MVS TOOLS AND TRICKS OF THE TRADE JULY 2001 Sam Golob MVS Systems Programmer P.O. Box 906 Tallman, New York 10982 Sam Golob is a Senior Systems Programmer. He also participates in library tours and book signings with his wife, author Courtney Taylor. Sam can be contacted at sbgolob@cbttape.org. Information about the CBT MVS Tapes can be found on the web, at http://www.cbttape.org. EBCDIC to ASCII - YES or NO? As we all know, data is represented on the computer as a succession of bit settings: 0 if the bit is off, 1 if the bit is on. To represent data on most of the systems we work with, the bits are gathered in groups of eight. A group of eight consecutive bits is called a byte. The number of different possible combinations of the eight bits in a byte, is 2 to the eighth power, which equals 256. So it follows that there are 256 different possible byte combinations of eight bits. We use the common hexadecimal representation, to shorten how we picture eight-bit bytes. Each hexadecimal number represents 4 bits. And a byte represents 8 bits. Therefore, it takes two 4-bit hexadecimal numbers to describe one 8-bit byte. So, in the common hexadecimal representation, the values of these 8-bit byte numbers, representing the decimal numbers from 0 thru 255, ranges from hex 00 (or X'00') to hex FF (or X'FF'). This is true, since a hexadecimal number represents a four-bit value from 0 to 15, denoted in increasing order, as 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F. And two successive hexadecimal 4-bit numbers are used to represent one eight-bit byte. Thus, the eight-bit byte whose value is 0001,1111 is represented by the hexadecimal value 1F, with the hex 1 being the four bytes of 0001, and the hex F being the four bytes of 1111. When we represent characters such as a "small a" or a "capital A" on the computer, we choose one of these eight-bit combinations to represent each character. So now it is a question of deciding which combination of eight bits should represent which character. Obviously, in order for one computer system to communicate with another computer system, we must set a standard. We must decide, which byte combination stands for A and which for B, which for 1 and which for 9. Also, all of the special characters such as the pound sign (#) and so forth, must likewise be accounted for, in a standard manner, by eight-bit "byte" representations. And different characters present in different languages, such as the "umlaut-ed" vowels in German, and so forth, must likewise be standardly represented. Has this been done? Of course it has. What is the complication? It has been done in two different, and completely incompatible ways. There are currently two entirely different standards for character representations of machine-readable data. Of the two standards, one, which has been championed by IBM, is called EBCDIC. The other, which has been adopted by practically everybody else, is called ASCII. IBM is a powerful enough force, that its standard cannot be ignored. MVS, for the most part, and VM, and VSE, and AS/400, largely use EBCDIC data representation for characters. And then, of course, there's the rest of the world.... They use ASCII. TRANSLATION BETWEEN THE TWO SYSTEMS. In Assembler Language, it is not hard to convert, for example, an 80-byte card image full of characters, from EBCDIC to ASCII representation, and vice-versa. To do so, you use a translate table containing 256 characters, and a single TR (translate) instruction. Once this is coded, the machine does the rest. An example of a partial translation table from ASCII to EBCDIC representation, is shown in Figure 1. This table come to us, courtesy of Tachyon Software, makers of the magnificent Tachyon 390 Cross Assembler for the PC. The Tachyon 390 Cross Assembler, which can take source code input either in ASCII or EBCDIC representation, mimics almost the complete set of features of the MVS/VM/VSE High Level Assembler from IBM. But it does it on a PC, with compatible output, and it can produce compatible ADATA. When the Tachyon Assembler "sees" a piece of source code, it looks at the first few characters, and figures out if they look like ASCII letters and numbers, or EBCDIC letters and numbers. Then it "gears itself" to take the rest of the source input in the same way. To find out more about Tachyon products, go to www.tachyonsoft.com . If you look closely at Figure 1, you'll notice some startling differences. In ASCII, the lower case letters are represented by bigger hex numbers than the upper case letters. But in EBCDIC, the opposite is the case--upper case letters are larger numbers than lower case letters. Another difference: in EBCDIC, as is widely known among Assembler programmers, you convert a lower case letter to an upper case letter by OR'ing it with a blank (X'40' in EBCDIC). But in ASCII, you convert an upper case letter to a lower case letter by OR'ing it with a blank (X'20' in ASCII). In EBCDIC, numbers (X'F0' thru X'F9') are bigger hex numbers than letters (either upper case or lower case). But in ASCII, numbers (X'30' thru x'39') are smaller hex numbers than either the upper case or the lower case letters. Whether one hex number representing a letter or a number is bigger, or smaller than another, makes a great difference to us. We see the difference when we do sorting. Sorting usually goes completely according to the numeric value of a character. If the character is represented by a bigger number, that character sorts higher. So we see that if we'd sort a list of characters in ASCII, the numbers would sort lower than the letters. But in EBCDIC, the numbers sort higher. That's why, when we look in the index of a book, and we see the lower case words coming after the upper case words, and the numbers coming before both of them, we know that the book was composed on a computer system using ASCII representation. Whereas if the lower case words come first, and the numbers come last, we know the book was composed and indexed using an EBCDIC-based machine. Sometimes even IBM confuses us this way--we look at the index to an IBM MVS manual, and its index is sorted the ASCII way, although the subject matter of the book is EBCDIC oriented. The solution to this paradox is easy. IBM simply used a PC to compose its MVS manuals. If you know these differences between EBCDIC and ASCII sorts, those apparent anomalies are completely explained. MOVING DATA BETWEEN SYSTEMS If you're moving text from a pc to an MVS machine, you would be wise to know, if the program that does the moving, also does a translate operation at the same time. For example, if IND$FILE is the program which does the data moving, you instruct IND$FILE to translate from ASCII to EBCDIC on an upload, or from EBCDIC to ASCII on a download, with the "ASCII" keyword of the IND$FILE command. Data delimiting is another difference between the two systems. MVS (fixed blocked) records are delimited by the record size, while ASCII text files on the PC, are often delimited either by a Carriage Return (CR) X'0D' character followed by a Line Feed (LF) X'0A' character, or by one of them without the other (as is done by many UNIX machines). All of this may be confusing to the beginner, but after one becomes knowledgeable, one can become an accomplished "data jockey". For example, I use SPF/PC (from Command Technology Corporation) on the PC, to look at data. I most often use one of the early Windows versions of SPF/PC, which still looks very mainframe-ISPF-like. SPF/PC allows you to control the profile (keyed on the second name for the file, after the dot), to describe the characteristics of the file you think you're looking at. Then, when you rename the second name of the file to match the profile, you can either see the data properly, or it looks mangled. You adjust the SPF/PC profile characteristics, using your ASCII-EBCDIC knowledge and your data delimiting smarts, until the data can be seen clearly. Once that's done, SPF/PC offers data conversion facilities, to convert a text file from say, ASCII and CR-LF delimited, to EBCDIC, and 80-character delimited (padded with blanks). I'll conclude this month's discussion with some words about the usefulness of the TSO XMIT command for moving data. On MVS, the TSO XMIT command is used to convert pds'es and other format data files, to FB-80 byte, nice neat sequential files, which are very suitable for transmission to other systems. XMIT accomplishes this data reformatting with its OUTDSN(re.formatted.dataset) keyword. For the purposes of today's discussion, one must know that XMIT-format files are strictly EBCDIC. Any attempt to perform ASCII translation on them, will mess them up irreparably. (You must do a re-transmission of the original file, without any conversion.) Therefore, all data transmission of XMIT-format MVS files must be BINARY, with no ASCII keyword, or CR or LF (if you're using IND$FILE). As the proprietor of the CBT MVS Utilities Tape collection, I have to be very familiar will all of this data-moving stuff. Most of the CBT contributions come in on the Internet, and although I prefer getting XMIT-formatted pds'es, which were transmitted in BINARY, and which can be converted into an unchanged pds on my MVS system, I often have to "suffer" and handle ASCII data, or something else. I have learned to become very skilled at this. An often vexing problem for me, is to handle someone's REXX exec that has been transmitted to me in ASCII, and which has to be converted to EBCDIC. The vertical line in a REXX exec, when looked at in EBCDIC, should be X'4F'. But some of the ASCII-to-EBCDIC translate tables make it into X'6A', a broken vertical line, which does not function the same way in a REXX exec. I've got to make sure that all X'6A' are converted to X'4F', and I'm never quite sure that the exec will really work right. Other problems are caused by the slanty quotes in ASCII, which may not translate correctly into EBCDIC, and there's always that stuff that becomes C'3D' on the MVS system, but which was some box, or other, on the ASCII system. When I get a REXX contribution for the CBT MVS collection, I always prefer that it has never undergone an ASCII conversion. Therefore, I greatly prefer going with the XMIT route, which is pure EBCDIC. I hope this month's talk has opened your eyes to a few of the problems with data transmission between different machines, and that it has better made you more aware of ASCII and EBCDIC issues. I always wish you the best of everything, and I hope to see you again next month. * * * * * * * * * * * * * * * * * * * * * * * Figure 1. An ASCII To EBCDIC Translation Table This table, with the first hex number as ASCII and the second hex number as its EBCDIC equivalent, comes to us, courtesy of Tachyon Software, makers of the Tachyon Cross-Assembler, Tachyon Operating System, and the Tachyon File Tools for the PC. The Tachyon Assembler can accept Assembler Language source input that is either in ASCII representation or EBCDIC representation, and it uses (by default) the following translation table to convert from one data representation format, to the other. blank ! " # $ % & ' 00=00 20=40 21=5A 22=7F 23=7B 24=5B 25=6C 26=50 27=7D ( ) * + , - . / 0 28=4D 29=5D 2A=5C 2B=4E 2C=6B 2D=60 2E=4B 2F=61 30=F0 1 2 3 4 5 6 7 8 9 31=F1 32=F2 33=F3 34=F4 35=F5 36=F6 37=F7 38=F8 39=F9 : ; < = > ? @ A B 3A=7A 3B=5E 3C=4C 3D=7E 3E=6E 3F=6F 40=7C 41=C1 42=C2 C D E F G H I J K 43=C3 44=C4 45=C5 46=C6 47=C7 48=C8 49=C9 4A=D1 4B=D2 L M N O P Q R S T 4C=D3 4D=D4 4E=D5 4F=D6 50=D7 51=D8 52=D9 53=E2 54=E3 U V W X Y Z í \ ù 55=E4 56=E5 57=E6 58=E7 59=E8 5A=E9 5B=BA 5C=E0 5D=BB ª _ ` a b c d e f 5E=B0 5F=6D 60=79 61=81 62=82 63=83 64=84 65=85 66=86 g h i j k l m n o 67=87 68=88 69=89 6A=91 6B=92 6C=93 6D=94 6E=95 6F=96 p q r s t u v w x 70=97 71=98 72=99 73=A2 74=A3 75=A4 76=A5 77=A6 78=A7 y z { | } æ ½ ^ ¨ 79=A8 7A=A9 7B=C0 7C=4F 7D=D0 7E=A0 9B=4A AA=5F C0=AB â ž Ñ C3=EB C4=BF DA=AC