MVS TOOLS AND TRICKS OF THE TRADE
                          JULY 2001

                                      Sam Golob
                                      MVS Systems Programmer
                                      P.O. Box 906
                                      Tallman, New York 10982

Sam Golob is a Senior Systems Programmer.  He also participates in
library tours and book signings with his wife, author Courtney Taylor.
Sam can be contacted at sbgolob@cbttape.org.  Information about the CBT
MVS Tapes can be found on the web, at http://www.cbttape.org.


EBCDIC to ASCII - YES or NO?

      As we all know, data is represented on the computer as a
succession of bit settings:  0 if the bit is off, 1 if the bit is on.
To represent data on most of the systems we work with, the bits are
gathered in groups of eight.  A group of eight consecutive bits is
called a byte.  The number of different possible combinations of the
eight bits in a byte, is 2 to the eighth power, which equals 256.  So it
follows that there are 256 different possible byte combinations of eight
bits.

      We use the common hexadecimal representation, to shorten how we
picture eight-bit bytes.  Each hexadecimal number represents 4 bits.
And a byte represents 8 bits.  Therefore, it takes two 4-bit hexadecimal
numbers to describe one 8-bit byte.  So, in the common hexadecimal
representation, the values of these 8-bit byte numbers, representing the
decimal numbers from 0 thru 255, ranges from hex 00 (or X'00') to hex FF
(or X'FF').  This is true, since a hexadecimal number represents a
four-bit value from 0 to 15, denoted in increasing order, as 0, 1, 2, 3,
4, 5, 6, 7, 8, 9, A, B, C, D, E, and F.  And two successive hexadecimal
4-bit numbers are used to represent one eight-bit byte.  Thus, the
eight-bit byte whose value is 0001,1111 is represented by the
hexadecimal value 1F, with the hex 1 being the four bytes of 0001, and
the hex F being the four bytes of 1111.

      When we represent characters such as a "small a" or a "capital
A" on the computer, we choose one of these eight-bit combinations to
represent each character.  So now it is a question of deciding which
combination of eight bits should represent which character.  Obviously,
in order for one computer system to communicate with another computer
system, we must set a standard.  We must decide, which byte combination
stands for A and which for B, which for 1 and which for 9.  Also, all of
the special characters such as the pound sign (#) and so forth, must
likewise be accounted for, in a standard manner, by eight-bit "byte"
representations.  And different characters present in different
languages, such as the "umlaut-ed" vowels in German, and so forth, must
likewise be standardly represented.

      Has this been done?  Of course it has.  What is the complication?
It has been done in two different, and completely incompatible ways.
There are currently two entirely different standards for character
representations of machine-readable data.

      Of the two standards, one, which has been championed by IBM, is
called EBCDIC.  The other, which has been adopted by practically
everybody else, is called ASCII.  IBM is a powerful enough force, that
its standard cannot be ignored.  MVS, for the most part, and VM, and
VSE, and AS/400, largely use EBCDIC data representation for characters.
And then, of course, there's the rest of the world....  They use ASCII.


TRANSLATION BETWEEN THE TWO SYSTEMS.

      In Assembler Language, it is not hard to convert, for example,
an 80-byte card image full of characters, from EBCDIC to ASCII
representation, and vice-versa.  To do so, you use a translate table
containing 256 characters, and a single TR (translate) instruction.
Once this is coded, the machine does the rest.  An example of a partial
translation table from ASCII to EBCDIC representation, is shown in
Figure 1.  This table come to us, courtesy of Tachyon Software, makers
of the magnificent Tachyon 390 Cross Assembler for the PC.

      The Tachyon 390 Cross Assembler, which can take source code input
either in ASCII or EBCDIC representation, mimics almost the complete set
of features of the MVS/VM/VSE High Level Assembler from IBM.  But it
does it on a PC, with compatible output, and it can produce compatible
ADATA.  When the Tachyon Assembler "sees" a piece of source code, it
looks at the first few characters, and figures out if they look like
ASCII letters and numbers, or EBCDIC letters and numbers.  Then it
"gears itself" to take the rest of the source input in the same way.  To
find out more about Tachyon products, go to www.tachyonsoft.com .

      If you look closely at Figure 1, you'll notice some startling
differences.  In ASCII, the lower case letters are represented by bigger
hex numbers than the upper case letters.  But in EBCDIC, the opposite is
the case--upper case letters are larger numbers than lower case letters.
Another difference:  in EBCDIC, as is widely known among Assembler
programmers, you convert a lower case letter to an upper case letter by
OR'ing it with a blank (X'40' in EBCDIC).  But in ASCII, you convert an
upper case letter to a lower case letter by OR'ing it with a blank
(X'20' in ASCII).  In EBCDIC, numbers (X'F0' thru X'F9') are bigger hex
numbers than letters (either upper case or lower case).  But in ASCII,
numbers (X'30' thru x'39') are smaller hex numbers than either the upper
case or the lower case letters.

      Whether one hex number representing a letter or a number is
bigger, or smaller than another, makes a great difference to us.  We see
the difference when we do sorting.  Sorting usually goes completely
according to the numeric value of a character.  If the character is
represented by a bigger number, that character sorts higher.  So we see
that if we'd sort a list of characters in ASCII, the numbers would sort
lower than the letters.  But in EBCDIC, the numbers sort higher.  That's
why, when we look in the index of a book, and we see the lower case
words coming after the upper case words, and the numbers coming before
both of them, we know that the book was composed on a computer system
using ASCII representation.  Whereas if the lower case words come first,
and the numbers come last, we know the book was composed and indexed
using an EBCDIC-based machine.

      Sometimes even IBM confuses us this way--we look at the index to
an IBM MVS manual, and its index is sorted the ASCII way, although the
subject matter of the book is EBCDIC oriented.  The solution to this
paradox is easy.  IBM simply used a PC to compose its MVS manuals.  If
you know these differences between EBCDIC and ASCII sorts, those
apparent anomalies are completely explained.


MOVING DATA BETWEEN SYSTEMS

      If you're moving text from a pc to an MVS machine, you would be
wise to know, if the program that does the moving, also does a translate
operation at the same time.  For example, if IND$FILE is the program
which does the data moving, you instruct IND$FILE to translate from
ASCII to EBCDIC on an upload, or from EBCDIC to ASCII on a download,
with the "ASCII" keyword of the IND$FILE command.  Data delimiting is
another difference between the two systems.  MVS (fixed blocked) records
are delimited by the record size, while ASCII text files on the PC, are
often delimited either by a Carriage Return (CR) X'0D' character
followed by a Line Feed (LF) X'0A' character, or by one of them without
the other (as is done by many UNIX machines).

      All of this may be confusing to the beginner, but after one
becomes knowledgeable, one can become an accomplished "data jockey".
For example, I use SPF/PC (from Command Technology Corporation) on the
PC, to look at data.  I most often use one of the early Windows versions
of SPF/PC, which still looks very mainframe-ISPF-like.  SPF/PC allows
you to control the profile (keyed on the second name for the file, after
the dot), to describe the characteristics of the file you think you're
looking at.  Then, when you rename the second name of the file to match
the profile, you can either see the data properly, or it looks mangled.
You adjust the SPF/PC profile characteristics, using your ASCII-EBCDIC
knowledge and your data delimiting smarts, until the data can be seen
clearly.  Once that's done, SPF/PC offers data conversion facilities,
to convert a text file from say, ASCII and CR-LF delimited, to EBCDIC,
and 80-character delimited (padded with blanks).

      I'll conclude this month's discussion with some words about the
usefulness of the TSO XMIT command for moving data.  On MVS, the TSO
XMIT command is used to convert pds'es and other format data files, to
FB-80 byte, nice neat sequential files, which are very suitable for
transmission to other systems.  XMIT accomplishes this data reformatting
with its OUTDSN(re.formatted.dataset) keyword.  For the purposes of
today's discussion, one must know that XMIT-format files are strictly
EBCDIC.  Any attempt to perform ASCII translation on them, will mess
them up irreparably.  (You must do a re-transmission of the original
file, without any conversion.)  Therefore, all data transmission of
XMIT-format MVS files must be BINARY, with no ASCII keyword, or CR or LF
(if you're using IND$FILE).

      As the proprietor of the CBT MVS Utilities Tape collection, I
have to be very familiar will all of this data-moving stuff.  Most of
the CBT contributions come in on the Internet, and although I prefer
getting XMIT-formatted pds'es, which were transmitted in BINARY, and
which can be converted into an unchanged pds on my MVS system, I often
have to "suffer" and handle ASCII data, or something else.  I have
learned to become very skilled at this.

      An often vexing problem for me, is to handle someone's REXX exec
that has been transmitted to me in ASCII, and which has to be converted
to EBCDIC.  The vertical line in a REXX exec, when looked at in EBCDIC,
should be X'4F'.  But some of the ASCII-to-EBCDIC translate tables make
it into X'6A', a broken vertical line, which does not function the same
way in a REXX exec.  I've got to make sure that all X'6A' are converted
to X'4F', and I'm never quite sure that the exec will really work right.
Other problems are caused by the slanty quotes in ASCII, which may not
translate correctly into EBCDIC, and there's always that stuff that
becomes C'3D' on the MVS system, but which was some box, or other, on
the ASCII system.  When I get a REXX contribution for the CBT MVS
collection, I always prefer that it has never undergone an ASCII
conversion.  Therefore, I greatly prefer going with the XMIT route,
which is pure EBCDIC.

      I hope this month's talk has opened your eyes to a few of the
problems with data transmission between different machines, and that it
has better made you more aware of ASCII and EBCDIC issues.  I always
wish you the best of everything, and I hope to see you again next
month.


  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *


Figure 1.       An ASCII To EBCDIC Translation Table

            This table, with the first hex number as ASCII and
            the second hex number as its EBCDIC equivalent,
            comes to us, courtesy of Tachyon Software, makers
            of the Tachyon Cross-Assembler, Tachyon Operating
            System, and the Tachyon File Tools for the PC.
            The Tachyon Assembler can accept Assembler Language
            source input that is either in ASCII representation
            or EBCDIC representation, and it uses (by default)
            the following translation table to convert from one
            data representation format, to the other.

       blank    !      "      #      $      %      &      '
00=00  20=40  21=5A  22=7F  23=7B  24=5B  25=6C  26=50  27=7D
  (      )      *      +      ,      -      .      /      0
28=4D  29=5D  2A=5C  2B=4E  2C=6B  2D=60  2E=4B  2F=61  30=F0
  1      2      3      4      5      6      7      8      9
31=F1  32=F2  33=F3  34=F4  35=F5  36=F6  37=F7  38=F8  39=F9
  :      ;      <      =      >      ?      @      A      B
3A=7A  3B=5E  3C=4C  3D=7E  3E=6E  3F=6F  40=7C  41=C1  42=C2
  C      D      E      F      G      H      I      J      K
43=C3  44=C4  45=C5  46=C6  47=C7  48=C8  49=C9  4A=D1  4B=D2
  L      M      N      O      P      Q      R      S      T
4C=D3  4D=D4  4E=D5  4F=D6  50=D7  51=D8  52=D9  53=E2  54=E3
  U      V      W      X      Y      Z      í      \      ù
55=E4  56=E5  57=E6  58=E7  59=E8  5A=E9  5B=BA  5C=E0  5D=BB
  ª      _      `      a      b      c      d      e      f
5E=B0  5F=6D  60=79  61=81  62=82  63=83  64=84  65=85  66=86
  g      h      i      j      k      l      m      n      o
67=87  68=88  69=89  6A=91  6B=92  6C=93  6D=94  6E=95  6F=96
  p      q      r      s      t      u      v      w      x
70=97  71=98  72=99  73=A2  74=A3  75=A4  76=A5  77=A6  78=A7
  y      z      {      |      }      æ      ½      ^      ¨
79=A8  7A=A9  7B=C0  7C=4F  7D=D0  7E=A0  9B=4A  AA=5F  C0=AB
  â      ž      Ñ
C3=EB  C4=BF  DA=AC