(My apologies to our readership. This article has been edited.) MVS TOOLS AND TRICKS OF THE TRADE March 1994 Sam Golob MVS Systems Programmer sbgolob@cbttape.org Sam Golob is a Senior Systems Programmer THE FASCINATION OF THE DISASSEMBLER - PART TWO. In Part I (January 1994), we began studying the usefulness of disassembling machine code. As you may recall, this is the process of converting load module CSECTs into assembler source code using a utility program called a disassembler. Two public domain disassemblers can be found on the CBT MVS Utilities Tape. The CBT Tape is obtainable through NaSPA. These disassemblers are on Files 217 and 171 of the tape. These two programs are somewhat similar to each other, but they are also significantly different. You can decide which one you like better. Let's take a quick look at both of them. First, we'll look at their similarities. Both disassemblers produce very detailed print output including instruction displacements, a hex dump of the load module and all the load module attributes. Additionally, both disassemblers also produce a much plainer punch output, which is reassembleable. As for differences, the disassembler on File 171 allows labels to be fitted from actual assembly of real macros, whereas the File 217 disassembler has its own special "label coding" that is required. As of Version 363 of the CBT Tape, the File 171 disassembler has been improved much. The vendor product PDSTOOLS from SERENA International in Burlingame, Calif., contains both a load module disassembler and an object deck disassembler (besides the zillion other things it does). Reassembleable source code produced by the PDSTOOLS disassembler has displacement information on each line, making it a handy tool for fitting zaps into the original load module. The PDSTOOLS disassembler does not produce separate print and punch output. One output is produced, complete with the displacement information. Depending upon a keyword, The PDSTOOLS disassembler output can be made instantly reassembleable, sandwiched between appropriate reassembly JCL. Otherwise, its output is displayed in a more "diagnostic" format, resembling an assembler listing. The PDSTOOLS disassembler also has another feature the other disassemblers do not: It can decode entire load modules and it is not restricted to doing just a CSECT at a time. I once used the PDSTOOLS disassembler to disassemble IEANUC01 (for MVS/370 a while back) and I quickly reassembled it so it actually worked and could be used for production. (Note: Don't do this unless you perform many checks and verifications against the original nucleus to ensure "sameness.") You can see that the PDSTOOLS disassembler is very powerful and quick to use. In Part I, we discussed a few applications of disassembling load modules. In this month's column, we'll go into more detail. Right now, I'll summarize some useful techniques involving disassembly we discussed previously. First problem: How do we determine if two different load modules actually have the same code? Answer: Disassemble them both with the same disassembler and compare the two source decks line by line with a compare utility. If the decks are the same, the code is the same. Only the load module attributes and the CSECT structure (the load module maps) will then need to be compared. This will prove "sameness" of the modules, even if the link-edit dates are different. Next situation: Suppose we have two load modules, one zapped and the other not. How do we determine where the zap is? Answer: Simply disassemble both of them and again compare the source decks line by line. The location of the zap(s) will appear immediately upon comparison. Finally, suppose we have to upgrade a USERMOD zap from a lower maintenance level of (non-OCO) IBM code to a more recent level of the same module. Disassembling both modules will usually reveal the appropriate location of the corresponding zap in the higher maintenance level code. The technique may even work if some of the actual assembler instructions which are to be zapped became different instructions in the later level of the module. For example, suppose the newer module was created by IBM using a different PL/S compiler than the old one. It is possible that the same PL/S logic may translate into completely different assembler code. Nevertheless, you can sometimes find the appropriate zapping point with the aid of disassembled source code because you understand the overall logic better when you see the assembler instructions before you. Trying To Learn How Code Works Usually, when you disassemble a load module, you have far less information about how it works than if you had a commented source deck. With disassembled source, you have the instructions but must do considerable detective work to learn the logic. It pays to use whatever extra clues you can find. However, let's start with the basics. Learning the details of how a program works from a disassembled source deck usually takes a lot of patience. However, a quick glance at a disassembled source deck will reveal some significant information. For instance, you can usually spot patch areas quickly. DCB control blocks are easily recognizable once you've seen them. It is usually possible to see the DDNAME in the DCB. You can tell if it's hard coded in the DCB or if a substitute DDNAME is being dropped in. SVC instructions are decoded by all these disassemblers. You can get a general idea what the program is doing by looking at which SVC instructions have been coded. One of the more subtle details you can get from a disassembled module is the structure of a work area or a control block created by that module. You may notice a section of code that has a series of STore instructions to different displacements off the same register, with other instructions including a Load or Load Address instruction, placed before each STore. This may indicate that the program is obtaining information and placing it in the various pigeon holes of a work area or control block. With some patience, you can then map the layout of that control block or work area. I once used this observation profitably. I had to refit a zap to some IBM code which got a new internal work area after a certain PTF level. I was worried that the newly added module structure might interfere with the workings of my zap. From the cover letter of the PTF I got some idea of the new work area's purpose. I didn't have source at the higher PTF level because the microfiche (remember microfiche?) for that PUT level wasn't obtainable yet. Our PTF had been an advance shipment obtained from a CBPDO. Nevertheless, with disassembled source I was able to figure out what kind of data went into that work area and at which point in the code execution it happened. It turned out that my zap code could safely use the new work area before the IBM code used it. The zap code "did its thing" before the IBM code initialized that new piece of storage for the first time. So I successfully fitted and tested my zap. Deciphering uncommented disassembled code is much easier if you have clues. Often, a lower level of the same module can be found in base level machine readable source (the Optional Materials tapes) or on microfiche. Even though some of the code will change at the higher maintenance or release levels much of that module's structure and logic will also be found at the lower level. Additionally, you have commented assembler source at the lower level. Exploit this hint as much as possible. It really pays to save your old Optional Materials tapes. I have a scheme to get at these source materials quickly, which you can see on Files 188 and 189 of the CBT Utilities Tape. Unfortunately, I do not have the space to describe its details here. Once I have my scheme in place, I can find and assemble any source module in five minutes. Why is this so good? Because IBM source code is heavily commented. Even at base level code, these materials will aid immeasurably in trying to negotiate a disassembly deck of the code you are actually running. Together with care, patience and intelligence, these hints will definitely assist you in this work. Disassembling Object Decks The public disassembler programs we've mentioned take load modules as input, as opposed to object decks. Object decks are RECFM=FB card-image decks that are output from the assembler and from language compilers. Load modules are RECFM=U and are output from the linkage editor, or now, from the binder. Even though object decks are not directly executable on the computer, they do contain machine instructions identical to those in load modules. As such they could be candidates for disassembling. However, with the free disassemblers from the CBT Tape Files 217 and 171 the object decks will have to be link-edited into load modules first, before their code can be disassembled by these programs. A shortcut to this process would be a disassembler program that could take object decks as input instead of load modules. As of this writing, I don't know of any free program that does this. However, the PDSTOOLS vendor product, besides its DISASM subcommand to disassemble load modules, has a READOBJ subcommand to disassemble object decks directly without having to link-edit them first. For example, you can point the READOBJ subcommand directly at a PTF and obtain instant disassemblies of all the object decks in the PTF. Whatever tools you have, disassembling an object deck directly from the PTF can pay big dividends. Since all the disassemblers generate instruction displacements within some of their outputs, it is possible to fit a zap directly onto the PTF without having to APPLY it first. It doesn't even have to be on your system and you can already figure out the proper hooks into it. You don't have to wait until the full-sized load module is actually sitting on a system library. I'll show you a startling example of something I was able to accomplish with PDSTOOLS. But the same kind of thing can be done, albeit much more awkwardly, with public tools. My friend, Merv Hemp, recently asked me to refit an old modification to module IDCLC01, which performs LISTC processing and displays catalog entries. A 1-byte zap to this module could greatly shorten the "LISTC LEVEL" output by eliminating the repetition of the catalog name after displaying every catalog entry (See Figure 1). With the zap on, the LISTC display is much shorter and runs much quicker than with the zap off. I mentioned this case in passing, last time. This zap had been fitted at DFP Version 2.4 base level. Merv wanted me to fit the same zap to a DFP Version 3.3.1 level of IDCLC01 at PTF level UY83416. I actually later fitted this zap to 17 different maintenance levels of IDCLC01, doing the whole job with PDSTOOLS in less than two hours. A full machine-readable exhibition of this work can be found on a new CBT Tape, File 236. How did I go about this? First, I had to find every IBM SMPPTS data set in my data center complex. In PDSTOOLS, I did a global LISTFILE command against all of our 400 mounted volumes, looking for all data sets containing the string "PTS" within their names. When I found these, I narrowed the list down (by eyeball) to just the SMPPTS data sets for IBM modules. There were six of them, which referred to five different levels of the MVS operating system. As you know, these PTS data sets contain the actual text of IBM PTF fixes. Next, I scanned each PTS data set for the string, "IDCLC01", getting a small list of members in each PTS. The PDSTOOLS command which does this is: "FIND : /IDCLC01/ THEN(MEMLIST)," which says: Find every member containing the string "IDCLC01" and display a member list consisting of only those members. Once I had this list in each PTS data set, I ran the READOBJ command against every PTF to disassemble all the object decks in it (and use the ISPF EDIF service to display the result). Going up and down each disassembly of an IDCLC01 module, I found the proper place to fit the zap for that level. Fitting this zap to the 17 different levels of IDCLC01 I found at our data center, took two hours with PDSTOOLS. The same job would have taken perhaps 10 to 25 hours without PDSTOOLS, but it could have been done using public materials too. Here's how. After finding which PTFs contained the string "IDCLC01," you could edit a copy of that PTF to extract its object deck for IDCLC01. Then you would link-edit that object deck and run it through one of the disassembler programs. At this point, the zap could be fitted by looking through the disassembler print output. Conclusion I hope this month's column opens your mind in a new direction. Disassembling should no longer be a "dirty word" or label you as "dangerous." On the contrary, it is just another essential technique for a systems programmer to use in solving many of the problems that come up at most shops. Good luck. * * * * * * * * * * * * * * * * * * * * * * * Figure 1. Illustration of the MOD to LISTC in module IDCLC01. (THIS IS THE ACTION OF THE MOD) * ---- LISTC LISTING WITH THE MOD IN PLACE ---- * (THE RESPONSE IS ALSO MUCH FASTER.) LISTC LEV(TIMMY) IN CATALOG:STAFF.CATLG TIMMY.ACS.CNTL TIMMY.ASM2 TIMMY.DATA TIMMY.DCOLL.DATA READY END * ---- LISTC LISTING WITHOUT THE MOD IN PLACE ---- * LISTC LEV(TIMMY) NONVSAM ------- TIMMY.ACS.CNTL IN-CAT --- STAFF.CATLG NONVSAM ------- TIMMY.ASM2 IN-CAT --- STAFF.CATLG NONVSAM ------- TIMMY.DATA IN-CAT --- STAFF.CATLG NONVSAM ------- TIMMY.DCOLL.DATA IN-CAT --- STAFF.CATLG READY END * * * * * * * * * * * * * * * * * * * * * * * (This figure was cut, in the published article.) Figure 2. Illustration of how the Assembler Instructions in an IBM module can change when different releases of PL/S are used, or when the PL/S logic may have changed slightly. The top set of instructions comes from the DFP Release 2.4 machine-readable source at base level. The bottom set of instructions is from a PDSTOOLS object module disassembly obtained from PTF UY83416 itself, using the PDSTOOLS "READOBJ" subcommand. Included here are instructions on how to fit this zap, in general. The zap is fitted on various levels of the module IDCLC01, which belongs to load module IDCLC01. In XA DFP 2.4 base level, the mod is done to the first branch of nine similar compares and branches that occur one after the other as follows: (This is from a source code assembly.) C @02,FDTPTR+80(,@15) BNE @RF01262 <----- (Zap 4770 to 4700 here) C @02,FDTPTR+8(,@15) BNE @RF01262 C @02,FDTPTR+96(,@15) BNE @RF01262 C @02,FDTPTR+100(,@15) BNE @RF01262 C @02,FDTPTR+104(,@15) BNE @RF01262 C @02,FDTPTR+92(,@15) BNE @RF01262 C @02,FDTPTR+56(,@15) BNE @RF01262 C @02,FDTPTR+64(,@15) BNE @RF01262 C @02,FDTPTR+60(,@15) BNE @RF01263 Because of changes in the PL/AS compiler (whichever one is being used now--it doesn't matter to us) this logic is done differently in MVS/ESA 4: The idea is that you have to find a similar pattern of nine sets of instructions in module IDCLC01 and zap the branch of the first set to a noop (4770 to 4700). (This is from READOBJ output of PDSTOOLS.) * This module is at PTF level UY83416 (MVS/ESA 4.2 - DFP 3.3.1) Displacement L R14,88(,R2) 58E0 2058 *.\..* 1E26 001E26 LTR R14,R14 12EE *..* 1E2A 001E2A ----> BNZ 3731(,R9) (Zap to 4700) 4770 9E93 *...l* 1E2C 001E2C L R15,16(,R2) 58F0 2010 *.0..* 1E30 001E30 LTR R15,R15 12FF *..* 1E34 001E34 BNZ 3731(,R9) 4770 9E93 *...l* 1E36 001E36 L R0,104(,R2) 5800 2068 *....* 1E3A 001E3A LTR R0,R0 1200 *..* 1E3E 001E3E BNZ 3731(,R9) 4770 9E93 *...l* 1E40 001E40 L R1,108(,R2) 5810 206C *...%* 1E44 001E44 LTR R1,R1 1211 *..* 1E48 001E48 BNZ 3731(,R9) 4770 9E93 *...l* 1E4A 001E4A L R14,112(,R2) 58E0 2070 *.\..* 1E4E 001E4E LTR R14,R14 12EE *..* 1E52 001E52 BNZ 3731(,R9) 4770 9E93 *...l* 1E54 001E54 L R15,100(,R2) 58F0 2064 *.0..* 1E58 001E58 LTR R15,R15 12FF *..* 1E5C 001E5C BNZ 3731(,R9) 4770 9E93 *...l* 1E5E 001E5E L R0,64(,R2) 5800 2040 *... * 1E62 001E62 LTR R0,R0 1200 *..* 1E66 001E66 BNZ 3731(,R9) 4770 9E93 *...l* 1E68 001E68 L R1,72(,R2) 5810 2048 *....* 1E6C 001E6C LTR R1,R1 1211 *..* 1E70 001E70 BNZ 3731(,R9) 4770 9E93 *...l* 1E72 001E72 L R2,68(,R2) 5820 2044 *....* 1E76 001E76 LTR R2,R2 1222 *..* 1E7A 001E7A BNZ 3727(,R9) 4770 9E8F *....* 1E7C 001E7C