MVS TOOLS AND TRICKS OF THE TRADE February 1997 Sam Golob MVS Systems Programmer sbgolob@cbttape.org Sam Golob is a Senior Systems Programmer CREATIVE ENQUEUING. This month, I'd like to talk about two ways how I used ENQUEUE processing, which arose in my work. These specific cases may seem to be a bit far-fetched when looked at in an isolated way. But they illustrate the way enqueues, deques, and reserves are actually used by the operating system, to protect the use of system resources. If you follow what I'm going to be talking about, you yourselves will be able to use these same principles creatively, in your own work, to solve your own problems. What is enqueuing? IBM created this process to deal with a situation when two different tasks want to access the same data, or to use some system facility. This data or system facility might be sensitive to the fact that it shouldn't be accessed or updated by two independent jobs at the same time. We want to make one of the jobs wait until the other is finished accessing the data, or using the facility. Then the other job will be able to go in and "do its thing", separately from the first job. In this way, the data's integrity will be preserved, or the system facility will be able to work in the way it was designed. What is the mechanism of this process? How is it done? What happens is that each job that wants to use a resource, creates an object called an ENQUEUE. This enqueue can be either EXCLUSIVE or SHARED. Jobs or tasks which possess a shared enqueue can all use the resource concurrently. A job or task which possesses an exclusive enqueue is the only one which can use the resource at this time. Other jobs or tasks possessing the same enqueue, either shared, or exclusive, are put into a wait state by the operating system, until the possessor of the exclusive enqueue releases (or "dequeues") its enqueue. What is this object called an enqueue, and how is it created? An enqueue is created inside a running program, by the use of the ENQ or RESERVE macros, and it is released by the DEQ macro. The process is a bit more complicated, because an enqueue consists of three separate parts, and they all must match exactly, in order for the enqueue to be effective. What are the three parts of an enqueue? These are, the QNAME, the RNAME, and the SCOPE. The holder of an exclusive enqueue cannot stop another job or task, causing it to wait, unless the other task is also holding an enqueue with the exact same QNAME, RNAME, and SCOPE. If the other job has an enqueue which differs, however slightly, in either QNAME, or RNAME, or SCOPE, it will not be stopped, and the whole mechanism will be ineffective. You see, that the programming to protect a resource using enqueues, has to be very well coordinated among all the resource's potential users. IBM, from within its own programming of the MVS operating system, has created its own enqueues consistently. If any of IBM's own enqueues will be found to be defective, this is a bug, and it must be reported as a programming error to IBM. Users who create their own application programming systems, can also use the enqueue-dequeue mechanism to protect the resources which they create themselves, provided that they are internally consistent in creating and releasing the enqueues. There are several helpful IBM manuals, both for users, and for system programmers. Among these, are: "Application Development Guide: Assembler Language Programs" (GC28-1644), and "Application Development Reference: Services for Assembler Language Programs" (GC28-1642). For authorized programs there is the multi-volume: "Application Development Reference Services for Authorized Assembler Language Programs". All these manuals are available on IBM's CD-rom online library "MVS Collection" of manuals. Today, in our role as system programmers, we will try and break in to a few of IBM's enqueues. In the process, we will learn very much, about how exact this system is. Once we see what is going on, and we gain familiarity with the enqueue process, we will be able to write more of our own similar programs, and we will be better able to diagnose system problems in the enqueue area, that will arise in our daily work from time to time. (Editor: Please keep the "we" in the previous paragraph. The reason is that "I" know how to do this stuff, but if "we" go through it together, then the readers will pay attention, because they are part of the process. Thanks. ) TRACKING AND DISPLAYING ENQUEUES How can we look at an enqueue in all its detail? Actually, we can start with a console display that is quite effective. This is the "DISPLAY GRS" command with the "RES=" keyword. The relevant syntax of this command is: DISPLAY GRS,RES=(qname,rname) ,HEX where HEX is optional. The HEX operand is useful because qnames and rnames can be quite arbitrary. The qnames can be up to 8 characters long, and the rnames much longer--at least up to 60 characters and probably more. In the "D GRS" command, you can use the asterisk '*' as a wild-card indicator in either the qname or rname fields. This wildcard capability can be used to discover the qnames or rnames in your enqueue if you are not completely sure of them, or if you only know part of the name. The "D GRS" display will come back on the console with considerably detailed information about the structure of the enqueues. See Figure 1. You have to examine the SYSLOG with SDSF or whatever SYSLOG browsing tool you have, to see the details clearly. Otherwise they will roll off the screen, most likely. Once you have examined the "D GRS" display, you will know quite a lot about the enqueues you are concerned with. The "D GRS" display has some limitations. On a recent NVS/ESA system, the length of the RNAME display is limited to only 49 characters. Sometimes, the actual RNAME is bigger, as we shall soon see. Also, "D GRS" cannot limit the display by owning jobname, although it displays the jobname. In all, though, "D GRS" is a very useful display, and its CONTENTION keyword is quite useful for finding system bottlenecks caused by conflicting enqueues. Full syntax of the "D GRS" command can be found in the "MVS System Commands" manual that is current for your version of MVS. What other tools do I know about for displaying enqueues? I once called GRS Level 2 support because I couldn't display an enqueue with a long rname. The technician told me that he wrote a program to display long rnames, and he could send it to me. This program, which you run from a cataloged procedure, can now be found on File 250 of the CBT MVS Utilities Tape, a huge free software collection that can be obtained through the NaSPA office. Most vendor-written performance monitors have good enqueue displays. I have access to OMEGAMON from Candle Corp. which has its WHO and XQCB commands. The other big performance monitors are probably just about as good. I myself have used a very fine monitor called IMON that is written by Greg Price of Melbourne, Australia, and which is a vendor product. IMON is a TSO command and is very compact, user friendly and simple to run. However IMON has a very large array of capabilities. IMON is distributed in Sydney, Australia by the "Press Enter Group", email address: www.pressenter.com.au . See Figure 2 for a small sample of IMON's enqueue displays. TWO PROBLEMS INVOLVING ENQUEUES I have had two recent problems which helped me to gain some experience with enqueues, and I wish to share them with you. The first problem involved trying to temporarily and "harmlessly" stop JES2 on a test system for a test. The second involved my work with the SYS1.BRODCAST dataset. See this column from July 1995. (All my previous columns may be found on File 120 of the CBT MVS Tape.) Subsequent to writing that column, I wrote a package of utilities to manage all the user messages in SYS1.BRODCAST. I wrote three different programs for deleting all the broadcast messages to any particular TSO user. These programs were each based on different mechanisms, but in the third program, called BCMDEL2, I went into the SYS1.BRODCAST dataset myself, and physically displayed and deleted all of the user's messages, instead of calling IBM's LISTBC program to do the work. Therefore, I had to imitate IBM's enqueues that are created by the SEND and LISTBC programs, to protect the integrity of the SYS1.BRODCAST dataset while my own program, BCMDEL2, was doing its stuff. My SYS1.BRODCAST utilities may be found on File 247 of the CBT Tape, but unfortunately the correct enqueue protection in BCMDEL2 only found its way to the tape in Version 409. The NaSPA CD-rom is more back-leveled and contains Version 404, so you need to get a new tape to see my correct code. Now to the details of the first problem. Why did we have to stop JES2? One of our remote users claimed to have "irrevocably" lost data from one of its RJE transmissions, during a time when JES2 was hung with an enqueue lockout ("a deadly embrace"). JES2 obtains, at intervals, an exclusive RESERVE enqueue to its checkpoint dataset. A RESERVE enqueue ties up the disk pack, as well as locking out other jobs from obtaining the same enqueue. There happened to be another job which wanted an exclusive enqueue to a dataset on the checkpoint's pack, in such a way that JES2 was hung until the other job was cancelled. We had to duplicate this situation under test conditions, since we certainly would not want to stop JES2 on a production system. I decided to write a program in the form of an authorized TSO command (since I was breaking into an IBM system enqueue) that would ask for an exclusive RESERVE enqueue to the JES2 checkpoint dataset, and then WAIT indefinitely, without posting the WAIT ECB. I could free JES2 again by ATTENTION'ing my TSO command or cancelling my TSO session. This would be a wonderful way to control the test. The remote users would start a transmission, and during the transmission, I would "hang" JES2 purposely. Then we could see what would happen. The problem was that I would have to exactly imitate JES2's enqueue to the checkpoint dataset. I knew from looking at JES2's enqueues, that the QNAME is SYSZJES2, and the beginning of the RNAME consists of the 6 character volume serial that the checkpoint dataset is on, followed immediately by the dataset name of the checkpoint dataset. The enqueue was a RESERVE with a scope of SYSTEM. I wrote the program with this in mind, and found that it did not stop anything. It was then that I tried the "D GRS" display in HEX, and found that the RNAME of the enqueue was padded with blanks as far as the display would go. I couldn't actually know how long the RNAME really was. If you look at Figure 1, you'll see how the RNAME of JES2's RESERVE enqueue to the checkpoint dataset is padded with blanks. The actual length of this RNAME, I think, is 50 or 51 characters. I tried lengthening my own enqueue's RNAME, one character at a time, until I got my TSO command to successfully stop JES2. Stopping access to SYS1.BRODCAST by IBM's SEND and LISTBC commands was a completely different problem. From looking at SEND's enqueues with IMON, I saw that the QNAME was SYSIKJBC, and the RNAME was usually a three-byte hex number. Sometimes the number was X'000000', and sometimes it was something else. The enqueue was EXCLUSIVE, not a RESERVE, and it had a scope of SYSTEM. From my knowledge of the structure of SYS1.BRODCAST (which is a direct access dataset with relative record addressing from the beginning), I saw that the RNAME of X'000000' was an enqueue on the header record of SYS1.BRODCAST, and the nonzero RNAME was an enqueue on the relative record address which contained that TSO user's userid record. With this knowledge, and with my intention about what my program should do, I carefully was able to protect SYS1.BRODCAST while my program did its message deletions, and all was happy. I have not treated the subject of enqueues here with any thoroughness, and there are many gaps in what I have said. Nevertheless, I hope that this month's article will whet your appetite to explore this area. If you have a performance monitor, it would be nice to watch its enqueue displays with variations, so you can see what enqueues the various system components throw out. You might do something similar with the "D GRS" display, but try and do the display once, or a few times, looking at SYSLOG afterwards, so you don't confuse the operators with all the console messages. Good luck and good hunting. I hope to see you again next month. * * * * * * * * * * * * * * * * * * * * * * * Figure 1. Illustration of MVS's DISPLAY GRS command to display the details of outstanding ENQUEUEs. When datasets are allocated, the system uses a qname of SYSDSN and an RNAME of the dataset's name. The RESERVE enqueue on the JES2 checkpoint dataset is more complicated. Its QNAME is SYSZJES2, while its RNAME starts with the checkpoint volume followed by the checkpoint dataset name. The RNAME is padded with blanks past the end of the display, to 50 characters. D GRS,RES=(SYSDSN,SYS1.PARMLIB) ISG020I 11.05.02 GRS STATUS 858 S=SYSTEM SYSDSN SYS1.PARMLIB SYSNAME JOBNAME ASID TCBADDR EXC/SHR OWN/WAIT MVS1 RACF 0020 009FDE88 SHARE OWN MVS1 BLXSHR 001E 009FDE88 SHARE OWN MVS1 IHDMP 0117 009FDE88 SHARE OWN D GRS,RES=(SYSZJES2.*),HEX ISG020I 11.00.48 GRS STATUS 010 S=SYSTEM SYSZJES2 JESSP1SYS1.HASPCKPT T EEEEDCEF DCEEDFEEEF4CCEDCDDE444444444444444444444444444444 28291522 1522712821B81273273000000000000000000000000000000 SYSNAME JOBNAME ASID TCBADDR EXC/SHR OWN/WAIT MVS1 JES2 0014 009F7DC8 EXCLUSIVE OWN * * * * * * * * * * * * * * * * * * * * * * * Figure 2. IMON display of the JES2 checkpoint RESERVE enqueue. You can tell that the enqueue is a successful RESERVE enqueue by the color of the display line. A tutorial panel is provided (by pressing PF1) which explains the many details described in the display line. SYSTEM MVS1 96/12/20 10:14:29.33 LINE 1 OF 2 75% ENQUEUE DISPLAY SELECTION CRITERIA: Q=******** R=**************** J=JES2**** Q-NAME E RESOURCE-NAME (RSV-DEVICE) USER SYSZJES2*JESSP1SYS1.HASPCKPT 9 JESSP1)JES2