Facing an APAR-rent Problem?
IBM’s APAR process provides the tools for dealing with software issues
6/12/2013 1:01:01 AM |
By Gabe Goldberg
The word APAR is like a Swiss Army knife: It’s a noun, verb, epithet, four-letter word, and it’s short for authorized program analysis report. But nobody I know says that mouthful!
TechTarget begins its dry definition
, “APAR is a term used in IBM for a description of a problem with an IBM program that is formally tracked until a solution is provided.” Well, yes. More than a description or event, though, APAR is a process worth understanding when facing misbehaving software.
Handbooks and Definitions
It's fortunate that mainframers are accustomed to reading documentation, because there's plenty related to APARs and kindred terms. IBM’s Software Support Handbook
is provided to:
• Introduce IBM Software Support
• Describe the IBM Software Support organization
• Provide information on support service offerings, including definitions of programs, policies and procedures
• Help use Web knowledge content and various self-assist tools
• Facilitate contacting IBM Software Support
• Assist getting information on software support for companies acquired by IBM and not fully integrated into mainstream offerings and processes
• Describe optional IBM Software Support services
Useful for newbies and veterans alike is a quick-read section, labeled “Acronyms A-Z,” which is nearly accurate, since it proceeds from A-is-for-APAR to U-is-for-UR1 (meaning, “unable to be reproduced on the next product release"). Don't miss the helpful definition of IBM: "International Business Machines. Undisputedly the largest software developer, anywhere in the world." If this isn't enough jargon, continue to the online IBM Terminology database, giving terms and definitions from many IBM software and hardware products as well as general computing terms. Truly inclusive, this runs from AARP (no, not that organization; it's AppleTalk Address Resolution Protocol) to z/VM.
Oddly, another resource, titled "IBM Software Support Handbook Version 5.0.0,"
is also available. Contents range from “What's New” (as of March 2013) to multiple useful appendices.
The star—and start—of the show, of course, is the APAR, from which many other terms follow. An APAR (that is, IBM agreeing a problem exists and assigning a tracking number for it) can result from elaborate customer debugging, a simple “It’s broken” report, or—quite often—customer/IBM collaboration to research, recreate and document problem details. While that continues, details and communications are collected in a problem management record (PMR). Customers with access to IBMLink can view and update PMRs. Tips on PMR management are available at developerWorks
Not all problems are created equal. When first reporting and continuing through research, customers assign severity levels, ranging from 1 to 4—meaning critical situation/system down, severe impact; moderate impact; and minimal impact. A “Sev 1” report commits IBM working 24-7 for resolution, as long as customer staff is equally available. The severity level can be changed if circumstances change from when first entered or to match current business impact conditions. In fact—perhaps not widely known—severity can be toggled to reflect customer work hours (Sev 1 when on duty and lower severity otherwise), rather like processor On/Off Capacity on Demand.
It’s important not to abuse severity levels—that is, don't attempt to make everything a fire drill or bully IBM into unnecessary urgency. Crying wolf too often costs credibility and can delay resolving real crises. And consider whether every real but perhaps minor oddity or inconsistency is worth reporting, since working them occupies vendor resources that might be better spent.
APARs simply describe problems; they’re closed—answered—with various responses. The most useful, of course, is a patch, called programming temporary fix (PTF), a fix consisting of documentation and/or code. A PTF is temporary only in the sense that it disappears with the next product release, when it’s integrated in base product code.
Some problems are automatically escalated. According to IBM’s z/VM and z/OS Statements of Integrity—issued, respectively, in 2007
—IBM accepts APARs describing system exposures which allow circumventing system access controls. In fact, similar priority is given to other products’ defects which create breaches. A long time ago, a problem I reported where an aspect of OfficeVision allowed one VM user to take arbitrary and unauthorized action on another’s virtual machine was remedied essentially overnight.
IBM’s Consulting Enterprise Software Architect Tim Sipples notes one challenge of security-related patches. “Vendors have to adequately describe the problem,” he says, “its scope and the importance of applying a fix—without revealing too much information to describe how somebody could exploit the problem."
Customers face the mirror-image problem of being aware when security/integrity patches are available. IBM maintains lists of fixes for critical APARs that should be conscientiously installed between fix pack installations, depending on local applicability. These APARs, categorized as high-impact pervasive (HIPER) describe serious and widespread problems.
The Art and Science of Debugging and Reporting Problems
An earlier Destination z article
gives problem-reporting tips. With practice, researching problems—understanding what’s wrong and learning whether it’s a known problem—and collaborating for fixes with IBM and other vendors is an acquired skill. Multiple PMRs are sometimes created for the same problem, until various symptoms are known to be related; the first one encountered might not be most central to the issue at hand. When you’ve discovered an already-known problem, subscribe to the “interested parties list” to be notified of actions taken on it.
Similarly, IBM staff can provide world-class support. For example, decades ago, Don Ariola—now director, Global Channel Sales, Technical Support Services at IBM—provided heroic service expanding and extending the VM/Passthru product, scaling it up for large customer networks.
Bill Bitner, a senior software engineer with IBM z/VM Customer Focus and Care, suggests ways to have better chances of getting APARs accepted and closed in timely fashion: “Be nice, professional, and thorough in communications.” He appreciates customers offering problem details—sometimes even suggesting which code areas are failing—or being able to recreate problems.
Besides PTF creation, APARs can be closed with codes such as:
• DOC—documentation change
• FIN—fixed-if-next with fix deferred until a future release of the product, and
• UR1—unable to be reproduced on next product release
• WAD—working as designed with no code or doc fix provided
• SUGG—thanks for the suggestion, which acknowledges a desired change but doesn't promise implementation
The nightmare case is a buggy PTF, or PTF in error (PE). Sipples notes that it’s possible to get “ping pong” APARs/PTFs, where fixing one problem causes another, but fixing that problem requires reinstating the original problem—with no way around that reality.
The mainframe, of course, is supported by multiple environments—z/VM, z/VSE, z/OS, Linux on System z and z/TPF—with each requiring different debugging, problem reporting, and service application tools and techniques.
CPR Systems’ Pete Clark says very few z/VSE customers open new APARs, and there are fewer problems than ever before. Mostly, Clark adds, customers search the problem database, find and order a PTF or the latest refresh.
And z/TPF nomenclature differs a bit, sometimes using APAR to refer to an individual fix and PTF to refer to a “fix level” (e.g., PTF 6 meaning the sixth fix level including many APARs and, perhaps, multiple enhancements).
Beware different procedures followed by various vendors. From a single APAR, IBM typically creates unique PTFs for various product levels; other vendors may apply the same fix number for multiple version/release/maintenance levels, making it challenging matching service to one's environment.
Contrary to the occasional misconception, high or low APAR count isn't necessarily a measure of quality or popularity. A high count can indicate a wildly popular new product or feature and a low count can simply reveal lack of use.
Finally, a venerable and entertaining bit of APAR folklore exists regarding a heavily used but tiny module IEFBR14
. Did it REALLY suffer more APARs than it has instructions?
Gabe Goldberg has developed, worked with and written about technology for decades. He can be contacted at firstname.lastname@example.org.