Checklists Bring Order Out of Chaos and Enhance Reliability
Systematic procedures help avoid and solve problems
10/5/2016 12:00:38 AM |
By Gabe Goldberg
After spending a career with mainframes, I prefer order to chaos. I like the deterministic manner in which mainframe OSes are maintained (audit trails, full piece-part identification, systematic system builds, standardized maintenance tools, maintenance history, etc.) more than the free-fire frontier mentality for configuring, maintaining and debugging Windows PCs (plug-and-pray, installs which sprinkle random files everywhere, rebooting to ignore vanquish problems, etc.). Even logging changes to my PC, I feel at the mercy of the next software install which mysteriously corrupts a working system. But I take some comfort from detailed records and notes on how I've done things and set up systems so I can retrace my path.
In the same manner, when my wife prepares her favorite contribution to potluck events, rum cake, she follows a written recipe (see below). I've teased her about needing—or at least wanting—instructions, but I admire her process and see that it mirrors mine. I'm happy that the only variable between cakes is whether they include chocolate chips and nuts, with no risk of key ingredients forgotten.
Following the Rules Is Key
Current events tragically illustrate consequences of ignoring procedure
The pilot of an Embraer Phenom failed to turn on crucial de-icing equipment during an approach to the airport in Gaithersburg, Maryland, on December 8, 2014, causing a deadly crash, the NTSB said in its probable-cause hearing today. All three people on the jet and three on the ground were killed. “Pilots must rely on checklists and procedures because relying only on memory can have deadly results,’’ said NTSB Chairman Christopher Hart. “The pilot’s failure to turn on the de-icing system in an icing situation proved to be disastrous.” By not taking possible icing into consideration, the pilot set approach and landing speeds that were too slow for conditions, leading to an aerodynamic stall at an altitude at which a recovery was not possible, the board said. The airplane crashed less than a mile from the runway.
Checklists and procedures are hardly new. Test pilots live by them with written reports being key and most washouts occurring because of inability to write. Articles such as "Mainframes, the Military and Me"
and "Checked, Set, Roger"
illustrate their use in aviation.
Ed Gould semi-retired and supporting a non-profit organization's z Systems usage, related two settings with very different orientations. "In the ’70s and early ’80s we were almost monthly going through Systems Assurance with IBM," being pretty thorough with elevator dimensions and the like, he says. "We were in Chicago, so power cords were always an issue; it was the only place in the country with a limit of seven feet." And yet, when the company built a New Jersey data center, after hurriedly changing location, it was discovered that elevator measurements weren't quite correct. So movers sat idle while helicopters and a crane were used to bust out a wall and lift equipment to the second floor. He noted that it cost about $50,000.
The book "The Checklist Manifesto: How to Get Things Right"
illustrates—using examples from medicine, the military, aviation and other contexts—great and pervasive value in applying structure, planning, discipline and checklists to complex projects. Checklists and structured reports avoid and solve problems. They also improve quality by identifying weak areas.
Checklists in the IT World
Working with technology, we're more than familiar with lists of instructions for installing, maintaining, debugging and repairing it. E.g., VM, through many incarnations, has used an Installation Verification Program (IVP) to exercise basic functions. Long ago in a VM data center, we added local functions to the test suite. Ensuring successful IVP operation at least reduced unpleasant surprises when systems entered production. Such standard and local "regression testing" ensures that system changes and maintenance haven't broken anything. Checklist-managed change control avoids surprises and "you never told me about that" stakeholder screams.
The opposite approach is all too common: keeping details in one's head and managing by crisis—e.g., omitting from mandatory procedures stress testing systems and applications under realistic load to ensure scalability. Similarly, benchmarking—running a controlled or calibrated load and measuring performance and throughput—ensures that upgrades deliver on promises and that functional changes haven't degraded operation.
Checklists support and automate redundancy and skills transfer, and avoid disasters or reduce their consequences. It's important to remember that checklists aren't just procedures—they produce after-action reports which are used to improve the checklists.
Mainframe consultant Bill Janulin describes working in pre/post-sales capacity for ISVs, with responsibility for customer-reported problems. He notes that they used a concise standard checklist for diagnosis, including a brief problem description, product release level, when problems occurred, what product setup work was done (including product configuration) and processes running at the time of failure. For completeness, they also requested console logs and memory dumps. Even when the problem didn’t directly involve the product supported, the checklist and complete resources it provided led to asking questions until resolution.
IT executive and pilot Stan King reports that his company, Information Technology Company, succeeds based on the religious use of checklists. "In fact," he says, "[for] any organization looking to achieve ISO/IEC certification and accreditation, it’s a must." But, he continues, "a checklist is only as good as the person using it." As a pilot, he observes that for decades since the first flight, aviation has had checklists. A normal airline flight may require 15 checklists, some for emergencies and others routine followed in detail. Checklists are essential because complacency slips into a cockpit easily, especially when you are intimately familiar with an aircraft and bypassing a step may seem innocuous while sitting at the gate, but may haunt you at the worst possible time.
Icing on the Cake
Just as checklists are valuable in medicine, aviation, cooking and information technology, they're essential for event planning. Software engineering expert Capers Jones puts it this way: "Describe the perfect state of affairs for the project at hand; then draw from this fantasy and create the perfect checklist."
Keep notes on what's done, log steps and decisions, record problems and solutions, identify components used and date stamp events. Whether you or someone else repeats the process, having a log history and recipe to follow will produce a much better IT "rum cake."
This lesson applies to practically everything you do in life. I'm coming to the end of two years serving as chairman of my police station's Citizens Advisory Committee. A surprise in taking on this role was that it included running catering operations—an annual June cookout for members and Thanksgiving/Christmas meals for officers working those days. My first step in taking these on was querying for recollections of how things were done and coding that into—of course—checklists. This made planning and executing events in this second year as reliable as my wife's cake baking, only involving a few (dozen) more people and a more complete menu.
Gabe Goldberg has developed, worked with and written about technology for decades. Email him at firstname.lastname@example.org.
- One 18.25 ounce package yellow cake mix
- One 5.1 oz package Jello Instant Vanilla Pudding
- Four eggs
- 1/2 cup cold water
- 1/2 cup oil, any type
- 1 cup chopped pecans or walnuts
- 1/2 cup Bacardi dark or light rum 80 proof
- Preheat oven to 325 degrees
- Grease & flour 12" Bundt pan
- Sprinkle nuts over pan bottom
- Mix all cake ingredients together
- Pour batter over nuts
- Bake 1 hour, cool
- Invert cake on serving plate
- Prick top, drizzle and smooth glaze evenly over top and sides
- Allow cake to absorb glaze, repeat until glaze is used up
- 1/4 lb butter
- 1/4 cup water
- 1 cup granulated sugar
- 1/2 cup Bacardi dark or light rum 80 proof
- Melt butter in saucepan, stir in water and sugar
- Boil 5 minutes, stirring constantly
- Remove from heat, stir in rum, allow to cool