Greater Greater Washington

Transit


Metro needs calm, proactive hazard analysis

NTSB members' emotional tongue-lashing of Metro last week may have been well deserved. But the NTSB critique also risks being counterproductive unless cooler heads prevail at WMATA, focused more on actual safety than on just responding to NTSB.


Photo from the NTSB.

NTSB's safety recommendations are reactive, not proactive. They illuminate the facts of the crash, but are unhelpful in preventing the next crash, whose specific causes are likely to be very different given the rarity of accidents in any transit system.

Furthermore, because NTSB's makes its recommendations without regard to costs, yet expects WMATA to implement them in their entirety or suffer further tongue-lashing, they risk stealing funds from higher priority corrective actions. WMATA really needs a prioritized list of initiatives (corrective action plans, or CAPs) that would boost safety, without regard to whether NTSB has made political footballs of them or not.

Where should such a list come from? Hazard analysis, conducted systematically, is the central discipline in safety management, and it is missing at WMATA and from NTSB's recommendations. It is common practice in industries such as airlines and nuclear power.

A hazard is a cause of an accident, and the purpose of hazard analysis is to identify as many hazards as possible and then prioritize them by likelihood, severity of the consequences, and the cost of correcting them. There are two types of hazard analysis, and both are critical.

Root Cause Analysis: Whenever accidents happen, root cause analyses must be conducted to identify the root causes, or hazards, that led to the accident. NTSB conducted an excellent root cause analysis of the Red Line crash. The problem with relying on root cause analyses alone is that systems with very, very few accidents present few opportunities to identify root causes, and the root causes of each accident are statistically likely to be different.

Failure Modes and Effects Analysis (FMEA): A more proactive approach to hazard analysis is to identify all of the ways in which a system might fail. These are the system's failure modes. Loss of train detection by the automatic train control system was the failure mode implicated in the Red Line crash. But there are dozens of other failure modes. FMEA identifies as many failure modes as possible, identifies the causes of each failure mode, and then prioritizes the actions that would correct each cause by the severity and likelihood of the effects of their failure mode and the cost of the corrective action.

NTSB rightly identified the deeper cause of the Red Line crash as not the failure of track circuit modules but an institutional failure to address safety. This institutional failure, though, was unhelpfully generalized as the "lack of a safety culture." How does one get a "safety culture"? NTSB's recommendations are sorely lacking in detail on this topic, with no mention of hazard analysis or FMEA. The result, as has been said, is a tone of petulance by the NTSB.

When WMATA calmly, systematically begins to conduct hazard analysis, publicly displays the resulting prioritized list of Corrective Action Plans in its monthly Vital Signs report, and then updates the list itself (as hazard analyses are conducted continuously) and the status of each plan, then people will think of WMATA as having a "safety culture".


FTA Guide for Transit Hazard Analysis.

In this regard, the FTA was much more helpful than the NTSB in the FTA Audit's recommendations for a "Hazard Management Program". The institutional root cause of the Red Line crash, unidentified by NTSB, was described perfectly by the FTA: "There is no evidence that safety analysis is being performed to prioritize hazards for elimination and mitigation."

Will cooler heads at WMATA prevail? Preliminary signs are not encouraging. The WMATA Board criticized the failure to implement over 100 Corrective Action Plans. Similarly, a WMATA official told the Riders Advisory Council, in explaining why the new 7000 series of rail cars will forego longitudinal seating, that if there were anything it could do, no matter what, to improve safety, then they would be remiss in skipping it.

Both of these incidents portray a shell-shocked WMATA that is reflexively saying "of course" to any idea that could improve safety. This emotional response to safety is precisely what leads to the false "safety vs cost" trade-off. A proactive hazard analysis program, however, must prioritize this list of ideas because it has produced far more corrective action plans than there is money or time to ever implement.

This results in a lean safety agenda that prioritizes CAPs with a high safety return on investment, not those that will only push large volumes of riders into cars for a minimal improvement in safety. That's why the FTA asked WMATA for its list of "top ten" hazards that it plans to address.

Furthermore, it's unclear if FMEA and Hazard Analysis are skills that exist within WMATA. The recent WMATA Vital Signs monthly report of Key Performance Indicators, such as passenger injuries and bus on-time performance, is to be commended for transparently monitoring and reporting metrics. But the discussions of "Why did performance change?" and "Actions to improve performance" for each KPI seem so arbitrary that it appears no root cause analyses were conducted for each KPI that was below target. Hopefully new Chief Safety Officer James Dougherty can bring these skills to WMATA.

It's time for calm, proactive analysis to replace emotional, reactive safety initiatives. The Metro Board and GM, as well as journalists and bloggers, can be more helpful by asking the right questions, as the FTA did, instead of exposing every safety idea that WMATA has not implemented as indicative of an agency with no "safety culture."

Ken Archer is CTO of a software firm in Tysons Corner. He commutes to Tysons by bus from his home in Georgetown, where he lives with his wife and son. Ken completed a Masters degree in Philosophy from The Catholic University of America. 

Comments

Add a comment »

This hits the nail on the head.

by Ben Ross on Aug 4, 2010 12:17 pm • linkreport

Very thoughtful/ thought-provoking post. Excellent.

by Penny Everline on Aug 4, 2010 12:31 pm • linkreport

Hazards and causes of hazards - DC schoolchildren.

by Redline SOS on Aug 4, 2010 1:15 pm • linkreport

Calm and systematic. Two words that are not in the dictionary of government, administration and politics.

by Jasper on Aug 4, 2010 1:28 pm • linkreport

I don't know how one makes a CAP for "getting employees to do their jobs," especially if their failure to perform due diligence will most likely never be uncovered until and unless something goes catastrophically wrong. How long have the parasitic oscillations in the ATC system been a problem? At least since the near-three-way crash at Rosslyn, right? So it took just the right set of conditions for an accident to actually occur, out of thousands upon thousands of train trips, even though it's obvious on its face that "the train detection system sometimes doesn't detect trains" is a gaping safety hazard.

You can make all the bullet-points you want; if the people carrying them out don't care to do them right, and they know there are no consequences for them personally, you're not going to truly see that "high safety return on investment."

Captcha: able missioned

by Dizzy on Aug 4, 2010 2:04 pm • linkreport

"Calm and systematic. Two words that are not in the dictionary of government, administration and politics."

I'm not sure if this is necessarily an issue pertaining to the government itself per se, but more one pertaining to the screaming mob outside of their windows.

by andrew on Aug 4, 2010 2:40 pm • linkreport

Great article! One thing to note: perhaps NTSB did address hazard analysis to some degree in its meetings as and in its examination of High Reliability Organizations. See Accident Docket postings for dates Feb 17, 2010 and March 2, 2010.
http://www.ntsb.gov/Dockets/RailRoad/DCA09MR007/default.htm

by fly-on-the-wall on Aug 4, 2010 3:22 pm • linkreport

@ andrew: I'm not sure if this is necessarily an issue pertaining to the government itself per se, but more one pertaining to the screaming mob outside of their windows.

What screaming mob? I am making no more sound than the tapping of the keys I stroke typing to post a reaction on a niche blog where discussions are civil, sometimes cynical, and moderated when they go beyond that.

Most political maneuvering I see, is not based on calm analysis, but on blame shifting. "It wasn't me. I was not present at the WMATA meetings dealing with safety".

Calm and systematic. I can't really come up with a list of local administrative and political persons that would fit the list. Most of 'm fit the bill of 'desperately hanging onto their seat, doing whatever to hang on'.

Please indulge me and prove me wrong.

by Jasper on Aug 4, 2010 4:41 pm • linkreport

Thank you for the MBA lecture on Root Cause Analysis.

However, as has been said repeatedly, what Metro needs is a culture shift. That has to come from top management and be forced down through the organization until employees "get it"--and believe it. All freight railroads have it; they live and breath safety. Even their e-mail messages close with "Be safe!" I know many professional railroaders, from Class 1s down to shortlines, and they know safety is not just good for them but also for the health of the company.

And so I've been astonished to observe over the years one astonishing fatal or near-fatal accident after another involving Metro maintenance-of-way workers, long before the Fort Totten accident--a clear sign to me they had a miserable safety culture.

Metro needs new, strong, safety-oriented management. It was necessary for NTSB to yell at them and make them soil themselves. The WMATA board now needs to put the right people in charge.

by John in Alexandria on Aug 5, 2010 8:42 am • linkreport

For those interested in safety culture in high risk businesses, the safetymatters blog (http://www.safetymattersblog.com/) provides some thought provoking analyses and commentary.

by Bob on Aug 5, 2010 9:10 am • linkreport

Is WMATA really not using a comprehensive, systematic safety technique? If true, that's absolutely ridiculous, but looking at their website I can find no mention of how they systematically address safety.

Another option which is used extensively by the nuclear industry is Probabilistic Risk Assessment (PRA). PRAs use event trees and/or fault trees to model potential failures and provide a quantitative assessment of the risk of a hazard. They can be used to evaluate the safety benefit of adding equipment or improving it's reliability which helps to prioritize maintenance, surveillance and system upgrades.
http://en.wikipedia.org/wiki/Probabilistic_risk_analysis

Ken have you forwarded this post to WMATA? They clearly need to read it.

by Brian on Aug 5, 2010 12:39 pm • linkreport

@Brian,

Yes, I have, but it probably wasn't necessary as GGW gets read pretty widely.

by Ken Archer on Aug 5, 2010 12:46 pm • linkreport

What federal policy has ever been based on calm and rational policy? The entire mission of the Department of Homeland Security is to do everything except be calm and rational. If we were being calm and rational, Metro would be spending money on fixing the track circuits (a known problem that has actually lead to fatalities) instead of anti-terrorism boondoggles to protect against imaginary problems that have killed 0 people. (Metro's own press release states, "There is no current [terrorist] threat to the transit agency or elevated threat level."

by Stanton Park on Aug 5, 2010 4:40 pm • linkreport

I can not agree. If not for the NTSB showing the world how the WMATA Board is thus far incapable of *any* leadership at all; how exactly ARE you going to change things?

Not ONE Board member was there to hear the NTSB Recommendations. Previous
reports have fallen on deaf ears.

The Board evisceration by the NTSB was not just deserved, but long overdue. After all, this winter the WMATA Chairamn sat there and said in effect "Safety's not OUR problem; we only diddle with *policy*..."

Where does he think a safety attitude comes from; the stork?

Years ago, I worked at an oil company with an active safety posture and culture. Since I left, BP bought them, and guess what my old-cow orkers tell me is the case now? Leadership at the TOP is where it MUST start.

I can not see that the NTSB report in any way constrained WMATA from any proactive movement towards a safer system. Only their own attitudes prevent such.

by George B on Aug 5, 2010 11:58 pm • linkreport

Unfortunately, WMATA's priority is to meet the ever growing demands of the public for extensive service, rather than making safety and reliabilty their mission. In order to throughly and safely maintain the aging system WMATA will have to limit service hours so employees can safely do the necessary maintanence. It is impossible to diligently do your job when you have only a matter of minutes before the next train is coming through and management is expecting you to hurry along so schedules are maintained. It is this lack of patience that is much responsible for the tragedies that WMATA has come to accept as par for the course.

by Grace G. on Aug 6, 2010 10:48 am • linkreport

@Ken

Once the NTSB's report is available in its entirety, I'd encourage you to check out the section of the analysis entitled "Safety Culture." It lays out in fairly specific detail the essential characteristics of an effective safety culture, many of which were advanced by Dr. James Reason. Part of establishing such a culture is, in fact, hazard analysis, and the NTSB report recommended the following to WMATA: "Review the Hazard Identification and Resolution Matrix process in your system safety program plan to ensure that safety-critical systems such as the automatic train control system and its subsystem components are assigned appropriate levels of risk in light of the issues identified in this accident."

by Sean on Aug 6, 2010 2:20 pm • linkreport

Jim Graham just finished his excuses ^H^H speech to the NTSB board. He again claimed it was a "technology" failure [ergo not HIS fault]l ignoring the meat of the NTSB report....

I see little hope for improvement with such attitudes still in abundance....

by George B on Aug 9, 2010 4:56 pm • linkreport

Graham's reply to the NTSB is the result of how the causes of the accident have been framed: direct cause is technical, institutional cause is cultural. What can the board do if those are the causes? Hence Graham's honest reply:

"You say to yourself, 'What could I have done differently?' What could you have done about this circuitry test? I conclude that I don't know what I could have done."

Jim Reason would be the first to say that you get a safety culture through calm, systemic hazard analysis, not simply through speeches to employees or mandating "safety comes first" in email signatures. Like Reason's "Swiss Cheese Model" of accident causes, his 4 features of a safety culture have been unhelpfully oversimplified by safety execs looking for "rules of thumb".

In fact, safety culture sloganeering is counterproductive, as Deming said in his 10th point:

Eliminate slogans, exhortations, and targets for the work force asking for zero defects and new levels of productivity. Such exhortations only create adversarial relationships, as the bulk of the causes of low quality and low productivity belong to the system and thus lie beyond the power of the work force.

Why did the Operations Control Center staff ignore the repeated circuit failure indicators? Attributing this to lack of vigilance blames the work force. But was hazard analysis conducted on reported system malfunctions? No. If hazard analysis was conducted systematically on all reported malfunctions, resulting in a reprioritized agenda of action plans to improve safety, then the OCC staff would have almost certainly reported the malfunctions.

Graham and other board members can ask Metro what the FTA asked, 'Do you have a top 10 list of actions to improve safety?' and, if so, 'Why those 10?'. From what I can tell, though, no one other than the FTA has asked Metro this question to date.

by Ken Archer on Aug 10, 2010 8:30 am • linkreport

Add a Comment

Name: (will be displayed on the comments page)

Email: (must be your real address, but will be kept private)

URL: (optional, will be displayed)

Your comment:

By submitting a comment, you agree to abide by our comment policy.
Notify me of followup comments via email. (You can also subscribe without commenting.)
Save my name and email address on this computer so I don't have to enter it next time, and so I don't have to answer the anti-spam map challenge question in the future.

or