The inglorious path back to Metro glory: Maintenance

For all the complaints about Metrorail, from outages to safety concerns, it's easy to forget that Metrorail was until recently considered by many to be the best subway system in the country. For those of us who rode the subway in the 1990s, this is not a distant memory.

Bus Maintenance circa 1954

So what happened? The Metrorail system got old. Much of Metrorail was built between 1970 and 1990. For a generation, we didn't have to worry about broken escalators and elevators, doors that wouldn't close and tracks that malfunctioned. Everything just worked because it was new.

The solution, according to WMATA's Capital Needs Inventory, is to replace all of the aging infrastructure that is at the end of its useful life. Hence the sizeable capital budget from WMATA.

The $11 billion in capital needs are driven by a number of factors, including the age and condition of Metro's assets. The 30-year old Metrorail system requires many life cycle replacement costs for the first time, including the replacement of nearly one-third of the rail car fleet. Similarly, Metrobuses need to be replaced and rehabilitated on a regular schedule.
The obstacle, we are told, is a lack of dedicated funding to finance this massive replacement. But is "useful life"-based replacement really the solution? Is it the best practice in maintenance today? Let's look a little closer at maintenance that is based on the "useful life" of infrastructure.

Scheduled Maintenance: The first step in the evolution of any organization's maintenance strategy is from reactive to proactive maintenance. The advantages of this step are obvious (fewer breakdowns, longer service life) and the easiest way to implement proactive maintenance is with a schedule. All transit agencies have implemented proactive, scheduled maintenance programs, for which we are fortunate as firms in many industries have not.

However, scheduled maintenance has one fundamental weakness: because maintenance is based on a calendar and not the objective condition of an asset, it is almost always either too late and a breakdown has already occurred, or it is way too early and thus wasteful. The breakdowns, of course, only increase reactive maintenance expenses, thus undermining the attempt to be proactive and stealing funds from proactive maintenance efforts.

The FTA is even complaining that manufacturers are building vehicles whose maximum useful life is based on agency expectations. While this weakness can be partially addressed by scheduling maintenance based on usage and not a calendar (just like a car's 3,000 mile maintenance intervals), any scheduled maintenance strategy inevitably creates a false costs vs quality trade-off. This is because the only way to improve reliability through scheduled maintenance is to increase its frequency, which further increases wasteful maintenance costs.

Much of the nation's built environment was built in the same generation as Metrorail, and our daily lives have become increasingly dependent on this infrastructure. Maintenance of aging infrastructure is thus not just a Metrorail challenge but one of the leading challenges facing the country. Scheduled maintenance could bankrupt our country while still leaving it with an unreliable infrastructure. Fortunately, maintenance best practices have developed that provide a blueprint to a smarter, leaner and more reliable built infrastructure.

Reliability-Centered Maintenance: Reliability-Centered Maintenance initiates maintenance activities when monitors or tests indicate that an asset's condition is likely to lead to breakdown. For example, vibration or temperature, two of the most common leading indicators of breakdowns, are easily monitored with remote sensors. Because the condition of an asset, instead of a schedule, determines when maintenance is initiated, this approach is called condition-based or reliability-centered maintenance.

The goal of reliability-centered maintenance is to initiate the right maintenance at the right time. The result is that maintenance is less costly and more effective.

DoD Hierarchy of Maintenance Approaches

Identifying the conditions that are leading indicators of different types of breakdown is accomplished through Failure Modes and Effects Analysis (FMEA). The ways in which a car or bus system or subsystem could fail, its failure modes, are identified along with the possible causes of each failure mode. Causes of failure modes that are more likely to occur or have severe consequences are then monitored using remote sensors or manual tests. FMEA is an essential step to improving both reliability and safety at Metro.

Where is Metro on the hierarchy of maintenance approaches? Metro currently practices calendar-based scheduled maintenance, and has made the decision to migrate to usage-based scheduled maintenance. While this is good, it will not enable Metro to return to its glory days at an affordable price. The prospect of migrating to reliability-centered maintenance has both good and bad news.

The Good News: Many pieces are in place for a transition to reliability-centered maintenance that would be a model for the nation's transit agencies.

First, WMATA has invested in the leading asset management software system (IBM Maximo) which supports reliability-centered maintenance. Metro is currently using Maximo to track every asset it owns (267,000 assets) so that, for example, replacement components can be identified instantly or maintenance instructions can be remotely downloaded for any component. The Safety Management System that was quickly built by WMATA IT this year enables the Safety Office to analyze failures through point-and-click identification of components in any system.

However, Maximo could also be used to associate asset conditions (e.g. temperature levels) with failure modes. When this is done, Maximo can not only enable more cost-efficient reliability-centered maintenance, it can even use the data it collects to report the maintenance or replacement costs required to support any asset availability target (e.g. 99% availability). Imagine a capital expense budget that includes this type of data-driven, performance-based justification for each line item.

Second, WMATA has equipped the majority of its buses with Automatic Vehicle Monitoring (AVM) instruments. These instruments continuously survey the bus during operation, silently collecting fault, performance, and service data from braking, electrical, engine, transmission, security, fare collection, accessibility, and climate control systems, and then automatically uploading the data nightly.

The Bad News: Despite the presence of the building blocks for implementing maintenance best practices, there seems to be no management-level leadership in maintenance best practices, perhaps the most critical discipline for the future of Metro. As a result, WMATA remains in the trap of expensive reactive maintenance caused by calendar-based maintenance schedules that are independent of the conditions of WMATA's 267,000 assets.

A case in point is the elevators and escalators, some of whose manufacturers are out-of-business requiring expensive consultants or wholesale replacement. However, we were only reliant on these manufacturers because we implemented their maintenance schedules, instead of conducting Failure Mode and Effects Analysis to develop our own internal knowledge base and condition-based maintenance system for each elevator and escalator. Now Metro has hired a consultant to "fix" the elevators and escalators in 4 stations, a short-term reactive solution that will only work until the next elevator or escalator failure in those stations requires another heroic, expensive consultant.

WMATA can do this. It's been done in the airline and defense industries, and it will eventually be done in transit. The WMATA Board should select a GM with experience in reliability-centered maintenance, preferably from the airline or defense sectors. And we should encourage WMATA to lead the way among transit agencies, none of whom have adopted these maintenance best practices, lest rail travel across the country be increasingly perceived as out-dated, dangerous and unreliable.

Ken Archer is CTO of a software firm in Tysons Corner. He commutes to Tysons by bus from his home in Georgetown, where he lives with his wife and son. Ken completed a Masters degree in Philosophy from The Catholic University of America. 


Yes, if only Metro and transit in general could be in as good of shape as the aviation industry...


Won't even get started on the defense sector, which has operated in an "unlimited money" environment, as Bob Gates himself has noted. Not exactly a realistic model to follow.

by Dizzy on Aug 30, 2010 3:17 pm • linkreport

So is the fact that "there seems to be no management-level leadership in maintenance best practices" a result of continually going to the well of cutting payroll at WMATA, or is this something that has never really been considered?

And for those wondering, that graph can be found in section 2-4 of the PDF.

by Steven Yates on Aug 30, 2010 3:27 pm • linkreport

@Steven Yates

That's a great question. An industry has to get to the point where safety or maintenance costs are so unacceptable that organizations are willing to take political ownership of maintenance instead of hiding behind vendor-designed maintenance schedules.

Reliability-centered maintenance was designed by US Airlines and the Rand Corporation in the 1970s, because of the unacceptable crash rates of aircraft. What they found was that the best maintenance approach is also the most cost-efficient, because it applies the right maintenance at the right time.

by Ken Archer on Aug 30, 2010 3:39 pm • linkreport

Well, a couple thoughts:

1. You fail to mention anything about employees and contractors. I don't know what percentage of WMATA's employees are being hired for their race rather than their competence, but any public sector employer has to balance efficiency vs. being a jobs bank. And another aspect of WMATA's problem is how to manage contractors, which they haven't learned yet.

2. Role of federal government. I think you point in comments re: reliability centered maintenance is a good one. To compare, if air travel had continued to be as "dangerous" as it was in the 1970's it might still be considered safe and acceptable. However, NTSB was very good at drawing a line to where airlines are incredibly safe. Perhaps too much so -- there is a financial cost and we can see the Airline industry suffering. However, any metric established needs to include the entire transit system and not just the safety measures.

Metro is still safe -- however the little breakdowns - yesterday I had 1) a faregate that didn't display my fare, one escalator and one elevator out of service, and another faremachine that didn't work w/cc -- are what drives people nuts.

by charlie on Aug 30, 2010 3:56 pm • linkreport

You gotta wonder how competent leadership is if maintenance cost comes as a shock. On the other hand, it seems to be generic problem in the US.

by Jasper on Aug 30, 2010 4:01 pm • linkreport

More substantively, while reliability-centered maintenance seems like a worthwhile concept, I think focusing on the conceptual level is a mistake at the moment. As the NTSB report on the Red Line crash noted, there were countless alarms going off that the ATC system had issues. The warnings were uniformly ignored. The circuit testing method that could have detected the specific error went unused. What good is a high-level maintenance concept if the people doing the maintenance aren't going to implement the concept with more than 50% of their gluteal fat pads?

by Dizzy on Aug 30, 2010 4:03 pm • linkreport


The Operations Control Center staff ignored the warnings because management wouldn't have done anything about it, like subject the reported warnings to Root Cause Analysis that results in updated procedures. That's why these changes have to be initiated from the top, and that can only happen if they are understood conceptually. Let's not blame the workers for a failure of leadership and management.

by Ken Archer on Aug 30, 2010 4:15 pm • linkreport

We need to go back to building the system as it should have been built...with a third and fourth set of tracks. Then preventative maintenance could occur with less disruption to riders.

Five days of Metro shutdown on the Redline is five days of revenue and ridership loyalty lost.

by Redline SOS on Aug 30, 2010 4:21 pm • linkreport

I think it's important to recognize that while the overall, end-state benefits are readily available in a brief synopsis, those bullet points are the effect of everything going according to plan.

Being in the Navy until last year, I spent almost a quarter of each work week (9 hrs on average) reviewing maintenance records and validating my Sailors' work schedules. I'll be honest with you that there was somewhat of a mix of "preventive" and RCM (sad how I still use 3-letter acronyms) going on, but I can say that double redundancy (if WMATA could successfully navigate it into their operational maintenance budget) might save some money for them down the road.

The reason I bring up double redundancy is, as an example, the escalator situation. It is unsatisfactory for any public-facing authority to allow their facilities to fall into total disrepair in the manner that Metro did. Now, I may not be suggesting a fully installed, off-site escalator whose parts are continuously having "predictive" maintenance performed upon them, simply waiting for its counterpart on the the public-facing escalator to go kaput, but rather get the requisite information on the parts most likely to fail, and order those backups as necessary, and keeping the spares well maintained.

Ultimately, it will be up to Metro's board of directors to make the best decision possible with respect to how the new GM's view of maintenance will affect his/her daily decisions for WMATA and its fleet, and I think the board will get it right.

by C. R. on Aug 30, 2010 4:55 pm • linkreport


If you know your managers are so blatantly incompetent as to ignore massive system warnings/failures, then you turn to the Safety department or the Inspector General.

While maintenance failures were not the proximate cause in this particular calamity, they have been implicated in various rail yard derailments, for example. The NTSB page containing the full list of documents in the investigation record has some of the nitty-gritty details.

by Dizzy on Aug 30, 2010 5:10 pm • linkreport

On the other hand, it seems to be generic problem in the US.

Ayup. While those arguing for austerity now aren't very interested in examining the depths of poverty which "throwing money at things" prevented during the 1930s, we emerged from the Great Depression with a lot of public goods. A bunch of things whose cornerstones or brass plaques say 193x, 195x or 196x have been wearing out. Did everybody just think that they just fell from the sky?

by ThresherK on Aug 30, 2010 10:37 pm • linkreport

beyond the annual wrangling with local governments, Metro is basically alone in not having a predictable revenue source besides the farebox. It has had an upper management vacuum for several years. If it were run like a business, there would be even less investment in maintenance and underfunding of pensions, the usual way the private sector functions. This also is how a lot of the public sector operates--Virginia's "surplus" comes from not fulfilling obligations.

Matro needs strong management and a better board, along with a way to insulate itself from hacks like McDonnell and O'Malley. My guess is that the only way this will happen is if riders are more active and organized than they are now.

by Rich on Aug 30, 2010 10:39 pm • linkreport

Whenever I hear the name "Maximo" my eye starts to twitch.

That software must have been designed by a drunk sadist. It is the polar opposite of intuitive, efficient, and user-friendly.

There are lots of little cryptic symbols that have nothing to do with the function they are associated with.

Many pages have multiple fields that are unused, and are therefore an unnecessary and confusing distraction.

During the first Maximo class I attended, the entire Maximo system crashed and the class had to be cut short.

Paperwork that used to take a few minutes can now take a half hour or more.

Needless to say, having maintenance records computerized has many advantages. It is something Metro should have done years ago, but there has to be a better way than Maximo.

by S. Johnson on Aug 30, 2010 11:41 pm • linkreport

As somebody who spent much of my adolescence and younger adulthood riding SEPTA fairly frequently, I can assure you, Metro is not alone in its funding problems.

by Nate on Aug 31, 2010 12:20 am • linkreport

All shorts runs of escalators should be replaced with concrete stairs as soon as budgets allow.

by stevek_fairfax on Aug 31, 2010 8:31 am • linkreport

@Red Line SOS: If third and fourth tracks are so crucial, why do so few systems outside of New York have them? Even then, anyone living alongside the gentrifying areas of the L and M trains in Brooklyn (which, gasp, have only two tracks) would beg to differ. New York is an exception to the rule and, well, every rule.

@Rich: I think that the last year has organized riders, the problem is a lot of said riders see DC as a waystation in life and probably couldn't name their local representatives if they tried. The frustrations aired on social media could be put to good use if people channeled it to the powers that be.
Case in point: Back in 2004, the Post did an article on Montgomery and Prince George's complaining about Annapolis favoring Baltimore (and keeping MTA's fares low) over them even though Maryland would be in trouble without them. If that article ran today, you bet people would mobilize over that article since social media has evolved and most users of it did not live here then.

@Nate: SEPTA gives riders more (unlimited passes, functional commuter rail, etc.) for less money. The most expensive all-in SEPTA pass is still less than the cheapest MARC TLC. After the storm and the nickel-and-diming fare hikes, I'd take SEPTA over WMATA/MARC/VRE in a heartbeat.

by Jason on Aug 31, 2010 8:33 am • linkreport

