Bus Maintenance circa 1954

For all the complaints about Metrorail, from outages to safety concerns, it’s easy to forget that Metrorail was until recently considered by many to be the best subway system in the country. For those of us who rode the subway in the 1990s, this is not a distant memory.

So what happened? The Metrorail system got old. Much of Metrorail was built between 1970 and 1990. For a generation, we didn’t have to worry about broken escalators and elevators, doors that wouldn’t close and tracks that malfunctioned. Everything just worked because it was new.

The solution, according to WMATA’s Capital Needs Inventory, is to replace all of the aging infrastructure that is at the end of its useful life. Hence the sizeable capital budget from WMATA.

The $11 billion in capital needs are driven by a number of factors, including the age and condition of Metro’s assets. The 30-year old Metrorail system requires many life cycle replacement costs for the first time, including the replacement of nearly one-third of the rail car fleet. Similarly, Metrobuses need to be replaced and rehabilitated on a regular schedule.

The obstacle, we are told, is a lack of dedicated funding to finance this massive replacement. But is “useful life”-based replacement really the solution? Is it the best practice in maintenance today? Let’s look a little closer at maintenance that is based on the “useful life” of infrastructure.

Scheduled Maintenance: The first step in the evolution of any organization’s maintenance strategy is from reactive to proactive maintenance. The advantages of this step are obvious (fewer breakdowns, longer service life) and the easiest way to implement proactive maintenance is with a schedule. All transit agencies have implemented proactive, scheduled maintenance programs, for which we are fortunate as firms in many industries have not.

However, scheduled maintenance has one fundamental weakness: because maintenance is based on a calendar and not the objective condition of an asset, it is almost always either too late and a breakdown has already occurred, or it is way too early and thus wasteful. The breakdowns, of course, only increase reactive maintenance expenses, thus undermining the attempt to be proactive and stealing funds from proactive maintenance efforts.

The FTA is even complaining that manufacturers are building vehicles whose maximum useful life is based on agency expectations. While this weakness can be partially addressed by scheduling maintenance based on usage and not a calendar (just like a car’s 3,000 mile maintenance intervals), any scheduled maintenance strategy inevitably creates a false costs vs quality trade-off. This is because the only way to improve reliability through scheduled maintenance is to increase its frequency, which further increases wasteful maintenance costs.

Much of the nation’s built environment was built in the same generation as Metrorail, and our daily lives have become increasingly dependent on this infrastructure. Maintenance of aging infrastructure is thus not just a Metrorail challenge but one of the leading challenges facing the country. Scheduled maintenance could bankrupt our country while still leaving it with an unreliable infrastructure. Fortunately, maintenance best practices have developed that provide a blueprint to a smarter, leaner and more reliable built infrastructure.

Reliability-Centered Maintenance: Reliability-Centered Maintenance initiates maintenance activities when monitors or tests indicate that an asset’s condition is likely to lead to breakdown. For example, vibration or temperature, two of the most common leading indicators of breakdowns, are easily monitored with remote sensors. Because the condition of an asset, instead of a schedule, determines when maintenance is initiated, this approach is called condition-based or reliability-centered maintenance.

The goal of reliability-centered maintenance is to initiate the right maintenance at the right time. The result is that maintenance is less costly and more effective.

DoD Hierarchy of Maintenance Approaches

Identifying the conditions that are leading indicators of different types of breakdown is accomplished through Failure Modes and Effects Analysis (FMEA). The ways in which a car or bus system or subsystem could fail, its failure modes, are identified along with the possible causes of each failure mode. Causes of failure modes that are more likely to occur or have severe consequences are then monitored using remote sensors or manual tests. FMEA is an essential step to improving both reliability and safety at Metro.

Where is Metro on the hierarchy of maintenance approaches? Metro currently practices calendar-based scheduled maintenance, and has made the decision to migrate to usage-based scheduled maintenance. While this is good, it will not enable Metro to return to its glory days at an affordable price. The prospect of migrating to reliability-centered maintenance has both good and bad news.

The Good News: Many pieces are in place for a transition to reliability-centered maintenance that would be a model for the nation’s transit agencies.

First, WMATA has invested in the leading asset management software system (IBM Maximo) which supports reliability-centered maintenance. Metro is currently using Maximo to track every asset it owns (267,000 assets) so that, for example, replacement components can be identified instantly or maintenance instructions can be remotely downloaded for any component. The Safety Management System that was quickly built by WMATA IT this year enables the Safety Office to analyze failures through point-and-click identification of components in any system.

However, Maximo could also be used to associate asset conditions (e.g. temperature levels) with failure modes. When this is done, Maximo can not only enable more cost-efficient reliability-centered maintenance, it can even use the data it collects to report the maintenance or replacement costs required to support any asset availability target (e.g. 99% availability). Imagine a capital expense budget that includes this type of data-driven, performance-based justification for each line item.

Second, WMATA has equipped the majority of its buses with Automatic Vehicle Monitoring (AVM) instruments. These instruments continuously survey the bus during operation, silently collecting fault, performance, and service data from braking, electrical, engine, transmission, security, fare collection, accessibility, and climate control systems, and then automatically uploading the data nightly.

The Bad News: Despite the presence of the building blocks for implementing maintenance best practices, there seems to be no management-level leadership in maintenance best practices, perhaps the most critical discipline for the future of Metro. As a result, WMATA remains in the trap of expensive reactive maintenance caused by calendar-based maintenance schedules that are independent of the conditions of WMATA’s 267,000 assets.

A case in point is the elevators and escalators, some of whose manufacturers are out-of-business requiring expensive consultants or wholesale replacement. However, we were only reliant on these manufacturers because we implemented their maintenance schedules, instead of conducting Failure Mode and Effects Analysis to develop our own internal knowledge base and condition-based maintenance system for each elevator and escalator. Now Metro has hired a consultant to “fix” the elevators and escalators in 4 stations, a short-term reactive solution that will only work until the next elevator or escalator failure in those stations requires another heroic, expensive consultant.

WMATA can do this. It’s been done in the airline and defense industries, and it will eventually be done in transit. The WMATA Board should select a GM with experience in reliability-centered maintenance, preferably from the airline or defense sectors. And we should encourage WMATA to lead the way among transit agencies, none of whom have adopted these maintenance best practices, lest rail travel across the country be increasingly perceived as out-dated, dangerous and unreliable.

Ken Archer is CTO of a software firm in Tysons Corner. He commutes to Tysons by bus from his home in Georgetown, where he lives with his wife and son.  Ken completed a Masters degree in Philosophy from The Catholic University of America.