Greater Greater Washington

Transit


Which DC-area transit agencies offer open data?

Projects like the Mobility Lab's real-time screens and Transit Near Me can help riders and boost transit usage, but they can only show information for agencies which provide open data. How do our region's agencies stack up?


Photo by rllayman on Flickr.

The table below lists the many transit agencies in the Washington region and their open data progress. In a nutshell, there are 2 kinds of open data: schedule data and real-time arrival data.

General Transit Feed Specification (GTFS) files list schedules and the locations of stops and routes, powering applications like making maps or trip planners. Real-time arrival data lets applications tell riders how far away the bus actually is, for tools like smartphone apps or digital screens.

Schedule data Real-time data
Public GTFS Shapes in GTFS On Google Tracking Tracking API
Metrorail
Here

and Bing

Custom
Metrobus Most1
and Bing

Custom
Circulator (DC)
WMATA2

WMATA2

Nextbus
ART (Arlington)
Here

In process

Connexionz
DASH (Alexandria) Via email only3
Ride On (Montgomery)
Old?4

More info
The Bus (Prince George's)
Nextbus
MTA (Maryland) commuter bus
Here
MARC
Confusingly5

Here
Fairfax (County) Connector
CUE
(Fairfax City)

Nextbus
Loudoun County Transit
Text/email alerts
PRTC
VRE
Unofficial6
Mix of GPS & manual7
1 WMATA's GTFS file contains most Metrobus routes, but some paths cut diagonally across the grid over some long sections such as freeway or bridge segments of routes.

2 Circulator route and schedule data is included as part of the WMATA GTFS feed. However, there are some quality issues such as route names.

3 DASH feed is not publicly available, but officials can provide it via email.

4 Ride On's feed no longer appears to be on their website. GTFS Data Exchange has cached a version from December 2010 which was apparently posted in a news release.

5 MARC lines are listed in the MTA Maryland feed as lines 300, 301, and 302, which doesn't very easily differentiate them for someone unfamiliar with their GTFS feed.

6 Someone not affiliated with VRE created a GTFS file in 2009, but it hasn't been updated since and VRE does not offer an official one.

7 VRE has a page with train status which lists some trains' positions through GPS and some from manual reports from the conductor.

What the columns mean

Creating public GTFS feeds (the 1st column) allows someone who's written an app to easily incorporate schedule and route data for a transit agency. GTFS has emerged as a national standard for representing transit feeds, and there's tremendous value in having as many agencies as possible support the same standard. That way, if someone writes an app in Chicago, they can make it work in Denver, Albany, or Miami at the same time.

Most of the transit agencies' feeds including the paths that the vehicles take, but some do not, like DASH. The 2nd column shows this information. Feeds without paths are still usable, but apps that visualize routes, like Transit Near Me, end up showing unsightly diagonal lines cutting across city blocks.

Agencies can also sign a contract with Google to have their routes and schedules on Google Maps. The 3rd column shows agencies which have done this. Some agencies put out their data files, but aren't willing to sign this contract because of indemnification or other clauses which Google unfortunately insists upon. On the flip side, some agencies sign up with Google but then don't publish the GTFS feed publicly.

The agency might provide it to those who ask, or might not, but this dissuades app creators from including this agency, and makes it harder for them to get regular updates. Every agency should strive to host a public and up-to-date GTFS feed on their site so that anyone building apps can easily incorporate that agency's services into the tool.

The other type of open data is real-time locations or predictions. To make this possible, agencies first have to deploy AVL (Automatic Vehicle Location) technology on their buses or trains (the 4th column). The main obstacle is that this is somewhat expensive; a physical device has to go into each vehicle, and those devices then need some amount of maintenance over time.

Once an agency has tracking, it's relatively simple to offer a computer interface for apps to access and tell riders about this information (the 5th column). Most of the agencies with tracking offer such an interface, but while Ride On, MARC, and Loudoun Transit all have public tracking sites that provide some services to riders, but no way for other apps to tap into the information those sites contain.

What agencies can do

Agencies with red X's on this chart can start thinking about how to provide schedule and/or real-time open data. Creating GTFS files isn't extremely difficult, though it does require some staff time to actually do it. For agencies that use scheduling software, the manufacturers of that software often offer modules to export data as GTFS as well.

Some GTFS feeds could benefit from quality fixes. For example, WMATA's Metrorail GTFS file doesn't show the specific paths trains take, and paths are missing for a few bus routes. The "Transparent Metro Data Sets" Application Programming Interface (API), a special interface WMATA created to offer access to much of its data, does include the correct paths. But many people develop apps to access GTFS files for multiple cities. It's much less likely they will put in extra development effort to specifically pull just these route shapes from this unique API.

The Circulator's routes are part of the WMATA GTFS feed, which makes things even easier for apps than having to download a separate feed. One problem is that the route names are all cryptic: there's "DCDGR" for the Dupont-Georgetown-Rosslyn Circulator, or "DC98" for the route which replaced the former 98 bus. Those are fine for internal systems inside the agencies, but they aren't very clear to riders.

Agencies which have provided their data to Google but don't offer the feeds publicly (like DASH, Ride On, and MARC) should post those feeds on their websites and publicly link to the feeds. They are already creating the GTFS files for Google, so it's a trivial step to also let others download the same files.

WMATA also has much of the route data for other local bus systems in the region as well, which it uses in its trip planner. Agencies which don't have GTFS files can give WMATA permission to include their data in its GTFS feed, as the Circulator does.

Agencies with AVL systems already on their vehicles should set up APIs to give apps access to the locations or predictions, and agencies without AVL can work toward getting the budget necessary to deploy AVL.

What others can do

Transit industry associations and vendors which sell technology to transit agencies can all encourage open data to be part of any contract. Vendors can encourage agencies to open their data and provide services to do so, and associations can encourage agencies to ask their vendors for these services.

The industry can also help move toward a clear standard for bus tracking. GTFS has become a standard for schedule and route data because large numbers of agencies went ahead and offered GTFS files. But there is not yet a consensus around what format to use to offer real-time predictions.

WMATA built its own API which provides the data in a certain format. Circulator, The Bus, and CUE all use Nextbus for tracking, which has its own API. ART uses another service, Connexionz. This unfortunately means that anyone building a real-time application and wants to incorporate multiple services has to support at least 3 different APIs.

There are efforts to create such standards, like GTFS-Realtime, but this hasn't realized the same widespread adoption as GTFS, nor has any other standard.

It's still possible to build apps without a standard, and the Mobility Lab's real-time screen project does connect to all 3 different systems in our region. But that requires extra work, not just for the Mobility Lab but for every other app creator who wants to offer predictions for multiple transit agencies.

The easier we make it to build apps, the more we'll get. Ultimately, it would be great for one standard to emerge, and for the various vendors like Nextbus to agree to all offer data to apps in that same standard format.

Update: Commenter intermodal commuter pointed out the real-time status page for VRE. It combines some train positions from GPS and some from manual reports from conductors. There is not an API to access the data. I've corrected the chart.

Update 2: Commenter Adam noted that MARC is actually contained in the MTA Maryland GTFS file, but listed only as routes 300, 301, and 302, which we didn't realize were not commuter buses upon examining the feed. But you can see the MARC lines on Transit Near Me (for example, center around Union Station).

Also, ACCS Web Manager Joe Chapline posted a status update about ART's efforts to get into Google Transit; according to Chapline, this was delayed for a time due to contract issues, and now is awaiting action by the Google legal department, which I know from past personal experience is often understaffed and backlogged.

David Alpert is the Founder and Editor-in-Chief of Greater Greater Washington and Greater Greater Education. He worked as a Product Manager for Google for six years and has lived in the Boston, San Francisco, and New York metro areas in addition to Washington, DC. He loves the area which is, in many ways, greater than those others, and wants to see it become even greater. 

Comments

Add a comment »

Great roundup. Thanks.

by Brad on Jan 12, 2012 11:14 am • linkreport

Nicely done, David. This is a great template for advocacy.

It's the gaps that are surprising -- who knew that ART wasn't in Google? Or that MARC has GTFS they're not sharing?

by Matt Caywood on Jan 12, 2012 11:40 am • linkreport

Fairfax Connector needs to step up to the plate.

by inlogan on Jan 12, 2012 11:53 am • linkreport

VRE I believe has real-time tracking.

by Jack Love on Jan 12, 2012 11:56 am • linkreport

Although I would switch the 2nd and 3rd columns, because "Public GTFS" and "On Google" are both similar (indicating that a GTFS feed exists somewhere. "Shapes in GTFS" is a different kind of issue.

by Matt Caywood on Jan 12, 2012 11:56 am • linkreport

And yes David, very well done.

by Jack Love on Jan 12, 2012 11:57 am • linkreport

I deliberately put Google 3rd versus 2nd because it's not as important as the others from an open data standpoint; it's not really an open data step at all. It's nice for agencies to be on Google, but that doesn't necessarily lead to the files being public, and unfortunately Google insists they sign an agreement that can have some one-sided indemnification provisions which has become a problem for a few agencies.

The reason I did include it is that if an agency is on Google but doesn't have public GTFS, we know that they have a GTFS file somewhere, and therefore that they should just release it.

by David Alpert on Jan 12, 2012 12:15 pm • linkreport

What does a transit agency have to submit to get their rail/bus lines on the Google Transit Layer? By that I mean the visual layer with colored lines you can turn on by checking "Transit" in the menu or clicking on a stop. NYC, Boston, and others have this, but DC and Baltimore do not.

by Phil LaCombe on Jan 12, 2012 12:26 pm • linkreport

Your chart is incorrect for VRE. VRE has real-time tracking on their website at http://www.vre.org/vremap/app?action=ovmap. It's call RailTime and shows the current position of the train on the map and indicates whether the train is late or on time. It's similar to MARCs, however VRE launched their's first.

by Davin Peterson on Jan 12, 2012 12:32 pm • linkreport

Didn't Connecter swear for years that their data was tied with WMATA and if WMATA did something that they'd be forced to follow. Doesn't seem that true anymore.

by Jason on Jan 12, 2012 12:33 pm • linkreport

I would like for WMATA to publish bus timetables as comma- or tab-delimited text files alongside their PDFs. I could then transfer these files to my PDA and open them in Tiny Sheet.

Last year, I produced my own text files from their PDFs through a somewhat tedious process. Because most of my routes have changed since then, I must redo many of these files, but I'd rather not have to.

I believe that transit agencies are reluctant to release schedule data because they don't want to be blamed if third-party applications such as Transit Near Me produce erroneous information.

by The Civic Center on Jan 12, 2012 12:35 pm • linkreport

VRE's web site has two tracking options:

The VRE map and (when it works) the MARC status page can be sampled to show delays over the course of the day.  The information is accessible, though not in a standard format. 

by intermodal commuter on Jan 12, 2012 12:40 pm • linkreport

Great post.

by Gavin on Jan 12, 2012 12:45 pm • linkreport

@The Civic Center

Agreed - having the schedules in a csv or tsv format would allow schedules to be put into real-time-arrival displays even if the agency doesn't have real-time data. It's not as useful, but it is certainly preferable to nothing.

by OctaviusIII on Jan 12, 2012 12:46 pm • linkreport

Rather, since such a file would show only the timing stops, it would help at certain locations but not necessarily along the route. Better to cover the hubs than not cover anything, though.

by OctaviusIII on Jan 12, 2012 12:50 pm • linkreport

Civic Center and OctaviusIII: WMATA should just release the data in GTFS, as they do, and then someone else can (and should, if it doesn't exist already) create a tool to generate simple tabular schedules.

The open source code for Transit Near Me, for example, includes loading GTFS files into a database; once that's in there, you can just do a select query.

There's also an existing app that I think does what you are asking, specifically for Northern Virginia transit schedules: http://www.commuterpage.com/novasched.htm

We shouldn't put the onus on WMATA to release data in all the different formats that people might find useful. Instead, they should release it in one format that's as standard as possible, and then let people create tools which do many things with that data.

by David Alpert on Jan 12, 2012 1:16 pm • linkreport

Thanks, intermodal commuter. I've updated the table for VRE.

by David Alpert on Jan 12, 2012 1:33 pm • linkreport

Thank you for your comments, OctaviusIII, and thank you, David, for a very informative post.

I use published timetables and NextBus in combination.

By the way, many riders who use the audio version of NextBus may not be aware that it can be reached directly at 301-562-4669.

Another tip: If you are waiting for a South- or East-bound bus at Tenleytown, use StopID 1002500 instead of 1002478. Using 1002478 gives you times for both directions or westbound only, because stops in both directions at Wisc and Albemarle are assigned this number. I emailed WMATA about this ambiguity a few months ago, and I think that they're fixing it.

by The Civic Center on Jan 12, 2012 1:35 pm • linkreport

The fact that Fairfax Connector doesn't even seem to be moving forward with any data sharing really bothers me. My guess is they'll offer real time tracking just in time for the first phase of the Silver Line to open. Then I won't be riding them anymore and won't care.

by Car Free Todd on Jan 12, 2012 3:23 pm • linkreport

MARC should be in the same GTFS feed with MTA Commuter bus. The routes, once properly labeled, now seem to be labeled 300-302 lines.

by Adam on Jan 12, 2012 3:38 pm • linkreport

It's true that ART data isn't on Google, although ART submitted data to Google and finished the testing process about a year ago. There were some concerns about the partnership agreement, as David indicated there sometimes are. It took some time to figure this out, but in May 2011 the County Board approved partnering with Google through the Virginia Department of Rail and Public Transportation (DRPT), who already had an agreement. This seemed like a good idea at the time, but apparently it required that existing agreement to be amended. Since September, we've been waiting for Google's legal department to sign the agreement. As far as I know, there's no dispute; we're just waiting for it to happen.

by Joe Chapline, ACCS Web Manager on Jan 12, 2012 6:34 pm • linkreport

Joe: Thanks. I've added a link to your comment in the table so people curious about it can easily jump to the comment.

Adam: I see. You're right. Thanks. I've updated the table.

by David Alpert on Jan 12, 2012 6:51 pm • linkreport

FYI - ART, through its AVL provider (Connexionz), has scheduled the development of a GTFS-Realtime standard feed during 2012. We had originally thought it would be good to mirror the WMATA API but decided to use GTFS-Realtime for future compatibility with other agencies.

by Tom Scherer on Jan 13, 2012 8:51 am • linkreport

Thanks for the update to the chart, David. When I worked at MTA Md, I wound up taking on the task of improving the Google Data labels after seeing the rail lines simply being given the internal data numbers and being labeled as the default of bus lines. I even devised a color scheme, and added schedule data from numerous operations connecting to the Baltimore core such as Howard Transit and CMRT to create better regional itineraries in Google.

After all that, it is dismaying to see the return to the internal labels, while a process of under an hour to update the old files could make the data far less confusing to new riders now being told to take the "301" rail line and connect to the "200" rail line in Baltimore.

by Adam on Jan 13, 2012 8:59 am • linkreport

Agreed that MTA should (re)correct the labeling of the rail lines. Almost all of the formatting needs to be changed, really. For instance, all of the bus lines include their short name (number) in the long name and the trip headsign. It's redundant. Some of the buses use white text on a yellow background--illegible. It wouldn't take long to clean it all up, would it? I might volunteer my time!

by Phil LaCombe on Jan 13, 2012 9:12 am • linkreport

The route number appearing in the headsign field (ex "120 Bus towards 120 White Marsh") is a bit tricky since that field in the scheduling software is also used an application for next vehicle arrival signs that needs that route number, and the Google data has a headsign field for every trip.

That said, things like the yellow-white color combo on the qb48 line, the quirky "leading zeros" on bus routes 99 and under, the colors and labels on Howard and Annapolis Transit lines, and the rail line labels would only need updating in one file to improve the usability and appearance of the resulting output.

by Adam on Jan 13, 2012 9:24 am • linkreport

@Adam
Thanks for shedding some light on the MTA GTFS data.
On the headsign issue, you couldn't just bring the trips file into Excel and do a find replace "120 " ""? You'd have to repeat the process for every route, but it couldn't take more than a few hours. Is the MTA the only agency using that next vehicle arrival technology? I wonder how other agencies work around the problem.
Why are stops in all caps? And is it necessary to include the direction in the stop (e.g. "nb")? Is this also due to the data being used elsewhere? Other agencies do not name their stops this way in GTFS.

by Phil LaCombe on Jan 13, 2012 9:41 am • linkreport

@ Phil,
I think I considered something like that, but since the file also contains other number values, I was hesitant. For example, replace "13" with "" would also snatch up the 4:13 departure of some other line, unless as you mentioned, pulled the data over to Excel and then reparsed it to a comma delimited file, something that I didn't have the time to do given other more pressing duties - Google was something I elected to take on, and I focused on the getting the most bang for the buck timewise.

One thing I wish Google Transit made use of are some of the extra data fields in the feed. I actually took the time and populated these, for example noting that Express routes have a 40¢ surcharge on the fare, that Light Rail tickets should be purchased prior to boarding, or that a line operated school days only.

Creating a fare matrix on Google is another matter, particularly when incorporating other systems that have "semi-reciprocal" arrangements. Get an MTA daypass on the 17 bus, and you can use it to board the J or Silver at Arundel Mills. However, catch the "J" from Laurel and try to get an MTA daypass and you're out of luck, so you've got to pony up to the get the 17 at the same spot.

by Adam on Jan 13, 2012 10:10 am • linkreport

In addition to the table above, many area universities provide transit service. The University of Maryland has almost 30 bus lines, which are in NextBus. GMU also has several in NextBus. They certainly qualify as "transit agencies," although they are not "public transit agencies."

http://www.nextbus.com/predictor/agencySelector.jsp#Maryland
http://www.nextbus.com/predictor/agencySelector.jsp#Virginia
http://www.nextbus.com/predictor/agencySelector.jsp#District%20of%20Columbia

by Matt Caywood on Jan 13, 2012 3:12 pm • linkreport

PRTC has the capacity to produce a GTFS feed and has contracted with Google to participate in Google Transit. The agency’s schedule data, including shapes, currently resides in the test environment and testing will commence in the near future. Once testing is completed and the agency is ready to go live on Google Transit, the GTFS feed will be made available publicly. In addition, the agency is poised to embark on a CAD/AVL project, which ultimately will provide real-time data. Until such time, PRTC will continue pushing service related information to riders via email and SMS messages using the Rider Express subscription service.

by Althea Evans on Jan 13, 2012 3:56 pm • linkreport

Update on Tenleytown Bus Stop at Wisc and Albemarle

WMATA assigned a new ID, 1003486, to the east-/south-bound stop. Stop ID 1002478, which had corresponded to both stops at that intersection, now refers only to the west-/north-bound stop in front of the Thai restaurant.

I got this problem fixed by contacting WMATA through the developer page of their web site late last year.

And, David, this reminds me that bus timetables could be generated also from WMATA's XML files. The developer page shows how to access these files. An off-the-shelf XML-to-CSV converter or a custom application in Java or some other language could then be used to produce timetables in a delimited format. I don't know whether I'll ever have time to pursue such a project.

Thanks for the link to The Commuter Page. They provide
schedules for Palm OS, which is what I have, but only for NoVa transit systems, but what they're doing is worth exploring.

by The Civic Center on Jan 18, 2012 11:32 pm • linkreport

Kurt Raschke has a spreadsheet with updated information current as of 11/2012. It's not as comprehensive as this post, but there are a few major changes that need to be highlighted:

* Ride-On now has a GTFS-realtime API.

* Fairfax Connector now has GTFS.

* VRE now has an official GTFS schedule, and a GTFS-realtime API.

by Matt Caywood on Nov 13, 2012 1:39 am • linkreport

Add a Comment

Name: (will be displayed on the comments page)

Email: (must be your real address, but will be kept private)

URL: (optional, will be displayed)

Your comment:

By submitting a comment, you agree to abide by our comment policy.
Notify me of followup comments via email. (You can also subscribe without commenting.)
Save my name and email address on this computer so I don't have to enter it next time, and so I don't have to answer the anti-spam map challenge question in the future.

or

Support Us