Transit
Which DC-area transit agencies offer open data?
Projects like the Mobility Lab's real-time screens and Transit Near Me can help riders and boost transit usage, but they can only show information for agencies which provide open data. How do our region's agencies stack up?
The table below lists the many transit agencies in the Washington region and their open data progress. In a nutshell, there are 2 kinds of open data: schedule data and real-time arrival data.
General Transit Feed Specification (GTFS) files list schedules and the locations of stops and routes, powering applications like making maps or trip planners. Real-time arrival data lets applications tell riders how far away the bus actually is, for tools like smartphone apps or digital screens.
| Schedule data | Real-time data | ||||
|---|---|---|---|---|---|
| Public GTFS | Shapes in GTFS | On Google | Tracking | Tracking API | |
| Metrorail | ![]() Here |
| ![]() and Bing |
| ![]() Custom |
| Metrobus |
| Most1 | ![]() and Bing |
| ![]() Custom |
| Circulator (DC) | ![]() WMATA2 |
| ![]() WMATA2 |
| ![]() Nextbus |
| ART (Arlington) | ![]() Here |
| ![]() In process |
| ![]() Connexionz |
| DASH (Alexandria) | Via email only3 |
|
|
|
|
| Ride On (Montgomery) | ![]() Old?4 |
|
|
| ![]() More info |
| The Bus (Prince George's) |
|
|
|
| ![]() Nextbus |
| MTA (Maryland) commuter bus | ![]() Here |
|
|
|
|
| MARC | ![]() Confusingly5 |
|
| ![]() Here |
|
| Fairfax (County) Connector |
|
|
|
|
|
| CUE (Fairfax City) |
|
|
|
| ![]() Nextbus |
| Loudoun County Transit |
|
|
| ![]() Text/email alerts |
|
| PRTC |
|
|
|
|
|
| VRE | ![]() Unofficial6 |
|
| Mix of GPS & manual7 |
|
2 Circulator route and schedule data is included as part of the WMATA GTFS feed. However, there are some quality issues such as route names.
3 DASH feed is not publicly available, but officials can provide it via email.
4 Ride On's feed no longer appears to be on their website. GTFS Data Exchange has cached a version from December 2010 which was apparently posted in a news release.
5 MARC lines are listed in the MTA Maryland feed as lines 300, 301, and 302, which doesn't very easily differentiate them for someone unfamiliar with their GTFS feed.
6 Someone not affiliated with VRE created a GTFS file in 2009, but it hasn't been updated since and VRE does not offer an official one.
7 VRE has a page with train status which lists some trains' positions through GPS and some from manual reports from the conductor.
What the columns mean
Creating public GTFS feeds (the 1st column) allows someone who's written an app to easily incorporate schedule and route data for a transit agency. GTFS has emerged as a national standard for representing transit feeds, and there's tremendous value in having as many agencies as possible support the same standard. That way, if someone writes an app in Chicago, they can make it work in Denver, Albany, or Miami at the same time.
Most of the transit agencies' feeds including the paths that the vehicles take, but some do not, like DASH. The 2nd column shows this information. Feeds without paths are still usable, but apps that visualize routes, like Transit Near Me, end up showing unsightly diagonal lines cutting across city blocks.
Agencies can also sign a contract with Google to have their routes and schedules on Google Maps. The 3rd column shows agencies which have done this. Some agencies put out their data files, but aren't willing to sign this contract because of indemnification or other clauses which Google unfortunately insists upon. On the flip side, some agencies sign up with Google but then don't publish the GTFS feed publicly.
The agency might provide it to those who ask, or might not, but this dissuades app creators from including this agency, and makes it harder for them to get regular updates. Every agency should strive to host a public and up-to-date GTFS feed on their site so that anyone building apps can easily incorporate that agency's services into the tool.
The other type of open data is real-time locations or predictions. To make this possible, agencies first have to deploy AVL (Automatic Vehicle Location) technology on their buses or trains (the 4th column). The main obstacle is that this is somewhat expensive; a physical device has to go into each vehicle, and those devices then need some amount of maintenance over time.
Once an agency has tracking, it's relatively simple to offer a computer interface for apps to access and tell riders about this information (the 5th column). Most of the agencies with tracking offer such an interface, but while Ride On, MARC, and Loudoun Transit all have public tracking sites that provide some services to riders, but no way for other apps to tap into the information those sites contain.
What agencies can do
Agencies with red X's on this chart can start thinking about how to provide schedule and/or real-time open data. Creating GTFS files isn't extremely difficult, though it does require some staff time to actually do it. For agencies that use scheduling software, the manufacturers of that software often offer modules to export data as GTFS as well.
Some GTFS feeds could benefit from quality fixes. For example, WMATA's Metrorail GTFS file doesn't show the specific paths trains take, and paths are missing for a few bus routes. The "Transparent Metro Data Sets" Application Programming Interface (API), a special interface WMATA created to offer access to much of its data, does include the correct paths. But many people develop apps to access GTFS files for multiple cities. It's much less likely they will put in extra development effort to specifically pull just these route shapes from this unique API.
The Circulator's routes are part of the WMATA GTFS feed, which makes things even easier for apps than having to download a separate feed. One problem is that the route names are all cryptic: there's "DCDGR" for the Dupont-Georgetown-Rosslyn Circulator, or "DC98" for the route which replaced the former 98 bus. Those are fine for internal systems inside the agencies, but they aren't very clear to riders.
Agencies which have provided their data to Google but don't offer the feeds publicly (like DASH, Ride On, and MARC) should post those feeds on their websites and publicly link to the feeds. They are already creating the GTFS files for Google, so it's a trivial step to also let others download the same files.
WMATA also has much of the route data for other local bus systems in the region as well, which it uses in its trip planner. Agencies which don't have GTFS files can give WMATA permission to include their data in its GTFS feed, as the Circulator does.
Agencies with AVL systems already on their vehicles should set up APIs to give apps access to the locations or predictions, and agencies without AVL can work toward getting the budget necessary to deploy AVL.
What others can do
Transit industry associations and vendors which sell technology to transit agencies can all encourage open data to be part of any contract. Vendors can encourage agencies to open their data and provide services to do so, and associations can encourage agencies to ask their vendors for these services.
The industry can also help move toward a clear standard for bus tracking. GTFS has become a standard for schedule and route data because large numbers of agencies went ahead and offered GTFS files. But there is not yet a consensus around what format to use to offer real-time predictions.
WMATA built its own API which provides the data in a certain format. Circulator, The Bus, and CUE all use Nextbus for tracking, which has its own API. ART uses another service, Connexionz. This unfortunately means that anyone building a real-time application and wants to incorporate multiple services has to support at least 3 different APIs.
There are efforts to create such standards, like GTFS-Realtime, but this hasn't realized the same widespread adoption as GTFS, nor has any other standard.
It's still possible to build apps without a standard, and the Mobility Lab's real-time screen project does connect to all 3 different systems in our region. But that requires extra work, not just for the Mobility Lab but for every other app creator who wants to offer predictions for multiple transit agencies.
The easier we make it to build apps, the more we'll get. Ultimately, it would be great for one standard to emerge, and for the various vendors like Nextbus to agree to all offer data to apps in that same standard format.
Update: Commenter intermodal commuter pointed out the real-time status page for VRE. It combines some train positions from GPS and some from manual reports from conductors. There is not an API to access the data. I've corrected the chart.
Update 2: Commenter Adam noted that MARC is actually contained in the MTA Maryland GTFS file, but listed only as routes 300, 301, and 302, which we didn't realize were not commuter buses upon examining the feed. But you can see the MARC lines on Transit Near Me (for example, center around Union Station).
Also, ACCS Web Manager Joe Chapline posted a status update about ART's efforts to get into Google Transit; according to Chapline, this was delayed for a time due to contract issues, and now is awaiting action by the Google legal department, which I know from past personal experience is often understaffed and backlogged.
Comments
- Latest Metro map drafts add Anacostia parks and other tweaks
- Bikeshare is a gateway to private biking, not competition
- DC Council makes major policy changes overnight
- Short-term Washingtonians deserve a voice, too
- Public land deals have both benefits and pitfalls
- Parklets give every block a little park
- Judge denies injunction against closing schools








by Brad on Jan 12, 2012 11:14 am • link • report
It's the gaps that are surprising -- who knew that ART wasn't in Google? Or that MARC has GTFS they're not sharing?
by Matt Caywood on Jan 12, 2012 11:40 am • link • report
by inlogan on Jan 12, 2012 11:53 am • link • report
by Jack Love on Jan 12, 2012 11:56 am • link • report
by Matt Caywood on Jan 12, 2012 11:56 am • link • report
by Jack Love on Jan 12, 2012 11:57 am • link • report
The reason I did include it is that if an agency is on Google but doesn't have public GTFS, we know that they have a GTFS file somewhere, and therefore that they should just release it.
by David Alpert on Jan 12, 2012 12:15 pm • link • report
by Phil LaCombe on Jan 12, 2012 12:26 pm • link • report
by Davin Peterson on Jan 12, 2012 12:32 pm • link • report
by Jason on Jan 12, 2012 12:33 pm • link • report
Last year, I produced my own text files from their PDFs through a somewhat tedious process. Because most of my routes have changed since then, I must redo many of these files, but I'd rather not have to.
I believe that transit agencies are reluctant to release schedule data because they don't want to be blamed if third-party applications such as Transit Near Me produce erroneous information.
by The Civic Center on Jan 12, 2012 12:35 pm • link • report
The VRE map and (when it works) the MARC status page can be sampled to show delays over the course of the day. The information is accessible, though not in a standard format.
by intermodal commuter on Jan 12, 2012 12:40 pm • link • report
by Gavin on Jan 12, 2012 12:45 pm • link • report
Agreed - having the schedules in a csv or tsv format would allow schedules to be put into real-time-arrival displays even if the agency doesn't have real-time data. It's not as useful, but it is certainly preferable to nothing.
by OctaviusIII on Jan 12, 2012 12:46 pm • link • report
by OctaviusIII on Jan 12, 2012 12:50 pm • link • report
The open source code for Transit Near Me, for example, includes loading GTFS files into a database; once that's in there, you can just do a select query.
There's also an existing app that I think does what you are asking, specifically for Northern Virginia transit schedules: http://www.commuterpage.com/novasched.htm
We shouldn't put the onus on WMATA to release data in all the different formats that people might find useful. Instead, they should release it in one format that's as standard as possible, and then let people create tools which do many things with that data.
by David Alpert on Jan 12, 2012 1:16 pm • link • report
by David Alpert on Jan 12, 2012 1:33 pm • link • report
I use published timetables and NextBus in combination.
By the way, many riders who use the audio version of NextBus may not be aware that it can be reached directly at 301-562-4669.
Another tip: If you are waiting for a South- or East-bound bus at Tenleytown, use StopID 1002500 instead of 1002478. Using 1002478 gives you times for both directions or westbound only, because stops in both directions at Wisc and Albemarle are assigned this number. I emailed WMATA about this ambiguity a few months ago, and I think that they're fixing it.
by The Civic Center on Jan 12, 2012 1:35 pm • link • report
by Car Free Todd on Jan 12, 2012 3:23 pm • link • report
by Adam on Jan 12, 2012 3:38 pm • link • report
by Joe Chapline, ACCS Web Manager on Jan 12, 2012 6:34 pm • link • report
Adam: I see. You're right. Thanks. I've updated the table.
by David Alpert on Jan 12, 2012 6:51 pm • link • report
by Tom Scherer on Jan 13, 2012 8:51 am • link • report
After all that, it is dismaying to see the return to the internal labels, while a process of under an hour to update the old files could make the data far less confusing to new riders now being told to take the "301" rail line and connect to the "200" rail line in Baltimore.
by Adam on Jan 13, 2012 8:59 am • link • report
by Phil LaCombe on Jan 13, 2012 9:12 am • link • report
That said, things like the yellow-white color combo on the qb48 line, the quirky "leading zeros" on bus routes 99 and under, the colors and labels on Howard and Annapolis Transit lines, and the rail line labels would only need updating in one file to improve the usability and appearance of the resulting output.
by Adam on Jan 13, 2012 9:24 am • link • report
Thanks for shedding some light on the MTA GTFS data.
On the headsign issue, you couldn't just bring the trips file into Excel and do a find replace "120 " ""? You'd have to repeat the process for every route, but it couldn't take more than a few hours. Is the MTA the only agency using that next vehicle arrival technology? I wonder how other agencies work around the problem.
Why are stops in all caps? And is it necessary to include the direction in the stop (e.g. "nb")? Is this also due to the data being used elsewhere? Other agencies do not name their stops this way in GTFS.
by Phil LaCombe on Jan 13, 2012 9:41 am • link • report
I think I considered something like that, but since the file also contains other number values, I was hesitant. For example, replace "13" with "" would also snatch up the 4:13 departure of some other line, unless as you mentioned, pulled the data over to Excel and then reparsed it to a comma delimited file, something that I didn't have the time to do given other more pressing duties - Google was something I elected to take on, and I focused on the getting the most bang for the buck timewise.
One thing I wish Google Transit made use of are some of the extra data fields in the feed. I actually took the time and populated these, for example noting that Express routes have a 40¢ surcharge on the fare, that Light Rail tickets should be purchased prior to boarding, or that a line operated school days only.
Creating a fare matrix on Google is another matter, particularly when incorporating other systems that have "semi-reciprocal" arrangements. Get an MTA daypass on the 17 bus, and you can use it to board the J or Silver at Arundel Mills. However, catch the "J" from Laurel and try to get an MTA daypass and you're out of luck, so you've got to pony up to the get the 17 at the same spot.
by Adam on Jan 13, 2012 10:10 am • link • report
http://www.nextbus.com/predictor/agencySelector.jsp#Maryland
http://www.nextbus.com/predictor/agencySelector.jsp#Virginia
http://www.nextbus.com/predictor/agencySelector.jsp#District%20of%20Columbia
by Matt Caywood on Jan 13, 2012 3:12 pm • link • report
by Althea Evans on Jan 13, 2012 3:56 pm • link • report
WMATA assigned a new ID, 1003486, to the east-/south-bound stop. Stop ID 1002478, which had corresponded to both stops at that intersection, now refers only to the west-/north-bound stop in front of the Thai restaurant.
I got this problem fixed by contacting WMATA through the developer page of their web site late last year.
And, David, this reminds me that bus timetables could be generated also from WMATA's XML files. The developer page shows how to access these files. An off-the-shelf XML-to-CSV converter or a custom application in Java or some other language could then be used to produce timetables in a delimited format. I don't know whether I'll ever have time to pursue such a project.
Thanks for the link to The Commuter Page. They provide
schedules for Palm OS, which is what I have, but only for NoVa transit systems, but what they're doing is worth exploring.
by The Civic Center on Jan 18, 2012 11:32 pm • link • report
* Ride-On now has a GTFS-realtime API.
* Fairfax Connector now has GTFS.
* VRE now has an official GTFS schedule, and a GTFS-realtime API.
by Matt Caywood on Nov 13, 2012 1:39 am • link • report
Add a Comment