Greater Greater Washington

Bicycling


Capital Bikeshare data already yields interesting facts

Reader Corey H has already taken the Capital Bikeshare anonymous trip data, released just a few days ago, and crunched the numbers to come up with some fascinating nuggets of information:


Photo by DDOTDC on Flickr.

1) Downhill flow. Average trip is -1.94 meters, or over 2,632 kilometers in elevation change total. The average ride from Wisconsin and Macomb loses 55 meters in elevation.

Fairfax Village has the highest start station:end station ratio (71 trips started, only 29 ended).

2) Last mile usage. The four most common one-way trips are Adams Mill/Columbia to Calvert/Woodley and back as well as Eastern Market Metro to Lincoln Park and back.

3) Tourists like to use it to sight-see. The 6th most common one-way trip is from the Smithsonian station back to the Smithsonian Station at 3,586 trips.

The average [such] trip is 2 hours, 48 minutes. 76.1% of [these] trips generate usage fees. Breaking that down between casual and members, 86.0% of casual incurred fees on these rides while 18.8% of members incurred fees.

4) Casual vs. members usage fees. 40.7% of casual rides incur fees. 3.3% 3.3% of member rides incur fees.

*Using GPS elevation data so all caveats apply. And it only factors in station-to-station elevation change.

Corey added in an email, "I've already taken the time to clean the data and get it into a usable database. So if there are specific questions you'd like to be answered I can easily put together a query to get those answers (and I'm sure the others can as well)."

What would you like to know about Capital Bikeshare usage? He can't necessarily investigate everyone's questions, but if anyone posts some interesting questions that catch Corey's eye, maybe he will analyze them for us.

David Alpert is the Founder and Editor-in-Chief of Greater Greater Washington and Greater Greater Education. He worked as a Product Manager for Google for six years and has lived in the Boston, San Francisco, and New York metro areas in addition to Washington, DC. He loves the area which is, in many ways, greater than those others, and wants to see it become even greater. 

Comments

Add a comment »

I have definitely contributed to the downhill flow situation. :)

by tom veil on Jan 13, 2012 1:46 pm • linkreport

To Corey: Any chance I could get hold of your cleaned database? I'd love to play with the bikeshare data myself, (I'm a bit of a machine-learning dork) and to get my hands on a cleaned version would be awesome.

Thanks!

by AR on Jan 13, 2012 1:47 pm • linkreport

I'm curious about the site-seeing usage out of the Crystal City stations. Are there significant round-trips there?

by Kevin Beekman on Jan 13, 2012 1:55 pm • linkreport

I've been crunching these numbers a bit too... It has been really interesting. I came up with a different number for registered user trips that incurred fees - 3.3% (33,011 rides over 2011) rather than .3% (probably just a moving-the-decimal error). My number for casual users was 40.4%. Both of those exclude trips that took more than 24 hours, since I'm not sure exactly how CaBi would actually deal with those relatively-rare instances - would they actually charge $1000? There were only 240 of these instances in 2011, so they probably handle it on a case by case basis.

I also worked out an estimate for how much CaBi might have made in overage fees for 2011. Again, excluding trips over 24 hours, I came up with more than $1.3 million, mostly from casual users (87.9% of the money). The average overage fee for casual users was also much higher than for registered users.

I put together a blog post this morning with my numbers at bikesharestation.com/capital-bikeshare-data/, with some more numbers probably coming soon.

by Eric O on Jan 13, 2012 2:03 pm • linkreport

What were the top-10 most-common trips? Given how early the 17th & R NW station empties out in the morning, I'm guessing rides to and from there will be on the list.

by Brad on Jan 13, 2012 2:10 pm • linkreport

Which stations spend the most time dock-blocked or bike-less? And what times should we avoid those stations? Is there any way to use this data to build a calculator where you plug in a time and date and it will tell you the likelihood of whether the station will be dock-blocked or bike-less?

by Falls Church on Jan 13, 2012 2:18 pm • linkreport

I would like to get a hand on the cleaned up database as well. I'd like to use to stalk a few people and to track the movements of Christiana Amanapour (C'mon, you know she's a member). Take that, restraining order.

by David C on Jan 13, 2012 2:19 pm • linkreport

I'd first like to know what format Corey's cleaned up database is actually in and how it differs from the CSV file up on the web?

I've also cleaned up the data, because there's not a whole lot you can do with the CSVs up on the web, because the date and time stamps are stored as character strings. It also doesn't help that the station variable doesn't have the ID parsed out from the huge long station name.

I have the data stored as SAS datasets and could answer many questions as well - it's just a matter of a) finding the time to sit down and code it and b) figuring out where to begin.

by Rob P on Jan 13, 2012 2:28 pm • linkreport

@Brad

The 10 most common one-way trips on weekdays by registered memebers:

5:00 AM to 9:59 AM:
1) Lincoln Park to Eastern Market
2) 13th/D NE to Union Station
3) 18th/Bell (Crystal City Metro) to 27th/Crystal Drive
4) 14th/V NW to Dupon Circle Metro
5) 17th/Corcoran to 17th/K
6) 18th/Bell to 23rd/Crystal
7) 13th/H St to Union Station
8) Union Station to L'Enfant Plaze
9) Adams Mill/Columbia to Calvert/Woodley
10) 15th/P to Dupont Circle

3:00 PM to 7:59 PM
1) Eastern Market to Lincoln Park
2) Calvert/Woodley to Adams Mill/Columbia
3) Union Station to 13th/D
4) Dupon Circle to 15th/P
5) Union Station to 13th/H
6) 27th/Crystal to 18th/Bell
7) Adams Mill/Columbia to Calvert/Woodley
8) Dupont Circle to 16th/U
9) Dupont Circle to 14th/V
10) Eastern Market to 13th/D

@Falls Church

The data are for trips only. It's difficult to find whether these stations were dockblocked/empty at any time. www.cabitracker.com has good data on that.

@Rob P, David C, Eric O

Basically I normalized the five CSVs, then removed the trips from non-standard stations, removed trips to and from the same station less than 60 seconds, removed incomplete entries, and removed trips over 24 hours. I also made the personal decision to remove the White House station trips, but that's a judgement call. Once I did that, I added elevation change and station-to-station distance.

GPS Coordinates came from http://www.capitalbikeshare.com/stations/bikeStations.xml

Elevations came from http://www.gpsvisualizer.com/elevation

Distance using the GPS coordinates and the Great Circle Distance (http://en.wikipedia.org/wiki/Great-circle_distance)

by Corey H. on Jan 13, 2012 2:45 pm • linkreport

Corey, thanks for clarifying. How are you handling the date and time stamps? The key issue for me is that I want to be able to set-up cross-tabs and filters by time of day, day of week, etc. I can't do that so long as the date and timestamp is stored as a string.

by Rob P on Jan 13, 2012 3:08 pm • linkreport

Fascinating how much commuter traffic is in Crystal City and Capital Hill. I would not have expected that.

by charlie on Jan 13, 2012 3:26 pm • linkreport

@Corey: Ah-ha! #5 in the a.m. Thanks!

by Brad on Jan 13, 2012 3:26 pm • linkreport

Capitol Hill is totally unsurprising to me. Bikeshare allows residents in the "interior" areas of the neighborhood to have more convenient access to the Metro Stations along its perimeter.

Lincoln Park to Eastern Market is by no means an insurmountable distance to walk. However, it'd be annoying to have to do every day. Bikeshare adds a huge convenience factor.

Ditto for Crystal City. There are a few points in the area that are irritatingly far from the Metro. My only gripe is that CC still has a long way to come in terms of bicycle friendliness. Biking from the Metro to the 23rd St restaurants should be an no-brainer, although it's one of the few Bikeshare trips that I'll *never* attempt at night or without a helmet.

by andrew on Jan 13, 2012 3:31 pm • linkreport

@Charlie,

They're the most common one-way routes, not necessarily the heaviest-used stations. Those top 10 trips are almost exclusively last-mile connections to transit.

But look at 14th/V, with the second most originations in the AM rush, and the trips go all over the city, Dupont Cirlce, 19th/L, 17th/K, 18th/M, 21st/M, 14th/H are the most popular destinations.

by Corey H. on Jan 13, 2012 3:37 pm • linkreport

Is it possible to determine which routes are primarily on bike lanes? On the really bad weather days are the top ten routes different?

by tour guide on Jan 13, 2012 3:53 pm • linkreport

@tour guide -- that would be extremely challenging, as the routes themselves are not available (just the origin-destination pairs).

But one could definitely look into pairs of stations that are connected by bike lanes, and assume that a fair amount of riders would choose the bikelane routes.

Similarly, when DDOT gets around to adding new bike lanes in the city, it might be possible to do a before/after comparison (I think a time-series, although it's been a long time since I took stats) to see if the lanes impact overall usage in stations along those routes.

I would be interested in seeing how bad weather days impact CaBi usage at various stations. I would hypothesize that stations near Metro (or trips originating/ending at metro stations) don't have as much of a drop in rainy-day usage as station pairs in which Metro is not involved.

by Jacques on Jan 13, 2012 4:01 pm • linkreport

@CoreyH; thanks. That makes more sense to me -- after looking at oobrien's animations.

Thanks again for doing this.

I'd love to look at the EOTR data. I know it has picked up in recent months, but it might be interesting to see where it goes.

Another point. David's bllboard concect -- I suggested on the bikehare side you have station availability and the 5 most used station from that station. With that data, you could do that.

by charlie on Jan 13, 2012 4:07 pm • linkreport

I would also add that there are probably of one-way trips that aren't taken during the morning and evening rush periods simply because there aren't any bikes.

by Adam L on Jan 13, 2012 4:16 pm • linkreport

In the article, the 0.3% should read 3.3% as Eric O. pointed out.

@Jacques, it would be easy enough to add historic
temperatures, humidity, and precipitation to the database. However, you'd have to factor in the cyclical trends of usage to make any real insights. (http://goo.gl/J3B5f for historic weather data)

@Rob P, David C, Eric O

For clarification on the cleaned data set, starting with 1,361,007 total records, here are the records I removed

18 incomplete records (no bike or member type)
169 record with non-standard start stations
807 non-standard or empty end stations
2,224 trips that started or ended at the white house station
16,027 trips that started AND ended at the same station and lasted 60 seconds or less
287 trips that lasted 24 hours or longer

That leaves 1,341,478 usable records. I then added station-to-station distance in miles, and elevation change in feet while converting the duration to seconds.

The CSV is at:
www.coreyholman.com/flat.csv

The Station IDs are at:
www.coreyholman.com/stations.txt

Member Type 1 = Registered, 2 = Casual

by Corey H. on Jan 13, 2012 4:42 pm • linkreport

@Charlie

Of the trips origination from Ward 7 and Ward 8, not surprisingly, given geography, most of them are either self-contained or go to Ward 6.

Start Node - End Node - Percent of Total

DC7-DC1 1.0%
DC7-DC2 3.8%
DC7-DC5 2.5%
DC7-DC6 41.6%
DC7-DC7 38.6%
DC7-DC8 12.3%
DC7-RB 0.1%

DC8-CC 0.3%
DC8-DC1 0.6%
DC8-DC2 8.1%
DC8-DC3 0.1%
DC8-DC4 0.2%
DC8-DC5 0.8%
DC8-DC6 34.3%
DC8-DC7 4.6%
DC8-DC8 50.1%
DC8-RB 0.9%

(CC = Crystal City cluster, RB = Rosslyn-Ballston cluster)

@Adam L.

You've highlighted one of the biggest issues with Bikeshare data collection is that there's no way to tell rides not taken or rides that went to multiple stations before finding a dock.

by Corey H. on Jan 13, 2012 5:14 pm • linkreport

The study of CaBi casual users mentions a technique called bottomless stations to measure real demand. Basically they take a ton of bikes to a stations that are regularly emptied in the morning and they just keep replacing them to see how many additional bikes would be needed to meet demand. Pretty clever trick.

Something else that would be neat - and difficult - to correlate is how CaBi use changes on days when there are Metro service outages.

by David C on Jan 13, 2012 5:28 pm • linkreport

@David C: That is clever -- yet it fails to take account of riders like my wife, who several month ago largely gave up on using CaBi in the morning because the station was consistently empty by 8 am.

by Brad on Jan 13, 2012 5:43 pm • linkreport

Brad, true, but I think you could easily model discouraged users. First do the bottomless station trick to determine the needed station size. Come back in 3 months and do it again. Whatever extra demand you have then is probably a close approximation to the discouraged riders. Do that a few dozen times and you can come up with an estimate for the likely number of discouraged users.

by David C on Jan 13, 2012 5:48 pm • linkreport

You know, that is interesting. I always wonder why the Eastern market station seemed to be full in the morning, and empty during the evening commute. I had thought people would be using it to commute downtown. Actually turns out they are using it between home and the subway.

by beatbox on Jan 13, 2012 5:54 pm • linkreport

The geographic center of the systems is around 11th & F:
http://g.co/maps/d9wnc

38.8975160356978 -77.0273450932033

Closest station: Metro Center/12th & G (terminalName 31230)

by akg on Jan 13, 2012 6:15 pm • linkreport

On a related note, is there any Google Maps layer or shared map that shows the location of cabi stations? It would be very helpful when planning trips with public transportation....at least until G integrates the cabi data

by RE on Jan 14, 2012 8:30 am • linkreport

Corey H. -- nice work.

by Richard Layman on Jan 14, 2012 12:53 pm • linkreport

It might be interesting to see how real distances affect the analysis instead of CABI distances.

My three most-used routes are 1.4x, 1.4x, and 1.2x longer than CABI's point-to-point calculations.

Obviously you can't know the actual route that any rider takes, but we do know that unless someone simply rides in a straight line from station to station on the same road, that rides are longer than the point-to-point numbers. Probably nearly all of them will be.

It shouldn't be too hard to approximate the real distances for say the Top 25 trips: it's probably safe to guess that if a route is that popular then there is probably a best path that most reasonable people would take.

by Mike C on Jan 14, 2012 3:58 pm • linkreport

This is awesome info. Too bad I need Silverlight on my iPad :( #fail #ugh

by Ghosts of DC on Jan 14, 2012 9:02 pm • linkreport

Oliver O'Brien does his magic that he does:

http://oliverobrien.co.uk/2012/01/bike-share-route-fluxes/

by Corey H. on Jan 17, 2012 4:03 pm • linkreport

The routes map is cool, but I have my doubts about how much traffic the cycletrack gains (he admits this too) and apparently N St NW is some kind of bike thoroughfare?

by MLD on Jan 17, 2012 6:04 pm • linkreport

Inspired by Corey H and others, I've been crunching the CaBi dataset too. Here's round 1 of system-level data: http://wp.me/p1UX2j-97

by JDAntos on Jan 17, 2012 10:43 pm • linkreport

CaBi location in predominantly African American neighborhoods need to be looked at more closely. More specifically we need to look at data such as heavily traveled Metrobus patterns and taxi cab data (if available). Simply putting CaBi stations at Metro stations do not suffice as most individuals exiting the Metro Stations are heading home. It is inconceivable to think that they will use the Cabi to commute to their residence where there is no end station close by.

by John on Jan 18, 2012 11:53 am • linkreport

Wonder if there is a mailing list (or even better a Google Group) where all CaBi and other local transpo data hackers are hanging out?

by Dmitry Kachaev on Jan 26, 2012 1:31 pm • linkreport

@Dmitry, our friends at WABA/BikeArlington just set up a forum for Bikeshare hacker discussions.

by Matt Caywood on Feb 3, 2012 10:36 am • linkreport

Add a Comment

Name: (will be displayed on the comments page)

Email: (must be your real address, but will be kept private)

URL: (optional, will be displayed)

Your comment:

By submitting a comment, you agree to abide by our comment policy.
Notify me of followup comments via email. (You can also subscribe without commenting.)
Save my name and email address on this computer so I don't have to enter it next time, and so I don't have to answer the anti-spam map challenge question in the future.

or

Support Us