Greater Greater Washington

Capital Bikeshare releases anonymous trip data

Programmers or analysts interested in studying Capital Bikeshare patterns or creating useful apps can now do a lot more. Capital Bikeshare has followed through on its promise and posted data files with individual (but anonymous) trip data.


Anyone can now make a map like this one of CaBi trip patterns. Image from CommuterPageBlog.

The files, one for each quarter going back to late 2010, list individual trips, including the time each started and ended, duration, which station it started and ended at, and an identifying number for the individual bike. It doesn't say anything about the member who used the bike, except whether they are a "registered" (annual or monthly) member or a "casual" member (daily or 3- or 5-day).

Now, people can generate tables or graphics showing the most popular station pairs, or where people most often go from an individual station, or what weather patterns make usage heavier or lighter, or where the nighttime activity is, and much more.

This data has been available for some time for London, allowing people to create animations of a day's CaBi usage and diagrams of a single bike's path over several days. The folks who built those and other tools can now even adapt their code to work for Capital Bikeshare, if they're so inclined.

Arlington, DC and Alta officials agreed in November to offer the data, after discussions with Tom Fairchild of the Mobility Lab, lab collaborator and advisor Matt Caywood, and CaBi Tracker creator Daniel Gohlke. (I am working on projects for the Mobility Lab as well, but was not involved in this specific discussion.)

To make it even easier to work with the data, Dylan Barlett imported the files into Google Fusion Tables, a tool that lets people easily sort, manipulate and visualize data.

If you put together an interesting analysis or visualization, please send it to us! We'd love to post interesting things you come up with using this or other open data.

David Alpert is the founder and editor-in-chief of Greater Greater Washington. He worked as a Product Manager for Google for six years and has lived in the Boston, San Francisco, and New York metro areas in addition to Washington, DC. He now lives with his wife and daughter in Dupont Circle. 

Comments

Add a comment »

Glad to see a lot of trans-river traffic, though I don't see an awful lot of actual bikes out on the 14th Street Bridge bikeway or on the Memorial.

This should be a good argument to extend into Alexandria and deeper into Arlington.

by Jack Love on Jan 11, 2012 3:37 pm • linkreport

@Jack Love -- I think the image above was from the first quarter of CaBi's existence (hence the Nannie Helen Burroughs stop, and none in the Rosslyn/Ballston corridor).

I imagine the cross-river traffic has only increased since.

by Jacques on Jan 11, 2012 3:47 pm • linkreport

Its a shame the CaBi doesnt have GPS inside each bike like other competing systems, because that data would be huge in finding out which streets cyclists use. Point-to-point doesnt tell us about the preferred path.

by JJJJJ on Jan 11, 2012 3:54 pm • linkreport

This is great news! I've been awaiting this for months now.

by Rob P on Jan 11, 2012 4:02 pm • linkreport

I'm enjoying this.

"23h 58min. 56sec.,11/16/2011 18:32,11/17/2011 18:31,Georgia & New Hampshire Ave NW (31400),19th & E Street NW (31206),W00347,Registered" ... Why?

"20h 58min. 42sec.,11/24/2011 17:53,11/25/2011 14:51,21st & M St NW (31212),21st & M St NW (31212),W00662,Casual" ... Probably didn't understand the rules.

by ZR on Jan 11, 2012 4:19 pm • linkreport

Here are my first couple of takeaways from the over 1.36 million trips in the database:

1*) Downhill flow. Average trip is -1.94 meters, or over 2,632 kilometers in elevation change total. The average ride from Wisconsin and Macomb loses 55 meters in elevation. Fairfax Village has the highest start station:end station ratio (71 trips started, only 29 ended).

2) Last mile usage. The four most common one-way trips are Adams Mill/Columbia to Calvert/Woodley and back as well as Eastern Market Metro to Lincoln Park and back.

3) Tourists like to use it to sight-see. The 6th most common one-way trip is from the Smithsonian station back to the Smithsonian Station at 3,586 trips. The average trip is 2 hours, 48 minutes. 76.1% of trips generate usage fees. Breaking that down between casual and members, 86.0% of casual incurred fees on these rides while 18.8% of members incurred fees.

4) Casual vs. members usage fees. 40.7% of casual rides incur fees. 0.3% of member rides incur fees.

*Using GPS elevation data so all caveats apply. And it only factors is only station-to-station elevation change

by Corey H. on Jan 11, 2012 4:50 pm • linkreport

JJJJ, I suspect it is only a matter of time before we see GPS added.

by David C on Jan 11, 2012 5:12 pm • linkreport

"The files, one for each quarter going back to late 2010, list individual trips, including the time each started and ended,

Anytime you release data that hasn't been aggregated (i.e., you release the data at the individual level), it becomes possible for someone with a little knowledge to connect the dots and determine who that data is applicable too. The feds consider this confidential data and keep lots of restrictions on its transfer and dissemination. I'm surprised this is being allowed. It's not a good idea. For example, just by knowing approximately when a loved one left a specific location (their work?) they can fairly easily determine where they went to if they know they use Capital Bikeshare. Releasing aggregated data is one thing, individual trips (i.e., data which is specific to an individual) is a whole other ball game.

by Lance on Jan 11, 2012 11:28 pm • linkreport

Yes, I'm just sure there will be hundreds of people stalking someone they already know is going somewhere on bikeshare by combing through trip spreadsheets.

by MLD on Jan 12, 2012 7:59 am • linkreport

Lance, you're adorable.

by David C on Jan 12, 2012 8:13 am • linkreport

@MLD "Yes, I'm just sure there will be hundreds of people stalking someone they already know is going somewhere on bikeshare by combing through trip spreadsheets.

All it takes is one person doing so.

The other side of the coin is, 'What does non-aggregated data give you for following patterns that you couldn't get from aggregated data where it's impossible to discern individual patterns?" Aren't policy decisions made at the aggregate level? i.e., No policy is going to be made based on one person going from point A to point B.

by Lance on Jan 12, 2012 8:20 am • linkreport

With the individual trip data you don't have to rely on CaBi to aggregate it the way you want. With the individual data we could look at specific times of day on different days, or see how an individual bike moves around. We can cut the data in hundreds of different ways to try to find patterns.

The data doesn't have anything identifying an individual, you can't track a person from one trip to the next. I suppose if you know when a person was getting on a bike you could attempt to find that one trip in the list. But to say that those "privacy concerns" are a reason not to release the data is to ignore the zillions of other easier and more dangerous ways people can track someone.

by MLD on Jan 12, 2012 8:37 am • linkreport

I suppose if you know when a person was getting on a bike you could attempt to find that one trip in the list.

That is the whole ballgame. Suppose you suspect your significant other is having an affair. If you know s/he got on a bike at a certain time and place, claimed to go to work, and instead went the tryst pad, the Cabi data enabled this violation of privacy. Talk to any divorce lawyer, they will give you lots of similar (and usually hilarious) stories, especially when they employ and investigator using a GPS tracking device.

by goldfish on Jan 12, 2012 9:11 am • linkreport

@goldfish
But not really. How do you know your loved one is the one who got a bike at 5:21 downtown and went to Adams Morgan and not the person who got a bike at the same location at 5:23 and went to Capital Hill?

The other thing is this data is being released in real time, so assuming that the most recent trip released is a couple of weeks in the past at a minimum, I doubt anyone is really going to keep a detialed of log of where they suspect their cheating love one is and what time they went somewhere. If they were to be that obsessive, I would suspect that they would use the GPS in that persons phone or some other tracking method first.

Not to make this political, but Dick Cheney had the 1% doctrine, if something is 1% likely we should act like it will happen and plan accordingly to prevent it. I felt that was stupid and I think what you are worried about with this is more like the .0001% doctrine. Instead of getting the benefit from this data, you are saying we should not release it becuase of the one in a million (billion?) cahnce someone might do something bad with the information, information that they could get easier another way anyhow.

by nathaniel on Jan 12, 2012 9:31 am • linkreport

@nathaniel -- the data is not good enough to catch every cheater, but knowing when somebody gets on a bike and seeing that nobody else got on one at the same time, enables one to figure out where the bicyclist went. Did s/he go straight downtown (i.e., the port near the metro station), or to that other person's apt (i.e., a 10 minute trip took 2 hours)?

There are lots of divorces and usually child custody and significant money is in play. Divorce lawyers do very, very well in this business. So what makes you think that this is less than 1%?

by goldfish on Jan 12, 2012 9:40 am • linkreport

@goldfish, @Lance:
Except that the system is complex enough that it would be near impossible to get proof-positive.

So for example, you notice your hubby leave the apartment at 8:05. It's a 4 minute walk to the nearest bikeshare station, so he should check out a bike at about 8:09.

Let's say just one bike is checked out between 8:07 and 8:11. But that bike doesn't go to your significant other's office. There you go. You knew it! He is having an affair! Bastard!

Except that what really happened was that your husband stopped off at the corner 7-Eleven to pick up some Tylenol and didn't get a bike until 8:12.

Or, as luck would have it, your husband got to the bike kiosk a few seconds after the person in front of him, and that young lady took the last bike and went to her office. Instead, he had to walk to a different kiosk.

Or, drat, your husband realized that his morning staff meeting got moved up to 8:30, and what do you know, here comes the limited-stop Metrobus right now.

On the other hand, perhaps your husband was smart and checked SpotCycle and realized the three stations closest his office were full, so he had to bike to a different one and walk (or transit) over to his office.

Of course, if your husband were really smart, he'd add some confounding variables. He'd start biking toward his office, and then turn his bike in at a kiosk on the way. And then take out another bike. One more bike change, and it will be impossible to figure out where he went.

And of course, if there were any more than 1 bike checked out within the time period you think your husband would have done it, you'd never be able to confirm which trip was his. Especially if, like Lance posits), you don't know the endpoint to begin with.

by Matt Johnson on Jan 12, 2012 9:42 am • linkreport

@MLD: doesn't have anything identifying an individual, you can't track a person from one trip to the next.

This minimizes the risks created when someone has offline knowledge to identify the individual. Much more worrisome than the hypothetical Lance has put out there involving a loved one is the risk of stranger/acquaintance stalking based on this information.

For example, the guy who waits at the bus stop across from your office every day, or twice a week, at the same time you're leaving work and admires your iPhone or sees that you're always carrying a laptop case. Checking this data, it won't be hard to see the recurring destination pattern to intercept you sometime for a mugging. And substitute "legs" for "iPhone" and the risks are even worse.

It's easy to become paranoid contemplating crime risks, and given how unaware most of us are as we bumble through the Metro buried in our earbuds and a copy of the Expressaminer, a determined malfeasor could probably follow many people home anyway. But this data does make something previously very difficult much more possible -- tying together one portion of a person's travel pattern that you might see regularly enough to note with their regular destination. And in many cases, the origin and destination will provide the most likely route that person will take as well.

by Arl Fan on Jan 12, 2012 9:45 am • linkreport

So someone who wishes to do me harm will now have a marginally easier time to track me? This is a risk I'm willing to take. If someone is taking time to notice when I get on a bike or leave my apartment then their ability to make some assumptions from a computer program isn't the thing that bothers me.

by Canaan on Jan 12, 2012 10:02 am • linkreport

@Canaan -- Agreed, especially if this data is being released quarterly, at which point it's dated at best.

If there was a member identifier attached to the data being used, I'd be concerned, but as is, the burden of proof seems unreachable, and the suspicion (or other efforts) that would have to be exercised in order to get to this point makes the marginal utility almost nil.

by Jacques on Jan 12, 2012 10:06 am • linkreport

It's easy to become paranoid contemplating crime risks

Exactly, that's exactly what it is, paranoia. Seriously, a criminal is going to track your movements and then go look at CaBi data to figure out where you're going so they can take your iPhone? Please.

For every ridiculous hypothetical there are 10 easier ways someone could do the same thing to you. If you are so worried about crimes that these cockamamie theoretical stalkings scare you, I suggest you never leave your house. You're probably more likely to be hit by a car or have an asteroid come down on you than have some mugger scanning CaBi spreadsheets to track where you ride a bike.

by MLD on Jan 12, 2012 10:09 am • linkreport

Yeah I don't think it would require a lot of effort to eventually tie a trip, or set of trips, to an individual. It would involve surveillance, which is easy enough to do in public; and time, to eliminate the noise of other trips and riders.

But what does that tell you? Past trips are no guarantee of future trips. The data only show O&D with no indication as to stops along the way. That will change when GPS units are introduced.

Personally, I would love to have GPS units in the bikes to see where they are ridden, which bike lane facilities are used, how riders (anonymised) build trips from one destination to another. e.g. Who's using 15th Street? Pennsylvania Avenue? What's the best bike route from Benning Road to Downtown? etc etc.

But I see the potential risks in making that information available.

by Jack Love on Jan 12, 2012 10:22 am • linkreport

Could we solve this privacy issue by "fuzzing" each departure time by 0-5 minutes?

by andrew on Jan 12, 2012 10:35 am • linkreport

The average trip is 2 hours, 48 minutes.

So, I assume these people are not actually doing a 3 hour marathon bike ride but actually stopping places like maybe a museum. If so, what do they do with their bike?

I've often wanted to use bikeshare to run an errand some place with no nearby docking stations and wondered if it's safe to bring my own lock and lockup the bikeshare bike while I'm inside. It's $1000 if the bike gets stolen, so I haven't risked it but maybe it's safe because nobody would steal a bikeshare bike. Thoughts?

by Falls Church on Jan 12, 2012 10:36 am • linkreport

Personally Identifiable Information

The Office of Management and Budget (OMB) Guidance for the implementation of the Confidential Information Protection and Statistical Efficiency Act of 2002 and OMB Memorandum 07-16, Safeguarding Against and Responding to the Breach of Personally Identifiable Information, both state that “The term ‘personally identifiable information’ refers to information that can be used to distinguish or trace an individual’s identity, such as their name, Social Security Number, biometric records, etc. alone, or when combined with other personal or identifying information which is linked or linkable to a specific individual, such as date and place of birth, mother’s maiden name, etc.”

If you can use this data to help facilitate this in any way, then the authorities releasing this data are in violation of some federal rule somewhere. I dunno ... does CaBi get any funding that originates with the feds? Only if it doesn't could it claim to not be in violation of this rule (or a variation of it dependent on the federal agency providing the funding.)

Bottom line whether we agree or disagree that this type of personal information can be obtained in other ways, or whether (as some have suggested) that we personally feel the trade offs make this risk 'worth it', the government has ruled that this type of information may not be released to individuals without a 'need to know'. And definitely not be broadcast as is being proposed here.

by Lance on Jan 12, 2012 10:50 am • linkreport

@ Jack But what does that tell you? Past trips are no guarantee of future trips.

Absolutely. But a pattern of past trips -- particularly for a system which facilitates many daily commutes -- can create a high probability of a future trip, the same way that a burglar's observation that someone leaves their apartment every Sunday at 10:00 in a suit and returns at 12:30 establishes a high probability the pattern will recur.

Data suggests that CaBi has greatly helped bring the male/female gender ratio in bike ridership in DC closer to equality (not yet there by a long way, but a real improvement). I think there's a real safety concern here that could undermine that: there are a lot of women who would be very uncomfortable if you told them that someone could spot them checking out a CaBi and find out exactly which station they rode to -- even if it's not until a month from now. Particularly if they are daily users.

I strongly suggest that the exact-time data be replaced with something more general -- break the day into chunks as "morning/afternoon/evening/overnight." Even just zeroing out the minutes and providing only the hour of checkout/return would greatly increase the level of privacy, without hugely damaging the usefulness of the data.

by Arl Fan on Jan 12, 2012 10:52 am • linkreport

@Lance

Except the CaBi data contains no "personally identifiable information." There's no member ID involved, no SSN, no name. It's just a list of date/time, origin/destination, and the bicycle used.

by MLD on Jan 12, 2012 11:02 am • linkreport

@Lance --
Please let me know what part of the information that CaBi released would reasonably fit into the definition that you provided?

“The term ‘personally identifiable information’ refers to information that can be used to distinguish or trace an individual’s identity, such as their name, Social Security Number, biometric records, etc. alone, or when combined with other personal or identifying information which is linked or linkable to a specific individual, such as date and place of birth, mother’s maiden name, etc.”

There is no user-specific information whatsoever in the data that has been released. Unless there is a station that has been used by only one person in the entire quarter, there is no reasonable way to tell who is riding which bikes.

(CaBi likely has that information, and in the event of a criminal investigation, it's likely that such information could be used, but it's not in the data released).

by Jacques on Jan 12, 2012 11:14 am • linkreport

This is likely a silly question, one I believe I've asked and has been answered before.

If you rent a bike and decide to into a store, what do you do with the bike? I saw a guy @the CabiStation @Cong Heights y'day who wanted to ride to Giant but wasn't sure if he was just supposed to leave the bike outside. I told him that I thought so and both of us laughed.

by HogWash on Jan 12, 2012 11:19 am • linkreport

@MLD and Jacques

alone, or when combined with other personal or identifying information which is linked or linkable to a specific individual

hint: as part of my job I've had to certify that I understand what this phrase means. And for those doubting that links can be made with only the most tenuous data out there, Google my name and [Deleted for violating the comment policy.]

by Lance on Jan 12, 2012 11:22 am • linkreport

@Lance: I believe the protection of 'personally identifiable information' is to prevent identity theft. I don't think the Cabi data has any PII.

Otoh the information is good enough to catch someone in a lie. It would require some work to do this, and nobody will put forth such effort unless the consequences are serious enough. Possibilities include a divorce case or some civil liability. Definitely privacy is violated; it is like getting the list of the movies someone has watched.

by goldfish on Jan 12, 2012 11:29 am • linkreport

OK Lance, so tell us out of the following columns that the files contain, which piece of information is a piece of "personal or identifying information... such as date and place of birth, mother’s maiden name, etc.:

Trip Duration
Start Date/Time
End Date/Time
Start Station
End Station
Bike #
Member Type (Annual/Casual)

None of those are personal or can be linked to personal information like a birth date, etc.

If you SAW someone pick out a bike and somehow had your watch synced to the CaBi time clock you might possibly be able to figure out which trip you're looking at. But beyond that you still don't have any information that helps you figure out WHO that person is.

You can't use the database to figure out one users trips, you can use it to figure out something that you've already seen and already tracked in a different way (by watching them.)

by MLD on Jan 12, 2012 11:33 am • linkreport

I'm glad that there are so many people concerned with privacy. Privacy implications were given serious consideration during the process of opening the data.

With the exception of registered vs. casual user, all the data pertains to bikes, not people. There is nothing "personally identifiable" about it.

CaBi bikes are ridden in public. A hypothetical malefactor could find out which station a CaBi user is riding to by following the person on the street, in a car, on a bike, or even on foot. A hypothetical burglar could watch a person leave their house, from any of 100 public places.

The value to a hypothetical highly technology-savvy malefactor of being able to spot someone checking out a CaBi, write down the time so they know it's them and not someone else, and 3 months later "find out exactly which station they rode to" (and then what?) is very low.

Andrew's and Arl Fan's suggestion to make times less precise is reasonable, but it may decrease the utility of the data. Part of the value of having this data is in understanding how stations empty and fill up, and this can occur on time scales of seconds or minutes.

by Matt Caywood on Jan 12, 2012 11:45 am • linkreport

@Lance,

I redact government documents for a living. There's absolutely no way that a court would ever deem the information CaBi is releasing "personably identifiable information".

Besides, no one riding a shared bicycle in a public right of way in broad daylight has an expection of privacy... at all! That's like me posting a comment on a blog, using my own name, at 10:47 AM on a Thursday, and then claiming "but you had no right to know that I was reading GGW at work!" (I'm on break right now, btw)

If your wife sees you get on a CaBi bike at 9:05 AM, she can easily hail a cab (or jump on another bike) and follow you to your destination. CaBi isn't a private affair.

by Steven Harrell on Jan 12, 2012 11:47 am • linkreport

@goldfish, good point

by Lance on Jan 12, 2012 11:55 am • linkreport

Like @Lance and @goldfish, I too am extremely concerned about these highly relevant privacy concerns. And my personal antipathy towards Capital Bikeshare has nothing whatsoever to do with my worries. Perhaps we should dismantle the system and fire everyone who was ever involved in its existence, just to be sure.

by oboe on Jan 12, 2012 12:03 pm • linkreport

@oboe:
Is firing them a strong enough action?

I mean, what if one of them remembers that time bike #65712 took a trip from Dupont to Columbia Heights at 10:56:14 on December 12, 2011?

by Matt Johnson on Jan 12, 2012 12:08 pm • linkreport

Get out the Neuralyzer.

by David C on Jan 12, 2012 12:17 pm • linkreport

@goldfish: it is like getting the list of the movies someone has watched.

It is more like getting a list of all movies that were watched by all persons with start time and end times.
-------------

When I sat down with the CaBi Board, along with Matt Caywood & Mobility Lab, we discussed privacy. We originally requested they include a hashed unique user ID, but it was no included in the files. I believe due to privacy concerns.

I'd also like to point out that London released the same trip data. Has anyone cited the data being used for the reasons some have mentioned above.

Being concerned about privacy is important, but I think we have to be realistic. There shouldn't be any assumption of privacy while riding a CaBi bike in public. This data is about the system, not the people. In fact, there is nothing personally identifiable in the files.

by Daniel Gohlke on Jan 12, 2012 12:24 pm • linkreport

As a daily bikeshare rider, I have absolutely no qualms about this data being released. As a privacy concern, it ranks way way way down there.

by MrTinDC on Jan 12, 2012 12:28 pm • linkreport

I'll make an analogy with credit card data. Supposing someone posted the following information on all credit card transaction in DC, over the past three months:

Date/Time
Location
Vendor name
Amount

but without the card name and account number, i.e., no 'personally indentifiable information'.

No doubt many people would find the information interesting and useful. Otoh one could easily use that data to piece together what someone else is doing, if the investigator has a way to link the subject to a given transaction.

Is privacy violated? you bet.

by goldfish on Jan 12, 2012 12:29 pm • linkreport

Otoh one could easily use that data to piece together what someone else is doing, if the investigator has a way to link the subject to a given transaction.

Except you can't connect on instance to another, unless you are witness to every single one of the transactions. In which case, you've already been following the person around recording what they're doing.

by MLD on Jan 12, 2012 12:34 pm • linkreport

@Daniel Gohlke: There shouldn't be any assumption of privacy while riding a CaBi bike in public. I couldn't disagree more. When you are out on your own, you have a reasonable expectation that no one is tracking your movements. Therefore, what you are doing is "private". This very question is currently under consideration by the Supreme Court. I do not have much faith that they will do the right thing.

by goldfish on Jan 12, 2012 12:42 pm • linkreport

When you are out on your own, you have a reasonable expectation that no one is tracking your movements.

Correct, there should be an expectation that people cannot automatically track you.

Luckily, this data doesn't allow you to do that!

by MLD on Jan 12, 2012 12:44 pm • linkreport

@MLD, you need to think more like a divorce lawyer. In custody cases what they do is hire someone to plant and conceal a GPS on a subject's car. Two weeks later, the lawyer has everything that is needed to pressure the subject to settle the case. It gets really fun when the subject's iphone is hacked.

Obviously the Cabi data is not that good, but it good enough to show if someone is lying over some short period.

by goldfish on Jan 12, 2012 1:00 pm • linkreport

I don't want to put words in Daniel's mouth, but I think what he meant to say is that it's not reasonable for people to assume they have absolute privacy when they're doing something *in public* like riding a bike.

We absolutely believe it's still worth considering people's concerns about whether a database of data about bikes has a meaningful affect on personal privacy. Currently CaBi is addressing these concerns by making the data be entirely about bikes, not about people; and by delaying data release.

London's bikeshare data has been out for nearly a year, as Daniel points out, and so far the results are in: awesome visualizations, useful code, and zero tabloid scare stories about privacy compromises.

by Matt Caywood on Jan 12, 2012 1:09 pm • linkreport

@Daniel "This data is about the system, not the people. In fact, there is nothing personally identifiable in the files."

Goldfish makes a good point that this may not come under the scope of what the PII laws are striving to prevent(i.e., identity theft), but if you subsitute 'protecting privacy' for 'protecting one's identity from theft', there are parallels between the two. And the one I have been trying to make understood today is that non-primary data can be used to derive primary data ... when one is able to 'connect the lines' .... So while nothing being disseminated is apparently personally identifiable, if it can be used in conjunction with other data to identify an individual and their normal patterns of travel (or even their one off travel in the past), then that is information which needs to be sheilded from release. And I think the others on here have done a pretty good job of explaining what CaBi is releasing can indeed be used to identify an individuals past trips or pattern of trips.

As others have offered, releasing this data at an aggregated level (e.g., departures per quarter hour at a specific station) would solve this problem. Additionally, just throwing uninterpreted (fully foot-noted) data out there serves no ones interests. It's as bad as when that guy a while back was tweeting cycling and pedestrian incidents in the District. Anyone with the money to buy a calculator could 'derive' just about any conclusion they wanted from the data ... which made it basically valueless. I suspect your going to see the same problem here with respect to the CaBi data. And at possibly great cost to some individuals who will be paying for this with their privacy (at the best case.)

by Lance on Jan 12, 2012 1:11 pm • linkreport

@ goldfish: Obviously the Cabi data is not that good, but it good enough to show if someone is lying over some short period.

Of course it's not. The only way you could connect a given individual to a trip record would be to also observe that individual, or track that individual with a GPS. In that case, you wouldn't need the CaBi data. I suppose you'll say "if you record the time they leave on a date and then look at the report for when that trip terminates and no one else left at that time, you've got em!", in which case, great work divorce lawyers on waiting 3 months for something you could have done instantly with $20 in cab fare. And, of course, which would have been completely legal in the first place, regardless of the pending decision to which you point, given that it has to with automatic tracking via GPS. So...what's the kerfluffle?

by worthing on Jan 12, 2012 1:13 pm • linkreport

I'm pretty sure OMB guidance wouldn't control what CABI does.

by charlie on Jan 12, 2012 1:13 pm • linkreport

@Lance

Ahh yes, data is worthless unless someone else interprets it for you, that's why there have never been any studies done except for those involving original research!

by MLD on Jan 12, 2012 1:21 pm • linkreport

@goldfish

United States v. Jones deals, specifically, with requiring a warrant for the use of GPS tracking devices over an extended period. No one is questioning the legality of allowing police to follow a suspect. The law is quite clear: in a public place, a person has a very limited expectation of privacy.

by David R. on Jan 12, 2012 1:22 pm • linkreport

@worthing: the principle at stake, and also that this is a slippery slope. Consider my cc example.

by goldfish on Jan 12, 2012 1:22 pm • linkreport

@David R: yes we are all familiar with the government's argument in this case. What this does not consider is that government's authority to follow a suspect has until lately been extremely limited by the cost of doing so. With the new technology, the cost surveillance has has decreased by orders of magnitude, which makes 'fishing expeditions' viable. These are the sorts of personal rights that people have died to protect.

by goldfish on Jan 12, 2012 1:28 pm • linkreport

Goldfish, I'm 100% with you on warrantless and automatic surveillance being very significant "uh-ohs" concerning the overall right to privacy and the general unpleasantness of Big Brother (the concept, not the show, though I'll take arguments for that also being unpleasant). It hits far too close to "well, if you have nothing to hide, you shouldn't have a problem with this, right?" However, I disagree that this CaBi data is positioned on that slippery slope; there is nothing automated here.

by worthing on Jan 12, 2012 1:38 pm • linkreport

@worthing: the slippery slope is the GPS data that has been promised to be added later.

by goldfish on Jan 12, 2012 1:41 pm • linkreport

Getting away from the theoretical arguments, but there are some awkward entries in the files that were dumped.

1) There are 18 records that look like this:
Duration-Start date-End date-Start station-End station-Bike#-Member Type
0h 10min. 42sec.-11/7/2011 6:46:00 PM-11/7/2011 6:57:00 PM-Calvert St & Woodley Pl NW (31106)-W00877-Billed-20010

I'm not sure what they mean, but they're missing a end station and member type and have two fields appended. One whose values is "Billed" which I'm not sure what it represents but there last one is obviously zip code (the other records show this). I'm not sure if this is an issue with the integrity of their database or if the process they use to dump the data failed on these 18 records. Either way they shouldn't have slipped through/.

2) I really don't agree with including the White House station in this data. At all. I think it needs to be expunged like yesterday. On a less important scale, I'm not sure why they included the Alta warehouse stations.

by Corey H. on Jan 12, 2012 1:51 pm • linkreport

@goldfish: Who promised GPS data, and what data was promised, exactly? These CaBi bikes don't have GPS capability. Even if it were added in the future, that doesn't mean its data would get released with the trip data.

by Daniel Gohlke on Jan 12, 2012 1:56 pm • linkreport

@Daniel Gohhlke: ok you are right, Cabi has not promised GPS data. Otoh other upthread have suggested it, and it would be very easy to implement. And I am sure that many people would find such information interesting and useful.

by goldfish on Jan 12, 2012 1:59 pm • linkreport

Furthering the issue of whether you can track down whether you can pinpoint an individual's usage with the data. I am 100% certain I can tell you exactly where the guy who won the winter weather warrior content was on February 22, 2011 between 7:30 AM and 11:30 PM. Sure, it's his "fault" because he used the same bike all day, but his infamous go-to-every-station ride is there in all its glory.

by Corey H. on Jan 12, 2012 1:59 pm • linkreport

@goldfish

You wrote that:

When you are out on your own, you have a reasonable expectation that no one is tracking your movements. Therefore, what you are doing is "private". This very question is currently under consideration by the Supreme Court.
And you linked to United States v. Jones as evidence. But that's not what the case is about. People do not have an expectation of privacy in public; the argument in Jones is that the exceptional length and scope of the surveillance required a warrant.

As others have emphasized, tracking a person's movements with CaBi data would require either access to CaBi's financial/user records, or in-person surveillance, and the latter would be be an impracticably inefficient and superfluous measure.

by David R. on Jan 12, 2012 3:48 pm • linkreport

@Corey H: You illustrate an excellent point, and in my opinion what this whole matter boils down to in simple terms. The data shows that some guy hit every single station using the same bike during bad weather to win the Winter Warrior contest. However, without CaBi announcing the winner and his marathon ride that led to his win, you wouldn't be able to connect his trips from this data to him personally.

There is nothing personally identifiable that is wholly contained within this data. Lance/goldfish: Would you agree? However, you could use this data along with additional data (like CaBi's announcement of the winner) to make connections, like that the aforementioned guy was at x station at x time during his winning ride.

Lance/goldfish: Is your argument that simply being able to make that connection between this data and an external source (whether be observing them or through an announcement like this one) is bad thing? Can you further elaborate on why?

by Daniel Gohlke on Jan 12, 2012 4:04 pm • linkreport

@Daniel Gohlke: Is your argument that simply being able to make that connection between this data and an external source (whether be observing them or through an announcement like this one) is bad thing? Can you further elaborate on why?

The reason that I've mentioned behavior patterns a couple of times is that a major trigger for opportunistic crimes -- of both property and violence -- is the casual observation of a desired target, whether an iPhone in use on the Metro, an SUV sitting in a parking lot, a person walking home alone in the dark. Being able to link a person that you've observed in public to a pattern that ends in a geographically distant spot, so that you can skip the step of physically following them for that distance is a huge step towards being able to target that person so that you can stalk them or commit a crime against them. Yes, it's possible to physically follow people to their destination. But it's hard, and it leaves the perpetrator vulnerable to being spotted and recognized, especially if done repeatedly. A lot of CaBi rides are part of a repeated pattern, and even now (and increasingly in the future) end (especially in the case of an evening ride home) in a location darker and with more isolation and less visibility than where they start. Releasing this data is making some people more vulnerable than we should be comfortable with. Being mugged can be an incredibly traumatic experience, to say nothing of the risk of more serious crimes.

by Arl Fan on Jan 12, 2012 7:04 pm • linkreport

So someone's going to commit an opportunistic crime by observing someone at a CaBi station for days on end (taking notes on the times they arrive and the date of course!), THEN wait three months for the data to be released, THEN comb through and find the patterns of travel so they can wait, in hiding at the person's destination, and THEN steal their iPhone?

Seriously?

by MLD on Jan 12, 2012 10:28 pm • linkreport

MLD, Some people like the seduction of the long theft.

by David C on Jan 12, 2012 10:42 pm • linkreport

@Daniel Gohlke: taken alone, there is nothing personally identifiable in the data. However, data never exists alone, and connecting the dots can be obvious. With the most trivial knowledge of someone's schedule -- e.g., when your neighbor leaves for work in the morning -- when combined with the cabi data, makes it possible track someone's movements. I agree with Lance, the data should be 'fuzzed' or (say) grouped in 15 minute increments, to reduce the possibility of tracking someone.

An example of using deriving important intelligence from seemingly harmless data: When the lot of late night pizzas are delivered to military logistics planners, you know something is up. So all Syria needs to do is track when and where pizzas are delivered in Arlington.

David R: I disagree your position in the strongest possible terms. When in public, people have an expectation that their movements are not tracked, and this is connected directly to personal privacy.

by goldfish on Jan 13, 2012 10:18 am • linkreport

@David R: more thoughts on privacy of movement (for lack of a better term) --

Where are you going to be tonight at 9PM?

You will probably answer, "none of your business" because in fact, it is not my business to know where you are going.

That is privacy, and that is the privacy that is violated by knowing where a person is going without that person telling you voluntarily.

by goldfish on Jan 13, 2012 10:28 am • linkreport

@goldfish:
However, David R has no expectation of privacy if he is in a public space at 9PM. I don't know where he is going to be at 9PM, and neither does anyone else.

But if I happened to see him in the Starbucks at 13th and U at 9PM, I'll know where he was. That is not a violation of his privacy.

There is no personally identifiable information released with the CaBi data. And the only way you could figure out where/when someone was some place is by actually following them. Not by using the data.

by Matt Johnson on Jan 13, 2012 10:38 am • linkreport

@MJ: You are confusing "being in public" with "knowledge of one's movements". They are not the same. The Cabi data reveals the latter.

by goldfish on Jan 13, 2012 10:40 am • linkreport

@goldfish:
In that case, I would like you to take a few moments to look into the CaBi data. Please tell me when/where my next CaBi trip will take place.

by Matt Johnson on Jan 13, 2012 10:42 am • linkreport

@MJ: I happen to have met a good friend at the Lincoln Memorial last week. We chatted, but when we parted company I asked her where she was going. She deflected the question.

by goldfish on Jan 13, 2012 10:45 am • linkreport

@goldfish:
Please tell me when/where my next CaBi trip will take place.

by Matt Johnson on Jan 13, 2012 10:47 am • linkreport

@MJ: are you purposefully ignoring my point? Knowledge of someone in a public place at a particular time does not tell you where they are going or what their plans are. That is only revealed voluntarily. This is the very essence of the privacy and/or freedom of movement and this right is asserted every day when someone refuses to tell someone else that where they are going is a private affair and is 'none of your business.'.

To answer your question, may I have your credit card records? Or maybe I should attach a GPS to your car. Oh wait! I could get that information by observing when you check out the cabi bicycle, and then comb through the cabi records to see where you went.

by goldfish on Jan 13, 2012 10:58 am • linkreport

@goldfish:
Knowledge of someone in a public place at a particular time does not tell you where they are going or what their plans are.
Exactly. You cannot tell me where I am going to go in the future unless I tell you. Therefore, the release of CaBi records (especially without a name attached) will not enable you to do that.
Oh wait! I could get that information by observing when you check out the cabi bicycle, and then comb through the cabi records to see where you went.
My point exactly. Unless you actually make an effort to observe me in public, you cannot track my movements using CaBi data.

And if you are making an effort to observe me in public, you don't need the CaBi data to track my movements.

With respect to your idea that privacy is essential to the freedom of movement, it's baloney.

Let me give an example. You're standing on the sidewalk outside a bank. You hear gunshots and screaming. A man runs out of the building pulling off a mask and carrying a big sack with dollar signs written on it.

You take his picture with your cell phone as he runs by, and a few seconds later show it to a police officer. You tell him, "he went that way, officer!".

Later, in court, you're asked by the prosecutor to identify the man you saw escaping from the bank. You say, "That's him. The defendant."

But the judge says, "I'm sorry. I can't allow that identification. The defendant did not tell anyone that he was going to the bank to rob it, therefore, anything he did in public during that time is not observable. It's private. He chose to keep it private by not telling anyone."

See? That's not how being in public works. You have every right not to tell anyone where you're going to go. You have every right not to tell anyone where you've been or where you are.

But if someone sees you there, they are not legally or morally precluded from saying they saw you.

And the fact that someone could see you going somewhere in public and tell someone they saw you doing that does not violate your privacy and it does not impede your ability or right to movement.

by Matt Johnson on Jan 13, 2012 11:13 am • linkreport

@MJ, your bank robbery example is not on point. Yes any witness can tell anyone else where someone else was and what that other person was doing. That is not private. But the witness does not know where the bank robbers went or what they did with the money -- that is private. If they jumped on a Cabi bicycle and rode to Bethesda, such information should only be revealed with a warrant, which only occurs when there is probable cause that a crime has been committed (in this case, that is true).

by goldfish on Jan 13, 2012 11:21 am • linkreport

@goldfish:
Correct. The witness does not know where the robbers went. But since they were in public, they could have been observed the entire time (maybe they got on a bus). If they did take out a CaBi, you still couldn't necessarily know where they went. You as a private citizen could wait three months and then try to figure it out, but good luck.

Anyway the police wouldn't have any trouble getting a warrant for the CaBi data in that instance.

by Matt Johnson on Jan 13, 2012 11:31 am • linkreport

@goldfish

Except the data is released months later, making the connection basically useless. It also doesn't let you create a pattern of someone's movement, because you can't connect any one trip to another trip. It's not like you can just pop on the site and immediately see where someone went.

Also, your neighbor example is bogus. Knowing "when your neighbor leaves for work" isn't enough to track the trip; in order to pick the right trip from the data you have to know exactly when they picked up the bike from the station in order to identify their trip.

The idea that there is a danger that someone could take observations of you at bike stations and then try to pick out your endpoints months later is ridiculous. I suppose it's POSSIBLE, but it sure sounds like a lot more work than just f-ing following someone if you want to.

I think in most of our minds that potential danger doesn't begin to outweigh the usefulness of the data. "Fuzzying" times by 15 minutes would hurt the integrity - if someone wants to develop a model for looking at patterns of docks emptying/filling then 15 minute intervals isn't a good enough resolution.

by MLD on Jan 13, 2012 11:36 am • linkreport

@MLD: lets say I happen to live across the street from a cabi station. I can see exactly when everybody takes out a bike. With the cabi data I now know where my neighbors have gone. I am not the snooping type, but some are and such people can now easily figure out what is not their business. Before the cabi data was available, they could not without spending many $k on a private investigator.

by goldfish on Jan 13, 2012 11:58 am • linkreport

At the same time, though, goldfish, you can't tell where (specifically) any of those neighbors went. Only where they parked the bikeshare bike that you saw them take out. From there, they could have easily walked several blocks in any direction, gotten onto a bus or metro, or taken out another cabi bike.

You know as much about where that person was going as you know if you're sitting on a metro train and watch somebody get off the train at a certain station. Unless you follow them from that point, you only know that they got off the train in neighborhood x.

by Jacques on Jan 13, 2012 12:18 pm • linkreport

@Jacques: the information provided by the cabi data, of where my neighbors went, is new incremental loss to what was before one's personal business. There is the slippery slope here, as further losses of privacy become acceptable -- such as GPS cabi data. (I dare suggest that installing GPS on cabi bicycles will happen sometime soon, as I am sure cabi has every right to know where its bicycles are.)

This represents a small but nevertheless undeniable loss of privacy. It is enough to catch someone in a lie about where they where going. For example, this information could be used to detect if one's spouse is having an extramarital affair. Or that snoopy neighbor could use damning conclusions partly based the cabi data to blackmail someone. Etc.

by goldfish on Jan 13, 2012 12:55 pm • linkreport

@goldfish,

Okay, okay. You've beaten the horse to death, then beaten it some more. And you managed to convince @Lance. Let it go, man!

:)

by oboe on Jan 13, 2012 1:09 pm • linkreport

As part of their "Visualization Challenge," Boston's MBTA just released bikeshare data from October 9 and 11, 2011.

Another step down the slippery slope... /sarcasm

by Matt Caywood on Jan 13, 2012 5:47 pm • linkreport

Adding to Lance's comments about Personally Identifiable Information, it's the "when combined" part of the definition that's the key. Look at the Netflix Prize controversy -- data that alone does not identify individuals can easily be correlated with other publically available data to identify individuals with high probability. No stalking required.

This is precisely why, for a biometrics project I worked on, our fingerprint database was treated as PII, even though it was anonymized. The reality is that releasing "anonymized" data can easily make the difference between whether an adversary does or does not have the information they need to violate your privacy. If the data is not released, it won't enable such an outcome.

The problem isn't even limited to anonymized individual data. Even aggregate data, depending on the scale and type of aggregation, can easily be correlated to individuals. It's called "de-anonymizing" and you will find many interesting papers on the subject by searching on that term.

by HPC on Jan 15, 2012 2:17 pm • linkreport

Perhaps Lance and goldfish should just stop using CaBi to rendezvous with each other for their affair.

by Jon on Jan 16, 2012 8:23 am • linkreport

Add a Comment

Name: (will be displayed on the comments page)

Email: (must be your real address, but will be kept private)

URL: (optional, will be displayed)

Your comment:

By submitting a comment, you agree to abide by our comment policy.
Notify me of followup comments via email. (You can also subscribe without commenting.)
Save my name and email address on this computer so I don't have to enter it next time, and so I don't have to answer the anti-spam map challenge question in the future.

or