Bicycling
Capital Bikeshare releases anonymous trip data
Programmers or analysts interested in studying Capital Bikeshare patterns or creating useful apps can now do a lot more. Capital Bikeshare has followed through on its promise and posted data files with individual (but anonymous) trip data.
The files, one for each quarter going back to late 2010, list individual trips, including the time each started and ended, duration, which station it started and ended at, and an identifying number for the individual bike. It doesn't say anything about the member who used the bike, except whether they are a "registered" (annual or monthly) member or a "casual" member (daily or 3- or 5-day).
Now, people can generate tables or graphics showing the most popular station pairs, or where people most often go from an individual station, or what weather patterns make usage heavier or lighter, or where the nighttime activity is, and much more.
This data has been available for some time for London, allowing people to create animations of a day's CaBi usage and diagrams of a single bike's path over several days. The folks who built those and other tools can now even adapt their code to work for Capital Bikeshare, if they're so inclined.
Arlington, DC and Alta officials agreed in November to offer the data, after discussions with Tom Fairchild of the Mobility Lab, lab collaborator and advisor Matt Caywood, and CaBi Tracker creator Daniel Gohlke. (I am working on projects for the Mobility Lab as well, but was not involved in this specific discussion.)
To make it even easier to work with the data, Dylan Barlett imported the files into Google Fusion Tables, a tool that lets people easily sort, manipulate and visualize data.
If you put together an interesting analysis or visualization, please send it to us! We'd love to post interesting things you come up with using this or other open data.
Comments
- Community stories show the shift to a walkable lifestyle
- Young kids try to assault me while biking
- Focus transportation on downtown or neighborhoods?
- Some are pushing to limit sidewalk cycling
- Metro bag searches aren't always optional
- Where is downtown Prince George's County?
- Endless zoning update delay hurts homeowners







This should be a good argument to extend into Alexandria and deeper into Arlington.
by Jack Love on Jan 11, 2012 3:37 pm • link • report
I imagine the cross-river traffic has only increased since.
by Jacques on Jan 11, 2012 3:47 pm • link • report
by JJJJJ on Jan 11, 2012 3:54 pm • link • report
by Rob P on Jan 11, 2012 4:02 pm • link • report
"23h 58min. 56sec.,11/16/2011 18:32,11/17/2011 18:31,Georgia & New Hampshire Ave NW (31400),19th & E Street NW (31206),W00347,Registered" ... Why?
"20h 58min. 42sec.,11/24/2011 17:53,11/25/2011 14:51,21st & M St NW (31212),21st & M St NW (31212),W00662,Casual" ... Probably didn't understand the rules.
by ZR on Jan 11, 2012 4:19 pm • link • report
1*) Downhill flow. Average trip is -1.94 meters, or over 2,632 kilometers in elevation change total. The average ride from Wisconsin and Macomb loses 55 meters in elevation. Fairfax Village has the highest start station:end station ratio (71 trips started, only 29 ended).
2) Last mile usage. The four most common one-way trips are Adams Mill/Columbia to Calvert/Woodley and back as well as Eastern Market Metro to Lincoln Park and back.
3) Tourists like to use it to sight-see. The 6th most common one-way trip is from the Smithsonian station back to the Smithsonian Station at 3,586 trips. The average trip is 2 hours, 48 minutes. 76.1% of trips generate usage fees. Breaking that down between casual and members, 86.0% of casual incurred fees on these rides while 18.8% of members incurred fees.
4) Casual vs. members usage fees. 40.7% of casual rides incur fees. 0.3% of member rides incur fees.
*Using GPS elevation data so all caveats apply. And it only factors is only station-to-station elevation change
by Corey H. on Jan 11, 2012 4:50 pm • link • report
by David C on Jan 11, 2012 5:12 pm • link • report
Anytime you release data that hasn't been aggregated (i.e., you release the data at the individual level), it becomes possible for someone with a little knowledge to connect the dots and determine who that data is applicable too. The feds consider this confidential data and keep lots of restrictions on its transfer and dissemination. I'm surprised this is being allowed. It's not a good idea. For example, just by knowing approximately when a loved one left a specific location (their work?) they can fairly easily determine where they went to if they know they use Capital Bikeshare. Releasing aggregated data is one thing, individual trips (i.e., data which is specific to an individual) is a whole other ball game.
by Lance on Jan 11, 2012 11:28 pm • link • report
by MLD on Jan 12, 2012 7:59 am • link • report
by David C on Jan 12, 2012 8:13 am • link • report
All it takes is one person doing so.
The other side of the coin is, 'What does non-aggregated data give you for following patterns that you couldn't get from aggregated data where it's impossible to discern individual patterns?" Aren't policy decisions made at the aggregate level? i.e., No policy is going to be made based on one person going from point A to point B.
by Lance on Jan 12, 2012 8:20 am • link • report
The data doesn't have anything identifying an individual, you can't track a person from one trip to the next. I suppose if you know when a person was getting on a bike you could attempt to find that one trip in the list. But to say that those "privacy concerns" are a reason not to release the data is to ignore the zillions of other easier and more dangerous ways people can track someone.
by MLD on Jan 12, 2012 8:37 am • link • report
That is the whole ballgame. Suppose you suspect your significant other is having an affair. If you know s/he got on a bike at a certain time and place, claimed to go to work, and instead went the tryst pad, the Cabi data enabled this violation of privacy. Talk to any divorce lawyer, they will give you lots of similar (and usually hilarious) stories, especially when they employ and investigator using a GPS tracking device.
by goldfish on Jan 12, 2012 9:11 am • link • report
But not really. How do you know your loved one is the one who got a bike at 5:21 downtown and went to Adams Morgan and not the person who got a bike at the same location at 5:23 and went to Capital Hill?
The other thing is this data is being released in real time, so assuming that the most recent trip released is a couple of weeks in the past at a minimum, I doubt anyone is really going to keep a detialed of log of where they suspect their cheating love one is and what time they went somewhere. If they were to be that obsessive, I would suspect that they would use the GPS in that persons phone or some other tracking method first.
Not to make this political, but Dick Cheney had the 1% doctrine, if something is 1% likely we should act like it will happen and plan accordingly to prevent it. I felt that was stupid and I think what you are worried about with this is more like the .0001% doctrine. Instead of getting the benefit from this data, you are saying we should not release it becuase of the one in a million (billion?) cahnce someone might do something bad with the information, information that they could get easier another way anyhow.
by nathaniel on Jan 12, 2012 9:31 am • link • report
There are lots of divorces and usually child custody and significant money is in play. Divorce lawyers do very, very well in this business. So what makes you think that this is less than 1%?
by goldfish on Jan 12, 2012 9:40 am • link • report
Except that the system is complex enough that it would be near impossible to get proof-positive.
So for example, you notice your hubby leave the apartment at 8:05. It's a 4 minute walk to the nearest bikeshare station, so he should check out a bike at about 8:09.
Let's say just one bike is checked out between 8:07 and 8:11. But that bike doesn't go to your significant other's office. There you go. You knew it! He is having an affair! Bastard!
Except that what really happened was that your husband stopped off at the corner 7-Eleven to pick up some Tylenol and didn't get a bike until 8:12.
Or, as luck would have it, your husband got to the bike kiosk a few seconds after the person in front of him, and that young lady took the last bike and went to her office. Instead, he had to walk to a different kiosk.
Or, drat, your husband realized that his morning staff meeting got moved up to 8:30, and what do you know, here comes the limited-stop Metrobus right now.
On the other hand, perhaps your husband was smart and checked SpotCycle and realized the three stations closest his office were full, so he had to bike to a different one and walk (or transit) over to his office.
Of course, if your husband were really smart, he'd add some confounding variables. He'd start biking toward his office, and then turn his bike in at a kiosk on the way. And then take out another bike. One more bike change, and it will be impossible to figure out where he went.
And of course, if there were any more than 1 bike checked out within the time period you think your husband would have done it, you'd never be able to confirm which trip was his. Especially if, like Lance posits), you don't know the endpoint to begin with.
by Matt Johnson on Jan 12, 2012 9:42 am • link • report
This minimizes the risks created when someone has offline knowledge to identify the individual. Much more worrisome than the hypothetical Lance has put out there involving a loved one is the risk of stranger/acquaintance stalking based on this information.
For example, the guy who waits at the bus stop across from your office every day, or twice a week, at the same time you're leaving work and admires your iPhone or sees that you're always carrying a laptop case. Checking this data, it won't be hard to see the recurring destination pattern to intercept you sometime for a mugging. And substitute "legs" for "iPhone" and the risks are even worse.
It's easy to become paranoid contemplating crime risks, and given how unaware most of us are as we bumble through the Metro buried in our earbuds and a copy of the Expressaminer, a determined malfeasor could probably follow many people home anyway. But this data does make something previously very difficult much more possible -- tying together one portion of a person's travel pattern that you might see regularly enough to note with their regular destination. And in many cases, the origin and destination will provide the most likely route that person will take as well.
by Arl Fan on Jan 12, 2012 9:45 am • link • report
by Canaan on Jan 12, 2012 10:02 am • link • report
If there was a member identifier attached to the data being used, I'd be concerned, but as is, the burden of proof seems unreachable, and the suspicion (or other efforts) that would have to be exercised in order to get to this point makes the marginal utility almost nil.
by Jacques on Jan 12, 2012 10:06 am • link • report
Exactly, that's exactly what it is, paranoia. Seriously, a criminal is going to track your movements and then go look at CaBi data to figure out where you're going so they can take your iPhone? Please.
For every ridiculous hypothetical there are 10 easier ways someone could do the same thing to you. If you are so worried about crimes that these cockamamie theoretical stalkings scare you, I suggest you never leave your house. You're probably more likely to be hit by a car or have an asteroid come down on you than have some mugger scanning CaBi spreadsheets to track where you ride a bike.
by MLD on Jan 12, 2012 10:09 am • link • report
But what does that tell you? Past trips are no guarantee of future trips. The data only show O&D with no indication as to stops along the way. That will change when GPS units are introduced.
Personally, I would love to have GPS units in the bikes to see where they are ridden, which bike lane facilities are used, how riders (anonymised) build trips from one destination to another. e.g. Who's using 15th Street? Pennsylvania Avenue? What's the best bike route from Benning Road to Downtown? etc etc.
But I see the potential risks in making that information available.
by Jack Love on Jan 12, 2012 10:22 am • link • report
by andrew on Jan 12, 2012 10:35 am • link • report
So, I assume these people are not actually doing a 3 hour marathon bike ride but actually stopping places like maybe a museum. If so, what do they do with their bike?
I've often wanted to use bikeshare to run an errand some place with no nearby docking stations and wondered if it's safe to bring my own lock and lockup the bikeshare bike while I'm inside. It's $1000 if the bike gets stolen, so I haven't risked it but maybe it's safe because nobody would steal a bikeshare bike. Thoughts?
by Falls Church on Jan 12, 2012 10:36 am • link • report
The Office of Management and Budget (OMB) Guidance for the implementation of the Confidential Information Protection and Statistical Efficiency Act of 2002 and OMB Memorandum 07-16, Safeguarding Against and Responding to the Breach of Personally Identifiable Information, both state that The term personally identifiable information refers to information that can be used to distinguish or trace an individuals identity, such as their name, Social Security Number, biometric records, etc. alone, or when combined with other personal or identifying information which is linked or linkable to a specific individual, such as date and place of birth, mothers maiden name, etc.
If you can use this data to help facilitate this in any way, then the authorities releasing this data are in violation of some federal rule somewhere. I dunno ... does CaBi get any funding that originates with the feds? Only if it doesn't could it claim to not be in violation of this rule (or a variation of it dependent on the federal agency providing the funding.)
Bottom line whether we agree or disagree that this type of personal information can be obtained in other ways, or whether (as some have suggested) that we personally feel the trade offs make this risk 'worth it', the government has ruled that this type of information may not be released to individuals without a 'need to know'. And definitely not be broadcast as is being proposed here.
by Lance on Jan 12, 2012 10:50 am • link • report
Absolutely. But a pattern of past trips -- particularly for a system which facilitates many daily commutes -- can create a high probability of a future trip, the same way that a burglar's observation that someone leaves their apartment every Sunday at 10:00 in a suit and returns at 12:30 establishes a high probability the pattern will recur.
Data suggests that CaBi has greatly helped bring the male/female gender ratio in bike ridership in DC closer to equality (not yet there by a long way, but a real improvement). I think there's a real safety concern here that could undermine that: there are a lot of women who would be very uncomfortable if you told them that someone could spot them checking out a CaBi and find out exactly which station they rode to -- even if it's not until a month from now. Particularly if they are daily users.
I strongly suggest that the exact-time data be replaced with something more general -- break the day into chunks as "morning/afternoon/evening/overnight." Even just zeroing out the minutes and providing only the hour of checkout/return would greatly increase the level of privacy, without hugely damaging the usefulness of the data.
by Arl Fan on Jan 12, 2012 10:52 am • link • report
Except the CaBi data contains no "personally identifiable information." There's no member ID involved, no SSN, no name. It's just a list of date/time, origin/destination, and the bicycle used.
by MLD on Jan 12, 2012 11:02 am • link • report
Please let me know what part of the information that CaBi released would reasonably fit into the definition that you provided?
The term personally identifiable information refers to information that can be used to distinguish or trace an individuals identity, such as their name, Social Security Number, biometric records, etc. alone, or when combined with other personal or identifying information which is linked or linkable to a specific individual, such as date and place of birth, mothers maiden name, etc.
There is no user-specific information whatsoever in the data that has been released. Unless there is a station that has been used by only one person in the entire quarter, there is no reasonable way to tell who is riding which bikes.
(CaBi likely has that information, and in the event of a criminal investigation, it's likely that such information could be used, but it's not in the data released).
by Jacques on Jan 12, 2012 11:14 am • link • report
If you rent a bike and decide to into a store, what do you do with the bike? I saw a guy @the CabiStation @Cong Heights y'day who wanted to ride to Giant but wasn't sure if he was just supposed to leave the bike outside. I told him that I thought so and both of us laughed.
by HogWash on Jan 12, 2012 11:19 am • link • report
alone, or when combined with other personal or identifying information which is linked or linkable to a specific individual
hint: as part of my job I've had to certify that I understand what this phrase means. And for those doubting that links can be made with only the most tenuous data out there, Google my name and [Deleted for violating the comment policy.]
by Lance on Jan 12, 2012 11:22 am • link • report
Otoh the information is good enough to catch someone in a lie. It would require some work to do this, and nobody will put forth such effort unless the consequences are serious enough. Possibilities include a divorce case or some civil liability. Definitely privacy is violated; it is like getting the list of the movies someone has watched.
by goldfish on Jan 12, 2012 11:29 am • link • report
Trip Duration
Start Date/Time
End Date/Time
Start Station
End Station
Bike #
Member Type (Annual/Casual)
None of those are personal or can be linked to personal information like a birth date, etc.
If you SAW someone pick out a bike and somehow had your watch synced to the CaBi time clock you might possibly be able to figure out which trip you're looking at. But beyond that you still don't have any information that helps you figure out WHO that person is.
You can't use the database to figure out one users trips, you can use it to figure out something that you've already seen and already tracked in a different way (by watching them.)
by MLD on Jan 12, 2012 11:33 am • link • report
With the exception of registered vs. casual user, all the data pertains to bikes, not people. There is nothing "personally identifiable" about it.
CaBi bikes are ridden in public. A hypothetical malefactor could find out which station a CaBi user is riding to by following the person on the street, in a car, on a bike, or even on foot. A hypothetical burglar could watch a person leave their house, from any of 100 public places.
The value to a hypothetical highly technology-savvy malefactor of being able to spot someone checking out a CaBi, write down the time so they know it's them and not someone else, and 3 months later "find out exactly which station they rode to" (and then what?) is very low.
Andrew's and Arl Fan's suggestion to make times less precise is reasonable, but it may decrease the utility of the data. Part of the value of having this data is in understanding how stations empty and fill up, and this can occur on time scales of seconds or minutes.
by Matt Caywood on Jan 12, 2012 11:45 am • link • report
I redact government documents for a living. There's absolutely no way that a court would ever deem the information CaBi is releasing "personably identifiable information".
Besides, no one riding a shared bicycle in a public right of way in broad daylight has an expection of privacy... at all! That's like me posting a comment on a blog, using my own name, at 10:47 AM on a Thursday, and then claiming "but you had no right to know that I was reading GGW at work!" (I'm on break right now, btw)
If your wife sees you get on a CaBi bike at 9:05 AM, she can easily hail a cab (or jump on another bike) and follow you to your destination. CaBi isn't a private affair.
by Steven Harrell on Jan 12, 2012 11:47 am • link • report
by Lance on Jan 12, 2012 11:55 am • link • report
by oboe on Jan 12, 2012 12:03 pm • link • report
Is firing them a strong enough action?
I mean, what if one of them remembers that time bike #65712 took a trip from Dupont to Columbia Heights at 10:56:14 on December 12, 2011?
by Matt Johnson on Jan 12, 2012 12:08 pm • link • report
by David C on Jan 12, 2012 12:17 pm • link • report
It is more like getting a list of all movies that were watched by all persons with start time and end times.
-------------
When I sat down with the CaBi Board, along with Matt Caywood & Mobility Lab, we discussed privacy. We originally requested they include a hashed unique user ID, but it was no included in the files. I believe due to privacy concerns.
I'd also like to point out that London released the same trip data. Has anyone cited the data being used for the reasons some have mentioned above.
Being concerned about privacy is important, but I think we have to be realistic. There shouldn't be any assumption of privacy while riding a CaBi bike in public. This data is about the system, not the people. In fact, there is nothing personally identifiable in the files.
by Daniel Gohlke on Jan 12, 2012 12:24 pm • link • report
by MrTinDC on Jan 12, 2012 12:28 pm • link • report
Date/Time
Location
Vendor name
Amount
but without the card name and account number, i.e., no 'personally indentifiable information'.
No doubt many people would find the information interesting and useful. Otoh one could easily use that data to piece together what someone else is doing, if the investigator has a way to link the subject to a given transaction.
Is privacy violated? you bet.
by goldfish on Jan 12, 2012 12:29 pm • link • report
Except you can't connect on instance to another, unless you are witness to every single one of the transactions. In which case, you've already been following the person around recording what they're doing.
by MLD on Jan 12, 2012 12:34 pm • link • report
by goldfish on Jan 12, 2012 12:42 pm • link • report
Correct, there should be an expectation that people cannot automatically track you.
Luckily, this data doesn't allow you to do that!
by MLD on Jan 12, 2012 12:44 pm • link • report
Obviously the Cabi data is not that good, but it good enough to show if someone is lying over some short period.
by goldfish on Jan 12, 2012 1:00 pm • link • report
We absolutely believe it's still worth considering people's concerns about whether a database of data about bikes has a meaningful affect on personal privacy. Currently CaBi is addressing these concerns by making the data be entirely about bikes, not about people; and by delaying data release.
London's bikeshare data has been out for nearly a year, as Daniel points out, and so far the results are in: awesome visualizations, useful code, and zero tabloid scare stories about privacy compromises.
by Matt Caywood on Jan 12, 2012 1:09 pm • link • report
Goldfish makes a good point that this may not come under the scope of what the PII laws are striving to prevent(i.e., identity theft), but if you subsitute 'protecting privacy' for 'protecting one's identity from theft', there are parallels between the two. And the one I have been trying to make understood today is that non-primary data can be used to derive primary data ... when one is able to 'connect the lines' .... So while nothing being disseminated is apparently personally identifiable, if it can be used in conjunction with other data to identify an individual and their normal patterns of travel (or even their one off travel in the past), then that is information which needs to be sheilded from release. And I think the others on here have done a pretty good job of explaining what CaBi is releasing can indeed be used to identify an individuals past trips or pattern of trips.
As others have offered, releasing this data at an aggregated level (e.g., departures per quarter hour at a specific station) would solve this problem. Additionally, just throwing uninterpreted (fully foot-noted) data out there serves no ones interests. It's as bad as when that guy a while back was tweeting cycling and pedestrian incidents in the District. Anyone with the money to buy a calculator could 'derive' just about any conclusion they wanted from the data ... which made it basically valueless. I suspect your going to see the same problem here with respect to the CaBi data. And at possibly great cost to some individuals who will be paying for this with their privacy (at the best case.)
by Lance on Jan 12, 2012 1:11 pm • link • report
Of course it's not. The only way you could connect a given individual to a trip record would be to also observe that individual, or track that individual with a GPS. In that case, you wouldn't need the CaBi data. I suppose you'll say "if you record the time they leave on a date and then look at the report for when that trip terminates and no one else left at that time, you've got em!", in which case, great work divorce lawyers on waiting 3 months for something you could have done instantly with $20 in cab fare. And, of course, which would have been completely legal in the first place, regardless of the pending decision to which you point, given that it has to with automatic tracking via GPS. So...what's the kerfluffle?
by worthing on Jan 12, 2012 1:13 pm • link • report
by charlie on Jan 12, 2012 1:13 pm • link • report
Ahh yes, data is worthless unless someone else interprets it for you, that's why there have never been any studies done except for those involving original research!
by MLD on Jan 12, 2012 1:21 pm • link • report
United States v. Jones deals, specifically, with requiring a warrant for the use of GPS tracking devices over an extended period. No one is questioning the legality of allowing police to follow a suspect. The law is quite clear: in a public place, a person has a very limited expectation of privacy.
by David R. on Jan 12, 2012 1:22 pm • link • report
by goldfish on Jan 12, 2012 1:22 pm • link • report
by goldfish on Jan 12, 2012 1:28 pm • link • report
by worthing on Jan 12, 2012 1:38 pm • link • report
by goldfish on Jan 12, 2012 1:41 pm • link • report
1) There are 18 records that look like this:
Duration-Start date-End date-Start station-End station-Bike#-Member Type
0h 10min. 42sec.-11/7/2011 6:46:00 PM-11/7/2011 6:57:00 PM-Calvert St & Woodley Pl NW (31106)-W00877-Billed-20010
I'm not sure what they mean, but they're missing a end station and member type and have two fields appended. One whose values is "Billed" which I'm not sure what it represents but there last one is obviously zip code (the other records show this). I'm not sure if this is an issue with the integrity of their database or if the process they use to dump the data failed on these 18 records. Either way they shouldn't have slipped through/.
2) I really don't agree with including the White House station in this data. At all. I think it needs to be expunged like yesterday. On a less important scale, I'm not sure why they included the Alta warehouse stations.
by Corey H. on Jan 12, 2012 1:51 pm • link • report
by Daniel Gohlke on Jan 12, 2012 1:56 pm • link • report
by goldfish on Jan 12, 2012 1:59 pm • link • report
by Corey H. on Jan 12, 2012 1:59 pm • link • report
You wrote that:
And you linked to United States v. Jones as evidence. But that's not what the case is about. People do not have an expectation of privacy in public; the argument in Jones is that the exceptional length and scope of the surveillance required a warrant.As others have emphasized, tracking a person's movements with CaBi data would require either access to CaBi's financial/user records, or in-person surveillance, and the latter would be be an impracticably inefficient and superfluous measure.
by David R. on Jan 12, 2012 3:48 pm • link • report
There is nothing personally identifiable that is wholly contained within this data. Lance/goldfish: Would you agree? However, you could use this data along with additional data (like CaBi's announcement of the winner) to make connections, like that the aforementioned guy was at x station at x time during his winning ride.
Lance/goldfish: Is your argument that simply being able to make that connection between this data and an external source (whether be observing them or through an announcement like this one) is bad thing? Can you further elaborate on why?
by Daniel Gohlke on Jan 12, 2012 4:04 pm • link • report
The reason that I've mentioned behavior patterns a couple of times is that a major trigger for opportunistic crimes -- of both property and violence -- is the casual observation of a desired target, whether an iPhone in use on the Metro, an SUV sitting in a parking lot, a person walking home alone in the dark. Being able to link a person that you've observed in public to a pattern that ends in a geographically distant spot, so that you can skip the step of physically following them for that distance is a huge step towards being able to target that person so that you can stalk them or commit a crime against them. Yes, it's possible to physically follow people to their destination. But it's hard, and it leaves the perpetrator vulnerable to being spotted and recognized, especially if done repeatedly. A lot of CaBi rides are part of a repeated pattern, and even now (and increasingly in the future) end (especially in the case of an evening ride home) in a location darker and with more isolation and less visibility than where they start. Releasing this data is making some people more vulnerable than we should be comfortable with. Being mugged can be an incredibly traumatic experience, to say nothing of the risk of more serious crimes.
by Arl Fan on Jan 12, 2012 7:04 pm • link • report
Seriously?
by MLD on Jan 12, 2012 10:28 pm • link • report
by David C on Jan 12, 2012 10:42 pm • link • report
An example of using deriving important intelligence from seemingly harmless data: When the lot of late night pizzas are delivered to military logistics planners, you know something is up. So all Syria needs to do is track when and where pizzas are delivered in Arlington.
David R: I disagree your position in the strongest possible terms. When in public, people have an expectation that their movements are not tracked, and this is connected directly to personal privacy.
by goldfish on Jan 13, 2012 10:18 am • link • report
Where are you going to be tonight at 9PM?
You will probably answer, "none of your business" because in fact, it is not my business to know where you are going.
That is privacy, and that is the privacy that is violated by knowing where a person is going without that person telling you voluntarily.
by goldfish on Jan 13, 2012 10:28 am • link • report
However, David R has no expectation of privacy if he is in a public space at 9PM. I don't know where he is going to be at 9PM, and neither does anyone else.
But if I happened to see him in the Starbucks at 13th and U at 9PM, I'll know where he was. That is not a violation of his privacy.
There is no personally identifiable information released with the CaBi data. And the only way you could figure out where/when someone was some place is by actually following them. Not by using the data.
by Matt Johnson on Jan 13, 2012 10:38 am • link • report
by goldfish on Jan 13, 2012 10:40 am • link • report
In that case, I would like you to take a few moments to look into the CaBi data. Please tell me when/where my next CaBi trip will take place.
by Matt Johnson on Jan 13, 2012 10:42 am • link • report
by goldfish on Jan 13, 2012 10:45 am • link • report
Please tell me when/where my next CaBi trip will take place.
by Matt Johnson on Jan 13, 2012 10:47 am • link • report
To answer your question, may I have your credit card records? Or maybe I should attach a GPS to your car. Oh wait! I could get that information by observing when you check out the cabi bicycle, and then comb through the cabi records to see where you went.
by goldfish on Jan 13, 2012 10:58 am • link • report
And if you are making an effort to observe me in public, you don't need the CaBi data to track my movements.
With respect to your idea that privacy is essential to the freedom of movement, it's baloney.
Let me give an example. You're standing on the sidewalk outside a bank. You hear gunshots and screaming. A man runs out of the building pulling off a mask and carrying a big sack with dollar signs written on it.
You take his picture with your cell phone as he runs by, and a few seconds later show it to a police officer. You tell him, "he went that way, officer!".
Later, in court, you're asked by the prosecutor to identify the man you saw escaping from the bank. You say, "That's him. The defendant."
But the judge says, "I'm sorry. I can't allow that identification. The defendant did not tell anyone that he was going to the bank to rob it, therefore, anything he did in public during that time is not observable. It's private. He chose to keep it private by not telling anyone."
See? That's not how being in public works. You have every right not to tell anyone where you're going to go. You have every right not to tell anyone where you've been or where you are.
But if someone sees you there, they are not legally or morally precluded from saying they saw you.
And the fact that someone could see you going somewhere in public and tell someone they saw you doing that does not violate your privacy and it does not impede your ability or right to movement.
by Matt Johnson on Jan 13, 2012 11:13 am • link • report
by goldfish on Jan 13, 2012 11:21 am • link • report
Correct. The witness does not know where the robbers went. But since they were in public, they could have been observed the entire time (maybe they got on a bus). If they did take out a CaBi, you still couldn't necessarily know where they went. You as a private citizen could wait three months and then try to figure it out, but good luck.
Anyway the police wouldn't have any trouble getting a warrant for the CaBi data in that instance.
by Matt Johnson on Jan 13, 2012 11:31 am • link • report
Except the data is released months later, making the connection basically useless. It also doesn't let you create a pattern of someone's movement, because you can't connect any one trip to another trip. It's not like you can just pop on the site and immediately see where someone went.
Also, your neighbor example is bogus. Knowing "when your neighbor leaves for work" isn't enough to track the trip; in order to pick the right trip from the data you have to know exactly when they picked up the bike from the station in order to identify their trip.
The idea that there is a danger that someone could take observations of you at bike stations and then try to pick out your endpoints months later is ridiculous. I suppose it's POSSIBLE, but it sure sounds like a lot more work than just f-ing following someone if you want to.
I think in most of our minds that potential danger doesn't begin to outweigh the usefulness of the data. "Fuzzying" times by 15 minutes would hurt the integrity - if someone wants to develop a model for looking at patterns of docks emptying/filling then 15 minute intervals isn't a good enough resolution.
by MLD on Jan 13, 2012 11:36 am • link • report
by goldfish on Jan 13, 2012 11:58 am • link • report
You know as much about where that person was going as you know if you're sitting on a metro train and watch somebody get off the train at a certain station. Unless you follow them from that point, you only know that they got off the train in neighborhood x.
by Jacques on Jan 13, 2012 12:18 pm • link • report
This represents a small but nevertheless undeniable loss of privacy. It is enough to catch someone in a lie about where they where going. For example, this information could be used to detect if one's spouse is having an extramarital affair. Or that snoopy neighbor could use damning conclusions partly based the cabi data to blackmail someone. Etc.
by goldfish on Jan 13, 2012 12:55 pm • link • report
Okay, okay. You've beaten the horse to death, then beaten it some more. And you managed to convince @Lance. Let it go, man!
:)
by oboe on Jan 13, 2012 1:09 pm • link • report
Another step down the slippery slope... /sarcasm
by Matt Caywood on Jan 13, 2012 5:47 pm • link • report
This is precisely why, for a biometrics project I worked on, our fingerprint database was treated as PII, even though it was anonymized. The reality is that releasing "anonymized" data can easily make the difference between whether an adversary does or does not have the information they need to violate your privacy. If the data is not released, it won't enable such an outcome.
The problem isn't even limited to anonymized individual data. Even aggregate data, depending on the scale and type of aggregation, can easily be correlated to individuals. It's called "de-anonymizing" and you will find many interesting papers on the subject by searching on that term.
by HPC on Jan 15, 2012 2:17 pm • link • report
by Jon on Jan 16, 2012 8:23 am • link • report
Add a Comment