Posts about Data Openness
WMATA planners have created a new ridership data visualization, a video that shows the volume at each station across the day:
This has a lot in common with Kenton Ngo's animated GIF that works basically the same way, but with less fine-grained time resolution:
WMATA planners created this before they saw Ngo's, planner Michael Eichler noted in an email. In each one, the circles are larger at times when more people are entering or exiting the station. The color shifts based on whether the traffic is people entering (pink), exiting (blue), or a mix (shades of purple in between).
The WMATA animation uses April 10, 2013, which was Metro's 4th highest ridership day ever. The PlanItMetro post says:
A combination of cherry blossom peak bloom and two sporting events ratcheted ridership up to 871,000 for the day, compared to an average weekday ridership of around 750,000. Note the high level of activity at the Smithsonian station all day long, and big dots that grow and shrink as the sports games begin and then end near Gallery Place and Navy Yard-Ballpark stations.You can access the data yourself to create your own visualizations here. If you make some, let us know at email@example.com and we'll post some of the best.
Kenton Ngo made an animation showing how many people are entering or exiting Metro stations at each hour across the day.
Green circles show where people enter, and red where they exit. As you'd expect, green circles swell and then shrink at end-of-line and other busy suburban stations in the morning, while even larger red circles appear at the stations at major job centers. In the evening, the pattern reverses.
This is another way of visualizing the Metro station data which WMATA released last year. Matt Johnson used it to compute the busiest stations and the balance between stations. In 2009, Matt diagrammed the flows in each direction.
If you go to the large and interactive version on PlanItMetro, you can mouse over individual squares to see the date as a tooltip.
The darkest red days have the lowest ridership, the darkest green the highest. You can see high ridership events like President Obama's January
2005 2009 inauguration, the Stewart/Colbert rally in October 2010, Snowmageddon/Snowpocalypse in February 2010, and more.
Stepping back, it's clear how ridership is highest in April, June, and July, and the number of very high ridership days jumped significantly in 2008 but then has stayed flat or a bit down since. Weekend ridership has gotten lower in recent years, probably because of all the trackwork.
What do you notice?
The Sunlight Foundation has put together a great interactive map of contributions for the April 23 DC Council at-large special election.
Map by the Sunlight Foundation. Contribution data from the April 15 release
by the DC Office of Campaign Finance.
Their article by Ryan Sibley also shows many other interesting statistics, such as who got money from outside the region, the balance of corporate and individual contributions (Anita Bonds and Michael Brown got only about half individual contributions, while it's nearly 100% for Silverman), and more.
Sibley also notes that while DC's Office of Campaign Finance releases computer-readable data files with contribution information, some data is not in those files, like which candidate goes with a campaign committee. That's in PDFs, but PDF data isn't usable in mash-ups without human work.
What do you notice?
After open data advocates pointed out how ridiculous it is that private companies have a copyright on the only publicly-available versions of DC's laws, DC Council General Counsel David Zvenyach helped make a public domain version and posted it online.
Tom MacWright explained the problem last month. DC, like many governments, contracts with a company (in this case LexisNexis) to compile all of the laws and keep them updated as they change. They post the laws online, but with licenses that restrict your rights to reuse the information, even though it's the public law.
Rather than ignoring the problem or issuing silly legal threats against people who were digitizing the code without permission, Zvenyach worked with the advocates to create a version of the code free of these restrictions.
Mike Masnick writes at TechDirt:
Part of the issue was that the only digital copy of the code that they had was the one given to them by West, and it contained a variety of extraneous information that was West's IP, including West logos on each section of the law (representing many thousands of copies). Zvenyach had Joshua Tauberer come by and spend a day removing every bit of West IP from the document and quickly releasing a downloadable copy of the DC Code with a CC0 public domain license.Tom MacWright notes that this is just one step:
There are a few things that this isn't: it isn't the official copy of the code, and lawyers would be ill-advised to cite it alone. It isn't up-to-dateWhat can people do with an open source set of DC laws? We can think of a lot of things, but the best part is when people do things we don't think of. Some commenters on MacWright's post wondered why this matters; can't you just find the code on the existing website? Yes, you can't link directly to a part of the code, and can only download pieces in Microsoft Word, but so what?
— the council is fast-moving and this is just a snapshot. In time we'll fix these problems too.
So what is all the ways someone could build better tools to make it easier to find the laws. Someone already made a tool that's for some purposes better than the official site. Or people could write automated programs to compare the laws on some topics, like yielding to pedestrians, to those in other states. (Hey, that would be a great idea! Has someone done that yet?)
Do you have ideas or want to implement some? MacWright is organizing a hackathon on Sunday. If you build something neat with the code, let us know and we'll show it off here.
Say you're moving to the area, have a job, and want to find places with good transit to work. How do you figure it out? A lot of people just look at the Metro map and don't consider other modes, but a new service called AutNo is trying to help people locate near transit.
This is actually a problem I hear often. A family friend moved to DC a couple of years ago, for a job at PriceWaterhouseCoopers in Tysons. The Silver Line was still a few years off, but he wanted to live in a vibrant, urban neighborhood. Where should he go?
The bus maps are daunting to decipher. It took me a couple of hours to really puzzle through the combinations and cross-reference it with my general knowledge of housing prices in various neighborhoods.
Boston-based AutNo tries to help by putting rental listings and trip planning together in one interface. You can view available rentals (it doesn't have places for sale, yet), click on one, and see transit directions to your office or another location you specify.
The about page reads:
AutNo is the first apartment search designed and developed specifically for people without cars. For the first time since the automobile was invented, the percentage of Americans who drive to school or work is on the decline. Gas prices are skyrocketing and automobile carbon emissions are contributing to global warming. Commuting and living without an automobile is the way of the future for many people. AutNo is dedicated to helping these people find apartments.It will also show driving routes to work, too, if you want them.
You can narrow down results by price and number of bedrooms. A future feature that would be helpful is to also let people restrict the searches by travel time. That way, you could say that you want a place under $2,000 a month that's no more than a 45 minute trip to work, or whatever.
Basically, combine this with Mapnificent:
And, at the risk of sounding like a broken record: this is why open data is valuable. A transit agency might build a great app, but they're never going to build a mash-up of real estate data and transit data. When it's easy to put transit routing into an app, you not only can build apps that give people transit routing, but tools and apps that combine transit routing with almost anything else.
Update: I hadn't know it, but WalkScore actually has this exact Mapnificent-style feature. You can filter apartment listings by transit distance to a point:
However, when you click on an apartment, WalkScore does not show you the transit routing with trains and buses you would take, while AutNo does. Without that information, people won't as easily learn which buses might work best for them or be able to judge whether a location is really likely as acessible from transit as the system says.
It would be best to have both at once on the same site; as it is now, I'd recommend that people use a combination of both tools for their search.
There's a deep, persistent, and crippling problem with the laws of DC: you can't download a copy.
Due to a weak contract and a variety of legal techniques, it's not possible to create better ways to read the law or download it for offline access, or even to try to do better than the crummy online portal that serves as its official source.
It also means that it's hard to discuss legal matters online, since you can't link to specific laws
How the law became scarce
How did this happen? It's a tricky answer of access, ownership, and contracts.
The DC Council writes and publishes bills, which are additions and subtractions to the law itself. The law is compiled by a contractor
The contractor publishes a few different versions of the "compiled law," each of which with restrictions:
- The online portal has a "browsewrap" restriction against copying in full.
- The CD they publish has a "clickwrap" restriction against copying at all.
- Even the printed version has a registered copyright by the Council itself.
Unfortunately, courts have upheld these types of restrictions in the CD and website Terms of Service. They get further support from the wire fraud statute, which prosecutors used in the Aaron Swartz case to escalate charges to felonies. And in all of these versions, the contractor tries to claim copyright through compilation copyright and additional content like citations and prefaces.
In the face of these strong guards against freeing the law, the most reasonable avenue for creating a freely-accessible copy is buying and scanning the printed copies, which is exactly what some citizens are starting to do.
Why this matters
This has effects in many places. Advocacy organizations pushing for changes can't reference laws by linking to them, so they have to copy & paste relevant sections and hope that people trust their versions. Of course, when laws go out of date, these copy and pasted guides stop working.
The goal of better educating the police about laws (like the rules of the road for bicyclists) is harder. Police can't have an offline copy of the law for quick access in the field, and the online version is near-useless on smartphones.
It's also locking the DC Council into using a contractor for this purpose. DC's contracts with WestLaw and LexisNexis aren't strong enough to force the contractors to provide them with a copyright-cleaned version, so the council itself doesn't have a compiled copy of the law that they can publish by themselves if they want to take this in-house.
This is a hard problem to unwrap and fix, and there are multiple efforts afoot.
Waldo Jaquith is building The State Decoded, an open-source system for storing and displaying state codes. It's already deployed with Virginia's laws. Public Resource.org is working on the long task of
Meanwhile, it'll be months or years until it's possible to download DC's laws onto your iPhone and clarify whether it is, indeed, legal to bike on a sidewalk (sometimes) or drink in public space (never).
One of the best effects of open data is when people correlate data sets from very different places to generate interesting information. This graph cleverly combines DC's school quality tiers (known as "accountability categories") with Walk Score:
Sandra Moscoso wrote yesterday about how Code for DC's School Decisions Project has been gathering coders who want to use open data to help parents, students, and policymakers. This is one of the graphs they created at the recent Open Data Day using data from the Office of State Superintendent of Eduaction (OSSE).
I've asked to get access to the raw spreadsheet for this graph so we can look at, for example, which schools each dot represents. Here are the accountability categories by school. I will add the spreadsheet with WalkScore matched up with category when it's available. Update: here's the data as a CSV file.
A few things immediately jump out. The most successful DCPS schools have high Walk Scores, while the least successful ones mostly (but not entirely) cluster in the lower range. This may reflect the fact that a public school's success has a lot to do with the socioeconomic status of the neighborhood, and the local retail that is a big part of Walk Score locates in areas with higher incomes.
That income effect is also very pronounced in the graph Sandra posted yesterday:
That's not the case with charter schools. 3 of the 5 "reward" charters are in low-Walk Score areas (which could mean something, or just be a consequence of little data), while the "Rising" charters are basically all over the place. This may have a lot to do with the simple fact that since charters have to find and pay for their own space, they're in all manner of locations.
An interesting future step might be to correlate the school tiers with some data set about land prices or rents, or resident incomes. That could help illuminate whether charters end up locating in less-expensive areas, because they want to serve poorer residents and/or because they need cheaper land.
What do you see from looking at this data?
- It's fine to not build parking at Tysons Metro stations
- Arlington considers using fees to reduce parking
- Sexist Metro ad asks "Can't we just talk about shoes?"
- Downtown & Georgia Avenue Walmarts open for business
- Rural Virginia leads eastern US in cars per household
- Are our sports spaces serving both genders?
- Good design, lots of parking at Wheaton's tallest building