Posts about Data Openness
The Sunlight Foundation has put together a great interactive map of contributions for the April 23 DC Council at-large special election.
Map by the Sunlight Foundation. Contribution data from the April 15 release
by the DC Office of Campaign Finance.
Their article by Ryan Sibley also shows many other interesting statistics, such as who got money from outside the region, the balance of corporate and individual contributions (Anita Bonds and Michael Brown got only about half individual contributions, while it's nearly 100% for Silverman), and more.
Sibley also notes that while DC's Office of Campaign Finance releases computer-readable data files with contribution information, some data is not in those files, like which candidate goes with a campaign committee. That's in PDFs, but PDF data isn't usable in mash-ups without human work.
What do you notice?
After open data advocates pointed out how ridiculous it is that private companies have a copyright on the only publicly-available versions of DC's laws, DC Council General Counsel David Zvenyach helped make a public domain version and posted it online.
Tom MacWright explained the problem last month. DC, like many governments, contracts with a company (in this case LexisNexis) to compile all of the laws and keep them updated as they change. They post the laws online, but with licenses that restrict your rights to reuse the information, even though it's the public law.
Rather than ignoring the problem or issuing silly legal threats against people who were digitizing the code without permission, Zvenyach worked with the advocates to create a version of the code free of these restrictions.
Mike Masnick writes at TechDirt:
Part of the issue was that the only digital copy of the code that they had was the one given to them by West, and it contained a variety of extraneous information that was West's IP, including West logos on each section of the law (representing many thousands of copies). Zvenyach had Joshua Tauberer come by and spend a day removing every bit of West IP from the document and quickly releasing a downloadable copy of the DC Code with a CC0 public domain license.Tom MacWright notes that this is just one step:
There are a few things that this isn't: it isn't the official copy of the code, and lawyers would be ill-advised to cite it alone. It isn't up-to-dateWhat can people do with an open source set of DC laws? We can think of a lot of things, but the best part is when people do things we don't think of. Some commenters on MacWright's post wondered why this matters; can't you just find the code on the existing website? Yes, you can't link directly to a part of the code, and can only download pieces in Microsoft Word, but so what?
— the council is fast-moving and this is just a snapshot. In time we'll fix these problems too.
So what is all the ways someone could build better tools to make it easier to find the laws. Someone already made a tool that's for some purposes better than the official site. Or people could write automated programs to compare the laws on some topics, like yielding to pedestrians, to those in other states. (Hey, that would be a great idea! Has someone done that yet?)
Do you have ideas or want to implement some? MacWright is organizing a hackathon on Sunday. If you build something neat with the code, let us know and we'll show it off here.
Say you're moving to the area, have a job, and want to find places with good transit to work. How do you figure it out? A lot of people just look at the Metro map and don't consider other modes, but a new service called AutNo is trying to help people locate near transit.
This is actually a problem I hear often. A family friend moved to DC a couple of years ago, for a job at PriceWaterhouseCoopers in Tysons. The Silver Line was still a few years off, but he wanted to live in a vibrant, urban neighborhood. Where should he go?
The bus maps are daunting to decipher. It took me a couple of hours to really puzzle through the combinations and cross-reference it with my general knowledge of housing prices in various neighborhoods.
Boston-based AutNo tries to help by putting rental listings and trip planning together in one interface. You can view available rentals (it doesn't have places for sale, yet), click on one, and see transit directions to your office or another location you specify.
The about page reads:
AutNo is the first apartment search designed and developed specifically for people without cars. For the first time since the automobile was invented, the percentage of Americans who drive to school or work is on the decline. Gas prices are skyrocketing and automobile carbon emissions are contributing to global warming. Commuting and living without an automobile is the way of the future for many people. AutNo is dedicated to helping these people find apartments.It will also show driving routes to work, too, if you want them.
You can narrow down results by price and number of bedrooms. A future feature that would be helpful is to also let people restrict the searches by travel time. That way, you could say that you want a place under $2,000 a month that's no more than a 45 minute trip to work, or whatever.
Basically, combine this with Mapnificent:
And, at the risk of sounding like a broken record: this is why open data is valuable. A transit agency might build a great app, but they're never going to build a mash-up of real estate data and transit data. When it's easy to put transit routing into an app, you not only can build apps that give people transit routing, but tools and apps that combine transit routing with almost anything else.
Update: I hadn't know it, but WalkScore actually has this exact Mapnificent-style feature. You can filter apartment listings by transit distance to a point:
However, when you click on an apartment, WalkScore does not show you the transit routing with trains and buses you would take, while AutNo does. Without that information, people won't as easily learn which buses might work best for them or be able to judge whether a location is really likely as acessible from transit as the system says.
It would be best to have both at once on the same site; as it is now, I'd recommend that people use a combination of both tools for their search.
There's a deep, persistent, and crippling problem with the laws of DC: you can't download a copy.
Due to a weak contract and a variety of legal techniques, it's not possible to create better ways to read the law or download it for offline access, or even to try to do better than the crummy online portal that serves as its official source.
It also means that it's hard to discuss legal matters online, since you can't link to specific laws
How the law became scarce
How did this happen? It's a tricky answer of access, ownership, and contracts.
The DC Council writes and publishes bills, which are additions and subtractions to the law itself. The law is compiled by a contractor
The contractor publishes a few different versions of the "compiled law," each of which with restrictions:
- The online portal has a "browsewrap" restriction against copying in full.
- The CD they publish has a "clickwrap" restriction against copying at all.
- Even the printed version has a registered copyright by the Council itself.
Unfortunately, courts have upheld these types of restrictions in the CD and website Terms of Service. They get further support from the wire fraud statute, which prosecutors used in the Aaron Swartz case to escalate charges to felonies. And in all of these versions, the contractor tries to claim copyright through compilation copyright and additional content like citations and prefaces.
In the face of these strong guards against freeing the law, the most reasonable avenue for creating a freely-accessible copy is buying and scanning the printed copies, which is exactly what some citizens are starting to do.
Why this matters
This has effects in many places. Advocacy organizations pushing for changes can't reference laws by linking to them, so they have to copy & paste relevant sections and hope that people trust their versions. Of course, when laws go out of date, these copy and pasted guides stop working.
The goal of better educating the police about laws (like the rules of the road for bicyclists) is harder. Police can't have an offline copy of the law for quick access in the field, and the online version is near-useless on smartphones.
It's also locking the DC Council into using a contractor for this purpose. DC's contracts with WestLaw and LexisNexis aren't strong enough to force the contractors to provide them with a copyright-cleaned version, so the council itself doesn't have a compiled copy of the law that they can publish by themselves if they want to take this in-house.
This is a hard problem to unwrap and fix, and there are multiple efforts afoot.
Waldo Jaquith is building The State Decoded, an open-source system for storing and displaying state codes. It's already deployed with Virginia's laws. Public Resource.org is working on the long task of
Meanwhile, it'll be months or years until it's possible to download DC's laws onto your iPhone and clarify whether it is, indeed, legal to bike on a sidewalk (sometimes) or drink in public space (never).
One of the best effects of open data is when people correlate data sets from very different places to generate interesting information. This graph cleverly combines DC's school quality tiers (known as "accountability categories") with Walk Score:
Sandra Moscoso wrote yesterday about how Code for DC's School Decisions Project has been gathering coders who want to use open data to help parents, students, and policymakers. This is one of the graphs they created at the recent Open Data Day using data from the Office of State Superintendent of Eduaction (OSSE).
I've asked to get access to the raw spreadsheet for this graph so we can look at, for example, which schools each dot represents. Here are the accountability categories by school. I will add the spreadsheet with WalkScore matched up with category when it's available. Update: here's the data as a CSV file.
A few things immediately jump out. The most successful DCPS schools have high Walk Scores, while the least successful ones mostly (but not entirely) cluster in the lower range. This may reflect the fact that a public school's success has a lot to do with the socioeconomic status of the neighborhood, and the local retail that is a big part of Walk Score locates in areas with higher incomes.
That income effect is also very pronounced in the graph Sandra posted yesterday:
That's not the case with charter schools. 3 of the 5 "reward" charters are in low-Walk Score areas (which could mean something, or just be a consequence of little data), while the "Rising" charters are basically all over the place. This may have a lot to do with the simple fact that since charters have to find and pay for their own space, they're in all manner of locations.
An interesting future step might be to correlate the school tiers with some data set about land prices or rents, or resident incomes. That could help illuminate whether charters end up locating in less-expensive areas, because they want to serve poorer residents and/or because they need cheaper land.
What do you see from looking at this data?
Starting at 12:06, Greater Greater Washington contributor Veronica Davis, WABA head Shane Farthing, and Arlington bike planner Chris Eatough will talk about bicycling in DC on the Kojo Nnamdi Show. Listen live or catch the archived audio once it's posted this afternoon.
They also posted this video which visualizes a few days of Capital Bikeshare trips:
This is yet another consequence of Capital Bikeshare's excellent decision to provide anonymous trip data. People have done all kinds of useful things with the data, like MV Jantzen's similar video and interactive visualization tool.
At the recent International Open Data Hackathon, Justin Grimes put the DC budget into a "treemap," a chart that shows a lot of items as rectangles of different sizes. This makes it very easy to understand how much money is going to different functions.
Since Justin's spreadsheet was public, I was able to make a copy to tweak a few things. I modified some of the titles to get the agency's abbreviation to the start, so that you can understand more of them in the top-level chart, and revised the color scale to one that should be more perceptible to color-blind readers.
The colors represent which categories increased or decreased in FY2013, the budget approved last year for the fiscal year we're in now. Green boxes increased more, while purple boxes decreased. Though sometimes categories in the DC budget grow and shrink because functions get shifted from one to another, so it can be tricky to really understand increase and decrease numbers without delving into the budget deeply.
What do you notice in the budget?
And if you make a better treemap using a tool without some of the limitations of the Google one, or make a treemap for another area jurisdiction's budget, let us know at email@example.com.
Thanks to Sandra Moscoso for the tip.
WMATA planners helped STLTransit create an animation of transit across the entire Washington region. That's possible because WMATA has a single data file with all regional agencies' schedules. They hope to make that file public; that would fuel even more tools that aid the entire region.
Click full screen and HD to see the most detail.
One of the obstacles for people who want to build trip planners, analyze what areas are accessible by transit, design visualizations, or create mobile apps is that our region has a great many transit agencies, each with their own separate data files.
Want to build a tool that integrates Metrobus, Fairfax Connector, and Ride On? You have to chase down a number of separate files from different agencies in a number of different places, and not all agencies offer open data at all.
The effect is that many tool builders, especially those outside the region, don't bother to include all of our regional systems. For example, the fun tool Mapnificent, which shows you everywhere you can reach in a set time from one point by transit, only includes WMATA, DC Circulator, and ART services. That means it just won't know about some places you can reach in Fairfax, Alexandria, Montgomery, or Prince George's.
Sites like this can show data for many cities all across the world without the site's author having to do a bunch of custom work in every city, because many transit agencies release their schedules in an open file format called the General Transit Feed Specification (GTFS). Software developer Matt Caywood has been maintaining a list of which local agencies offer GTFS files as well as open real-time data.
We've made some progress. Fairfax Connector, for example, recently started offering its own GTFS feed. But while DASH has one, you have to email them for it, and there's none for Prince George's The Bus.
The best way to foster more neat tools and apps would be to have a single GTFS file that includes all systems. As it turns out, there is such a beast. WMATA already has all of the schedules for all regional systems for its own trip planner. It even creates a single GTFS file now.
Michael Eichler wrote on PlanItMetro that they give this file to the regional Transportation Planning Board for its modeling, and offered it to STLTransit, who have been making animations showing all transit in a region across a single day.
This is one of many useful ways people could use the file. How about letting others get it? Eichler writes, "We are working to make this file publicly available."
Based on the STLTransit video, WMATA's file apparently includes 5 agencies that Caywood's list says have no public GTFS files: PG's TheBus, PRTC OmniLink and OmniRide, Fairfax CUE, Frederick TransIT, and Loudoun County Transit. It also covers Laurel Connect-a-Ride, Reston LINK, Howard Transit, the UM Shuttle, and Annapolis Transit, which aren't even on that list and which most software developers might not even think to look for even if they did have available files.
Last I heard, the obstacles to the file being public included WMATA getting permission from the regional transit agencies, and some trepidation by folks inside the agency about whether they should take on the extra work to do this or would get criticized if the file has any errors.
Let's hope they can make this file public as soon as possible. Since it already exists, it should be a no-brainer. If any regional agencies or folks at WMATA don't understand why this is good for transit, a look at this video should bring it into clear focus.
- Bikeshare is a gateway to private biking, not competition
- Short-term Washingtonians deserve a voice, too
- Judge denies injunction against closing schools
- DC Council makes major policy changes overnight
- Public land deals have both benefits and pitfalls
- Long-term closures: A solution to single-tracking?
- PG planners propose bold new smart growth future