Posts about Data Openness
If you go to the large and interactive version on PlanItMetro, you can mouse over individual squares to see the date as a tooltip.
The darkest red days have the lowest ridership, the darkest green the highest. You can see high ridership events like President Obama's January
2005 2009 inauguration, the Stewart/Colbert rally in October 2010, Snowmageddon/Snowpocalypse in February 2010, and more.
Stepping back, it's clear how ridership is highest in April, June, and July, and the number of very high ridership days jumped significantly in 2008 but then has stayed flat or a bit down since. Weekend ridership has gotten lower in recent years, probably because of all the trackwork.
What do you notice?
June 1 is National Day of Civic Hacking and a series of DC education projects are planned for DC's Hack for Change event. In anticipation of this event, the Office of the State Superintendent of Education (OSSE) released some very exciting data about school populations that can potentially help families negotiate school choice.
This data is particularly useful when combined with other datasets. It can help to answer questions like:
- If I enroll my child in a school outside of our neighborhood, will there be a critical mass of students there from our neighborhood? (I think about this a lot, as I currently rely on other families for carpools to/from after-school activities).
- What is the neighborhood composition of my children's school? (There are infinite conversations about in boundary/out of boundary on school listservs. It would be nice to differentiate between out of boundary (a block or two outside of boundary) versus out of the ward altogether.
Adding more elements to the data, like special programming or lottery results (we're hoping to get multi-year results from DCPS and charters) could also provide information about the type of "school formula" that seems to attract families. Who knows? I can't wait to find out!
The Sunlight Foundation has put together a great interactive map of contributions for the April 23 DC Council at-large special election.
Map by the Sunlight Foundation. Contribution data from the April 15 release
by the DC Office of Campaign Finance.
Their article by Ryan Sibley also shows many other interesting statistics, such as who got money from outside the region, the balance of corporate and individual contributions (Anita Bonds and Michael Brown got only about half individual contributions, while it's nearly 100% for Silverman), and more.
Sibley also notes that while DC's Office of Campaign Finance releases computer-readable data files with contribution information, some data is not in those files, like which candidate goes with a campaign committee. That's in PDFs, but PDF data isn't usable in mash-ups without human work.
What do you notice?
After open data advocates pointed out how ridiculous it is that private companies have a copyright on the only publicly-available versions of DC's laws, DC Council General Counsel David Zvenyach helped make a public domain version and posted it online.
Tom MacWright explained the problem last month. DC, like many governments, contracts with a company (in this case LexisNexis) to compile all of the laws and keep them updated as they change. They post the laws online, but with licenses that restrict your rights to reuse the information, even though it's the public law.
Rather than ignoring the problem or issuing silly legal threats against people who were digitizing the code without permission, Zvenyach worked with the advocates to create a version of the code free of these restrictions.
Mike Masnick writes at TechDirt:
Part of the issue was that the only digital copy of the code that they had was the one given to them by West, and it contained a variety of extraneous information that was West's IP, including West logos on each section of the law (representing many thousands of copies). Zvenyach had Joshua Tauberer come by and spend a day removing every bit of West IP from the document and quickly releasing a downloadable copy of the DC Code with a CC0 public domain license.Tom MacWright notes that this is just one step:
There are a few things that this isn't: it isn't the official copy of the code, and lawyers would be ill-advised to cite it alone. It isn't up-to-date—What can people do with an open source set of DC laws? We can think of a lot of things, but the best part is when people do things we don't think of. Some commenters on MacWright's post wondered why this matters; can't you just find the code on the existing website? Yes, you can't link directly to a part of the code, and can only download pieces in Microsoft Word, but so what?
the council is fast-moving and this is just a snapshot. In time we'll fix these problems too.
So what is all the ways someone could build better tools to make it easier to find the laws. Someone already made a tool that's for some purposes better than the official site. Or people could write automated programs to compare the laws on some topics, like yielding to pedestrians, to those in other states. (Hey, that would be a great idea! Has someone done that yet?)
Do you have ideas or want to implement some? MacWright is organizing a hackathon on Sunday. If you build something neat with the code, let us know and we'll show it off here.
The Sunlight Foundation has named Greater Greater Education contributor Sandra Moscoso as an "OpenGov Champion" for her efforts to get more open education data in DC for parents and policymakers. Sandra also talks about her quest to ensure funding for school librarians and the effect librarians have on children, which she has blogged about here.
Also, Sandra shows off her bicycle-family cred by having 4 (or is that 5?) bicycles in the background during the interview. Congratulations, Sandra!
Say you're moving to the area, have a job, and want to find places with good transit to work. How do you figure it out? A lot of people just look at the Metro map and don't consider other modes, but a new service called AutNo is trying to help people locate near transit.
This is actually a problem I hear often. A family friend moved to DC a couple of years ago, for a job at PriceWaterhouseCoopers in Tysons. The Silver Line was still a few years off, but he wanted to live in a vibrant, urban neighborhood. Where should he go?
The bus maps are daunting to decipher. It took me a couple of hours to really puzzle through the combinations and cross-reference it with my general knowledge of housing prices in various neighborhoods.
Boston-based AutNo tries to help by putting rental listings and trip planning together in one interface. You can view available rentals (it doesn't have places for sale, yet), click on one, and see transit directions to your office or another location you specify.
The about page reads:
AutNo is the first apartment search designed and developed specifically for people without cars. For the first time since the automobile was invented, the percentage of Americans who drive to school or work is on the decline. Gas prices are skyrocketing and automobile carbon emissions are contributing to global warming. Commuting and living without an automobile is the way of the future for many people. AutNo is dedicated to helping these people find apartments.It will also show driving routes to work, too, if you want them.
You can narrow down results by price and number of bedrooms. A future feature that would be helpful is to also let people restrict the searches by travel time. That way, you could say that you want a place under $2,000 a month that's no more than a 45 minute trip to work, or whatever.
Basically, combine this with Mapnificent:
And, at the risk of sounding like a broken record: this is why open data is valuable. A transit agency might build a great app, but they're never going to build a mash-up of real estate data and transit data. When it's easy to put transit routing into an app, you not only can build apps that give people transit routing, but tools and apps that combine transit routing with almost anything else.
Update: I hadn't know it, but WalkScore actually has this exact Mapnificent-style feature. You can filter apartment listings by transit distance to a point:
However, when you click on an apartment, WalkScore does not show you the transit routing with trains and buses you would take, while AutNo does. Without that information, people won't as easily learn which buses might work best for them or be able to judge whether a location is really likely as acessible from transit as the system says.
It would be best to have both at once on the same site; as it is now, I'd recommend that people use a combination of both tools for their search.
There's a deep, persistent, and crippling problem with the laws of DC: you can't download a copy.
Due to a weak contract and a variety of legal techniques, it's not possible to create better ways to read the law or download it for offline access, or even to try to do better than the crummy online portal that serves as its official source.
It also means that it's hard to discuss legal matters online, since you can't link to specific laws—
How the law became scarce
How did this happen? It's a tricky answer of access, ownership, and contracts.
The DC Council writes and publishes bills, which are additions and subtractions to the law itself. The law is compiled by a contractor—
The contractor publishes a few different versions of the "compiled law," each of which with restrictions:
- The online portal has a "browsewrap" restriction against copying in full.
- The CD they publish has a "clickwrap" restriction against copying at all.
- Even the printed version has a registered copyright by the Council itself.
Unfortunately, courts have upheld these types of restrictions in the CD and website Terms of Service. They get further support from the wire fraud statute, which prosecutors used in the Aaron Swartz case to escalate charges to felonies. And in all of these versions, the contractor tries to claim copyright through compilation copyright and additional content like citations and prefaces.
In the face of these strong guards against freeing the law, the most reasonable avenue for creating a freely-accessible copy is buying and scanning the printed copies, which is exactly what some citizens are starting to do.
Why this matters
This has effects in many places. Advocacy organizations pushing for changes can't reference laws by linking to them, so they have to copy & paste relevant sections and hope that people trust their versions. Of course, when laws go out of date, these copy and pasted guides stop working.
The goal of better educating the police about laws (like the rules of the road for bicyclists) is harder. Police can't have an offline copy of the law for quick access in the field, and the online version is near-useless on smartphones.
It's also locking the DC Council into using a contractor for this purpose. DC's contracts with WestLaw and LexisNexis aren't strong enough to force the contractors to provide them with a copyright-cleaned version, so the council itself doesn't have a compiled copy of the law that they can publish by themselves if they want to take this in-house.
This is a hard problem to unwrap and fix, and there are multiple efforts afoot.
Waldo Jaquith is building The State Decoded, an open-source system for storing and displaying state codes. It's already deployed with Virginia's laws. Public Resource.org is working on the long task of
Meanwhile, it'll be months or years until it's possible to download DC's laws onto your iPhone and clarify whether it is, indeed, legal to bike on a sidewalk (sometimes) or drink in public space (never).
On Friday DCPS released its initial budget allocations for the 2014 school year. This year's budget includes more information to help average parents and residents better understand the budget.
Easy-to-digest breakouts for individual schools show how DCPS is allocating funds among administrators, classroom teachers, special education, arts, and more. They show how each school's funding has changed from the previous year and enrollment trends over several years. For example, here is the worksheet for Maury Elementary.
This format makes it simpler for people who are interested in the budget, but can't spend the time poring over complex graphs or the rules that dictate how DCPS allocates money. DCPS is even providing a one-pager that allows you to quickly compare all DCPS schools on overall allocation and per pupil funding.
For people who want to compare funding among schools (like for example, how many schools get funding for a full-time librarian), however, the PDFs don't organize the data in the most useful way. To analyze information across schools in more detail, you have to invest a significant amount of time clicking through individual PDFs, copying and pasting each amount and category into a spreadsheet.
To improve upon the current format, and to truly make this process transparent, DCPS could publish the budgets in a structured format like a spreadsheet, breaking out the allocations so that each allocation category is a cell in a row. Even better, OSSE could publish DCPS and DC public charter school allocations in a common structured (spreadsheet) format.
The best scenario would be to publish all school budgets as open data (free and machine-readable) in a central data catalog like data.dc.gov, and to publish not only allocations but actual expenditures. How about not just how we planned to spend the money, but how it was actually spent?
One of the best effects of open data is when people correlate data sets from very different places to generate interesting information. This graph cleverly combines DC's school quality tiers (known as "accountability categories") with Walk Score:
Sandra Moscoso wrote yesterday about how Code for DC's School Decisions Project has been gathering coders who want to use open data to help parents, students, and policymakers. This is one of the graphs they created at the recent Open Data Day using data from the Office of State Superintendent of Eduaction (OSSE).
I've asked to get access to the raw spreadsheet for this graph so we can look at, for example, which schools each dot represents. Here are the accountability categories by school. I will add the spreadsheet with WalkScore matched up with category when it's available. Update: here's the data as a CSV file.
A few things immediately jump out. The most successful DCPS schools have high Walk Scores, while the least successful ones mostly (but not entirely) cluster in the lower range. This may reflect the fact that a public school's success has a lot to do with the socioeconomic status of the neighborhood, and the local retail that is a big part of Walk Score locates in areas with higher incomes.
That income effect is also very pronounced in the graph Sandra posted yesterday:
That's not the case with charter schools. 3 of the 5 "reward" charters are in low-Walk Score areas (which could mean something, or just be a consequence of little data), while the "Rising" charters are basically all over the place. This may have a lot to do with the simple fact that since charters have to find and pay for their own space, they're in all manner of locations.
An interesting future step might be to correlate the school tiers with some data set about land prices or rents, or resident incomes. That could help illuminate whether charters end up locating in less-expensive areas, because they want to serve poorer residents and/or because they need cheaper land.
What do you see from looking at this data?
DC has made a major commitment to pre-kindergarten education. Are these programs improving kids' performance in the rest of their education? Based on information available so far, we don't know for sure. We do know that a pre-K program has to be high quality to make a difference, and some do better than others.
In his State of the Union address, President Obama proposed greatly expanding pre-k and other early childhood education programs nationwide.
President Obama mentioned the success that Georgia and Oklahoma have had with their early education programs. He failed to mention that here, in the District of Columbia, high-quality education is already widespread for 3- and 4- year olds and an accomplishment that we should celebrate.
DC's Pre-K Enhancement and Expansion Amendment Act of 2008 guarantees all DC 3- and 4- year olds a pre-K seat. While the District's claims of having already achieved "universal access" can be debated, the important question, today, is "are early education programs having an impact on student achievement?"
What is the impact of early education in DC?
Last summer, Mayor Gray, along with officials from the Office of the State Superintendent of Education, announced that District students participating in pre-kindergarten programs demonstrated gains in overall proficiency by 3rd and 4th grade on the 2012 District of Columbia Comprehensive Assessment System (DC CAS).
City officials claim that the test results showed slight increases for both reading and math District-wide by students from both DC Public Schools and public charter schools who had attended early education programs.
However, a study released last December by DC Action for Children shows no significant improvement in the math and reading performance of third-grade students in DC public schools since 2007.
These two contrasting views clearly demonstrate that we need better data to evaluate our early education programs. (It's also important to note that the two evaluations used different methologies.)
Our early education programs must be high quality
Even if early education isn't helping DC students today, it doesn't necessarily mean we shouldn't invest in universal early education. It means we should invest in high-quality universal early education.
It is generally accepted that early education programs must meet specific quality standards to impact a child's cognitive and social development. The President, as well as early education advocates, have long cited studies from the Perry Preschool Project, in Ypsilanti, Michigan, and the Carolina Abecedarian Project, in Chapel Hill, North Carolina, to claim that "high-quality early education provides the foundation for all children's success in school and helps to reduce achievement gaps."
In the District, the 2008 law requires many of the elements of "high-quality" programs, including small class sizes (16 children and 2 adults) and an approved curriculum. Lead teachers must have a bachelor's degree or higher, and assistant teachers need at least an associate's degree.
The legislation includes provisions for professional development for teachers; comprehensive, wrap-around services for children and their families, including home visits; and a parental component, including educational workshops, parent association meetings, and parent-teacher conferences.
While I haven't visited all District schools and I've been in only a few of the private and nonprofit early education classrooms that get support from the District's Pre-K legislation, I have seen remarkable differences among the classrooms. High-quality classrooms should be stocked with developmentally-appropriate materials. Children should be able to move around the classroom, engaging in hands-on activities. Adults should interact meaningfully with those children, helping them deepen the knowledge that they're gaining through play.
Many programs meet this standard; unfortunately, others fail this "I'll know it when I see it" test, despite their ability to check off the other more tangible measures of "quality."
There is hope; data is coming
District officials acknowledge, in a recent RAISE DC report, that the District lacks a clear measure for kindergarten readiness, "making it difficult to know overall where our youngest children are in meeting academic and developmental benchmarks." Additionally, despite the fact that the Office of the State Superintendent for Education (OSSE) insists that every preschool and pre-K program meet their new "gold" standard when they apply for funding, a visit to classrooms in different parts of the city clearly shows that differences do exist.
For the past few years, OSSE has been developing a long-awaited Statewide Longitudinal Education Data Warehouse (SLED). Developed with federal Department of Education funds, the SLED will track students across DCPS and the charters from kindergarten through high school.
The SLED works by assigning every student in the District a unique student identifier and uses that number to track students through their educational development, even as they change schools. By running a report in SLED, education agencies and school staff can look at real-time, standardized enrollment data broken down by gender, ward, and grade. Appropriate staff can also look at assessment scores and individual student progress.
The current SLED incorporates 9 years of enrollment audit data and the last 5 years of DC CAS data. OSSE wants to expand SLED to include early childhood, college enrollment and adult education data starting later this year.
Having comprehensive and longitudinal data for students is imperative if we want to track and improve educational outcomes for students, as well as if we want to ensure programs that we're investing in are meeting their goals—
- Metro proposes ending late-night service PERMANENTLY. That's a terrible idea.
- For Metro's plans to cut late-night service, big questions remain unanswered
- Find out your personal Metro on-time stats with this tool
- What do you think of these bike plans for Columbia Pike?
- DC's 43,766 acres: 25% "roads," 2% high-rises
- 50% of DC residents live on only 20% of the land
- This may be DC's most ridiculous missing crosswalk