The US Government efforts to create a culture of open government data is a big deal. Hopefully it signals a shift from the “pull” model of FOIA to a “push” mindset in which data is proactively returned to the public without first having to ask (and pay). Still, data.gov has a lot of room for improvement, as Clay Johnson of Sunlight Labs mentions here and here.
Clay’s criticisms are well founded, but what I’d like to see more of is some brainstorming about what our ideal data.gov would look like. A recent post about the coming data.gov.uk site provides a nice foil for us, for one, as the UK seems to be taking a very different approach. But more importantly, what would you want to see in a government data site, and how would you use it?
I heard once that it is a good exercise to try to compress an idea into three sentences or less — it forces you to understand what you really want to say. So here is my three sentence suggestion:
- Bring it all under one roof. The current data.gov site is like Yahoo! from the mid-90s: it is just a directory of links to other sites. This is a noble start, but we really need to get a single point of access if we want to revolutionize eGovernment. The government is an immense, heterogeneous organization, so this is as much an organizational challenge as a technical one. But there are plenty of precedents of systems which allow individual data publishers (the government agencies) to retain control over the publishing and updating of their own data, while allowing data consumers (the public) to access it all from a single location.
- .. But don’t forget to give credit. When offering a single access point for all the data, it is essential to keep metadata that tracks which data came from where. This is as important for book-keeping and data integration reasons as it is for simply giving credit where credit is due. Agencies that publish data sets of great use should be recognized for their work.
- Build it as you go. We don’t need the perfect system overnight. No single ontology, schema, or data format will be able to encompass all the government’s data. That’s OK — it doesn’t have to. Don’t let fear of not getting it perfect slow down incremental progress toward our goal. Just bringing the data under one roof is a fantastic start; you can always try to begin standardizing formats and “linking” it later. My next blog post will specifically address this topic.
- …And version data sets. The benefits of offering “versions” of datasets are threefold. First, it allows you to maintain a system in which data providers feel comfortable updating their data at will. Second, it allows the implementors of the system to feel comfortable experimenting with data integration techniques and knowing that, if it doesn’t work out, users still have access to the same system they did last week. Third, it is the ultimate expression of openness: like a subversion repository for the government, everyone will be able to see the evolution of data over time.
- Help users discover data. With the sheer volume of data available, publishing it isn’t enough — you have to help people find what they want. The current data.gov site already does a decent job of offering search functionality. We can go further, providing data “footnotes” for bloggers to link back into the data.gov site (see the DataPress project for an idea of how this might work), suggestions of “hot” data sets for particular areas of interest, or a government data blog that highlights new and important data that has been recently published.
- .. And let them tell you what they want. Your users — the citizens — are your best assets. Let them prioritize your tasks for you by allowing them to suggest and vote on features and data sets they would like to see added. This type of decentralized management strategy is making waves among the business community, and the same mindset can be applied to government.
So there are my three sentences: Bring it all under one roof, but don’t forget to give credit. Build it as you go, and version data sets along the way. Help users discover data, and let them tell you what they want.
What are your three?



