Takeaways from the 2018 International Open Data Conference

Experiences from IODC18

publication

2018-10-16

In the week of September 24 – 28, open data experts from across the globe descended on Buenos Aires, Argentina for the fifth gathering of the International Open Data Conference (and the first edition of the conference happening in the southern hemisphere). The actual conference was September 27-28, but there were a variety of satellite events happening in the days before the official conference (for example, the Open Cities Summit and the Open Data Research Symposium).

I had the chance to attend this conference and to hear first-hand what the discussions happening on the global stage around open data are. Based on the sessions I attended and the conversations I had, here are my top takeaways from my week in Buenos Aires.

1) A successful open data strategy doesn’t end with publishing data – it starts with it.

In the years since governments began embracing the idea of releasing their data to be viewed and reused by the public, there has been a gradual transition from focusing on the publishing of data toward focusing on the actual use and impact of the data. When the open data movement first emerged, there was a bit of naivety surrounding it (something many IODC participants freely admitted) – there persisted the idea that the act of releasing data would itself create transparency and trust between citizens and governments, and that innovation would flourish in such an open environment.

Today, we know better: publishing data online doesn’t magically mean that a government becomes more trustworthy, or that citizens’ lives are instantly better. It doesn’t just matter that the data is public: it matters what data was released and how that data can be and is being used. This line of thought feeds into deliberations of “open by default” versus “publishing with purpose”, i.e., whether governments should pursue policies that mandate all of their data should automatically be made public (with specific, limited exceptions), or whether it’s better to be thoughtful and deliberate about what datasets are prioritized for publishing. The IODC community generally seemed to think that publishing with purpose was a better step, at least in the short term, since it’s more likely to see the most useful data being published, but this remains an active point of contention within the open data community.

2) If you want your data to have an impact, you need to know and engage your users.

“User-centered design” has become a well-established practice in the development of digital tools and processes. It espouses the importance of taking time to understand your users and their specific needs and wants, rather than simply assuming you know best and designing an end product based on your possibly-false assumptions.

The same principles can and should be applied to open data strategies. Governments can’t assume they know what data citizens are interested in and what they want to use it for. If they genuinely want to see their data being used and reused, then they need to engage with users (or potential users) to first ask what data would be most of use or interest, and how this data should be delivered (e.g., what format it should take or how it should be structured).

This is admittedly easier said than done. A big reason why this is challenging ¬– one that was repeatedly mentioned at the conference – was the fact that there still isn’t a good understanding in governments of who the users of open data actually are. Sure, there are assumptions that businesses or citizen activists would make use of the data, but governments often don’t have any clear proof that their assumptions are correct (or incorrect, as the case may be). Administrations that genuinely want to see their data reach a wide audience need to make the effort to engage directly with their users (or possible users) – and not just online. How can you go about this? Representatives from Reboot, a social impact firm based in New York City, presented one project in which they interviewed more than a dozen individuals who were either actively using open data or who conceivably would have an interest in doing so. They used these interviews to develop archetypes (“user personas”) of both existing and potential open data users in NYC, with the intent of using these profiles to better understand user needs and guide future open data strategy. You can read more about this project here.

3) We need to get creative to bring open data to a wider audience.

People whose work revolves around data tend to forget they live in a bubble. To them, it’s obvious that if you have an interest in a specific, data-based topic (for example, how government money is being spent), you should search online to find where that data is available and if there are existing visualizations or reports that offer an in-depth analysis or new perspectives on this topic. But there are plenty of non-data-savvy people who regardless have an interest in how government money is being spent (or other types of data) – they just might not realize that the information exists online, or perhaps the information that exists may not be suitable for them (e.g., it may exist as reports or websites written using academic or overly formal language, or as complicated figures). How do you engage these types of people with open data?

Just because data is open to citizens doesn’t mean it is intelligible to citizens. Data illiteracy is highly prevalent across most societies and is a significant barrier against widespread appreciation and usage of open data. Civic tech organizations or other NGOs can help fill this gap by taking this data and building tools or applications that help make it more intelligible (something the Technologiestiftung tries to do, for example). But they still need to be conscious of whether they are communicating information in a way that the average person can understand and relate to – and be willing to use non-traditional means to do so (for example: using wall murals as a form of data visualization that reaches people in their own communities – see this example out of Tanzania that was presented at an IODC panel on the intersections of art and data).

4) Data standards need to be a bigger part of open data conversations.

There were multiple sessions on the topic of data standards during the IODC (including a day-long workshop preceding the conference), and they all tended to start with the same gentle reassurance: data standards are not nearly as scary and confusing as they seem at first blush.

It’s understandable why many people active in the open data sphere tend to shy away from this topic – data standards are often described using heavily technical language and standards themselves are usually structured as JSON code, both of which can be intimidating for people who consider themselves less tech-savvy (and while open data is a “tech” field, plenty of people working in it are not programmers). But the message from IODC was clear: everyone who works in the field of open data needs to at least be comfortable with the concept of data standards. As open data matures and we move from discussions of “why should we open up data” to “how can we maximize the value and usefulness of the data we’ve opened”, a crucial component of this becomes ensuring that different datasets covering similar topics can be combined or compared with minimal effort: data is far more powerful when able to viewed in a broader context rather than within a single, isolated instance. Data standards are what enables datasets collected in different contexts or by different actors to be joined together meaningfully.

But getting people to embrace data standards isn’t the only problem: once you’ve decided to adopt a standard, finding an appropriate standard and then choosing the right one (if there are multiple options) can be challenging. In some contexts (like transportation, for example) there are a plethora of standards available, and it’s not always immediately clear when standards are redundant or if they are actually serving distinctly different needs (and thus whether multiple standards are actually necessary). Initiatives like GovEx’s Data Standards Directory are trying to bring new clarity into this landscape by making it easier to find and differentiate data standards for various topics. But IODC discussions made it clear that a lot of ground remains to be covered with respect to making data standards better understood and more widely used.

If you went to IODC18 but didn't take a picture with the giant hashtag, were you really there at all?