Berlin’s Core Datasets

The Berlin city administration possesses a treasure trove of potentially valuable data. The city’s E-Government Law (passed in 2016) stipulates that the government and all of its associated departments and agencies are obligated to make their data available as machine-readable open data. But often the question remains: where exactly should government entities start with open data?

In order to bring open data from theory into practice as well as to better demonstrate the value of open data, it can be extremely helpful to first set priorities – which are the datasets that the city most needs as open data? While Berlin has been publishing open data for several years, this is still a question we hear posed often. Thus, we created this list of “core” datasets for Berlin, which contains the datasets we believe offer the city and its inhabitants the greatest possible value.

About the core datasets table

The below table contains 100 datasets that we identified as being particularly valuable for Berlin when published as open data. We evaluated these datasets based on their potential reuse for the research, business, and civil society sectors; the possible efficiency gains for the government as a result of publishing these datasets; and the contribution of these datasets toward building a more transparent and democratic society. For a more complete explanation of how we assembled this list, see the accompanying publication we released with this project.

Rating individual datasets based on these five criteria allowed us to create a total score for each dataset that captured its potential; this is displayed in the table under the “Score” column. We also assessed the availability of individual datasets as machine-readable, open data; the results of this assessment are displayed in the “Availability” column. The scale is structured as follows:

0: The data either does not exist or its status is unknown
1: The data exists, but it’s not publicly accessible in any form
2: The data is publicly accessible but not machine-readable
3: The data is machine readable but has limitations (such as insufficient quality or granularity)
4: The data is available as fully open and high-quality data

For the cases where a source for the data (or at least, information about the data) was known, the corresponding link is included in the “Link” column.

Important to note: We chose these 100 datasets because we evaluated them to present the greatest potential value for the largest number of people. There are, of course, many more datasets that perhaps have more narrow appeal, but which nevertheless should be published as open data. The goal of this list was not to cover all datasets, but simply those with the largest potential. We are highly interested in feedback on this list, as well as a lively discussion on the role that open data can play in Berlin.

Has the availability of a certain dataset changed, or is one of our links outdated? Or, do you feel like a critical dataset is missing from this list? Feel free to send corrections and comments to dykes@technologiestiftung-berlin.de.

Data availability