Which public administration data are relevant for citizens, politics and the economy? For the implementation of Open Government Data, answering this question is of central importance. Given their oft-scarce resources, administrations are often forced to prioritize data publishing. As part of our project ODIS, we support administrations in the selection and processing of open datasets. As a result of this work, we frequently find ourselves reflecting on the possible “added value” that open data can provide for a government. We previously published here some examples of how cities can benefit from Open Data. To further enhance our understanding of the demand for open data, we have now tried an additional approach: the analysis of parliamentary inquiries from the Berlin House of Representatives (Berliner Abgeordnetenhaus).
What are parliamentary inquiries (schriftliche Anfragen)?
Parliamentary requests are an instrument of parliamentary scrutiny. They offer MPs a way to request information from the government on various aspects of the government’s activities (for example, information on specific policy initiatives or on government spending). The government must deliver a response within five weeks of receiving the request. The answers are freely accessible as individual PDFs thorugh the online parliament documentation system. The platform kleineAnfragen of the Open Knowledge Foundation collects all of these answers and their corresponding metadata and offers a machine-readable download of the responses. This collection opens up a huge amount of information about:
- Topics that are of interest to politics and the public
- Data that can potentially be provided as open data
The number of requests is increasing annually (only in 2016 – an election year – did the number decline). In 2018 alone, more than 3,000 inquiries were made to the Berlin administration. This can be information on specific events, e.g. an inquiry regarding a Women’s March on 18.2.2018, or general datasets, e.g. the class sizes of Berlin schools.
According to one American study, the amount of public information requests governments receive can be significantly reduced by proactively providing open data. Thus, open data represents not only a gain in transparency for citizens and parliamentarians, but also a reduction in the workload of administrative staff entrusted with answering parliamentary requests.
In the following data analysis, we examine the parliamentary requests from the Berlin House of Representatives to find out which topics are particularly in demand and the potential that these requests represent for open data.
More than 15,000 inquiries are listed on the kleineAnfragen website for Berlin in the period from November 2011 to January 2019. For each request there is a title, an applicant party, an answering administration and the full text of the request with the corresponding answers. Using the frequency of individual words in the request titles, we have identified the topics that are most frequently addressed in these inquiries.
Filler words (e.g. die (the), und (and), als (when)) and words specific to Berlin (e.g. Senat (Senate), Berlin, Bezirk (district)) that are not relevant for the analysis have been removed (here is a list of all of the “stopwords” used in this analysis). The remaining words were reduced to their word stem (for example, Kinder (children) to ‘kind’). After this data cleaning, it became possible to evaluate word frequency. To better understand the terms in context, they are presented below in a network visualization (words are displayed in the original German). The point size represents the frequency of a term, while the lines represent the connections between the terms. The more frequently two terms occur together in a title, the thicker the connecting line. When you click on a word, five example queries using this word in their title are listed below the graphic. To achieve a more filtered view, use the slider “minimum word count” to set the minimum number of times a word must appear in order to be displayed in the graphic.
The most popular Topics
Two major domestic political themes – education (school) and domestic security – are reflected in the two most common terms: Schule (school) (653 inquiries) and Polizei (police) (456 inquiries). Current political and media topics in Berlin are also reflected in the inquiries: Wohnungen (apartments), öffentlicher Nahverkehr (public transport), die Flughäfen (airports) (BER, Tempelhof und Tegel) und Flüchtlinge (refugees) are also among the most-inquired topics.
Education and childcare
The majority of all inquiries are made around the topic of education and school. Words used here are, for example, “Schule” (school) (653 queries), “Universität” (university) (129), “Bildung” (education) (80), “Lehrkraft” (teacher) (48) and “Willkommensklassen” (Welcome classes) (38), as well as “Kinder” (Children) (213), “Kita”/”Kitaplatz” (daycare) (175) and “Jugend” (youth) (104).
What data is already available on this topic? Berlin offers detailed information on locations and offers of schools via a search form. These data were used by the “Jede Schule” project, which developed a website that made these data easier to navigate and understand. The Senate Department for Education, Youth and Family also provides statistics on the number of pupils by type of school and region as well as nice visualizations of the movements of students between residential districts and school districts.
However, there is still some potential for further open data releases. For example, the SPD has made 134 requests for the “Sonderungsverbot” of private schools. Other queries containing interesting records include:
- WLAN at public schools
- PC equipment at schools
- Electricity and heating costs of schools
- School class visits to historical memorial sites
- Green schools
- Demand for educators
- Energy consumption and emissions from higher education institutions
- School class sizes
- Child poverty
- Day care for children
- Entry of non-traditional educators into the educational system - this inquiry was used for a visualization in this project by Thomas Tursics, for example
Police and crime
“Polizei” (Police) is the second-most common term in the query titles. The topic is in demand across all parties. For example, the SPD has requested detailed operating figures on each police department. The CDU has various requests for budgeted positions vs. actual fulfillment of these positions, and the FDP has submitted multiple inquiries on occupational safety and general health promotion within the police force. Other common terms are “fire brigade” (166), “organized crime” (74), “rocker crime” (53), and “prison” (32). For the attack on Breitscheidplatz on 19 December 2016, 94 requests were made. Last year, the Crime Atlas has already provided an important crime-related dataset as open data (you can read our blog post on the Crime Atlas here). Further potential open datasets based on parliamentary inquiries include:
- Reaction time of the Berlin police
- Property belonging to the police
- Police contact with citizens
- Overtime within the police force
- Motor vehicles belonging to the police
Apartments and real estate
On the topic of “Wohnen” (to reside) there were more requests in 2018 than in any year before. The term “wohn” exists “only” 161 time, but it is hidden in 731 titles in different terms, e.g. z.B. “wohnungslos” (homeless), “Wohnungsbaugesellschaften” (housing society), “Wohnraum” (living space), “Wohnungsnot” (housing shortage). Other terms on the subject are “Immobilien” (real estate) (70), “Neubau” (new construction”) (41) or “Unterbringung” (housing) (95). Possible interesting datasets from the inquiries are:
- Right of first refusal usages
- Student housing in Berlin
- Construction activities by the Senate Department for Urban Development and Housing
- Housing stock of housing associations
- Accessible apartments
Public transportation (BVG)
For public transport, there are information requests for “BVG” (149), “S-Bahn” (87), “Bahnhof” (train station) (79) and “U-Bahnhof” (subway station) (49). Through the regular publishing of the VBB timetable as open data, the VBB ensures that one critical public transportation dataset is available online. But other important datasets, like statistics on delays within the BVG network, still have to be explicitly requested via an inquiry every year. Other possible open data candidates:
- BVG depots
- Crime in the U-Bahn
- Motor vehicles belonging to the BVG
- Passenger Numbers for the BVG and S-Bahn
There are regular information requests for Berlin’s airports: “BER” (215), “Tegel” (99) and “Tempelhof” (66). Potentially relevant data sets are, for example, the size of the building located at the Tempelhof airfield or the capacity of the BER airport in comparison to Schönefeld.
On the subject of “Flüchtlinge” (refugees) (229) there were more than 70 inquiries in 2015. In 2018 there were only 15. The State Office for Refugee Affairs now offers informative figures and visualizations, e.g. on accommodations for refugees and on refugee arrivals.
The most-inquired government bodies
The senate departments for Transport & Environment, Home Affairs & Sports, Education & Youth, and Social Affairs received the most inquiries. (Note: Senate departments marked with (WP17) have since been restructured or renamed in the current legislative term).
The Senate Department for Domestic Affairs & Sports received the most requests in both terms. These were mainly requests related to the police, fire department, crime, the protection of the constitution and public swimming facilities.
The Senate Department for Environment, Transport and Climate Protection (WP18) received the second-most requests in election period 18. Inquiries mainly concern public transport, green spaces, bridges, renovations of buildings and airports. The Senate Department for Urban Development and Environment (WP17) received the most requests in the previous parliamentary term. In addition to overlapping topics such as rail, airport and accessibility, inquiries were made regarding state-owned housing associations, the “State Opera scandal”, and apartments.
The Senate Department for Education, Youth and Family (or previously Education, Youth and Science) received the third-most requests. Topics included schools, children, youth, day-care centers, and underage refugees.
Which party makes the most inquiries?
Opposition parties post the most inquires in both legislative terms, but a significant proportion is coming from the ruling parties as well. Thus, during the legislative period 17, 2400 inquiries were made by Die Grünen and Die Piraten, but the CDU and SPD also each made over 1000 inquiries. In the current legislature, most requests come from the strongest opposition party, the CDU (over 1700). However, the ruling parties SPD, Die Grünen and Die Linke also already each made almost 1000 requests. The obvious assumption that fewer inquiries are made to Senate Departments by party members of the same party can not be confirmed in this way. Although there is a tendency, especially in the legislative term 17, many requests are made to Senate Departments under the leadership of the same party (Exact numbers are available here in the analysis).
Implications for an open data strategy
A considerable number of the parliamentary inquiries submitted to government bodies in Berlin are answered by administrations using data as support for their answers. About one third of the more than 15,000 answers contain tables, which is usually an indication that a corresponding database (or at least dataset) exists somewhere. In addition, answers often provide aggregate numbers and statistics that suggest interesting underlying raw data: For example, this request mentions that there are regular school-based statistical surveys showing that roughly 18.2% of kids in Berlin schools can’t swim. These statistical surveys could be broken down by districts, schools or grade levels to form a relevant dataset.
Although the answers to parliamentary requests are generally published via the parliamentary documentation system, to date these answers (including the taables of data that are often included) are only available as PDFs – a format known to make reuse of its contents difficult. A accompanying publication of raw data in machine-readable, open data-compliant formats (like CSV files) could significantly increase the reusability of these answers. In principle, it would also be conceivable to publish such data directly via the Berlin Open Data Portal. Even more helpful would be creation of more open interfaces to the administration through which MPs (as well as citizens) can query databases directly to obtain the data they are interested in and thereby avoid the detour of a parliamentary inquiry. This would not least be a considerable relief for the administrative staff themselves.
Limitations of the analysis
When interpreting these results, it should be remembered that not all queries refer to data sets. Thus, it is not possible to derive relevance for open data from the frequencies of the terms in all cases. It is also important to remember that data does not need to be requested if it is already readily available. Consequently, the frequency with which a topic is requested does not correspond one-to-one with the relevance of that topic (phrased differently: this analysis does not give us a list of the most important topics for politicians in Berlin, it only gives us a list of the topics that they most frequently request information on, i.e., topics for which there is not currently enough publicly available information).
The analysis of the titles gives a rough overview of the topics, but is not a completely accurate evaluation: MPs have full creative freedom to choose a title of their liking, and there are no guidelines or rules for how the titles should be structured. Thus, not all included topics of the requests are apparent from the title. An analysis of the full texts would be a much more complex undertaking, but it would potentially allow for a more detailed and nuanced analysis of the inquiries and their content. Finally, the following limitations of the actual text analysis should be considered: synonyms are not factored into the analysis, the analysis package used (NLTK Snowball Stemmer) was not always able to separate compound words properly (e.g. “Kitaausbau” is not separated into “Kita” and “Ausbau” and thus not assigned to the term “Kita”), and the analysis was not able to account for cases where a word might have two meanings (e.g. the German word “Behinderung” (disability) can refer to obstruction of traffic or physical disability).