2022: OSM & Wikidata Integration: Railway Station Case 🚉

Analyzes the integration potential of OSM and Wikidata through a case study of railway stations in 2022.

2022: OSM & Wikidata Integration: Railway Station Case 🚉
State of the Map
138 views • Oct 11, 2022
2022: OSM & Wikidata Integration: Railway Station Case 🚉

About this video

https://media.ccc.de/v/state-of-the-map-2022-academic-track-19388-comparative-integration-potential-analyses-of-osm-and-wikidata-the-case-study-of-railway-stations



In this work, we present analyses using a series of comparative data insights that help to better understand the potential and implications of integration between knowledge graphs and OSM.

OpenStreetMap(OSM) is one of the richest and most diverse sources of geographic information. However, it lacks a fundamental property vital for spatio-semantic analyses: hierarchical structure and semantic linkage. OSM provides links to existing knowledge graphs (structured data that conforms to a specific ontology) e.g., via the wikidata=* tags. The usage of these link-tags is currently limited to a small percentage of both OSM and Wikidata objects. Efforts were undertaken to enhance the geographic linking, linking nearby objects of the same type and semantic linking [1-3]. On the side of the hierarchical and semantic structuring of OSM, the WorldKG knowledge graph[4] provides a semantic mapping of a large subset of OSM. While the free and open OSM tagging scheme is a fundamental part of the OSM project that enabled its success, WorldKG overcomes the inherent lack of structure this tagging scheme represents, paving the way for a knowledge-graph integration of the OSM dataset. Still, open knowledge graphs and OSM are not fully integrated.

The following analyses provide a series of comparative data insights that help to better understand the potential and implications of integration between knowledge graphs and OSM. In this work, OSM is compared to Wikidata, one of the largest open knowledge graph projects from the Wikimedia Foundation that provides structured storage to other Wikimedia projects such as Wikipedia. Wikidata can, in many aspects, be compared to OSM by its community structure, its free and open nature, and simple contribution framework. In this work, the two datasets are first compared in size, data structure, and distribution. Later, we extend our analyses with a community comparison. The presented analyses also examine how two separate online communities with similar interests have evolved.

Grasping the size of the two projects is a straightforward task and visible on their websites: OSM features around 1 billion elements [5], while Wikidata is much smaller with over 97 million objects, of which approximately 9 million have geographic coordinates. The topic of railway stations was chosen because these objects have a comparable definition and are well represented in both datasets with ca. 130k and 100k elements in OSM and Wikidata, respectively, indicating integration potential. In OSM, railway stations are mapped by the tags 'railway=station' or 'railway=halt'. In Wikidata, the 'instance of (P31)' property containing 'Q55488' value represents Railway Station (object type).

By defining generalizable comparison indicators, the presented work provides a framework and source code (available at https://gitlab.gistools.geog.uni-heidelberg.de/giscience/ideal-vgi/osm-wikidata-comparison under the GNU Affero General Public License v3) for VGI project description, comparison, and monitoring. Similar approaches have been established for OSM contributors [6], for single OSM elements [7], and for small geographic regions [8]. For data collection in Wikidata, Wikidata API (https://www.wikidata.org/w/api.php) and Wikidata SPARQL endpoint were used. For Wikidata objects mapped with 'Railway Station', their revision history containing user information, timestamps, and a number of properties was collected. Overall contributions were collected from all users who have contributed to at least one object typed 'Railway Station'. OSM data collection was done using the ohsome API (https://ohsome.org) to extract all railway stations mapped in OSM, including their history and all edits made by the users who edited these railway stations. In addition to a general comparison between the datasets, we derived five sets for a more detailed comparison: OSM with links to Wikidata (59,441 elements), OSM without links to Wikidata (74,659), Wikidata that have links from OSM and are typed as railway stations (45,050), Wikidata without links to OSM but with geocoordinates (54,594) and Wikidata without links to OSM and without geocoordinates (6,714).

Our first analysis regarding the growth rate of the two sources showed that OSM has reached a saturated state regarding the number of railway stations, where only a few stations were added since mid-2020. Wikidata, on the other hand, still experiences a stable number of new stations that are added to the project. The two datasets depict no clear temporal correlation hinting towards two independent communities, meaning that edits in OSM are not followed by edits in Wikidata and vice versa. Despite the similar size of the two datasets at a global scale, the two datasets show significant discrepancies on a country level.

Tags and Topics

Browse our collection to discover more content in these categories.

Video Information

Views

138

Likes

4

Duration

26:46

Published

Oct 11, 2022

Related Trending Topics

LIVE TRENDS

Related trending topics. Click any trend to explore more videos.