Linking, Extending, Exploiting and Enhancing
Tabular Data with Wikidata
K-CAP 2019, Marina del Rey, Los Angeles
Knowledge graphs have become a common asset for representing world knowledge in data driven models and applications. In this tutorial participants will learn how to integrate, link and extend tabular data using Wikidata, a crowdsourced knowledge graph with over 60 million entities and a lively community of curators. The first part of the tutorial presents an overview of Wikidata, its data model and its query and update APIs. The second part presents tools to link tabular datasets to Wikidata, to augment them using Wikidata, to extend Wikidata with new data and to query and visualize the extended Wikidata knowledge graph.
This tutorial is designed for those who want to know more about the Wikidata knowledge graph, its data model, useful applications for browsing and visualizing its contents, and how to exploit Wikidata to link and extend existing tabular data.
Participants are recommended to have basic knowledge representation skills (RDF) and query language skills (SPARQL). Basic Python skills are necessary to handle the Python Notebooks and APIs used in the hands on exercises.
Structure and Resources
- Introduction to Wikidata (Slides) (pdf)
- Goals of the Wikidata project
- The Wikidata data model
- Comparison with DBpedia, YAGO and Cyc
- Interesting statistcs about Wikidata
- Getting Data From Wikidata (Slides (part 1)) (pdf (part 1))),(Slides (part 2)) (pdf (part 2)))
- API and command-line tools
- Wikidata SPARQL enpoint
- Linking data to Wikidata items
- Extending Wikidata ((Slides (part 1)) (pdf (part 1)) ((Slides (part 2)) (pdf (part 2))
- Crowdsourcing, human curation
- Bots for importing data from databases and for quality assurance
- Tools to map structured data to Wikidata
- Wikidata satellites to hold specialized data
- Applications (Slides) (pdf)
- Visualization: SQID, Reasonator, etc.
- Wikipedia infoboxes
- Data enrichment via multi-lingual labels, external identifiers, analytics
- Discussion and Hands On
- SPARQL and visualization
Computer Scientist, Information Sciences Institute (USC)
Researcher at the Information Sciences Institute of the University of Southern California. He received his PhD at the
Ontology Engineering Group of the Universidad Politecnica de Madrid. His research is focused on using Semantic Web and Linked Data to facilitate the reuse and understanding of scientific workflows. Daniel has experience in presenting tutorials at international conferences such as Dublin Core and AAAI and universities such as Stanford, UCLA and USC.
Principal Scientist, Information Sciences Institute (USC)
Principal Scientist and Research Director of the Center on Knowledge Graphs at the USC Information Sciences Institute, and a Research Associate Professor at the USC Computer Science Department. His research focuses on table understanding, knowledge graphs and applications of knowledge graphs. He teaches a graduate course at USC on Building Knowledge Graphs, and has given tutorials on knowledge graph construction at KDD, ISWC, AAAI and WWW.