Knowledge Capture and Discovery

About

Knowledge graphs have become a common asset for representing world knowledge in data driven models and applications. In this tutorial participants will learn how to integrate, link and extend tabular data using Wikidata, a crowdsourced knowledge graph with over 60 million entities and a lively community of curators. The first part of the tutorial presents an overview of Wikidata, its data model and its query and update APIs. The second part presents tools to link tabular datasets to Wikidata, to augment them using Wikidata, to extend Wikidata with new data and to query and visualize the extended Wikidata knowledge graph.

Audience

This tutorial is designed for those who want to know more about the Wikidata knowledge graph, its data model, useful applications for browsing and visualizing its contents, and how to exploit Wikidata to link and extend existing tabular data.

Participants are recommended to have basic knowledge representation skills (RDF) and query language skills (SPARQL). Basic Python skills are necessary to handle the Python Notebooks and APIs used in the hands on exercises.

Structure and Resources

Introduction to Wikidata (Slides) (pdf)

Goals of the Wikidata project
The Wikidata data model
Comparison with DBpedia, YAGO and Cyc
Interesting statistcs about Wikidata

Getting Data From Wikidata (Slides (part 1)) (pdf (part 1))),(Slides (part 2)) (pdf (part 2)))

API and command-line tools
Wikidata SPARQL enpoint
Linking data to Wikidata items

Extending Wikidata ((Slides (part 1)) (pdf (part 1)) ((Slides (part 2)) (pdf (part 2))

Crowdsourcing, human curation
Bots for importing data from databases and for quality assurance
Tools to map structured data to Wikidata
Wikidata satellites to hold specialized data

Applications (Slides) (pdf)

Visualization: SQID, Reasonator, etc.
Wikipedia infoboxes
Data enrichment via multi-lingual labels, external identifiers, analytics

Discussion and Hands On

SPARQL and visualization
Wikifier

Tutors

Daniel Garijo

Computer Scientist, Information Sciences Institute (USC)

Researcher at the Information Sciences Institute of the University of Southern California. He received his PhD at the Ontology Engineering Group of the Universidad Politecnica de Madrid. His research is focused on using Semantic Web and Linked Data to facilitate the reuse and understanding of scientific workflows. Daniel has experience in presenting tutorials at international conferences such as Dublin Core and AAAI and universities such as Stanford, UCLA and USC.

Pedro Szekely

Principal Scientist, Information Sciences Institute (USC)

Principal Scientist and Research Director of the Center on Knowledge Graphs at the USC Information Sciences Institute, and a Research Associate Professor at the USC Computer Science Department. His research focuses on table understanding, knowledge graphs and applications of knowledge graphs. He teaches a graduate course at USC on Building Knowledge Graphs, and has given tutorials on knowledge graph construction at KDD, ISWC, AAAI and WWW.