Transforming Wikipedia into a Very Large Ontology

Michael Strube

NLP Group, EML Research gGmbH, Heidelberg

2009. 3. 9

301-421, 11:00 AM


Wikipedia provides a repository for world knowledge with more structure than the web and more coverage than manually created knowledge bases. Although its system of categories can be used straightforwardly as a semantic network, the Wikipedia categorization cannot be considered a proper taxonomy, as the relations between categories are not semantically typed.

In this presentation we will show how to induce an isa hierarchy on top of the Wikipedia categorization. We start by taking the category system in Wikipedia as a conceptual network. We then label the semantic relations between categories using methods based on connectivity in the network and lexico-syntactic matching. As a result we are able to derive a large scale taxonomy with isa relations between the concepts.

We evaluate the quality of the taxonomy by comparing it with ResearchCyc, one of the largest manually created ontologies, and show that the Wikipedia derived taxonomy compares favorably with it. We also discuss experiments on using Wikipedia for computing the semantic similarity of words. The Wikipedia derived taxonomy performs as well as measures using WordNet, a commonly used lexical database in Natural Language Processing. We conclude with a view on current work which includes labeling additional relations such as part-of, location and temporal ones.



This page is maintained by Ji-seon Yoo (
Last update: November 19, 2009