Automatic taxonomy construction

Automatic taxonomy construction (ATC) is the use of autonomous or semi-autonomous software programs to create hierarchical outlines or taxonomical classifications from a body of texts (corpus). It is a branch of natural language processing, which in turn is a branch of artificial intelligence. ATC programs are examples of software agents and intelligent agents, and may be autonomous as well (see autonomous agent).

Other names for ATC include taxonomy generation, taxonomy learning, taxonomy extraction, taxonomy building, and taxonomy induction. Any of these terms may be preceded by the word "automatic", as in automatic taxonomy induction. ATC is also referred to as semantic taxonomy induction.

A taxonomy is a tree structure and includes familial (parent-offspring, sibling, etc.) relationships built-in (like in a family tree). For example, physics is an offspring of physical science, which in turn is an offspring of science.

As mentioned above, the process is also called taxonomy induction. This is because, in order for a software program to construct a taxonomy from a corpus (for example, from Wikipedia, a web page, or the World Wide Web), it must induce which terms belong to the taxonomy and what the relationships between them are. Such as by identifying hyponym-hypernym pairs, among other approaches. This is done using algorithms, including statistical algorithms. Note that deduction (deductive logic) is often also employed (e.g., if B is a sibling of A, then B has the same parent as A and gets placed under that parent in the taxonomy).

The primary application of automatic taxonomy construction is in ontology learning, a central activity within ontology engineering. In computer science and artificial intelligence, an ontology is a conceptual model of a (subject) domain. A domain is a given subject area or specifically defined sphere of interest. An ontology of a domain includes the vocabulary of that domain and the relationships between those concepts or entities. The backbone of most ontologies is a taxonomy, and taxonomical structure may be used throughout an ontology.

As building taxonomies manually is extremely labor-intensive and time-consuming, there is great motivation to automate the process.

See also

Further reading

This article is issued from Wikipedia - version of the 4/20/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.