The library track is a real-word task to match the STW and the TheSoz thesaurus. Both provide a vocabulary for economic resp. social science subjects and are used by libraries for indexation and retrieval. Altogether, the goal of this track is to show whether the matching systems can handle these lightweight ontologies including a huge amount of concepts and additional descriptions. To evaluate the results two steps are performed: first the automatic evaluation against an existing reference alignment and second the manual consideration of correspondences which are not included in the reference alignment.
For this track, we use specific versions of the thesauri where extensions like SKOS-XL or own ones are not included.
There are two possibilites to run the data set. You can either download the data set or run it on the SEALS platform.
This reference alignment only contains equivalence correspondences between descriptors. Other correspondences, e.g. with subsumption relation or between descriptors and non-descriptors, will be manually checked, if possible. It is not a strict 1:1 alignment! This is the updated reference alignment with additional correspondences which have been found by the matching systems.
We applied the following transformations to create an OWL-Version out of SKOS:
skos:concept ➔ owl:class
skos:prefLabel, skos:altLabel ➔ rdfs:label
skos:scopeNote, skos:notation ➔ rdfs:comment
skos:narrower ➔ rdfs:superClassOf
skos:broader ➔ rdfs:subClassOf
skos:related ➔ rdfs:seeAlso
Since OWL does not provide properties like preferred label and alternative label, we map both labels to rdfs:label. Thus, the information is lost which label is the preferred. If you like to have a more fine-grained distinction, you can implement your own SKOS to OWL transformation.
We invented a new namespace for the OWL version to avoid any confusion with the original data. First, we changed the base namespaces as following:
http://zbw.eu/stw/ ➔ http://stw.owl
http://lod.gesis.org/thesoz/ ➔ http://thesoz.owl
Within the original data, the concepts are divided into descriptors and non-descriptors. This disitnction is dropped during the transformation into OWL. Thus, the corresponding encoding into the URIs is also omitted.
Moreover, only letters and dots are permitted within the URI. Examples:
http://zbw.eu/stw/thsys/72180 ➔ http://stw.owl#72180
http://zbw.eu/stw/descriptor/16207-5 ➔ http://stw.owl#16207.5
http://lod.gesis.org/thesoz/classification/4.1.07 ➔ http://thesoz.owl#4.1.07
http://lod.gesis.org/thesoz/concept/10034303 ➔ http://thesoz.owl#10034303
These transformations simplify the matching but nevertheless the original data can be reconstructed.
The STW Thesaurus for Economics provides vocabulary on any economic subject: more than 6,000 standardized subject headings (skos:Concepts, with preferred labels in English and German) and 19,000 additional keywords (skos:altLabels) in both languages. The vocabulary was developed for indexing purposes in libraries and economic research institutions and includes technical terms used in law, sociology, or politics, and geographic names. The entries are richly interconnected by 16,000 skos:broader/narrower and 10,000 skos:related relations. An additional hierarchy of main categories provides a high level overview. The vocabulary is maintained on a regular basis by ZBW German National Library of Economics - Leibniz Centre for Economics and published under a CC-by-sa-nc license. An online XHTML/RDFa version for convenient browsing is available here.
The Thesaurus for the Social Sciences (TheSoz) serves as a crucial instrument for indexing documents and research information in the social sciences. It contains overall about 12,000 keywords, from which 8,000 are standardized subject headings (in English and German) and 4,000 additional keywords. The thesaurus covers all topics and sub-disciplines of the social sciences. Additionally terms from associated and related disciplines are included in order to support an accurate and adequate indexing process of interdisciplinary, practical-oriented and multi-cultural documents. The thesaurus is owned and maintained by GESIS - Leibniz Institute for the Social Sciences. Its SKOS version is published under a CC-by-nc-nd license. A HTML representation for browsing is available here.
Since the existing mapping of STW and TheSoz has been manually created by domain experts in the KoMoHe project and does not cover the changes and enhancements in both thesauri since 2006, the evaluation is supposed to show whether the creation of the alignment can be automated and to which degree. Moreover, the could possibly inform further automatic or semi-automatic mapping precedures to be implemented for a regular maintenance of the mapping. Additionally, we would like to see how current state-of-the-art matching systems are able to deal with lightweight ontologies which are very often used in practice. Due to the large amount of concepts together with the plenty of semantic relations and additional keywords, the matching systems need to find a way how to deal with these conditions. Thus, it should become clear which matching techniques are best suitable for such real-world tasks.
Dominique Ritze (Research Gorup Data and Web Science, University of Mannheim) dominique[.][at]informatik[.]uni-mannheim[.]de
Kai Eckert (Research Gorup Data and Web Science, University of Mannheim)
Andreas Oskar Kempf (GESIS)
Joachim Neubert (ZBW)