Results for OAEI 2022 - Common Knowledge Graphs Track

Experiment Settings

The evaluation have been executed on a Linux virtual machine with 128 GB of RAM and 16 vCPUs (2.4 GHz) processors. Precision, Recall and F-measure have been computed with respect to the reference alignment for this task. The evaluation was performed using MELT evaluation toolkit, which was used to evaluate SEALS matchers as well as those submitted as web packages. This year we added a new benchmark dataset in addition to the original Nell-DBpedia dataset. The new benchmark aligns classes from Yago and Wikidata. The gold standard for Nell-DBpedia is only a partial gold standard [1]. Therefore, to avoid over-penalising systems that may discover reasonable matches that are not coded in the gold standard, we ignore any predicted matches if neither of the classes in that pair is present as a true positive pair with another class in the gold standard. The two datasets annotate numerous instances, which some matchers unfortunately could not handle. Therefore, this year, we evaluate all matchers on a small version of both benchmarks. The latter have the same exact schema information, but with fewer annotated instances.

Generated alignments

The alignment files resulted from all of the evaluated smatchers are also avilable to download here.

Participation and Success

This year, 8 matchers registered to participate on this track. However, we run all participants on the two small benchmarks to test their ability to match large-scale common KGs schema. The following matching systems produced an exception: SEBMatcher, GraphMatcher, and WomboCombo, while ALion and Ciderlm have completed the task but produced empty alignment files. Therefore, our results only include matchers who were able to complete the task with a non-empty alignment file and within the 12 hours timeout, which are:

LogMap
ATMatcher
Matcha
KGMatcher+
LogMapLite
LogMapKG
LsMatch
AMD

We have also evaluated a simple string based baseline matcher which calculates the similarity between class labels in order to generate candidate matching classes. The baseline we utilize is the SimpleStringMatcher available through Matching EvaLuation Toolkit (MELT) and the source code can be found here.

Results

The tables below show the aggregated results on the two datasets for systems that produced non-empty alignment files. The size column indicates the total number of class alignments discovered by each system. While the majority of the systems discovered alignments at both schema and instance levels, we have only evaluated class alignments, as the two gold standard does not include any instance-level ground truth. Further, Not all systems were able to handle the original dataset versions (i.e., those with all annotated instances). In terms of the Nell-DBpedia test case, LogMap, ATMatcher, KGMatcher+, and AMD were able to generate results when applied to the full-size dataset. While on the Yago-Wikidata dataset, which is large-scale compared to the first dataset, only ATMatcher, KGMatcher+, and AMD were able to generate alignments with the original dataset. Other systems either fail to complete the task within the allocated 24 hours time limit such as LogMapLite, Matcha, and LsMatch, or produce an empty alignment file such as Matcha, LogMap (only on the YagoWikidata dataset). LogMapKG on the other hand tend to only align instances when it is applied to full-size datasets. Similar to 2021 evaluation results, AMD does generate schema alignments but in the wrong format, therefore, they can not be evaluated. On the Nell-DBpedia dataset, all systems were able to outperform the basic string matcher, in terms of F1 score, except for LogMapLite. On the YagoWikidata dataset, two systems were not able to outperform the baseline, which are LogMapLite and LsMatch. In terms of runtime, the tables below present the run time as HH:MM:SS where we can observe that all matching were able to finish the task in less than an hour except for KGMatcher+. Finally, the dataset size column identifies whether the system was able to perform on the original dataset or only on the smaller version.

Nell-DBpedia

Matcher	Alignment Size	Precision	Recall	F1 measure	Time	Dataset Size
LogMap	105	0.99	0.80	0.88	00:03:17	original
ATMatcher	104	1.00	0.80	0.89	00:03:10	original
Matcha	104	1.00	0.81	0.90	00:01:00	small
KGMatcher+	117	1.00	0.91	0.95	02:43:50	original
LogMapLite	77	1.00	0.60	0.75	00:26:19	small
LogMapKG	104	0.98	0.80	0.88	00:00:00	small
AMD	102	0.00	0.00	0.00	00:00:23	original
LsMatch	101	0.96	0.75	0.84	00:00:52	small
String Baseline	78	1.00	0.60	0.75	00:00:37	original

Yago-Wikidata

Matcher	Alignment Size	Precision	Recall	F1 measure	Time	Dataset Size
LogMap	233	1.00	0.76	0.86	00:01:19	small
ATMatcher	233	1.00	0.77	0.87	00:19:04	original
Matcha	243	1.00	0.80	0.89	00:03:18	small
KGMatcher+	253	0.99	0.83	0.91	02:07:59	original
LogMapLite	211	1.00	0.70	0.81	00:48:19	small
LogMapKG	232	1.00	0.76	0.83	00:00:10	small
AMD	125	0.00	0.00	0.00	00:29:04	original
LsMatch	196	0.96	0.63	0.76	00:02:28	small
String Baseline	212	1.00	0.70	0.82	00:00:02	original

Organizers

This track is organized by:

Omaima Fallatah (The University of Sheffield - Information School)

For any questions or suggestions about the track please email: oafallatah1 at sheffield dot ac dot uk

References

[1] Fallatah, O., Zhang, Z., Hopfgartner, F. A gold standard dataset for large knowledge graphs matching, Proceedings of the 15th Ontology Matching workshop (2020). [pdf]