Ontology Alignment Evaluation Initiative - OAEI-2023 Campaign OAEI

Results for OAEI 2023 - Common Knowledge Graphs Track

Experiment Settings

The evaluation have been executed on a virtual machine with 16 GB of RAM and 16 vCPUs (2.4 GHz) processors. Precision, Recall and F-measure have been computed with respect to the reference alignment for this task. The evaluation was performed using MELT evaluation toolkit, which was used to evaluate SEALS matchers as well as those submitted as web packages. This year we added a new benchmark dataset in addition to the original Nell-DBpedia dataset. The new benchmark aligns classes from Yago and Wikidata. The gold standard for Nell-DBpedia is only a partial gold standard [1]. Therefore, to avoid over-penalising systems that may discover reasonable matches that are not coded in the gold standard, we ignore any predicted matches if neither of the classes in that pair is present as a true positive pair with another class in the gold standard. The two datasets annotate numerous instances, which some matchers unfortunately could not handle. Therefore, this year, we evaluate all matchers on a small version of both benchmarks. The latter have the same exact schema information, but with fewer annotated instances.

Generated alignments

The alignment files resulted from all of the evaluated smatchers are also avilable to download here.

Participation and Success

This year, 7 matchers registered to participate on this track, which are : Our results only include matchers who were able to complete the task with a non-empty alignment file and within the 12 hours timeout. Further, due to those system requiring special hardware or software resources, we optained the alignments of OLaLa and Matcha systems from the participants. We have also evaluated a simple string based baseline matcher which calculates the similarity between class labels in order to generate candidate matching classes. The baseline we utilize is the SimpleStringMatcher available through Matching EvaLuation Toolkit (MELT) and the source code can be found here.

Results

The tables provided below showcase aggregated results from two datasets for systems that generated non-empty alignment files. The "size" column indicates the total number of class alignments discovered by each system. Although most systems found alignments at both schema and instance levels, our evaluation focused solely on class alignments due to the absence of instance-level ground truth in the two gold standards. Regarding the Nell-DBpedia test case, systems like LogMap, OLaLa, Matcha, and AMD successfully produced results on the full-size dataset. However, some systems struggled with the original dataset versions containing all annotated instances. Specifically, LogMapLite and LsMatch failed to complete the task within the allocated 24-hour time limit. LogMap, while handling the smaller version, produced empty alignment files when applied to the YagoWikidata dataset. LogMapKG primarily aligned instances with full-size datasets. Additionally, AMD generated schema alignments, but in an incorrect format, rendering them unassessable. In terms of performance, all systems, except for LogMapLite, outperformed the basic string matcher on the Nell-DBpedia dataset. However, on the YagoWikidata dataset, LogMapLite and LsMatch were unable to surpass the baseline. This year saw the return of different matchers and the introduction of a new one, OLaLa. While most matchers demonstrated similar performance to previous evaluations, Matcha notably improved its results on both datasets. Matcha also showcased the ability to function with the original datasets, a capability it lacked in the 2022 evaluation. Notably, OLaLa outperformed all other matchers in the Nell-DBpedia task, whereas Matcha excelled on the larger dataset, Yago-Wikidata. Furthermore, all matching processes were completed in less than an hour, as indicated in the runtime column. Lastly, the dataset size column specifies whether a system operated on the original dataset or solely on the smaller version.

Nell-DBpedia

Matcher Alignment Size Precision Recall F1 measure Time Dataset Size
LogMap 105 0.99 0.80 0.88 00:03:17 original
OLaLa 120 1.0 0.92 0.96 00:07:07 original
LogMapLite 77 1.00 0.60 0.75 00:26:19 small
LogMapKG 104 0.98 0.80 0.88 00:00:00 small
AMD 102 0.00 0.00 0.00 00:00:23 original
LsMatch 101 0.96 0.75 0.84 00:00:52 small
Matcha 114 0.99 0.87 0.93 00:01:53 original
String Baseline 78 1.00 0.60 0.75 00:00:37 original

Yago-Wikidata

Matcher Alignment Size Precision Recall F1 measure Time Dataset Size
LogMap 233 1.00 0.76 0.86 00:00:26 small
OLaLa 209 1.0 0.68 0.81 00:03:56 original
LogMapLite 211 1.00 0.70 0.81 00:54:13 small
LogMapKG 232 1.00 0.76 0.83 00:00:10 small
AMD 125 0.00 0.00 0.00 00:29:04 original
LsMatch 196 0.97 0.63 0.76 00:02:33 small
Matcha 274 0.99 0.90 0.94 00:07:16 original
String Baseline 212 1.00 0.70 0.82 00:00:02 original

Organizers

This track is organized by:

For any questions or suggestions about the track please email: oafallatah at uqu dot edu dot sa

References

[1] Fallatah, O., Zhang, Z., Hopfgartner, F. A gold standard dataset for large knowledge graphs matching, Proceedings of the 15th Ontology Matching workshop (2020). [pdf]