The evaluation have been executed on a virtual machine with 16 GB of RAM and 16 vCPUs (2.4 GHz) processors. Precision, Recall and F-measure have been computed with respect to the reference alignment for this task. The evaluation was performed using MELT evaluation toolkit, which was used to evaluate SEALS matchers as well as those submitted as web packages. This year we added a new benchmark dataset in addition to the original Nell-DBpedia dataset. The new benchmark aligns classes from Yago and Wikidata. The gold standard for Nell-DBpedia is only a partial gold standard [1]. Therefore, to avoid over-penalising systems that may discover reasonable matches that are not coded in the gold standard, we ignore any predicted matches if neither of the classes in that pair is present as a true positive pair with another class in the gold standard. The two datasets annotate numerous instances, which some matchers unfortunately could not handle. Therefore, this year, we evaluate all matchers on a small version of both benchmarks. The latter have the same exact schema information, but with fewer annotated instances.
The alignment files resulted from all of the evaluated smatchers are also avilable to download here.
The tables provided below showcase aggregated results from two datasets for systems that generated non-empty alignment files. The "size" column indicates the total number of class alignments discovered by each system. Although most systems found alignments at both schema and instance levels, our evaluation focused solely on class alignments due to the absence of instance-level ground truth in the two gold standards. Regarding the Nell-DBpedia test case, systems like LogMap, OLaLa, Matcha, and AMD successfully produced results on the full-size dataset. However, some systems struggled with the original dataset versions containing all annotated instances. Specifically, LogMapLite and LsMatch failed to complete the task within the allocated 24-hour time limit. LogMap, while handling the smaller version, produced empty alignment files when applied to the YagoWikidata dataset. LogMapKG primarily aligned instances with full-size datasets. Additionally, AMD generated schema alignments, but in an incorrect format, rendering them unassessable. In terms of performance, all systems, except for LogMapLite, outperformed the basic string matcher on the Nell-DBpedia dataset. However, on the YagoWikidata dataset, LogMapLite and LsMatch were unable to surpass the baseline. This year saw the return of different matchers and the introduction of a new one, OLaLa. While most matchers demonstrated similar performance to previous evaluations, Matcha notably improved its results on both datasets. Matcha also showcased the ability to function with the original datasets, a capability it lacked in the 2022 evaluation. Notably, OLaLa outperformed all other matchers in the Nell-DBpedia task, whereas Matcha excelled on the larger dataset, Yago-Wikidata. Furthermore, all matching processes were completed in less than an hour, as indicated in the runtime column. Lastly, the dataset size column specifies whether a system operated on the original dataset or solely on the smaller version.
Matcher | Alignment Size | Precision | Recall | F1 measure | Time | Dataset Size |
---|---|---|---|---|---|---|
LogMap | 105 | 0.99 | 0.80 | 0.88 | 00:03:17 | original |
OLaLa | 120 | 1.0 | 0.92 | 0.96 | 00:07:07 | original |
LogMapLite | 77 | 1.00 | 0.60 | 0.75 | 00:26:19 | small |
LogMapKG | 104 | 0.98 | 0.80 | 0.88 | 00:00:00 | small |
AMD | 102 | 0.00 | 0.00 | 0.00 | 00:00:23 | original |
LsMatch | 101 | 0.96 | 0.75 | 0.84 | 00:00:52 | small |
Matcha | 114 | 0.99 | 0.87 | 0.93 | 00:01:53 | original |
String Baseline | 78 | 1.00 | 0.60 | 0.75 | 00:00:37 | original |
Matcher | Alignment Size | Precision | Recall | F1 measure | Time | Dataset Size |
---|---|---|---|---|---|---|
LogMap | 233 | 1.00 | 0.76 | 0.86 | 00:00:26 | small |
OLaLa | 209 | 1.0 | 0.68 | 0.81 | 00:03:56 | original |
LogMapLite | 211 | 1.00 | 0.70 | 0.81 | 00:54:13 | small |
LogMapKG | 232 | 1.00 | 0.76 | 0.83 | 00:00:10 | small |
AMD | 125 | 0.00 | 0.00 | 0.00 | 00:29:04 | original |
LsMatch | 196 | 0.97 | 0.63 | 0.76 | 00:02:33 | small |
Matcha | 274 | 0.99 | 0.90 | 0.94 | 00:07:16 | original |
String Baseline | 212 | 1.00 | 0.70 | 0.82 | 00:00:02 | original |
This track is organized by:
For any questions or suggestions about the track please email: oafallatah at uqu dot edu dot sa
[1] Fallatah, O., Zhang, Z., Hopfgartner, F. A gold standard dataset for large knowledge graphs matching, Proceedings of the 15th Ontology Matching workshop (2020). [pdf]