MultiFarm Results for OAEI 2020

In this page, we report the results of the OAEI 2020 campaign for the MultiFarm track. The details on this data set can be found at the MultiFarm web page.

If you notice any kind of error (wrong numbers, incorrect information on a matching system, etc.) do not hesitate to contact us (for the mail see below in the last paragraph on this page).

Experimental setting

We have conducted an evaluation based on the blind data set. This data set includes the matching tasks involving the edas and ekaw ontologies (resulting in 55 x 24 tasks). Participants were able to test their systems on the open subset of tasks, available via the SEALS repository. The open subset counts on 45 x 25 tasks and it does not include Italian translations.

We distinguish two types of matching tasks :

those tasks where two different ontologies (edas-ekaw, for instance) have been translated into two different languages;
those tasks where the same ontology (edas-edas) has been translated into two different languages.

As we could observe in previous evaluations, for the tasks of type (ii), good results are not directly related to the use of specific techniques for dealing with cross-lingual ontologies, but on the ability to exploit the fact that both ontologies have an identical structure.

Participants

This year, 6 systems have registered to participate in the MultiFarm track: AML, Lily, LogMap, LogMapLt, VeeAlign and Wiktionary. This number slightly increases with respect to the last campaign (5 in 2019, 6 in 2018, 8 in 2017, 7 in 2016, 5 in 2015, 3 in 2014, 7 in 2013, and 7 in 2012). The reader can refer to the OAEI papers for a detailed description of the strategies adopted by each system.

Evaluation results

Execution setting and runtime

The systems have been executed on a Ubuntu Linux machine configured with 8GB of RAM running under a Intel Core CPU 2.00GHz x4 processors. All measurements are based on a single run. As for each campaign, we observed large differences in the time required for a system to complete the 55 x 24 matching tasks: AML (170 minutes), Lily (453 minutes), LogMap (43 minutes) and Wiktionary (1290 minutes). These number are not comparable to those from last year given the fact that the SEALS repositories have moved to another server (with a different configuration). Furthermore, the concurrent access to the SEALS repositories during the evaluation period may have an impact in the time required for completing the tasks. These measurements are only indicative of the time the systems require for finishing the task in a common environment. Note a well that VeeAlign run in a different environment so runtime is not reported.

Overall results

The table below presents the aggregated results for the matching tasks. They have been computed using the Alignment API 4.6 and can slightly differ from those computed with the SEALS client. We haven't applied any threshold on the results. They are measured in terms of classical precision and recall. We do not report the results of non-specific systems here, as we could observe in the last campaigns that they can have intermediate results in tests of type ii) and poor performance in tests i). Lily has generated empty alignments so there are no results to be reported.

AML outperforms all other systems in terms of F-measure for task i) (same behaviour in the last campaigns). In terms of precision, Wiktionary is the system that generates the most precise alignments, followed by LogMap and AML. With respect to the task ii) LogMap has the overall best performance.

MultiFarm aggregated results per matcher, for each type of matching task -- different ontologies (i) and same ontologies (ii). Time is measured in minutes (for completing the 55x24 matching tasks), ** tool run in a different environment so runtime is not reported. #pairs indicates the number of pairs of languages the tool is able to generated (non empty) alignments. Size indicates the average of the number of generated correspondences for the tests where an (non empty) alignment has been generated. Two kinds of results are reported : those do not distinguishing empty and erroneous (or not generated) alignments and those -- indicated between parenthesis -- considering only non empty generated alignments for a pair of languages.
			Different ontologies (i)				Same ontologies (ii)
System	Time	#pairs	Size	Prec.	F-m.	Rec.	Size	Prec.	F-m.	Rec.
AML	170	55	8.25	.72 (.72)	.47 (.47)	.35 (.35)	33.65	.94 (.96)	.28 (.28)	.17 (.17)
LogMap	43	55	6.64	.73 (.72)	.37 (.37)	.25 (.25)	46.62	.95 (.97)	.41 (.43)	.28 (.28)
LogMapLt	17	23	1.15	.34 (.35)	.04 (.09)	.02 (.02)	95.17	.02 (.02)	.01 (.03)	.01 (.01)
VeeAlign	**	52	2.53	.73 (.77)	.15 (.15)	.09 (.09)	11.98	.91 (.93)	.14 (.14)	.08 (.08)
Wiktionary	1290	53	4.92	.77 (.80)	.32 (.33)	.21 (.21)	9.38	.94 (.96)	.12 (.13)	.07 (.07)

AML, LogMap and Wiktionary have participated last year. Unfortunately, we lose some one on the way (EVOCROS) but we compensate it with a newcomer (VeeAlign). Comparing the results from last year, in terms F-measure (cases of type i), AML maintains its overall performance (.45 in 2019, .46 in 2018, .46 in 2017, .45 in 2016 and .47 in 2015). The same could be observed for LogMap (.37 in 2019, .37 in 2018, .36 in 2017, and .37 in 2016). The performance in terms of f-measure of Wiktionary also remains stable (.31 in 2019).

Language specific results (type i)

Table below presents the results per pair of language, involving matching different ontologies (test cases of type i).

MultiFarm results per pair of languages (55 pairs), for the test cases of type (i)

	AML			LogMap			LogMapLt			VeeAlign			Wiktionary
ar-cn	0,59	0,26	0,16	0,62	0,19	0,11	0,00	NaN	0,00	0,60	0,05	0,02	NaN	NaN	0,00
ar-cz	0,70	0,40	0,28	0,72	0,40	0,28	0,00	NaN	0,00	0,86	0,12	0,07	0,92	0,12	0,07
ar-de	0,69	0,39	0,28	0,73	0,37	0,25	0,00	NaN	0,00	0,63	0,24	0,14	1,00	0,11	0,06
ar-en	0,77	0,38	0,25	0,73	0,41	0,28	0,00	NaN	0,00	0,82	0,27	0,16	0,95	0,11	0,06
ar-es	0,67	0,45	0,34	0,69	0,36	0,25	0,00	NaN	0,00	0,51	0,18	0,11	0,94	0,09	0,05
ar-fr	0,65	0,36	0,25	0,64	0,29	0,19	0,00	NaN	0,00	0,41	0,08	0,04	1,00	0,02	0,01
ar-it	0,74	0,48	0,36	0,69	0,22	0,13	0,00	NaN	0,00	1,00	0,11	0,06	0,93	0,07	0,04
ar-nl	0,68	0,38	0,27	0,74	0,41	0,28	0,00	NaN	0,00	0,87	0,16	0,09	0,93	0,13	0,07
ar-pt	0,71	0,48	0,37	0,72	0,38	0,25	0,00	NaN	0,00	0,75	0,20	0,12	0,91	0,10	0,05
ar-ru	0,66	0,30	0,19	0,77	0,41	0,28	0,00	NaN	0,00	0,33	0,06	0,04	0,88	0,08	0,04
cn-cz	0,63	0,32	0,22	0,72	0,27	0,17	0,00	NaN	0,00	1,00	0,04	0,02	0,81	0,33	0,21
cn-de	0,64	0,35	0,24	0,71	0,23	0,13	0,00	NaN	0,00	0,54	0,09	0,05	0,70	0,31	0,20
cn-en	0,67	0,32	0,21	0,85	0,22	0,13	0,00	NaN	0,00	0,88	0,08	0,04	0,81	0,38	0,25
cn-es	0,69	0,41	0,29	0,66	0,25	0,15	0,00	NaN	0,00	1,00	0,09	0,05	0,72	0,39	0,27
cn-fr	0,68	0,41	0,29	0,69	0,23	0,14	0,00	NaN	0,00	0,75	0,02	0,01	1,00	0,01	0,00
cn-it	0,73	0,39	0,27	0,79	0,12	0,06	0,00	NaN	0,00	1,00	0,08	0,04	0,75	0,35	0,23
cn-nl	0,67	0,34	0,23	0,70	0,21	0,12	0,00	NaN	0,00	0,84	0,08	0,04	0,72	0,38	0,26
cn-pt	0,65	0,41	0,30	0,77	0,25	0,15	0,00	NaN	0,00	1,00	0,06	0,03	0,77	0,35	0,22
cn-ru	0,65	0,39	0,28	0,73	0,31	0,19	0,00	NaN	0,00	0,70	0,04	0,02	NaN	NaN	0,00
cz-de	0,68	0,47	0,36	0,70	0,39	0,27	0,93	0,13	0,07	0,89	0,04	0,02	0,80	0,36	0,23
cz-en	0,81	0,49	0,35	0,79	0,50	0,37	0,65	0,07	0,04	0,80	0,26	0,15	0,85	0,47	0,32
cz-es	0,77	0,57	0,45	0,68	0,39	0,27	0,82	0,05	0,02	0,83	0,23	0,14	0,77	0,45	0,32
cz-fr	0,78	0,54	0,42	0,66	0,39	0,28	0,00	NaN	0,00	0,78	0,04	0,02	0,90	0,33	0,20
cz-it	0,73	0,53	0,42	0,77	0,37	0,24	0,83	0,05	0,03	0,88	0,19	0,10	0,80	0,39	0,26
cz-nl	0,78	0,56	0,44	0,72	0,45	0,33	0,80	0,08	0,04	0,84	0,27	0,16	0,78	0,42	0,28
cz-pt	0,72	0,55	0,45	0,72	0,44	0,32	NaN	NaN	0,00	NaN	NaN	0,00	0,81	0,45	0,32
cz-ru	0,75	0,52	0,39	0,75	0,46	0,33	0,00	NaN	0,00	NaN	NaN	0,00	0,83	0,36	0,23
de-en	0,78	0,46	0,33	0,78	0,44	0,31	0,89	0,20	0,11	0,95	0,10	0,05	0,63	0,39	0,28
de-es	0,67	0,48	0,37	0,73	0,39	0,26	0,50	0,01	0,01	0,93	0,07	0,04	0,72	0,40	0,28
de-fr	0,73	0,50	0,38	0,75	0,43	0,30	0,75	0,05	0,02	0,72	0,19	0,11	0,78	0,37	0,24
de-it	0,74	0,53	0,42	0,70	0,34	0,22	0,83	0,05	0,03	1,00	0,07	0,04	0,74	0,34	0,22
de-nl	0,73	0,48	0,36	0,78	0,45	0,32	0,90	0,10	0,05	0,86	0,09	0,05	0,78	0,41	0,28
de-pt	0,70	0,49	0,37	0,70	0,38	0,26	0,87	0,07	0,04	1,00	0,09	0,05	0,73	0,37	0,25
de-ru	0,67	0,42	0,30	0,78	0,44	0,31	0,00	NaN	0,00	0,70	0,16	0,09	0,85	0,35	0,22
en-es	0,77	0,45	0,32	0,72	0,45	0,33	0,75	0,03	0,02	0,63	0,30	0,19	0,76	0,48	0,36
en-fr	0,81	0,46	0,32	0,70	0,43	0,31	0,79	0,10	0,05	0,75	0,05	0,02	0,63	0,44	0,33
en-it	0,78	0,44	0,30	0,71	0,41	0,29	0,86	0,09	0,05	0,90	0,29	0,17	0,79	0,44	0,30
en-nl	0,80	0,48	0,34	0,80	0,54	0,40	0,86	0,13	0,07	0,72	0,42	0,30	0,73	0,47	0,35
en-pt	0,78	0,49	0,36	0,76	0,52	0,39	0,86	0,09	0,05	0,81	0,36	0,23	0,86	0,52	0,38
en-ru	0,74	0,38	0,26	0,90	0,48	0,33	0,00	NaN	0,00	0,50	0,06	0,03	0,65	0,31	0,21
es-fr	0,76	0,56	0,44	0,69	0,40	0,28	0,00	NaN	0,00	0,65	0,08	0,04	0,77	0,44	0,30
es-it	0,75	0,60	0,50	0,63	0,27	0,17	0,94	0,16	0,09	0,78	0,35	0,23	0,71	0,43	0,31
es-nl	0,74	0,58	0,48	0,71	0,40	0,28	0,00	NaN	0,00	0,70	0,27	0,17	0,69	0,43	0,31
es-pt	0,73	0,58	0,49	0,70	0,45	0,33	0,82	0,20	0,11	0,74	0,42	0,29	0,71	0,41	0,28
es-ru	0,72	0,51	0,39	0,76	0,41	0,28	0,00	NaN	0,00	0,53	0,04	0,02	0,71	0,31	0,20
fr-it	0,75	0,57	0,46	0,67	0,35	0,24	0,00	NaN	0,00	0,93	0,07	0,04	0,77	0,31	0,19
fr-nl	0,74	0,55	0,43	0,71	0,42	0,30	0,90	0,09	0,05	0,56	0,08	0,04	0,84	0,34	0,21
fr-pt	0,74	0,55	0,44	0,67	0,39	0,28	0,50	0,01	0,01	0,77	0,05	0,03	0,77	0,41	0,28
fr-ru	0,74	0,49	0,37	0,74	0,36	0,24	0,00	NaN	0,00	0,64	0,18	0,10	0,78	0,39	0,26
it-nl	0,72	0,54	0,43	0,75	0,36	0,24	0,85	0,06	0,03	0,75	0,29	0,18	0,68	0,38	0,26
it-pt	0,76	0,62	0,52	0,64	0,33	0,23	0,92	0,17	0,09	0,88	0,42	0,28	0,78	0,46	0,33
it-ru	0,69	0,44	0,32	0,84	0,28	0,17	0,00	NaN	0,00	NaN	NaN	0,00	0,80	0,25	0,15
nl-pt	0,76	0,59	0,48	0,70	0,45	0,33	0,86	0,06	0,03	0,71	0,35	0,23	0,72	0,43	0,30
nl-ru	0,74	0,51	0,39	0,79	0,46	0,33	0,00	NaN	0,00	0,83	0,03	0,01	0,81	0,31	0,19
pt-ru	0,71	0,49	0,37	0,75	0,47	0,34	0,00	NaN	0,00	0,64	0,08	0,04	0,84	0,33	0,20

NaN: division per zero, likely due to empty alignment.

Conclusions

Unforntunetly, this track does not attract to many participants. In terms of performance, the F-measure for blind tests remains relatively stable across campaigns. As observed in several campaigns, still, all systems privilege precision in detriment to recall and the results are below the ones obtained for the Conference original dataset.

References

[1] Christian Meilicke, Raul Garcia-Castro, Fred Freitas, Willem Robert van Hage, Elena Montiel-Ponsoda, Ryan Ribeiro de Azevedo, Heiner Stuckenschmidt, Ondrej Svab-Zamazal, Vojtech Svatek, Andrei Tamilin, Cassia Trojahn, Shenghui Wang. MultiFarm: A Benchmark for Multilingual Ontology Matching. Accepted for publication at the Journal of Web Semantics.

An authors version of the paper can be found at the MultiFarm homepage, where the data set is described in details.

Contact

This track is organized by Beyza Yaman and Cassia Trojahn dos Santos. If you have any problems working with the ontologies, any questions or suggestions, feel free to write an email to beyza [_] yaman [at] hotmail [.] com and cassia [.] trojahn [at] irit [.] fr.