MultiFarm Results for OAEI 2024

In this page, we report the results of the OAEI 2024 campaign for the MultiFarm track. The details on this data set can be found at the MultiFarm web page.

If you notice any kind of error (wrong numbers, incorrect information on a matching system, etc.) do not hesitate to contact us (for the mail see below in the last paragraph on this page).

Experimental setting

We have conducted an evaluation based on the blind data set. This data set includes the matching tasks involving the edas and ekaw ontologies (resulting in 55 x 24 tasks). Participants were able to test their systems on the open subset of tasks. The open subset counts on 45 x 25 tasks and it does not include Italian translations.

We distinguish two types of matching tasks :

those tasks where two different ontologies (edas-ekaw, for instance) have been translated into two different languages;
those tasks where the same ontology (edas-edas) has been translated into two different languages.

As we could observe in previous evaluations, for the tasks of type (ii) which is similar ontologies, good results are not directly related to the use of specific techniques for dealing with cross-lingual ontologies, but on the ability to exploit the fact that both ontologies have an identical structure. This year, we report the results on different ontologies (i).

Participants

This year, 4 systems have registered to participate in the MultiFarm track: LogMap, LogMapLt, Matcha and MDMapper. The number of participating tools is similar with respect to the last 4 campaigns (4 in 2023, 5 in 2022, 6 in 2021, 6 in 2020, 5 in 2019, 6 in 2018, 8 in 2017, 7 in 2016, 5 in 2015, 3 in 2014, 7 in 2013, and 7 in 2012). This year, we lost the participation of LSMatch Multilingual. But we received new participation from Matcha. The reader can refer to the OAEI papers for a detailed description of the strategies adopted by each system.

Evaluation results

Execution setting and runtime

The systems have been executed on a Windows Server 2025 machine configured with 96GB of RAM running under a Intel Xeon Silver 4114 @2.20Ghz CPU, Tesla P40 GPU. All measurements are based on a single run. As for each campaign, we observed large differences in the time required for a system to complete the 55 x 24 matching tasks: LogMap (13 minutes), LogMapLt (265 minutes), Matcha (309 minutes) and MDMapper (493 minutes). This year we used a different machine to run the experiments however as it is seen from Logmap family tools the results are similar to last year. Whereas the timing of Matcher and MDmapper are quite high. These measurements are only indicative of the time the systems require for finishing the task in a common environment.

Overall results

The table below presents the aggregated results for the matching tasks. MultiFarm aggregated results per matcher for different ontologies. Time is measured in minutes (for completing the 55x24 matching tasks).


		Different ontologies (i)
System	Time(Min)	Prec.	F-m.	Rec.
LogMap	~13	.72	.42	.32
LogMapLt	~265	.24	.038	.02
Matcha	~309	.21	.28	.44
MDMapper	~493	.25	.04	.26

LogMap, LogMapLt, Matcha and MDMapper have participated this year. The results indicate notable differences in performance across the four systems (LogMap, LogMapLt, Matcha, and MDMapper) with regard to processing time, precision, F-measure, and recall. LogMap exhibits the shortest processing time (~13 minutes) and achieves the highest precision (0.72), but its recall is relatively low (0.32), resulting in a moderate F-measure of 0.42. LogMapLt takes significantly longer (~265 minutes) but shows much lower precision (0.24) and a minimal F-measure (0.038), along with a low recall (0.02). Matcha requires even more time (~309 minutes) and has a relatively balanced performance, with a precision of 0.21, an F-measure of 0.28, and the highest recall among the systems (0.44). Finally, MDMapper has the longest runtime (~493 minutes) with low precision (0.25), recall (0.26), and an F-measure of 0.04, indicating limited effectiveness despite the extended processing time. Overall, LogMap stands out for its efficiency and higher precision, while Matcha demonstrates better recall, albeit at a significant cost in processing time.

Conclusions

It is seen that similar number of different systems are participating each year to the campaign through the years. However, there is a dynamicity of the tools, such that, each year participating tools vary. The analysis reveals that, while a range of systems were assessed, each exhibits distinct performance dynamics. Specifically, systems prioritize precision over recall, with all recall scores falling below 0.50. Notably, LogMapLt and MDMapper show lower performance across both recall and F-measure, suggesting limited effectiveness in comparison. Overall, these results align with observations from previous evaluations, indicating that the systems’ outcomes still fall short of benchmarks achieved with the Conference original dataset.

References

[1] Christian Meilicke, Raul Garcia-Castro, Fred Freitas, Willem Robert van Hage, Elena Montiel-Ponsoda, Ryan Ribeiro de Azevedo, Heiner Stuckenschmidt, Ondrej Svab-Zamazal, Vojtech Svatek, Andrei Tamilin, Cassia Trojahn, Shenghui Wang. MultiFarm: A Benchmark for Multilingual Ontology Matching. Accepted for publication at the Journal of Web Semantics.

An authors version of the paper can be found at the MultiFarm homepage, where the data set is described in details.

Contact

This track is organized by Beyza Yaman, Abhisek Sharma, Sarika Jain and Cassia Trojahn dos Santos. If you have any problems working with the ontologies, any questions or suggestions, feel free to write an email to beyza [.] yaman [at] adaptcentre [.] ie, jasarika [at] nitkkr [.] ac [.] in, and cassia [.] trojahn [at] irit [.] fr.