OAEI Evaluation Report: Beyond Equivalence

Overview & Interpretation

Description

This first-year track evaluates five matchers—LogMap, LogMap-bio, LogMap-kg, Matcha, and MDMapper—across 10 datasets from two families: industrial classification standards (ECLASS–GPC, ECLASS–UNSPSC, ETIM–ECLASS, GPC–UNSPSC, GPC–UNSPSCplus) and STROMA/TaSeR (G1, G2, G3, G5, G7). We report both traditional metrics (Precision, Recall, F1-Score), which reward only exact identicial correspondences, and isAmong metrics (Precision*, Recall*, F1-Score*), which also give credit for partially correct relations such as superclass/subclass and overlap.

Overall, the produced alignments suggest that most systems have limited support for non-equivalence relations; an exception is MDMapper, which also captures superclass/subclass/overlap.

What is isAmong? In brief, an alignment is transformed so that each class is mapped to a covering set of descendant classes in the other ontology (its “isAmong” set). Class-level (for each class) Precision/Recall/F1-Score are then computed from the overlap between predicted and reference descendant-sets, and averaged across source and target sides. This yields a fine-grained, relation-aware score that fairly rewards containment/overlap—even when a system does not predict the exact relation in the reference alignment. This approach is designed for classification ontologies and avoids hand-tuned weights while supporting inference of border relations (≡, ≤, ≥, ≃).

Analysis

All 10 datasets (macro): LogMap leads on isAmong metrics (best F1-Score*), while LogMap-bio leads on traditional F1-Score.
Industrial classification standards: MDMapper leads under both regimes, and especially under isAmong (best P*, R*, F1*). Performance on these benchmarks is uniformly low, likely due to the scarcity of true equivalences and the predominance of other relation types. Overall, this indicates: existing matchers lack the capability to detect relations across concepts that differ in granularity or classification perspective.
STROMA/TaSeR: LogMap-bio achieves the best traditional F1-Score, whereas LogMap tops Recall and all isAmong metrics (Precision*, Recall*, F1-Scpre*).

Macro-Averages by Matcher — All (10 datasets)

Matcher	Precision	Recall	F1-Score	Precision*	Recall*	F1-Score*
LogMap	30.69%	17.44%	13.99%	22.33%	19.48%	19.62%
LogMap-bio	45.33%	15.30%	19.31%	20.82%	17.50%	17.92%
LogMap-kg	38.78%	16.68%	16.62%	21.29%	18.37%	18.59%
Matcha	25.57%	12.02%	6.10%	13.81%	11.44%	11.01%
MDMapper	47.94%	10.51%	16.40%	21.33%	16.12%	16.80%

Macro-Averages by Matcher — Industrial Clasification Standards

Matcher	Precision	Recall	F1-Score	Precision*	Recall*	F1-Score*
LogMap	24.60%	5.02%	6.29%	10.98%	8.52%	8.64%
LogMap-bio	36.49%	5.03%	7.88%	10.85%	8.35%	8.49%
LogMap-kg	32.86%	5.03%	7.47%	10.85%	8.35%	8.49%
Matcha	13.53%	0.07%	0.14%	3.69%	1.65%	1.70%
MDMapper	29.74%	5.87%	9.13%	17.09%	12.31%	12.66%

Macro-Averages by Matcher — STROMA/TaSeR

Matcher	Precision	Recall	F1-Score	Precision*	Recall*	F1-Score*
LogMap	36.77%	29.86%	21.69%	33.68%	30.44%	30.61%
LogMap-bio	54.18%	25.56%	30.73%	30.79%	26.66%	27.34%
LogMap-kg	44.70%	28.34%	25.78%	31.74%	28.39%	28.69%
Matcha	37.62%	23.96%	12.06%	23.94%	21.23%	20.32%
MDMapper	66.15%	15.15%	23.67%	25.57%	19.92%	20.93%

eclass-gpc

Matcher	Traditional			isAmong
	Precision	Recall	F1-Score	Precision*	Recall*	F1-Score*
LogMap	32.35%	0.09%	0.17%	4.09%	1.67%	1.83%
LogMap-bio	30.56%	0.09%	0.17%	4.19%	1.65%	1.82%
LogMap-kg	30.56%	0.09%	0.17%	4.19%	1.65%	1.82%
MDMapper	12.83%	0.19%	0.37%	10.81%	6.09%	6.29%

eclass-unspsc

Matcher	Traditional			isAmong
	Precision	Recall	F1-Score	Precision*	Recall*	F1-Score*
LogMap	17.34%	0.04%	0.07%	3.97%	1.59%	1.74%
LogMap-bio	16.10%	0.04%	0.08%	4.31%	1.75%	1.91%
LogMap-kg	16.10%	0.04%	0.08%	4.31%	1.75%	1.91%
Matcha	14.09%	0.03%	0.05%	3.32%	1.14%	1.28%
MDMapper	11.56%	0.11%	0.21%	11.86%	5.04%	5.56%

etim-eclass

Matcher	Traditional			isAmong
	Precision	Recall	F1-Score	Precision*	Recall*	F1-Score*
LogMap	40.32%	24.69%	30.62%	37.00%	34.17%	34.44%
LogMap-bio	88.13%	24.82%	38.74%	37.15%	34.35%	34.60%
LogMap-kg	70.02%	24.82%	36.65%	37.15%	34.35%	34.60%
MDMapper	96.77%	28.55%	44.09%	42.17%	38.73%	39.50%

gpc-unspsc

Matcher	Traditional			isAmong
	Precision	Recall	F1-Score	Precision*	Recall*	F1-Score*
LogMap	24.51%	0.13%	0.25%	6.92%	3.11%	3.23%
LogMap-bio	24.51%	0.13%	0.25%	6.92%	3.11%	3.23%
LogMap-kg	24.51%	0.13%	0.25%	6.92%	3.11%	3.23%
Matcha	21.83%	0.16%	0.31%	9.14%	4.28%	4.32%
MDMapper	13.73%	0.29%	0.56%	16.35%	8.75%	9.07%

gpc-unspscplus

Matcher	Traditional			isAmong
	Precision	Recall	F1-Score	Precision*	Recall*	F1-Score*
LogMap	8.49%	0.16%	0.31%	2.91%	2.07%	1.96%
LogMap-bio	23.15%	0.09%	0.17%	1.68%	0.90%	0.92%
LogMap-kg	23.15%	0.09%	0.17%	1.68%	0.90%	0.92%
Matcha	18.18%	0.10%	0.20%	2.30%	1.17%	1.19%
MDMapper	13.79%	0.22%	0.44%	4.23%	2.93%	2.88%

g1-web

Matcher	Traditional			isAmong
	Precision	Recall	F1-Score	Precision*	Recall*	F1-Score*
LogMap	3.20%	53.64%	6.05%	50.10%	50.22%	48.76%
LogMap-bio	60.81%	40.91%	48.91%	44.59%	41.25%	41.50%
LogMap-kg	16.47%	53.64%	25.20%	50.10%	50.22%	48.76%
MDMapper	88.24%	36.36%	51.50%	45.96%	39.20%	40.69%

g2-diseases

Matcher	Traditional			isAmong
	Precision	Recall	F1-Score	Precision*	Recall*	F1-Score*
LogMap	51.67%	69.77%	59.38%	45.81%	47.64%	45.92%
LogMap-bio	60.17%	61.02%	60.59%	44.36%	44.55%	43.85%
LogMap-kg	57.14%	62.15%	59.54%	43.59%	44.25%	43.31%
Matcha	2.50%	70.34%	4.83%	28.04%	36.68%	28.64%
MDMapper	57.45%	7.63%	13.47%	11.03%	7.64%	8.07%

g3-text

Matcher	Traditional			isAmong
	Precision	Recall	F1-Score	Precision*	Recall*	F1-Score*
LogMap	43.75%	7.35%	12.58%	19.88%	9.11%	11.56%
LogMap-bio	43.75%	7.35%	12.58%	19.88%	9.11%	11.56%
LogMap-kg	43.75%	7.35%	12.58%	19.88%	9.11%	11.56%
Matcha	39.85%	6.96%	11.84%	19.79%	8.91%	11.47%
MDMapper	38.53%	5.51%	9.64%	14.97%	6.49%	8.37%

g5-groceries

Matcher	Traditional			isAmong
	Precision	Recall	F1-Score	Precision*	Recall*	F1-Score*
LogMap	20.51%	5.13%	8.21%	16.71%	14.00%	14.24%
LogMap-bio	27.59%	5.13%	8.65%	15.70%	13.60%	13.80%
LogMap-kg	27.59%	5.13%	8.65%	15.70%	13.60%	13.80%
Matcha	23.53%	5.13%	8.42%	16.98%	14.02%	14.32%
MDMapper	46.51%	12.82%	20.10%	28.84%	24.81%	24.61%

g7-literature

Matcher	Traditional			isAmong
	Precision	Recall	F1-Score	Precision*	Recall*	F1-Score*
LogMap	64.71%	13.41%	22.22%	35.90%	31.24%	32.56%
LogMap-bio	78.57%	13.41%	22.92%	29.43%	24.77%	26.01%
LogMap-kg	78.57%	13.41%	22.92%	29.43%	24.77%	26.01%
Matcha	84.62%	13.41%	23.16%	30.94%	25.31%	26.87%
MDMapper	100.00%	13.41%	23.66%	27.07%	21.47%	22.92%

OAEI Evaluation Report: Beyond Equivalence 2025