Overview & Interpretation
Description
This first-year track evaluates five matchers—LogMap, LogMap-bio, LogMap-kg, Matcha, and MDMapper—across 10 datasets from two families: industrial classification standards (ECLASS–GPC, ECLASS–UNSPSC, ETIM–ECLASS, GPC–UNSPSC, GPC–UNSPSCplus) and STROMA/TaSeR (G1, G2, G3, G5, G7). We report both traditional metrics (Precision, Recall, F1-Score), which reward only exact identicial correspondences, and isAmong metrics (Precision*, Recall*, F1-Score*), which also give credit for partially correct relations such as superclass/subclass and overlap.
Overall, the produced alignments suggest that most systems have limited support for non-equivalence relations; an exception is MDMapper, which also captures superclass/subclass/overlap.
What is isAmong? In brief, an alignment is transformed so that each class is mapped to a covering set of descendant classes in the other ontology (its “isAmong” set). Class-level (for each class) Precision/Recall/F1-Score are then computed from the overlap between predicted and reference descendant-sets, and averaged across source and target sides. This yields a fine-grained, relation-aware score that fairly rewards containment/overlap—even when a system does not predict the exact relation in the reference alignment. This approach is designed for classification ontologies and avoids hand-tuned weights while supporting inference of border relations (≡, ≤, ≥, ≃).
Analysis
- All 10 datasets (macro): LogMap leads on isAmong metrics (best F1-Score*), while LogMap-bio leads on traditional F1-Score.
- Industrial classification standards: MDMapper leads under both regimes, and especially under isAmong (best P*, R*, F1*). Performance on these benchmarks is uniformly low, likely due to the scarcity of true equivalences and the predominance of other relation types. Overall, this indicates: existing matchers lack the capability to detect relations across concepts that differ in granularity or classification perspective.
- STROMA/TaSeR: LogMap-bio achieves the best traditional F1-Score, whereas LogMap tops Recall and all isAmong metrics (Precision*, Recall*, F1-Scpre*).
>>> download raw results (alignments)
Overall Results
Cells highlighted indicate the best score within that dataset for the given metric. Matchers are listed alphabetically.
Macro-Averages by Matcher — All (10 datasets)
Matcher | Precision | Recall | F1-Score | Precision* | Recall* | F1-Score* |
---|---|---|---|---|---|---|
LogMap | 30.69% | 17.44% | 13.99% | 22.33% | 19.48% | 19.62% |
LogMap-bio | 45.33% | 15.30% | 19.31% | 20.82% | 17.50% | 17.92% |
LogMap-kg | 38.78% | 16.68% | 16.62% | 21.29% | 18.37% | 18.59% |
Matcha | 25.57% | 12.02% | 6.10% | 13.81% | 11.44% | 11.01% |
MDMapper | 47.94% | 10.51% | 16.40% | 21.33% | 16.12% | 16.80% |
Macro-Averages by Matcher — Industrial Clasification Standards
Matcher | Precision | Recall | F1-Score | Precision* | Recall* | F1-Score* |
---|---|---|---|---|---|---|
LogMap | 24.60% | 5.02% | 6.29% | 10.98% | 8.52% | 8.64% |
LogMap-bio | 36.49% | 5.03% | 7.88% | 10.85% | 8.35% | 8.49% |
LogMap-kg | 32.86% | 5.03% | 7.47% | 10.85% | 8.35% | 8.49% |
Matcha | 13.53% | 0.07% | 0.14% | 3.69% | 1.65% | 1.70% |
MDMapper | 29.74% | 5.87% | 9.13% | 17.09% | 12.31% | 12.66% |
Macro-Averages by Matcher — STROMA/TaSeR
Matcher | Precision | Recall | F1-Score | Precision* | Recall* | F1-Score* |
---|---|---|---|---|---|---|
LogMap | 36.77% | 29.86% | 21.69% | 33.68% | 30.44% | 30.61% |
LogMap-bio | 54.18% | 25.56% | 30.73% | 30.79% | 26.66% | 27.34% |
LogMap-kg | 44.70% | 28.34% | 25.78% | 31.74% | 28.39% | 28.69% |
Matcha | 37.62% | 23.96% | 12.06% | 23.94% | 21.23% | 20.32% |
MDMapper | 66.15% | 15.15% | 23.67% | 25.57% | 19.92% | 20.93% |
Per-Dataset Results
eclass-gpc
Matcher | Traditional | isAmong | ||||
---|---|---|---|---|---|---|
Precision | Recall | F1-Score | Precision* | Recall* | F1-Score* | |
LogMap | 32.35% | 0.09% | 0.17% | 4.09% | 1.67% | 1.83% |
LogMap-bio | 30.56% | 0.09% | 0.17% | 4.19% | 1.65% | 1.82% |
LogMap-kg | 30.56% | 0.09% | 0.17% | 4.19% | 1.65% | 1.82% |
MDMapper | 12.83% | 0.19% | 0.37% | 10.81% | 6.09% | 6.29% |
eclass-unspsc
Matcher | Traditional | isAmong | ||||
---|---|---|---|---|---|---|
Precision | Recall | F1-Score | Precision* | Recall* | F1-Score* | |
LogMap | 17.34% | 0.04% | 0.07% | 3.97% | 1.59% | 1.74% |
LogMap-bio | 16.10% | 0.04% | 0.08% | 4.31% | 1.75% | 1.91% |
LogMap-kg | 16.10% | 0.04% | 0.08% | 4.31% | 1.75% | 1.91% |
Matcha | 14.09% | 0.03% | 0.05% | 3.32% | 1.14% | 1.28% |
MDMapper | 11.56% | 0.11% | 0.21% | 11.86% | 5.04% | 5.56% |
etim-eclass
Matcher | Traditional | isAmong | ||||
---|---|---|---|---|---|---|
Precision | Recall | F1-Score | Precision* | Recall* | F1-Score* | |
LogMap | 40.32% | 24.69% | 30.62% | 37.00% | 34.17% | 34.44% |
LogMap-bio | 88.13% | 24.82% | 38.74% | 37.15% | 34.35% | 34.60% |
LogMap-kg | 70.02% | 24.82% | 36.65% | 37.15% | 34.35% | 34.60% |
MDMapper | 96.77% | 28.55% | 44.09% | 42.17% | 38.73% | 39.50% |
gpc-unspsc
Matcher | Traditional | isAmong | ||||
---|---|---|---|---|---|---|
Precision | Recall | F1-Score | Precision* | Recall* | F1-Score* | |
LogMap | 24.51% | 0.13% | 0.25% | 6.92% | 3.11% | 3.23% |
LogMap-bio | 24.51% | 0.13% | 0.25% | 6.92% | 3.11% | 3.23% |
LogMap-kg | 24.51% | 0.13% | 0.25% | 6.92% | 3.11% | 3.23% |
Matcha | 21.83% | 0.16% | 0.31% | 9.14% | 4.28% | 4.32% |
MDMapper | 13.73% | 0.29% | 0.56% | 16.35% | 8.75% | 9.07% |
gpc-unspscplus
Matcher | Traditional | isAmong | ||||
---|---|---|---|---|---|---|
Precision | Recall | F1-Score | Precision* | Recall* | F1-Score* | |
LogMap | 8.49% | 0.16% | 0.31% | 2.91% | 2.07% | 1.96% |
LogMap-bio | 23.15% | 0.09% | 0.17% | 1.68% | 0.90% | 0.92% |
LogMap-kg | 23.15% | 0.09% | 0.17% | 1.68% | 0.90% | 0.92% |
Matcha | 18.18% | 0.10% | 0.20% | 2.30% | 1.17% | 1.19% |
MDMapper | 13.79% | 0.22% | 0.44% | 4.23% | 2.93% | 2.88% |
g1-web
Matcher | Traditional | isAmong | ||||
---|---|---|---|---|---|---|
Precision | Recall | F1-Score | Precision* | Recall* | F1-Score* | |
LogMap | 3.20% | 53.64% | 6.05% | 50.10% | 50.22% | 48.76% |
LogMap-bio | 60.81% | 40.91% | 48.91% | 44.59% | 41.25% | 41.50% |
LogMap-kg | 16.47% | 53.64% | 25.20% | 50.10% | 50.22% | 48.76% |
MDMapper | 88.24% | 36.36% | 51.50% | 45.96% | 39.20% | 40.69% |
g2-diseases
Matcher | Traditional | isAmong | ||||
---|---|---|---|---|---|---|
Precision | Recall | F1-Score | Precision* | Recall* | F1-Score* | |
LogMap | 51.67% | 69.77% | 59.38% | 45.81% | 47.64% | 45.92% |
LogMap-bio | 60.17% | 61.02% | 60.59% | 44.36% | 44.55% | 43.85% |
LogMap-kg | 57.14% | 62.15% | 59.54% | 43.59% | 44.25% | 43.31% |
Matcha | 2.50% | 70.34% | 4.83% | 28.04% | 36.68% | 28.64% |
MDMapper | 57.45% | 7.63% | 13.47% | 11.03% | 7.64% | 8.07% |
g3-text
Matcher | Traditional | isAmong | ||||
---|---|---|---|---|---|---|
Precision | Recall | F1-Score | Precision* | Recall* | F1-Score* | |
LogMap | 43.75% | 7.35% | 12.58% | 19.88% | 9.11% | 11.56% |
LogMap-bio | 43.75% | 7.35% | 12.58% | 19.88% | 9.11% | 11.56% |
LogMap-kg | 43.75% | 7.35% | 12.58% | 19.88% | 9.11% | 11.56% |
Matcha | 39.85% | 6.96% | 11.84% | 19.79% | 8.91% | 11.47% |
MDMapper | 38.53% | 5.51% | 9.64% | 14.97% | 6.49% | 8.37% |
g5-groceries
Matcher | Traditional | isAmong | ||||
---|---|---|---|---|---|---|
Precision | Recall | F1-Score | Precision* | Recall* | F1-Score* | |
LogMap | 20.51% | 5.13% | 8.21% | 16.71% | 14.00% | 14.24% |
LogMap-bio | 27.59% | 5.13% | 8.65% | 15.70% | 13.60% | 13.80% |
LogMap-kg | 27.59% | 5.13% | 8.65% | 15.70% | 13.60% | 13.80% |
Matcha | 23.53% | 5.13% | 8.42% | 16.98% | 14.02% | 14.32% |
MDMapper | 46.51% | 12.82% | 20.10% | 28.84% | 24.81% | 24.61% |
g7-literature
Matcher | Traditional | isAmong | ||||
---|---|---|---|---|---|---|
Precision | Recall | F1-Score | Precision* | Recall* | F1-Score* | |
LogMap | 64.71% | 13.41% | 22.22% | 35.90% | 31.24% | 32.56% |
LogMap-bio | 78.57% | 13.41% | 22.92% | 29.43% | 24.77% | 26.01% |
LogMap-kg | 78.57% | 13.41% | 22.92% | 29.43% | 24.77% | 26.01% |
Matcha | 84.62% | 13.41% | 23.16% | 30.94% | 25.31% | 26.87% |
MDMapper | 100.00% | 13.41% | 23.66% | 27.07% | 21.47% | 22.92% |