Pharmacogenomics Track

Task: The Pharmacogenomics Track (or PGx Track for short) involves n-ary tuples representing so-called "pharmacogenomic relationships" and their components of three distinct types: drugs, genetic factors, and phenotypes. Tuples are reified as instances of the class pgxo:PharmacogenomicRelationship. The goal of the track is to match these tuples (instance matching).

Motivation: Pharmacogenomic tuples involve drugs, genetic factors, and phenotypes, and state that patients being treated by the specified drugs while having the specified genetic factors may experience the given phenotypes. Knowledge in pharmacogenomics is scattered across several resources, e.g., reference databases (PharmGKB) or the biomedical literature. Hence, there is a need to build a consolidated view of the knowledge of this domain by aligning tuples from different sources. See [1] for a detailed motivation and [2] for a detailed task description.

Data sets

The data sets are available on Zenodo. You can either download them for local analysis and processing, or you can directly use the MELT toolkit.

MELT:

Track pgxTrack = TrackRepository.Pharmacogenomics.V1;

SEALS:

Repository: http://oaei.webdatacommons.org/tdrs/
Suite-ID: pharmacogenomics
Version-ID: v1

For the evaluation, we use a subset of the alignments available in PGxLOD that have been created with the matching rules described in [3].

Task with 10 % of PGx relationships

Alignments: 942
- relatedMatch alignments: 83
- sameAs alignments: 315
- closeMatch alignments: 46
- broadMatch alignments: 332
- narrowMatch alignments: 166
Entities to align in source: 2523
Triples in source: 211770
Entities to align in target: 2520
Triples in target: 208051

Task with 50 % of PGx relationships

Alignments: 23938
- relatedMatch alignments: 1500
- sameAs alignments: 8636
- closeMatch alignments: 1128
- broadMatch alignments: 8047
- narrowMatch alignments: 4627
Entities to align in source: 12767
Triples in source: 459483
Entities to align in target: 12450
Triples in target: 450364

Task with 100 % of PGx relationships

Alignments: 88569
- relatedMatch alignments: 5318
- sameAs alignments: 34510
- closeMatch alignments: 3978
- broadMatch alignments: 24097
- narrowMatch alignments: 20666
Entities to align in source: 25362
Triples in source: 691151
Entities to align in target: 25073
Triples in target: 687052

Examples

Pharmacogenomic tuple encoding

Components (drugs, genetic factors, phenotypes) are associated with pharmacogenomic tuples through various properties from the PGxO ontology (e.g. pgxo:isAssociatedWith, pgxo:causes). These properties are organized in a hierarchy, with inverses, and/or symmetry.

Two examples of pharmacogenomic tuples and their encoding are displayed below:

Matching examples

In the following example, the expected alignment is pgt_2 skos:broadMatch pgt_1 (denoted by pgt_2 < pgt_1) since the genetic factor causing pgt_2 is part of the one causing pgt_1, and the phenotype is caused by pgt_2 instead of only being associated with pgt_1.

In the following example, we rely on hierarchies of ontology classes to detect that one tuple is more specific than the other because it involves a more specific phenotype.

Modalities

Expected alignments

Only alignments between instances of the pgxo:PharmacogenomicRelationship class are expected. Four different types of alignments are considered:

owl:sameAs (=)
skos:closeMatch (Close)
skos:broadMatch (<)
skos:narrowMatch (>)
skos:relatedMatch (Related)

Evaluation

Fine-grained setting: Only alignments of the exact type expected in the reference are considered as correct. To illustrate, an alignment (e1, =, e2) where (e1, ~, e2) was expected will be considered as incorrect. Precision, Recall, and F1-score will be computed for each type of alignments.
Coarse-grained setting: Any type of alignment between entities expected to be aligned will be considered as correct. To illustrate, an alignment (e1, =, e2) where (e1, ~, e2) was expected will be considered as correct. Precision, Recall, and F1-score will be computed globally accordingly.

Results available

The results are available on the results page.

Organizers

Pierre Monnin (Université Côte d’Azur, Inria, CNRS, I3S, France), main contact for the track, pierre dot monnin at inria dot fr
Adrien Coulet (Inria Paris, Centre de Recherche des Cordeliers, Inserm, Université Paris Cité, Sorbonne Université, Paris, France)

References

[1] Pierre Monnin, Joël Legrand, Graziella Husson, Patrice Ringot, Andon Tchechmedjiev, Clément Jonquet, Amedeo Napoli, Adrien Coulet: PGxO and PGxLOD: a reconciliation of pharmacogenomic knowledge of various provenances, enabling further comparison. BMC Bioinformatics 20-S(4): 139:1-139:16 (2019) [pdf]

[2] Pierre Monnin, Adrien Coulet: Matching pharmacogenomic knowledge: particularities, results, and perspectives. OM@ISWC 2022: 79-83 [pdf]

[3] Pierre Monnin, Miguel Couceiro, Amedeo Napoli, Adrien Coulet: Knowledge-Based Matching of n-ary Tuples. ICCS 2020: 48-56 [pdf]