1 Background and Problem
1909 Wilhelm Johannsen introduced the term “phenotype”, but until today it still has no generally consented definition. Usually, the term is considered as an observable characteristic or trait of an organism. Correct determination of phenotypes plays a key role for diagnosis of diseases, evaluation of risk factors and recruitment of patients for clinical and epidemiological studies. But the translation of phenotype algorithms into machine-readable form is a challenging process. Recent attempts have shown that ontologies are suitable to handle phenotypes and that they can support clinical research and decision making. We developed an approach, which uses extended reasoning, where phenotypic data are combined to complex phenotypes based on calculations and classifications.
We consider a phenotype as an individual (as in General Formal Ontology, GFO), for example, the weight of a concrete person. Abstract instantiable entities, that are instantiated by phenotypes, are called phenotype classes (e.g., the abstract property ‘body length’ possess individual length values as instances.
There are single and composite properties (traits), and correspondingly, single and composite phenotypes:
- Single phenotype: a single property (e.g., age, weight, height)
- Composite phenotype: a composite property that consists of single properties (e.g., BMI, SOFA Score) of an organism or its subsystem
- Boolean phenotype: a Boolean expression based on has_part relations
- Mathematical phenotype: a calculation rule (e.g., BMI = weight / height²)
Furthermore, composite phenotype classes can associate certain conditions with specific predefined values (scores). Such phenotype classes we call score phenotype classes.
We distinguish between restricted and non-restricted phenotype classes, depending on whether their extensions (set of instances) are restricted to a certain range of individual phenotypes by defined conditions or all instances are allowed. For example:
- Non-restricted: phenotype class “age”, which is instantiated by the ages of all living beings
- Restricted: phenotype class “young age”, which is instantiated by the ages of the young ones (age below some pleasant value).
2.1 Architecture of the Core Ontology of Phenotypes
The Core Ontology of Phenotypes (COP) enables ontologists to model phenotype classes, so that phenotypes can be classified in phenotype classes based on instance data sets (e.g., of a patient). We used the class gfo:Property of the GFO to model properties or traits and we defined the class cop:Phenotype as subclass of gfo:Property (Figure 1 (a)). According to our definitions in Section 1, there are six types of phenotype classes:
- non-restricted (NSiP) and restricted (RSiP) single phenotype classes
- non-restricted (NScP) and restricted (RScP) score phenotype classes
- non-restricted (NMaP) and restricted (RMaP) mathematical phenotype classes
Each subclass of cop:Single_Phenotype, cop:Score_Phenotype and cop:Mathematical_Phenotype is a phenotype class and is instantiated by phenotypes. Direct subclasses are non-restricted, while subclasses of non-restricted phenotype classes are restricted (e.g., age greater than or equal to 20 years: Age_ge_20).
Phenotype classes possess various common attributes (e.g., labels, descriptions and links to external concepts). Additional attributes vary depending on the phenotype class. NSiP classes define the datatype and a unit of measure, NMaP classes have a mathematical formula, RSiP and RMaP classes have restrictions and RScP classes have a Boolean expression and an optional score. Logical relations between phenotype classes as well as range restrictions are represented by anonymous equivalent classes or general class axioms based on property restrictions.
2.2 Example: Calculation of the Body Surface Area
The body surface area plays an important role in medicine, for example, for the calculation of the dosage of certain drugs. We used the formula of Gehan and George:
SA = a0 * Height ^ a1 * Weight ^ a2
Coefficients are dependent on the patient’s age:
- Age below 5 years: a0 = 0.02667, a1 = 0.42246, a2 = 0.51456
- Age between 5 and 19 years: a0 = 0.03050, a1 = 0.35129, a2 = 0.54375
- Age above 19 years: a0 = 0.01545, a1 = 0.54468, a2 = 0.46336
Steps to model the scenario:
- Model the NSiP classes “Height”, “Weight” and “Age” as subclasses of cop:Single_Phenotype and add annotations for labels, descriptions, related concepts, etc., if appropriate.
- Define the RSiP classes for age ranges as subclasses of the class “Age”. For every RSiP class, the anonymous equivalent class must be created, which represents the corresponding restriction.
- For each coefficient define a subclass of cop:Score_Phenotype (i.e., a0, a1 and a2).
- Create subclasses of the NScP classes from 3, which represent the single score values (e.g., a0_s2), define general class axioms (e.g., has_part some Age_ge_5_l_20 SubClassOf a0_s2) to reference the corresponding age range classes (e.g., Age_ge_5_l_20) using a general class axiom and define the score values (e.g., 0.0305 from above).
- Model the resulting phenotype class for the body surface area as NMaP class and add an annotation for the formula with the names of NSiP and NScP classes. Additionally, these classes are referenced by means of the general class axiom “a0 and a1 and a2 and (has_part some Height) and (has_part some Weight) SubClassOf GG_BSA”.
2.3 Phenotype Reasoning
To use the COP for reasoning, whom can use the Phenotype Manager (PhenoMan, early prototype), which follows a multistage reasoning approach, that combines standard reasoners (e.g., Pellet or HermiT) and mathematical calculations. PhenoMan is doing its reasoning in the following way:
- Receive an instance data set (e.g., height = 180 cm, weight = 85 kg and age = 20 years)
- Insert the instance data as instances of the direct subclasses of cop:Single_Phenotype (Height, Weight and Age) and add the values as property assertions based on the has_value relation (e.g., has_value 180)
- Define a composite phenotype as instance of the class cop:Composite_Phenotype, which combines all the single phenotype instances using property assertions based on has_part relation.
- Start the reasoning
- Classification step 1: classify the single phenotype instances in restricted classes, using a standard reasoner (e.g., 21 years >= 20 years => Age_ge_20)
- Classification step 2: classify the composite phenotype instance into the second subclasses (i.e., a0_s2, a1_s2 and a3_s2) of all three score phenotype classes
- Calculation step: construct the formula of the mathematical phenotype classes and pass it to an external library for calculation (the formula is constructed by inserting the values of the single phenotype instances (e.g., height and weight) and by determining relevant score values for score variables (e.g., a0 = 0.01545))
- Go to 4 if a NMaP class has subclasses, i.e., RMaP classes, which are in turn used in score phenotypes
Step 4 may be repeated several times, depending on the complexity of the phenotypes. For example, the SOFA score consists of six other scores and requires two classification and two calculation steps.