Tour 1b: Ontology-Based Specification and Import of Content Into the Leipzig Health Atlas

1 Background

Designing complex web portals, including the modelling of content to be represented, is a demanding process. The content has a complex structure, because of the contained entities and their connections among each other. The entities and their relationships have to be systematically analysed, and integrated into the Content Management System (CMS) of the Leipzig Health Atlas. We decided to use ontologies, because they are well-suited for modelling and specifying complex data and its dependencies. However, an automated import of ontologies into the CMS Drupal has been lacking so far and was therefore implemented by us.

The next sections outline how metadata is imported into the LHA. If you just want to contribute your data and models, have a look at Section 2. Section 3 is a more in-depth view on the deployment process.

2 How to Contribute

As a contributor, you just have to grep the LHA Excel Template from here and submit it to the LHA team. Whenever you decide to hand in an entity, you can generate respective LHA IDs on above page. For details on how to fill the Excel, see Section 3.2.

For each entry, you have to decide if attachments, e.g., data sets with patient data, binaries of programmatic models or images, should be accessible by everyone, or if the access should be restricted to specific user groups (e.g., your research group or institute), determined by a whitelist.

After your submission, expect the following scenarios to occur:

  • People will download and try your provided public attachments.
  • Some of them will have a greater interest and will try to contact you to get more information or to pose questions.
  • A fraction of the interested people will cite you, or even want to start a cooperation.

3 Solution

3.1 From Excel to Drupal – Upload of LHA Metadata into the Web Portal

We have developed an ontology to describe the metadata about projects, publications, models and data sets to be represented in the LHA portal. Based on the ontology, we have implemented a pipeline (Figure 1) which can be used to specify the contents of the portal and import them into the CMS Drupal (version 8). Our approach enables ontology-based modelling of web portal content and its automatic import into Drupal.

LHA Deployment Pipeline
Figure 1: The LHA deployment pipeline for metadata

 

The pipeline consists of the following 4 steps:

  1. Specification of the contents with the help of a specially developed Excel template
  2. Generation of domain-specific entities in the ontology from the specification using the Drupal Ontology Generator (DOG)
  3. Optional ontology optimization with an ontology editor including the integration of external ontologies/terminologies, such as human disease ontology
  4. Import of the ontology into Drupal's own database using the module Simple Ontology Loader in Drupal

3.2 Metadata Specification with Excel Templates

For the structured specification of metadata of projects, publications, datasets of different types and methods, we developed a metadata model consisting of three linked levels (entity types) (Figure 2).

LHA metadata model
Figure 2: The LHA metadata model, showing all available entity types and their relations.

 

The project level is the superior level to which several publications can be assigned. The datasets, OMICS datasets, clinical trial data and other special datasets, and related methods and models are mostly assigned to publications and form the lowest level for the collection of the accompanying metadata. Since links to each entity are possible via IDs, publications can be assigned to several projects or it is also possible to refer to multiple publications.

The metamodel can be found in the structure of the template for collecting metadata (see example below). Specific information on the respective levels (project, publication, OMICS dataset, clinical dataset, methods) is requested in the various table tabs in the Excel.  

In addition to the bibliographic information, further information on the contents of the projects, publications and datasets is recorded in the entry masks. These include, among other things:

  • Links to external websites, e.g. websites of study groups and PubMed entries
  • General descriptions such as project goals, motivations, abstracts of publications, etc.
  • Annotations with concepts of external terminology (e.g. for diseases)
  • Sponsors and, if available, grant numbers
  • Number of Records and Study Design used
  • Contact details of scientists responsible for the content

Depending on the context requirements, the metadata itself is stored in Excel as a link, text, a text enumeration separated by a concatenation character (vertical bar: “|”), numeric entries or as a date.

Example Excel of the German Glioma Network and its publications, data sets and methods

3.3 The Ontological Perspective – Architecture of the Drupal Ontology

We have developed the Drupal Upper Ontology (DUO), which models the standard components of Drupal (Field, Node, File and Vocabulary). According to the three-ontologies-method, DUO is a task ontology, i.e. an ontology for the problem that the software is supposed to solve. We also implemented a domain ontology, the Portal Ontology of LHA (POL), which was embedded in DUO and used to model the contents of the portal. For the integration and formal foundation of task and domain ontology, we used the General Formal Ontology (GFO) as the top-level ontology (Figure 3).

Based on GFO, we distinguish between symbolic structures (e.g. content of web pages, such as text and images) and the entities (categories or individuals, such as people or projects) represented by the symbolic structures. For reasons of simplicity, we only model the entities to be represented in the ontology, while their representations (on the web pages) are generated by the software.

Since both individuals and categories can be represented in a portal, we derive the class duo:Node_Item for modelling the entities to be represented from the class gfo:Item, which has the classes gfo:Individual and gfo:Category as subclasses. The class duo:Vocabulary_Concept is used to integrate the concepts of external ontologies/terminologies and is derived from gfo:Concept. We consider the files (duo:File) as continuants (gfo:Continuant) in the GFO sense, since they are concrete individuals with a certain lifetime.

In POL categories from DUO are specialized and instantiated. On the one hand, different entity types, such as pol:Publication, pol:Project, pol:Method and pol:Clinical_Data are defined by embedding in duo:Node_Item, and on the other hand, concrete instances of these classes are created and linked to each other. In addition, external terminologies (such as disease classifications) are referenced in POL so that the concepts of the POL entities can be annotated.

Both the Drupal fields (such as “title” or “content”) and the user-defined domain-specific fields (such as “address”, “author” and “disease”) are considered as properties and modelled as annotation properties, so that the instances in POL can be described and linked with each other through these fields.

three-ontology-method
Figure 3: Relations between GFO, DUO and POL

 

Example Project Ontology in OWL for the German Glioma Network (OWL-Editor required)

German Glioma Network in LHA (ID: LHA-7Q0CF98QUE-7)

3.4 Content Annotation with Concepts of the Human Disease Ontology

By adding DOIDs in the Excel templates, content can be annotated with concepts from the Human Disease Ontology (HDO). When the Project Ontology is generated, all relevant HDO classes, their metadata and super classes are dynamically requested from Obolibrary and they are added to the POL.

”brain glioma” tag in the Human Disease Ontology Taxonomy on the LHA portal

As you can see in the link above, the following metadata of HDO concepts are stored in the LHA:

  • Descriptions
  • DOID – the unique identifier of the concept
  • Synonyms
  • References to concepts in other terminologies
  • Related phenotypes

We provide references to other disease terminologies to achieve compatibility with a vast amount of standard terminologies in medicine.

4 Related Publications

Uciteli, Alexandr; Beger, Christoph; Rillich, Katja; Meineke, Frank A.; Loeffler, Markus; Herre, Heinrich (2018): Ontology-Based Modelling of Web Content: Example Leipzig Health Atlas. In: Hoppe, Thomas; Humm, Bernhard; Reibold, Anatol (Hg.) (2018): Semantic applications. Methodology, technology, corporate use. Berlin: Springer. Online available at http://dx.doi.org/10.1007/978-3-662-55433-3.

Beger, Christoph; Uciteli, Alexandr; Herre, Heinrich (2017): Light-Weighted Automatic Import of Standardized Ontologies into the Content Management System Drupal. In: Studies in health technology and informatics 243, 170–174.

Guided Tours

The Core Ontology of Phenotypes

Ontology-Based Specification and Import of Content into the Leipzig Health Atlas

LHA Data Portal

Reference Data for new Phenotypes

Toolbox for Genetic Risk Prediction

Individual Next-Cycle Management for Chemotherapies

OposSOM Browser of B-Cell Lymphoma

GEO-Maps

Trial Data / Platform Tour