Project Area 1: Semantic data integration, ontologies and mining services

Description

The overall goal of this project area is to provide methods to semantically integrate and provide access to relevant data and metadata from selected resources (see projects 1.2 and 1.3). Original data are provided by several epidemiological and clinical trials and competence networks. Providing these data semantically necessitates the integration of raw data and analysis results together with corresponding metadata. Moreover, we will annotate metadata and instances, i.e., create links to other possibly external data resources (e.g. ontologies), and, thus, make data publicly available within the Linked Open Data (LOD) cloud (http://lod-cloud.net) with privacy preservations.

Project structure and dataflow of LHA project area 1
Figure 1: Project structure and dataflow of LHA project area 1

 

Project Area Projects

Project 1.1: Framework for Metadata and Data Integration

The objective of this task is to design and to develop the integration framework. In cooperation with applications in project area 2 we identify relevant data of trials and biomedical research projects which are then physically integrated into a central research database to overcome technical restrictions and syntactical heterogeneity. We will use semi-automatically generated schema mappings from sources to the research database to facilitate data migration and harmonize data (e.g., code lists in different trials).

Project 1.2: Ontology Development

The objective of this project is the development of an ontological framework which is used to represent, integrate, and formalize various types of data, information, and knowledge. At the core of this framework, the Basic Investigation Ontology (BIO) describes all metadata and data structures of integrated studies at different levels of abstraction (e.g. basic, metadata, data levels). We will yield our experience with the LIFE Investigation Ontology [1] that has been specifically developed in and for LIFE to describe collected data.

Project 1.3: Annotation Linking

In this project we will enrich LHA datasets with semantic annotations based on semi-automatic annotation methods in order to achieve higher interoperability for different applications and datasets in the LHA platform (see Project 1.1). Despite much research in the link discovery area, only few annotation linking has been done for biomedical documents and data such as case report forms (CRFs) for clinical trials, analysis results and complex phenotypes.

Project 1.4: Bioinformatics knowledge mining in complex data

This WP aims at providing bioinformatics methods to handle massive and complex molecular and phenotypic data collected from clinical trials and epidemiological cohort studies to make them usable in downstream systems medicine applications for hypothesis generation and verification, mathematical modeling and diagnostic and prognostic decision making as described in WP 2.