Systems medicine and translational medicine is aiming at unraveling disease pathogenesis, diagnostic segregation of heterogeneous diseases, prognostication of disease progression and side effects, and search for tailored treatments to be tested in clinical trials. Novel molecular-genetic and phenotyping technologies have permitted to gather rich multi-layered data sets. A major challenge is to integrate data and analyses to arrive at novel and selective medical decision models for stratified medicine. One bottleneck is integrative data semantics.
The interdisciplinary project “Leipzig Health Atlas” (LHA) forms an alliance of medical ontologists, medical systems biologists and clinical trial groups to tackle these issues. We focus on areas in which our team has scientific and clinical expertise. The Leipzig Health Atlas will
- provide an interoperable ontology-based semantic platform to share highly annotated data, novel ontologies, usable models and working software tools;
- provide advanced, application-oriented analytic pipeline for a clinical and scientific user community to provide disease-related phenotype classifications, omics based disease sub-classifications, risk predictions and simulation models for diseases and organ functions.
LHA is attractive for other systems medicine groups to use the material and also provide their results and tools and thereby become a provider of integrative data semantics.
Medical and scientific users will have access using a web-based platform to models, methods (algorithms, tools), data and metadata. Models and data will be curated.
Data and Permissions
A strategic goal of the LHA is to contribute to the reduction of "waste in clinical research", [Lancet, 2014 ]. In the area of clinical research, data is often withheld, not objectively or correctly published, or misinterpreted. This leads to a distorted information situation, which in turn leads to unnecessary resource use in already sufficiently explored areas. The rapid and undistorted provision of high-quality data from clinical trials for the scientific community as well as for the public can counteract this situation.
The LHA provides data, documents and algorithms of various types from trial and patient-related research and care projects in which we and our partners are actively involved. This includes annotations from, or links to, appropriate data, documents and algorithms from other consortia.
Data and documents are provided in a selection and approval process in coordination with the participating consortia or data owners for defined user groups.
The LHA will provide the following data, documents and models/algorithms
- Metadata from phenotyping and genotyping of subjects and patients: This includes e.g. Biometric features from case report forms (CRF) of clinical and epidemiological studies. The CRFs are prepared in a structured way and are represented in a metadata repository (MDR) with content identifiers, coding rules, annotations to the phenotypes and genotypes, and, if applicable, corresponding measurement rules (SOPs). We currently have several thousand such features, which are succesively made available in the LHA-MDR (LHA Data Portal). Metadata are generally used for study-related or cross-study purposes. It also includes identifiers and descriptions of the clinical and epidemiological studies from which the CRF characteristics are derived, and documents or publications on study design or study protocols. The LHA will provide multiple data mining tools for the metadata.
- Phenotypic and molecular genetic data from pre-evaluated prospective clinical and epidemiological studies are provided in a prepared and curated manner, depending on the audience and the intended use (see below). Depending on requirements and consensus, aggregated or individualized data can be provided. These data sets ensure the reference to the generating studies. For the data sets, descriptive population statistics are provided for the description of the collectives. In doing so, the reporting standard of the international EQUATOR network, which is now established for interventional therapeutic studies, diagnostic and prognostic studies and for epidemiological studies, is being sought. For molecular genetic data we optionally provide extensive annotations for the individual data. These annotations include e.g. methods and quality indicators of the measurements, the nature of the changes (e.g., genotype, mutation-calls) and predictions of the gene functions (e.g., GO annotations, pathway annotation, damage prediction in mutations). Molecular genetic data include sequence data, genome-wide arraydates, proteomata, metabolom data, immunohistochemistry data, cytogenetics data and many more. The provision of data may include only portions of the data records, e.g. only phenome data, only genome data, with and without annotations, depending on the audience and approval process. For datamining in data and documents, the LHA will provide a series of tools.
- Within the scope of biometric evaluations and system-medical modeling, numerous models and evaluation methods are emerging which are of increasing importance for medicine. These are based on data sets at which they were conditioned. The LHA will provide such models and algorithms along with training data sets. The models can be divided into two broad groups: (1) statistical models for the diagnostic classification of diseases or phenotypes or statistical models for the calculation of risk structures in predictions; and (2) dynamic simulation models of pathophysiological processes, pharmacokinetics and pharmacodynamics or of disease progression. Statistical models are defined in the LHA, and a calculation can be performed for individual patients by entering data into an acquisition scheme (e.g., GUI). In case of the dynamic models, two user scenarios are to be expected: first the LHA has a ready-to-use tool, which can then be addressed directly for simulations. Second the model algorithm is available for download for research purposes. Within the framework of the LHA, we will provide assessment methods and models only according to the current standards of a certified medical product according to the MPG (DIMDI certificate). Moreover the metadata and data for the associated model adaptations are provided.
Authorization concept and consent management
The LHA aims to make as many data as possible accessible (shared open data). A large part of the LHA information comes directly from clinical research. This is not always possible for data protection reasons, in individual cases also for reasons of intellectual property. Therefore a personal registration for a specific inquiry to the LHA team or the data owner can be necessary.
The assignment to user groups is performed for each set of metadata, data, document or model provided in the LHA, in consultation with the data subjects involved, and in accordance with the associated, participant consent to the research project.
- Public: Completely freely accessible data - public access
- Registered (project-scoped): Personal, general access for all data of a project.
- Restricted (record-scoped): Personal access for specific individual records.
In the Restricted case, the LHA presents only metadata and provides queries to the data owner. The exchange of the data is negotiated between the data owner and the inquirer and possibly supported by a contract. Registration requests are treated in two stages.
- Description of a study with short profile, parts of the study protocol
- Description of the documentation concept with short profile, with CRF, with annotated MDR
- Only aggregated phenotypic data, individualized for a few features or for many features
- Only aggregated genome data, individualized for few or for many features, with or without functional annotations
- Derivatives of already published data
- Study protocol
- Phenotypic data for few characteristics Analysis tools (microdata-based, e.g., i2b2)
- Models (either directly on the LHA server or as a download), use requires professional background
- Consent-restricted microdata