Publications

958 Publications visible to you, out of a total of 958

Abstract (Expand)

OBJECTIVE: In the fields of medical care and research as well as hospital management, time series are an important part of the overall data basis. To ensure high quality standards and enable suitable decisions, tools for precise and generic imputations and forecasts that integrate the temporal dynamics are of great importance. Since forecasting and imputation tasks involve an inherent uncertainty, the focus of our work lay on a probabilistic multivariate generative approach that samples infillings or forecasts from an analysable distribution rather than producing deterministic results. MATERIALS AND METHODS: For this task, we developed a system based on generative adversarial networks that consist of recurrent encoders and decoders with attention mechanisms and can learn the distribution of intervals from multivariate time series conditioned on the periods before and, if available, periods after the values that are to be predicted. For training, validation and testing, a data set of jointly measured blood pressure series (ABP) and electrocardiograms (ECG) (length: 1,250=ˆ10s) was generated. For the imputation tasks, one interval of fixed length was masked randomly and independently in both channels of every sample. For the forecasting task, all masks were positioned at the end. RESULTS: The models were trained on around 65,000 bivariate samples and tested against 14,000 series of different persons. For the evaluation, 50 samples were produced for every masked interval to estimate the range of the generated infillings or forecasts. The element-wise arithmetic average of these samples served as an estimator for the mean of the learned conditional distribution. The approach showed better results than a state-of-the-art probabilistic multivariate forecasting mechanism based on Gaussian copula transformation and recurrent neural networks. On the imputation task, the proposed method reached a mean squared error (MSE) of 0.057 on the ECG channel and an MSE of 28.30 on the ABP channel, while the baseline approach reached MSEs of 0.095 (ECG) and 229.1 (ABP). Moreover, on the forecasting task, the presented system achieved MSEs of 0.069 (ECG) and 33.73 (ABP), outperforming the recurrent copula approach, which reached MSEs of 0.082 (ECG) and 196.53 (ABP). CONCLUSION: The presented generative probabilistic system for the imputation and forecasting of (medical) time series features the flexibility to handle masks of different sizes and positions, the ability to quantify uncertainty due to its probabilistic predictions, and an adjustable trade-off between the goals of minimising errors in individual predictions and minimising the distance between the learned and the real conditional distribution of the infillings or forecasts.

Authors: Sven Festag, Cord Spreckelsen

Date Published: 1st Feb 2023

Publication Type: Journal article

Abstract (Expand)

BACKGROUND: The growing interest in the secondary use of electronic health record (EHR) data has increased the number of new data integration and data sharing infrastructures. The present work has been developed in the context of the German Medical Informatics Initiative, where 29 university hospitals agreed to the usage of the Health Level Seven Fast Healthcare Interoperability Resources (FHIR) standard for their newly established data integration centers. This standard is optimized to describe and exchange medical data but less suitable for standard statistical analysis which mostly requires tabular data formats. OBJECTIVES: The objective of this work is to establish a tool that makes FHIR data accessible for standard statistical analysis by providing means to retrieve and transform data from a FHIR server. The tool should be implemented in a programming environment known to most data analysts and offer functions with variable degrees of flexibility and automation catering to users with different levels of FHIR expertise. METHODS: We propose the fhircrackr framework, which allows downloading and flattening FHIR resources for data analysis. The framework supports different download and authentication protocols and gives the user full control over the data that is extracted from the FHIR resources and transformed into tables. We implemented it using the programming language R [1] and published it under the GPL-3 open source license. RESULTS: The framework was successfully applied to both publicly available test data and real-world data from several ongoing studies. While the processing of larger real-world data sets puts a considerable burden on computation time and memory consumption, those challenges can be attenuated with a number of suitable measures like parallelization and temporary storage mechanisms. CONCLUSION: The fhircrackr R package provides an open source solution within an environment that is familiar to most data scientists and helps overcome the practical challenges that still hamper the usage of EHR data for research.

Authors: J. Palm, F. A. Meineke, J. Przybilla, T. Peschel

Date Published: 25th Jan 2023

Publication Type: Journal article

Abstract (Expand)

The COVID-19 pandemic shed light on the need for quick diagnosis tools in healthcare, leading to the development of several algorithmic models for disease detection. Though these models are relatively easy to build, their training requires a lot of data, storage, and resources, which may not be available for use by medical institutions or could be beyond the skillset of the people who most need these tools. This paper describes a data analysis and machine learning platform that takes advantage of high-performance computing infrastructure for medical diagnosis support applications. This platform is validated by re-training a previously published deep learning model (COVID-Net) on new data, where it is shown that the performance of the model is improved through large-scale hyperparameter optimisation that uncovered optimal training parameter combinations. The per-class accuracy of the model, especially for COVID-19 and pneumonia, is higher when using the tuned hyperparameters (healthy: 96.5%; pneumonia: 61.5%; COVID-19: 78.9%) as opposed to parameters chosen through traditional methods (healthy: 93.6%; pneumonia: 46.1%; COVID-19: 76.3%). Furthermore, training speed-up analysis shows a major decrease in training time as resources increase, from 207 min using 1 node to 54 min when distributed over 32 nodes, but highlights the presence of a cut-off point where the communication overhead begins to affect performance. The developed platform is intended to provide the medical field with a technical environment for developing novel portable artificial-intelligence-based tools for diagnosis support.

Authors: Chadi Barakat, Marcel Aach, Andreas Schuppert, Sigur\dhur Brynjólfsson, Sebastian Fritsch, Morris Riedel

Date Published: 2023

Publication Type: Journal article

Abstract (Expand)

BACKGROUND: The growing interest in the secondary use of electronic health record (EHR) data has increased the number of new data integration and data sharing infrastructures. The present work has been developed in the context of the German Medical Informatics Initiative, where 29 university hospitals agreed to the usage of the Health Level Seven Fast Healthcare Interoperability Resources (FHIR) standard for their newly established data integration centers. This standard is optimized to describe and exchange medical data but less suitable for standard statistical analysis which mostly requires tabular data formats. OBJECTIVES: The objective of this work is to establish a tool that makes FHIR data accessible for standard statistical analysis by providing means to retrieve and transform data from a FHIR server. The tool should be implemented in a programming environment known to most data analysts and offer functions with variable degrees of flexibility and automation catering to users with different levels of FHIR expertise. METHODS: We propose the fhircrackr framework, which allows downloading and flattening FHIR resources for data analysis. The framework supports different download and authentication protocols and gives the user full control over the data that is extracted from the FHIR resources and transformed into tables. We implemented it using the programming language R [1] and published it under the GPL-3 open source license. RESULTS: The framework was successfully applied to both publicly available test data and real-world data from several ongoing studies. While the processing of larger real-world data sets puts a considerable burden on computation time and memory consumption, those challenges can be attenuated with a number of suitable measures like parallelization and temporary storage mechanisms. CONCLUSION: The fhircrackr R package provides an open source solution within an environment that is familiar to most data scientists and helps overcome the practical challenges that still hamper the usage of EHR data for research.

Authors: Julia Palm, Frank A Meineke, Jens Przybilla, Thomas Peschel

Date Published: 2023

Publication Type: Journal article

Abstract (Expand)

BACKGROUND: The Federal Ministry of Education and Research of Germany (BMBF) funds a network of university medicines (NUM) to support COVID-19 and pandemic research at national level. The “COVID-19 Data Exchange Platform” (CODEX) as part of NUM establishes a harmonised infrastructure that supports research use of COVID-19 datasets. The broad consent (BC) of the Medical Informatics Initiative (MII) is agreed by all German federal states and forms the legal base for data processing. All 34 participating university hospitals (NUM sites) work upon a harmonised infrastructural as well as legal basis for their data protection-compliant collection and transfer of their research dataset to the central CODEX platform. Each NUM site ensures that the exchanged consent information conforms to the already-balloted HL7 FHIR consent profiles and the interoperability concept of the MII Task Force “Consent Implementation” (TFCI). The Independent Trusted Third-Party (TTP) of the University Medicine Greifswald supports data protection-compliant data processing and provides the consent management solutions gICS. METHODS: Based on a stakeholder dialogue a required set of FHIR-functionalities was identified and technically specified supported by official FHIR experts. Next, a “TTP-FHIR Gateway” for the HL7 FHIR-compliant exchange of consent information using gICS was implemented. A last step included external integration tests and the development of a pre-configured consent template for the BC for the NUM sites. RESULTS: A FHIR-compliant gICS-release and a corresponding consent template for the BC were provided to all NUM sites in June 2021. All FHIR functionalities comply with the already-balloted FHIR consent profiles of the HL7 Working Group Consent Management. The consent template simplifies the technical BC rollout and the corresponding implementation of the TFCI interoperability concept at the NUM sites. CONCLUSIONS: This article shows that a HL7 FHIR-compliant and interoperable nationwide exchange of consent information could be built using of the consent management software gICS and the provided TTP-FHIR Gateway. The initial functional scope of the solution covers the requirements identified in the NUM-CODEX setting. The semantic correctness of these functionalities was validated by project-partners from the Ludwig-Maximilian University in Munich. The production rollout of the solution package to all NUM sites has started successfully.

Authors: Martin Bialke, Lars Geidel, Christopher Hampf, Arne Blumentritt, Peter Penndorf, Ronny Schuldt, Frank-Michael Moser, Stefan Lang, Patrick Werner, Sebastian Stäubert, Hauke Hund, Fady Albashiti, Jürgen Gührer, Hans-Ulrich Prokosch, Thomas Bahls, Wolfgang Hoffmann

Date Published: 1st Dec 2022

Publication Type: Journal article

Abstract (Expand)

Anti-CD19 CAR-T cell immunotherapy is a hopeful treatment option for patients with B cell lymphomas, however it copes with partly severe adverse effects like neurotoxicity. Single-cell resolved molecular data sets in combination with clinical parametrization allow for comprehensive characterization of cellular subpopulations, their transcriptomic states, and their relation to the adverse effects. We here present a re-analysis of single-cell RNA sequencing data of 24 patients comprising more than 130,000 cells with focus on cellular states and their association to immune cell related neurotoxicity. For this, we developed a single-cell data portraying workflow to disentangle the transcriptional state space with single-cell resolution and its analysis in terms of modularly-composed cellular programs. We demonstrated capabilities of single-cell data portraying to disentangle transcriptional states using intuitive visualization, functional mining, molecular cell stratification, and variability analyses. Our analysis revealed that the T cell composition of the patient's infusion product as well as the spectrum of their transcriptional states of cells derived from patients with low ICANS grade do not markedly differ from those of cells from high ICANS patients, while the relative abundancies, particularly that of cycling cells, of LAG3-mediated exhaustion and of CAR positive cells, vary. Our study provides molecular details of the transcriptomic landscape with possible impact to overcome neurotoxicity.

Authors: H. Loeffler-Wirth, M. Rade, A. Arakelyan, M. Kreuz, M. Loeffler, U. Koehl, K. Reiche, H. Binder

Date Published: 17th Oct 2022

Publication Type: Journal article

Abstract (Expand)

Machine learning (ML) models are developed on a learning dataset covering only a small part of the data of interest. If model predictions are accurate for the learning dataset but fail for unseen data then generalization error is considered high. This problem manifests itself within all major sub-fields of ML but is especially relevant in medical applications. Clinical data structures, patient cohorts, and clinical protocols may be highly biased among hospitals such that sampling of representative learning datasets to learn ML models remains a challenge. As ML models exhibit poor predictive performance over data ranges sparsely or not covered by the learning dataset, in this study, we propose a novel method to assess their generalization capability among different hospitals based on the convex hull (CH) overlap between multivariate datasets. To reduce dimensionality effects, we used a two-step approach. First, CH analysis was applied to find mean CH coverage between each of the two datasets, resulting in an upper bound of the prediction range. Second, 4 types of ML models were trained to classify the origin of a dataset (i.e., from which hospital) and to estimate differences in datasets with respect to underlying distributions. To demonstrate the applicability of our method, we used 4 critical-care patient datasets from different hospitals in Germany and USA. We estimated the similarity of these populations and investigated whether ML models developed on one dataset can be reliably applied to another one. We show that the strongest drop in performance was associated with the poor intersection of convex hulls in the corresponding hospitals’ datasets and with a high performance of ML methods for dataset discrimination. Hence, we suggest the application of our pipeline as a first tool to assess the transferability of trained models. We emphasize that datasets from different hospitals represent heterogeneous data sources, and the transfer from one database to another should be performed with utmost care to avoid implications during real-world applications of the developed models. Further research is needed to develop methods for the adaptation of ML models to new hospitals. In addition, more work should be aimed at the creation of gold-standard datasets that are large and diverse with data from varied application sites.

Authors: Konstantin Sharafutdinov, Jayesh S Bhat, Sebastian Johannes Fritsch, Kateryna Nikulina, Moein E Samadi, Richard Polzin, Hannah Mayer, Gernot Marx, Johannes Bickenbach, Andreas Schuppert

Date Published: 1st Oct 2022

Publication Type: Journal article

Powered by
(v.1.13.0-master)
Copyright © 2008 - 2021 The University of Manchester and HITS gGmbH
Institute for Medical Informatics, Statistics and Epidemiology, University of Leipzig

By continuing to use this site you agree to the use of cookies