The Data Repositories Powering Medical Miracles at MDC
How advanced data infrastructure is accelerating biomedical discovery
In the bustling labs of the Max Delbrück Center for Molecular Medicine (MDC), a quiet revolution is underway—one that transcends microscopes and petri dishes to harness the power of bytes and algorithms. As one of Germany's premier biomedical research institutions, the MDC employs approximately 1,800 staff members who tirelessly investigate the molecular origins of diseases, seeking new pathways to diagnose, treat, and prevent human illness8 .
These repositories are far more than digital storage spaces—they are dynamic resources that capture complex biological information, from the intricate dance of proteins within our cells to the genetic variations that predispose us to disease.
At a time when medical research is increasingly driven by massive datasets and artificial intelligence, these carefully curated collections have become indispensable tools in the quest to understand and ultimately conquer human disease. They represent the digital infrastructure that enables today's researchers to build upon yesterday's discoveries, accelerating the pace of biomedical innovation in ways previously unimaginable.
While both store information, repositories are characterized by their comprehensive nature, standardized formats, and emphasis on reproducibility.
At MDC, repositories span multiple disciplines focused on cardiovascular and metabolic diseases, cancer, diseases of the nervous system, and medical systems biology8 .
Protein Domains
Simulation Time
Risk Increase Identified
Patients in Clinical Cohort
Among the most impressive repositories associated with MDC researchers is mdCATH, a groundbreaking dataset that addresses a critical gap in our understanding of proteins—the dynamic building blocks of life1 .
While we've made tremendous strides in determining static protein structures, a comprehensive understanding of how these structures move and change over time has remained elusive.
The mdCATH repository enables researchers to study protein folding thermodynamics, unfolding kinetics, and conformational changes at a proteome-wide scale1 . This has profound implications for understanding how proteins function—and malfunction—in diseases ranging from Alzheimer's to cancer.
| Aspect | Scale/Details |
|---|---|
| Protein domains simulated | 5,398 |
| Temperatures | 320K, 348K, 379K, 413K, 450K |
| Replicates per condition | 5 |
| Total simulation time | >62 milliseconds |
| Data recording interval | Every 1 nanosecond |
| Unique feature | Includes instantaneous forces in addition to coordinates |
The dataset is available under a Creative Commons CC BY 4.0 license and can be accessed through multiple platforms, including HuggingFace and PlayMolecule1 .
Understanding protein dynamics has implications for diseases ranging from Alzheimer's to cancer, enabling new therapeutic approaches.
While some repositories store computational data like mdCATH, others emerge from experimental breakthroughs. A compelling example comes from MDC researchers' work on CRISPR-based genotyping—a technology that could revolutionize how we detect genetic risk factors for disease6 .
The research team developed a multiplexed CRISPR assay to identify genetic variants in the APOL1 gene prevalent among individuals of African ancestry. These variants are associated with an 8–30-fold increased risk of developing kidney disease.
Researchers focused on two APOL1 risk variants: G1 and G26 .
Employed three orthogonal Cas enzymes enabling simultaneous detection6 .
Systematically designed and tested CRISPR RNAs for robust allele discrimination6 .
Incorporated ML analysis to interpret fluorescence-based readouts6 .
Adapted assay for lateral-flow readout, enabling visual genotype determination6 .
| Research Reagent | Function in Experiment |
|---|---|
| LwaCas13a | RNA-targeting CRISPR enzyme for allele discrimination |
| PsmCas13b | Additional RNA-targeting enzyme for multiplexing |
| LbaCas12a | DNA-targeting CRISPR enzyme expanding detection capability |
| Custom crRNAs | Guide molecules programmed to recognize specific APOL1 variants |
| RNA/DNA standards | Synthetic genetic material for assay validation |
| Lateral flow strips | Point-of-care compatible visual readout platform |
| Metric | Performance/Outcome |
|---|---|
| Genotypes detected | All 6 APOL1 genotypes |
| Clinical validation | >100 patients across multiple centers |
| Readout options | Fluorescence or lateral flow |
| Key advantage | Rapid results enabling point-of-care testing |
| Potential impact | Accessible genotyping for resource-limited settings |
The CRISPR-based genotyping assay successfully identified all six possible APOL1 genotypes with accuracy comparable to gold-standard sequencing methods6 . The platform demonstrated particular promise for kidney transplant stratification, where knowing the donor's APOL1 genotype can significantly impact postoperative care and outcomes.
Behind every repository and every breakthrough lies a collection of essential research tools. At MDC, the experimental toolkit spans multiple disciplines and technologies:
Include cloning vectors for DNA manipulation, cRNA probes for gene expression analysis, and knockout mouse models for studying gene function in complex organisms2 .
Leverage the center's state-of-the-art platforms, including 7 Tesla ultra high field MRI and electron microscopy, enabling visualization of biological structures from the whole-organ down to the molecular level8 .
Forms the backbone of modern biomedical research, with bioinformatics platforms capable of processing the massive datasets generated by genomics, proteomics, and imaging studies8 .
This diverse toolkit reflects the multidisciplinary nature of contemporary molecular medicine, where biological insight emerges from the integration of diverse technologies and data types.
The repositories and research emerging from the Max Delbrück Center represent more than isolated scientific achievements—they embody a fundamental shift in how we approach biological understanding and medical progress. As Prof. Matthias Tschöp, CEO of Helmholtz Munich, noted: "Cutting-edge AI tools integrated across the entire organization accelerate the delivery of societal benefits"3 . These words capture the transformative potential of marrying robust data repositories with advanced analytical tools.
Looking ahead, the future of medical research will increasingly depend on such comprehensive digital resources that capture the complexity of biological systems. As MDC's work demonstrates, the path to understanding—and ultimately treating—human disease requires not just brilliant individual experiments but the collective wisdom preserved in carefully constructed repositories.
For patients awaiting new therapies, for doctors seeking better diagnostic tools, and for scientists pursuing the next breakthrough, these repositories represent more than just data—they represent hope, encoded in the most universal language of all: the language of science.