Back to Moritz Lab

Scientists Uncover 1,700+ Protein-like Molecules in the ‘Dark Proteome’

ISB researchers help reveal a previously hidden layer of the human proteome, identifying a new class of protein-like molecules known as “peptideins” with potential implications for cancer, immunotherapy, and human disease.

Abstract Illustration of a Stylized Protein
Finding the Signal in the Noise: An artistic interpretation of the "dark proteome." Illuminating a new class of protein-like molecules — peptideins — hidden within billions of genomic and proteomic data points. (Illustration by Delaney Nye / ISB)

Scientists at the Institute for Systems Biology (ISB) have helped uncover a vast, previously hidden layer of human biology, positively identifying more than 1,700 new protein-like molecules in the human genome from more than 7,000 sequences that could reshape our understanding of health and disease.

Published today in Nature, the international study reveals that large portions of the human genome once thought to be biologically inactive are, in fact, producing small, previously undetected molecules. These findings expand the known landscape of the human proteome and introduce a new class of biological entities researchers are calling “peptideins.”

The work was conducted by a global consortium of scientists led by the Princess Máxima Center for Pediatric Oncology in the Netherlands, the University of Michigan Medical School, the EMBL European Bioinformatics Institute in the UK, and ISB.

“These findings suggest that we have been missing a substantial portion of the molecular machinery that operates within human cells,” said ISB Professor Dr. Robert Moritz, a co-senior author of the study. “By bringing together large-scale data and advanced computational tools developed at ISB, we’ve been able to confirm the existence of thousands of these previously hidden molecules at a level of confidence that wasn’t possible before.”

Revealing New Protein-like Molecules in the ‘Dark Proteome’

For decades, scientists have focused on a relatively fixed set of approximately 19,500 proteins encoded by the human genome. But growing evidence suggests that many additional molecules — particularly small ones — have gone undetected.

In this study, researchers analyzed more than 7,000 understudied regions of DNA known as non-canonical open reading frames (ncORFs). They found thus far that roughly 25 percent of these regions produce detectable protein-like molecules, many of which are fewer than 50 amino acids in length.

Because the function of most of these molecules remains unclear, the research team introduced the concept of “peptideins” to describe them — acknowledging their protein-like nature while leaving open questions about their biological roles.

ISB’s Role: Confirming Discovery at Scale

ISB played a central role in validating these discoveries, analyzing nearly 100,000 mass spectrometry experiments comprising more than 3.7 billion spectra drawn from publicly available datasets.

Using the ISB-developed Trans Proteomic Pipeline and the PeptideAtlas resources, researchers were able to confirm the existence of these peptideins with high confidence.

“Our developments in large-scale data processing at ISB provide us the capability to analyze the world’s publicly available proteomics data at scale, which was essential to this discovery,” said ISB Principal Scientist Dr. Eric Deutsch, co-first author of the study. “This work highlights the importance of open science infrastructure and data standards leadership, and demonstrates how large-scale data integration can reveal entirely new layers of biology.”

Implications for Disease and Therapy

Many of the newly identified peptideins are detected within cells and presented on cell surfaces as antigens to the immune system, making them potential targets for targeted cancer therapeutics. Others may play roles in gene regulation, homeostasis, cell division, and disease processes that have so far remained unexplained.

In early experiments, researchers identified several peptideins that appear essential for cancer cell survival, suggesting they could serve as future drug targets.

“This is not just an expansion of the protein catalog — it’s a shift in how we think about gene function,” Moritz said. “We are opening the door to an entirely new area of biology that could have profound implications for understanding and treating human disease.”

The research was conducted in conjunction with the TransCODE Consortium, an international collaboration of more than 60 scientists across 30 institutions worldwide focusing on identifying, annotating, and cataloging human non-canonical open reading frames (ncORFs).