Digested Protein DB - Enterprise

DigestedProteinDB Engine

Enterprise-grade, high-performance peptide indexing and search engine

Overview

The Enterprise Engine (Rust) is an ultra-optimized implementation of DigestedProteinDB designed for large-scale proteomics and metaproteomics.

The Java version serves as the open-source research reference, while this engine delivers sub-millisecond mass-range queries and seamless embedding into high-performance computing (HPC) pipelines.

Key capabilities

Ultra-fast mass search

Retrieve candidate peptides within narrow precursor mass windows in milliseconds, even for very large databases.

Custom database generation

Build tailored peptide indexes for specific organisms, taxonomies, enzymes, and digestion parameters.

Out-of-core design

No need to load the full digest into memory. Works efficiently with very large UniProt-scale datasets.

CLI-first architecture

Scriptable and pipeline-friendly command-line interface suitable for HPC and automated workflows.

Embeddable engine

Designed to be integrated into external software, services, or internal proteomics pipelines.

Enterprise customization

Supports custom builds, organism-specific databases, and performance tuning for production environments.

Performance snapshot

These results correspond to the largest database build we have benchmarked so far, which best reflects peak-scale performance. Additional snapshots for other database sizes will be added as they become available.

Enterprise Engine Specifications (Rust)

High-performance

Database Version	UniProtKB/TrEMBL (Release 2026_01)
Taxonomy & Scope	All Organisms (proteome-wide) - full global protein index.
Scale	202.55 million proteins / 17.20 billion peptides
Digestion Parameters	Enzyme: Trypsin Missed Cleavages: Up to 2 allowed Peptide Length: 6 to 50 amino acids
Search Performance	~150 ms average mass-range query time.
Disk Footprint	~255 GB (Optimized with Snappy/Zstd compression & 5-bit encoding)
Memory Usage	~316 MB peak RAM (Out-of-core indexing)
Storage Engine	RocksDB key-value store (optimized Rust core)

Designed for ultra-fast candidate retrieval in large-scale metaproteomics.

DigestedProteinDB — bash — 120x40

user@proteomics-hpc:~$ massq 1800.0000 1800.0002 --db-path ./uniprot_trembl_db

#       Mass      Peptide             Accession   TaxID
1       1800.0001
                AGASCPICKKEIQLVIK   Q2HJ21      9913
                AGASCPICKKEIQLVIK   O15151      9606
                AVARMSVLSELCLPLAK   Q1WRS8      362948
                GGKGDLCIVLNVLLMQK   Q83NI2      218496
                GGKGDLCIVLNVLLMQK   Q83MZ4      203267
                GLMPLGITDEIRKMVK    A2RMH5      416870
                GLMPLGITDEIRKMVK    Q02XB8      272622
                ILMGASVGIPASSLCIIR  Q92275      5334
                KGQIVMTSDKPPKMLK    A0RLX8      360106
                KTMPLILSGVDVVAMAR   O49289      3702
                MIPMIVLATTNQNKVK    Q6AQD7      177439
                ... (1118 additional entries) ...

2       1800.0002
                VEEIYEDDEMNT        A0A1Y2ULM3  9606
                CHGWGGCHHIR         A0A8R8NSG9  10090
                ... (2394 additional entries) ...
... 
[SUCCESS] Found 239 peptides matching criteria.
Execution time: 469.129µs
user@proteomics-hpc:~$ _

Architecture and workflow

The diagram is split into two parts: database construction (left) and mass-based search (right). The first shows the build pipeline from UniProtKB input through digestion and encoding into a RocksDB index. The second shows how experimental mass queries are matched against the indexed data to return candidate peptides.

High-level build and query flow for the enterprise engine.

Downloads

Downloadable builds and pre-built databases will be published as releases become available.

CLI Binary (Rust)

Optimized Rust binary for Linux, macOS, and Windows. Features sub-millisecond query resolution and ultra-low memory footprint.

Pre-release testing in progress.

Pre-built Databases

Download the complete UniProtKB/TrEMBL 2026 digest (Trypsin, MC=2). Approximately 255 GB of optimized RocksDB data.

Direct high-speed mirror links will be provided.

Community / Research

Java open-source implementation
Academic and research use
Basic digestion and search
Evaluation and prototyping

Enterprise Engine (Rust)

High-performance optimized core
Low-overhead CLI execution
Designed for embedding
Custom database builds
Production- and HPC-ready

Typical use cases

large-scale proteomics database preparation
metaproteomics search space reduction
DIA/DDA candidate filtering by precursor mass
core facility peptide indexing infrastructure
backend engine for custom proteomics software

Enterprise access and collaboration

For custom database builds, pipeline integration, embedding support, or commercial licensing, please get in touch.

Contact

Department: Bioinformatics Laboratory, Faculty of Food Technology and Biotechnology University of Zagreb, Croatia

GitHub: Digested Protein DB

Email: jdiminic@pbf.hr