Enterprise-grade, high-performance peptide indexing and search engine
The Enterprise Engine (Rust) is an ultra-optimized implementation of DigestedProteinDB designed for large-scale proteomics and metaproteomics.
The Java version serves as the open-source research reference, while this engine delivers sub-millisecond mass-range queries and seamless embedding into high-performance computing (HPC) pipelines.
Retrieve candidate peptides within narrow precursor mass windows in milliseconds, even for very large databases.
Build tailored peptide indexes for specific organisms, taxonomies, enzymes, and digestion parameters.
No need to load the full digest into memory. Works efficiently with very large UniProt-scale datasets.
Scriptable and pipeline-friendly command-line interface suitable for HPC and automated workflows.
Designed to be integrated into external software, services, or internal proteomics pipelines.
Supports custom builds, organism-specific databases, and performance tuning for production environments.
These results correspond to the largest database build we have benchmarked so far, which best reflects peak-scale performance. Additional snapshots for other database sizes will be added as they become available.
| Database Version | UniProtKB/TrEMBL (Release 2026_01) |
| Taxonomy & Scope | All Organisms (proteome-wide) - full global protein index. |
| Scale | 202.55 million proteins / 17.20 billion peptides |
| Digestion Parameters |
Enzyme: Trypsin Missed Cleavages: Up to 2 allowed Peptide Length: 6 to 50 amino acids |
| Search Performance | ~150 ms average mass-range query time. |
| Disk Footprint | ~255 GB (Optimized with Snappy/Zstd compression & 5-bit encoding) |
| Memory Usage | ~316 MB peak RAM (Out-of-core indexing) |
| Storage Engine | RocksDB key-value store (optimized Rust core) |
user@proteomics-hpc:~$ massq 1800.0000 1800.0002 --db-path ./uniprot_trembl_db # Mass Peptide Accession TaxID 1 1800.0001 AGASCPICKKEIQLVIK Q2HJ21 9913 AGASCPICKKEIQLVIK O15151 9606 AVARMSVLSELCLPLAK Q1WRS8 362948 GGKGDLCIVLNVLLMQK Q83NI2 218496 GGKGDLCIVLNVLLMQK Q83MZ4 203267 GLMPLGITDEIRKMVK A2RMH5 416870 GLMPLGITDEIRKMVK Q02XB8 272622 ILMGASVGIPASSLCIIR Q92275 5334 KGQIVMTSDKPPKMLK A0RLX8 360106 KTMPLILSGVDVVAMAR O49289 3702 MIPMIVLATTNQNKVK Q6AQD7 177439 ... (1118 additional entries) ... 2 1800.0002 VEEIYEDDEMNT A0A1Y2ULM3 9606 CHGWGGCHHIR A0A8R8NSG9 10090 ... (2394 additional entries) ... ... [SUCCESS] Found 239 peptides matching criteria. Execution time: 469.129µs user@proteomics-hpc:~$ _
The diagram is split into two parts: database construction (left) and mass-based search (right). The first shows the build pipeline from UniProtKB input through digestion and encoding into a RocksDB index. The second shows how experimental mass queries are matched against the indexed data to return candidate peptides.
High-level build and query flow for the enterprise engine.
Downloadable builds and pre-built databases will be published as releases become available.
Optimized Rust binary for Linux, macOS, and Windows. Features sub-millisecond query resolution and ultra-low memory footprint.
Download the complete UniProtKB/TrEMBL 2026 digest (Trypsin, MC=2). Approximately 255 GB of optimized RocksDB data.
For custom database builds, pipeline integration, embedding support, or commercial licensing, please get in touch.
Contact