BOLDistilled
Using an open-source algorithm, each BOLDistilled library captures a snapshot of the genetic diversity and associated taxonomic information on BOLD in a comprehensive but compact dataset. These libraries are versioned and publicly available, and archived for reproducibility of analyses conducted on a particular sequence array.
Presently, a BOLDistilled library is only available for COI.
Coming Soon

Optimizing DNA Barcode Libraries
The BOLDistilling process employs an algorithm that acts on each BIN to select the minimal number of sequences that effectively capture its genetic diversity, reducing over 17M records to approximately 1.2M based on a 0.75% divergence threshold. Our tests indicated that this threshold led to a nearly ten-fold reduction in the size of the library while offering enough resolution to accurately infer taxonomy in taxa with high intra-specific genetic variation. This value might change slightly with future study and will be reported in the metadata accompanying each BOLDistilled library.
Key Benefits
01
reduced computation
Incorporation of a BOLDistilled library into current bioinformatic workflows can reduce 24 hours of computation by 98% – to less than 30 minutes.
02
democratizes research
Analyses using these libraries can be run locally on low-end computers without an Internet connection, ideal for use in remote communities.
03
accurate taxonomy
Inconsistencies are resolved prior to analysis, reducing the risk of misidentifications while maintaining intraspecific genetic variation.
04
bold.export
Export sequence data to FASTA or CSV/TSV formats with customizable sequence naming, preserving BCDM format.
05
bold.analyze.diversity
Calculate species richness and diversity indexes, visualizing results in plots and matrices for detailed analysis.
06
bold.analyze.map
Map geographic data from BOLD, displaying data points on global or regional maps in GIS-compatible formats.

Publication
Preprint publication is available at https://ecoevorxiv.org/repository/view/8991/
Capabilities
- Direct API Access: Effortlessly retrieve species identification, taxonomy, and sequence data from BOLD without writing any new code.
- Integration with Popular Tools BOLDconnectR supports the integration of BOLD data with other R packages, enabling comprehensive analysis and visualization.
- Custom Workflows Users can design tailored workflows for specific research needs, enhancing the efficiency and scope of their data analysis.
Connect with BOLDistilled
Presently, a BOLDistilled library is only available for COI. The BIN algorithm and our sequence divergence threshold have been fine-tuned based on our collective expertise into this locus. Similar libraries can certainly be produced for other loci (e.g., rbcLa or ITS2) and we will generate them based on demand and further exploration of the distillation parameters.
Coming Soon...
BOLDistilled libraries will be available from this URL following acceptance of the BOLDistilled manuscript. Preprint publication is available at https://ecoevorxiv.org/repository/view/8991/