BOLDistilled

Using an open-source algorithm, each BOLDistilled library captures a snapshot of the genetic diversity and associated taxonomic information on BOLD in a comprehensive but compact dataset. These libraries are versioned and publicly available, and archived for reproducibility of analyses conducted on a particular sequence array.

Presently, a BOLDistilled library is only available for COI.

Coming Soon

Current Libraries:

Optimizing DNA Barcode Libraries

The BOLDistilling process employs an algorithm that acts on each BIN to select the minimal number of sequences that effectively capture its genetic diversity, reducing over 17M records to approximately 1.2M based on a 0.75% divergence threshold. Our tests indicated that this threshold led to a nearly ten-fold reduction in the size of the library while offering enough resolution to accurately infer taxonomy in taxa with high intra-specific genetic variation. This value might change slightly with future study and will be reported in the metadata accompanying each BOLDistilled library.

Key Benefits

01

reduced computation

Incorporation of a BOLDistilled library into current bioinformatic workflows can reduce 24 hours of computation by 98% – to less than 30 minutes.

02

democratizes research

Analyses using these libraries can be run locally on low-end computers without an Internet connection, ideal for use in remote communities.

03

accurate taxonomy

Inconsistencies are resolved prior to analysis, reducing the risk of misidentifications while maintaining intraspecific genetic variation.

04

bold.export

Export sequence data to FASTA or CSV/TSV formats with customizable sequence naming, preserving BCDM format.

05

bold.analyze.diversity

Calculate species richness and diversity indexes, visualizing results in plots and matrices for detailed analysis.

06

bold.analyze.map

Map geographic data from BOLD, displaying data points on global or regional maps in GIS-compatible formats.

Publication

Preprint publication is available at https://ecoevorxiv.org/repository/view/8991/

Capabilities

  • Direct API Access: Effortlessly retrieve species identification, taxonomy, and sequence data from BOLD without writing any new code.
  • Integration with Popular Tools BOLDconnectR supports the integration of BOLD data with other R packages, enabling comprehensive analysis and visualization.
  • Custom Workflows Users can design tailored workflows for specific research needs, enhancing the efficiency and scope of their data analysis.

Connect with BOLDistilled

Presently, a BOLDistilled library is only available for COI. The BIN algorithm and our sequence divergence threshold have been fine-tuned based on our collective expertise into this locus. Similar libraries can certainly be produced for other loci (e.g., rbcLa or ITS2) and we will generate them based on demand and further exploration of the distillation parameters.

Coming Soon...

BOLDistilled libraries will be available from this URL following acceptance of the BOLDistilled manuscript. Preprint publication is available at https://ecoevorxiv.org/repository/view/8991/