kmer-ord Workflow Overview
This section illustrates typical workflows with kmer-ord, including projection, clustering, visualisation, and interactive binning.
Each page walks through how to run commands, in what order, and how outputs connect, providing a hands-on guide for real datasets.
Typical Workflow
A standard kmer-ord analysis consists of the following stages:
-
Projection — Generate k-mer embeddings of your reads
Learn how to run: Projection pipeline -
Clustering — Group reads based on high-dimensional embeddings
Learn how to run: Clustering pipeline -
Visualisation — Inspect embeddings and feature distributions
Learn how to run: Visualisation -
Interactive Binning — Select clusters and save bins using the Dash interface
Learn how to run: Interactive binning
Typical Workflows
-
Projection pipeline (
project)
Convert raw reads into k-mer embeddings. Includes normalization, dimensionality reduction, and optional PCA preprocessing.
Workflow tip: Start here to create embeddings for downstream clustering or interactive binning.
-
Clustering pipeline (
cluster)
Cluster reads based on embeddings. Use density-based (HDBSCAN), graph-based (Leiden), or other methods.
Workflow tip: Run after
projectto identify read clusters and generate databases for visualization. -
Visualisation pipeline (
visualise)
Generate plots to explore embeddings, feature distributions, and cluster results.
Workflow tip: Useful to inspect clustering results before binning, or to generate figures for reports.
-
Interactive binning (
bin)
Launch the b2w Dash app for manual binning. Supports lasso selection, overlaying points, and exporting bins.
Workflow tip: Use after clustering or embedding exploration to extract high-confidence read subsets.
Workflow Notes
- Workflows are modular — you can run each step independently or as part of a pipeline.
- Output from one step is usually input for the next (e.g., embeddings → clustering → visualization → binning).
- Use visualisation and binning iteratively to validate clusters and refine selections.
- Check the CLI reference for full command options and advanced configurations.
Recommended Workflow Sequence
- Prepare reads → clean, trim, or convert to FASTA/FASTQ.
- Project → generate embeddings using
project. - Cluster → perform clustering with
cluster. - Visualise → inspect embeddings and clusters with
visualise. - Interactive binning → fine-tune bins and export reads with
bin. - Optional → re-run subsets, explore parameter sweeps, or compare features.
This iterative approach ensures reproducible, high-confidence bins while giving users flexibility to explore their data.
Tips
- Always check your embeddings visually — patterns may not always be obvious from metrics alone.
- Use overlays and feature comparisons to validate clusters before creating bins.
- Filters affect both visualization and binning — apply with caution.
Next step: Projection pipeline