kmer-ord Workflow Overview

This section illustrates typical workflows with kmer-ord, including projection, clustering, visualisation, and interactive binning.

Each page walks through how to run commands, in what order, and how outputs connect, providing a hands-on guide for real datasets.

Typical Workflow

A standard kmer-ord analysis consists of the following stages:

Projection — Generate k-mer embeddings of your reads
Learn how to run: Projection pipeline
Clustering — Group reads based on high-dimensional embeddings
Learn how to run: Clustering pipeline
Visualisation — Inspect embeddings and feature distributions
Learn how to run: Visualisation
Interactive Binning — Select clusters and save bins using the Dash interface
Learn how to run: Interactive binning

Projection pipeline (project)

project

Convert raw reads into k-mer embeddings. Includes normalization, dimensionality reduction, and optional PCA preprocessing.

Workflow tip: Start here to create embeddings for downstream clustering or interactive binning.
```
kmer-ord project \
    -i reads.fastq.gz \
    -o output_dir \
    --dr umap,localmap \
    --norm clr \
    --dims 2
```
Clustering pipeline (cluster)

cluster

Cluster reads based on embeddings. Use density-based (HDBSCAN), graph-based (Leiden), or other methods.

Workflow tip: Run after project to identify read clusters and generate databases for visualization.
```
kmer-ord cluster \
    -i reads.fastq.gz \
    -o output_dir \
    --dr umap \
    --dims 15 \
    --cluster hdbscan
```
Visualisation pipeline (visualise)

visualise

Generate plots to explore embeddings, feature distributions, and cluster results.

Workflow tip: Useful to inspect clustering results before binning, or to generate figures for reports.
```
kmer-ord visualise \
    -d results/kmer-ord.sqlite \
    --embedding-mode all
```
Interactive binning (bin)

bin

Launch the b2w Dash app for manual binning. Supports lasso selection, overlaying points, and exporting bins.

Workflow tip: Use after clustering or embedding exploration to extract high-confidence read subsets.
```
kmer-ord bin -d results/kmerord.sqlite -o results/bins
```

Workflows are modular — you can run each step independently or as part of a pipeline.
Output from one step is usually input for the next (e.g., embeddings → clustering → visualization → binning).
Use visualisation and binning iteratively to validate clusters and refine selections.
Check the CLI reference for full command options and advanced configurations.

This iterative approach ensures reproducible, high-confidence bins while giving users flexibility to explore their data.

Always check your embeddings visually — patterns may not always be obvious from metrics alone.
Use overlays and feature comparisons to validate clusters before creating bins.
Filters affect both visualization and binning — apply with caution.