Skip to content

Core Concepts

Workspace hardening

Threading analyzes your code and prepares it for reproducible, accelerated execution.

A hardened workspace includes:

  • Optimized kernels — Bottlenecks vectorized and parallelized
  • Dependency lock — Complete, pinned environment
  • Provenance manifest — Hashes of code, data, and parameters
  • Configuration — Extracted parameters

Provenance

Everything needed to reproduce an experiment exactly.

code:
  - path: src/pca.py
    hash: a13f9c2e

notebooks:
  - path: 03_pca_analysis.ipynb
    hash: f91b2e88

data:
  - path: data/processed/counts_filtered.tsv
    sha256: 8f3a2e1c...

parameters:
  source: config/params.yaml
  values:
    pca.n_components: 50

UV (Unified Versioning)

The UV tree captures your complete execution environment:

  • Direct dependencies
  • Transitive dependencies
  • System dependencies
  • Implicit imports

Computation graph

Threading builds a DAG from your code:

Load Data → Normalize → PCA → Plot

This enables dependency analysis, parallelization, caching, and checkpointing.


Kernel optimization

Threading identifies atomic computation units and applies:

Strategy Speedup
Vectorization 5-50x
Parallelization 4-32x
GPU offload 10-1000x
Memory layout 2-10x
Kernel fusion 1.5-3x

Data parallelism

Same computation runs on different data shards across nodes.

$ threading ld s3://bio-data/microbiome_full_v2
[dataset] Sharding input across 4 nodes

Parameter sweeps

[iterate] Sweeping: pca.n_components=[10,20,30,40,50]
  • Reusing compiled kernels
  • Provenance captured for each sweep