Generate rarefaction curves for diversity estimation

Estimate repertoire diversity across different sequencing depths using rarefaction (interpolation) and extrapolation. This uses the iNEXT algorithm to help determine if sequencing depth is sufficient to capture repertoire diversity.

Usage

runINext(sample_table, q = 0, endpoint = 1e+05, nboot = 10, conf = 0.95)

Arguments

sample_table

A tibble from readImmunoSeq() or productiveSeq() containing "junction_aa", "duplicate_count", and "repertoire_id" columns. Can contain one or multiple repertoires.

q

Diversity order to calculate:

0 (default): Species richness (number of unique clones)
1: Shannon diversity (accounts for evenness)
2: Simpson diversity (emphasizes abundant clones)

endpoint

Maximum sequencing depth for extrapolation. Default is 100000. Set higher to predict diversity at deeper sequencing.

nboot

Number of bootstrap iterations for confidence intervals (default 10). Higher values give more precise estimates but take longer.

conf

Confidence level for intervals (default 0.95 for 95% CI)

Value

A tibble with rarefaction/extrapolation results:

m: Sample size (sequencing depth)
Method: "Rarefaction", "Observed", or "Extrapolation"
Order.q: Diversity order (same as q parameter)
qD: Estimated diversity at depth m
qD.LCL: Lower confidence limit
qD.UCL: Upper confidence limit
SC: Standard error from bootstrap
repertoire_id: Sample identifier

Details

This function wraps the iNEXT package (Chao et al. 2014) for rarefaction and extrapolation analysis. It converts the sample table into a matrix format where rows are unique sequences and columns are repertoires, then runs iNEXT on all samples simultaneously.

Rarefaction vs Extrapolation:

Rarefaction (m less than observed): Subsample sequences to depth m and count unique clones. Shows how diversity increases with sequencing depth. Useful to compare samples at equal depth.

Extrapolation (m greater than observed): Predict diversity at deeper sequencing using Chao1 estimator for unseen species. Shows whether sequencing is complete (plateau) or more diversity remains (still increasing).

How to interpret the curve: Plateau reached = Sequencing depth is sufficient, most clones captured. Still increasing steeply = Need deeper sequencing to capture full diversity. Comparing samples = Use rarefied diversity at same depth, not raw counts.

References

Chao, A., et al. (2014). Rarefaction and extrapolation with Hill numbers: a framework for sampling and estimation in species diversity studies. Ecological Monographs, 84(1), 45-67.

Examples

if (FALSE) { # \dontrun{
file_path <- system.file("extdata", "TCRB_sequencing",
 package = "LymphoSeq2")
study_table <- LymphoSeq2::readImmunoSeq(path = file_path, threads = 1)
amino_table <- LymphoSeq2::productiveSeq(study_table,
  aggregate = "junction_aa",
  prevalence = TRUE
)
# Run on all samples at once
rarefaction_table <- LymphoSeq2::runINext(amino_table)
} # }