Skip to contents

readImmunoSeq() Imports tab-separated value (.tsv) files exported by the Adaptive Biotechnologies ImmunoSEQ analyzer, BGI IR-SEQ, MiXCR and stores them as MiAIRR compliant tibble. Optimized for large dataset performance.

Usage

readImmunoSeq(
  path,
  recursive = FALSE,
  threads = parallel::detectCores()/2,
  parallel = TRUE,
  chunk_size = NULL,
  max_memory_gb = 8,
  streaming_mode = FALSE,
  progress_detail = "basic",
  return_type = "data.table",
  use_arrow = "auto",
  sample_mode = FALSE,
  sample_size = 1e+06
)

Arguments

path

Path to the directory containing tab-delimited files. Only files with the extension .tsv are imported. The names of the data frames are the same as names of the files.

recursive

A Boolean value

  • TRUE : the function will recursively search directory for all .tsv files

  • FALSE (the default): Open file using path

threads

Number of threads.

parallel

A Boolean value

  • TRUE (the default): Process files in parallel using future

  • FALSE: Process files sequentially

chunk_size

Integer specifying the number of files to process in each chunk for memory efficiency. Default is NULL (process all files at once).

max_memory_gb

Maximum memory to use in GB. If exceeded, will switch to chunked processing automatically. Default is 8GB.

streaming_mode

Boolean. If TRUE, processes extremely large files in streaming chunks to avoid memory limits. Default is FALSE.

progress_detail

Level of progress reporting: "none", "basic", "detailed". Default is "basic".

return_type

Character string specifying return type: "data.table" (default), "tibble" (tidyverse compatible), or "lazy_dt" (dtplyr - best of both worlds). lazy_dt provides tidyverse syntax with data.table performance.

use_arrow

Character or logical. Controls Apache Arrow usage for large datasets: "auto" (default) enables Arrow automatically for datasets >5GB, "always" forces Arrow, "never" disables Arrow, TRUE/FALSE for simple enable/disable.

sample_mode

Boolean. If TRUE, processes only a random sample of sequences from large datasets for quick analysis. Default is FALSE.

sample_size

Integer. Number of sequences to sample per file when sample_mode=TRUE. Default is 1,000,000.

Value

Returns a data.table or tibble with MiAIRR headers and repertoire_id

Examples

file_path <- system.file("extdata", "TCRB_sequencing",
 package = "LymphoSeq2")
study_table <- LymphoSeq2::readImmunoSeq(
  path = file_path, recursive = FALSE,
  threads = 1
)
#> Dataset Analysis:
#>   Files: 10, Total: 0.00 GB, Largest: 0.0 MB
#>   Available memory: 11.6 GB
study_table <- LymphoSeq2::topSeqs(study_table, top = 100)