readImmunoSeq() Imports tab-separated value (.tsv) files exported by the
Adaptive Biotechnologies ImmunoSEQ analyzer, BGI IR-SEQ, MiXCR and stores
them as MiAIRR compliant tibble. Optimized for large dataset performance.
Usage
readImmunoSeq(
path,
recursive = FALSE,
threads = parallel::detectCores()/2,
parallel = TRUE,
chunk_size = NULL,
max_memory_gb = 8,
streaming_mode = FALSE,
progress_detail = "basic",
return_type = "data.table",
use_arrow = "auto",
sample_mode = FALSE,
sample_size = 1e+06
)Arguments
- path
Path to the directory containing tab-delimited files. Only files with the extension .tsv are imported. The names of the data frames are the same as names of the files.
- recursive
A Boolean value
TRUE: the function will recursively search directory for all .tsv filesFALSE(the default): Open file using path
- threads
Number of threads.
- parallel
A Boolean value
TRUE(the default): Process files in parallel using futureFALSE: Process files sequentially
- chunk_size
Integer specifying the number of files to process in each chunk for memory efficiency. Default is NULL (process all files at once).
- max_memory_gb
Maximum memory to use in GB. If exceeded, will switch to chunked processing automatically. Default is 8GB.
- streaming_mode
Boolean. If TRUE, processes extremely large files in streaming chunks to avoid memory limits. Default is FALSE.
- progress_detail
Level of progress reporting: "none", "basic", "detailed". Default is "basic".
- return_type
Character string specifying return type: "data.table" (default), "tibble" (tidyverse compatible), or "lazy_dt" (dtplyr - best of both worlds). lazy_dt provides tidyverse syntax with data.table performance.
- use_arrow
Character or logical. Controls Apache Arrow usage for large datasets: "auto" (default) enables Arrow automatically for datasets >5GB, "always" forces Arrow, "never" disables Arrow, TRUE/FALSE for simple enable/disable.
- sample_mode
Boolean. If TRUE, processes only a random sample of sequences from large datasets for quick analysis. Default is FALSE.
- sample_size
Integer. Number of sequences to sample per file when sample_mode=TRUE. Default is 1,000,000.
Examples
file_path <- system.file("extdata", "TCRB_sequencing",
package = "LymphoSeq2")
study_table <- LymphoSeq2::readImmunoSeq(
path = file_path, recursive = FALSE,
threads = 1
)
#> Dataset Analysis:
#> Files: 10, Total: 0.00 GB, Largest: 0.0 MB
#> Available memory: 11.6 GB
study_table <- LymphoSeq2::topSeqs(study_table, top = 100)
