productiveSeq()
Select productive nucleotide/amino acid CDR3 sequences
from a tibble containing raw AIRR formatted data. Aggregation of the raw data
is either done on the productive CDR3 amino acid sequence (junction_aa) or
the productive CDR3 nucleotide sequence (junction). If "junction_aa"
is selected, then resulting tibble will display the most frequently observed.
V, D, J gene that were associated with the formation of the productive CDR3
amino acid sequence. If "junction" is selected then all columns in the
original list will be present in the outputted list. The difference in
output is due to the fact that the same amino acid CDR3 sequence may be
encoded by multiple unique junction sequences with differing V, D, and J
genes.
Arguments
- study_table
A tibble consisting antigen receptor sequencing data imported by the LymphoSeq2 function
readImmunoSeq()
. "junction_aa", "duplicate_count", and "duplicate_frequency" are required columns- aggregate
Indicates whether the values of "duplicate_count" and "duplicate_frequency" should be aggregated by amino acid or junction sequence. Acceptable values are "junction_aa" or "junction"
- prevalence
A Boolean value
TRUE
: Add a new column the study table giving the prevalence of each CDR3 amino acid sequence in 55 healthy donor peripheral blood samples.FALSE
(the default): Do not add prevelance information
Value
Returns a list of data frames of productive amino acid sequences with recomputed values for "duplicate_count", "duplicate_frequency". A productive sequences is defined as a sequences that is in frame and does not have an early stop codon.
Examples
file_path <- system.file("extdata", "TCRB_sequencing",
package = "LymphoSeq2")
study_table <- LymphoSeq2::readImmunoSeq(path = file_path, threads = 1)
study_table <- LymphoSeq2::topSeqs(study_table, top = 100)
amino_table <- LymphoSeq2::productiveSeq(
study_table = study_table,
aggregate = "junction_aa",
prevalence = TRUE
)
nucleotide_table <- LymphoSeq2::productiveSeq(
study_table = study_table,
aggregate = "junction",
prevalence = FALSE
)