Perform multiple sequence alignment using one of three methods and output results to the console or as a pdf file. One may perform the alignment of all amino acid or nucleotide sequences in a single repertoire_id. Alternatively, one may search for a given sequence within a list of samples using an edit distance threshold.
Usage
alignSeq(
study_table,
repertoire_ids = NULL,
sequence_list = NULL,
edit_distance = 15,
type = "junction",
method = "ClustalOmega",
top = 150
)Arguments
- study_table
A tibble consisting of antigen receptor sequences imported by the LymphoSeq2 function
readImmunoSeq().- repertoire_ids
A character vector indicating the name of the repertoire_id in the productive sequence list.
- sequence_list
A character vector of one ore more amino acid or nucleotide CDR3 sequences to search.
- edit_distance
An integer giving the minimum edit distance that the sequence must be less than or equal to. See details below.
- type
A character vector indicating whether "junction_aa" or "junction" sequences should be aligned. If "junction_aa" is specified, then run
productiveSeq()first.- method
A character vector indicating the multiple sequence alignment method to be used. Refer to the Bioconductor "msa" package for more details. Options include "ClustalW", "ClustalOmega", and "Muscle".
- top
The number of top sequences to perform alignment.
Details
Edit distance is a way of quantifying how dissimilar two sequences are to one another by counting the minimum number of operations required to transform one sequence into the other. For example, an edit distance of 0 means the sequences are identical and an edit distance of 1 indicates that the sequences different by a single amino acid or junction.
See also
If having trouble saving pdf files, refer to Bioconductor package msa for installation instructions http://bioconductor.org/packages/release/bioc/vignettes/msa/inst/doc/msa.pdf
Examples
file_path <- system.file("extdata", "IGH_sequencing", package = "LymphoSeq2")
study_table <- LymphoSeq2::readImmunoSeq(path = file_path, threads = 1)
#> Dataset Analysis:
#> Files: 10, Total: 0.00 GB, Largest: 0.0 MB
#> Available memory: 14.6 GB
study_table <- LymphoSeq2::topSeqs(study_table, top = 100)
nucleotide_table <- LymphoSeq2::productiveSeq(study_table, aggregate = "junction")
LymphoSeq2::alignSeq(nucleotide_table,
repertoire_ids = "IGH_MVQ92552A_BL", type = "junction",
method = "ClustalW"
)
#> use default substitution matrix
#> CLUSTAL 2.1
#>
#> Call:
#> msa::msa(string_list, method = method)
#>
#> MsaDNAMultipleAlignment with 43 rows and 179 columns
#> aln names
#> [1] -------------------------...GTACTTCGATCTCTGGGGCCGTGGC IGH_MVQ92552A_BL_33
#> [2] -------------------------...GTACTTCGATCTCTGGGGCCCTGGC IGH_MVQ92552A_BL_43
#> [3] -------------------------...GTACTTCGATCTCTGGGGCCGTGGC IGH_MVQ92552A_BL_9
#> [4] -------------------------...CGGTATGGACGTCTGGGGCCAAGGG IGH_MVQ92552A_BL_14
#> [5] -------------------------...CGGTTTGGACGTCTGGGGCCAAGGG IGH_MVQ92552A_BL_27
#> [6] -------------------------...CGTTTGGGACTACTGGGGCCAGGGA IGH_MVQ92552A_BL_28
#> [7] -------------------------...CTACTTTGACTACTGGGGCCAGGGA IGH_MVQ92552A_BL_32
#> [8] -------------------------...TTGCTATAGGCCAAGGGGCCAGGGA IGH_MVQ92552A_BL_22
#> [9] -------------------------...CTACTTCGACCCCTGGGGCCAGGGA IGH_MVQ92552A_BL_36
#> ... ...
#> [36] ------------------------G...-GGTATGGACGTCTGGGGCCAAGGG IGH_MVQ92552A_BL_39
#> [37] ---------------------CGCG...CTACATGGACGTCTGGGGCAAAGGG IGH_MVQ92552A_BL_25
#> [38] ------------------CTCCAGA...-TACATGGACGTCTGGGGCAAAGGG IGH_MVQ92552A_BL_29
#> [39] ---------------------AGCC...ACATCGTGACTCCTGGGGCCAGGGA IGH_MVQ92552A_BL_4
#> [40] ------------------CCCAAGT...TGCTTG-GAGCACTGGGGCCAGGGC IGH_MVQ92552A_BL_23
#> [41] ------------------------A...GTACTTTGACTACTGGGGCCAGGGA IGH_MVQ92552A_BL_2
#> [42] -----------TCTCCATCTGTAGA...CGACTTTGACTACTGGGGCCAGGGA IGH_MVQ92552A_BL_42
#> [43] ----------------------GCC...CTACATGGACGTCTGGGGCAAAGGG IGH_MVQ92552A_BL_38
#> Con ----------------------???...???CTT?GAC??CTGGGGCCAGGGA Consensus
