Align multiple sequences

Perform multiple sequence alignment using one of three methods and output results to the console or as a pdf file. One may perform the alignment of all amino acid or nucleotide sequences in a single repertoire_id. Alternatively, one may search for a given sequence within a list of samples using an edit distance threshold.

Usage

alignSeq(
  study_table,
  repertoire_ids = NULL,
  sequence_list = NULL,
  edit_distance = 15,
  type = "junction",
  method = "ClustalOmega",
  top = 150
)

Arguments

study_table: A tibble consisting of antigen receptor sequences imported by the LymphoSeq2 function readImmunoSeq().
repertoire_ids: A character vector indicating the name of the repertoire_id in the productive sequence list.
sequence_list: A character vector of one ore more amino acid or nucleotide CDR3 sequences to search.
edit_distance: An integer giving the minimum edit distance that the sequence must be less than or equal to. See details below.
type: A character vector indicating whether "junction_aa" or "junction" sequences should be aligned. If "junction_aa" is specified, then run productiveSeq() first.
method: A character vector indicating the multiple sequence alignment method to be used. Refer to the Bioconductor "msa" package for more details. Options include "ClustalW", "ClustalOmega", and "Muscle".
top: The number of top sequences to perform alignment.

Value

Performs a multiple sequence alignment object.

Details

Edit distance is a way of quantifying how dissimilar two sequences are to one another by counting the minimum number of operations required to transform one sequence into the other. For example, an edit distance of 0 means the sequences are identical and an edit distance of 1 indicates that the sequences different by a single amino acid or junction.

Examples

file_path <- system.file("extdata", "IGH_sequencing", package = "LymphoSeq2")
study_table <- LymphoSeq2::readImmunoSeq(path = file_path, threads = 1)
#> Registered S3 methods overwritten by 'readr':
#>   method                    from 
#>   as.data.frame.spec_tbl_df vroom
#>   as_tibble.spec_tbl_df     vroom
#>   format.col_spec           vroom
#>   print.col_spec            vroom
#>   print.collector           vroom
#>   print.date_names          vroom
#>   print.locale              vroom
#>   str.col_spec              vroom
study_table <- LymphoSeq2::topSeqs(study_table, top = 100)
nucleotide_table <- LymphoSeq2::productiveSeq(study_table, aggregate = "junction")
LymphoSeq2::alignSeq(nucleotide_table,
  repertoire_ids = "IGH_MVQ92552A_BL", type = "junction",
  method = "ClustalW"
)
#> use default substitution matrix
#> CLUSTAL 2.1  
#> 
#> Call:
#>    msa::msa(string_list, method = method)
#> 
#> MsaDNAMultipleAlignment with 42 rows and 179 columns
#>      aln                                                   names
#>  [1] -------------------------...CCACATGGACGTCTGGGGCAAAGGG IGH_MVQ92552A_BL_23
#>  [2] -------------------------...CGCTATGGACGTCTGGGGCCAAGGG IGH_MVQ92552A_BL_36
#>  [3] -------------------------...CTACATGGACGTCTGGGGCAAAGGG IGH_MVQ92552A_BL_40
#>  [4] -------------------------...CCACATGGACGTCTGGGGCAAAGGG IGH_MVQ92552A_BL_6
#>  [5] ------------------------G...-GGTATGGACGTCTGGGGCCAAGGG IGH_MVQ92552A_BL_38
#>  [6] ---------------------CGCG...CTACATGGACGTCTGGGGCAAAGGG IGH_MVQ92552A_BL_24
#>  [7] ------------------------A...TGCTTTTGATGTTTGGGGCCAAGGG IGH_MVQ92552A_BL_3
#>  [8] -------------------------...CGCTATGGACGTCTGGGGCCAAGGG IGH_MVQ92552A_BL_11
#>  [9] -------------------------...CTACTTTGACGACTGGGGCCAGGGA IGH_MVQ92552A_BL_8 
#>  ... ...
#> [35] ------------CACCATCTCCAGA...--TCTTTGAATACTGGGGCCAGGGA IGH_MVQ92552A_BL_12
#> [36] ---------AGTCACGATTACCGCG...--GTTCGGGGAATTGGGGCCAGGGA IGH_MVQ92552A_BL_5
#> [37] ------------------GACAACA...--CTTTTGATTTTTGGGGCCAAGGG IGH_MVQ92552A_BL_34
#> [38] ---------------CATGACCAGG...--ACTTTGACTACTGGGGCCAGGGA IGH_MVQ92552A_BL_21
#> [39] ---------------------CGCG...--GCTTTGACCAGTGGGGCCAGGGA IGH_MVQ92552A_BL_25
#> [40] ------------------CTCCAGA...--TCCTCGACTATTGGGGCCAGGGA IGH_MVQ92552A_BL_29
#> [41] ------------------CTCCAGA...--ACATGGACGTCTGGGGCAAAGGG IGH_MVQ92552A_BL_28
#> [42] ----------------------GCC...CTACATGGACGTCTGGGGCAAAGGG IGH_MVQ92552A_BL_37
#>  Con ----------------------???...--?CTT?GAC?ACTGGGGCCAGGGA Consensus

Usage

Arguments

Value

Details

See also

Examples