POPULARITY
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.10.19.344986v1?rss=1 Authors: Morgenstern, B., Blanke, M. Abstract: Phylogenetic placement is the task of placing a query sequence of unknown taxonomic origin into a given phylogenetic tree of a set of reference sequences. Several approaches to phylogenetic placement have been proposed in recent years. The most accurate of them need a multiple alignment of the reference sequences as input. Most of them also need alignments of the query sequences to the multiple alignment of the reference sequences. A major field of application of phylogenetic placement is taxonomic read assignment in metagenomics. Herein, we propose App-SpaM, an efficient alignment-free algorithm for phylogenetic placement of short sequencing reads on a tree of a set of reference genomes. App-SpaM is based on the Filtered Spaced Word Matches approach that we previously developed. Unlike other methods, our approach neither requires a multiple alignment of the reference genomes, nor alignments of the queries to the reference sequences. Moreover, App-SpaM works not only on assembled reference genomes, but can also take reference taxa as input for which only unassembled read sequences are available. The quality of the results achieved with App-SpaM is comparable to the best available approaches to phylogenetic placement. However, since App-SpaM is not based on sequence alignment, it is between one and two orders of magnitude faster than those existing methods. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.10.06.327585v1?rss=1 Authors: Ju, F., Zhu, J., Shao, B., Kong, L., Liu, T.-Y., Zheng, W.-M., Bu, D. Abstract: Protein functions are largely determined by the final details of their tertiary structures, and the structures could be accurately reconstructed based on inter-residue distances. Residue co-evolution has become the primary principle for estimating inter-residue distances since the residues in close spatial proximity tend to co-evolve. The widely-used approaches infer residue co-evolution using an indirect strategy, i.e., they first extract from the multiple sequence alignment (MSA) of query protein some handcrafted features, say, co-variance matrix, and then infer residue co-evolution using these features rather than the raw information carried by MSA. This indirect strategy always leads to considerable information loss and inaccurate estimation of inter-residue distances. Here, we report a deep neural network framework (called CopulaNet) to learn residue co-evolution directly from MSA without any handcrafted features. The CopulaNet consists of two key elements: i) an encoder to model context-specific mutation for each residue, and ii) an aggregator to model correlations among residues and thereafter infer residue co-evolutions. Using the CASP13 (the 13th Critical Assessment of Protein Structure Prediction) target proteins as representatives, we demonstrated the successful application of CopulaNet for estimating inter-residue distances and further predicting protein tertiary structure with improved accuracy and efficiency. Head-to-head comparison suggested that for 24 out of the 31 free modeling CASP13 domains, ProFOLD outperformed AlphaFold, one of the state-of-the-art prediction approaches. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.07.241729v1?rss=1 Authors: Edgar, R. C., Taylor, J., Altman, T., Barbera, P., Meleshko, D., Lin, V., Lohr, D., Novakovsky, G., Al-Shayeb, B., Banfield, J., Korobeynikov, A., Chikhi, R., Babaian, A. Abstract: Public sequence data represents a major opportunity for viral discovery, but its exploration has been inhibited by a lack of efficient methods for searching this corpus, which is currently at the petabase scale and growing exponentially. To address the ongoing pandemic caused by Severe Acute Respiratory Syndrome Coronavirus 2 and expand the known sequence diversity of viruses, we aligned pangenomes for coronaviruses (CoV) and other viral families to 5.6 petabases of public sequencing data from 3.8 million biologically diverse samples. To implement this strategy, we developed a cloud computing architecture, `Serratus`, tailored for ultra-high throughput sequence alignment at the petabase scale. From this search, we identified and assembled thousands of CoV and CoV-like genomes and genome fragments ranging from known strains to putatively novel genera. We generalise this strategy to other viral families, identifying several novel deltaviruses and huge bacteriophages. To catalyse a new era of viral discovery we made millions of viral alignments and family identifications freely available to the research community (https://serratus.io). Expanding the known diversity and zoonotic reservoirs of CoV and other emerging pathogens can accelerate vaccine and therapeutic developments for the current pandemic, and help us anticipate and mitigate future ones. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.07.242305v1?rss=1 Authors: Roe, D., Vierra-Green, C., Pyo, C.-W., Geraghty, D. E., Spellman, S. R., Maiers, M., Kuang, R. Abstract: Human chromosome 19q13.4 contains genes encoding killer-cell immunoglobulin-like receptors (KIR). Reported haplotype lengths range from 67 to 269 kilobases and contain 4 to 18 genes. The region has certain properties such as single nucleotide variation, structural variation, homology, and repetitive elements that make it hard to align accurately beyond single gene alleles. To the best of our knowledge, a multiple sequence alignment of KIR haplotypes has never been published or presented. Such an alignment would be useful to precisely define KIR haplotypes and loci, provide context for assigning alleles (especially fusion alleles) to genes, infer evolutionary history, impute alleles, interpret and predict co-expression, and generate markers. In order to extend the framework of KIR haplotype sequences in the human genome reference, 27 new sequences were generated including 24 haplotypes from 12 individuals of African American ancestry that were selected for genotypic diversity and novelty to the reference, to bring the total to 68 full length genomic KIR haplotype sequences. We leveraged these data and tools from our long-read KIR haplotype assembly algorithm to define and align KIR haplotypes at
Alfie Abdul-Rahman, (Oxford e-Research Centre, University of Oxford) gives a talk for the 2016 Digital Humanities at Oxford Summer School. In this lecture, I will present a web-based visual analytics approach for detecting similarity between texts. ViTA: Visualization for Text Alignment is the result of our “Commonplace Cultures: Mining Shared Passages in the 18th Century using Sequence Alignment and Visual Analytics” project under the Digging into Data Challenge Program (III) and it is a collaboration between the University of Oxford, the University of Chicago, and the Australian National University. The team comprises of computer scientists and domain experts in the fields of literary studies, intellectual history, and digital humanities. ViTA is a web-based visual analytics approach that allows domain experts to construct and modify a text alignment pipeline by visualizing the tools and connections for a specific method in conjunction with testing inputs and outputs. The construction of the text alignment is similar to that of an image processing pipeline. As the approach was embedded directly in the context of 18th century print culture, this approach was developed in an interdisciplinary manner, and was evaluated in intensive meetings with the domain experts at the design stage as well as after prototyping.
Bam.iobio is the first app of its kind that allows scientists to analyze genome sequence data on their web browser, interactively, and in real-time, without having to rely on terabytes of storage and vast sources of computing power. The resource, developed by a team led by Gabor Marth, DSc, co-director of the USTAR Center for Genetic Discovery and human genetics professor at the University of Utah, was published in the journal Nature Methods on Nov. 25. Marth explains bam.iobio, its features, another ready-to-use app – vcf.iobio - for analyzing variant call format files, and plans to allow developers to build off the IOBIO operating system. Learn more about bam.iobio.
The power of Scala keeps showing up in expected--and unexpected--places. In this talk, I will share how I used Scala in bioinformatics, neuroscience, online games, mobile applications, and desktop applications. In bioinformatics, the beauty of Scala really shines. Thanks to the abstraction in the Scala collection framework, creating collection types for DNA, RNA, and Protein Chains is super easy. With the proper collections at my disposal, I was able to implement algorithms such as Sequence Alignment, Genome Assembly, and Gibbs Sampling by focusing on the core ideas of the algorithm and not on the nitty-gritty details. On the top of it, Scala makes dynamic programming, which is fundamental in many bioinformatics algorithms, easy to implement and understand. Last year, I talked about using brainwaves to play games. Since then, I have taken things up a notch and started working to solve more practical problems such as maintaining attention while performing tasks (something that my age group suffers from a lot). This exposed me to a whole new and exciting world of Digital Signal Processing. This project benefited a lot from Reactive programming techniques in cleaning my code and making it more reliable. Not to be outdone by the serious work of algorithms, I took the desktop version of Collidium and converted it to an online HTML5 Game using Scala.js. I have also improved my Android game, Wordsteal. I even used Scala with Swing to save a few bucks for our budget-starved Student Store by creating a simple cash register application. In this talk, I will share many interesting techniques I learned and convey why I continue to be a huge fan of Scala. See you there! Shadaj is a 13 year old, who loves to program. He has programmed in Logo, NXT Mindstorm, Ruby, Python, and C. However, he loves programming in Scala. Besides programming, he likes Math and Science. When he grows up, he wants to be a robotic scientist. Shadaj hosts his projects on GitHub, and has a channel on Youtube. He has presented at Scala Days 2012 and the Bay Area Scala Enthusiast group showing his Scala projects. When not doing his school work or programming, he plays guitar, sitar, and games, some of which he created. To view the video visit www.parleys.com. To get started building reactive apps visit: http://www.typesafe.com/activator.
Describes how to perform a multiple sequence alignment of protein amino acid sequences.
Blackburne, B (Manchester) Wednesday 22 June 2011, 15:00-15:20
Use of multiple sequence alignment to build a model of a set of biologically related sequences. Profiles, log-odds ratios.
The center tree method and analysis; progressive alignment, guide trees, CLUSTAL, uses of multiple alignment
Continuation of multiple sequence alignment; sum-of-pairs objective function; tree consistency theorem; factor-of-two approximation.
Start of discussion on Multiple Sequence Alignment. sum-of-pairs objective function. Dynamic program solution for three sequences. Program MSA
In depth treatment of local alignment using dynamic programming.
Continuation of the discussion of how to compute similarity and optimal sequence alignment using dynamic programming. Local as well as global alignment.
Continuation of the discussion of how to efficiently compute the similarity of two sequences. Introduction to the traceback operation to find the optimal alignment.
Review of the definition of sequence similarity and extensions of the model for greater biological fidelity. Introduction to parametric sequence alignment.
Definition of sequence similarity and string alignment, counting alignments the need for fast computation.
IBM software engineer and musician Paul Reiners previews his featured article on Dynamic programming and sequence alignment, looking at how computer science aids molecular biology. Also, check out Paul's dW Space on music programming and algorithmic composition.