Microbial Bioinformatics is a rapidly changing field marrying computer science and microbiology. Join us as we share some tips and tricks we’ve learnt over the years. If you’re student just getting to grips to the field, or someone who just wants to keep tabs on the latest and greatest - this podcast is for you. The hosts are Dr. Lee Katz from the Centres for Disease Control and Prevention (US), Dr. Nabil-Fareed Alikhan and Dr. Andrew Page both from Quadram Institute Bioscience (UK) and bring together years of experience in microbial bioinformatics. The opinions expressed here are our own and do not necessarily reflect the views of Centres for Disease Control and Prevention or Quadram Institute Bioscience. Intro music : Werq - Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 3.0 License http://creativecommons.org/licenses/by/3.0/ Outro music : Scheming Weasel (faster version) - Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 3.0 License http://creativecommons.org/licenses/by/3.0/ Question and comments? microbinfie@gmail.com
In this episode of the Micro Binfie Podcast, host Andrew Page takes listeners to the heart of the microbial genomics hackathon in Bethesda, Maryland, for an engaging conversation with special guest Megan Phillips, a PhD student from Emory University. Megan delves into her research on Staphylococcus aureus (MRSA), highlighting its fascinating dual nature as both a harmless and potentially serious pathogen. Megan discusses the complexities of tetracycline resistance, particularly focusing on plasmid-mediated mechanisms involving the pt181 plasmid. She explains how this plasmid's efflux pump, encoded by the gene tetK, contributes to variable resistance levels and the factors influencing MIC (Minimum Inhibitory Concentration) variability. Listeners will learn about the intricacies of plasmid copy numbers, their global spread across clonal complexes, and the occurrence of horizontal and vertical gene transfer. Throughout the episode, Megan shares insights on working with short-read sequencing data and the strategies she employs to detect plasmid presence using tools like BLAST. She also touches on the challenges and fascinating discoveries of tracking historical sample data and integrating findings from older research papers, showcasing her appreciation for the poetic style of scientific writing from the 1940s. For those interested in antimicrobial resistance, evolutionary microbiology, and the subtleties of bacterial genome analysis, this episode offers a compelling blend of technical details and engaging storytelling. Tune in to hear more about Megan's upcoming publications, her experiences navigating complex genomic data, and her thoughts on antimicrobial stewardship and historical perspectives on drug resistance.
In a two-part discussion, the hosts analyze the movie Contagion from their expert perspectives, focusing on the film's portrayal of epidemiology and genomics. They note that the movie compresses timelines for dramatic effect, speeding up the virus's spread and the response to it, and that decisions about managing a crisis are based on societal values, not just science. In part one, the hosts discuss the movie's depiction of the R0 value. They note that while this explanation is useful for the audience, it is unlikely that an epidemiologist would need to explain this concept to other epidemiologists. The group notes that the MEV1 virus is modeled on the real-life Nipah virus and comment on a scene where the genome of the virus is described as being 15 to 19 kilobases in length with 6 to 10 genes. They also discuss the movie's depiction of virus isolation and the unrealistic speed with which the initial assessment of the virus occurs. The podcasters touch upon the BSL4 lab and how the film depicts how the scientists behave. They also discuss the character of Matt Damon, who is exposed to the virus but does not get sick, and is an example of an asymptomatic carrier. In part two, the bioinformaticians examine the "genome dashboard" scene, noting the software's informative interface, which includes an alignment panel, protein structure, and a recombination map. They also discuss a scene where a phylogenetic tree is used to determine a change in the R0 value. They find this unrealistic because the tree is just a picture that doesn't accurately represent how a virus spreads. The group discusses how the bioinformatician is depicted in the film as moving in and out of the lab, which may have been realistic in 2011 but is less so in the present day. They discuss how the CDC is portrayed in the film, noting scenes that were filmed at the actual CDC, as well as their experiences at the CDC. The podcasters note the movie is very US-centric and that many international partners would be involved in solving a global pandemic.
In this episode of the Micro Binfie podcast, host Andrew Page is joined by Nikhita Puthuveetil, Senior Bioinformatician at the American Type Culture Collection (ATCC). They delve into ATCC's ambitious project of sequencing a vast array of organisms from their renowned collection, tackling the challenges of assembling complex genomes from bacteria, viruses, fungi, and more. Discover how Nikhita and her team navigate through genomic roadblocks, leverage cutting-edge sequencing technologies, and work to ensure accurate data provenance. Whether it's large viral genomes or evolving taxonomy, this episode offers a deep dive into the fascinating world of microbial bioinformatics and genomic curation.
In a two-part discussion, the hosts analyze the movie Contagion from their expert perspectives, focusing on the film's portrayal of epidemiology and genomics. They note that the movie compresses timelines for dramatic effect, speeding up the virus's spread and the response to it, and that decisions about managing a crisis are based on societal values, not just science. In part one, the hosts discuss the movie's depiction of the R0 value. They note that while this explanation is useful for the audience, it is unlikely that an epidemiologist would need to explain this concept to other epidemiologists. The group notes that the MEV1 virus is modeled on the real-life Nipah virus and comment on a scene where the genome of the virus is described as being 15 to 19 kilobases in length with 6 to 10 genes. They also discuss the movie's depiction of virus isolation and the unrealistic speed with which the initial assessment of the virus occurs. The podcasters touch upon the BSL4 lab and how the film depicts how the scientists behave. They also discuss the character of Matt Damon, who is exposed to the virus but does not get sick, and is an example of an asymptomatic carrier. In part two, the bioinformaticians examine the "genome dashboard" scene, noting the software's informative interface, which includes an alignment panel, protein structure, and a recombination map. They also discuss a scene where a phylogenetic tree is used to determine a change in the R0 value. They find this unrealistic because the tree is just a picture that doesn't accurately represent how a virus spreads. The group discusses how the bioinformatician is depicted in the film as moving in and out of the lab, which may have been realistic in 2011 but is less so in the present day. They discuss how the CDC is portrayed in the film, noting scenes that were filmed at the actual CDC, as well as their experiences at the CDC. The podcasters note the movie is very US-centric and that many international partners would be involved in solving a global pandemic.
In this episode of the Micro Binfie Podcast, host Andrew Page speaks with Dr. Brooke Talbot, a recent PhD graduate from Emory University, about her research on Staphylococcus aureus, with a focus on MRSA and antibiotic resistance. Brooke shares insights into her molecular epidemiology work, discussing the complexities of tracking resistant bacterial strains in clinical settings and the significance of genomic epidemiology in public health. From honeybees to foodborne outbreaks, Brooke's diverse research background offers listeners a fascinating journey through science, microbiology, and epidemiology.
In this episode of the Micro Binfie podcast, host Andrew Page talks with Dr. Erin Young, a bioinformatician at the Utah Public Health Laboratory, recorded during the 10th Microbial Bioinformatics Hackathon in Bethesda, Maryland. Erin shares her journey from researching hereditary cancer predisposition to her current role in public health bioinformatics, which she entered through a prestigious CDC and APHL fellowship. The conversation delves into her work with bacterial pathogens, particularly in tracking antimicrobial resistance in organisms like Klebsiella. Erin discusses the tools she uses for genome typing, such as MASH, FastANI, and SKA, and her innovative research on the accuracy of long-read sequencing technologies like Nanopore for detecting antimicrobial resistance genes. She also provides a preview of her upcoming poster for ASM, where she examines how Nanopore reads can be used effectively in public health microbiology. This episode offers a fascinating look at how bioinformatics and genomics are advancing the fight against infectious diseases.
In this episode of the Micro Binfie podcast, host Andrew Page is live from the 10th Microbial Bioinformatics Hackathon in Bethesda, Maryland. He sits down with David Mahoney, a PhD student from Dalhousie University in Halifax, Nova Scotia. David shares his research on characterizing antimicrobial resistance (AMR) genes and their transfer within metagenomes, focusing on metagenomic assembly graphs. They delve into David's background in food safety microbiology and his interest in the public health implications of genomics. He explains his exciting work on analyzing how AMR genes transfer across different environments, such as food production plants and clinical settings, using both new and existing data from Canada's Genomics Research and Development Initiative. David also highlights his use of innovative methods like assembly graphs and graph-based approaches to uncover AMR gene flow and lateral gene transfers, including the potential of machine learning techniques such as graph convolutional neural networks.
In this episode of the Micro Binfie Podcast, host Andrew Page catches up with Torsten Seemann at the 10th Microbial Bioinformatics Hackathon in Bethesda, Maryland. They discuss the rapid evolution of bioinformatics, the challenges faced by labs worldwide, and the explosion of tools post-COVID. Torsten shares insights into his work at Melbourne's Microbiological Diagnostic Unit (MDU), the development of platforms like OzTracker for bacterial genomics, and how his lab plays a national and international role in data sharing. The conversation dives into the future of the widely-used variant calling tool Snippy, as Torsten reveals exciting updates funded by the Chan Zuckerberg Initiative, including nanopore read support and the ability to process pre-assembled genomes. They also explore the importance of maintaining open-source bioinformatics tools to prevent them from becoming obsolete. Tune in for an in-depth discussion on the state of genomics, software development, and the challenges and rewards of open-source collaboration.
In this episode of the Micro binfie Podcast, host Andrew Page sits down with Tim Dallman at the 10th Bioinformatics Hackathon in Bethesda, Maryland. Tim shares insights from his work at Utrecht University in the Netherlands, where he focuses on genomic surveillance and machine learning models to predict disease risk and severity. They discuss the challenges of integrating genomic variation into predictive models, the importance of high-quality metadata, and the complexities of working with pathogens like Shiga toxin-producing E. coli. Tim also talks about his role at the WHO Pandemic and Epidemic Intelligence Hub and how global collaboration can drive innovation in public health genomics. Tune in to hear about cutting-edge research, the importance of interdisciplinary teamwork, and how genomic data can be harnessed for future pandemic preparedness.
Host Andrew Page is joined by Robert Petit from the Wyoming Public Health Laboratory. Robert, a key developer of the Bactopia pipeline, shares insights into how this end-to-end tool is transforming bacterial genomic surveillance. They dive into the origins of Bactopia, its applications in public health, and Robert's experience leading genomic projects in a rural setting. Discover how Bactopia streamlines pathogen detection, improves documentation, and integrates with other tools to deliver fast and accurate results. Listen in as they discuss new innovations in bioinformatics, including visualizations and human-read filtering, and explore future projects like CamelHUMP, designed to simplify sequence-based typing. Recorded live at the Microbial Bioinformatics Hackathon in Bethesda, Maryland, this episode brings you the latest in pathogen genomics and the challenges and rewards of working on the frontier of public health.
Andrew and Lee talk with Christine and Cynney about the Haiti cholera outbreak Cynney Walters: https://www.linkedin.com/in/cynney-walters-763111190 Walters et al, "Genome sequences from a reemergence of Vibrio cholerae in Haiti, 2022 reveal relatedness to previously circulating strains" https://journals.asm.org/doi/abs/10.1128/jcm.00142-23
Nabil and Lee have a quick chat about minimum spanning trees (MST). Eburst paper: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-10-152
We go over tree visualizations! * Microreact https://microreact.org/,https://www.phylocanvas.gl/ * grapetree https://github.com/achtman-lab/GrapeTree * Auspice/NextStrain https://nextstrain.org/ * Taxonium https://taxonium.org/ * Itol: https://itol.embl.de/ * PhyloViz https://online.phyloviz.net/index * Phandango https://jameshadfield.github.io/phandango/#/main
Kostas Konstantinidis returns to talk to us about ANI and metagenomics by Microbial Bioinformatics
We talk with Kostas! For more information please visit https://enve-omics.gatech.edu/
In this episode of the Micro Binfie Podcast, hosts Dr. Andrew Page and Dr. Lee Katz delve into the fascinating world of hash databases and their application in cgMLST (core genome Multilocus Sequence Typing) for microbial bioinformatics. The discussion begins with the challenges faced by bioinformaticians due to siloed MLST databases across the globe, which hinder synchronization and effective genomic surveillance. To address these issues, the concept of using hash databases for allele identification is introduced. Hashing allows for the creation of unique identifiers for genetic sequences, enabling easier database synchronization without the need for extensive system support or resources. Dr. Katz explains the principle of hashing and its application in genomics, where even a single nucleotide polymorphism (SNP) can result in a different hash, making it a perfect solution for distinguishing alleles. Various hashing algorithms, such as MD5 and SHA-256, are discussed, along with their advantages and potential risks of hash collisions. Despite these risks, the use of more complex hashes has been shown to significantly reduce the probability of such collisions. The episode also explores practical aspects of implementing hash databases in bioinformatics software, highlighting the need for exact matching algorithms due to the nature of hashing. Existing tools like eToKi and upcoming software are mentioned as examples of applications that can utilize hash databases. Furthermore, the conversation touches on the concept of sequence types in cgMLST and the challenges associated with naming and standardizing them in a decentralized database system. Alternatives like allele codes are mentioned, which could potentially simplify the representation of sequence types. Finally, the potential for adopting this hashing approach within larger bioinformatics organizations like Phage or GMI is discussed, with an emphasis on the need for a standardized and community-supported framework to ensure the longevity and effectiveness of hash databases in microbial genomics. This episode provides a comprehensive overview of how hash databases can revolutionize microbial genomics by solving long-standing issues of database synchronization and allele identification, paving the way for more efficient and collaborative genomic surveillance worldwide.
We discuss GAMBIT, software for accurately classifying bacteria and eukaryotes using a targeted k-mer based approach. GAMBIT software: https://github.com/gambit-suite/gambit GAMBIT suite: https://github.com/gambit-suite GAMBIT (Genomic Approximation Method for Bacterial Identification and Tracking): A methodology to rapidly leverage whole genome sequencing of bacterial isolates for clinical identification. https://doi.org/10.1371/journal.pone.0277575 TheiaEuk: a species-agnostic bioinformatics workflow for fungal genomic characterization https://www.frontiersin.org/journals/public-health/articles/10.3389/fpubh.2023.1198213/full
In this episode, Andrew Page and Lee Katz continue their conversation with Titus Brown, diving deeper into his work on k-mers, Sourmash, and open source software development: Topics discussed: K-mers for analyzing sequencing data, and how Sourmash builds on MinHash How Sourmash handles k-mers for metagenomic comparisons vs. MASH The modhash and bottom sketch approaches used in Sourmash Dealing with sequencing errors and noise in k-mer data Sourmash as a reference-based method, and applications for metagenomics Titus' focus on building reusable libraries and APIs vs one-off tools Recruiting collaborators through "nerd sniping" with interesting problems The open source philosophy that motivates Titus' software work Overall, the conversation provides insight into Titus' approach to bioinformatics software through iterating quickly, focusing on usability, and building open source tools. Papers: Spacegraphcats - https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02066-4 Sourmash - https://www.biorxiv.org/content/10.1101/2022.01.11.475838v2 IBD exploration - https://dib-lab.github.io/2021-paper-ibd/
In this final episode with Titus Brown, the conversation focuses on his work scaling metagenomic search with Sourmash: An overview of what Sourmash does - sketching and comparing large k-mer datasets How the sampling approach enables analyses like containment estimation Exciting capabilities of the Branchwater tool for multi-threaded real-time SRA search Scaling to search across millions of metagenomes in seconds with WebAssembly Potential public health applications for tracking and sourcing pathogens Important caveats around resolution limits and need for follow-up analyses Ongoing work to characterize the technique's specificity and sensitivity Overall, this episode highlights the massive scaling Sourmash enables for metagenomic search, and the potential use cases in public health, while acknowledging current limitations and uncertainties. Titus emphasizes the need to precisely convey what bioinformatic tools can and cannot do as research continues. Papers: Spacegraphcats - https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02066-4 Sourmash - https://www.biorxiv.org/content/10.1101/2022.01.11.475838v2 IBD exploration - https://dib-lab.github.io/2021-paper-ibd/
In this episode, Andrew Page and Lee Katz continue their chat with Titus Brown, focusing on taxonomy assignment in metagenomics: Topics discussed: Dealing with contamination and low quality genomes in reference databases Sourmash as a versatile search tool, not a curated database The need for high confidence in taxonomic assignment in public health Most microbial assignment tools have low specificity or sensitivity Possible ways to achieve perfect species classification (in theory) The challenges around defining species based on small genomic differences Interesting cryptography concept of 'unicity' distance for classification Conveying the nuances and uncertainties in taxonomic assignment The conversation highlights the difficulties around taxonomic classification, especially at the species level, but explores ideas for improving accuracy. Overall it emphasizes the complexities of biology and need for transparent conveyance of uncertainties. Papers: Spacegraphcats - https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02066-4 Sourmash - https://www.biorxiv.org/content/10.1101/2022.01.11.475838v2 IBD exploration - https://dib-lab.github.io/2021-paper-ibd/
Andrew, Nabil, and Lee react to the bioinformatics and the science overall in the 1993 film Jurassic Park. We looked at these YouTube clips: * https://youtu.be/mDTaykXudVI?si=I5aiUdBGStpIKHVC * https://youtu.be/RLz5Api676Y?si=D6nps33O42Fmk4Ac * https://youtu.be/0Nz8YrCC9X8?si=KNwkeFoS6Bu4LOpv * https://youtu.be/dxIPcbmo1_U?si=TOBw5AONYVCW0JzV * https://youtu.be/m1lc8GwBKFE?si=rj7Oiq51l2_dB6ro More information on the secret tattoo here: https://underunderstood.com/podcast/episode/jeff-goldblums-secret-tattoo-jurassic-park-ian-malcolm/
In this episode of the Micro Binfie Podcast, Andrew Page and Lee Katz interview Titus Brown about his journey from studying math and physics as an undergrad to becoming a bioinformatician focusing on metagenomics and software development. Topics discussed: Titus' background in math, physics, digital evolution research, and developmental biology His transition into bioinformatics to analyze the influx of genomic data in the 1990s Developing early tools for comparative genomics and sequence analysis The philosophy of creating usable software with good documentation Work on transcriptomics, metagenomics, and k-mers at Michigan State Digital normalization and dealing with large sequencing datasets Moving to UC Davis and continuing work on metagenomics and software like khmer and sourmash Thoughts on challenges around data reuse and accessibility in science. Papers: Spacegraphcats - https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02066-4 Sourmash - https://www.biorxiv.org/content/10.1101/2022.01.11.475838v2 IBD exploration - https://dib-lab.github.io/2021-paper-ibd/
The podcast discusses an article co-authored by Andrew Page, examining the use of GPT-4 for research publication. The conversation focuses on the authorship of articles generated by GPT-4 and the implications for academic publishing. Authorship and Ethics: Andrew discusses the question of authorship when AI-generated content is involved in research articles. He explores the ethical implications and potential biases associated with AI-assisted writing, such as the omission of minority figures and novel discoveries. He emphasizes the importance of transparency when using AI and its potential to democratize research, as long as ethical guidelines are maintained. AI & Scientific Journals: The podcast delves into the current landscape of AI in academic publishing. It addresses the commercial use of AI in crafting manuscripts for research articles and the necessity of distinguishing between manual and AI-generated contributions. The possible misalignment of GPT-4's commercial objectives with academic goals is highlighted. Risks and Benefits: Andrew outlines the risks of using AI in publishing, such as unintentional plagiarism, biases, and outdated methods. He provides an example of bioinformatics software recommending deprecated methods, illustrating the need for caution. The conversation also touches upon the AI's potential to introduce bias unintentionally, citing past incidents where AI models quickly adopted extremist views. Andrew's co-authors, Niamh Tumelty and Sam Sheppard, bring different perspectives on ethics and the impact of AI on publishing. Niamh, associated with the London School of Economics, emphasizes ethical considerations, while Sam, editor-in-chief of Microbial Genomics, underscores the need to adapt to the reality of AI contributions in journal submissions. In conclusion, the podcast underscores the importance of recognizing and navigating the ethical challenges posed by AI in academic publishing. It suggests that the technology may evolve faster than policies can adapt, necessitating an ongoing conversation among researchers, publishers, and AI developers. Links: https://microbiologysociety.org/blog/microbe-talk-ai-a-useful-tool-or-dangerous-unstoppable-force.html https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.001049
We continue our conversation with Wytamma Wirth about write-the and all things AI. It starts with discussing the usage of language models, specifically ChatGPT, in writing boilerplate code, and how it can assist in generating code snippets, unit tests, and even documentation strings. The participants also explore the potential of incorporating it into code editors to make coding more efficient and less error-prone. The conversation then shifts to discuss the generation of research papers, specifically software announcements, by leveraging code documentation. The participants believe ChatGPT could be useful in generating introductions and backgrounds for such publications. They also touch upon the utility of language models in translating documentation into different human languages to assist non-native English speakers. The discussion returns to code documentation, focusing on the tool "write the docs" which auto-generates well-structured and searchable documentation websites. The participants appreciate the tool's ease of use and the potential it has in maintaining proper documentation for projects. The conversation ends with an acknowledgment of the importance of human oversight in automating tasks using language models. Links: Write-the software: https://github.com/Wytamma/write-the Wytamma Wirth: https://www.wytamma.com/
In this episode, we dive deep into the world of automated code documentation and conversion using ChatGPT through the write-the software developed by Dr Wytamma Wirth from The University of Melbourne. Our guest, an experienced software engineer, takes us on a journey through the challenges and nuances of writing code documentation and the role AI can play in easing this process. We explore the intersection of ChatGPT's capabilities with Write the Docs, a documentation system widely used by developers. From highlighting ChatGPT's ability to understand and generate code snippets, to demonstrating real-time code conversion across multiple programming languages, this episode is a treasure trove for developers looking to enhance their workflow. Whether you're a seasoned developer or just getting started, tune in to discover how the synergy of AI and coding can elevate your documentation game to the next level! Links: Write-the software: https://github.com/Wytamma/write-the Wytamma Wirth: https://www.wytamma.com/
Lee and Andrew are at the Global Microbial Identifier conference (GMI13) in Vancouver Canada. On day 3 we catch up with Dr Ruth Timme.
Andrew and Lee are at the Global Microbial Identifier conference (GMI13) in Vancouver Canada. We talk to Dr William Hsiao, one of the organisers of the conference.
Andrew and Lee are at the Global Microbial Identifier conference 13 in Vancouver Canada. On the first day they talked to Dr Finlay Maguire and Dr Emma Griffiths about microbial genomics and Tim Hortons.
In this episode there is a comprehensive discussion on the influence of AI, especially GPT-4, in the sphere of microbial bioinformatics. They reflect on a study testing GPT-4's problem-solving capabilities, which raises concerns about its potential impact on employment practices and academic integrity. There's speculation that AI's proficiency in tackling standard technical problems could interfere with genuinely evaluating a candidate's knowledge during interviews. Drawing parallels with calculators, the hosts deliberate on whether AI tools should be permitted during assessments. They stress the necessity for individuals to possess a deep understanding of their domain to accurately interpret and validate AI solutions. Discussing the AI's limitations, the hosts highlight its struggles with regular expressions and handling larger scripts. They observe the AI tends to loop and repeat itself, performing better with shorter scripts but faltering on more complex tasks often seen in bioinformatics. This prompts a discussion on how educators should address these developments in their teaching strategies. Moreover, the hosts explore the potential of large language models to improve base calling and read correction in sequencing, drawing on the structured and predictable nature of language and genetic code. They also discuss the idea of introducing randomness in these models to generate creative and varied solutions, potentially predicting future alleles or gene configurations. Ultimately, they express a blend of enthusiasm and apprehension towards the swift advances in this field and the ensuing implications for bioinformatics. They end on a note of anticipation for future developments, with a humorous nod towards AI's potential for automating mundane tasks like auto-correcting sample sheets. References: What Is ChatGPT Doing … and Why Does It Work? https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/ Many bioinformatics programming tasks can be automated with ChatGPT https://arxiv.org/ftp/arxiv/papers/2303/2303.13528.pdf ChatGPT for bioinformatics https://medium.com/@91mattmoore/chatgpt-for-bioinformatics-404c6d0817a1 Empowering Beginners in Bioinformatics with ChatGPT https://www.biorxiv.org/content/10.1101/2023.03.07.531414v1 Lawyer uses GPT and get ethics violation https://simonwillison.net/2023/May/27/lawyer-chatgpt/ Can ChatGPT solve bioinformatic problems with Python? https://dmnfarrell.github.io/bioinformatics/chatGPT-python
In this episode of the Micro Binfie Podcast, titled "AI Unleashed: Navigating the Opportunities and Challenges of AI in Microbial Bioinformatics", Lee, Nabil, and Andrew unpack the implications of generative predictive text AI tools, notably GPT, on microbial bioinformatics. They kick off the conversation by outlining the various applications of AI tools in their work, which range from generating boilerplate programs, drafting documents, to summarizing vast tracts of data. Andrew talks about his experience with GPT in coding, specifically via VS Code and GitHub Copilot, highlighting how GPT can generate nearly 90% of the necessary code based on a brief description of the task, thereby accelerating his work. He goes on to discuss the use of GPT in clarifying lines of code and notes that they used AI to generate a paper on the ethical considerations of employing AI in microbial genomics research during a recent hackathon. The conversation then switches gears as Nabil shares his experience of using GPT to standardize date formats in tables and summarize paper abstracts. While GPT is generally accurate in performing simple tasks, he warns that the tool can sometimes provide erroneous answers. Nabil also highlights GPT's ability to generate plausible but inaccurate responses for complex prompts, as illustrated by his experience when he used it to find a route in a video game. Andrew then talks about a script they created during a hackathon, which produces podcast episodes reviewing math tools. He points out the issues encountered, such as GPT providing wrong factual information. Looking ahead, Andrew envisions a future awash with GPT-generated content that may or may not be correct, raising the challenge of discerning real and false information. However, they also acknowledge the potential benefits of AI technologies for those with visual impairments, though it's far from a perfect solution at present. The conversation veers to the use of AI tech in handling boilerplate code and generating code snippets based on predictive text. The hosts further discuss the potential for this tool in rapid language learning. A live experiment ensues where Nabil and Andrew use a Perl script and utilize GPT-4 to convert this script into Python and back again to assess its capabilities in language translation. The AI tool proves proficient, considering comments, usage, and authorship and employing popular libraries like BioPython intelligently, though it does leave a disclaimer about potential inaccuracies. They consider the possibility of using AI to optimize coding, similar to minifying JavaScript, and even the idea of iterating through multiple languages and assessing the output. Nabil initiates a simpler task for the AI, asking it to write a Python script translating DNA into protein, which then gets translated into Rust. Andrew shares his experience of using AI to generate a Python class that compares two spreadsheets using pandas, demonstrating AI's comprehension and execution of complex tasks. In summary, this episode underscores the power and potential of AI in coding and the need for human oversight to ensure the quality and effectiveness of AI-generated content. It offers a glimpse into a future where AI tools, despite their limitations, can revolutionize many aspects of programming, bringing in new efficiencies and methods of working.
The MicroBinfie podcast discusses the top programming languages for bioinformatics. Andrew, Lee, and Nabil agree that Python is a great starting point for its consistency and rigor. Its strict syntax is ideal for teaching programming fundamentals that are essential in any language. In contrast, Perl encourages multiple ways of doing the same thing, creating confusion and difficulties in keeping track of things. The hosts caution against starting with trendy languages that are constantly changing. Instead, stick with more established languages like Python, which have established libraries and concepts that will help you advance more easily. Trendy languages come and go like changing tides, making them riskier choices. Additionally, they highlight the importance of understanding databases and their primary keys and unique fields. SQL is useful, particularly in dealing with large datasets. It is consistent across flavors and unlikely to go away soon. It takes a lot of skill to optimize queries to work in milliseconds. The hosts emphasize that the language you choose to learn depends on your individual goals and environment. For instance, Lee suggests that you should look to who is in your space and what they are using and who is willing to help you. Once you understand the programming concepts, it is easier to transfer them to other languages, and it is just a question of understanding the syntax. Andrew, Lee, and Nabil also discuss their own trajectories of learning programming languages, revealing that it takes a long time to become an expert in a language, and it is something that needs to be appreciated. They highlight the difference between just learning the basics of a language and really getting into the depths of it and the frameworks and libraries. The hosts also mention languages that are important to pick up, like SQL and bash scripting, and languages that are popular for web development, like JavaScript. However, they caution that JavaScript and Java are not the same thing and that JavaScript has a reputation for being a weird language. When asked what language they would choose for a task, Nabil says he would use Perl, Lee mentions R for stats, while Andrew admits that he has to relearn R every time he comes back to it and therefore prefers Perl for quick scripts. They also discuss their love-hate relationship with R, mentioning that while it has useful libraries like GGplot and GGtree, its syntax is difficult to work with and has separate paradigms of approaching the same problem. The hosts conclude by acknowledging that there is no one-size-fits-all approach to learning programming languages. One should choose based on their goals, environment, and personal preferences. Python is a useful language to learn, even if one is not interested in bioinformatics. Additionally, they note that the fundamentals of databases and how they work are crucial to understand and utilized across fields.
We are back talking about systematics, and SeqCode; a nomenclatural code for prokaryotes described from sequence data. Marike Palmer is a Postdoctoral researcher in the School of Life Sciences at the University of Nevada Las Vegas and Miguel Rodriguez is an Assistant Professor of Bioinformatics at the University of Innsbruck in the departments of Microbiology and the Digital Science Center (DiSC). Link to paper: https://www.nature.com/articles/s41564-022-01214-9 History paper: https://www.sciencedirect.com/science/article/pii/S0723202022000121 They discussed the SeqCode, a nomenclature code for Prokaryotes described from sequence data. The SeqCode was created to provide a specific nomenclature code for previously uncultivated organisms. Palmer explained that the impetus for the SeqCode was the need to accommodate previously uncultivated organisms under a specific nomenclature code. She emphasized that the SeqCode was written to allow any peer-reviewed publication, but noted that the authors have designed three paths of validation in the SeqCode. They hope that anyone proposing a name will work with the curriculum team to ensure the best quality descriptions, names, etymology, and solidification. Rodriguez discussed the SeqCode's governance, which is already in place, and they have made them public so that anyone interested can join the SeqCode community. The governance structure comprises an executive board, committees, and working groups. The position's co-opted members hold some of the committees of these committees, while some are chosen by ballot. The hosts sought to clarify the relationship between the Isme Society, which is backing the SeqCode, and the wider field in general. Rodriguez explained that ISME is simply providing support as an umbrella organization for the SeqCode. Palmer and Rodriguez clarified that the SeqCode is not a competing code but rather a parallel one that aims to accommodate previously uncultivated organisms. The SeqCode was created to provide a specific nomenclature code for previously uncultivated organisms. Palmer noted that most scientists culture prokaryotes not for naming but to advance their knowledge of these organisms through physiology experiments. They emphasized that the new system is the result of a long collaborative effort that involved many different viewpoints and philosophies. The episode also discussed the practical requirements for naming under the new system, which include standards for the completeness and contamination levels required in the genome sequence data. Palmer noted that while the 16S rRNA gene sequence was not required for naming, it was recommended for improved accuracy in cross-talk between different taxonomies. The conversation highlighted the importance and challenges of naming microorganisms and the ongoing efforts to create a system that is inclusive of all microorganisms, both cultivated and uncultivated. Rodriguez and Palmer also discussed the SeqCode, a nature code for naming prokaryotes described from sequence data. They agreed that high-quality genomes should be the main control types to ensure the system builds up rather than breaks down. They noted the challenge of obtaining full genomes of some organisms, such as obligate intracellular parasites but suggested obtaining housekeeping genes as a potential solution. They further explained the technical issue of estimating completeness or contamination for many taxa, but Palmer confirmed that registering a name on the SeqCode registry requires adding such estimates. It emphasized the importance of collaboration within the scientific community and the need to create a system that is inclusive of all microorganisms. It also highlighted the challenges inherent in the process of naming microorganisms but demonstrated that it is an ongoing process, and that scientists are working to create a system that is accurate, practical, and beneficial for all.
Today we are talking about systematics, and specifically SeqCode; a nomenclatural code for prokaryotes described from sequence data. Joining us to talk about it are co-authors on the recent publication. Marike Palmer and Miguel Rodriguez. Marike Palmer is a Postdoctoral researcher in the School of Life Sciences at the University of Nevada Las Vegas and Miguel Rodriguez is an Assistant Professor of Bioinformatics at the University of Innsbruck in the departments of Microbiology and the Digital Science Center(DiSC).
An honest discussion about the up and downsides of doing a postdoc in front of an audience of first year PhD students. Guests Dr Emma Waters, Dr Heather Felgate and Dr Muhammad Yasir are joined by Dr Andrew Page. It was recorded in front of a live audience of PhD students at the Microbes, Microbiomes and Bioinformatics doctoral training program in the Quadram Institute in Norwich UK. Emma starts the conversation by sharing that she enjoys research and solving problems with different tools. The thrill of discovery and exploration that comes with the postdoc position is something she loves. Heather echoes Emma's thoughts and believes that she is happy where she is, rather than chasing after a higher paying job in the industry. She appreciates the flexibility that academia offers, which has enabled her to balance her family and personal life. The conversation takes a turn when PhD students ask if any of the postdocs regret the decision of choosing academia despite the evident pay gap between the industry and academia. Emma points out that although she may have earned more in the industry, she is happy where she is, and finds satisfaction in helping people through her work. Chasing profits in the industry would not offer her that kind of gratification. Yasir shares his success story of sequencing 600 samples of the SAR-CoV-2 virus in Pakistan, and how it contributed towards the fight against the pandemic. He credits the freedom and flexibility of academia that allows him to collaborate with colleagues from all over the world. In conclusion, Andrew advises students to explore their options and to keep their careers open-ended. He suggests that if they are after a higher paycheck, they should consider the bioinformatics data science path that offers more earning opportunities in the industry. The postdocs stress the importance of following what makes one happy in life, rather than chasing big salaries.
This is a panel discussion on mobile genetic elements, guest chaired by Dr Muhammad Yasir with guests Dr Emma Waters, Dr Heather Felgate and Dr Andrew Page. We cover AMR, Salmonella Typhi and Staphylococci and outbreaks and the role of MGEs. It was recorded in front of a live audience of PhD students at the Microbes, Microbiomes and Bioinformatics doctoral training program in the Quadram Institute in Norwich UK.
We talk about KRAKEN the taxonomic classification software and the software suite around it and are joined by Jennifer Lu and Natalia Rincon from Johns Hopkins University Center for Computational Biology.
We are talking about KRAKEN - the taxonomic classification software and in the hot seat are Dr Jennifer Lu and Natalia Rincon from Johns Hopkins University Center for Computational Biology.
We talk to Professor Ed Feil (University of Bath) and Dr Natacha Couto (University of Oxford) about the early days of MLST.
We discuss One Health with Dr Natacha Couto (University of Oxford) and Professor Ed Feil (University of Bath).
We celebrate having 100 episodes! We look back at the history of our podcast and then talk about what the future might hold. Then: Lee gets his revenge by having Andrew and Nabil pronounce words local to him. We very briefly mentioned this paper: https://www.nature.com/articles/s41586-022-05543-x Nabil was trying to remember this particular site and remembered it after recording: https://phagesdb.org/phages/
At the 8th Microbial Bioinformatics Hackathon in Bath we talked to a live panel with Kristy Horan, Torsten Seemann, Finlay Maguire and Andrew Page about bioinformatics from the frontlines. We apologise for the poor audio quality, it was recorded in a room with 20 people in the background so at points it got a bit loud, however we felt you might enjoy the discussion regardless.
We interview Frank Ambrosio. He is embarking on a lifestyle of nomadic bioinformatics, living his best life. * https://www.linkedin.com/in/francis-ambrosio/
We discuss recent advancements in genome sequencing technologies, based on what we've been hearing at conferences and within the community.
We've been busy attending in-person conferences such as IMMEM XIII and ASM NGS so we thought we'd give you some of our reflections. We discuss waste water surveillance, hybrid conferences and metadata amongst other things.
For the first time ever all 3 MicroBinfies are together in person to record an episode. We are joined by Torsten Seemann for a conversation about how what we do in research can get lost in translation when applied to public health. We discuss what we did with SARS-CoV-2 genomics and somehow end up chatting about geography and language. Hope you enjoy.
Over the past few weeks scientists have been swapping Twitter for Mastodon. Our very own Nabil-Fareed Alikhan talks about his experience with setting up and running a Mastodon server called https://mstdn.science which is one of the places where scientists have moved over to. We are joined by Emma Hodcroft to get an independent scientists view on the whole thing.
We go through bug reports and issues and give insights into how bioinformaticians dig into them. We suggest the underlying problems and possible solutions and also provide tips on how to file better bug reports.
Often the hard part of bioinformatics isnt the analysis, its getting all of the software you need setup and installed. Come with us on this journey and avoid dependancy hell.
91 - What language should I learn? by Microbial Bioinformatics
90 Public health bioinformatics on the cloud with WDL + Terra.bio by Microbial Bioinformatics
Today on the @microbinfie podcast, we talk about WDL with @sevinsky and @DannyJPark. We learn what widdle means to Andrew and his kids. Joel takes a shot at Lyve-SET and you'll never guess what happens next.