Micro binfie podcast

Share on

Microbial Bioinformatics is a rapidly changing field marrying computer science and microbiology. Join us as we share some tips and tricks we’ve learnt over the years. If you’re student just getting to grips to the field, or someone who just wants to keep tabs on the latest and greatest - this podcast is for you. The hosts are Dr. Lee Katz from the Centres for Disease Control and Prevention (US), Dr. Nabil-Fareed Alikhan and Dr. Andrew Page both from Quadram Institute Bioscience (UK) and bring together years of experience in microbial bioinformatics. The opinions expressed here are our own and do not necessarily reflect the views of Centres for Disease Control and Prevention or Quadram Institute Bioscience. Intro music : Werq - Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 3.0 License http://creativecommons.org/licenses/by/3.0/ Outro music : Scheming Weasel (faster version) - Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 3.0 License http://creativecommons.org/licenses/by/3.0/ Question and comments? microbinfie@gmail.com

Microbial Bioinformatics

Apr 9, 2026 LATEST EPISODE
monthly NEW EPISODES
28m AVG DURATION
155 EPISODES

Search for episodes from Micro binfie podcast with a specific topic:

Latest episodes from Micro binfie podcast

152 - Deacon part 2

Play Episode Listen Later Apr 9, 2026 27:54

In this follow-up Software Deep Dive episode, we continue our conversation with Dr. Bede Constantinides (University of Birmingham) about the design and implementation of Deacon, a fast host-read removal tool for metagenomics. Deacon uses minimizers and k-mer set membership queries instead of alignment, allowing it to filter reads extremely quickly while balancing sensitivity and specificity. The tool is written in Rust, producing a small, fast binary and enabling very high throughput. We also discuss benchmarking with diverse viral and bacterial datasets, why tools like Kraken2 are not always ideal for host depletion, and why host read removal remains an unsolved problem—especially when balancing privacy, computational cost, and preservation of microbial reads. Links Deacon https://github.com/bede/deacon Hostile https://github.com/bede/hostile/ Bede Constantinides http://bede.im/ Kraken2 https://ccb.jhu.edu/software/kraken2/

birmingham rust

151 - Deacon part 1

Play Episode Listen Later Mar 26, 2026 23:36

In this Software Deep Dive episode, we talk with Dr. Bede Constantinides (University of Birmingham) about Deacon, a tool for removing host DNA reads from metagenomic datasets. We discuss why host read removal is a deceptively difficult problem, the limitations of alignment-based approaches, and how Deacon evolved from Bede's earlier tool Hostile. The conversation covers practical issues such as human contamination in metagenomes, balancing sensitivity vs specificity when filtering reads, and the computational challenges of working with large human reference datasets and pangenomes. This episode focuses on the background and motivation for Deacon. The next episode will dive deeper into how the method works. Links Deacon https://github.com/bede/deacon Hostile https://github.com/bede/hostile/ Bede Constantinides http://bede.im/ nf-core taxprofiler https://nf-co.re/taxprofiler Kraken2 https://ccb.jhu.edu/software/kraken2/

dna birmingham hostile bede

150 - Genomicx

Play Episode Listen Later Mar 12, 2026 43:07

In this episode, Lee, Nabil, and Andrew experiment with “vibe coding” bioinformatics tools using AI coding assistants. The goal: quickly build useful genomics utilities that run entirely in the browser via WebAssembly, without requiring command-line installs or servers. Nabil states: “Even if you don't want to use this technology, you should pay attention - because everyone else will.” They discuss how existing bioinformatics programs can be compiled to WebAssembly and wrapped with simple browser interfaces so analyses run locally on a user's machine. This keeps genomic data private while making tools easier to access. The prototype tools discussed in the episode are available here: https://genomicx.github.io/ These examples show how browser-based bioinformatics might work for lightweight tasks such as genome comparisons and basic sequence analysis. Topics Covered * Using AI tools to rapidly prototype bioinformatics software * Compiling genomics programs to WebAssembly * Running analyses locally in the browser * Privacy advantages of keeping genomic data on the user's computer * Practical limits of browser-based computation Try the tools and let us know what you think using the hashtag #genomicx

ai practical privacy nabil webassembly compiling

149 - Bridging AI & Biosciences

Play Episode Listen Later Feb 9, 2026 35:53

Join hosts Kieren Sharma (Artificially Ever After podcast, University of Bristol) and Andrew Page (MicroBinfie podcast, Origin Sciences) for a compelling live panel discussion exploring the dynamic intersection of artificial intelligence and the biosciences. In this episode, our expert panel discusses:

university ai safety chatgpt bridging practical advice ai revolution bioscience informatics

148 - NextFlow debate 3

Play Episode Listen Later Nov 13, 2025 24:20

Final part of our discussion/debate on NextFlow

debate

147 - NextFlow debate 2

Play Episode Listen Later Oct 30, 2025 15:13

Part two of debating NextFlow

debate

146 - NextFlow debate 1

Play Episode Listen Later Oct 16, 2025 18:32

Andrew, Nabil, and Lee debate about NextFlow!

debate nabil

145 - micro binfie Pathoplexus part 2

Play Episode Listen Later Sep 25, 2025 20:52

Nabil and Lee bring a guest host Clint to talk with some of the people behind Pathoplexus http://pathoplexus.org/ * Dr. Emma Hodcroft - https://pathoplexus.org/about/eb * Dr. Theo Sanderson - https://pathoplexus.org/about/development-team * Mr. Arthur Shem Kasambula - https://pathoplexus.org/about/development-team

micro clint nabil

144 - micro binfie Pathoplexus part 1

Play Episode Listen Later Sep 11, 2025 22:47

micro clint nabil

143 - where we catch up on our professional lives

Play Episode Listen Later Aug 28, 2025 13:05

143 - where we catch up on our professional lives by Microbial Bioinformatics

professional

Microbinfie - genomeqc

Play Episode Listen Later Aug 7, 2025 26:42

The crew talks about the genomeqc ecosystem. Check it out at https://happykhan.github.io/genomeqc/

140 Exploring Clinical Metagenomics

Play Episode Listen Later Jul 3, 2025 22:54

In this episode of the Podcast, hosts Lee Katz and Andrew Page delve into the intricacies of clinical metagenomics. Joined by experts Torsten Seemann and Finlay Maguire, they explore the challenges and methodologies involved in identifying pathogens in clinical samples when traditional diagnostic methods fail. They discuss the types of samples analyzed, the impact of low DNA concentrations, and the complexities of sequencing data interpretation. The conversation highlights the importance of interdisciplinary collaboration between bioinformaticians and clinicians in managing difficult infections. Despite the hurdles, clinical metagenomics is presented as a complementary tool aiding in diagnosis, emphasizing the need for genomic literacy among healthcare professionals.

dna clinical metagenomics andrew page

139 Lassa virus genomics

Play Episode Listen Later Jun 19, 2025 15:29

On this episode we are joined by Sara Zufan to discuss her PhD journey and recent projects. The episode covers various research experiences and challenges in the field, including the rapid detection of AMR using nanopore sequencing, COVID-19 projects, hepatitis A outbreak investigations, and Lassa virus surveillance in Liberia. The guests share insights into their professional journeys, their experiences working across different continents, and the future of microbial bioinformatics research.

covid-19 phd viruses liberia genomics amr lassa

138 Decoding Tetracycline Resistance in MRSA

Play Episode Listen Later Jan 16, 2025 11:24

In this episode of the Micro Binfie Podcast, host Andrew Page takes listeners to the heart of the microbial genomics hackathon in Bethesda, Maryland, for an engaging conversation with special guest Megan Phillips, a PhD student from Emory University. Megan delves into her research on Staphylococcus aureus (MRSA), highlighting its fascinating dual nature as both a harmless and potentially serious pathogen. Megan discusses the complexities of tetracycline resistance, particularly focusing on plasmid-mediated mechanisms involving the pt181 plasmid. She explains how this plasmid's efflux pump, encoded by the gene tetK, contributes to variable resistance levels and the factors influencing MIC (Minimum Inhibitory Concentration) variability. Listeners will learn about the intricacies of plasmid copy numbers, their global spread across clonal complexes, and the occurrence of horizontal and vertical gene transfer. Throughout the episode, Megan shares insights on working with short-read sequencing data and the strategies she employs to detect plasmid presence using tools like BLAST. She also touches on the challenges and fascinating discoveries of tracking historical sample data and integrating findings from older research papers, showcasing her appreciation for the poetic style of scientific writing from the 1940s. For those interested in antimicrobial resistance, evolutionary microbiology, and the subtleties of bacterial genome analysis, this episode offers a compelling blend of technical details and engaging storytelling. Tune in to hear more about Megan's upcoming publications, her experiences navigating complex genomic data, and her thoughts on antimicrobial stewardship and historical perspectives on drug resistance.

phd maryland resistance blast bethesda decoding emory university mrsa staphylococcus andrew page tetracycline

137 Contagion part 2

Play Episode Listen Later Jan 2, 2025 32:26

In a two-part discussion, the hosts analyze the movie Contagion from their expert perspectives, focusing on the film's portrayal of epidemiology and genomics. They note that the movie compresses timelines for dramatic effect, speeding up the virus's spread and the response to it, and that decisions about managing a crisis are based on societal values, not just science. In part one, the hosts discuss the movie's depiction of the R0 value. They note that while this explanation is useful for the audience, it is unlikely that an epidemiologist would need to explain this concept to other epidemiologists. The group notes that the MEV1 virus is modeled on the real-life Nipah virus and comment on a scene where the genome of the virus is described as being 15 to 19 kilobases in length with 6 to 10 genes. They also discuss the movie's depiction of virus isolation and the unrealistic speed with which the initial assessment of the virus occurs. The podcasters touch upon the BSL4 lab and how the film depicts how the scientists behave. They also discuss the character of Matt Damon, who is exposed to the virus but does not get sick, and is an example of an asymptomatic carrier. In part two, the bioinformaticians examine the "genome dashboard" scene, noting the software's informative interface, which includes an alignment panel, protein structure, and a recombination map. They also discuss a scene where a phylogenetic tree is used to determine a change in the R0 value. They find this unrealistic because the tree is just a picture that doesn't accurately represent how a virus spreads. The group discusses how the bioinformatician is depicted in the film as moving in and out of the lab, which may have been realistic in 2011 but is less so in the present day. They discuss how the CDC is portrayed in the film, noting scenes that were filmed at the actual CDC, as well as their experiences at the CDC. The podcasters note the movie is very US-centric and that many international partners would be involved in solving a global pandemic.

cdc matt damon contagion nipah r0

135 Assembling ATCC bacterial strains

Play Episode Listen Later Dec 19, 2024 13:35

In this episode of the Micro Binfie podcast, host Andrew Page is joined by Nikhita Puthuveetil, Senior Bioinformatician at the American Type Culture Collection (ATCC). They delve into ATCC's ambitious project of sequencing a vast array of organisms from their renowned collection, tackling the challenges of assembling complex genomes from bacteria, viruses, fungi, and more. Discover how Nikhita and her team navigate through genomic roadblocks, leverage cutting-edge sequencing technologies, and work to ensure accurate data provenance. Whether it's large viral genomes or evolving taxonomy, this episode offers a deep dive into the fascinating world of microbial bioinformatics and genomic curation.

discover assembling strains bacterial andrew page atcc

141 hackathon panel

Play Episode Listen Later Dec 18, 2024 15:20

Andrew, Lee, Torsten, and Fin muse over the future of MLST, Illumina, and Nanopore at the ASM NGS Hackathon

panel fin hackathons torsten illumina

136 Contagion part 1

Play Episode Listen Later Dec 16, 2024 30:43

cdc matt damon contagion nipah r0

134 Diving into MRSA, Genomics, and Public Health

Play Episode Listen Later Dec 5, 2024 13:27

In this episode of the Micro Binfie Podcast, host Andrew Page speaks with Dr. Brooke Talbot, a recent PhD graduate from Emory University, about her research on Staphylococcus aureus, with a focus on MRSA and antibiotic resistance. Brooke shares insights into her molecular epidemiology work, discussing the complexities of tracking resistant bacterial strains in clinical settings and the significance of genomic epidemiology in public health. From honeybees to foodborne outbreaks, Brooke's diverse research background offers listeners a fascinating journey through science, microbiology, and epidemiology.

phd diving public health emory university genomics mrsa staphylococcus andrew page

133 The Role of Bioinformatics in Public Health and Disease Outbreaks

Play Episode Listen Later Dec 5, 2024 12:36

In this episode of the Micro Binfie podcast, host Andrew Page talks with Dr. Erin Young, a bioinformatician at the Utah Public Health Laboratory, recorded during the 10th Microbial Bioinformatics Hackathon in Bethesda, Maryland. Erin shares her journey from researching hereditary cancer predisposition to her current role in public health bioinformatics, which she entered through a prestigious CDC and APHL fellowship. The conversation delves into her work with bacterial pathogens, particularly in tracking antimicrobial resistance in organisms like Klebsiella. Erin discusses the tools she uses for genome typing, such as MASH, FastANI, and SKA, and her innovative research on the accuracy of long-read sequencing technologies like Nanopore for detecting antimicrobial resistance genes. She also provides a preview of her upcoming poster for ASM, where she examines how Nanopore reads can be used effectively in public health microbiology. This episode offers a fascinating look at how bioinformatics and genomics are advancing the fight against infectious diseases.

maryland disease cdc public health bethesda mash ska outbreaks asm bioinformatics klebsiella erin young andrew page

132 Unlocking the Secrets of Antimicrobial Resistance in Metagenomes

Play Episode Listen Later Nov 21, 2024 11:02

In this episode of the Micro Binfie podcast, host Andrew Page is live from the 10th Microbial Bioinformatics Hackathon in Bethesda, Maryland. He sits down with David Mahoney, a PhD student from Dalhousie University in Halifax, Nova Scotia. David shares his research on characterizing antimicrobial resistance (AMR) genes and their transfer within metagenomes, focusing on metagenomic assembly graphs. They delve into David's background in food safety microbiology and his interest in the public health implications of genomics. He explains his exciting work on analyzing how AMR genes transfer across different environments, such as food production plants and clinical settings, using both new and existing data from Canada's Genomics Research and Development Initiative. David also highlights his use of innovative methods like assembly graphs and graph-based approaches to uncover AMR gene flow and lateral gene transfers, including the potential of machine learning techniques such as graph convolutional neural networks.

canada secrets phd maryland unlocking bethesda nova scotia halifax dalhousie university amr antimicrobial resistance andrew page

131 Bioinformatics Evolution: Torsten Seemann on Snippy, Open-Source Support, and Global Genomics

Play Episode Listen Later Nov 7, 2024 12:47

In this episode of the Micro Binfie Podcast, host Andrew Page catches up with Torsten Seemann at the 10th Microbial Bioinformatics Hackathon in Bethesda, Maryland. They discuss the rapid evolution of bioinformatics, the challenges faced by labs worldwide, and the explosion of tools post-COVID. Torsten shares insights into his work at Melbourne's Microbiological Diagnostic Unit (MDU), the development of platforms like OzTracker for bacterial genomics, and how his lab plays a national and international role in data sharing. The conversation dives into the future of the widely-used variant calling tool Snippy, as Torsten reveals exciting updates funded by the Chan Zuckerberg Initiative, including nanopore read support and the ability to process pre-assembled genomes. They also explore the importance of maintaining open-source bioinformatics tools to prevent them from becoming obsolete. Tune in for an in-depth discussion on the state of genomics, software development, and the challenges and rewards of open-source collaboration.

covid-19 evolution global maryland melbourne bethesda open source genomics torsten bioinformatics seemann chan zuckerberg initiative snippy andrew page

130 Exploring Genomic Innovation and Machine Learning in Public Health

Play Episode Listen Later Oct 25, 2024 12:35

In this episode of the Micro binfie Podcast, host Andrew Page sits down with Tim Dallman at the 10th Bioinformatics Hackathon in Bethesda, Maryland. Tim shares insights from his work at Utrecht University in the Netherlands, where he focuses on genomic surveillance and machine learning models to predict disease risk and severity. They discuss the challenges of integrating genomic variation into predictive models, the importance of high-quality metadata, and the complexities of working with pathogens like Shiga toxin-producing E. coli. Tim also talks about his role at the WHO Pandemic and Epidemic Intelligence Hub and how global collaboration can drive innovation in public health genomics. Tune in to hear about cutting-edge research, the importance of interdisciplinary teamwork, and how genomic data can be harnessed for future pandemic preparedness.

innovation maryland netherlands micro public health machine learning bethesda genomic utrecht university shiga andrew page

129 Genomics on the Frontier: Bactopia, Bioinformatics, and Pathogen Surveillance with Robert Petit

Play Episode Listen Later Oct 11, 2024 14:58

Host Andrew Page is joined by Robert Petit from the Wyoming Public Health Laboratory. Robert, a key developer of the Bactopia pipeline, shares insights into how this end-to-end tool is transforming bacterial genomic surveillance. They dive into the origins of Bactopia, its applications in public health, and Robert's experience leading genomic projects in a rural setting. Discover how Bactopia streamlines pathogen detection, improves documentation, and integrates with other tools to deliver fast and accurate results. Listen in as they discuss new innovations in bioinformatics, including visualizations and human-read filtering, and explore future projects like CamelHUMP, designed to simplify sequence-based typing. Recorded live at the Microbial Bioinformatics Hackathon in Bethesda, Maryland, this episode brings you the latest in pathogen genomics and the challenges and rewards of working on the frontier of public health.

discover maryland bethesda surveillance petit frontier genomics pathogens bioinformatics

128 Haiti cholera outbreak with Christine Lee and Cynney Walters

Play Episode Listen Later Sep 19, 2024 21:57

Andrew and Lee talk with Christine and Cynney about the Haiti cholera outbreak Cynney Walters: https://www.linkedin.com/in/cynney-walters-763111190 Walters et al, "Genome sequences from a reemergence of Vibrio cholerae in Haiti, 2022 reveal relatedness to previously circulating strains" https://journals.asm.org/doi/abs/10.1128/jcm.00142-23

haiti outbreak walters genome cholera vibrio christine lee

127 Minimum spanning trees

Play Episode Listen Later Sep 5, 2024 15:49

Nabil and Lee have a quick chat about minimum spanning trees (MST). Eburst paper: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-10-152

trees minimum spanning mst nabil

126 Tree viz

Play Episode Listen Later Aug 22, 2024 21:21

We go over tree visualizations! * Microreact https://microreact.org/,https://www.phylocanvas.gl/ * grapetree https://github.com/achtman-lab/GrapeTree * Auspice/NextStrain https://nextstrain.org/ * Taxonium https://taxonium.org/ * Itol: https://itol.embl.de/ * PhyloViz https://online.phyloviz.net/index * Phandango https://jameshadfield.github.io/phandango/#/main

tree

Kostas Konstantinidis returns to talk to us about ANI and metagenomics

Play Episode Listen Later Jun 6, 2024 22:07

Kostas Konstantinidis returns to talk to us about ANI and metagenomics by Microbial Bioinformatics

ani kostas metagenomics

Kostas Konstantinidis talks to us about ANI and metagenomics

Play Episode Listen Later May 23, 2024 22:58

We talk with Kostas! For more information please visit https://enve-omics.gatech.edu/

kostas metagenomics

123 The Revolution of Hash Databases in cgMLST

Play Episode Listen Later Mar 21, 2024 17:42

In this episode of the Micro Binfie Podcast, hosts Dr. Andrew Page and Dr. Lee Katz delve into the fascinating world of hash databases and their application in cgMLST (core genome Multilocus Sequence Typing) for microbial bioinformatics. The discussion begins with the challenges faced by bioinformaticians due to siloed MLST databases across the globe, which hinder synchronization and effective genomic surveillance. To address these issues, the concept of using hash databases for allele identification is introduced. Hashing allows for the creation of unique identifiers for genetic sequences, enabling easier database synchronization without the need for extensive system support or resources. Dr. Katz explains the principle of hashing and its application in genomics, where even a single nucleotide polymorphism (SNP) can result in a different hash, making it a perfect solution for distinguishing alleles. Various hashing algorithms, such as MD5 and SHA-256, are discussed, along with their advantages and potential risks of hash collisions. Despite these risks, the use of more complex hashes has been shown to significantly reduce the probability of such collisions. The episode also explores practical aspects of implementing hash databases in bioinformatics software, highlighting the need for exact matching algorithms due to the nature of hashing. Existing tools like eToKi and upcoming software are mentioned as examples of applications that can utilize hash databases. Furthermore, the conversation touches on the concept of sequence types in cgMLST and the challenges associated with naming and standardizing them in a decentralized database system. Alternatives like allele codes are mentioned, which could potentially simplify the representation of sequence types. Finally, the potential for adopting this hashing approach within larger bioinformatics organizations like Phage or GMI is discussed, with an emphasis on the need for a standardized and community-supported framework to ensure the longevity and effectiveness of hash databases in microbial genomics. This episode provides a comprehensive overview of how hash databases can revolutionize microbial genomics by solving long-standing issues of database synchronization and allele identification, paving the way for more efficient and collaborative genomic surveillance worldwide.

revolution alternatives existing katz databases sha hash snp hashing phage md5 gmi andrew page

122 GAMBIT: Genomic Approximation Method for Bacterial Identification and Tracking

Play Episode Listen Later Mar 9, 2024 17:29

We discuss GAMBIT, software for accurately classifying bacteria and eukaryotes using a targeted k-mer based approach. GAMBIT software: https://github.com/gambit-suite/gambit GAMBIT suite: https://github.com/gambit-suite GAMBIT (Genomic Approximation Method for Bacterial Identification and Tracking): A methodology to rapidly leverage whole genome sequencing of bacterial isolates for clinical identification. https://doi.org/10.1371/journal.pone.0277575 TheiaEuk: a species-agnostic bioinformatics workflow for fungal genomic characterization https://www.frontiersin.org/journals/public-health/articles/10.3389/fpubh.2023.1198213/full

method tracking gambit identification bacterial genomic approximation

121 K-mers, Sourmash, and Open Source Software - More Conversations with Titus Brown

Play Episode Listen Later Feb 1, 2024 20:03

In this episode, Andrew Page and Lee Katz continue their conversation with Titus Brown, diving deeper into his work on k-mers, Sourmash, and open source software development: Topics discussed: K-mers for analyzing sequencing data, and how Sourmash builds on MinHash How Sourmash handles k-mers for metagenomic comparisons vs. MASH The modhash and bottom sketch approaches used in Sourmash Dealing with sequencing errors and noise in k-mer data Sourmash as a reference-based method, and applications for metagenomics Titus' focus on building reusable libraries and APIs vs one-off tools Recruiting collaborators through "nerd sniping" with interesting problems The open source philosophy that motivates Titus' software work Overall, the conversation provides insight into Titus' approach to bioinformatics software through iterating quickly, focusing on usability, and building open source tools. Papers: Spacegraphcats - https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02066-4 Sourmash - https://www.biorxiv.org/content/10.1101/2022.01.11.475838v2 IBD exploration - https://dib-lab.github.io/2021-paper-ibd/

conversations apis mers open source software andrew page

120 Scaling Metagenomic Search with Sourmash - Conversations with Titus Brown

Play Episode Listen Later Jan 18, 2024 25:25

In this final episode with Titus Brown, the conversation focuses on his work scaling metagenomic search with Sourmash: An overview of what Sourmash does - sketching and comparing large k-mer datasets How the sampling approach enables analyses like containment estimation Exciting capabilities of the Branchwater tool for multi-threaded real-time SRA search Scaling to search across millions of metagenomes in seconds with WebAssembly Potential public health applications for tracking and sourcing pathogens Important caveats around resolution limits and need for follow-up analyses Ongoing work to characterize the technique's specificity and sensitivity Overall, this episode highlights the massive scaling Sourmash enables for metagenomic search, and the potential use cases in public health, while acknowledging current limitations and uncertainties. Titus emphasizes the need to precisely convey what bioinformatic tools can and cannot do as research continues. Papers: Spacegraphcats - https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02066-4 Sourmash - https://www.biorxiv.org/content/10.1101/2022.01.11.475838v2 IBD exploration - https://dib-lab.github.io/2021-paper-ibd/

conversations search scaling

119 The Challenges of Microbial Taxonomy - Conversations with Titus Brown

Play Episode Listen Later Jan 4, 2024 22:53

In this episode, Andrew Page and Lee Katz continue their chat with Titus Brown, focusing on taxonomy assignment in metagenomics: Topics discussed: Dealing with contamination and low quality genomes in reference databases Sourmash as a versatile search tool, not a curated database The need for high confidence in taxonomic assignment in public health Most microbial assignment tools have low specificity or sensitivity Possible ways to achieve perfect species classification (in theory) The challenges around defining species based on small genomic differences Interesting cryptography concept of 'unicity' distance for classification Conveying the nuances and uncertainties in taxonomic assignment The conversation highlights the difficulties around taxonomic classification, especially at the species level, but explores ideas for improving accuracy. Overall it emphasizes the complexities of biology and need for transparent conveyance of uncertainties. Papers: Spacegraphcats - https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02066-4 Sourmash - https://www.biorxiv.org/content/10.1101/2022.01.11.475838v2 IBD exploration - https://dib-lab.github.io/2021-paper-ibd/

conversations challenges taxonomy microbial andrew page

118 Real bioinformaticians react to Jurassic Park

Play Episode Listen Later Dec 21, 2023 60:44

Andrew, Nabil, and Lee react to the bioinformatics and the science overall in the 1993 film Jurassic Park. We looked at these YouTube clips: * https://youtu.be/mDTaykXudVI?si=I5aiUdBGStpIKHVC * https://youtu.be/RLz5Api676Y?si=D6nps33O42Fmk4Ac * https://youtu.be/0Nz8YrCC9X8?si=KNwkeFoS6Bu4LOpv * https://youtu.be/dxIPcbmo1_U?si=TOBw5AONYVCW0JzV * https://youtu.be/m1lc8GwBKFE?si=rj7Oiq51l2_dB6ro More information on the secret tattoo here: https://underunderstood.com/podcast/episode/jeff-goldblums-secret-tattoo-jurassic-park-ian-malcolm/

jurassic park react nabil

117 From Math to Metagenomics - Titus Brown on Career Journeys and Software Solutions

Play Episode Listen Later Dec 7, 2023 20:12

In this episode of the Micro Binfie Podcast, Andrew Page and Lee Katz interview Titus Brown about his journey from studying math and physics as an undergrad to becoming a bioinformatician focusing on metagenomics and software development. Topics discussed: Titus' background in math, physics, digital evolution research, and developmental biology His transition into bioinformatics to analyze the influx of genomic data in the 1990s Developing early tools for comparative genomics and sequence analysis The philosophy of creating usable software with good documentation Work on transcriptomics, metagenomics, and k-mers at Michigan State Digital normalization and dealing with large sequencing datasets Moving to UC Davis and continuing work on metagenomics and software like khmer and sourmash Thoughts on challenges around data reuse and accessibility in science. Papers: Spacegraphcats - https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02066-4 Sourmash - https://www.biorxiv.org/content/10.1101/2022.01.11.475838v2 IBD exploration - https://dib-lab.github.io/2021-paper-ibd/

career math journeys uc davis software solutions metagenomics andrew page

116 AI Authorship and Ethics in Academic Publishing for Genomics

Play Episode Listen Later Nov 23, 2023 32:27

The podcast discusses an article co-authored by Andrew Page, examining the use of GPT-4 for research publication. The conversation focuses on the authorship of articles generated by GPT-4 and the implications for academic publishing. Authorship and Ethics: Andrew discusses the question of authorship when AI-generated content is involved in research articles. He explores the ethical implications and potential biases associated with AI-assisted writing, such as the omission of minority figures and novel discoveries. He emphasizes the importance of transparency when using AI and its potential to democratize research, as long as ethical guidelines are maintained. AI & Scientific Journals: The podcast delves into the current landscape of AI in academic publishing. It addresses the commercial use of AI in crafting manuscripts for research articles and the necessity of distinguishing between manual and AI-generated contributions. The possible misalignment of GPT-4's commercial objectives with academic goals is highlighted. Risks and Benefits: Andrew outlines the risks of using AI in publishing, such as unintentional plagiarism, biases, and outdated methods. He provides an example of bioinformatics software recommending deprecated methods, illustrating the need for caution. The conversation also touches upon the AI's potential to introduce bias unintentionally, citing past incidents where AI models quickly adopted extremist views. Andrew's co-authors, Niamh Tumelty and Sam Sheppard, bring different perspectives on ethics and the impact of AI on publishing. Niamh, associated with the London School of Economics, emphasizes ethical considerations, while Sam, editor-in-chief of Microbial Genomics, underscores the need to adapt to the reality of AI contributions in journal submissions. In conclusion, the podcast underscores the importance of recognizing and navigating the ethical challenges posed by AI in academic publishing. It suggests that the technology may evolve faster than policies can adapt, necessitating an ongoing conversation among researchers, publishers, and AI developers. Links: https://microbiologysociety.org/blog/microbe-talk-ai-a-useful-tool-or-dangerous-unstoppable-force.html https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.001049

ai economics ethics risks gpt london school genomics niamh authorship sam sheppard academic publishing andrew page microbial genomics

115 Write-the: speeding up software development for bioinformatics

Play Episode Listen Later Nov 9, 2023 24:22

We continue our conversation with Wytamma Wirth about write-the and all things AI. It starts with discussing the usage of language models, specifically ChatGPT, in writing boilerplate code, and how it can assist in generating code snippets, unit tests, and even documentation strings. The participants also explore the potential of incorporating it into code editors to make coding more efficient and less error-prone. The conversation then shifts to discuss the generation of research papers, specifically software announcements, by leveraging code documentation. The participants believe ChatGPT could be useful in generating introductions and backgrounds for such publications. They also touch upon the utility of language models in translating documentation into different human languages to assist non-native English speakers. The discussion returns to code documentation, focusing on the tool "write the docs" which auto-generates well-structured and searchable documentation websites. The participants appreciate the tool's ease of use and the potential it has in maintaining proper documentation for projects. The conversation ends with an acknowledgment of the importance of human oversight in automating tasks using language models. Links: Write-the software: https://github.com/Wytamma/write-the Wytamma Wirth: https://www.wytamma.com/

ai english write chatgpt software development speeding up bioinformatics links write

114 Write-the: Automating Code Documentation ChatGPT

Play Episode Listen Later Nov 4, 2023 25:35

In this episode, we dive deep into the world of automated code documentation and conversion using ChatGPT through the write-the software developed by Dr Wytamma Wirth from The University of Melbourne. Our guest, an experienced software engineer, takes us on a journey through the challenges and nuances of writing code documentation and the role AI can play in easing this process. We explore the intersection of ChatGPT's capabilities with Write the Docs, a documentation system widely used by developers. From highlighting ChatGPT's ability to understand and generate code snippets, to demonstrating real-time code conversion across multiple programming languages, this episode is a treasure trove for developers looking to enhance their workflow. Whether you're a seasoned developer or just getting started, tune in to discover how the synergy of AI and coding can elevate your documentation game to the next level! Links: Write-the software: https://github.com/Wytamma/write-the Wytamma Wirth: https://www.wytamma.com/

university ai write chatgpt code melbourne docs automating documentation links write

113 Global Microbial Identifier conference with Ruth Timme

Play Episode Listen Later Sep 18, 2023 8:11

Lee and Andrew are at the Global Microbial Identifier conference (GMI13) in Vancouver Canada. On day 3 we catch up with Dr Ruth Timme.

global conference identifier microbial vancouver canada timme

112 Global Microbial Identifier conference with Will Hsiao

Play Episode Listen Later Sep 17, 2023 7:35

Andrew and Lee are at the Global Microbial Identifier conference (GMI13) in Vancouver Canada. We talk to Dr William Hsiao, one of the organisers of the conference.

global conference identifier microbial vancouver canada hsiao

111 Global Microbial Identifier with Finlay Maguire and Emma Griffiths

Play Episode Listen Later Sep 16, 2023 21:08

Andrew and Lee are at the Global Microbial Identifier conference 13 in Vancouver Canada. On the first day they talked to Dr Finlay Maguire and Dr Emma Griffiths about microbial genomics and Tim Hortons.

global maguire griffiths tim hortons finlay identifier microbial vancouver canada

110 ChatGPT: The Bioinformatics Calculator

Play Episode Listen Later Jul 6, 2023 26:47

In this episode there is a comprehensive discussion on the influence of AI, especially GPT-4, in the sphere of microbial bioinformatics. They reflect on a study testing GPT-4's problem-solving capabilities, which raises concerns about its potential impact on employment practices and academic integrity. There's speculation that AI's proficiency in tackling standard technical problems could interfere with genuinely evaluating a candidate's knowledge during interviews. Drawing parallels with calculators, the hosts deliberate on whether AI tools should be permitted during assessments. They stress the necessity for individuals to possess a deep understanding of their domain to accurately interpret and validate AI solutions. Discussing the AI's limitations, the hosts highlight its struggles with regular expressions and handling larger scripts. They observe the AI tends to loop and repeat itself, performing better with shorter scripts but faltering on more complex tasks often seen in bioinformatics. This prompts a discussion on how educators should address these developments in their teaching strategies. Moreover, the hosts explore the potential of large language models to improve base calling and read correction in sequencing, drawing on the structured and predictable nature of language and genetic code. They also discuss the idea of introducing randomness in these models to generate creative and varied solutions, potentially predicting future alleles or gene configurations. Ultimately, they express a blend of enthusiasm and apprehension towards the swift advances in this field and the ensuing implications for bioinformatics. They end on a note of anticipation for future developments, with a humorous nod towards AI's potential for automating mundane tasks like auto-correcting sample sheets. References: What Is ChatGPT Doing … and Why Does It Work? https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/ Many bioinformatics programming tasks can be automated with ChatGPT https://arxiv.org/ftp/arxiv/papers/2303/2303.13528.pdf ChatGPT for bioinformatics https://medium.com/@91mattmoore/chatgpt-for-bioinformatics-404c6d0817a1 Empowering Beginners in Bioinformatics with ChatGPT https://www.biorxiv.org/content/10.1101/2023.03.07.531414v1 Lawyer uses GPT and get ethics violation https://simonwillison.net/2023/May/27/lawyer-chatgpt/ Can ChatGPT solve bioinformatic problems with Python? https://dmnfarrell.github.io/bioinformatics/chatGPT-python

ai drawing chatgpt gpt python calculator bioinformatics can chatgpt why does it work

109 AI Unleashed: Navigating the Opportunities and Challenges of AI in Microbial Bioinformatics

Play Episode Listen Later Jun 22, 2023 30:56

In this episode of the Micro Binfie Podcast, titled "AI Unleashed: Navigating the Opportunities and Challenges of AI in Microbial Bioinformatics", Lee, Nabil, and Andrew unpack the implications of generative predictive text AI tools, notably GPT, on microbial bioinformatics. They kick off the conversation by outlining the various applications of AI tools in their work, which range from generating boilerplate programs, drafting documents, to summarizing vast tracts of data. Andrew talks about his experience with GPT in coding, specifically via VS Code and GitHub Copilot, highlighting how GPT can generate nearly 90% of the necessary code based on a brief description of the task, thereby accelerating his work. He goes on to discuss the use of GPT in clarifying lines of code and notes that they used AI to generate a paper on the ethical considerations of employing AI in microbial genomics research during a recent hackathon. The conversation then switches gears as Nabil shares his experience of using GPT to standardize date formats in tables and summarize paper abstracts. While GPT is generally accurate in performing simple tasks, he warns that the tool can sometimes provide erroneous answers. Nabil also highlights GPT's ability to generate plausible but inaccurate responses for complex prompts, as illustrated by his experience when he used it to find a route in a video game. Andrew then talks about a script they created during a hackathon, which produces podcast episodes reviewing math tools. He points out the issues encountered, such as GPT providing wrong factual information. Looking ahead, Andrew envisions a future awash with GPT-generated content that may or may not be correct, raising the challenge of discerning real and false information. However, they also acknowledge the potential benefits of AI technologies for those with visual impairments, though it's far from a perfect solution at present. The conversation veers to the use of AI tech in handling boilerplate code and generating code snippets based on predictive text. The hosts further discuss the potential for this tool in rapid language learning. A live experiment ensues where Nabil and Andrew use a Perl script and utilize GPT-4 to convert this script into Python and back again to assess its capabilities in language translation. The AI tool proves proficient, considering comments, usage, and authorship and employing popular libraries like BioPython intelligently, though it does leave a disclaimer about potential inaccuracies. They consider the possibility of using AI to optimize coding, similar to minifying JavaScript, and even the idea of iterating through multiple languages and assessing the output. Nabil initiates a simpler task for the AI, asking it to write a Python script translating DNA into protein, which then gets translated into Rust. Andrew shares his experience of using AI to generate a Python class that compares two spreadsheets using pandas, demonstrating AI's comprehension and execution of complex tasks. In summary, this episode underscores the power and potential of AI in coding and the need for human oversight to ensure the quality and effectiveness of AI-generated content. It offers a glimpse into a future where AI tools, despite their limitations, can revolutionize many aspects of programming, bringing in new efficiencies and methods of working.

ai challenges opportunities navigating dna rust unleashed gpt python javascript nabil microbial github copilot vs code bioinformatics

Encore: What language should I learn?

Play Episode Listen Later Jun 8, 2023 29:48

The MicroBinfie podcast discusses the top programming languages for bioinformatics. Andrew, Lee, and Nabil agree that Python is a great starting point for its consistency and rigor. Its strict syntax is ideal for teaching programming fundamentals that are essential in any language. In contrast, Perl encourages multiple ways of doing the same thing, creating confusion and difficulties in keeping track of things. The hosts caution against starting with trendy languages that are constantly changing. Instead, stick with more established languages like Python, which have established libraries and concepts that will help you advance more easily. Trendy languages come and go like changing tides, making them riskier choices. Additionally, they highlight the importance of understanding databases and their primary keys and unique fields. SQL is useful, particularly in dealing with large datasets. It is consistent across flavors and unlikely to go away soon. It takes a lot of skill to optimize queries to work in milliseconds. The hosts emphasize that the language you choose to learn depends on your individual goals and environment. For instance, Lee suggests that you should look to who is in your space and what they are using and who is willing to help you. Once you understand the programming concepts, it is easier to transfer them to other languages, and it is just a question of understanding the syntax. Andrew, Lee, and Nabil also discuss their own trajectories of learning programming languages, revealing that it takes a long time to become an expert in a language, and it is something that needs to be appreciated. They highlight the difference between just learning the basics of a language and really getting into the depths of it and the frameworks and libraries. The hosts also mention languages that are important to pick up, like SQL and bash scripting, and languages that are popular for web development, like JavaScript. However, they caution that JavaScript and Java are not the same thing and that JavaScript has a reputation for being a weird language. When asked what language they would choose for a task, Nabil says he would use Perl, Lee mentions R for stats, while Andrew admits that he has to relearn R every time he comes back to it and therefore prefers Perl for quick scripts. They also discuss their love-hate relationship with R, mentioning that while it has useful libraries like GGplot and GGtree, its syntax is difficult to work with and has separate paradigms of approaching the same problem. The hosts conclude by acknowledging that there is no one-size-fits-all approach to learning programming languages. One should choose based on their goals, environment, and personal preferences. Python is a useful language to learn, even if one is not interested in bioinformatics. Additionally, they note that the fundamentals of databases and how they work are crucial to understand and utilized across fields.

language python java javascript trendy sql nabil

108 SeqCode: a nomenclatural code for prokaryotes described from sequence data

Play Episode Listen Later May 25, 2023 45:59

We are back talking about systematics, and SeqCode; a nomenclatural code for prokaryotes described from sequence data. Marike Palmer is a Postdoctoral researcher in the School of Life Sciences at the University of Nevada Las Vegas and Miguel Rodriguez is an Assistant Professor of Bioinformatics at the University of Innsbruck in the departments of Microbiology and the Digital Science Center (DiSC). Link to paper: https://www.nature.com/articles/s41564-022-01214-9 History paper: https://www.sciencedirect.com/science/article/pii/S0723202022000121 They discussed the SeqCode, a nomenclature code for Prokaryotes described from sequence data. The SeqCode was created to provide a specific nomenclature code for previously uncultivated organisms. Palmer explained that the impetus for the SeqCode was the need to accommodate previously uncultivated organisms under a specific nomenclature code. She emphasized that the SeqCode was written to allow any peer-reviewed publication, but noted that the authors have designed three paths of validation in the SeqCode. They hope that anyone proposing a name will work with the curriculum team to ensure the best quality descriptions, names, etymology, and solidification. Rodriguez discussed the SeqCode's governance, which is already in place, and they have made them public so that anyone interested can join the SeqCode community. The governance structure comprises an executive board, committees, and working groups. The position's co-opted members hold some of the committees of these committees, while some are chosen by ballot. The hosts sought to clarify the relationship between the Isme Society, which is backing the SeqCode, and the wider field in general. Rodriguez explained that ISME is simply providing support as an umbrella organization for the SeqCode. Palmer and Rodriguez clarified that the SeqCode is not a competing code but rather a parallel one that aims to accommodate previously uncultivated organisms. The SeqCode was created to provide a specific nomenclature code for previously uncultivated organisms. Palmer noted that most scientists culture prokaryotes not for naming but to advance their knowledge of these organisms through physiology experiments. They emphasized that the new system is the result of a long collaborative effort that involved many different viewpoints and philosophies. The episode also discussed the practical requirements for naming under the new system, which include standards for the completeness and contamination levels required in the genome sequence data. Palmer noted that while the 16S rRNA gene sequence was not required for naming, it was recommended for improved accuracy in cross-talk between different taxonomies. The conversation highlighted the importance and challenges of naming microorganisms and the ongoing efforts to create a system that is inclusive of all microorganisms, both cultivated and uncultivated. Rodriguez and Palmer also discussed the SeqCode, a nature code for naming prokaryotes described from sequence data. They agreed that high-quality genomes should be the main control types to ensure the system builds up rather than breaks down. They noted the challenge of obtaining full genomes of some organisms, such as obligate intracellular parasites but suggested obtaining housekeeping genes as a potential solution. They further explained the technical issue of estimating completeness or contamination for many taxa, but Palmer confirmed that registering a name on the SeqCode registry requires adding such estimates. It emphasized the importance of collaboration within the scientific community and the need to create a system that is inclusive of all microorganisms. It also highlighted the challenges inherent in the process of naming microorganisms but demonstrated that it is an ongoing process, and that scientists are working to create a system that is accurate, practical, and beneficial for all.

university history school data code assistant professor rodriguez life sciences sequence microbiology innsbruck bioinformatics isme postdoctoral nevada las vegas miguel rodriguez prokaryotes 16s rrna

107 Systematics and naming of prokaryotes in the era of sequencing

Play Episode Listen Later May 11, 2023 33:16

Today we are talking about systematics, and specifically SeqCode; a nomenclatural code for prokaryotes described from sequence data. Joining us to talk about it are co-authors on the recent publication. Marike Palmer and Miguel Rodriguez. Marike Palmer is a Postdoctoral researcher in the School of Life Sciences at the University of Nevada Las Vegas and Miguel Rodriguez is an Assistant Professor of Bioinformatics at the University of Innsbruck in the departments of Microbiology and the Digital Science Center(DiSC).

university school assistant professor naming life sciences microbiology innsbruck sequencing bioinformatics postdoctoral nevada las vegas systematics miguel rodriguez prokaryotes

106 Why on earth would you do a PostDoc?

Play Episode Listen Later Apr 27, 2023 15:19

An honest discussion about the up and downsides of doing a postdoc in front of an audience of first year PhD students. Guests Dr Emma Waters, Dr Heather Felgate and Dr Muhammad Yasir are joined by Dr Andrew Page. It was recorded in front of a live audience of PhD students at the Microbes, Microbiomes and Bioinformatics doctoral training program in the Quadram Institute in Norwich UK. Emma starts the conversation by sharing that she enjoys research and solving problems with different tools. The thrill of discovery and exploration that comes with the postdoc position is something she loves. Heather echoes Emma's thoughts and believes that she is happy where she is, rather than chasing after a higher paying job in the industry. She appreciates the flexibility that academia offers, which has enabled her to balance her family and personal life. The conversation takes a turn when PhD students ask if any of the postdocs regret the decision of choosing academia despite the evident pay gap between the industry and academia. Emma points out that although she may have earned more in the industry, she is happy where she is, and finds satisfaction in helping people through her work. Chasing profits in the industry would not offer her that kind of gratification. Yasir shares his success story of sequencing 600 samples of the SAR-CoV-2 virus in Pakistan, and how it contributed towards the fight against the pandemic. He credits the freedom and flexibility of academia that allows him to collaborate with colleagues from all over the world. In conclusion, Andrew advises students to explore their options and to keep their careers open-ended. He suggests that if they are after a higher paycheck, they should consider the bioinformatics data science path that offers more earning opportunities in the industry. The postdocs stress the importance of following what makes one happy in life, rather than chasing big salaries.

phd chasing pakistan microbiome microbes postdoc bioinformatics yasir earth would andrew page sar cov

105 Mobile genetic elements panel discussion

Play Episode Listen Later Apr 20, 2023 43:54

This is a panel discussion on mobile genetic elements, guest chaired by Dr Muhammad Yasir with guests Dr Emma Waters, Dr Heather Felgate and Dr Andrew Page. We cover AMR, Salmonella Typhi and Staphylococci and outbreaks and the role of MGEs. It was recorded in front of a live audience of PhD students at the Microbes, Microbiomes and Bioinformatics doctoral training program in the Quadram Institute in Norwich UK.

phd mobile elements genetic panel discussion microbiome microbes amr bioinformatics emma waters andrew page staphylococci

104 The Kraken software suite

Play Episode Listen Later Apr 6, 2023 21:50

We talk about KRAKEN the taxonomic classification software and the software suite around it and are joined by Jennifer Lu and Natalia Rincon from Johns Hopkins University Center for Computational Biology.

software suite kraken johns hopkins university computational biology jennifer lu

103 Release the Kraken

Play Episode Listen Later Mar 23, 2023 27:09

We are talking about KRAKEN - the taxonomic classification software and in the hot seat are Dr Jennifer Lu and Natalia Rincon from Johns Hopkins University Center for Computational Biology.

kraken johns hopkins university computational biology jennifer lu

Claim Micro binfie podcast

In order to claim this podcast we'll send an email to with a verification link. Simply click the link and you will be able to edit tags, request a refresh, and other features to take control of your podcast page!

Claim Cancel