Podcasts about microarrays

  • 16PODCASTS
  • 20EPISODES
  • 36mAVG DURATION
  • 1MONTHLY NEW EPISODE
  • Dec 27, 2023LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about microarrays

Latest podcast episodes about microarrays

The Nonlinear Library: LessWrong
LW - AI's impact on biology research: Part I, today by octopocta

The Nonlinear Library: LessWrong

Play Episode Listen Later Dec 27, 2023 6:35


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI's impact on biology research: Part I, today, published by octopocta on December 27, 2023 on LessWrong. I'm a biology PhD, and have been working in tech for a number of years. I want to show why I believe that biological research is the most near term, high value application of machine learning. This has profound implications for human health, industrial development, and the fate of the world. In this article I explain the current discoveries that machine learning has enabled in biology. In the next article I will consider what this implies will happen in the near term without major improvements in AI, along with my speculations about how our expectations that underlie our regulatory and business norms will fail. Finally, my last article will examine the longer term possibilities for machine learning and biology, including crazy but plausible sci-fi speculation. TL;DR Biology is complex, and the potential space of biological solutions to chemical, environmental, and other challenges is incredibly large. Biological research generates huge, well labeled datasets at low cost. This is a perfect fit with current machine learning approaches. Humans without computational assistance have very limited ability to understand biological systems enough to simulate, manipulate, and generate them. However, machine learning is giving us tools to do all of the above. This means things that have been constrained by human limits such as drug discovery or protein structure are suddenly unconstrained, turning a paucity of results into a superabundance in one step. Biology and data Biological research has been using technology to collect vast datasets since the bioinformatics revolution of the 1990's. DNA sequencing costs have dropped by 6 orders of magnitude in 20 years ($100,000,000 dollars per human genome to $1000 dollars per genome)[1]. Microarrays allowed researchers to measure changes in mRNA expression in response to different experimental conditions across the entire genome of many species. High throughput cell sorting, robotic multi-well assays, proteomics chips, automated microscopy, and many more technologies generate petabytes of data. As a result, biologists have been using computational tools to analyze and manipulate big datasets for over 30 years. Labs create, use, and share programs. Grad students are quick to adapt open source software, and lead researchers have been investing in powerful computational resources. There is a strong culture of adopting new technology, and this extends to machine learning. Leading Machine Learning experts want to solve biology Computer researchers have long been interested in applying computational resources to solve biological problems. Hedge fund billionaire David E. Shaw intentionally started a hedge fund so that he could fund computational biology research[2]. Demis Hassabis, Deepmind founder, is a PhD neuroscientist. Under his leadership Deepmind has made biological research a major priority, spinning off Isomorphic Labs[3] focused on drug discovery. The Chan Zuckerberg Institute is devoted to enabling computational research in biology and medicine to "cure, prevent, or manage all diseases by the end of this century"[4]. This shows that the highest level of machine learning research is being devoted to biological problems. What have we discovered so far? In 2020, Deepmind showed accuracy equal to the best physical methods of protein structure measurement at the CASP 14 protein folding prediction contest with their AlphaFold2 program.[5] This result "solved the protein folding problem"[6] for the large majority of proteins, showing that they could generate a high quality, biologically accurate 3D protein structure given the DNA sequence that encodes the protein. Deepmind then used AlphaFold2 to generate structures for all proteins kn...

The Nonlinear Library
LW - AI's impact on biology research: Part I, today by octopocta

The Nonlinear Library

Play Episode Listen Later Dec 27, 2023 6:35


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI's impact on biology research: Part I, today, published by octopocta on December 27, 2023 on LessWrong. I'm a biology PhD, and have been working in tech for a number of years. I want to show why I believe that biological research is the most near term, high value application of machine learning. This has profound implications for human health, industrial development, and the fate of the world. In this article I explain the current discoveries that machine learning has enabled in biology. In the next article I will consider what this implies will happen in the near term without major improvements in AI, along with my speculations about how our expectations that underlie our regulatory and business norms will fail. Finally, my last article will examine the longer term possibilities for machine learning and biology, including crazy but plausible sci-fi speculation. TL;DR Biology is complex, and the potential space of biological solutions to chemical, environmental, and other challenges is incredibly large. Biological research generates huge, well labeled datasets at low cost. This is a perfect fit with current machine learning approaches. Humans without computational assistance have very limited ability to understand biological systems enough to simulate, manipulate, and generate them. However, machine learning is giving us tools to do all of the above. This means things that have been constrained by human limits such as drug discovery or protein structure are suddenly unconstrained, turning a paucity of results into a superabundance in one step. Biology and data Biological research has been using technology to collect vast datasets since the bioinformatics revolution of the 1990's. DNA sequencing costs have dropped by 6 orders of magnitude in 20 years ($100,000,000 dollars per human genome to $1000 dollars per genome)[1]. Microarrays allowed researchers to measure changes in mRNA expression in response to different experimental conditions across the entire genome of many species. High throughput cell sorting, robotic multi-well assays, proteomics chips, automated microscopy, and many more technologies generate petabytes of data. As a result, biologists have been using computational tools to analyze and manipulate big datasets for over 30 years. Labs create, use, and share programs. Grad students are quick to adapt open source software, and lead researchers have been investing in powerful computational resources. There is a strong culture of adopting new technology, and this extends to machine learning. Leading Machine Learning experts want to solve biology Computer researchers have long been interested in applying computational resources to solve biological problems. Hedge fund billionaire David E. Shaw intentionally started a hedge fund so that he could fund computational biology research[2]. Demis Hassabis, Deepmind founder, is a PhD neuroscientist. Under his leadership Deepmind has made biological research a major priority, spinning off Isomorphic Labs[3] focused on drug discovery. The Chan Zuckerberg Institute is devoted to enabling computational research in biology and medicine to "cure, prevent, or manage all diseases by the end of this century"[4]. This shows that the highest level of machine learning research is being devoted to biological problems. What have we discovered so far? In 2020, Deepmind showed accuracy equal to the best physical methods of protein structure measurement at the CASP 14 protein folding prediction contest with their AlphaFold2 program.[5] This result "solved the protein folding problem"[6] for the large majority of proteins, showing that they could generate a high quality, biologically accurate 3D protein structure given the DNA sequence that encodes the protein. Deepmind then used AlphaFold2 to generate structures for all proteins kn...

The Drug Discovery World Podcast
Helping to Advance Precision Medicine: Sengenics i-Ome Discovery Microarrays

The Drug Discovery World Podcast

Play Episode Listen Later Oct 19, 2023 29:19


The latest sponsored DDW Sitting Down With podcast features Professor Jonathan Blackburn, Chief Scientific Officer of Sengenics, who discusses the advantages of functional protein microarrays, challenges to overcome for biomarker discovery, Sengenics' new i-Ome Discovery chip, and more.  As CSO, Professor Blackburn heads the R&D and product development strategy. He invented the patented Sengenics K-REX technology, holds more than 30 technology patents, and has published more than 130 peer reviewed papers, plus 9 book chapters. He serves on several committees, including the FNIH Steering Committees for Cancer and Inflammation & Immunity, and the Council of the Human Proteome Organization (HUPO). Listen now. 

The Medbullets Step 1 Podcast
Biochemistry | Microarrays

The Medbullets Step 1 Podcast

Play Episode Listen Later May 6, 2023 4:58


In this episode, we review the high-yield topic of Microarrays⁠⁠ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠from the Biochemistry section. Follow ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Medbullets⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ on social media: Facebook: www.facebook.com/medbullets Instagram: www.instagram.com/medbulletsofficial Twitter: www.twitter.com/medbullets --- Send in a voice message: https://podcasters.spotify.com/pod/show/medbulletsstep1/message

biochemistry microarrays
PaperPlayer biorxiv biochemistry
SARS-CoV-2 epitope mapping on microarrays highlights strong immune-response to N protein region

PaperPlayer biorxiv biochemistry

Play Episode Listen Later Nov 9, 2020


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.11.09.374082v1?rss=1 Authors: Musico, A., Frigerio, R., Mussida, A., Barzon, L., Sinigaglia, A., Riccetti, S., Gobbi, F., Piubelli, C., Bergamaschi, G., Chiari, M., Gori, A., Cretich, M. Abstract: A workflow for SARS-CoV-2 epitope discovery on peptide microarrays is herein reported. The process started with a proteome-wide screening of immunoreactivity based on the use of a high-density microarray followed by a refinement and validation phase on a restricted panel of probes using microarrays with tailored peptide immobilization through a click-based strategy. Progressively larger, independent cohorts of Covid-19 positive sera were tested in the refinement processes, leading to the identification of immunodominant regions on SARS-CoV-2 Spike (S), Nucleocapsid (N) protein and Orf1ab polyprotein. A summary study testing 50 serum samples highlighted an epitope of the N protein (region 155-171) providing 92% sensitivity and 100% specificity of IgG detection in Covid-19 samples thus being a promising candidate for rapid implementation in serological tests. Copy rights belong to original authors. Visit the link for more info

PaperPlayer biorxiv bioinformatics
CuBlock: A cross-platform normalization method for gene-expression microarrays

PaperPlayer biorxiv bioinformatics

Play Episode Listen Later Oct 29, 2020


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.10.29.360198v1?rss=1 Authors: Junet, V., Farres, J., Mas, J. M., Daura, X. Abstract: Cross-(multi)platform normalization of gene-expression microarray data remains an unresolved issue. Despite the existence of several algorithms, they are either constrained by the need to normalize all samples of all platforms together, compromising scalability and reuse, by adherence to the platforms of a specific provider, or simply by poor performance. In addition, many of the methods presented in the literature have not been specifically tested against multi-platform data and/or other methods applicable in this context. Thus, we set out to develop a normalization algorithm appropriate for gene-expression studies based on multiple, potentially large microarray sets collected along multiple platforms and at different times, applicable in systematic studies aimed at extracting knowledge from the wealth of microarray data available in public repositories; for example, for the extraction of Real-World Data to complement data from Randomized Controlled Trials. Our main focus or criterion for performance was on the capacity of the algorithm to properly separate samples from different biological groups. Copy rights belong to original authors. Visit the link for more info

Oncotarget
Screening of cancer tissue arrays identifies CXCR4 on adrenocortical carcinoma

Oncotarget

Play Episode Listen Later Nov 10, 2017 48:42


Title - Screening of cancer tissue arrays identifies CXCR4 on adrenocortical carcinoma: correlates with expression and quantification on metastases using 64Cu-plerixafor PET Abstract - Expression of the chemokine receptor CXCR4 by many cancers correlates with aggressive clinical behavior. As part of the initial studies in a project whose goal was to quantify CXCR4 expression on cancers non-invasively, we examined CXCR4 expression in cancer samples by immunohistochemistry using a validated anti-CXCR4 antibody. Among solid tumors, we found expression of CXCR4 on significant percentages of major types of kidney, lung, and pancreatic adenocarcinomas, and, notably, on metastases of clear cell renal cell carcinoma and squamous cell carcinoma of the lung. We found particularly high expression of CXCR4 on adrenocortical cancer (ACC) metastases. Microarrays of ACC metastases revealed correlations between expression of CXCR4 and other chemokine system genes, particularly CXCR7/ACKR3, which encodes an atypical chemokine receptor that shares a ligand, CXCL12, with CXCR4. A first-in-human study using 64Cu-plerixafor for PET in an ACC patient prior to resection of metastases showed heterogeneity among metastatic nodules and good correlations among PET SUVs, CXCR4 staining, and CXCR4 mRNA. Additionally, we were able to show that CXCR4 expression correlated with the rates of growth of the pulmonary lesions in this patient. Further studies are needed to understand better the role of CXCR4 in ACC and whether targeting it may be beneficial. In this regard, non-invasive methods for assessing CXCR4 expression, such as PET using 64Cu-plerixafor, should be important investigative tools. Full text - http://bit.ly/2hwPKOq Facebook - bit.ly/2xznxjV Twitter - bit.ly/2xzWvsu LinkedIn - bit.ly/2xzJ6kc Pintrest - bit.ly/2xzX8SS Reddit - bit.ly/2hoxI0N www.Oncotarget.com

Mendelspod Podcast
Cancer Researcher Tim Triche on the Staying Power of Microarrays

Mendelspod Podcast

Play Episode Listen Later Jul 13, 2015


In the second part of our interview with Tim Triche, Director of the Personalized Medicine Center at Children’s Hospital Los Angeles, Tim says that micro arrays are still a vital technology for today’s cancer researcher. Making use of both next-gen sequencing and arrays for his research, Tim confirms that arrays still have advantages in the clinic as well, such as quicker turn around time.  Tim also weighs in on some ongoing questions about whether poor biospecimen quality is hampering research efforts and whether genomic medicine is paying off for patients. 

Medizin - Open Access LMU - Teil 19/22
Detection and correction of probe-level artefacts on microarrays

Medizin - Open Access LMU - Teil 19/22

Play Episode Listen Later Jan 1, 2012


Background: A recent large-scale analysis of Gene Expression Omnibus (GEO) data found frequent evidence for spatial defects in a substantial fraction of Affymetrix microarrays in the GEO. Nevertheless, in contrast to quality assessment, artefact detection is not widely used in standard gene expression analysis pipelines. Furthermore, although approaches have been proposed to detect diverse types of spatial noise on arrays, the correction of these artefacts is mostly left to either summarization methods or the corresponding arrays are completely discarded. Results: We show that state-of-the-art robust summarization procedures are vulnerable to artefacts on arrays and cannot appropriately correct for these. To address this problem, we present a simple approach to detect artefacts with high recall and precision, which we further improve by taking into account the spatial layout of arrays. Finally, we propose two correction methods for these artefacts that either substitute values of defective probes using probeset information or filter corrupted probes. We show that our approach can identify and correct defective probe measurements appropriately and outperforms existing tools. Conclusions: While summarization is insufficient to correct for defective probes, this problem can be addressed in a straightforward way by the methods we present for identification and correction of defective probes. As these methods output CEL files with corrected probe values that serve as input to standard normalization and summarization procedures, they can be easily integrated into existing microarray analysis pipelines as an additional pre-processing step. An R package is freely available from http://www.bio.ifi.lmu.de/artefact-correction.

Biologia Molecolare Avanzata « Federica
4. Un esempio di studio comparato del Trascrittoma mediante Microarrays a DNA

Biologia Molecolare Avanzata « Federica

Play Episode Listen Later Nov 18, 2011 71:17


Iperlipidemia Combinata Familiare FCHL (Familial Combined HyperLipidemia): Malattia Genetica Multigenica Dominante (?) causata

mediante esempio microarrays
Biologia Molecolare Avanzata « Federica
3. Analisi del Trascrittoma mediante Microarrays a DNA

Biologia Molecolare Avanzata « Federica

Play Episode Listen Later Nov 18, 2011 74:03


I Potenti Mezzi della Postgenomica! Profili di espressione genica mediante microarrays

Fakultät für Biologie - Digitale Hochschulschriften der LMU - Teil 04/06
Identifikation zellulärer Ziel-Gene KSHV-kodierter miRNAs

Fakultät für Biologie - Digitale Hochschulschriften der LMU - Teil 04/06

Play Episode Listen Later Jul 7, 2010


Herpesviren exprimieren Micro-(mi)RNAs, welche die Expression von zellulären und viralen Genen beeinflussen. Das Genom des Kaposi Sarkom Assoziierten Herpesvirus (KSHV) kodiert ein Cluster von insgesamt 12 miRNAs, welche sowohl während der latenten, als auch während der lytischen Infektion exprimiert werden. Da bisher nur sehr wenige zelluläre Zielgene für KSHV miRNAs bekannt sind, war es das Ziel dieser Studie, Gene zu identifizieren, deren Expression durch virale miRNAs von KSHV beeinflusst wird. Zu diesem Zweck wurden KSHV miRNAs mit Hilfe eines lentiviralen Transduktionssystems in B-Zellen und in Endothelzellen exprimiert. Diese sind beide natürliche Wirtszellen für KSHV. Die dabei entstandenen Zelllinien wurden mit Hilfe von zwei unterschiedlichen experimentellen Ansätzen untersucht: Beim ersten Ansatz wurde das gesamte Expressionsprofil dieser Zellen mit Hilfe von Microarrays analysiert und, nach Filterung durch bioinformatische Methoden, wurden Kandidaten für eine Regulation durch virale miRNAs identifiziert. Das Ergebnis wurde anhand biochemischer Methoden validiert und zwei zelluläre Transkripte als Zielgene bestätigt. In funktionellen Analysen konnte gezeigt werden, dass KSHV miRNAs die Caspase 3 inhibieren und dadurch die Zellen vor Apoptose schützen. Im zweiten, weitaus effizienteren Ansatz, wurden die sogenannten RISC-Komplexe mit Hilfe von AGO2-spezifischen Antikörpern sowohl aus den KSHV miRNA exprimierenden Zellen als auch aus latent KSHV infizierten Zellen isoliert und die daran gebundenen mRNAs identifiziert. Der RISC-Komplex spielt die entscheidende Rolle bei der miRNA-induzierten Regulation und enthält neben Proteinkomponenten (u.a. Argonauten (AGO)-Proteinen) sowohl die aktiven miRNAs als auch die regulierten mRNAs. Nach Isolierung der gebundenen RNAs konnten mit dieser Methode 72 Gene als Zielgene für KSHV miRNAs identifiziert werden. Viele davon spielen eine wichtige Rolle in unterschiedlichen Prozessen wie Zellzykluskontrolle, in der Apoptose oder der mRNA-Prozessierung. Insgesamt 11 identifizierte Zielgene wurden mit Hilfe von real-time PCRs untersucht und 10 bestätigt. Mittels 3’UTR-Luciferase-Assays wurden 6 davon weiter analysiert und bestätigt. Für die Gene LRRC8D, NHP2L1 und GEMIN8 konnten die zuständigen KSHV miRNAs und die dazugehörigen Bindungsstellen auf dem Transkript identifiziert werden. Bei den letzteren beiden lagen diese interessanterweise nicht wie erwartet in der 3’UTR, sondern in dem kodierenden Bereich.

rolle micro bei diese ziel hilfe regulation viele beim bereich expression methoden ans methode studie ansatz zweck cluster kandidaten insgesamt das ergebnis analysen prozessen zellen infektion identifikation antik genen mittels rnas utr transkript mrnas pcrs mirnas apoptose zellul transkripte filterung ddc:500 zelllinien endothelzellen b zellen caspase kshv herpesviren ddc:570 microarrays bindungsstellen zielgene wirtszellen das genom expressionsprofil
Clinical Chemistry Podcast
Protein Microarrays for Personalized Medicine

Clinical Chemistry Podcast

Play Episode Listen Later Mar 29, 2010 8:03


Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02

In the 1990s a number of technological innovations appeared that revolutionized biology, and 'Bioinformatics' became a new scientific discipline. Microarrays can measure the abundance of tens of thousands of mRNA species, data on the complete genomic sequences of many different organisms are available, and other technologies make it possible to study various processes at the molecular level. In Bioinformatics and Biostatistics, current research and computations are limited by the available computer hardware. However, this problem can be solved using high-performance computing resources. There are several reasons for the increased focus on high-performance computing: larger data sets, increased computational requirements stemming from more sophisticated methodologies, and latest developments in computer chip production. The open-source programming language 'R' was developed to provide a powerful and extensible environment for statistical and graphical techniques. There are many good reasons for preferring R to other software or programming languages for scientific computations (in statistics and biology). However, the development of the R language was not aimed at providing a software for parallel or high-performance computing. Nonetheless, during the last decade, a great deal of research has been conducted on using parallel computing techniques with R. This PhD thesis demonstrates the usefulness of the R language and parallel computing for biological research. It introduces parallel computing with R, and reviews and evaluates existing techniques and R packages for parallel computing on Computer Clusters, on Multi-Core Systems, and in Grid Computing. From a computer-scientific point of view the packages were examined as to their reusability in biological applications, and some upgrades were proposed. Furthermore, parallel applications for next-generation sequence data and preprocessing of microarray data were developed. Microarray data are characterized by high levels of noise and bias. As these perturbations have to be removed, preprocessing of raw data has been a research topic of high priority over the past few years. A new Bioconductor package called affyPara for parallelized preprocessing of high-density oligonucleotide microarray data was developed and published. The partition of data can be performed on arrays using a block cyclic partition, and, as a result, parallelization of algorithms becomes directly possible. Existing statistical algorithms and data structures had to be adjusted and reformulated for the use in parallel computing. Using the new parallel infrastructure, normalization methods can be enhanced and new methods became available. The partition of data and distribution to several nodes or processors solves the main memory problem and accelerates the methods by up to the factor fifteen for 300 arrays or more. The final part of the thesis contains a huge cancer study analysing more than 7000 microarrays from a publicly available database, and estimating gene interaction networks. For this purpose, a new R package for microarray data management was developed, and various challenges regarding the analysis of this amount of data are discussed. The comparison of gene networks for different pathways and different cancer entities in the new amount of data partly confirms already established forms of gene interaction.

phd data existing biological mrna biostatistics microarray ddc:500 parallel computing microarrays bioconductor ddc:510 grid computing informatik und statistik
Medizinische Fakultät - Digitale Hochschulschriften der LMU - Teil 09/19
Strategien für die Expressionsanalyse in funktionellen Gengruppen

Medizinische Fakultät - Digitale Hochschulschriften der LMU - Teil 09/19

Play Episode Listen Later Feb 20, 2009


Die Analyse von Genexpressionsdaten, die durch die Microarray-Technologie bereit gestellt werden, ist in den letzten Jahren zu einem interessanten Forschungsfeld der Statistik geworden. Die ersten Verfahren auf diesem Gebiet zielen darauf ab, differentiell exprimierte Gene aus der riesigen Menge aller Gene eines Microarrays heraus zu filtern. Das Resultat einer solchen genweisen Analyse ist eine Liste interessanter Gene. Derartige Listen einzeln ausgewählter Gene sind allerdings schwer in einen biologischen Kontext zu bringen. Überdies hängen sie stark von der verwendeten Analysemethode und vom jeweiligen Datensatz ab, so daß Genlisten verschiedener Arbeitsgruppen meist eine relativ schlechte Übereinstimmung aufweisen. Eine Alternative beziehungsweise Weiterführung der genweisen Herangehensweise bietet die Analyse funktioneller Gengruppen. Diese beinhalten biologisches Vorwissen über das Zusammenspiel von Genen. Somit sind relevante Gengruppen sinnvoller interpretierbar als einzelne relevante Gene. Es werden verschiedene Verfahren für die Untersuchung funktioneller Gengruppen hinsichtlich differentieller Expression vorgestellt und auf methodischer Ebene sowie anhand von realen Datenbeispielen und Simulationsstudien verglichen. Von speziellem Interesse ist hier die Familie von Gengruppen, die durch die Gene Ontology definiert wird. Die hierarchische Struktur dieser Ontologien bedeutet eine zusätzliche Herausforderung für die Analyse, insbesondere für die Adjustierung für multiples Testen. Ein globaler Test auf differentielle Expression in Gengruppen ist das GlobalAncova Verfahren, welches im Rahmen dieser Arbeit weiter entwickelt und als R Paket bereit gestellt wurde. Die Signifikanz von Gengruppen kann dabei durch ein Permutationsmodell sowie über die asymptotische Verteilung der Teststatistik bewertet werden. Wir legen die theoretischen Grundlagen und Aspekte der Programmierung des Verfahrens dar. GlobalAncova eignet sich für die Analyse komplexer Fragestellungen. Hierzu werden einige ausführliche Auswertungen präsentiert, die im Rahmen von Kooperationen mit Medizinern und Biologen durchgeführt wurden.

Medizin - Open Access LMU - Teil 15/22
Transcript-specific expression profiles derived from sequence-based analysis of standard microarrays.

Medizin - Open Access LMU - Teil 15/22

Play Episode Listen Later Jan 1, 2009


Alternative mRNA processing mechanisms lead to multiple transcripts (i.e. splice isoforms) of a given gene which may have distinct biological functions. Microarrays like Affymetrix GeneChips measure mRNA expression of genes using sets of nucleotide probes. Until recently probe sets were not designed for transcript specificity. Nevertheless, the re-analysis of established microarray data using newly defined transcript-specific probe sets may provide information about expression levels of specific transcripts. In the present study alignment of probe sequences of the Affymetrix microarray HG-U133A with Ensembl transcript sequences was performed to define transcript-specific probe sets. Out of a total of 247,965 perfect match probes, 95,008 were designated “transcript-specific”, i.e. showing complete sequence alignment, no cross-hybridization, and transcript-, not only gene-specificity. These probes were grouped into 7,941 transcript-specific probe sets and 15,619 gene-specific probe sets, respectively. The former were used to differentiate 445 alternative transcripts of 215 genes. For selected transcripts, predicted by this analysis to be differentially expressed in the human kidney, confirmatory real-time RT-PCR experiments were performed. First, the expression of two specific transcripts of the genes PPM1A (PP2CA_HUMAN and P35813) and PLG (PLMN_HUMAN and Q5TEH5) in human kidneys was determined by the transcript-specific array analysis and confirmed by real-time RT-PCR. Secondly, disease-specific differential expression of single transcripts of PLG and ABCA1 (ABCA1_HUMAN and Q5VYS0_HUMAN) was computed from the available array data sets and confirmed by transcript-specific real-time RT-PCR. Transcript-specific analysis of microarray experiments can be employed to study gene-regulation on the transcript level using conventional microarray data. In this study, predictions based on sufficient probe set size and fold-change are confirmed by independent means.

Medizin - Open Access LMU - Teil 15/22
CMA - a comprehensive Bioconductor package for supervised classification with high dimensional data

Medizin - Open Access LMU - Teil 15/22

Play Episode Listen Later Jan 1, 2008


Background: For the last eight years, microarray-based classification has been a major topic in statistics, bioinformatics and biomedicine research. Traditional methods often yield unsatisfactory results or may even be inapplicable in the so-called "p >> n" setting where the number of predictors p by far exceeds the number of observations n, hence the term "ill-posed-problem". Careful model selection and evaluation satisfying accepted good-practice standards is a very complex task for statisticians without experience in this area or for scientists with limited statistical background. The multiplicity of available methods for class prediction based on high-dimensional data is an additional practical challenge for inexperienced researchers. Results: In this article, we introduce a new Bioconductor package called CMA (standing for "Classification for MicroArrays") for automatically performing variable selection, parameter tuning, classifier construction, and unbiased evaluation of the constructed classifiers using a large number of usual methods. Without much time and effort, users are provided with an overview of the unbiased accuracy of most top-performing classifiers. Furthermore, the standardized evaluation framework underlying CMA can also be beneficial in statistical research for comparison purposes, for instance if a new classifier has to be compared to existing approaches. Conclusion: CMA is a user-friendly comprehensive package for classifier construction and evaluation implementing most usual approaches. It is freely available from the Bioconductor website at http://bioconductor.org/packages/2.3/bioc/html/CMA.html.

Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02
Text Mining and Gene Expression Analysis Towards Combined Interpretation of High Throughput Data

Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02

Play Episode Listen Later Sep 13, 2007


Microarrays can capture gene expression activity for thousands of genes simultaneously and thus make it possible to analyze cell physiology and disease processes on molecular level. The interpretation of microarray gene expression experiments profits from knowledge on the analyzed genes and proteins and the biochemical networks in which they play a role. The trend is towards the development of data analysis methods that integrate diverse data types. Currently, the most comprehensive biomedical knowledge source is a large repository of free text articles. Text mining makes it possible to automatically extract and use information from texts. This thesis addresses two key aspects, biomedical text mining and gene expression data analysis, with the focus on providing high-quality methods and data that contribute to the development of integrated analysis approaches. The work is structured in three parts. Each part begins by providing the relevant background, and each chapter describes the developed methods as well as applications and results. Part I deals with biomedical text mining: Chapter 2 summarizes the relevant background of text mining; it describes text mining fundamentals, important text mining tasks, applications and particularities of text mining in the biomedical domain, and evaluation issues. In Chapter 3, a method for generating high-quality gene and protein name dictionaries is described. The analysis of the generated dictionaries revealed important properties of individual nomenclatures and the used databases (Fundel and Zimmer, 2006). The dictionaries are publicly available via a Wiki, a web service, and several client applications (Szugat et al., 2005). In Chapter 4, methods for the dictionary-based recognition of gene and protein names in texts and their mapping onto unique database identifiers are described. These methods make it possible to extract information from texts and to integrate text-derived information with data from other sources. Three named entity identification systems have been set up, two of them building upon the previously existing tool ProMiner (Hanisch et al., 2003). All of them have shown very good performance in the BioCreAtIvE challenges (Fundel et al., 2005a; Hanisch et al., 2005; Fundel and Zimmer, 2007). In Chapter 5, a new method for relation extraction (Fundel et al., 2007) is presented. It was applied on the largest collection of biomedical literature abstracts, and thus a comprehensive network of human gene and protein relations has been generated. A classification approach (Küffner et al., 2006) can be used to specify relation types further; e. g., as activating, direct physical, or gene regulatory relation. Part II deals with gene expression data analysis: Gene expression data needs to be processed so that differentially expressed genes can be identified. Gene expression data processing consists of several sequential steps. Two important steps are normalization, which aims at removing systematic variances between measurements, and quantification of differential expression by p-value and fold change determination. Numerous methods exist for these tasks. Chapter 6 describes the relevant background of gene expression data analysis; it presents the biological and technical principles of microarrays and gives an overview of the most relevant data processing steps. Finally, it provides a short introduction to osteoarthritis, which is in the focus of the analyzed gene expression data sets. In Chapter 7, quality criteria for the selection of normalization methods are described, and a method for the identification of differentially expressed genes is proposed, which is appropriate for data with large intensity variances between spots representing the same gene (Fundel et al., 2005b). Furthermore, a system is described that selects an appropriate combination of feature selection method and classifier, and thus identifies genes which lead to good classification results and show consistent behavior in different sample subgroups (Davis et al., 2006). The analysis of several gene expression data sets dealing with osteoarthritis is described in Chapter 8. This chapter contains the biomedical analysis of relevant disease processes and distinct disease stages (Aigner et al., 2006a), and a comparison of various microarray platforms and osteoarthritis models. Part III deals with integrated approaches and thus provides the connection between parts I and II: Chapter 9 gives an overview of different types of integrated data analysis approaches, with a focus on approaches that integrate gene expression data with manually compiled data, large-scale networks, or text mining. In Chapter 10, a method for the identification of genes which are consistently regulated and have a coherent literature background (Küffner et al., 2005) is described. This method indicates how gene and protein name identification and gene expression data can be integrated to return clusters which contain genes that are relevant for the respective experiment together with literature information that supports interpretation. Finally, in Chapter 11 ideas on how the described methods can contribute to current research and possible future directions are presented.

data interpretation zimmer numerous wiki gene expression throughput aigner text mining ddc:500 microarrays ddc:510 informatik und statistik
Genomics & Computational Biology
Lecture 05C: RNA 1: Microarrays, Library Sequencing and Quantitation Concepts

Genomics & Computational Biology

Play Episode Listen Later Apr 25, 2007 51:41


This course will assess the relationships among sequence, structure, and function in complex biological networks as well as progress in realistic modeling of quantitative, comprehensive, functional genomics analyses. Exercises will include algorithmic, statistical, database, and simulation approaches and practical applications to medicine, biotechnology, drug discovery, and genetic engineering. Future opportunities and current limitations will be critically addressed. In addition to the regular lecture sessions, supplementary sections are scheduled to address issues related to Perl, Mathematica and biology.

Genomics & Computational Biology
Lecture 05AB: RNA 1: Microarrays, Library Sequencing and Quantitation Concepts

Genomics & Computational Biology

Play Episode Listen Later Apr 25, 2007 58:47


This course will assess the relationships among sequence, structure, and function in complex biological networks as well as progress in realistic modeling of quantitative, comprehensive, functional genomics analyses. Exercises will include algorithmic, statistical, database, and simulation approaches and practical applications to medicine, biotechnology, drug discovery, and genetic engineering. Future opportunities and current limitations will be critically addressed. In addition to the regular lecture sessions, supplementary sections are scheduled to address issues related to Perl, Mathematica and biology.