Audio versions of bioRxiv paper abstracts
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.13.249920v1?rss=1 Authors: Goodin, D. S., Khankhanian, P., Gourraud, P.-A., Vince, N. Abstract: OBJECTIVE: To explore the nature of MS-susceptibility and, by extension, other complex-genetic diseases. BACKGROUND Basic-epidemiological parameters of MS (e.g., prevalence, recurrence-risks for siblings and twins, time-dependent changes in sex-ratio, etc.) are well-established. Moreover, >200 genetic-loci are unequivocally MS-associated, especially the HLA-DRB1*15:01~HLA-DQB1*06:02~a1 haplotype-association. DESIGN/METHODS: We define the genetically-susceptible subset-(G) to include everyone with any non-zero life-time chance of developing MS. We analyze, mathematically, the implications that these epidemiological observations have regarding genetic susceptibility. In addition, we use the sex-ratio change (observed over a 35-year interval), to derive the relationship between MS-probability and an increasing likelihood of a suitable environmental-exposure. RESULTS: We demonstrate that genetic-susceptibitly is restricted to less than 4.7% of populations across Europe and North America. Among carriers of the HLA-DRB1*15:01~HLA-DQB1*06:02~a1 haplotype, fewer than 20% are even in the subset-(G). Women are less likely to be susceptible than men although their MS-penetrance is considerably greater. Response-curves for MS-probability increase with an increasing likelihood of a suitable environmental-exposure, especially among women. These environmental response-curves plateau at under 50% for women and at a significantly lower level for men. CONCLUSIONS: MS is fundamentally a genetic disorder. Despite this, a suitable environmental-exposure is also critical for disease-pathogenesis. Genetic-susceptibility requires specific combinations of non-additive genetic risk-factors. For example, the HLA-DRB1*15:01~HLA-DQB1*06:02~a1 haplotype, by itself, poses no MS-risk. Moreover, the fact that environmental-response-curves plateau below 50%, indicates that disease-pathogenesis is partly stochastic. By extension, other diseases for which monozygotic-twin recurrence-risks greatly exceed disease-prevalence (e.g., rheumatoid arthritis, diabetes, and celiac disease), must have a similar genetic basis. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.13.249805v1?rss=1 Authors: Zapata, I., Lilly, M. L., Herron, M. E., Alvarez, C. E. Abstract: Very little is known about the etiology of personality and psychiatric disorders. Because the core neurobiology of many such traits is evolutionarily conserved, dogs present a powerful model. We previously reported genome scans of breed averages of ten traits related to fear, anxiety, aggression and social behavior in multiple cohorts of pedigree dogs. As a second phase of that discovery, here we tested the ability of markers at 13 of those loci to predict canine behavior in a community sample of 397 pedigree and mixed-breed dogs with individual-level genotype and phenotype data. We found support for all markers and loci. By including 122 dogs with veterinary behavioral diagnoses in our cohort, we were able to identify eight loci associated with those diagnoses. Logistic regression models showed subsets of those loci could predict behavioral diagnoses. We corroborated our previous findings that small body size is associated with many problem behaviors and large body size is associated with increased trainability. Children in the home were associated with anxiety traits; illness and other animals in the home with coprophagia; working-dog status with increased energy and separation-related problems; and competitive dogs with increased aggression directed at familiar dogs, but reduced fear directed at humans and unfamiliar dogs. Compared to other dogs, Pit Bull-type dogs were not defined by a set of our markers and were not more aggressive; but they were strongly associated with pulling on the leash. Using severity-threshold models, Pit Bull-type dogs showed reduced risk of owner-directed aggression (75th quantile) and increased risk of dog-directed fear (95th quantile). Our findings have broad utility, including for clinical and breeding purposes, but we caution that thorough understanding is necessary for their interpretation and use. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.13.249342v1?rss=1 Authors: Kao, C.-H., Wu, P.-Y., Yang, M.-H. Abstract: Quantitative trait loci (QTL) hotspots (genomic locations enriched in QTL) are a common and notable feature when collecting many QTL for various traits in many areas of biological studies. The QTL hotspots are important and attractive since they are highly informative and may harbor genes for the quantitative traits. So far, the current statistical methods for QTL hotspot detection use either the individual-level data from the genetical genomics experiments or the summarized data from public QTL databases to proceed with the detection analysis. These detection methods attempt to address some of the concerns, including the correlation structure among traits, the magnitude of LOD scores within a hotspot and computational cost, that arise during the process of QTL hotspot detection. In this article, we describe a statistical framework that can handle both types of data as well as address all the concerns at a time for QTL hotspot detection. Our statistical framework directly operates on the QTL matrix and hence has a very cheap computation cost, and is deployed to take advantage of the QTL mapping results for assisting the detection analysis. Two special devices, trait grouping and top profile, are introduced into the framework. The trait grouping attempts to group the closely linked or pleiotropic traits together to take care of the true linkages and cope with the underestimation of hotspot thresholds due to non-genetic correlations (arising from ignoring the correlation structure among traits), so as to have the ability to obtain much stricter thresholds and dismiss spurious hotspots. The top profile is designed to outline the LOD-score pattern of a hotspot across the different hotspot architectures, so that it can serve to identify and characterize the types of QTL hotspots with varying sizes and LOD score distributions. Real examples, numerical analysis and simulation study are performed to validate our statistical framework, investigate the detection properties, and also compare with the current methods in QTL hotspot detection. The results demonstrate that the proposed statistical framework can effectively accommodate the correlation structure among traits, identify the types of hotspots and still keep the notable features of easy implementation and fast computation for practical QTL hotspot detection. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.13.244970v1?rss=1 Authors: Derisbourg, M., Wester, L., Baddi, R., Denzel, M. S. Abstract: Protein homeostasis is modulated by stress response pathways and its deficiency is a hallmark of aging. The integrated stress response (ISR) is a conserved stress-signaling pathway that tunes mRNA translation via phosphorylation of the translation initiation factor eIF2. ISR activation and translation initiation are finely balanced by eIF2 kinases and by the eIF2 guanine nucleotide exchange factor eIF2B. However, the role of the ISR during aging remains unexplored. Using a genomic screen in Caenorhabditis elegans, we discovered a role of eIF2B and the eIF2 kinases in longevity. By limiting the ISR, these mutations enhanced protein homeostasis and increased lifespan. Consistently, full ISR inhibition using phosphorylation-defective eIF2 or pharmacological ISR inhibition prolonged lifespan. Lifespan extension through ISR inhibition occurred without changes in overall protein synthesis, and depended on enhanced translational efficiency of the kinase KIN-35. Evidently, lifespan is limited by the ISR and its inhibition may provide an intervention in aging. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.11.245415v1?rss=1 Authors: Wang, W., Lin, X.-D., Zhang, H.-L., Wang, M.-R., Guan, X.-Q., Holmes, E. C., Zhang, Y.-Z. Abstract: To better understand the genetic diversity, host association and evolution of coronaviruses (CoVs) in China we analyzed a total of 696 rodents encompassing 16 different species sampled from Zhejiang and Yunnan provinces. Based on the reverse transcriptase PCR-based CoV screening CoVs of fecal samples and subsequent sequence analysis of the RdRp gene, we identified CoVs in diverse rodent species, comprising Apodemus agrarius, Apodemus latronum, Bandicota indica, Eothenomys miletus, E. eleusis, Rattus andamanesis, Rattus norvegicus, and R. tanezumi. Apodemus chevrieri was a particularly rich host, harboring 25 rodent CoVs. Genetic and phylogenetic analysis revealed the presence of three groups of CoVs carried by a range of rodents that were closely related to the Lucheng Rn rat coronavirus (LRNV), China Rattus coronavirus HKU24 (ChRCoV_HKU24) and Longquan Rl rat coronavirus (LRLV) identified previously. One newly identified A. chevrieri-associated virus closely related to LRNV lacked an NS2 gene. This virus had a similar genetic organization to AcCoV-JC34, recently discovered in the same rodent species in Yunnan, suggesting that it represents a new viral subtype. Notably, additional variants of LRNV were identified that contained putative nonstructural NS2b genes located downstream of the NS2 gene that were likely derived from the host genome. Recombination events were also identified in the ORF1a gene of Lijiang-71. In sum, these data reveal the substantial genetic diversity and genomic complexity of rodent-borne CoVs, and greatly extend our knowledge of these major wildlife virus reservoirs. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.13.249334v1?rss=1 Authors: Havrylov, S., Chrystal, P., van Baarle, S., French, C. R., MacDonald, I. M., Avasarala, J. R., Rogers, R. C., Berry, F. B., Kume, T., Waskiewicz, A. J., Lehmann, O. J. Abstract: Alterations to cilia are responsible for a wide range of severe disease; however, understanding of the transcriptional control of ciliogenesis remains incomplete. We evaluated whether ciliary dysfunction contributed to the pleiotropic phenotypes caused by the Forkhead transcription factor FOXC1. Here, we show that patients with FOXC1-attributable Axenfeld-Rieger Syndrome (ARS) have a prevalence of ciliopathy-associated phenotypes comparable to syndromic ciliopathies. We demonstrate that altering the level of Foxc1, via shRNA mediated inhibition and mRNA overexpression, modifies cilia length in vitro. These structural changes were associated with substantially perturbed cilia-dependent signaling [Hedgehog (Hh) and PDGFRalpha] and the altered ciliary compartmentalization of a major Hh pathway transcription factor, Gli2. Analyses of two Foxc1 murine mutant strains demonstrated altered axonemal length in the choroid plexus with the increased expression of an essential regulator of multi-ciliation, Foxj1. The novel complexity revealed in ciliation of the choroid plexus indicates a partitioning of function between these Forkhead transcription factors. Collectively, these results support a contribution from ciliary dysfunction to some FOXC1-induced phenotypes. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.12.247973v1?rss=1 Authors: Fernando, L. M., Adeel, S., Basar, M. A., Allen, A. K., Duttaroy, A. Abstract: The nematode C. elegans has a contingent of five sod genes, one of the largest among aerobic organism. Earlier studies revealed each of the five sod genes is capable of making perfectly active SOD proteins in heterologous expressions systems therefore none appears to be a pseudogene. Yet deletion of the entire contingent of sod genes fails to impose any effect on the survival of C. elegans except these animals appear more sensitive to extraneously applied oxidative stress condition. We asked how many of the five sod genes are actually active in C. elegans through an in-gel SOD activity analysis. Here we provide evidence that out of the five genes only the mitochondrial SOD gene is active in C. elegans, albeit at a much lesser amount compared to D. melanogaster and E. coli. Mutant analysis further confirmed that among the mitochondrial forms, SOD-2 is the only naturally active SOD in C. elegans. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.12.248278v1?rss=1 Authors: Kern, A. D., Coffing, G. C., Battey, C. J. Abstract: Dimensionality reduction is a common tool for visualization and inference of population structure from genotypes, but popular methods either return too many dimensions for easy plotting (PCA) or fail to preserve global geometry (t-SNE and UMAP). Here we explore the utility of variational autoencoders (VAEs) -- generative machine learning models in which a pair of neural networks seek to first compress and then recreate the input data -- for visualizing population genetic variation. VAEs incorporate non-linear relationships, allow users to define the dimensionality of the latent space, and in our tests preserve global geometry better than t-SNE and UMAP. Our implementation, which we call popvae, is available as a command-line python program at github.com/kr-colab/popvae. The approach yields latent embeddings that capture subtle aspects of population structure in humans and Anopheles mosquitoes, and can generate artificial genotypes characteristic of a given sample or population. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.10.245043v1?rss=1 Authors: Si, Y., Zoellner, S. Abstract: Genotype imputation is an indispensable step in human genetic studies. Large reference panels with deeply sequenced genomes now allow interrogating variants with minor allele frequency
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.10.245191v1?rss=1 Authors: Berg, M. D., Zhu, Y., Isaacson, J., Genereaux, J., Loll-Krippleber, R., Brown, G. W., Brandl, C. J. Abstract: Non-proteinogenic amino acids, such as the proline analog L-azetidine-2-carboxylic acid (AZC), are detrimental to cells because they are mis-incorporated into proteins and lead to proteotoxic stress. Our goal was to identify genes that show chemical-genetic interactions with AZC in Saccharomyces cerevisiae and thus also potentially define the pathways cells use to cope with amino acid mis-incorporation. Screening the yeast deletion and temperature sensitive collections, we found 72 alleles with negative synthetic interactions with AZC treatment and 12 alleles that suppress AZC toxicity. Many of the genes with negative synthetic interactions are involved in protein quality control pathways through the proteasome. Genes involved in actin cytoskeleton organization and endocytosis also had negative synthetic interactions with AZC. Related to this, the number of actin patches per cell increases upon AZC treatment. Many of the same cellular processes were identified to have interactions with proteotoxic stress caused by two other amino acid analogs, canavanine and thialysine, or a mistranslating tRNA variant that mis-incorporates serine at proline codons. Alleles that suppressed AZC-induced toxicity functioned through the amino acid sensing TOR pathway or controlled amino acid permeases required for AZC uptake. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.11.247049v1?rss=1 Authors: Balbona, J. V., Kim, Y., Keller, M. C. Abstract: Offspring resemble their parents for both genetic and environmental reasons. Understanding the relative magnitude of these alternatives has long been a core interest in behavioral genetics research, but traditional designs, which compare phenotypic covariances to make inferences about unmeasured genetic and environmental factors, have struggled to disentangle them. Recently, Kong et al. (2018) showed that by correlating offspring phenotypic values with the measured polygenic score of parents' nontransmitted alleles, one can estimate the effect of "genetic nurture"-- a type of passive gene-environment covariation that arises when heritable parental traits directly influence offspring traits. Here, we instantiate this basic idea in a set of causal models that provide novel insights into the estimation of parental influences on offspring. Most importantly, we show how jointly modeling the parental polygenic scores and the offspring phenotypes can provide an unbiased estimate of the variation attributable to the environmental influence of parents on offspring, even when the polygenic score accounts for a small fraction of trait heritability. This model can be further extended to a) account for the influence of assortative mating at both equilibrium and disequilibrium (after a single generation of assortment), and b) include measured parental phenotypes, allowing for the estimation of the total variation due to additive genetic effects and their covariance with the familial environment. By utilizing path analysis techniques developed for extended twin family designs, our approach provides a general framework for modeling polygenic scores in family studies and allows for various model extensions that can be used to answer old questions about familial influences in new ways. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.09.243311v1?rss=1 Authors: McInnes, G. M., Altman, R. B. Abstract: Pharmacogenetics studies how genetic variation leads to variability in drug response. Guidelines for selecting the right drug and right dose to patients based on their genetics are clinically effective, but are still widely unused. For some drugs, the normal clinical decision making process may lead to the optimal dose of a drug that minimizes side effects and maximizes effectiveness. Without measurements of genotype, physicians and patients may observe and adjust dosage in a manner that reflects the underlying genetics. The emergence of genetic data linked to longitudinal clinical data in large biobanks offers an opportunity to confirm known pharmacogenetic interactions as well as discover novel associations by investigating outcomes from normal clinical practice. Here we use the UK Biobank to search for pharmacogenetic interactions among 200 drugs and 9 genes among 200,000 participants. We identify associations between pharmacogene phenotypes and drug maintenance dose as well as side effect incidence. We find support for several known drug-gene associations as well as novel pharmacogenetic interactions. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.11.246827v1?rss=1 Authors: Kim, Y., Balbona, J., Keller, M. C. Abstract: In a companion paper (Balbona et al. (2020)), we introduced a series of causal models that use polygenic scores from transmitted and nontransmitted alleles, the offspring trait, and parental traits to estimate the variation due to the environmental influences the parental trait has on the offspring trait (vertical transmission) as well as additive genetic effects. These models also estimate and account for the gene-gene and gene-environment covariation that arises from assortative mating and vertical transmission respectively. In the current study, we simulated polygenic scores and phenotypes of parents and offspring under genetic and vertical transmission scenarios, assuming two types of assortative mating. We instantiated the models from our companion paper in the OpenMx software, and compared the true values of parameters to maximum likelihood estimates from models fitted on the simulated data to quantify the bias and precision of estimates. We show that parameter estimates from these models are unbiased when assumptions are met, but as expected, they are biased to the degree that assumptions are unmet. Standard errors of the estimated variances due to vertical transmission and to genetic effects decrease with increasing sample sizes and with increasing r2 values of the polygenic score. Even when the polygenic score explains a modest amount of trait variation (r2 = .05), standard errors of these standardized estimates were reasonable (< .05) for n = 16K trios, and smaller sample sizes (e.g., down to 4K) when the polygenic score is more predictive. These causal models offer a novel approach for understanding how parents influence their offspring, but their use requires polygenic scores on relevant traits that are modestly predictive (e.g., r2 > .025) as well as datasets with genomic and phenotypic information on parents and offspring. The utility of polygenic scores for elucidating parental influences should thus serve as additional motivation for large genomic biobanks to perform GWAS's on traits that may be relevant to parenting and to oversample close relatives, particularly parents and offspring. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.11.247031v1?rss=1 Authors: Fu, Y.-X., Wang, G., Chen, K., Ma, X., Liu, S., Miao, W. Abstract: Cell division is a necessity of life which can be either mitotic or amitotic. While both are fundamental, amitosis is sometimes considered a relic of little importance in biology. Nevertheless, eukaryotes often have polyploid cells, including cancer cells, which may divide amitotically. To understand how amitosis ensures the completion of cell division, we turn to the macronuclei of ciliates. The grand scheme governing the proliferation of the macronuclei of ciliate cells, which involves chromosomal replication and the amitosis, is currently unknown. Using a novel model that encompasses a wide range of mechanisms together with experimental data of the composition of mating types at different stages derived from a single karyonide of Tetrahymena thermophila, we show that the chromosomal replication of the macronucleus has a strong head-start effect, with only about five copies of chromosomes replicated at a time and persistent reuse of the chromosomes involved in the early replication. Furthermore the fission of a fully grown macronucleus is non-random, with a strong tendency to push chromosomes and their replications to the same daughter cell. Similar strategies may exist for other Tetrahymena species or ciliates, and have implications to the amitosis of polyploid cells of multicellular organisms. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.11.246611v1?rss=1 Authors: Ding, X., Fragoza, R., Singh, P., Zhang, S., Yu, H., Schimenti, J. C. Abstract: Purpose: Approximately 7% of men suffer from infertility worldwide and sperm abnormalities are the major cause. Though genetic defects are thought to underlie a substantial fraction of all male infertility cases, the actual molecular bases are usually undetermined. Because the consequences of most genetic variants in populations are unknown, this complicates genetic diagnosis even after genome sequencing of patients. Some patients with ciliopathies, including primary ciliary dyskinesia (PCD) and Bardet-Biedl syndrome (BBS), also suffer from infertility because sperm flagella, which share several characteristics with cilia, are also affected in these patients. Methods: To identify infertility-causing genetic variants in human populations, we used in silico predictions to identify potentially deleterious SNP (single nucleotide polymorphism) alleles of RABL2A, a gene essential for normal cilia and flagella function. Candidate variants were assayed for protein stability in vitro, and the destabilizing variants were modeled in mice using CRISPR/Cas9-mediated genome editing. The resulting mice were characterized phenotypically for reproductive and developmental defects. Results: Two of the SNP alleles, Rabl2L119F (rs80006029) and Rabl2V158F (rs200121688), destabilized the protein. Mice bearing these alleles exhibited PCD- and BBS-associated disorders including male infertility, early growth retardation, excessive weight gain in adulthood, heterotaxia, pre-axial polydactyly, neural tube defects (NTD) and hydrocephalus. Conclusion: Our study identified and validated pathogenicity of two variants causing ciliopathies and male infertility in human populations, and identified phenotypes not previously described for null alleles of Rabl2. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.10.245167v1?rss=1 Authors: Gaynor, R. C., Gorjanc, G., Hickey, J. M. Abstract: This paper introduces AlphaSimR, an R package for stochastic simulations of plant and animal breeding programs. AlphaSimR is a highly flexible software package able to simulate a wide range of plant and animal breeding programs for diploid and autopolyploid species. AlphaSimR is ideal for testing the overall strategy and detailed design of breeding programs. AlphaSimR utilizes a scripting approach to building simulations that is particularly well suited for modeling highly complex breeding programs, such as commercial breeding programs. The primary benefit of this scripting approach is that it frees users from preset breeding program designs and allows them to model nearly any breeding program design. This paper lists the main features of AlphaSimR and provides a brief example simulation to show how to use the software. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.09.243287v1?rss=1 Authors: Trochet, H., Hussin, J. Abstract: Genetic risk scores (GRS), also known as polygenic risk scores, are a tool to estimate individuals' liabilities to a disease or trait measurement based solely on genetic information. They have value in clinical applications as well as for assessing relationships between traits and discovering causal determinants of complex disease. However, it has been shown that these scores are not robust to differences across continental populations and may not be portable within them either. Even within a single population, they may have variable predictive ability across sexes and socioeconomic strata, raising questions about their potential biases. In this paper, we investigated the accuracy of two different GRS across population strata of the UK Biobank, separated along principal component (PC) axes, considering different approaches to account for social and environmental confounders. We found that these scores did not predict the real differences in phenotypes observed along the first principal component, with evidence of discrepancies on axes as high as PC45. These results demonstrate that the measures currently taken for correcting for population structure are not sufficient, and the need for social and environmental confounders to be factored into the creation of GRS. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.10.244293v1?rss=1 Authors: Elsworth, B. L., Lyon, M. S., Alexander, T., Liu, Y., Matthews, P., Hallett, J., Bates, P., Palmer, T., Haberland, V., Davey Smith, G., Zheng, J., Haycock, P., Gaunt, T. R., Hemani, G. Abstract: Data generated by genome-wide association studies (GWAS) are growing fast with the linkage of biobank samples to health records, and expanding capture of high-dimensional molecular phenotypes. However the utility of these efforts can only be fully realised if their complete results are collected from their heterogeneous sources and formats, harmonised and made programmatically accessible. Here we present the OpenGWAS database, an open source, open access, scalable and high-performance cloud-based data infrastructure that imports and publishes complete GWAS summary datasets and metadata for the scientific community. Our import pipeline harmonises these datasets against dbSNP and the human genome reference sequence, generates summary reports and standardises the format of results and metadata. Users can access the data via a website, an application programming interface, R and Python packages, and also as downloadable files that can be rapidly queried in high performance computing environments. OpenGWAS currently contains 126 billion genetic associations from 14,582 complete GWAS datasets representing a range of different human phenotypes and disease outcomes across different populations. We developed R and Python packages to serve as conduits between these GWAS data sources and a range of available analytical tools, enabling Mendelian randomization, genetic colocalisation analysis, fine mapping, genetic correlation and locus visualisation. OpenGWAS is freely accessible at https://gwas.mrcieu.ac.uk, and has been designed to facilitate integration with third party analytical tools. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.10.236562v1?rss=1 Authors: Roule, T., Ariel, F., Hartmann, C., Crespi, M., Blein, T. Abstract: Clustered organization of biosynthetic non-homologous genes is emerging as a characteristic feature of plant genomes. The co-regulation of clustered genes seems to largely depend on epigenetic reprogramming and three-dimensional chromatin conformation. Here we identified the long noncoding RNA (lncRNA) MARneral Silencing (MARS), localized inside the Arabidopsis marneral cluster, and which controls the local epigenetic activation of its surrounding region in response to ABA. MARS modulates the POLYCOMB REPRESSIVE COMPLEX 1 (PRC1) component LIKE-HETEROCHROMATIN PROTEIN 1 (LHP1) binding throughout the cluster in a dose-dependent manner, determining H3K27me3 deposition and chromatin condensation. In response to ABA, MARS decoys LHP1 away from the cluster and promotes the formation of a chromatin loop bringing together the MARNERAL SYNTHASE 1 (MRN1) locus and a distal ABA-responsive enhancer. The enrichment of co-regulated lncRNAs in clustered metabolic genes in Arabidopsis suggests that the acquisition of novel noncoding transcriptional units may constitute an additional regulatory layer driving the evolution of biosynthetic pathways. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.10.244244v1?rss=1 Authors: Hannan, F. M., Stevenson, M., Bayliss, A. L., Stokes, V. j., Stewart, M. E., Kooblall, K. G., Gorvin, C. M., Codner, G. F., Teboul, L., Wells, S., Thakker, R. V. Abstract: Mutations of the adaptor protein-2 sigma subunit (AP2S1) gene which encodes AP2-sigma2, a component of the ubiquitous AP2 heterotetrameric complex involved in endosomal trafficking of the calcium-sensing receptor (CaSR), cause familial hypocalciuric hypercalcemia type 3 (FHH3). FHH3 patients have heterozygous AP2S1 missense Arg15 mutations (p.Arg15Cys, p.Arg15His or p.Arg15Leu) with marked hypercalcemia and occasional hypophosphatemia and osteomalacia. To further characterise the phenotypic spectrum and calcitropic pathophysiology of FHH3, we used CRISPR/Cas9 genome editing to generate mice harboring the AP2S1 p.Arg15Leu mutation, which causes the most severe FHH3 phenotype. Heterozygous (Ap2s1+/L15) mice were viable, and had marked hypercalcemia, hypermagnesemia, hypophosphatemia, and increased plasma concentrations of parathyroid hormone, fibroblast growth factor 23 and alkaline phosphatase activity, but normal pro-collagen type 1 N-terminal pro-peptide and 1,25 dihydroxyvitamin D. Homozygous (Ap2s1L15/L15) mice invariably died perinatally. The AP2S1 p.Arg15Leu mutation impaired protein-protein interactions between AP2-sigma2 and the other AP2 subunits, and the CaSR. Cinacalcet, a CaSR allosteric activator, ameliorated the hypercalcemia and elevated PTH concentrations, but not the diminished AP2-sigma2-CaSR interaction. Thus, our studies have established a mouse model with a germline loss-of-function AP2S1 mutation that is representative for FHH3 in humans, and demonstrated that cinacalcet corrects the abnormalities of plasma calcium and PTH. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.10.245175v1?rss=1 Authors: Peterson, T., Zuo, T., Su, W. Abstract: Transposable elements (TEs) are DNA sequences that can mobilize and proliferate throughout eukaryotic genomes. Previous studies have shown that in plant genomes, TEs can influence gene expression in various ways such as inserting in introns or exons to alter transcript structure and content, and providing novel promoters and regulatory elements to generate new regulatory patterns. Furthermore, TEs can also regulate gene expression at the epigenetic level by modifying chromatin structure, changing DNA methylation status and generating small RNAs. In this study, we demonstrated that Ac/fAc transposable elements are able to induce ectopic gene expression by duplicating and shuffling enhancer elements. Ac/fAc elements belong to the hAT family of Class II TEs. They can undergo standard transposition events, which involve the two termini of a single transposon, or alternative transposition events which involve the termini of two different, nearby elements. Our previous studies have shown that alternative transposition can generate various genome rearrangements such as deletions, duplications, inversions, translocations and Composite Insertions (CIs). We identified over 50 independent cases of CIs generated by Ac/fAc alternative transposition and analyzed 10 of them in detail. We show that these CIs induced ectopic expression of the maize pericarp color 2 (p2) gene, which encodes a Myb-related protein. All the CIs analyzed contain sequences including a transcriptional enhancer derived from the nearby p1 gene, suggesting that the CI-induced activation of p2 is effected by mobilization of the p1 enhancer. This is further supported by analysis of a mutant in which the CI is excised and p2 expression is lost. These results show that alternative transposition events are not only able to induce genome rearrangements, but also generate Composite Insertions that can control gene expression. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.06.240721v1?rss=1 Authors: Belonogova, N. M., Zorkoltseva, I. V., Tsepilov, Y. A., Axenovich, T. I. Abstract: Recent genome-wide studies have reported about 600 genes potentially in[fl]uencing neuroticism. Little is known about the mechanisms of their action. Here, we aimed to conduct a more detailed analysis of genes whose polymorphisms can regulate the level of neuroticism. Using UK Biobank-based GWAS summary statistics, we performed a gene-based association analysis using four sets of genetic variants within a gene differing in their protein coding properties. To guard against the influence of strong GWAS signals outside the gene, we used the specially designed procedure. As a result, we identified 190 genes associated with neuroticism due to their polymorphisms. Thirty eight of these genes were novel. Within all genes identified, we distinguished two slightly overlapping groups comprising genes that demonstrated association when using protein-coding and non-coding SNPs. Many genes from the first group included potentially pathogenic variants. For some genes from the second group, we found evidence of pleiotropy with gene expression. We demonstrated that the association of almost two hundred known genes could be inflated by the GWAS signals outside the gene. Using bioinformatics analysis, we prioritized the neuroticism genes and showed that the genes influencing the trait by their polymorphisms are the most appropriate candidate genes. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.06.239178v1?rss=1 Authors: Tubbs, J. D., Hwang, L.-D., Luong, J., Evans, D. M., Sham, P. C. Abstract: Disaggregation and estimation of genetic effects from offspring and parents has long been of interest to statistical geneticists. Recently, technical and methodological advances have made the genome-wide and loci-specific estimation of direct offspring and parental genetic nurture effects more possible. However, unbiased estimation using these methods requires datasets where both parents and at least one child have been genotyped, which are relatively scarce. Our group has recently developed a method and accompanying software (IMPISH; Hwang et al., 2020) which is able to impute missing parental genotypes from observed data on sibships and estimate their effects on an offspring phenotype conditional on the effects of genetic transmission. However, this method is unable to disentangle maternal and paternal effects, which may differ in magnitude and direction. Here, we introduce an extension to the original IMPISH routine which takes advantage of all available nuclear families to impute parent-specific missing genotypes and obtain asymptotically unbiased estimates of genetic effects on offspring phenotypes. We apply this this method to data from related individuals in the UK Biobank, showing concordance with previous estimates of maternal genetic effects on offspring birthweight. We also conduct the first GWAS jointly estimating offspring-, maternal-, and paternal-specific genetic effects on body-mass index. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.07.242214v1?rss=1 Authors: Marcus, J. H., Ha, W., Barber, R. F., Novembre, J. Abstract: An important feature in spatial population genetic data is often "isolation-by-distance," where genetic differentiation tends to increase as individuals become more geographically distant. Recently, Petkova et al. (2016) developed a statistical method called Estimating Effective Migration Surfaces (EEMS) for visualizing spatially heterogeneous isolation-by-distance on a geographic map. While EEMS is a powerful tool for depicting spatial population structure, it can suffer from slow runtimes. Here we develop a related method called Fast Estimation of Effective Migration Surfaces (FEEMS). FEEMS uses a Gaussian Markov Random Field in a penalized likelihood framework that allows for efficient optimization and output of effective migration surfaces. Further, the efficient optimization facilitates the inference of migration parameters per edge in the graph, rather than per node (as in EEMS). When tested with coalescent simulations, FEEMS accurately recovers effective migration surfaces with complex gene-flow histories, including those with anisotropy. Applications of FEEMS to population genetic data from North American gray wolves shows it to perform comparably to EEMS, but with solutions obtained orders of magnitude faster. Overall, FEEMS expands the ability of users to quickly visualize and interpret spatial structure in their data. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.07.241554v1?rss=1 Authors: Methorst, R., de Borst, G. J., Pasterkamp, G., van der Laan, S. W. Abstract: Background and aims: Atherosclerosis is a lipid-driven inflammatory disease presumably initiated by endothelial activation. Low vascular shear stress is known for its ability to activate endothelial cells. Differential DNA methylation (DNAm) is a relatively unexplored player in atherosclerotic disease development and endothelial dysfunction. Literature search revealed that expression of 11 genes have been found to be associated with differential DNAm due to low shear stress in endothelial cells. We hypothesized a causal relationship between DNAm of shear stress associated genes in human carotid plaque and increased risk of cardiovascular disease. Methods: Using Mendelian randomisation (MR) analysis, we explored the potential causal role of DNAm of shear stress associated genes on cardiovascular disease risk. We used genetic and DNAm data of 442 carotid endarterectomy derived advanced plaques from the Athero-Express Biobank Study for quantitative trait loci (QTL) discovery and performed MR analysis using these QTLs and GWAS summary statistics of coronary artery disease (CAD) and ischemic stroke (IS). Results: We discovered 9 methylation QTLs in plaque for differentially methylated shear stress associated genes. We found no significant effect of shear stress gene promotor methylation and increased risk of CAD and IS. Conclusions: Differential methylation of shear stress associated genes in advanced atherosclerotic plaques in unlikely to increase cardiovascular risk. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.07.239103v1?rss=1 Authors: Lyu, B., Tsvetanov, K. A., Tyler, L. K., Clarke, A., Cam-CAN,, Amos, W. Abstract: Identifying the genetic variations impacting human brain structure and further affecting cognitive functions, will enhance our understanding of the fundamental bases of cognition. In this study, we take two different approaches to this issue: classical genome-wide association analysis (GWAS) and a relatedness-based regression approach (REL) to search for associations between genotype and brain structural measures of gray matter and white matter. Instead of searching genetic variants by testing the association between a phenotype trait and the genotype of each single-nucleotide polymorphism (SNP) as in GWAS, REL takes advantage of multiple SNPs within a genomic window as a single measure, which potentially find associations wherever the functional SNP is in linkage disequilibrium (LD) with SNPs that have been sampled. We also conducted a simulation analysis to systemically compare GWAS and REL with respect to different levels of LD. Both methods succeed in identifying genetic variations associated with regional and global brain structural measures, though they tend to give complementary results due to the very different aspects of genetic properties used. Simulation results suggest that GWAS outperforms REL when the signal is relatively weak. However, the collective effects due to local LD boost the performance of REL with increasing signal strength, resulting in better performance than GWAS. Our study suggests that the optimal approach may vary across the genome and that pre-testing for LD could allow GWAS to be preferred where LD is high and REL to be used where LD is low, or the local pattern of LD is complex. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.07.240887v1?rss=1 Authors: Martell, H. J., Griffin, D. K., Wass, M. N. Abstract: The availability of thousands of individual genomes provides many opportunities to understand genetic variation and the relationship to phenotype, particularly disease. However, this remains challenging as it is often difficult to identify if a non-synonymous variant alters protein structure and function. Many computational methods have been developed but they typically interpret individual variants in isolation, despite the possibility of variant-variant interactions. Here, we combine the genetic variation data present in the 1000 genome project with protein structural data to identify variant-variant interactions within individual human genomes. We find more than 4,000 combinations of variants that located close in 3D dimensional structure and more than 1,200 in protein-protein interfaces. Many variant combinations include amino acid changes that are compensatory such as maintaining charges or functional groups, thus supporting that these are coevolutionary events. This highlights the need for variant interpretation and precision medicine to consider the gestalt effects of variants. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.05.239087v1?rss=1 Authors: Joshi, B. D., Singh, V. K., Singh, H., Singh, A., Singh, S. K., Chandra, K., Sharma, L. K., Thakur, M. Abstract: In the present study, we explored the intraspecific genetic variation and phylogeographic relationship among all the reported species in the genus Naemorhedus distributed in a wide range of habitats. The Bayesian based phylogeny demonstrated that Himalayan goral, is a highly diverged species from the other reported species of gorals. We claim the presence of two valid sub-species of Himalayan goral, i.e. N. g. bedfordi and N. g. goral, distributed in the western and central Himalaya, respectively. The comparative analysis with the inclusion of data available from different ranges, suggests the presence of plausibly six species of gorals across the distribution with a few valid subspecies. Further, we report that N. griseus is a valid species and not the synonyms of N. goral considering the observed discrepancy in the available sequences. We recommend all the sub-species present at distant locations may be considered as Evolutionary Significant Units (ESUs) and, therefore, appeal to provide them special attention for long term conservation and management. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.06.239137v1?rss=1 Authors: Genty, G., Guarnizo, C. E., Ramirez, J. P., Barrientos, L., Crawford, A. J. Abstract: The complex topography of the species-rich northern Andes creates heterogeneous environmental landscapes that are hypothesized to have promoted population fragmentation and diversification by vicariance, gradients and/or the adaptation of species. Previous phylogenetic work on the Palm Rocket Frog (Anura: Aromobatidae: Rheobates spp.), endemic to mid-elevation forests of Colombia, suggested valleys were important in promoting divergence between lineages. In this study, we use a spatially, multi-locus population genetic approach of two mitochondrial and four nuclear genes from 25 samples representing the complete geographic range of the genus to delimit species and test for landscape effects on genetic divergence within Rheobates. We tested three landscape genetic models: isolation by distance, isolation by resistance, and isolation by environment. Bayesian species delimitation (BPP) and a Poisson Tree Process (PTP) model both recovered five highly divergent genetic lineages within Rheobates, rather than the three inferred in a previous study. We found that an isolation by environment provided the only variable significantly correlated with genetic distances for both mitochondrial and nuclear genes, suggesting that local adaptation may have a role driving the genetic divergence within this genus of frogs. Thus, genetic divergence in Rheobates may be driven by the local environments where these frogs live, even more so that by the environmental characteristics of the intervening regions among populations (i.e., geographic barriers). Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.05.238162v1?rss=1 Authors: Shi, Q., Jiang, X., Zhang, Y., Gao, J., Zhang, H., Ali, A., Zhao, D., Bao, J., Iqbal, F., Jiang, L. Abstract: Exploring the genetic basis of human infertility is currently under intensive investigation. However, only a handful of genes are validated in animal models as disease-causing genes in infertile men. Thus, to better understand the genetic basis of spermatogenesis in human and to bridge the knowledge gap between human and other animal species, we have constructed FertilityOnline database, which is a resource that integrates the functional genes reported in literature related to spermatogenesis into an existing spermatogenic database, SpermatogenesisOnline 1.0. Additional features like functional annotation and statistical analysis of genetic variants of human genes, are also incorporated into FertilityOnline. By searching this database, users can focus on the top candidate genes associated with infertility and can perform enrichment analysis to instantly refine the number of candidates in a user-friendly web interface. Clinical validation of this database is established by the identification of novel causative mutations in SYCE1 and STAG3 in azoospermia men. In conclusion, FertilityOnline is not only an integrated resource for analysis of spermatogenic genes, but also a useful tool that facilitates to study underlying genetic basis of male infertility. Availability: FertilityOnline can be freely accessed at http://mcg.ustc.edu.cn/bsc/spermgenes2.0/index.html. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.04.236968v1?rss=1 Authors: Plasil, S. L., Seth, A., Homanics, G. E. Abstract: The development of CRISPR/Cas9 technology has vastly sped up the process of genome editing by introducing a bacterial system that can be exploited for reverse genetics-based research. However, generating homozygous knockout (KO) animals using traditional CRISPR/Cas9-mediated techniques requires three generations of animals. A founder animal with a desired mutation is crossed to produce heterozygous F1 offspring which are subsequently interbred to generate homozygous F2 KO animals. This study describes a novel adaptation of the CRISPR/Cas9-mediated method to develop a homozygous gene-targeted KO animal cohort in one generation. A well-characterized ethanol-responsive gene, MyD88, was chosen as a candidate gene for generation of MyD88-/- mice as proof of concept. Previous studies have reported changes in ethanol-related behavioral outcomes in MyD88 KO mice. Therefore, it was hypothesized that a successful one-generation KO of MyD88 should reproduce decreased responses to ethanols sedative effects, as well as increased ethanol consumption in males that were observed in previous studies. One-cell mouse embryos were simultaneously electroporated with four gRNAs targeting a critical Exon of MyD88 along with Cas9. DNA and RNA analysis of founder mice revealed a complex mix of genetic alterations, all of which were predicted to ablate MyD88 gene function. This study additionally compared responses of Mock treatment control mice generated through electroporation to controls purchased from a vendor. No substantial behavioral changes were noted between control cohorts. Overall, the CRISPR/Cas9 KO protocol reported here, which we call CRISPR Turbo Accelerated KnockOut (CRISPy TAKO), will be useful for reverse genetic in vivo screens of gene function in whole animals. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.05.236901v1?rss=1 Authors: Lin, Z., Seal, S., Basu, S. Abstract: SNP heritability of a trait is measured by the proportion of total variance explained by the additive effects of genome-wide single nucleotide polymorphisms (SNPs). Linear mixed models are routinely used to estimate SNP heritability for many complex traits. The basic concept behind this approach is to model genetic contribution as a random effect, where the variance of this genetic contribution attributes to the heritability of the trait. This linear mixed model approach requires estimation of relatedness among individuals in the sample, which is usually captured by estimating a genetic relationship matrix (GRM). Heritability is estimated by the restricted maximum likelihood (REML) or method of moments (MOM) approaches, and this estimation relies heavily on the GRM computed from the genetic data on individuals. Presence of population substructure in the data could significantly impact the GRM estimation and may introduce bias in heritability estimation. The common practice of accounting for such population substructure is to adjust for the top few principal components of the GRM as covariates in the linear mixed model. Here we propose an alternative way of estimating heritability in multi-ethnic studies. Our proposed approach is a MOM estimator derived from the Haseman-Elston regression and gives an asymptotically unbiased estimate of heritability in presence of population stratification. It introduces adjustments for the population stratification in a second-order estimating equation and allows for the total phenotypic variance vary by ethnicity. We study the performance of different MOM and REML approaches in presence of population stratification through extensive simulation studies. We estimate the heritability of height, weight and other anthropometric traits in the UK Biobank cohort to investigate the impact of subtle population substructure on SNP heritability estimation. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.05.237768v1?rss=1 Authors: Pachano, T., Sanchez-Gaya, V., Mariner-Fauli, M., Ealo, T., Asenjo, H. G., Respuela, P., Cruz-Molina, S., van IJcken, W., Landeira, D., Rada-Iglesias, A. Abstract: CpG islands (CGIs) represent a distinctive and widespread genetic feature of vertebrate genomes, being associated with ~70% of all annotated gene promoters. CGIs have been proposed to control transcription initiation by conferring nearby promoters with unique chromatin properties. In addition, there are thousands of distal or orphan CGIs (oCGIs) whose functional relevance and mechanism of action are barely known. Here we show that oCGIs are an essential component of poised enhancers (PEs) that boost their long-range regulatory activity and dictate the responsiveness of their target genes. Using a CRISPR/Cas9 knock-in strategy in mESC, we introduced PEs with or without oCGIs within topological associating domains (TADs) harbouring genes with different types of promoters. By evaluating the chromatin, topological and regulatory properties of the engineered PEs, we uncover that, rather than increasing their local activation, oCGIs boost the physical and functional communication between PEs and distally located developmental genes. Furthermore, we demonstrate that developmental genes with CpG rich promoters are particularly responsive to PEs and that such responsiveness depends on the presence of oCGIs. Therefore, our work unveils a novel role for CGIs as genetic determinants of the compatibility between genes and enhancers, thus providing major insights into how developmental gene expression programs are deployed under both physiological and pathological conditions. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.05.237172v1?rss=1 Authors: Paschini, M., Reyes, C. M., Gillespie, A. E., Lewis, K. A., Glustrom, L. W., Sharpee, T. O., Wuttke, D. S., Lundblad, V. Abstract: Telomeres present unique challenges for genomes with linear chromosomes, including the inability of the semi-conservative DNA replication machinery to fully duplicate the ends of linear molecules. This is solved in virtually all eukaryotes by the enzyme telomerase, through the addition of telomeric repeats onto chromosome ends. It is widely assumed that the primary site of action for telomerase is the single-stranded G-rich overhang at the ends of chromosomes, formed after DNA replication is complete. We show here that the preferred substrate for telomerase in wild type yeast is instead a collapsed fork generated during replication of duplex telomeric DNA. Furthermore, newly collapsed forks are extensively elongated by telomerase by as much as ~200 nucleotides in a single cell division, indicating that a major source of newly synthesized telomeric repeats in wild type cells occurs at collapsed forks. Fork collapse and the subsequent response by telomerase are coordinated by the dual activities of a telomere-dedicated RPA-like complex, which facilitates replication of duplex telomeric DNA and also recruits telomerase to the fork, thereby ensuring a high probability of re-elongation if DNA replication fails. We further show that the ability of telomerase to elongate newly collapsed forks is dependent on the Rad51 protein, indicating that telomerase activity in response to fork collapse proceeds through a regulatory pathway distinct from how telomerase engages fully replicated chromosome termini. We propose a new model in which spontaneous replication fork collapse and the subsequent response by telomerase is a major determinant of telomere length homeostasis. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.05.232348v1?rss=1 Authors: Bosch, J. A., Birchak, G., Perrimon, N. Abstract: Precise genome editing is a valuable tool to study gene function in model organisms. Prime editing, a precise editing system developed in mammalian cells, does not require double strand breaks or donor DNA and has low off-target effects. Here, we applied prime editing for the model organism Drosophila melanogaster and developed conditions for optimal editing. By expressing prime editing components in cultured cells or somatic cells of transgenic flies, we precisely installed premature stop codons in three classical visible marker genes, ebony, white, and forked. Furthermore, by restricting editing to germ cells, we demonstrate efficient germ line transmission of a precise edit in ebony to ~50% of flies. Our results suggest that prime editing is a useful system in Drosophila to study gene function, such as engineering precise point mutations, deletions, or epitope tags. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.05.216739v1?rss=1 Authors: Kosicki, M., Allen, F., Bradley, A. Abstract: Repair of Cas9-induced double-stranded breaks results primarily in formation of small indels, but can also cause potentially harmful large deletions. While mechanisms leading to the creation of small indels are relatively well understood, very little is known about the origins of large deletions. Using a novel library of clonal mouse embryonic stem cells bona fide deficient for 32 DNA repair genes, we have shown that large deletion frequency increases in cells impaired for non-homologous end joining and decreases in cells deficient for the central resection gene Nbn and the microhomology-mediated end joining gene Polq. Across deficient clones, increase in large deletion frequency was closely correlated with the increase in the extent of microhomology and the size of small indels, implying a continuity of repair processes across different genomic scales. Furthermore, by targeting diverse genomic sites, we identified examples of repair processes that were highly locus-specific, discovering a novel role for exonuclease Trex1. Finally, we present evidence that indel sizes increase with the overall efficiency of Cas9 mutagenesis. These findings may have impact on both basic research and clinical use of CRISPR-Cas9, in particular in conjunction with repair pathway modulation. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.03.235127v1?rss=1 Authors: Haskell, D., Zinovyeva, A. Abstract: microRNAs (miRNAs) and RNA binding proteins (RBPs) regulate gene expression at the post-transcriptional level, but the extent to which these key regulators of gene expression coordinate and the precise mechanisms of their coordination are not well understood. RNA binding proteins often have recognizable RNA binding domains that correlate with specific protein function. Recently, several RBPs containing K Homology (KH) RNA binding domains were shown to work with miRNAs to regulate gene expression, raising the possibility that KH domains may be important for coordinating with miRNA pathways in gene expression regulation. To ascertain whether additional KH domain proteins functionally interact with miRNAs during Caenorhabditis elegans development, we knocked down twenty-four genes encoding KH-domain proteins in several miRNA sensitized genetic backgrounds. Here, we report that a majority of the KH domain-containing genes genetically interact with multiple miRNAs and Argonaute alg-1. Interestingly, two KH domain genes, predicted splicing factors sfa- and asd-2, genetically interacted with all of the miRNA mutants tested, while other KH domain genes exhibited functional interactions only with specific miRNAs. Our domain architecture and phylogenetic relationship analyses of the C. elegans KH domain-containing proteins revealed potential groups that may share both structure and function. Collectively, we show that many C. elegans KH domain RBPs functionally interact with miRNAs, suggesting direct or indirect coordination between these two classes of post-transcriptional gene expression regulators. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.04.235994v1?rss=1 Authors: Strausz, S., Ruotsalainen, S. E., Ollila, H. M., Karjalainen, J., Reeve, M., Kurki, M., Mars, N., Havulinna, A. S., Kiiskinen, T., Mansour Aly, D., Ahlqvist, E., Teder-Laving, M., Palta, P., Groop, L., Magi, R., Makitie, A., Salomaa, V., Bachour, A., Tuomi, T., Palotie, A., Palotie, T., Ripatti, S. Abstract: There is currently only limited understanding of the genetic aetiology of obstructive sleep apnoea (OSA). The aim of our study is to identify genetic loci associated with OSA risk and to test if OSA and its comorbidities share a common genetic background. We conducted the first large-scale genome-wide association study of OSA using FinnGen Study (217,955 individuals) with 16,761 OSA patients identified using nationwide health registries. We estimated 8.3% [0.06-0.11] heritability and identified five loci associated with OSA (P < 5.0x10^-8): rs4837016 near GTPase activating protein and VPS9 domains 1 (GAPVD1), rs10928560 near C-X-C motif chemokine receptor 4 (CXCR4), rs185932673 near Calcium/calmodulin-dependent protein kinase ID (CAMK1D) and rs9937053 near Fat mass and obesity-associated protein (FTO) - a variant previously associated with body mass index (BMI). In a BMI-adjusted analysis, an association was observed for rs10507084 near Rhabdomyosarcoma 2 associated transcript (RMST)/NEDD1 gamma-tubulin ring complex targeting factor (NEDD1). We found genetic correlations between OSA and BMI (rg=0.72 [0.62-0.83]) and with comorbidities including hypertension, type 2 diabetes (T2D), coronary heart disease (CHD), stroke, depression, hypothyroidism, asthma and inflammatory rheumatic diseases (IRD) (rg > 0.30). Polygenic risk score (PRS) for BMI showed 1.98-fold increased OSA risk between the highest and the lowest quintile and Mendelian randomization supported a causal relationship between BMI and OSA. Our findings support the causal link between obesity and OSA and joint genetic basis between OSA and comorbidities. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.07.31.231514v1?rss=1 Authors: Shastry, V., Adams, P. E., Lindtke, D., Mandeville, E. G., Parchman, T. L., Gompert, Z., Buerkle, C. A. Abstract: Non-random mating among individuals can lead to spatial clustering of genetically similar individuals and population stratification. This deviation from panmixia is commonly observed in natural populations. Consequently, individuals can have parentage in single populations or involving hybridization between differentiated populations. Accounting for this mixture and structure is important when mapping the genetics of traits and learning about the formative evolutionary processes that shape genetic variation among individuals and populations. Stratified genetic relatedness among individuals is commonly quantified using estimates of ancestry that are derived from a statistical model. Development of these models for polyploid and mixed-ploidy individuals and populations has lagged behind those for diploids. Here, we extend and test a hierarchical Bayesian model, called entropy, which can utilize low-depth sequence data to estimate genotype and ancestry parameters in autopolyploid and mixed-ploidy individuals (including sex chromosomes and autosomes within individuals). Our analysis of simulated data illustrated the trade-off between sequencing depth and genome coverage and found lower error associated with low depth sequencing across a larger fraction of the genome than with high depth sequencing across a smaller fraction of the genome. The model has high accuracy and sensitivity as verified with simulated data and through analysis of admixture among populations of diploid and tetraploid Arabidopsis arenosa. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.07.31.231126v1?rss=1 Authors: McCartney, D. L., Chasman, D. I., Visscher, P. M., Muniz-Terrera, G., Cox, S. R., Russ, T. C., Marioni, R. E., Mur, J. Abstract: Genetic variation in VKORC1 is associated with differences in the coagulation of blood and consequentially with sensitivity to the drug warfarin. Variation in VKORC1 has also been linked to parental dementia. However, it is unclear whether the relationship persists for the diagnosis in patients themselves, whether the association holds only for certain forms of dementia, and if those taking warfarin are at greater risk. Here, we use data from 211,423 participants from UK Biobank to examine the relationship between VKORC1, risk of dementia, and the interplay with warfarin use. We find that the T-allele in rs9923231 confers a greater risk for vascular dementia (OR=1.28, p=0.0069), but not for general dementia (OR=1.04, p=0.21) or Alzheimer dementia (OR=1.05, p=0.35), and that the risk of vascular dementia is not affected by warfarin use in carriers of the T-allele. Our study reports for the first time an association between rs9923231 and vascular dementia, but further research is warranted to explore potential mechanisms and specify the relationship between rs9923231 and features of vascular dementia. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.01.232512v1?rss=1 Authors: Jang, J. K., Gladstein, A., Das, A., Cisco, Z., McKim, K. Abstract: Meiosis in female oocytes lack centrosomes, the major microtubule-organizing center, which may make them especially vulnerable to aneuploidy. In the acentrosomal oocytes of Drosophila, meiotic spindle assembly depends on the chromosomal passenger complex (CPC). Aurora B is the catalytic component of the CPC while the remaining subunits regulate its localization. Using an inhibitor of Aurora B activity, Binucleine 2, we found that continuous Aurora B activity is required to maintain the oocyte spindle during meiosis I. Furthermore, the necessity of a kinase for spindle regulation suggests that spindle dynamics is regulated by phosphatases. Our result have shown that the protein complex Protein Phosphatase 2A (PP2A) opposes CPC activity, probably by dephosphorylating spindle associated proteins such as the Kinesins. PP2A exists in two varieties, B55 and B56. While both antagonize Aurora B, they typically exhibit different localization and function. B55 has only minor roles in meiosis I spindle function. The B56 subunit is encoded by two partially redundant paralogs in the Drosophila genome, wdb and wrd. Knocking down both B56 subunits showed they are critical for multiple functions during meiosis I, including maintaining sister centromere and arm cohesion, end-on microtubule attachments, and the metaphase I arrest in oocytes. We found that WDB recruitment to the centromeres depends on BubR1, MEI-S332, and kinetochore protein SPC105R. However, only SPC105R is required for cohesion maintenance during meiosis I. We propose that SPC105R promotes cohesion maintenance by recruiting two proteins that further recruit PP2A, MEI-S332, and the Soronin homolog Dalmatian. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.03.232926v1?rss=1 Authors: Bengani, H., Grozeva, D., Moyon, L., Bhatia, S., Louros, S. R., Hope, J., Jackson, A., Prendergast, J., Owen, L. J., Naville, M., Rainger, J., Grimes, G., Halachev, M., Murphy, L. C., Boskovic, O. S., Heyningen, V. v., Kind, P., Abbott, C. M., Osterweil, E., Raymond, L., Roest Crollius, H., FitzPatrick, D. Abstract: Undiagnosed neurodevelopmental disease is significantly associated with rare variants in cis-regulatory elements (CRE) but demonstrating causality is challenging as target gene consequences may differ from a causative variant affecting the coding region. Here, we address this challenge by applying a procedure to discriminate likely diagnostic regulatory variants from those of neutral/low-penetrant effect. We identified six rare CRE variants using targeted and whole genome sequencing in 48 unrelated males with apparent X-linked intellectual disability (XLID) but without detectable coding region variants. These variants segregated appropriately in families and altered conserved bases in predicted CRE targeting known XLID genes. Three were unique and three were rare but too common to be plausibly causative for XLID. We compared the cis-regulatory activity of wild-type and mutant alleles in zebrafish embryos using dual-color fluorescent reporters. Two variants showed striking changes: one plausibly causative (FMR1CRE) and the other likely neutral/low-penetrant (TENM1CRE).These variants were knocked-in to mice and both altered embryonic neural expression of their target gene. Only Fmr1CRE mice showed disease-relevant behavioral defects. FMR1CRE is plausibly disease-associated resulting in complex misregulation of Fmr1/FMRP rather than loss-of-function. This is consistent both with absence of Fragile X syndrome in the probands and the observed electrophysiological anomalies in the FMR1CRE mouse brain. Although disruption of in vivo patterns of endogenous gene expression in disease-relevant tissues by CRE variants cannot be used as strong evidence for Mendelian disease association, in conjunction with extreme rarity in human populations and with relevant knock-in mouse phenotypes, such variants can become likely pathogenic. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.03.234005v1?rss=1 Authors: Gordenin, D. A., Klimczak, L. J., Randall, T. A., Saini, N., Li, J.-L. Abstract: Genomes of tens of thousands of SARS-CoV2 isolates have been sequenced across the world and the total number of changes (predominantly single base substitutions) in these isolates exceeds ten thousand. We compared the mutational spectrum in the new SARS-CoV-2 mutation dataset with the previously published mutation spectrum in hypermutated genomes of rubella - another positive single stranded (ss) RNA virus. Each of the rubella isolates arose by accumulation of hundreds of mutations during propagation in a single subject, while SARS-CoV-2 mutation spectrum represents a collection events in multiple virus isolates from individuals across the world. We found a clear similarity between the spectra of single base substitutions in rubella and in SARS-CoV-2, with C to U as well as A to G and U to C being the most prominent in plus strand genomic RNA of each virus. Of those, U to C changes universally showed preference for loops versus stems in predicted RNA secondary structure. Similarly, to what was previously reported for rubella, C to U changes showed enrichment in the uCn motif, which suggested a subclass of APOBEC cytidine deaminase being a source of these substitutions. We also found enrichment of several other trinucleotide-centered mutation motifs only in SARS-CoV-2 - likely indicative of a mutation process characteristic to this virus. Altogether, the results of this analysis suggest that the mutation mechanisms that lead to hypermutation of the rubella vaccine virus in a rare pathological condition may also operate in the background of the SARS-CoV-2 viruses currently propagating in the human population. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.03.235135v1?rss=1 Authors: Han, C., Yu, T., Qin, W., Liao, X., Huang, J., Liu, Z., Yu, L., Liu, X., Chen, Z., Yang, C., Wang, X., Mo, S., Zhu, G., Su, H., Mo, Z., Peng, T. Abstract: Background/Aims: Dietary aflatoxin B1 (AFB1) exposure, which induces DNA damage and codon 249 mutation of the TP53 gene, is one of the major risk factors for hepatocellular carcinoma (HCC). Hepatitis B virus (HBV) infection and AFB1 exert synergistic effects to promote carcinogenesis and TP53 R249S mutation in HCC. Methods: A genome-wide association study (GWAS) was conducted on 485 cases of HCC with chronic HBV infection, followed by a two-stage replication study on 270 cases with chronic HBV infection. Susceptibility variants for the TP53 R249S mutation in HCC were identified based on both GWAS and replication analysis. The associations of identified variants with expression levels of their located genes were validated in 20 paired independent samples. Results: Our results showed that TP53 R249S was significantly associated with ADAMTS18 rs9930984 (adjusted P = 4.84 x 10-6), WDR49 rs75218075 (adjusted P = 7.36 x 10-5) and SLC8A3 rs8022091 (adjusted P = 0.042). Additionally, ADAMTS18 mRNA expression was significantly higher in HCC tissue, compared with paired non-tumor tissue (P = 0.041) and patients carrying the TT genotype at rs9930984 showed lower ADAMTS18 expression in non-tumor tissue, compared with those carrying the GT genotype (P = 0.0028). Conclusions: TP53 expression is significantly associated with R249S mutation in HCC. Our collective results suggest that rs9930984, rs75218075 and rs8022091 are associated with susceptibility to the R249S mutation in cases of HCC exposed to AFB1 and HBV infection. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.03.235192v1?rss=1 Authors: Gifford, I., Dasgupta, A., Barrick, J. E. Abstract: Due to their universal presence and high sequence conservation, rRNA sequences are used widely in phylogenetics for inferring evolutionary relationships between microbes and in metagenomics for analyzing the composition of microbial communities. Most microbial genomes encode multiple copies of ribosomal RNA (rRNA) genes to supply cells with sufficient capacity for protein synthesis. These copies typically undergo concerted evolution that keeps their sequences identical, or nearly so, due to gene conversion, a type of intragenomic recombination that changes one copy of a homologous sequence to exactly match another. Widely varying rates of rRNA gene conversion have previously been estimated by comparative genomics methods and using genetic reporter assays. To more directly measure rates of rRNA intragenomic recombination, we sequenced the seven Escherichia coli rRNA operons in 15 lineages of cells that were evolved for ~13,750 generations with frequent single-cell bottlenecks that reduce the effects of selection. We identified 34 gene conversion events and estimate an overall rate of intragenomic recombination events between rRNA copies of 3.2 x 10-4 per generation or 5.3 x 10-5 per potential donor sequence. This rate varied only slightly from random expectations between different portions of the rRNA genes and between rRNA operons located at different locations in the genome. This accurate estimate of the rate of rRNA gene conversions fills a gap in our quantitative understanding of how ribosomal sequences and other multicopy elements diversify and homogenize during microbial genome evolution. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.03.235036v1?rss=1 Authors: Ghanta, K. S., Mello, C. C. Abstract: CRISPR genome editing has revolutionized genetics in many organisms. In the nematode Caenorhabditis elegans one injection into each of the two gonad arms of an adult hermaphrodite exposes hundreds of meiotic germ cells to editing mixtures, permitting the recovery of multiple indels or small precision edits from each successfully injected animal. Unfortunately, particularly for long insertions, editing efficiencies can vary widely, necessitating multiple injections, and often requiring co-selection strategies. Here we show that melting double stranded DNA (dsDNA) donor molecules prior to injection increases the frequency of precise homology-directed repair (HDR) by several fold for longer edits. We describe troubleshooting strategies that enable consistently high editing efficiencies resulting, for example, in up to 100 independent GFP knock-ins from a single injected animal. These efficiencies make C. elegans by far the easiest metazoan to genome edit, removing barriers to the use and adoption of this facile system as a model for understanding animal biology. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.02.233056v1?rss=1 Authors: Adolfi, A., Gantz, V. M., Jasinskiene, N., Lee, H.-F., Hwang, K., Bulger, E. A., Ramaiah, A., Bennett, J. B., Terradas, G., Emerson, J. J., Marshall, J. M., Bier, E., James, A. A. Abstract: The development of Cas9/gRNA-mediated gene-drive systems has bolstered the advancement of genetic technologies for controlling vector-borne pathogen transmission. These include population suppression approaches, genetic analogs of insecticidal techniques that reduce the number of vector insects, and population modification (replacement/alteration) approaches, which interfere with competence to transmit pathogens. We developed the first recoded gene-drive rescue system for population modification in the malaria vector, Anopheles stephensi, that relieves the load in females caused by integration of the drive into the kynurenine hydroxylase gene by rescuing its function. Non-functional resistant alleles are eliminated via a dominantly-acting maternal effect combined with slower-acting standard negative selection, and a functional resistant allele does not prevent drive invasion. Small cage trials show that single releases of gene-drive males robustly result in efficient population modification with >95% of mosquitoes carrying the drive within 5-11 generations over a range of initial release ratios. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.07.31.230565v1?rss=1 Authors: Rousseau, J., Mbakam, C. H., Guyon, A., Tremblay, G., Begin, F. G., Tremblay, J. P. Abstract: Base editing technique and PRIME editing techniques derived from the CRISPR/Cas9 discovery permit to modify selected nucleotides. We initially used the base editing technique to introduce in the APP gene the A673T mutation, which prevents the development of Alzheimer disease. Although the desired cytidine to thymidine mutation was inserted in up to 17% of the APP gene in HEK393T, there were also modifications of up to 20% of other nearby cytidines. More specific mutations of the APP gene were obtained with the PRIME editing technique. However, the best percentage of mutations was only 5.8%. The efficiency of the PRIME editing treatment was initially tested on the EMX1 gene. A single treatment produced the desired modification in 36% of the EMX1 gene. Three consecutive treatments increased the percentage of mutations to 50%. The PRIME editing technique was also used to insert specific point mutations in exons 9 and 35 of the DMD gene coding for the dystrophin gene and which is mutated in Duchenne Muscular Dystrophy (DMD). Up to 10% desired mutations of the DMD gene were obtained. Three repeated treatments increased the percentage of specific mutations to 16%. Given that there are thousands of nuclei inside a human muscle fiber and that the dystrophin nuclear domain is about 500 microns, this level of modifications would be sufficient to produce a phenotype improvement in DMD patients. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.07.30.228932v1?rss=1 Authors: Su, J., Zhao, J., Zhao, S., Shang, X., Pang, S., Chen, S., Liu, D., Kang, Z., Wang, X. Abstract: Due to the field soil changes, high density planting, and straw-returning methods, wheat common root rot (spot blotch) and Fusarium crown rot (FCR) have become severe threatens to global wheat productions. Only a few wheat genotypes show moderate resistance to these root and crown rot fungal diseases, and the genetic determinants of wheat resistance to these two devastating diseases have been poorly understood. This review summarizes the recent progress of genetic studies on wheat resistance to common root rot and Fusarium crown rot. Wheat germplasms with relative higher resistance are highlighted and genetic loci controlling the resistance to each of the disease are summarized. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.07.30.228825v1?rss=1 Authors: Madison, B. S., Flanagan, M. K., Nath, S., White, M. A. Abstract: Crossover frequency often differs substantially between sexes (i.e. heterochiasmy). Although this phenomenon is widespread throughout taxa, the mechanisms that lead to heterochiasmy remain unclear. One pattern that has emerged is that the overall length of the synaptonemal complex likely has a direct influence on the total number of crossovers in each sex. However, this has only been investigated in a handful of species. The threespine stickleback fish (Gasterosteus aculeatus) is an excellent species to explore whether synaptonemal complex length is associated with a difference in the total number of crossovers, as females have longer linkage maps than males. We used immunocytogenetics to quantify synaptonemal complex length in late pachytene female and male meiocytes. We found that females had synaptonemal complexes that were 1.65 times longer than males, which is remarkably similar to the length difference observed in a sex-specific linkage map constructed from a cross between two other populations. Our results support a model where chromosome axis length determines overall crossover frequency and establish the threespine stickleback as a useful species to explore the mechanistic basis of heterochiasmy. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.07.31.229682v1?rss=1 Authors: Xiang, S., Koshland, D. Abstract: Cohesin helps mediate sister chromatid cohesion, chromosome condensation, DNA repair and transcription regulation. Cohesin can tether two regions of DNA and also extrude DNA loops. We interrogated cohesin architecture, oligomerization and function in vivo through proximity based labeling of cohesin domains. Our results suggest that the hinge and head domain of cohesin both bind DNA, and that cohesin coiled coils bend, bringing the head and hinge together to form a butterfly conformation. Our data also suggest that cohesin efficiently oligomerizes on and off DNA. The levels of oligomers and their distribution on chromosomes are cell cycle regulated. Cohesin oligomerization is blocked by mutations in distinct domains of the Smc3p and Mcd1p subunits of cohesin or in Pds5p, a cohesin regulator. These mutations also block the maintenance of cohesion but not loop extrusion, suggesting that cohesin oligomerization plays a specific role in the maintenance of cohesion. Copy rights belong to original authors. Visit the link for more info