POPULARITY
In this week's episode Greg and Patrick talk about group coding approaches, like dummy variables and effect code variables, for helping to analyze group differences within the larger general linear model. Along they way they also discuss hacking up a lung, made for audio faces, walking pneumonia, putting Vicks VapoRub on your feet, cards in your spokes, confusing rental cars, crash test dummies, what is your quest, 25-cent Nyquil night, Bonferroni glasses, the Romans, and Nyquil haze. Stay in contact with Quantitude! Web page: quantitudepod.org TwitterX: @quantitudepod YouTube: @quantitudepod Merch: redbubble.com
In this lecture, we review what we have learned about one-sample confidence intervals (i.e., how to use them as graphical versions of one-sample t-tests) for absolute performance estimation in order to motivate the problem of relative performance estimation. We introduce two-sample confidence intervals (i.e., confidence intervals on DIFFERENCES based on different two-sample t-tests) that are tested against a null hypothesis of 0. This means covering confidence interval half widths for the paired-difference t-test, the equal-variance (pooled) t-test, and Welch's unequal variance t-test. Each of these different experimental conditions sets up a different standard error of the mean formula and formula for degrees of freedom that are used to define the actual confidence interval half widths (centered on the difference in sample means in the pairwise comparison of systems). We then generalize to the case of more than 2 systems, particularly for "ranking and selection (R&S)." This lets us review the multiple-comparisons problem (and Bonferroni correction) and how post hoc tests (after an ANOVA) are more statistically powerful ways to do comparisons.
Apologies for lower audio quality; we lost recordings and had to use backup tracks. Our guests today are Anastasios Angelopoulos and Wei-Lin Chiang, leads of Chatbot Arena, fka LMSYS, the crowdsourced AI evaluation platform developed by the LMSys student club at Berkeley, which became the de facto standard for comparing language models. Arena ELO is often more cited than MMLU scores to many folks, and they have attracted >1,000,000 people to cast votes since its launch, leading top model trainers to cite them over their own formal academic benchmarks:The Limits of Static BenchmarksWe've done two benchmarks episodes: Benchmarks 101 and Benchmarks 201. One issue we've always brought up with static benchmarks is that 1) many are getting saturated, with models scoring almost perfectly on them 2) they often don't reflect production use cases, making it hard for developers and users to use them as guidance. The fundamental challenge in AI evaluation isn't technical - it's philosophical. How do you measure something that increasingly resembles human intelligence? Rather than trying to define intelligence upfront, Arena let users interact naturally with models and collect comparative feedback. It's messy and subjective, but that's precisely the point - it captures the full spectrum of what people actually care about when using AI.The Pareto Frontier of Cost vs IntelligenceBecause the Elo scores are remarkably stable over time, we can put all the chat models on a map against their respective cost to gain a view of at least 3 orders of magnitude of model sizes/costs and observe the remarkable shift in intelligence per dollar over the past year:This frontier stood remarkably firm through the recent releases of o1-preview and price cuts of Gemini 1.5:The Statistics of SubjectivityIn our Benchmarks 201 episode, Clémentine Fourrier from HuggingFace thought this design choice was one of shortcomings of arenas: they aren't reproducible. You don't know who ranked what and what exactly the outcome was at the time of ranking. That same person might rank the same pair of outputs differently on a different day, or might ask harder questions to better models compared to smaller ones, making it imbalanced. Another argument that people have brought up is confirmation bias. We know humans prefer longer responses and are swayed by formatting - Rob Mulla from Dreadnode had found some interesting data on this in May:The approach LMArena is taking is to use logistic regression to decompose human preferences into constituent factors. As Anastasios explains: "We can say what components of style contribute to human preference and how they contribute." By adding these style components as parameters, they can mathematically "suck out" their influence and isolate the core model capabilities.This extends beyond just style - they can control for any measurable factor: "What if I want to look at the cost adjusted performance? Parameter count? We can ex post facto measure that." This is one of the most interesting things about Arena: You have a data generation engine which you can clean and turn into leaderboards later. If you wanted to create a leaderboard for poetry writing, you could get existing data from Arena, normalize it by identifying these style components. Whether or not it's possible to really understand WHAT bias the voters have, that's a different question.Private EvalsOne of the most delicate challenges LMSYS faces is maintaining trust while collaborating with AI labs. The concern is that labs could game the system by testing multiple variants privately and only releasing the best performer. This was brought up when 4o-mini released and it ranked as the second best model on the leaderboard:But this fear misunderstands how Arena works. Unlike static benchmarks where selection bias is a major issue, Arena's live nature means any initial bias gets washed out by ongoing evaluation. As Anastasios explains: "In the long run, there's way more fresh data than there is data that was used to compare these five models." The other big question is WHAT model is actually being tested; as people often talk about on X / Discord, the same endpoint will randomly feel “nerfed” like it happened for “Claude European summer” and corresponding conspiracy theories:It's hard to keep track of these performance changes in Arena as these changes (if real…?) are not observable.The Future of EvaluationThe team's latest work on RouteLLM points to an interesting future where evaluation becomes more granular and task-specific. But they maintain that even simple routing strategies can be powerful - like directing complex queries to larger models while handling simple tasks with smaller ones.Arena is now going to expand beyond text into multimodal evaluation and specialized domains like code execution and red teaming. But their core insight remains: the best way to evaluate intelligence isn't to simplify it into metrics, but to embrace its complexity and find rigorous ways to analyze it. To go after this vision, they are spinning out Arena from LMSys, which will stay as an academia-driven group at Berkeley.Full Video PodcastChapters* 00:00:00 - Introductions* 00:01:16 - Origin and development of Chatbot Arena* 00:05:41 - Static benchmarks vs. Arenas* 00:09:03 - Community building* 00:13:32 - Biases in human preference evaluation* 00:18:27 - Style Control and Model Categories* 00:26:06 - Impact of o1* 00:29:15 - Collaborating with AI labs* 00:34:51 - RouteLLM and router models* 00:38:09 - Future of LMSys / ArenaShow Notes* Anastasios Angelopoulos* Anastasios' NeurIPS Paper Conformal Risk Control* Wei-Lin Chiang* Chatbot Arena* LMSys* MTBench* ShareGPT dataset* Stanford's Alpaca project* LLMRouter* E2B* DreadnodeTranscriptAlessio [00:00:00]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, Partner and CTO in Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol.ai.Swyx [00:00:14]: Hey, and today we're very happy and excited to welcome Anastasios and Wei Lin from LMSys. Welcome guys.Wei Lin [00:00:21]: Hey, how's it going? Nice to see you.Anastasios [00:00:23]: Thanks for having us.Swyx [00:00:24]: Anastasios, I actually saw you, I think at last year's NeurIPS. You were presenting a paper, which I don't really super understand, but it was some theory paper about how your method was very dominating over other sort of search methods. I don't remember what it was, but I remember that you were a very confident speaker.Anastasios [00:00:40]: Oh, I totally remember you. Didn't ever connect that, but yes, that's definitely true. Yeah. Nice to see you again.Swyx [00:00:46]: Yeah. I was frantically looking for the name of your paper and I couldn't find it. Basically I had to cut it because I didn't understand it.Anastasios [00:00:51]: Is this conformal PID control or was this the online control?Wei Lin [00:00:55]: Blast from the past, man.Swyx [00:00:57]: Blast from the past. It's always interesting how NeurIPS and all these academic conferences are sort of six months behind what people are actually doing, but conformal risk control, I would recommend people check it out. I have the recording. I just never published it just because I was like, I don't understand this enough to explain it.Anastasios [00:01:14]: People won't be interested.Wei Lin [00:01:15]: It's all good.Swyx [00:01:16]: But ELO scores, ELO scores are very easy to understand. You guys are responsible for the biggest revolution in language model benchmarking in the last few years. Maybe you guys want to introduce yourselves and maybe tell a little bit of the brief history of LMSysWei Lin [00:01:32]: Hey, I'm Wei Lin. I'm a fifth year PhD student at UC Berkeley, working on Chatbot Arena these days, doing crowdsourcing AI benchmarking.Anastasios [00:01:43]: I'm Anastasios. I'm a sixth year PhD student here at Berkeley. I did most of my PhD on like theoretical statistics and sort of foundations of model evaluation and testing. And now I'm working 150% on this Chatbot Arena stuff. It's great.Alessio [00:02:00]: And what was the origin of it? How did you come up with the idea? How did you get people to buy in? And then maybe what were one or two of the pivotal moments early on that kind of made it the standard for these things?Wei Lin [00:02:12]: Yeah, yeah. Chatbot Arena project was started last year in April, May, around that. Before that, we were basically experimenting in a lab how to fine tune a chatbot open source based on the Llama 1 model that I released. At that time, Lama 1 was like a base model and people didn't really know how to fine tune it. So we were doing some explorations. We were inspired by Stanford's Alpaca project. So we basically, yeah, grow a data set from the internet, which is called ShareGPT data set, which is like a dialogue data set between user and chat GPT conversation. It turns out to be like pretty high quality data, dialogue data. So we fine tune on it and then we train it and release the model called V2. And people were very excited about it because it kind of like demonstrate open way model can reach this conversation capability similar to chat GPT. And then we basically release the model with and also build a demo website for the model. People were very excited about it. But during the development, the biggest challenge to us at the time was like, how do we even evaluate it? How do we even argue this model we trained is better than others? And then what's the gap between this open source model that other proprietary offering? At that time, it was like GPT-4 was just announced and it's like Cloud One. What's the difference between them? And then after that, like every week, there's a new model being fine tuned, released. So even until still now, right? And then we have that demo website for V2 now. And then we thought like, okay, maybe we can add a few more of the model as well, like API model as well. And then we quickly realized that people need a tool to compare between different models. So we have like a side by side UI implemented on the website to that people choose, you know, compare. And we quickly realized that maybe we can do something like, like a battle on top of ECLMs, like just anonymize it, anonymize the identity, and that people vote which one is better. So the community decides which one is better, not us, not us arguing, you know, our model is better or what. And that turns out to be like, people are very excited about this idea. And then we tweet, we launch, and that's, yeah, that's April, May. And then it was like first two, three weeks, like just a few hundred thousand views tweet on our launch tweets. And then we have regularly double update weekly, beginning at a time, adding new model GPT-4 as well. So it was like, that was the, you know, the initial.Anastasios [00:04:58]: Another pivotal moment, just to jump in, would be private models, like the GPT, I'm a little,Wei Lin [00:05:04]: I'm a little chatty. That was this year. That was this year.Anastasios [00:05:07]: Huge.Wei Lin [00:05:08]: That was also huge.Alessio [00:05:09]: In the beginning, I saw the initial release was May 3rd of the beta board. On April 6, we did a benchmarks 101 episode for a podcast, just kind of talking about, you know, how so much of the data is like in the pre-training corpus and blah, blah, blah. And like the benchmarks are really not what we need to evaluate whether or not a model is good. Why did you not make a benchmark? Maybe at the time, you know, it was just like, Hey, let's just put together a whole bunch of data again, run a, make a score that seems much easier than coming out with a whole website where like users need to vote. Any thoughts behind that?Wei Lin [00:05:41]: I think it's more like fundamentally, we don't know how to automate this kind of benchmarks when it's more like, you know, conversational, multi-turn, and more open-ended task that may not come with a ground truth. So let's say if you ask a model to help you write an email for you for whatever purpose, there's no ground truth. How do you score them? Or write a story or a creative story or many other things like how we use ChatterBee these days. It's more open-ended. You know, we need human in the loop to give us feedback, which one is better. And I think nuance here is like, sometimes it's also hard for human to give the absolute rating. So that's why we have this kind of pairwise comparison, easier for people to choose which one is better. So from that, we use these pairwise comparison, those to calculate the leaderboard. Yeah. You can add more about this methodology.Anastasios [00:06:40]: Yeah. I think the point is that, and you guys probably also talked about this at some point, but static benchmarks are intrinsically, to some extent, unable to measure generative model performance. And the reason is because you cannot pre-annotate all the outputs of a generative model. You change the model, it's like the distribution of your data is changing. New labels to deal with that. New labels are great automated labeling, right? Which is why people are pursuing both. And yeah, static benchmarks, they allow you to zoom in to particular types of information like factuality, historical facts. We can build the best benchmark of historical facts, and we will then know that the model is great at historical facts. But ultimately, that's not the only axis, right? And we can build 50 of them, and we can evaluate 50 axes. But it's just so, the problem of generative model evaluation is just so expansive, and it's so subjective, that it's just maybe non-intrinsically impossible, but at least we don't see a way. We didn't see a way of encoding that into a fixed benchmark.Wei Lin [00:07:47]: But on the other hand, I think there's a challenge where this kind of online dynamic benchmark is more expensive than static benchmark, offline benchmark, where people still need it. Like when they build models, they need static benchmark to track where they are.Anastasios [00:08:03]: It's not like our benchmark is uniformly better than all other benchmarks, right? It just measures a different kind of performance that has proved to be useful.Swyx [00:08:14]: You guys also published MTBench as well, which is a static version, let's say, of Chatbot Arena, right? That people can actually use in their development of models.Wei Lin [00:08:25]: Right. I think one of the reasons we still do this static benchmark, we still wanted to explore, experiment whether we can automate this, because people, eventually, model developers need it to fast iterate their model. So that's why we explored LM as a judge, and ArenaHard, trying to filter, select high-quality data we collected from Chatbot Arena, the high-quality subset, and use that as a question and then automate the judge pipeline, so that people can quickly get high-quality signal, benchmark signals, using this online benchmark.Swyx [00:09:03]: As a community builder, I'm curious about just the initial early days. Obviously when you offer effectively free A-B testing inference for people, people will come and use your arena. What do you think were the key unlocks for you? Was it funding for this arena? Was it marketing? When people came in, do you see a noticeable skew in the data? Which obviously now you have enough data sets, you can separate things out, like coding and hard prompts, but in the early days, it was just all sorts of things.Anastasios [00:09:31]: Yeah, maybe one thing to establish at first is that our philosophy has always been to maximize organic use. I think that really does speak to your point, which is, yeah, why do people come? They came to use free LLM inference, right? And also, a lot of users just come to the website to use direct chat, because you can chat with the model for free. And then you could think about it like, hey, let's just be kind of like more on the selfish or conservative or protectionist side and say, no, we're only giving credits for people that battle or so on and so forth. Strategy wouldn't work, right? Because what we're trying to build is like a big funnel, a big funnel that can direct people. And some people are passionate and interested and they battle. And yes, the distribution of the people that do that is different. It's like, as you're pointing out, it's like, that's not as they're enthusiastic.Wei Lin [00:10:24]: They're early adopters of this technology.Anastasios [00:10:27]: Or they like games, you know, people like this. And we've run a couple of surveys that indicate this as well, of our user base.Wei Lin [00:10:36]: We do see a lot of developers come to the site asking polling questions, 20-30%. Yeah, 20-30%.Anastasios [00:10:42]: It's obviously not reflective of the general population, but it's reflective of some corner of the world of people that really care. And to some extent, maybe that's all right, because those are like the power users. And you know, we're not trying to claim that we represent the world, right? We represent the people that come and vote.Swyx [00:11:02]: Did you have to do anything marketing-wise? Was anything effective? Did you struggle at all? Was it success from day one?Wei Lin [00:11:09]: At some point, almost done. Okay. Because as you can imagine, this leaderboard depends on community engagement participation. If no one comes to vote tomorrow, then no leaderboard.Anastasios [00:11:23]: So we had some period of time when the number of users was just, after the initial launch, it went lower. Yeah. And, you know, at some point, it did not look promising. Actually, I joined the project a couple months in to do the statistical aspects, right? As you can imagine, that's how it kind of hooked into my previous work. At that time, it wasn't like, you know, it definitely wasn't clear that this was like going to be the eval or something. It was just like, oh, this is a cool project. Like Wayland seems awesome, you know, and that's it.Wei Lin [00:11:56]: Definitely. There's in the beginning, because people don't know us, people don't know what this is for. So we had a hard time. But I think we were lucky enough that we have some initial momentum. And as well as the competition between model providers just becoming, you know, became very intense. Intense. And then that makes the eval onto us, right? Because always number one is number one.Anastasios [00:12:23]: There's also an element of trust. Our main priority in everything we do is trust. We want to make sure we're doing everything like all the I's are dotted and the T's are crossed and nobody gets unfair treatment and people can see from our profiles and from our previous work and from whatever, you know, we're trustworthy people. We're not like trying to make a buck and we're not trying to become famous off of this or that. It's just, we're trying to provide a great public leaderboard community venture project.Wei Lin [00:12:51]: Yeah.Swyx [00:12:52]: Yes. I mean, you are kind of famous now, you know, that's fine. Just to dive in more into biases and, you know, some of this is like statistical control. The classic one for human preference evaluation is humans demonstrably prefer longer contexts or longer outputs, which is actually something that we don't necessarily want. You guys, I think maybe two months ago put out some length control studies. Apart from that, there are just other documented biases. Like, I'd just be interested in your review of what you've learned about biases and maybe a little bit about how you've controlled for them.Anastasios [00:13:32]: At a very high level, yeah. Humans are biased. Totally agree. Like in various ways. It's not clear whether that's good or bad, you know, we try not to make value judgments about these things. We just try to describe them as they are. And our approach is always as follows. We collect organic data and then we take that data and we mine it to get whatever insights we can get. And, you know, we have many millions of data points that we can now use to extract insights from. Now, one of those insights is to ask the question, what is the effect of style, right? You have a bunch of data, you have votes, people are voting either which way. We have all the conversations. We can say what components of style contribute to human preference and how do they contribute? Now, that's an important question. Why is that an important question? It's important because some people want to see which model would be better if the lengths of the responses were the same, were to be the same, right? People want to see the causal effect of the model's identity controlled for length or controlled for markdown, number of headers, bulleted lists, is the text bold? Some people don't, they just don't care about that. The idea is not to impose the judgment that this is not important, but rather to say ex post facto, can we analyze our data in a way that decouples all the different factors that go into human preference? Now, the way we do this is via statistical regression. That is to say the arena score that we show on our leaderboard is a particular type of linear model, right? It's a linear model that takes, it's a logistic regression that takes model identities and fits them against human preference, right? So it regresses human preference against model identity. What you get at the end of that logistic regression is a parameter vector of coefficients. And when the coefficient is large, it tells you that GPT 4.0 or whatever, very large coefficient, that means it's strong. And that's exactly what we report in the table. It's just the predictive effect of the model identity on the vote. The other thing that you can do is you can take that vector, let's say we have M models, that is an M dimensional vector of coefficients. What you can do is you say, hey, I also want to understand what the effect of length is. So I'll add another entry to that vector, which is trying to predict the vote, right? That tells me the difference in length between two model responses. So we have that for all of our data. We can compute it ex post facto. We added it into the regression and we look at that predictive effect. And then the idea, and this is formally true under certain conditions, not always verifiable ones, but the idea is that adding that extra coefficient to this vector will kind of suck out the predictive power of length and put it into that M plus first coefficient and quote, unquote, de-bias the rest so that the effect of length is not included. And that's what we do in style control. Now we don't just do it for M plus one. We have, you know, five, six different style components that have to do with markdown headers and bulleted lists and so on that we add here. Now, where is this going? You guys see the idea. It's a general methodology. If you have something that's sort of like a nuisance parameter, something that exists and provides predictive value, but you really don't want to estimate that. You want to remove its effect. In causal inference, these things are called like confounders often. What you can do is you can model the effect. You can put them into your model and try to adjust for them. So another one of those things might be cost. You know, what if I want to look at the cost adjusted performance of my model, which models are punching above their weight, parameter count, which models are punching above their weight in terms of parameter count, we can ex post facto measure that. We can do it without introducing anything that compromises the organic nature of theWei Lin [00:17:17]: data that we collect.Anastasios [00:17:18]: Hopefully that answers the question.Wei Lin [00:17:20]: It does.Swyx [00:17:21]: So I guess with a background in econometrics, this is super familiar.Anastasios [00:17:25]: You're probably better at this than me for sure.Swyx [00:17:27]: Well, I mean, so I used to be, you know, a quantitative trader and so, you know, controlling for multiple effects on stock price is effectively the job. So it's interesting. Obviously the problem is proving causation, which is hard, but you don't have to do that.Anastasios [00:17:45]: Yes. Yes, that's right. And causal inference is a hard problem and it goes beyond statistics, right? It's like you have to build the right causal model and so on and so forth. But we think that this is a good first step and we're sort of looking forward to learning from more people. You know, there's some good people at Berkeley that work on causal inference for the learning from them on like, what are the really most contemporary techniques that we can use in order to estimate true causal effects if possible.Swyx [00:18:10]: Maybe we could take a step through the other categories. So style control is a category. It is not a default. I have thought that when you wrote that blog post, actually, I thought it would be the new default because it seems like the most obvious thing to control for. But you also have other categories, you have coding, you have hard prompts. We consider that.Anastasios [00:18:27]: We're still actively considering it. It's just, you know, once you make that step, once you take that step, you're introducing your opinion and I'm not, you know, why should our opinion be the one? That's kind of a community choice. We could put it to a vote.Wei Lin [00:18:39]: We could pass.Anastasios [00:18:40]: Yeah, maybe do a poll. Maybe do a poll.Swyx [00:18:42]: I don't know. No opinion is an opinion.Wei Lin [00:18:44]: You know what I mean?Swyx [00:18:45]: Yeah.Wei Lin [00:18:46]: There's no neutral choice here.Swyx [00:18:47]: Yeah. You have all these others. You have instruction following too. What are your favorite categories that you like to talk about? Maybe you tell a little bit of the stories, tell a little bit of like the hard choices that you had to make.Wei Lin [00:18:57]: Yeah. Yeah. Yeah. I think the, uh, initially the reason why we want to add these new categories is essentially to answer some of the questions from our community, which is we won't have a single leaderboard for everything. So these models behave very differently in different domains. Let's say this model is trend for coding, this model trend for more technical questions and so on. On the other hand, to answer people's question about like, okay, what if all these low quality, you know, because we crowdsource data from the internet, there will be noise. So how do we de-noise? How do we filter out these low quality data effectively? So that was like, you know, some questions we want to answer. So basically we spent a few months, like really diving into these questions to understand how do we filter all these data because these are like medias of data points. And then if you want to re-label yourself, it's possible, but we need to kind of like to automate this kind of data classification pipeline for us to effectively categorize them to different categories, say coding, math, structure, and also harder problems. So that was like, the hope is when we slice the data into these meaningful categories to give people more like better signals, more direct signals, and that's also to clarify what we are actually measuring for, because I think that's the core part of the benchmark. That was the initial motivation. Does that make sense?Anastasios [00:20:27]: Yeah. Also, I'll just say, this does like get back to the point that the philosophy is to like mine organic, to take organic data and then mine it x plus factor.Alessio [00:20:35]: Is the data cage-free too, or just organic?Anastasios [00:20:39]: It's cage-free.Wei Lin [00:20:40]: No GMO. Yeah. And all of these efforts are like open source, like we open source all of the data cleaning pipeline, filtering pipeline. Yeah.Swyx [00:20:50]: I love the notebooks you guys publish. Actually really good just for learning statistics.Wei Lin [00:20:54]: Yeah. I'll share this insights with everyone.Alessio [00:20:59]: I agree on the initial premise of, Hey, writing an email, writing a story, there's like no ground truth. But I think as you move into like coding and like red teaming, some of these things, there's like kind of like skill levels. So I'm curious how you think about the distribution of skill of the users. Like maybe the top 1% of red teamers is just not participating in the arena. So how do you guys think about adjusting for it? And like feels like this where there's kind of like big differences between the average and the top. Yeah.Anastasios [00:21:29]: Red teaming, of course, red teaming is quite challenging. So, okay. Moving back. There's definitely like some tasks that are not as subjective that like pairwise human preference feedback is not the only signal that you would want to measure. And to some extent, maybe it's useful, but it may be more useful if you give people better tools. For example, it'd be great if we could execute code with an arena, be fantastic.Wei Lin [00:21:52]: We want to do it.Anastasios [00:21:53]: There's also this idea of constructing a user leaderboard. What does that mean? That means some users are better than others. And how do we measure that? How do we quantify that? Hard in chatbot arena, but where it is easier is in red teaming, because in red teaming, there's an explicit game. You're trying to break the model, you either win or you lose. So what you can do is you can say, Hey, what's really happening here is that the models and humans are playing a game against one another. And then you can use the same sort of Bradley Terry methodology with some, some extensions that we came up with in one of you can read one of our recent blog posts for, for the sort of theoretical extensions. You can attribute like strength back to individual players and jointly attribute strength to like the models that are in this jailbreaking game, along with the target tasks, like what types of jailbreaks you want.Wei Lin [00:22:44]: So yeah.Anastasios [00:22:45]: And I think that this is, this is a hugely important and interesting avenue that we want to continue researching. We have some initial ideas, but you know, all thoughts are welcome.Wei Lin [00:22:54]: Yeah.Alessio [00:22:55]: So first of all, on the code execution, the E2B guys, I'm sure they'll be happy to helpWei Lin [00:22:59]: you.Alessio [00:23:00]: I'll please set that up. They're big fans. We're investors in a company called Dreadnought, which we do a lot in AI red teaming. I think to me, the most interesting thing has been, how do you do sure? Like the model jailbreak is one side. We also had Nicola Scarlini from DeepMind on the podcast, and he was talking about, for example, like, you know, context stealing and like a weight stealing. So there's kind of like a lot more that goes around it. I'm curious just how you think about the model and then maybe like the broader system, even with Red Team Arena, you're just focused on like jailbreaking of the model, right? You're not doing kind of like any testing on the more system level thing of the model where like, maybe you can get the training data back, you're going to exfiltrate some of the layers and the weights and things like that.Wei Lin [00:23:43]: So right now, as you can see, the Red Team Arena is at a very early stage and we are still exploring what could be the potential new games we can introduce to the platform. So the idea is still the same, right? And we build a community driven project platform for people. They can have fun with this website, for sure. That's one thing, and then help everyone to test these models. So one of the aspects you mentioned is stealing secrets, stealing training sets. That could be one, you know, it could be designed as a game. Say, can you still use their credential, you know, we hide, maybe we can hide the credential into system prompts and so on. So there are like a few potential ideas we want to explore for sure. Do you want to add more?Anastasios [00:24:28]: I think that this is great. This idea is a great one. There's a lot of great ideas in the Red Teaming space. You know, I'm not personally like a Red Teamer. I don't like go around and Red Team models, but there are people that do that and they're awesome. They're super skilled. When I think about the Red Team arena, I think those are really the people that we're building it for. Like, we want to make them excited and happy, build tools that they like. And just like chatbot arena, we'll trust that this will end up being useful for the world. And all these people are, you know, I won't say all these people in this community are actually good hearted, right? They're not doing it because they want to like see the world burn. They're doing it because they like, think it's fun and cool. And yeah. Okay. Maybe they want to see, maybe they want a little bit.Wei Lin [00:25:13]: I don't know. Majority.Anastasios [00:25:15]: Yeah.Wei Lin [00:25:16]: You know what I'm saying.Anastasios [00:25:17]: So, you know, trying to figure out how to serve them best, I think, I don't know where that fits. I just, I'm not expressing. And give them credits, right?Wei Lin [00:25:24]: And give them credit.Anastasios [00:25:25]: Yeah. Yeah. So I'm not trying to express any particular value judgment here as to whether that's the right next step. It's just, that's sort of the way that I think we would think about it.Swyx [00:25:35]: Yeah. We also talked to Sander Schulhoff of the HackerPrompt competition, and he's pretty interested in Red Teaming at scale. Let's just call it that. You guys maybe want to talk with him.Wei Lin [00:25:45]: Oh, nice.Swyx [00:25:46]: We wanted to cover a little, a few topical things and then go into the other stuff that your group is doing. You know, you're not just running Chatbot Arena. We can also talk about the new website and your future plans, but I just wanted to briefly focus on O1. It is the hottest, latest model. Obviously, you guys already have it on the leaderboard. What is the impact of O1 on your evals?Wei Lin [00:26:06]: Made our interface slower.Anastasios [00:26:07]: It made it slower.Swyx [00:26:08]: Yeah.Wei Lin [00:26:10]: Because it needs like 30, 60 seconds, sometimes even more to, the latency is like higher. So that's one. Sure. But I think we observe very interesting things from this model as well. Like we observe like significant improvement in certain categories, like more technical or math. Yeah.Anastasios [00:26:32]: I think actually like one takeaway that was encouraging is that I think a lot of people before the O1 release were thinking, oh, like this benchmark is saturated. And why were they thinking that? They were thinking that because there was a bunch of models that were kind of at the same level. They were just kind of like incrementally competing and it sort of wasn't immediately obvious that any of them were any better. Nobody, including any individual person, it's hard to tell. But what O1 did is it was, it's clearly a better model for certain tasks. I mean, I used it for like proving some theorems and you know, there's some theorems that like only I know because I still do a little bit of theory. Right. So it's like, I can go in there and ask like, oh, how would you prove this exact thing? Which I can tell you has never been in the public domain. It'll do it. It's like, what?Wei Lin [00:27:19]: Okay.Anastasios [00:27:20]: So there's this model and it crushed the benchmark. You know, it's just like really like a big gap. And what that's telling us is that it's not saturated yet. It's still measuring some signal. That was encouraging. The point, the takeaway is that the benchmark is comparative. There's no absolute number. There's no maximum ELO. It's just like, if you're better than the rest, then you win. I think that was actually quite helpful to us.Swyx [00:27:46]: I think people were criticizing, I saw some of the academics criticizing it as not apples to apples. Right. Like, because it can take more time to reason, it's basically doing some search, doing some chain of thought that if you actually let the other models do that same thing, they might do better.Wei Lin [00:28:03]: Absolutely.Anastasios [00:28:04]: To be clear, none of the leaderboard currently is apples to apples because you have like Gemini Flash, you have, you know, all sorts of tiny models like Lama 8B, like 8B and 405B are not apples to apples.Wei Lin [00:28:19]: Totally agree. They have different latencies.Anastasios [00:28:21]: Different latencies.Wei Lin [00:28:22]: Control for latency. Yeah.Anastasios [00:28:24]: Latency control. That's another thing. We can do style control, but latency control. You know, things like this are important if you want to understand the trade-offs involved in using AI.Swyx [00:28:34]: O1 is a developing story. We still haven't seen the full model yet, but it's definitely a very exciting new paradigm. I think one community controversy I just wanted to give you guys space to address is the collaboration between you and the large model labs. People have been suspicious, let's just say, about how they choose to A-B test on you. I'll state the argument and let you respond, which is basically they run like five anonymous models and basically argmax their Elo on LMSYS or chatbot arena, and they release the best one. Right? What has been your end of the controversy? How have you decided to clarify your policy going forward?Wei Lin [00:29:15]: On a high level, I think our goal here is to build a fast eval for everyone, and including everyone in the community can see the data board and understand, compare the models. More importantly, I think we want to build the best eval also for model builders, like all these frontier labs building models. They're also internally facing a challenge, which is how do they eval the model? That's the reason why we want to partner with all the frontier lab people, and then to help them testing. That's one of the... We want to solve this technical challenge, which is eval. Yeah.Anastasios [00:29:54]: I mean, ideally, it benefits everyone, right?Wei Lin [00:29:56]: Yeah.Anastasios [00:29:57]: And people also are interested in seeing the leading edge of the models. People in the community seem to like that. Oh, there's a new model up. Is this strawberry? People are excited. People are interested. Yeah. And then there's this question that you bring up of, is it actually causing harm?Wei Lin [00:30:15]: Right?Anastasios [00:30:16]: Is it causing harm to the benchmark that we are allowing this private testing to happen? Maybe stepping back, why do you have that instinct? The reason why you and others in the community have that instinct is because when you look at something like a benchmark, like an image net, a static benchmark, what happens is that if I give you a million different models that are all slightly different, and I pick the best one, there's something called selection bias that plays in, which is that the performance of the winning model is overstated. This is also sometimes called the winner's curse. And that's because statistical fluctuations in the evaluation, they're driving which model gets selected as the top. So this selection bias can be a problem. Now there's a couple of things that make this benchmark slightly different. So first of all, the selection bias that you include when you're only testing five models is normally empirically small.Wei Lin [00:31:12]: And that's why we have these confidence intervals constructed.Anastasios [00:31:16]: That's right. Yeah. Our confidence intervals are actually not multiplicity adjusted. One thing that we could do immediately tomorrow in order to address this concern is if a model provider is testing five models and they want to release one, and we're constructing the models at level one minus alpha, we can just construct the intervals instead at level one minus alpha divided by five. That's called Bonferroni correction. What that'll tell you is that the final performance of the model, the interval that gets constructed, is actually formally correct. We don't do that right now, partially because we know from simulations that the amount of selection bias you incur with these five things is just not huge. It's not huge in comparison to the variability that you get from just regular human voters. So that's one thing. But then the second thing is the benchmark is live, right? So what ends up happening is it'll be a small magnitude, but even if you suffer from the winner's curse after testing these five models, what'll happen is that over time, because we're getting new data, it'll get adjusted down. So if there's any bias that gets introduced at that stage, in the long run, it actually doesn't matter. Because asymptotically, basically in the long run, there's way more fresh data than there is data that was used to compare these five models against these private models.Swyx [00:32:35]: The announcement effect is only just the first phase and it has a long tail.Anastasios [00:32:39]: Yeah, that's right. And it sort of like automatically corrects itself for this selection adjustment.Swyx [00:32:45]: Every month, I do a little chart of Ellim's ELO versus cost, just to track the price per dollar, the amount of like, how much money do I have to pay for one incremental point in ELO? And so I actually observe an interesting stability in most of the ELO numbers, except for some of them. For example, GPT-4-O August has fallen from 12.90
BUFFALO, NY- April 1, 2024 – A new #research paper was #published on the #cover of Aging (listed by MEDLINE/PubMed as "Aging (Albany NY)" and "Aging-US" by Web of Science) Volume 16, Issue 6, entitled, “Altered brain morphology and functional connectivity in postmenopausal women: automatic segmentation of whole-brain and thalamic subnuclei and resting-state fMRI.” The transition to menopause is associated with various physiological changes, including alterations in brain structure and function. However, menopause-related structural and functional changes are poorly understood. In this new study, researchers Gwang-Won Kim, Kwangsung Park, Yun-Hyeon Kim, and Gwang-Woo Jeong from Chonnam National University not only compared the brain volume changes between premenopausal and postmenopausal women, but also evaluated the functional connectivity between the targeted brain regions associated with structural atrophy in postmenopausal women. “To the best of our knowledge, no comparative neuroimaging study on alterations in the brain volume and functional connectivity, especially focusing on the thalamic subnuclei in premenopausal vs. postmenopausal women has been reported.” Each of the 21 premenopausal and postmenopausal women underwent magnetic resonance imaging (MRI). T1-weighted MRI and resting-state functional MRI data were used to compare the brain volume and seed-based functional connectivity, respectively. In statistical analysis, multivariate analysis of variance, with age and whole brain volume as covariates, was used to evaluate surface areas and subcortical volumes between the two groups. Postmenopausal women showed significantly smaller cortical surface, especially in the left medial orbitofrontal cortex (mOFC), right superior temporal cortex, and right lateral orbitofrontal cortex, compared to premenopausal women (p < 0.05, Bonferroni-corrected) as well as significantly decreased functional connectivity between the left mOFC and the right thalamus was observed (p < 0.005, Monte-Carlo corrected). Although postmenopausal women did not show volume atrophy in the right thalamus, the volume of the right pulvinar anterior, which is one of the distinguished thalamic subnuclei, was significantly decreased (p < 0.05, Bonferroni-corrected). “Postmenopausal women showed significantly lower left mOFC, right lOFC, and right STC surface areas, reduced right PuA volume, and decreased left mOFC-right thalamus functional connectivity compared to premenopausal women. If replicated in an independent sample, these findings will be helpful for understanding the effects of menopause on the altered brain volume and functional connectivity in postmenopausal women.” DOI - https://doi.org/10.18632/aging.205662 Corresponding author - Gwang-Woo Jeong - gwjeong@jnu.ac.kr About Aging-US Aging publishes research papers in all fields of aging research including but not limited, aging from yeast to mammals, cellular senescence, age-related diseases such as cancer and Alzheimer's diseases and their prevention and treatment, anti-aging strategies and drug development and especially the role of signal transduction pathways such as mTOR in aging and potential approaches to modulate these signaling pathways to extend lifespan. The journal aims to promote treatment of age-related diseases by slowing down aging, validation of anti-aging drugs by treating age-related diseases, prevention of cancer by inhibiting aging. Cancer and COVID-19 are age-related diseases. Please visit our website at https://www.Aging-US.com. MEDIA@IMPACTJOURNALS.COM
Let's jump into statistics and understand how we can ensure our results are truly significant! Become a Paid Subscriber for access to the full Data Science Interview Prep Podcast library: https://podcasters.spotify.com/pod/show/data-science-interview/subscribe ----------------------------------------------------------- If you enjoy our podcast, please consider becoming a premium member on either Patreon ($5 donation) or Spotify ($2.99 donation). Your donation goes directly to supporting this channel and the human labor that goes into each of these episodes. For each episode, we do research and fact-check our content to make sure that you get the best information possible, even on cutting edge topics. Become a Paid Subscriber: https://podcasters.spotify.com/pod/show/data-science-interview/subscribe Becoming a premium member also gives you access to our locked episodes, which include helpful content such as: - NLP - Deep Learning - Recurrent Neural Networks - Imbalanced Data - The Bias-Variance Tradeoff - Transformers in NLP - Self-Attention in NLP - Distributions - Statistics - A/B Testing - ROC-AUC .... and so much more!
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.02.27.530185v1?rss=1 Authors: Kowalczyk, O. S., Medina, S., Tsivaka, D., McMahon, S. B., Williams, S. C. R., Brooks, J. C. W., Lythgoe, D. J., Howard, M. A. Abstract: Resting fMRI studies have identified intrinsic spinal cord activity, which forms organised motor (ventral) and sensory (dorsal) resting-state networks. However, to facilitate the use of spinal fMRI in, for example, clinical studies, it is crucial to first assess the reliability of the method, particularly given the unique anatomical, physiological, and methodological challenges associated with acquiring the data. Here we demonstrate a novel implementation for acquiring BOLD-sensitive resting-state spinal fMRI, which was used to characterise functional connectivity relationships in the cervical cord and assess their test-retest reliability in 23 young healthy volunteers. Resting-state networks were estimated in two ways: (1) by extracting the mean timeseries from anatomically constrained seed masks and estimating voxelwise connectivity maps and (2) by calculating seed-to-seed correlations between extracted mean timeseries. Seed regions corresponded to the four grey matter horns (ventral/dorsal and left/right) of C5-C8 segmental levels. Test-retest reliability was assessed using the intraclass correlation coefficient (ICC) in the following ways: for each voxel in the cervical spine; each voxel within an activated cluster; the mean signal as a summary estimate within an activated cluster; and correlation strength in the seed-to-seed analysis. Spatial overlap of clusters derived from voxelwise analysis between sessions was examined using Dice coefficients. Following voxelwise analysis, we observed distinct unilateral dorsal and ventral organisation of cervical spinal resting-state networks that was largely confined in the rostro-caudal extent to each spinal segmental level, with more sparse connections observed between segments (Bonferroni corrected p less than 0.003, threshold-free cluster enhancement with 5000 permutations). Additionally, strongest correlations were observed between within-segment ipsilateral dorsal-ventral connections, followed by within-segment dorso-dorsal and ventro-ventral connections. Test-retest reliability of these networks was mixed. Reliability was poor when assessed on a voxelwise level, with more promising indications of reliability when examining the average signal within clusters. Reliability of correlation strength between seeds was highly variable, with highest reliability achieved in ipsilateral dorsal-ventral and dorso-dorsal/ventro-ventral connectivity. However, the spatial overlap of networks between sessions was excellent. We demonstrate that while test-retest reliability of cervical spinal resting-state networks is mixed, their spatial extent is similar across sessions, suggesting that these networks are characterised by a consistent spatial representation over time. Copy rights belong to original authors. Visit the link for more info Podcast created by Paper Player, LLC
In this week's episode Patrick and Greg talk about an alternative to familywise Type I error control, the false discovery rate, and how it offers increased power in that middle ground between no error control and the severe control of Bonferroni. Along the way they also mention: Leif Ericson, discovering Columbus, The Flintstones, brontosauri, dying grandmas, Coors Field homeruns, hitting Richard, more Calvinball, the power reaper, Thelma and Louise, making flights on time, intellectual judo, wet paper bags, and distribution of blame.Stay in contact with Quantitude! Twitter: @quantitudepod Web page: quantitudepod.org Merch: redbubble.com
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.01.02.522524v1?rss=1 Authors: Ganesan, S., Moffat, B., Van Dam, N., Lorenzetti, V., Zalesky, A. Abstract: Objectives: Mapping the neurobiology of meditation using 3 Tesla functional MRI (fMRI) has burgeoned recently. However, limitations in signal quality and neuroanatomical resolution have impacted reliability and precision of extant findings. Although ultra-high strength 7 Tesla MRI overcomes these limitations, investigation of meditation using 7 Tesla fMRI is still in its infancy. Methods: In this feasibility study, we scanned 10 individuals who were beginner meditators using 7 Tesla fMRI while they performed focused attention meditation and non-focused rest. We also measured and adjusted the fMRI signal for key physiological differences between meditation and rest. Finally, we explored the 2-week impact of the single fMRI meditation session on mindfulness, anxiety and focused attention attributes. Results: Group-level task fMRI analyses revealed significant reductions in activity during meditation relative to rest in Default-mode network hubs, i.e., antero-medial prefrontal and posterior cingulate cortices, precuneus, as well as visual and thalamic regions. These findings survived stringent statistical corrections for fluctuations in physiological responses which demonstrated significant differences (p less than 0.05/n, Bonferroni controlled) between meditation and rest. Compared to baseline, State Mindfulness Scale (SMS) scores were significantly elevated (F = 8.16, p less than 0.05/n, Bonferroni controlled) following the fMRI meditation session, and were closely maintained at 2-week follow up. Conclusions: This pilot study establishes the feasibility and utility of investigating focused attention meditation using ultra-high strength (7 Tesla) fMRI, by supporting widespread evidence that focused attention meditation attenuates Default-mode activity responsible for self-referential processing. Future functional neuroimaging studies of meditation should control for physiological confounds and include behavioural assessments. Copy rights belong to original authors. Visit the link for more info Podcast created by Paper Player, LLC
In this lecture, we review what we have learned about one-sample confidence intervals (i.e., how to use them as graphical versions of one-sample t-tests) for absolute performance estimation in order to motivate the problem of relative performance estimation. We introduce two-sample confidence intervals (i.e., confidence intervals on DIFFERENCES based on different two-sample t-tests) that are tested against a null hypothesis of 0. This means covering confidence interval half widths for the paired-difference t-test, the equal-variance (pooled) t-test, and Welch's unequal variance t-test. Each of these different experimental conditions sets up a different standard error of the mean formula and formula for degrees of freedom that are used to define the actual confidence interval half widths (centered on the difference in sample means in the pairwise comparison of systems). We then generalize to the case of more than 2 systems, particularly for "ranking and selection (R&S)." This lets us review the multiple-comparisons problem (and Bonferroni correction) and how post hoc tests (after an ANOVA) are more statistically powerful ways to do comparisons.
In this lecture, we review the fundamental tradeoffs in hypothesis testing and the concrete origins of the assumptions in both the t-test and Chi-square test. We also discuss parametric and non-parametric statistics (including exact and non-exact tests) and how non-parametric, exact statistics like the Kolmogorov–Smirnov test are derived. This culminates in a discussion of the multiple comparisons (MC) problem and the Bonferroni correction as well as alternative tests (such as a MANOVA or an ANOVA with post hoc test) that have more statistical power than the Bonferroni correction. We close with an introduction to performance inference from simulation, which we will continue discussing in the next 3 lectures.
Investigating the anti-hypertensive effects of pumpkin seed oil Marymount University and University of Guilan (Iran), September 29, 2021 In a study, researchers from Iran and the U.S. found that pumpkin seed oil can potentially treat hypertension in postmenopausal women. Their report was published in Complementary Therapies in Clinical Practice. Postmenopausal women are more likely to develop hypertension than men of the same age. In vivo studies reveal that pumpkin seed oil has anti-hypertensive activity. The team investigated the effects of pumpkin seed oil supplementation on vascular function and heart rate variability in postmenopausal women with elevated blood pressure. Participants were assigned to take either a pumpkin seed oil supplement or a placebo for the six-week study. Those in the experimental group took 3 grams of pumpkin seed oil every day. Brachial and central blood pressure, wave reflection (augmentation index, AIx), arterial stiffness (SI) and various HRV parameters were measured at baseline and at the end of the study. Those who took pumpkin seed oil had significantly lower AIx, brachial and systolic blood pressure after treatment. SI and HRV parameters remained unchanged for the treatment group and the placebo group at the end of the study. In sum, taking pumpkin seed oil may improve arterial hemodynamics in postmenopausal women. Health benefits of evening classes revealed Oxford University, September 20, 2021 Those with a taste for adult education classes have long known it, but now Oxford University scientists have confirmed that taking part in the weekly sessions can boost wellbeing – regardless of the subject studied. In partnership with the Workers' Educational Association (WEA), the largest voluntary sector provider of adult education in England and Scotland, a team from Oxford's department of experimental psychology studied attendees at seven separate day-time adult education classes. Their findings are published in a series of papers. Each class took place over seven months and included a break in the middle. Attendees completed questionnaires before and after their class three times over the seven months: at the beginning of their courses, after 3 months, and at the end of the seven months. Participants were involved in one of three activities: singing, crafts or creative writing. Overall, attendees at all seven classes had improved mental and physical health and reported more satisfaction with their lives at the end of their courses. Dr Eiluned Pearce led the research. She said: 'The students reported benefits including increased self-confidence, a greater feeling of control over their lives and more willingness to take on new challenges. Some said the classes made them more motivated to be more active, despite the classes not specifically involving physical activity. 'Participants also said that the classes broadened their networks of friends and gave them an increased sense of belonging. We also found that the more someone felt part of their group, the more their health and wellbeing improved.' An intriguing finding was in the singing and creative writing classes. Building on the results of an earlier paper from the same study, which found that people in singing classes felt closer to their group more quickly than those in the other classes, the team looked at how relationships formed between individuals in the classes. Each person was asked to name those other people in the class whose name they could remember, whether or not they felt connected to each person they named, and whether they had talked to that person during class. Dr Pearce said: 'The results showed that those in the singing and creative writing groups built up relationships with other individuals more quickly than the crafters, and singers felt more connected to the class as a whole more quickly than both the other groups. 'While this confirms our earlier finding that singing has an 'ice-breaker effect' compared to other activities, it shows that other activities may enable people to increase their social networks just as much, even if it takes them longer to feel connected to their group as a whole.' Co-author Dr Jacques Launay adds: 'While much of our previous work has demonstrated the importance of music, it is likely that the most socially bonding activities are always those that are personally chosen and enjoyed. This research adds to growing support for the relevance of creative activities in creating happy communities and improving health and well-being, with consequent benefits for public services and society.' Dr Pádraig Mac Carron, Dr Anna Machin and Professor Robin Dunbar were also involved in the research. Howard Croft, WEA Regional Education Manager, said: 'The findings reiterate the feedback that we have had from our students over the years: learning is a fantastic way to boost your self-esteem and confidence. Also of note, is its therapeutic effect. For many students, creative courses are a means of finding a new outlet for expressing their feelings. This can be of immense help during times of personal difficulty or emotional upheaval, such as divorce or bereavement. Simply going to a course can offer much-needed respite. 'For others, learning can be an opportunity to reignite a former passion. This could be anything from a subject which you enjoyed at school to an area which you are interested in. Whatever your reason, there are so many benefits to be gained by signing up to a course.' Want to live forever? Theoretically, you could, study says Swiss Federal Institute of Technology, September 29, 2021 Humans can probably live to at least 130, and possibly well beyond, though the chances of reaching such super old age remain vanishingly small, according to new research. The outer limit of the human lifespan has long been hotly debated, with recent studies making the case we could live up to 150 years, or arguing that there is no maximum theoretical age for humans. The new research, published Wednesday in the Royal Society Open Science journal, wades into the debate by analyzing new data on supercentenarians—people aged 110 or more—and semi-supercentenarians, aged 105 or more. While the risk of death generally increases throughout our lifetime, the researchers' analysis shows that risk eventually plateaus and remains constant at approximately 50-50. "Beyond age 110 one can think of living another year as being almost like flipping a fair coin," said Anthony Davison, a professor of statistics at the Swiss Federal Institute of Technology in Lausanne (EPFL), who led the research. "If it comes up heads, then you live to your next birthday, and if not, then you will die at some point within the next year," he told AFP. Based on the data available so far, it seems likely that humans can live until at least 130, but extrapolating from the findings "would imply that there is no limit to the human lifespan," the research concludes. The conclusions match similar statistical analyses done on datasets of the very elderly. "But this study strengthens those conclusions and makes them more precise because more data are now available," Davison said. The first dataset the team studied is newly released material from the International Database on Longevity, which covers more than 1,100 supercentenarians from 13 countries. The second is from Italy on every person who was at least 105 between January 2009 and December 2015. 'One in a million' The work involves extrapolating from existing data, but Davison said that was a logical approach. "Any study of extreme old age, whether statistical or biological, will involve extrapolation," he said. "We were able to show that if a limit below 130 years exists, we should have been able to detect it by now using the data now available," he added. Still, just because humans can theoretically reach 130 or beyond, doesn't mean we're likely to see it anytime soon. For a start, the analysis is based on people who have already achieved the relatively rare feat of making it to well over 100. And even at age 110, your chances of making it to 130 are "about one in a million... not impossible but very unlikely," said Davison. He thinks we could see people reaching 130 within the century, as more people make it to supercentenarian status, increasing the chances of one becoming that one in a million. "But in the absence of major medical and social advances, ages much over this are highly unlikely ever to be observed," he added. For now, the oldest person on record is Frenchwoman Jeanne Calment, who died in 1997 at the confirmed age of 122. Her true age was the subject of some controversy, with claims of a possible fraud, but in 2019 several experts said a review of the evidence confirmed her age. Other pretenders to the throne of oldest person ever have a long way to go. The oldest verified living person in the world is Japan's Kane Tanaka, a comparatively youthful 118. Psychological treatment shown to yield strong, lasting pain relief, alter brain networks University of Colorado, September 29, 2021 Rethinking what causes pain and how great of a threat it is can provide chronic pain patients with lasting relief and alter brain networks associated with pain processing, according to new University of Colorado Boulder-led research. The study, published Sept. 29 in JAMA Psychiatry, found that two-thirds of chronic back pain patients who underwent a four-week psychological treatment called Pain Reprocessing Therapy (PRT) were pain-free or nearly pain-free post-treatment. And most maintained relief for one year. The findings provide some of the strongest evidence yet that a psychological treatmentcan provide potent and durable relief for chronic pain, which afflicts one in five Americans. "For a long time we have thought that chronic pain is due primarily to problems in the body, and most treatments to date have targeted that," said lead author Yoni Ashar, who conducted the study while earning his Ph.D. in the Department of Psychology and Neuroscience at CU Boulder. "This treatment is based on the premise that the brain can generate pain in the absence of injury or after an injury has healed, and that people can unlearn that pain. Our study shows it works." Misfiring neural pathways Approximately 85% of people with chronic back pain have what is known as "primary pain," meaning tests are unable to identify a clear bodily source, such as tissue damage. Misfiring neural pathways are at least partially to blame: Different brain regions—including those associated with reward and fear—activate more during episodes of chronic pain than acute pain, studies show. And among chronic pain patients, certain neural networks are sensitized to overreact to even mild stimuli. If pain is a warning signal that something is wrong with the body, primary chronic pain, Ashar said, is "like a false alarm stuck in the 'on' position." PRT seeks to turn off the alarm. "The idea is that by thinking about the pain as safe rather than threatening, patients can alter the brain networks reinforcing the pain, and neutralize it," said Ashar, now a postdoctoral researcher at Weill Cornell Medicine.or the randomized controlled trial, Ashar and senior author Tor Wager, now the Diana L. Taylor Distinguished Professor in Neuroscience at Dartmouth College, recruited 151 men and women who had back pain for at least six months at an intensity of at least four on a scale of zero to 10. Those in the treatment group completed an assessment followed by eight one-hour sessions of PRT, a technique developed by Los Angeles-based pain psychologist Alan Gordon. The goal: To educate the patient about the role of the brain in generating chronic pain; to help them reappraise their pain as they engage in movements they'd been afraid to do; and to help them address emotions that may exacerbate their pain. Pain is not 'all in your head' "This isn't suggesting that your pain is not real or that it's 'all in your head'," stressed Wager, noting that changes to neural pathways in the brain can linger long after an injury is gone, reinforced by such associations. "What it means is that if the causes are in the brain, the solutions may be there, too." Before and after treatment, participants also underwent functional magnetic resonance imaging (fMRI) scans to measure how their brains reacted to a mild pain stimulus. After treatment, 66% of patients in the treatment group were pain-free or nearly pain-free compared to 20% of the placebo group and 10% of the no-treatment group. "The magnitude and durability of pain reductions we saw are very rarely observed in chronic pain treatment trials," Ashar said, noting that opioids have yielded only moderate and short-term relief in many trials. And when people in the PRT group were exposed to pain in the scanner post-treatment, brain regions associated with pain processing—including the anterior insula and anterior midcingulate —had quieted significantly. The authors stress that the treatment is not intended for "secondary pain"—that rooted in acute injury or disease. The study focused specifically on PRT for chronic back pain, so future, larger studies are needed to determine if it would yeild similar results for other types of chronic pain. Meanwhile, other similar brain-centered techniques are already ememrging among physical therapists and other clinicians who treat pain. "This study suggests a fundamentally new way to think about both the causes of chronic back pain for many people and the tools that are available to treat that pain," said co-author Sona Dimidjian, professor of psychology and neuroscience and director of the Renee Crown Wellness Institute at CU Boulder. " It provides a potentially powerful option for people who want to live free or nearly free of pain." Citicoline (CDP-choline) and Memory Function in Healthy Older Adults: A Randomized, Double-Blind, Placebo-Controlled Clinical Trial Kyowa Hakko Bio (Japan), September 2021 Supplementation of citicoline (CDP-choline), a naturally occurring mononucleotide, has shown beneficial effects on memory function and behavior in populations with a wide range of impairments. However, few studies have investigated its effect in healthy older populations. Objective The objective of this study was to investigate the effects of citicoline, on memory in healthy elderly populations with age-associated memory impairment (AAMI). Methods A total of 100 healthy men and women aged between 50 and 85 y with AAMI participated in this randomized, double-blind, placebo-controlled trial. Participants were randomized to receive placebo (n = 51) or citicoline (n = 49; 500 mg/d) for 12 wk. Memory function was assessed at baseline and end of the intervention (12 wk) using computerized tests (Cambridge Brain Sciences, Ontario, Canada). Safety measurements included adverse events query, body weight, blood pressure, and hematology and metabolic panel. Intent-to-treat analysis was conducted using ANCOVA for the primary and secondary outcome variables with Bonferroni correction for multiple comparisons. Results A total of 99 out of 100 participants completed the study in its entirety. After the 12-wk intervention, participants supplemented with citicoline showed significantly greater improvements in secondary outcomes of episodic memory (assessed by the Paired Associate test), compared with those on placebo (mean: 0.15 vs. 0.06, respectively, P = 0.0025). Composite memory (secondary outcome), calculated using the scores of 4 memory tests, also significantly improved to a greater extent following citicoline supplementation (mean: 3.78) compared with placebo (mean: 0.72, P = 0.0052). Conclusions Dietary supplementation of citicoline for 12 wk improved overall memory performance, especially episodic memory, in healthy older males and females with AAMI. The findings suggest that regular consumption of citicoline may be safe and potentially beneficial against memory loss due to aging. Sleep may strengthen long-term memories in the immune system University of Tuebingen (Germany) September 29, 2021 More than a century ago, scientists demonstrated that sleep supports the retention of memories of facts and events. Later studies have shown that slow-wave sleep, often referred to as deep sleep, is important for transforming fragile, recently formed memories into stable, long-term memories. Now, in an Opinion article published in Trends in Neurosciences, part of a special issue on Neuroimmunology, researchers propose that deep sleep may also strengthen immunological memories of previously encountered pathogens. "While it has been known for a long time that sleep supports long-term memoryformation in the psychological domain, the idea that long-term memory formation is a function of sleep effective in all organismic systems is in our view entirely new," says senior author Jan Born of the University of Tuebingen. "We consider our approach toward a unifying concept of biological long-term memory formation, in which sleep plays a critical role, a new development in sleep research and memory research." The immune system "remembers" an encounter with a bacteria or virus by collecting fragments from the bug to create memory T cells, which last for months or years and help the body recognize a previous infection and quickly respond. These memory T cells appear to abstract "gist information" about the pathogens, as only T cells that store information about the tiniest fragments ever elicit a response. The selection of gist information allows memory T cells to detect new pathogens that are similar, but not identical, to previously encountered bacteria or viruses. Studies in humans have shown that long-term increases in memory T cells are associated with deep slow-wave sleep on the nights after vaccination. Taken together, the findings support the view that slow-wave sleep contributes to the formation of long-term memories of abstract, generalized information, which leads to adaptive behavioral and immunological responses. The obvious implication is that sleep deprivation could put your body at risk. "If we didn't sleep, then the immune system might focus on the wrong parts of the pathogen," Born says. "For example, many viruses can easily mutate some parts of their proteins to escape from immune responses. If too few antigen-recognizing cells [the cells that present the fragments to T cells] are available, then they might all be needed to fight off the pathogen. In addition to this, there is evidence that the hormones released during sleep benefit the crosstalk between antigen-presenting and antigen-recognizing cells, and some of these important hormones could be lacking without sleep." Born says that future research should examine what information is selected during sleep for storage in long-term memory, and how this selection is achieved. In the end, this research could have important clinical implications. "In order to design effective vaccines against HIV, malaria, and tuberculosis, which are based on immunological memory, the correct memory model must be available," Born says. "It is our hope that by comparing the concepts of neuronal and immunological memory, a model of immunological memory can be developed which integrates the available experimental data and serves as a helpful basis for vaccine development." Standardized astragalus extract for attenuation of the immunosuppression induced by strenuous physical exercise: randomized controlled trial University of Physical Sciences (Poland), September 3, 2021 This paper aimed to verify how a supplementation of rower's diet with Astragalus Membranaceus Root (AMR) modulated their immune system response to maximal physical exertion. Methods The double-blind study included 18 members of the Polish Rowing Team assigned to the supplemented group (n = 10), and the placebo group (n = 8). The participants performed a 2000 m test on a rowing ergometer at the beginning and at the end of the six-week of intensive training camp during which the supplemented group received 500 mg of AMR. Blood samples were obtained prior to, 1 min after completing, and 24 h after the exertion test. The levels of interleukin 2 (IL2), interleukin 4 (IL4), interleukin 10 (IL10), interferon ɤ (IFN-ɣ), and lactic acid were determined. Subpopulations of T regulatory lymphocytes [CD4+/CD25+/CD127−] (Treg), cytotoxic lymphocytes [CD8+/TCRαβ+] (CTL), natural killer cells [CD3−/CD16+/CD56+] (NK), and TCRδγ-positive cells (Tδγ) were determined with flow cytometry. Results After the camp, the initial NK and Treg levels sustained at the baseline, while Tδγ counts increased relative to the levels in the placebo group. In the supplemented subgroup, a decrease in IL2 level in reaction to maximal exertion clearly deepened while the change in IL-2/IL-10 level induced by the recovery after this exertion clearly increased, relative to the changes in the placebo group. Conclusions AMR restored the immunological balance in strenuously trained athletes through a stabilization of NK and Treg cells with a positive trend in Tδγ towards Th1 response during restitution by cytokine IL2 modulation.
Reporter Degrees Of Freedom I. A sample of Thursday’s talk at Yale These are four headlines describing the same study, Milkie, Nomaguchi and Denny (2015). The study found that of twenty or so outcomes, only three of them – all measuring delinquent behavior among teenagers – show significant effect from time spent with parents (and this result remains after Bonferroni correction). So Vox has a great argument for their headline. The National Post has an okay argument for their headline even though it’s kind of cherry-picked. The Washington Post just sort of reads between the lines and figures that if it’s not quantity of time that helps kids, it must be quality. And FOX also reads between the lines and figures that if moms spending time with their kids has no effect, the argument from opportunity costs suggests mothers are spending too much time with their kids. None of them are completely outright lying. And indeed, most of the articles eventually explain what I just said, halfway down the article, in one or two short sentences that most readers will skim over. But the rest of the article uses the study to support whatever the news source involved wants it to support, and so people will come up with four diametrically opposed conclusions from this one study depending on which source they read. II. Here’s a study that I wasn’t able to include in the presentation because it just came out recently. As per the Rice University press release: Overweight Men Just As Likely As Overweight Women To Face Discrimination.
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.11.04.366856v1?rss=1 Authors: Liew, S.-L., Zavaliangos-Petropulu, A., Schweighofer, N., Jahanshad, N., Lang, C. E., Lohse, K. R., Banaj, N., Barisano, G., Baugh, L. A., Bhattacharya, A. K., Bigjahan, B., Borich, M. R., Boyd, L. A., Brodtmann, A., Buetefisch, C. M., Byblow, W. D., Cassidy, J. M., Ciullo, V., Conforto, A. B., Craddock, R. C., Dula, A. N., Egorova, N., Feng, W., Fercho, K. A., Gregory, C. M., Hanlon, C. A., Hayward, K. S., Holguin, J. A., Hordacre, B., Hwang, D. H., Kautz, S. A., Khlif, M. S., Kim, B., Kuceyeski, A., Liu, J., Lin, D., Lotze, M., Kim, H., MacIntosh, B. J., Margetis, J. L., Mohamed, F. B., Nord Abstract: Background: Up to two-thirds of stroke survivors experience persistent sensorimotor impairments. Recovery relies on the integrity of spared brain areas to compensate for damaged tissue. Subcortical regions play critical roles in the control and regulation of sensorimotor circuits. Identifying relationships between sensorimotor behavior and non-lesioned subcortical volumes will reveal new neural targets for improving outcomes. Methods: We pooled high-resolution T1-weighted MRI brain scans and behavioral data in 828 individuals with unilateral stroke from 28 cohorts worldwide (age: median 63, interquartile range 19 years; 516 males, 312 females). Cross-sectional analyses using linear mixed-effects models related post-stroke sensorimotor behavior to non-lesioned subcortical volumes. We analyzed subacute ([≤]90 days) and chronic ([≥]180 days) stroke; sub-analyses in chronic stroke were performed on class of sensorimotor deficit (impairment, activity limitations) and side of lesioned hemisphere, with exploratory analyses in early stroke ([≤]21 days) and across time (Bonferroni-corrected, p
In this lecture, we continue to discuss hypothesis testing -- introducing parametric, non-parametric, exact, and non-exact tests and reviewing the assumptions behind many popular parameterized tests (like the t-test and ANOVA) and non-exact tests (Chi-square test). We then move to discuss the multiple-comparisons problem ("fishing") and Bonferroni correct. We end with an introduction to the estimation of absolute performance in simulated systems.
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.07.22.215939v1?rss=1 Authors: van Dongen, J., Hagenbeek, F. A., Suderman, M., Roetman, P. J., Sugden, K., Chiocchetti, A. G., Ismail, K., Mulder, R. H., Hafferty, J., Adams, M. J., Walker, R., Morris, S., Lahti, J., Kupers, L. K., Escaramis, G., Alemany, S., Bonder, M. J., Meijer, M., Ip, H. F., Jansen, R., Baselmans, B. M. L., Parmar, P., Lowry, E., Streit, F., Sirignano, L., Send, T., Frank, J., Jylhava, J., Wang, Y., Mishra, P. P., Colins, O. F., Corcoran, D., Poulton, R., Mill, J., Hannon, E., Arseneault, L., Korhonen, T., Vuoksimaa, E., Felix, J., Bakermans-Kranenburg, M., Campbell, A., Czamara, D., Binder, E., Corpel Abstract: DNA methylation profiles of aggressive behavior may capture lifetime cumulative effects of genetic, stochastic, and environmental influences associated with aggression. Here, we report the first large meta-analysis of epigenome-wide association studies (EWAS) of aggressive behavior (N=15,324 participants). In peripheral blood samples of 14,434 participants from 18 cohorts with mean ages ranging from 7 to 68 years, 13 methylation sites were significantly associated with aggression (alpha=1.2x10-7; Bonferroni correction). In cord blood samples of 2,425 children from five cohorts with aggression assessed at mean ages ranging from 4 to 7 years, 83% of these sites showed the same direction of association with childhood aggression (r=0.74, p=0.006) but no epigenome-wide significant sites were found. Top-sites (48 at a false discovery rate of 5% in the peripherl blood meta-analysis or in a combined meta-analysis of peripheral blood and cord blood) have been associated with chemical exposures, smoking, cognition, metabolic traits, and genetic variation (mQTLs). Three genes whose expression levels were associated with top-sites were previously linked to schizophrenia and general risk tolerance. At six CpGs, DNA methylation variation in blood mirrors variation in the brain. On average 44% (range=3-82%) of the aggression-methylation association was explained by current and former smoking and BMI. These findings point at loci that are sensitive to chemical exposures with potential implications for neuronal functions. We hope these results to be a starting point for studies leading to applications as peripheral biomarkers and to reveal causal relationships with aggression and related traits. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.07.19.188789v1?rss=1 Authors: Sin-Chan, P., Gosalia, N., Gao, C., Van Hout, C. V., Ye, B., Marcketta, A., Li, A. H., O'Dushlaine, C., Li, D., Overton, J. D., Reid, J. D., Baras, A., Genetics Center, R., Carey, D. J., Ledbetter, D. H., Rader, D., Ritchie, M. D., Damrauer, S. M., Milman, S., Barzilai, N., Glass, D. J., Economides, A. N., Shuldiner, A. R. Abstract: Aging is characterized by degeneration in cellular and organismal functions leading to increased disease susceptibility and death. Although our understanding of aging biology in model systems has increased dramatically, large-scale sequencing studies to understand human aging are now just beginning. We applied exome sequencing and association analyses (ExWAS) to identify age-related variants on 58,470 participants of the DiscovEHR cohort. Linear Mixed Model regression analyses of age at last encounter revealed variants in genes known to be linked with clonal hematopoiesis of indeterminate potential, which are associated with myelodysplastic syndromes, as top signals in our analysis, suggestive of age-related somatic mutation accumulation in hematopoietic cells despite patients lacking clinical diagnoses. In addition to APOE, we identified rare DISP2 rs183775254 (p = 7.40x10-10) and ZYG11A rs74227999 (p = 2.50x10-08) variants that were negatively associated with age in either both sexes combined and females, respectively, which were replicated with directional consistency in two independent cohorts. Epigenetic mapping showed these variants are located within cell-type-specific enhancers, suggestive of important transcriptional regulatory functions. To discover variants associated with extreme age, we performed exome-sequencing on persons of Ashkenazi Jewish descent ascertained for extensive lifespans. Case-Control analyses in 525 Ashkenazi Jews cases (Males [≥] 92 years, Females [≥] 95years) were compared to 482 controls. Our results showed variants in APOE (rs429358, rs6857), and TMTC2 (rs7976168) passed Bonferroni-adjusted p-value, as well as several nominally-associated population-specific variants. Collectively, our Age-ExWAS, the largest performed to date, confirmed and identified previously unreported candidate variants associated with human age. Copy rights belong to original authors. Visit the link for more info
En estadística, la corrección de Bonferroni es uno de los varios métodos utilizados para contrarrestar el problema de las comparaciones múltiples. La prueba de hipótesis estadística se basa en rechazar la hipótesis nula si la probabilidad de los datos observados en las hipótesis nulas es baja. Si se prueban múltiples hipótesis, aumenta la probabilidad de un evento raro y, por lo tanto, aumenta la probabilidad de rechazar incorrectamente una hipótesis nula (es decir, cometer un error de tipo I.
Effect of Dexamethasone in Hospitalized Patients with COVID-19: Preliminary Report (RECOVERY) trial is a randomized, controlled, open-label, comparison of dexamethasone 6 mg given once daily for up to ten days vs. usual care alone. 2104 patients randomly allocated to receive dexamethasone were compared with 4321 patients concurrently allocated to usual care. The primary outcome was 28-day mortality (21.6%) patients allocated dexamethasone vs (24.6%) patients in usual care died within 28 days (age-adjusted rate ratio [RR] 0.83; 95% confidence interval [CI] 0.74 to 0.92; P
Por Ernesto Mislej Historias pseudo reales para entender algunos principios de la minería de datos. Estadística, investigación y orden. El podcast ¿Todo puede ser explicado con datos? Menos es más y el Principio de Bonferroni de Ernesto Mislej se publicó primero en Wetoker.
Discussion of error rates and statistical power in hypothesis testing, along with a deeper investigation behind how the Student's t-test and the Chi-square test work and why they require the assumptions they do. An example paired-difference t-test with power analysis is done, and then lecture closes with a discussion of the multiple comparisons problem (applied to simulation problems) and tools, such as Bonferroni correction, that can be used to prevent "statistical fishing."Many Halloween-inspired examples are given during the lecture, and the relationship between ghost busting and statistics is exploited.
Last year, a study came out showing that beef jerky and other cured meats, could trigger mania in bipolar disorder (paper, popular article). It was a pretty big deal, getting coverage in the national press and affecting the advice psychiatrists (including me) gave their patients. The study was pretty simple: psychiatrists at a mental hospital in Baltimore asked new patients if they had ever eaten any of a variety of foods. After getting a few hundred responses, they compared answers to controls and across diagnostic categories. The only hit that came up was that people in the hospital for bipolar mania were more likely to have said they ate dry cured meat like beef jerky (odds ratio 3.49). This survived various statistical comparisons and made some biological sense. The methodology was a little bit weird, because they only asked if they’d ever had the food, not if they’d eaten a lot of it just before becoming sick. If you had beef jerky once when you were fourteen, and ended up in the psych hospital when you were fifty-five, that counted. Either they were hoping that “ever had beef jerky at all” was a good proxy for “eats a lot of beef jerky right now”, or that past consumption produced lasting changes in gut bacteria. In any case, they found a strong effect even after adjusting for confounders and doing the necessary Bonferroni corrections, so it’s hard to argue with success.
A false discovery rate (FDR) is a methodology that can be useful when struggling with the problem of multiple comparisons. In any experiment, if the experimenter checks more than one dependent variable, then they are making multiple comparisons. Naturally, if you make enough comparisons, you will eventually find some correlation. Classically, people applied the Bonferroni Correction. In essence, this procedure dictates that you should lower your p-value (raise your standard of evidence) by a specific amount depending on the number of variables you're considering. While effective, this methodology is strict about preventing false positives (type i errors). You aren't likely to find evidence for a hypothesis that is actually false using Bonferroni. However, your exuberance to avoid type i errors may have introduced some type ii errors. There could be some hypotheses that are actually true, which you did not notice. This episode covers an alternative known as false discovery rates. The essence of this method is to make more specific adjustments to your expectation of what p-value is sufficient evidence.
TACL 2017 paper by Rotem Dror, Gili Baumer, Marina Bogomolov, and Roi Reichart. Roi comes on to talk to us about how to make better statistical comparisons between two methods when there are multiple datasets in the comparison. This paper shows that there are more powerful methods available than the occasionally-used Bonferroni correction, and using the better methods can let you make stronger, statistically-valid conclusions. We talk a bit also about how the assumptions you make about your data can affect the statistical tests that you perform, and briefly mention other issues in replicability / reproducibility, like training variance. https://www.semanticscholar.org/paper/Replicability-Analysis-for-Natural-Language-Proces-Dror-Baumer/fa5129ab6fd85f8ff590f9cc8a39139e9dfa8aa2
Dr Matthew Atkinson, consultant in anaesthesia and critical care from East Lancashire Hospitals NHS Trust, discusses his article 'Statistical Analysis: Sample Size and Power Estimations', published in the May edition of BJA Education. Topics discussed include error types, the Bonferroni correction and post-hoc power analysis.
Dr Matthew Atkinson, consultant in anaesthesia and critical care from East Lancashire Hospitals NHS Trust, discusses his article 'Statistical Analysis: Sample Size and Power Estimations', published in the May edition of BJA Education. Topics discussed include error types, the Bonferroni correction and post-hoc power analysis.
Today's episode begins by asking how many left handed employees we should expect to be at a company before anyone should claim left handedness discrimination. If not lefties, let's consider eye color, hair color, favorite ska band, most recent grocery store used, and any number of characteristics could be studied to look for deviations from the norm in a company. When multiple comparisons are to be made simultaneous, one must account for this, and a common method for doing so is with the Bonferroni Correction. It is not, however, a sure fire procedure, and this episode wraps up with a bit of skepticism about it.
Tierärztliche Fakultät - Digitale Hochschulschriften der LMU - Teil 07/07
Ziel der vorliegenden Studie ist es, den Einfluss von 17β-Östradiol und/oder Progesteron auf die neurologische Funktion in vivo sowie auf die neuronale Morphologie und auf molekulare Aspekte der zerebralen Inflammation und Exzitotoxizität in vitro im Zeitraum von 4 Tagen nach herzchirurgisch typischer extrakorporaler Zirkulation (EKZ) mit 45 min tief hypothermem Kreislaufstillstand („deep hypothermic circulatory arrest“, DHCA) in einem Versuchsmodell an der Ratte zu untersuchen. 100 weibliche Sprague-Dawley-Ratten wurden in 5 Versuchsgruppen (n = 20) randomisiert: Eine Gruppe wurde scheinkastriert (Gruppe „Intakt“); die 4 weiteren Gruppen wurden kastriert und je nach Gruppenzugehörigkeit mit einem subkutanen Hormonimplantat versehen: Plazebo (Gruppe „Plazebo“), 17β-Östradiol (Gruppe „Östrogen“), Progesteron (Gruppe „Progesteron“) sowie 17β-Östradiol- und Progesteron (Gruppe „Progesteron + Östrogen“, „P+Ö“). 4 Wochen nach Kastration/Scheinkastration und gegebenenfalls nach Hormon-/Plazebo-Supplementierung wurden die Tiere mit Isofluran (2,0 – 2,5 Vol% in 40 % O2) anästhesiert, intubiert, katheterisiert, mithilfe der EKZ über 30 min auf 16 °C rektale Körpertemperatur abgekühlt und der DHCA wurde für einen Zeitraum von 45 min durchgeführt. Anschließend wurden die Tiere mittels EKZ über 40 min auf 35,5 °C rektale Körpertemperatur wiedererwärmt, die EKZ beendet, die Tiere 1 Stunde nachbeatmet und eine Telemetrie-Transmittersonde intraperitoneal eingesetzt, bevor die Anästhesie beendet, und die Tiere bei Einsetzen der Spontanatmung extubiert wurden. Die postoperative telemetrische Überwachung umfasste Körpertemperatur und Herzfrequenz, letztere gekoppelt mit einem Alarmsystem, um Tiere bei Bedarf (Herzfrequenz < 150 Schläge/min) rechtzeitig euthanasieren zu können. Alle Tiere wurden am präoperativen Tag 1 (Tag –1) neurologisch untersucht sowie an den postoperativen Tagen 1, 2, 3 und 4, sofern sie überlebt haben. Am postoperativen Tag 4 wurden die Tiere in tiefer Isoflurannarkose zur Blut- und Serum-Gewinnung entblutet sowie deren Gehirne entnommen und tiefgefroren (–80 °C). In Hämatoxylin-Eosin (HE-) gefärbten Gehirnschnitten wurde die Anzahl intakter und geschädigter Neurone lichtmikroskopisch in den Bereichen Striatum (Bregma –0,3 mm), Kortex und Hippokampus (Bregma –3,3 mm) ermittelt. Die TNFα-Konzentration im Serum der Tiere wurde mittels Enzyme-linked Immunosorbent Assay bestimmt. Die zerebrale Expression von Östrogenrezeptor α (ERα), von Östrogenrezeptor β (ERβ), von „Excitatory Amino Acid Transporter 1“ (EAAT1), von „Glutamat Transporter 1b“ (GLT-1b), von „CXC-Motiv-Chemokinrezeptor 2“, (CXCR2), von Interleukin-8 (IL-8) und von induzierbarer Stickoxid Synthase (iNOS) erfolgte mit dem Western-Blot-Verfahren. Die durchgeführten Auswertungen erfolgten für eine nach spezifischen Kriterien getroffene Auswahl von jeweils 5 überlebenden Tieren und 5 euthanasierten Tieren pro Versuchsgruppe. Die statistische Datenanalyse wurde mithilfe allgemeiner linearer Modelle mit post-hoc einfaktorieller Varianzanalyse und Bonferroni-t-Tests bzw. Kruskal-Wallis und post-hoc Mann-Whitney U (p < 0,05) durchgeführt. Die Ergebnisse zeigen keine Versuchsgruppen-spezifischen Unterschiede bei den intra- und postoperativen physiologischen Parametern. Die Serumkonzentrationen von TNFα, die Gehirnmorphologie und die neurologische Funktion zeigten sich vom Versuchsansatz unbeeinflusst. Beim Nachweis der Expression von Rezeptor-, Transporter- und Entzündungsproteinen im Gehirngewebe mittels Western-Blot-Verfahren zeigten sich bei einigen Proteinen Versuchsgruppen-spezifische Unterschiede: Die vorzeitig euthanasierten Tiere der Gruppe „Plazebo“ zeigten eine niedrigere Expression von ERβ als die überlebenden Tiere dieser Gruppe und als die ebenfalls vorzeitig euthanasierten Gruppen „Östrogen“ und „P+Ö“. Des Weiteren exprimierten die überlebenden Tiere der Gruppe „P+Ö“ signifikant mehr EAAT1 im Gehirn als die vorzeitig euthanasierten Tiere dieser Gruppe und auch als die überlebenden Tiere der „Plazebo“-Gruppe. Zusätzlich zeigte sich, dass physiologische hormonelle Schwankungen im Rahmen des Zyklusgeschehens bei der gemeinsamen Wirkung von 17β-Östradiol und Progesteron (Gruppe „Intakt“) im Rahmen der postoperativen zerebralen Inflammation sowohl vorteilhaft (reduzierte iNOS-Expression) als auch von Nachteil (erhöhte IL-8-Expression) gegenüber einer Substituion dieser beiden Hormone (Gruppe „P+Ö“) sein können. Zusammenfassend gesehen bieten aus der Vielzahl der untersuchten Parameter und Faktoren insbesondere die für die zerebrale Inflammation und Exzitotoxizität typischen spezifischen Proteine ausreichend Potential für weiterführende, tierexperimentell ähnlich gestaltete Untersuchungen. Langfristig könnten hier zusätzliche Tierversuchsmodelle wie z. B. Alterung, Diabetes, Atherosklerose und Hypertension eine weitere Annäherung an humanklinische Verhältnisse ermöglichen.
Background: Since the pig is one of the most important livestock animals worldwide, mapping loci that are associated with economically important traits and/or traits that influence animal welfare is extremely relevant for efficient future pig breeding. Therefore, the purpose of this study was a genome-wide mapping of quantitative trait loci (QTL) associated with nine body composition and bone mineral traits: absolute (Fat, Lean) and percentage (FatPC, LeanPC) fat and lean mass, live weight (Weight), soft tissue X-ray attenuation coefficient (R), absolute (BMC) and percentage (BMCPC) bone mineral content and bone mineral density (BMD). Methods: Data on the nine traits investigated were obtained by Dual-energy X-ray absorptiometry for 551 pigs that were between 160 and 200 days old. In addition, all pigs were genotyped using Illumina's PorcineSNP60 Genotyping BeadChip. Based on these data, a genome-wide combined linkage and linkage disequilibrium analysis was conducted. Thus, we used 44 611 sliding windows that each consisted of 20 adjacent single nucleotide polymorphisms (SNPs). For the middle of each sliding window a variance component analysis was carried out using ASReml. The underlying mixed linear model included random QTL and polygenic effects, with fixed effects of sex, housing, season and age. Results: Using a Bonferroni-corrected genome-wide significance threshold of P < 0.001, significant peaks were identified for all traits except BMCPC. Overall, we identified 72 QTL on 16 chromosomes, of which 24 were significantly associated with one trait only and the remaining with more than one trait. For example, a QTL on chromosome 2 included the highest peak across the genome for four traits (Fat, FatPC, LeanPC and R). The nearby gene, ZNF608, is known to be associated with body mass index in humans and involved in starvation in Drosophila, which makes it an extremely good candidate gene for this QTL. Conclusions: Our QTL mapping approach identified 72 QTL, some of which confirmed results of previous studies in pigs. However, we also detected significant associations that have not been published before and were able to identify a number of new and promising candidate genes, such as ZNF608.
Medizinische Fakultät - Digitale Hochschulschriften der LMU - Teil 16/19
Zusammenfassung: Ziel: Der Zweck dieser Pilot-Studie war, eine Gesundheitsintervention für gesunde, ältere Menschen, die ihre Gesundheit und Leistungsfähigkeit erhalten oder verbessern wollen, zu entwickeln und diese Intervention mit herkömmlichen Gesundheitsprogrammen in ihrer Wirkung, Effektstärke und Nachhaltigkeit zu vergleichen. Gesundheit sollte dabei in einer ganzheitlichen Sicht im Sinne des bio-psycho-sozialen Modells konzeptionalisiert und evaluiert werden. Studiendesign: Es wurde eine explorative Untersuchung mit einem drei Gruppenplan und drei Messzeitpunkten über sechs Monate durchgeführt. Insgesamt nahmen N = 69 gesunde Personen im Alter von 50 bis 65 Jahren teil. Der wöchentliche Trainingsaufwand umfasste für jede Gruppe ca. 4 Stunden pro Woche, die unterschiedlich gestaltet wurden. Die Teilnehmer der ersten Gruppe (SP1: N = 26) absolvierten ein rein körperliches Fitnesstraining mit entsprechenden Kraft-, Ausdauer-, Beweglichkeits- und Koordinationsübungen über 4 Stunden / Woche. Bei der zweiten Gruppe (SP2: N = 20) wurde das körperliche Training auf 2 Stunden / Woche reduziert, zusätzlich aber eine ebenfalls zwei Stunden / Woche umfassende achtsamkeitsbasierte Intervention (MBSR nach Kabat-Zinn) durchgeführt. Die dritte Gruppe (SP3: N = 23) absolvierte ein ganzheitliches Training, das zu 2 Stunden / Woche aus körperlichen Training und zu 2 Stunden / Woche aus einem mentalen, emotionalen und motivational-volitionalen Training sowie einem Achtsamkeitstraining bestand. Die ersten acht Wochen fand eine intensive, strukturierte Seminarphase mit insgesamt 26 Stunden Gruppeninterventionsdauer und zusätzlich vorgeschriebenen Trainingseinheiten statt. In den anschließenden vier Monaten konnten die Teilnehmer ihre Trainingseinheiten nach freiem Ermessen planen. Zur Veränderung der körperlichen Gesundheit wurde Blutdruck, Herzfrequenz, Herzratenvariabilität, Bauchumfang, Ausdauer- und Krafttest sowie bei ausgewählten Teilnehmern auch eine fMRT-Untersuchung des Gehirns durchgeführt, während psycho-soziale Gesundheitsparameter mit Fragebogeninstrumenten wie Fragebogen zum allgemeinen habituellen Wohlbefinden (FAHW), Sense of Coherence Scale (SOC), Perceived Stress Questionnaire (PSQ), Überdruss-Skala, Fragebogen zur Erfassung von Ressourcen und Selbstmanagementfähigkeiten (FERUS spezifisch Selbstwirksamkeit und Coping) erhoben wurden. Methodik: Die Signifikanzprüfung zwischen den Teilstichproben erfolgte aufgrund der geringen Stichprobengröße nichtparametrisch mittels Friedman-Test (Chi2), zur Abschätzung der Relevanz wurden zusätzlich Effektstärken auf der Basis von Cohen-d-Tests errechnet. Um Veränderungen im Zeitvergleich zu analysieren kamen Varianzanalysen mit Messwiederholung (SPSS) zur Anwendung, eine Signifikanzbestimmung erfolgte über den Greenhouse-Geisser-F-Wert und Post-Hoc-Vergleiche wurden Bonferroni korrigiert. Die Innersubjekteffektstärken wurden mittels partiellen Eta-Quadrat berechnet. Anschließend wurde eine bivariate Korrelationsanalyse durchgeführt und die Signifikanz nach Pearson zweiseitig geprüft. Die gefundenen Zusammenhänge wurden abschließend mit einer linearen Regressionsanalyse analysiert. Ergebnisse: Die Hypothese, dass Bewegung und körperliches Training positiv auf Gesundheit wirken, konnte in dieser Studie für die Parameter Bauchumfang (SP1 p=.01 F=7; SP2 p=.01 F=6.1; SP3 p=.00 F=12,6) und Kraftleistungsfähigkeit (SP1 p=.00 F=32; SP2 p=.00 F=14.2; SP3 p=.00 F=21.4) in allen drei Gruppen belegt werden, während ein signifikanter Zusammenhang mit Ausdauer (SP3 p=.00 F=12.7) nur für die ganzheitliche Gruppe bestätigt werden konnte. Für die Messungen von Blutdruck, Herzratenvariabilität und Herzfrequenz konnten graduelle Verbesserungen festgestellt werden, die allerdings nicht das Signifikanzniveau erreichten. Die Auswertung der Fragebogendaten ergab beim FAHW (Gesamtwert (SP1 p=.03 F=4.2; SP2 p=.00 F=15.2; SP3 p=.00 F=22.1) und körperliches Wohlbefinden (SP1 p=.01 F=6; SP2 p=.00 F=12.1; SP3 p=.00 F=15.6)) eine signifikante Verbesserung für alle drei Gruppen. In den Aspekten psychische (SP2 p=.00 F=16.2; SP3 p=.00 F=21.7) und soziale (SP2 p=.05 F=3.9; SP3 p=.01 F=6.8) Gesundheit des FAHW, sowie beim SOC (SP2 p=.00 F=9.5; SP3 p=.00 F=7), PSQ (SP2 p=.00 F=15.4; SP3 p=.00 F=24.3), FERUS-Coping (SP2 p=.00 F=8.4; SP3 p=.00 F=8.1) und dem Überdrussfragebogen (SP2 p=.00 F=21.8; SP3 p=.00 F=23.3) konnten durch ein körperliches- und ein achtsamkeitsbasiertes Training (SP2) sowie ein ganzheitliches Training (SP3) signifikante Verbesserungen erzielt werden, wohingegen bei „nur“ körperlichem Training (SP1) lediglich eine tendenzielle Verbesserung zu beobachten war. Der Aspekt Selbstwirksamkeit (SP3 p=.00 F=12,6) konnte nur in der ganzheitlichen Gruppe signifikant gesteigert werden. Die fMRT Messung fand nur bei SP2 und SP3 statt, wobei in beiden Gruppen eine signifikant erweiterte metabolische Aktivierung des Precuneus (SP2 p
Tierärztliche Fakultät - Digitale Hochschulschriften der LMU - Teil 06/07
A case-control investigation was undertaken to determine management and health related factors associated with pleurisy in slaughter pigs in England and Wales. The British Pig Executive Pig Health Scheme database of abattoir pathology was used to identify 121 case (≥10% prevalence of pleurisy on 3 or more assessment dates in the preceding 24 months) and 121 control units (≤5% prevalence of pleurisy on 3 or more assessment dates in the preceding 24 months). Farm data were collected by postal questionnaire. Data from respondents (70 cases and 51 controls) were analysed using simple logistic regression models with Bonferroni corrections. Limited multivariate analyses were also performed to check the robustness of the overall conclusions. Management factors associated with increased odds of pleurisy included ‘no all-in all-out pig flow’ (OR 9.3, 95% confidence interval [CI]: 3.3–29), ‘rearing of pigs with an age difference of >1 month in the same airspace’ (OR 6.5 [2.8–17]) and ‘repeated mixing’ (OR 2.2 [1.4–3.8]) and ‘moving’ (OR 2.2 [1.5–3.4]) of pigs during the rearing phase. Those associated with decreased odds of pleurisy included ‘filling wean-to-finish or grower-to-finish systems with piglets from ≤3 sources’ (OR 0.18 [0.07–0.41]) compared to farrow-to-finish systems, ‘cleaning and disinfecting’ of grower (ORs 0.28 [0.13–0.61] and 0.29 [0.13–0.61]) and finisher (ORs 0.24 [0.11–0.51] and 0.2 [0.09–0.44]) accommodation between groups, and ‘extended down time’ of grower and finisher accommodation (OR 0.84 [0.75–0.93] and 0.86 [0.77–0.94] respectively for each additional day of downtime). This study demonstrated the value of national-level abattoir pathology data collection systems for case control analyses and generated guidance for on-farm interventions to help reduce the prevalence of pleurisy in slaughter pigs.
Genome-wide association studies identified the autophagy gene IRGM to be strongly associated with Crohn's disease (CD) but its impact in ulcerative colitis (UC), its phenotypic effects and potential epistatic interactions with other IBD susceptibility genes are less clear which we therefore analyzed in this study. Genomic DNA from 2060 individuals including 817 CD patients, 283 UC patients, and 961 healthy, unrelated controls (all of Caucasian origin) was analyzed for six IRGM single nucleotide polymorphisms (SNPs) (rs13371189, rs10065172 = p.Leu105Leu, rs4958847, rs1000113, rs11747270, rs931058). In all patients, a detailed genotype-phenotype analysis and testing for epistasis with the three major CD susceptibility genes NOD2, IL23R and ATG16L1 were performed. Our analysis revealed an association of the IRGM SNPs rs13371189 (p = 0.02, OR 1.31 [95% CI 1.05-1.65]), rs10065172 = p.Leu105Leu (p = 0.016, OR 1.33 [95% CI 1.06-1.66]) and rs1000113 (p = 0.047, OR 1.27 [95% CI 1.01-1.61]) with CD susceptibility. There was linkage disequilibrium between these three IRGM SNPs. In UC, several IRGM haplotypes were weakly associated with UC susceptibility (p
Metabolomic profiling and the integration of whole-genome genetic association data has proven to be a powerful tool to comprehensively explore gene regulatory networks and to investigate the effects of genetic variation at the molecular level. Serum metabolite concentrations allow a direct readout of biological processes, and association of specific metabolomic signatures with complex diseases such as Alzheimer's disease and cardiovascular and metabolic disorders has been shown. There are well-known correlations between sex and the incidence, prevalence, age of onset, symptoms, and severity of a disease, as well as the reaction to drugs. However, most of the studies published so far did not consider the role of sexual dimorphism and did not analyse their data stratified by gender. This study investigated sex-specific differences of serum metabolite concentrations and their underlying genetic determination. For discovery and replication we used more than 3,300 independent individuals from KORA F3 and F4 with metabolite measurements of 131 metabolites, including amino acids, phosphatidylcholines, sphingomyelins, acylcarnitines, and C6-sugars. A linear regression approach revealed significant concentration differences between males and females for 102 out of 131 metabolites (p-values
Main contributors to adverse outcomes in severely burned pediatric patients are profound and complex metabolic changes in response to the initial injury. It is currently unknown how long these conditions persist beyond the acute phase post-injury. The aim of the present study was to examine the persistence of abnormalities of various clinical parameters commonly utilized to assess the degree hypermetabolic and inflammatory alterations in severely burned children for up to three years post-burn to identify patient specific therapeutic needs and interventions. Nine-hundred seventy-seven severely burned pediatric patients with burns over 30% of the total body surface admitted to our institution between 1998 and 2008 were enrolled in this study and compared to a cohort non-burned, non-injured children. Demographics and clinical outcomes, hypermetabolism, body composition, organ function, inflammatory and acute phase responses were determined at admission and subsequent regular intervals for up to 36 months post-burn. Statistical analysis was performed using One-way ANOVA, Student's t-test with Bonferroni correction where appropriate with significance accepted at p
Osteopontin represents a multifunctional molecule playing a pivotal role in chronic inflammatory and autoimmune diseases. Its expression is increased in inflammatory bowel disease (IBD). The aim of our study was to analyze the association of osteopontin (OPN/SPP1) gene variants in a large cohort of IBD patients. Genomic DNA from 2819 Caucasian individuals (n = 841 patients with Crohn's disease (CD), n = 473 patients with ulcerative colitis (UC), and n = 1505 healthy unrelated controls) was analyzed for nine OPN SNPs (rs2728127, rs2853744, rs11730582, rs11739060, rs28357094, rs4754 = p.Asp80Asp, rs1126616 = p.Ala236Ala, rs1126772 and rs9138). Considering the important role of osteopontin in Th17-mediated diseases, we performed analysis for epistasis with IBD-associated IL23R variants and analyzed serum levels of the Th17 cytokine IL-22. For four OPN SNPs (rs4754, rs1126616, rs1126772 and rs9138), we observed significantly different distributions between male and female CD patients. rs4754 was protective in male CD patients (p = 0.0004, OR = 0.69). None of the other investigated OPN SNPs was associated with CD or UC susceptibility. However, several OPN haplotypes showed significant associations with CD susceptibility. The strongest association was found for a haplotype consisting of the 8 OPN SNPs rs2728127-rs2853744-rs11730582-rs11439060-rs28357094-rs112661-rs1126772-rs9138 (omnibus p-value = 2.07×10⁻⁸). Overall, the mean IL-22 secretion in the combined group of OPN minor allele carriers with CD was significantly lower than that of CD patients with OPN wildtype alleles (p = 3.66×10⁻⁵). There was evidence for weak epistasis between the OPN SNP rs28357094 with the IL23R SNP rs10489629 (p = 4.18×10⁻²) and between OPN SNP rs1126616 and IL23R SNP rs2201841 (p = 4.18×10⁻²) but none of these associations remained significant after Bonferroni correction. Our study identified OPN haplotypes as modifiers of CD susceptibility, while the combined effects of certain OPN variants may modulate IL-22 secretion.
Apart from its regulatory function in lipid and glucose metabolism, peroxisome proliferator-activated receptor (PPAR)γ has impact on the regulation of inflammation and bone metabolism. The aim of the study was to investigate the association of five polymorphisms (rs10865710, rs2067819, rs3892175, rs1801282, rs3856806) within the PPARG gene with chronic periodontitis. The study population comprised 402 periodontitis patients and 793 healthy individuals. Genotyping of the PPARG gene polymorphisms was performed by PCR and melting curve analysis. Comparison of frequency distribution of genotypes between individuals with periodontal disease and healthy controls for the polymorphism rs3856806 showed a P-value of 0.04 but failed to reach significance after correction for multiple testing (P 0.90). A 3-site analysis (rs2067819-rs1801282-rs3856860) revealed five haplotypes with a frequency of ≥1% among cases and controls. Following adjustment for age, gender and smoking, none of the haplotypes was significantly different between periodontitis and healthy controls after Bonferroni correction. This study could not show a significant association between PPARG gene variants and chronic periodontitis.
Tierärztliche Fakultät - Digitale Hochschulschriften der LMU - Teil 04/07
Up to now, the occult phase of the cardiomyopathy in Doberman Pinschers, with a cumulative prevalence of 63 % in Europe (WESS et al., 2009b), can only be diagnosed by specialized and cost-intensive methods, such as the 24-h-ECG (Holter) and the echocardiogram. Electrocardiographic and echocardiographic changes usually appear at an age, in which the dogs were already used for breeding. Considering the genetic health of the Doberman Pinscher breed, it is of great importance to diagnose the genetically determined cardiomyopathy before it can be passed on to the offspring. The aim of this study Analysis of NTpro-BNP in Dilated Cardiomyopathy of the Doberman Pinscher was to evaluate NTpro-BNP in the Doberman Pinscher population, in healthy animals as well as in various stages of cardiomyopathy, to define reference values and to validate the use of NTpro-BNP as a diagnostic tool. Therefore, 250 healthy and 108 dogs with cardiomyopathy were studied between 2004 and 2008, in total 480 examinations. NTpro-BNP measurements were performed in plasma samples, using the ELISA VETSIGNTM Canine CardioSCREEN Ntpro-BNP, Guildhay Ltd., UK. NTpro-BNP concentrations increased in correlation with the severity of disease. There was a statistically significant difference between the healthy control group (mean 315.7 pmol/l) and the occult and decompensated groups as well as the whole dog population with cardiomyopathy. To differentiate healthy dogs from dogs with cardiomyopathy, a cut-off value of 413.8 pmol/l was established that provided an area under the receiver operator curve of 0.820, with a sensitivity and specifity of 72.4 % and 80.2 %, respectively. There was no significant influence of age, weight or sex on the NTpro-BNP concentration. Sensitivity reached 90.0 % for predicting abnormal echocardiographic changes. Interestingly, there was a statistically significant increase of NTpro-BNP in the occult group with exclusively VPCs even before echocardiographic changes were present. The “still normal” group, consisting of examinations of dogs that appeared to be healthy without any measurable signs at that time, but which later on developed cardiomyopathy, is highly interesting. Although the mean NTpro-BNP level (499.7 pmol/l) was higher than the mean NTpro-BNP concentration in the occult group with exclusively VPCs, no statistically significant difference was achieved in comparision to the healthy control group using Bonferroni’s equitation. The small number of only 17 dogs in this group might be a reason for the lack of statistical significance. One limitation of this study is the fact that not all dogs, especially those in the healthy control group, could be followed up to the end of their lives. Therefore, it might be possible that “still healthy” dogs with genetically determined cardiomyopathy are included in the healthy control group. As a consequence, the reference value could be falsely high. Long-term studies will be needed to validate the reference value in a group of definitively genetically healthy animals. The tendency to higher NTpro-BNP values in the still normal group also should be studied in a larger population in order to evaluate the potential of NTpro-BNP as an early marker of cardiomyopathy in Doberman Pinschers. NTpro-BNP cannot replace the Holter ECG and echocardiographic measurements in the diagnosis of occult cardiomyopathy at present. But if the time-consuming and cost-intensive methods that require a high specialization are not available, NTpro-BNP levels could indicate the necessity of referring the patient to a cardiologist. Because of the high sensitivity of 90 % for predicting abnormal echocardiographic changes, it might be possible that the measurement of NTpro-BNP eventually could replace the echocardiographic examination, but not the Holter-ECG.
Recent studies demonstrated an association of STAT4 variants with systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA), indicating that multiple autoimmune diseases share common susceptibility genes. We therefore investigated the influence of STAT4 variants on the susceptibility and phenotype of inflammatory bowel diseases (IBD) in a large patient and control cohort. Genomic DNA from 2704 individuals of Caucasian origin including 857 patients with Crohn's disease (CD), 464 patients with ulcerative colitis (UC), and 1383 healthy, unrelated controls was analyzed for seven SNPs in the STAT4 gene (rs11889341, rs7574865, rs7568275, rs8179673, rs10181656, rs7582694, rs10174238). In addition, a detailed genotype-phenotype analysis was performed. Our analysis revealed an association of the STAT4 SNP rs7574865 with overall decreased susceptibility to CD (p = 0.047, OR 0.86 [95% CI 0.74-0.99]). However, compared to CD patients carrying the wild type genotype, the STAT4 SNP rs7574865 was significantly associated with early CD onset (p = 0.021) and colonic CD (p = 0.008; OR = 4.60, 95% CI 1.63-12.96). For two other STAT4 variants, there was a trend towards protection against CD susceptibility (rs7568275, p = 0.058, OR 0.86 [95% CI 0.74-1.00]; rs10174238, p = 0.057, OR 0.86 [95% CI 0.75-1.00]). In contrast, we did not observe any association with UC susceptibility. Evidence for weak gene-gene interaction of STAT4 with the IL23R SNP rs11209026 was lost after Bonferroni correction. Our results identified the STAT4 SNP rs7574865 as a disease-modifying gene variant in colonic CD. However, in contrast to SLE and RA, the effect of rs7574865 on CD susceptibility is only weak.
The adipocyte-derived protein adiponectin is highly heritable and inversely associated with risk of type 2 diabetes mellitus (T2D) and coronary heart disease (CHD). We meta-analyzed 3 genome-wide association studies for circulating adiponectin levels (n = 8,531) and sought validation of the lead single nucleotide polymorphisms ( SNPs) in 5 additional cohorts (n = 6,202). Five SNPs were genome-wide significant in their relationship with adiponectin (P
Background: The chemokine CXCL13 is known to dictate homing and motility of B cells in lymphoid tissue and has been implicated in the formation of ectopic lymphoid tissue in chronic inflammation. Whether it influences B cell trafficking during acute infection, is largely unclear. In previous studies, we showed that (I) CXCL13 levels are markedly increased in the B cell-rich cerebrospinal fluid (CSF) of patients with acute Lyme neuroborreliosis (LNB), and (II) CXCL13 is released by monocytes upon recognition of borrelial outer surface proteins by Toll-like receptor 2. Here, we assessed the role of CXCL13 - in comparison to other chemokines - in the recruitment of B cells to the CSF of patients with acute LNB. Methods: Measurement of chemokines was done by ELISA. B cells were isolated from whole blood using magnetic cell separation (MACS). For migration experiments, a modified Boyden chamber assay was used and the migrated B cells were further analysed by FACS. The migration was inhibited either by preincubation of the CSF samples with neutralizing antibodies, heating to 60 C, removal of proteins >3 kDa, or by pre-treatment of the B cells with pertussis toxin. The principal statistical tests used were one-way analysis of variance and Bonferroni test (chemokine measurements) as well as paired Student's t-test (migration experiments). Results: Measurements of chemokine levels revealed an increase in three of the four known major B cell chemoattractants CXCL13, CCL19 and CXCL12 in LNB CSF. The CXCL13 CSF: serum ratio, as a measure of the chemotactic gradient, was substantially higher than that of CCL19 and CXCL12. Moreover, the chemotactic activity of LNB CSF was reduced up to 56% after preincubation with a neutralizing CXCL13 antibody, while combined preincubation with antibodies against CXCL13, CCL19, and CXCL12 did not lead to further reduction. Since treatment with pertussis toxin, heating to 60 degrees C, and removal of proteins >3 kDa abrogated the chemotactic activity, further not yet identified chemokines seem to be involved in B cell recruitment to LNB CSF. Conclusion: Combined, our study suggests a key role of CXCL13 in B cell migration to sites of infection as shown here for the CSF of LNB patients.
The outcome of Genome-Wide Association Studies (GWAS) has challenged the field of blood pressure (BP) genetics as previous candidate genes have not been among the top loci in these scans. We used Affymetrix 500K genotyping data of KORA S3 cohort (n = 1,644; Southern-Germany) to address (i) SNP coverage in 160 BP candidate genes; (ii) the evidence for associations with BP traits in genome-wide and replication data, and haplotype analysis. In total, 160 gene regions (genic region+/-10 kb) covered 2,411 SNPs across 11.4 Mb. Marker densities in genes varied from 0 (n = 11) to 0.6 SNPs/kb. On average 52.5% of the HAPMAP SNPs per gene were captured. No evidence for association with BP was obtained for 1,449 tested SNPs. Considerable associations (P50% of HAPMAP SNPs were tagged. In general, genes with higher marker density (>0.2 SNPs/kb) revealed a better chance to reach close to significance associations. Although, none of the detected P-values remained significant after Bonferroni correction (P
Click To Play These are some verbal notes from a recent meeting I attended. There are some pictures, but it's mostly an audio thing. One thing I forgot to note was that in the Mirror Neuron presentation, an experiment was discussed that focused on sarcasm and the ability of an ASD child to "get it". "A balloon, oh great." Said by 2 different people means two entirely different things depending upon the tone. One person is excited about the balloon and the other is expressing disappointment following the balloon having popped. The ASD kid doesn't pick up on the meaning unless s/he's told to pay attention to the tone prior to hearing the phrase. Only then does the ASD kid "get it" and the appropriate spots light up on the MRI.Running time - a 22 minute beast that I failed to get under 10 minutes for youtube consumption. For Quicktime version, click here(pops)It's pretty rough, but Bonferroni and Venn have been taking up much of my non-sleeping time lately. Enjoy.
Mathematik, Informatik und Statistik - Open Access LMU - Teil 01/03
This paper aims at constructing stepwise test procedures based on the Bonferroni-Holm principle for a multi-way ANOVA. Especially for the two-way ANOVA it is shown, that the procedures keep the multiple level alpha. These theoretical results are supplemented by a simulation study to compare the multiple procedures regarding two power concepts and to learn about which of the introduced procedures is the best.
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.04.23.057679v1?rss=1 Authors: Naganuma, R., Yabe, I., Takeuchi, M., Morishita, K., Nakane, S., Takase, R., Takahashi-Iwata, I., Matsushima, M., Otsuki, M., Shiraishi, H., Sasaki, H. Abstract: Studies on evoked responses in Parkinson's disease (PD) may be useful for elucidating the etiology and quantitative evaluation of PD. However, in previous studies, the association between evoked responses and detailed motor symptoms or cognitive functions has not been clear. This study investigated the characteristics of the visual (VEF), auditory (AEF), and somatosensory (SEF) evoked magnetic fields in patients with Parkinson's disease (PD), and the correlations between evoked fields and the patient's clinical characteristics, motor symptoms, and cognitive functions. Twenty patients with PD and 10 healthy controls (HCs) were recruited as participants. We recorded VEF, AEF, and SEF, collected clinical characteristics, performed physical examinations, and administered 10 cognitive tests. We investigated differences in the latencies of the evoked fields between patients with PD and HCs. We also evaluated the correlation of the latencies with motor symptoms and cognitive functioning. There were significant differences between the two groups in 6 of the cognitive tests, all of which suggested mild cognitive impairment in patients with PD. The latencies of the VEF N75m, P100m, N145m, AEF P50m, P100m, and SEF P60m components were greater in the patients with PD than in the HCs. The latencies mainly correlated with medication and motor symptoms, less so with cognitive tests, with some elements of the correlations remaining significant after Bonferroni correction. In conclusion, the latencies of the VEF, AEF, and SEF were greater in PD patients than in HCs and were mainly correlated with medication and motor symptoms rather than cognitive functioning. Findings from this study suggest that evoked fields may reflect basal ganglia functioning and are candidates for assessing motor symptoms or the therapeutic effects of medication in patients with PD. Copy rights belong to original authors. Visit the link for more info