Podcasts about Chollet

Faster, Please! â€” The Podcast

Play Episode Listen Later Nov 4, 2025 2:57

Huit danseuses et danseurs plus la chanteuse Billie Bird et la DJ Mânaa pour emporter le public dans une formidable tempête de joie. Avec sa troupe de bleu vêtue, la chorégraphe Géradine Chollet célèbre le collectif et le lien. Cette baleine fait un job formidable. A voir à Vidy-Lausanne jusquʹau 9 novembre. Puis à lʹADC de Genève du 13 au 15 novembre. Chronique de Thierry Sartoretti.

puis chronique huit ventre adc dj m tendresse chollet la baleine thierry sartoretti

Play Episode Listen Later Sep 26, 2025 30:25

My fellow pro-growth/progress/abundance Up Wingers,Artificial intelligence may prove to be one of the most transformative technologies in history, but like any tool, its immense power for good comes with a unique array of risks, both large and small.Today on Faster, Please! — The Podcast, I chat with Miles Brundage about extracting the most out of AI's potential while mitigating harms. We discuss the evolving expectations for AI development and how to reconcile with the technology's most daunting challenges.Brundage is an AI policy researcher. He is a non-resident fellow at the Institute for Progress, and formerly held a number of senior roles at OpenAI. He is also the author of his own Substack.In This Episode* Setting expectations (1:18)* Maximizing the benefits (7:21)* Recognizing the risks (13:23)* Pacing true progress (19:04)* Considering national security (21:39)* Grounds for optimism and pessimism (27:15)Below is a lightly edited transcript of our conversation. Setting expectations (1:18)It seems to me like there are multiple vibe shifts happening at different cadences and in different directions.Pethokoukis: Earlier this year I was moderating a discussion between an economist here at AEI and a CEO of a leading AI company, and when I asked each of them how AI might impact our lives, our economists said, ‘Well, I could imagine, for instance, a doctor's productivity increasing because AI could accurately and deeply translate and transcribe an appointment with a patient in a way that's far better than what's currently available.” So that was his scenario. And then I asked the same question of the AI company CEO, who said, by contrast, “Well, I think within a decade, all human death will be optional thanks to AI-driven medical advances.” On that rather broad spectrum — more efficient doctor appointments and immortality — how do you see the potential of this technology?Brundage: It's a good question. I don't think those are necessarily mutually exclusive. I think, in general, AI can both augment productivity and substitute for human labor, and the ratio of those things is kind of hard to predict and might be very policy dependent and social-norm dependent. What I will say is that, in general, it seems to me like the pace of progress is very fast and so both augmentation and substitutions seem to be picking up steam.It's kind of interesting watching the debate between AI researchers and economists, and I have a colleague who has said that the AI researchers sometimes underestimate the practical challenges in deployment at scale. Conversely, the economists sometimes underestimate just how quickly the technology is advancing. I think there's maybe some happy middle to be found, or perhaps one of the more extreme perspectives is true. But personally, I am not an economist, I can't really speak to all of the details of substitution, and augmentation, and all the policy variables here, but what I will say is that at least the technical potential for very significant amounts of augmentation of human labor, as well as substitution for human labor, seem pretty likely on even well less than 10 years — but certainly within 10 years things will change a lot.It seems to me that the vibe has shifted a bit. When I talk to people from the Bay Area and I give them the Washington or Wall Street economist view, to them I sound unbelievably gloomy and cautious. But it seems the vibe has shifted, at least recently, to where a lot of people think that major advancements like superintelligence are further out than they previously thought — like we should be viewing AI as an important technology, but more like what we've seen before with the Internet and the PC.It's hard for me to comment. It seems to me like there are multiple vibe shifts happening at different cadences and in different directions. It seems like several years ago there was more of a consensus that what people today would call AGI was decades away or more, and it does seem like that kind of timeframe has shifted closer to the present. There there's still debate between the “next few years” crowd versus the “more like 10 years” crowd. But that is a much narrower range than we saw several years ago when there was a wider range of expert opinions. People who used to be seen as on one end of the spectrum, for example, Gary Marcus and François Chollet who were seen as kind of the skeptics of AI progress, even they now are saying, “Oh, it's like maybe 10 years or so, maybe five years for very high levels of capability.” So I think there's been some compression in that respect. That's one thing that's going on.There's also a way in which people are starting to think less abstractly and more concretely about the applications of AI and seeing it less as this kind of mysterious thing that might happen suddenly and thinking of it more as incremental, more as something that requires some work to apply in various parts of the economy that there's some friction associated with.Both of these aren't inconsistent, they're just kind of different vibe shifts that are happening. So getting back to the question of is this just a normal technology, I would say that, at the very least, it does seem faster in some respects than some other technological changes that we've seen. So I think ChatGPT's adoption going from zero to double-digit percentages of use across many professions in the US and in a matter of high number of months, low number of years, is quite stark.Would you be surprised if, five years from now, we viewed AI as something much more important than just another incremental technological advance, something far more transformative than technologies that have come before?No, I wouldn't be surprised by that at all. If I understand your question correctly, my baseline expectation is that it will be seen as one of the most important technologies ever. I'm not sure that there's a standard consensus on how to rate the internet versus electricity, et cetera, but it does seem to me like it's of the same caliber of electricity in the sense of essentially converting one kind of energy into various kinds of useful economic work. Similarly, AI is converting various types of electricity into cognitive work, and I think that's a huge deal.Maximizing the benefits (7:21)There's also a lot of value being left on the table in terms of finding new ways to exploit the upsides and accelerate particularly beneficial applications.However you want to define society or the aspect of society that you focus on — government businesses, individuals — are we collectively doing what we need to do to fully exploit the upsides of this technology over the next half-decade to decade, as well as minimizing potential downsides?I think we are not, and this is something that I sometimes find frustrating about the way that the debate plays out is that there's sometimes this zero-sum mentality of doomers versus boomers — a term that Karen Hao uses — and this idea that there's this inherent tension between mitigating the risks and maximizing the benefits, and there are some tensions, but I don't think that we are on the Pareto frontier, so to speak, of those issues.Right now, I think there's a lot of value being left on the table in terms of fairly low-cost risk mitigations. There's also a lot of value being left on the table in terms of finding new ways to exploit the upsides and accelerate particularly beneficial applications. I'll give just one example, because I write a lot about the risk, but I also am very interested in maximizing the upside. So I'll just give one example: Protecting critical infrastructure and improving the cybersecurity of various parts of critical infrastructure in the US. Hospitals, for example, get attacked with ransomware all the time, and this causes real harm to patients because machines get bricked, essentially, and they have one or two people on the IT team, and they're kind of overwhelmed by these, not even always that sophisticated, but perhaps more-sophisticated hackers. That's a huge problem. It matters for national security in addition to patients' lives, and it matters for national security in the sense that this is something that China and Russia and others could hold at risk in the context of a war. They could threaten this critical infrastructure as part of a bargaining strategy.And I don't think that there's that much interest in helping hospitals have a better automated cybersecurity engineer helper among the Big Tech companies — because there aren't that many hospital administrators. . . I'm not sure if it would meet the technical definition of market failure, but it's at least a national security failure in that it's a kind of fragmented market. There's a water plant here, a hospital administrator there.I recently put out a report with the Institute for Progress arguing that philanthropists and government could put some additional gasoline in the tank of cybersecurity by incentivizing innovation that specifically helps these under-resourced defenders more so than the usual customers of cybersecurity companies like Fortune 500 companies.I'm confident that companies and entrepreneurs will figure out how to extract value from AI and create new products and new services, barring any regulatory slowdowns. But since you mentioned low-hanging fruit, what are some examples of that?I would say that transparency is one of the areas where a lot of AI policy experts seem to be in pretty strong agreement. Obviously there is still some debate and disagreement about the details of what should be required, but just to give you some illustration, it is typical for the leading AI companies, sometimes called frontier AI companies, to put out some kind of documentation about the safety steps that they've taken. It's typical for them to say, here's our safety strategy and here's some evidence that we're following this strategy. This includes things like assessing whether their systems can be used for cyber-attacks, and assessing whether they could be used to create biological weapons, or assessing the extent to which they make up facts and make mistakes, but state them very confidently in a way that could pose risks to users of the technology.That tends to be totally voluntary, and there started to be some momentum as a result of various voluntary commitments that were made in recent years, but as the technology gets more high-stakes, and there's more cutthroat competition, and there's maybe more lawsuits where companies might be tempted to retreat a bit in terms of the information that they share, I think that things could kind of backslide, and at the very least not advance as far as I would like from the perspective of making sure that there's sharing of lessons learned from one company to another, as well as making sure that investors and users of the technology can make informed decisions about, okay, do I purchase the services of OpenAI, or Google, or Anthropic, and making these informed decisions, making informed capital investment seems to require transparency to some degree.This is something that is actively being debated in a few contexts. For example, in California there's a bill that has that and a few other things called SB-53. But in general, we're at a bit of a fork in the road in terms of both how certain regulations will be implemented such as in the EU. Is it going to become actually an adaptive, nimble approach to risk mitigation or is it going to become a compliance checklist that just kind of makes big four accounting firms richer? So there are questions then there are just “does the law pass or not?” kind of questions here.Recognizing the risks (13:23). . . I'm sure there'll be some things that we look back on and say it's not ideal, but in my opinion, it's better to do something that is as informed as we can do, because it does seem like there are these kind of market failures and incentive problems that are going to arise if we do nothing . . .In my probably overly simplistic way of looking at it, I think of two buckets and you have issues like, are these things biased? Are they giving misinformation? Are they interacting with young people in a way that's bad for their mental health? And I feel like we have a lot of rules and we have a huge legal system for liability that can probably handle those.Then, in the other bucket, are what may, for the moment, be science-fictional kinds of existential risks, whether it's machines taking over or just being able to give humans the ability to do very bad things in a way we couldn't before. Within that second bucket, I think, it sort of needs to be flexible. Right now, I'm pretty happy with voluntary standards, and market discipline, and maybe the government creating some benchmarks, but I can imagine the technology advancing to where the voluntary aspect seems less viable and there might need to be actual mandates about transparency, or testing, or red teaming, or whatever you want to call it.I think that's a reasonable distinction, in the sense that there are risks at different scales, there are some that are kind of these large-scale catastrophic risks and might have lower likelihood but higher magnitude of impact. And then there are things that are, I would say, literally happening millions of times a day like ChatGPT making up citations to articles that don't exist, or Claud saying that it fixed your code but actually it didn't fix the code and the user's too lazy to notice, and so forth.So there are these different kinds of risks. I personally don't make a super strong distinction between them in terms of different time horizons, precisely because I think things are going so quickly. I think science fiction is becoming science fact very much sooner than many people expected. But in any case, I think that similar logic around, let's make sure that there's transparency even if we don't know exactly what the right risk thresholds are, and we want to allow a fair degree of flexibility and what measures companies take.It seems good that they share what they're doing and, in my opinion, ideally go another step further and allow third parties to audit their practices and make sure that if they say, “Well, we did a rigorous test for hallucination or something like that,” that that's actually true. And so that's what I would like to see for both what you might call the mundane and the more science fiction risks. But again, I think it's kind of hard to say how things will play out, and different people have different perspectives on these things. I happen to be on the more aggressive end of the spectrumI am worried about the spread of the apocalyptic, high-risk AI narrative that we heard so much about when ChatGPT first rolled out. That seems to have quieted, but I worry about it ramping up again and stifling innovation in an attempt to reduce risk.These are very fair concerns, and I will say that there are lots of bills and laws out there that have, in fact, slowed down innovation and certain contexts. The EU, I think, has gone too far in some areas around social media platforms. I do think at least some of the state bills that have been floated would lead to a lot of red tape and burdens to small businesses. I personally think this is avoidable.There are going to be mistakes. I don't want to be misleading about how high quality policymakers' understanding of some of these issues are. There will be mistakes, even in cases where, for example, in California there was a kind of blue ribbon commission of AI experts producing a report over several months, and then that directly informing legislation, and a lot of industry back and forth and negotiation over the details. I would say that's probably the high water mark, SB-53, of fairly stakeholder/expert-informed legislation. Even there, I'm sure there'll be some things that we look back on and say it's not ideal, but in my opinion, it's better to do something that is as informed as we can do, because it does seem like there are these kind of market failures and incentive problems that are going to arise if we do nothing, such as companies retrenching and holding back information that makes it hard for the field as a whole to tackle these issues.I'll just make one more point, which is adapting to the compliance capability of different companies: How rich are they? How expensive are the models they're training, I think is a key factor in the legislation that I tend to be more sympathetic to. So just to make a contrast, there's a bill in Colorado that was kind of one size fits all, regulate all the kind of algorithms, and that, I think, is very burdensome to small businesses. I think something like SB-53 where it says, okay, if you can afford to train an AI system for a $100 million, you can probably afford to put out a dozen pages about your safety and security practices.Pacing true progress (19:04). . . some people . . . kind of wanted to say, “Well, things are slowing down.” But in my opinion, if you look at more objective measures of progress . . . there's quite rapid progress happening still.Hopefully Grok did not create this tweet of yours, but if it did, well, there we go. You won't have to answer it, but I just want to understand what you meant by it: “A lot of AI safety people really, really want to find evidence that we have a lot of time for AGI.” What does that mean?What I was trying to get at is that — and I guess this is not necessarily just AI safety people, but I sometimes kind of try to poke at people in my social network who I'm often on the same side of, but also try to be a friendly critic to, and that includes people who are working on AI safety. I think there's a common tendency to kind of grasp at what I would consider straws when reading papers and interpreting product launches in a way that kind of suggests, well, we've hit a wall, AI is slowing down, this was a flop, who cares?I'm doing my kind of maybe uncharitable psychoanalysis. What I was getting at is that I think one reason why some people might be tempted to do that is that it makes things seem easier and less scary: “Well, we don't have to worry about really powerful AI enabled cyber-attacks for another five years, or biological weapons for another two years, or whatever.” Maybe, maybe not.I think the specific example that sparked that was GPT-5 where there were a lot of people who, in my opinion, were reading the tea leaves in a particular way and missing important parts of the context. For example, at GPT-5 wasn't a much larger or more expensive-to-train model than GPT-4, which may be surprising by the name. And I think OpenAI did kind of screw up the naming and gave people the wrong impression, but from my perspective, there was nothing particularly surprising, but to some people it was kind of a flop that they kind of wanted to say, “Well, things are slowing down.” But in my opinion, if you look at more objective measures of progress like scores on math, and coding, and the reduction in the rate of hallucinations, and solving chemistry and biology problems, and designing new chips, and so forth, there's quite rapid progress happening still.Considering national security (21:39)I want to avoid a scenario like the Cuban Missile Crisis or ways in which that could have been much worse than the actual Cuban Missile Crisis happening as a result of AI and AGI.I'm not sure if you're familiar with some of the work being done by former Google CEO Eric Schmidt, who's been doing a lot of work on national security and AI, and his work, it doesn't use the word AGI, but it talks about AI certainly smart enough to be able to have certain capabilities which our national security establishment should be aware of, should be planning, and those capabilities, I think to most people, would seem sort of science fictional: being able to launch incredibly sophisticated cyber-attacks, or be able to improve itself, or be able to create some other sort of capabilities. And from that, I'm like, whether or not you think that's possible, to me, the odds of that being possible are not zero, and if they're not zero, some bit of the bandwidth of the Pentagon should be thinking about that. I mean, is that sensible?Yeah, it's totally sensible. I'm not going to argue with you there. In fact, I've done some collaboration with the Rand Corporation, which has a pretty heavy investment in what they call the geopolitics of AGI and kind of studying what are the scenarios, including AI and AGI being used to produce “wonder weapons” and super-weapons of some kind.Basically, I think this is super important and in fact, I have a paper coming out that was in collaboration with some folks there pretty soon. I won't spoil all the details, but if you search “Miles Brundage US China,” you'll see some things that I've discussed there. And basically my perspective is we need to strike a balance between competing vigorously on the commercial side with countries like China and Russia on AI — more so China, Russia is less of a threat on the commercial side, at least — and also making sure that we're fielding national security applications of AI in a responsible way, but also recognizing that there are these ways in which things could spiral out of control in a scenario with totally unbridled competition. I want to avoid a scenario like the Cuban Missile Crisis or ways in which that could have been much worse than the actual Cuban Missile Crisis happening as a result of AI and AGI.If you think that, again, the odds are not zero that a technology which is fast-evolving, that we have no previous experience with because it's fast-evolving, could create the kinds of doomsday scenarios that there's new books out about, people are talking about. And so if you think, okay, not a zero percent chance that could happen, but it is kind of a zero percent chance that we're going to stop AI, smash the GPUs, as someone who cares about policy, are you just hoping for the best, or are the kinds of things we've already talked about — transparency, testing, maybe that testing becoming mandatory at some point — is that enough?It's hard to say what's enough, and I agree that . . . I don't know if I give it zero, maybe if there's some major pandemic caused by AI and then Xi Jinping and Trump get together and say, okay, this is getting out of control, maybe things could change. But yeah, it does seem like continued investment and a large-scale deployment of AI is the most likely scenario.Generally, the way that I see this playing out is that there are kind of three pillars of a solution. There's kind of some degree of safety and security standards. Maybe we won't agree on everything, but we should at least be able to agree that you don't want to lose control of your AI system, you don't want it to get stolen, you don't want a $10 billion AI system to be stolen by a $10 million-scale hacking effort. So I think there are sensible standards you can come up with around safety and security. I think you can have evidence produced or required that companies are following these things. That includes transparency.It also includes, I would say, third-party auditing where there's kind of third parties checking the claims and making sure that these standards are being followed, and then you need some incentives to actually participate in this regime and follow it. And I think the incentives part is tricky, particularly at an international scale. What incentive does China have to play ball other than obviously they don't want to have their AI kill them or overthrow their government or whatever? So where exactly are the interests aligned or not? Is there some kind of system of export control policies or sanctions or something that would drive compliance or is there some other approach? I think that's the tricky part, but to me, those are kind of the rough outlines of a solution. Maybe that's enough, but I think right now it's not even really clear what the rough rules of the road are, who's playing by the rules, and we're relying a lot on goodwill and voluntary reporting. I think we could do better, but is that enough? That's harder to say.Grounds for optimism and pessimism (27:15). . . it seems to me like there is at least some room for learning from experience . . . So in that sense, I'm more optimistic. . . I would say, in another respect, I'm maybe more pessimistic in that I am seeing value being left on the table.Did your experience at OpenAI make you more or make you more optimistic or worried that, when we look back 10 years from now, that AI will have, overall on net, made the world a better place?I am sorry to not give you a simpler answer here, and maybe think I should sit on this one and come up with a kind of clearer, more optimistic or more pessimistic answer, but I'll give you kind of two updates in different directions, and I think they're not totally inconsistent.I would say that I have gotten more optimistic about the solvability of the problem in the following sense. I think that things were very fuzzy five, 10 years ago, and when I joined OpenAI almost seven years now ago now, there was a lot of concern that it could kind of come about suddenly — that one day you don't have AI, the next day you have AGI, and then on the third day you have artificial superintelligence and so forth.But we don't live to see the fourth day.Exactly, and so it seems more gradual to me now, and I think that is a good thing. It also means that — and this is where I differ from some of the more extreme voices in terms of shutting it all down — it seems to me like there is at least some room for learning from experience, iterating, kind of taking the lessons from GPT-5 and translating them into GPT-6, rather than it being something that we have to get 100 percent right on the first shot and there being no room for error. So in that sense, I'm more optimistic.I would say, in another respect, I'm maybe more pessimistic in that I am seeing value being left on the table. It seems to me like, as I said, we're not on the Pareto frontier. It seems like there are pretty straightforward things that could be done for a very small fraction of, say, the US federal budget, or very small fraction of billionaires' personal philanthropy or whatever. That in my opinion, would dramatically reduce the likelihood of an AI-enabled pandemic or various other issues, and would dramatically increase the benefits of AI.It's been a bit sad to continuously see those opportunities being neglected. I hope that as AI becomes more of a salient issue to more people and people start to appreciate, okay, this is a real thing, the benefits are real, the risks are real, that there will be more of a kind of efficient policy market and people take those opportunities, but right now it seems pretty inefficient to me. That's where my pessimism comes from. It's not that it's unsolvable, it's just, okay, from a political economy and kind of public-choice perspective, are the policymakers going to make the right decisions?On sale everywhere The Conservative Futurist: How To Create the Sci-Fi World We Were PromisedMicro Reads Faster, Please! is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit fasterplease.substack.com/subscribe

Les invité.es: Emma Dubois et Jean Chollet, "Heidi"

Play Episode Listen Later Sep 9, 2025 17:04

Le Théâtre Bateau-lune à Cheseaux-sur-Lausanne, en co-production avec le Pantographe à Vevey, présente un nouveau spectacle " Heidi ", dont tout-le-monde connaît lʹhistoire, mais peut-être moins son auteure, la zurichoise Johanna Spyri, dont sont tirés les romans. Le personnage de "Heidi" sera incarnée par deux jeunes comédiennes. Aurore Dupuis dès le 10 septembre au Théâtre du Bateau-Lune à Cheseaux-sur-Lausanne, et Emma Dubois 12 ans dès le 28 septembre au Pantographe à Vevey. Publié en 1880, le premier volume a immédiatement remporté un succès considérable et le roman fut traduit en six langues, adapté de nombreuses fois au cinéma et à la télé. Et désormais, au théâtre. Agenda: "Heidi" au Théâtre Bateau-Lune à Cheseaux-sur-Lausanne, du 10 au 21 septembre - https://www.lepantographe.ch/heidi/ Et au Pantographe à Vevey, du 28 octobre au 9 novembre - Heidi – Le Pantographe La comédienne Emma Dubois et le metteur en scène et directeur du Bâteau-Lune Jean Chollet sont les invité.es dʹAnne Laure Gannac.

dubois invit lausanne leth bateau publi chollet johanna spyri

Emma Dubois et Jean Chollet, "Heidi"

Play Episode Listen Later Sep 9, 2025 55:37

dubois chollet

ACTU CULTURELLE

Prends pas Ã§a pour du cash

Play Episode Listen Later Sep 4, 2025 3:29

Le Grand Prix suisse des arts de la scène au danseur Thomas Hauert Le danseur et chorégraphe soleurois Thomas Hauert est l'heureux détenteur de l'anneau Hans Reinhart, suprême distinction helvétique des arts de la scène. Ont également été honoré-e-s: la circassienne Titoune de la compagnie Trotolla, deux performers (le Zurichois Daniel Hellmann et la Genevoise Davide-Christelle Sanvee), le metteur en scène vaudois Fabrice Gorgerat, la programmatrice genevoise Anne Davier pour sa contribution à la danse contemporaine helvétique ou encore la danseuse vaudoise Géraldine Chollet. Côté spectacles, "Le Repos" de Clara Delorme et "Dans ton intérieur" de Julia Perazzini ont été distingués. Présentations par Thierry Sartoretti

pr dans ont actu culturelle chollet le grand prix thierry sartoretti

Vibe Coding and The Rise of AI Agents with Amjad Masad and Yohei Nakajima

Venture Stories

Play Episode Listen Later Jun 20, 2025 58:59

Amjad Masad (@amasad), founder and CEO of Replit, and Yohei Nakajima (@yoheinakajima), Managing Partner at Untapped Capital, joined Village Global partner Ben Casnocha for a live masterclass with Village Global founders.Takeaways:AI agents are rapidly evolving, with coding and deep research agents showing the most traction today. But general-purpose assistants are still brittle — trip-planning and high-context tasks remain hard.Replit Agent shows how quickly full-stack applications can be built today, sometimes in under an hour — even by non-technical users. What matters most isn't a CS degree, it's traits like curiosity, grit, and systems thinking.Many AI startups are too quick to claim “moats” when most don't really have one. True defensibility requires deep domain insight, unique data, and the right founder traits.The rise of vertical AI agents is compelling — specialists outperform general agents for now. A real AGI will change everything, and it's so disruptive it's not even worth planning around.The best investors still look for timeless traits: hard-charging, resourceful founders, attacking stagnant industries. AI changes a lot — but not what makes a great early-stage team.Tools like Replit are making vibe coding (yes, even for non-coders) a superpower. From executive dashboards to lightweight Crunchbase clones, agents are already creating real enterprise value.Don't over-engineer AI use cases. Start with internal tools or things you've always wanted to build. The best projects often come from personal curiosity and side projects.Resources mentioned:Replit – The coding platform behind Replit Agent, enabling fast full-stack app creation with AIVCpedia by Yohei Nakajima – A startup intelligence platform vibe-coded with Replit AgentTweet: $150k → $400 NetSuite extension – Real-world example of arbitrage using ReplitTED Talk on Grit by Angela Duckworth – Referenced by Amjad as a key trait for AI builders“Perfectionism” blog post by Amjad Masad – Why it holds builders back and how to overcome itSeven Powers by Hamilton Helmer – The strategy book Amjad calls the best resource on real moatsNEO – A fully autonomous ML engineerLayers – An autonomous AI marketing agent that lives in your IDEBasis – A vertical AI agent for accounting firmsNDEA – A new lab (founded by François Chollet & Mike Knoop) exploring AGI with program synthesisThanks for listening — if you like what you hear, please review us on your favorite podcast platform.Check us out on the web at www.villageglobal.vc or get in touch with us on Twitter @villageglobal.Want to get updates from us? Subscribe to get a peek inside the Village. We'll send you reading recommendations, exclusive event invites, and commentary on the latest happenings in Silicon Valley. www.villageglobal.vc/signup

Episode 48: HOW TO BENCHMARK AGI WITH GREG KAMRADT

Vanishing Gradients

Play Episode Listen Later May 23, 2025 64:25

If we want to make progress toward AGI, we need a clear definition of intelligence—and a way to measure it. In this episode, Hugo talks with Greg Kamradt, President of the ARC Prize Foundation, about ARC-AGI: a benchmark built on Francois Chollet's definition of intelligence as “the efficiency at which you learn new things.” Unlike most evals that focus on memorization or task completion, ARC is designed to measure generalization—and expose where today's top models fall short. They discuss:

Cours Économie et Monde des affaires pour les étudiants de Sciences humaines

Play Episode Listen Later May 2, 2025 19:23

Entrevue avec Cédric Chollet, enseignant en économie au Cégep de Saint-Laurent.Pour de l'information concernant l'utilisation de vos données personnelles - https://omnystudio.com/policies/listener/fr

monde cours affaires entrevue saint laurent chollet sciences humaines

Épisode du vendredi 2 mai | 600 millions pour une démocratie fonctionnel

Prends pas Ã§a pour du cash

Play Episode Listen Later May 2, 2025 58:21

Dans cet épisode intégral du 2 mai, en entrevue : Cédric Chollet, enseignant en économie au Cégep de Saint-Laurent Pierre-Charles Drapeau, président d'InnovMarine Une production QUB Mai 2025Pour de l'information concernant l'utilisation de vos données personnelles - https://omnystudio.com/policies/listener/fr

millions dans vendredi chollet

174 - Mieux connaître votre cible client grâce à l'enquête persona - avec Émilie Chollet | persona, cible client, marketing, audience cible

My Marketing Podcast

Play Episode Listen Later Apr 24, 2025 37:57

Vous pensez connaître vos clients sur le bout des doigts… mais êtes-vous bien sûr de ne pas vous baser uniquement sur des suppositions ?Dans cet épisode, on reçoit Émilie Chollet, spécialiste en persona et analyse comportementale client, pour vous aider à réaliser une enquête persona utile, claire et sans faux pas. Avec elle, on décortique les étapes clés pour interroger vos clients efficacement, comprendre leurs vrais besoins et affiner votre stratégie marketing.PROGRAMME :Pourquoi il ne faut pas réserver l'enquête persona aux grandes boîtesÀ quel moment faire une étude client (et quand ce n'est pas la peine de tout refaire)Comment poser les bonnes questions (sans influencer les réponses)Les erreurs classiques à éviter pour ne pas biaiser vos résultatsEntretien qualitatif ou questionnaire : comment choisir ?Notre méthode pas-à-pas pour récolter et analyser les données sans se noyerUn épisode ultra-concret pour arrêter les suppositions et prendre enfin des décisions marketing basées sur des données fiables.A PROPOS D'ÉMILIE CHOLLETLinkedIn____

Beyond Brute Force: Chollet & Knoop on ARC AGI 2, the Benchmark Breaking LLMs and the Search for True Machine Intelligence

The MAD Podcast with Matt Turck

Play Episode Listen Later Apr 3, 2025 60:45

In this fascinating episode, we dive deep into the race towards true AI intelligence, AGI benchmarks, test-time adaptation, and program synthesis with star AI researcher (and philosopher) Francois Chollet, creator of Keras and the ARC AGI benchmark, and Mike Knoop, co-founder of Zapier and now co-founder with Francois of both the ARC Prize and the research lab Ndea. With the launch of ARC Prize 2025 and ARC-AGI 2, they explain why existing LLMs fall short on true intelligence tests, how new models like O3 mark a step change in capabilities, and what it will really take to reach AGI.We cover everything from the technical evolution of ARC 1 to ARC 2, the shift toward test-time reasoning, and the role of program synthesis as a foundation for more general intelligence. The conversation also explores the philosophical underpinnings of intelligence, the structure of the ARC Prize, and the motivation behind launching Ndea — a ew AGI research lab that aims to build a "factory for rapid scientific advancement." Whether you're deep in the AI research trenches or just fascinated by where this is all headed, this episode offers clarity and inspiration.NdeaWebsite - https://ndea.comX/Twitter - https://x.com/ndeaARC PrizeWebsite - https://arcprize.orgX/Twitter - https://x.com/arcprizeFrançois CholletLinkedIn - https://www.linkedin.com/in/fcholletX/Twitter - https://x.com/fcholletMike KnoopX/Twitter - https://x.com/mikeknoopFIRSTMARKWebsite - https://firstmark.comX/Twitter - https://twitter.com/FirstMarkCapMatt Turck (Managing Director)LinkedIn - https://www.linkedin.com/in/turck/X/Twitter - https://twitter.com/mattturck(00:00) Intro (01:05) Introduction to ARC Prize 2025 and ARC-AGI 2 (02:07) What is ARC and how it differs from other AI benchmarks (02:54) Why current models struggle with fluid intelligence (03:52) Shift from static LLMs to test-time adaptation (04:19) What ARC measures vs. traditional benchmarks (07:52) Limitations of brute-force scaling in LLMs (13:31) Defining intelligence: adaptation and efficiency (16:19) How O3 achieved a massive leap in ARC performance (20:35) Speculation on O3's architecture and test-time search (22:48) Program synthesis: what it is and why it matters (28:28) Combining LLMs with search and synthesis techniques (34:57) The ARC Prize structure: efficiency track, private vs. public (42:03) Open source as a requirement for progress (44:59) What's new in ARC-AGI 2 and human benchmark testing (48:14) Capabilities ARC-AGI 2 is designed to test (49:21) When will ARC-AGI 2 be saturated? AGI timelines (52:25) Founding of NDEA and why now (54:19) Vision beyond AGI: a factory for scientific advancement (56:40) What NDEA is building and why it's different from LLM labs (58:32) Hiring and remote-first culture at NDEA (59:52) Closing thoughts and the future of AI research

ARC Prize v2 Launch! (Francois Chollet and Mike Knoop)

Play Episode Listen Later Mar 24, 2025 54:15

We are joined by Francois Chollet and Mike Knoop, to launch the new version of the ARC prize! In version 2, the challenges have been calibrated with humans such that at least 2 humans could solve each task in a reasonable task, but also adversarially selected so that frontier reasoning models can't solve them. The best LLMs today get negligible performance on this challenge. https://arcprize.org/SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/***TRANSCRIPT:https://www.dropbox.com/scl/fi/0v9o8xcpppdwnkntj59oi/ARCv2.pdf?rlkey=luqb6f141976vra6zdtptv5uj&dl=0TOC:1. ARC v2 Core Design & Objectives [00:00:00] 1.1 ARC v2 Launch and Benchmark Architecture [00:03:16] 1.2 Test-Time Optimization and AGI Assessment [00:06:24] 1.3 Human-AI Capability Analysis [00:13:02] 1.4 OpenAI o3 Initial Performance Results2. ARC Technical Evolution [00:17:20] 2.1 ARC-v1 to ARC-v2 Design Improvements [00:21:12] 2.2 Human Validation Methodology [00:26:05] 2.3 Task Design and Gaming Prevention [00:29:11] 2.4 Intelligence Measurement Framework3. O3 Performance & Future Challenges [00:38:50] 3.1 O3 Comprehensive Performance Analysis [00:43:40] 3.2 System Limitations and Failure Modes [00:49:30] 3.3 Program Synthesis Applications [00:53:00] 3.4 Future Development RoadmapREFS:[00:00:15] On the Measure of Intelligence, François Chollethttps://arxiv.org/abs/1911.01547[00:06:45] ARC Prize Foundation, François Chollet, Mike Knoophttps://arcprize.org/[00:12:50] OpenAI o3 model performance on ARC v1, ARC Prize Teamhttps://arcprize.org/blog/oai-o3-pub-breakthrough[00:18:30] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Jason Wei et al.https://arxiv.org/abs/2201.11903[00:21:45] ARC-v2 benchmark tasks, Mike Knoophttps://arcprize.org/blog/introducing-arc-agi-public-leaderboard[00:26:05] ARC Prize 2024: Technical Report, Francois Chollet et al.https://arxiv.org/html/2412.04604v2[00:32:45] ARC Prize 2024 Technical Report, Francois Chollet, Mike Knoop, Gregory Kamradthttps://arxiv.org/abs/2412.04604[00:48:55] The Bitter Lesson, Rich Suttonhttp://www.incompleteideas.net/IncIdeas/BitterLesson.html[00:53:30] Decoding strategies in neural text generation, Sina Zarrießhttps://www.mdpi.com/2078-2489/12/9/355/pdf

events launch intelligence prizes measure chain openai arc decoding ml zurich objectives francois agi large language models chief engineer future challenges chollet core design technical report failure modes mike knoop

Test-Time Adaptation: the key to reasoning with DL (Mohamed Osman)

AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store

Play Episode Listen Later Mar 22, 2025 63:36

Mohamed Osman joins to discuss MindsAI's highest scoring entry to the ARC challenge 2024 and the paradigm of test-time fine-tuning. They explore how the team, now part of Tufa Labs in Zurich, achieved state-of-the-art results using a combination of pre-training techniques, a unique meta-learning strategy, and an ensemble voting mechanism. Mohamed emphasizes the importance of raw data input and flexibility of the network.SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/***TRANSCRIPT + REFS:https://www.dropbox.com/scl/fi/jeavyqidsjzjgjgd7ns7h/MoFInal.pdf?rlkey=cjjmo7rgtenxrr3b46nk6yq2e&dl=0Mohamed Osman (Tufa Labs)https://x.com/MohamedOsmanMLJack Cole (Tufa Labs)https://x.com/MindsAI_JackHow and why deep learning for ARC paper:https://github.com/MohamedOsman1998/deep-learning-for-arc/blob/main/deep_learning_for_arc.pdfTOC:1. Abstract Reasoning Foundations [00:00:00] 1.1 Test-Time Fine-Tuning and ARC Challenge Overview [00:10:20] 1.2 Neural Networks vs Programmatic Approaches to Reasoning [00:13:23] 1.3 Code-Based Learning and Meta-Model Architecture [00:20:26] 1.4 Technical Implementation with Long T5 Model2. ARC Solution Architectures [00:24:10] 2.1 Test-Time Tuning and Voting Methods for ARC Solutions [00:27:54] 2.2 Model Generalization and Function Generation Challenges [00:32:53] 2.3 Input Representation and VLM Limitations [00:36:21] 2.4 Architecture Innovation and Cross-Modal Integration [00:40:05] 2.5 Future of ARC Challenge and Program Synthesis Approaches3. Advanced Systems Integration [00:43:00] 3.1 DreamCoder Evolution and LLM Integration [00:50:07] 3.2 MindsAI Team Progress and Acquisition by Tufa Labs [00:54:15] 3.3 ARC v2 Development and Performance Scaling [00:58:22] 3.4 Intelligence Benchmarks and Transformer Limitations [01:01:50] 3.5 Neural Architecture Optimization and Processing DistributionREFS:[00:01:32] Original ARC challenge paper, François Chollethttps://arxiv.org/abs/1911.01547[00:06:55] DreamCoder, Kevin Ellis et al.https://arxiv.org/abs/2006.08381[00:12:50] Deep Learning with Python, François Chollethttps://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438[00:13:35] Deep Learning with Python, François Chollethttps://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438[00:13:35] Influence of pretraining data for reasoning, Laura Ruishttps://arxiv.org/abs/2411.12580[00:17:50] Latent Program Networks, Clement Bonnethttps://arxiv.org/html/2411.08706v1[00:20:50] T5, Colin Raffel et al.https://arxiv.org/abs/1910.10683[00:30:30] Combining Induction and Transduction for Abstract Reasoning, Wen-Ding Li, Kevin Ellis et al.https://arxiv.org/abs/2411.02272[00:34:15] Six finger problem, Chen et al.https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_SpatialVLM_Endowing_Vision-Language_Models_with_Spatial_Reasoning_Capabilities_CVPR_2024_paper.pdf[00:38:15] DeepSeek-R1-Distill-Llama, DeepSeek AIhttps://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B[00:40:10] ARC Prize 2024 Technical Report, François Chollet et al.https://arxiv.org/html/2412.04604v2[00:45:20] LLM-Guided Compositional Program Synthesis, Wen-Ding Li and Kevin Ellishttps://arxiv.org/html/2503.15540[00:54:25] Abstraction and Reasoning Corpus, François Chollethttps://github.com/fchollet/ARC-AGI[00:57:10] O3 breakthrough on ARC-AGI, OpenAIhttps://arcprize.org/[00:59:35] ConceptARC Benchmark, Arseny Moskvichev, Melanie Mitchellhttps://arxiv.org/abs/2305.07141[01:02:05] Mixtape: Breaking the Softmax Bottleneck Efficiently, Yang, Zhilin and Dai, Zihang and Salakhutdinov, Ruslan and Cohen, William W.http://papers.neurips.cc/paper/9723-mixtape-breaking-the-softmax-bottleneck-efficiently.pdf

Leroy Chollet

featured Wiki of the Day

Play Episode Listen Later Mar 5, 2025 2:31

fWotD Episode 2861: Leroy Chollet Welcome to Featured Wiki of the Day, your daily dose of knowledge from Wikipedia’s finest articles.The featured article for Wednesday, 5 March 2025 is Leroy Chollet.Leroy Patrick Chollet (March 5, 1925 – June 10, 1998) was an American professional basketball player. Chollet and his brothers attended Holy Cross School in New Orleans and excelled in sports. After a year in the United States Navy, Chollet enrolled at Loyola University New Orleans and led the Loyola Wolf Pack to their first NAIA men's basketball championship in 1945. Louisiana schools were segregated at the time. Chollet had an African American great-grandparent, and when this was revealed he was pressured into leaving Loyola. He moved to New York and played three seasons for Canisius College. In New York, he passed as white; Canisius would later claim Chollet to be the school's first African American basketball player.Chollet played for several professional teams, including the Syracuse Nationals. During the inaugural season of the National Basketball Association (NBA), he became a role player behind established veterans, and the team made it to the 1950 NBA Finals. An ankle injury limited Chollet's second year in the NBA. The Elmira Colonels, an American Basketball League team, signed Chollet for his third and final season. He married Barbara Knaus in June 1950. After retiring from professional basketball in 1952, he moved to her hometown, Lakewood, Ohio. They had three children: Lawrence, Melanie, and David. In Lakewood, Chollet worked on the construction of St. Edward High School and became a teacher and varsity head coach. He was inducted into the Halls of Fame of Holy Cross School, Loyola University, and Canisius College. He died in 1998.This recording reflects the Wikipedia text as of 00:30 UTC on Wednesday, 5 March 2025.For the full current version of the article, see Leroy Chollet on Wikipedia.This podcast uses content from Wikipedia under the Creative Commons Attribution-ShareAlike License.Visit our archives at wikioftheday.com and subscribe to stay updated on new episodes.Follow us on Mastodon at @wikioftheday@masto.ai.Also check out Curmudgeon's Corner, a current events podcast.Until next time, I'm standard Salli.

american new york education nba ohio new orleans african americans fame louisiana wikipedia nba finals corner mastodon halls loyola united states navy lakewood loyola university naia utc curmudgeon loyola university new orleans canisius national basketball association nba canisius college chollet syracuse nationals american basketball league

#197 - AI in Gmail+Docs, MiniMax-01, Titans, Transformer^2

Let's Talk AI

Play Episode Listen Later Jan 20, 2025 83:52 Transcription Available

Our 197th episode with a summary and discussion of last week's big AI news! Recorded on 01/17/2024 Join our brand new Discord here! https://discord.gg/nTyezGSKwP Hosted by Andrey Kurenkov and guest-hosted by the folks from Latent Space Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. Sponsors: The Generator - An interdisciplinary AI lab empowering innovators from all fields to bring visionary ideas to life by harnessing the capabilities of artificial intelligence. In this episode: - Google and Mistral sign deals with AP and AFP, respectively, to deliver up-to-date news through their AI platforms. - ChatGPT introduces a tasks feature for reminders and to-dos, positioning itself more as a personal assistant. - Synthesia raises $180 million to enhance its AI video platform for generating videos of human avatars. - New U.S. guidelines restrict exporting AI chips to various countries, impacting Nvidia and other tech firms. If you would like to become a sponsor for the newsletter, podcast, or both, please fill out this form. Timestamps + Links: (00:00:00) Intro / Banter (00:04:29) News Preview (00:05:09) Response to listener comments (00:05:58) Sponsor Break Tools & Apps (00:07:01) Google is making AI in Gmail and Docs free — but raising the price of Workspace (00:07:52) Microsoft relaunches Copilot for business with free AI chat and pay-as-you-go agents (00:12:36) Google signs deal with AP to deliver up-to-date news through its Gemini AI chatbot (00:18:08) Mistral signs deal with AFP to offer up-to-date answers in Le Chat (00:18:45) ChatGPT can now handle reminders and to-dos Applications & Business (00:22:53) Palmer Luckey's AI Defense Company Anduril Is Building a $1 Billion Plant in Ohio (00:28:36) OpenAI is bankrolling Axios' expansion into four new markets (00:29:39) AI researcher François Chollet founds a new AI lab focused on AGI (00:32:18) Nvidia-backed AI video platform Synthesia doubles valuation to $2.1 billion (00:34:46) Anysphere Raises $105M in Series B (00:40:14) Harvey Valuation of 3 Billion Projects & Open Source (00:46:12) MiniMax-01: Scaling Foundation Models with Lightning Attention (00:51:16) MinMo: A Multimodal Large Language Model with Approximately 8B Parameters for Seamless Voice Interaction (00:53:01) HALoGEN: Fantastic LLM Hallucinations and Where to Find Them Research & Advancements (00:57:03) Titans: Learning to Memorize at Test Time (01:04:38) Transformer2: Self-adaptive LLMs (01:08:15) Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps Policy & Safety (01:11:23) Biden administration proposes sweeping new restrictions on exporting AI chips (01:13:56) Biden orders Energy, Defense departments to lease sites for AI data centers, clean energy generation (01:15:00) OpenAI presents its preferred version of AI regulation in a new ‘blueprint' (01:16:15) More teens report using ChatGPT for schoolwork, despite the tech's faults Synthetic Media & Art (01:17:55) In AI copyright case, Zuckerberg turns to YouTube for his defense (01:19:53) Outro

AI Daily News 2025/01/16:

Play Episode Listen Later Jan 17, 2025 13:34

A Daily Chronicle of AI Innovations on January 16th 2025

ai microsoft model develop covering researchers tier breast cancer predict expands copilot deep learning daily news ai innovations chollet daily chronicle

Thu. 01/16 – Nintendo Switch 2

Techmeme Ride Home

Play Episode Listen Later Jan 16, 2025 15:55

Nintendo finally takes the wraps off the Switch 2. Everybody seems to want to give TikTok more time, but can they find a way to do it? A big new AI research lab. A check in with Nothing. The company, I mean. And the weird story of when Walgreens tries to replace refrigerator doors with smartscreens.Sponsors:TryJoyMode.com and code RIDE at checkoutLinks:Here's the Nintendo Switch 2 (The Verge)Trump considers executive order hoping to ‘save TikTok' from ban or sale in U.S. law (Washington Post)Biden administration looks for ways to keep TikTok available in the U.S. (NBCNews)AI researcher François Chollet founds a new AI lab focused on AGI (TechCrunch)Phone Startup Nothing Raises Funding, Crosses $1 Billion in Lifetime Sales (Bloomberg)Walgreens Replaced Fridge Doors With Smart Screens. It's Now a $200 Million Fiasco (Bloomberg)See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

tiktok ai ride switch nintendo billion nintendo switch walgreens crosses chollet

Culture - Thomas Isle avec Christelle Chollet

Culture médias - Philippe Vandel

Play Episode Listen Later Jan 10, 2025 28:00

Thomas Isle et sa bande vous font vivre toute l'actualité culturelle, entre invités et décryptages, le tout dénué d'à-priori, mais non de bienveillance.

culture christelle chollet thomas isle

Francois Chollet - ARC reflections - NeurIPS 2024

Play Episode Listen Later Jan 9, 2025 86:46

François Chollet discusses the outcomes of the ARC-AGI (Abstraction and Reasoning Corpus) Prize competition in 2024, where accuracy rose from 33% to 55.5% on a private evaluation set. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events? They are hosting an event in Zurich on January 9th with the ARChitects, join if you can. Goto https://tufalabs.ai/ *** Read about the recent result on o3 with ARC here (Chollet knew about it at the time of the interview but wasn't allowed to say): https://arcprize.org/blog/oai-o3-pub-breakthrough TOC: 1. Introduction and Opening [00:00:00] 1.1 Deep Learning vs. Symbolic Reasoning: François's Long-Standing Hybrid View [00:00:48] 1.2 “Why Do They Call You a Symbolist?” – Addressing Misconceptions [00:01:31] 1.3 Defining Reasoning 3. ARC Competition 2024 Results and Evolution [00:07:26] 3.1 ARC Prize 2024: Reflecting on the Narrative Shift Toward System 2 [00:10:29] 3.2 Comparing Private Leaderboard vs. Public Leaderboard Solutions [00:13:17] 3.3 Two Winning Approaches: Deep Learning–Guided Program Synthesis and Test-Time Training 4. Transduction vs. Induction in ARC [00:16:04] 4.1 Test-Time Training, Overfitting Concerns, and Developer-Aware Generalization [00:19:35] 4.2 Gradient Descent Adaptation vs. Discrete Program Search 5. ARC-2 Development and Future Directions [00:23:51] 5.1 Ensemble Methods, Benchmark Flaws, and the Need for ARC-2 [00:25:35] 5.2 Human-Level Performance Metrics and Private Test Sets [00:29:44] 5.3 Task Diversity, Redundancy Issues, and Expanded Evaluation Methodology 6. Program Synthesis Approaches [00:30:18] 6.1 Induction vs. Transduction [00:32:11] 6.2 Challenges of Writing Algorithms for Perceptual vs. Algorithmic Tasks [00:34:23] 6.3 Combining Induction and Transduction [00:37:05] 6.4 Multi-View Insight and Overfitting Regulation 7. Latent Space and Graph-Based Synthesis [00:38:17] 7.1 Clément Bonnet's Latent Program Search Approach [00:40:10] 7.2 Decoding to Symbolic Form and Local Discrete Search [00:41:15] 7.3 Graph of Operators vs. Token-by-Token Code Generation [00:45:50] 7.4 Iterative Program Graph Modifications and Reusable Functions 8. Compute Efficiency and Lifelong Learning [00:48:05] 8.1 Symbolic Process for Architecture Generation [00:50:33] 8.2 Logarithmic Relationship of Compute and Accuracy [00:52:20] 8.3 Learning New Building Blocks for Future Tasks 9. AI Reasoning and Future Development [00:53:15] 9.1 Consciousness as a Self-Consistency Mechanism in Iterative Reasoning [00:56:30] 9.2 Reconciling Symbolic and Connectionist Views [01:00:13] 9.3 System 2 Reasoning - Awareness and Consistency [01:03:05] 9.4 Novel Problem Solving, Abstraction, and Reusability 10. Program Synthesis and Research Lab [01:05:53] 10.1 François Leaving Google to Focus on Program Synthesis [01:09:55] 10.2 Democratizing Programming and Natural Language Instruction 11. Frontier Models and O1 Architecture [01:14:38] 11.1 Search-Based Chain of Thought vs. Standard Forward Pass [01:16:55] 11.2 o1's Natural Language Program Generation and Test-Time Compute Scaling [01:19:35] 11.3 Logarithmic Gains with Deeper Search 12. ARC Evaluation and Human Intelligence [01:22:55] 12.1 LLMs as Guessing Machines and Agent Reliability Issues [01:25:02] 12.2 ARC-2 Human Testing and Correlation with g-Factor [01:26:16] 12.3 Closing Remarks and Future Directions SHOWNOTES PDF: https://www.dropbox.com/scl/fi/ujaai0ewpdnsosc5mc30k/CholletNeurips.pdf?rlkey=s68dp432vefpj2z0dp5wmzqz6&st=hazphyx5&dl=0

o3 d'Open AI atteint le niveau de l'intelligence humaine ?

Choses à Savoir TECH

Play Episode Listen Later Jan 8, 2025 2:14

Le système O3 d'OpenAI, futur moteur de ChatGPT, a marqué un tournant majeur dans la recherche en intelligence artificielle. Il a récemment obtenu un score de 85 % au test ARC-AGI, un benchmark de référence conçu pour évaluer la capacité des systèmes d'IA à généraliser et s'adapter à de nouvelles situations. Ce résultat, égal à la moyenne humaine, surpasse nettement les 55 % obtenus par les IA précédentes. Une avancée qui alimente les espoirs d'approcher l'intelligence artificielle générale (AGI). Le test ARC-AGI, développé par le chercheur français François Chollet, évalue l'efficacité d'échantillonnage : la capacité à résoudre des problèmes inédits à partir de quelques exemples. Concrètement, il s'agit pour l'IA d'analyser des transformations appliquées à des grilles carrées, à partir de trois exemples, avant de généraliser une règle pour résoudre un cas supplémentaire. O3 a impressionné en démontrant une aptitude à identifier des règles simples et généralisables. Selon certains experts, le système pourrait fonctionner par « chaînes de pensée », testant différentes étapes pour résoudre les problèmes avant de sélectionner la meilleure. Une méthode proche de celle d'AlphaGo, l'IA de Google qui a battu le champion du monde de Go. Mais cet enthousiasme s'accompagne de prudence. OpenAI reste discrète sur les détails techniques et les capacités réelles d'O3, limitant ses communications à quelques tests préliminaires. Des experts craignent que cette performance soit le fruit d'une optimisation spécifique au test ARC-AGI, plutôt qu'une véritable capacité de généralisation applicable à d'autres contextes. Pour trancher, des évaluations plus vastes seront nécessaires. Si O3 démontre une adaptabilité humaine dans divers domaines, les répercussions pourraient être révolutionnaires, ouvrant la voie à des IA auto-améliorantes avec des impacts sociétaux majeurs. Reste à voir si cette promesse deviendra réalité. Hébergé par Acast. Visitez acast.com/privacy pour plus d'informations.

google chatgpt acast ia openai visitez niveau selon reste agi concr humaine atteint o3 chollet

Classifying Images: Massive Parallelism And Surface Features

Fluidity

Play Episode Listen Later Jan 5, 2025 15:05

Analysis of image classifiers demonstrates that it is possible to understand backprop networks at the task-relevant run-time algorithmic level. In these systems, at least, networks gain their power from deploying massive parallelism to check for the presence of a vast number of simple, shallow patterns. https://betterwithout.ai/images-surface-features This episode has a lot of links: David Chapman's earliest public mention, in February 2016, of image classifiers probably using color and texture in ways that "cheat": twitter.com/Meaningness/status/698688687341572096 Jordana Cepelewicz's “Where we see shapes, AI sees textures,” Quanta Magazine, July 1, 2019: https://www.quantamagazine.org/where-we-see-shapes-ai-sees-textures-20190701/ “Suddenly, a leopard print sofa appears”, May 2015: https://web.archive.org/web/20150622084852/http://rocknrollnerd.github.io/ml/2015/05/27/leopard-sofa.html “Understanding How Image Quality Affects Deep Neural Networks” April 2016: https://arxiv.org/abs/1604.04004 Goodfellow et al., “Explaining and Harnessing Adversarial Examples,” December 2014: https://arxiv.org/abs/1412.6572 “Universal adversarial perturbations,” October 2016: https://arxiv.org/pdf/1610.08401v1.pdf “Exploring the Landscape of Spatial Robustness,” December 2017: https://arxiv.org/abs/1712.02779 “Overinterpretation reveals image classification model pathologies,” NeurIPS 2021: https://proceedings.neurips.cc/paper/2021/file/8217bb4e7fa0541e0f5e04fea764ab91-Paper.pdf “Approximating CNNs with Bag-of-Local-Features Models Works Surprisingly Well on ImageNet,” ICLR 2019: https://openreview.net/forum?id=SkfMWhAqYQ Baker et al.'s “Deep convolutional networks do not classify based on global object shape,” PLOS Computational Biology, 2018: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006613 François Chollet's Twitter threads about AI producing images of horses with extra legs: twitter.com/fchollet/status/1573836241875120128 and twitter.com/fchollet/status/1573843774803161090 “Zoom In: An Introduction to Circuits,” 2020: https://distill.pub/2020/circuits/zoom-in/ Geirhos et al., “ImageNet-Trained CNNs Are Biased Towards Texture; Increasing Shape Bias Improves Accuracy and Robustness,” ICLR 2019: https://openreview.net/forum?id=Bygh9j09KX Dehghani et al., “Scaling Vision Transformers to 22 Billion Parameters,” 2023: https://arxiv.org/abs/2302.05442 Hasson et al., “Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks,” February 2020: https://www.gwern.net/docs/ai/scaling/2020-hasson.pdf

ai deep massive exploring universal landscape explaining analysis images surface biological bag circuits classifying hasson goodfellow robustness parallelism chollet imagenet neurips david chapman quanta magazine artificial neural networks iclr meaningness plos computational biology

Do AI As Engineering Instead

Fluidity

Play Episode Listen Later Dec 15, 2024 15:47

Current AI practice is not engineering, even when it aims for practical applications, because it is not based on scientific understanding. Enforcing engineering norms on the field could lead to considerably safer systems. https://betterwithout.ai/AI-as-engineering This episode has a lot of links! Here they are. Michael Nielsen's “The role of ‘explanation' in AI”. https://michaelnotebook.com/ongoing/sporadica.html#role_of_explanation_in_AI Subbarao Kambhampati's “Changing the Nature of AI Research”. https://dl.acm.org/doi/pdf/10.1145/3546954 Chris Olah and his collaborators: “Thread: Circuits”. distill.pub/2020/circuits/ “An Overview of Early Vision in InceptionV1”. distill.pub/2020/circuits/early-vision/ Dai et al., “Knowledge Neurons in Pretrained Transformers”. https://arxiv.org/pdf/2104.08696.pdf Meng et al.: “Locating and Editing Factual Associations in GPT.” rome.baulab.info “Mass-Editing Memory in a Transformer,” https://arxiv.org/pdf/2210.07229.pdf François Chollet on image generators putting the wrong number of legs on horses: twitter.com/fchollet/status/1573879858203340800 Neel Nanda's “Longlist of Theories of Impact for Interpretability”, https://www.lesswrong.com/posts/uK6sQCNMw8WKzJeCQ/a-longlist-of-theories-of-impact-for-interpretability Zachary C. Lipton's “The Mythos of Model Interpretability”. https://arxiv.org/abs/1606.03490 Meng et al., “Locating and Editing Factual Associations in GPT”. https://arxiv.org/pdf/2202.05262.pdf Belrose et al., “Eliciting Latent Predictions from Transformers with the Tuned Lens”. https://arxiv.org/abs/2303.08112 “Progress measures for grokking via mechanistic interpretability”. https://arxiv.org/abs/2301.05217 Conmy et al., “Towards Automated Circuit Discovery for Mechanistic Interpretability”. https://arxiv.org/abs/2304.14997 Elhage et al., “Softmax Linear Units,” transformer-circuits.pub/2022/solu/index.html Filan et al., “Clusterability in Neural Networks,” https://arxiv.org/pdf/2103.03386.pdf Cammarata et al., “Curve circuits,” distill.pub/2020/circuits/curve-circuits/ You can support the podcast and get episodes a week early, by supporting the Patreon: https://www.patreon.com/m/fluidityaudiobooks If you like the show, consider buying me a coffee: https://www.buymeacoffee.com/mattarnold Original music by Kevin MacLeod. This podcast is under a Creative Commons Attribution Non-Commercial International 4.0 License.

397: Nostalgia Overdose (hak)

Rebuild

Play Episode Listen Later Dec 12, 2024 99:41

Hakuro Matsuda さんをゲストに迎えて、台湾、Intel, MacBook Pro, OpenAI などについて話しました。スポンサー: SRE Kaigi Show Notes Hub3 12/3 ポッドキャスターインタビュー・宮川達彦さん【Rebuild】 - LISTEN NEWS Loop Experience 2 Plus 耳栓 Intel's CEO is out after only three years Intel, Biden-Harris Administration Finalize $7.86 Billion Funding... Pat Gelsinger (@PGelsinger) Introducing ChatGPT Pro Perplexity Pro｜LINEMO World Labs AI pioneer François Chollet leaves Google Home - SIGGRAPH Asia 2024 Popular video game company to close SF studio amid mass layoffs 桜井政博のゲーム作るにはデッドデッドデーモンズデデデデデストラクションダンダダン正体本心 Gladiator II 機動戦士Gundam GQuuuuuuX（ジークアクス） Ghost of Yōtei is coming in 2025 HD-2D版ドラゴンクエスト 198Xのファミコン狂騒曲中山美穂のトキメキハイスクール SRE Kaigi 2025 ★

ceo ghosts nostalgia intel openai sf overdose macbook pro chollet 198x

Mona Chollet (3/3)

Bookmakers

Play Episode Listen Later Nov 27, 2024 41:04

Une workaholic plus très anonyme Cheffe d'édition au « Monde Diplomatique » de 2007 à 2022, Mona Chollet se décrit – avec euphémisme – comme « plutôt consciencieuse ». Interrogée par « Femme Actuelle », la journaliste explique : « L'aspect robotique du salariat me convenait très bien. Tout comme cette logique rassurante de l'effort récompensé : je me savais le droit de profiter de mes week-ends. » Or, quand le succès de ses livres lui permet de se libérer de cet emploi quotidien, c'est la panique à bord, sur laquelle s'ouvre son dernier essai, « Résister à la culpabilisation » (La Découverte, 2024). Ce « bulldozer » cérébral ajoute : « J'avais oublié l'autonomie. Je m'étais habituée à ce qu'on me dise tous les matins où aller, quoi faire et jusqu'à quelle heure. Organiser soi-même ses journées provoque un grand désarroi. Je me forçais à travailler huit heures par jour et le week-end, pour ne pas me laisser aller (…) Se tuer au travail, faire totalement abstraction de son bien-être, se révèle bien vu. » Bien vu, son propos l'est aussi. Avec un premier tirage de 70 000 exemplaires, « le nouveau Mona Chollet », pour lequel elle refuse les invitations à parler en public, figure déjà parmi les dix meilleures ventes de l'automne. Son livre n'aborde pas seulement la question du sacrifice en entreprise ; parmi ce qu'elle recense comme des « empêchements d'exister », Chollet dissèque les discours misogynes, la mise en accusation des victimes de violences sexuelles, les injonctions éducatives, ou encore « le flicage des mots et des pensées » au sein des sphères militantes.Suivie par 92 000 abonné·e·s sur X, Mona Chollet définit parfois son rapport à l'écriture comme « une drogue en soi, une porte dérobée dans l'horreur de l'époque ». Pour ce troisième et dernier épisode, ouvrons celle du petit bureau – monastique – de la Mona, qui continue de rêver d'une pièce plus grande « dont la fenêtre resterait éclairée jusqu'à une heure avancée de la nuit, pour y faire naître des livres ». L'autrice du mois : Mona CholletNée à Genève en 1973, « obsédée par le fait de lire, de s'informer et de changer le monde », la journaliste suisse Mona Chollet est devenue pour toute une génération de féministes un modèle d'intelligence, de sensibilité et de précision. Depuis le début des années 2000, via une dizaine d'essais érudits (« Beauté fatale », « Sorcières », « Réinventer l'amour »), elle analyse remarquablement les mécanismes de domination (masculine, capitaliste, professionnelle – ou les trois à la fois), en partageant son admiration pour la poésie de Mahmoud Darwich ou la prose engagée de Susan Sontag, pour les séries « Mad Men » ou « La Fabuleuse Madame Maisel », le tout entremêlé de confidences personnelles ou tirées de son cercle d'amies. Elle vit et travaille à Paris. Enregistrements : septembre 2024 - Réalisation : Charlie Marcelet - Mixage : Charlie Marcelet - Illustration : Sylvain Cabot - Chant, beatmaking : Élodie Milo - Musiques originales : Samuel Hirsch - Entretien, découpage : Richard Gaitet - Prise de son : Mathilde Guermonprez - Montage : Gary Salin - Lectures : Delphine Saltel - Production : ARTE Radio

tout depuis mad men lad organisers beaut sorci cheffe susan sontag interrog mona chollet chollet enregistrements suivie monde diplomatique mahmoud darwich samuel hirsch entretien

Mona Chollet (2/3)

Bookmakers

Play Episode Listen Later Nov 27, 2024 49:26

Notre sorcière bien-aimée En 2017, dans le secret nocturne de son laboratoire, Mona Chollet jette dans son chaudron mental les ingrédients de la réhabilitation d'une figure populaire : la sorcière. Publié l'année suivante aux éditions La Découverte, son ouvrage « Sorcières : la puissance invaincue des femmes » se souvient de ces dizaines de milliers de féminicides perpétrés du XVe au XVIIe siècle, en Europe, qui visèrent principalement les célibataires sans enfant. Chollet interroge en profondeur ce « coup porté à toutes les velléités d'indépendance féminine », la « haine » des cheveux blancs, la criminalisation de la contraception et de l'avortement, en s'appuyant autant sur les romans de Toni Morrison que sur le film « Liaison fatale ». Elle y affine son geste : « J'écris pour faire émerger des sujets qui n'étaient parfois même pas identifiés, en affirmant leur pertinence, leur dignité. Je suis une aimable bourgeoise bien élevée et cela m'embarrasse de me faire remarquer. Je sors du rang quand je ne peux pas faire autrement, quand mes convictions et aspirations m'y obligent. J'écris pour me donner du courage. » Abracadabra ! Le livre devient un grimoire de référence traduit en quinze langues et vendu à 380 000 exemplaires. Son nom se mue en incantation. D'où la nécessité d'interroger ses sortilèges, la structure de ses best-sellers qu'elle situe « entre le développement personnel et la politique », son usage des citations ou sa réticence au « terrain », en naviguant des podiums de « Beauté fatale » (sur les clichés véhiculés par l'industrie de la mode et la presse féminine, sorti en 2012 et vendu à 120 000 exemplaires) jusqu'à « Réinventer l'amour » (sur les impasses et les violences des relations hétérosexuelles, sorti en 2021 et vendu à 200 000 exemplaires), en passant par son petit préféré, « Chez soi » (sur « la sagesse des casaniers », sorti en 2015 et vendu à 65 000 exemplaires). Turlututu, chapeau pointu, n'attendons plus : envolons-nous sur le balai de cette sorcière bien-aimée, qui nettoie de nombreuses pensées poussiéreuses ! L'autrice du mois : Mona CholletNée à Genève en 1973, « obsédée par le fait de lire, de s'informer et de changer le monde », la journaliste suisse Mona Chollet est devenue pour toute une génération de féministes un modèle d'intelligence, de sensibilité et de précision. Depuis le début des années 2000, via une dizaine d'essais érudits (« Beauté fatale », « Sorcières », « Réinventer l'amour »), elle analyse remarquablement les mécanismes de domination (masculine, capitaliste, professionnelle – ou les trois à la fois), en partageant son admiration pour la poésie de Mahmoud Darwich ou la prose engagée de Susan Sontag, pour les séries « Mad Men » ou « La Fabuleuse Madame Maisel », le tout entremêlé de confidences personnelles ou tirées de son cercle d'amies. Elle vit et travaille à Paris. Enregistrements : septembre 2024 - Réalisation : Charlie Marcelet - Mixage : Charlie Marcelet - Illustration : Sylvain Cabot - Chant, beatmaking : Élodie Milo - Musiques originales : Samuel Hirsch - Entretien, découpage : Richard Gaitet - Prise de son : Mathilde Guermonprez - Montage : Gary Salin - Lectures : Delphine Saltel - Production : ARTE Radio

europe depuis chez mad men lad toni morrison abracadabra beaut liaison sorci susan sontag publi xviie xve mona chollet chollet enregistrements mahmoud darwich samuel hirsch entretien

Pattern Recognition vs True Intelligence - Francois Chollet

Play Episode Listen Later Nov 6, 2024 162:54

Francois Chollet, a prominent AI expert and creator of ARC-AGI, discusses intelligence, consciousness, and artificial intelligence. Chollet explains that real intelligence isn't about memorizing information or having lots of knowledge - it's about being able to handle new situations effectively. This is why he believes current large language models (LLMs) have "near-zero intelligence" despite their impressive abilities. They're more like sophisticated memory and pattern-matching systems than truly intelligent beings. *** MLST IS SPONSORED BY TUFA AI LABS! The current winners of the ARC challenge, MindsAI are part of Tufa AI Labs. They are hiring ML engineers. Are you interested?! Please goto https://tufalabs.ai/ *** He introduced his "Kaleidoscope Hypothesis," which suggests that while the world seems infinitely complex, it's actually made up of simpler patterns that repeat and combine in different ways. True intelligence, he argues, involves identifying these basic patterns and using them to understand new situations. Chollet also talked about consciousness, suggesting it develops gradually in children rather than appearing all at once. He believes consciousness exists in degrees - animals have it to some extent, and even human consciousness varies with age and circumstances (like being more conscious when learning something new versus doing routine tasks). On AI safety, Chollet takes a notably different stance from many in Silicon Valley. He views AGI development as a scientific challenge rather than a religious quest, and doesn't share the apocalyptic concerns of some AI researchers. He argues that intelligence itself isn't dangerous - it's just a tool for turning information into useful models. What matters is how we choose to use it. ARC-AGI Prize: https://arcprize.org/ Francois Chollet: https://x.com/fchollet Shownotes: https://www.dropbox.com/scl/fi/j2068j3hlj8br96pfa7bi/CHOLLET_FINAL.pdf?rlkey=xkbr7tbnrjdl66m246w26uc8k&st=0a4ec4na&dl=0 TOC: 1. Intelligence and Model Building [00:00:00] 1.1 Intelligence Definition and ARC Benchmark [00:05:40] 1.2 LLMs as Program Memorization Systems [00:09:36] 1.3 Kaleidoscope Hypothesis and Abstract Building Blocks [00:13:39] 1.4 Deep Learning Limitations and System 2 Reasoning [00:29:38] 1.5 Intelligence vs. Skill in LLMs and Model Building 2. ARC Benchmark and Program Synthesis [00:37:36] 2.1 Intelligence Definition and LLM Limitations [00:41:33] 2.2 Meta-Learning System Architecture [00:56:21] 2.3 Program Search and Occam's Razor [00:59:42] 2.4 Developer-Aware Generalization [01:06:49] 2.5 Task Generation and Benchmark Design 3. Cognitive Systems and Program Generation [01:14:38] 3.1 System 1/2 Thinking Fundamentals [01:22:17] 3.2 Program Synthesis and Combinatorial Challenges [01:31:18] 3.3 Test-Time Fine-Tuning Strategies [01:36:10] 3.4 Evaluation and Leakage Problems [01:43:22] 3.5 ARC Implementation Approaches 4. Intelligence and Language Systems [01:50:06] 4.1 Intelligence as Tool vs Agent [01:53:53] 4.2 Cultural Knowledge Integration [01:58:42] 4.3 Language and Abstraction Generation [02:02:41] 4.4 Embodiment in Cognitive Systems [02:09:02] 4.5 Language as Cognitive Operating System 5. Consciousness and AI Safety [02:14:05] 5.1 Consciousness and Intelligence Relationship [02:20:25] 5.2 Development of Machine Consciousness [02:28:40] 5.3 Consciousness Prerequisites and Indicators [02:36:36] 5.4 AGI Safety Considerations [02:40:29] 5.5 AI Regulation Framework

Questions d'éducation. Par les penseurs de notre temps 7/10 : La culpabilité au berceau ? Grand entretien avec Mona Chollet

Etre et savoir

Play Episode Listen Later Oct 21, 2024 60:33

durée : 01:00:33 - Être et savoir - par : Louise Tourret - Peut-on éduquer sans culpabiliser (soi-même et ses enfants) ? - réalisation : Peire Legras - invités : Mona Chollet Journaliste, essayiste

temps entretien ducation culpabilit berceau mona chollet chollet grand entretien peire legras

Decompiling Dreams: A New Approach to ARC? - Alessandro Palmarini

Play Episode Listen Later Oct 19, 2024 51:34

Alessandro Palmarini is a post-baccalaureate researcher at the Santa Fe Institute working under the supervision of Melanie Mitchell. He completed his undergraduate degree in Artificial Intelligence and Computer Science at the University of Edinburgh. Palmarini's current research focuses on developing AI systems that can efficiently acquire new skills from limited data, inspired by François Chollet's work on measuring intelligence. His work builds upon the DreamCoder program synthesis system, introducing a novel approach called "dream decompiling" to improve library learning in inductive program synthesis. Palmarini is particularly interested in addressing the Abstraction and Reasoning Corpus (ARC) challenge, aiming to create AI systems that can perform abstract reasoning tasks more efficiently than current approaches. His research explores the balance between computational efficiency and data efficiency in AI learning processes. DO YOU WANT WORK ON ARC with the MindsAI team (current ARC winners)? MLST is sponsored by Tufa Labs: Focus: ARC, LLMs, test-time-compute, active inference, system2 reasoning, and more. Future plans: Expanding to complex environments like Warcraft 2 and Starcraft 2. Interested? Apply for an ML research position: benjamin@tufa.ai TOC: 1. Intelligence Measurement in AI Systems [00:00:00] 1.1 Defining Intelligence in AI Systems [00:02:00] 1.2 Research at Santa Fe Institute [00:04:35] 1.3 Impact of Gaming on AI Development [00:05:10] 1.4 Comparing AI and Human Learning Efficiency 2. Efficient Skill Acquisition in AI [00:06:40] 2.1 Intelligence as Skill Acquisition Efficiency [00:08:25] 2.2 Limitations of Current AI Systems in Generalization [00:09:45] 2.3 Human vs. AI Cognitive Processes [00:10:40] 2.4 Measuring AI Intelligence: Chollet's ARC Challenge 3. Program Synthesis and ARC Challenge [00:12:55] 3.1 Philosophical Foundations of Program Synthesis [00:17:14] 3.2 Introduction to Program Induction and ARC Tasks [00:18:49] 3.3 DreamCoder: Principles and Techniques [00:27:55] 3.4 Trade-offs in Program Synthesis Search Strategies [00:31:52] 3.5 Neural Networks and Bayesian Program Learning 4. Advanced Program Synthesis Techniques [00:32:30] 4.1 DreamCoder and Dream Decompiling Approach [00:39:00] 4.2 Beta Distribution and Caching in Program Synthesis [00:45:10] 4.3 Performance and Limitations of Dream Decompiling [00:47:45] 4.4 Alessandro's Approach to ARC Challenge [00:51:12] 4.5 Conclusion and Future Discussions Refs: Full reflist on YT VD, Show Notes and MP3 metadata Show Notes: https://www.dropbox.com/scl/fi/x50201tgqucj5ba2q4typ/Ale.pdf?rlkey=0ubvk7p5gtyx1gpownpdadim8&st=5pniu3nq&dl=0

It's Not About Scale, It's About Abstraction - Francois Chollet

Play Episode Listen Later Oct 12, 2024 46:21

François Chollet discusses the limitations of Large Language Models (LLMs) and proposes a new approach to advancing artificial intelligence. He argues that current AI systems excel at pattern recognition but struggle with logical reasoning and true generalization. This was Chollet's keynote talk at AGI-24, filmed in high-quality. We will be releasing a full interview with him shortly. A teaser clip from that is played in the intro! Chollet introduces the Abstraction and Reasoning Corpus (ARC) as a benchmark for measuring AI progress towards human-like intelligence. He explains the concept of abstraction in AI systems and proposes combining deep learning with program synthesis to overcome current limitations. Chollet suggests that breakthroughs in AI might come from outside major tech labs and encourages researchers to explore new ideas in the pursuit of artificial general intelligence. TOC 1. LLM Limitations and Intelligence Concepts [00:00:00] 1.1 LLM Limitations and Composition [00:12:05] 1.2 Intelligence as Process vs. Skill [00:17:15] 1.3 Generalization as Key to AI Progress 2. ARC-AGI Benchmark and LLM Performance [00:19:59] 2.1 Introduction to ARC-AGI Benchmark [00:20:05] 2.2 Introduction to ARC-AGI and the ARC Prize [00:23:35] 2.3 Performance of LLMs and Humans on ARC-AGI 3. Abstraction in AI Systems [00:26:10] 3.1 The Kaleidoscope Hypothesis and Abstraction Spectrum [00:30:05] 3.2 LLM Capabilities and Limitations in Abstraction [00:32:10] 3.3 Value-Centric vs Program-Centric Abstraction [00:33:25] 3.4 Types of Abstraction in AI Systems 4. Advancing AI: Combining Deep Learning and Program Synthesis [00:34:05] 4.1 Limitations of Transformers and Need for Program Synthesis [00:36:45] 4.2 Combining Deep Learning and Program Synthesis [00:39:59] 4.3 Applying Combined Approaches to ARC Tasks [00:44:20] 4.4 State-of-the-Art Solutions for ARC Shownotes (new!): https://www.dropbox.com/scl/fi/i7nsyoahuei6np95lbjxw/CholletKeynote.pdf?rlkey=t3502kbov5exsdxhderq70b9i&st=1ca91ewz&dl=0 [0:01:15] Abstraction and Reasoning Corpus (ARC): AI benchmark (François Chollet) https://arxiv.org/abs/1911.01547 [0:05:30] Monty Hall problem: Probability puzzle (Steve Selvin) https://www.tandfonline.com/doi/abs/10.1080/00031305.1975.10479121 [0:06:20] LLM training dynamics analysis (Tirumala et al.) https://arxiv.org/abs/2205.10770 [0:10:20] Transformer limitations on compositionality (Dziri et al.) https://arxiv.org/abs/2305.18654 [0:10:25] Reversal Curse in LLMs (Berglund et al.) https://arxiv.org/abs/2309.12288 [0:19:25] Measure of intelligence using algorithmic information theory (François Chollet) https://arxiv.org/abs/1911.01547 [0:20:10] ARC-AGI: GitHub repository (François Chollet) https://github.com/fchollet/ARC-AGI [0:22:15] ARC Prize: $1,000,000+ competition (François Chollet) https://arcprize.org/ [0:33:30] System 1 and System 2 thinking (Daniel Kahneman) https://www.amazon.com/Thinking-Fast-Slow-Daniel-Kahneman/dp/0374533555 [0:34:00] Core knowledge in infants (Elizabeth Spelke) https://www.harvardlds.org/wp-content/uploads/2017/01/SpelkeKinzler07-1.pdf [0:34:30] Embedding interpretive spaces in ML (Tennenholtz et al.) https://arxiv.org/abs/2310.04475 [0:44:20] Hypothesis Search with LLMs for ARC (Wang et al.) https://arxiv.org/abs/2309.05660 [0:44:50] Ryan Greenblatt's high score on ARC public leaderboard https://arcprize.org/

231 - Matt Kirkland, Founder and Designer at Brand New Box

Rails with Jason

Play Episode Listen Later Sep 4, 2024 51:18

On this episode, I talk to Matt Kirkland, Founder and Designer at Brand New Box. We talk about how to get good clients, the utility of reminding people that you exist, reading science fiction, ChatGPT as highly advanced autocomplete, reading history, The limits of ChatGPT-style AI as compared to AGI, and Matt's Dracula read-through newsletter Dracula Daily. Philip K. Dick on GoodreadsFrançois Chollet on Sean Carroll's Mindscape PodcastDracula DailyMatt Kirkland.comBrand New BoxMatt Kirkland on TwitterMatt Kirkland on LinkedIn

founders ai chatgpt designers dracula brand new kirkland agi sean carroll chollet new box dracula daily

L'élaboration du vin | Stéphanie Chollet (vigneronne) et Ophélie Michaud (oenologue) - Mouton Cadet

Business of Bouffe

Play Episode Listen Later Sep 1, 2024 60:32

Dans ce second épisode spécial de Business of Bouffe, enregistré à Bordeaux à l'occasion de l'événement “Bordeaux Fête le Vin”, nous parlons plus en détail de l'élaboration du vin. Nos invitées sont une nouvelle fois un duo de passionnées. L'une est vigneronne. Elle s'appelle Stéphanie Chollet et s'est reconvertie avec courage dans la viticulture. L'autre est œnologue : elle s'appelle Ophélie Michaud et travaille pour la maison Mouton Cadet, en lien étroit avec les vignerons partenaires comme Stéphanie.Dans cet épisode, nous cherchons à mieux comprendre le métier de vigneron et le modèle des vins bordelais à travers l'exemple des vins Mouton Cadet.Cet épisode spécial a été enregistré en collaboration avec la maison Mouton Cadet.Et n'oubliez pas, pour votre santé : attention à l'abus d'alcool ! Hébergé par Acast. Visitez acast.com/privacy pour plus d'informations.

business nos dans acast bordeaux visitez cadet michaud oph mouton bouffe chollet

Smashing Security presents The AI Fix: An AI cookery dumpster fire, the ARC prize, and a creepy new AI friend

Smashing Security

Play Episode Listen Later Aug 6, 2024 41:54

While "Smashing Security" is on its summer holiday, here's a chance to listen to an episode of its sister show - "The AI Fix".In episode ten of The AI Fix, Graham attempts to say "quinoa", Mark draws a line in the amper-sand, ChatGPT becomes an expert in solar panels and bomb disposal, and our hosts watch a terrifying trailer for a creepy new AI friend.Graham discovers that the world of AI cookery is a soggy, limey mess, and learns an unusual trick for making a great mojito, while Mark pits his co-host against the cleverest AI brains in the world.Episode links:OpenAI starts rollout of Advanced Voice Mode.UK Government shelves £1.3bn UK tech and AI plans.Friend trailer.Artificial intelligence has hard time with accents.Netherlands court uses ChatGPT to decide things.Argentina will use AI to ‘predict future crimes' but experts worry for citizens' rights.Twitter thread on crockpot cookbook.Get ready for AI to rip off your favorite cookbooks.‘One of the most disgusting meals I've ever eaten': AI recipes tested.This cookbook author was a best-seller on Amazon — but she may not even be human.ARC Prize.ARC Prize leaderboard.On the Measure of Intelligence research paper by François Chollet.The AI FixThe AI Fix podcast is presented by Graham Cluley and Mark Stockley.Learn more about the podcast at theaifix.show, and follow us on Twitter at @TheAIFix.Never miss another episode by following us in your favourite podcast app. It's free!Like to give us some feedback or sponsor the podcast? Get in touch.This...

Prof. Subbarao Kambhampati - LLMs don't reason, they memorize (ICML2024 2/13)

Play Episode Listen Later Jul 29, 2024 102:27

Prof. Subbarao Kambhampati argues that while LLMs are impressive and useful tools, especially for creative tasks, they have fundamental limitations in logical reasoning and cannot provide guarantees about the correctness of their outputs. He advocates for hybrid approaches that combine LLMs with external verification systems. MLST is sponsored by Brave: The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api. TOC (sorry the ones baked into the MP3 were wrong apropos due to LLM hallucination!) [00:00:00] Intro [00:02:06] Bio [00:03:02] LLMs are n-gram models on steroids [00:07:26] Is natural language a formal language? [00:08:34] Natural language is formal? [00:11:01] Do LLMs reason? [00:19:13] Definition of reasoning [00:31:40] Creativity in reasoning [00:50:27] Chollet's ARC challenge [01:01:31] Can we reason without verification? [01:10:00] LLMs cant solve some tasks [01:19:07] LLM Modulo framework [01:29:26] Future trends of architecture [01:34:48] Future research directions Youtube version: https://www.youtube.com/watch?v=y1WnHpedi2A Refs: (we didn't have space for URLs here, check YT video description instead) Can LLMs Really Reason and Plan? On the Planning Abilities of Large Language Models : A Critical Investigation Chain of Thoughtlessness? An Analysis of CoT in Planning On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve "Task Success" is not Enough Partition function (number theory) (Srinivasa Ramanujan and G.H. Hardy's work) Poincaré conjecture Gödel's incompleteness theorems ROT13 (Rotate13, "rotate by 13 places") A Mathematical Theory of Communication (C. E. SHANNON) Sparks of AGI Kambhampati thesis on speech recognition (1983) PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change Explainable human-AI interaction Tree of Thoughts On the Measure of Intelligence (ARC Challenge) Getting 50% (SoTA) on ARC-AGI with GPT-4o (Ryan Greenblatt ARC solution) PROGRAMS WITH COMMON SENSE (John McCarthy) - "AI should be an advice taker program" Original chain of thought paper ICAPS 2024 Keynote: Dale Schuurmans on "Computing and Planning with Large Generative Models" (COT) The Hardware Lottery (Hooker) A Path Towards Autonomous Machine Intelligence (JEPA/LeCun) AlphaGeometry FunSearch Emergent Abilities of Large Language Models Language models are not naysayers (Negation in LLMs) The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A" Embracing negative results

Sara Hooker - Why US AI Act Compute Thresholds Are Misguided

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas

Play Episode Listen Later Jul 18, 2024 65:41

Sara Hooker is VP of Research at Cohere and leader of Cohere for AI. We discuss her recent paper critiquing the use of compute thresholds, measured in FLOPs (floating point operations), as an AI governance strategy. We explore why this approach, recently adopted in both US and EU AI policies, may be problematic and oversimplified. Sara explains the limitations of using raw computational power as a measure of AI capability or risk, and discusses the complex relationship between compute, data, and model architecture. Equally important, we go into Sara's work on "The AI Language Gap." This research highlights the challenges and inequalities in developing AI systems that work across multiple languages. Sara discusses how current AI models, predominantly trained on English and a handful of high-resource languages, fail to serve the linguistic diversity of our global population. We explore the technical, ethical, and societal implications of this gap, and discuss potential solutions for creating more inclusive and representative AI systems. We broadly discuss the relationship between language, culture, and AI capabilities, as well as the ethical considerations in AI development and deployment. YT Version: https://youtu.be/dBZp47999Ko TOC: [00:00:00] Intro [00:02:12] FLOPS paper [00:26:42] Hardware lottery [00:30:22] The Language gap [00:33:25] Safety [00:38:31] Emergent [00:41:23] Creativity [00:43:40] Long tail [00:44:26] LLMs and society [00:45:36] Model bias [00:48:51] Language and capabilities [00:52:27] Ethical frameworks and RLHF Sara Hooker https://www.sarahooker.me/ https://www.linkedin.com/in/sararosehooker/ https://scholar.google.com/citations?user=2xy6h3sAAAAJ&hl=en https://x.com/sarahookr Interviewer: Tim Scarfe Refs The AI Language gap https://cohere.com/research/papers/the-AI-language-gap.pdf On the Limitations of Compute Thresholds as a Governance Strategy. https://arxiv.org/pdf/2407.05694v1 The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm https://arxiv.org/pdf/2406.18682 Cohere Aya https://cohere.com/research/aya RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs https://arxiv.org/pdf/2407.02552 Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs https://arxiv.org/pdf/2402.14740 Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/ EU AI Act https://www.europarl.europa.eu/doceo/document/TA-9-2024-0138_EN.pdf The bitter lesson http://www.incompleteideas.net/IncIdeas/BitterLesson.html Neel Nanda interview https://www.youtube.com/watch?v=_Ygf0GnlwmY Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet https://transformer-circuits.pub/2024/scaling-monosemanticity/ Chollet's ARC challenge https://github.com/fchollet/ARC-AGI Ryan Greenblatt on ARC https://www.youtube.com/watch?v=z9j3wB1RRGA Disclaimer: This is the third video from our Cohere partnership. We were not told what to say in the interview, and didn't edit anything out from the interview.

Zapier's Mike Knoop launches ARC Prize to Jumpstart New Ideas for AGI

Training Data

Play Episode Listen Later Jul 2, 2024 55:12

As impressive as LLMs are, the growing consensus is that language, scale and compute won't get us to AGI. Although many AI benchmarks have quickly achieved human-level performance, there is one eval that has barely budged since it was created in 2019. Google researcher François Chollet wrote a paper that year defining intelligence as skill-acquisition efficiency—the ability to learn new skills as humans do, from a small number of examples. To make it testable he proposed a new benchmark, the Abstraction and Reasoning Corpus (ARC), designed to be easy for humans, but hard for AI. Notably, it doesn't rely on language. Zapier co-founder Mike Knoop read Chollet's paper as the LLM wave was rising. He worked quickly to integrate generative AI into Zapier's product, but kept coming back to the lack of progress on the ARC benchmark. In June, Knoop and Chollet launched the ARC Prize, a public competition offering more than $1M to beat and open-source a solution to the ARC-AGI eval. In this episode Mike talks about the new ideas required to solve ARC, shares updates from the first two weeks of the competition, and shares why he's excited for AGI systems that can innovate alongside humans. Hosted by: Sonya Huang and Pat Grady, Sequoia Capital Mentioned: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models: The 2019 paper that first caught Mike's attention about the capabilities of LLMs On the Measure of Intelligence: 2019 paper by Google researcher François Chollet that introduced the ARC benchmark, which remains unbeaten ARC Prize 2024: The $1M+ competition Mike and François have launched to drive interest in solving the ARC-AGI eval Sequence to Sequence Learning with Neural Networks: Ilya Sutskever paper from 2014 that influenced the direction of machine translation with deep neural networks. Etched: Luke Miles on LessWrong wrote about the first ASIC chip that accelerates transformers on silicon Kaggle: The leading data science competition platform and online community, acquired by Google in 2017 Lab42: Swiss AU lab that hosted ARCathon precursor to ARC Prize Jack Cole: Researcher on team that was #1 on the leaderboard for ARCathon Ryan Greenblatt: Researcher with current high score (50%) on ARC public leaderboard (00:00) Introduction (01:51) AI at Zapier (08:31) What is ARC AGI? (13:25) What does it mean to efficiently acquire a new skill? (19:03) What approaches will succeed? (21:11) A little bit of a different shape (25:59) The role of code generation and program synthesis (29:11) What types of people are working on this? (31:45) Trying to prove you wrong (34:50) Where are the big labs? (38:21) The world post-AGI (42:51) When will we cross 85% on ARC AGI? (46:12) Will LLMs be part of the solution? (50:13) Lightning round

ai google intelligence prizes measure lightning arc launches notably jumpstart llm new ideas agi zapier abstraction asic knoop chollet lesswrong mike knoop pat grady

280 | François Chollet on Deep Learning and the Meaning of Intelligence

Play Episode Listen Later Jun 24, 2024 101:49

Which is more intelligent, ChatGPT or a 3-year old? Of course this depends on what we mean by "intelligence." A modern LLM is certainly able to answer all sorts of questions that require knowledge far past the capacity of a 3-year old, and even to perform synthetic tasks that seem remarkable to many human grown-ups. But is that really intelligence? François Chollet argues that it is not, and that LLMs are not ever going to be truly "intelligent" in the usual sense -- although other approaches to AI might get there.Support Mindscape on Patreon.Blog post with transcript: https://www.preposterousuniverse.com/podcast/2024/06/24/280-francois-chollet-on-deep-learning-and-the-meaning-of-intelligence/François Chollet received his Diplôme d'Ingénieur from École Nationale Supérieure de Techniques Avancées, Paris. He is currently a Senior Staff Engineer at Google. He has been awarded the Global Swiss AI award for breakthroughs in artificial intelligence. He is the author of Deep Learning with Python, and developer of the Keras software library for neural networks. He is the creator of the ARC (Abstraction and Reasoning Corpus) Challenge.Web siteGithubGoogle Scholar publicationsWikipedia"On the Measure of Intelligence"See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

ai google science technology society ideas blog meaning chatgpt philosophy artificial intelligence intelligence measure python llm dipl deep learning keras chollet nationale sup

LW - LLM Generality is a Timeline Crux by eggsyntax

Play Episode Listen Later Jun 24, 2024 14:07

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: LLM Generality is a Timeline Crux, published by eggsyntax on June 24, 2024 on LessWrong. Short Summary LLMs may be fundamentally incapable of fully general reasoning, and if so, short timelines are less plausible. Longer summary There is ML research suggesting that LLMs fail badly on attempts at general reasoning, such as planning problems, scheduling, and attempts to solve novel visual puzzles. This post provides a brief introduction to that research, and asks: Whether this limitation is illusory or actually exists. If it exists, whether it will be solved by scaling or is a problem fundamental to LLMs. If fundamental, whether it can be overcome by scaffolding & tooling. If this is a real and fundamental limitation that can't be fully overcome by scaffolding, we should be skeptical of arguments like Leopold Aschenbrenner's (in his recent 'Situational Awareness') that we can just 'follow straight lines on graphs' and expect AGI in the next few years. Introduction Leopold Aschenbrenner's recent 'Situational Awareness' document has gotten considerable attention in the safety & alignment community. Aschenbrenner argues that we should expect current systems to reach human-level given further scaling and 'unhobbling', and that it's 'strikingly plausible' that we'll see 'drop-in remote workers' capable of doing the work of an AI researcher or engineer by 2027. Others hold similar views. Francois Chollet and Mike Knoop's new $500,000 prize for beating the ARC benchmark has also gotten considerable recent attention in AIS[1]. Chollet holds a diametrically opposed view: that the current LLM approach is fundamentally incapable of general reasoning, and hence incapable of solving novel problems. We only imagine that LLMs can reason, Chollet argues, because they've seen such a vast wealth of problems that they can pattern-match against. But LLMs, even if scaled much further, will never be able to do the work of AI researchers. It would be quite valuable to have a thorough analysis of this question through the lens of AI safety and alignment. This post is not that[2], nor is it a review of the voluminous literature on this debate (from outside the AIS community). It attempts to briefly introduce the disagreement, some evidence on each side, and the impact on timelines. What is general reasoning? Part of what makes this issue contentious is that there's not a widely shared definition of 'general reasoning', and in fact various discussions of this use various terms. By 'general reasoning', I mean to capture two things. First, the ability to think carefully and precisely, step by step. Second, the ability to apply that sort of thinking in novel situations[3]. Terminology is inconsistent between authors on this subject; some call this 'system II thinking'; some 'reasoning'; some 'planning' (mainly for the first half of the definition); Chollet just talks about 'intelligence' (mainly for the second half). This issue is further complicated by the fact that humans aren't fully general reasoners without tool support either. For example, seven-dimensional tic-tac-toe is a simple and easily defined system, but incredibly difficult for humans to play mentally without extensive training and/or tool support. Generalizations that are in-distribution for humans seems like something that any system should be able to do; generalizations that are out-of-distribution for humans don't feel as though they ought to count. How general are LLMs? It's important to clarify that this is very much a matter of degree. Nearly everyone was surprised by the degree to which the last generation of state-of-the-art LLMs like GPT-3 generalized; for example, no one I know of predicted that LLMs trained on primarily English-language sources would be able to do translation between languages. Some in the field argued as...

ai english speech timeline ea arc gpt ml llm agi terminology crux generalization rationalist chollet lesswrong mike knoop

AF - LLM Generality is a Timeline Crux by Egg Syntax

Play Episode Listen Later Jun 24, 2024 15:05

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: LLM Generality is a Timeline Crux, published by Egg Syntax on June 24, 2024 on The AI Alignment Forum. Short Summary LLMs may be fundamentally incapable of fully general reasoning, and if so, short timelines are less plausible. Longer summary There is ML research suggesting that LLMs fail badly on attempts at general reasoning, such as planning problems, scheduling, and attempts to solve novel visual puzzles. This post provides a brief introduction to that research, and asks: Whether this limitation is illusory or actually exists. If it exists, whether it will be solved by scaling or is a problem fundamental to LLMs. If fundamental, whether it can be overcome by scaffolding & tooling. If this is a real and fundamental limitation that can't be fully overcome by scaffolding, we should be skeptical of arguments like Leopold Aschenbrenner's (in his recent 'Situational Awareness') that we can just 'follow straight lines on graphs' and expect AGI in the next few years. Introduction Leopold Aschenbrenner's recent 'Situational Awareness' document has gotten considerable attention in the safety & alignment community. Aschenbrenner argues that we should expect current systems to reach human-level given further scaling[1], and that it's 'strikingly plausible' that we'll see 'drop-in remote workers' capable of doing the work of an AI researcher or engineer by 2027. Others hold similar views. Francois Chollet and Mike Knoop's new $500,000 prize for beating the ARC benchmark has also gotten considerable recent attention in AIS[2]. Chollet holds a diametrically opposed view: that the current LLM approach is fundamentally incapable of general reasoning, and hence incapable of solving novel problems. We only imagine that LLMs can reason, Chollet argues, because they've seen such a vast wealth of problems that they can pattern-match against. But LLMs, even if scaled much further, will never be able to do the work of AI researchers. It would be quite valuable to have a thorough analysis of this question through the lens of AI safety and alignment. This post is not that[3], nor is it a review of the voluminous literature on this debate (from outside the AIS community). It attempts to briefly introduce the disagreement, some evidence on each side, and the impact on timelines. What is general reasoning? Part of what makes this issue contentious is that there's not a widely shared definition of 'general reasoning', and in fact various discussions of this use various terms. By 'general reasoning', I mean to capture two things. First, the ability to think carefully and precisely, step by step. Second, the ability to apply that sort of thinking in novel situations[4]. Terminology is inconsistent between authors on this subject; some call this 'system II thinking'; some 'reasoning'; some 'planning' (mainly for the first half of the definition); Chollet just talks about 'intelligence' (mainly for the second half). This issue is further complicated by the fact that humans aren't fully general reasoners without tool support either. For example, seven-dimensional tic-tac-toe is a simple and easily defined system, but incredibly difficult for humans to play mentally without extensive training and/or tool support. Generalizations that are in-distribution for humans seems like something that any system should be able to do; generalizations that are out-of-distribution for humans don't feel as though they ought to count. How general are LLMs? It's important to clarify that this is very much a matter of degree. Nearly everyone was surprised by the degree to which the last generation of state-of-the-art LLMs like GPT-3 generalized; for example, no one I know of predicted that LLMs trained on primarily English-language sources would be able to do translation between languages. Some in the field argued as...

ai english speech timeline ea arc gpt ml llm agi terminology crux syntax generalization rationalist chollet mike knoop

New "50%" ARC result and current winners interviewed

Play Episode Listen Later Jun 18, 2024 134:17

The ARC Challenge, created by Francois Chollet, tests how well AI systems can generalize from a few examples in a grid-based intelligence test. We interview the current winners of the ARC Challenge—Jack Cole, Mohammed Osman and their collaborator Michael Hodel. They discuss how they tackled ARC (Abstraction and Reasoning Corpus) using language models. We also discuss the new "50%" public set approach announced today from Redwood Research (Ryan Greenblatt). Jack and Mohammed explain their winning approach, which involves fine-tuning a language model on a large, specifically-generated dataset and then doing additional fine-tuning at test-time, a technique known in this context as "active inference". They use various strategies to represent the data for the language model and believe that with further improvements, the accuracy could reach above 50%. Michael talks about his work on generating new ARC-like tasks to help train the models. They also debate whether their methods stay true to the "spirit" of Chollet's measure of intelligence. Despite some concerns, they agree that their solutions are promising and adaptable for other similar problems. Note: Jack's team is still the current official winner at 33% on the private set. Ryan's entry is not on the private leaderboard or eligible. Chollet invented ARC in 2019 (not 2017 as stated) "Ryan's entry is not a new state of the art. We don't know exactly how well it does since it was only evaluated on 100 tasks from the evaluation set and does 50% on those, reportedly. Meanwhile Jacks team i.e. MindsAI's solution does 54% on the entire eval set and it is seemingly possible to do 60-70% with an ensemble" Jack Cole: https://x.com/Jcole75Cole https://lab42.global/community-interview-jack-cole/ Mohamed Osman: Mohamed is looking to do a PhD in AI/ML, can you help him? Email: mothman198@outlook.com https://www.linkedin.com/in/mohamedosman1905/ Michael Hodel: https://arxiv.org/pdf/2404.07353v1 https://www.linkedin.com/in/michael-hodel/ https://x.com/bayesilicon https://github.com/michaelhodel Getting 50% (SoTA) on ARC-AGI with GPT-4o - Ryan Greenblatt https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt Neural networks for abstraction and reasoning: Towards broad generalization in machines [Mikel Bober-Irizar, Soumya Banerjee] https://arxiv.org/pdf/2402.03507 Measure of intelligence: https://arxiv.org/abs/1911.01547 YT version: https://youtu.be/jSAT_RuJ_Cg

ai interview phd current winners arc result gpt ai ml sota chollet jack cole

LW - Getting 50% (SoTA) on ARC-AGI with GPT-4o by ryan greenblatt

Play Episode Listen Later Jun 17, 2024 33:15

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Getting 50% (SoTA) on ARC-AGI with GPT-4o, published by ryan greenblatt on June 17, 2024 on LessWrong. I recently got to 50%[1] accuracy on the public test set for ARC-AGI by having GPT-4o generate a huge number of Python implementations of the transformation rule (around 8,000 per problem) and then selecting among these implementations based on correctness of the Python programs on the examples (if this is confusing, go here)[2]. I use a variety of additional approaches and tweaks which overall substantially improve the performance of my method relative to just sampling 8,000 programs. [This post is on a pretty different topic than the usual posts I make about AI safety.] The additional approaches and tweaks are: I use few-shot prompts which perform meticulous step-by-step reasoning. I have GPT-4o try to revise some of the implementations after seeing what they actually output on the provided examples. I do some feature engineering, providing the model with considerably better grid representations than the naive approach of just providing images. (See below for details on what a "grid" in ARC-AGI is.) I used specialized few-shot prompts for the two main buckets of ARC-AGI problems (cases where the grid size changes vs doesn't). The prior state of the art on this dataset was 34% accuracy, so this is a significant improvement.[3] On a held-out subset of the train set, where humans get 85% accuracy, my solution gets 72% accuracy.[4] (The train set is significantly easier than the test set as noted here.) Additional increases of runtime compute would further improve performance (and there are clear scaling laws), but this is left as an exercise to the reader. In this post: I describe my method; I analyze what limits its performance and make predictions about what is needed to reach human performance; I comment on what it means for claims that François Chollet makes about LLMs. Given that current LLMs can perform decently well on ARC-AGI, do claims like "LLMs like Gemini or ChatGPT [don't work] because they're basically frozen at inference time. They're not actually learning anything." make sense? (This quote is from here.) Thanks to Fabien Roger and Buck Shlegeris for a bit of help with this project and with writing this post. What is ARC-AGI? ARC-AGI is a dataset built to evaluate the general reasoning abilities of AIs. It consists of visual problems like the below, where there are input-output examples which are grids of colored cells. The task is to guess the transformation from input to output and then fill out the missing grid. Here is an example from the tutorial: This one is easy, and it's easy to get GPT-4o to solve it. But the tasks from the public test set are much harder; they're often non-trivial for (typical) humans. There is a reported MTurk human baseline for the train distribution of 85%, but no human baseline for the public test set which is known to be significantly more difficult. Here are representative problems from the test set[5], and whether my GPT-4o-based solution gets them correct or not. Problem 1: Problem 2: Problem 3: My method The main idea behind my solution is very simple: get GPT-4o to generate around 8,000 python programs which attempt to implement the transformation, select a program which is right on all the examples (usually there are 3 examples), and then submit the output this function produces when applied to the additional test input(s). I show GPT-4o the problem as images and in various ascii representations. My approach is similar in spirit to the approach applied in AlphaCode in which a model generates millions of completions attempting to solve a programming problem and then aggregates over them to determine what to submit. Actually getting to 50% with this main idea took me about 6 days of work. This work includes construct...

ai chatgpt speech ea gemini gpt python sota greenblatt rationalist chollet lesswrong alphacode mturk buck shlegeris

EA - On the Dwarkesh/Chollet Podcast, and the cruxes of scaling to AGI by JWS

Play Episode Listen Later Jun 16, 2024 32:21

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On the Dwarkesh/Chollet Podcast, and the cruxes of scaling to AGI, published by JWS on June 16, 2024 on The Effective Altruism Forum. Overview Recently Dwarkesh Patel released an interview with François Chollet (hereafter Dwarkesh and François). I thought this was one of Dwarkesh's best recent podcasts, and one of the best discussions that the AI Community has had recently. Instead of subtweeting those with opposing opinions or vagueposting, we actually got two people with disagreements on the key issue of scaling and AGI having a good faith and productive discussion.[1] I want to explicitly give Dwarkesh a shout-out for having such a productive discussion (even if I disagree with him on the object level) and having someone on who challenges his beliefs and preconceptions. Often when I think of different AI factions getting angry at each other, and the quality of AI risk discourse plummeting, I'm reminded of Scott's phrase "I reject the argument that Purely Logical Debate has been tried and found wanting. Like GK Chesterton, I think it has been found difficult and left untried." More of this kind of thing please, everyone involved. I took notes as I listened to the podcast, and went through it again to make sure I got the key claims right. I grouped them into similar themes, as Dwarkesh and François often went down a rabbit-hole to pursue an interesting point or crux and later returned to the main topic.[2] I hope this can help readers navigate to their points of interest, or make the discussion clearer, though I'd definitely recommend listening/watching for yourself! (It is long though, so feel free to jump around the doc rather than slog through it one go!) Full disclosure, I am sceptical of a lot of the case for short AGI timelines these days, and thus also sceptical of claims that x-risk from AI is an overwhelmingly important thing to be doing in the entire history of humanity. This is of course comes across in my summarisation and takeaways, but I think acknowledging that openly is better than leaving it to be inferred, and I hope this post can be another addition in helping improve the state of AI discussion both in and outside of EA/AI-Safety circles. It is also important to state explicitly here that I might very well be wrong! Please take my perspective as just that, one perspective among many, and do not defer to me (or to anyone really). Come to your own conclusions on these issues.[3] The Podcast All timestamps are for the YouTube video, not the podcast recording. I've tried to cover the podcast by the main things as they appeared chronologically, and then tracking them through the transcript. I include links to some external resources, passing thoughts in footnotes, and more full thoughts in block-quotes. Introducing the ARC Challenge The podcast starts with an introduction of the ARC Challenge itself, and Dwarkesh is happy that François has put out a line in the sand as an LLM sceptic instead of moving the goalposts [0:02:27]. François notes that LLMs struggle on ARC, in part because its challenges are novel and meant to not be found on the internet, instead the approaches that perform better are based on 'Discrete Program Search' [0:02:04]. He later notes that ARC puzzles are not complex and require very little knowledge to solve [0:25:45]. Dwarkesh agrees that the problems are simple and thinks it's an "intriguing fact" that ARC problems are simple for humans, but LLMs are bad at them, and he hasn't been convinced by the explanations he's got from LLM proponents/scaling maximalists about why that is [0:11:57]. Towards the end François mentions in passing that big labs tried ARC but didn't share because their results because they're bad [1:08:28].[4] One of ARC's main selling points is that humans are clearly meant to do well at this, even children, [0...

ai speech scaling ea arc llm agi jws rationalist chollet

EA - LLMs won't lead to AGI - Francois Chollet by tobycrisford