POPULARITY
If we want to make progress toward AGI, we need a clear definition of intelligence—and a way to measure it. In this episode, Hugo talks with Greg Kamradt, President of the ARC Prize Foundation, about ARC-AGI: a benchmark built on Francois Chollet's definition of intelligence as “the efficiency at which you learn new things.” Unlike most evals that focus on memorization or task completion, ARC is designed to measure generalization—and expose where today's top models fall short. They discuss:
Vous pensez connaître vos clients sur le bout des doigts… mais êtes-vous bien sûr de ne pas vous baser uniquement sur des suppositions ?Dans cet épisode, on reçoit Émilie Chollet, spécialiste en persona et analyse comportementale client, pour vous aider à réaliser une enquête persona utile, claire et sans faux pas. Avec elle, on décortique les étapes clés pour interroger vos clients efficacement, comprendre leurs vrais besoins et affiner votre stratégie marketing.PROGRAMME :Pourquoi il ne faut pas réserver l'enquête persona aux grandes boîtesÀ quel moment faire une étude client (et quand ce n'est pas la peine de tout refaire)Comment poser les bonnes questions (sans influencer les réponses)Les erreurs classiques à éviter pour ne pas biaiser vos résultatsEntretien qualitatif ou questionnaire : comment choisir ?Notre méthode pas-à-pas pour récolter et analyser les données sans se noyerUn épisode ultra-concret pour arrêter les suppositions et prendre enfin des décisions marketing basées sur des données fiables.A PROPOS D'ÉMILIE CHOLLETLinkedIn____
In this fascinating episode, we dive deep into the race towards true AI intelligence, AGI benchmarks, test-time adaptation, and program synthesis with star AI researcher (and philosopher) Francois Chollet, creator of Keras and the ARC AGI benchmark, and Mike Knoop, co-founder of Zapier and now co-founder with Francois of both the ARC Prize and the research lab Ndea. With the launch of ARC Prize 2025 and ARC-AGI 2, they explain why existing LLMs fall short on true intelligence tests, how new models like O3 mark a step change in capabilities, and what it will really take to reach AGI.We cover everything from the technical evolution of ARC 1 to ARC 2, the shift toward test-time reasoning, and the role of program synthesis as a foundation for more general intelligence. The conversation also explores the philosophical underpinnings of intelligence, the structure of the ARC Prize, and the motivation behind launching Ndea — a ew AGI research lab that aims to build a "factory for rapid scientific advancement." Whether you're deep in the AI research trenches or just fascinated by where this is all headed, this episode offers clarity and inspiration.NdeaWebsite - https://ndea.comX/Twitter - https://x.com/ndeaARC PrizeWebsite - https://arcprize.orgX/Twitter - https://x.com/arcprizeFrançois CholletLinkedIn - https://www.linkedin.com/in/fcholletX/Twitter - https://x.com/fcholletMike KnoopX/Twitter - https://x.com/mikeknoopFIRSTMARKWebsite - https://firstmark.comX/Twitter - https://twitter.com/FirstMarkCapMatt Turck (Managing Director)LinkedIn - https://www.linkedin.com/in/turck/X/Twitter - https://twitter.com/mattturck(00:00) Intro (01:05) Introduction to ARC Prize 2025 and ARC-AGI 2 (02:07) What is ARC and how it differs from other AI benchmarks (02:54) Why current models struggle with fluid intelligence (03:52) Shift from static LLMs to test-time adaptation (04:19) What ARC measures vs. traditional benchmarks (07:52) Limitations of brute-force scaling in LLMs (13:31) Defining intelligence: adaptation and efficiency (16:19) How O3 achieved a massive leap in ARC performance (20:35) Speculation on O3's architecture and test-time search (22:48) Program synthesis: what it is and why it matters (28:28) Combining LLMs with search and synthesis techniques (34:57) The ARC Prize structure: efficiency track, private vs. public (42:03) Open source as a requirement for progress (44:59) What's new in ARC-AGI 2 and human benchmark testing (48:14) Capabilities ARC-AGI 2 is designed to test (49:21) When will ARC-AGI 2 be saturated? AGI timelines (52:25) Founding of NDEA and why now (54:19) Vision beyond AGI: a factory for scientific advancement (56:40) What NDEA is building and why it's different from LLM labs (58:32) Hiring and remote-first culture at NDEA (59:52) Closing thoughts and the future of AI research
OpenAI brengt revolutionaire plaatjes naar ChatGPT met perfecte tekstweergave én karakter-consistentie. Google verrast iedereen met Gemini 2.5 Pro dat alle AI-benchmarks domineert en een miljoen tokens context biedt. François Chollet daagt AI opnieuw uit met de "onmogelijke" ARC-AGI-2 test waarbij mensen nog steeds winnen. En het hoofdonderwerp van deze week: AI en werk. AI dringt door tot elk beroep. Robotisering raakte eerst alleen de mensen die aan lopende banden stonden, maar nu wordt de druk in elk kantoor gevoeld. Maar we hebben een keuze. Laten we de machines alles wat ons mens maakt overnemen? Het is niet langer een theoretische vraag, en vandaag duiken we er diep in. Hoe bereid jij je voor op een de toekomst van werkgelegenheid met AI?Benieuwd naar het webinar? Lees dan verder op aireport.email/webinarAls je een lezing wil over AI van Wietse of Alexander dan kan dat. Mail ons op lezing@aireport.emailOp de hoogte blijven van het laatste AI-nieuws en 2x per week tips & tools ontvangen om het meeste uit AI te halen (en bij de webinar te zijn). Abonneer je dan op onze nieuwsbrief via aireport.emailVandaag nog beginnen met AI binnen jouw bedrijf? Ga dan naar deptagency.com/aireport This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.aireport.email/subscribe
We are joined by Francois Chollet and Mike Knoop, to launch the new version of the ARC prize! In version 2, the challenges have been calibrated with humans such that at least 2 humans could solve each task in a reasonable task, but also adversarially selected so that frontier reasoning models can't solve them. The best LLMs today get negligible performance on this challenge. https://arcprize.org/SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/***TRANSCRIPT:https://www.dropbox.com/scl/fi/0v9o8xcpppdwnkntj59oi/ARCv2.pdf?rlkey=luqb6f141976vra6zdtptv5uj&dl=0TOC:1. ARC v2 Core Design & Objectives [00:00:00] 1.1 ARC v2 Launch and Benchmark Architecture [00:03:16] 1.2 Test-Time Optimization and AGI Assessment [00:06:24] 1.3 Human-AI Capability Analysis [00:13:02] 1.4 OpenAI o3 Initial Performance Results2. ARC Technical Evolution [00:17:20] 2.1 ARC-v1 to ARC-v2 Design Improvements [00:21:12] 2.2 Human Validation Methodology [00:26:05] 2.3 Task Design and Gaming Prevention [00:29:11] 2.4 Intelligence Measurement Framework3. O3 Performance & Future Challenges [00:38:50] 3.1 O3 Comprehensive Performance Analysis [00:43:40] 3.2 System Limitations and Failure Modes [00:49:30] 3.3 Program Synthesis Applications [00:53:00] 3.4 Future Development RoadmapREFS:[00:00:15] On the Measure of Intelligence, François Chollethttps://arxiv.org/abs/1911.01547[00:06:45] ARC Prize Foundation, François Chollet, Mike Knoophttps://arcprize.org/[00:12:50] OpenAI o3 model performance on ARC v1, ARC Prize Teamhttps://arcprize.org/blog/oai-o3-pub-breakthrough[00:18:30] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Jason Wei et al.https://arxiv.org/abs/2201.11903[00:21:45] ARC-v2 benchmark tasks, Mike Knoophttps://arcprize.org/blog/introducing-arc-agi-public-leaderboard[00:26:05] ARC Prize 2024: Technical Report, Francois Chollet et al.https://arxiv.org/html/2412.04604v2[00:32:45] ARC Prize 2024 Technical Report, Francois Chollet, Mike Knoop, Gregory Kamradthttps://arxiv.org/abs/2412.04604[00:48:55] The Bitter Lesson, Rich Suttonhttp://www.incompleteideas.net/IncIdeas/BitterLesson.html[00:53:30] Decoding strategies in neural text generation, Sina Zarrießhttps://www.mdpi.com/2078-2489/12/9/355/pdf
Mohamed Osman joins to discuss MindsAI's highest scoring entry to the ARC challenge 2024 and the paradigm of test-time fine-tuning. They explore how the team, now part of Tufa Labs in Zurich, achieved state-of-the-art results using a combination of pre-training techniques, a unique meta-learning strategy, and an ensemble voting mechanism. Mohamed emphasizes the importance of raw data input and flexibility of the network.SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/***TRANSCRIPT + REFS:https://www.dropbox.com/scl/fi/jeavyqidsjzjgjgd7ns7h/MoFInal.pdf?rlkey=cjjmo7rgtenxrr3b46nk6yq2e&dl=0Mohamed Osman (Tufa Labs)https://x.com/MohamedOsmanMLJack Cole (Tufa Labs)https://x.com/MindsAI_JackHow and why deep learning for ARC paper:https://github.com/MohamedOsman1998/deep-learning-for-arc/blob/main/deep_learning_for_arc.pdfTOC:1. Abstract Reasoning Foundations [00:00:00] 1.1 Test-Time Fine-Tuning and ARC Challenge Overview [00:10:20] 1.2 Neural Networks vs Programmatic Approaches to Reasoning [00:13:23] 1.3 Code-Based Learning and Meta-Model Architecture [00:20:26] 1.4 Technical Implementation with Long T5 Model2. ARC Solution Architectures [00:24:10] 2.1 Test-Time Tuning and Voting Methods for ARC Solutions [00:27:54] 2.2 Model Generalization and Function Generation Challenges [00:32:53] 2.3 Input Representation and VLM Limitations [00:36:21] 2.4 Architecture Innovation and Cross-Modal Integration [00:40:05] 2.5 Future of ARC Challenge and Program Synthesis Approaches3. Advanced Systems Integration [00:43:00] 3.1 DreamCoder Evolution and LLM Integration [00:50:07] 3.2 MindsAI Team Progress and Acquisition by Tufa Labs [00:54:15] 3.3 ARC v2 Development and Performance Scaling [00:58:22] 3.4 Intelligence Benchmarks and Transformer Limitations [01:01:50] 3.5 Neural Architecture Optimization and Processing DistributionREFS:[00:01:32] Original ARC challenge paper, François Chollethttps://arxiv.org/abs/1911.01547[00:06:55] DreamCoder, Kevin Ellis et al.https://arxiv.org/abs/2006.08381[00:12:50] Deep Learning with Python, François Chollethttps://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438[00:13:35] Deep Learning with Python, François Chollethttps://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438[00:13:35] Influence of pretraining data for reasoning, Laura Ruishttps://arxiv.org/abs/2411.12580[00:17:50] Latent Program Networks, Clement Bonnethttps://arxiv.org/html/2411.08706v1[00:20:50] T5, Colin Raffel et al.https://arxiv.org/abs/1910.10683[00:30:30] Combining Induction and Transduction for Abstract Reasoning, Wen-Ding Li, Kevin Ellis et al.https://arxiv.org/abs/2411.02272[00:34:15] Six finger problem, Chen et al.https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_SpatialVLM_Endowing_Vision-Language_Models_with_Spatial_Reasoning_Capabilities_CVPR_2024_paper.pdf[00:38:15] DeepSeek-R1-Distill-Llama, DeepSeek AIhttps://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B[00:40:10] ARC Prize 2024 Technical Report, François Chollet et al.https://arxiv.org/html/2412.04604v2[00:45:20] LLM-Guided Compositional Program Synthesis, Wen-Ding Li and Kevin Ellishttps://arxiv.org/html/2503.15540[00:54:25] Abstraction and Reasoning Corpus, François Chollethttps://github.com/fchollet/ARC-AGI[00:57:10] O3 breakthrough on ARC-AGI, OpenAIhttps://arcprize.org/[00:59:35] ConceptARC Benchmark, Arseny Moskvichev, Melanie Mitchellhttps://arxiv.org/abs/2305.07141[01:02:05] Mixtape: Breaking the Softmax Bottleneck Efficiently, Yang, Zhilin and Dai, Zihang and Salakhutdinov, Ruslan and Cohen, William W.http://papers.neurips.cc/paper/9723-mixtape-breaking-the-softmax-bottleneck-efficiently.pdf
fWotD Episode 2861: Leroy Chollet Welcome to Featured Wiki of the Day, your daily dose of knowledge from Wikipedia’s finest articles.The featured article for Wednesday, 5 March 2025 is Leroy Chollet.Leroy Patrick Chollet (March 5, 1925 – June 10, 1998) was an American professional basketball player. Chollet and his brothers attended Holy Cross School in New Orleans and excelled in sports. After a year in the United States Navy, Chollet enrolled at Loyola University New Orleans and led the Loyola Wolf Pack to their first NAIA men's basketball championship in 1945. Louisiana schools were segregated at the time. Chollet had an African American great-grandparent, and when this was revealed he was pressured into leaving Loyola. He moved to New York and played three seasons for Canisius College. In New York, he passed as white; Canisius would later claim Chollet to be the school's first African American basketball player.Chollet played for several professional teams, including the Syracuse Nationals. During the inaugural season of the National Basketball Association (NBA), he became a role player behind established veterans, and the team made it to the 1950 NBA Finals. An ankle injury limited Chollet's second year in the NBA. The Elmira Colonels, an American Basketball League team, signed Chollet for his third and final season. He married Barbara Knaus in June 1950. After retiring from professional basketball in 1952, he moved to her hometown, Lakewood, Ohio. They had three children: Lawrence, Melanie, and David. In Lakewood, Chollet worked on the construction of St. Edward High School and became a teacher and varsity head coach. He was inducted into the Halls of Fame of Holy Cross School, Loyola University, and Canisius College. He died in 1998.This recording reflects the Wikipedia text as of 00:30 UTC on Wednesday, 5 March 2025.For the full current version of the article, see Leroy Chollet on Wikipedia.This podcast uses content from Wikipedia under the Creative Commons Attribution-ShareAlike License.Visit our archives at wikioftheday.com and subscribe to stay updated on new episodes.Follow us on Mastodon at @wikioftheday@masto.ai.Also check out Curmudgeon's Corner, a current events podcast.Until next time, I'm standard Salli.
Our 197th episode with a summary and discussion of last week's big AI news! Recorded on 01/17/2024 Join our brand new Discord here! https://discord.gg/nTyezGSKwP Hosted by Andrey Kurenkov and guest-hosted by the folks from Latent Space Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. Sponsors: The Generator - An interdisciplinary AI lab empowering innovators from all fields to bring visionary ideas to life by harnessing the capabilities of artificial intelligence. In this episode: - Google and Mistral sign deals with AP and AFP, respectively, to deliver up-to-date news through their AI platforms. - ChatGPT introduces a tasks feature for reminders and to-dos, positioning itself more as a personal assistant. - Synthesia raises $180 million to enhance its AI video platform for generating videos of human avatars. - New U.S. guidelines restrict exporting AI chips to various countries, impacting Nvidia and other tech firms. If you would like to become a sponsor for the newsletter, podcast, or both, please fill out this form. Timestamps + Links: (00:00:00) Intro / Banter (00:04:29) News Preview (00:05:09) Response to listener comments (00:05:58) Sponsor Break Tools & Apps (00:07:01) Google is making AI in Gmail and Docs free — but raising the price of Workspace (00:07:52) Microsoft relaunches Copilot for business with free AI chat and pay-as-you-go agents (00:12:36) Google signs deal with AP to deliver up-to-date news through its Gemini AI chatbot (00:18:08) Mistral signs deal with AFP to offer up-to-date answers in Le Chat (00:18:45) ChatGPT can now handle reminders and to-dos Applications & Business (00:22:53) Palmer Luckey's AI Defense Company Anduril Is Building a $1 Billion Plant in Ohio (00:28:36) OpenAI is bankrolling Axios' expansion into four new markets (00:29:39) AI researcher François Chollet founds a new AI lab focused on AGI (00:32:18) Nvidia-backed AI video platform Synthesia doubles valuation to $2.1 billion (00:34:46) Anysphere Raises $105M in Series B (00:40:14) Harvey Valuation of 3 Billion Projects & Open Source (00:46:12) MiniMax-01: Scaling Foundation Models with Lightning Attention (00:51:16) MinMo: A Multimodal Large Language Model with Approximately 8B Parameters for Seamless Voice Interaction (00:53:01) HALoGEN: Fantastic LLM Hallucinations and Where to Find Them Research & Advancements (00:57:03) Titans: Learning to Memorize at Test Time (01:04:38) Transformer2: Self-adaptive LLMs (01:08:15) Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps Policy & Safety (01:11:23) Biden administration proposes sweeping new restrictions on exporting AI chips (01:13:56) Biden orders Energy, Defense departments to lease sites for AI data centers, clean energy generation (01:15:00) OpenAI presents its preferred version of AI regulation in a new ‘blueprint' (01:16:15) More teens report using ChatGPT for schoolwork, despite the tech's faults Synthetic Media & Art (01:17:55) In AI copyright case, Zuckerberg turns to YouTube for his defense (01:19:53) Outro
AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store
A Daily Chronicle of AI Innovations on January 16th 2025
Nintendo finally takes the wraps off the Switch 2. Everybody seems to want to give TikTok more time, but can they find a way to do it? A big new AI research lab. A check in with Nothing. The company, I mean. And the weird story of when Walgreens tries to replace refrigerator doors with smartscreens.Sponsors:TryJoyMode.com and code RIDE at checkoutLinks:Here's the Nintendo Switch 2 (The Verge)Trump considers executive order hoping to ‘save TikTok' from ban or sale in U.S. law (Washington Post)Biden administration looks for ways to keep TikTok available in the U.S. (NBCNews)AI researcher François Chollet founds a new AI lab focused on AGI (TechCrunch)Phone Startup Nothing Raises Funding, Crosses $1 Billion in Lifetime Sales (Bloomberg)Walgreens Replaced Fridge Doors With Smart Screens. It's Now a $200 Million Fiasco (Bloomberg)See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
Thomas Isle et sa bande vous font vivre toute l'actualité culturelle, entre invités et décryptages, le tout dénué d'à-priori, mais non de bienveillance.
Thomas Isle et sa bande vous font vivre toute l'actualité culturelle, entre invités et décryptages, le tout dénué d'à-priori, mais non de bienveillance.
François Chollet discusses the outcomes of the ARC-AGI (Abstraction and Reasoning Corpus) Prize competition in 2024, where accuracy rose from 33% to 55.5% on a private evaluation set. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events? They are hosting an event in Zurich on January 9th with the ARChitects, join if you can. Goto https://tufalabs.ai/ *** Read about the recent result on o3 with ARC here (Chollet knew about it at the time of the interview but wasn't allowed to say): https://arcprize.org/blog/oai-o3-pub-breakthrough TOC: 1. Introduction and Opening [00:00:00] 1.1 Deep Learning vs. Symbolic Reasoning: François's Long-Standing Hybrid View [00:00:48] 1.2 “Why Do They Call You a Symbolist?” – Addressing Misconceptions [00:01:31] 1.3 Defining Reasoning 3. ARC Competition 2024 Results and Evolution [00:07:26] 3.1 ARC Prize 2024: Reflecting on the Narrative Shift Toward System 2 [00:10:29] 3.2 Comparing Private Leaderboard vs. Public Leaderboard Solutions [00:13:17] 3.3 Two Winning Approaches: Deep Learning–Guided Program Synthesis and Test-Time Training 4. Transduction vs. Induction in ARC [00:16:04] 4.1 Test-Time Training, Overfitting Concerns, and Developer-Aware Generalization [00:19:35] 4.2 Gradient Descent Adaptation vs. Discrete Program Search 5. ARC-2 Development and Future Directions [00:23:51] 5.1 Ensemble Methods, Benchmark Flaws, and the Need for ARC-2 [00:25:35] 5.2 Human-Level Performance Metrics and Private Test Sets [00:29:44] 5.3 Task Diversity, Redundancy Issues, and Expanded Evaluation Methodology 6. Program Synthesis Approaches [00:30:18] 6.1 Induction vs. Transduction [00:32:11] 6.2 Challenges of Writing Algorithms for Perceptual vs. Algorithmic Tasks [00:34:23] 6.3 Combining Induction and Transduction [00:37:05] 6.4 Multi-View Insight and Overfitting Regulation 7. Latent Space and Graph-Based Synthesis [00:38:17] 7.1 Clément Bonnet's Latent Program Search Approach [00:40:10] 7.2 Decoding to Symbolic Form and Local Discrete Search [00:41:15] 7.3 Graph of Operators vs. Token-by-Token Code Generation [00:45:50] 7.4 Iterative Program Graph Modifications and Reusable Functions 8. Compute Efficiency and Lifelong Learning [00:48:05] 8.1 Symbolic Process for Architecture Generation [00:50:33] 8.2 Logarithmic Relationship of Compute and Accuracy [00:52:20] 8.3 Learning New Building Blocks for Future Tasks 9. AI Reasoning and Future Development [00:53:15] 9.1 Consciousness as a Self-Consistency Mechanism in Iterative Reasoning [00:56:30] 9.2 Reconciling Symbolic and Connectionist Views [01:00:13] 9.3 System 2 Reasoning - Awareness and Consistency [01:03:05] 9.4 Novel Problem Solving, Abstraction, and Reusability 10. Program Synthesis and Research Lab [01:05:53] 10.1 François Leaving Google to Focus on Program Synthesis [01:09:55] 10.2 Democratizing Programming and Natural Language Instruction 11. Frontier Models and O1 Architecture [01:14:38] 11.1 Search-Based Chain of Thought vs. Standard Forward Pass [01:16:55] 11.2 o1's Natural Language Program Generation and Test-Time Compute Scaling [01:19:35] 11.3 Logarithmic Gains with Deeper Search 12. ARC Evaluation and Human Intelligence [01:22:55] 12.1 LLMs as Guessing Machines and Agent Reliability Issues [01:25:02] 12.2 ARC-2 Human Testing and Correlation with g-Factor [01:26:16] 12.3 Closing Remarks and Future Directions SHOWNOTES PDF: https://www.dropbox.com/scl/fi/ujaai0ewpdnsosc5mc30k/CholletNeurips.pdf?rlkey=s68dp432vefpj2z0dp5wmzqz6&st=hazphyx5&dl=0
Today in AI is a daily recap of the latest news and developments in the AI industry. See your story and want to be featured in an upcoming episode? Reach out at tonyphoang.com The Consumer Electronics Show in Las Vegas revealed groundbreaking automotive innovations from companies like Hyundai Mobis, BMW, and Honda. These innovations include advanced holographic displays, highly customizable in-vehicle systems, and extensive electric vehicle charging networks aimed at enhancing the driving experience, improving safety, and promoting sustainability. Honda's AI-integrated prototypes promise to redefine vehicles as personal companions, potentially impacting mental health and privacy. In the realm of AI wearables, Based Hardware of San Francisco introduced Omi, an affordable device designed to boost productivity through a brain interface and voice commands. With features emphasizing user privacy and an open-source platform for developers, Omi represents the next step in wearable tech. Grove AI, founded by Stanford engineers, is using their AI agent, Grace, to make clinical trial enrollments more efficient, reducing administrative burdens for patients and healthcare providers alike. Waymo has responded to recent safety incidents involving their autonomous vehicles with significant enhancements to their technology and infrastructure. These improvements aim to address software updates, communication systems, and public safety concerns, ensuring regulatory compliance. Concurrently, security vulnerabilities in Automated License Plate Recognition systems have sparked debates over privacy and the potential misuse of surveillance technologies, highlighting the need for stricter data security measures. Microsoft faced backlash after upgrading its Bing Image Creator with the new DALL-E 3 model, which led to decreased user satisfaction due to image quality and ethical concerns. The company had to revert to the previous model, underscoring the challenges of keeping technical advancements in line with user expectations and the wider implications for market competition and public trust. In parallel, the ARC Prize Foundation, co-founded by François Chollet and Greg Kamradt, seeks to create benchmarks for evaluating AI's journey towards human-level intelligence, driving progress in artificial general intelligence. Small businesses are increasingly adopting generative AI to improve efficiency, counteract labor shortages, and maintain a competitive edge. While data security and accuracy remain challenges, AI is proving transformative across industries, promoting innovation and growth. However, the development of AGI and superintelligence by entities like Open AI presents significant ethical, economic, and geopolitical challenges that must be addressed.
Le système O3 d'OpenAI, futur moteur de ChatGPT, a marqué un tournant majeur dans la recherche en intelligence artificielle. Il a récemment obtenu un score de 85 % au test ARC-AGI, un benchmark de référence conçu pour évaluer la capacité des systèmes d'IA à généraliser et s'adapter à de nouvelles situations. Ce résultat, égal à la moyenne humaine, surpasse nettement les 55 % obtenus par les IA précédentes. Une avancée qui alimente les espoirs d'approcher l'intelligence artificielle générale (AGI). Le test ARC-AGI, développé par le chercheur français François Chollet, évalue l'efficacité d'échantillonnage : la capacité à résoudre des problèmes inédits à partir de quelques exemples. Concrètement, il s'agit pour l'IA d'analyser des transformations appliquées à des grilles carrées, à partir de trois exemples, avant de généraliser une règle pour résoudre un cas supplémentaire. O3 a impressionné en démontrant une aptitude à identifier des règles simples et généralisables. Selon certains experts, le système pourrait fonctionner par « chaînes de pensée », testant différentes étapes pour résoudre les problèmes avant de sélectionner la meilleure. Une méthode proche de celle d'AlphaGo, l'IA de Google qui a battu le champion du monde de Go. Mais cet enthousiasme s'accompagne de prudence. OpenAI reste discrète sur les détails techniques et les capacités réelles d'O3, limitant ses communications à quelques tests préliminaires. Des experts craignent que cette performance soit le fruit d'une optimisation spécifique au test ARC-AGI, plutôt qu'une véritable capacité de généralisation applicable à d'autres contextes. Pour trancher, des évaluations plus vastes seront nécessaires. Si O3 démontre une adaptabilité humaine dans divers domaines, les répercussions pourraient être révolutionnaires, ouvrant la voie à des IA auto-améliorantes avec des impacts sociétaux majeurs. Reste à voir si cette promesse deviendra réalité. Hébergé par Acast. Visitez acast.com/privacy pour plus d'informations.
durée : 00:02:20 - Le point Dakar avec Herbst et Chollet
Analysis of image classifiers demonstrates that it is possible to understand backprop networks at the task-relevant run-time algorithmic level. In these systems, at least, networks gain their power from deploying massive parallelism to check for the presence of a vast number of simple, shallow patterns. https://betterwithout.ai/images-surface-features This episode has a lot of links: David Chapman's earliest public mention, in February 2016, of image classifiers probably using color and texture in ways that "cheat": twitter.com/Meaningness/status/698688687341572096 Jordana Cepelewicz's “Where we see shapes, AI sees textures,” Quanta Magazine, July 1, 2019: https://www.quantamagazine.org/where-we-see-shapes-ai-sees-textures-20190701/ “Suddenly, a leopard print sofa appears”, May 2015: https://web.archive.org/web/20150622084852/http://rocknrollnerd.github.io/ml/2015/05/27/leopard-sofa.html “Understanding How Image Quality Affects Deep Neural Networks” April 2016: https://arxiv.org/abs/1604.04004 Goodfellow et al., “Explaining and Harnessing Adversarial Examples,” December 2014: https://arxiv.org/abs/1412.6572 “Universal adversarial perturbations,” October 2016: https://arxiv.org/pdf/1610.08401v1.pdf “Exploring the Landscape of Spatial Robustness,” December 2017: https://arxiv.org/abs/1712.02779 “Overinterpretation reveals image classification model pathologies,” NeurIPS 2021: https://proceedings.neurips.cc/paper/2021/file/8217bb4e7fa0541e0f5e04fea764ab91-Paper.pdf “Approximating CNNs with Bag-of-Local-Features Models Works Surprisingly Well on ImageNet,” ICLR 2019: https://openreview.net/forum?id=SkfMWhAqYQ Baker et al.'s “Deep convolutional networks do not classify based on global object shape,” PLOS Computational Biology, 2018: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006613 François Chollet's Twitter threads about AI producing images of horses with extra legs: twitter.com/fchollet/status/1573836241875120128 and twitter.com/fchollet/status/1573843774803161090 “Zoom In: An Introduction to Circuits,” 2020: https://distill.pub/2020/circuits/zoom-in/ Geirhos et al., “ImageNet-Trained CNNs Are Biased Towards Texture; Increasing Shape Bias Improves Accuracy and Robustness,” ICLR 2019: https://openreview.net/forum?id=Bygh9j09KX Dehghani et al., “Scaling Vision Transformers to 22 Billion Parameters,” 2023: https://arxiv.org/abs/2302.05442 Hasson et al., “Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks,” February 2020: https://www.gwern.net/docs/ai/scaling/2020-hasson.pdf
Current AI practice is not engineering, even when it aims for practical applications, because it is not based on scientific understanding. Enforcing engineering norms on the field could lead to considerably safer systems. https://betterwithout.ai/AI-as-engineering This episode has a lot of links! Here they are. Michael Nielsen's “The role of ‘explanation' in AI”. https://michaelnotebook.com/ongoing/sporadica.html#role_of_explanation_in_AI Subbarao Kambhampati's “Changing the Nature of AI Research”. https://dl.acm.org/doi/pdf/10.1145/3546954 Chris Olah and his collaborators: “Thread: Circuits”. distill.pub/2020/circuits/ “An Overview of Early Vision in InceptionV1”. distill.pub/2020/circuits/early-vision/ Dai et al., “Knowledge Neurons in Pretrained Transformers”. https://arxiv.org/pdf/2104.08696.pdf Meng et al.: “Locating and Editing Factual Associations in GPT.” rome.baulab.info “Mass-Editing Memory in a Transformer,” https://arxiv.org/pdf/2210.07229.pdf François Chollet on image generators putting the wrong number of legs on horses: twitter.com/fchollet/status/1573879858203340800 Neel Nanda's “Longlist of Theories of Impact for Interpretability”, https://www.lesswrong.com/posts/uK6sQCNMw8WKzJeCQ/a-longlist-of-theories-of-impact-for-interpretability Zachary C. Lipton's “The Mythos of Model Interpretability”. https://arxiv.org/abs/1606.03490 Meng et al., “Locating and Editing Factual Associations in GPT”. https://arxiv.org/pdf/2202.05262.pdf Belrose et al., “Eliciting Latent Predictions from Transformers with the Tuned Lens”. https://arxiv.org/abs/2303.08112 “Progress measures for grokking via mechanistic interpretability”. https://arxiv.org/abs/2301.05217 Conmy et al., “Towards Automated Circuit Discovery for Mechanistic Interpretability”. https://arxiv.org/abs/2304.14997 Elhage et al., “Softmax Linear Units,” transformer-circuits.pub/2022/solu/index.html Filan et al., “Clusterability in Neural Networks,” https://arxiv.org/pdf/2103.03386.pdf Cammarata et al., “Curve circuits,” distill.pub/2020/circuits/curve-circuits/ You can support the podcast and get episodes a week early, by supporting the Patreon: https://www.patreon.com/m/fluidityaudiobooks If you like the show, consider buying me a coffee: https://www.buymeacoffee.com/mattarnold Original music by Kevin MacLeod. This podcast is under a Creative Commons Attribution Non-Commercial International 4.0 License.
Hakuro Matsuda さんをゲストに迎えて、台湾、Intel, MacBook Pro, OpenAI などについて話しました。 スポンサー: SRE Kaigi Show Notes Hub3 12/3 ポッドキャスターインタビュー・宮川達彦さん【Rebuild】 - LISTEN NEWS Loop Experience 2 Plus 耳栓 Intel's CEO is out after only three years Intel, Biden-Harris Administration Finalize $7.86 Billion Funding... Pat Gelsinger (@PGelsinger) Introducing ChatGPT Pro Perplexity Pro|LINEMO World Labs AI pioneer François Chollet leaves Google Home - SIGGRAPH Asia 2024 Popular video game company to close SF studio amid mass layoffs 桜井政博のゲーム作るには デッドデッドデーモンズデデデデデストラクション ダンダダン 正体 本心 Gladiator II 機動戦士Gundam GQuuuuuuX(ジークアクス) Ghost of Yōtei is coming in 2025 HD-2D版 ドラゴンクエスト 198Xのファミコン狂騒曲 中山美穂のトキメキハイスクール SRE Kaigi 2025 ★
Une workaholic plus très anonyme Cheffe d'édition au « Monde Diplomatique » de 2007 à 2022, Mona Chollet se décrit – avec euphémisme – comme « plutôt consciencieuse ». Interrogée par « Femme Actuelle », la journaliste explique : « L'aspect robotique du salariat me convenait très bien. Tout comme cette logique rassurante de l'effort récompensé : je me savais le droit de profiter de mes week-ends. » Or, quand le succès de ses livres lui permet de se libérer de cet emploi quotidien, c'est la panique à bord, sur laquelle s'ouvre son dernier essai, « Résister à la culpabilisation » (La Découverte, 2024). Ce « bulldozer » cérébral ajoute : « J'avais oublié l'autonomie. Je m'étais habituée à ce qu'on me dise tous les matins où aller, quoi faire et jusqu'à quelle heure. Organiser soi-même ses journées provoque un grand désarroi. Je me forçais à travailler huit heures par jour et le week-end, pour ne pas me laisser aller (…) Se tuer au travail, faire totalement abstraction de son bien-être, se révèle bien vu. » Bien vu, son propos l'est aussi. Avec un premier tirage de 70 000 exemplaires, « le nouveau Mona Chollet », pour lequel elle refuse les invitations à parler en public, figure déjà parmi les dix meilleures ventes de l'automne. Son livre n'aborde pas seulement la question du sacrifice en entreprise ; parmi ce qu'elle recense comme des « empêchements d'exister », Chollet dissèque les discours misogynes, la mise en accusation des victimes de violences sexuelles, les injonctions éducatives, ou encore « le flicage des mots et des pensées » au sein des sphères militantes.Suivie par 92 000 abonné·e·s sur X, Mona Chollet définit parfois son rapport à l'écriture comme « une drogue en soi, une porte dérobée dans l'horreur de l'époque ». Pour ce troisième et dernier épisode, ouvrons celle du petit bureau – monastique – de la Mona, qui continue de rêver d'une pièce plus grande « dont la fenêtre resterait éclairée jusqu'à une heure avancée de la nuit, pour y faire naître des livres ». L'autrice du mois : Mona CholletNée à Genève en 1973, « obsédée par le fait de lire, de s'informer et de changer le monde », la journaliste suisse Mona Chollet est devenue pour toute une génération de féministes un modèle d'intelligence, de sensibilité et de précision. Depuis le début des années 2000, via une dizaine d'essais érudits (« Beauté fatale », « Sorcières », « Réinventer l'amour »), elle analyse remarquablement les mécanismes de domination (masculine, capitaliste, professionnelle – ou les trois à la fois), en partageant son admiration pour la poésie de Mahmoud Darwich ou la prose engagée de Susan Sontag, pour les séries « Mad Men » ou « La Fabuleuse Madame Maisel », le tout entremêlé de confidences personnelles ou tirées de son cercle d'amies. Elle vit et travaille à Paris. Enregistrements : septembre 2024 - Réalisation : Charlie Marcelet - Mixage : Charlie Marcelet - Illustration : Sylvain Cabot - Chant, beatmaking : Élodie Milo - Musiques originales : Samuel Hirsch - Entretien, découpage : Richard Gaitet - Prise de son : Mathilde Guermonprez - Montage : Gary Salin - Lectures : Delphine Saltel - Production : ARTE Radio
Notre sorcière bien-aimée En 2017, dans le secret nocturne de son laboratoire, Mona Chollet jette dans son chaudron mental les ingrédients de la réhabilitation d'une figure populaire : la sorcière. Publié l'année suivante aux éditions La Découverte, son ouvrage « Sorcières : la puissance invaincue des femmes » se souvient de ces dizaines de milliers de féminicides perpétrés du XVe au XVIIe siècle, en Europe, qui visèrent principalement les célibataires sans enfant. Chollet interroge en profondeur ce « coup porté à toutes les velléités d'indépendance féminine », la « haine » des cheveux blancs, la criminalisation de la contraception et de l'avortement, en s'appuyant autant sur les romans de Toni Morrison que sur le film « Liaison fatale ». Elle y affine son geste : « J'écris pour faire émerger des sujets qui n'étaient parfois même pas identifiés, en affirmant leur pertinence, leur dignité. Je suis une aimable bourgeoise bien élevée et cela m'embarrasse de me faire remarquer. Je sors du rang quand je ne peux pas faire autrement, quand mes convictions et aspirations m'y obligent. J'écris pour me donner du courage. » Abracadabra ! Le livre devient un grimoire de référence traduit en quinze langues et vendu à 380 000 exemplaires. Son nom se mue en incantation. D'où la nécessité d'interroger ses sortilèges, la structure de ses best-sellers qu'elle situe « entre le développement personnel et la politique », son usage des citations ou sa réticence au « terrain », en naviguant des podiums de « Beauté fatale » (sur les clichés véhiculés par l'industrie de la mode et la presse féminine, sorti en 2012 et vendu à 120 000 exemplaires) jusqu'à « Réinventer l'amour » (sur les impasses et les violences des relations hétérosexuelles, sorti en 2021 et vendu à 200 000 exemplaires), en passant par son petit préféré, « Chez soi » (sur « la sagesse des casaniers », sorti en 2015 et vendu à 65 000 exemplaires). Turlututu, chapeau pointu, n'attendons plus : envolons-nous sur le balai de cette sorcière bien-aimée, qui nettoie de nombreuses pensées poussiéreuses ! L'autrice du mois : Mona CholletNée à Genève en 1973, « obsédée par le fait de lire, de s'informer et de changer le monde », la journaliste suisse Mona Chollet est devenue pour toute une génération de féministes un modèle d'intelligence, de sensibilité et de précision. Depuis le début des années 2000, via une dizaine d'essais érudits (« Beauté fatale », « Sorcières », « Réinventer l'amour »), elle analyse remarquablement les mécanismes de domination (masculine, capitaliste, professionnelle – ou les trois à la fois), en partageant son admiration pour la poésie de Mahmoud Darwich ou la prose engagée de Susan Sontag, pour les séries « Mad Men » ou « La Fabuleuse Madame Maisel », le tout entremêlé de confidences personnelles ou tirées de son cercle d'amies. Elle vit et travaille à Paris. Enregistrements : septembre 2024 - Réalisation : Charlie Marcelet - Mixage : Charlie Marcelet - Illustration : Sylvain Cabot - Chant, beatmaking : Élodie Milo - Musiques originales : Samuel Hirsch - Entretien, découpage : Richard Gaitet - Prise de son : Mathilde Guermonprez - Montage : Gary Salin - Lectures : Delphine Saltel - Production : ARTE Radio
durée : 00:02:08 - Guillaume Chollet - pilote moto sur le Dakar
Francois Chollet, a prominent AI expert and creator of ARC-AGI, discusses intelligence, consciousness, and artificial intelligence. Chollet explains that real intelligence isn't about memorizing information or having lots of knowledge - it's about being able to handle new situations effectively. This is why he believes current large language models (LLMs) have "near-zero intelligence" despite their impressive abilities. They're more like sophisticated memory and pattern-matching systems than truly intelligent beings. *** MLST IS SPONSORED BY TUFA AI LABS! The current winners of the ARC challenge, MindsAI are part of Tufa AI Labs. They are hiring ML engineers. Are you interested?! Please goto https://tufalabs.ai/ *** He introduced his "Kaleidoscope Hypothesis," which suggests that while the world seems infinitely complex, it's actually made up of simpler patterns that repeat and combine in different ways. True intelligence, he argues, involves identifying these basic patterns and using them to understand new situations. Chollet also talked about consciousness, suggesting it develops gradually in children rather than appearing all at once. He believes consciousness exists in degrees - animals have it to some extent, and even human consciousness varies with age and circumstances (like being more conscious when learning something new versus doing routine tasks). On AI safety, Chollet takes a notably different stance from many in Silicon Valley. He views AGI development as a scientific challenge rather than a religious quest, and doesn't share the apocalyptic concerns of some AI researchers. He argues that intelligence itself isn't dangerous - it's just a tool for turning information into useful models. What matters is how we choose to use it. ARC-AGI Prize: https://arcprize.org/ Francois Chollet: https://x.com/fchollet Shownotes: https://www.dropbox.com/scl/fi/j2068j3hlj8br96pfa7bi/CHOLLET_FINAL.pdf?rlkey=xkbr7tbnrjdl66m246w26uc8k&st=0a4ec4na&dl=0 TOC: 1. Intelligence and Model Building [00:00:00] 1.1 Intelligence Definition and ARC Benchmark [00:05:40] 1.2 LLMs as Program Memorization Systems [00:09:36] 1.3 Kaleidoscope Hypothesis and Abstract Building Blocks [00:13:39] 1.4 Deep Learning Limitations and System 2 Reasoning [00:29:38] 1.5 Intelligence vs. Skill in LLMs and Model Building 2. ARC Benchmark and Program Synthesis [00:37:36] 2.1 Intelligence Definition and LLM Limitations [00:41:33] 2.2 Meta-Learning System Architecture [00:56:21] 2.3 Program Search and Occam's Razor [00:59:42] 2.4 Developer-Aware Generalization [01:06:49] 2.5 Task Generation and Benchmark Design 3. Cognitive Systems and Program Generation [01:14:38] 3.1 System 1/2 Thinking Fundamentals [01:22:17] 3.2 Program Synthesis and Combinatorial Challenges [01:31:18] 3.3 Test-Time Fine-Tuning Strategies [01:36:10] 3.4 Evaluation and Leakage Problems [01:43:22] 3.5 ARC Implementation Approaches 4. Intelligence and Language Systems [01:50:06] 4.1 Intelligence as Tool vs Agent [01:53:53] 4.2 Cultural Knowledge Integration [01:58:42] 4.3 Language and Abstraction Generation [02:02:41] 4.4 Embodiment in Cognitive Systems [02:09:02] 4.5 Language as Cognitive Operating System 5. Consciousness and AI Safety [02:14:05] 5.1 Consciousness and Intelligence Relationship [02:20:25] 5.2 Development of Machine Consciousness [02:28:40] 5.3 Consciousness Prerequisites and Indicators [02:36:36] 5.4 AGI Safety Considerations [02:40:29] 5.5 AI Regulation Framework
durée : 01:00:33 - Être et savoir - par : Louise Tourret - Peut-on éduquer sans culpabiliser (soi-même et ses enfants) ? - réalisation : Peire Legras - invités : Mona Chollet Journaliste, essayiste
Alessandro Palmarini is a post-baccalaureate researcher at the Santa Fe Institute working under the supervision of Melanie Mitchell. He completed his undergraduate degree in Artificial Intelligence and Computer Science at the University of Edinburgh. Palmarini's current research focuses on developing AI systems that can efficiently acquire new skills from limited data, inspired by François Chollet's work on measuring intelligence. His work builds upon the DreamCoder program synthesis system, introducing a novel approach called "dream decompiling" to improve library learning in inductive program synthesis. Palmarini is particularly interested in addressing the Abstraction and Reasoning Corpus (ARC) challenge, aiming to create AI systems that can perform abstract reasoning tasks more efficiently than current approaches. His research explores the balance between computational efficiency and data efficiency in AI learning processes. DO YOU WANT WORK ON ARC with the MindsAI team (current ARC winners)? MLST is sponsored by Tufa Labs: Focus: ARC, LLMs, test-time-compute, active inference, system2 reasoning, and more. Future plans: Expanding to complex environments like Warcraft 2 and Starcraft 2. Interested? Apply for an ML research position: benjamin@tufa.ai TOC: 1. Intelligence Measurement in AI Systems [00:00:00] 1.1 Defining Intelligence in AI Systems [00:02:00] 1.2 Research at Santa Fe Institute [00:04:35] 1.3 Impact of Gaming on AI Development [00:05:10] 1.4 Comparing AI and Human Learning Efficiency 2. Efficient Skill Acquisition in AI [00:06:40] 2.1 Intelligence as Skill Acquisition Efficiency [00:08:25] 2.2 Limitations of Current AI Systems in Generalization [00:09:45] 2.3 Human vs. AI Cognitive Processes [00:10:40] 2.4 Measuring AI Intelligence: Chollet's ARC Challenge 3. Program Synthesis and ARC Challenge [00:12:55] 3.1 Philosophical Foundations of Program Synthesis [00:17:14] 3.2 Introduction to Program Induction and ARC Tasks [00:18:49] 3.3 DreamCoder: Principles and Techniques [00:27:55] 3.4 Trade-offs in Program Synthesis Search Strategies [00:31:52] 3.5 Neural Networks and Bayesian Program Learning 4. Advanced Program Synthesis Techniques [00:32:30] 4.1 DreamCoder and Dream Decompiling Approach [00:39:00] 4.2 Beta Distribution and Caching in Program Synthesis [00:45:10] 4.3 Performance and Limitations of Dream Decompiling [00:47:45] 4.4 Alessandro's Approach to ARC Challenge [00:51:12] 4.5 Conclusion and Future Discussions Refs: Full reflist on YT VD, Show Notes and MP3 metadata Show Notes: https://www.dropbox.com/scl/fi/x50201tgqucj5ba2q4typ/Ale.pdf?rlkey=0ubvk7p5gtyx1gpownpdadim8&st=5pniu3nq&dl=0
François Chollet discusses the limitations of Large Language Models (LLMs) and proposes a new approach to advancing artificial intelligence. He argues that current AI systems excel at pattern recognition but struggle with logical reasoning and true generalization. This was Chollet's keynote talk at AGI-24, filmed in high-quality. We will be releasing a full interview with him shortly. A teaser clip from that is played in the intro! Chollet introduces the Abstraction and Reasoning Corpus (ARC) as a benchmark for measuring AI progress towards human-like intelligence. He explains the concept of abstraction in AI systems and proposes combining deep learning with program synthesis to overcome current limitations. Chollet suggests that breakthroughs in AI might come from outside major tech labs and encourages researchers to explore new ideas in the pursuit of artificial general intelligence. TOC 1. LLM Limitations and Intelligence Concepts [00:00:00] 1.1 LLM Limitations and Composition [00:12:05] 1.2 Intelligence as Process vs. Skill [00:17:15] 1.3 Generalization as Key to AI Progress 2. ARC-AGI Benchmark and LLM Performance [00:19:59] 2.1 Introduction to ARC-AGI Benchmark [00:20:05] 2.2 Introduction to ARC-AGI and the ARC Prize [00:23:35] 2.3 Performance of LLMs and Humans on ARC-AGI 3. Abstraction in AI Systems [00:26:10] 3.1 The Kaleidoscope Hypothesis and Abstraction Spectrum [00:30:05] 3.2 LLM Capabilities and Limitations in Abstraction [00:32:10] 3.3 Value-Centric vs Program-Centric Abstraction [00:33:25] 3.4 Types of Abstraction in AI Systems 4. Advancing AI: Combining Deep Learning and Program Synthesis [00:34:05] 4.1 Limitations of Transformers and Need for Program Synthesis [00:36:45] 4.2 Combining Deep Learning and Program Synthesis [00:39:59] 4.3 Applying Combined Approaches to ARC Tasks [00:44:20] 4.4 State-of-the-Art Solutions for ARC Shownotes (new!): https://www.dropbox.com/scl/fi/i7nsyoahuei6np95lbjxw/CholletKeynote.pdf?rlkey=t3502kbov5exsdxhderq70b9i&st=1ca91ewz&dl=0 [0:01:15] Abstraction and Reasoning Corpus (ARC): AI benchmark (François Chollet) https://arxiv.org/abs/1911.01547 [0:05:30] Monty Hall problem: Probability puzzle (Steve Selvin) https://www.tandfonline.com/doi/abs/10.1080/00031305.1975.10479121 [0:06:20] LLM training dynamics analysis (Tirumala et al.) https://arxiv.org/abs/2205.10770 [0:10:20] Transformer limitations on compositionality (Dziri et al.) https://arxiv.org/abs/2305.18654 [0:10:25] Reversal Curse in LLMs (Berglund et al.) https://arxiv.org/abs/2309.12288 [0:19:25] Measure of intelligence using algorithmic information theory (François Chollet) https://arxiv.org/abs/1911.01547 [0:20:10] ARC-AGI: GitHub repository (François Chollet) https://github.com/fchollet/ARC-AGI [0:22:15] ARC Prize: $1,000,000+ competition (François Chollet) https://arcprize.org/ [0:33:30] System 1 and System 2 thinking (Daniel Kahneman) https://www.amazon.com/Thinking-Fast-Slow-Daniel-Kahneman/dp/0374533555 [0:34:00] Core knowledge in infants (Elizabeth Spelke) https://www.harvardlds.org/wp-content/uploads/2017/01/SpelkeKinzler07-1.pdf [0:34:30] Embedding interpretive spaces in ML (Tennenholtz et al.) https://arxiv.org/abs/2310.04475 [0:44:20] Hypothesis Search with LLMs for ARC (Wang et al.) https://arxiv.org/abs/2309.05660 [0:44:50] Ryan Greenblatt's high score on ARC public leaderboard https://arcprize.org/
On this episode, I talk to Matt Kirkland, Founder and Designer at Brand New Box. We talk about how to get good clients, the utility of reminding people that you exist, reading science fiction, ChatGPT as highly advanced autocomplete, reading history, The limits of ChatGPT-style AI as compared to AGI, and Matt's Dracula read-through newsletter Dracula Daily. Philip K. Dick on GoodreadsFrançois Chollet on Sean Carroll's Mindscape PodcastDracula DailyMatt Kirkland.comBrand New BoxMatt Kirkland on TwitterMatt Kirkland on LinkedIn
Dans ce second épisode spécial de Business of Bouffe, enregistré à Bordeaux à l'occasion de l'événement “Bordeaux Fête le Vin”, nous parlons plus en détail de l'élaboration du vin. Nos invitées sont une nouvelle fois un duo de passionnées. L'une est vigneronne. Elle s'appelle Stéphanie Chollet et s'est reconvertie avec courage dans la viticulture. L'autre est œnologue : elle s'appelle Ophélie Michaud et travaille pour la maison Mouton Cadet, en lien étroit avec les vignerons partenaires comme Stéphanie.Dans cet épisode, nous cherchons à mieux comprendre le métier de vigneron et le modèle des vins bordelais à travers l'exemple des vins Mouton Cadet.Cet épisode spécial a été enregistré en collaboration avec la maison Mouton Cadet.Et n'oubliez pas, pour votre santé : attention à l'abus d'alcool ! Hébergé par Acast. Visitez acast.com/privacy pour plus d'informations.
While "Smashing Security" is on its summer holiday, here's a chance to listen to an episode of its sister show - "The AI Fix".In episode ten of The AI Fix, Graham attempts to say "quinoa", Mark draws a line in the amper-sand, ChatGPT becomes an expert in solar panels and bomb disposal, and our hosts watch a terrifying trailer for a creepy new AI friend.Graham discovers that the world of AI cookery is a soggy, limey mess, and learns an unusual trick for making a great mojito, while Mark pits his co-host against the cleverest AI brains in the world.Episode links:OpenAI starts rollout of Advanced Voice Mode.UK Government shelves £1.3bn UK tech and AI plans.Friend trailer.Artificial intelligence has hard time with accents.Netherlands court uses ChatGPT to decide things.Argentina will use AI to ‘predict future crimes' but experts worry for citizens' rights.Twitter thread on crockpot cookbook.Get ready for AI to rip off your favorite cookbooks.‘One of the most disgusting meals I've ever eaten': AI recipes tested.This cookbook author was a best-seller on Amazon — but she may not even be human.ARC Prize.ARC Prize leaderboard.On the Measure of Intelligence research paper by François Chollet.The AI FixThe AI Fix podcast is presented by Graham Cluley and Mark Stockley.Learn more about the podcast at theaifix.show, and follow us on Twitter at @TheAIFix.Never miss another episode by following us in your favourite podcast app. It's free!Like to give us some feedback or sponsor the podcast? Get in touch.This...
Prof. Subbarao Kambhampati argues that while LLMs are impressive and useful tools, especially for creative tasks, they have fundamental limitations in logical reasoning and cannot provide guarantees about the correctness of their outputs. He advocates for hybrid approaches that combine LLMs with external verification systems. MLST is sponsored by Brave: The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api. TOC (sorry the ones baked into the MP3 were wrong apropos due to LLM hallucination!) [00:00:00] Intro [00:02:06] Bio [00:03:02] LLMs are n-gram models on steroids [00:07:26] Is natural language a formal language? [00:08:34] Natural language is formal? [00:11:01] Do LLMs reason? [00:19:13] Definition of reasoning [00:31:40] Creativity in reasoning [00:50:27] Chollet's ARC challenge [01:01:31] Can we reason without verification? [01:10:00] LLMs cant solve some tasks [01:19:07] LLM Modulo framework [01:29:26] Future trends of architecture [01:34:48] Future research directions Youtube version: https://www.youtube.com/watch?v=y1WnHpedi2A Refs: (we didn't have space for URLs here, check YT video description instead) Can LLMs Really Reason and Plan? On the Planning Abilities of Large Language Models : A Critical Investigation Chain of Thoughtlessness? An Analysis of CoT in Planning On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve "Task Success" is not Enough Partition function (number theory) (Srinivasa Ramanujan and G.H. Hardy's work) Poincaré conjecture Gödel's incompleteness theorems ROT13 (Rotate13, "rotate by 13 places") A Mathematical Theory of Communication (C. E. SHANNON) Sparks of AGI Kambhampati thesis on speech recognition (1983) PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change Explainable human-AI interaction Tree of Thoughts On the Measure of Intelligence (ARC Challenge) Getting 50% (SoTA) on ARC-AGI with GPT-4o (Ryan Greenblatt ARC solution) PROGRAMS WITH COMMON SENSE (John McCarthy) - "AI should be an advice taker program" Original chain of thought paper ICAPS 2024 Keynote: Dale Schuurmans on "Computing and Planning with Large Generative Models" (COT) The Hardware Lottery (Hooker) A Path Towards Autonomous Machine Intelligence (JEPA/LeCun) AlphaGeometry FunSearch Emergent Abilities of Large Language Models Language models are not naysayers (Negation in LLMs) The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A" Embracing negative results
Sara Hooker is VP of Research at Cohere and leader of Cohere for AI. We discuss her recent paper critiquing the use of compute thresholds, measured in FLOPs (floating point operations), as an AI governance strategy. We explore why this approach, recently adopted in both US and EU AI policies, may be problematic and oversimplified. Sara explains the limitations of using raw computational power as a measure of AI capability or risk, and discusses the complex relationship between compute, data, and model architecture. Equally important, we go into Sara's work on "The AI Language Gap." This research highlights the challenges and inequalities in developing AI systems that work across multiple languages. Sara discusses how current AI models, predominantly trained on English and a handful of high-resource languages, fail to serve the linguistic diversity of our global population. We explore the technical, ethical, and societal implications of this gap, and discuss potential solutions for creating more inclusive and representative AI systems. We broadly discuss the relationship between language, culture, and AI capabilities, as well as the ethical considerations in AI development and deployment. YT Version: https://youtu.be/dBZp47999Ko TOC: [00:00:00] Intro [00:02:12] FLOPS paper [00:26:42] Hardware lottery [00:30:22] The Language gap [00:33:25] Safety [00:38:31] Emergent [00:41:23] Creativity [00:43:40] Long tail [00:44:26] LLMs and society [00:45:36] Model bias [00:48:51] Language and capabilities [00:52:27] Ethical frameworks and RLHF Sara Hooker https://www.sarahooker.me/ https://www.linkedin.com/in/sararosehooker/ https://scholar.google.com/citations?user=2xy6h3sAAAAJ&hl=en https://x.com/sarahookr Interviewer: Tim Scarfe Refs The AI Language gap https://cohere.com/research/papers/the-AI-language-gap.pdf On the Limitations of Compute Thresholds as a Governance Strategy. https://arxiv.org/pdf/2407.05694v1 The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm https://arxiv.org/pdf/2406.18682 Cohere Aya https://cohere.com/research/aya RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs https://arxiv.org/pdf/2407.02552 Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs https://arxiv.org/pdf/2402.14740 Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/ EU AI Act https://www.europarl.europa.eu/doceo/document/TA-9-2024-0138_EN.pdf The bitter lesson http://www.incompleteideas.net/IncIdeas/BitterLesson.html Neel Nanda interview https://www.youtube.com/watch?v=_Ygf0GnlwmY Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet https://transformer-circuits.pub/2024/scaling-monosemanticity/ Chollet's ARC challenge https://github.com/fchollet/ARC-AGI Ryan Greenblatt on ARC https://www.youtube.com/watch?v=z9j3wB1RRGA Disclaimer: This is the third video from our Cohere partnership. We were not told what to say in the interview, and didn't edit anything out from the interview.
As impressive as LLMs are, the growing consensus is that language, scale and compute won't get us to AGI. Although many AI benchmarks have quickly achieved human-level performance, there is one eval that has barely budged since it was created in 2019. Google researcher François Chollet wrote a paper that year defining intelligence as skill-acquisition efficiency—the ability to learn new skills as humans do, from a small number of examples. To make it testable he proposed a new benchmark, the Abstraction and Reasoning Corpus (ARC), designed to be easy for humans, but hard for AI. Notably, it doesn't rely on language. Zapier co-founder Mike Knoop read Chollet's paper as the LLM wave was rising. He worked quickly to integrate generative AI into Zapier's product, but kept coming back to the lack of progress on the ARC benchmark. In June, Knoop and Chollet launched the ARC Prize, a public competition offering more than $1M to beat and open-source a solution to the ARC-AGI eval. In this episode Mike talks about the new ideas required to solve ARC, shares updates from the first two weeks of the competition, and shares why he's excited for AGI systems that can innovate alongside humans. Hosted by: Sonya Huang and Pat Grady, Sequoia Capital Mentioned: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models: The 2019 paper that first caught Mike's attention about the capabilities of LLMs On the Measure of Intelligence: 2019 paper by Google researcher François Chollet that introduced the ARC benchmark, which remains unbeaten ARC Prize 2024: The $1M+ competition Mike and François have launched to drive interest in solving the ARC-AGI eval Sequence to Sequence Learning with Neural Networks: Ilya Sutskever paper from 2014 that influenced the direction of machine translation with deep neural networks. Etched: Luke Miles on LessWrong wrote about the first ASIC chip that accelerates transformers on silicon Kaggle: The leading data science competition platform and online community, acquired by Google in 2017 Lab42: Swiss AU lab that hosted ARCathon precursor to ARC Prize Jack Cole: Researcher on team that was #1 on the leaderboard for ARCathon Ryan Greenblatt: Researcher with current high score (50%) on ARC public leaderboard (00:00) Introduction (01:51) AI at Zapier (08:31) What is ARC AGI? (13:25) What does it mean to efficiently acquire a new skill? (19:03) What approaches will succeed? (21:11) A little bit of a different shape (25:59) The role of code generation and program synthesis (29:11) What types of people are working on this? (31:45) Trying to prove you wrong (34:50) Where are the big labs? (38:21) The world post-AGI (42:51) When will we cross 85% on ARC AGI? (46:12) Will LLMs be part of the solution? (50:13) Lightning round
Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas
Which is more intelligent, ChatGPT or a 3-year old? Of course this depends on what we mean by "intelligence." A modern LLM is certainly able to answer all sorts of questions that require knowledge far past the capacity of a 3-year old, and even to perform synthetic tasks that seem remarkable to many human grown-ups. But is that really intelligence? François Chollet argues that it is not, and that LLMs are not ever going to be truly "intelligent" in the usual sense -- although other approaches to AI might get there.Support Mindscape on Patreon.Blog post with transcript: https://www.preposterousuniverse.com/podcast/2024/06/24/280-francois-chollet-on-deep-learning-and-the-meaning-of-intelligence/François Chollet received his Diplôme d'Ingénieur from École Nationale Supérieure de Techniques Avancées, Paris. He is currently a Senior Staff Engineer at Google. He has been awarded the Global Swiss AI award for breakthroughs in artificial intelligence. He is the author of Deep Learning with Python, and developer of the Keras software library for neural networks. He is the creator of the ARC (Abstraction and Reasoning Corpus) Challenge.Web siteGithubGoogle Scholar publicationsWikipedia"On the Measure of Intelligence"See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: LLM Generality is a Timeline Crux, published by Egg Syntax on June 24, 2024 on The AI Alignment Forum. Short Summary LLMs may be fundamentally incapable of fully general reasoning, and if so, short timelines are less plausible. Longer summary There is ML research suggesting that LLMs fail badly on attempts at general reasoning, such as planning problems, scheduling, and attempts to solve novel visual puzzles. This post provides a brief introduction to that research, and asks: Whether this limitation is illusory or actually exists. If it exists, whether it will be solved by scaling or is a problem fundamental to LLMs. If fundamental, whether it can be overcome by scaffolding & tooling. If this is a real and fundamental limitation that can't be fully overcome by scaffolding, we should be skeptical of arguments like Leopold Aschenbrenner's (in his recent 'Situational Awareness') that we can just 'follow straight lines on graphs' and expect AGI in the next few years. Introduction Leopold Aschenbrenner's recent 'Situational Awareness' document has gotten considerable attention in the safety & alignment community. Aschenbrenner argues that we should expect current systems to reach human-level given further scaling[1], and that it's 'strikingly plausible' that we'll see 'drop-in remote workers' capable of doing the work of an AI researcher or engineer by 2027. Others hold similar views. Francois Chollet and Mike Knoop's new $500,000 prize for beating the ARC benchmark has also gotten considerable recent attention in AIS[2]. Chollet holds a diametrically opposed view: that the current LLM approach is fundamentally incapable of general reasoning, and hence incapable of solving novel problems. We only imagine that LLMs can reason, Chollet argues, because they've seen such a vast wealth of problems that they can pattern-match against. But LLMs, even if scaled much further, will never be able to do the work of AI researchers. It would be quite valuable to have a thorough analysis of this question through the lens of AI safety and alignment. This post is not that[3], nor is it a review of the voluminous literature on this debate (from outside the AIS community). It attempts to briefly introduce the disagreement, some evidence on each side, and the impact on timelines. What is general reasoning? Part of what makes this issue contentious is that there's not a widely shared definition of 'general reasoning', and in fact various discussions of this use various terms. By 'general reasoning', I mean to capture two things. First, the ability to think carefully and precisely, step by step. Second, the ability to apply that sort of thinking in novel situations[4]. Terminology is inconsistent between authors on this subject; some call this 'system II thinking'; some 'reasoning'; some 'planning' (mainly for the first half of the definition); Chollet just talks about 'intelligence' (mainly for the second half). This issue is further complicated by the fact that humans aren't fully general reasoners without tool support either. For example, seven-dimensional tic-tac-toe is a simple and easily defined system, but incredibly difficult for humans to play mentally without extensive training and/or tool support. Generalizations that are in-distribution for humans seems like something that any system should be able to do; generalizations that are out-of-distribution for humans don't feel as though they ought to count. How general are LLMs? It's important to clarify that this is very much a matter of degree. Nearly everyone was surprised by the degree to which the last generation of state-of-the-art LLMs like GPT-3 generalized; for example, no one I know of predicted that LLMs trained on primarily English-language sources would be able to do translation between languages. Some in the field argued as...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: LLM Generality is a Timeline Crux, published by eggsyntax on June 24, 2024 on LessWrong. Short Summary LLMs may be fundamentally incapable of fully general reasoning, and if so, short timelines are less plausible. Longer summary There is ML research suggesting that LLMs fail badly on attempts at general reasoning, such as planning problems, scheduling, and attempts to solve novel visual puzzles. This post provides a brief introduction to that research, and asks: Whether this limitation is illusory or actually exists. If it exists, whether it will be solved by scaling or is a problem fundamental to LLMs. If fundamental, whether it can be overcome by scaffolding & tooling. If this is a real and fundamental limitation that can't be fully overcome by scaffolding, we should be skeptical of arguments like Leopold Aschenbrenner's (in his recent 'Situational Awareness') that we can just 'follow straight lines on graphs' and expect AGI in the next few years. Introduction Leopold Aschenbrenner's recent 'Situational Awareness' document has gotten considerable attention in the safety & alignment community. Aschenbrenner argues that we should expect current systems to reach human-level given further scaling and 'unhobbling', and that it's 'strikingly plausible' that we'll see 'drop-in remote workers' capable of doing the work of an AI researcher or engineer by 2027. Others hold similar views. Francois Chollet and Mike Knoop's new $500,000 prize for beating the ARC benchmark has also gotten considerable recent attention in AIS[1]. Chollet holds a diametrically opposed view: that the current LLM approach is fundamentally incapable of general reasoning, and hence incapable of solving novel problems. We only imagine that LLMs can reason, Chollet argues, because they've seen such a vast wealth of problems that they can pattern-match against. But LLMs, even if scaled much further, will never be able to do the work of AI researchers. It would be quite valuable to have a thorough analysis of this question through the lens of AI safety and alignment. This post is not that[2], nor is it a review of the voluminous literature on this debate (from outside the AIS community). It attempts to briefly introduce the disagreement, some evidence on each side, and the impact on timelines. What is general reasoning? Part of what makes this issue contentious is that there's not a widely shared definition of 'general reasoning', and in fact various discussions of this use various terms. By 'general reasoning', I mean to capture two things. First, the ability to think carefully and precisely, step by step. Second, the ability to apply that sort of thinking in novel situations[3]. Terminology is inconsistent between authors on this subject; some call this 'system II thinking'; some 'reasoning'; some 'planning' (mainly for the first half of the definition); Chollet just talks about 'intelligence' (mainly for the second half). This issue is further complicated by the fact that humans aren't fully general reasoners without tool support either. For example, seven-dimensional tic-tac-toe is a simple and easily defined system, but incredibly difficult for humans to play mentally without extensive training and/or tool support. Generalizations that are in-distribution for humans seems like something that any system should be able to do; generalizations that are out-of-distribution for humans don't feel as though they ought to count. How general are LLMs? It's important to clarify that this is very much a matter of degree. Nearly everyone was surprised by the degree to which the last generation of state-of-the-art LLMs like GPT-3 generalized; for example, no one I know of predicted that LLMs trained on primarily English-language sources would be able to do translation between languages. Some in the field argued as...
The ARC Challenge, created by Francois Chollet, tests how well AI systems can generalize from a few examples in a grid-based intelligence test. We interview the current winners of the ARC Challenge—Jack Cole, Mohammed Osman and their collaborator Michael Hodel. They discuss how they tackled ARC (Abstraction and Reasoning Corpus) using language models. We also discuss the new "50%" public set approach announced today from Redwood Research (Ryan Greenblatt). Jack and Mohammed explain their winning approach, which involves fine-tuning a language model on a large, specifically-generated dataset and then doing additional fine-tuning at test-time, a technique known in this context as "active inference". They use various strategies to represent the data for the language model and believe that with further improvements, the accuracy could reach above 50%. Michael talks about his work on generating new ARC-like tasks to help train the models. They also debate whether their methods stay true to the "spirit" of Chollet's measure of intelligence. Despite some concerns, they agree that their solutions are promising and adaptable for other similar problems. Note: Jack's team is still the current official winner at 33% on the private set. Ryan's entry is not on the private leaderboard or eligible. Chollet invented ARC in 2019 (not 2017 as stated) "Ryan's entry is not a new state of the art. We don't know exactly how well it does since it was only evaluated on 100 tasks from the evaluation set and does 50% on those, reportedly. Meanwhile Jacks team i.e. MindsAI's solution does 54% on the entire eval set and it is seemingly possible to do 60-70% with an ensemble" Jack Cole: https://x.com/Jcole75Cole https://lab42.global/community-interview-jack-cole/ Mohamed Osman: Mohamed is looking to do a PhD in AI/ML, can you help him? Email: mothman198@outlook.com https://www.linkedin.com/in/mohamedosman1905/ Michael Hodel: https://arxiv.org/pdf/2404.07353v1 https://www.linkedin.com/in/michael-hodel/ https://x.com/bayesilicon https://github.com/michaelhodel Getting 50% (SoTA) on ARC-AGI with GPT-4o - Ryan Greenblatt https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt Neural networks for abstraction and reasoning: Towards broad generalization in machines [Mikel Bober-Irizar, Soumya Banerjee] https://arxiv.org/pdf/2402.03507 Measure of intelligence: https://arxiv.org/abs/1911.01547 YT version: https://youtu.be/jSAT_RuJ_Cg
Tensorraum - Der KI Podcast | News über AI, Machine Learning, LLMs, Tech-Investitionen und Mehr
Passend zur Fußballeuropameisterschaft starten wir in die Folge mit einem KI-Tippspiel: Was glauben ChatGPT, Gemini und weitere, wie das nächste Deutschlandspiel ausgeht? Dann gehen wir aber direkt zu ernsten Themen über: Der Google-Forscher François Chollet sagte in einem NZZ Interview im April, dass die Investitionen in GenAI-Entwicklungen 1000-fach (!) zu hoch seien. Was ist dran an seiner These? Die Unternehmensberatung PwC hat sich währenddessen mit Aleph Alpha zusammengetan, um mit der creance.ai einen virtuellen Anwalt zu entwickeln. Erste Studien zu solchen Anwendungen zeigen jedoch, dass noch viel Arbeit in die Modelle gesteckt werden muss. Etwas mehr Zeit nehmen wir uns um auf den aktuellen Report von Lucidworks zu schauen: “The State of Generative AI in 2024”. Abschließend experimentieren wir mit dem neuesten Text-to-Video Modell lumalabs Dream Machine. Links zur Folge: Google-Forscher zu KI-Hype in der Wirtschaft: «Die Investitionen sind um ein Tausendfaches zu hoch» PwC Germany and Aleph Alpha launch founded joint venture creance.ai AI on Trial: Legal Models Hallucinate in 1 out of 6 (or More) Benchmarking Queries The State of Generative AI in 2024: Benchmarking the Hype vs. Reality lumalabs Dream Machine Gebt uns gerne Feedback Email: info@tensorraum.de Links zu uns: https://www.tensorraum.de Hosts: Stefan Wiezorek: https://www.linkedin.com/in/stefanwiezorek/ Dr. Arne Meyer: https://www.linkedin.com/in/arne-meyer-6a36612b9/ Dr. Jannis Buchsteiner: https://www.linkedin.com/in/jannis-buchsteiner/ Kapitelmarken: 00:00:00 Anfang und Teaser 00:02:11 KI-Tippspiel zur EM 00:05:12 François Chollet: KI-Investitionen zu hoch 00:11:44 AlphaFold: missglückte Monetarisierung? 00:18:42 PwC und Aleph Alpha stellen creance.ai vor 00:24:33 SLMs für Fachbereiche vs LLMS 00:34:40 Lucidworks: The State of Generative AI in 2024 00:49:38 Text-to-Video: lumalabs Dream Machine 00:57:43 Abschluss und Ausblick
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Getting 50% (SoTA) on ARC-AGI with GPT-4o, published by ryan greenblatt on June 17, 2024 on LessWrong. I recently got to 50%[1] accuracy on the public test set for ARC-AGI by having GPT-4o generate a huge number of Python implementations of the transformation rule (around 8,000 per problem) and then selecting among these implementations based on correctness of the Python programs on the examples (if this is confusing, go here)[2]. I use a variety of additional approaches and tweaks which overall substantially improve the performance of my method relative to just sampling 8,000 programs. [This post is on a pretty different topic than the usual posts I make about AI safety.] The additional approaches and tweaks are: I use few-shot prompts which perform meticulous step-by-step reasoning. I have GPT-4o try to revise some of the implementations after seeing what they actually output on the provided examples. I do some feature engineering, providing the model with considerably better grid representations than the naive approach of just providing images. (See below for details on what a "grid" in ARC-AGI is.) I used specialized few-shot prompts for the two main buckets of ARC-AGI problems (cases where the grid size changes vs doesn't). The prior state of the art on this dataset was 34% accuracy, so this is a significant improvement.[3] On a held-out subset of the train set, where humans get 85% accuracy, my solution gets 72% accuracy.[4] (The train set is significantly easier than the test set as noted here.) Additional increases of runtime compute would further improve performance (and there are clear scaling laws), but this is left as an exercise to the reader. In this post: I describe my method; I analyze what limits its performance and make predictions about what is needed to reach human performance; I comment on what it means for claims that François Chollet makes about LLMs. Given that current LLMs can perform decently well on ARC-AGI, do claims like "LLMs like Gemini or ChatGPT [don't work] because they're basically frozen at inference time. They're not actually learning anything." make sense? (This quote is from here.) Thanks to Fabien Roger and Buck Shlegeris for a bit of help with this project and with writing this post. What is ARC-AGI? ARC-AGI is a dataset built to evaluate the general reasoning abilities of AIs. It consists of visual problems like the below, where there are input-output examples which are grids of colored cells. The task is to guess the transformation from input to output and then fill out the missing grid. Here is an example from the tutorial: This one is easy, and it's easy to get GPT-4o to solve it. But the tasks from the public test set are much harder; they're often non-trivial for (typical) humans. There is a reported MTurk human baseline for the train distribution of 85%, but no human baseline for the public test set which is known to be significantly more difficult. Here are representative problems from the test set[5], and whether my GPT-4o-based solution gets them correct or not. Problem 1: Problem 2: Problem 3: My method The main idea behind my solution is very simple: get GPT-4o to generate around 8,000 python programs which attempt to implement the transformation, select a program which is right on all the examples (usually there are 3 examples), and then submit the output this function produces when applied to the additional test input(s). I show GPT-4o the problem as images and in various ascii representations. My approach is similar in spirit to the approach applied in AlphaCode in which a model generates millions of completions attempting to solve a programming problem and then aggregates over them to determine what to submit. Actually getting to 50% with this main idea took me about 6 days of work. This work includes construct...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On the Dwarkesh/Chollet Podcast, and the cruxes of scaling to AGI, published by JWS on June 16, 2024 on The Effective Altruism Forum. Overview Recently Dwarkesh Patel released an interview with François Chollet (hereafter Dwarkesh and François). I thought this was one of Dwarkesh's best recent podcasts, and one of the best discussions that the AI Community has had recently. Instead of subtweeting those with opposing opinions or vagueposting, we actually got two people with disagreements on the key issue of scaling and AGI having a good faith and productive discussion.[1] I want to explicitly give Dwarkesh a shout-out for having such a productive discussion (even if I disagree with him on the object level) and having someone on who challenges his beliefs and preconceptions. Often when I think of different AI factions getting angry at each other, and the quality of AI risk discourse plummeting, I'm reminded of Scott's phrase "I reject the argument that Purely Logical Debate has been tried and found wanting. Like GK Chesterton, I think it has been found difficult and left untried." More of this kind of thing please, everyone involved. I took notes as I listened to the podcast, and went through it again to make sure I got the key claims right. I grouped them into similar themes, as Dwarkesh and François often went down a rabbit-hole to pursue an interesting point or crux and later returned to the main topic.[2] I hope this can help readers navigate to their points of interest, or make the discussion clearer, though I'd definitely recommend listening/watching for yourself! (It is long though, so feel free to jump around the doc rather than slog through it one go!) Full disclosure, I am sceptical of a lot of the case for short AGI timelines these days, and thus also sceptical of claims that x-risk from AI is an overwhelmingly important thing to be doing in the entire history of humanity. This is of course comes across in my summarisation and takeaways, but I think acknowledging that openly is better than leaving it to be inferred, and I hope this post can be another addition in helping improve the state of AI discussion both in and outside of EA/AI-Safety circles. It is also important to state explicitly here that I might very well be wrong! Please take my perspective as just that, one perspective among many, and do not defer to me (or to anyone really). Come to your own conclusions on these issues.[3] The Podcast All timestamps are for the YouTube video, not the podcast recording. I've tried to cover the podcast by the main things as they appeared chronologically, and then tracking them through the transcript. I include links to some external resources, passing thoughts in footnotes, and more full thoughts in block-quotes. Introducing the ARC Challenge The podcast starts with an introduction of the ARC Challenge itself, and Dwarkesh is happy that François has put out a line in the sand as an LLM sceptic instead of moving the goalposts [0:02:27]. François notes that LLMs struggle on ARC, in part because its challenges are novel and meant to not be found on the internet, instead the approaches that perform better are based on 'Discrete Program Search' [0:02:04]. He later notes that ARC puzzles are not complex and require very little knowledge to solve [0:25:45]. Dwarkesh agrees that the problems are simple and thinks it's an "intriguing fact" that ARC problems are simple for humans, but LLMs are bad at them, and he hasn't been convinced by the explanations he's got from LLM proponents/scaling maximalists about why that is [0:11:57]. Towards the end François mentions in passing that big labs tried ARC but didn't share because their results because they're bad [1:08:28].[4] One of ARC's main selling points is that humans are clearly meant to do well at this, even children, [0...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: LLMs won't lead to AGI - Francois Chollet, published by tobycrisford on June 12, 2024 on The Effective Altruism Forum. I found this interview with Francois Chollet fascinating, and would be curious to hear what other people make of it. I think it is impressive that he's managed to devise a benchmark of tasks which are mostly pretty easy for most humans, but which LLMs have so far not been able to make much progress with. If you don't have time to watch the video, then I think these tweets of his sum up his views quite well: The point of general intelligence is to make it possible to deal with novelty and uncertainty, which is what our lives are made of. Intelligence is the ability to improvise and adapt in the face of situations you weren't prepared for (either by your evolutionary history or by your past experience) -- to efficiently acquire skills at novel tasks, on the fly. Meanwhile what the AI of today does is to combine extremely weak generalization power (i.e. ability to deal with novelty and uncertainty) with a dense sampling of everything it might ever be faced with -- essentially, use brute-force scale to *by-pass* the problem of intelligence entirely. If intelligence is the ability to deal with what you weren't prepared for, then the modern AI strategy is to prepare for everything, so you never need intelligence. This is of course a terrible strategy, because it is impossible to prepare for everything. The problem isn't just scale, the problem is the fact that the real world isn't sampled from a static distribution -- it is ever changing and ever novel. If his take on things is correct, I am not sure exactly what this implies for AGI timelines. Maybe it would mean that AGI is much further off than we think, because the impressive feats of LLMs that have led us to think it might be close have been overinterpreted. But it seems like it could also mean that AGI will arrive much sooner? Maybe we already have more than enough compute and training data for superhuman AGI, and we are just waiting on that one clever idea. Maybe that could happen tomorrow? Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
No Priors: Artificial Intelligence | Machine Learning | Technology | Startups
The first step in achieving AGI is nailing down a concise definition and Mike Knoop, the co-founder and Head of AI at Zapier, believes François Chollet got it right when he defined general intelligence as a system that can efficiently acquire new skills. This week on No Priors, Miked joins Elad to discuss ARC Prize which is a multi-million dollar non-profit public challenge that is looking for someone to beat the Abstraction and Reasoning Corpus (ARC) evaluation. In this episode, they also get into why Mike thinks LLMs will not get us to AGI, how Zapier is incorporating AI into their products and the power of agents, and why it's dangerous to regulate AGI before discovering its full potential. Show Links: About the Abstraction and Reasoning Corpus Zapier Central Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @mikeknoop Show Notes: (0:00) Introduction (1:10) Redefining AGI (2:16) Introducing ARC Prize (3:08) Definition of AGI (5:14) LLMs and AGI (8:20) Promising techniques to developing AGI (11:0) Sentience and intelligence (13:51) Prize model vs investing (16:28) Zapier AI innovations (19:08) Economic value of agents (21:48) Open source to achieve AGI (24:20) Regulating AI and AGI
Here is my conversation with Francois Chollet and Mike Knoop on the $1 million ARC-AGI Prize they're launching today.I did a bunch of socratic grilling throughout, but Francois's arguments about why LLMs won't lead to AGI are very interesting and worth thinking through.It was really fun discussing/debating the cruxes. Enjoy!Watch on YouTube. Listen on Apple Podcasts, Spotify, or any other podcast platform. Read the full transcript here. Timestamps(00:00:00) – The ARC benchmark(00:11:10) – Why LLMs struggle with ARC(00:19:00) – Skill vs intelligence(00:27:55) - Do we need “AGI” to automate most jobs?(00:48:28) – Future of AI progress: deep learning + program synthesis(01:00:40) – How Mike Knoop got nerd-sniped by ARC(01:08:37) – Million $ ARC Prize(01:10:33) – Resisting benchmark saturation(01:18:08) – ARC scores on frontier vs open source models(01:26:19) – Possible solutions to ARC Prize Get full access to Dwarkesh Podcast at www.dwarkeshpatel.com/subscribe
durée : 01:30:16 - Le Grand dimanche Soir - par : Charline Vanhoenacker, Guillaume Meurice, Juliette ARNAUD, Aymeric LOMPRET - Toute l'équipe est en vacances, mais on n'allait pas vous laisser vous morfondre un dimanche soir pour autant ! L'occasion de (re)-découvrir le meilleur des interviews de Christelle Chollet, Gilles Perret et Cédric Khan, avant de s'ambiancer sur les prestations musicales de La Poison et Gwendoline. - réalisé par : François AUDOIN
durée : 01:30:16 - Le Grand dimanche Soir - par : Charline Vanhoenacker, Guillaume Meurice, Juliette ARNAUD, Aymeric LOMPRET - Toute l'équipe est en vacances, mais on n'allait pas vous laisser vous morfondre un dimanche soir pour autant ! L'occasion de (re)-découvrir le meilleur des interviews de Christelle Chollet, Gilles Perret et Cédric Khan, avant de s'ambiancer sur les prestations musicales de La Poison et Gwendoline. - réalisé par : François AUDOIN
Meet François Chollet, creator of Keras, software engineer, and AI researcher at Google. Join François and hosts Ashley Oldacre and Gus Martins as they discuss how Keras 3 was created, integrating Keras 3 with Gemma and Kaggle, artificial general intelligence (AGI), and much more! Resources: François Chollet research → https://goo.gle/443V3vG Deep Learning With Python, Second Edition → https://goo.gle/3UnpdH1 Intelligence: On Intelligence: How a New Understanding of the Brain Will Lead to the Creation of Truly Intelligent Machines → https://goo.gle/3xDE33s Researcher Pierre-Yves Oudeyer → https://goo.gle/3W8a39V Monty Hall Challenge → https://goo.gle/3VYXAW5 Machine Learning: Keras 3 → https://goo.gle/3JqRgis Gemma on Keras → https://goo.gle/49Q0pfy The ARC challenge on Kaggle → https://goo.gle/3xQsDcr
durée : 01:30:27 - Le grand dimanche soir - par : Charline Vanhoenacker - Ce soir, on reçoit Christelle Chollet à retrouver sur la scène Théâtre de la Tour Eiffel dans "L'Empiafée", un show qui ne manque sacrément pas de piquant. Puis place au live musical avec les rockeurs de Johnny Montreuil pour deux titres inédits, dont une reprise... 100% fun !
durée : 01:30:27 - Le grand dimanche soir - par : Charline Vanhoenacker - Ce soir, on reçoit Christelle Chollet à retrouver sur la scène Théâtre de la Tour Eiffel dans "L'Empiafée", un show qui ne manque sacrément pas de piquant. Puis place au live musical avec les rockeurs de Johnny Montreuil pour deux titres inédits, dont une reprise... 100% fun !
The Lead...Let's talk about Errol Spence Jr. vs Terence CrawfordPPV UndercardIsaac Cruz vs Giovanni Cabrera, Nonito Donaire vs Alexandro Santiago, and Tellez vs Sergio GarciaPrelims on YouTube feature two bouts Steven Nelson, a close friend of Crawford will face veteran Rowdy Montgomery, and Jose Salas Reyes faces world title challenger Aston Palicte...In other notable bouts, Jabín Chollet and Michael Portales duel in what should be a competitive bout, Justin Viloria looks really good and he is on the undercard, too. Results George Kambosos Jnr vs Maxi Hughes is discussed at length. As well as Keyshawn Davis - what is the expectation for Davis in the short-term and long-term?We also talk about notable performances including Giovani Santillan versus Erick Bone, Jeremiah Milton, and Troy Isley.Thompson Boxing closed up shop on Friday night. They had fighters like Timothy Bradley Jr., and Daniel Roman.Lee McGregor lost, as Erik Robles upset him.- News- What is the future of Devin Haney? WBC extends the deadline for him...what is his next fight- Arnold Barboza Jr leaves Top Rank- Tyson Fury vs Francis Ngannou thoughtsNext WeekOn ESPN+ TuesdayUnified super bantamweight world champion, Stephen Fulton Jr., versus Naoya Inoue fight. Thoughts on Fulton's trainer questioning the wraps of Inoue...Robeisy Ramirez on the undercard...On ESPN+ FridaySeniesa Estrada vs. Leonela Paola Yudica, for Estrada's WBC/WBA women's strawweight titleAndres Cortes vs. Xavier MartinezAbraham Nova vs. Jonathan RomeroRohan Polanco vs. Cesar FrancisKarlos Balderas vs. Nahir AlbrightDante Benjamin and Charlie Sheehy also on the card...plus Jaylan Phillips gets an A-side fight.[Terence Crawford and Errol Spence Jr - Photo Credit: Alex Sanchez / Showtime][Kambosos-Hughes - Photo: Mikey Williams / Top Rank Inctimestamps0:00 The awful decision of Kambosos vs Maxi Hughes 11:00 Keyshawn Davis vs Francisco Patera15:50 Giovani Santillan vs Erick Bone15:25 Stephan Shaw vs Joe Goodall27:00 Troy Isley and Jeremiah Milton 31:00 Devin Haney title status38:40 Stephen Fulton vs Naoya Inoue34:00 Arnold Barboza Jr leaves Top Rank35:58 Tyson Fury vs a guy named Ngannou53:40 Errol Spence Jr vs Terence Crawford 01:01:00 Isaac Cruz vs Giovanni Cabrera 01:05:08 Nonito Donaire vs Alexandro Santiago01:08:50 YouTube Stream prelims01:10:00 Seniesa Estrada vs. Leonela Paola Yudica01:14:00 Andres Cortes vs Xavier Martinez01:15:20 Top Rank undercard01:17:00 outro
In this episode of Intelligence Matters, host Michael Morell speaks with State Department Counselor Derek Chollet about the state of the war in Ukraine as it enters its second year. Morell and Chollet discuss the implications of a deepening relationship between Russia and Iran as well as Russia and China, which the U.S. recently warned against providing material aid to Moscow. Chollet also provides new insights into the newly tense relationship between Washington and Beijing, following the shootdown of a Chinese surveillance balloon. He outlines the Biden administration's approach to managing Iran's nuclear ambitions after the earlier collapse of nuclear talks. See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.