Podcasts about ACI

  • 337PODCASTS
  • 717EPISODES
  • 38mAVG DURATION
  • 5WEEKLY NEW EPISODES
  • May 15, 2025LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about ACI

Latest podcast episodes about ACI

709 Watershed
46. Aquatic Conservation Initiative - An In-depth Conversation

709 Watershed

Play Episode Listen Later May 15, 2025 34:09


Send us a textOn this episode Host Darren Sheppard speaks with ACI staff members Rachael Brown and Gabby Riefesel about what ACI is, what they do, their projects, and much more! An environmental non-profit based in St. John's, but quickly expanding their reach throughout the east coast of the province. Similar to our work here at IBEC, ACI aims to improve the environment in multiple ways, while educating the public on the importance of good stewardship practices.Music by Giorgio Di Campo for FreeSound Music https://www.youtube.com/watch?v=8j8sO7-kbRc

Arauto Repórter UNISC
ARAUTO REPÓRTER UNISC 30 de Abril de 2025

Arauto Repórter UNISC

Play Episode Listen Later Apr 30, 2025 21:47


No Arauto Repórter de hoje, você confere:➡️ Expocande 2025 inicia HOJE em Candelária.➡️ Oportunidades de crescimento e inovação para negócios foi tema do Café Empresarial promovido pela ACI.➡️ Prazo para pagar o IPVA 2025 encerra HOJE.➡️ Em destaque na segurança pública: santa-cruzense perde mais de 30 MIL REAIS no golpe do acesso remoto de celular e ainda, jovem é preso com 28 mil maços de cigarros contrabandeados em carro funerário.

Assunto Nosso
ARAUTO REPÓRTER UNISC 30 de Abril de 2025

Assunto Nosso

Play Episode Listen Later Apr 30, 2025 21:47


No Arauto Repórter de hoje, você confere:➡️ Expocande 2025 inicia HOJE em Candelária.➡️ Oportunidades de crescimento e inovação para negócios foi tema do Café Empresarial promovido pela ACI.➡️ Prazo para pagar o IPVA 2025 encerra HOJE.➡️ Em destaque na segurança pública: santa-cruzense perde mais de 30 MIL REAIS no golpe do acesso remoto de celular e ainda, jovem é preso com 28 mil maços de cigarros contrabandeados em carro funerário.

Smartinvesting2000
April 25th, 2025 | Gold Investment, University Endowments, Trade Wars & Home Prices, Converting Pretax, Netflix (NFLX), The Walt Disney Company (DIS), Albertsons Companies, Inc. (ACI) & (UNH)

Smartinvesting2000

Play Episode Listen Later Apr 26, 2025 55:40


Should you invest in gold for the long term? Gold has been a great asset to hold over the last year, but I remain a skeptic of investing in gold long term. I personally don't own any gold nor would I recommend buying gold at this point in time. While the recent gains in the price of gold look attractive, given the fact it is up over 20% so far this year in a difficult market, the long-term results aren't enticing. There are periods of time where gold has been a strong performer, but trying to guess those periods is extremely difficult. If we look at January 1980 gold reached $850 per ounce, but the important number here is that the inflation adjusted price was $3,486 per ounce. This means it was not until recently when gold hit $3,500 per ounce, we see an all-time high on an inflation adjusted basis and essentially you made no real gain for over 45 years. At the end of the day gold is just a piece of metal worth only what the next person will pay for it. It has no earnings, no interest, no rents. This makes it extremely difficult to value and given the added expenses for trading and holding gold, it just does not make sense to me. I will continue to invest in good strong businesses at fair prices as I believe that is the best strategy for long term wealth creation.   Why is the government supporting universities with large endowments? I've never really thought about this before. I have known that some big universities have multibillion dollar endowment funds, but I did not realize that 658 institutions have approximately $874 billion, which is nearly $1trillion in endowment funds. When I dug a little bit deeper, I discovered that in addition to these universities receiving money from the federal government via grants, some pay little or no income tax and also get a waiver on property taxes. If you're starting to get a little bit irritated at this point because your hard-working dollars are going to universities like Harvard that has a $53 billion endowment or Yale with a $41 billion endowment, you might be like me and think it's time that things change. The cost of tuition at Harvard is $57,000 per year and the President makes about $1.3 million a year. The president of San Diego State University has a salary of $531,000 and the cost for one year of tuition is about $8700. I'm sure the students at Harvard do receive a more prestigious education than at San Diego State University, but is it 6 1/2 times better? Do the students that graduate from Harvard make a salary that's 600% more than a graduate from San Diego State University? I don't think so. I wondered where money from these endowments goes and basically 48.1% of endowment distributions go to fund student financial aid, 17.7% goes to academic programs and research, 10.8% is used for endowment faculty positions and nearly 17% of the endowment funds are used for other purposes. Wouldn't it be nice to know what those purposes are? I think we need to take a hard look at what universities have in their endowment funds, their tax benefits and grants, and let's have more students here in the United States benefit from those billions of dollars to get a good education as opposed to the fat cats in the Ivy League towers of the universities. One other point I found interesting was the investing philosophy for these endowment funds. The goal is to earn around 8% per year and pay out 4.5% to 5% to fund those various expenses. This should then allow the endowment fund to continue growing. A big problem is many have not been able to achieve that goal with only 25% of 152 schools that were surveyed being able to meet the 8% return over the last 10 years. The other concern is if they can't cut expenses if there is a lack of grants, many endowments are not liquid. Harvard for example had 39% in private equity, 32% in hedge funds, 5% in real estate, 3% in real assets, and just 3% in cash. With all this said I really believe this system should be reviewed to better the entire country, rather than just the Ivy League system.   Could the trade wars hurt home prices? We are starting to see some cracks in the housing market, such as the delinquency rate on FHA mortgages, which cater to the high-risk borrowers who can't qualify for a conventional mortgage because they either have a small down payment or weak credit. The delinquency rate for FHA currently stands at 11% according to the Mortgage Bankers Association, it has not been at this level for 12 years. Unfortunately, and we warned against it, but many people have stretched themselves too far financially to get into a home over the last few years. Because it's only been two or three years since they bought their home, after fees and commissions they may not have much if any equity built up in that home. Another area of weakness that is being seen is with the homebuilders who have really increased their incentives because they have more completed but unsold homes. The builders are getting a little bit worried because they have not seen this many homes sitting on their lots with no buyers since 2009. The average incentives for homebuilders is usually around 5% of the total value of the home, but we are starting to see some incentives around 13% from big builders like Lennar. The volatility of the 10-year treasury, which mortgages generally trade off of, has not been helpful because it has had a wide trading range lately. This then makes it difficult for homebuyers to lock in a good rate. At this point in time, I think I would be waiting to buy a home until maybe late summer. I think there should be some good deals at that point in time as the tariff war should continue to progress and we should have a clearer picture of the economy by that time.   Financial Planning: Why converting 100% of pretax is bad Roth conversions can be a powerful tax planning tool, but like any tool, using it the wrong way can do more harm than good. One of the most common mistakes we see is the idea that you should convert all of your pre-tax retirement savings, like a traditional IRA or 401(k), to a Roth account. Everyone loves the idea of a tax-free retirement. When you convert money from a traditional IRA to a Roth IRA, you're moving it from a pre-tax account to a tax-free account, but there's a price, the converted amount is considered income and you must pay ordinary income tax in the year of the conversion. Once converted funds grow tax-free. The best way to think about money in a pre-tax account is that it is deferred income.  It will be taxed, it's just a matter of when.  When you make contributions to a pre-tax account, you are not receiving a tax deduction, you are deferring income to a future year. When performing a Roth conversion, you are voluntarily deciding to pay tax on that income, even though you don't have to yet.  This only makes sense if you are able to convert at a lower tax rate than you would otherwise be subject to if you did not convert.  This most commonly happens between the beginning of retirement, typically in your 60's, and the beginning of your required distributions at age 75. During that period taxable income is generally lower which means conversions may be done at a lower tax rate than when required distributions begin at 75. Required distributions can be a problem because if you have too much in pre-tax accounts, your required taxable distributions may push you into a higher tax bracket and trigger IRMAA.  Roth conversions help this by shifting funds from pre-tax to tax-free, therefore reducing the level of taxable distributions beginning at 75.  However there is an efficient amount that should be converted for every person.  Converting 100% of pre-tax funds means you will likely be in a lower tax bracket after the conversions, and will potentially not have any tax liability at all.  This doesn't sound bad, but it means you likely paid too much in tax to convert the funds in the first place.  Again, money in a pre-tax account is deferred income that will be taxed.  The goal is to have that income taxed at the lowest rate possible.  If you convert too aggressively you may be settling for a higher tax rate on the money coming out and not receive enough tax-free income from the Roth to justify it.  Instead, structuring withdrawals and conversions to keep your taxable income consistently low all through retirement will result in a higher level of after-tax income.   Companies Discussed: Netflix (NFLX), The Walt Disney Company (DIS), Albertsons Companies, Inc. (ACI) & UnitedHealth Group Inc (UNH)

Engineering Greatness
Ep 25 - Engineering Greatness with Maria Juenger + Megan Voss-Warner

Engineering Greatness

Play Episode Listen Later Apr 23, 2025 36:41


In this episode, Maria Juenger, Professor at The University of Texas at Austin and current president of ACI, joins Megan Voss-Warner, PhD, Assistant Professor of Civil Engineering at the University of Evansville, to share their personal paths into the world of civil engineering. Megan reflects on her journey from studying chemistry to discovering a passion for materials science and civil engineering, while Maria shares how an early love for math and science shaped her career. Together, they discuss how awareness of engineering careers has evolved over time, their active roles within the American Concrete Institute (ACI), and how student competitions help spark interest and engagement in the field. The conversation also touches on the challenges of balancing research and teaching, offers career advice for young engineers, and highlights the vital role of mentorship and professional communities like ACI in shaping successful careers. Check out the video podcast here: https://youtu.be/fTRRWJcwkR0   Engineering Greatness is produced by Association Briefings. 

Concrete Logic
EP #119: Is More Limestone the Key to Stronger, Cheaper Concrete? Find Out!

Concrete Logic

Play Episode Listen Later Apr 22, 2025 52:18 Transcription Available


What if you were told there's better concrete out there with more limestone than what's used in Type IL cement mixes? Sounds crazy, right? But that's exactly what we're exploring today with John Guynn and John Kline. In this episode, we explore ACI 211-7R and its potential to reduce cement content without sacrificing performance. Learn how adjusting particle size distribution, using admixtures, and understanding the water-cement ratio can improve workability and strength. Plus, get insights on the challenges and benefits of low carbon concrete, real-world applications, and the regulatory hurdles that must be overcome to make these innovations standard in construction. Don't miss it! What's Inside: Limestone reduces cement content in concrete mixes. ACI 211-7R provides guidelines for using mineral fillers. Particle size distribution and admixtures optimize performance. Real-world applications demonstrate the benefits of reduced cement mixes. Low carbon concrete excels in strength, workability, and durability. Regulatory challenges exist around water-cement ratios and limestone classification. Smarter material use can reduce costs and improve sustainability. Ongoing research is key to advancing concrete technology. CHAPTERS: 00:00 Introduction to Concrete Innovations 02:56 Understanding ACI 211-7R and Its Implications 06:06 The Role of Limestone in Concrete Performance 08:56 Balancing Performance and Cost in Concrete Mixes 11:58 Water Demand and Its Impact on Concrete Quality 15:00 Particle Size Distribution and Its Importance 17:59 Admixtures and Their Role in Modern Concrete 21:06 Real-World Applications and Case Studies 23:54 Feedback from Finishers and Practical Considerations 29:27 Performance of Low Carbon Concrete 32:13 Regulatory Challenges and Water-Cement Ratio 35:02 Innovations in Cement and SCMs 38:47 Field Testing and Real-World Applications 46:04 Durability Testing and Future Prospects LISTEN NOW – Every concrete contractor & engineer needs to hear this one! Guest: John Guynn Company: Roman Cement Email: john.guynn@roman-cement.comWebsite: www.roman-cement.com‍Guest: John Kline Website: https://www.linkedin.com/in/john-kline-18003010/============================= Take Your Knowledge Further – Join Concrete Logic Academy! Gain exclusive access to expert video courses, live Q&A, and cutting-edge industry insights. Earn Professional Development Hours (PDHs) and elevate your expertise! Learn More: https://www.concretelogicacademy.comSupport the Podcast – Be Part of the Concrete Revolution! Donate: https://www.concretelogicpodcast.comBecome a producer & get recognized on our next episode! ============================= Recommended Resources: ACI 211-7R.20: Guide for Proportioning Concrete Mixtures with Ground Calcium Carbonate and Other Mineral Fillers: https://www.concrete.org/Portals/0/Files/PDF/Previews/211.7R-20_preview.pdfProducer: Jodi Tandett Music by: Mike Dunton Instagram: @Mike_Dunton Stay Connected & Watch More! Host: Seth Tandett Email: seth@concretelogicpodcast.comLinkedIn: https://www.linkedin.com/in/seth-tandett/YouTube Channel: https://www.youtube.com/@concretelogicpodcastPodcast Website: https://www.concretelogicpodcast.comLIKE, SUBSCRIBE & SHARE for more expert concrete insights!

Histoires d'Entreprises
#123 Philippe Rivière, fondateur et CEO d'ACI Groupe

Histoires d'Entreprises

Play Episode Listen Later Apr 22, 2025 56:40


Je suis reçu aujourd'hui à Lagny-sur-Marne par Philippe Rivière, PDG et fondateur du Groupe ACI. ACI est un groupe industriel que vous ne connaissez probablement pas mais dont vous entendrez de plus en plus parler. Le Groupe a à pleine plus de 5 ans et pèse déjà plus de 200 millions d'Euros. Ils visent 500M€ dans 5 ans. ACI sert directement les plus grands industriels que compte notre pays. Chez ACI on conçoit, on fabrique, on finit et on assemble dans les secteurs aussi variés que l'aéronautique, la défense, le nucléaire ou encore le ferroviaire. Venez avec Philippe à la découverte d'une société avalant une autre société par mois et s'installant dans tous nos territoires, de Saint-Etienne, en passant par Lyon, la vallée de l'Arve, la Loire, le Nord ou encore la Franche-Comté. Et si je vous parle de territoires en introduction, ce n'est pas par souci de marketing local, c'est parce que l'implantation par zone géographique est au cœur de la stratégie d'ACI. Cette stratégie, Philippe l'a apprise au Japon. Fallait-il aller au bout du monde pour mieux comprendre comment nous pouvions réussir ? Réponse dans cet épisode passionnant qui redonnera de l'espoir à celles et ceux qui pensent que la messe est dite pour l'industrie en France. Il y a toujours de l'espoir, ACI le prouve. Suivez Philippe sur LinkedIn Si cette nouvelle interview vous a plu, parlez-en autour de vous, notez 5 ⭐ le podcast (Spotify, Deezer, ApplePodcast...) et rédigez un avis.N'hésitez pas à m'écrire sur LinkedIn ➡️ LinkedIn/MartinVidelaine et à vous abonner à notre Newsletter hebdomadaire Toutes les Histoires d'Entreprises sont également disponibles sur histoiresentreprises.com et sur le site de bluebirds.partners, site de la communauté d'indépendants que j'anime et qui conseille ou remplace des dirigeants. Un podcast co-réalisé avec Agnès GuillardHébergé par Ausha. Visitez ausha.co/politique-de-confidentialite pour plus d'informations.

World vs Virus
Can aviation ever be sustainable? Here are some paths to net zero

World vs Virus

Play Episode Listen Later Apr 10, 2025 43:39


Aviation is growing, but its environmental impact does not have to - if the technology and policies are available to de-carbonise the sector. We hear from a company making sustainable aviation fuel with waste CO2; the head of one of the world's busiest airports, and from the body representing airports around the world. Hosts: Robin Pomeroy, Podcasts Editor, World Economic Forum; Laia Barbarà, Head, Climate Strategy - Net Zero, World Economic Forum Guests: Paul Griffiths, CEO of Dubai Airports Ayesha Choudhury, Chief Commercial Officer at Infinium Justin Erbacci, Director General of Airports Council International World Links: Airports of Tomorrow: https://initiatives.weforum.org/airports-of-tomorrow/home Global Aviation Sustainability Outlook 2025: https://www.weforum.org/publications/global-aviation-sustainability-outlook-2025/ Related podcasts: The energy transition moonshot: innovations that will transform our world Flying without emissions: how hydrogen is greening aviation An energy company is building the world's largest airplane. Here's why Skills are changeable - passions are not: Boom Supersonic CEO Blake Scholl   Check out all our podcasts on wef.ch/podcasts:  YouTube: - https://www.youtube.com/@wef/podcasts Radio Davos - subscribe: https://pod.link/1504682164 Meet the Leader - subscribe: https://pod.link/1534915560 Agenda Dialogues - subscribe: https://pod.link/1574956552 Join the World Economic Forum Podcast Club: https://www.facebook.com/groups/wefpodcastclub

Kanzlei WBS
EU plant Bazooka-Angriff gegen Trump-Zölle: Auch ich habe starke Verluste | Anwalt Solmecke

Kanzlei WBS

Play Episode Listen Later Apr 7, 2025 17:27


Sichere dir jetzt 40 Euro als Android-Nutzer. Ganz einfach mit Privacy ReClaim: https://wbs.law/android (Werbung) BAföG-Antrag gestellt, doch immer noch kein Geld? Dann handele jetzt – ohne Risiko: https://wbs.law/bafoeg Bahnt sich ein Handelskrieg an? Diese Frage stellen sich aktuell Politiker, Medien, Aktionäre und viele andere. Denn Trump schmeißt mit hohen Zöllen um sich. Seine Zölle gegen unter anderem die EU lösen eine kleine Kettenreaktion aus, denn hier stellt man sich die Frage, wie man am besten auf die Offensive der USA antworten möchte. Welche Möglichkeiten bleiben der EU und wie sehen die möglichen Maßnahmen rechtlich aus? Diese und andere Fragen beantworten wir in diesem Video. ACI: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=OJ:L_202302675 Infos zum ACI: https://trade.ec.europa.eu/access-to-markets/de/content/instrument-zur-bekaempfung-von-zwangsmassnahmen#:~:text=Kontakt-,Instrument%20zur%20Bek%C3%A4mpfung%20von%20Zwangsma%C3%9Fnahmen,-Eine%20zentrale%20Anlaufstelle DSA: https://commission.europa.eu/strategy-and-policy/priorities-2019-2024/europe-fit-digital-age/digital-services-act_de und https://eu-digitalstrategie.de/kapitel-4-dsa/art-52-dsa-sanktionen/ Spiegel-Beitrag: https://www.spiegel.de/wirtschaft/donald-trump-das-steckt-im-zollpaket-des-us-praesidenten-a-1ba1236b-db48-4211-9bdf-a1da3c2119a7 FR-Beitrag: https://www.fr.de/wirtschaft/und-europa-holt-wirtschaftliche-atombombe-raus-trump-plant-neue-zoelle-zr-93659679.html Tagesschau-Beitrag: https://www.tagesschau.de/ausland/amerika/musk-europa-zollpolitik-100.html ▬▬▬▬▬▬▬▬▬▬▬▬▬ WBS.LEGAL sucht dich! Du bist auf der Suche nach einem attraktiven, spannenden und anspruchsvollen Job? Dann bewirb dich bei uns und komm in unser Team. Bei WBS.LEGAL arbeitest du im Herzen der Medienhauptstadt Köln und bist im Berufsleben immer am Puls der Zeit – garantiert. Hier unsere offenen Stellenangebote: https://www.wbs.legal/karriere/#jobs Was erwartet dich bei uns? Hier bekommst du weitere Infos: https://www.wbs.legal/karriere/. ▬▬▬▬▬▬▬▬▬▬▬▬▬ Rechtsanwalt Prof. Christian Solmecke Prof. Christian Solmecke hat sich als Rechtsanwalt und Partner der Kölner Medienrechtskanzlei WBS.LEGAL auf die Beratung der Internet-, IT- und Medienbranche spezialisiert. So hat er in den vergangenen Jahren den Bereich Internetrecht/E-Commerce der Kanzlei stetig ausgebaut und betreut zahlreiche Medienschaffende, Web-2.0-Plattformen und App- Entwickler. Neben seiner Tätigkeit als Rechtsanwalt ist Prof. Christian Solmecke vielfacher Buchautor und als Gründer der cloudbasierten Kanzleisoftware Legalvisio.de auch erfolgreicher LegalTech-Unternehmer. ▬▬▬▬▬▬▬▬▬▬▬▬▬ Virtueller Kanzlei-Rundgang: https://wbs.law/rundgang Startet euren Rundgang in 3D und 360° durch die Kanzlei WBS.LEGAL (inkl. YouTube- Studio) ▬▬▬▬▬▬▬▬▬▬▬▬▬ Social-Media-Kanäle von WBS.LEGAL Wir freuen uns, wenn du uns auch auf unseren weiteren Social-Media-Kanälen besuchst und uns dort folgst. Jeder unserer Kanäle steht für sich und bringt dir garantiert einen Mehrwert. ▬Instagram und TikTok▬ Auf unseren erfolgreichen Kanälen auf Instagram und TikTok räumen wir täglich mit Rechtsirrtümern auf und präsentieren dir rechtliche Lifehacks. Damit bist du immer auf dem Laufendem und bekommst deine tägliche Dosis Alltagsrecht. Kurz, knackig und immer auf den Punkt. Folge uns auf Instagram und TikTok und du kannst vor deinen Freunden mit neuem Wissen glänzen. ➥ Instagram: https://wbs.law/recht2go ➥ TikTok: https://wbs.law/recht2goTikTok ▬Facebook▬ Auf Facebook sind wir inzwischen schon alte Hasen, denn seit Jahren informieren wir dich dort täglich über aktuelle Rechts-News. Gerne kannst du uns dort auch eine Anfrage als private Nachricht schicken. Schau vorbei! Hier der Link: ➥ https://wbs.law/facebook ▬X / Twitter▬ Erfahre als Erster, wenn es wichtige Rechts-News gibt. Knackige Statements zu aktuellen Themen bekommst du auf unserem X-Account (ehemals Twitter)! Hier der Link: ➥ https://wbs.law/twitter ▬Podcasts▬ Du b

The Civil Engineering Academy Podcast
The State of the AEC Industry in 2025 with Mark Oakeson

The Civil Engineering Academy Podcast

Play Episode Listen Later Apr 1, 2025 28:46


How's the AEC industry currently doing with lots of Government funding…but high interest rates, inflation, and (even worse) tariffs?

Concrete Logic
EP #117: Boosting Concrete Testing Accuracy with the CTAC Program

Concrete Logic

Play Episode Listen Later Apr 1, 2025 30:41 Transcription Available


In this episode, Seth Tandett talks with Todd Ohlheiser, Executive Director of the Colorado Ready Mixed Concrete Association (CRMCA), about the CTAC Program (Concrete Testing Adherence Collaboration Program). Todd explains how this program helps ensure that concrete cylinders are properly tested and cured, addressing common mistakes that lead to low strength results. They also discuss the role of ACI-certified technicians, the importance of correct testing, and how the program benefits contractors and suppliers by improving the accuracy of concrete testing.

Motley Fool Money
Right Trend, Wrong Stock

Motley Fool Money

Play Episode Listen Later Mar 29, 2025 46:14


Investors weren't exactly wrong to be excited about the companies trying to make meal kits and plant-based meat cool. But they sure haven't made any money from those bets. So … what went wrong? Patrick Badolato is an Associate Professor of Instruction at the McCoombs School of Business at the University of Texas at Austin, where he teaches Accounting. He joins Ricky Mulvey for a conversation about companies that have opened the door for genuinely exciting opportunities, but haven't yet been able to figure out a workable business model. They also discuss: Expanding your definition of competition. Why Blue Apron and Beyond Meat haven't taken off like their IPO investors hoped. Whether Coca-Cola is at risk of becoming a “Cabbage Patch concept.” Companies/tickers discussed: KR, ACI, BYND, MCD, KO, NVDA, CELH, PEP, YETI Host: Ricky Mulvey Guest: Patrick Badolato Producer: Mary Long Engineer: Dan Boyd, Rick Engdahl Learn more about your ad choices. Visit megaphone.fm/adchoices

PENDENTE: Rubrica su Cinema, letteratura, fumetto ed esperienze culturali
Negativo, se po' fa!: Bianco, Rosso e Verdone

PENDENTE: Rubrica su Cinema, letteratura, fumetto ed esperienze culturali

Play Episode Listen Later Mar 21, 2025 10:16


Una nuova retrospettiva con protagonista un noto regista, sceneggiatore e attore che il pubblico italiano conosce fin troppo bene ma che magari, a volte, non valorizza più di quanto meriterebbe. Ed eccomi quindi alle prese con una rapida ma intensa retrospettiva della filmografia di Carlo Verdone.Seconda opera del regista ma decisamente più coesa del suo esordio, "Bianco, Rosso e Verdone" è probabilmente uno dei punti più alti del Verdone strettamente comico e il migliore tra i suoi film a episodi.Che aspettate allora per (ri)vivere il weekend elettorale più assurdo e tragicomico di tutti i tempi?! Ricordatevi la vostra tessera da socio ACI e soprattutto...dov'è finita MAGDA?!

Emprender LibreMente
Conocer tu neurodivergencia a través de tus hijas

Emprender LibreMente

Play Episode Listen Later Mar 18, 2025 14:59


This is a free preview of a paid episode. To hear more, visit libremente.substack.comHola gente divergente.En este episodio entrevistamos a una mujer detectada de altas capacidades siendo adulta que llegó, como tantas madres, a conocer su neurotipo gracias a haber identificado primero las altas capacidades en sus hijas.Hablaremos de qué la llevó a convertirse en una activista y divulgadora de las altas capacidades, y cómo lidia como persona con alta sensibilidad con la exposición en redes y las críticas que recibe de vez en cuando.Silvia, nos hablará de muchos de esos temas que surgen siendo madre con ACI de niñas también con ACI, como por ejemplo es el tema de la frustración en las aacc y cómo ha aprendido ella a gestionarlo en sus hijas. Y de si es importante llevar a los niños siendo pequeños a hacerse una evaluación, o de si por el contrario, al identificarlos como aacc se materializan estos miedos que tienen muchas madres y padres como son por ejemplo, que al niño se le señale por ser diferente.También hablamos de la importancia de normalizar las aacc en particular y las neurodivergencias en general y de los peligros que conlleva ocultar a los hijos que éstos son aacc.Y de cómo explicarles siendo pequeños que son aacc y de los miedos que surgen como madre o padre cuando tus hijos son identificados como ACI.Silvia también nos hablará de qué es lo que más le desregula como madre y cómo aborda este tema con niñas que son también muy intensas como ella. Y de cómo finalmente llegó a su evaluación de aacc y de cómo está viviendo el haber sido detectada como ACI siendo ya adulta.Disfruta de la entrevista y dale al play. Para conocer más de ella* Su página web* Su libro “Yo siempre os daré voz”* Sus cuentos* Su Instagram

Beyond the Numbers
Unlocking the Power of Appraisal Technology: A Conversation with ACI Software

Beyond the Numbers

Play Episode Listen Later Mar 17, 2025 21:39


Send us a textIn this episode of Beyond the Numbers, Kevin Hecht sits down with Kim Angelone, Program Director at ACI, to explore how appraisal technology is transforming the industry. They discuss ACI's latest innovations, including the highly anticipated Workbench platform, and how it helps appraisers streamline workflows, enhance efficiency, and stay ahead of regulatory changes. Tune in for insights on AI integration, compliance tools, and the future of real estate valuation software. Whether you're a seasoned appraiser or just starting out, this conversation is packed with valuable takeaways! 

Le Business Club de France des Entrepreneurs
Entrepreneurs & Agriculteurs dans nos régions

Le Business Club de France des Entrepreneurs

Play Episode Listen Later Mar 14, 2025 13:00


INÉDITDEUX ÉMISSIONS EN UNE ! : BUSINESS CLUB DE FRANCE DES ENTREPRENEURS ET BUSINESS CLUB DE FRANCE DES AGRICULTEURSUne émission présentée par Michel PicotDANS L'ACTU :- (Gard) DMS Group va produire et livrer pour l'Ukraine : 120 dispositifs de radiologie mobile- (Bouches du Rhône) CMA CGM annonce un investissement de 20 milliards de $ aux USAECO REGIONS La plus belle fonderie d'Europe en Haute Marne : Hachette & Driout racheté par le groupe ACI.Reportage de Puissance TVDans l'Aube, Nigloland prépare son ouverture 2025 (le 4 avril prochain) avec beaucoup de nouveautés notamment avec des nouveaux espaces hôteliers et de restauration.Reportage de Canal 32BUSINESS CLUB DE FRANCE DES AGRICULTEURSRetour sur le salon de l'agriculture 2025. Que vaut un prix agricole ? Reportage LMTVDans les Vosges : Relancer le Bio ! Reportage Vosges TV Hébergé par Ausha. Visitez ausha.co/politique-de-confidentialite pour plus d'informations.

Ethereum Daily - Crypto News Briefing
Sepolia Pectra Upgrade Incident

Ethereum Daily - Crypto News Briefing

Play Episode Listen Later Mar 6, 2025 2:54


The Sepolia testnet encounters a bug from the Pectra fork. The ACI proposes the Aave Savings Rate for GHO. And Devconnect 2025 will be held in Argentina. Read more: https://ethdaily.io/660

Ethereum Daily - Crypto News Briefing
Aavenomics Implementation Proposal

Ethereum Daily - Crypto News Briefing

Play Episode Listen Later Mar 5, 2025 3:38


ACI introduces the Aavenomics implementation proposal. Lens introduces the Grove data storage solution. Hyperlane introduces OpenUSDT And Lighthouse will enable light client support by default. Read more: https://ethdaily.io/659

Concrete Logic
EP #114: The True Cost of Low Carbon Concrete

Concrete Logic

Play Episode Listen Later Feb 27, 2025 55:55 Transcription Available


In this episode of the Concrete Logic Podcast, Dr. Jon Belkowitz and host Seth Tandett discuss the excessive costs and challenges associated with low carbon concrete, particularly Type 1L cement. They delve into the implications for the cement industry, the struggles faced by Departments of Transportation (DOTs) with new cement standards, and misconceptions surrounding blended cements. The conversation covers the shift towards performance-based design, the implications of ACI 323 on concrete practices, and the role of concrete pumping in quality assurance. They also highlight the importance of long-term testing for new materials and the need for research and development in ready-mix concrete to address industry challenges. Takeaways Low carbon concrete has significant cost implications for the industry. Misconceptions about blended cements are prevalent in the industry. The shift to low carbon cements has led to premature cracking issues. Concrete performance is not solely determined by its strength. Performance-based design has shifted the responsibility from prescriptive to outcome-based. Long-term testing is crucial for new concrete materials to ensure reliability. Research and development in ready-mix concrete is often neglected in contracts. Chapters 00:00 Introduction to Low Carbon Concrete 04:31 The Cost Implications of Low Carbon Concrete 08:06 Challenges Faced by DOTs with New Cements 12:42 Understanding the Shift in Cement Standards 16:45 The Impact of New Cements on Concrete Durability 20:32 Misconceptions About Blended Cements 24:15 Abrasion Resistance and Concrete Performance 27:14 The Concrete Industry's Strength Assumptions 29:39 Performance-Based Concrete Design 31:24 Impact of ACI 323 on Concrete Practices 32:53 The Role of Concrete Pumping in Quality Assurance 34:51 Historical Challenges in the Concrete Industry 37:59 Research and Development in Ready-Mix Concrete 40:43 The Importance of Long-Term Testing 46:34 The Future of Type 1L Cements 49:43 Balancing Perspectives in Concrete Discussions ***Did you learn something from this episode? Would you like to support the concrete industry's favorite podcast? If so, donate at https://www.concretelogicpodcast.com/support/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ . When YOU donate to the show, you will be listed as a producer of the next episode that is released! Join the Concrete Logic Academy! Enhance your learning from our podcast with engaging quizzes that test your knowledge and help you earn Professional Development Hours (PDHs). Support Concrete Logic and take your education to the next level!

SBS Dinka - SBS Dinka
Banks dit Baai kɔu aɣeer

SBS Dinka - SBS Dinka

Play Episode Listen Later Feb 14, 2025 7:55


Aci banks dit gam lɔn aci bi thiök baai kɔu aɣeer tɛn ruon karou ku abak kɔk bi bɛn ka yiic.

The Structural Engineering Channel
Unparalleled Ways to Improve Seismic Construction With Tested Concrete Designs – Ep 149

The Structural Engineering Channel

Play Episode Listen Later Feb 13, 2025 28:05


In this episode, we speak with David Fanella, Ph.D., S.E., P.E., F.ACI, F.ASCE, F.SEI, vice president of engineering at the Concrete Reinforcing Steel Institute, about designing cost-effective steel-reinforced concrete buildings, the role of constructability in project success, especially in seismic construction, and how managing tolerances can streamline construction and reduce costs. ***The video version of […] The post Unparalleled Ways to Improve Seismic Construction With Tested Concrete Designs – Ep 149 appeared first on Engineering Management Institute.

The MAD Podcast with Matt Turck
The AI Coding Agent Revolution, The Future of Software, Techno-Optimism | Amjad Masad, CEO, Replit

The MAD Podcast with Matt Turck

Play Episode Listen Later Feb 6, 2025 89:39


Replit is one of the most visible and exciting companies reshaping how we approach software and application development in the Generative AI era. In this episode, we sit down with its CEO, Amjad Masad, for an in-depth discussion on all things AI, agents, and software. Amjad shares the journey of building Replit, from its humble beginnings as a student side project to becoming a major player in Generative AI today. We also discuss the challenges of launching a startup, the multiple attempts to get into Y Combinator, the pivotal moment when Paul Graham recognized Replit's potential, and the early bet on integrating AI and machine learning into the core of Replit. Amjad dives into the evolving landscape of AI and machine learning, sharing how these technologies are reshaping software development. We explore the concept of coding agents and the impact of Replit's latest innovation, Replit Agent, on the software creation process. Additionally, Amjad reflects on his time at Codecademy and Facebook, where he worked on groundbreaking projects like React Native, and how those experiences shaped his entrepreneurial journey. We end with Amjad's view on techno-optimism and his belief in an energized Silicon Valley. Replit Website - https://replit.com X/Twitter - https://x.com/Replit Amjad Masad LinkedIn - https://www.linkedin.com/in/amjadmasad X/Twitter - https://x.com/amasad FIRSTMARK Website - https://firstmark.com X/Twitter - https://twitter.com/FirstMarkCap Matt Turck (Managing Director) LinkedIn - https://www.linkedin.com/in/turck/ X/Twitter - https://twitter.com/mattturck (00:00) Intro (01:36) The origins of Replit (15:54) Amjad's decision to restart Replit (19:00) Joining Y Combinator (30:06) AI and ML at Replit (32:31) Explain Code (39:09) Replit Agent (52:10) Balancing usability for both developers and non-technical users (53:22) Sonnet 3.5 stack (58:43) The challenge of AI evaluation (01:00:02) ACI vs. HCI (01:05:02) Will AI replace software development? (01:10:15) If anyone can build an app with Replit, what's the next bottleneck? (01:14:31) The future of SaaS in an AI-driven world (01:18:37) Why Amjad embraces techno-optimism (01:20:36) Defining civilizationism (01:23:11) Amjad's perspective on government's role

Born To Speak
Communication Lessons from a Film Producer

Born To Speak

Play Episode Listen Later Feb 3, 2025 29:55


Have you ever wondered what it's like to navigate the fast-paced world of film production? In this episode of the Body Talk Podcast, Alina chats with Chevonne O'Shaughnessy, a film producer and co-founder of American Cinema International, to explore her fascinating career in the industry.Chevonne shares how an unexpected turn in her life led her into the film world, starting from scratch and learning the ropes as she went. From building strong relationships to mastering negotiation skills, she explains the communication strategies crucial to her success, especially when standing her ground with intimidating agents or finding win-win solutions in high-stakes situations.Chevonne also talks about how she's adapted to industry shifts over the decades, including the rise of AI, and the success of ACI's YouTube channel, bringing her work to new audiences.Listen in to hear Chevonne's incredible story and her lessons on resilience, communication and staying ahead of the curve.989/5000Thumbnail

IN-the-Know
Developing Niche Products for Healthcare Providers with Maia Jarvis

IN-the-Know

Play Episode Listen Later Jan 29, 2025 26:21


As Vice President of Operations at BLISCare, Maia Jarvis provides executive leadership for the company's MGU activities, spearheads product development, and oversees the ever-evolving BLISCare insurance management platform. With over 10 years of experience in the insurance industry (focused on creating innovative, niche solutions) and more than two decades of leadership under her belt, Maia has mastered the art of problem-solving and resource-wrangling. A key player in the launch of the BLISCare captive, Maia now serves on its Board of Directors. Maia earned her BA from the University of Portland and her MBA in IT Management from Western Governors University. She holds CPCU and ARe designations along with several Salesforce certifications; she is working on her ACI (Associate in Captive Insurance) certification. In this episode of In the Know, Chris Hampshire and Maia explore the niche products offered at BLISCare, how this captive was formed, and the many exciting careers that can be found in the insurance industry.   Key Takeaways   Maia's insurance career started, like so many, somewhat unintentionally. The offerings at BLISCare began with a problem that needed to be solved. Lessons learned as this niche bariatric program launched. Marketing this targeted product started with word of mouth and has expanded with the help of medical provider referrals. The benefits behind the decision to create this captive product. Maia's journey from executive assistant to establishing a captive product with alternative risk financing was successful, in part, because of her mentors. An overview of the benefits of earning CPCU, ARE, and ACI certifications. Maia's IT Management MBA helps her understand how to cultivate solutions in the insurance industry. The exciting role of IT in navigating industry pain points to find solutions. Maia's message to anyone who is considering an insurance career. A promising look at the five-year future of the insurance industry. Maia's motivating advice to her early-career self.  

Talking Pools Podcast
Concrete Cancer ll, New Podcast, License Compliance

Talking Pools Podcast

Play Episode Listen Later Jan 24, 2025 26:21


Text Rudy Now!The concrete industry has methods to mitigate this. Hopefully, something here can be incorporated into a Cancer Treatment for swimming pools...In this episode, Rudy discusses the importance of community support in the pool service industry, addresses the issue of unlicensed contractor work, and delves into the complexities of concrete durability, specifically focusing on alkali-silica reaction (ASR) and rebar corrosion. He emphasizes the need for preventive measures, effective repair strategies, and the adoption of emerging technologies to combat concrete deterioration. The conversation highlights the significance of regular maintenance and proactive approaches to ensure the longevity of concrete structures, particularly in the pool industry.takeawaysLicense ComplianceCommunity support is essential for industry growth.Unlicensed contractor work poses significant risks.Concrete is durable but susceptible to deterioration.ASR and rebar corrosion are major concerns in concrete.Water management is critical in preventing deterioration.Preventive measures can extend the lifespan of concrete.Emerging technologies offer innovative solutions for repair.Regular inspections can identify early signs of damage.Timely repairs can mitigate extensive damage.Collaboration and knowledge sharing enhance industry standards.Pete 'The Pool Guy's TIP of The Day PodcastSound Bites"There are solutions out there.""Concrete cancer might be curable.""We need to lift each other up."RefererencesAmerican Concrete Institute (ACI) Reports:ACI 301-20, Section 4.2.2.6: Durability.ACI 318, Chapter 19: Durability Requirements.ACI 224R-01: Managing water ingress with surface sealants.ACI 221.1R-98: Mitigating ASR gel formation.ACI 503R: Restoring structural integrity.ACI 222R: Preventing oxidation of reinforcement and chloride removal.ACI 364.1R: pH r POOL MAGAZINE Pool Magazine is leading up to the minute news source for Swimming Pool News and Pool Features. OuAquaStar Pool ProductsThe Global Leader in Safety, Dependability, & Innovation in Pool Technology.BLUERAY XLThe real mineral purifier! Reduce your pool maintenance costs & efforts by 50%CPO Certification ClassesAttend your CPO class with Rudy Stankowitz!Jack's MagicIf you know Jack's you'd have no stains!Online Pool ClassesThe difference between you and your competition is what you know!RaypakRaypak, leading the evolution of environmental efficiency and sustainability in pool heaters.Disclaimer: This post contains affiliate links. If you make a purchase, I may receive a commission at no extra cost to you.Support the showThank you so much for listening! You can find us on social media: Facebook Instagram Tik Tok Email us: talkingpools@gmail.com

AviationPros Podcast
Justin Erbacci's Vision for ACI

AviationPros Podcast

Play Episode Listen Later Jan 24, 2025 32:29


Airports Council International World Director General Justin Erbacci talks with Editor-in-Chief Joe Petrie about the organization's future, challenges facing the airport industry and how technology is going to reshape how the world travels in the near future.  Erbacci joined ACI in his current role in September 2024. He previously served as Chief Executive Officer – Airports at NEOM in Saudi Arabia and as Chief Executive Officer of Los Angeles World Airports (LAWA). He also excelled as Chief Operating Officer, Chief IT and Commercial Strategy Officer and Chief IT and Innovation Officer at LAWA. Prior to his tenure at LAWA, Erbacci was Vice President of Customer Experience and Technology at the airline alliance Star Alliance Services GmbH. His ability to drive innovation and leverage technology was developed during his IT leadership roles at major global companies such as Credit Suisse and United Airlines. Earlier, he gained experience at management consultancies including Cambridge Management Consultants and Deloitte and Touche. Erbacci also practiced law as a civil rights defense litigator.

The Public Relations Podcast
Predictions for In-house comms in 2025

The Public Relations Podcast

Play Episode Listen Later Jan 22, 2025 14:58


What is the future for in house PR people in 2025? This is something I've been privately discussing on LinkedIn with Lynn Kwek from ACI in Singapore. But we thought, why don't we record a podcast episode instead? We discuss everything from AI to the importance of authentic storytelling but not just from a point of view of a list of things but how to actually approach then in 2025. As you'll hear, we have some predictions, we have our ideas, but more importantly, as always, what are your ideas? Do you let me know?

TD Ameritrade Network
Albertsons (ACI) Earnings After Failed Kroger (KR) Merger

TD Ameritrade Network

Play Episode Listen Later Jan 8, 2025 7:38


Bill Kirk and Pablo Garces cover Albertsons (ACI) earnings. Bill says that we're waiting for fiscal 2025 guidance in their next report after its failed merger with Kroger (KR). He has a Buy rating and $23 price target on the stock, citing efficiency and buyback programs. Pablo points to lower market share amid competition from mass merchants, dollar stores, and other grocery upstarts. He has a BB+ rating on ACI, expecting it to benefit from a cautious consumer. ======== Schwab Network ======== Empowering every investor and trader, every market day. Subscribe to the Market Minute newsletter - https://schwabnetwork.com/subscribe Download the iOS app - https://apps.apple.com/us/app/schwab-network/id1460719185 Download the Amazon Fire Tv App - https://www.amazon.com/TD-Ameritrade-Network/dp/B07KRD76C7 Watch on Sling - https://watch.sling.com/1/asset/191928615bd8d47686f94682aefaa007/watch Watch on Vizio - https://www.vizio.com/en/watchfreeplus-explore Watch on DistroTV - https://www.distro.tv/live/schwab-network/ Follow us on X – / schwabnetwork Follow us on Facebook – / schwabnetwork Follow us on LinkedIn - / schwab-network About Schwab Network - https://schwabnetwork.com/about

Concrete Logic
EP #109: ACI 323, The Fast-Approaching Code and Its Impact on Concrete

Concrete Logic

Play Episode Listen Later Jan 7, 2025 59:24 Transcription Available


In this episode of the Concrete Logic Podcast, host Seth Tandett is joined by Rich Szecsy and Dr. Jon Belkowitz to discuss the recently released ACI 323 code on low carbon concrete. The trio dig into the significance of the code, the confusion surrounding its implementation, and the challenges it poses for ready-mix producers and construction practices. The hosts explore the implications of the new code on concrete pumping, the role of authorities in code adoption, and the need for future updates to adapt to evolving industry standards. The episode concludes with reflections on the potential impact of the code and the importance of ongoing dialogue in the concrete industry. Takeaways ACI 323 represents a significant shift in concrete standards. The code's implementation may create confusion for industry professionals. Ready-mix producers face challenges in meeting new low carbon requirements. There is a need for clarity on the authority of jurisdiction in code adoption. Frequent updates to the code will be necessary as more data becomes available. The relationship between contractors and ready-mix suppliers is crucial for compliance. Concrete pumping practices may be adversely affected by the new standards. Policymakers will likely default to adopting ACI 323 without fully understanding its implications. Ongoing discussions are essential to navigate the complexities of low carbon concrete. Chapters 00:00 Introduction to the Concrete Logic Podcast 03:17 Overview of ACI 323 and Its Significance 06:47 Confusion Surrounding Low Carbon Concrete Standards 09:21 Challenges in the Concrete Supply Chain 12:51 The Role of Authorities in Code Adoption 20:42 Impact of Low Carbon Concrete on Local Regulations 28:35 Future Updates and Adaptations of ACI 323 30:01 The Need for Frequent Updates in Concrete Codes 32:02 The Evolution of Low Carbon Concrete Standards 33:50 Challenges in Concrete Production and Compliance 38:55 Confusion Surrounding Concrete Codes and Specifications 48:59 The Impact of Policy on Concrete Standards 51:58 Looking Ahead: Future of Concrete Codes and Practices*** Did you learn something from this episode? Would you like to support the concrete industry's favorite podcast? If so, donate at https://www.concretelogicpodcast.com/support/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ . When YOU donate to the show, you will be listed as a producer of the next episode that is released! Join the Concrete Logic Academy! Enhance your learning from our podcast with engaging quizzes that test your knowledge and help you earn Professional Development Hours (PDHs). Support Concrete Logic and take your education to the next level!

Engineering Greatness
Ep 23 - Engineering Greatness with Madonna Saad + Vania Del Carmen Moreno-Colin

Engineering Greatness

Play Episode Listen Later Dec 19, 2024 27:33


In this episode, Madonna Saad, a recent graduate of the University of Louisiana at Lafayette, and Vania Del Carmen Moreno-Colin, a student at the University of Washington, share their journeys in civil engineering. Vania reflects on her transition from participating in a rocketry team to taking on leadership roles within ACI, while Madonna discusses her active involvement in engineering clubs and her internship experiences. Together, they highlight the importance of seeking out opportunities, engaging in research, and leveraging networking and sponsorships to advance their careers. They also candidly explore the challenges and rewards of balancing academics with extracurricular commitments, offering valuable advice to aspiring engineers. Check out the video podcast here: https://youtu.be/848LlOErp6U   Engineering Greatness is produced by Association Briefings. 

Motley Fool Money
Young Investors, Root for a Bear Market

Motley Fool Money

Play Episode Listen Later Dec 17, 2024 25:32


Investing is a decades-long game. (00:14) Bill Barker and Ricky Mulvey discuss: - The Federal Trade Commission's ruling on junk fees. - What killed a merger between Kroger and Albertsons. - How younger investors can prepare for the next bear market. Then, (17:28) Alison Southwick and Robert Brokamp offer some tips on tax-loss harvesting. WSJ column discussed: https://www.wsj.com/finance/stocks/why-this-frothy-market-has-me-scared-295c07c3 Companies discussed: KR, ACI, AZO, AAPL, ORLY, SBUX Host: Ricky Mulvey Guests: Bill Barker, Alison Southwick, Robert Brokamp Producer: Mary Long Engineer: Rick Engdahl Learn more about your ad choices. Visit megaphone.fm/adchoices

Rappin' With ReefBum
Guest: Chris Meckley, ACI Aquaculture

Rappin' With ReefBum

Play Episode Listen Later Dec 4, 2024 307:11


Rappin' With ReefBum is a LIVE talk show with host Keith Berkelhamer and guests from the reef keeping community. In this episode I chat with Chris Meckley, who is the owner of ACI Aquaculture in Plant City, Florida. ACI is a coral wholesaler and Chris runs the business with his wife Amanda and their staff.

Packet Pushers - Full Podcast Feed
NB506: Billions Flow for US Chips; FCC Lets T-Mobile, SpaceX Make Phone Calls from Orbit

Packet Pushers - Full Podcast Feed

Play Episode Listen Later Dec 3, 2024 24:31


Take a Network Break! We’ve got a full menu for our post-Thanksgiving episode. We start with a host of critical CVEs affecting Veritas and a couple more for QNAP. Cisco announces EOL for two version of its ACI software, Verizon runs field trials for 1.6Tbps throughput in a single wavelength (with Ciena optical transceivers), and... Read more »

Packet Pushers - Network Break
NB506: Billions Flow for US Chips; FCC Lets T-Mobile, SpaceX Make Phone Calls from Orbit

Packet Pushers - Network Break

Play Episode Listen Later Dec 3, 2024 24:31


Take a Network Break! We’ve got a full menu for our post-Thanksgiving episode. We start with a host of critical CVEs affecting Veritas and a couple more for QNAP. Cisco announces EOL for two version of its ACI software, Verizon runs field trials for 1.6Tbps throughput in a single wavelength (with Ciena optical transceivers), and... Read more »

Packet Pushers - Fat Pipe
NB506: Billions Flow for US Chips; FCC Lets T-Mobile, SpaceX Make Phone Calls from Orbit

Packet Pushers - Fat Pipe

Play Episode Listen Later Dec 3, 2024 24:31


Take a Network Break! We’ve got a full menu for our post-Thanksgiving episode. We start with a host of critical CVEs affecting Veritas and a couple more for QNAP. Cisco announces EOL for two version of its ACI software, Verizon runs field trials for 1.6Tbps throughput in a single wavelength (with Ciena optical transceivers), and... Read more »

The Teaching Your Toddler Podcast
How To Create A Safe Holiday Home

The Teaching Your Toddler Podcast

Play Episode Listen Later Dec 2, 2024 5:51


Hosting young children during the holidays can create some risks around the house you might not think of. Creating a safe, functional home, especially in areas like the laundry room, kitchen, medicine cabinet, and bedrooms, is essential for the whole family. Watch this episode on our YouTube channel here: https://youtu.be/VsVRA-qIWTg As an example, a survey by the American Cleaning Institute, from 2024, showed that 36% of Americans use decorative jars or containers to enhance their laundry space, a trend made popular on social media. However, putting products into clear jars or containers for aesthetic purposes is one way you may be unintentionally putting children at risk.   In this mini episode, Brian Sansoni, Senior Vice President, Communications, Outreach and Membership at the American Cleaning Institute and Torine Creppy, President of Safe Kids Worldwide show us how to help parents create safe and child-friendly spaces without sacrificing aesthetics. Both highlight the importance of proper use of cleaning products and the dos and don'ts of storing them, additional items to keep an eye out for throughout the home and areas to prioritize safety. Whether you're organizing, revamping, or preparing to bring a new baby home, ACI and Safe Kids have the resources, including a new safety guide, to help get your home in tip-top safety shape without scarifying style.  About American Cleaning Institute:  ACI is a non-profit organization.  Established in 1926, ACI is dedicated to advancing public understanding of the safety and benefits of cleaning products, and protecting the ability of its members to formulate products that best meet consumer needs. ACI serves both its members and the public by developing and sharing information about industry products with the technical community, policy makers, childcare and health professionals, educators, media and consumers. Find them at https://www.cleaninginstitute.org/   About Safe Kids Worldwide:  Safe Kids Worldwide® is a nonprofit organization working to reduce unintentional injuries to children ages 0-14 and build equitable and sustainable systems that support injury prevention. Most people are surprised to learn preventable injuries are the number one cause of death to children in the United States. Safe Kids works with strategic partners and an extensive network of more than 400 coalitions in the U.S. to reduce traffic injuries, drownings, sleep-related deaths, falls, burns, poisonings, and more. We achieve this work through a public health approach that includes research, interventions to educate and raise awareness, safety device distribution, and advocacy at the federal, state, and local levels. Safe Kids also supports a worldwide alliance of like-minded organizations in more than 20 countries. Find them at https://www.safekids.org/ Please like and subscribe to our podcast and leave a 5-star review so we can reach more parents like you! Subscribe to our podcast by sending an email to subscribe@teachingyourtoddler.com  For more expert interviews, fun activities and story time podcasts, please visit our website at TeachingYourToddler.com Check us out on Facebook at Teaching Your Toddler and on twitter at @TeachingToddler and on Instagram at @teachingyourtoddler  To support great future content, please click here and help us out with a $5 gift: glow.fm/teachingyourtoddler Leave us some feedback on this show and your ideas for future shows! #parenting #toddlers #moms #momlife #kids #podcast #toddlerlife 

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0
The new Claude 3.5 Sonnet, Computer Use, and Building SOTA Agents — with Erik Schluntz, Anthropic

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Nov 28, 2024 71:10


We have announced our first speaker, friend of the show Dylan Patel, and topic slates for Latent Space LIVE! at NeurIPS. Sign up for IRL/Livestream and to debate!We are still taking questions for our next big recap episode! Submit questions and messages on Speakpipe here for a chance to appear on the show!The vibe shift we observed in July - in favor of Claude 3.5 Sonnet, first introduced in June — has been remarkably long lived and persistent, surviving multiple subsequent updates of 4o, o1 and Gemini versions, for Anthropic's Claude to end 2024 as the preferred model for AI Engineers and even being the exclusive choice for new code agents like bolt.new (our next guest on the pod!), which unlocked so much performance from Claude Sonnet that it went from $0 to $4m ARR in 4 weeks when it launched last month.Anthropic has now raised an additional $4b from Amazon and made an incredibly well received update of Claude 3.5 Sonnet (and Haiku), making significant improvements in performance over its predecessors:Solving SWE-BenchAs part of the October Sonnet release, Anthropic teased a blink-and-you'll miss it result:The updated Claude 3.5 Sonnet shows wide-ranging improvements on industry benchmarks, with particularly strong gains in agentic coding and tool use tasks. On coding, it improves performance on SWE-bench Verified from 33.4% to 49.0%, scoring higher than all publicly available models—including reasoning models like OpenAI o1-preview and specialized systems designed for agentic coding. It also improves performance on TAU-bench, an agentic tool use task, from 62.6% to 69.2% in the retail domain, and from 36.0% to 46.0% in the more challenging airline domain. The new Claude 3.5 Sonnet offers these advancements at the same price and speed as its predecessor.This was followed up by a blogpost a week later from today's guest, Erik Schluntz, the engineer who implemented and scored this SOTA result using a simple, non-overengineered version of the SWE-Agent framework (you can see the submissions here). We have previously covered the SWE-Bench story extensively:* Speaking with SWEBench/SWEAgent authors at ICLR* Speaking with Cosine Genie, the previous SOTA (43.8%) on SWEBench Verified (with brief update at DevDay 2024)* Speaking with Shunyu Yao on SWEBench and the ReAct paradigm driving SWE-AgentOne of the notable inclusions in this blogpost are the tools that Erik decided to give Claude, e.g. the “Edit Tool”:The tools teased in the SWEBench submission/blogpost were then polished up and released with Computer Use…And you can also see even more computer use tools given in the new Model Context Protocol servers:Claude Computer UseBecause it is one of the best received AI releases of the year, we recommend watching the 2 minute Computer Use intro (and related demos) in its entirety:Eric also worked on Claude's function calling, tool use, and computer use APIs, so we discuss that in the episode.Erik [00:53:39]: With computer use, just give the thing a browser that's logged into what you want to integrate with, and it's going to work immediately. And I see that reduction in friction as being incredibly exciting. Imagine a customer support team where, okay, hey, you got this customer support bot, but you need to go integrate it with all these things. And you don't have any engineers on your customer support team. But if you can just give the thing a browser that's logged into your systems that you need it to have access to, now, suddenly, in one day, you could be up and rolling with a fully integrated customer service bot that could go do all the actions you care about. So I think that's the most exciting thing for me about computer use, is reducing that friction of integrations to almost zero.As you'll see, this is very top of mind for Erik as a former Robotics founder who's company basically used robots to interface with human physical systems like elevators.Full Video episodePlease like and subscribe!Show Notes* Eric Schluntz* “Raising the bar on SWE-Bench Verified”* Cobalt Robotics* SWE-Bench* SWE-Bench Verified* Human Eval & other benchmarks* Anthropic Workbench* Aider* Cursor* Fireworks AI* E2B* Amanda Askell* Toyota Research* Physical Intelligence (Pi)* Chelsea Finn* Josh Albrecht* Eric Jang* 1X* Dust* Cosine Episode* Bolt* Adept Episode* TauBench* LMSys EpisodeTimestamps* [00:00:00] Introductions* [00:03:39] What is SWE-Bench?* [00:12:22] SWE-Bench vs HumanEval vs others* [00:15:21] SWE-Agent architecture and runtime* [00:21:18] Do you need code indexing?* [00:24:50] Giving the agent tools* [00:27:47] Sandboxing for coding agents* [00:29:16] Why not write tests?* [00:30:31] Redesigning engineering tools for LLMs* [00:35:53] Multi-agent systems* [00:37:52] Why XML so good?* [00:42:57] Thoughts on agent frameworks* [00:45:12] How many turns can an agent do?* [00:47:12] Using multiple model types* [00:51:40] Computer use and agent use cases* [00:59:04] State of AI robotics* [01:04:24] Robotics in manufacturing* [01:05:01] Hardware challenges in robotics* [01:09:21] Is self-driving a good business?TranscriptAlessio [00:00:00]: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO at Decibel Partners. And today we're in the new studio with my usual co-host, Shawn from Smol AI.Swyx [00:00:14]: Hey, and today we're very blessed to have Erik Schluntz from Anthropic with us. Welcome.Erik [00:00:19]: Hi, thanks very much. I'm Erik Schluntz. I'm a member of technical staff at Anthropic, working on tool use, computer use, and Swebench.Swyx [00:00:27]: Yeah. Well, how did you get into just the whole AI journey? I think you spent some time at SpaceX as well? Yeah. And robotics. Yeah. There's a lot of overlap between like the robotics people and the AI people, and maybe like there's some interlap or interest between language models for robots right now. Maybe just a little bit of background on how you got to where you are. Yeah, sure.Erik [00:00:50]: I was at SpaceX a long time ago, but before joining Anthropic, I was the CTO and co-founder of Cobalt Robotics. We built security and inspection robots. These are sort of five foot tall robots that would patrol through an office building or a warehouse looking for anything out of the ordinary. Very friendly, no tasers or anything. We would just sort of call a remote operator if we saw anything. We have about 100 of those out in the world, and had a team of about 100. We actually got acquired about six months ago, but I had left Cobalt about a year ago now, because I was starting to get a lot more excited about AI. I had been writing a lot of my code with things like Copilot, and I was like, wow, this is actually really cool. If you had told me 10 years ago that AI would be writing a lot of my code, I would say, hey, I think that's AGI. And so I kind of realized that we had passed this level, like, wow, this is actually really useful for engineering work. That got me a lot more excited about AI and learning about large language models. So I ended up taking a sabbatical and then doing a lot of reading and research myself and decided, hey, I want to go be at the core of this and joined Anthropic.Alessio [00:01:53]: And why Anthropic? Did you consider other labs? Did you consider maybe some of the robotics companies?Erik [00:02:00]: So I think at the time I was a little burnt out of robotics, and so also for the rest of this, any sort of negative things I say about robotics or hardware is coming from a place of burnout, and I reserve my right to change my opinion in a few years. Yeah, I looked around, but ultimately I knew a lot of people that I really trusted and I thought were incredibly smart at Anthropic, and I think that was the big deciding factor to come there. I was like, hey, this team's amazing. They're not just brilliant, but sort of like the most nice and kind people that I know, and so I just felt like I could be a really good culture fit. And ultimately, I do care a lot about AI safety and making sure that I don't want to build something that's used for bad purposes, and I felt like the best chance of that was joining Anthropic.Alessio [00:02:39]: And from the outside, these labs kind of look like huge organizations that have these obscureSwyx [00:02:44]: ways to organize.Alessio [00:02:45]: How did you get, you joined Anthropic, did you already know you were going to work on of the stuff you publish or you kind of join and then you figure out where you land? I think people are always curious to learn more.Erik [00:02:57]: Yeah, I've been very happy that Anthropic is very bottoms up and sort of very sort of receptive to whatever your interests are. And so I joined sort of being very transparent of like, hey, I'm most excited about code generation and AI that can actually go out and sort of touch the world or sort of help people build things. And, you know, those weren't my initial initial projects. I also came in and said, hey, I want to do the most valuable possible thing for this company and help Anthropic succeed. And, you know, like, let me find the balance of those. So I was working on lots of things at the beginning, you know, function calling, tool use. And then sort of as it became more and more relevant, I was like, oh, hey, like, let's it's time to go work on encoding agents and sort of started looking at SWE-Bench as sort of a really good benchmark for that.Swyx [00:03:39]: So let's get right into SWE-Bench. That's one of the many claims to fame. I feel like there's just been a series of releases related with Cloud 3.5 Sonnet around about two or three months ago, 3.5 Sonnet came out and it was it was a step ahead in terms of a lot of people immediately fell in love with it for coding. And then last month you released a new updated version of Cloud Sonnet. We're not going to talk about the training for that because that's still confidential. But I think Anthropic's done a really good job, like applying the model to different things. So you took the lead on SWE-Bench, but then also we're going to talk a little bit about computer use later on. So maybe just give us a context about why you looked at SWE-Bench Verified and you actually came up with a whole system for building agents that would maximally use the model well. Yeah.Erik [00:04:28]: So I'm on a sub team called Product Research. And basically the idea of product research is to really understand what end customers care about and want in the models and then work to try to make that happen. So we're not focused on sort of these more abstract general benchmarks like math problems or MMLU, but we really care about finding the things that are really valuable and making sure the models are great at those. And so because I've been interested in coding agents, I knew that this would be a really valuable thing. And I knew there were a lot of startups and our customers trying to build coding agents with our models. And so I said, hey, this is going to be a really good benchmark to be able to measure that and do well on it. And I wasn't the first person at Anthropic to find SWE-Bench, and there are lots of people that already knew about it and had done some internal efforts on it. It fell to me to sort of both implement the benchmark, which is very tricky, and then also to sort of make sure we had an agent and basically like a reference agent, maybe I'd call it, that could do very well on it. Ultimately, we want to provide how we implemented that reference agent so that people can build their own agents on top of our system and get sort of the most out of it as possible. So with this blog post we released on SWE-Bench, we released the exact tools and the prompt that we gave the model to be able to do well.Swyx [00:05:46]: For people who don't know, who maybe haven't dived into SWE-Bench, I think the general perception is they're like tasks that a software engineer could do. I feel like that's an inaccurate description because it is basically, one, it's a subset of like 12 repos. It's everything they could find that every issue with like a matching commit that could be tested. So that's not every commit. And then SWE-Bench verified is further manually filtered by OpenAI. Is that an accurate description and anything you'd change about that? Yes.Erik [00:06:14]: SWE-Bench is, it certainly is a subset of all tasks. It's first of all, it's only Python repos, so already fairly limited there. And it's just 12 of these popular open source repos. And yes, it's only ones where there were tests that passed at the beginning and also new tests that were introduced that test the new feature that's added. So it is, I think, a very limited subset of real engineering tasks. But I think it's also very valuable because even though it's a subset, it is true engineering tasks. And I think a lot of other benchmarks are really kind of these much more artificial setups of even if they're related to coding, they're more like coding interview style questions or puzzles that I think are very different from day-to-day what you end up doing. I don't know how frequently you all get to use recursion in your day-to-day job, but whenever I do, it's like a treat. And I think it's almost comical, and a lot of people joke about this in the industry, is how different interview questions are.Swyx [00:07:13]: Dynamic programming. Yeah, exactly.Erik [00:07:15]: Like, you code. From the day-to-day job. But I think one of the most interesting things about SWE-Bench is that all these other benchmarks are usually just isolated puzzles, and you're starting from scratch. Whereas SWE-Bench, you're starting in the context of an entire repository. And so it adds this entirely new dimension to the problem of finding the relevant files. And this is a huge part of real engineering, is it's actually pretty rare that you're starting something totally greenfield. You need to go and figure out where in a codebase you're going to make a change and understand how your work is going to interact with the rest of the systems. And I think SWE-Bench does a really good job of presenting that problem.Alessio [00:07:51]: Why do we still use human eval? It's like 92%, I think. I don't even know if you can actually get to 100% because some of the data is not actuallySwyx [00:07:59]: solvable.Alessio [00:08:00]: Do you see benchmarks like that, they should just get sunsetted? Because when you look at the model releases, it's like, oh, it's like 92% instead of like 89%, 90% on human eval versus, you know, SWE-Bench verified is you have 49%, right? Which is like, before 45% was state of the art, but maybe like six months ago it was like 30%, something like that. So is that a benchmark that you think is going to replace human eval, or do you think they're just going to run in parallel?Erik [00:08:27]: I think there's still need for sort of many different varied evals. Like sometimes you do really care about just sort of greenfield code generation. And so I don't think that everything needs to go to sort of an agentic setup.Swyx [00:08:39]: It would be very expensive to implement.Erik [00:08:41]: The other thing I was going to say is that SWE-Bench is certainly hard to implement and expensive to run because each task, you have to parse, you know, a lot of the repo to understand where to put your code. And a lot of times you take many tries of writing code, running it, editing it. It can use a lot of tokens compared to something like human eval. So I think there's definitely a space for these more traditional coding evals that are sort of easy to implement, quick to run, and do get you some signal. Maybe hopefully there's just sort of harder versions of human eval that get created.Alessio [00:09:14]: How do we get SWE-Bench verified to 92%? Do you think that's something where it's like line of sight to it, or it's like, you know, we need a whole lot of things to go right? Yeah, yeah.Erik [00:09:23]: And actually, maybe I'll start with SWE-Bench versus SWE-Bench verified, which is I think something I missed earlier. So SWE-Bench is, as we described, this big set of tasks that were scraped.Swyx [00:09:33]: Like 12,000 or something?Erik [00:09:34]: Yeah, I think it's 2,000 in the final set. But a lot of those, even though a human did them, they're actually impossible given the information that comes with the task. The most classic example of this is the test looks for a very specific error string. You know, like assert message equals error, something, something, something. And unless you know that's exactly what you're looking for, there's no way the model is going to write that exact same error message, and so the tests are going to fail. So SWE-Bench verified was actually made in partnership with OpenAI, and they hired humans to go review all these tasks and pick out a subset to try to remove any obstacle like this that would make the tasks impossible. So in theory, all of these tasks should be fully doable by the model. And they also had humans grade how difficult they thought the problems would be. Between less than 15 minutes, I think 15 minutes to an hour, an hour to four hours, and greater than four hours. So that's kind of this interesting sort of how big the problem is as well. To get to SWE-Bench verified to 90%, actually, maybe I'll also start off with some of the remaining failures that I see when running our model on SWE-Bench. I'd say the biggest cases are the model sort of operates at the wrong level of abstraction. And what I mean by that is the model puts in maybe a smaller band-aid when really the task is asking for a bigger refactor. And some of those, you know, is the model's fault, but a lot of times if you're just sort of seeing the GitHub issue, it's not exactly clear which way you should do. So even though these tasks are possible, there's still some ambiguity in how the tasks are described. That being said, I think in general, language models frequently will produce a smaller diff when possible, rather than trying to do a big refactor. I think another area, at least the agent we created, didn't have any multimodal abilities, even though our models are very good at vision. So I think that's just a missed opportunity. And if I read through some of the traces, there's some funny things where, especially the tasks on matplotlib, which is a graphing library, the test script will save an image and the model will just say, okay, it looks great, you know, without looking at it. So there's certainly extra juice to squeeze there of just making sure the model really understands all the sides of the input that it's given, including multimodal. But yeah, I think like getting to 92%. So this is something that I have not looked at, but I'm very curious about. I want someone to look at, like, what is the union of all of the different tasks that have been solved by at least one attempt at SWE-Bench Verified. There's a ton of submissions to the benchmark, and so I'd be really curious to see how many of those 500 tasks at least someone has solved. And I think, you know, there's probably a bunch that none of the attempts have ever solved. And I think it'd be interesting to look at those and say, hey, is there some problem with these? Like, are these impossible? Or are they just really hard and only a human could do them?Swyx [00:12:22]: Yeah, like specifically, is there a category of problems that are still unreachable by any LLM agent? Yeah, yeah. And I think there definitely are.Erik [00:12:28]: The question is, are those fairly inaccessible or are they just impossible because of the descriptions? But I think certainly some of the tasks, especially the ones that the human graders reviewed as like taking longer than four hours are extremely difficult. I think we got a few of them right, but not very many at all in the benchmark.Swyx [00:12:49]: And did those take less than four hours?Erik [00:12:51]: They certainly did less than, yeah, than four hours.Swyx [00:12:54]: Is there a correlation of length of time with like human estimated time? You know what I mean? Or do we have sort of more of X paradox type situations where it's something super easy for a model, but hard for a human?Erik [00:13:06]: I actually haven't done the stats on that, but I think that'd be really interesting to see of like how many tokens does it take and how is that correlated with difficulty? What is the likelihood of success with difficulty? I think actually a really interesting thing that I saw, one of my coworkers who was also working on this named Simon, he was focusing just specifically on the very hard problems, the ones that are said to take longer than four hours. And he ended up sort of creating a much more detailed prompt than I used. And he got a higher score on the most difficult subset of problems, but a lower score overall on the whole benchmark. And the prompt that I made, which is sort of much more simple and bare bones, got a higher score on the overall benchmark, but lower score on the really hard problems. And I think some of that is the really detailed prompt made the model sort of overcomplicate a lot of the easy problems, because honestly, a lot of the suite bench problems, they really do just ask for a bandaid where it's like, hey, this crashes if this is none, and really all you need to do is put a check if none. And so sometimes trying to make the model think really deeply, it'll think in circles and overcomplicate something, which certainly human engineers are capable of as well. But I think there's some interesting thing of the best prompt for hard problems might not be the best prompt for easy problems.Alessio [00:14:19]: How do we fix that? Are you supposed to fix it at the model level? How do I know what prompt I'm supposed to use?Swyx [00:14:25]: Yeah.Erik [00:14:26]: And I'll say this was a very small effect size, and so I think this isn't worth obsessing over. I would say that as people are building systems around agents, I think the more you can separate out the different kinds of work the agent needs to do, the better you can tailor a prompt for that task. And I think that also creates a lot of like, for instance, if you were trying to make an agent that could both solve hard programming tasks, and it could just write quick test files for something that someone else had already made, the best way to do those two tasks might be very different prompts. I see a lot of people build systems where they first sort of have a classification, and then route the problem to two different prompts. And that's sort of a very effective thing, because one, it makes the two different prompts much simpler and smaller, and it means you can have someone work on one of the prompts without any risk of affecting the other tasks. So it creates like a nice separation of concerns. Yeah.Alessio [00:15:21]: And the other model behavior thing you mentioned, they prefer to generate like shorter diffs. Why is that? Like, is there a way? I think that's maybe like the lazy model question that people have is like, why are you not just generating the whole code instead of telling me to implement it?Swyx [00:15:36]: Are you saving tokens? Yeah, exactly. It's like conspiracy theory. Yeah. Yeah.Erik [00:15:41]: Yeah. So there's two different things there. One is like the, I'd say maybe like doing the easier solution rather than the hard solution. And I'd say the second one, I think what you're talking about is like the lazy model is like when the model says like dot, dot, dot, code remains the same.Swyx [00:15:52]: Code goes here. Yeah. I'm like, thanks, dude.Erik [00:15:55]: But honestly, like that just comes as like people on the internet will do stuff like that. And like, dude, if you're talking to a friend and you ask them like to give you some example code, they would definitely do that. They're not going to reroll the whole thing. And so I think that's just a matter of like, you know, sometimes you actually do just, just want like the relevant changes. And so I think it's, this is something where a lot of times like, you know, the models aren't good at mind reading of like which one you want. So I think that like the more explicit you can be in prompting to say, Hey, you know, give me the entire thing, no, no elisions versus just give me the relevant changes. And that's something, you know, we want to make the models always better at following those kinds of instructions.Swyx [00:16:32]: I'll drop a couple of references here. We're recording this like a day after Dario, Lex Friedman just dropped his five hour pod with Dario and Amanda and the rest of the crew. And Dario actually made this interesting observation that like, we actually don't want, we complain about models being too chatty in text and then not chatty enough in code. And so like getting that right is kind of a awkward bar because, you know, you, you don't want it to yap in its responses, but then you also want it to be complete in, in code. And then sometimes it's not complete. Sometimes you just want it to diff, which is something that Enthopic has also released with a, you know, like the, the fast edit stuff that you guys did. And then the other thing I wanted to also double back on is the prompting stuff. You said, you said it was a small effect, but it was a noticeable effect in terms of like picking a prompt. I think we'll go into suite agent in a little bit, but I kind of reject the fact that, you know, you need to choose one prompt and like have your whole performance be predicated on that one prompt. I think something that Enthopic has done really well is meta prompting, prompting for a prompt. And so why can't you just develop a meta prompt for, for all the other prompts? And you know, if it's a simple task, make a simple prompt, if it's a hard task, make a hard prompt. Obviously I'm probably hand-waving a little bit, but I will definitely ask people to try the Enthopic Workbench meta prompting system if they haven't tried it yet. I went to the Build Day recently at Enthopic HQ, and it's the closest I've felt to an AGI, like learning how to operate itself that, yeah, it's, it's, it's really magical.Erik [00:17:57]: Yeah, no, Claude is great at writing prompts for Claude.Swyx [00:18:00]: Right, so meta prompting. Yeah, yeah.Erik [00:18:02]: The way I think about this is that humans, even like very smart humans still use sort of checklists and use sort of scaffolding for themselves. Surgeons will still have checklists, even though they're incredible experts. And certainly, you know, a very senior engineer needs less structure than a junior engineer, but there still is some of that structure that you want to keep. And so I always try to anthropomorphize the models and try to think about for a human sort of what is the equivalent. And that's sort of, you know, how I think about these things is how much instruction would you give a human with the same task? And do you, would you need to give them a lot of instruction or a little bit of instruction?Alessio [00:18:36]: Let's talk about the agent architecture maybe. So first, runtime, you let it run until it thinks it's done or it reaches 200k context window.Swyx [00:18:45]: How did you come up? What's up with that?Erik [00:18:47]: Yeah.Swyx [00:18:48]: Yeah.Erik [00:18:49]: I mean, this, so I'd say that a lot of previous agent work built sort of these very hard coded and rigid workflows where the model is sort of pushed through certain flows of steps. And I think to some extent, you know, that's needed with smaller models and models that are less smart. But one of the things that we really wanted to explore was like, let's really give Claude the reins here and not force Claude to do anything, but let Claude decide, you know, how it should approach the problem, what steps it should do. And so really, you know, what we did is like the most extreme version of this is just give it some tools that it can call and it's able to keep calling the tools, keep thinking, and then yeah, keep doing that until it thinks it's done. And that's sort of the most, the most minimal agent framework that we came up with. And I think that works very well. I think especially the new Sonnet 3.5 is very, very good at self-correction, has a lot of like grit. Claude will try things that fail and then try, you know, come back and sort of try different approaches. And I think that's something that you didn't see in a lot of previous models. Some of the existing agent frameworks that I looked at, they had whole systems built to try to detect loops and see, oh, is the model doing the same thing, you know, more than three times, then we have to pull it out. And I think like the smarter the models are, the less you need that kind of extra scaffolding. So yeah, just giving the model tools and letting it keep sample and call tools until it thinks it's done was the most minimal framework that we could think of. And so that's what we did.Alessio [00:20:18]: So you're not pruning like bad paths from the context. If it tries to do something, it fails. You just burn all these tokens.Swyx [00:20:25]: Yes.Erik [00:20:26]: I would say the downside of this is that this is sort of a very token expensive way to doSwyx [00:20:29]: this. But still, it's very common to prune bad paths because models get stuck. Yeah.Erik [00:20:35]: But I'd say that, yeah, 3.5 is not getting stuck as much as previous models. And so, yeah, we wanted to at least just try the most minimal thing. Now, I would say that, you know, this is definitely an area of future research, especially if we talk about these problems that are going to take a human more than four hours. Those might be things where we're going to need to go prune bad paths to let the model be able to accomplish this task within 200k tokens. So certainly I think there's like future research to be done in that area, but it's not necessary to do well on these benchmarks.Swyx [00:21:06]: Another thing I always have questions about on context window things, there's a mini cottage industry of code indexers that have sprung up for large code bases, like the ones in SweetBench. You didn't need them? We didn't.Erik [00:21:18]: And I think I'd say there's like two reasons for this. One is like SweetBench specific and the other is a more general thing. The more general thing is that I think Sonnet is very good at what we call agentic search. And what this basically means is letting the model decide how to search for something. It gets the results and then it can decide, should it keep searching or is it done? Does it have everything it needs? So if you read through a lot of the traces of the SweetBench, the model is calling tools to view directories, list out things, view files. And it will do a few of those until it feels like it's found the file where the bug is. And then it will start working on that file. And I think like, again, this is all, everything we did was about just giving Claude the full reins. So there's no hard-coded system. There's no search system that you're relying on getting the correct files into context. This just totally lets Claude do it.Swyx [00:22:11]: Or embedding things into a vector database. Exactly. Oops. No, no.Erik [00:22:17]: This is very, very token expensive. And so certainly, and it also takes many, many turns. And so certainly if you want to do something in a single turn, you need to do RAG and just push stuff into the first prompt.Alessio [00:22:28]: And just to make it clear, it's using the Bash tool, basically doing LS, looking at files and then doing CAD for the following context. It can do that.Erik [00:22:35]: But it's file editing tool also has a command in it called view that can view a directory. It's very similar to LS, but it just sort of has some nice sort of quality of life improvements. So I think it'll only do an LS sort of two directories deep so that the model doesn't get overwhelmed if it does this on a huge file. I would say actually we did more engineering of the tools than the overall prompt. But the one other thing I want to say about this agentic search is that for SWE-Bench specifically, a lot of the tasks are bug reports, which means they have a stack trace in them. And that means right in that first prompt, it tells you where to go. And so I think this is a very easy case for the model to find the right files versus if you're using this as a general coding assistant where there isn't a stack trace or you're asking it to insert a new feature, I think there it's much harder to know which files to look at. And that might be an area where you would need to do more of this exhaustive search where an agentic search would take way too long.Swyx [00:23:33]: As someone who spent the last few years in the JS world, it'd be interesting to see SWE-Bench JS because these stack traces are useless because of so much virtualization that we do. So they're very, very disconnected with where the code problems are actually appearing.Erik [00:23:50]: That makes me feel better about my limited front-end experience, as I've always struggled with that problem.Swyx [00:23:55]: It's not your fault. We've gotten ourselves into a very, very complicated situation. And I'm not sure it's entirely needed. But if you talk to our friends at Vercel, they will say it is.Erik [00:24:04]: I will say SWE-Bench just released SWE-Bench Multimodal, which I believe is either entirely JavaScript or largely JavaScript. And it's entirely things that have visual components of them.Swyx [00:24:15]: Are you going to tackle that? We will see.Erik [00:24:17]: I think it's on the list and there's interest, but no guarantees yet.Swyx [00:24:20]: Just as a side note, it occurs to me that every model lab, including Enthopic, but the others as well, you should have your own SWE-Bench, whatever your bug tracker tool. This is a general methodology that you can use to track progress, I guess.Erik [00:24:34]: Yeah, sort of running on our own internal code base.Swyx [00:24:36]: Yeah, that's a fun idea.Alessio [00:24:37]: Since you spend so much time on the tool design, so you have this edit tool that can make changes and whatnot. Any learnings from that that you wish the AI IDEs would take in? Is there some special way to look at files, feed them in?Erik [00:24:50]: I would say the core of that tool is string replace. And so we did a few different experiments with different ways to specify how to edit a file. And string replace, basically, the model has to write out the existing version of the string and then a new version, and that just gets swapped in. We found that to be the most reliable way to do these edits. Other things that we tried were having the model directly write a diff, having the model fully regenerate files. That one is actually the most accurate, but it takes so many tokens, and if you're in a very big file, it's cost prohibitive. There's basically a lot of different ways to represent the same task. And they actually have pretty big differences in terms of model accuracy. I think Eider, they have a really good blog where they explore some of these different methods for editing files, and they post results about them, which I think is interesting. But I think this is a really good example of the broader idea that you need to iterate on tools rather than just a prompt. And I think a lot of people, when they make tools for an LLM, they kind of treat it like they're just writing an API for a computer, and it's sort of very minimal. It's sort of just the bare bones of what you'd need, and honestly, it's so hard for the models to use those. Again, I come back to anthropomorphizing these models. Imagine you're a developer, and you just read this for the very first time, and you're trying to use it. You can do so much better than just sort of the bare API spec of what you'd often see. Include examples in the description. Include really detailed explanations of how things work. And I think that, again, also think about what is the easiest way for the model to represent the change that it wants to make. For file editing, as an example, writing a diff is actually... Let's take the most extreme example. You want the model to literally write a patch file. I think patch files have at the very beginning numbers of how many total lines change. That means before the model has actually written the edit, it needs to decide how many numbers or how many lines are going to change.Swyx [00:26:52]: Don't quote me on that.Erik [00:26:54]: I think it's something like that, but I don't know if that's exactly the diff format. But you can certainly have formats that are much easier to express without messing up than others. And I like to think about how much human effort goes into designing human interfaces for things. It's incredible. This is entirely what FrontEnd is about, is creating better interfaces to kind of do the same things. And I think that same amount of attention and effort needs to go into creating agent computer interfaces.Swyx [00:27:19]: It's a topic we've discussed, ACI or whatever that looks like. I would also shout out that I think you released some of these toolings as part of computer use as well. And people really liked it. It's all open source if people want to check it out. I'm curious if there's an environment element that complements the tools. So how do you... Do you have a sandbox? Is it just Docker? Because that can be slow or resource intensive. Do you have anything else that you would recommend?Erik [00:27:47]: I don't think I can talk about sort of public details or about private details about how we implement our sandboxing. But obviously, we need to have sort of safe, secure, and fast sandboxes for training for the models to be able to practice writing code and working in an environment.Swyx [00:28:03]: I'm aware of a few startups working on agent sandboxing. E2B is a close friend of ours that Alessio has led around in, but also I think there's others where they're focusing on snapshotting memory so that it can do time travel for debugging. Computer use where you can control the mouse or keyboard or something like that. Whereas here, I think that the kinds of tools that we offer are very, very limited to coding agent work cases like bash, edit, you know, stuff like that. Yeah.Erik [00:28:30]: I think the computer use demo that we released is an extension of that. It has the same bash and edit tools, but it also has the computer tool that lets it get screenshots and move the mouse and keyboard. Yeah. So I definitely think there's sort of more general tools there. And again, the tools we released as part of SweetBench were, I'd say they're very specific for like editing files and doing bash, but at the same time, that's actually very general if you think about it. Like anything that you would do on a command line or like editing files, you can do with those tools. And so we do want those tools to feel like any sort of computer terminal work could be done with those same tools rather than making tools that were like very specific for SweetBench like run tests as its own tool, for instance. Yeah.Swyx [00:29:15]: You had a question about tests.Alessio [00:29:16]: Yeah, exactly. I saw there's no test writer tool. Is it because it generates the code and then you're running it against SweetBench anyway, so it doesn't really need to write the test or?Swyx [00:29:26]: Yeah.Erik [00:29:27]: So this is one of the interesting things about SweetBench is that the tests that the model's output is graded on are hidden from it. That's basically so that the model can't cheat by looking at the tests and writing the exact solution. And I'd say typically the model, the first thing it does is it usually writes a little script to reproduce the error. And again, most SweetBench tasks are like, hey, here's a bug that I found. I run this and I get this error. So the first thing the model does is try to reproduce that. So it's kind of been rerunning that script as a mini test. But yeah, sometimes the model will like accidentally introduce a bug that breaks some other tests and it doesn't know about that.Alessio [00:30:05]: And should we be redesigning any tools? We kind of talked about this and like having more examples, but I'm thinking even things of like Q as a query parameter in many APIs, it's like easier for the model to like re-query than read the Q. I'm sure it learned the Q by this point, but like, is there anything you've seen like building this where it's like, hey, if I were to redesign some CLI tools, some API tool, I would like change the way structure to make it better for LLMs?Erik [00:30:31]: I don't think I've thought enough about that off the top of my head, but certainly like just making everything more human friendly, like having like more detailed documentation and examples. I think examples are really good in things like descriptions, like so many, like just using the Linux command line, like how many times I do like dash dash help or look at the man page or something. It's like, just give me one example of like how I actually use this. Like I don't want to go read through a hundred flags. Just give me the most common example. But again, so you know, things that would be useful for a human, I think are also very useful for a model.Swyx [00:31:03]: Yeah. I mean, there's one thing that you cannot give to code agents that is useful for human is this access to the internet. I wonder how to design that in, because one of the issues that I also had with just the idea of a suite bench is that you can't do follow up questions. You can't like look around for similar implementations. These are all things that I do when I try to fix code and we don't do that. It's not, it wouldn't be fair, like it'd be too easy to cheat, but then also it's kind of not being fair to these agents because they're not operating in a real world situation. Like if I had a real world agent, of course I'm giving it access to the internet because I'm not trying to pass a benchmark. I don't have a question in there more, more just like, I feel like the most obvious tool access to the internet is not being used.Erik [00:31:47]: I think that that's really important for humans, but honestly the models have so much general knowledge from pre-training that it's, it's like less important for them. I feel like versioning, you know, if you're working on a newer thing that was like, they came after the knowledge cutoff, then yes, I think that's very important. I think actually this, this is like a broader problem that there is a divergence between Sweebench and like what customers will actually care about who are working on a coding agent for real use. And I think one of those there is like internet access and being able to like, how do you pull in outside information? I think another one is like, if you have a real coding agent, you don't want to have it start on a task and like spin its wheels for hours because you gave it a bad prompt. You want it to come back immediately and ask follow up questions and like really make sure it has a very detailed understanding of what to do, then go off for a few hours and do work. So I think that like real tasks are going to be much more interactive with the agent rather than this kind of like one shot system. And right now there's no benchmark that, that measures that. And maybe I think it'd be interesting to have some benchmark that is more interactive. I don't know if you're familiar with TauBench, but it's a, it's a customer service benchmark where there's basically one LLM that's playing the user or the customer that's getting support and another LLM that's playing the support agent and they interact and try to resolve the issue.Swyx [00:33:08]: Yeah. We talked to the LMSIS guys. Awesome. And they also did MTBench for people listening along. So maybe we need MTSWE-Bench. Sure. Yeah.Erik [00:33:16]: So maybe, you know, you could have something where like before the SWE-Bench task starts, you have like a few back and forths with kind of like the, the author who can answer follow up questions about what they want the task to do. And of course you'd need to do that where it doesn't cheat and like just get the exact, the exact thing out of the human or out of the sort of user. But I think that would be a really interesting thing to see. If you look at sort of existing agent work, like a Repl.it's coding agent, I think one of the really great UX things they do is like first having the agent create a plan and then having the human approve that plan or give feedback. I think for agents in general, like having a planning step at the beginning, one, just having that plan will improve performance on the downstream task just because it's kind of like a bigger chain of thought, but also it's just such a better UX. It's way easier for a human to iterate on a plan with a model rather than iterating on the full task that sort of has a much slower time through each loop. If the human has approved this implementation plan, I think it makes the end result a lot more sort of auditable and trustable. So I think there's a lot of things sort of outside of SweetBench that will be very important for real agent usage in the world. Yeah.Swyx [00:34:27]: I will say also, there's a couple of comments on names that you dropped. Copilot also does the plan stage before it writes code. I feel like those approaches have generally been less Twitter successful because it's not prompt to code, it's prompt plan code. You know, so there's a little bit of friction in there, but it's not much. Like it's, it actually, it's, it, you get a lot for what it's worth. I also like the way that Devin does it, where you can sort of edit the plan as it goes along. And then the other thing with Repl.it, we had a, we hosted a sort of dev day pregame with Repl.it and they also commented about multi-agents. So like having two agents kind of bounce off of each other. I think it's a similar approach to what you're talking about with kind of the few shot example, just as in the prompts of clarifying what the agent wants. But typically I think this would be implemented as a tool calling another agent, like a sub-agent I don't know if you explored that, do you like that idea?Erik [00:35:20]: I haven't explored this enough, but I've definitely heard of people having good success with this. Of almost like basically having a few different sort of personas of agents, even if they're all the same LLM. I think this is one thing with multi-agent that a lot of people will kind of get confused by is they think it has to be different models behind each thing. But really it's sort of usually the same, the same model with different prompts. And yet having one, having them have different personas to kind of bring different sort of thoughts and priorities to the table. I've seen that work very well and sort of create a much more thorough and thought outSwyx [00:35:53]: response.Erik [00:35:53]: I think the downside is just that it adds a lot of complexity and it adds a lot of extra tokens. So I think it depends what you care about. If you want a plan that's very thorough and detailed, I think it's great. If you want a really quick, just like write this function, you know, you probably don't want to do that and have like a bunch of different calls before it does this.Alessio [00:36:11]: And just talking about the prompt, why are XML tags so good in Cloud? I think initially people were like, oh, maybe you're just getting lucky with XML. But I saw obviously you use them in your own agent prompts, so they must work. And why is it so model specific to your family?Erik [00:36:26]: Yeah, I think that there's, again, I'm not sure how much I can say, but I think there's historical reasons that internally we've preferred XML. I think also the one broader thing I'll say is that if you look at certain kinds of outputs, there is overhead to outputting in JSON. If you're trying to output code in JSON, there's a lot of extra escaping that needs to be done, and that actually hurts model performance across the board. Versus if you're in just a single XML tag, there's none of that sort of escaping thatSwyx [00:36:58]: needs to happen.Erik [00:36:58]: That being said, I haven't tried having it write HTML and XML, which maybe then you start running into weird escaping things there. I'm not sure. But yeah, I'd say that's some historical reasons, and there's less overhead of escaping.Swyx [00:37:12]: I use XML in other models as well, and it's just a really nice way to make sure that the thing that ends is tied to the thing that starts. That's the only way to do code fences where you're pretty sure example one start, example one end, that is one cohesive unit.Alessio [00:37:30]: Because the braces are nondescriptive. Yeah, exactly.Swyx [00:37:33]: That would be my simple reason. XML is good for everyone, not just Cloud. Cloud was just the first one to popularize it, I think.Erik [00:37:39]: I do definitely prefer to read XML than read JSON.Alessio [00:37:43]: Any other details that are maybe underappreciated? I know, for example, you had the absolute paths versus relative. Any other fun nuggets?Erik [00:37:52]: I think that's a good sort of anecdote to mention about iterating on tools. Like I said, spend time prompt engineering your tools, and don't just write the prompt, but write the tool, and then actually give it to the model and read a bunch of transcripts about how the model tries to use the tool. I think by doing that, you will find areas where the model misunderstands a tool or makes mistakes, and then basically change the tool to make it foolproof. There's this Japanese term, pokayoke, about making tools mistake-proof. You know, the classic idea is you can have a plug that can fit either way, and that's dangerous, or you can make it asymmetric so that it can't fit this way, it has to go like this, and that's a better tool because you can't use it the wrong way. So for this example of absolute paths, one of the things that we saw while testing these tools is, oh, if the model has done CD and moved to a different directory, it would often get confused when trying to use the tool because it's now in a different directory, and so the paths aren't lining up. So we said, oh, well, let's just force the tool to always require an absolute path, and then that's easy for the model to understand. It knows sort of where it is. It knows where the files are. And then once we have it always giving absolute paths, it never messes up even, like, no matter where it is because it just, if you're using an absolute path, it doesn't matter whereSwyx [00:39:13]: you are.Erik [00:39:13]: So iterations like that, you know, let us make the tool foolproof for the model. I'd say there's other categories of things where we see, oh, if the model, you know, opens vim, like, you know, it's never going to return. And so the tool is stuck.Swyx [00:39:28]: Did it get stuck? Yeah. Get out of vim. What?Erik [00:39:31]: Well, because the tool is, like, it just text in, text out. It's not interactive. So it's not like the model doesn't know how to get out of vim. It's that the way that the tool is, like, hooked up to the computer is not interactive. Yes, I mean, there is the meme of no one knows how to get out of vim. You know, basically, we just added instructions in the tool of, like, hey, don't launch commands that don't return.Swyx [00:39:54]: Yeah, like, don't launch vim.Erik [00:39:55]: Don't launch whatever. If you do need to do something, you know, put an ampersand after it to launch it in the background. And so, like, just, you know, putting kind of instructions like that just right in the description for the tool really helps the model. And I think, like, that's an underutilized space of prompt engineering, where, like, people might try to do that in the overall prompt, but just put that in the tool itself so the model knows that it's, like, for this tool, this is what's relevant.Swyx [00:40:20]: You said you worked on the function calling and tool use before you actually started this vBench work, right? Was there any surprises? Because you basically went from creator of that API to user of that API. Any surprises or changes you would make now that you have extensively dog-fooded in a state-of-the-art agent?Erik [00:40:39]: I want us to make, like, maybe, like, a little bit less verbose SDK. I think some way, like, right now, it just takes, I think we sort of force people to do the best practices of writing out sort of these full JSON schemas, but it would be really nice if you could just pass in a Python function as a tool. I think that could be something nice.Swyx [00:40:58]: I think that there's a lot of, like, Python- There's helper libraries. ... structure, you know. I don't know if there's anyone else that is specializing for Anthropic. Maybe Jeremy Howard's and Simon Willis's stuff. They all have Cloud-specific stuff that they are working on. Cloudette. Cloudette, exactly. I also wanted to spend a little bit of time with SuiteAgent. It seems like a very general framework. Like, is there a reason you picked it apart from it's the same authors as vBench, or?Erik [00:41:21]: The main thing we wanted to go with was the same authors as vBench, so it just felt sort of like the safest, most neutral option. And it was, you know, very high quality. It was very easy to modify, to work with. I would say it also actually, their underlying framework is sort of this, it's like, youSwyx [00:41:39]: know, think, act, observe.Erik [00:41:40]: That they kind of go through this loop, which is like a little bit more hard-coded than what we wanted to do, but it's still very close. That's still very general. So it felt like a good match as sort of the starting point for our agent. And we had already sort of worked with and talked with the SWE-Bench people directly, so it felt nice to just have, you know, we already know the authors. This will be easy to work with.Swyx [00:42:00]: I'll share a little bit of like, this all seems disconnected, but once you figure out the people and where they go to school, it all makes sense. So it's all Princeton. Yeah, the SWE-Bench and SuiteAgent.Erik [00:42:11]: It's a group out of Princeton.Swyx [00:42:12]: Yeah, and we had Shun Yu on the pod, and he came up with the React paradigm, and that's think, act, observe. That's all React. So they're all friends. Yep, yeah, exactly.Erik [00:42:22]: And you know, if you actually read our traces of our submission, you can actually see like think, act, observe in our logs. And we just didn't even change the printing code. So it's like doing still function calls under the hood, and the model can do sort of multiple function calls in a row without thinking in between if it wants to. But yeah, so a lot of similarities and a lot of things we inherited from SuiteAgent just as a starting point for the framework.Alessio [00:42:47]: Any thoughts about other agent frameworks? I think there's, you know, the whole gamut from very simple to like very complex.Swyx [00:42:53]: Autogen, CooEI, LandGraph. Yeah, yeah.Erik [00:42:56]: I think I haven't explored a lot of them in detail. I would say with agent frameworks in general, they can certainly save you some like boilerplate. But I think there's actually this like downside of making agents too easy, where you end up very quickly like building a much more complex system than you need. And suddenly, you know, instead of having one prompt, you have five agents that are talking to each other and doing a dialogue. And it's like, because the framework made that 10 lines to do, you end up building something that's way too complex. So I think I would actually caution people to like try to start without these frameworks if you can, because you'll be closer to the raw prompts and be able to sort of directly understand what's going on. I think a lot of times these frameworks also, by trying to make everything feel really magical, you end up sort of really hiding what the actual prompt and output of the model is, and that can make it much harder to debug. So certainly these things have a place, and I think they do really help at getting rid of boilerplate, but they come with this cost of obfuscating what's really happening and making it too easy to very quickly add a lot of complexity. So yeah, I would recommend people to like try it from scratch, and it's like not that bad.Alessio [00:44:08]: Would you rather have like a framework of tools? Do you almost see like, hey, it's maybe easier to get tools that are already well curated, like the ones that you build, if I had an easy way to get the best tool from you, andSwyx [00:44:21]: like you maintain the definition?Alessio [00:44:22]: Or yeah, any thoughts on how you want to formalize tool sharing?Erik [00:44:26]: Yeah, I think that's something that we're certainly interested in exploring, and I think there is space for sort of these general tools that will be very broadly applicable. But at the same time, most people that are building on these, they do have much more specific things that they're trying to do. You know, I think that might be useful for hobbyists and demos, but the ultimate end applications are going to be bespoke. And so we just want to make sure that the model's great at any tool that it uses. But certainly something we're exploring.Alessio [00:44:52]: So everything bespoke, no frameworks, no anything.Swyx [00:44:55]: Just for now, for now.Erik [00:44:56]: Yeah, I would say that like the best thing I've seen is people building up from like, build some good util functions, and then you can use those as building blocks. Yeah, yeah.Alessio [00:45:05]: I have a utils folder, or like all these scripts. My framework is like def, call, and tropic. And then I just put all the defaults.Swyx [00:45:12]: Yeah, exactly. There's a startup hidden in every utils folder, you know? No, totally not. Like, if you use it enough, like it's a startup, you know? At some point. I'm kind of curious, is there a maximum length of turns that it took? Like, what was the longest run? I actually don't.Erik [00:45:27]: I mean, it had basically infinite turns until it ran into a 200k context. I should have looked this up. I don't know. And so for some of those failed cases where it eventually ran out of context, I mean, it was over 100 turns. I'm trying to remember like the longest successful run, but I think it was definitely over 100 turns that some of the times.Swyx [00:45:48]: Which is not that much. It's a coffee break. Yeah.Erik [00:45:52]: But certainly, you know, these things can be a lot of turns. And I think that's because some of these things are really hard, where it's going to take, you know, many tries to do it. And if you think about like, think about a task that takes a human four hours to do. Think about how many different files you read, and like times you edit a file in four hours. That's a lot more than 100.Alessio [00:46:10]: How many times you open Twitter because you get distracted. But if you had a lot more compute, what's kind of like the return on the extra compute now? So like, you know, if you had thousands of turns or like whatever, like how much better would it get?Erik [00:46:23]: Yeah, this I don't know. And I think this is, I think sort of one of the open areas of research in general with agents is memory and sort of how do you have something that can do work beyond its context length where you're just purely appending. So you mentioned earlier things like pruning bad paths. I think there's a lot of interesting work around there. Can you just roll back but summarize, hey, don't go down this path? There be dragons. Yeah, I think that's very interesting that you could have something that that uses way more tokens without ever using at a time more than 200k. So I think that's very interesting. I think the biggest thing is like, can you make the model sort of losslessly summarize what it's learned from trying different approaches and bring things back? I think that's sort of the big challenge.Swyx [00:47:11]: What about different models?Alessio [00:47:12]: So you have Haiku, which is like, you know, cheaper. So you're like, well, what if I have a Haiku to do a lot of these smaller things and then put it back up?Erik [00:47:20]: I think Cursor might have said that they actually have a separate model for file editing.Swyx [00:47:25]: I'm trying to remember.Erik [00:47:25]: I think they were on maybe the Lex Fridman podcast where they said they have a bigger model, like write what the code should be and then a different model, like apply it. So I think there's a lot of interesting room for stuff like that. Yeah, fast supply.Swyx [00:47:37]: We actually did a pod with Fireworks that they worked with on. It's speculative decoding.Erik [00:47:41]: But I think there's also really interesting things about like, you know, paring down input tokens as well, especially sometimes the models trying to read like a 10,000 line file. That's a lot of tokens. And most of it is actually not going to be relevant. I think it'd be really interesting to like delegate that to Haiku. Haiku read this file and just pull out the most relevant functions. And then, you know, Sonnet reads just those and you save 90% on tokens. I think there's a lot of really interesting room for things like that. And again, we were just trying to do sort of the simplest, most minimal thing and show that it works. I'm really hoping that people, sort of the agent community builds things like that on top of our models. That's, again, why we released these tools. We're not going to go and do lots more submissions to SWE-Bench and try to prompt engineer this and build a bigger system. We want people to like the ecosystem to do that on top of our models. But yeah, so I think that's a really interesting one.Swyx [00:48:32]: It turns out, I think you did do 3.5 Haiku with your tools and it scored a 40.6. Yes.Erik [00:48:38]: So it did very well. It itself is actually very smart, which is great. But we haven't done any experiments with this combination of the two models. But yeah, I think that's one of the exciting things is that how well Haiku 3.5 did on SWE-Bench shows that sort of even our smallest, fastest model is very good at sort of thinking agentically and working on hard problems. Like it's not just sort of for writing simple text anymore.Alessio [00:49:02]: And I know you're not going to talk about it, but like Sonnet is not even supposed to be the best model, you know? Like Opus, it's kind of like we left it at three back in the corner intro. At some point, I'm sure the new Opus will come out. And if you had Opus Plus on it, that sounds very, very good.Swyx [00:49:19]: There's a run with SuiteAgent plus Opus, but that's the official SWE-Bench guys doing it.Erik [00:49:24]: That was the older, you know, 3.0.Swyx [00:49:25]: You didn't do yours. Yeah. Okay. Did you want to? I mean, you could just change the model name.Erik [00:49:31]: I think we didn't submit it, but I think we included it in our model card.Swyx [00:49:35]: Okay.Erik [00:49:35]: We included the score as a comparison. Yeah.Swyx [00:49:38]: Yeah.Erik [00:49:38]: And Sonnet and Haiku, actually, I think the new ones, they both outperformed the original Opus. Yeah. I did see that.Swyx [00:49:44]: Yeah. It's a little bit hard to find. Yeah.Erik [00:49:47]: It's not an exciting score, so we didn't feel like they need to submit it to the benchmark.Swyx [00:49:52]: We can cut over to computer use if we're okay with moving on to topics on this, if anything else. I think we're good.Erik [00:49:58]: I'm trying to think if there's anything else SWE-Bench related.Swyx [00:50:02]: It doesn't have to be also just specifically SWE-Bench, but just your thoughts on building agents, because you are one of the few people that have reached this leaderboard on building a coding agent. This is the state of the art. It's surprisingly not that hard to reach with some good principles. Right. There's obviously a ton of low-hanging fruit that we covered. Your thoughts on if you were to build a coding agent startup, what next?Erik [00:50:24]: I think the really interesting question for me, for all the startups out there, is this kind of divergence between the benchmarks and what real customers will want. So I'm curious, maybe the next time you have a coding agent startup on the podcast, you should ask them that. What are the differences that they're starting to make? Tomorrow.Swyx [00:50:40]: Oh, perfect, perfect. Yeah.Erik [00:50:41]: I'm actually very curious what they will see, because I also have seen, I feel like it's slowed down a little bit if I don't see the startups submitting to SWE-Bench that much anymore.Swyx [00:50:52]: Because of the traces, the trace. So we had Cosign on, they had a 50-something on full, on SWE-Bench full, which is the hardest one, and they were rejected because they didn't want to submit their traces. Yep. IP, you know? Yeah, that makes sense, that makes sense. Actually, tomorrow we're talking to Bolt, which is a cloud customer. You guys actually published a case study with them. I assume you weren't involved with that, but they were very happy with Cloud. Cool. One of the biggest launches of the year. Yeah, totally. We actually happened to b

The Leadership Launchpad Project
How to Grow Your People into Leaders with Candemir Akyildiz

The Leadership Launchpad Project

Play Episode Listen Later Nov 26, 2024 29:54


Candemir Akyildiz, Director at TAV Airports, joins Rob Kalwarowsky on the Leadership Launchpad Podcast to talk about how to get promoted, leading high-performance teams and growing people in your organization.           Candemir AKYILDIZ Airports Director, IAP, MBA TAV Airports Holding                                     E-mail : candemir.akyildiz@tav.aero https://www.linkedin.com/in/candemirakyildiz-iap/  Phone : +90 533 967 16 48 Driven and results-oriented aviation professional with over 23 years of dedicated experience in Airport Management. Throughout my career, I've garnered extensive expertise across various facets of airport operations, enabling me to excel in diverse management roles within the industry. My commitment to excellence has been recognized through my attainment of the prestigious International Airport Management certification, earned upon completing the AMPAP program offered through the collaboration of ICAO and ACI. In addition to my practical experience, I've had the privilege of contributing to the industry on a broader scale as a member of the ACI World Facilitation & Services Standing Committee. This role has not only allowed me to stay at the forefront of industry trends but also to actively shape policies and practices that drive operational efficiency and customer satisfaction at airports worldwide. Find Rob Kalwarowsky, World-Renowned Executive Coach, Author & TEDx Speaker, at the following links:  https://www.robkalwarowsky.com/  https://www.linkedin.com/in/robert-kalwarowsky/

IBM Expert Radio
Evolving the Payments Ecosystem with ACI Worldwide

IBM Expert Radio

Play Episode Listen Later Nov 19, 2024 15:22


“Customers are demanding faster time-to-market, and faster responses.  They want more flexibility in the way that they can process payments,” says Ray Caradine, ACI Worldwide's product director. ACI has been helping clients manage payments workloads on the mainframe for decades, so they know where the ecosystem is heading. “We see IBM Z in the context of resilience and availability,” Caradine continues. “It's actually running in a data center where people can see it and touch it and control it.” More than ever, up-time, reliability, and security make the mainframe an ideal payments platform. Listen to learn how AI will analyze payments data to detect fraud and support custom-usage models -- and how Linux will help meet regulatory requirements and enable hybrid solutions.  Resources Visit the ISV Ecosystem User Group on the IBM Z and LinuxONE Community for more updates on how ISVs and partners are innovating the IBM Z platform: blogs, events, videos, discussions, and more.  Join here. Subscribe to z/Action! Each month we meet some of the world's most innovative companies as they share how they're expanding horizons and driving success with IBM Z. 

Liberty & Justice with Matt Whitaker
John McLaughlin, Donald Trump's pollster, joins Liberty & Justice with Matt Whitaker, Season 3, Episode 17.

Liberty & Justice with Matt Whitaker

Play Episode Listen Later Oct 31, 2024 28:55


John McLaughlin, Donald Trump's pollster, joins Liberty & Justice with Matt Whitaker, Season 3, Episode 17.  Presented by American Cornerstone Institute.  Learn more about ACI at https://americancornerstone.org/  Watch every episode of Liberty & Justice at www.whitaker.tv. John McLaughlin, CEO and Partner, McLaughlin and Associates.  More here: mclaughlinonline.comJohn McLaughlin has worked professionally as a strategic consultant and pollster for over 35 years. During this time he has earned a reputation for helping some of America's most successful corporations and winning some of the toughest elections in the nation. In 2016 John worked as an advisor and pollster for Donald Trump from the primaries through election day. His political clients have included former Presidential candidates Steve Forbes and Fred Thompson, former California Governor Arnold Schwarzenegger, former Florida Governor Jeb Bush, former Georgia Governor Nathan Deal and 22 current and former U.S. Senators and 16 current Republican members of Congress. Internationally, John has done work in Israel for Prime Minister Benjamin Netanyahu, The Conservative Party in the United Kingdom, former Conservative Prime Minister Stephen Harper of Canada and he advised Hungarian Prime Minister Viktor Orban in his 2018 landslide re-election. He is a founding partner of Opiniones Latinas, a public opinion research company dedicated to researching opinions of Latinos nationwide. John has appeared on every major broadcast and cable channel, as well as prominent radio talk shows across America. His articles have been published in a wide range of publications including National Review, Middle East Quarterly, Campaigns and Elections, and The Polling Report. His work has been recognized by winning Telly and PR Week Campaign Awards. John is a graduate of Fordham College (B.A) and holds an M.B.A. from Fordham University with concentrations in Finance and Quantitative Methods. He is also a member of MENSA.  Matthew G. Whitaker was acting Attorney General of the United States (2018-2019).  Prior to becoming acting Attorney General, Mr. Whitaker served as Chief of Staff to the Attorney General. He was appointed as the U.S. Attorney for the Southern District of Iowa by President George W. Bush, serving from 2004-2009. Whitaker was the managing partner of Des Moines-based law firm, Whitaker Hagenow & Gustoff LLP from 2009 until rejoining DOJ in 2017. He was also the Executive Director for FACT, The Foundation for Accountability & Civic Trust, an ethics and accountability watchdog, between 2014 and 2017.   Mr. Whitaker is the Author of the book--Above the Law, The Inside Story of How the Justice Department Tried to Subvert President Trump.  Buy Matt's book here: https://amzn.to/3IXUOb8 Mr. Whitaker graduated with a Master of Business Administration, Juris Doctor, and Bachelor of Arts from the University of Iowa.  While at Iowa, Mr. Whitaker was a three-year letterman on the football team where he received the prestigious Big Ten Medal of Honor. Mr. Whitaker is now a Senior Fellow with the American Cornerstone Institute, Co-Chair of the Center for Law and Justice at America First Policy Institute and a Senior Fellow at the American Conservative Union Foundation. Matt is on the Board of Directors for America First Legal Foundation. He is also Of Counsel with the Graves Garrett law firm.  Whitaker appears regularly to discuss legal and political issues on Fox News, Newsmax and other news outlets.      

SBS Dinka - SBS Dinka
Dhöl de kura de kɔc ke baai ku thiɛ̈c de piööc në Melbourne

SBS Dinka - SBS Dinka

Play Episode Listen Later Oct 2, 2024 5:45


Aciëŋ ë määt ë kɔc yiic ë pol de kura tënë mïth dhuo ku nyïïr të lëu bï yïn keek lɛɛr bïk pol ëtök.

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

OpenAI DevDay is almost here! Per tradition, we are hosting a DevDay pregame event for everyone coming to town! Join us with demos and gossip!Also sign up for related events across San Francisco: the AI DevTools Night, the xAI open house, the Replicate art show, the DevDay Watch Party (for non-attendees), Hack Night with OpenAI at Cloudflare. For everyone else, join the Latent Space Discord for our online watch party and find fellow AI Engineers in your city.OpenAI's recent o1 release (and Reflection 70b debacle) has reignited broad interest in agentic general reasoning and tree search methods.While we have covered some of the self-taught reasoning literature on the Latent Space Paper Club, it is notable that the Eric Zelikman ended up at xAI, whereas OpenAI's hiring of Noam Brown and now Shunyu suggests more interest in tool-using chain of thought/tree of thought/generator-verifier architectures for Level 3 Agents.We were more than delighted to learn that Shunyu is a fellow Latent Space enjoyer, and invited him back (after his first appearance on our NeurIPS 2023 pod) for a look through his academic career with Harrison Chase (one year after his first LS show).ReAct: Synergizing Reasoning and Acting in Language Modelspaper linkFollowing seminal Chain of Thought papers from Wei et al and Kojima et al, and reflecting on lessons from building the WebShop human ecommerce trajectory benchmark, Shunyu's first big hit, the ReAct paper showed that using LLMs to “generate both reasoning traces and task-specific actions in an interleaved manner” achieved remarkably greater performance (less hallucination/error propagation, higher ALFWorld/WebShop benchmark success) than CoT alone. In even better news, ReAct scales fabulously with finetuning:As a member of the elite Princeton NLP group, Shunyu was also a coauthor of the Reflexion paper, which we discuss in this pod.Tree of Thoughtspaper link hereShunyu's next major improvement on the CoT literature was Tree of Thoughts:Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role…ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices.The beauty of ToT is it doesnt require pretraining with exotic methods like backspace tokens or other MCTS architectures. You can listen to Shunyu explain ToT in his own words on our NeurIPS pod, but also the ineffable Yannic Kilcher:Other WorkWe don't have the space to summarize the rest of Shunyu's work, you can listen to our pod with him now, and recommend the CoALA paper and his initial hit webinar with Harrison, today's guest cohost:as well as Shunyu's PhD Defense Lecture:as well as Shunyu's latest lecture covering a Brief History of LLM Agents:As usual, we are live on YouTube! Show Notes* Harrison Chase* LangChain, LangSmith, LangGraph* Shunyu Yao* Alec Radford* ReAct Paper* Hotpot QA* Tau Bench* WebShop* SWE-Agent* SWE-Bench* Trees of Thought* CoALA Paper* Related Episodes* Our Thomas Scialom (Meta) episode* Shunyu on our NeurIPS 2023 Best Papers episode* Harrison on our LangChain episode* Mentions* Sierra* Voyager* Jason Wei* Tavily* SERP API* ExaTimestamps* [00:00:00] Opening Song by Suno* [00:03:00] Introductions* [00:06:16] The ReAct paper* [00:12:09] Early applications of ReAct in LangChain* [00:17:15] Discussion of the Reflection paper* [00:22:35] Tree of Thoughts paper and search algorithms in language models* [00:27:21] SWE-Agent and SWE-Bench for coding benchmarks* [00:39:21] CoALA: Cognitive Architectures for Language Agents* [00:45:24] Agent-Computer Interfaces (ACI) and tool design for agents* [00:49:24] Designing frameworks for agents vs humans* [00:53:52] UX design for AI applications and agents* [00:59:53] Data and model improvements for agent capabilities* [01:19:10] TauBench* [01:23:09] Promising areas for AITranscriptAlessio [00:00:01]: Hey, everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO of Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Small AI.Swyx [00:00:12]: Hey, and today we have a super special episode. I actually always wanted to take like a selfie and go like, you know, POV, you're about to revolutionize the world of agents because we have two of the most awesome hiring agents in the house. So first, we're going to welcome back Harrison Chase. Welcome. Excited to be here. What's new with you recently in sort of like the 10, 20 second recap?Harrison [00:00:34]: Linkchain, Linksmith, Lingraph, pushing on all of them. Lots of cool stuff related to a lot of the stuff that we're going to talk about today, probably.Swyx [00:00:42]: Yeah.Alessio [00:00:43]: We'll mention it in there. And the Celtics won the title.Swyx [00:00:45]: And the Celtics won the title. You got that going on for you. I don't know. Is that like floorball? Handball? Baseball? Basketball.Alessio [00:00:52]: Basketball, basketball.Harrison [00:00:53]: Patriots aren't looking good though, so that's...Swyx [00:00:56]: And then Xun Yu, you've also been on the pod, but only in like a sort of oral paper presentation capacity. But welcome officially to the LinkedSpace pod.Shunyu [00:01:03]: Yeah, I've been a huge fan. So thanks for the invitation. Thanks.Swyx [00:01:07]: Well, it's an honor to have you on. You're one of like, you're maybe the first PhD thesis defense I've ever watched in like this AI world, because most people just publish single papers, but every paper of yours is a banger. So congrats.Shunyu [00:01:22]: Thanks.Swyx [00:01:24]: Yeah, maybe we'll just kick it off with, you know, what was your journey into using language models for agents? I like that your thesis advisor, I didn't catch his name, but he was like, you know... Karthik. Yeah. It's like, this guy just wanted to use language models and it was such a controversial pick at the time. Right.Shunyu [00:01:39]: The full story is that in undergrad, I did some computer vision research and that's how I got into AI. But at the time, I feel like, you know, you're just composing all the GAN or 3D perception or whatever together and it's not exciting anymore. And one day I just see this transformer paper and that's really cool. But I really got into language model only when I entered my PhD and met my advisor Karthik. So he was actually the second author of GPT-1 when he was like a visiting scientist at OpenAI. With Alec Redford?Swyx [00:02:10]: Yes.Shunyu [00:02:11]: Wow. That's what he told me. It's like back in OpenAI, they did this GPT-1 together and Ilya just said, Karthik, you should stay because we just solved the language. But apparently Karthik is not fully convinced. So he went to Princeton, started his professorship and I'm really grateful. So he accepted me as a student, even though I have no prior knowledge in NLP. And you know, we just met for the first time and he's like, you know, what do you want to do? And I'm like, you know, you have done those test game scenes. That's really cool. I wonder if we can just redo them with language models. And that's how the whole journey began. Awesome.Alessio [00:02:46]: So GPT-2 was out at the time? Yes, that was 2019.Shunyu [00:02:48]: Yeah.Alessio [00:02:49]: Way too dangerous to release. And then I guess the first work of yours that I came across was React, which was a big part of your defense. But also Harrison, when you came on The Pockets last year, you said that was one of the first papers that you saw when you were getting inspired for BlankChain. So maybe give a recap of why you thought it was cool, because you were already working in AI and machine learning. And then, yeah, you can kind of like intro the paper formally. What was that interesting to you specifically?Harrison [00:03:16]: Yeah, I mean, I think the interesting part was using these language models to interact with the outside world in some form. And I think in the paper, you mostly deal with Wikipedia. And I think there's some other data sets as well. But the outside world is the outside world. And so interacting with things that weren't present in the LLM and APIs and calling into them and thinking about the React reasoning and acting and kind of like combining those together and getting better results. I'd been playing around with LLMs, been talking with people who were playing around with LLMs. People were trying to get LLMs to call into APIs, do things, and it was always, how can they do it more reliably and better? And so this paper was basically a step in that direction. And I think really interesting and also really general as well. Like I think that's part of the appeal is just how general and simple in a good way, I think the idea was. So that it was really appealing for all those reasons.Shunyu [00:04:07]: Simple is always good. Yeah.Alessio [00:04:09]: Do you have a favorite part? Because I have one favorite part from your PhD defense, which I didn't understand when I read the paper, but you said something along the lines, React doesn't change the outside or the environment, but it does change the insight through the context, putting more things in the context. You're not actually changing any of the tools around you to work for you, but you're changing how the model thinks. And I think that was like a very profound thing when I, not that I've been using these tools for like 18 months. I'm like, I understand what you meant, but like to say that at the time you did the PhD defense was not trivial. Yeah.Shunyu [00:04:41]: Another way to put it is like thinking can be an extra tool that's useful.Alessio [00:04:47]: Makes sense. Checks out.Swyx [00:04:49]: Who would have thought? I think it's also more controversial within his world because everyone was trying to use RL for agents. And this is like the first kind of zero gradient type approach. Yeah.Shunyu [00:05:01]: I think the bigger kind of historical context is that we have this two big branches of AI. So if you think about RL, right, that's pretty much the equivalent of agent at a time. And it's like agent is equivalent to reinforcement learning and reinforcement learning is equivalent to whatever game environment they're using, right? Atari game or go or whatever. So you have like a pretty much, you know, you have a biased kind of like set of methodologies in terms of reinforcement learning and represents agents. On the other hand, I think NLP is like a historical kind of subject. It's not really into agents, right? It's more about reasoning. It's more about solving those concrete tasks. And if you look at SEL, right, like each task has its own track, right? Summarization has a track, question answering has a track. So I think really it's about rethinking agents in terms of what could be the new environments that we came to have is not just Atari games or whatever video games, but also those text games or language games. And also thinking about, could there be like a more general kind of methodology beyond just designing specific pipelines for each NLP task? That's like the bigger kind of context, I would say.Alessio [00:06:14]: Is there an inspiration spark moment that you remember or how did you come to this? We had Trida on the podcast and he mentioned he was really inspired working with like systems people to think about Flash Attention. What was your inspiration journey?Shunyu [00:06:27]: So actually before React, I spent the first two years of my PhD focusing on text-based games, or in other words, text adventure games. It's a very kind of small kind of research area and quite ad hoc, I would say. And there are like, I don't know, like 10 people working on that at the time. And have you guys heard of Zork 1, for example? So basically the idea is you have this game and you have text observations, like you see a monster, you see a dragon.Swyx [00:06:57]: You're eaten by a grue.Shunyu [00:06:58]: Yeah, you're eaten by a grue. And you have actions like kill the grue with a sword or whatever. And that's like a very typical setup of a text game. So I think one day after I've seen all the GPT-3 stuff, I just think about, you know, how can I solve the game? Like why those AI, you know, machine learning methods are pretty stupid, but we are pretty good at solving the game relatively, right? So for the context, the predominant method to solve this text game is obviously reinforcement learning. And the idea is you just try out an arrow in those games for like millions of steps and you kind of just overfit to the game. But there's no language understanding at all. And I'm like, why can't I solve the game better? And it's kind of like, because we think about the game, right? Like when we see this very complex text observation, like you see a grue and you might see a sword, you know, in the right of the room and you have to go through the wooden door to go to that room. You will think, you know, oh, I have to kill the monster and to kill that monster, I have to get the sword, I have to get the sword, I have to go, right? And this kind of thinking actually helps us kind of throw shots off the game. And it's like, why don't we also enable the text agents to think? And that's kind of the prototype of React. And I think that's actually very interesting because the prototype, I think, was around November of 2021. So that's even before like chain of thought or whatever came up. So we did a bunch of experiments in the text game, but it was not really working that well. Like those text games are just too hard. I think today it's still very hard. Like if you use GPD 4 to solve it, it's still very hard. So the change came when I started the internship in Google. And apparently Google care less about text game, they care more about what's more practical. So pretty much I just reapplied the idea, but to more practical kind of environments like Wikipedia or simpler text games like Alphard, and it just worked. It's kind of like you first have the idea and then you try to find the domains and the problems to demonstrate the idea, which is, I would say, different from most of the AI research, but it kind of worked out for me in that case.Swyx [00:09:09]: For Harrison, when you were implementing React, what were people applying React to in the early days?Harrison [00:09:14]: I think the first demo we did probably had like a calculator tool and a search tool. So like general things, we tried to make it pretty easy to write your own tools and plug in your own things. And so this is one of the things that we've seen in LangChain is people who build their own applications generally write their own tools. Like there are a few common ones. I'd say like the three common ones might be like a browser, a search tool, and a code interpreter. But then other than that-Swyx [00:09:37]: The LMS. Yep.Harrison [00:09:39]: Yeah, exactly. It matches up very nice with that. And we actually just redid like our integrations docs page, and if you go to the tool section, they like highlight those three, and then there's a bunch of like other ones. And there's such a long tail of other ones. But in practice, like when people go to production, they generally have their own tools or maybe one of those three, maybe some other ones, but like very, very few other ones. So yeah, I think the first demos was a search and a calculator one. And there's- What's the data set?Shunyu [00:10:04]: Hotpot QA.Harrison [00:10:05]: Yeah. Oh, so there's that one. And then there's like the celebrity one by the same author, I think.Swyx [00:10:09]: Olivier Wilde's boyfriend squared. Yeah. 0.23. Yeah. Right, right, right.Harrison [00:10:16]: I'm forgetting the name of the author, but there's-Swyx [00:10:17]: I was like, we're going to over-optimize for Olivier Wilde's boyfriend, and it's going to change next year or something.Harrison [00:10:21]: There's a few data sets kind of like in that vein that require multi-step kind of like reasoning and thinking. So one of the questions I actually had for you in this vein, like the React paper, there's a few things in there, or at least when I think of that, there's a few things that I think of. There's kind of like the specific prompting strategy. Then there's like this general idea of kind of like thinking and then taking an action. And then there's just even more general idea of just like taking actions in a loop. Today, like obviously language models have changed a lot. We have tool calling. The specific prompting strategy probably isn't used super heavily anymore. Would you say that like the concept of React is still used though? Or like do you think that tool calling and running tool calling in a loop, is that ReactSwyx [00:11:02]: in your mind?Shunyu [00:11:03]: I would say like it's like more implicitly used than explicitly used. To be fair, I think the contribution of React is actually twofold. So first is this idea of, you know, we should be able to use calls in a very general way. Like there should be a single kind of general method to handle interaction with various environments. I think React is the first paper to demonstrate the idea. But then I think later there are two form or whatever, and this becomes like a trivial idea. But I think at the time, that's like a pretty non-trivial thing. And I think the second contribution is this idea of what people call like inner monologue or thinking or reasoning or whatever, to be paired with tool use. I think that's still non-trivial because if you look at the default function calling or whatever, like there's no inner monologue. And in practice, that actually is important, especially if the tool that you use is pretty different from the training distribution of the language model. I think those are the two main things that are kind of inherited.Harrison [00:12:10]: On that note, I think OpenAI even recommended when you're doing tool calling, it's sometimes helpful to put a thought field in the tool, along with all the actual acquired arguments,Swyx [00:12:19]: and then have that one first.Harrison [00:12:20]: So it fills out that first, and they've shown that that's yielded better results. The reason I ask is just like this same concept is still alive, and I don't know whether to call it a React agent or not. I don't know what to call it. I think of it as React, like it's the same ideas that were in the paper, but it's obviously a very different implementation at this point in time. And so I just don't know what to call it.Shunyu [00:12:40]: I feel like people will sometimes think more in terms of different tools, right? Because if you think about a web agent versus, you know, like a function calling agent, calling a Python API, you would think of them as very different. But in some sense, the methodology is the same. It depends on how you view them, right? I think people will tend to think more in terms of the environment and the tools rather than the methodology. Or, in other words, I think the methodology is kind of trivial and simple, so people will try to focus more on the different tools. But I think it's good to have a single underlying principle of those things.Alessio [00:13:17]: How do you see the surface of React getting molded into the model? So a function calling is a good example of like, now the model does it. What about the thinking? Now most models that you use kind of do chain of thought on their own, they kind of produce steps. Do you think that more and more of this logic will be in the model? Or do you think the context window will still be the main driver of reasoning and thinking?Shunyu [00:13:39]: I think it's already default, right? You do some chain of thought and you do some tool call, the cost of adding the chain of thought is kind of relatively low compared to other things. So it's not hurting to do that. And I think it's already kind of common practice, I would say.Swyx [00:13:56]: This is a good place to bring in either Tree of Thought or Reflection, your pick.Shunyu [00:14:01]: Maybe Reflection, to respect the time order, I would say.Swyx [00:14:05]: Any backstory as well, like the people involved with NOAA and the Princeton group. We talked about this offline, but people don't understand how these research pieces come together and this ideation.Shunyu [00:14:15]: I think Reflection is mostly NOAA's work, I'm more like advising kind of role. The story is, I don't remember the time, but one day we just see this pre-print that's like Reflection and Autonomous Agent with memory or whatever. And it's kind of like an extension to React, which uses this self-reflection. I'm like, oh, somehow you've become very popular. And NOAA reached out to me, it's like, do you want to collaborate on this and make this from an archive pre-print to something more solid, like a conference submission? I'm like, sure. We started collaborating and we remain good friends today. And I think another interesting backstory is NOAA was contacted by OpenAI at the time. It's like, this is pretty cool, do you want to just work at OpenAI? And I think Sierra also reached out at the same time. It's like, this is pretty cool, do you want to work at Sierra? And I think NOAA chose Sierra, but it's pretty cool because he was still like a second year undergrad and he's a very smart kid.Swyx [00:15:16]: Based on one paper. Oh my god.Shunyu [00:15:19]: He's done some other research based on programming language or chemistry or whatever, but I think that's the paper that got the attention of OpenAI and Sierra.Swyx [00:15:28]: For those who haven't gone too deep on it, the way that you present the inside of React, can you do that also for reflection? Yeah.Shunyu [00:15:35]: I think one way to think of reflection is that the traditional idea of reinforcement learning is you have a scalar reward and then you somehow back-propagate the signal of the scalar reward to the rest of your neural network through whatever algorithm, like policy grading or A2C or whatever. And if you think about the real life, most of the reward signal is not scalar. It's like your boss told you, you should have done a better job in this, but you could jump on that or whatever. It's not like a scalar reward, like 29 or something. I think in general, humans deal more with long scalar reward, or you can say language feedback. And the way that they deal with language feedback also has this back-propagation process, right? Because you start from this, you did a good job on job B, and then you reflect what could have been done differently to change to make it better. And you kind of change your prompt, right? Basically, you change your prompt on how to do job A and how to do job B, and then you do the whole thing again. So it's really like a pipeline of language where in self-graded descent, you have something like text reasoning to replace those gradient descent algorithms. I think that's one way to think of reflection.Harrison [00:16:47]: One question I have about reflection is how general do you think the algorithm there is? And so for context, I think at LangChain and at other places as well, we found it pretty easy to implement React in a standard way. You plug in any tools and it kind of works off the shelf, can get it up and running. I don't think we have an off-the-shelf kind of implementation of reflection and kind of the general sense. I think the concepts, absolutely, we see used in different kind of specific cognitive architectures, but I don't think we have one that comes off the shelf. I don't think any of the other frameworks have one that comes off the shelf. And I'm curious whether that's because it's not general enough or it's complex as well, because it also requires running it more times.Swyx [00:17:28]: Maybe that's not feasible.Harrison [00:17:30]: I'm curious how you think about the generality, complexity. Should we have one that comes off the shelf?Shunyu [00:17:36]: I think the algorithm is general in the sense that it's just as general as other algorithms, if you think about policy grading or whatever, but it's not applicable to all tasks, just like other algorithms. So you can argue PPO is also general, but it works better for those set of tasks, but not on those set of tasks. I think it's the same situation for reflection. And I think a key bottleneck is the evaluator, right? Basically, you need to have a good sense of the signal. So for example, if you are trying to do a very hard reasoning task, say mathematics, for example, and you don't have any tools, you're operating in this chain of thought setup, then reflection will be pretty hard because in order to reflect upon your thoughts, you have to have a very good evaluator to judge whether your thought is good or not. But that might be as hard as solving the problem itself or even harder. The principle of self-reflection is probably more applicable if you have a good evaluator, for example, in the case of coding. If you have those arrows, then you can just reflect on that and how to solve the bug andSwyx [00:18:37]: stuff.Shunyu [00:18:38]: So I think another criteria is that it depends on the application, right? If you have this latency or whatever need for an actual application with an end-user, the end-user wouldn't let you do two hours of tree-of-thought or reflection, right? You need something as soon as possible. So in that case, maybe this is better to be used as a training time technique, right? You do those reflection or tree-of-thought or whatever, you get a lot of data, and then you try to use the data to train your model better. And then in test time, you still use something as simple as React, but that's already improved.Alessio [00:19:11]: And if you think of the Voyager paper as a way to store skills and then reuse them, how would you compare this reflective memory and at what point it's just ragging on the memory versus you want to start to fine-tune some of them or what's the next step once you get a very long reflective corpus? Yeah.Shunyu [00:19:30]: So I think there are two questions here. The first question is, what type of information or memory are you considering, right? Is it like semantic memory that stores knowledge about the word, or is it the episodic memory that stores trajectories or behaviors, or is it more of a procedural memory like in Voyager's case, like skills or code snippets that you can use to do actions, right?Swyx [00:19:54]: That's one dimension.Shunyu [00:19:55]: And the second dimension is obviously how you use the memory, either retrieving from it, using it in the context, or fine-tuning it. I think the Cognitive Architecture for Language Agents paper has a good categorization of all the different combinations. And of course, which way you use depends on the concrete application and the concrete need and the concrete task. But I think in general, it's good to think of those systematic dimensions and all the possible options there.Swyx [00:20:25]: Harrison also has in LangMEM, I think you did a presentation in my meetup, and I think you've done it at a couple other venues as well. User state, semantic memory, and append-only state, I think kind of maps to what you just said.Shunyu [00:20:38]: What is LangMEM? Can I give it like a quick...Harrison [00:20:40]: One of the modules of LangChain for a long time has been something around memory. And I think we're still obviously figuring out what that means, as is everyone kind of in the space. But one of the experiments that we did, and one of the proof of concepts that we did was, technically what it was is you would basically create threads, you'd push messages to those threads in the background, we process the data in a few ways. One, we put it into some semantic store, that's the semantic memory. And then two, we do some extraction and reasoning over the memories to extract. And we let the user define this, but extract key facts or anything that's of interest to the user. Those aren't exactly trajectories, they're maybe more closer to the procedural memory. Is that how you'd think about it or classify it?Shunyu [00:21:22]: Is it like about knowledge about the word, or is it more like how to do something?Swyx [00:21:27]: It's reflections, basically.Harrison [00:21:28]: So in generative worlds.Shunyu [00:21:30]: Generative agents.Swyx [00:21:31]: The Smallville. Yeah, the Smallville one.Harrison [00:21:33]: So the way that they had their memory there was they had the sequence of events, and that's kind of like the raw events that happened. But then every N events, they'd run some synthesis over those events for the LLM to insert its own memory, basically. It's that type of memory.Swyx [00:21:49]: I don't know how that would be classified.Shunyu [00:21:50]: I think of that as more of the semantic memory, but to be fair, I think it's just one way to think of that. But whether it's semantic memory or procedural memory or whatever memory, that's like an abstraction layer. But in terms of implementation, you can choose whatever implementation for whatever memory. So they're totally kind of orthogonal. I think it's more of a good way to think of the things, because from the history of cognitive science and cognitive architecture and how people study even neuroscience, that's the way people think of how the human brain organizes memory. And I think it's more useful as a way to think of things. But it's not like for semantic memory, you have to do this kind of way to retrieve or fine-tune, and for procedural memory, you have to do that. I think those are totally orthogonal kind of dimensions.Harrison [00:22:34]: How much background do you have in cognitive sciences, and how much do you model some of your thoughts on?Shunyu [00:22:40]: That's a great question, actually. I think one of the undergrad influences for my follow-up research is I was doing an internship at MIT's Computational Cognitive Science Lab with Josh Tannenbaum, and he's a very famous cognitive scientist. And I think a lot of his ideas still influence me today, like thinking of things in computational terms and getting interested in language and a lot of stuff, or even developing psychology kind of stuff. So I think it still influences me today.Swyx [00:23:14]: As a developer that tried out LangMEM, the way I view it is just it's a materialized view of a stream of logs. And if anything, that's just useful for context compression. I don't have to use the full context to run it over everything. But also it's kind of debuggable. If it's wrong, I can show it to the user, the user can manually fix it, and I can carry on. That's a really good analogy. I like that. I'm going to steal that. Sure. Please, please. You know I'm bullish on memory databases. I guess, Tree of Thoughts? Yeah, Tree of Thoughts.Shunyu [00:23:39]: I feel like I'm relieving the defense in like a podcast format. Yeah, no.Alessio [00:23:45]: I mean, you had a banger. Well, this is the one where you're already successful and we just highlight the glory. It was really good. You mentioned that since thinking is kind of like taking an action, you can use action searching algorithms to think of thinking. So just like you will use Tree Search to find the next thing. And the idea behind Tree of Thought is that you generate all these possible outcomes and then find the best tree to get to the end. Maybe back to the latency question, you can't really do that if you have to respond in real time. So what are maybe some of the most helpful use cases for things like this? Where have you seen people adopt it where the high latency is actually worth the wait?Shunyu [00:24:21]: For things that you don't care about latency, obviously. For example, if you're trying to do math, if you're just trying to come up with a proof. But I feel like one type of task is more about searching for a solution. You can try a hundred times, but if you find one solution, that's good. For example, if you're finding a math proof or if you're finding a good code to solve a problem or whatever, I think another type of task is more like reacting. For example, if you're doing customer service, you're like a web agent booking a ticket for an end user. Those are more reactive kind of tasks, or more real-time tasks. You have to do things fast. They might be easy, but you have to do it reliably. And you care more about can you solve 99% of the time out of a hundred. But for the type of search type of tasks, then you care more about can I find one solution out of a hundred. So it's kind of symmetric and different.Alessio [00:25:11]: Do you have any data or intuition from your user base? What's the split of these type of use cases? How many people are doing more reactive things and how many people are experimenting with deep, long search?Harrison [00:25:23]: I would say React's probably the most popular. I think there's aspects of reflection that get used. Tree of thought, probably the least so. There's a great tweet from Jason Wei, I think you're now a colleague, and he was talking about prompting strategies and how he thinks about them. And I think the four things that he had was, one, how easy is it to implement? How much compute does it take? How many tasks does it solve? And how much does it improve on those tasks? And I'd add a fifth, which is how likely is it to be relevant when the next generation of models come out? And I think if you look at those axes and then you look at React, reflection, tree of thought, it tracks that the ones that score better are used more. React is pretty easy to implement. Tree of thought's pretty hard to implement. The amount of compute, yeah, a lot more for tree of thought. The tasks and how much it improves, I don't have amazing visibility there. But I think if we're comparing React versus tree of thought, React just dominates the first two axes so much that my question around that was going to be like, how do you think about these prompting strategies, cognitive architectures, whatever you want to call them? When you're thinking of them, what are the axes that you're judging them on in your head when you're thinking whether it's a good one or a less good one?Swyx [00:26:38]: Right.Shunyu [00:26:39]: Right. I think there is a difference between a prompting method versus research, in the sense that for research, you don't really even care about does it actually work on practical tasks or does it help? Whatever. I think it's more about the idea or the principle, right? What is the direction that you're unblocking and whatever. And I think for an actual prompting method to solve a concrete problem, I would say simplicity is very important because the simpler it is, the less decision you have to make about it. And it's easier to design. It's easier to propagate. And it's easier to do stuff. So always try to be as simple as possible. And I think latency obviously is important. If you can do things fast and you don't want to do things slow. And I think in terms of the actual prompting method to use for a particular problem, I think we should all be in the minimalist kind of camp, right? You should try the minimum thing and see if it works. And if it doesn't work and there's absolute reason to add something, then you add something, right? If there's absolute reason that you need some tool, then you should add the tool thing. If there's absolute reason to add reflection or whatever, you should add that. Otherwise, if a chain of thought can already solve something, then you don't even need to use any of that.Harrison [00:27:57]: Yeah. Or if it's just better prompting can solve it. Like, you know, you could add a reflection step or you could make your instructions a little bit clearer.Swyx [00:28:03]: And it's a lot easier to do that.Shunyu [00:28:04]: I think another interesting thing is like, I personally have never done those kind of like weird tricks. I think all the prompts that I write are kind of like just talking to a human, right? It's like, I don't know. I never say something like, your grandma is dying and you have to solve it. I mean, those are cool, but I feel like we should all try to solve things in a very intuitive way. Just like talking to your co-worker. That should work 99% of the time. That's my personal take.Swyx [00:28:29]: The problem with how language models, at least in the GPC 3 era, was that they over-optimized to some sets of tokens in sequence. So like reading the Kojima et al. paper that was listing step-by-step, like he tried a bunch of them and they had wildly different results. It should not be the case, but it is the case. And hopefully we're getting better there.Shunyu [00:28:51]: Yeah. I think it's also like a timing thing in the sense that if you think about this whole line of language model, right? Like at the time it was just like a text generator. We don't have any idea how it's going to be used, right? And obviously at the time you will find all kinds of weird issues because it's not trained to do any of that, right? But then I think we have this loop where once we realize chain of thought is important or agent is important or tool using is important, what we see is today's language models are heavily optimized towards those things. So I think in some sense they become more reliable and robust over those use cases. And you don't need to do as much prompt engineering tricks anymore to solve those things. I feel like in some sense, I feel like prompt engineering even is like a slightly negative word at the time because it refers to all those kind of weird tricks that you have to apply. But I think we don't have to do that anymore. Like given today's progress, you should just be able to talk to like a coworker. And if you're clear and concrete and being reasonable, then it should do reasonable things for you.Swyx [00:29:51]: Yeah. The way I put this is you should not be a prompt engineer because it is the goal of the big labs to put you out of a job.Shunyu [00:29:58]: You should just be a good communicator. Like if you're a good communicator to humans, you should be a good communicator to languageSwyx [00:30:02]: models.Harrison [00:30:03]: That's the key though, because oftentimes people aren't good communicators to these language models and that is a very important skill and that's still messing around with the prompt. And so it depends what you're talking about when you're saying prompt engineer.Shunyu [00:30:14]: But do you think it's like very correlated with like, are they like a good communicator to humans? You know, it's like.Harrison [00:30:20]: It may be, but I also think I would say on average, people are probably worse at communicating with language models than to humans right now, at least, because I think we're still figuring out how to do it. You kind of expect it to be magical and there's probably some correlation, but I'd say there's also just like, people are worse at it right now than talking to humans.Shunyu [00:30:36]: We should make it like a, you know, like an elementary school class or whatever, how toSwyx [00:30:41]: talk to language models. Yeah. I don't know. Very pro that. Yeah. Before we leave the topic of trees and searching, not specific about QSTAR, but there's a lot of questions about MCTS and this combination of tree search and language models. And I just had to get in a question there about how seriously should people take this?Shunyu [00:30:59]: Again, I think it depends on the tasks, right? So MCTS was magical for Go, but it's probably not as magical for robotics, right? So I think right now the problem is not even that we don't have good methodologies, it's more about we don't have good tasks. It's also very interesting, right? Because if you look at my citation, it's like, obviously the most cited are React, Refraction and Tree of Thought. Those are methodologies. But I think like equally important, if not more important line of my work is like benchmarks and environments, right? Like WebShop or SuiteVenture or whatever. And I think in general, what people do in academia that I think is not good is they choose a very simple task, like Alford, and then they apply overly complex methods to show they improve 2%. I think you should probably match the level of complexity of your task and your method. I feel like where tasks are kind of far behind the method in some sense, right? Because we have some good test-time approaches, like whatever, React or Refraction or Tree of Thought, or like there are many, many more complicated test-time methods afterwards. But on the benchmark side, we have made a lot of good progress this year, last year. But I think we still need more progress towards that, like better coding benchmark, better web agent benchmark, better agent benchmark, not even for web or code. I think in general, we need to catch up with tasks.Harrison [00:32:27]: What are the biggest reasons in your mind why it lags behind?Shunyu [00:32:31]: I think incentive is one big reason. Like if you see, you know, all the master paper are cited like a hundred times more than the task paper. And also making a good benchmark is actually quite hard. It's almost like a different set of skills in some sense, right? I feel like if you want to build a good benchmark, you need to be like a good kind of product manager kind of mindset, right? You need to think about why people should use your benchmark, why it's challenging, why it's useful. If you think about like a PhD going into like a school, right? The prior skill that expected to have is more about, you know, can they code this method and can they just run experiments and can solve that? I think building a benchmark is not the typical prior skill that we have, but I think things are getting better. I think more and more people are starting to build benchmarks and people are saying that it's like a way to get more impact in some sense, right? Because like if you have a really good benchmark, a lot of people are going to use it. But if you have a super complicated test time method, like it's very hard for people to use it.Harrison [00:33:35]: Are evaluation metrics also part of the reason? Like for some of these tasks that we might want to ask these agents or language models to do, is it hard to evaluate them? And so it's hard to get an automated benchmark. Obviously with SweetBench you can, and with coding, it's easier, but.Shunyu [00:33:50]: I think that's part of the skillset thing that I mentioned, because I feel like it's like a product manager because there are many dimensions and you need to strike a balance and it's really hard, right? If you want to make sense, very easy to autogradable, like automatically gradable, like either to grade or either to evaluate, then you might lose some of the realness or practicality. Or like it might be practical, but it might not be as scalable, right? For example, if you think about text game, human have pre-annotated all the rewards and all the language are real. So it's pretty good on autogradable dimension and the practical dimension. If you think about, you know, practical, like actual English being practical, but it's not scalable, right? It takes like a year for experts to build that game. So it's not really that scalable. And I think part of the reason that SweetBench is so popular now is it kind of hits the balance between these three dimensions, right? Easy to evaluate and being actually practical and being scalable. Like if I were to criticize upon some of my prior work, I think webshop, like it's my initial attempt to get into benchmark world and I'm trying to do a good job striking the balance. But obviously we make it all gradable and it's really scalable, but then I think the practicality is not as high as actually just using GitHub issues, right? Because you're just creating those like synthetic tasks.Harrison [00:35:13]: Are there other areas besides coding that jump to mind as being really good for being autogradable?Shunyu [00:35:20]: Maybe mathematics.Swyx [00:35:21]: Classic. Yeah. Do you have thoughts on alpha proof, the new DeepMind paper? I think it's pretty cool.Shunyu [00:35:29]: I think it's more of a, you know, it's more of like a confidence boost or like sometimes, you know, the work is not even about, you know, the technical details or the methodology that it chooses or the concrete results. I think it's more about a signal, right?Swyx [00:35:47]: Yeah. Existence proof. Yeah.Shunyu [00:35:50]: Yeah. It can be done. This direction is exciting. It kind of encourages people to work more towards that direction. I think it's more like a boost of confidence, I would say.Swyx [00:35:59]: Yeah. So we're going to focus more on agents now and, you know, all of us have a special interest in coding agents. I would consider Devin to be the sort of biggest launch of the year as far as AI startups go. And you guys in the Princeton group worked on Suiagents alongside of Suibench. Tell us the story about Suiagent. Sure.Shunyu [00:36:21]: I think it's kind of like a triology, it's actually a series of three works now. So actually the first work is called Intercode, but it's not as famous, I know. And the second work is called Suibench and the third work is called Suiagent. And I'm just really confused why nobody is working on coding. You know, it's like a year ago, but I mean, not everybody's working on coding, obviously, but a year ago, like literally nobody was working on coding. I was really confused. And the people that were working on coding are, you know, trying to solve human evil in like a sick-to-sick way. There's no agent, there's no chain of thought, there's no anything, they're just, you know, fine tuning the model and improve some points and whatever, like, I was really confused because obviously coding is the best application for agents because it's autogradable, it's super important, you can make everything like API or code action, right? So I was confused and I collaborated with some of the students in Princeton and we have this work called Intercode and the idea is, first, if you care about coding, then you should solve coding in an interactive way, meaning more like a Jupyter Notebook kind of way than just writing a program and seeing if it fails or succeeds and stop, right? You should solve it in an interactive way because that's exactly how humans solve it, right? You don't have to, you know, write a program like next token, next token, next token and stop and never do any edits and you cannot really use any terminal or whatever tool. It doesn't make sense, right? And that's the way people are solving coding at the time, basically like sampling a program from a language model without chain of thought, without tool call, without refactoring, without anything. So the first point is we should solve coding in a very interactive way and that's a very general principle that applies for various coding benchmarks. And also, I think you can make a lot of the agent task kind of like interactive coding. If you have Python and you can call any package, then you can literally also browse internet or do whatever you want, like control a robot or whatever. So that seems to be a very general paradigm. But obviously I think a bottleneck is at the time we're still doing, you know, very simple tasks like human eval or whatever coding benchmark people proposed. They were super hard in 2021, like 20%, but they're like 95% already in 2023. So obviously the next step is we need a better benchmark. And Carlos and John, which are the first authors of Swaybench, I think they come up with this great idea that we should just script GitHub and solve whatever human engineers are solving. And I think it's actually pretty easy to come up with the idea. And I think in the first week, they already made a lot of progress. They script the GitHub and they make all the same, but then there's a lot of painful info work and whatever, you know. I think the idea is super easy, but the engineering is super hard. And I feel like that's a very typical signal of a good work in the AI era now.Swyx [00:39:17]: I think also, I think the filtering was challenging, because if you look at open source PRs, a lot of them are just like, you know, fixing typos. I think it's challenging.Shunyu [00:39:27]: And to be honest, we didn't do a perfect job at the time. So if you look at the recent blog post with OpenAI, we improved the filtering so that it's more solvable.Swyx [00:39:36]: I think OpenAI was just like, look, this is a thing now. We have to fix this. These students just rushed it.Shunyu [00:39:45]: It's a good convergence of interests for me.Alessio [00:39:48]: Was that tied to you joining OpenAI? Or was that just unrelated?Shunyu [00:39:52]: It's a coincidence for me, but it's a good coincidence.Swyx [00:39:55]: There is a history of anytime a big lab adopts a benchmark, they fix it. Otherwise, it's a broken benchmark.Shunyu [00:40:03]: So naturally, once we propose swimmage, the next step is to solve it. But I think the typical way you solve something now is you collect some training samples, or you design some complicated agent method, and then you try to solve it. Either super complicated prompt, or you build a better model with more training data. But I think at the time, we realized that even before those things, there's a fundamental problem with the interface or the tool that you're supposed to use. Because that's like an ignored problem in some sense. What your tool is, or how that matters for your task. So what we found concretely is that if you just use the text terminal off the shelf as a tool for those agents, there's a lot of problems. For example, if you edit something, there's no feedback. So you don't know whether your edit is good or not. That makes the agent very confused and makes a lot of mistakes. There are a lot of small problems, you would say. Well, you can try to do prompt engineering and improve that, but it turns out to be actually very hard. We realized that the interface design is actually a very omitted part of agent design. So we did this switch agent work. And the key idea is just, even before you talk about what the agent is, you should talk about what the environment is. You should make sure that the environment is actually friendly to whatever agent you're trying to apply. That's the same idea for humans. Text terminal is good for some tasks, like git, pool, or whatever. But it's not good if you want to look at browser and whatever. Also, browser is a good tool for some tasks, but it's not a good tool for other tasks. We need to talk about how design interface, in some sense, where we should treat agents as our customers. It's like when we treat humans as a customer, we design human computer interfaces. We design those beautiful desktops or browsers or whatever, so that it's very intuitive and easy for humans to use. And this whole great subject of HCI is all about that. I think now the research idea of switch agent is just, we should treat agents as our customers. And we should do like, you know… AICI.Swyx [00:42:16]: AICI, exactly.Harrison [00:42:18]: So what are the tools that a suite agent should have, or a coding agent in general should have?Shunyu [00:42:24]: For suite agent, it's like a modified text terminal, which kind of adapts to a lot of the patterns of language models to make it easier for language models to use. For example, now for edit, instead of having no feedback, it will actually have a feedback of, you know, actually here you introduced like a syntax error, and you should probably want to fix that, and there's an ended error there. And that makes it super easy for the model to actually do that. And there's other small things, like how exactly you write arguments, right? Like, do you want to write like a multi-line edit, or do you want to write a single line edit? I think it's more interesting to think about the way of the development process of an ACI rather than the actual ACI for like a concrete application. Because I think the general paradigm is very similar to HCI and psychology, right? Basically, for how people develop HCIs, they do behavior experiments on humans, right? I do every test, right? Like, which interface is actually better? And I do those behavior experiments, kind of like psychology experiments to humans, and I change things. And I think what's really interesting for me, for this three-agent paper, is we can probably do the same thing for agents, right? We can do every test for those agents and do behavior tests. And through the process, we not only invent better interfaces for those agents, that's the practical value, but we also better understand agents. Just like when we do those A-B tests, we do those HCI, we better understand humans. Doing those ACI experiments, we actually better understand agents. And that's pretty cool.Harrison [00:43:51]: Besides that A-B testing, what are other processes that people can use to think about this in a good way?Swyx [00:43:57]: That's a great question.Shunyu [00:43:58]: And I think three-agent is an initial work. And what we do is the kind of the naive approach, right? You just try some interface, and you see what's going wrong, and then you try to fix that. We do this kind of iterative fixing. But I think what's really interesting is there'll be a lot of future directions that's very promising if we can apply some of the HCI principles more systematically into the interface design. I think that would be a very cool interdisciplinary research opportunity.Harrison [00:44:26]: You talked a lot about agent-computer interfaces and interactions. What about human-to-agent UX patterns? Curious for any thoughts there that you might have.Swyx [00:44:38]: That's a great question.Shunyu [00:44:39]: And in some sense, I feel like prompt engineering is about human-to-agent interface. But I think there can be a lot of interesting research done about... So prompting is about how humans can better communicate with the agent. But I think there could be interesting research on how agents can better communicate with humans, right? When to ask questions, how to ask questions, what's the frequency of asking questions. And I think those kinds of stuff could be very cool research.Harrison [00:45:07]: Yeah, I think some of the most interesting stuff that I saw here was also related to coding with Devin from Cognition. And they had the three or four different panels where you had the chat, the browser, the terminal, and I guess the code editor as well.Swyx [00:45:19]: There's more now.Harrison [00:45:19]: There's more. Okay, I'm not up to date. Yeah, I think they also did a good job on ACI.Swyx [00:45:25]: I think that's the main learning I have from Devin. They cracked that. Actually, there was no foundational planning breakthrough. The planner is actually pretty simple, but ACI that they broke through on.Shunyu [00:45:35]: I think making the tool good and reliable is probably like 90% of the whole agent. Once the tool is actually good, then the agent design can be much, much simpler. On the other hand, if the tool is bad, then no matter how much you put into the agent design, planning or search or whatever, it's still going to be trash.Harrison [00:45:53]: Yeah, I'd argue the same. Same with like context and instructions. Like, yeah, go hand in hand.Alessio [00:46:00]: On the tool, how do you think about the tension of like, for both of you, I mean, you're building a library, so even more for you. The tension between making now a language or a library that is like easy for the agent to grasp and write versus one that is easy for like the human to grasp and write. Because, you know, the trend is like more and more code gets written by the agent. So why wouldn't you optimize the framework to be as easy as possible for the model versus for the person?Swyx [00:46:24]: I think it's possible to design an interfaceShunyu [00:46:25]: that's both friendly to humans and agents. But what do you think?Harrison [00:46:29]: We haven't thought about that from the perspective, like we're not trying to design LangChain or LangGraph to be friendly. But I mean, I think to be friendly for agents to write.Swyx [00:46:42]: But I mean, I think we see this with like,Harrison [00:46:43]: I saw some paper that used TypeScript notation instead of JSON notation for tool calling and it got a lot better performance. So it's definitely a thing. I haven't really heard of anyone designing like a syntax or a language explicitly for agents, but there's clearly syntaxes that are better.Shunyu [00:46:59]: I think function calling is a good example where it's like a good interface for both human programmers and for agents, right? Like for developers, it's actually a very friendly interface because it's very concrete and you don't have to do prompt engineering anymore. You can be very systematic. And for models, it's also pretty good, right? Like it can use all the existing coding content. So I think we need more of those kinds of designs.Swyx [00:47:21]: I will mostly agree and I'll slightly disagree in terms of this, which is like, whether designing for humans also overlaps with designing for AI. So Malte Ubo, who's the CTO of Vercel, who is creating basically JavaScript's competitor to LangChain, they're observing that basically, like if the API is easy to understand for humans, it's actually much easier to understand for LLMs, for example, because they're not overloaded functions. They don't behave differently under different contexts. They do one thing and they always work the same way. It's easy for humans, it's easy for LLMs. And like that makes a lot of sense. And obviously adding types is another one. Like type annotations only help give extra context, which is really great. So that's the agreement. And then a disagreement is that when I use structured output to do my chain of thought, I have found that I change my field names to hint to the LLM of what the field is supposed to do. So instead of saying topics, I'll say candidate topics. And that gives me a better result because the LLM was like, ah, this is just a draft thing I can use for chain of thought. And instead of like summaries, I'll say topic summaries to link the previous field to the current field. So like little stuff like that, I find myself optimizing for the LLM where I, as a human, would never do that. Interesting.Shunyu [00:48:32]: It's kind of like the way you optimize the prompt, it might be different for humans and for machines. You can have a common ground that's both clear for humans and agents, but to improve the human performance versus improving the agent performance, they might move to different directions.Swyx [00:48:48]: Might move different directions. There's a lot more use of metadata as well, like descriptions, comments, code comments, annotations and stuff like that. Yeah.Harrison [00:48:56]: I would argue that's just you communicatingSwyx [00:48:58]: to the agent what it should do.Harrison [00:49:00]: And maybe you need to communicate a little bit more than to humans because models aren't quite good enough yet.Swyx [00:49:06]: But like, I don't think that's crazy.Harrison [00:49:07]: I don't think that's like- It's not crazy.Swyx [00:49:09]: I will bring this in because it just happened to me yesterday. I was at the cursor office. They held their first user meetup and I was telling them about the LLM OS concept and why basically every interface, every tool was being redesigned for AIs to use rather than humans. And they're like, why? Like, can we just use Bing and Google for LLM search? Why must I use Exa? Or what's the other one that you guys work with?Harrison [00:49:32]: Tavilli.Swyx [00:49:33]: Tavilli. Web Search API dedicated for LLMs. What's the difference?Shunyu [00:49:36]: Exactly. To Bing API.Swyx [00:49:38]: Exactly.Harrison [00:49:38]: There weren't great APIs for search. Like the best one, like the one that we used initially in LangChain was SERP API, which is like maybe illegal. I'm not sure.Swyx [00:49:49]: And like, you know,Harrison [00:49:52]: and now there are like venture-backed companies.Swyx [00:49:53]: Shout out to DuckDuckGo, which is free.Harrison [00:49:55]: Yes, yes.Swyx [00:49:56]: Yeah.Harrison [00:49:56]: I do think there are some differences though. I think you want, like, I think generally these APIs try to return small amounts of text information, clear legible field. It's not a massive JSON blob. And I think that matters. I think like when you talk about designing tools, it's not only the, it's the interface in the entirety, not only the inputs, but also the outputs that really matter. And so I think they try to make the outputs.Shunyu [00:50:18]: They're doing ACI.Swyx [00:50:19]: Yeah, yeah, absolutely.Harrison [00:50:20]: Really?Swyx [00:50:21]: Like there's a whole set of industries that are just being redone for ACI. It's weird. And so my simple answer to them was like the error messages. When you give error messages, they should be basically prompts for the LLM to take and then self-correct. Then your error messages get more verbose, actually, than you normally would with a human. Stuff like that. Like a little, honestly, it's not that big. Again, like, is this worth a venture-backed industry? Unless you can tell us. But like, I think Code Interpreter, I think is a new thing. I hope so.Alessio [00:50:52]: We invested in it to be so.Shunyu [00:50:53]: I think that's a very interesting point. You're trying to optimize to the extreme, then obviously they're going to be different. For example, the error—Swyx [00:51:00]: Because we take it very seriously. Right.Shunyu [00:51:01]: The error for like language model, the longer the better. But for humans, that will make them very nervous and very tired, right? But I guess the point is more like, maybe we should try to find a co-optimized common ground as much as possible. And then if we have divergence, then we should try to diverge. But it's more philosophical now.Alessio [00:51:19]: But I think like part of it is like how you use it. So Google invented the PageRank because ideally you only click on one link, you know, like the top three should have the answer. But with models, it's like, well, you can get 20. So those searches are more like semantic grouping in a way. It's like for this query, I'll return you like 20, 30 things that are kind of good, you know? So it's less about ranking and it's more about grouping.Shunyu [00:51:42]: Another fundamental thing about HCI is the difference between human and machine's kind of memory limit, right? So I think what's really interesting about this concept HCI versus HCI is interfaces that's optimized for them. You can kind of understand some of the fundamental characteristics, differences of humans and machines, right? Why, you know, if you look at find or whatever terminal command, you know, you can only look at one thing at a time or that's because we have a very small working memory. You can only deal with one thing at a time. You can only look at one paragraph of text at the same time. So the interface for us is by design, you know, a small piece of information, but more temporal steps. But for machines, that should be the opposite, right? You should just give them a hundred different results and they should just decide in context what's the most relevant stuff and trade off the context for temporal steps. That's actually also better for language models because like the cost is smaller or whatever. So it's interesting to connect those interfaces to the fundamental kind of differences of those.Harrison [00:52:43]: When you said earlier, you know, we should try to design these to maybe be similar as possible and diverge if we need to.Swyx [00:52:49]: I actually don't have a problem with them diverging nowHarrison [00:52:51]: and seeing venture-backed startups emerging now because we are different from machines code AI. And it's just so early on, like they may still look kind of similar and they may still be small differences, but it's still just so early. And I think we'll only discover more ways that they differ. And so I'm totally fine with them kind of like diverging earlySwyx [00:53:10]: and optimizing for the...Harrison [00:53:11]: I agree. I think it's more like, you know,Shunyu [00:53:14]: we should obviously try to optimize human interface just for humans. We're already doing that for 50 years. We should optimize agent interface just for agents, but we might also try to co-optimize both and see how far we can get. There's enough people to try all three directions. Yeah.Swyx [00:53:31]: There's a thesis I sometimes push, which is the sour lesson as opposed to the bitter lesson, which we're always inspired by human development, but actually AI develops its own path.Shunyu [00:53:40]: Right. We need to understand better, you know, what are the fundamental differences between those creatures.Swyx [00:53:45]: It's funny when really early on this pod, you were like, how much grounding do you have in cognitive development and human brain stuff? And I'm like

The Heart Of Show Business With Alexia Melocchi
A new vision for Faith-Based Films- with Chevonne O'Shaughnessy

The Heart Of Show Business With Alexia Melocchi

Play Episode Listen Later Sep 3, 2024 37:21


New Season Premiere! With a portfolio that shifts from high-octane action films to heartwarming, inspirational family stories. I sit down with my longtime friend and industry powerhouse, Chevonne O' Shaughnessy from ACI Inspires. As the co founder of ACI with indie sales agent and producer George Shamieh,  she believes audiences are craving Christian-themed, uplifting content in today's world filled with turmoil. But Chevonne has a plan:  producing ten movies within eighteen months.You will love the  back story on how the Love Finds You book series turned into a major hit for UP TV and how Chevonne overcame initial rejections to leveraging dating sites and Christian bookstores for its promotion. Despite the channel's pivot to a younger demographic, ACI broke viewership records and learned invaluable lessons about staying ahead of market trends. This episode offers a behind-the-scenes look at the bittersweet reality of changing management and the relentless drive to remain relevant in a constantly evolving industry.Our conversation doesn't stop at domestic success; we explore the complexities of international markets and the importance of owning intellectual property. Learn how we navigated the challenges of producing and distributing films globally, including innovative tactics like dubbing foreign series into English and harnessing AI technology for production efficiency. Additionally, we shed light on the unique hurdles faced by women in executive roles and the significance of teamwork. This is a truly enlightening and inspiring listen for anyone interested in the dynamic world of entertainment.Check out ACI on the go for quality family entertainmenthttps://www.youtube.com/c/acionthegoWant to listen to more episode on your favorite audio player? Visit our podcast website for over 100 episodes!https://www.theheartofshowbusiness.comAbout your Host- Alexia MelocchiBuy My Book - An Insiders Secret: Mastering the Hollywood PathAlexia Melocchi - WebsiteThe Heart of Show Business - WebsiteLittle Studio Films - WebsiteShop Our Merchandise!TwitterInstagramFacebookLinkedInThanks for listening! Follow us on X, Instagram and Facebook and on the podcast's official site www.theheartofshowbusiness.com

Cisco Champion Radio
S11|E18 Modernize Data Center Networks with Cisco Innovations

Cisco Champion Radio

Play Episode Listen Later Sep 3, 2024 54:39


In this episode, we delve into the latest innovations in data center networking unveiled at Cisco Live US. Discover how Cisco is revolutionizing data center operations with new products and solutions aimed at simplifying operations, enhancing security, and ensuring a consistent user experience across various infrastructure architectures. Join us as we explore Cisco Nexus Dashboard, a pivotal component that centralizes operations, automation, and management for data center networks. Learn how it provides a unified view to manage and configure ACI fabrics, NX-OS devices, and interconnect networks, delivering consistent outcomes across multiple locations. We'll discuss its simplified deployment options, switch-based licensing model, and integration of campus capabilities for better visibility and seamless connectivity. Additionally, we'll cover Cisco's commitment to offering flexibility and choice with both Intel and AMD processors, and its focus on security, especially concerning AI workloads and secure communications. Tune in to understand how Cisco's agile, elastic, and cognitive solutions are driving businesses forward. Learn more: https://www.cisco.com/site/us/en/products/networking/cloud-networking/index.html Cisco guests Murali Gandluru, Vice President of Product Management, Cisco Lukas Krattiger, Cisco Fellow, Cisco Networking-DC Network & Provider Connectivity, Cisco Cisco Champion hosts Rita Younger, Practice Lead Data Center Networking, World Wide Technology Rickey Keith, Consulting Systems Engineer, World Wide Technology Michael Witte, Principal Solutions Architect, World Wide Technology Liam Keegan, Advisor Moderator Danielle Carter, Customer Voices and Cisco Champion Program

Choose 2 Think
311: Finally! Wholesome Movies You and Your Family Can Watch for FREE! With Chevonne O'Shaughnessy of ACI

Choose 2 Think

Play Episode Listen Later Aug 15, 2024 43:02


In this episode of the Choose 2 Think Inspirational Podcast, we sit down with Chevonne O'Shaughnessy, Co-Founder and President of American Cinema International, to explore her incredible journey in the film industry. Chevonne shares her passion for creating family-oriented, faith-based films that inspire and uplift audiences worldwide. With over 150 feature films to her name and a thriving YouTube channel called ACI on the Go with over 650,000 subscribers, Chevonne offers profound insights into the power of storytelling, the importance of staying true to your values, and her mission to make a positive impact through entertainment. Tune in for an inspiring conversation with a true trailblazer in Christian filmmaking! This message is for you! You will be drawn to Chevonne's commitment to producing films that emphasize strong moral values and reflect your faith, offering you entertainment that resonates with your beliefs. Chevonne's journey in the film industry, marked by dedication and integrity, serves as an inspiring example of how to stay true to one's values while pursuing a career in a competitive field like Hollywood. You will gain valuable insights into the world of faith-based storytelling, learning how Chevonne and American Cinema International create content that uplifts, encourages, and spreads messages of hope and love. CONNECT WITH CHEVONNE Instagram: @chevonneinspiresYouTube: @ACIOnTheGo (Please subscribe to her channel)website: https://americancinemainspires.com/ CONNECT WITH VICTORIA: *NEW RELEASE: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Pickleball Passion A Marriage Devotional: 21 Days to a Stronger Connection on and off the Court⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ https://amzn.to/48wnvaV *⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠CHOOSE 2 THINK 365-DAY DEVOTIONAL⁠⁠⁠⁠⁠⁠:⁠⁠ ⁠⁠⁠⁠⁠⁠https://amzn.to/3Hcl7v1 *⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠CHOOSE 2 THINK JOURNAL⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠: https://amzn.to/3WvinND EMAIL: choose2think@gmail.com ⁠⁠WEBSITE:⁠⁠ www.choose2think.co ⁠⁠MENTORING:⁠⁠ www.choose2think.co/coaching.html ⁠⁠YOUTUBE:⁠⁠ www.youtube.com/channel/UCz8Z2B9TtXvWn0RKelVY5DQ ⁠⁠FACEBOOK:⁠⁠ www.facebook.com/groups/choose2think ⁠⁠INSTAGRAM⁠⁠: www.instagram.com/victoriadwalkerlydon/ *When you click on these Amazon affiliate links, I may earn a teeny commission from qualifying purchases at no extra cost to you. Thank you for your support! DISCLAIMER: The Choose 2 Think Inspirational Podcast is for educational and entertainment purposes only. Please consult your physician or doctor for all medical advice and counsel. Send in a voice message: https://podcasters.spotify.com/pod/show/victoria-d-lydon/message ⁠SUPPORT CHOOSE 2 THINK MINISTRIES AND PODCAST HERE: ⁠https://podcasters.spotify.com/pod/show/victoria-d-lydon --- Support this podcast: https://podcasters.spotify.com/pod/show/victoria-d-lydon/support

The Thoughtful Entrepreneur
1951 – Maximizing Time and Business Strategies with CO2 Coaching's Gary Cohen

The Thoughtful Entrepreneur

Play Episode Listen Later Jun 24, 2024 23:50 Transcription Available


The Value of Experience in Executive CoachingIn a recent episode of The Thoughtful Entrepreneur Show, host Josh Elledge engaged in a compelling conversation with Gary Cohen, the managing partner and executive coach at CO2 Coaching. The discussion delved into the nuances of executive coaching, particularly for CEOs and company presidents. Gary Cohen shared his wealth of leadership, coaching, and business strategy knowledge, offering listeners practical advice and profound insights. This blog post will distill the key themes and tips from the episode, providing a valuable guide for business leaders and entrepreneurs.Gary Cohen highlighted the significance of practical experience in executive coaching, stressing that coaches who have held similar leadership roles can offer more relevant and actionable advice. He also introduced the CO2 Coaching framework, which focuses on helping clients reclaim their time by delegating or streamlining tasks. Additionally, Gary emphasized the importance of balancing cost reductions with revenue gains to ensure a sustainable business model. These strategies are crucial for effective time management and financial health, enabling leaders to concentrate on strategic initiatives.The conversation also touched on fostering a culture of accountability and learning from failures. Gary discussed the role of leaders in setting clear expectations and holding team members accountable, which drives performance and alignment with organizational goals. He advocated for a blame-free culture that encourages innovation and continuous improvement. Furthermore, Gary underscored the necessity of emotional detachment in business decisions and the power of asking the right questions to engage and empower team members. By exploring coaching opportunities, business leaders can enhance their skills and drive their organizations to success.About Gary Cohen:Gary is famous for asking; he wrote the book on it. He probes his clients with the only kind of questions that can produce change: unexpected ones. From the client's answers, this dedicated Minneapolis leadership coach offers not just insights but alternative courses of action.“There always are several good roads to Rome,” he says. “The key is to identify the one that best fits both your head and heart.” He focuses on the destination–and not the possible curves in the road–for a simple reason: most obstacles are artificial, and the rest are in our heads. “Clear your head,” he believes, “and the obstacles disappear.” This may explain why Gary's clients call him “eccentric in exactly the right way.” Gary has yet to meet a client who wants to be ordinary, and he helps them enjoy unusual success by employing unusual approaches.CEO experience: Managing Partner and Co-founder of CO2 Partners, LLC (2004), an Executive Coaching and Leadership Development Firm. Founded ACI in 1989 with $4,000 and two employees, then grew 48 percent compounded annually for 12 years to over 2,200 employees and went public on the NASDAQ. ACI was one of Venture Magazine's Top 10 Best Performing Businesses and Business Journal's 25 Fastest Growing Small Public Companies, and Gary was an Entrepreneur of the Year finalist.Board memberships: All Kinds of Minds, Harvard Alumni Club of Minnesota, IC Systems, Inc., Richfield Bank, ACI, Telecentrics,, Outward Bound National Advisory, HBS Alumni Club of Minnesota (Past President), Minnesota Zoo Foundation among others.Author: Just Ask Leadership: Why Great Managers Always Ask the Right Questions (McGraw Hill 2009); articles for Business Week, Leader to Leader, and Forbes.Clients: Unilever, Intel, Genentech,...

Best in Fest
What Do Networks Expect in Today's Hollywood with Chevonne O'Shaughnessy - Ep #162

Best in Fest

Play Episode Listen Later Apr 23, 2024 35:20


As Co-Founder and President of American Cinema International, Chevonne began her professional journey as the President of International Sales at Quest Entertainment. Over the spans of 10 years, she successfully oversaw the production and sales of more than 176 feature films and two popular television series. In 2000, Chevonne established American Cinema International. In less than a decade, ACI produced 20 feature films, securing international sales and prime-time broadcasts on platform such as USA, SyFy, and HBO Premier. Additionally, they produced multiple romance movie for Hallmark, and UPTV.  Chevonne served as an executive producer for various films and TV mini-series. In 2014, along with George Shamieh, I established ACI INSPIRES, a new brand dedicated to producing inspirational entertainment with a focus on family films. My most current project is ACI On the Go, the YouTube channel for her company which boasts over 600,000 subscribers.