POPULARITY
Fala Carlão conversa com Rafael Butke, Diretor de Marketing Estratégico da ICL, direto do evento ICL NutriExperts. Rafael contou sobre a história do NutriExperts, que começou há 13 anos e se transformou em um dos eventos mais relevantes da companhia. Ele destacou a evolução da ICL nesse período, o tamanho da estrutura atual e a importância do trabalho em equipe para manter a qualidade técnica e a entrega de soluções cada vez mais conectadas ao produtor. Durante a conversa, Rafael também apresentou o lançamento da nova plataforma de diagnóstico, que marca mais um passo da ICL na integração entre tecnologia e nutrição vegetal. Fala aí, Rafael!
Fala Carlão conversa com Michel Castelani, Diretor de Pesquisa e Desenvolvimento da ICL, direto do evento ICL NutriExperts. Com 12 anos de trajetória na companhia, Michel falou sobre a construção do evento, que nasceu com o propósito de promover troca de conhecimento técnico de alto nível e se consolidou como um ambiente estratégico para ouvir o que os clientes têm a dizer. Ele destacou que essa escuta ativa feita diretamente com quem vive o agro é essencial para o desenvolvimento de novas tecnologias, mais conectadas às necessidades reais do campo. Um evento que evolui ano após ano, porque é feito por quem entende de pesquisa, mas também respeita a prática. Fala aí, Michel!
No episódio de hoje, recebemos Sérgio Silva, Doutor em Agronomia, para falar sobre o impacto dos sensores na agricultura. Como essa tecnologia está transformando o monitoramento do solo e das plantas?
Fala Carlão conversa com João Pascoalino, Gerente de Serviços Digitais da ICL, direto do evento ICL NutriExperts. João apresentou o braço de serviços digitais da ICL e falou sobre o lançamento de uma nova ferramenta que vai revolucionar o manejo nutricional: um diagnóstico em tempo real, que permite identificar deficiências na lavoura de forma imediata, corrigir rapidamente e garantir a manutenção da saúde da planta. A solução promete economia de tempo e dinheiro, além de entregar mais performance e produtividade ao produtor. É tecnologia aplicada com inteligência, ajudando a proteger a safra e a tomar decisões com base em dados concretos. Fala aí, João!
Fala Carlão conversa com André Fernandes, Engenheiro Agrônomo, direto do evento ICL NutriExperts. André falou sobre sua contribuição no livro “Nutrologia Vegetal – A evolução da agricultura”, obra pioneira organizada pela ICL, e destacou a importância da fertirrigação como ferramenta estratégica para melhorar o aproveitamento nutricional das culturas. Com foco especial na cafeicultura, ele reforçou que a irrigação vai muito além do fornecimento de água, ela é um verdadeiro seguro de produtividade, essencial para manter o equilíbrio e o rendimento mesmo diante de adversidades climáticas. Fala aí, André!
Fala Carlão conversa com Guilherme Amaral, Agrônomo, Coordenador do programa de capacitação da ICL e co-autor do livro “Nutrologia Vegetal – A evolução da agricultura”, direto do evento ICL NutriExperts. Guilherme apresentou a obra organizada pela ICL, que é o primeiro livro do mundo dedicado à nutrologia vegetal, reunindo conhecimento técnico avançado sobre a nutrição de plantas com base científica e aplicação prática no campo. Ele também falou sobre seu trabalho como coordenador do programa interno de capacitação da ICL, reforçando o compromisso da empresa com a formação contínua de profissionais que atuam diretamente na transformação da agricultura brasileira. Fala aí, Guilherme!
Fala Carlão conversa com Eder Sandy, Engenheiro Agrônomo e Consultor em Cafeicultura da ECS Consultoria Agronômica, direto do ICL NutriExperts. Com uma trajetória construída no campo e com os pés firmes na técnica, Eder falou sobre seu trabalho como consultor especializado em cafeicultura, acompanhando produtores, orientando decisões e contribuindo para uma produção mais eficiente, equilibrada e sustentável. Ele também comentou sobre a importância de participar de eventos como o NutriExperts, onde a troca de conhecimento é intensa e atualizada, e aproveitou para mostrar rapidamente o livro sobre Nutrologia Vegetal que está sendo produzido com o apoio da ICL. Comprometido com a evolução técnica do agro, Eder reforça que o aprendizado constante é o solo fértil onde nasce uma consultoria de valor. Fala aí, Eder!
Fala Carlão conversa com Maurício De Bortoli, Engenheiro Agrônomo e Consultor da Bortoli Agro Consultoria, direto do ICL NutriExperts. Participando pela décima vez do evento, Maurício destacou a evolução do NutriExperts ao longo dos anos, sempre com foco em conteúdo técnico aplicado e retorno econômico real para o produtor. Ele comentou sobre uma nova ferramenta de diagnose da ICL que promete facilitar o dia a dia no campo com ainda mais precisão e agilidade. Com uma trajetória sólida, reconhecida com prêmios e resultados consistentes, Maurício falou sobre o impacto direto que seu trabalho gera nas propriedades que acompanha, onde o conhecimento se transforma em produtividade, sustentabilidade e lucro. Fala aí, Maurício!
Fala Carlão conversa com André Alfonsi, Gerente Sênior de Portfólio da ICL, direto do NutriExperts, evento técnico que reúne os principais especialistas do setor. André compartilhou um pouco da sua trajetória na companhia e falou sobre o lançamento do SulfurBall, uma inovação da ICL que traz o enxofre em formato esférico, com alta solubilidade e eficiência agronômica. O produto atende culturas como soja e milho, reforçando a importância do enxofre na construção de solos equilibrados e produtivos. André também adiantou que dois novos lançamentos vêm aí, mostrando que a ICL segue firme no compromisso de entregar soluções que melhoram a performance no campo com tecnologia de verdade. Fala aí, André!
Fala Carlão conversa com Edivandro Corte, Diretor Executivo do Grupo Terras Gerais, direto do NutriExperts, evento técnico promovido pela ICL. Com 25 anos de experiência como consultor, Edivandro falou sobre sua trajetória no agro, sua paixão pela atividade e o propósito que o move: contribuir com a sociedade levando conhecimento, soluções e transformação para o campo. Ele destacou os benefícios dos produtos da ICL e a importância de democratizar o acesso à tecnologia, para que ela chegue a todos os produtores, independentemente do tamanho. Também comentou sua participação como conselheiro no NutriExperts e elogiou a qualidade estratégica dessa edição do evento. Fala aí, Edivandro!
Fala Carlão conversa com Bernardo Vieira, Gerente de Produtos da ICL, durante o NutriExperts, evento técnico promovido pela ICL. Bernardo contou um pouco sobre sua trajetória profissional e pessoal, revelando que descobriu sua paixão pelo campo ainda muito jovem. Ele também falou sobre a entrada da ICL no universo dos biológicos, o crescimento do portfólio da empresa e os dois lançamentos mais recentes que reforçam o compromisso com inovação e desempenho no agro. Fala aí, Bernardo!
Fala Carlão conversa com Carlos Cogo, Sócio-Diretor de Consultoria da Cogo Inteligência em Agronegócio; Guy Carvalho, apresentador do programa Papo de Cafeicultor; e Alfredo Kober, CEO da ICL América do Sul, durante o Encontro de Distribuidores da ICL. Na conversa, eles destacaram a grandiosidade do evento, a qualidade das palestras e o alto nível de conteúdo técnico apresentado. Alfredo também comentou sobre os novos lançamentos da ICL, reforçando o compromisso da empresa com a inovação e o fortalecimento das parcerias no agronegócio. Fala ai, amigos!
Fala Carlão conversa com João Benetti, Diretor Comercial Sênior da ICL e Head das marcas Aminoagri e Dimicron, direto do Encontro de Distribuidores 2025 da ICL. João falou sobre a grandiosidade do evento e destacou como a postura da ICL vai além da entrega de produtos: ela envolve relacionamento, escuta e visão estratégica. Um modelo de atuação que fortalece o vínculo com quem está na ponta e garante evolução conjunta no campo. Fala aí, João!
Fala Carlão conversa com Alfredo Kober, CEO da ICL para a América Latina, direto do Encontro de Distribuidores 2025 da companhia. Alfredo destacou a importância de pensar o agro com estratégia, visão de longo prazo e foco em soluções práticas. Segundo ele, o sucesso da ICL passa por três pilares: parcerias sólidas, inovação real e compromisso com resultados que chegam ao produtor. Com uma liderança firme e conectada ao campo, Alfredo reforçou que o papel da indústria é construir pontes entre a ciência e a lavoura, entregando tecnologia de verdade para um agro mais produtivo, sustentável e preparado para o futuro. Fala aí, Alfredo!
Fala Carlão conversa com Luis Diniz, Diretor Comercial da Regional Sudeste da Agro Amazônia, direto do Encontro de Distribuidores 2025 da ICL. Luis contou como tem sido esse momento de transição e crescimento, após a aquisição da Nativa pela Agro Amazônia. A mudança de marca, agora Agro Amazônia Sudeste, vem acompanhada de expansão estratégica, portfólio reforçado e uma equipe ainda mais preparada para atender o produtor rural. Ele também explicou por que Patos de Minas foi escolhida como polo para a produção de sementes — uma decisão baseada em clima, solo e visão de longo prazo. E, claro, comentou sobre o evento da ICL, destacando a importância da troca de experiências entre parceiros que pensam o agro com estratégia e compromisso. Fala aí, Luis!
Fala Carlão conversa com Moysés Simantob, Professor da FGV, direto do Encontro de Distribuidores 2025 da ICL. Com uma linguagem acessível e muito conteúdo, Moysés falou sobre a revolução silenciosa da inteligência artificial no nosso dia a dia, inclusive no agro. Ele explicou o que diferencia (e o que aproxima) a inteligência humana da artificial, e por que é fundamental entender essa tecnologia como aliada estratégica para tomada de decisão, eficiência e competitividade no campo. Uma conversa de alto nível sobre inovação, comportamento e o futuro que já começou. Fala aí, Moysés!
O convidado especial do PlantCast de hoje é o professor Tadeu Inoue, do curso de Agronomia da Universidade Estadual de Maringá!Ele compartilhou insights valiosos sobre diferentes formas de manejo nutricional e fisiológico voltadas à mitigação de estresses bióticos e abióticos em grandes culturas.Não perca essa conversa cheia de conhecimento!
Fala Carlão conversa com Aquilino Romani, empresário da Sul Agrícola, direto do Encontro de Distribuidores 2025 da ICL. Com 42 anos de história na revenda, Aquilino falou sobre sua parceria de longa data com a ICL e trouxe um lado inusitado da sua trajetória: sua paixão pelo futebol. Ele participou de três diretorias do Paraná Clube e compartilhou como o mundo do agro e o universo da bola se cruzam mais do que parece. Foi um bate-papo sobre gestão, espírito de equipe, visão de longo prazo e, claro, muita experiência de campo, em todos os sentidos. Fala aí, Aquilino!
Fala Carlão conversa com Pedro Mandelli, consultor, professor e autor, direto do Encontro de Distribuidores 2025 da ICL. Especialista em comportamento organizacional e gestão de pessoas, Pedro falou sobre a importância de lideranças preparadas para os desafios da nova economia. Para ele, as empresas que querem crescer de forma sustentável no agro precisam investir em uma “liderança 2025”, mais estratégica, humana, conectada com o negócio e com a equipe. Um papo direto ao ponto sobre gente, gestão e o papel fundamental de quem lidera no campo e fora dele. Fala aí, Pedro!
Fala Carlão conversa com Cristiano Corazza, Diretor da Cotriel responsável pelas áreas de Grãos e Fomento, direto do Encontro de Distribuidores 2025 da ICL. Com sede em Espumoso (RS), a Cotriel é uma das maiores cooperativas agropecuárias do Brasil, e Cristiano conhece essa história de perto: já são mais de 20 anos de dedicação ao cooperativismo e ao produtor rural. No bate-papo, ele destacou a força da região, a importância da assistência técnica e a confiança gerada por relações de longo prazo. A parceria com a ICL, que já dura uma década, mostra que quando indústria e cooperativa caminham juntas, quem ganha é o campo. Tecnologia, proximidade e resultado andam lado a lado. Fala aí, Cristiano!
Fala Carlão conversa com Victor Scotton Leal, Commercial VP B2C da ICL América do Sul, direto do Encontro de Distribuidores 2025 da empresa. Victor falou sobre a importância de agregar valor real para o produtor, não só com portfólio robusto, mas com inteligência, acesso ao crédito e tecnologia de ponta. O evento reuniu grandes nomes do setor justamente para pensar o agro de forma integrada, prática e conectada com os desafios do campo. Uma conversa que mostra como a ICL aposta em soluções completas, da ciência ao relacionamento. Fala aí, Victor!
Fala Carlão conversa com um verdadeiro time de peso no Encontro de Distribuidores 2025 da ICL! Silvio Dal Molin, Gerente de Vendas da Super Safra, falou sobre sua atuação no Rio Grande do Sul e a parceria sólida com a ICL, que vem fazendo diferença no campo. João Esberci, Gerente Regional de Vendas da ICL, contou sua trajetória e reforçou o compromisso da empresa com quem faz o agro acontecer de verdade. Também participaram Juarez, parceiro da Dimicron e proprietário da Produtiva Insumos Agrícolas em Ijuí/RS, e Neivaldo Cappellesso, Diretor Administrativo da Agropaulista, que acumula mais de 40 anos de experiência na região de Unaí. Alexandre Perotti, sócio-proprietário da Planta Sul, completou o time com insights sobre relacionamento, confiança e resultados no dia a dia do produtor. Crédito, parceria, tecnologia e conexão com quem vive o campo de perto. É disso que o agro precisa. Fala aí, amigos!
Fala Carlão conversa com Julio Chudzik, Diretor Comercial da Ouro Safra, direto do Encontro de Distribuidores 2025 da ICL. Julio contou sobre a nova fase da empresa, que agora também exporta para a China, um marco que começou com planejamento estratégico e visão de longo prazo. Falou sobre como essa oportunidade foi construída e o impacto que isso representa para a Ouro Safra e para o agro brasileiro. Ele também destacou a força da parceria com a ICL e a importância de eventos como esse para conectar pessoas, gerar negócios e impulsionar novas conquistas no campo. Fala aí, Julio!
Fala Carlão conversa com Eduardo Menegário, CFO da ICL, direto do Encontro de Distribuidores 2025 da empresa. Com uma visão estratégica sobre finanças e mercado, Eduardo destacou a importância de oferecer condições de crédito cada vez melhores para o produtor rural. Segundo ele, o crédito bem estruturado é peça-chave para a expansão sustentável do agro, e a indústria precisa assumir protagonismo nesse processo. Eduardo também comentou sobre a atuação integrada entre o time financeiro e o time comercial da ICL, ressaltando que a proximidade com a ponta faz toda a diferença para gerar negócios sólidos, com gestão de risco e confiança no futuro. Fala aí, Eduardo!
Fala Carlão conversa com Rafael Butke, Diretor de Marketing Estratégico da ICL, direto do Encontro de Distribuidores 2025 da empresa. No papo, Rafael trouxe uma visão clara sobre o momento atual do setor e o papel estratégico da ICL nesse cenário. Ele falou sobre as transformações recentes no agro, o desafio constante de crescer com sustentabilidade e a importância de evoluir junto com os parceiros, seja pelo AgroCoonexão, voltado às cooperativas, ou pelo Programa Horizon, voltado às distribuidoras. Com resultados expressivos e um plano bem definido, a ICL vem mostrando que quando crédito, estratégia e proximidade se encontram, o agro avança com consistência e inteligência. Fala aí, Rafael!
No novo episódio do PlantCast, o professor Carlos Crusciol, professor da Unesp de Botucatu, traz um tema essencial para a produtividade no campo: o manejo da cana-de-açúcar sob estresses abióticos como seca, alta radiação e altas temperaturas.Um conteúdo técnico e prático para quem quer entender melhor como lidar com os desafios que impactam o desenvolvimento das plantas.
Fala Carlão conversa com Carlos Cogo, da Cogo Inteligência de Mercado, direto do Encontro de Distribuidores 2025 da ICL. Especialista em tendências e análises econômicas do agro, Cogo trouxe uma visão realista e estratégica do cenário atual. Em sua palestra, falou sobre os solavancos do mercado internacional, as vantagens competitivas que o Brasil tem neste momento e o crescimento da demanda chinesa por nossos produtos. Também abordou os desafios estruturais internos, apontou gargalos ligados à gestão pública e reforçou que, mesmo em meio às incertezas, este é um momento favorável para investir, com inteligência e visão de longo prazo. Fala aí, Carlos!
Disclaimer: This video is for educational purposes only. The opinions expressed by the guests are their personal views and do not reflect our stance. We have no intention of defaming or harming any individual, brand, product, country, or profession mentioned in this video. Our aim is to provide information to help the audience make informed decisions.Niranjan Pagadala Founder of 8 views - a prominent Digital marketing company based in Hyderabad. Along with being a distinguished entrepreneur, he is also an Ex-Ranji and ICL cricketer~Our filming gear- Camera 1 - https://amzn.to/4gS3IGv Camera 2 - https://amzn.to/4gN6Kf1 Wireless collor mic -https://amzn.to/4k4dEPX Dynamic microphone - https://amzn.to/3QqWBdd Audio mixer - https://amzn.to/4hHByiDLens 1 - https://amzn.to/4lrR9F6Lens 2 - https://amzn.to/44mFxx3Lens 3 - https://amzn.to/44nlz5o~About This Podcast - In this telugu podcast episode, we take a discussed the hidden world of digital marketing, data tracking, and the powerful systems shaping how we interact with the internet. From how agencies collect and sell your phone number and email ID, to why you and your entire friend group suddenly see the same reel within hours, we unpack the sophisticated tactics running behind the scenes. We explore the psychology behind social media addiction—how the brain reacts to endless content, why it's so hard to put the phone down, and how even the smallest notifications are designed to hook you.We also question whether privacy really exists in the modern world. With voice assistants like Alexa and Siri responding instantly, how can we be sure they're not always listening? You'll also learn how apps like Ola or Uber might change prices based on your behavior, like switching apps or low battery levels. The conversation doesn't stop there—we look at the impact of fake followers, manipulated trending tabs, and paid engagement packages that can alter public opinion without most users even realizing it.As the digital space grows, we also dive into the massive influence of AI—how it's reshaping industries, affecting jobs, and changing how content is created and consumed. From real-world case studies to surprising truths about who really controls the internet, this episode is packed with insights, stories, and examples that challenge what we think we know about our online lives.If you've ever wondered how deep the rabbit hole goes when it comes to the internet, algorithms, and marketing, this is your episode. Tune in, and prepare to see your screen in a completely new way.#telugupodcast #businesspodcast #rawtalks #vamshikurapati #rawtalkswithvk #podcastintelugu #vkpodcast #besttelugupodcast
No episódio de hoje, o professor Cássio Luiz Boechat, da Universidade Federal do Piauí, fala sobre a presença de metais pesados em produtos aplicados na agricultura.
O PlantCast desta semana traz um convite especial para todos os amantes de café! No episódio, o professor Leandro Paiva, especialista em café do IF Sul de Minas, compartilha seus conhecimentos sobre temas essenciais para o setor. Entre os destaques estão: cafés especiais, a construção da qualidade do café, o mercado de cafés diferenciados e como a inovação está moldando o futuro dessa indústria.Uma conversa imperdível para quem quer entender mais sobre o universo do café e suas tendências. Não deixe de ouvir! ☕
ICL Professional Horticulture Technical Manager Andrew Wilson explains what water soluble fertilisers are and how they can be applied directly to the plant through drip irrigation and foliar application. He explains how to apply them through a diluter, overhead irrigation or drip irrigation. WSF's are usually applied as a supplementary feed in combination with a Controlled Release Fertiliser such as Osmocote 5. They are typically used to give a growth boost to outdoor crops after a prolonged high rainfall period during the growing season.Wilson talks about different types of water soluble fertilisers to suit your water type and ratios of NPK in the product, as well as conductivity (EC) of the fertiliser.He explains how AngelaWeb 3.0 software takes many nursery specific factors into account such as water quality, growing media and Osmocote levels and says how WSF's can be used in peat-free growing with care and tells us about frequency of feeding Lots of advice can be found on the ICL website and many practical videos on our Youtube channel ICL UK/Ire Professional Horticulture. Hosted on Acast. See acast.com/privacy for more information.
Neste episódio, revisitamos uma conversa incrível com o Dr. Thadeu Melo, pesquisador da Fundação Chapadão, para explorar a fascinante relação entre a física e a química do solo. Descubra como a estrutura do solo influencia a nutrição das plantas, de que forma as propriedades químicas impactam os atributos físicos e quem são os verdadeiros "engenheiros do sistema".Aproveite e siga a ICL nas redes sociais para mais conteúdo e atualizações:Instagram:https://bit.ly/3RfwZjlYouTube:https://bit.ly/46RYbdXLinkedIn:https://bit.ly/487ejJt
Neste episódio do PlantCast, temos a honra de receber Carlos Cogo, da Cogo Inteligência em Agronegócio para falar sobre o cenário das safras 2024/2025 e 2025/2026.Confira as perspectivas sobre a soja e o milho safrinha, rentabilidade, custos de produção, juros, investimentos e as oportunidades no agro diante das tarifas de Trump.
Salve, camaradas!Nesse episódio analisamos a entrevista de Fernando Haddad para o ICL. Na entrevista, o Ministro fez questão de defender todos os fundamentos da agenda neoliberal e MENTIU sobre o BPC e políticas sociais. Fica de olho!Ouça agora!
From playing Ranji Trophy for Haryana and representing North Zone in the Deodhar Trophy to becoming the first runner-up in ESPN’s Harsha Ki Khoj: Dream Job, his journey into sports broadcasting has been anything but conventional. With over 3,000 shows across Star Sports, Sony Six, Times Now, Zee Sports, Ten Cricket, Ten Sports, DD Sports, ESPN, News X, and Mirror Now, he has been the voice behind IPL, ICL, ISL, Pro-Kabaddi, NBA, BCCI and ICC Cricket World Cups, Khelo India, Asian Games, Commonwealth Games, Wimbledon, and the Olympics. As one of India's top bilingual commentators, his insights into players like Rohit Sharma and Virat Kohli bring a depth that keeps audiences hooked. But how did a cricketer transition into sports media, and what goes into analyzing the game at the highest level? Beyond commentary, his music career has been just as dynamic—trained under Guru Manik Lal Verma, he has composed over 300 songs, performed in 600+ concerts, and released albums like India Hai Meri Jaan and Rok Sako Toh Rok Lo. His latest single Chal Dost is a blockbuster, and Musical Talkshaala is redefining how music and motivation come together. Adding another milestone, his debut book Udaan, launched by Kapil Dev in 2025, dives into his multifaceted journey. What drives someone to master multiple fields, and what untold stories lie behind his career in cricket, commentary, and music? Tune in to find out.See omnystudio.com/listener for privacy information.
Neste episódio, resgatamos uma conversa super valiosa com o professor Geraldo Chavarria, da Universidade de Passo Fundo, sobre como maximizar a produção de soja.
On today's show: To Subscribe
Panel: '(Non-)Defining 'Gender' in the Crimes Against Humanity Draft: Possibilities, Alliances, and Strategies'Feminist activists, country representatives, and other civil society actors have debated how to define “gender” in international criminal law (ICL) for at least three decades. In the Rome Conference that established the International Criminal Court (ICC) and its Statute in 1998, defining “gender” was a hotly debated topic of negotiation. More recently, this debate has resurfaced in the steps leading to the International Law Commission's Draft Articles for a Crimes Against Humanity Treaty, and continues to be discussed in the deliberations at the Sixth Committee on the Draft Articles. The CAH Convention is now expected to be negotiated between 2026-2029, and, more than a mere point of contention, the concept of ‘gender' in its text can be crucial for prosecuting sexual and gender-based international crimes and thus fundamental to gender justice efforts worldwide. With this in mind, this roundtable gathers scholars and activists studying and working (often simultaneously) on the definition of gender in international criminal law, in an effort to learn from their specific positionalities, perceptions, and experiences about the challenges, strategies, and possibilities for (non-)defining the term.https://www.lcil.cam.ac.uk/press/events/2025/02/panel-queering-gender-crimes-against-humanity-draft-possibilities-alliances-and-strategies
Panel: '(Non-)Defining 'Gender' in the Crimes Against Humanity Draft: Possibilities, Alliances, and Strategies'Feminist activists, country representatives, and other civil society actors have debated how to define “gender” in international criminal law (ICL) for at least three decades. In the Rome Conference that established the International Criminal Court (ICC) and its Statute in 1998, defining “gender” was a hotly debated topic of negotiation. More recently, this debate has resurfaced in the steps leading to the International Law Commission's Draft Articles for a Crimes Against Humanity Treaty, and continues to be discussed in the deliberations at the Sixth Committee on the Draft Articles. The CAH Convention is now expected to be negotiated between 2026-2029, and, more than a mere point of contention, the concept of ‘gender' in its text can be crucial for prosecuting sexual and gender-based international crimes and thus fundamental to gender justice efforts worldwide. With this in mind, this roundtable gathers scholars and activists studying and working (often simultaneously) on the definition of gender in international criminal law, in an effort to learn from their specific positionalities, perceptions, and experiences about the challenges, strategies, and possibilities for (non-)defining the term.https://www.lcil.cam.ac.uk/press/events/2025/02/panel-queering-gender-crimes-against-humanity-draft-possibilities-alliances-and-strategies
별헤는사람들 2025년 2월호. 나는 암흑물질을 이렇게 찾는다! 그리고 가이아 망원경이 밝혀내는 우주 속 거리 Feat. 고등과학원 유재원 박사 - 오프닝 유로파 바다 속에 사는 새우를 찾으러 탐사선 드디어 출발! 스타쉽, 좀 커지고 뻥 터지다 (이 에피소드는 일론 머스크의 극우적 행보 이전에 녹화되었음을 알려 드립니다) - 유재원 박사 나는 암흑물질을 ICL 연구로 찾아낸다! - 홍승수 박사 우주 속 거리 시리즈! 가이아 우주 망원경의 괴물같은 능력 전체 자료 https://www.slideshare.net/slideshow/2025-2-pdf/275403978 과학과사람들 제공
This article delves into the latest developments in refractive surgery, with a particular focus on LASIK, PRK, SMILE, and ICL procedures, as well as the critical role of corneal transplant surgery in restoring vision for those with severe corneal damage or diseases like keratoconus and Fuchs' endothelial dystrophy.
This year, the theme of our podcast will feature discussions of foundational equipment and emerging technology in ophthalmology. In the first episode of 2025, Roger Zaldivar, MD, MBA, joins Gary Wörtz, MD, to discuss the ICL Guru project and how its complex algorithm integrates with ultrasound biomicroscopy platforms. Dr. Zaldivar shares his experience with ICL surgery, and how he leveraged that expertise to help improve sizing methodology and patient selection.
Across the Great Illuminary, a question was posed - which Team of Illumineers is the greatest to have ever quested? 20 teams responded to this call and decided to stake their claim as the greatest and in order to determine who will reign supreme, the Illumineer Champions League was created. Join myself, Rod, Vvonderland and yBreezy as we tell you what the Illumineers Champions League is and where to follow along as our Inaugural Season kicks off Friday, January 17, 2025. Follow and watch the ICL on YouTube @IllumineerChampionsLeague Watch the Streams on Twitch: https://www.twitch.tv/vvonderland Follow the ICL on Twitter/X: https://x.com/illumineercl
Happy holidays! We'll be sharing snippets from Latent Space LIVE! through the break bringing you the best of 2024! We want to express our deepest appreciation to event sponsors AWS, Daylight Computer, Thoth.ai, StrongCompute, Notable Capital, and most of all all our LS supporters who helped fund the gorgeous venue and A/V production!For NeurIPS last year we did our standard conference podcast coverage interviewing selected papers (that we have now also done for ICLR and ICML), however we felt that we could be doing more to help AI Engineers 1) get more industry-relevant content, and 2) recap 2024 year in review from experts. As a result, we organized the first Latent Space LIVE!, our first in person miniconference, at NeurIPS 2024 in Vancouver.Of perennial interest, particularly at academic conferences, is scaled-up architecture research as people hunt for the next Attention Is All You Need. We have many names for them: “efficient models”, “retentive networks”, “subquadratic attention” or “linear attention” but some of them don't even have any lineage with attention - one of the best papers of this NeurIPS was Sepp Hochreiter's xLSTM, which has a particularly poetic significance as one of the creators of the LSTM returning to update and challenge the OG language model architecture:So, for lack of a better term, we decided to call this segment “the State of Post-Transformers” and fortunately everyone rolled with it.We are fortunate to have two powerful friends of the pod to give us an update here:* Together AI: with CEO Vipul Ved Prakash and CTO Ce Zhang joining us to talk about how they are building Together together as a quote unquote full stack AI startup, from the lowest level kernel and systems programming to the highest level mathematical abstractions driving new model architectures and inference algorithms, with notable industry contributions from RedPajama v2, Flash Attention 3, Mamba 2, Mixture of Agents, BASED, Sequoia, Evo, Dragonfly, Dan Fu's ThunderKittens and many more research projects this year* Recursal AI: with CEO Eugene Cheah who has helped lead the independent RWKV project while also running Featherless AI. This year, the team has shipped RWKV v5, codenamed Eagle, to 1.5 billion Windows 10 and Windows 11 machines worldwide, to support Microsoft's on-device, energy-usage-sensitive Windows Copilot usecases, and has launched the first updates on RWKV v6, codenamed Finch and GoldFinch. On the morning of Latent Space Live, they also announced QRWKV6, a Qwen 32B model modified with RWKV linear attention layers. We were looking to host a debate between our speakers, but given that both of them were working on post-transformers alternativesFull Talk on YoutubePlease like and subscribe!LinksAll the models and papers they picked:* Earlier Cited Work* Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention* Hungry hungry hippos: Towards language modeling with state space models* Hyena hierarchy: Towards larger convolutional language models* Mamba: Linear-Time Sequence Modeling with Selective State Spaces* S4: Efficiently Modeling Long Sequences with Structured State Spaces* Just Read Twice (Arora et al)* Recurrent large language models that compete with Transformers in language modeling perplexity are emerging at a rapid rate (e.g., Mamba, RWKV). Excitingly, these architectures use a constant amount of memory during inference. However, due to the limited memory, recurrent LMs cannot recall and use all the information in long contexts leading to brittle in-context learning (ICL) quality. A key challenge for efficient LMs is selecting what information to store versus discard. In this work, we observe the order in which information is shown to the LM impacts the selection difficulty. * To formalize this, we show that the hardness of information recall reduces to the hardness of a problem called set disjointness (SD), a quintessential problem in communication complexity that requires a streaming algorithm (e.g., recurrent model) to decide whether inputted sets are disjoint. We empirically and theoretically show that the recurrent memory required to solve SD changes with set order, i.e., whether the smaller set appears first in-context. * Our analysis suggests, to mitigate the reliance on data order, we can put information in the right order in-context or process prompts non-causally. Towards that end, we propose: (1) JRT-Prompt, where context gets repeated multiple times in the prompt, effectively showing the model all data orders. This gives 11.0±1.3 points of improvement, averaged across 16 recurrent LMs and the 6 ICL tasks, with 11.9× higher throughput than FlashAttention-2 for generation prefill (length 32k, batch size 16, NVidia H100). We then propose (2) JRT-RNN, which uses non-causal prefix-linear-attention to process prompts and provides 99% of Transformer quality at 360M params., 30B tokens and 96% at 1.3B params., 50B tokens on average across the tasks, with 19.2× higher throughput for prefill than FA2.* Jamba: A 52B Hybrid Transformer-Mamba Language Model* We present Jamba, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. * Specifically, Jamba interleaves blocks of Transformer and Mamba layers, enjoying the benefits of both model families. MoE is added in some of these layers to increase model capacity while keeping active parameter usage manageable. * This flexible architecture allows resource- and objective-specific configurations. In the particular configuration we have implemented, we end up with a powerful model that fits in a single 80GB GPU.* Built at large scale, Jamba provides high throughput and small memory footprint compared to vanilla Transformers, and at the same time state-of-the-art performance on standard language model benchmarks and long-context evaluations. Remarkably, the model presents strong results for up to 256K tokens context length. * We study various architectural decisions, such as how to combine Transformer and Mamba layers, and how to mix experts, and show that some of them are crucial in large scale modeling. We also describe several interesting properties of these architectures which the training and evaluation of Jamba have revealed, and plan to release checkpoints from various ablation runs, to encourage further exploration of this novel architecture. We make the weights of our implementation of Jamba publicly available under a permissive license.* SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers* We introduce Sana, a text-to-image framework that can efficiently generate images up to 4096×4096 resolution. Sana can synthesize high-resolution, high-quality images with strong text-image alignment at a remarkably fast speed, deployable on laptop GPU. Core designs include: * (1) Deep compression autoencoder: unlike traditional AEs, which compress images only 8×, we trained an AE that can compress images 32×, effectively reducing the number of latent tokens. * (2) Linear DiT: we replace all vanilla attention in DiT with linear attention, which is more efficient at high resolutions without sacrificing quality. * (3) Decoder-only text encoder: we replaced T5 with modern decoder-only small LLM as the text encoder and designed complex human instruction with in-context learning to enhance the image-text alignment. * (4) Efficient training and sampling: we propose Flow-DPM-Solver to reduce sampling steps, with efficient caption labeling and selection to accelerate convergence. * As a result, Sana-0.6B is very competitive with modern giant diffusion model (e.g. Flux-12B), being 20 times smaller and 100+ times faster in measured throughput. Moreover, Sana-0.6B can be deployed on a 16GB laptop GPU, taking less than 1 second to generate a 1024×1024 resolution image. Sana enables content creation at low cost. * RWKV: Reinventing RNNs for the Transformer Era* Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length. In contrast, recurrent neural networks (RNNs) exhibit linear scaling in memory and computational requirements but struggle to match the same performance as Transformers due to limitations in parallelization and scalability. * We propose a novel model architecture, Receptance Weighted Key Value (RWKV), that combines the efficient parallelizable training of transformers with the efficient inference of RNNs.* Our approach leverages a linear attention mechanism and allows us to formulate the model as either a Transformer or an RNN, thus parallelizing computations during training and maintains constant computational and memory complexity during inference. * We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers, suggesting future work can leverage this architecture to create more efficient models. This work presents a significant step towards reconciling trade-offs between computational efficiency and model performance in sequence processing tasks.* LoLCATs: On Low-Rank Linearizing of Large Language Models* Recent works show we can linearize large language models (LLMs) -- swapping the quadratic attentions of popular Transformer-based LLMs with subquadratic analogs, such as linear attention -- avoiding the expensive pretraining costs. However, linearizing LLMs often significantly degrades model quality, still requires training over billions of tokens, and remains limited to smaller 1.3B to 7B LLMs. * We thus propose Low-rank Linear Conversion via Attention Transfer (LoLCATs), a simple two-step method that improves LLM linearizing quality with orders of magnitudes less memory and compute. * We base these steps on two findings. * First, we can replace an LLM's softmax attentions with closely-approximating linear attentions, simply by training the linear attentions to match their softmax counterparts with an output MSE loss ("attention transfer").* Then, this enables adjusting for approximation errors and recovering LLM quality simply with low-rank adaptation (LoRA). * LoLCATs significantly improves linearizing quality, training efficiency, and scalability. We significantly reduce the linearizing quality gap and produce state-of-the-art subquadratic LLMs from Llama 3 8B and Mistral 7B v0.1, leading to 20+ points of improvement on 5-shot MMLU. * Furthermore, LoLCATs does so with only 0.2% of past methods' model parameters and 0.4% of their training tokens. * Finally, we apply LoLCATs to create the first linearized 70B and 405B LLMs (50x larger than prior work). * When compared with prior approaches under the same compute budgets, LoLCATs significantly improves linearizing quality, closing the gap between linearized and original Llama 3.1 70B and 405B LLMs by 77.8% and 78.1% on 5-shot MMLU.Timestamps* [00:02:27] Intros* [00:03:16] Why Scale Context Lengths? or work on Efficient Models* [00:06:07] The Story of SSMs* [00:09:33] Idea 1: Approximation -> Principled Modeling* [00:12:14] Idea 3: Selection* [00:15:07] Just Read Twice* [00:16:51] Idea 4: Test Time Compute* [00:17:32] Idea 2: Hardware & Kernel Support* [00:19:49] RWKV vs SSMs* [00:24:24] RWKV Arch* [00:26:15] QWRKWv6 launch* [00:30:00] What's next* [00:33:21] Hot Takes - does anyone really need long context?Transcript[00:00:00] AI Charlie: We're back at Latent Space Live, our first mini conference held at NeurIPS 2024 in Vancouver. This is Charlie, your AI co host. As a special treat this week, we're recapping the best of 2024 going domain by domain. We sent out a survey to the over 900 of you who told us what you wanted, and then invited the best speakers in the Latent Space Network to cover each field.[00:00:24] AI Charlie: 200 of you joined us in person throughout the day, with over 2200 watching live online. Thanks Our next keynote covers the State of Transformers alternative architectures, with a special joint presentation with Dan Fu of Together AI and Eugene Chia of Recursal AI and Featherless AI. We've featured both Together and Recursal on the pod before, with CEO Veepal Vedprakash introducing them.[00:00:49] AI Charlie: And CTO CE Zhang joining us to talk about how they are building together together as a quote unquote full stack AI startup from the lowest level kernel and systems [00:01:00] programming to the highest level mathematical abstractions driving new model architectures and inference algorithms with notable industry contributions from Red Pajama V2, Flash Attention 3, Mamba 2, Mixture of Agents.[00:01:15] AI Charlie: Based, Sequoia, Evo, Dragonfly, Danfoo's Thunder Kittens, and many more research projects this year. As for Recursal and Featherless, we were the first podcast to feature RWKV last year, and this year the team has shipped RWKV v5, codenamed Eagle, to 1. 5 billion Windows 10 and Windows 11 machines worldwide to support Microsoft's on device, end Energy Usage Sensitive Windows Copilot Use Cases and has launched the first updates on RWKV v6, codenamed Finch and Goldfinch.[00:01:53] AI Charlie: On the morning of Latent Space Live, they also announced QRdata UKv6, a QEN32B model [00:02:00] modified with RDWKV linear attention layers. Eugene has also written the most single most popular guest post on the Latent Space blog this year. Yes, we do take guest posts on what he has discovered about the H100 GPU inference NeoCloud market since the successful launch of Featherless AI this year.[00:02:20] AI Charlie: As always, don't forget to check the show notes for the YouTube link to their talk as well as their slides. Watch out and take care.[00:02:27] Intros[00:02:27] Dan Fu: Yeah, so thanks so much for having us. So this is going to be a little bit of a two part presentation. My name is Dan. I'm at Together AI, and I'll be joining UCSD as faculty in about a year. And Eugene, you want to introduce yourself?[00:02:46] Eugene Cheah: Eugene, I lead the art activity team, and I, I'm CEO of Featherless, and we both work on this new post transformer architecture space.[00:02:55] Dan Fu: Yeah, so yeah, so today we're really excited to talk to you a little bit [00:03:00] about that. So first I'm going to give a broad overview of kind of the last few years of progress in non post transformer architectures. And then afterwards Eugene will tell us a little bit about the latest and the greatest and the latest frontier models in this space.[00:03:16] Why Scale Context Lengths? or work on Efficient Models[00:03:16] Dan Fu: So, the story starts with Scaling. So this is probably a figure or something like this that you've seen very recently. Over the last five to six years, we've seen models really scale up in parameter size, and that's brought with it a bunch of new capabilities, like the ability to talk to you and tell you sometimes how to use your Colab screens.[00:03:35] Dan Fu: But another place where we've seen scaling especially recently is scaling in context length. So this can mean Having more text inputs for your models, but it can also mean things like taking a lot of visual token inputs image inputs to your models or generating lots of outputs. And one thing that's been really exciting over the last few months or so is that we're, we're seeing scaling, not only during training time, but also [00:04:00] during test time.[00:04:00] Dan Fu: So this is one of the, the, this is the iconic image from the OpenAI 01 release. Not only are we starting to scale train time compute, but we're also starting to scale test time compute. Now if you're familiar with our attention and our transformer architectures today, this graph on the right might look a little bit scary.[00:04:19] Dan Fu: And one of the reasons is that the implications are a little bit Interesting. So what does it mean if we want to continue having smarter and smarter models? Do we just need to start building bigger, bigger data centers, spending more flops? Is this this little Dolly 3, we need more flops, guys? Is this going to be the future of all of AI?[00:04:39] Dan Fu: Or is there a better way, another path forward? Maybe we can get the same capabilities that we've gotten used to, But for a lot less compute, a lot less flops. And one of the things that we're going to talk about today is specifically looking at that core attention operator in some of these models.[00:04:57] Dan Fu: And the reason is that so this is just some, some [00:05:00] basic you know, scaling curves, but attention has compute that scales quadratically in the context length. So that means that if you're doing something like test time compute and you want to spend a bunch of tokens thinking about what comes next, the longer that that goes the, the, the more tokens you spend on that, that compute grows quadratically in that.[00:05:19] Dan Fu: One of the questions that we're interested in is, can we take that basic sequence model, that basic sequence primitive at the bottom, and get it to scale better? Can we scale in, let's say, n to the 3 halves or n log n? So in, in the first part of the talk, so we just went over the introduction. What I'm gonna do over the next few slides is just talk about some of the key advances and ideas that have shown over the past few years since maybe early 2020 to, to now that shown promise that this might actually be possible.[00:05:48] Dan Fu: That you can actually get potentially the same quality that we want while scale, while scaling better. So to do that, we're and, and basically the, the story that we're gonna look is we're gonna start to see [00:06:00] how. So this is a basic graph of just the past couple years of progress of perplexity where that blue line, that dotted blue line, is attention.[00:06:07] The Story of SSMs[00:06:07] Dan Fu: It's your basic transformer, full dense attention. And then the dots coming down are some of the methods that you'll see in this presentation today. We're going to turn the clock back all the way to 2020. So this, this, this question of can we make attention subquadratic? Basically, as soon as we said attention is all you need, People started asking this question.[00:06:28] Dan Fu: So we have this quadratic attention operator. Can we do better? I'll briefly talk about why attention is quadratic. And the basic thing that happens, if you're not familiar, is that you have these inputs, these keys and queries. And what you do in this attention matrix, this S matrix over here, is that you're using, you're comparing every token in your input to every other token.[00:06:49] Dan Fu: So when I try to do something like upload a whole book to Gemini, what happens beyond the Maybe not Gemini, because we don't necessarily know what architecture is. But let's say we upload it to LLAMA, what happens beyond [00:07:00] the scenes, behind the scenes, is that it's going to take every single word in that book and compare it to every other word.[00:07:05] Dan Fu: And this has been a really, it's, it's led to some pretty impressive things. But it's kind of a brute forcing of the way that you would try to interpret a interpret something. And what attention does in particular is the, and then what attention, sorry, don't want to. Okay, no, no laser pointer. What, what attention does afterwards is that instead of always operating in this quadratic thing, it takes a row wise softmax over this matrix, and then multiplies it by this values matrix.[00:07:32] Dan Fu: So, one of the key points to notice is that the output size is always going to be the same as the inputs, at least in standard self attention. So one of the first things that folks tried to do around 2020 is this thing called linear attention, which is just, just noticing that if we take out this softmax from here, if we take out this non linearity in the middle of the attention operation, and then if you compute the keys and the values operation first, you actually never hit this quadratic bottleneck.[00:07:57] Dan Fu: So that, that's potentially a way [00:08:00] to get a lot more computationally efficient. And there are various ways to do this by basically using feature maps or try to approximate this overall attention computation. But some of this work sort of started to hit a wall in 2020. And the basic challenges were, were two.[00:08:16] Dan Fu: So one was quality. It was back then, it was kind of hard to, to get good quality with these linear attention operators. The other one was actually hardware efficiency. So these, this feature map that was just shown by a simplify simplify here. Actually ends up being quite computationally expensive if you just implement it naively.[00:08:34] Dan Fu: So you started having these operators that not only were you sure, you're not really sure if they have the same quality, but also they're actually just wall clock slower. So you kind of end up getting the worst of both worlds. So this was the the stage. So that kind of sets the stage for four years ago.[00:08:49] Dan Fu: Keep this in mind because linear attention is actually going to come back in a few years once we have a better understanding. But one of the works that started kicking off this, this [00:09:00] mini revolution in post transformer architectures was this idea called states based model. So here the seminal work is, is one about our work queue in 2022.[00:09:09] Dan Fu: And this, this piece of work really brought together a few ideas from, from some long running research research lines of work. The first one was, and this is really one of the keys to, to closing the gap in quality was just using things that, that if you talk to a, a, an electrical engineer off the street, they might know off, off the, like the back of their hand.[00:09:33] Idea 1: Approximation -> Principled Modeling[00:09:33] Dan Fu: But taking some of those properties with how we model dynamical systems in signal processing and then using those ideas to model the inputs, the, the text tokens in, for example a transformer like Next Token Prediction Architecture. So some of those early states-based model papers were looking at this relatively, relatively simple recurrent update model that comes from maybe chapter one of a signal processing class.[00:09:59] Dan Fu: But then using [00:10:00] some principle theory about how you should do that recurrent update in order to really get the most that you can out of your hidden state, out of your out of your sequence. So that, that was one key idea for quality and. When this was eventually realized, you started to see a bunch of benchmarks that were pretty sticky for a few years.[00:10:20] Dan Fu: Things like long range arena, some long sequence evaluation benchmarks, There was stuff in time series, time series analysis. They started to, you started to see the quality tick up in meaningful ways. But the other key thing that What's so influential about these states based models is that they also had a key idea about how you can compute these things efficiently.[00:10:45] Dan Fu: So if you go back to your machine learning 101 class where you learned about RNNs, one thing that you may have learned is that they don't paralyze as well as detention, because if you just run them naively, you have to do this kind of sequential update to process new tokens, [00:11:00] whereas in attention, you can process all the tokens in parallel at one time.[00:11:04] Dan Fu: One of the key insights behind the S4 paper was that these recurrent models, you could take them and you could also formulate them as a convolution. And in particular, with a convolution, you could, instead of using a PyTorch conv1d operation, you can compute that with the FFT. And that would give you n log n compute in the in the sequence length n with an operator that was relatively well optimized for modern hardware.[00:11:28] Dan Fu: So those are really, I'd say, the two key ideas in 2022 that started allowing these breakthroughs to happen in these non transformer architectures. So, these ideas about how to principally model sorry, how to model the recurrent updates of a mo of, of a sequence in a principled way, and also these key ideas in how you can compute it efficiently by turning it into a convolution and then scaling it up with the FFT.[00:11:53] Dan Fu: Along those same lines, so afterwards we started putting out some work on specialized kernels, so just [00:12:00] like we have flash attention for transformers, we also have works like flash fft conf, and if you look at these lines of work oftentimes when, whenever you see a new architecture, you see a new primitive one of the, one of the table stakes now is, do you have an efficient kernel so that you can actually get wall clock speed up?[00:12:14] Idea 3: Selection[00:12:14] Dan Fu: So by 2022, We are starting to have these models that had promising quality primitives, but and, and also promising wall clocks. So you could actually see regimes where they were better than transformers in meaningful ways. That being said, there were, there's still sometimes a quality gap, particularly for language modeling.[00:12:33] Dan Fu: And because languages, It's so core to what we do in sequence modeling these days the, the next, the next key idea that I'm going to talk about is this idea of selection mechanisms. And this is basically an idea of, so you have this recurrent state that you're keeping around that just summarizes everything that, that came before.[00:12:50] Dan Fu: And to get a good sequence model, one of the things that you really need to be able to do is have the model learn what's the best way to pick out pieces from that recurrent [00:13:00] state. So one of the, one of the major ideas here in a line of work called H3, Hungry Hungry Hippos, and also these hyena models were One way you can do this is by just adding some simple element wise gates.[00:13:13] Dan Fu: So versions of these ideas have been around for decades. If you squint at the LSTM paper you, you can probably find, find this gating mechanism. But turns out you can take those old ideas, add them into these new. state space models, and then you can see quality start to pick up. If you've heard of the Mamba model, this also takes the selection to the next level by actually making some changes in that fundamental recurrent state space.[00:13:40] Dan Fu: So, it's not only just this gating that happens around the SSM layer, but also you can actually make The ABCD matrices of your state space model, you can make them data dependent, which will allow you to even better select out different pieces from your hidden state depending on what you're seeing. I'll also point out if you look at the [00:14:00] bottom right of this figure, there's this little triangle with a GPU SRAM, GPU HBM, and this, this is just continuing that trend of when you have a new architecture you, you, you also release it with a kernel to, to, to show that it is hardware efficient, that it, that it can be hardware efficient on modern hardware.[00:14:17] Dan Fu: The, the, one of the next cool things that happened is once we had this understanding of these are the basic pieces, these are the basic principles behind some of the sequence models linear attention actually started to come back. So in earlier this year, there was a model called BASED the, from Simran Arora and, and some other folks, that combined a more principled version of linear attention that basically the, the, the, the two second summary is that it used a Taylor approximation of the softmax attention, combined that with a simple sliding window attention and was starting to able, starting to be able to expand the Pareto frontier of how much data can you recall from your sequence, versus how small is your recurrent state size.[00:14:58] Dan Fu: So those orange dots [00:15:00] are, at the top there, are just showing smaller sequences that can recall more memory.[00:15:07] Just Read Twice[00:15:07] Dan Fu: And the last major idea I think that has been influential in this line of work and is very relatively late breaking just a few months ago, is just the basic idea that when you have these models that are fundamentally more efficient in the sequence length, you maybe don't want to prompt them or use them in exactly the same way.[00:15:26] Dan Fu: So this was a really cool paper called Just Read Twice, also from Simran. That basically said, hey, all these efficient models can process tokens so much more efficiently than transformers that they can sometimes have unfair advantages compared to a simple transformer token. So, or sorry, a simple transformer model.[00:15:44] Dan Fu: So take, for example the standard, the standard use case of you have some long document, you're going to pass it in as input, and then you're going to ask some question about it. One problem you might imagine for a recurrent model where you have a fixed state size is, let's say that [00:16:00] you're. Article is very long, and you're trying to ask about some really niche thing.[00:16:04] Dan Fu: You can imagine it might be hard for the model to know ahead of time what information to put into the hidden state. But these, these, these models are so much more efficient that you can do something really stupid, like, you can just put the document write down the document, write down the question, write down the document again, and then write down the question again, and then this time, the second time that you go over that document, you know exactly what to look for.[00:16:25] Dan Fu: And the cool thing about this is, so this is, And this this results in better quality, especially on these recall intensive tasks. But the other interesting thing is it really takes advantage of the more efficient architectures that, that we're having here. So one of the other, I think, influential ideas in this line of work is if you change the fundamental compute capabilities of your model and the way that it scales, you can actually start to query it at test time differently.[00:16:51] Idea 4: Test Time Compute[00:16:51] Dan Fu: And this actually, of course, goes back to those slides on test time compute. So while everybody's looking at, say, test time compute for big transformer models, [00:17:00] I think potentially a really interesting research question is, how can you take those and how does it change with this new next generation of models?[00:17:09] Dan Fu: So the, I'll just briefly summarize what some of those key ideas were and then talk and then show you briefly kind of what the state of the art is today. So, so the four key ideas are instead of just doing a simple linear attention approximation, instead take ideas that we know from other fields like signal processing, do a more principled approach to your modeling of the sequence.[00:17:32] Idea 2: Hardware & Kernel Support[00:17:32] Dan Fu: Another key idea throughout all these lines of work is you really want. Hardware and kernel support from day one. So, so even if your model is theoretically more efficient if somebody goes and runs it and it's two times slower one of the things that, that we've learned is that if, if you're in that situation, it's, it's just gonna be dead on arrival.[00:17:49] Dan Fu: So you want to be designing your architectures one of the key, key machine learning ideas that has been important for the quality is just making sure that you encode different ways that you can [00:18:00] select from your hidden state and, and really focus on that as a key decider of quality. And finally, I think one of the, the, the emerging new, new things for, for this line of work and something that's quite interesting is, What are the right test time paradigms for these models?[00:18:15] Dan Fu: How do they change relative to relative to what you might do for a standard transformer? I'll briefly end this section. So I've labeled this slide where we are yesterday because Eugene is going to talk about some new models that he released literally this morning. But as of yesterday, some of the really cool results out of the, these efficient alternative models were so AI2 trained this hybrid MOE called Jamba.[00:18:40] Dan Fu: That, that, that seems, that is currently the state of the art for these non transformer architectures. There's this NVIDIA and MIT put out this new diffusion model called SANA recently that one of their key key observations is that you can take a standard diffusion transformer diffusion model, replace the layers with linear [00:19:00] attention, and then that lets you scale to much larger much larger images, much, much Much larger sequences more efficiently.[00:19:07] Dan Fu: And and one thing that I don't think anybody would have called when a few years ago is that one of those gated SSM, gated states based models ended up on the cover of Science because a great group of folks went and trained some DNA models. So that's Michael Polley, Eric Yuen from from Stanford and the Arc Institute.[00:19:26] Dan Fu: So it's, we're really at an exciting time in 2024 where these non transformer, post transformer architectures are showing promise across a wide range. Across a wide range of, of modalities, of applications, and, and of tasks. And with that, I'll pass it on to Eugene, who can tell you a little bit about the latest and greatest with RWKV.[00:19:49] RWKV vs SSMs[00:19:49] Eugene Cheah: So, that's useful? Yeah. You're talking to here. Oh, I'm talking to here. Okay. So, yeah, two streams. Yeah. So, I think one common questions that we tend to get asked, right, is what's the difference between [00:20:00] RWKV and state space? So I think one of the key things to really understand, right the difference between the two groups, right, is that we are actually more like an open source, random internet meets academia kind of situation.[00:20:11] Eugene Cheah: Like, most of us never wrote any paper, but we, we basically look at RNNs and linear intention when intention is all you need came out, and then we decided to like, hey there is a quadratic scaling problem. Why don't we try fixing that instead? So, so, so we end up developing our own branch, but we end up sharing ideas back and forth.[00:20:30] Eugene Cheah: So, and, and we do all this actively in Discord, GitHub, etc. This was so bad for a few years, right, that basically, the average group's H index was so close to zero, right, Illuter. ai actually came in and helped us write our first paper. Great, now our H index is now three, apparently. So, so, so, but, but the thing is, like, a lot of these experiments led to results, and, and, essentially, essentially, we we took the same ideas from linear attention, [00:21:00] and we built on it.[00:21:01] Eugene Cheah: So, to take a step back into, like, how does RWKB handle its own attention mechanic and achieve the same goals of, like, O and compute, respectively, and in focus of our overall goal to make AI accessible to everyone, regardless of language, nation, or compute, that's our goal. We actually train our models primarily on over a hundred languages, which is another topic altogether.[00:21:23] Eugene Cheah: And our goal is to train to even 200 languages to cover all languages in the world. But at the same time, we work on this architecture, To lower the compute cost so that people can run it on Raspberry Pis and on anything. So, how did RWKB break the dependency of LSTM token flow? Because I think to understand architecture, right, it's probably easier to understand it from the RNN lens.[00:21:46] Eugene Cheah: Because that's where we built on. We all, we all state space kind of like try to, try to start anew and took lessons from that and say, So there's a little bit of divergence there. And AKA, this our version of linear attention. So to take step back [00:22:00] all foundation models, be it transformers or non transformers at a very high level, right?[00:22:05] Eugene Cheah: Pumps in the token. I mean, text that things into embeddings and go through a lot of layers. Generate a lot of states where the QKV cache or be iron in states or RW KB states. And outputs and embedding, they are not the same thing. And we just take more layers and more embeddings. And somehow that magically works.[00:22:23] Eugene Cheah: So, if you, if you remember your ancient RNN lessons which we, which we, which we we call best learning these days the general idea is that you have the embedding information flowing all the way up, and when, and you take that information and you flow it back down, and then you process it as part of your LSTM layers.[00:22:41] Eugene Cheah: So, this is how it generally works. Kapati is quoted saying that RNNs are actually unreasonably effective. The problem is this is not scalable. To start doing work on the second token, you need to wait for the first token. And then you need to, and likewise for the third token and fourth token, yada yada.[00:22:55] Eugene Cheah: That is CPU land, not GPU land. So, so, so, you [00:23:00] can have a H100 and you can't even use 1 percent of it. So, so that's kind of why RNNs didn't really take off in the direction that we wanted, like, billions of parameters when it comes to training. So, what did RDAP KV version 0 do? Boom. We just did the dumbest, lamest thing.[00:23:13] Eugene Cheah: Sorry, this is the bottleneck for RNN. We did the dumb thing of removing that line. And it kind of worked. It trained. It sucked, but it kind of worked. Then we were like, hey, then no one cared because the loss was crap, but how do we improve that? And that's essentially where we move forward, because if you see this kind of flow, right, you can actually get your GPU saturated quickly, where it essentially cascades respectively.[00:23:41] Eugene Cheah: So I'm just waiting for this to loop again. So it's like, once you get your first layer, your token to be computed finish. You start to cascade your compute all the way until you are, Hey, I'm using 100 percent of the GPU. So we, we worked on it, and we started going along the principle of that as long as we keep this general architecture [00:24:00] where, where we can cascade and, and be highly efficient with our architecture, nothing is sacred in our architecture.[00:24:06] Eugene Cheah: And we have done some crazy ideas. In fact, you ask us, if you ask me to explain some things in the paper, right, officially in the paper, I'll say we had this idea and we wrote it this way. The reality is someone came with a code, we tested it, it worked, and then we rationalized later. So, so the general[00:24:24] RWKV Arch[00:24:24] Eugene Cheah: The idea behind rwkbr is that we generally have two major blocks that we do.[00:24:30] Eugene Cheah: We call time mix and channel mix. And time mix generally handles handles long term memory states, where essentially, where essentially where we apply the matrix multiplication and Cilu activation functions into processing an input embedding and an output embedding. I'm oversimplifying it because this, This calculation changed every version and we have, like, version 7 right now.[00:24:50] Eugene Cheah: ChannelMix is similar to Base in the sense that it does shorter term attention, where it just looks at the sister token, or the token before it, because [00:25:00] there's a shift in the token shift matrix. I don't really want to go too much into the papers itself, because, like, we do have three papers on this.[00:25:09] Eugene Cheah: Basically, RWKB, RNN for the transformer, ERA, Ego and Pinch, RWKB, Matrix Value State. This is the updated version 5, version 6. And Goldfinch is our, is, is, is, is our hybrid model respectively. We are writing the paper already for V seven and which is, which is for R wk V seven. Called, named Goose, or architectures are named by Bird.[00:25:30] Eugene Cheah: And, I'm going to cover as well, qrwkb, and mama100k, and rwkb, and Where did that lead to? Great! Because we are all GPU poor and to be clear, like, most of this research is done, like, only on a handful H100s, which I had one Google researcher told me that was, like, his experiment budget for a single researcher.[00:25:48] Eugene Cheah: So, our entire organization has less compute than a single researcher in Google. So We, we, one of the things that we explored into was to how do we convert transformer models instead? Because [00:26:00] someone already paid that billion dollars, a million dollars onto training, so why don't we take advantage of those weights?[00:26:05] Eugene Cheah: And, and to, I believe, together AI worked on the lockets for, for the Lambda side of things, and, and we took some ideas from there as well, and we essentially did that for RWKB.[00:26:15] QWRKWv6 launch[00:26:15] Eugene Cheah: And that led to, Q RWKB6, which we just dropped today, a 32 bit instruct preview model, where we took the Quen 32 bit instruct model, freeze the feedforward layer, remove the QKB attention layer, and replace it with RWKB linear layers.[00:26:32] Eugene Cheah: So to be clear, this means we do not have the rwkv channel mix layer, we only have the time mix layer. But but once we do that, we train the rwkv layer. Important is that the feedforward layer needs to be frozen, so the new attention can be learned. And then we unfreeze the feedforward layer, and train all the layers together with a custom learning rate schedule, so that they can learn how to work together.[00:26:54] Eugene Cheah: The end result, surprisingly, And, to be honest, to the frustration of the R. W. [00:27:00] KV MOE team, which ended up releasing the model on the same day, was that, with just a few hours of training on two nodes, we managed to get it to be on par, kind of, with the original QUAN32B model. So, in fact, when the first run, right, that completely confused us, it was like, and I was telling Daniel Goldstein, Smirky, who kind of leads most of our research coordination, When you pitched me this idea, you told me at best you'll get the same level of performance.[00:27:26] Eugene Cheah: You didn't tell me the challenge and score and Winograd score will shoot up. I don't know what's happening there. But it did. MMLU score dropping, that was expected. Because if you think about it, when we were training all the layers, right, we were essentially Like, Frankenstein this thing, and we did brain damage to the feedforward network layer 2 with the new RWKB layers.[00:27:47] Eugene Cheah: But, 76%, hey, somehow it's retained, and we can probably further train this. We didn't even spend more than 3 days training this, so there's a lot more that can be done, hence the preview. This brings up [00:28:00] a big question, because We are already now in the process of converting to 7TB. We are now, this is actually extremely compute efficient to test our attention mechanic.[00:28:10] Eugene Cheah: It's like, it becomes a shortcut. We can, we are already planning to do our version 7 and our hybrid architecture for it. Because we don't need to train from scratch. And we get a really good model out of it. And the other thing that is uncomfortable to say is that because we are doing right now on the 70b is that if this scales correctly to 128k context length, I'm not even talking about a million 128, majority of enterprise workload today is just on 70b at under 32k context length.[00:28:41] Eugene Cheah: That means if this works and the benchmark matches it, It means we can replace the vast majority of current AI workload, unless you want super long context. And then sorry, can someone give us more GPUs? Because we do need the VRAM for super long context, sadly. So yeah, that's what we are working on, and essentially, [00:29:00] we are excited about this to just push it further.[00:29:02] Eugene Cheah: And this conversion process, to be clear, I don't think it's going to be exclusive to RWKB. It probably will work for Mamba as well, I don't see why not. And we will probably see more ideas, or more experiments, or more hybrids, or Yeah, like, one of the weirdest things that I wanted to say outright, and I confirmed this with the Black Mamba team and the Jamba team, which because we did the GoFinch hybrid model, is that none of us understand why a hard hybrid with a state based model to be R.[00:29:28] Eugene Cheah: QA state space and transformer performs better when, than the baseline of both. It's like, it's like when you train one, you expect, and then you replace, you expect the same results. That's our pitch. That's our claim. But somehow when we jam both together, it outperforms both. And that's like one area of emulation that, like, we only have four experiments, plus four teams, that a lot more needs to be done.[00:29:51] Eugene Cheah: But, but these are things that excite me, essentially, because that is what it's potentially we can move ahead for. Which brings us to what comes next.[00:30:00] What's next[00:30:00] [00:30:00][00:30:00] Dan Fu: So, this part is kind of just some, where we'll talk a little bit about stuff that, that we're excited about. Maybe have some wild speculation on, on what, what's, what's coming next.[00:30:12] Dan Fu: And, of course this is also the part that will be more open to questions. So, a couple things that, that I'm excited about is continued hardware model co design for, for these models. So one of the things that we've put out recently is this library called ThunderKittens. It's a CUDA library.[00:30:29] Dan Fu: And one of the things that, that we found frustrating is every time that we built one of these new architectures, and I'm sure you had the exact same experience, we'd have to go and spend two months in CUDA land, like writing these, these new efficient things. And. If we decided to change one thing in PyTorch, like one line of PyTorch code is like a week of CUDA code at least.[00:30:47] Dan Fu: So one of our goals with, with a library like Thunderkitten, so we, we just broke down what are the key principles, what are the key hardware things what are the key, Compute pieces that you get from the hardware. So for example on [00:31:00] H100 everything is really revolves around a warp group matrix multiply operation.[00:31:06] Dan Fu: So you really want your operation to be able to split into relatively small matrix, matrix multiply operations. So like multiplying two 64 by 64 matrices, for example. And so if you know that ahead of time when you're designing your model, that probably gives you you know, some information about how you set the state sizes, how you set the update, how you set the update function.[00:31:27] Dan Fu: So with Thunderkittens we basically built a whole library just around this basic idea that all your basic compute primitives should not be a float, but it should be a matrix, and everything should just be matrix compute. And we've been using that to, to try to both re implement some existing architectures, and also start to design code.[00:31:44] Dan Fu: Some new ones that are really designed with this core with a tensor core primitive in mind. Another thing that that we're, that at least I'm excited about is we, over the last four or five years, we've really been looking at language models as the next thing. But if you've been paying [00:32:00] attention to Twitter there's been a bunch of new next generation models that are coming out.[00:32:04] Dan Fu: So there, there are. So, video generation models that can run real time, that are supported by your mouse and your keyboard, that I'm told if you play with them that, you know, that they only have a few seconds of memory. Can we take that model, can we give it a very long context length so that you could actually maybe generate an entire game state at a time?[00:32:25] Dan Fu: What does that look like for the model? You're certainly not going to do a giant quadratic attention computation to try to run that. Maybe, maybe use some of these new models, or some of these new video generation models that came out. So Sora came out I don't know, two days ago now. But with super long queue times and super long generation times.[00:32:43] Dan Fu: So that's probably a quadratic attention operation at the, at the bottom of it. What if we could remove that and get the same quality, but a lot faster generation time? Or some of the demos that we saw from Paige earlier today. You know, if I have a super long conversation with my [00:33:00] Gemini bot, what if I wanted to remember everything that it's seen in the last week?[00:33:06] Dan Fu: I mean, maybe you don't for personal reasons, but what if I did, you know? What does that mean for the architecture? And I think, you know, that's certainly something I'm pretty excited about. I'm sure you're excited about it too. So, I think we were supposed to have some hot takes, but I honestly don't remember what our hot takes were.[00:33:21] Hot Takes - does anyone really need long context?[00:33:21] Eugene Cheah: Yeah, including the next slide. Hot takes, yes, these are our[00:33:25] Dan Fu: hot takes.[00:33:25] Eugene Cheah: I think the big one on Twitter that we saw, that we shared, was the question is like, is RAG relevant? In the case of, like, the future of, like, state based models?[00:33:38] Dan Fu: Let's see, I haven't played too much with RAG. But when I have. I'll say I found it was a little bit challenging to do research on it because we had this experience over and over again, where you could have any, an embedding model of any quality, so you could have a really, really bad embedding model, or you could have a really, really [00:34:00] good one, By any measure of good.[00:34:03] Dan Fu: And for the final RAG application, it kind of didn't matter. That's what I'll say about RAG while I'm being recorded. I know it doesn't actually answer the question, but[00:34:13] Eugene Cheah: Yeah, so I think a lot of folks are like, extremely excited of the idea of RWKB or State Space potentially having infinite context.[00:34:21] Eugene Cheah: But I think the reality is that when we say infinite context, we just mean a different kind of infinite context, or you, or as it's previously covered, you need to test the model differently. So, think of it more along the lines of the human. Like, I don't remember what I ate for breakfast yesterday.[00:34:37] Eugene Cheah: Yeah, that's the statement that I'll say. And And we humans are not quadratic transformers. If we did, if let's say we increased our brain size for every second we live, we would have exploded by the time we are 5 years old or something like that. And, and I think, I think basically fundamentally for us, right, be it whether we, regardless of whether RWKB, statespace, XLSTM, [00:35:00] etc, our general idea is that instead of that expanding state, that increase in computational cost, what if we have a fixed state size?[00:35:08] Eugene Cheah: And Information theory detects that that fixed state size will have a limit. Just how big of a limit is a question, like, we, like, RWKB is running at 40 megabytes for, for its state. Its future version might run into 400 megabytes. That is like millions of tokens in, if you're talking about mathematically, the maximum possibility.[00:35:29] Eugene Cheah: It's just that I guess we were all more inefficient about it, so maybe we hit 100, 000. And that's kind of like the work we are doing, trying to like push it and maximize it. And that's where the models will start differing, because it will choose to forget things, it will choose to remember things. And that's why I think that there might be some element of right, but it may not be the same right.[00:35:49] Eugene Cheah: It may be the model learn things, and it's like, hmm, I can't remember that, that article. Let me do a database search, to search. Just like us humans, when we can't remember the article in the company. We do a search on Notion. [00:36:00][00:36:00] Dan Fu: I think something that would be really interesting is if you could have facts that are, so right now, the one intuition about language models is that all those parameters are around just to store random facts about the world.[00:36:14] Dan Fu: And this intuition comes from the observation that if you take a really small language model, it can do things like talk to you, or kind of has like the The style of conversation, it can learn that, but where it will usually fall over compared to a much larger one is it'll just be a lot less factual about things that it knows or that it can do.[00:36:32] Dan Fu: But that points to all those weights that we're spending, all that SGD that we're spending to train these models are just being used to store facts. And we have things like databases that are pretty good at storing facts. So I think one thing that would be really interesting is if we could actually have some sort of outside data store that a language model can can look at that that maybe is you know, has has some sort of gradient descent in it, but but would be quite interesting.[00:36:58] Dan Fu: And then maybe you could edit it, delete [00:37:00] facts, you know, change who's president so that it doesn't, it doesn't get lost.[00:37:04] Vibhu: Can we open up Q& A and hot takes for the audience? I have a hot take Q& A. Do these scale? When, when 405B state space model, RAG exists, no one does long context, who's throwing in 2 million token questions, hot takes?[00:37:24] Dan Fu: The, the who's throwing in 2 million token question, I think, is, is a really good question. So I actually, I was going to offer that as a hot take. I mean, my hot take was going to be that long context doesn't matter. I know I just gave a whole talk about it, but you know, what, what's the point of doing research if you can't, you know, play both sides.[00:37:40] Dan Fu: But I think one of the, so I think for both of us, the reason that we first got into this was just from the first principled questions of there's this quadratic thing. Clearly intelligence doesn't need to be quadratic. What is going on? Can we understand it better? You know, since then it's kind of turned into a race, which has [00:38:00] been exciting to watch, like, how much context you can take in.[00:38:03] Dan Fu: But I think it's right. Nobody is actually putting in a two million context prompt into these models. And, and, you know, if they are, maybe we can go, go You know, design a better model to do that particular thing. Yeah, what do you think about that? So you've also been working on this. Do you think long context matters?[00:38:19] Eugene Cheah: So I'm going to burn a bit. How many of you remember the news of Google Gemini supporting 3 million contacts, right? Raise your hand.[00:38:28] Vibhu: Yeah, 2 million.[00:38:29] Eugene Cheah: Oh, it's 2 million.[00:38:31] Eugene Cheah: Yeah, how many of you actually tried that? See?[00:38:34] Vibhu: I use it a lot. You? You work for MindsTV. I use it a lot.[00:38:41] Eugene Cheah: So, for some people that has used, and I think, I think that's the, that's might be, like, this is where my opinion starts to differ, because I think the big labs may have a bigger role in this, because Like, even for RWKB, even when we train non contacts, the reason why I say VRAM is a problem is that because when we did the, we need to backprop [00:39:00] against the states, we actually need to maintain the state in between the tokens by the token length.[00:39:05] Eugene Cheah: So that means we need to actually roll out the whole 1 million contacts if we are actually training 1 million. Which is the same for transformers, actually, but it just means we don't magically reuse the VRAM consumption in the training time space. So that is one of the VRAM bottlenecks, and I'm neither OpenAI nor Google, so donate GPUs if you have too much of them.[00:39:27] Eugene Cheah: But then, putting it back to another paradigm, right, is that I think O1 style reasoning might be actually pushing that direction downwards. In my opinion, this is my partial hot take is that if, let's say you have a super big model, And let's say you have a 70B model that may take double the tokens, but gets the same result.[00:39:51] Eugene Cheah: Strictly speaking, a 70B, and this is even for transformer or non transformer, right? We we'll take less less resources than that 400 B [00:40:00] model, even if it did double the amount thinking. And if that's the case, and we are still all trying to figure this out, maybe the direction for us is really getting the sub 200 B to be as fast as efficient as possible.[00:40:11] Eugene Cheah: We a very efficient architecture that some folks happen to be working on to, to just reason it out over larger and larger context thing.[00:40:20] Question: Yeah. One thing I'm super interested in is. Models that can watch forever? Obviously you cannot train something on infinite context length. How are y'all thinking about that, where you run on a much longer context length than is possible to train on?[00:40:38] Dan Fu: Yeah, it's a, it's a great question. So I think when I think you guys probably had tweets along these lines, too. When we first started doing these things, because these are all recurrent models in theory you could just run it forever. You could just run it forever. And at the very least it won't, it won't like error out on your crash.[00:40:57] Dan Fu: There's another question of whether it can actually [00:41:00] use what it's seen in that infinite context. And I think there, so one place where probably the research and architectures ran faster Then another research is actually the benchmarks for long context. So you turn it on forever. You want to do everything or watch everything.[00:41:16] Dan Fu: What is it that you actually wanted to do? Can we actually build some benchmarks for that? Then measure what's happening. And then ask the question, can the models do it? Is there something else that they need? Yeah, I think that if I were to turn back the clock to 2022, that's probably one of the things I would have done differently, which would have been actually get some long context benchmarks out at the same time as we started pushing context length on all these models.[00:41:41] Eugene Cheah: I will also say the use case. So like, I think we both agree that there's no Infinite memory and the model needs to be able to learn and decide. I think what we have observed for, I think this also fits the state space model, is that one of the key advantages of this alternate attention mechanic that is not based on token position is that the model don't suddenly become crazy when you go past the [00:42:00] 8k training context tank, or a million context tank.[00:42:03] Eugene Cheah: It's actually still stable. It's still able to run, it's still able to rationalize. It just starts forgetting things. But some of these things are still there in latent memory. Some of these things are still somewhat there. That's the whole point of why reading twice works. Things like that. And one of the biggest pushes in this direction is that I think both Statespace and RWKB have Separate papers by other researchers where they use this architecture for time series data.[00:42:26] Eugene Cheah: Weather modeling. So, you are not asking what was the weather five days ago. You're asking what's the weather tomorrow based on the infinite length that we, as long as this Earth and the computer will keep running. So, so, and they found that it is like, better than existing, like, transformer or existing architecture in modeling this weather data.[00:42:47] Eugene Cheah: Control for the param size and stuff. I'm quite sure there are people with larger models. So, so there are things that, that in this case, right, there is future applications if your question is just what's next and not what's 10 years ago.[00:42:59] Dan Fu: Thanks so [00:43:00] much for having us. Get full access to Latent Space at www.latent.space/subscribe
Lalit Modi, the founder of the IPL (Indian Premier League), sharing his in-depth recount and incredible details on how the IPL was created in this episode. How he orchestrated everything from the players, media deals, team owners to sponsors, and all within a few months. Key Highlights Recap of Episode #1 Putting an “A team” together to make the IPL a reality, bringing IMG into the group (on success basis), finding the best experts from around the world, etc Learning from the best Leagues in the world to create a new franchise model First on the agenda, securing the top 100 players from around the world – challenges with scheduling of other Cricket globally Window from March-May was identified and locked in Creating the auction process and selling the idea to the players – brackets for players created – Minimum/Maximum cap spend on players USD 4- 5.5 mil by teams “Controversy” a key component of his strategy – keep media and people talking and guessing (free PR) “14 days” of playing Cricket - players paid per day – USD 20k – 100k per day and some as high as USD 200k – so over USD 2 mil for the whole season Lots of resistance at first with Media Owners – Nimbus thought they should own it because of the deal with the BCCI, black listed Murdoch (Fox), Zee TV already started ICL (competing competition) Need to create interest and competition with other broadcasters and agencies (aiming for USD 1 billion over 10 years) WSG (World Sports Group) bid USD 1b (both domestic and global rights) – partnered with SONY for year 1 in India Mapped out advertising inventory during the Match time Right to broadcast only, production done by League, etc Target USD 59 million in Year one as minimum to pay Teams (de-risk the purchase of team franchises – min USD 50 mil per franchise was his target (paid over 10 yrs) Key – an 8 pm ICT start, not to compete with other Indian Cricket afternoon slots – targeting Bollywood and entertainment budgets and appeal to Women in the household (who control the TV remote control at that time slot in the Indian household in those days) Next target – Team owners and the right profile – Shah Rukh Khan became key , Mukesh & Anil Ambani, Vijay Mallya, etc Bid for all cities allowed, highest bidder wins, if bidder wins more than one city, bidder can chose which city he wants to take Bids ranged from USD 65-120 mil (for 10 yrs) Open process – in front of media and cameras Teams don't own Stadiums – not allowed Shah Rukh Khan – Nokia – sponsors his team and how that helped with upfront funding Everything done between January – April before the first ball was played Principal Sponsor deals for the League – USD 10 mil per year for title, USD 8 mil for associate – and stories behind it No open Media access to the matches – deliberate “controversy” strategy At the time already secured – USD 100 mil cheque for media (Year 1), USD 75 mil (10% of franchise fee) before it even started (not including sponsorship yet) Bidding process for Sponsorship rights (Hero Honda, DLF, etc) – DLF Indian Premier League Associate sponsor deals – Citi, Vodaphone, Kingfisher, etc Opening Ceremony and first game - Shah Rukh Khan vs Vijay Mallya Sony struggling to get distribution across the country – not nationwide yet. On day of first game, Lalit opens the doors to all media to cover the game. Sony furious. 100% TV coverage across India, all over the news and press Shah Rukh Khan's teams win – Bollywood stars celebrating on the field – huge success Sony distribution went through the roof. IPL was an instant success from then on Secret sauce – Competitive high profile owners, Shah Rukh Khan for entertainment and women audience, music, cheerleaders, etc To be continued……. About Globally recognized leader, Executive Director of Modi Enterprises, with business interests across nearly every industry vertical. Built a billion-dollar brand in less than a year, launched a revolution to Indian broadcast entertainment, and negotiated partnerships with ESPN, Disney, Google, Ten Sports, B4U, Fashion TV, Voyages TV, Buena Vista Television, United Artists, Marvel, Nike and others. International business strategist who catalyzes transformation and change by capitalizing on unmet demand in domestic and overseas markets. Embraces technology and creates opportunities to enrich lives, create sustainable revenue, and optimize productivity through integration of emerging platforms in social and digital. An entrepreneur at heart with a phenomenal ability to connect people and ideas into revenue-generating enterprises, an active member of the board for one of the oldest and most successful industrial conglomerates in India, and the thought- leader behind radical change in India's leading Cricket administration, the BCCI. Recognized by Time, Sports Illustrated, Business Week, Forbes, Business Standard, and other influential global media outlets for launch of the Indian Premier League (IPL), one of world's most popular Cricket organizations. Was Awarded ‘Indian of the Century” by India Today in Forbes nominated Modi as “Rainmaker of the Century” Against extensive political and economic challenges, brought IPL to South Africa, and sustained the same phenomenal levels of viewership and attendance despite the move. Columbia University and Stanford University Both did case studies on Modi, which are still being taught in Universities across the world. Created billions dollar opportunities, and played an integral role in shaping the cultural direction of Modern India as visionary behind transformational changes in broadcast, sports, entertainment, and consumer products. Follow us on our social sites for the latest updates Instagram: https://www.instagram.com/sportsentrepreneurs/ Facebook: https://www.facebook.com/marcusluerpodcast LinkedIn: https://www.linkedin.com/company/sports-entrepreneurs Website: https://marcusluer.com Podcast: https://marcusluer.com/podcast To get in touch, please email us at podcast@marcusluer.com Feel Good by MusicbyAden https://soundcloud.com/musicbyaden Creative Commons — Attribution-ShareAlike 3.0 Unported — CC BY-SA 3.0 Free Download / Stream: https://bit.ly/_feel-good Music promoted by Audio Library https://youtu.be/bvgIqqRStcQ
Lalit Modi, the Architect and Founder of the IPL (Indian Premier League) – we recorded four episodes to go very deep into his amazing stories from his early days as a young entrepreneur making his marks, to his entry into Cricket (board of BCCI) to the creation of the IPL and controversies after. Lots of amazing stories, facts and anecdotes many told for the first time. Key Highlights Lalit's early day, joining the family Tobacco business (Godfrey Philipps India) – the three criteria for the family to support a business JV with Disney in India – licensing, merchandise, etc (Michael Eisner his mentor) - looking to launch sports channel with ESPN – movies, etc Early days of DTH and cable operators in India – Star TV (Prime Sports later Star Sports) Placing Disney TV program on Doordarshan (DD) – huge hit Rupert Murdoch coming into the picture – Modi Entertainment Network – Zee TV story Power of Database from Godfrey Philipps India to launch sports network – IMG Bill Sinrich early days of sales of Indian Cricket rights Leveraging Indian Cricket rights to launch an encrypted sports channel (first India's Pay channel) Lalit entering Cricket space – Mark Mascarenhas (Worldtel) – the key broker at the time 2002/3 – first time the concept of an Indian Cricket League comes up – 50 over game – Indian Cricket League (ICL) 2005 – Lalit becoming VP of BCCI – commercialization of Indian cricket taking off under his leadership First Lalit cancels Sahara contract – (paying only USD 100k per year, front of Jersey) – pushed to USD 1 mil per Day of Cricket (105 days) Nike new kit deal – (Reebok small deal before) – Nike bids 52 mil per year during tender process Terminating Murdoch's media contract story – new tender pushing fees up 10x – bids over USD 500 million – political pressure Harish Thawani – Nimbus – USD 612-625 mil (4 years) – launched NEO Sports on back of it ICC T20 World Cup in South Africa, 2007 - re-ignites the ideas of a T20 League BCCI board gives Lalit a green light to explore the idea since he had already made some a few billions over the past 2 years Indian Players not very enthusiastic – hadn't played the format Subhash Chandra – ZEE TV owner - copied idea and launched ICL first – lots of issues BCCI stops Indian national team players from joining – international players joined IMG - Andrew Wildblood – enters the frame – Lalit hires them to put the concept together (Aug 2007) Lalit knows key is for India to do well in T20 WC - offers a Porsche for Six Sixes to any Indian player to motivate them to win the T20 WC Negotiating with international players in South Africa Rudra Singh hits six sixes - India vs England – huge news, Indian fans catch the fever and Indian wins the WC Open Double Decker Bus parade for players – pouring rain on that day – team arriving from South Africa – nine hour tour – 4 million people on the streets T20 was the new buzz word in India – story to be continued……… About Globally recognized leader, Executive Director of Modi Enterprises, with business interests across nearly every industry vertical. Built a billion-dollar brand in less than a year, launched a revolution to Indian broadcast entertainment, and negotiated partnerships with ESPN, Disney, Google, Ten Sports, B4U, Fashion TV, Voyages TV, Buena Vista Television, United Artists, Marvel, Nike and others. International business strategist who catalyzes transformation and change by capitalizing on unmet demand in domestic and overseas markets. Embraces technology and creates opportunities to enrich lives, create sustainable revenue, and optimize productivity through integration of emerging platforms in social and digital. An entrepreneur at heart with a phenomenal ability to connect people and ideas into revenue-generating enterprises, an active member of the board for one of the oldest and most successful industrial conglomerates in India, and the thought- leader behind radical change in India's leading Cricket administration, the BCCI. Recognized by Time, Sports Illustrated, Business Week, Forbes, Business Standard, and other influential global media outlets for launch of the Indian Premier League (IPL), one of world's most popular Cricket organizations. Was Awarded ‘Indian of the Century” by India Today in Forbes nominated Modi as “Rainmaker of the Century” Against extensive political and economic challenges, brought IPL to South Africa, and sustained the same phenomenal levels of viewership and attendance despite the move. Columbia University and Stanford University Both did case studies on Modi, which are still being taught in Universities across the world. Created billions dollar opportunities, and played an integral role in shaping the cultural direction of Modern India as visionary behind transformational changes in broadcast, sports, entertainment, and consumer products. Follow us on our social sites for the latest updates Instagram: https://www.instagram.com/sportsentrepreneurs/ Facebook: https://www.facebook.com/marcusluerpodcast LinkedIn: https://www.linkedin.com/company/sports-entrepreneurs Website: https://marcusluer.com Podcast: https://marcusluer.com/podcast To get in touch, please email us at podcast@marcusluer.com Feel Good by MusicbyAden https://soundcloud.com/musicbyaden Creative Commons — Attribution-ShareAlike 3.0 Unported — CC BY-SA 3.0 Free Download / Stream: https://bit.ly/_feel-good Music promoted by Audio Library https://youtu.be/bvgIqqRStcQ
Ficha técnica Hosts: Leticia Dáquer e Thiago Corrêa Edição: Leticia Dáquer Capa: Leticia Dáquer Data da gravação: 20/10/2024 Data da publicação: 24/10/2024 Músicas/áudios: The Oldest Song in the World The Most Mysterious Song on the Internet - HQ Stereo Remastered Coisas mencionadas no episódio: Episódio do Stuff You Should Know sobre mistérios da internet, incluindo a música desconhecida Música da bunda do Bosch A música mais misteriosa da internet (Wikipedia) But what is CRISPR-Cas9? An animated introduction to Gene Editing Bom Leticia World-first therapy using donor cells sends autoimmune diseases into remission (Nature, 04/10/2024) Discovering Roman mosaics - A fabulous new find where history meets luxury in Antakya (World Archeology, 18/11/2020) Thiago Ouça a música mais antiga do mundo, de 3.400 anos (Olhar Digital, 29/07/2024) World's first ‘meltdown-proof' nuclear reactor aces safety test (New Atlas, 24/07/2024) Brazilian artist swaps historical coin in British Museum for a fake (The Guardian, 22/07/2024) Mau Leticia Políticos de cidade alemã querem restringir venda de kebab (Carta Capital, 12/08/2024) Thiago Music industry's 1990s hard drives, like all HDDs, are dying (12/09/2024) Feio Leticia State-of-the-Art Fire Station Leveled by Blaze (Newsweek, 18/10/2024) Woman passes her driving test on her 960th go after spending £11,000 (The Mirror, 26/03/2023) Comandante de navio dos EUA é rebaixado após atirar com a mira ao contrário (UOL, 04/09/2024) Engenheira mantém 7,4 mil abas abertas no Firefox há mais de dois anos (Terra, 07/05/2024) Two San Francisco nudists save man from being attacked in street by a "crazy kind of pirate guy"with a blowtorch (MSN, 07/2024) Thiago Alemanha vai parar de usar disquetes em navios de guerra (Tecnoblog, 07/2024) MS-DOS and Windows 3.11 still run train dashboards at German railway — company listed admin job for 30-year-old operating system (Tom's Hardware, 29/01/2024) Math student builds fusion reactor at home with help from Claude AI and $2,000 (Techspot, 03/09/2024) Restaurant sues customer over $3,000 waitress tip he left on $13 meal (Unilad, 01/07/2024) Maldives minister arrested for performing ‘black magic' on President Muizzu: Report (Hindustan Times, 28/06/2024) Parceria com Veste Esquerda: Agora tem camiseta do Pistolando direto no site da Veste Esquerda! Mas o código de desconto PISTOLA10 dá 10% de desconto na sua compra da nossa e de outras camisetas maneiríssimas esquerdopatas! Parceria com Editora Boitempo: compre livros por esse link aqui pra gente ganhar uns trocados de comissão :) Nosso link de associados da Amazon, mas só em último caso, hein: bit.ly/Pistolando Parceria com o ICL: inscreva-se nos cursos pelo nosso link Esse podcast é produzido pelo Estopim Podcasts. Precisa de ajuda pra fazer o seu podcast? Chega mais, que a gente te dá uma mãozinha. Links do Pistolando www.pistolando.com contato@pistolando.com Twitter: @PistolandoPod Instagram: @PistolandoPod Apóie o Pistolando no Catarse, no Patreon e agora também no PicPay, ou faça um Pix pra gente usando a chave contato@pistolando.com Descrição da capa:
Tired of fumbling for your glasses or wrestling with contacts? This episode is for you! Join Melissa and Selah as they dive into vision correction with Dr. Sheri Rowen, exploring ICL, RLE, and LASIK – the superstars of clear vision. Not sure what those acronyms mean? ICL (Implantable Collamer Lens) works like internal contact lenses (and are even removable!), while RLE (Refractive Lens Exchange) can drastically reduce your need for glasses. And of course, there's the OG, popular LASIK procedure that reshapes your cornea for clear vision. But that's not all we cover! Learn why dry eye often affects contact lens wearers and how to protect your eyes from screen time with blue light glasses and the 20-20-20 rule: every 20 minutes, look 20 feet away and blink, blink, blink! Don't miss this eye-opening episode! Care Experts is a weekly podcast by CareCredit where we sit down with doctors and experts who give information, tips and insight into healthcare treatments and procedures. Check in every Wednesday for new episodes at carecredit.com/careexperts or subscribe on your favorite podcast app. CareCredit is a health, wellness and personal care credit card that has helped millions of people with promotional financing options and is accepted at hundreds of thousands of provider and retail locations nationwide. Learn more at carecredit.com.
The 365 Days of Astronomy, the daily podcast of the International Year of Astronomy 2009
Paul Hill and Dr. Jenifer “Dr. Dust” Millard host. Damien Phillips, John Wildridge and Dustin Ruoff produce. Today we bring you two of the plenary sessions from the British Planetary Science Conference, 2024, hosted by Space Park Leicester and the National Space Centre on June 18-21, 2024. - Dr. Aprajita Verma of the UK ELT Programme. - Dr. Steven G. Banham Research Fellow in planetary surface processes at the ICL. The Space Park newsletter reports: Dr. Jenifer Millard, Managing Editor at Fifth Star Labs, added: “I attended BPSC2024 not as a planetary scientist, but as an astronomer and science communicator, hoping to be inspired and learn beyond my field of expertise. … I'm delighted to say I was not disappointed by the event Space Park Leicester enabled. It was a fantastic few days of learning in a wonderful, encouraging and most importantly safe environment.” The conference was supported by the UK Space Agency, the Science and Technology Facilities Council (STFC), Europlanet Society and the Royal Astronomical Society. A gallery of event images can be found here: https://www.space-park.co.uk/galleries/bpsc2024/ www.awesomeastronomy.com Bio: Awesome Astronomy explores the frontiers of science, space and our evolving understanding of the universe. Join Paul & Jeni for informative and fun astronomy programmes dedicated to space and astronomy news and monthly podcast extras covering hot topics and special interviews in the world of science and astronomy. We've added a new way to donate to 365 Days of Astronomy to support editing, hosting, and production costs. Just visit: https://www.patreon.com/365DaysOfAstronomy and donate as much as you can! Share the podcast with your friends and send the Patreon link to them too! Every bit helps! Thank you! ------------------------------------ Do go visit http://www.redbubble.com/people/CosmoQuestX/shop for cool Astronomy Cast and CosmoQuest t-shirts, coffee mugs and other awesomeness! http://cosmoquest.org/Donate This show is made possible through your donations. Thank you! (Haven't donated? It's not too late! Just click!) ------------------------------------ The 365 Days of Astronomy Podcast is produced by the Planetary Science Institute. http://www.psi.edu Visit us on the web at 365DaysOfAstronomy.org or email us at info@365DaysOfAstronomy.org.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Extracting SAE task features for ICL, published by Dmitrii Kharlapenko on August 12, 2024 on The AI Alignment Forum. TL;DR We try to study task vectors in the SAE basis. This is challenging because there is no canonical way to convert an arbitrary vector in the residual stream to a linear combination of SAE features - you can't just pass an arbitrary vector through the encoder without going off distribution. We explored the algorithm of gradient pursuit suggested in Smith et al, but it didn't work for us without modifications. Our approach is to apply the SAE encoder to the task vector, and then apply a gradient-based cleanup. This exploits the fact that task vectors have a differentiable objective. We find that this gives a sparser and cleaner reconstruction, which is also highly interpretable, and also serves as a better task vector due to directly optimizing for log likelihood. This takes us from ~100 active features to ~10. Using our algorithm, we find two classes of SAE features involved in ICL. One of them recognizes the exact tasks or output formats from the examples, and another one encodes the tasks for execution by the model later on. We show that steering with these features has causal effects similar to task vectors. This work was produced as part of the ML Alignment & Theory Scholars Program - Summer 24 Cohort, under mentorship from Neel Nanda and Arthur Conmy. Prior work Task or function vectors are internal representations of some task that LLMs form while processing an ICL prompt. They can be extracted from a model running on a few-shot prompt and then be used to make it complete the same task without having any prior context or task description. Several papers (Function vectors in large language models, In-Context Learning Creates Task Vectors) have proposed different ways to extract those task vectors. They all center around having ICL examples being fed to a model in the form of "input output, … " and averaging the residuals on the "separator" token over a batch. This approach can reconstruct some part of the ICL performance but does not admit a straightforward conversion to the SAE basis. ITO with gradient pursuit can be used to do a sparse coding of a residual vector using SAE features. The post suggests using this algorithm for steering vector SAE decomposition. Since task vectors can be thought of as steering vectors, ITO may provide some insight into the ways they operate. Initial Phi-3 experiments Direct SAE task vector reconstruction In our study we trained a set of gated SAEs for Phi-3 Mini 3.8B using a model-generated synthetic instruction dataset. While offering a sparse dictionary decomposition of residuals, SAEs tend to introduce a reconstruction error that impacts the performance of the model. They also have no guarantee to be able to decompose out-of-distribution vectors, and task vectors being a product of averaging activations across prompts and tokens may be the case of such vectors. Thus, we first studied the performance of SAE reconstructions of task vectors in transferring the definition of two tasks: 1) antonym generation and 2) English to Spanish word translation. These and other tasks used to study task vectors were taken from the ICL task vectors paper github repository. These charts show the NLL loss of the model on the evaluation set of zero-shot prompts for both of the tasks depending on the layer of extraction/insertion. TV stands for the original task vector performance; Recon of TV stands for using the SAE reconstruction of the task vector instead of the task vector; TV on recon stands for first doing a SAE reconstruction of the residuals and then collecting a task vector on them; ITO stands for the ITO algorithm with 40 target l0 loss. It can be seen from charts that SAE reconstruction significantly decrea...