Podcasts about dpo

  • 303PODCASTS
  • 757EPISODES
  • 36mAVG DURATION
  • 5WEEKLY NEW EPISODES
  • May 28, 2025LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about dpo

Latest podcast episodes about dpo

Careers in Data Privacy
Krete Paal: CEO at GDPR Register

Careers in Data Privacy

Play Episode Listen Later May 28, 2025 29:21


Krete has spent her career protecting privacy information,She is CEO of a company that manages privacy documentation.Krete was the DPO for the Estonian Border Guard,We will chat about her career, both the easy and the hard!

Serious Privacy
Retro Week in Privacy - So much to Cover!

Serious Privacy

Play Episode Listen Later May 22, 2025 32:54


Send us a textOn this week of Serious Privacy, Paul Breitbarth, and Dr. K Royal (Ralph O'Brien was traveling), we cover a wild wrap up of privacy activities, including Tom Kemp as the newly appointed head of the California Consumer Privacy Protection Agency, and a wide sweep of enforcement actions including Roku, Honda Motor Company, National Public Data, Tom Snyder, plus class actions against Insomnia and Pill Pack, and a reprimand sent to Deep Seek, IAPP's state privacy law tracker update, California is seeking public feedback on proposed regulations for the delete request and opt-out platform - the DROP system, CNIL's guidance on monitoring self-checkouts, and Meta's request for a court to invalidate the EDPB guidance (can't do it, it's not a law) and Belgium's new law plus quite a bit more. We are packed with news.Please subscribe in your favorite podcast app - sharing is caring!  Powered by TrustArcSeamlessly manage your privacy program, assess risks, and stay up to date on laws across the globe.With TrustArc's Privacy Studio and Governance Suite, you can automate cookie compliance, streamline data subject rights, and centralize your privacy tasks—all while reducing compliance costs. Visit TrustArc.com/serious-privacy.If you have comments or questions, find us on LinkedIn and Instagram @seriousprivacy, and on BlueSky under @seriousprivacy.eu, @europaulb.seriousprivacy.eu, @heartofprivacy.bsky.app and @igrobrien.seriousprivacy.eu, and email podcast@seriousprivacy.eu. Rate and Review us! From Season 6, our episodes are edited by Fey O'Brien. Our intro and exit music is Channel Intro 24 by Sascha Ende, licensed under CC BY 4.0. with the voiceover by Tim Foley.

Serious Privacy
Indescribably Intuitive: Indigenous Privacy

Serious Privacy

Play Episode Listen Later May 14, 2025 36:46


Send us a textOn this episode of @SeriousPrivacy, hosts Paul Breitbarth and Dr. K Royal (Ralph wasn'  able to join us in DC) catch up with Tahu Kukutai, Professor, The University of Waikato; Jade Makory, CIPP/E, CIPM, CIPT, FIP, Legal and Advocacy Director, Data Analytics Kenya, and Privacy Expert, PwC (on Sabbatical); and Shana Morgan, AIGP, CIPP/E, CIPM, FIP, Global Head of AI / Privacy, L3Harris Technologies - just after the first IAPP panel on indigenous privacy at GPS25 (moderated by Shoshana Rosenberg). Fabulous and enlightening.   Powered by TrustArcSeamlessly manage your privacy program, assess risks, and stay up to date on laws across the globe.With TrustArc's Privacy Studio and Governance Suite, you can automate cookie compliance, streamline data subject rights, and centralize your privacy tasks—all while reducing compliance costs. Visit TrustArc.com/serious-privacy.If you have comments or questions, find us on LinkedIn and Instagram @seriousprivacy, and on BlueSky under @seriousprivacy.eu, @europaulb.seriousprivacy.eu, @heartofprivacy.bsky.app and @igrobrien.seriousprivacy.eu, and email podcast@seriousprivacy.eu. Rate and Review us! From Season 6, our episodes are edited by Fey O'Brien. Our intro and exit music is Channel Intro 24 by Sascha Ende, licensed under CC BY 4.0. with the voiceover by Tim Foley.

Capital
Capital Intereconomía 11:00 a 12:00 12/05/2025

Capital

Play Episode Listen Later May 12, 2025 6:47


En Empresas con Identidad conocemos a Mario Eguiluz, cofundador de Deblock. Además hablamos sobre Bitcoin y Ethereum con Mario Eguiluz, cofundador de Deblock y ponemos el foco en la Inteligencia Artificial y la ciberseguridad con Daniel Garcia, Managing Director de ISMS Forum (Asociación Española para el Fomento de la Seguridad de la Información); Carlos A. Saiz, Vice Chairman, ISMS Forum; Director, Data Privacy Institute; Partner and Head of Privacy, Risk & Compliance, Ecix Group y con Francisco Lázaro, CISO y DPO en Renfe, y Co-director del Gurpo de Inteligencia Artificial de ISMS Forum.

Masters of Privacy
Daniel Barber (DataGrail): Privacy Tech spotlight II - widespread non-compliance, opt-out challenges, and shadow AI

Masters of Privacy

Play Episode Listen Later May 11, 2025 35:55


Is it possible that a whole generation of consent-management solutions built for the EU-driven opt-in world are unsuitable for the opt-out scenario predominant in the US? How are DPOs and AI Governance professionals to deal with “shadow AI” and “shadow IT”?  Daniel Barber is DataGrail's CEO and co-founder. Prior to DataGrail Daniel led revenue teams at DocuSign, Datanyze (acquired by ZoomInfo), ToutApp (acquired by Marketo) and Responsys (acquired by Oracle). He also advises several high-growth startups. References: Daniel Barber on LinkedIn Unveiling DataGrail's 2024 Data Privacy Trends Report: The Time Data Subject Requests Surged 246% in Two Years DataGrail Privacy Inspector (Chrome Web Store) Max Anderson (Ketch): Privacy Tech spotlight I – the future of CMPs, value vs. hype in privacy compliance SaaS (Masters of Privacy, April 2025)

Serious Privacy
A week in privacy with Ralph and K

Serious Privacy

Play Episode Listen Later May 8, 2025 23:15


Send us a textOn this week of Serious Privacy,  Ralph O'Brien of Reinbo Consulting, and Dr. K Royal connect to cover a week in privacy as Paul Breitbarth is away. This weeks shorter episode includes a guide to what's coming up from Serious Privacy at IAPP summit in DC, a penalty from the UK ICO, EDPB draft Guidance on blockchain, state laws, enforcement actions, and more!Please subscribe in your favorite podcast app - sharing is caring! Some resourceshttps://ico.org.uk/about-the-ico/media-centre/news-and-blogs/2025/04/law-firm-fined-60-000-following-cyber-attack/https://www.edpb.europa.eu/news/news/2025/edpb-adopts-guidelines-processing-personal-data-through-blockchains-and-ready_enhttps://iapp.org/resources/article/us-state-privacy-legislation-tracker/#state-privacy-law-chart Powered by TrustArcSeamlessly manage your privacy program, assess risks, and stay up to date on laws across the globe.With TrustArc's Privacy Studio and Governance Suite, you can automate cookie compliance, streamline data subject rights, and centralize your privacy tasks—all while reducing compliance costs. Visit TrustArc.com/serious-privacy.If you have comments or questions, find us on LinkedIn and Instagram @seriousprivacy, and on BlueSky under @seriousprivacy.eu, @europaulb.seriousprivacy.eu, @heartofprivacy.bsky.app and @igrobrien.seriousprivacy.eu, and email podcast@seriousprivacy.eu. Rate and Review us! From Season 6, our episodes are edited by Fey O'Brien. Our intro and exit music is Channel Intro 24 by Sascha Ende, licensed under CC BY 4.0. with the voiceover by Tim Foley.

Serious Privacy
A week in privacy plus Summit Day 2 recap

Serious Privacy

Play Episode Listen Later May 3, 2025 41:34


Send us a textOn this week of Serious Privacy, Paul Breitbarth, Ralph O'Brien of Reinbo Consulting, and Dr. K Royal have a short week in privacy (a lot about #Meta and children using #AI) along with some updates on the IAPP #GPS25 where we learned that it was all about the people, such as our friends at TrustArc, Ben Siegal, Dan Solove, and so many others. Please subscribe in your favorite podcast app - sharing is caring!  Powered by TrustArcSeamlessly manage your privacy program, assess risks, and stay up to date on laws across the globe.With TrustArc's Privacy Studio and Governance Suite, you can automate cookie compliance, streamline data subject rights, and centralize your privacy tasks—all while reducing compliance costs. Visit TrustArc.com/serious-privacy.If you have comments or questions, find us on LinkedIn and Instagram @seriousprivacy, and on BlueSky under @seriousprivacy.eu, @europaulb.seriousprivacy.eu, @heartofprivacy.bsky.app and @igrobrien.seriousprivacy.eu, and email podcast@seriousprivacy.eu. Rate and Review us! From Season 6, our episodes are edited by Fey O'Brien. Our intro and exit music is Channel Intro 24 by Sascha Ende, licensed under CC BY 4.0. with the voiceover by Tim Foley.

Serious Privacy
The 30 Year Plan: Live from IAPP GPS25

Serious Privacy

Play Episode Listen Later Apr 24, 2025 32:19


Send us a textWe are a little late this week, but with good reason: Paul Breitbarth and Dr. K Royal were attending the IAPP Global Privacy Summit in Washington D.C. and bring you their report from the Opening General Session of the conference. The speakers during this session were professor Lawrence Lessig, Hans Peter Brøndmo and Catie Cuan, each reflecting on 25 years of IAPP and what is next for us privacy professionals. Apparently: it's robots! Powered by TrustArcSeamlessly manage your privacy program, assess risks, and stay up to date on laws across the globe.With TrustArc's Privacy Studio and Governance Suite, you can automate cookie compliance, streamline data subject rights, and centralize your privacy tasks—all while reducing compliance costs. Visit TrustArc.com/serious-privacy.If you have comments or questions, find us on LinkedIn and Instagram @seriousprivacy, and on BlueSky under @seriousprivacy.eu, @europaulb.seriousprivacy.eu, @heartofprivacy.bsky.app and @igrobrien.seriousprivacy.eu, and email podcast@seriousprivacy.eu. Rate and Review us! From Season 6, our episodes are edited by Fey O'Brien. Our intro and exit music is Channel Intro 24 by Sascha Ende, licensed under CC BY 4.0. with the voiceover by Tim Foley.

InfosecTrain
Data Privacy Officer Interview Questions & Concepts

InfosecTrain

Play Episode Listen Later Apr 17, 2025 21:20


InfosecTrain's "Data Privacy Officer Interview Questions" provides a comprehensive overview of the data privacy landscape and the critical role of the Data Privacy Officer (DPO). The article features a curated list of interview questions designed to evaluate a candidate's understanding of data protection laws, privacy principles, and their ability to manage an organization's data responsibly. It explains key concepts such as data privacy, data minimization, and Privacy by Design, while also outlining the responsibilities of a DPO and the steps involved in managing data breaches and conducting Privacy Impact Assessments. Furthermore, the resource highlights common data privacy regulations and discusses future challenges in data privacy, emphasizing the importance of staying updated with evolving laws and the impact of AI technologies. Finally, InfosecTrain promotes its training courses for individuals looking to become DPOs or enhance their data privacy knowledge.

Serious Privacy
Not your Grandma's Cookies (with Darren Abernethy)

Serious Privacy

Play Episode Listen Later Apr 16, 2025 41:52


Send us a textOn this week of Serious Privacy, Paul Breitbarth, Ralph O'Brien of Reinbo Consulting, and Dr. K Royal connect with Darren Abernethy of Greenberg Traurig to discuss all things #cookies and #trackers.Please subscribe in your favorite podcast app - sharing is caring!  Powered by TrustArcSeamlessly manage your privacy program, assess risks, and stay up to date on laws across the globe.With TrustArc's Privacy Studio and Governance Suite, you can automate cookie compliance, streamline data subject rights, and centralize your privacy tasks—all while reducing compliance costs. Visit TrustArc.com/serious-privacy.If you have comments or questions, find us on LinkedIn and Instagram @seriousprivacy, and on BlueSky under @seriousprivacy.eu, @europaulb.seriousprivacy.eu, @heartofprivacy.bsky.app and @igrobrien.seriousprivacy.eu, and email podcast@seriousprivacy.eu. Rate and Review us! From Season 6, our episodes are edited by Fey O'Brien. Our intro and exit music is Channel Intro 24 by Sascha Ende, licensed under CC BY 4.0. with the voiceover by Tim Foley.

Serious Privacy
A Challenging Week in Privacy

Serious Privacy

Play Episode Listen Later Apr 10, 2025 39:45


Send us a textOn this week of Serious Privacy, Paul Breitbarth, Ralph O'Brien, and Dr. K Royal make it personal. They talk about their own mental health in light of busy workloads and global developments, whether privacy related or not. And they talk about the 23andme bankruptcy and what that means for personal data, as well as a fine issued by the Jersey Data Protection Authority..Please subscribe in your favorite podcast app - sharing is caring!  Powered by TrustArcSeamlessly manage your privacy program, assess risks, and stay up to date on laws across the globe.With TrustArc's Privacy Studio and Governance Suite, you can automate cookie compliance, streamline data subject rights, and centralize your privacy tasks—all while reducing compliance costs. Visit TrustArc.com/serious-privacy.If you have comments or questions, find us on LinkedIn and Instagram @seriousprivacy, and on BlueSky under @seriousprivacy.eu, @europaulb.seriousprivacy.eu, @heartofprivacy.bsky.app and @igrobrien.seriousprivacy.eu, and email podcast@seriousprivacy.eu. Rate and Review us! From Season 6, our episodes are edited by Fey O'Brien. Our intro and exit music is Channel Intro 24 by Sascha Ende, licensed under CC BY 4.0. with the voiceover by Tim Foley.

Masters of Privacy (ES)
Alejandro Prieto: privacidad anticompetitiva, DNIs explosivos y difamación artificial

Masters of Privacy (ES)

Play Episode Listen Later Apr 7, 2025 29:16


Alejandro Prieto es director del Departamento de Protección de Datos y Regulación Digital de Autocontrol. Anteriormente ha sido DPO del Grupo Alsea (VIPS, Starbucks, Foster's Hollywood, Domino's Pizza) y de Bluetab Solutions (Grupo IBM), así como consultor de seguridad de la información. Alejandro participa como docente en diversas instituciones, incluyendo la Universidad Carlos III de Madrid. Con nuestro invitado hemos abordado la nueva categorización de datos del DNI como especialmente sensibles, la denuncia de NOYB a OpenAI por atribuir a un ciudadano noruego el asesinato de sus hijos y la multa a Apple de 150 millones de euros en Francia por prácticas anticompetitivas.  Referencias: Alejandro Prieto en LinkedIn Newsletter de Alejandro Prieto Alejandro Prieto en Bluesky Multa de 150 millones de euros a Apple por usar “App Tracking Transparency” para imponer cortapisas en un mercado publicitario en el que participa sin el mismo nivel de consentimiento Sergio Maldonado: Apple vs. Privacy (2021, EN - Medium) La AEPD estima que el DNI contiene datos especialmente sensibles e impone sanción de 100.000 euros por solicitar su fotocopia por email (sanción) La AEPD sanciona por fotografiar el DNI en la entrega de paquetes (finReg360) OpenAI demandada en Noruega por difamar a un ciudadano atribuyéndole el asesinato de sus hijos Jorge García Herrero: ¿Contienen datos personales los LLM? ¿Cómo aplicamos el RGPD a los sistemas de IA generativa? (Masters of Privacy) Alejandro Prieto en Masters of Privacy (enero 2024): códigos de conducta en el mercado publicitario y la nueva dimensión del consentimiento APEP: Publicidad digital respetando la privacidad (curso).  

Masters of Privacy
Andy Dale: DPO vs. CPO, present and future value of Privacy Tech, and the new US administration's impact on the regulatory landscape

Masters of Privacy

Play Episode Listen Later Apr 6, 2025 27:09


Today we are taking a look at the difference between DPO and CPO roles in the US, the present and future impact of Privacy Tech in the management of privacy programs, the evolution of privacy regulation under the new US administration, and a potential Schrems III scenario.  Andy Dale serves as General Counsel and Chief Privacy Officer at OpenAP and holds the position of Executive Board Member at The L Suite (TechGC). With extensive experience as an advisor to various companies, Andy previously worked as General Counsel and Chief Privacy Officer at Alyce, a company acquired by Sendoso in 2024, and as General Counsel and VP of Global Data Privacy at SessionM, which was acquired by Mastercard in 2019. Andy Dale earned a JD in Law from the University of Baltimore School of Law (2003-2006) and a degree from Colgate University (1996-2000). References:  Andy Dale on LinkedIn The Data Protection Breakfast Club podcast on Spotify Brian Focht: Can the American Privacy Rights Act find a path to survival? (Masters of Privacy) Amy Worley on the American Privacy Rights Act (Masters of Privacy) Molly Martinson on state-level comprehensive privacy laws (Masters of Privacy)

Serious Privacy
F to the T to the C - a slow but substantive week in privacy

Serious Privacy

Play Episode Listen Later Apr 4, 2025 31:50 Transcription Available


Send us a textOn this week of Serious Privacy, Paul Breitbarth , Ralph O'Brien of Reinbo Consulting, and Dr. K Royal talk about the controversy with executive changes to the U.S. Federal Trade Commission #FTC, the UK #adequacy extension, and the Norwegian decision about Data Protection Officer #DPO conflicts of interest.Please subscribe in your favorite podcast app - sharing is caring!  Powered by TrustArcSeamlessly manage your privacy program, assess risks, and stay up to date on laws across the globe.With TrustArc's Privacy Studio and Governance Suite, you can automate cookie compliance, streamline data subject rights, and centralize your privacy tasks—all while reducing compliance costs. Visit TrustArc.com/serious-privacy.If you have comments or questions, find us on LinkedIn and Instagram @seriousprivacy, and on BlueSky under @seriousprivacy.eu, @europaulb.seriousprivacy.eu, @heartofprivacy.bsky.app and @igrobrien.seriousprivacy.eu, and email podcast@seriousprivacy.eu. Rate and Review us! From Season 6, our episodes are edited by Fey O'Brien. Our intro and exit music is Channel Intro 24 by Sascha Ende, licensed under CC BY 4.0. with the voiceover by Tim Foley.

Grumpy GDPR
Conflicted (w/special guest)

Grumpy GDPR

Play Episode Listen Later Apr 3, 2025 47:22


Today, the Grumpy GDPR show hosts a special guest – and fellow podcaster! With years at the Dutch DPA, then real-world experience in the trenches of data protection, also teaching compliance at Maastricht University and a member of the Jersey DPA – who better to dive into the latest dynamite DPO decision with than Paul Breitbarth himself!

Masters of Privacy
Tim Turner: UK news spotlight - advertising, reforms, AI

Masters of Privacy

Play Episode Listen Later Mar 30, 2025 30:22


Where is the UK data protection reform headed? How are we to deal with behavioural advertising in the context of sports betting and gambling? Will the UK stay clear of regulating or supervising AI à la EU?  Tim Turner has worked on Data Protection, Freedom of Information (FOI) and Information Rights law since 2001. He started at the Information Commissioner's Office as a Policy Manager on FOI issues. After that, he was a Data Protection & FOI Officer for two councils and then an Information Governance Manager for an NHS (National Health Service) organisation. He has been offering data protection training and consultancy since 2011. Also, Tim is the author of the very popular DPO Daily newsletter and LinkedIn feed.  References: Tim Turner on LinkedIn 2040 Training The DPO Daily on LinkedIn ICO: Action taken against Sky Betting and Gaming for using cookies without consent UK betting giants under fire for ads targeting at-risk gamblers (The Guardian) UK Data Reform: What's Proposed (Bird & Bird) Stephen Almond (ICO): data protection laws as a primary tool for AI governance (Masters of Privacy)  

Serious Privacy
A Month in Privacy

Serious Privacy

Play Episode Listen Later Mar 27, 2025 37:04 Transcription Available


Send us a textOn this week of Serious Privacy, Paul Breitbarth, Ralph O'Brien of Reinbo Consulting, and Dr. K Royal cover a month in privacy. This includes UK adequacy, the March meeting of the European Data Protection Board where they released a statement on the implementation of the PNR directive, we talk about BCRS and the number of companies who have adopted BCRs and BSPRs, and the UK list of BCRs, court cases, we talk about the future of the GDPR and lots of data protection consultation, and that is just the European part of it.Please subscribe in your favorite podcast app - sharing is caring!  Powered by TrustArcSeamlessly manage your privacy program, assess risks, and stay up to date on laws across the globe.With TrustArc's Privacy Studio and Governance Suite, you can automate cookie compliance, streamline data subject rights, and centralize your privacy tasks—all while reducing compliance costs. Visit TrustArc.com/serious-privacy.If you have comments or questions, find us on LinkedIn and Instagram @seriousprivacy, and on BlueSky under @seriousprivacy.eu, @europaulb.seriousprivacy.eu, @heartofprivacy.bsky.app and @igrobrien.seriousprivacy.eu, and email podcast@seriousprivacy.eu. Rate and Review us! From Season 6, our episodes are edited by Fey O'Brien. Our intro and exit music is Channel Intro 24 by Sascha Ende, licensed under CC BY 4.0. with the voiceover by Tim Foley.

Altalex News
Altalex Settimanale n. 12/2025: le notizie dal 24 al 28 marzo

Altalex News

Play Episode Listen Later Mar 27, 2025 10:57


Passi avanti su E-evidence, Convenzione UE a tutela dell'avvocato, le tariffe della mediazione, agevolazioni prima casa per residenti all'estero, separazione personale e accesso agli atti, il DPO aziendale>> Leggi anche l'articolo: https://tinyurl.com/8vhckfwn>> Scopri tutti i podcast di Altalex: https://bit.ly/2NpEc3w

Al Ahly Pharos
Pre-Trading Thoughts

Al Ahly Pharos

Play Episode Listen Later Mar 26, 2025 4:27


*Key news articles for today*The International Finance Corporation (IFC) unveiled the list of 11 Egyptian airports slated for development through public-private partnerships (PPP).China's Wu'an Xin Feng signed the contract for its USD1.7 bn integrated metal industries complex in the Ain Sokhna industrial zone with the government yesterday. The Egyptian Electric Utility and Consumer Protection Regulatory Agency (Egyptera) has temporarily halted the acceptance of requests to connect solar power plants to the grid until a regulatory framework for net metering and solar self-consumption is finalized.The House of Representatives approved a grant from the Japanese government worth JPY500 million (USD3.3 million) to the Ministry of Agriculture.Local fertilizer player Agro TH plans to establish its second factory in Sadat City in partnership with an Omani investor with an estimated investment of EGP5 bn.Local home appliances manufacturer Royal Home plans to list 20% of its shares on the EGX within the next 18 months.Instapay transfer fees come into effect on Tuesday, 1 April. Transfers will be subject to a 0.1% fee, ranging from EGP0.50 and EGP20 per transaction.ORAS 4Q24 consolidated net income attributable to shareholders decreased 29.2% YoY to USD31.0 million and adjusted net income attributable to shareholders increased 0.4% YoY to USD117.3 million in FY24. ORAS is currently trading at FY25e PE of 4.9x.JUFO reported 4Q24 attributable net profit of EGP298 million (+108.4% YoY, -69.0% QoQ). FY24 attributable net profit recorded EGP2,735 million (+167.9% YoY), missing our estimates of EGP2,969 million. JUFO is currently trading at FY25f P/E of 8.7x.TMGH and Alameda Healthcare signed a partnership agreement to build a 200-bed hospital in TMGH's Madinaty, with investments of EGP5.0 billion.HELI expects its revenue to reach EGP300.0 billion in ten years.EGCH obtained an EGP10 billion credit facility, from a consortium of 6 banks, to finance the company's nitric acid and ammonium nitrate project. The facility is distributed USD93 million in foreign currency and EGP5 billion in local currency.CIEB Annual General Meeting (AGM) has approved the distribution of cash dividends amounting to EGP3.2 per share, reflecting a payout ratio of 50%. The distribution date is set for April 17, and the record date is April 14.COMI has announced cash dividends of EGP2.50 per share. The distribution date is set for April 10, with a record date of April 7. BTFH announced that the coverage rate for the first phase of the company's capital increase subscription reached 92.11% amounting to 4.95 billion shares. ValU plans to issue securitized bonds worth EGP1 billion during the second quarter of this year.ETEL's assembly approved distributing cash dividends of EGP1.50/share, implying a DPO of 25.4% and a DY of 4.1%.Hilton will set up Africa's first Signia Hotel and Signia Residence in West Cairo's Skywalk. The 200-key hotel will include build on 5k sqm space.The Holding Company for Tourism and Hotels is working on adding 3k keys to its portfolio within the coming three years. 

Serious Privacy
Whatever it Takes (with Rowenna Fielding)

Serious Privacy

Play Episode Listen Later Mar 19, 2025 42:19


Send us a textOn this week of Serious Privacy, Paul Breitbarth, Ralph O'Brien of Reinbo Consulting, and Dr. K Royal connect with MissIGGeek herself, Rowenna Fielding on all things ethics.Please subscribe in your favorite podcast app - sharing is caring!  Powered by TrustArcSeamlessly manage your privacy program, assess risks, and stay up to date on laws across the globe.With TrustArc's Privacy Studio and Governance Suite, you can automate cookie compliance, streamline data subject rights, and centralize your privacy tasks—all while reducing compliance costs. Visit TrustArc.com/serious-privacy.If you have comments or questions, find us on LinkedIn and Instagram @seriousprivacy, and on BlueSky under @seriousprivacy.eu, @europaulb.seriousprivacy.eu, @heartofprivacy.bsky.app and @igrobrien.seriousprivacy.eu, and email podcast@seriousprivacy.eu. Rate and Review us! From Season 6, our episodes are edited by Fey O'Brien. Our intro and exit music is Channel Intro 24 by Sascha Ende, licensed under CC BY 4.0. with the voiceover by Tim Foley.

Serious Privacy
Out of the AI Frying Pan (with Joanne Furtsch)

Serious Privacy

Play Episode Listen Later Mar 13, 2025 34:21


Send us a textOn this week of Serious Privacy, Paul Breitbarth, Ralph O'Brien of Reinbo Consulting, and Dr. K Royal join Joanne Furtsch, VP extraordinaire of TrustArc to discuss all things #AI. Tune in to learn about the practical and innovative aspects of AI and its privacy and data protection implications.Please subscribe in your favorite podcast app - sharing is caring!  Powered by TrustArcSeamlessly manage your privacy program, assess risks, and stay up to date on laws across the globe.With TrustArc's Privacy Studio and Governance Suite, you can automate cookie compliance, streamline data subject rights, and centralize your privacy tasks—all while reducing compliance costs. Visit TrustArc.com/serious-privacy.If you have comments or questions, find us on LinkedIn and Instagram @seriousprivacy, and on BlueSky under @seriousprivacy.eu, @europaulb.seriousprivacy.eu, @heartofprivacy.bsky.app and @igrobrien.seriousprivacy.eu, and email podcast@seriousprivacy.eu. Rate and Review us! From Season 6, our episodes are edited by Fey O'Brien. Our intro and exit music is Channel Intro 24 by Sascha Ende, licensed under CC BY 4.0. with the voiceover by Tim Foley.

Food School: Smarter Stronger Leaner.
Instant Clarity in Chaos: The Navy SEAL Mental Tool for Success in Uncertainty

Food School: Smarter Stronger Leaner.

Play Episode Listen Later Mar 10, 2025 18:22


TUNE IN TO LEARN:  What if you could transform paralyzing uncertainty into decisive action?In this eye-opening exploration of mental focus, you'll learn how Navy SEALs navigate chaos and make critical decisions when stakes are highest.    Enter the DPO method (Duration-Pathway-Outcome), a neuroscience-backed approach that elite military operators use to create certainty in the most uncertain environments. By deliberately shifting your temporal horizon, you generate the confidence and motivation to move forward rather than freeze.  This uniquely human ability to shift focus across time—zooming in on immediate details, then zooming out to connect actions to larger goals—is what separates those who thrive from those merely surviving in today's accelerating change.  As unpredictability becomes our new normal, mastering this mental skill may be the most critical factor in your future success.  Ready to transform how you navigate uncertainty?  Try this Navy SEAL technique today and watch how this creates momentum toward your larger objectives.  Text Me Your Thoughts and IdeasSupport the show Brought to you by Angela Shurina EXECUTIVE HEALTH AND OPTIMAL PERFORMANCE COACH Change in days - not in years!

Kite Consulting
Dairy Producer Organisations Explained

Kite Consulting

Play Episode Listen Later Mar 7, 2025 49:24


This week, our hosts Will and Ben focus on Dairy Producer Organisations (DPOs) and their vital role in the dairy industry. With Ian Harvey, farmer, Director at Davidstow Creamery as well as a member of the NFU Dairy Board and former member of AHDB Dairy Sector Council, along with our own dairy market analyst, Chris Walkland. Starting the discussion prompted by a recent article from Chris in British Dairying Magazine, they discuss the role of DPO's and the importance of collective representation for farmers. They discuss the limitations of DPOs, particularly highlighted in Chris' article after 2 significant processors who do have DPO's. have just given notice to a number of farmers. They also discuss the difference between DPOs and farmer boards, and the significance of FDOM (Fair Dealings Obligations Milk) over the next few months.Please note: The information provided during this podcast has been prepared for general informational purposes only and does not constitute advice. The information must not be relied upon for any purpose and no representation or warranty is given as to its accuracy, completeness or otherwise. Any reference to other organisations, businesses or products during the podcast are not endorsements or recommendations of Dairy Consulting Ltd or its affiliated companies. The views of the presenter are personal and may not be the views of Dairy Consulting Ltd. The contents of this podcast are the copyright of Dairy Consulting Ltd.

She Said Privacy/He Said Security
ISACA 2025 State of Privacy Survey Findings

She Said Privacy/He Said Security

Play Episode Listen Later Mar 6, 2025 34:47


Niel Harper is a Certified Director and ISACA Board Vice Chair. He is also the Chief Information Security Officer and Data Protection Officer at Doodle. Niel is based in Germany. He has more than 20 years of experience in IT risk management, cybersecurity, privacy, Internet governance and policy, and digital transformation. Safia Kazi is the Privacy Professional Practices Principal at ISACA. She has worked at ISACA for just over a decade, initially working on ISACA's periodicals and now serving as the Privacy Professional Practices Principal. She is based in Chicago. In 2021, she was a recipient of the AM&P Network's Emerging Leader award, which recognizes innovative association publishing professionals under the age of 35. In this episode… ISACA's State of Privacy 2025 survey reveals that privacy professionals are facing significant hurdles, including staffing shortages, budget cuts, and increasing demands for technical privacy expertise. Many organizations are shifting privacy responsibilities to legal and security teams, without additional resources or training. At the same time, AI adoption is increasing, introducing new complexities and risks. With privacy budgets under strain and teams expected to do more with less, how can businesses sustain effective privacy programs while navigating new challenges? According to ISACA's State of Privacy 2025 survey, one of the most pressing concerns for privacy teams is the growing demand for technical privacy expertise. Privacy by design also remains a challenge, with limited resources making it difficult for teams to embed privacy into product development from the outset. AI also plays a growing role in privacy operations, helping automate processes while raising concerns about data security, bias, and third-party risks. Despite these findings from ISACA's survey, businesses can make privacy sustainable by fostering a culture of privacy awareness from the top down, ensuring leadership understands the value of privacy beyond compliance. In this episode of She Said Privacy/He Said Security, Jodi and Justin Daniels speak with Niel Harper, Certified Director and Board Vice Chair at ISACA and CISO and DPO at Doodle, and Safia Kazi, Privacy Professional Practices Principal at ISACA, about the findings from ISACA's State of Privacy 2025 survey. Safia explains how privacy professionals can adapt to changes by continuously learning and staying informed on emerging risks, while Niel highlights the need for board-level privacy advocacy. They also explore how organizations are adapting to staffing shortages and budget constraints, the impact of AI on privacy operations, and how organizations can effectively navigate emerging risks.

Serious Privacy
The Volatile Side of Privacy & Data Protection (with Joe Jones)

Serious Privacy

Play Episode Listen Later Mar 5, 2025 30:08


Send us a textOn this week of Serious Privacy, Paul Breitbarth, Ralph O'Brien of Reinbo Consulting, were off (busy with professional lives), so Dr. K Royal connected with Joe Jones of the IAPP, Director of Research and Insights. They discuss how Joe got started in privacy - building from international trade as a lawyer and then BAM! Out came the GDPR, Joe then migrated to the #UK government, and quickly into the #IAPP, moving to the US about two years ago.Please subscribe in your favorite podcast app - sharing is caring!  Powered by TrustArcSeamlessly manage your privacy program, assess risks, and stay up to date on laws across the globe.With TrustArc's Privacy Studio and Governance Suite, you can automate cookie compliance, streamline data subject rights, and centralize your privacy tasks—all while reducing compliance costs. Visit TrustArc.com/serious-privacy.If you have comments or questions, find us on LinkedIn and Instagram @seriousprivacy, and on BlueSky under @seriousprivacy.eu, @europaulb.seriousprivacy.eu, @heartofprivacy.bsky.app and @igrobrien.seriousprivacy.eu, and email podcast@seriousprivacy.eu. Rate and Review us! From Season 6, our episodes are edited by Fey O'Brien. Our intro and exit music is Channel Intro 24 by Sascha Ende, licensed under CC BY 4.0. with the voiceover by Tim Foley.

Serious Privacy
Catch (up) of the Week: on notices, children and doping

Serious Privacy

Play Episode Listen Later Feb 26, 2025 41:25


Send us a textOn this week of Serious Privacy, Paul Breitbarth, Ralph O'Brien of Reinbo Consulting, and Dr. K Royal, catch up on data protection and privacy developments from around the globe. Up for discussion this week:The repeal of the proposal for an ePrivacy Regulation and AI Liability Directive (link)The EDPB guidelines on age assurance and recommendations to the World Anti Doping Agency (link)The ICO Direct Marketing Advice generator (link)Utah Age Verification (link)Danish Petitition to buy California (link) Powered by TrustArcSeamlessly manage your privacy program, assess risks, and stay up to date on laws across the globe.With TrustArc's Privacy Studio and Governance Suite, you can automate cookie compliance, streamline data subject rights, and centralize your privacy tasks—all while reducing compliance costs. Visit TrustArc.com/serious-privacy.If you have comments or questions, find us on LinkedIn and Instagram @seriousprivacy, and on BlueSky under @seriousprivacy.eu, @europaulb.seriousprivacy.eu, @heartofprivacy.bsky.app and @igrobrien.seriousprivacy.eu, and email podcast@seriousprivacy.eu. Rate and Review us! From Season 6, our episodes are edited by Fey O'Brien. Our intro and exit music is Channel Intro 24 by Sascha Ende, licensed under CC BY 4.0. with the voiceover by Tim Foley.

Serious Privacy
We require order! Order, we say (a week in privacy)

Serious Privacy

Play Episode Listen Later Feb 19, 2025 40:57 Transcription Available


Send us a textOn this week of Serious Privacy, Paul Breitbarth of Catawiki, Ralph O'Brien of Reinbo Consulting, and Dr. K Royal launch the first week in privacy for 2025. Topics include the EDPS election, AI, and US state laws and enforcement - among others!  Powered by TrustArcSeamlessly manage your privacy program, assess risks, and stay up to date on laws across the globe.With TrustArc's Privacy Studio and Governance Suite, you can automate cookie compliance, streamline data subject rights, and centralize your privacy tasks—all while reducing compliance costs. Visit TrustArc.com/serious-privacy.If you have comments or questions, find us on LinkedIn and Instagram @seriousprivacy, and on BlueSky under @seriousprivacy.eu, @europaulb.seriousprivacy.eu, @heartofprivacy.bsky.app and @igrobrien.seriousprivacy.eu, and email podcast@seriousprivacy.eu. Rate and Review us! From Season 6, our episodes are edited by Fey O'Brien. Our intro and exit music is Channel Intro 24 by Sascha Ende, licensed under CC BY 4.0. with the voiceover by Tim Foley.

Serious Privacy
Constitutionally Unconstitional or is it? (Prof. Wayne Unger)

Serious Privacy

Play Episode Listen Later Feb 12, 2025 35:41


Send us a textOn this week of Serious Privacy, Paul Breitbarth of Catawiki, Ralph O'Brien of Reinbo Consulting, and Dr. K Royal interview with Wayne Unger, a #constitutional scholar and Assistant Professor of Law, Quinnipiac University. They delve into recent #U.S. #political events under the new administration. The discussion covers the legality of #executiveorders, implications for #privacylaws, #birthright #citizenship, #TikTok, and concerns about retaliation for political speech. Insights are shared on the broader impact of these issues both domestically and internationally.00:00 Breaking News: Italian Regulator Blocks DeepSeek02:41 Introduction to This Week's Guest: Wayne Unger02:49 Constitutional Challenges in the U.S.05:04 TikTok Ban: Legal and Political Implications14:51 Trump's Executive Orders and Constitutional Questions18:16 Privacy and Civil Liberties Oversight Board Controversy23:30 Global Reactions and Personal Reflections32:51 Concluding Thoughts and Farewell Powered by TrustArcSeamlessly manage your privacy program, assess risks, and stay up to date on laws across the globe.With TrustArc's Privacy Studio and Governance Suite, you can automate cookie compliance, streamline data subject rights, and centralize your privacy tasks—all while reducing compliance costs. Visit TrustArc.com/serious-privacy.If you have comments or questions, find us on LinkedIn and Instagram @seriousprivacy, and on BlueSky under @seriousprivacy.eu, @europaulb.seriousprivacy.eu, @heartofprivacy.bsky.app and @igrobrien.seriousprivacy.eu, and email podcast@seriousprivacy.eu. Rate and Review us! From Season 6, our episodes are edited by Fey O'Brien. Our intro and exit music is Channel Intro 24 by Sascha Ende, licensed under CC BY 4.0. with the voiceover by Tim Foley#heartofprivacy #europaulb #igrobrien #seriousprivacy #privacy #dataprotection #cybersecuritylaw #CPO #DPO #CISO

Serious Privacy
The first week in privacy - hot news now

Serious Privacy

Play Episode Listen Later Feb 6, 2025 34:20 Transcription Available


Send us a textOn this week of Serious Privacy, Paul Breitbarth of Catawiki, Ralph O'Brien of Reinbo Consulting, and Dr. K Royal launch the first week in privacy for 2025. Topics include State laws in the US entering into effect (link to White & Case article, but bonus for 10 areas for US-based privacy programs to focus in 2025 from Hintze Law) to a TikTok ban that was there and then it wasn't. European Data Protection Board opinions. Court of Justice of the EU. Regulatory issues in Kenya. so much more. and did we even talk about Deepseek? Remember to like and subscribe! Powered by TrustArcSeamlessly manage your privacy program, assess risks, and stay up to date on laws across the globe.With TrustArc's Privacy Studio and Governance Suite, you can automate cookie compliance, streamline data subject rights, and centralize your privacy tasks—all while reducing compliance costs. Visit TrustArc.com/serious-privacy.If you have comments or questions, find us on LinkedIn and Instagram @seriousprivacy, and on BlueSky under @seriousprivacy.eu, @europaulb.seriousprivacy.eu, @heartofprivacy.bsky.app and @igrobrien.seriousprivacy.eu, and email podcast@seriousprivacy.eu. Rate and Review us! From Season 6, our episodes are edited by Fey O'Brien. Our intro and exit music is Channel Intro 24 by Sascha Ende, licensed under CC BY 4.0. with the voiceover by Tim Foley#heartofprivacy #europaulb #igrobrien #seriousprivacy #privacy #dataprotection #cybersecuritylaw #CPO #DPO #CISO

Careers in Data Privacy
Damian Bielecki: Deputy Data Protection Officer at XTB

Careers in Data Privacy

Play Episode Listen Later Feb 6, 2025 32:16


For Damian, GDPR is how he got started, We chat about the career path he's charted. Since law school, Damian has worked in data protection, As a DPO, he finds the legal bases for data collection!

Deep Papers
Multiagent Finetuning: A Conversation with Researcher Yilun Du

Deep Papers

Play Episode Listen Later Feb 4, 2025 30:03


We talk to Google DeepMind Senior Research Scientist (and incoming Assistant Professor at Harvard), Yilun Du, about his latest paper "Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains." This paper introduces a multiagent finetuning framework that enhances the performance and diversity of language models by employing a society of agents with distinct roles, improving feedback mechanisms and overall output quality.The method enables autonomous self-improvement through iterative finetuning, achieving significant performance gains across various reasoning tasks. It's versatile, applicable to both open-source and proprietary LLMs, and can integrate with human-feedback-based methods like RLHF or DPO, paving the way for future advancements in language model development.Read an overview on the blogWatch the full discussionLearn more about AI observability and evaluation in our course, join the Arize AI Slack community or get the latest on LinkedIn and X.

Masters of Privacy (ES)
Ricardo Gandarias: “Artificial” (Sigman, Bilinkis), y el despacho inteligente

Masters of Privacy (ES)

Play Episode Listen Later Feb 4, 2025 21:07


Ricardo Gandarias ha co-organizado junto a la editorial Marcial Pons un Club de Lectura cuya primera edición se dedicó al libro que rubrica nuestro episodio, de Mariano Sigman y Santiago Bilinkis.  Desde la perspectiva que nos dan algunas reflexiones de este libro avanzaremos hacia el rol de la IA en servicios profesionales y despachos de abogados dedicados al asesoramiento en protección de datos, prestando particular atención a la experiencia de Ricardo en su rol actual. Después de diez años trabajando en despachos de abogados, nuestro invitado pasó en 2022 a formar parte de ECIX Tech, donde ahora es consultor de privacidad y derecho digital. Además es DPO acreditado por la AEPD, miembro del "Special Pool of Experts" del CEPD, ponente y formador de privacidad en distintos foros y organizaciones. Referencias: Ricardo Gandarias en LinkedIn Artificial: la nueva inteligencia y el contorno de lo humano (Mariano Sigman y Santiago Bilinkis) Club de lectura Marcial Pons  MIAbogado - ECIX  

SMART TECH
RGPD : enjeux de conformité et évolutions

SMART TECH

Play Episode Listen Later Feb 3, 2025 14:40


Le RGPD, célèbre règlement général sur la protection des données va bientôt souffler sa 9ème bougie ! Et il sera d'ailleurs le thème phare de la 19e édition de l'Université des DPO organisée par l'AFCDP. L'occasion d'évoquer les enjeux juridiques, éthiques et technologiques et les évolutions du métier de délégué à la protection des données. -----------------------------------------------------------------------SMART TECH - Le magazine quotidien de l'innovationDans SMART TECH, l'actu du numérique et de l'innovation prend tout son sens. Chaque jour, des spécialistes décryptent les actualités, les tendances, et les enjeux soulevés par l'adoption des nouvelles technologies.

The FIT4PRIVACY Podcast - For those who care about privacy
Role of Privacy Engineering in Creating Digital Trust with Steve Ahouanmenou and Punit Bhatia in the FIT4PRIVACY Podcast E131 S06

The FIT4PRIVACY Podcast - For those who care about privacy

Play Episode Listen Later Jan 30, 2025 19:08


Can trust be engineered? In this episode, Punit is joined by Steve Ahouanmenou, Global Privacy Engineering Lead for Open Banking at Mastercard, to explore the pivotal role of privacy engineering in creating digital trust. Steve discusses why trust isn't sector-specific, emphasizing how transparency is vital across industries like healthcare and finance. The conversation dives into open banking, a revolutionary approach that gives consumers control over their financial data while fostering competition among financial service providers. Steve explains how privacy engineering brings privacy principles to life, embedding privacy by design, conducting risk assessments, and bridging the gap between privacy teams and technical teams.   Join us in discussing how privacy engineering is shaping the future of digital trust. Hear expert insights, real-world strategies, and thought-provoking discussions that will change the way you think about data, trust, and innovation.   KEY CONVERSION  00:01:59 How would you describe Digital Trust  00:05:53 What is Privacy Engineering?  00:10:31 What kind of a role do you expect from tech team  00:12:01 How can privacy pros help tech colleagues?  00:17:10 Best way to Reach you    ABOUT THE GUEST  Steve Ahouanmenou is part of the Global Privacy & Data Protection Department at Mastercard and leads the privacy engineering program in Open Banking.  His mission is to enable innovation and trust in the digital finance realm, by applying his analytical skills, domain expertise, and collaborative approach to privacy and security challenges.  With over 10 years of experience in information security, privacy risks and data governance, he has worked with global organizations in various sectors with a focus on healthcare and finance. He also a PhD Candidate at Ghent University, investigating information security and privacy in healthcare institutions, and an alumni of Belgium's 40under40. He holds multiple certifications, such as ISO 27001 Senior Lead Implementer, CIPP/E, CISM, CDPSE, ITIL v3, DPO, COBIT 5. ABOUT HOST  Punit Bhatia is one of the leading privacy experts who works independently and has worked with professionals in over 30 countries. Punit works with business and privacy leaders to create an organization culture with high privacy awareness and compliance as a business priority. Selectively, Punit is open to mentor and coach professionals.  Punit is the author of books “Be Ready for GDPR'' which was rated as the best GDPR Book, “AI & Privacy – How to Find Balance”, “Intro To GDPR”, and “Be an Effective DPO”. Punit is a global speaker who has spoken at over 30 global events. Punit is the creator and host of the FIT4PRIVACY Podcast. This podcast has been featured amongst top GDPR and privacy podcasts.  As a person, Punit is an avid thinker and believes in thinking, believing, and acting in line with one's value to have joy in life. He has developed the philosophy named ‘ABC for joy of life' which passionately shares. Punit is based out of Belgium, the heart of Europe.  RESOURCES  Websiteswww.fit4privacy.com,www.punitbhatia.com,https://www.linkedin.com/in/steve-ahouanmenou/  Podcast https://www.fit4privacy.com/podcast  Blog https://www.fit4privacy.com/blog  YouTube http://youtube.com/fit4privacy

Serious Privacy
Villains, Visions, and Val - Welcome 2025 (with Val Ilchenko)

Serious Privacy

Play Episode Listen Later Jan 28, 2025 38:38 Transcription Available


Send us a textOn this week of Serious Privacy, Paul Breitbarth, Ralph O'Brien, and Dr. K Royal ring in the new year with Val Ilchenko, General Counsel and Chief Privacy Officer of TrustArc. No topic was off limits! We discussed the ghosts of Privacy past, now, and future. Tune in to hear all about it as we kick off the new year on #GlobalDataPrivacyday / #GlobalDataProtectionDay 2025!Please follow and set to auto-downloads in your favorite podcast app - sharing is caring!  With TrustArc's Privacy Studio and Governance Suite, you can automate cookie compliance, streamline data subject rights, and centralize your privacy tasks—all while reducing compliance costs. Visit TrustArc.com/serious-privacy.Powered by TrustArcSeamlessly manage your privacy program, assess risks, and stay up to date on laws across the globe. With TrustArc's Privacy Studio and Governance Suite, you can automate cookie compliance, streamline data subject rights, and centralize your privacy tasks—all while reducing compliance costs. Visit TrustArc.com/serious-privacy.Powered by TrustArcSeamlessly manage your privacy program, assess risks, and stay up to date on laws across the globe.If you have comments or questions, find us on LinkedIn and Instagram @seriousprivacy, and on BlueSky under @seriousprivacy.eu, @europaulb.seriousprivacy.eu, @heartofprivacy.bsky.app and @igrobrien.seriousprivacy.eu, and email podcast@seriousprivacy.eu. Rate and Review us! Proudly sponsored by TrustArc. Nobody Knows Privacy Like the Privacy Pros.Learn more at https://trustarc.com/serious-privacy/ From Season 6, our episodes are edited by Fey O'Brien. Our intro and exit music is Channel Intro 24 by Sascha Ende, licensed under CC BY 4.0. #heartofprivacy #europaulb #igrobrien #seriousprivacy #privacy #dataprotection #cybersecuritylaw #CPO #DPO #CISO

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0
Outlasting Noam Shazeer, crowdsourcing Chat + AI with >1.4m DAU, and becoming the "Western DeepSeek" — with William Beauchamp, Chai Research

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Jan 26, 2025 75:46


One last Gold sponsor slot is available for the AI Engineer Summit in NYC. Our last round of invites is going out soon - apply here - If you are building AI agents or AI eng teams, this will be the single highest-signal conference of the year for you!While the world melts down over DeepSeek, few are talking about the OTHER notable group of former hedge fund traders who pivoted into AI and built a remarkably profitable consumer AI business with a tiny team with incredibly cracked engineering team — Chai Research. In short order they have:* Started a Chat AI company well before Noam Shazeer started Character AI, and outlasted his departure.* Crossed 1m DAU in 2.5 years - William updates us on the pod that they've hit 1.4m DAU now, another +40% from a few months ago. Revenue crossed >$22m. * Launched the Chaiverse model crowdsourcing platform - taking 3-4 week A/B testing cycles down to 3-4 hours, and deploying >100 models a week.While they're not paying million dollar salaries, you can tell they're doing pretty well for an 11 person startup:The Chai Recipe: Building infra for rapid evalsRemember how the central thesis of LMarena (formerly LMsys) is that the only comprehensive way to evaluate LLMs is to let users try them out and pick winners?At the core of Chai is a mobile app that looks like Character AI, but is actually the largest LLM A/B testing arena in the world, specialized on retaining chat users for Chai's usecases (therapy, assistant, roleplay, etc). It's basically what LMArena would be if taken very, very seriously at one company (with $1m in prizes to boot):Chai publishes occasional research on how they think about this, including talks at their Palo Alto office:William expands upon this in today's podcast (34 mins in):Fundamentally, the way I would describe it is when you're building anything in life, you need to be able to evaluate it. And through evaluation, you can iterate, we can look at benchmarks, and we can say the issues with benchmarks and why they may not generalize as well as one would hope in the challenges of working with them. But something that works incredibly well is getting feedback from humans. And so we built this thing where anyone can submit a model to our developer backend, and it gets put in front of 5000 users, and the users can rate it. And we can then have a really accurate ranking of like which model, or users finding more engaging or more entertaining. And it gets, you know, it's at this point now, where every day we're able to, I mean, we evaluate between 20 and 50 models, LLMs, every single day, right. So even though we've got only got a team of, say, five AI researchers, they're able to iterate a huge quantity of LLMs, right. So our team ships, let's just say minimum 100 LLMs a week is what we're able to iterate through. Now, before that moment in time, we might iterate through three a week, we might, you know, there was a time when even doing like five a month was a challenge, right? By being able to change the feedback loops to the point where it's not, let's launch these three models, let's do an A-B test, let's assign, let's do different cohorts, let's wait 30 days to see what the day 30 retention is, which is the kind of the, if you're doing an app, that's like A-B testing 101 would be, do a 30-day retention test, assign different treatments to different cohorts and come back in 30 days. So that's insanely slow. That's just, it's too slow. And so we were able to get that 30-day feedback loop all the way down to something like three hours.In Crowdsourcing the leap to Ten Trillion-Parameter AGI, William describes Chai's routing as a recommender system, which makes a lot more sense to us than previous pitches for model routing startups:William is notably counter-consensus in a lot of his AI product principles:* No streaming: Chats appear all at once to allow rejection sampling* No voice: Chai actually beat Character AI to introducing voice - but removed it after finding that it was far from a killer feature.* Blending: “Something that we love to do at Chai is blending, which is, you know, it's the simplest way to think about it is you're going to end up, and you're going to pretty quickly see you've got one model that's really smart, one model that's really funny. How do you get the user an experience that is both smart and funny? Well, just 50% of the requests, you can serve them the smart model, 50% of the requests, you serve them the funny model.” (that's it!)But chief above all is the recommender system.We also referenced Exa CEO Will Bryk's concept of SuperKnowlege:Full Video versionOn YouTube. please like and subscribe!Timestamps* 00:00:04 Introductions and background of William Beauchamp* 00:01:19 Origin story of Chai AI* 00:04:40 Transition from finance to AI* 00:11:36 Initial product development and idea maze for Chai* 00:16:29 User psychology and engagement with AI companions* 00:20:00 Origin of the Chai name* 00:22:01 Comparison with Character AI and funding challenges* 00:25:59 Chai's growth and user numbers* 00:34:53 Key inflection points in Chai's growth* 00:42:10 Multi-modality in AI companions and focus on user-generated content* 00:46:49 Chaiverse developer platform and model evaluation* 00:51:58 Views on AGI and the nature of AI intelligence* 00:57:14 Evaluation methods and human feedback in AI development* 01:02:01 Content creation and user experience in Chai* 01:04:49 Chai Grant program and company culture* 01:07:20 Inference optimization and compute costs* 01:09:37 Rejection sampling and reward models in AI generation* 01:11:48 Closing thoughts and recruitmentTranscriptAlessio [00:00:04]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO at Decibel, and today we're in the Chai AI office with my usual co-host, Swyx.swyx [00:00:14]: Hey, thanks for having us. It's rare that we get to get out of the office, so thanks for inviting us to your home. We're in the office of Chai with William Beauchamp. Yeah, that's right. You're founder of Chai AI, but previously, I think you're concurrently also running your fund?William [00:00:29]: Yep, so I was simultaneously running an algorithmic trading company, but I fortunately was able to kind of exit from that, I think just in Q3 last year. Yeah, congrats. Yeah, thanks.swyx [00:00:43]: So Chai has always been on my radar because, well, first of all, you do a lot of advertising, I guess, in the Bay Area, so it's working. Yep. And second of all, the reason I reached out to a mutual friend, Joyce, was because I'm just generally interested in the... ...consumer AI space, chat platforms in general. I think there's a lot of inference insights that we can get from that, as well as human psychology insights, kind of a weird blend of the two. And we also share a bit of a history as former finance people crossing over. I guess we can just kind of start it off with the origin story of Chai.William [00:01:19]: Why decide working on a consumer AI platform rather than B2B SaaS? So just quickly touching on the background in finance. Sure. Originally, I'm from... I'm from the UK, born in London. And I was fortunate enough to go study economics at Cambridge. And I graduated in 2012. And at that time, everyone in the UK and everyone on my course, HFT, quant trading was really the big thing. It was like the big wave that was happening. So there was a lot of opportunity in that space. And throughout college, I'd sort of played poker. So I'd, you know, I dabbled as a professional poker player. And I was able to accumulate this sort of, you know, say $100,000 through playing poker. And at the time, as my friends would go work at companies like ChangeStreet or Citadel, I kind of did the maths. And I just thought, well, maybe if I traded my own capital, I'd probably come out ahead. I'd make more money than just going to work at ChangeStreet.swyx [00:02:20]: With 100k base as capital?William [00:02:22]: Yes, yes. That's not a lot. Well, it depends what strategies you're doing. And, you know, there is an advantage. There's an advantage to being small, right? Because there are, if you have a 10... Strategies that don't work in size. Exactly, exactly. So if you have a fund of $10 million, if you find a little anomaly in the market that you might be able to make 100k a year from, that's a 1% return on your 10 million fund. If your fund is 100k, that's 100% return, right? So being small, in some sense, was an advantage. So started off, and the, taught myself Python, and machine learning was like the big thing as well. Machine learning had really, it was the first, you know, big time machine learning was being used for image recognition, neural networks come out, you get dropout. And, you know, so this, this was the big thing that's going on at the time. So I probably spent my first three years out of Cambridge, just building neural networks, building random forests to try and predict asset prices, right, and then trade that using my own money. And that went well. And, you know, if you if you start something, and it goes well, you You try and hire more people. And the first people that came to mind was the talented people I went to college with. And so I hired some friends. And that went well and hired some more. And eventually, I kind of ran out of friends to hire. And so that was when I formed the company. And from that point on, we had our ups and we had our downs. And that was a whole long story and journey in itself. But after doing that for about eight or nine years, on my 30th birthday, which was four years ago now, I kind of took a step back to just evaluate my life, right? This is what one does when one turns 30. You know, I just heard it. I hear you. And, you know, I looked at my 20s and I loved it. It was a really special time. I was really lucky and fortunate to have worked with this amazing team, been successful, had a lot of hard times. And through the hard times, learned wisdom and then a lot of success and, you know, was able to enjoy it. And so the company was making about five million pounds a year. And it was just me and a team of, say, 15, like, Oxford and Cambridge educated mathematicians and physicists. It was like the real dream that you'd have if you wanted to start a quant trading firm. It was like...swyx [00:04:40]: Your own, all your own money?William [00:04:41]: Yeah, exactly. It was all the team's own money. We had no customers complaining to us about issues. There's no investors, you know, saying, you know, they don't like the risk that we're taking. We could. We could really run the thing exactly as we wanted it. It's like Susquehanna or like Rintec. Yeah, exactly. Yeah. And they're the companies that we would kind of look towards as we were building that thing out. But on my 30th birthday, I look and I say, OK, great. This thing is making as much money as kind of anyone would really need. And I thought, well, what's going to happen if we keep going in this direction? And it was clear that we would never have a kind of a big, big impact on the world. We can enrich ourselves. We can make really good money. Everyone on the team would be paid very, very well. Presumably, I can make enough money to buy a yacht or something. But this stuff wasn't that important to me. And so I felt a sort of obligation that if you have this much talent and if you have a talented team, especially as a founder, you want to be putting all that talent towards a good use. I looked at the time of like getting into crypto and I had a really strong view on crypto, which was that as far as a gambling device. This is like the most fun form of gambling invented in like ever super fun, I thought as a way to evade monetary regulations and banking restrictions. I think it's also absolutely amazing. So it has two like killer use cases, not so much banking the unbanked, but everything else, but everything else to do with like the blockchain and, and you know, web, was it web 3.0 or web, you know, that I, that didn't, it didn't really make much sense. And so instead of going into crypto, which I thought, even if I was successful, I'd end up in a lot of trouble. I thought maybe it'd be better to build something that governments wouldn't have a problem with. I knew that LLMs were like a thing. I think opening. I had said they hadn't released GPT-3 yet, but they'd said GPT-3 is so powerful. We can't release it to the world or something. Was it GPT-2? And then I started interacting with, I think Google had open source, some language models. They weren't necessarily LLMs, but they, but they were. But yeah, exactly. So I was able to play around with, but nowadays so many people have interacted with the chat GPT, they get it, but it's like the first time you, you can just talk to a computer and it talks back. It's kind of a special moment and you know, everyone who's done that goes like, wow, this is how it should be. Right. It should be like, rather than having to type on Google and search, you should just be able to ask Google a question. When I saw that I read the literature, I kind of came across the scaling laws and I think even four years ago. All the pieces of the puzzle were there, right? Google had done this amazing research and published, you know, a lot of it. Open AI was still open. And so they'd published a lot of their research. And so you really could be fully informed on, on the state of AI and where it was going. And so at that point I was confident enough, it was worth a shot. I think LLMs are going to be the next big thing. And so that's the thing I want to be building in, in that space. And I thought what's the most impactful product I can possibly build. And I thought it should be a platform. So I myself love platforms. I think they're fantastic because they open up an ecosystem where anyone can contribute to it. Right. So if you think of a platform like a YouTube, instead of it being like a Hollywood situation where you have to, if you want to make a TV show, you have to convince Disney to give you the money to produce it instead, anyone in the world can post any content they want to YouTube. And if people want to view it, the algorithm is going to promote it. Nowadays. You can look at creators like Mr. Beast or Joe Rogan. They would have never have had that opportunity unless it was for this platform. Other ones like Twitter's a great one, right? But I would consider Wikipedia to be a platform where instead of the Britannica encyclopedia, which is this, it's like a monolithic, you get all the, the researchers together, you get all the data together and you combine it in this, in this one monolithic source. Instead. You have this distributed thing. You can say anyone can host their content on Wikipedia. Anyone can contribute to it. And anyone can maybe their contribution is they delete stuff. When I was hearing like the kind of the Sam Altman and kind of the, the Muskian perspective of AI, it was a very kind of monolithic thing. It was all about AI is basically a single thing, which is intelligence. Yeah. Yeah. The more intelligent, the more compute, the more intelligent, and the more and better AI researchers, the more intelligent, right? They would speak about it as a kind of erased, like who can get the most data, the most compute and the most researchers. And that would end up with the most intelligent AI. But I didn't believe in any of that. I thought that's like the total, like I thought that perspective is the perspective of someone who's never actually done machine learning. Because with machine learning, first of all, you see that the performance of the models follows an S curve. So it's not like it just goes off to infinity, right? And the, the S curve, it kind of plateaus around human level performance. And you can look at all the, all the machine learning that was going on in the 2010s, everything kind of plateaued around the human level performance. And we can think about the self-driving car promises, you know, how Elon Musk kept saying the self-driving car is going to happen next year, it's going to happen next, next year. Or you can look at the image recognition, the speech recognition. You can look at. All of these things, there was almost nothing that went superhuman, except for something like AlphaGo. And we can speak about why AlphaGo was able to go like super superhuman. So I thought the most likely thing was going to be this, I thought it's not going to be a monolithic thing. That's like an encyclopedia Britannica. I thought it must be a distributed thing. And I actually liked to look at the world of finance for what I think a mature machine learning ecosystem would look like. So, yeah. So finance is a machine learning ecosystem because all of these quant trading firms are running machine learning algorithms, but they're running it on a centralized platform like a marketplace. And it's not the case that there's one giant quant trading company of all the data and all the quant researchers and all the algorithms and compute, but instead they all specialize. So one will specialize on high frequency training. Another will specialize on mid frequency. Another one will specialize on equity. Another one will specialize. And I thought that's the way the world works. That's how it is. And so there must exist a platform where a small team can produce an AI for a unique purpose. And they can iterate and build the best thing for that, right? And so that was the vision for Chai. So we wanted to build a platform for LLMs.Alessio [00:11:36]: That's kind of the maybe inside versus contrarian view that led you to start the company. Yeah. And then what was maybe the initial idea maze? Because if somebody told you that was the Hugging Face founding story, people might believe it. It's kind of like a similar ethos behind it. How did you land on the product feature today? And maybe what were some of the ideas that you discarded that initially you thought about?William [00:11:58]: So the first thing we built, it was fundamentally an API. So nowadays people would describe it as like agents, right? But anyone could write a Python script. They could submit it to an API. They could send it to the Chai backend and we would then host this code and execute it. So that's like the developer side of the platform. On their Python script, the interface was essentially text in and text out. An example would be the very first bot that I created. I think it was a Reddit news bot. And so it would first, it would pull the popular news. Then it would prompt whatever, like I just use some external API for like Burr or GPT-2 or whatever. Like it was a very, very small thing. And then the user could talk to it. So you could say to the bot, hi bot, what's the news today? And it would say, this is the top stories. And you could chat with it. Now four years later, that's like perplexity or something. That's like the, right? But back then the models were first of all, like really, really dumb. You know, they had an IQ of like a four year old. And users, there really wasn't any demand or any PMF for interacting with the news. So then I was like, okay. Um. So let's make another one. And I made a bot, which was like, you could talk to it about a recipe. So you could say, I'm making eggs. Like I've got eggs in my fridge. What should I cook? And it'll say, you should make an omelet. Right. There was no PMF for that. No one used it. And so I just kept creating bots. And so every single night after work, I'd be like, okay, I like, we have AI, we have this platform. I can create any text in textile sort of agent and put it on the platform. And so we just create stuff night after night. And then all the coders I knew, I would say, yeah, this is what we're going to do. And then I would say to them, look, there's this platform. You can create any like chat AI. You should put it on. And you know, everyone's like, well, chatbots are super lame. We want absolutely nothing to do with your chatbot app. No one who knew Python wanted to build on it. I'm like trying to build all these bots and no consumers want to talk to any of them. And then my sister who at the time was like just finishing college or something, I said to her, I was like, if you want to learn Python, you should just submit a bot for my platform. And she, she built a therapy for me. And I was like, okay, cool. I'm going to build a therapist bot. And then the next day I checked the performance of the app and I'm like, oh my God, we've got 20 active users. And they spent, they spent like an average of 20 minutes on the app. I was like, oh my God, what, what bot were they speaking to for an average of 20 minutes? And I looked and it was the therapist bot. And I went, oh, this is where the PMF is. There was no demand for, for recipe help. There was no demand for news. There was no demand for dad jokes or pub quiz or fun facts or what they wanted was they wanted the therapist bot. the time I kind of reflected on that and I thought, well, if I want to consume news, the most fun thing, most fun way to consume news is like Twitter. It's not like the value of there being a back and forth, wasn't that high. Right. And I thought if I need help with a recipe, I actually just go like the New York times has a good recipe section, right? It's not actually that hard. And so I just thought the thing that AI is 10 X better at is a sort of a conversation right. That's not intrinsically informative, but it's more about an opportunity. You can say whatever you want. You're not going to get judged. If it's 3am, you don't have to wait for your friend to text back. It's like, it's immediate. They're going to reply immediately. You can say whatever you want. It's judgment-free and it's much more like a playground. It's much more like a fun experience. And you could see that if the AI gave a person a compliment, they would love it. It's much easier to get the AI to give you a compliment than a human. From that day on, I said, okay, I get it. Humans want to speak to like humans or human like entities and they want to have fun. And that was when I started to look less at platforms like Google. And I started to look more at platforms like Instagram. And I was trying to think about why do people use Instagram? And I could see that I think Chai was, was filling the same desire or the same drive. If you go on Instagram, typically you want to look at the faces of other humans, or you want to hear about other people's lives. So if it's like the rock is making himself pancakes on a cheese plate. You kind of feel a little bit like you're the rock's friend, or you're like having pancakes with him or something, right? But if you do it too much, you feel like you're sad and like a lonely person, but with AI, you can talk to it and tell it stories and tell you stories, and you can play with it for as long as you want. And you don't feel like you're like a sad, lonely person. You feel like you actually have a friend.Alessio [00:16:29]: And what, why is that? Do you have any insight on that from using it?William [00:16:33]: I think it's just the human psychology. I think it's just the idea that, with old school social media. You're just consuming passively, right? So you'll just swipe. If I'm watching TikTok, just like swipe and swipe and swipe. And even though I'm getting the dopamine of like watching an engaging video, there's this other thing that's building my head, which is like, I'm feeling lazier and lazier and lazier. And after a certain period of time, I'm like, man, I just wasted 40 minutes. I achieved nothing. But with AI, because you're interacting, you feel like you're, it's not like work, but you feel like you're participating and contributing to the thing. You don't feel like you're just. Consuming. So you don't have a sense of remorse basically. And you know, I think on the whole people, the way people talk about, try and interact with the AI, they speak about it in an incredibly positive sense. Like we get people who say they have eating disorders saying that the AI helps them with their eating disorders. People who say they're depressed, it helps them through like the rough patches. So I think there's something intrinsically healthy about interacting that TikTok and Instagram and YouTube doesn't quite tick. From that point on, it was about building more and more kind of like human centric AI for people to interact with. And I was like, okay, let's make a Kanye West bot, right? And then no one wanted to talk to the Kanye West bot. And I was like, ah, who's like a cool persona for teenagers to want to interact with. And I was like, I was trying to find the influencers and stuff like that, but no one cared. Like they didn't want to interact with the, yeah. And instead it was really just the special moment was when we said the realization that developers and software engineers aren't interested in building this sort of AI, but the consumers are right. And rather than me trying to guess every day, like what's the right bot to submit to the platform, why don't we just create the tools for the users to build it themselves? And so nowadays this is like the most obvious thing in the world, but when Chai first did it, it was not an obvious thing at all. Right. Right. So we took the API for let's just say it was, I think it was GPTJ, which was this 6 billion parameter open source transformer style LLM. We took GPTJ. We let users create the prompt. We let users select the image and we let users choose the name. And then that was the bot. And through that, they could shape the experience, right? So if they said this bot's going to be really mean, and it's going to be called like bully in the playground, right? That was like a whole category that I never would have guessed. Right. People love to fight. They love to have a disagreement, right? And then they would create, there'd be all these romantic archetypes that I didn't know existed. And so as the users could create the content that they wanted, that was when Chai was able to, to get this huge variety of content and rather than appealing to, you know, 1% of the population that I'd figured out what they wanted, you could appeal to a much, much broader thing. And so from that moment on, it was very, very crystal clear. It's like Chai, just as Instagram is this social media platform that lets people create images and upload images, videos and upload that, Chai was really about how can we let the users create this experience in AI and then share it and interact and search. So it's really, you know, I say it's like a platform for social AI.Alessio [00:20:00]: Where did the Chai name come from? Because you started the same path. I was like, is it character AI shortened? You started at the same time, so I was curious. The UK origin was like the second, the Chai.William [00:20:15]: We started way before character AI. And there's an interesting story that Chai's numbers were very, very strong, right? So I think in even 20, I think late 2022, was it late 2022 or maybe early 2023? Chai was like the number one AI app in the app store. So we would have something like 100,000 daily active users. And then one day we kind of saw there was this website. And we were like, oh, this website looks just like Chai. And it was the character AI website. And I think that nowadays it's, I think it's much more common knowledge that when they left Google with the funding, I think they knew what was the most trending, the number one app. And I think they sort of built that. Oh, you found the people.swyx [00:21:03]: You found the PMF for them.William [00:21:04]: We found the PMF for them. Exactly. Yeah. So I worked a year very, very hard. And then they, and then that was when I learned a lesson, which is that if you're VC backed and if, you know, so Chai, we'd kind of ran, we'd got to this point, I was the only person who'd invested. I'd invested maybe 2 million pounds in the business. And you know, from that, we were able to build this thing, get to say a hundred thousand daily active users. And then when character AI came along, the first version, we sort of laughed. We were like, oh man, this thing sucks. Like they don't know what they're building. They're building the wrong thing anyway, but then I saw, oh, they've raised a hundred million dollars. Oh, they've raised another hundred million dollars. And then our users started saying, oh guys, your AI sucks. Cause we were serving a 6 billion parameter model, right? How big was the model that character AI could afford to serve, right? So we would be spending, let's say we would spend a dollar per per user, right? Over the, the, you know, the entire lifetime.swyx [00:22:01]: A dollar per session, per chat, per month? No, no, no, no.William [00:22:04]: Let's say we'd get over the course of the year, we'd have a million users and we'd spend a million dollars on the AI throughout the year. Right. Like aggregated. Exactly. Exactly. Right. They could spend a hundred times that. So people would say, why is your AI much dumber than character AIs? And then I was like, oh, okay, I get it. This is like the Silicon Valley style, um, hyper scale business. And so, yeah, we moved to Silicon Valley and, uh, got some funding and iterated and built the flywheels. And, um, yeah, I, I'm very proud that we were able to compete with that. Right. So, and I think the reason we were able to do it was just customer obsession. And it's similar, I guess, to how deep seek have been able to produce such a compelling model when compared to someone like an open AI, right? So deep seek, you know, their latest, um, V2, yeah, they claim to have spent 5 million training it.swyx [00:22:57]: It may be a bit more, but, um, like, why are you making it? Why are you making such a big deal out of this? Yeah. There's an agenda there. Yeah. You brought up deep seek. So we have to ask you had a call with them.William [00:23:07]: We did. We did. We did. Um, let me think what to say about that. I think for one, they have an amazing story, right? So their background is again in finance.swyx [00:23:16]: They're the Chinese version of you. Exactly.William [00:23:18]: Well, there's a lot of similarities. Yes. Yes. I have a great affinity for companies which are like, um, founder led, customer obsessed and just try and build something great. And I think what deep seek have achieved. There's quite special is they've got this amazing inference engine. They've been able to reduce the size of the KV cash significantly. And then by being able to do that, they're able to significantly reduce their inference costs. And I think with kind of with AI, people get really focused on like the kind of the foundation model or like the model itself. And they sort of don't pay much attention to the inference. To give you an example with Chai, let's say a typical user session is 90 minutes, which is like, you know, is very, very long for comparison. Let's say the average session length on TikTok is 70 minutes. So people are spending a lot of time. And in that time they're able to send say 150 messages. That's a lot of completions, right? It's quite different from an open AI scenario where people might come in, they'll have a particular question in mind. And they'll ask like one question. And a few follow up questions, right? So because they're consuming, say 30 times as many requests for a chat, or a conversational experience, you've got to figure out how to how to get the right balance between the cost of that and the quality. And so, you know, I think with AI, it's always been the case that if you want a better experience, you can throw compute at the problem, right? So if you want a better model, you can just make it bigger. If you want it to remember better, give it a longer context. And now, what open AI is doing to great fanfare is with projection sampling, you can generate many candidates, right? And then with some sort of reward model or some sort of scoring system, you can serve the most promising of these many candidates. And so that's kind of scaling up on the inference time compute side of things. And so for us, it doesn't make sense to think of AI is just the absolute performance. So. But what we're seeing, it's like the MML you score or the, you know, any of these benchmarks that people like to look at, if you just get that score, it doesn't really tell tell you anything. Because it's really like progress is made by improving the performance per dollar. And so I think that's an area where deep seek have been able to form very, very well, surprisingly so. And so I'm very interested in what Lama four is going to look like. And if they're able to sort of match what deep seek have been able to achieve with this performance per dollar gain.Alessio [00:25:59]: Before we go into the inference, some of the deeper stuff, can you give people an overview of like some of the numbers? So I think last I checked, you have like 1.4 million daily active now. It's like over 22 million of revenue. So it's quite a business.William [00:26:12]: Yeah, I think we grew by a factor of, you know, users grew by a factor of three last year. Revenue over doubled. You know, it's very exciting. We're competing with some really big, really well funded companies. Character AI got this, I think it was almost a $3 billion valuation. And they have 5 million DAU is a number that I last heard. Torquay, which is a Chinese built app owned by a company called Minimax. They're incredibly well funded. And these companies didn't grow by a factor of three last year. Right. And so when you've got this company and this team that's able to keep building something that gets users excited, and they want to tell their friend about it, and then they want to come and they want to stick on the platform. I think that's very special. And so last year was a great year for the team. And yeah, I think the numbers reflect the hard work that we put in. And then fundamentally, the quality of the app, the quality of the content, the quality of the content, the quality of the content, the quality of the content, the quality of the content. AI is the quality of the experience that you have. You actually published your DAU growth chart, which is unusual. And I see some inflections. Like, it's not just a straight line. There's some things that actually inflect. Yes. What were the big ones? Cool. That's a great, great, great question. Let me think of a good answer. I'm basically looking to annotate this chart, which doesn't have annotations on it. Cool. The first thing I would say is this is, I think the most important thing to know about success is that success is born out of failures. Right? Through failures that we learn. You know, if you think something's a good idea, and you do and it works, great, but you didn't actually learn anything, because everything went exactly as you imagined. But if you have an idea, you think it's going to be good, you try it, and it fails. There's a gap between the reality and expectation. And that's an opportunity to learn. The flat periods, that's us learning. And then the up periods is that's us reaping the rewards of that. So I think the big, of the growth shot of just 2024, I think the first thing that really kind of put a dent in our growth was our backend. So we just reached this scale. So we'd, from day one, we'd built on top of Google's GCP, which is Google's cloud platform. And they were fantastic. We used them when we had one daily active user, and they worked pretty good all the way up till we had about 500,000. It was never the cheapest, but from an engineering perspective, man, that thing scaled insanely good. Like, not Vertex? Not Vertex. Like GKE, that kind of stuff? We use Firebase. So we use Firebase. I'm pretty sure we're the biggest user ever on Firebase. That's expensive. Yeah, we had calls with engineers, and they're like, we wouldn't recommend using this product beyond this point, and you're 3x over that. So we pushed Google to their absolute limits. You know, it was fantastic for us, because we could focus on the AI. We could focus on just adding as much value as possible. But then what happened was, after 500,000, just the thing, the way we were using it, and it would just, it wouldn't scale any further. And so we had a really, really painful, at least three-month period, as we kind of migrated between different services, figuring out, like, what requests do we want to keep on Firebase, and what ones do we want to move on to something else? And then, you know, making mistakes. And learning things the hard way. And then after about three months, we got that right. So that, we would then be able to scale to the 1.5 million DAE without any further issues from the GCP. But what happens is, if you have an outage, new users who go on your app experience a dysfunctional app, and then they're going to exit. And so your next day, the key metrics that the app stores track are going to be something like retention rates. And so your next day, the key metrics that the app stores track are going to be something like retention rates. Money spent, and the star, like, the rating that they give you. In the app store. In the app store, yeah. Tyranny. So if you're ranked top 50 in entertainment, you're going to acquire a certain rate of users organically. If you go in and have a bad experience, it's going to tank where you're positioned in the algorithm. And then it can take a long time to kind of earn your way back up, at least if you wanted to do it organically. If you throw money at it, you can jump to the top. And I could talk about that. But broadly speaking, if we look at 2024, the first kink in the graph was outages due to hitting 500k DAU. The backend didn't want to scale past that. So then we just had to do the engineering and build through it. Okay, so we built through that, and then we get a little bit of growth. And so, okay, that's feeling a little bit good. I think the next thing, I think it's, I'm not going to lie, I have a feeling that when Character AI got... I was thinking. I think so. I think... So the Character AI team fundamentally got acquired by Google. And I don't know what they changed in their business. I don't know if they dialed down that ad spend. Products don't change, right? Products just what it is. I don't think so. Yeah, I think the product is what it is. It's like maintenance mode. Yes. I think the issue that people, you know, some people may think this is an obvious fact, but running a business can be very competitive, right? Because other businesses can see what you're doing, and they can imitate you. And then there's this... There's this question of, if you've got one company that's spending $100,000 a day on advertising, and you've got another company that's spending zero, if you consider market share, and if you're considering new users which are entering the market, the guy that's spending $100,000 a day is going to be getting 90% of those new users. And so I have a suspicion that when the founders of Character AI left, they dialed down their spending on user acquisition. And I think that kind of gave oxygen to like the other apps. And so Chai was able to then start growing again in a really healthy fashion. I think that's kind of like the second thing. I think a third thing is we've really built a great data flywheel. Like the AI team sort of perfected their flywheel, I would say, in end of Q2. And I could speak about that at length. But fundamentally, the way I would describe it is when you're building anything in life, you need to be able to evaluate it. And through evaluation, you can iterate, we can look at benchmarks, and we can say the issues with benchmarks and why they may not generalize as well as one would hope in the challenges of working with them. But something that works incredibly well is getting feedback from humans. And so we built this thing where anyone can submit a model to our developer backend, and it gets put in front of 5000 users, and the users can rate it. And we can then have a really accurate ranking of like which model, or users finding more engaging or more entertaining. And it gets, you know, it's at this point now, where every day we're able to, I mean, we evaluate between 20 and 50 models, LLMs, every single day, right. So even though we've got only got a team of, say, five AI researchers, they're able to iterate a huge quantity of LLMs, right. So our team ships, let's just say minimum 100 LLMs a week is what we're able to iterate through. Now, before that moment in time, we might iterate through three a week, we might, you know, there was a time when even doing like five a month was a challenge, right? By being able to change the feedback loops to the point where it's not, let's launch these three models, let's do an A-B test, let's assign, let's do different cohorts, let's wait 30 days to see what the day 30 retention is, which is the kind of the, if you're doing an app, that's like A-B testing 101 would be, do a 30-day retention test, assign different treatments to different cohorts and come back in 30 days. So that's insanely slow. That's just, it's too slow. And so we were able to get that 30-day feedback loop all the way down to something like three hours. And when we did that, we could really, really, really perfect techniques like DPO, fine tuning, prompt engineering, blending, rejection sampling, training a reward model, right, really successfully, like boom, boom, boom, boom, boom. And so I think in Q3 and Q4, we got, the amount of AI improvements we got was like astounding. It was getting to the point, I thought like how much more, how much more edge is there to be had here? But the team just could keep going and going and going. That was like number three for the inflection point.swyx [00:34:53]: There's a fourth?William [00:34:54]: The important thing about the third one is if you go on our Reddit or you talk to users of AI, there's like a clear date. It's like somewhere in October or something. The users, they flipped. Before October, the users... The users would say character AI is better than you, for the most part. Then from October onwards, they would say, wow, you guys are better than character AI. And that was like a really clear positive signal that we'd sort of done it. And I think people, you can't cheat consumers. You can't trick them. You can't b******t them. They know, right? If you're going to spend 90 minutes on a platform, and with apps, there's the barriers to switching is pretty low. Like you can try character AI, you can't cheat consumers. You can't cheat them. You can't cheat them. You can't cheat AI for a day. If you get bored, you can try Chai. If you get bored of Chai, you can go back to character. So the users, the loyalty is not strong, right? What keeps them on the app is the experience. If you deliver a better experience, they're going to stay and they can tell. So that was the fourth one was we were fortunate enough to get this hire. He was hired one really talented engineer. And then they said, oh, at my last company, we had a head of growth. He was really, really good. And he was the head of growth for ByteDance for two years. Would you like to speak to him? And I was like, yes. Yes, I think I would. And so I spoke to him. And he just blew me away with what he knew about user acquisition. You know, it was like a 3D chessswyx [00:36:21]: sort of thing. You know, as much as, as I know about AI. Like ByteDance as in TikTok US. Yes.William [00:36:26]: Not ByteDance as other stuff. Yep. He was interviewing us as we were interviewing him. Right. And so pick up options. Yeah, exactly. And so he was kind of looking at our metrics. And he was like, I saw him get really excited when he said, guys, you've got a million daily active users and you've done no advertising. I said, correct. And he was like, that's unheard of. He's like, I've never heard of anyone doing that. And then he started looking at our metrics. And he was like, if you've got all of this organically, if you start spending money, this is going to be very exciting. I was like, let's give it a go. So then he came in, we've just started ramping up the user acquisition. So that looks like spending, you know, let's say we're spending, we started spending $20,000 a day, it looked very promising than 20,000. Right now we're spending $40,000 a day on user acquisition. That's still only half of what like character AI or talkie may be spending. But from that, it's sort of, we were growing at a rate of maybe say, 2x a year. And that got us growing at a rate of 3x a year. So I'm growing, I'm evolving more and more to like a Silicon Valley style hyper growth, like, you know, you build something decent, and then you canswyx [00:37:33]: slap on a huge... You did the important thing, you did the product first.William [00:37:36]: Of course, but then you can slap on like, like the rocket or the jet engine or something, which is just this cash in, you pour in as much cash, you buy a lot of ads, and your growth is faster.swyx [00:37:48]: Not to, you know, I'm just kind of curious what's working right now versus what surprisinglyWilliam [00:37:52]: doesn't work. Oh, there's a long, long list of surprising stuff that doesn't work. Yeah. The surprising thing, like the most surprising thing, what doesn't work is almost everything doesn't work. That's what's surprising. And I'll give you an example. So like a year and a half ago, I was working at a company, we were super excited by audio. I was like, audio is going to be the next killer feature, we have to get in the app. And I want to be the first. So everything Chai does, I want us to be the first. We may not be the company that's strongest at execution, but we can always be theswyx [00:38:22]: most innovative. Interesting. Right? So we can... You're pretty strong at execution.William [00:38:26]: We're much stronger, we're much stronger. A lot of the reason we're here is because we were first. If we launched today, it'd be so hard to get the traction. Because it's like to get the flywheel, to get the users, to build a product people are excited about. If you're first, people are naturally excited about it. But if you're fifth or 10th, man, you've got to beswyx [00:38:46]: insanely good at execution. So you were first with voice? We were first. We were first. I only knowWilliam [00:38:51]: when character launched voice. They launched it, I think they launched it at least nine months after us. Okay. Okay. But the team worked so hard for it. At the time we did it, latency is a huge problem. Cost is a huge problem. Getting the right quality of the voice is a huge problem. Right? Then there's this user interface and getting the right user experience. Because you don't just want it to start blurting out. Right? You want to kind of activate it. But then you don't have to keep pressing a button every single time. There's a lot that goes into getting a really smooth audio experience. So we went ahead, we invested the three months, we built it all. And then when we did the A-B test, there was like, no change in any of the numbers. And I was like, this can't be right, there must be a bug. And we spent like a week just checking everything, checking again, checking again. And it was like, the users just did not care. And it was something like only 10 or 15% of users even click the button to like, they wanted to engage the audio. And they would only use it for 10 or 15% of the time. So if you do the math, if it's just like something that one in seven people use it for one seventh of their time. You've changed like 2% of the experience. So even if that that 2% of the time is like insanely good, it doesn't translate much when you look at the retention, when you look at the engagement, and when you look at the monetization rates. So audio did not have a big impact. I'm pretty big on audio. But yeah, I like it too. But it's, you know, so a lot of the stuff which I do, I'm a big, you can have a theory. And you resist. Yeah. Exactly, exactly. So I think if you want to make audio work, it has to be a unique, compelling, exciting experience that they can't have anywhere else.swyx [00:40:37]: It could be your models, which just weren't good enough.William [00:40:39]: No, no, no, they were great. Oh, yeah, they were very good. it was like, it was kind of like just the, you know, if you listen to like an audible or Kindle, or something like, you just hear this voice. And it's like, you don't go like, wow, this is this is special, right? It's like a convenience thing. But the idea is that if you can, if Chai is the only platform, like, let's say you have a Mr. Beast, and YouTube is the only platform you can use to make audio work, then you can watch a Mr. Beast video. And it's the most engaging, fun video that you want to watch, you'll go to a YouTube. And so it's like for audio, you can't just put the audio on there. And people go, oh, yeah, it's like 2% better. Or like, 5% of users think it's 20% better, right? It has to be something that the majority of people, for the majority of the experience, go like, wow, this is a big deal. That's the features you need to be shipping. If it's not going to appeal to the majority of people, for the majority of the experience, and it's not a big deal, it's not going to move you. Cool. So you killed it. I don't see it anymore. Yep. So I love this. The longer, it's kind of cheesy, I guess, but the longer I've been working at Chai, and I think the team agrees with this, all the platitudes, at least I thought they were platitudes, that you would get from like the Steve Jobs, which is like, build something insanely great, right? Or be maniacally focused, or, you know, the most important thing is saying no to, not to work on. All of these sort of lessons, they just are like painfully true. They're painfully true. So now I'm just like, everything I say, I'm either quoting Steve Jobs or Zuckerberg. I'm like, guys, move fast and break free.swyx [00:42:10]: You've jumped the Apollo to cool it now.William [00:42:12]: Yeah, it's just so, everything they said is so, so true. The turtle neck. Yeah, yeah, yeah. Everything is so true.swyx [00:42:18]: This last question on my side, and I want to pass this to Alessio, is on just, just multi-modality in general. This actually comes from Justine Moore from A16Z, who's a friend of ours. And a lot of people are trying to do voice image video for AI companions. Yes. You just said voice didn't work. Yep. What would make you revisit?William [00:42:36]: So Steve Jobs, he was very, listen, he was very, very clear on this. There's a habit of engineers who, once they've got some cool technology, they want to find a way to package up the cool technology and sell it to consumers, right? That does not work. So you're free to try and build a startup where you've got your cool tech and you want to find someone to sell it to. That's not what we do at Chai. At Chai, we start with the consumer. What does the consumer want? What is their problem? And how do we solve it? So right now, the number one problems for the users, it's not the audio. That's not the number one problem. It's not the image generation either. That's not their problem either. The number one problem for users in AI is this. All the AI is being generated by middle-aged men in Silicon Valley, right? That's all the content. You're interacting with this AI. You're speaking to it for 90 minutes on average. It's being trained by middle-aged men. The guys out there, they're out there. They're talking to you. They're talking to you. They're like, oh, what should the AI say in this situation, right? What's funny, right? What's cool? What's boring? What's entertaining? That's not the way it should be. The way it should be is that the users should be creating the AI, right? And so the way I speak about it is this. Chai, we have this AI engine in which sits atop a thin layer of UGC. So the thin layer of UGC is absolutely essential, right? It's just prompts. But it's just prompts. It's just an image. It's just a name. It's like we've done 1% of what we could do. So we need to keep thickening up that layer of UGC. It must be the case that the users can train the AI. And if reinforcement learning is powerful and important, they have to be able to do that. And so it's got to be the case that there exists, you know, I say to the team, just as Mr. Beast is able to spend 100 million a year or whatever it is on his production company, and he's got a team building the content, the Mr. Beast company is able to spend 100 million a year on his production company. And he's got a team building the content, which then he shares on the YouTube platform. Until there's a team that's earning 100 million a year or spending 100 million on the content that they're producing for the Chai platform, we're not finished, right? So that's the problem. That's what we're excited to build. And getting too caught up in the tech, I think is a fool's errand. It does not work.Alessio [00:44:52]: As an aside, I saw the Beast Games thing on Amazon Prime. It's not doing well. And I'mswyx [00:44:56]: curious. It's kind of like, I mean, the audience reading is high. The run-to-meet-all sucks, but the audience reading is high.Alessio [00:45:02]: But it's not like in the top 10. I saw it dropped off of like the... Oh, okay. Yeah, that one I don't know. I'm curious, like, you know, it's kind of like similar content, but different platform. And then going back to like, some of what you were saying is like, you know, people come to ChaiWilliam [00:45:13]: expecting some type of content. Yeah, I think it's something that's interesting to discuss is like, is moats. And what is the moat? And so, you know, if you look at a platform like YouTube, the moat, I think is in first is really is in the ecosystem. And the ecosystem, is comprised of you have the content creators, you have the users, the consumers, and then you have the algorithms. And so this, this creates a sort of a flywheel where the algorithms are able to be trained on the users, and the users data, the recommend systems can then feed information to the content creators. So Mr. Beast, he knows which thumbnail does the best. He knows the first 10 seconds of the video has to be this particular way. And so his content is super optimized for the YouTube platform. So that's why it doesn't do well on Amazon. If he wants to do well on Amazon, how many videos has he created on the YouTube platform? By thousands, 10s of 1000s, I guess, he needs to get those iterations in on the Amazon. So at Chai, I think it's all about how can we get the most compelling, rich user generated content, stick that on top of the AI engine, the recommender systems, in such that we get this beautiful data flywheel, more users, better recommendations, more creative, more content, more users.Alessio [00:46:34]: You mentioned the algorithm, you have this idea of the Chaiverse on Chai, and you have your own kind of like LMSYS-like ELO system. Yeah, what are things that your models optimize for, like your users optimize for, and maybe talk about how you build it, how people submit models?William [00:46:49]: So Chaiverse is what I would describe as a developer platform. More often when we're speaking about Chai, we're thinking about the Chai app. And the Chai app is really this product for consumers. And so consumers can come on the Chai app, they can come on the Chai app, they can come on the Chai app, they can interact with our AI, and they can interact with other UGC. And it's really just these kind of bots. And it's a thin layer of UGC. Okay. Our mission is not to just have a very thin layer of UGC. Our mission is to have as much UGC as possible. So we must have, I don't want people at Chai training the AI. I want people, not middle aged men, building AI. I want everyone building the AI, as many people building the AI as possible. Okay, so what we built was we built Chaiverse. And Chaiverse is kind of, it's kind of like a prototype, is the way to think about it. And it started with this, this observation that, well, how many models get submitted into Hugging Face a day? It's hundreds, it's hundreds, right? So there's hundreds of LLMs submitted each day. Now consider that, what does it take to build an LLM? It takes a lot of work, actually. It's like someone devoted several hours of compute, several hours of their time, prepared a data set, launched it, ran it, evaluated it, submitted it, right? So there's a lot of, there's a lot of, there's a lot of work that's going into that. So what we did was we said, well, why can't we host their models for them and serve them to users? And then what would that look like? The first issue is, well, how do you know if a model is good or not? Like, we don't want to serve users the crappy models, right? So what we would do is we would, I love the LMSYS style. I think it's really cool. It's really simple. It's a very intuitive thing, which is you simply present the users with two completions. You can say, look, this is from model one. This is from model two. This is from model three. This is from model A. This is from model B, which is better. And so if someone submits a model to Chaiverse, what we do is we spin up a GPU. We download the model. We're going to now host that model on this GPU. And we're going to start routing traffic to it. And we're going to send, we think it takes about 5,000 completions to get an accurate signal. That's roughly what LMSYS does. And from that, we're able to get an accurate ranking. And we're able to get an accurate ranking. And we're able to get an accurate ranking of which models are people finding entertaining and which models are not entertaining. If you look at the bottom 80%, they'll suck. You can just disregard them. They totally suck. Then when you get the top 20%, you know you've got a decent model, but you can break it down into more nuance. There might be one that's really descriptive. There might be one that's got a lot of personality to it. There might be one that's really illogical. Then the question is, well, what do you do with these top models? From that, you can do more sophisticated things. You can try and do like a routing thing where you say for a given user request, we're going to try and predict which of these end models that users enjoy the most. That turns out to be pretty expensive and not a huge source of like edge or improvement. Something that we love to do at Chai is blending, which is, you know, it's the simplest way to think about it is you're going to end up, and you're going to pretty quickly see you've got one model that's really smart, one model that's really funny. How do you get the user an experience that is both smart and funny? Well, just 50% of the requests, you can serve them the smart model, 50% of the requests, you serve them the funny model. Just a random 50%? Just a random, yeah. And then... That's blending? That's blending. You can do more sophisticated things on top of that, as in all things in life, but the 80-20 solution, if you just do that, you get a pretty powerful effect out of the gate. Random number generator. I think it's like the robustness of randomness. Random is a very powerful optimization technique, and it's a very robust thing. So you can explore a lot of the space very efficiently. There's one thing that's really, really important to share, and this is the most exciting thing for me, is after you do the ranking, you get an ELO score, and you can track a user's first join date, the first date they submit a model to Chaiverse, they almost always get a terrible ELO, right? So let's say the first submission they get an ELO of 1,100 or 1,000 or something, and you can see that they iterate and they iterate and iterate, and it will be like, no improvement, no improvement, no improvement, and then boom. Do you give them any data, or do you have to come up with this themselves? We do, we do, we do, we do. We try and strike a balance between giving them data that's very useful, you've got to be compliant with GDPR, which is like, you have to work very hard to preserve the privacy of users of your app. So we try to give them as much signal as possible, to be helpful. The minimum is we're just going to give you a score, right? That's the minimum. But that alone is people can optimize a score pretty well, because they're able to come up with theories, submit it, does it work? No. A new theory, does it work? No. And then boom, as soon as they figure something out, they keep it, and then they iterate, and then boom,Alessio [00:51:46]: they figure something out, and they keep it. Last year, you had this post on your blog, cross-sourcing the lead to the 10 trillion parameter, AGI, and you call it a mixture of experts, recommenders. Yep. Any insights?William [00:51:58]: Updated thoughts, 12 months later? I think the odds, the timeline for AGI has certainly been pushed out, right? Now, this is in, I'm a controversial person, I don't know, like, I just think... You don't believe in scaling laws, you think AGI is further away. I think it's an S-curve. I think everything's an S-curve. And I think that the models have proven to just be far worse at reasoning than people sort of thought. And I think whenever I hear people talk about LLMs as reasoning engines, I sort of cringe a bit. I don't think that's what they are. I think of them more as like a simulator. I think of them as like a, right? So they get trained to predict the next most likely token. It's like a physics simulation engine. So you get these like games where you can like construct a bridge, and you drop a car down, and then it predicts what should happen. And that's really what LLMs are doing. It's not so much that they're reasoning, it's more that they're just doing the most likely thing. So fundamentally, the ability for people to add in intelligence, I think is very limited. What most people would consider intelligence, I think the AI is not a crowdsourcing problem, right? Now with Wikipedia, Wikipedia crowdsources knowledge. It doesn't crowdsource intelligence. So it's a subtle distinction. AI is fantastic at knowledge. I think it's weak at intelligence. And a lot, it's easy to conflate the two because if you ask it a question and it gives you, you know, if you said, who was the seventh president of the United States, and it gives you the correct answer, I'd say, well, I don't know the answer to that. And you can conflate that with intelligence. But really, that's a question of knowledge. And knowledge is really this thing about saying, how can I store all of this information? And then how can I retrieve something that's relevant? Okay, they're fantastic at that. They're fantastic at storing knowledge and retrieving the relevant knowledge. They're superior to humans in that regard. And so I think we need to come up for a new word. How does one describe AI should contain more knowledge than any individual human? It should be more accessible than any individual human. That's a very powerful thing. That's superswyx [00:54:07]: powerful. But what words do we use to describe that? We had a previous guest on Exa AI that does search. And he tried to coin super knowledge as the opposite of super intelligence.William [00:54:20]: Exactly. I think super knowledge is a more accurate word for it.swyx [00:54:24]: You can store more things than any human can.William [00:54:26]: And you can retrieve it better than any human can as well. And I think it's those two things combined that's special. I think that thing will exist. That thing can be built. And I think you can start with something that's entertaining and fun. And I think, I often think it's like, look, it's going to be a 20 year journey. And we're in like, year four, or it's like the web. And this is like 1998 or something. You know, you've got a long, long way to go before the Amazon.coms are like these huge, multi trillion dollar businesses that every single person uses every day. And so AI today is very simplistic. And it's fundamentally the way we're using it, the flywheels, and this ability for how can everyone contribute to it to really magnify the value that it brings. Right now, like, I think it's a bit sad. It's like, right now you have big labs, I'm going to pick on open AI. And they kind of go to like these human labelers. And they say, we're going to pay you to just label this like subset of questions that we want to get a really high quality data set, then we're going to get like our own computers that are really powerful. And that's kind of like the thing. For me, it's so much like Encyclopedia Britannica. It's like insane. All the people that were interested in blockchain, it's like, well, this is this is what needs to be decentralized, you need to decentralize that thing. Because if you distribute it, people can generate way more data in a distributed fashion, way more, right? You need the incentive. Yeah, of course. Yeah. But I mean, the, the, that's kind of the exciting thing about Wikipedia was it's this understanding, like the incentives, you don't need money to incentivize people. You don't need dog coins. No. Sometimes, sometimes people get the satisfaction fro

Masters of Privacy (ES)
Liliana Acosta: el rol de la DPO en EE.UU, ámbito universitario, derecho comparado y garantías de los encargados del tratamiento

Masters of Privacy (ES)

Play Episode Listen Later Jan 7, 2025 50:16


Liliana Acosta (DPO, Utah State University) es abogada especializada en protección de datos, gobernanza y gestión de riesgos, formada en Colombia y habiendo trabajado también en Guatemala. Con ella hemos analizado las particularidades asociadas a estructurar y gestionar un programa de protección de datos en Estados Unidos, incluyendo actividades de marketing en el ámbito universitario y diferencias importantes en lo relativo al solapamiento normativo y la gestión de encargados o sub-encargados.  Referencias: Liliana Acosta Santacruz en LinkedIn FERPA: Family Educational Rights and Privacy Act GLBA: Gramm-Leach-Bliley Act HIPAA: Health Insurance Portability and Accountability Act COPPA: Children Online Privacy Protection Act Utah Consumer Privacy Act Utah Government Data Privacy Act NIST Privacy Framework

Serious Privacy
From Renovations to Resolutions: 2024 Recap (see ya' in 2025)

Serious Privacy

Play Episode Listen Later Dec 31, 2024 44:09


Send us a textIn this final episode of the year, hosts Paul Breitbarth, Dr. K Royal, and Ralph O'Brien discuss the highlights of the fifth season of the Serious Privacy podcast, reflecting on the pivotal moments in privacy throughout 2024. They share insights into their favorite episodes, the progression and challenges within global privacy laws, and personal milestones. The hosts also predict what to expect in the privacy landscape in 2025, including advancements in legislation and regulations. As they sign off for the year, they extend their best wishes for the new year and invite listeners to provide feedback and suggestions for future episodes, marking the transition to a new season with an elevated reflectiveness and anticipation. We thank all of our fans and guests and especially our sponsor TrustArc for their support and enthusiasm.  If you have comments or questions, find us on LinkedIn and IG @seriousprivacy, and on Blue Sky under @seriousprivacy.eu, @europaulb.seriousprivacy.eu, @heartofprivacy.bsky.app and @igrobrien.seriousprivacy.eu, and email podcast@seriousprivacy.eu. Rate and Review us! Proudly sponsored by TrustArc. Learn more about NymityAI at https://trustarc.com/nymityai-beta/ #heartofprivacy #europaulb #seriousprivacy #privacy #dataprotection #cybersecuritylaw #CPO #DPO #CISO

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Happy holidays! We'll be sharing snippets from Latent Space LIVE! through the break bringing you the best of 2024! We want to express our deepest appreciation to event sponsors AWS, Daylight Computer, Thoth.ai, StrongCompute, Notable Capital, and most of all all our LS supporters who helped fund the gorgeous venue and A/V production!For NeurIPS last year we did our standard conference podcast coverage interviewing selected papers (that we have now also done for ICLR and ICML), however we felt that we could be doing more to help AI Engineers 1) get more industry-relevant content, and 2) recap 2024 year in review from experts. As a result, we organized the first Latent Space LIVE!, our first in person miniconference, at NeurIPS 2024 in Vancouver. Today, we're proud to share Loubna's highly anticipated talk (slides here)!Synthetic DataWe called out the Synthetic Data debate at last year's NeurIPS, and no surprise that 2024 was dominated by the rise of synthetic data everywhere:* Apple's Rephrasing the Web, Microsoft's Phi 2-4 and Orca/AgentInstruct, Tencent's Billion Persona dataset, DCLM, and HuggingFace's FineWeb-Edu, and Loubna's own Cosmopedia extended the ideas of synthetic textbook and agent generation to improve raw web scrape dataset quality* This year we also talked to the IDEFICS/OBELICS team at HuggingFace who released WebSight this year, the first work on code-vs-images synthetic data.* We called Llama 3.1 the Synthetic Data Model for its extensive use (and documentation!) of synthetic data in its pipeline, as well as its permissive license. * Nemotron CC and Nemotron-4-340B also made a big splash this year for how they used 20k items of human data to synthesize over 98% of the data used for SFT/PFT.* Cohere introduced Multilingual Arbitrage: Optimizing Data Pools to Accelerate Multilingual Progress observing gains of up to 56.5% improvement in win rates comparing multiple teachers vs the single best teacher model* In post training, AI2's Tülu3 (discussed by Luca in our Open Models talk) and Loubna's Smol Talk were also notable open releases this year.This comes in the face of a lot of scrutiny and criticism, with Scale AI as one of the leading voices publishing AI models collapse when trained on recursively generated data in Nature magazine bringing mainstream concerns to the potential downsides of poor quality syndata:Part of the concerns we highlighted last year on low-background tokens are coming to bear: ChatGPT contaminated data is spiking in every possible metric:But perhaps, if Sakana's AI Scientist pans out this year, we will have mostly-AI AI researchers publishing AI research anyway so do we really care as long as the ideas can be verified to be correct?Smol ModelsMeta surprised many folks this year by not just aggressively updating Llama 3 and adding multimodality, but also adding a new series of “small” 1B and 3B “on device” models this year, even working on quantized numerics collaborations with Qualcomm, Mediatek, and Arm. It is near unbelievable that a 1B model today can qualitatively match a 13B model of last year:and the minimum size to hit a given MMLU bar has come down roughly 10x in the last year. We have been tracking this proxied by Lmsys Elo and inference price:The key reads this year are:* MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases* Apple Intelligence Foundation Language Models* Hymba: A Hybrid-head Architecture for Small Language Models* Loubna's SmolLM and SmolLM2: a family of state-of-the-art small models with 135M, 360M, and 1.7B parameters on the pareto efficiency frontier.* and Moondream, which we already covered in the 2024 in Vision talkFull Talk on YouTubeplease like and subscribe!Timestamps* [00:00:05] Loubna Intro* [00:00:33] The Rise of Synthetic Data Everywhere* [00:02:57] Model Collapse* [00:05:14] Phi, FineWeb, Cosmopedia - Synthetic Textbooks* [00:12:36] DCLM, Nemotron-CC* [00:13:28] Post Training - AI2 Tulu, Smol Talk, Cohere Multilingual Arbitrage* [00:16:17] Smol Models* [00:18:24] On Device Models* [00:22:45] Smol Vision Models* [00:25:14] What's NextTranscript2024 in Synthetic Data and Smol Models[00:00:00] ​[00:00:05] Loubna Intro[00:00:05] Speaker: ​I'm very happy to be here. Thank you for the invitation. So I'm going to be talking about synthetic data in 2024. And then I'm going to be talking about small on device models. So I think the most interesting thing about synthetic data this year is that like now we have it everywhere in the large language models pipeline.[00:00:33] The Rise of Synthetic Data Everywhere[00:00:33] Speaker: I think initially, synthetic data was mainly used just for post training, because naturally that's the part where we needed human annotators. And then after that, we realized that we don't really have good benchmarks to [00:01:00] measure if models follow instructions well, if they are creative enough, or if they are chatty enough, so we also started using LLMs as judges.[00:01:08] Speaker: Thank you. And I think this year and towards the end of last year, we also went to the pre training parts and we started generating synthetic data for pre training to kind of replace some parts of the web. And the motivation behind that is that you have a lot of control over synthetic data. You can control your prompt and basically also the kind of data that you generate.[00:01:28] Speaker: So instead of just trying to filter the web, you could try to get the LLM to generate what you think the best web pages could look like and then train your models on that. So this is how we went from not having synthetic data at all in the LLM pipeline to having it everywhere. And so the cool thing is like today you can train an LLM with like an entirely synthetic pipeline.[00:01:49] Speaker: For example, you can use our Cosmopedia datasets and you can train a 1B model on like 150 billion tokens that are 100 percent synthetic. And those are also of good quality. And then you can [00:02:00] instruction tune the model on a synthetic SFT dataset. You can also do DPO on a synthetic dataset. And then to evaluate if the model is good, you can use.[00:02:07] Speaker: A benchmark that uses LLMs as a judge, for example, MTBench or AlpacaEvil. So I think this is like a really mind blowing because like just a few years ago, we wouldn't think this is possible. And I think there's a lot of concerns about model collapse, and I'm going to talk about that later. But we'll see that like, if we use synthetic data properly and we curate it carefully, that shouldn't happen.[00:02:29] Speaker: And the reason synthetic data is very popular right now is that we have really strong models, both open and closed. It is really cheap and fast to use compared to human annotations, which cost a lot and take a lot of time. And also for open models right now, we have some really good inference frameworks.[00:02:47] Speaker: So if you have enough GPUs, it's really easy to spawn these GPUs and generate like a lot of synthetic data. Some examples are VLM, TGI, and TensorRT.[00:02:57] Model Collapse[00:02:57] Speaker: Now let's talk about the elephant in the room, model [00:03:00] collapse. Is this the end? If you look at the media and all of like, for example, some papers in nature, it's really scary because there's a lot of synthetic data out there in the web.[00:03:09] Speaker: And naturally we train on the web. So we're going to be training a lot of synthetic data. And if model collapse is going to happen, we should really try to take that seriously. And the other issue is that, as I said, we think, a lot of people think the web is polluted because there's a lot of synthetic data.[00:03:24] Speaker: And for example, when we're building fine web datasets here at Guillerm and Hinek, we're interested in like, how much synthetic data is there in the web? So there isn't really a method to properly measure the amount of synthetic data or to save a webpage synthetic or not. But one thing we can do is to try to look for like proxy words, for example, expressions like as a large language model or words like delve that we know are actually generated by chat GPT.[00:03:49] Speaker: We could try to measure the amount of these words in our data system and compare them to the previous years. For example, here, we measured like a, these words ratio in different dumps of common crawl. [00:04:00] And we can see that like the ratio really increased after chat GPT's release. So if we were to say that synthetic data amount didn't change, you would expect this ratio to stay constant, which is not the case.[00:04:11] Speaker: So there's a lot of synthetic data probably on the web, but does this really make models worse? So what we did is we trained different models on these different dumps. And we then computed their performance on popular, like, NLP benchmarks, and then we computed the aggregated score. And surprisingly, you can see that the latest DOMs are actually even better than the DOMs that are before.[00:04:31] Speaker: So if there's some synthetic data there, at least it did not make the model's worse. Yeah, which is really encouraging. So personally, I wouldn't say the web is positive with Synthetic Data. Maybe it's even making it more rich. And the issue with like model collapse is that, for example, those studies, they were done at like a small scale, and you would ask the model to complete, for example, a Wikipedia paragraph, and then you would train it on these new generations, and you would do that every day.[00:04:56] Speaker: iteratively. I think if you do that approach, it's normal to [00:05:00] observe this kind of behavior because the quality is going to be worse because the model is already small. And then if you train it just on its generations, you shouldn't expect it to become better. But what we're really doing here is that we take a model that is very large and we try to distill its knowledge into a model that is smaller.[00:05:14] Phi, FineWeb, Cosmopedia - Synthetic Textbooks[00:05:14] Speaker: And in this way, you can expect to get like a better performance for your small model. And using synthetic data for pre-training has become really popular. After the textbooks are all you need papers where Microsoft basically trained a series of small models on textbooks that were using a large LLM.[00:05:32] Speaker: And then they found that these models were actually better than models that are much larger. So this was really interesting. It was like first of its time, but it was also met with a lot of skepticism, which is a good thing in research. It pushes you to question things because the dataset that they trained on was not public, so people were not really sure if these models are really good or maybe there's just some data contamination.[00:05:55] Speaker: So it was really hard to check if you just have the weights of the models. [00:06:00] And as Hugging Face, because we like open source, we tried to reproduce what they did. So this is our Cosmopedia dataset. We basically tried to follow a similar approach to what they documented in the paper. And we created a synthetic dataset of textbooks and blog posts and stories that had almost 30 billion tokens.[00:06:16] Speaker: And we tried to train some models on that. And we found that like the key ingredient to getting a good data set that is synthetic is trying as much as possible to keep it diverse. Because if you just throw the same prompts as your model, like generate like a textbook about linear algebra, and even if you change the temperature, the textbooks are going to look alike.[00:06:35] Speaker: So there's no way you could scale to like millions of samples. And the way you do that is by creating prompts that have some seeds that make them diverse. In our case, the prompt, we would ask the model to generate a textbook, but make it related to an extract from a webpage. And also we try to frame it within, to stay within topic.[00:06:55] Speaker: For example, here, we put like an extract about cardiovascular bioimaging, [00:07:00] and then we ask the model to generate a textbook related to medicine that is also related to this webpage. And this is a really nice approach because there's so many webpages out there. So you can. Be sure that your generation is not going to be diverse when you change the seed example.[00:07:16] Speaker: One thing that's challenging with this is that you want the seed samples to be related to your topics. So we use like a search tool to try to go all of fine web datasets. And then we also do a lot of experiments with the type of generations we want the model to generate. For example, we ask it for textbooks for middle school students or textbook for college.[00:07:40] Speaker: And we found that like some generation styles help on some specific benchmarks, while others help on other benchmarks. For example, college textbooks are really good for MMLU, while middle school textbooks are good for benchmarks like OpenBookQA and Pico. This is like a sample from like our search tool.[00:07:56] Speaker: For example, you have a top category, which is a topic, and then you have some [00:08:00] subtopics, and then you have the topic hits, which are basically the web pages in fine web does belong to these topics. And here you can see the comparison between Cosmopedia. We had two versions V1 and V2 in blue and red, and you can see the comparison to fine web, and as you can see throughout the training training on Cosmopedia was consistently better.[00:08:20] Speaker: So we managed to get a data set that was actually good to train these models on. It's of course so much smaller than FineWeb, it's only 30 billion tokens, but that's the scale that Microsoft data sets was, so we kind of managed to reproduce a bit what they did. And the data set is public, so everyone can go there, check if everything is all right.[00:08:38] Speaker: And now this is a recent paper from NVIDIA, Neumatron CC. They took things a bit further, and they generated not a few billion tokens, but 1. 9 trillion tokens, which is huge. And we can see later how they did that. It's more of, like, rephrasing the web. So we can see today that there's, like, some really huge synthetic datasets out there, and they're public, so, [00:09:00] like, you can try to filter them even further if you want to get, like, more high quality corpses.[00:09:04] Speaker: So for this, rephrasing the web this approach was suggested in this paper by Pratyush, where basically in this paper, they take some samples from C4 datasets, and then they use an LLM to rewrite these samples into a better format. For example, they ask an LLM to rewrite the sample into a Wikipedia passage or into a Q& A page.[00:09:25] Speaker: And the interesting thing in this approach is that you can use a model that is Small because it doesn't, rewriting doesn't require knowledge. It's just rewriting a page into a different style. So the model doesn't need to have like knowledge that is like extensive of what is rewriting compared to just asking a model to generate a new textbook and not giving it like ground truth.[00:09:45] Speaker: So here they rewrite some samples from C4 into Q& A, into Wikipedia, and they find that doing this works better than training just on C4. And so what they did in Nemo Trans CC is a similar approach. [00:10:00] They rewrite some pages from Common Crawl for two reasons. One is to, like improve Pages that are low quality, so they rewrite them into, for example, Wikipedia page, so they look better.[00:10:11] Speaker: And another reason is to create more diverse datasets. So they have a dataset that they already heavily filtered, and then they take these pages that are already high quality, and they ask the model to rewrite them in Question and Answer format. into like open ended questions or like multi choice questions.[00:10:27] Speaker: So this way they can reuse the same page multiple times without fearing like having multiple duplicates, because it's the same information, but it's going to be written differently. So I think that's also a really interesting approach for like generating synthetic data just by rephrasing the pages that you already have.[00:10:44] Speaker: There's also this approach called Prox where they try to start from a web page and then they generate a program which finds how to write that page to make it better and less noisy. For example, here you can see that there's some leftover metadata in the web page and you don't necessarily want to keep that for training [00:11:00] your model.[00:11:00] Speaker: So So they train a model that can generate programs that can like normalize and remove lines that are extra. So I think this approach is also interesting, but it's maybe less scalable than the approaches that I presented before. So that was it for like rephrasing and generating new textbooks.[00:11:17] Speaker: Another approach that I think is really good and becoming really popular for using synthetic data for pre training is basically building a better classifiers. For filtering the web for example, here we release the data sets called fine web edu. And the way we built it is by taking Llama3 and asking it to rate the educational content of web pages from zero to five.[00:11:39] Speaker: So for example, if a page is like a really good textbook that could be useful in a school setting, it would get a really high score. And if a page is just like an advertisement or promotional material, it would get a lower score. And then after that, we take these synthetic annotations and we train a classifier on them.[00:11:57] Speaker: It's a classifier like a BERT model. [00:12:00] And then we run this classifier on all of FineWeb, which is a 15 trillion tokens dataset. And then we only keep the pages that have like a score that's higher than 3. So for example, in our case, we went from 15 trillion tokens to 3. to just 1. 5 trillion tokens. Those are really highly educational.[00:12:16] Speaker: And as you can see here, a fine web EDU outperforms all the other public web datasets by a larger margin on a couple of benchmarks here, I show the aggregated score and you can see that this approach is really effective for filtering web datasets to get like better corpuses for training your LLMs.[00:12:36] DCLM, Nemotron-CC[00:12:36] Speaker: Others also try to do this approach. There's, for example, the DCLM datasets where they also train the classifier, but not to detect educational content. Instead, they trained it on OpenHermes dataset, which is a dataset for instruction tuning. And also they explain like IAM5 subreddits, and then they also get really high quality dataset which is like very information dense and can help [00:13:00] you train some really good LLMs.[00:13:01] Speaker: And then Nemotron Common Crawl, they also did this approach, but instead of using one classifier, they used an ensemble of classifiers. So they used, for example, the DCLM classifier, and also classifiers like the ones we used in FineWebEducational, and then they combined these two. Scores into a, with an ensemble method to only retain the best high quality pages, and they get a data set that works even better than the ones we develop.[00:13:25] Speaker: So that was it for like synthetic data for pre-training.[00:13:28] Post Training - AI2 Tulu, Smol Talk, Cohere Multilingual Arbitrage[00:13:28] Speaker: Now we can go back to post training. I think there's a lot of interesting post training data sets out there. One that was released recently, the agent instructs by Microsoft where they basically try to target some specific skills. And improve the performance of models on them.[00:13:43] Speaker: For example, here, you can see code, brain teasers, open domain QA, and they managed to get a dataset that outperforms that's when fine tuning Mistral 7b on it, it outperforms the original instruct model that was released by Mistral. And as I said, to get good synthetic data, you really [00:14:00] have to have a framework to make sure that your data is diverse.[00:14:03] Speaker: So for example, for them, they always. And then they see the generations on either source code or raw text documents, and then they rewrite them to make sure they're easier to generate instructions from, and then they use that for their like instruction data generation. There's also the Tool3SFT mixture, which was released recently by Allen AI.[00:14:23] Speaker: It's also really good quality and it covers a wide range of tasks. And the way they make sure that this dataset is diverse is by using personas from the persona hub datasets. Which is basically a data set of like I think over a million personas. And for example, in the tool mixture to generate like a new code snippet, they would give like the model persona, for example, a machine learning researcher interested in neural networks, and then ask it to generate like a coding problem.[00:14:49] Speaker: This way you make sure that your data set is really diverse, and then you can further filter the data sets, for example, using the reward models. We also released a dataset called Smalltalk, [00:15:00] and we also tried to cover the wide range of tasks, and as you can see here, for example, when fine tuning Mistral 7b on the dataset, we also outperformed the original Mistral instructs on a number of benchmarks, notably on mathematics and instruction following with ifevil.[00:15:18] Speaker: Another paper that's really interesting I wanted to mention is this one called Multilingual Data Arbitrage by Cohere. And basically they want to generate a data set for post training that is multilingual. And they have a really interesting problem. It's the fact that there isn't like one model that's really good at all the languages they wanted.[00:15:36] Speaker: So what they do is that like they use not just one teacher model, but multiple teachers. And then they have a router which basically sends the prompts they have to all these models. And then they get the completions and they have a reward model that traces all these generations and only keeps the best one.[00:15:52] Speaker: And this is like arbitrage and finance. So well, I think what's interesting in this, it shows that like synthetic data, it doesn't have to come from a single model. [00:16:00] And because we have so many good models now, you could like pull these models together and get like a dataset that's really high quality and that's diverse and that's covers all your needs.[00:16:12] Speaker: I was supposed to put a meme there, but. Yeah, so that was it for like a synthetic data.[00:16:17] Smol Models[00:16:17] Speaker: Now we can go to see what's happening in the small models field in 2024. I don't know if you know, but like now we have some really good small models. For example, Lama 3. 2 1B is. It matches Lama 2. 13b from, that was released last year on the LMSYS arena, which is basically the default go to leaderboard for evaluating models using human evaluation.[00:16:39] Speaker: And as you can see here, the scores of the models are really close. So I think we've made like hugely forward in terms of small models. Of course, that's one, just one data point, but there's more. For example, if you look at this chart from the Quint 2. 5 blog post, it shows that today we have some really good models that are only like 3 billion parameters [00:17:00] and 4 billion that score really high on MMLU.[00:17:03] Speaker: Which is a really popular benchmark for evaluating models. And you can see here that the red, the blue dots have more than 65 on MMLU. And the grey ones have less. And for example, Llama33b had less. So now we have a 3b model that outperforms a 33b model that was released earlier. So I think now people are starting to realize that like, we shouldn't just scale and scale models, but we should try to make them more efficient.[00:17:33] Speaker: I don't know if you knew, but you can also chat with a 3B plus model on your iPhone. For example, here, this is an app called PocketPal, where you can go and select a model from Hugging Face. It has a large choice. For example, here we loaded the 5. 3. 5, which is 3. 8 billion parameters on this iPhone. And we can chat with this and you can see that even the latency is also acceptable.[00:17:57] Speaker: For example, here, I asked it to give me a joke about [00:18:00] NeurIPS. So let's see what it has to say.[00:18:06] Speaker: Okay, why did the neural network attend NeurIPS? Because it heard there would be a lot of layers and fun and it wanted to train its sense of humor. So not very funny, but at least it can run on device. Yeah, so I think now we have good small models, but we also have like good frameworks and tools to use these small models.[00:18:24] On Device Models[00:18:24] Speaker: So I think we're really close to having like really on edge and on device models that are really good. And I think for a while we've had this narrative. But just training larger models is better. Of course, this is supported by science scaling laws. As you can see here, for example, when we scale the model size, the loss is lower and obviously you get a better model.[00:18:46] Speaker: But and we can see this, for example, in the GPT family of models, how we went from just a hundred million parameters to more than a trillion. parameters. And of course, we all observed the performance improvement when using the latest model. But [00:19:00] one thing that we shouldn't forget is that when we scale the model, we also scale the inference costs and time.[00:19:05] Speaker: And so the largest models were are going to cost so much more. So I think now instead of just building larger models, we should be focusing on building more efficient models. It's no longer a race for the largest models since these models are really expensive to run and they require like a really good infrastructure to do that and they cannot run on, for example, consumer hardware.[00:19:27] Speaker: And when you try to build more efficient models that match larger models, that's when you can really unlock some really interesting on device use cases. And I think a trend that we're noticing now is the trend of training smaller models longer. For example, if you compare how much, how long LLAMA was trained compared to LLAMA3, there is a huge increase in the pre training length.[00:19:50] Speaker: LLAMA was trained on 1 trillion tokens, but LLAMA3 8b was trained on 15 trillion tokens. So Meta managed to get a model that's the same size, but But it performs so much [00:20:00] better by choosing to like spend the sacrifice during training, because as we know, training is a one time cost, but inference is something that's ongoing.[00:20:08] Speaker: If we want to see what are like the small models reads in 2024, I think this mobile LLM paper by Meta is interesting. They try to study different models that are like have the less than 1 billion parameters and find which architecture makes most sense for these models. For example, they find that depth is more important than width.[00:20:29] Speaker: So it's more important to have models that have like more layers than just one. making them more wide. They also find that GQA helps, that tying the embedding helps. So I think it's a nice study overall for models that are just a few hundred million parameters. There's also the Apple intelligence tech report, which is interesting.[00:20:48] Speaker: So for Apple intelligence, they had two models, one that was like on server and another model that was on device. It had 3 billion parameters. And I think the interesting part is that they trained this model using [00:21:00] pruning. And then distillation. And for example, they have this table where they show that, like, using pruning and distillation works much better than training from scratch.[00:21:08] Speaker: And they also have some interesting insights about, like, how they specialize their models on specific tasks, like, for example, summarization and rewriting. There's also this paper by NVIDIA that was released recently. I think you've already had a talk about, like, hybrid models that was all interesting.[00:21:23] Speaker: And this model, they used, like, a hybrid architecture between state space models and transformers. And they managed to train a 1B model that's really performant without needing to train it on a lot of tokens. And regarding our work, we just recently released SmallM2, so it's a series of three models, which are the best in class in each model size.[00:21:46] Speaker: For example, our 1. 7b model outperforms Lama 1b and also Qt 2. 5. And how we managed to train this model is the following. That's where you spent a lot of time trying to curate the pre training datasets. We did a lot of [00:22:00] ablations, trying to find which datasets are good and also how to mix them. We also created some new math and code datasets that we're releasing soon.[00:22:08] Speaker: But you basically really spent a lot of time trying to find what's the best mixture that you can train these models on. And then we spent some time trying to like we also trained these models for very long. For example, small M1 was trained only on 1 trillion tokens, but this model is trained on 11 trillion tokens.[00:22:24] Speaker: And we saw that the performance kept improving. The models didn't really plateau mid training, which I think is really interesting. It shows that you can train such small models for very long and keep getting performance gains. What's interesting about SmallLM2 is that it's fully open. We also released, like the pre training code base, the fine tuning code, the datasets, and also evaluation in this repository.[00:22:45] Smol Vision Models[00:22:45] Speaker: Also there's, like, really interesting small models for text, but also for vision. For example, here you can see SmallVLM, which is a 2B model that's really efficient. It doesn't consume a lot of RAM, and it also has a good performance. There's also Moondream 0. [00:23:00] 5b, which was released recently. It's like the smallest visual language model.[00:23:04] Speaker: And as you can see, there isn't like a big trade off compared to Moondream 2b. So now I showed you that we have some really good small models. We also have the tools to use them, but why should you consider using small models and when? I think, like, small models are really interesting because of the on device feature.[00:23:23] Speaker: Because these models are small and they can run fast, you can basically run them on your laptop, but also on your mobile phone. And this means that your dataset stays locally. You don't have to send your queries to third parties. And this really enhances privacy. That was, for example, one of the big selling points for Apple Intelligence.[00:23:42] Speaker: Also, right now, we really have a lot of work to do. So many frameworks to do on device inference. For example, there's MLX, MLC, Llama, CPP, Transformers, JS. So we have a lot of options and each of them have like great features. So you have so many options for doing that. Small models are also really powerful if you choose to specialize them.[00:24:00][00:24:00] Speaker: For example, here there's a startup called Numind, which took small LM and then they fine tuned it on text extraction datasets. And they managed to get a model that's not very far from models that are much larger. So I think text extraction is like one use case where small models can be really performant and it makes sense to use them instead of just using larger models.[00:24:19] Speaker: You can also chat with these models in browser. For example, here, you can go there, you can load the model, you can even turn off your internet and just start chatting with the model locally. Speaking of text extraction, if you don't want to fine tune the models, there's a really good method of structure generation.[00:24:36] Speaker: We can basically force the models to follow a JSON schema that you defined. For example, here, we try to force the model to follow a schema for extracting key information from GitHub issues. So you can input free text, which is a complaint about a GitHub repository, something not working. And then you can run it there and the model can extract anything that is relevant for your GitHub issue creation.[00:24:58] Speaker: For example, the [00:25:00] priority, for example, here, priority is high, the type of the issue bug, and then a title and the estimation of how long this will take to fix. And you can just like do this in the browser, you can transform your text into a GitHub issue that's properly formatted.[00:25:14] What's Next[00:25:14] Speaker: So what's next for synthetic data and small models?[00:25:18] Speaker: I think that domain specific synthetic data is going to be, it's already important, it's going to be even more important. For example, generating synthetic data for math. I think this really would help improve the reasoning of a lot of models. And a lot of people are doing it, for example, Quint 2. 12 math, everyone's trying to reproduce a one.[00:25:37] Speaker: And so I think for synthetic data, trying to specialize it on some domains is going to be really important. And then for small models, I think specializing them through fine tuning, it's also going to be really important because I think a lot of companies are just trying to use these large models because they are better.[00:25:53] Speaker: But on some tasks, I think you can already get decent performance with small models. So you don't need to Pay like a [00:26:00] cost that's much larger just to make your model better at your task by a few percent. And this is not just for text. And I think it also applies for other modalities like vision and audio.[00:26:11] Speaker: And I think you should also watch out for on device frameworks and applications. For example, like the app I showed, or lama, all these frameworks are becoming really popular and I'm pretty sure that we're gonna get like more of them in 2025. And users really like that. Maybe for other, I should also say hot take.[00:26:28] Speaker: I think that like in AI, we just started like with fine tuning, for example, trying to make BERT work on some specific use cases, and really struggling to do that. And then we had some models that are much larger. So we just switched to like prompt engineering to get the models And I think we're going back to fine tuning where we realize these models are really costly.[00:26:47] Speaker: It's better to use just a small model or try to specialize it. So I think it's a little bit of a cycle and we're going to start to see like more fine tuning and less of just like a prompt engineering the models. So that was my talk. Thank you for following. And if you have [00:27:00] any questions, we can take them now. Get full access to Latent Space at www.latent.space/subscribe

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Happy holidays! We'll be sharing snippets from Latent Space LIVE! through the break bringing you the best of 2024! We want to express our deepest appreciation to event sponsors AWS, Daylight Computer, Thoth.ai, StrongCompute, Notable Capital, and most of all our LS supporters who helped fund the venue and A/V production!For NeurIPS last year we did our standard conference podcast coverage interviewing selected papers (that we have now also done for ICLR and ICML), however we felt that we could be doing more to help AI Engineers 1) get more industry-relevant content, and 2) recap 2024 year in review from experts. As a result, we organized the first Latent Space LIVE!, our first in person miniconference, at NeurIPS 2024 in Vancouver.Since Nathan Lambert ( Interconnects ) joined us for the hit RLHF 201 episode at the start of this year, it is hard to overstate how much Open Models have exploded this past year. In 2023 only five names were playing in the top LLM ranks, Mistral, Mosaic's MPT, TII UAE's Falcon, Yi from Kai-Fu Lee's 01.ai, and of course Meta's Llama 1 and 2. This year a whole cast of new open models have burst on the scene, from Google's Gemma and Cohere's Command R, to Alibaba's Qwen and Deepseek models, to LLM 360 and DCLM and of course to the Allen Institute's OLMo, OL MOE, Pixmo, Molmo, and Olmo 2 models. We were honored to host Luca Soldaini, one of the research leads on the Olmo series of models at AI2.Pursuing Open Model research comes with a lot of challenges beyond just funding and access to GPUs and datasets, particularly the regulatory debates this year across Europe, California and the White House. We also were honored to hear from and Sophia Yang, head of devrel at Mistral, who also presented a great session at the AI Engineer World's Fair Open Models track!Full Talk on YouTubePlease like and subscribe!Timestamps* 00:00 Welcome to Latent Space Live * 00:12 Recap of 2024: Best Moments and Keynotes * 01:22 Explosive Growth of Open Models in 2024 * 02:04 Challenges in Open Model Research * 02:38 Keynote by Luca Soldani: State of Open Models * 07:23 Significance of Open Source AI Licenses * 11:31 Research Constraints and Compute Challenges * 13:46 Fully Open Models: A New Trend * 27:46 Mistral's Journey and Innovations * 32:57 Interactive Demo: Lachat Capabilities * 36:50 Closing Remarks and NetworkingTranscriptSession3Audio[00:00:00] AI Charlie: Welcome to Latent Space Live, our first mini conference held at NeurIPS 2024 in Vancouver. This is Charlie, your AI co host. As a special treat this week, we're recapping the best of 2024 going domain by domain. We sent out a survey to the over 900 of you who told us what you wanted, and then invited the best speakers in the latent space network to cover each field.[00:00:28] AI Charlie: 200 of you joined us in person throughout the day, with over 2, 200 watching live online. Our next keynote covers the state of open models in 2024, with Luca Soldani and Nathan Lambert of the Allen Institute for AI, with a special appearance from Dr. Sophia Yang of Mistral. Our first hit episode of 2024 was with Nathan Lambert on RLHF 201 back in January.[00:00:57] AI Charlie: Where he discussed both reinforcement learning for language [00:01:00] models and the growing post training and mid training stack with hot takes on everything from constitutional AI to DPO to rejection sampling and also previewed the sea change coming to the Allen Institute. And to Interconnects, his incredible substack on the technical aspects of state of the art AI training.[00:01:18] AI Charlie: We highly recommend subscribing to get access to his Discord as well. It is hard to overstate how much open models have exploded this past year. In 2023, only five names were playing in the top LLM ranks. Mistral, Mosaics MPT, and Gatsby. TII UAE's Falcon, Yi, from Kaifu Lee's 01. ai, And of course, Meta's Lama 1 and 2.[00:01:43] AI Charlie: This year, a whole cast of new open models have burst on the scene. From Google's Jemma and Cohere's Command R, To Alibaba's Quen and DeepSeq models, to LLM360 and DCLM, and of course, to the Allen Institute's OLMO, [00:02:00] OLMOE, PIXMO, MOLMO, and OLMO2 models. Pursuing open model research comes with a lot of challenges beyond just funding and access to GPUs and datasets, particularly the regulatory debates this year across Europe.[00:02:14] AI Charlie: California and the White House. We also were honored to hear from Mistral, who also presented a great session at the AI Engineer World's Fair Open Models track. As always, don't forget to check the show notes for the YouTube link to their talk, as well as their slides. Watch out and take care.[00:02:35] Luca Intro[00:02:35] Luca Soldaini: Cool. Yeah, thanks for having me over. I'm Luca. I'm a research scientist at the Allen Institute for AI. I threw together a few slides on sort of like a recap of like interesting themes in open models for, for 2024. Have about maybe 20, 25 minutes of slides, and then we can chat if there are any questions.[00:02:57] Luca Soldaini: If I can advance to the next slide. [00:03:00] Okay, cool. So I did the quick check of like, to sort of get a sense of like, how much 2024 was different from 2023. So I went on Hugging Face and sort of get, tried to get a picture of what kind of models were released in 2023 and like, what do we get in 2024?[00:03:16] Luca Soldaini: 2023 we get, we got things like both LLAMA 1 and 2, we got Mistral, we got MPT, Falcon models, I think the YI model came in at the end. Tail end of the year. It was a pretty good year. But then I did the same for 2024. And it's actually quite stark difference. You have models that are, you know, reveling frontier level.[00:03:38] Luca Soldaini: Performance of what you can get from closed models from like Quen, from DeepSeq. We got Llama3. We got all sorts of different models. I added our own Olmo at the bottom. There's this growing group of like, Fully open models that I'm going to touch on a little bit later. But you know, just looking at the slides, it feels like 2024 [00:04:00] was just smooth sailing, happy knees, much better than previous year.[00:04:04] Luca Soldaini: And you know, you can plot you can pick your favorite benchmark Or least favorite, I don't know, depending on what point you're trying to make. And plot, you know, your closed model, your open model and sort of spin it in ways that show that, oh, you know open models are much closer to where closed models are today versus to Versus last year where the gap was fairly significant.[00:04:29] Luca Soldaini: So one thing that I think I don't know if I have to convince people in this room, but usually when I give this talks about like open models, there is always like this background question in, in, in people's mind of like, why should we use open models? APIs argument, you know, it's, it's. Just an HTTP request to get output from a, from one of the best model out there.[00:04:53] Luca Soldaini: Why do I have to set up infra and use local models? And there are really like two answer. There is the more [00:05:00] researchy answer for this, which is where it might be. Background lays, which is just research. If you want to do research on language models, research thrives on, on open models, there is like large swath of research on modeling, on how these models behave on evaluation and inference on mechanistic interpretability that could not happen at all if you didn't have open models they're also for AI builders, they're also like.[00:05:30] Luca Soldaini: Good use cases for using local models. You know, you have some, this is like a very not comprehensive slides, but you have things like there are some application where local models just blow closed models out of the water. So like retrieval, it's a very clear example. We might have like constraints like Edge AI applications where it makes sense.[00:05:51] Luca Soldaini: But even just like in terms of like stability, being able to say this model is not changing under the hood. It's, there's plenty of good cases for, [00:06:00] for open models. And the community is just not models. Is I stole this slide from one of the Quent2 announcement blog posts. But it's super cool to see like how much tech exists around open models and serving them on making them efficient and hosting them.[00:06:18] Luca Soldaini: It's pretty cool. And so. It's if you think about like where the term opens come from, comes from like the open source really open models meet the core tenants of, of open, of open source specifically when it comes around collaboration, there is truly a spirit, like through these open models, you can build on top of other people.[00:06:41] Luca Soldaini: innovation. We see a lot of these even in our own work of like, you know, as we iterate in the various versions of Alma it's not just like every time we collect from scratch all the data. No, the first step is like, okay, what are the cool data sources and datasets people have put [00:07:00] together for language model for training?[00:07:01] Luca Soldaini: Or when it comes to like our post training pipeline We one of the steps is you want to do some DPO and you use a lot of outputs of other models to improve your, your preference model. So it's really having like an open sort of ecosystem benefits and accelerates the development of open models.[00:07:23] The Definition of Open Models[00:07:23] Luca Soldaini: One thing that we got in 2024, which is not a specific model, but I thought it was really significant, is we first got we got our first open source AI definition. So this is from the open source initiative they've been generally the steward of a lot of the open source licenses when it comes to software and so they embarked on this journey in trying to figure out, okay, How does a license, an open source license for a model look like?[00:07:52] Luca Soldaini: Majority of the work is very dry because licenses are dry. So I'm not going to walk through the license step by [00:08:00] step, but I'm just going to pick out one aspect that is very good and then one aspect that personally feels like it needs improvement on the good side. This this open source AI license actually.[00:08:13] Luca Soldaini: This is very intuitive. If you ever build open source software and you have some expectation around like what open source looks like for software for, for AI, sort of matches your intuition. So, the weights need to be fairly available the code must be released with an open source license and there shouldn't be like license clauses that block specific use cases.[00:08:39] Luca Soldaini: So. Under this definition, for example, LLAMA or some of the QUEN models are not open source because the license says you can't use this model for this or it says if you use this model you have to name the output this way or derivative needs to be named that way. Those clauses don't meet open source [00:09:00] definition and so they will not be covered.[00:09:02] Luca Soldaini: The LLAMA license will not be covered under the open source definition. It's not perfect. One of the thing that, um, internally, you know, in discussion with with OSI, we were sort of disappointed is around the language. For data. So you might imagine that an open source AI model means a model where the data is freely available.[00:09:26] Luca Soldaini: There were discussion around that, but at the end of the day, they decided to go with a softened stance where they say a model is open source if you provide sufficient detail information. On how to sort of replicate the data pipeline. So you have an equivalent system, sufficient, sufficiently detailed.[00:09:46] Luca Soldaini: It's very, it's very fuzzy. Don't like that. An equivalent system is also very fuzzy. And this doesn't take into account the accessibility of the process, right? It might be that you provide enough [00:10:00] information, but this process costs, I don't know, 10 million to do. Now the open source definition. Like, any open source license has never been about accessibility, so that's never a factor in open source software, how accessible software is.[00:10:14] Luca Soldaini: I can make a piece of open source, put it on my hard drive, and never access it. That software is still open source, the fact that it's not widely distributed doesn't change the license, but practically there are expectations of like, what we want good open sources to be. So, it's, It's kind of sad to see that the data component in this license is not as, as, Open as some of us would like would like it to be.[00:10:40] Challenges for Open Models[00:10:40] Luca Soldaini: and I linked a blog post that Nathan wrote on the topic that it's less rambly and easier to follow through. One thing that in general, I think it's fair to say about the state of open models in 2024 is that we know a lot more than what we knew in, [00:11:00] in 2023. Like both on the training data, like And the pre training data you curate on like how to do like all the post training, especially like on the RL side.[00:11:10] Luca Soldaini: You know, 2023 was a lot of like throwing random darts at the board. I think 2024, we have clear recipes that, okay, don't get the same results as a closed lab because there is a cost in, in actually matching what they do. But at least we have a good sense of like, okay, this is, this is the path to get state of the art language model.[00:11:31] Luca Soldaini: I think that one thing that it's a downside of 2024 is that I think we are more research constrained in 2023. It feels that, you know, the barrier for compute that you need to, to move innovation along as just being right rising and rising. So like, if you go back to this slide, there is now this, this cluster of models that are sort of released by the.[00:11:57] Luca Soldaini: Compute rich club. Membership is [00:12:00] hotly debated. You know, some people don't want to be. Called the rich because it comes to expectations. Some people want to be called rich, but I don't know, there's debate, but like, these are players that have, you know, 10, 000, 50, 000 GPUs at minimum. And so they can do a lot of work and a lot of exploration and improving models that it's not very accessible.[00:12:21] Luca Soldaini: To give you a sense of like how I personally think about. Research budget for each part of the, of the language model pipeline is like on the pre training side, you can maybe do something with a thousand GPUs, really you want 10, 000. And like, if you want real estate of the art, you know, your deep seek minimum is like 50, 000 and you can scale to infinity.[00:12:44] Luca Soldaini: The more you have, the better it gets. Everyone on that side still complains that they don't have enough GPUs. Post training is a super wide sort of spectrum. You can do as little with like eight GPUs as long as you're able to [00:13:00] run, you know, a good version of, say, a LLAMA model, you can do a lot of work there.[00:13:05] Luca Soldaini: You can scale a lot of the methodology, just like scales with compute, right? If you're interested in you know, your open replication of what OpenAI's O1 is you're going to be on the 10K spectrum of our GPUs. Inference, you can do a lot with very few resources. Evaluation, you can do a lot with, well, I should say at least one GPUs if you want to evaluate GPUs.[00:13:30] Luca Soldaini: Open models but in general, like if you are, if you care a lot about intervention to do on this model, which it's my prefer area of, of research, then, you know, the resources that you need are quite, quite significant. Yeah. One other trends that has emerged in 2024 is this cluster of fully open models.[00:13:54] Luca Soldaini: So Omo the model that we built at ai, two being one of them and you know, it's nice [00:14:00] that it's not just us. There's like a cluster of other mostly research efforts who are working on this. And so it's good to to give you a primer of what like fully open means. So fully open, the easy way to think about it is instead of just releasing a model checkpoint that you run, you release a full recipe so that other people working on it.[00:14:24] Luca Soldaini: Working on that space can pick and choose whatever they want from your recipe and create their own model or improve on top of your model. You're giving out the full pipeline and all the details there instead of just like the end output. So I pull up the screenshot from our recent MOE model.[00:14:43] Luca Soldaini: And like for this model, for example, we released the model itself. Data that was trained on, the code, both for training and inference all the logs that we got through the training run, as well as every intermediate checkpoint and like the fact that you release different part of the pipeline [00:15:00] allows others to do really cool things.[00:15:02] Luca Soldaini: So for example, this tweet from early this year from folks in news research they use our pre training data to do a replication of the BitNet paper in the open. So they took just a Really like the initial part of a pipeline and then the, the thing on top of it. It goes both ways.[00:15:21] Luca Soldaini: So for example, for the Olmo2 model a lot of our pre trained data for the first stage of pre training was from this DCLM initiative that was led by folks Ooh, a variety of ins a variety of institutions. It was a really nice group effort. But you know, for When it was nice to be able to say, okay, you know, the state of the art in terms of like what is done in the open has improved.[00:15:46] AI2 Models - Olmo, Molmo, Pixmo etc[00:15:46] Luca Soldaini: We don't have to like do all this work from scratch to catch up the state of the art. We can just take it directly and integrate it and do our own improvements on top of that. I'm going to spend a few minutes doing like a [00:16:00] shameless plug for some of our fully open recipes. So indulge me in this.[00:16:05] Luca Soldaini: So a few things that we released this year was, as I was mentioning, there's OMOE model which is, I think still is state of the art MOE model in its size class. And it's also. Fully open, so every component of this model is available. We released a multi modal model called Molmo. Molmo is not just a model, but it's a full recipe of how you go from a text only model to a multi modal model, and we apply this recipe on top of Quent checkpoints, on top of Olmo checkpoints, as well as on top of OlmoE.[00:16:37] Luca Soldaini: And I think there'd be a replication doing that on top of Mistral as well. The post training side we recently released 2. 0. 3. Same story. This is a recipe on how you go from a base model to A state of the art post training model. We use the Tulu recipe on top of Olmo, on top of Llama, and then there's been open replication effort [00:17:00] to do that on top of Quen as well.[00:17:02] Luca Soldaini: It's really nice to see like, you know, when your recipe sort of, it's kind of turnkey, you can apply it to different models and it kind of just works. And finally, the last thing we released this year was Olmo 2, which so far is the best state of the art. Fully open language model a Sera combines aspect from all three of these previous models.[00:17:22] Luca Soldaini: What we learn on the data side from MomoE and what we learn on like making models that are easy to adapt from the Momo project and the Tulu project. I will close with a little bit of reflection of like ways this, this ecosystem of open models like it's not all roses. It's not all happy. It feels like day to day, it's always in peril.[00:17:44] Luca Soldaini: And, you know, I talked a little bit about like the compute issues that come with it. But it's really not just compute. One thing that is on top of my mind is due to like the environment and how you know, growing feelings about like how AI is treated. [00:18:00] It's actually harder to get access to a lot of the data that was used to train a lot of the models up to last year.[00:18:06] Luca Soldaini: So this is a screenshot from really fabulous work from Shane Longpre who's, I think is in Europe about Just access of like diminishing access to data for language model pre training. So what they did is they went through every snapshot of common crawl. Common crawl is this publicly available scrape of the, of a subset of the internet.[00:18:29] Luca Soldaini: And they looked at how For any given website whether a website that was accessible in say 2017, what, whether it was accessible or not in 2024. And what they found is as a reaction to like the close like of the existence of closed models like OpenAI or Cloud GPT or Cloud a lot of content owners have blanket Blocked any type of crawling to your website.[00:18:57] Luca Soldaini: And this is something that we see also internally at [00:19:00] AI2. Like one project that we started this year is we wanted to, we wanted to understand, like, if you're a good citizen of the internet and you crawl following sort of norms and policy that have been established in the last 25 years, what can you crawl?[00:19:17] Luca Soldaini: And we found that there's a lot of website where. The norms of how you express preference of whether to crawl your data or not are broken. A lot of people would block a lot of crawling, but do not advertise that in RobustDXT. You can only tell that they're crawling, that they're blocking you in crawling when you try doing it.[00:19:37] Luca Soldaini: Sometimes you can't even crawl the robots. txt to, to check whether you're allowed or not. And then a lot of websites there's, there's like all these technologies that historically have been, have existed to make websites serving easier such as Cloudflare or DNS. They're now being repurposed for blocking AI or any type of crawling [00:20:00] in a way that is Very opaque to the content owners themselves.[00:20:04] Luca Soldaini: So, you know, you go to these websites, you try to access them and they're not available and you get a feeling it's like, Oh, someone changed, something changed on the, on the DNS side that it's blocking this and likely the content owner has no idea. They're just using a Cloudflare for better, you know, load balancing.[00:20:25] Luca Soldaini: And this is something that was sort of sprung on them with very little notice. And I think the problem is this, this blocking or ideas really, it impacts people in different ways. It disproportionately helps companies that have a headstart, which are usually the closed labs and it hurts incoming newcomer players where either have now to do things in a sketchy way or you're never going to get that content that the closed lab might have.[00:20:54] Luca Soldaini: So there's a lot, it was a lot of coverage. I'm going to plug Nathan's blog post again. That is, [00:21:00] that I think the title of this one is very succinct which is like, we're actually not, You know, before thinking about running out of training data, we're actually running out of open training data. And so if we want better open models they should be on top of our mind.[00:21:13] Regulation and Lobbying[00:21:13] Luca Soldaini: The other thing that has emerged is that there is strong lobbying efforts on trying to define any kind of, AI as like a new extremely risky and I want to be precise here. Like the problem is now, um, like the problem is not not considering the risk of this technology. Every technology has risks that, that should always be considered.[00:21:37] Luca Soldaini: The thing that it's like to me is sorry, is ingenious is like just putting this AI on a pedestal and calling it like, An unknown alien technology that has like new and undiscovered potentials to destroy humanity. When in reality, all the dangers I think are rooted in [00:22:00] dangers that we know from existing software industry or existing issues that come with when using software on on a lot of sensitive domains, like medical areas.[00:22:13] Luca Soldaini: And I also noticed a lot of efforts that have actually been going on and trying to make this open model safe. I pasted one here from AI2, but there's actually like a lot of work that has been going on on like, okay, how do you make, if you're distributing this model, Openly, how do you make it safe?[00:22:31] Luca Soldaini: How, what's the right balance between accessibility on open models and safety? And then also there's annoying brushing of sort of concerns that are then proved to be unfounded under the rug. You know, if you remember the beginning of this year, it was all about bio risk of these open models.[00:22:48] Luca Soldaini: The whole thing fizzled because as being Finally, there's been like rigorous research, not just this paper from Cohere folks, but it's been rigorous research showing [00:23:00] that this is really not a concern that we should be worried about. Again, there is a lot of dangerous use of AI applications, but this one was just like, A lobbying ploy to just make things sound scarier than they actually are.[00:23:15] Luca Soldaini: So I got to preface this part. It says, this is my personal opinion. It's not my employer, but I look at things like the SP 1047 from, from California. And I think we kind of dodged a bullet on, on this legislation. We, you know, the open source community, a lot of the community came together at the last, sort of the last minute and did a very good effort trying to explain all the negative impact of this bill.[00:23:43] Luca Soldaini: But There's like, I feel like there's a lot of excitement on building these open models or like researching on these open models. And lobbying is not sexy it's kind of boring but it's sort of necessary to make sure that this ecosystem can, can really [00:24:00] thrive. This end of presentation, I have Some links, emails, sort of standard thing in case anyone wants to reach out and if folks have questions or anything they wanted to discuss.[00:24:13] Luca Soldaini: Is there an open floor? I think we have Sophia[00:24:16] swyx: who wants to who one, one very important open model that we haven't covered is Mistral. Ask her on this slide. Yeah, yeah. Well, well, it's nice to have the Mistral person talk recap the year in Mistral. But while Sophia gets set up, does anyone have like, just thoughts or questions about the progress in this space?[00:24:32] Questions - Incentive Alignment[00:24:32] swyx: Do you always have questions?[00:24:34] Quesiton: I'm very curious how we should build incentives to build open models, things like Francois Chollet's ArcPrize, and other initiatives like that. What is your opinion on how we should better align incentives in the community so that open models stay open?[00:24:49] Luca Soldaini: The incentive bit is, like, really hard.[00:24:51] Luca Soldaini: Like, even It's something that I actually, even we think a lot about it internally because like building open models is risky. [00:25:00] It's very expensive. And so people don't want to take risky bets. I think the, definitely like the challenges like our challenge, I think those are like very valid approaches for it.[00:25:13] Luca Soldaini: And then I think in general, promoting, building, so, any kind of effort to participate in this challenge, in those challenges, if we can promote doing that on top of open models and sort of really lean into like this multiplier effect, I think that is a good way to go. If there were more money for that.[00:25:35] Luca Soldaini: For efforts like research efforts around open models. There's a lot of, I think there's a lot of investments in companies that at the moment are releasing their model in the open, which is really cool. But it's usually more because of commercial interest and not wanting to support this, this like open models in the longterm, it's a really hard problem because I think everyone is operating sort of [00:26:00] in what.[00:26:01] Luca Soldaini: Everyone is at their local maximum, right? In ways that really optimize their position on the market. Global maximum is harder to achieve.[00:26:11] Question2: Can I ask one question? No.[00:26:12] Luca Soldaini: Yeah.[00:26:13] Question2: So I think one of the gap between the closed and open source models is the mutability. So the closed source models like chat GPT works pretty good on the low resource languages, which is not the same on the open, open source models, right?[00:26:27] Question2: So is it in your plan to improve on that?[00:26:32] Luca Soldaini: I think in general,[00:26:32] Luca Soldaini: yes, is I think it's. I think we'll see a lot of improvements there in, like, 2025. Like, there's groups like, Procurement English on the smaller side that are already working on, like, better crawl support, multilingual support. I think what I'm trying to say here is you really want to be experts.[00:26:54] Luca Soldaini: who are actually in those countries that teach those languages to [00:27:00] participate in the international community. To give you, like, a very easy example I'm originally from Italy. I think I'm terribly equipped to build a model that works well in Italian. Because one of the things you need to be able to do is having that knowledge of, like, okay, how do I access, you know, how Libraries, or content that is from this region that covers this language.[00:27:23] Luca Soldaini: I've been in the US long enough that I no longer know. So, I think that's the efforts that folks in Central Europe, for example, are doing. Around like, okay, let's tap into regional communities. To get access you know, to bring in collaborators from those areas. I think it's going to be, like, very crucial for getting products there.[00:27:46] Mistral intro[00:27:46] Sophia Yang: Hi everyone. Yeah, I'm super excited to be here to talk to you guys about Mistral. A really short and quick recap of what we have done, what kind of models and products we have released in the [00:28:00] past year and a half. So most of you We have already known that we are a small startup funded about a year and a half ago in Paris in May, 2003, it was funded by three of our co founders, and in September, 2003, we released our first open source model, Mistral 7b yeah, how, how many of you have used or heard about Mistral 7b?[00:28:24] Sophia Yang: Hey, pretty much everyone. Thank you. Yeah, it's our Pretty popular and community. Our committee really loved this model, and in December 23, we, we released another popular model with the MLE architecture Mr. A X seven B and oh. Going into this year, you can see we have released a lot of things this year.[00:28:46] Sophia Yang: First of all, in February 2004, we released MrSmall, MrLarge, LeChat, which is our chat interface, I will show you in a little bit. We released an embedding model for, you [00:29:00] know, converting your text into embedding vectors, and all of our models are available. The, the big cloud resources. So you can use our model on Google cloud, AWS, Azure Snowflake, IBM.[00:29:16] Sophia Yang: So very useful for enterprise who wants to use our model through cloud. And in April and May this year, we released another powerful open source MOE model, AX22B. And we also released our first code. Code Model Coastal, which is amazing at 80 plus languages. And then we provided another fine tuning service for customization.[00:29:41] Sophia Yang: So because we know the community love to fine tune our models, so we provide you a very nice and easy option for you to fine tune our model on our platform. And also we released our fine tuning code base called Menstrual finetune. It's open source, so feel free to take it. Take a look and.[00:29:58] Sophia Yang: More models. [00:30:00] On July 2, November this year, we released many, many other models. First of all is the two new small, best small models. We have Minestra 3B great for Deploying on edge devices we have Minstrel 8B if you used to use Minstrel 7B, Minstrel 8B is a great replacement with much stronger performance than Minstrel 7B.[00:30:25] Sophia Yang: We also collaborated with NVIDIA and open sourced another model, Nemo 12B another great model. And Just a few weeks ago, we updated Mistral Large with the version 2 with the updated, updated state of the art features and really great function calling capabilities. It's supporting function calling in LatentNate.[00:30:45] Sophia Yang: And we released two multimodal models Pixtral 12b. It's this open source and Pixtral Large just amazing model for, models for not understanding images, but also great at text understanding. So. Yeah, a [00:31:00] lot of the image models are not so good at textual understanding, but pixel large and pixel 12b are good at both image understanding and textual understanding.[00:31:09] Sophia Yang: And of course, we have models for research. Coastal Mamba is built on Mamba architecture and MathRoll, great with working with math problems. So yeah, that's another model.[00:31:29] Sophia Yang: Here's another view of our model reference. We have several premier models, which means these models are mostly available through our API. I mean, all of the models are available throughout our API, except for Ministry 3B. But for the premier model, they have a special license. Minstrel research license, you can use it for free for exploration, but if you want to use it for enterprise for production use, you will need to purchase a license [00:32:00] from us.[00:32:00] Sophia Yang: So on the top row here, we have Minstrel 3b and 8b as our premier model. Minstrel small for best, best low latency use cases, MrLarge is great for your most sophisticated use cases. PixelLarge is the frontier class multimodal model. And, and we have Coastral for great for coding and then again, MrEmbedding model.[00:32:22] Sophia Yang: And The bottom, the bottom of the slides here, we have several Apache 2. 0 licensed open way models. Free for the community to use, and also if you want to fine tune it, use it for customization, production, feel free to do so. The latest, we have Pixtros 3 12b. We also have Mr. Nemo mum, Coastal Mamba and Mastro, as I mentioned, and we have three legacy models that we don't update anymore.[00:32:49] Sophia Yang: So we recommend you to move to our newer models if you are still using them. And then, just a few weeks ago, [00:33:00] we did a lot of, uh, improvements to our code interface, Lachette. How many of you have used Lachette? Oh, no. Only a few. Okay. I highly recommend Lachette. It's chat. mistral. ai. It's free to use.[00:33:16] Sophia Yang: It has all the amazing capabilities I'm going to show you right now. But before that, Lachette in French means cat. So this is actually a cat logo. If you You can tell this is the cat eyes. Yeah. So first of all, I want to show you something Maybe let's, let's take a look at image understanding.[00:33:36] Sophia Yang: So here I have a receipts and I want to ask, just going to get the prompts. Cool. So basically I have a receipt and I said I ordered I don't know. Coffee and the sausage. How much do I owe? Add a 18 percent tip. So hopefully it was able to get the cost of the coffee and the [00:34:00] sausage and ignore the other things.[00:34:03] Sophia Yang: And yeah, I don't really understand this, but I think this is coffee. It's yeah. Nine, eight. And then cost of the sausage, we have 22 here. And then it was able to add the cost, calculate the tip, and all that. Great. So, it's great at image understanding, it's great at OCR tasks. So, if you have OCR tasks, please use it.[00:34:28] Sophia Yang: It's free on the chat. It's also available through our API. And also I want to show you a Canvas example. A lot of you may have used Canvas with other tools before. But, With Lachat, it's completely free again. Here, I'm asking it to create a canvas that's used PyScript to execute Python in my browser.[00:34:51] Sophia Yang: Let's see if it works. Import this. Okay, so, yeah, so basically it's executing [00:35:00] Python here. Exactly what we wanted. And the other day, I was trying to ask Lachat to create a game for me. Let's see if we can make it work. Yeah, the Tetris game. Yep. Let's just get one row. Maybe. Oh no. Okay. All right. You get the idea. I failed my mission. Okay. Here we go. Yay! Cool. Yeah. So as you can see, Lachet can write, like, a code about a simple game pretty easily. And you can ask Lachet to explain the code. Make updates however you like. Another example. There is a bar here I want to move.[00:35:48] Sophia Yang: Okay, great, okay. And let's go back to another one. Yeah, we also have web search capabilities. Like, you can [00:36:00] ask what's the latest AI news. Image generation is pretty cool. Generate an image about researchers. Okay. In Vancouver? Yeah, it's Black Forest Labs flux Pro. Again, this is free, so Oh, cool.[00:36:19] Sophia Yang: I guess researchers here are mostly from University of British Columbia. That's smart. Yeah. So this is Laia ira. Please feel free to use it. And let me know if you have any feedback. We're always looking for improvement and we're gonna release a lot more powerful features in the coming years.[00:36:37] Sophia Yang: Thank you. Get full access to Latent Space at www.latent.space/subscribe

Serious Privacy
A long-awaited episode and law from Down Under (with John Keeves)

Serious Privacy

Play Episode Listen Later Dec 16, 2024 33:46


Send us a textOn this week of Serious Privacy, Paul Breitbarth of Catawiki and Dr. K Royal connect with John Keeves, partner at Johnson Winter Slatterly in Australia to discuss all things Australia. In particular we discussed the Cyber Security Act of 2024 and other recent changes and enforcement activities. Tune in for some #livinglearninglaughing. If you have comments or questions, find us on LinkedIn and IG @seriousprivacy, and on Blue Sky under @seriousprivacy.eu, @europaulb.seriousprivacy.eu, @heartofprivacy.bsky.app and @igrobrien.seriousprivacy.eu, and email podcast@seriousprivacy.eu. Rate and Review us! Proudly sponsored by TrustArc. Learn more about NymityAI at https://trustarc.com/nymityai-beta/ #heartofprivacy #europaulb #seriousprivacy #privacy #dataprotection #cybersecuritylaw #CPO #DPO #CISO

Masters of Privacy
Rie Aleksandra Walle: revisiting legitimate interest for marketing or analytics after KNLTB, privacy fundamentalism, and how the GDPR lost its sparkle

Masters of Privacy

Play Episode Listen Later Dec 8, 2024 35:27


Has honour been restored to the Legitimate Interest legal basis after the CJEU Royal Dutch Tennis Association decision and subsequent EDPB Guidelines? Is the GDPR showing signs of rustiness? Has it instead become a new religion?  Rie Aleksandra Walle brings over seventeen years of professional experience across both the private and public sectors, having worked at Kristiania University College, Ernst & Young, Nordic Innovation and the Norwegian Agency for Public Management and eGovernment. Rie is behind the DPO Hub, which helps busy DPOs by offering concise summaries and key practical takeaways from key CJEU rulings, EDPB documents and DPA decisions, as well as by putting together a community around it. She is also the host of the Grumpy GDPR podcast. References: The Grumpy GDPR Podcast (NoTies Consulting) DPO Hub Rie Aleksandra Walle on LinkedIn Rie Aleksandra Walle on Bluesky KNLTB vs. Dutch DPA (CJEU decision) EDPB Guidelines 1/2024 on processing of personal data based on legitimate interest Guidelines on the technical scope of article 5.3 of the ePrivacy Directive Serious Privacy (Podcast): Comments on the KNLTB case and other updates  Peter Craddock: ePrivacy exceptions, advertising, analytics, the limits of consent and server-side processing (Masters of Privacy) Rie Aleksandra Walle: the DPO's guide to better resources, constructive debates, and a happier life (Masters of Privacy)

Serious Privacy
A slow week in privacy - so AI time!

Serious Privacy

Play Episode Listen Later Dec 6, 2024 36:12 Transcription Available


Send us a textOn this week of Serious Privacy, Paul Breitbarth of Catawiki and Dr. K Royal cover a relatively slow week in privacy, including a settlement with Oracle out of California, some new WorldCoin investigations, KOSA, and a position paper from BEUC so we also throw in some frank discussion of AI tools and how they can help in our personal and professional lives.Tune in for some #livinglearninglaughing.  If you have comments or questions, find us on LinkedIn and IG @seriousprivacy, and on Blue Sky under @seriousprivacy.eu, @europaulb.seriousprivacy.eu, @heartofprivacy.bsky.app and @igrobrien.seriousprivacy.eu, and email podcast@seriousprivacy.eu. Rate and Review us! Proudly sponsored by TrustArc. Learn more about NymityAI at https://trustarc.com/nymityai-beta/ #heartofprivacy #europaulb #seriousprivacy #privacy #dataprotection #cybersecuritylaw #CPO #DPO #CISO

Masters of Privacy
Matthew Junod: the US-based DPO in the face of AI governance

Masters of Privacy

Play Episode Listen Later Dec 1, 2024 28:03


How is the role of the DPO (Data Protection/Privacy Officer) evolving in the US? What is the best approach to managing AI governance once a privacy program has been implemented? Matt Junod is a US privacy attorney and Florida native with a prior background in network engineering and security. He has worked in-house, rolling out and managing data protection programs as well as dealing with security and privacy compliance issues. Our guest has also served in privacy leadership roles since 2018, including the DPO position for a large technology services firm, and most recently a leading Internet job board. References: Matt Junod on LinkedIn EU Commission's General-Purpose AI Code of Practice NIST AI Risk Management Framework Joe Biden's Executive Order on Artificial Intelligence Elon Musk's X is changing its privacy policy to allow third parties to train AI on your posts (Techcrunch)

Is an AI Arms Race Inevitable? with Robert Wright of Nonzero Newsletter & Podcast

Play Episode Listen Later Nov 27, 2024 124:58


In this episode of The Cognitive Revolution, Nathan explores the complex intersection of AI development and international relations with Robert Wright, publisher of the Nonzero Newsletter. They examine the growing militarization of AI, US-China relations, and the concerning trajectory of what Wright calls "the chip war." Drawing from Nathan's recent experience at The Curve conference and an AI wargame simulation, they discuss the risks of an AI arms race and search for alternative paths to avoid catastrophic conflict between global powers. Join us for this crucial conversation about the future of AI governance and international cooperation. Subscribe to The NonZero Newsletter at https://nonzero.substack.com Be notified early when Turpentine's drops new publication: https://www.turpentine.co/exclusiveaccess RECOMMENDED PODCAST: Unpack Pricing - Dive into the dark arts of SaaS pricing with Metronome CEO Scott Woody and tech leaders. Learn how strategic pricing drives explosive revenue growth in today's biggest companies like Snowflake, Cockroach Labs, Dropbox and more. Apple: https://podcasts.apple.com/us/podcast/id1765716600 Spotify: https://open.spotify.com/show/38DK3W1Fq1xxQalhDSueFg SPONSORS: Notion: Notion offers powerful workflow and automation templates, perfect for streamlining processes and laying the groundwork for AI-driven automation. With Notion AI, you can search across thousands of documents from various platforms, generating highly relevant analysis and content tailored just for you - try it for free at https://notion.com/cognitiverevolution 80,000 Hours: 80,000 Hours offers free one-on-one career advising for Cognitive Revolution listeners aiming to tackle global challenges, especially in AI. They connect high-potential individuals with experts, opportunities, and personalized career plans to maximize positive impact. Apply for a free call at https://80000hours.org/cognitiverevolution to accelerate your career and contribute to solving pressing AI-related issues. SelectQuote: Finding the right life insurance shouldn't be another task you put off. SelectQuote compares top-rated policies to get you the best coverage at the right price. Even in our AI-driven world, protecting your family's future remains essential. Get your personalized quote at https://selectquote.com/cognitive Oracle Cloud Infrastructure (OCI): Oracle's next-generation cloud platform delivers blazing-fast AI and ML performance with 50% less for compute and 80% less for outbound networking compared to other cloud providers13. OCI powers industry leaders with secure infrastructure and application development capabilities. New U.S. customers can get their cloud bill cut in half by switching to OCI before December 31, 2024 at https://oracle.com/cognitive CHAPTERS: (00:00:00) Teaser (00:00:59) Sponsors: Incogni (00:02:20) About the Episode (00:05:56) Introducing AI2 (00:09:56) Tulu: Deep Dive (Part 1) (00:17:43) Sponsors: Notion | Shopify (00:20:38) Open vs. Closed Recipes (00:29:48) Compute & Value (Part 1) (00:34:22) Sponsors: Oracle Cloud Infrastructure (OCI) | 80,000 Hours (00:37:02) Compute & Value (Part 2) (00:42:41) Model Weight Evolution (00:53:16) DPO vs. PPO (01:06:36) Project Trajectory (01:20:39) Synthetic Data & LLM Judge (01:27:39) Verifiable RL (01:38:17) Advice for Practitioners (01:44:01) Open Source vs. Closed (01:49:18) Outro SOCIAL LINKS: Website: https://www.cognitiverevolution.ai Twitter (Podcast): https://x.com/cogrev_podcast Twitter (Nathan): https://x.com/labenz LinkedIn: https://www.linkedin.com/in/nathanlabenz/ Youtube: https://www.youtube.com/@CognitiveRevolutionPodcast

The Operatory Podcast by Upgrade Dental
The future of DSOs vs. DPOs—An Interview with Ian McNickle of ICON Dental Partners

The Operatory Podcast by Upgrade Dental

Play Episode Listen Later Nov 26, 2024 21:50


Are you interested in taking your dental practice to the next level but are not sure where to start? DSOs have been around for quite a number of years now, but they are no longer the only option for practice owners who want to experience greater business success. In this episode of The Patient First Podcast, I sat down with Ian McNickle, CEO and one of multiple founders of ICON Dental Partners, to talk about the advantages of both the DSO (dental service organization) and DPO (dental partnership organization).  This interview sheds light on the differences between the models, and highlights how ICON Dental Partners has achieved success by taking a non-conventional path that prioritizes true doctor autonomy and supports innovation that unlocks a higher quality of care. I'm Dr. Bryan Laskin—author, dentist and entrepreneur who's always on the lookout for solutions that make dentistry more lucrative by putting the patient first.  Find out if ICON is a good fit for you: ICONDentalPartners.com 

Serious Privacy
A Great Week in Privacy

Serious Privacy

Play Episode Listen Later Nov 23, 2024 32:11


Send us a textOn this week of Serious Privacy, Paul Breitbarth of Catawiki and Dr. K Royal catch up on some happenings in the privacy world. We discussed US election results and a peek at some potential changes to come, such as appointments to the Privacy and Civil Liberties Oversight Board, EU first review of the thingy (the EU US Data Privacy Framework), (please note - there is no need to have both SCCs and register to the framework), TrustArc's new Privacy Pulse newsletter, Japan had the first mutual adequacy decision with EU and now South Korea has an equivalency mechanism. We also mentioned some S. Korea enforcement, such as TikTok, Canada's pending privacy legislation, and yet another update to Meta's advertising program. Tune in for some #livinglearninglaughing. If you have comments or questions, find us on LinkedIn and IG @seriousprivacy @podcastprivacy @euroPaulB @heartofprivacy and on Blue Sky under the same - Serious Privacy, EuroPaulB, and HeartofPrivacy - and email podcast@seriousprivacy.eu. Rate and Review us! Proudly sponsored by TrustArc. Learn more about NymityAI at https://trustarc.com/nymityai-beta/ #heartofprivacy #europaulb #seriousprivacy #privacy #dataprotection #cybersecuritylaw #CPO #DPO #CISO

Everything You Wanted to Know About LLM Post-Training, with Nathan Lambert of Allen Institute for AI

Play Episode Listen Later Nov 21, 2024 110:08


In this episode of The Cognitive Revolution, we dive deep into frontier post-training techniques for large language models with Nathan Lambert from the Allen Institute for AI. Nathan discusses the groundbreaking Tulu 3 release, which matches Meta's post-training performance using the LlAMA base model. We explore supervised fine-tuning, preference-based reinforcement learning, and the innovative reinforcement learning from verifiable reward technique. Nathan provides unprecedented insights into the practical aspects of model development, compute requirements, and data generation strategies. This technically rich conversation illuminates previously opaque aspects of LLM development, achieved by a small team of 10-15 people. Join us for one of our most detailed and valuable discussions on state-of-the-art AI model development. Check out Nathan's Lambert newsletter: https://www.natolambert.com https://www.interconnects.ai Be notified early when Turpentine's drops new publication: https://www.turpentine.co/exclusiveaccess SPONSORS: Incogni: Take your personal data back with Incogni! Use code REVOLUTION at the link below and get 60% off an annual plan: https://incogni.com/revolution Notion: Notion offers powerful workflow and automation templates, perfect for streamlining processes and laying the groundwork for AI-driven automation. With Notion AI, you can search across thousands of documents from various platforms, generating highly relevant analysis and content tailored just for you - try it for free at https://notion.com/cognitiverevolution Shopify: Shopify is the world's leading e-commerce platform, offering a market-leading checkout system and exclusive AI apps like Quikly. Nobody does selling better than Shopify. Get a $1 per month trial at https://shopify.com/cognitive Oracle Cloud Infrastructure (OCI): Oracle's next-generation cloud platform delivers blazing-fast AI and ML performance with 50% less for compute and 80% less for outbound networking compared to other cloud providers13. OCI powers industry leaders with secure infrastructure and application development capabilities. New U.S. customers can get their cloud bill cut in half by switching to OCI before December 31, 2024 at https://oracle.com/cognitive 80,000 Hours: 80,000 Hours offers free one-on-one career advising for Cognitive Revolution listeners aiming to tackle global challenges, especially in AI. They connect high-potential individuals with experts, opportunities, and personalized career plans to maximize positive impact. Apply for a free call at https://80000hours.org/cognitiverevolution to accelerate your career and contribute to solving pressing AI-related issues. RECOMMENDED PODCAST: Unpack Pricing - Dive into the dark arts of SaaS pricing with Metronome CEO Scott Woody and tech leaders. Learn how strategic pricing drives explosive revenue growth in today's biggest companies like Snowflake, Cockroach Labs, Dropbox and more. Apple: https://podcasts.apple.com/us/podcast/id1765716600 Spotify: https://open.spotify.com/show/38DK3W1Fq1xxQalhDSueFg CHAPTERS: (00:00:00) Teaser (00:00:59) Sponsors: Incogni (00:02:20) About the Episode (00:05:56) Introducing AI2 (00:09:56) Tulu: Deep Dive (Part 1) (00:17:43) Sponsors: Shopify | Oracle Cloud Infrastructure (OCI) (00:20:38) Open vs. Closed Recipes (00:29:48) Compute & Value (Part 1) (00:34:22) Sponsors: 80,000 Hours | Notion (00:37:02) Compute & Value (Part 2) (00:42:41) Model Weight Evolution (00:53:16) DPO vs. PPO (01:06:36) Project Trajectory (01:20:39) Synthetic Data & LLM Judge (01:27:39) Verifiable RL (01:38:17) Advice for Practitioners (01:44:01) Open Source vs. Closed (01:49:18) Outro

Serious Privacy
Big Information in a Small Space (GPA2)

Serious Privacy

Play Episode Listen Later Nov 19, 2024 49:32


Send us a textOn this week of Serious Privacy, Paul Breitbarth of Catawiki and Dr. K Royal along with esteemed co-host and colleague Ralph O'Brien bring you the day 2 of the #GPA (Global Privacy Assembly) in #Jersey. We feature open conversations with Eduardo Ustaren of Hogan Lovells; Shana Morgan, Global Head of Privacy & AI Legal Compliance, L3Harris Technologies; Alexander McD White, the inaugural Privacy Commissioner for Bermuda and Patricia Kosseim, Commissioner, Office of the Information and Privacy Commissioner of Ontario (IPC). And forgive us - sometimes the conversation space was a little too open, but we have fabulous podcast tools (Riverside, Descript, and Auphonics!) Tune in for some #livinglearninglaughing. If you have comments or questions, find us on LinkedIn and IG @seriousprivacy @podcastprivacy @euroPaulB @heartofprivacy and on Blue Sky under the same - Serious Privacy, EuroPaulB, and HeartofPrivacy - and email podcast@seriousprivacy.eu. Rate and Review us! Proudly sponsored by TrustArc. Learn more about NymityAI at https://trustarc.com/nymityai-beta/ #heartofprivacy #europaulb #seriousprivacy #privacy #dataprotection #cybersecuritylaw #CPO #DPO #CISO

Serious Privacy
Opening Day at the GPA

Serious Privacy

Play Episode Listen Later Nov 15, 2024 46:02


Send us a textOn this week of Serious Privacy, Paul Breitbarth of Catawiki and Dr. K Royal along with esteemed co-host and colleague Ralph O'Brien bring you the opening day of the #GPA (Global Privacy Assembly) in #Jersey. We set up a table in the common room and snagged passing celebrities to join us for some open conversation, such as Paul Vane Information Commissioner at Jersey Office of the Information Commissioner,  and forever favorite Ron de Jesus. We discussed some of the challenges of travel, some fascinating sessions, such as Paul's with four teen girls on their social media presence and concerns about their privacy. So join us for opening day!Tune in for some #livinglearninglaughing. If you have comments or questions, find us on LinkedIn and IG @seriousprivacy @podcastprivacy @euroPaulB @heartofprivacy and on Blue Sky under the same - Serious Privacy, EuroPaulB, and HeartofPrivacy - and email podcast@seriousprivacy.eu. Rate and Review us! Proudly sponsored by TrustArc. Learn more about NymityAI at https://trustarc.com/nymityai-beta/ #heartofprivacy #europaulb #seriousprivacy #privacy #dataprotection #cybersecuritylaw #CPO #DPO #CISO