POPULARITY
Charles Henderson, who leads the cybersecurity services division at Coalfire, shares how the company is reimagining offensive and defensive operations through a programmatic lens that prioritizes outcomes over checkboxes. His team, made up of practitioners with deep experience and creative drive, brings offensive testing and exposure management together with defensive services and managed offerings to address full-spectrum cybersecurity needs. The focus isn't on commoditized services—it's on what actually makes a difference.At the heart of the conversation is the idea that cybersecurity is a team sport. Henderson draws parallels between the improvisation of music and the tactics of both attackers and defenders. Both require rhythm, creativity, and cohesion. The myth of the lone hero doesn't hold up anymore—effective cybersecurity programs are driven by collaboration across specialties and by combining services in ways that amplify their value.Coalfire's evolution reflects this shift. It's not just about running a penetration test or red team operation in isolation. It's about integrating those efforts into a broader mission-focused program, tailored to real threats and measured against what matters most. Henderson emphasizes that CISOs are no longer content with piecemeal assessments; they're seeking simplified, strategic programs with measurable outcomes.The conversation also touches on the importance of storytelling in cybersecurity reporting. Henderson underscores the need for findings to be communicated in ways that resonate with technical teams, security leaders, and the board. It's about enabling CISOs to own the narrative, armed with context, clarity, and confidence.Henderson's reflections on the early days of hacker culture—when gatherings like HoCon and early Def Cons were more about curiosity and camaraderie than business—bring a human dimension to the discussion. That same passion still fuels many practitioners today, and Coalfire is committed to nurturing it through talent development and internships, helping the next generation find their voice, their challenge, and yes, even their hacker handle.This episode offers a look at how to build programs, teams, and mindsets that are ready to lead—not follow—on the cybersecurity front.Learn more about Coalfire: https://itspm.ag/coalfire-yj4wNote: This story contains promotional content. Learn more.Guest: Charles Henderson, Executive Vice President of Cyber Security Services, Coalfire | https://www.linkedin.com/in/angustx/ResourcesLearn more and catch more stories from Coalfire: https://www.itspmagazine.com/directory/coalfireLearn more and catch more stories from RSA Conference 2025 coverage: https://www.itspmagazine.com/rsac25______________________Keywords:charles henderson, sean martin, coalfire, red teaming, penetration testing, cybersecurity services, exposure management, ciso, threat intelligence, hacker culture, brand story, brand marketing, marketing podcast, brand story podcast______________________Catch all of our event coverage: https://www.itspmagazine.com/technology-and-cybersecurity-conference-coverageWant to tell your Brand Story Briefing as part of our event coverage? Learn More
Snehal Antani is an entrepreneur, technologist, and investor. He is the CEO and Co-founder of Horizon3, a cybersecurity company using AI to deliver Red Teaming and Penetration Testing as a Service. He also serves as a Highly Qualified Expert for the U.S. Department of Defense, supporting digital transformation and data initiatives for Special Operations. Previously, he was CTO and SVP at Splunk, held CIO roles at GE Capital, and began his career as a software engineer at IBM. Snehal holds a master's in computer science from Rensselaer Polytechnic Institute and a bachelor's from Purdue University, and he is the inventor on 16 patents.In this conversation, we discuss:Snehal Antani's path from software engineer to CEO, and how his father's quiet example of grit and passion continues to shape his leadership style.How a “LEGO blocks” approach to building skills prepared Snehal to lead, and why he believes leadership must be earned through experience.Why Horizon3 identifies as a data company, and how running more pen tests than the Big Four creates a powerful AI advantage.What “cyber-enabled economic warfare” looks like in practice, and how a small disruption in a supply chain can create massive global impact.How Horizon3 built an AI engine that hacked a bank in under 60 seconds, showing what's possible when algorithms replace manual testing.What the future of work looks like in the AI era, with a growing divide between those with specialized expertise and trade skills and those without.Resources:Subscribe to the AI & The Future of Work Newsletter: https://aiandwork.beehiiv.com/subscribe Connect with Snehal on LinkedIn: https://www.linkedin.com/in/snehalantani/ AI fun fact article: https://venturebeat.com/security/ai-vs-endpoint-attacks-what-security-leaders-must-know-to-stay-ahead/ On the New Definition of Work: https://podcasts.apple.com/us/podcast/dr-john-boudreau-future-of-work-pioneer-and/id1476885647?i=1000633854079
Get featured on the show by leaving us a Voice Mail: https://bit.ly/MIPVM FULL SHOW NOTES https://www.microsoftinnovationpodcast.com/681 The team explores the ethical implications of teaching AI jailbreaking techniques and conducting red team testing on large language models, balancing educational value against potential misuse. They dive into personal experiments with bypassing AI safeguards, revealing both creative workarounds and robust protections in modern systems. TAKEAWAYS • Debate on whether demonstrating AI vulnerabilities is responsible education or potentially dangerous knowledge sharing • Psychological impact on security professionals who regularly simulate malicious behaviors to test AI safety • Real examples of attempts to "jailbreak" AI systems through fantasy storytelling and other creative prompts • Legal gray areas in AI security testing that require dedicated legal support for organizations • Personal experiences with testing AI guardrails on different models and their varying levels of protection • Future prediction that Microsoft's per-user licensing model may shift to consumption-based as AI agents replace human tasks • Growth observations about Microsoft's Business Applications division reaching approximately $8 billion • Discussion of how M365 Copilot is transforming productivity, particularly for analyzing sales calls and customer interactions Check out this episode for more deep dives into AI safety, security, and the future of technology in business.This year we're adding a new show to our line up - The AI Advantage. We'll discuss the skills you need to thrive in an AI-enabled world. DynamicsMinds is a world-class event in Slovenia that brings together Microsoft product managers, industry leaders, and dedicated users to explore the latest in Microsoft Dynamics 365, the Power Platform, and Copilot.Early bird tickets are on sale now and listeners of the Microsoft Innovation Podcast get 10% off with the code MIPVIP144bff https://www.dynamicsminds.com/register/?voucher=MIPVIP144bff Accelerate your Microsoft career with the 90 Day Mentoring Challenge We've helped 1,300+ people across 70+ countries establish successful careers in the Microsoft Power Platform and Dynamics 365 ecosystem.Benefit from expert guidance, a supportive community, and a clear career roadmap. A lot can change in 90 days, get started today!Support the showIf you want to get in touch with me, you can message me here on Linkedin.Thanks for listening
Bugged boardrooms. Insider moles. Social engineers posing as safety inspectors!? In this Talking Lead episode, Lefty assembles a veteran intel crew—Bryan Seaver U.S. Army Military Police vet and owner of SAPS Squadron Augmented Protection Services, LLC, a Nashville outfit running dignitary protection, K9 ops, and intelligence training. A *Talking Lead* mainstay! He's got firsthand scoop on "Red Teaming"; Mitch Davis U.S. Marine, private investigator, interrogator, Phoenix Consulting Group (now DynCorp) contractor, with a nose for sniffing out moles and lies; Brad Duley U.S. Marine, embassy guard, Phoenix/DynCorp contractor, Iraq vet, deputy sheriff, and precision shooter, bringing tactical grit to the table —to expose the high-stakes world of corporate espionage. They pull back the curtain on real-world spy tactics that were used during the the "Cold War" era and are still used in today's business battles: Red Team operations, honeypots, pretexting, data theft, and the growing threat of AI-driven deception. From cyber breaches to physical infiltrations, the tools of Cold War espionage are now aimed at American companies, defense tech, and even firearms innovation. State-backed actors, insider threats, and corporate sabotage—it's not just overseas anymore. Tune-in and get "Leaducated"!!
Bugged boardrooms. Insider moles. Social engineers posing as safety inspectors!? In this Talking Lead episode, Lefty assembles a veteran intel crew—Bryan Seaver U.S. Army Military Police vet and owner of SAPS Squadron Augmented Protection Services, LLC, a Nashville outfit running dignitary protection, K9 ops, and intelligence training. A *Talking Lead* mainstay! He's got firsthand scoop on "Red Teaming"; Mitch Davis U.S. Marine, private investigator, interrogator, Phoenix Consulting Group (now DynCorp) contractor, with a nose for sniffing out moles and lies; Brad Duley U.S. Marine, embassy guard, Phoenix/DynCorp contractor, Iraq vet, deputy sheriff, and precision shooter, bringing tactical grit to the table —to expose the high-stakes world of corporate espionage. They pull back the curtain on real-world spy tactics that were used during the the "Cold War" era and are still used in today's business battles: Red Team operations, honeypots, pretexting, data theft, and the growing threat of AI-driven deception. From cyber breaches to physical infiltrations, the tools of Cold War espionage are now aimed at American companies, defense tech, and even firearms innovation. State-backed actors, insider threats, and corporate sabotage—it's not just overseas anymore. Tune-in and get "Leaducated"!!
Send us a textJayson Coil is Assistant Fire Chief and Battalion Chief at Sedona Fire District in Arizona. With over 25 years of operational and leadership experience particularly in wildland firefighting and major disaster response, Jayson shares powerful insights on decision-making in complex environments. We dive into topics like adaptive leadership, red teaming, decentralizing command, and improving decision quality during crisis. Jayson also reflects on organizational change, trust, and morale, offering valuable lessons for current and future fire service leaders. From strategy to tactics, military crossovers to systemic failures, this conversation is packed with wisdom to help first responders lead more effectively in today's uncertain world. Connect with Jayon:-LINKEDINWEBSITEACCESS THE PODCAST LIBRARY & EVERY EPISODE, DEBRIEF & DOCUMENT CLICK HEREPODCAST GIFT - Get your FREE subscription to essential Firefighting publications HERE A big thanks to our partners for supporting this episode.GORE-TEX Professional ClothingMSA The Safety CompanyIDEXHAIX Footwear - Get offical podcast discount on HAIX HEREXendurance - to hunt performance & endurance 20% off HERE with code ffp20Lyfe Linez - Get Functional Hydration FUEL for FIREFIGHTERS, Clean no sugar for daily hydration. 80% of people live dehydratedSupport the show***The views expressed in this episode are those of the individual speakers. Our partners are not responsible for the content of this episode and does not warrant its accuracy or completeness.*** Please support the podcast and its future by clicking HERE and joining our Patreon Crew
Welcome to today's episode of AI Lawyer Talking Tech, where we delve into the rapidly evolving intersection of law and technology. The legal sector is currently undergoing a significant shift, moving beyond traditional practices with the integration of smart technologies and artificial intelligence. This transformation is driven by the need for enhanced efficiency, accuracy, and improved client service. From AI-powered legal research and document analysis to automated workflows and predictive analytics, technology is reshaping how legal professionals operate. We'll be exploring the key trends, ethical considerations, and the impact of these advancements on the future of legal practice.State declines to take action on Alton resident's complaint against VW selling personal data01 Apr 2025Laconia Daily SunBeyond Paperwork: The Smart Tech Approach to Legal Innovation in Simsbury02 Apr 2025IloungeEthics opinion offers principles for lawyers' ethical use of AI01 Apr 2025Texas Bar BlogLLRX March 2025 Issue31 Mar 2025LLRXWhy 2025 Demands AI-First Strategies for CLM01 Apr 2025ContractPodAiDennis P. Block & Associates is Leading the Legal AI Revolution01 Apr 2025Law Firm NewswireAlternative Business Structures: Big Accounting Firm Establishes U.S. Law Firm01 Apr 2025Wisconsin Lawyer MagazineOpening doors is easy….when you have the right keys01 Apr 2025JD SupraReviewing the case law as discovery reforms delay budget negotiations01 Apr 2025INFORMNNY.comMark Cohen and Dierk Schindler On The Union Of the Digital Legal Exchange and the Liquid Legal Institute01 Apr 2025LawSitesQuebec government launches its digital identity project: An overview01 Apr 2025LexologyLawyers may soon charge $10,000 an hour thanks to AI, says LexisNexis CEO01 Apr 2025TechSpotAI in Action: Enhancing Legal Discovery and Investigation Reviews01 Apr 2025LexologyLegal Issues on Red Teaming in Artificial Intelligence01 Apr 2025LexologyABA Business Law Spring Meeting Returns to New Orleans01 Apr 2025Biz New OrleansEpiq Wins Partnership with Technology and Innovation Law Firm Merchant & Gould P.C.01 Apr 2025Epiq SystemsCosmoLex and Rocket Matter Debut Automated Workflows for Busy Law Firms01 Apr 2025LawSites5 Legal Tech Solutions Your Senior Partners Will Love01 Apr 2025LawyeristEIP announces promotion of Ben Maling to partner01 Apr 2025Patent Lawyer MagazineAn Update on Generative AI in UK Litigation and Disclosure [March 2025]01 Apr 2025LexologyLegal regulators have “key role” in improving access to justice01 Apr 2025Legal FuturesMake Time For Slow Law – The ‘No Legal Tech' Movement01 Apr 2025Artificial LawyerTrump Advisor Demands Legal Tech Ban01 Apr 2025Artificial LawyerConstructing the digital future: Legal challenges in data centre projects01 Apr 2025LexologyLegal Tech Can't Ignore the Fear in the Room01 Apr 2025Legaltech on MediumLegalWeek 2025: Demos and analysis of DeepJudge, Vincent AI, Thomson Reuters CoCounsel, LexisNexis Protégé and more01 Apr 2025Legal IT InsiderMake Time For Slow Law – The ‘No Legal Tech' Movement01 Apr 2025Artificial LawyerBreaking news: UK law firm hires first AI partner01 Apr 2025Legal IT Insider
Building Trust Through Technology: Responsible AI in Practice // MLOps Podcast #301 with Rafael Sandroni, Founder and CEO of GardionAI.Join the Community: https://go.mlops.community/YTJoinIn Get the newsletter: https://go.mlops.community/YTNewsletter // AbstractRafael Sandroni shares key insights on securing AI systems, tackling fraud, and implementing robust guardrails. From prompt injection attacks to AI-driven fraud detection, we explore the challenges and best practices for building safer AI.// BioEntrepreneur and problem solver. // Related LinksGardionAI LinkedIn: https://www.linkedin.com/company/guardionai/~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExploreJoin our slack community [https://go.mlops.community/slack]Follow us on X/Twitter [@mlopscommunity](https://x.com/mlopscommunity) or [LinkedIn](https://go.mlops.community/linkedin)] Sign up for the next meetup: [https://go.mlops.community/register]MLOps Swag/Merch: [https://shop.mlops.community/]Connect with Demetrios on LinkedIn: /dpbrinkmConnect with Rafael on LinkedIn: /rafaelsandroniTimestamps:[00:00] Rafael's preferred coffee[00:16] Takeaways[01:03] AI Assistant Best Practices[03:48] Siri vs In-App AI[08:44] AI Security Exploration[11:55] Zero Trust for LLMS[18:02] Indirect Prompt Injection Risks[22:42] WhatsApp Banking Risks[26:27] Traditional vs New Age Fraud[29:12] AI Fraud Mitigation Patterns[32:50] Agent Access Control Risks[34:31] Red Teaming and Pentesting[39:40] Data Security Paradox[40:48] Wrap up
STANDARD EDITION: Signal OPSEC, White-box Red-teaming LLMs, Unified Company Context (UCC), New Book Recommendations, Single Apple Note Technique, and much more... You are currently listening to the Standard version of the podcast, consider upgrading and becoming a member to unlock the full version and many other exclusive benefits here: https://newsletter.danielmiessler.com/upgrade Subscribe to the newsletter at:https://danielmiessler.com/subscribe Join the UL community at:https://danielmiessler.com/upgrade Follow on X:https://x.com/danielmiessler Follow on LinkedIn:https://www.linkedin.com/in/danielmiesslerBecome a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.
In this episode of Crazy Wisdom, I, Stewart Alsop, sit down with Naman Mishra, CTO of Repello AI, to unpack the real-world security risks behind deploying large language models. We talk about layered vulnerabilities—from the model, infrastructure, and application layers—to attack vectors like prompt injection, indirect prompt injection through agents, and even how a simple email summarizer could be exploited to trigger a reverse shell. Naman shares stories like the accidental leak of a Windows activation key via an LLM and explains why red teaming isn't just a checkbox, but a continuous mindset. If you want to learn more about his work, check out Repello's website at repello.ai.Check out this GPT we trained on the conversation!Timestamps00:00 - Stewart Alsop introduces Naman Mishra, CTO of Repel AI. They frame the episode around AI security, contrasting prompt injection risks with traditional cybersecurity in ML apps.05:00 - Naman explains the layered security model: model, infrastructure, and application layers. He distinguishes safety (bias, hallucination) from security (unauthorized access, data leaks).10:00 - Focus on the application layer, especially in finance, healthcare, and legal. Naman shares how ChatGPT leaked a Windows activation key and stresses data minimization and security-by-design.15:00 - They discuss red teaming, how Repel AI simulates attacks, and Anthropic's HackerOne challenge. Naman shares how adversarial testing strengthens LLM guardrails.20:00 - Conversation shifts to AI agents and autonomy. Naman explains indirect prompt injection via email or calendar, leading to real exploits like reverse shells—all triggered by summarizing an email.25:00 - Stewart compares the Internet to a castle without doors. Naman explains the cat-and-mouse game of security—attackers need one flaw; defenders must lock every door. LLM insecurity lowers the barrier for attackers.30:00 - They explore input/output filtering, role-based access control, and clean fine-tuning. Naman admits most guardrails can be broken and only block low-hanging fruit.35:00 - They cover denial-of-wallet attacks—LLMs exploited to run up massive token costs. Naman critiques DeepSeek's weak alignment and state bias, noting training data risks.40:00 - Naman breaks down India's AI scene: Bangalore as a hub, US-India GTM, and the debate between sovereignty vs. pragmatism. He leans toward India building foundational models.45:00 - Closing thoughts on India's AI future. Naman mentions Sarvam AI, Krutrim, and Paris Chopra's Loss Funk. He urges devs to red team before shipping—"close the doors before enemies walk in."Key InsightsAI security requires a layered approach. Naman emphasizes that GenAI applications have vulnerabilities across three primary layers: the model layer, infrastructure layer, and application layer. It's not enough to patch up just one—true security-by-design means thinking holistically about how these layers interact and where they can be exploited.Prompt injection is more dangerous than it sounds. Direct prompt injection is already risky, but indirect prompt injection—where an attacker hides malicious instructions in content that the model will process later, like an email or webpage—poses an even more insidious threat. Naman compares it to smuggling weapons past the castle gates by hiding them in the food.Red teaming should be continuous, not a one-off. One of the critical mistakes teams make is treating red teaming like a compliance checkbox. Naman argues that red teaming should be embedded into the development lifecycle, constantly testing edge cases and probing for failure modes, especially as models evolve or interact with new data sources.LLMs can unintentionally leak sensitive data. In one real-world case, a language model fine-tuned on internal documentation ended up leaking a Windows activation key when asked a completely unrelated question. This illustrates how even seemingly benign outputs can compromise system integrity when training data isn't properly scoped or sanitized.Denial-of-wallet is an emerging threat vector. Unlike traditional denial-of-service attacks, LLMs are vulnerable to economic attacks where a bad actor can force the system to perform expensive computations, draining API credits or infrastructure budgets. This kind of vulnerability is particularly dangerous in scalable GenAI deployments with limited cost monitoring.Agents amplify security risks. While autonomous agents offer exciting capabilities, they also open the door to complex, compounded vulnerabilities. When agents start reading web content or calling tools on their own, indirect prompt injection can escalate into real-world consequences—like issuing financial transactions or triggering scripts—without human review.The Indian AI ecosystem needs to balance speed with sovereignty. Naman reflects on the Indian and global context, warning against simply importing models and infrastructure from abroad without understanding the security implications. There's a need for sovereign control over critical layers of AI systems—not just for innovation's sake, but for national resilience in an increasingly AI-mediated world.
Guest: Alex Polyakov, CEO at Adversa AI Topics: Adversa AI is known for its focus on AI red teaming and adversarial attacks. Can you share a particularly memorable red teaming exercise that exposed a surprising vulnerability in an AI system? What was the key takeaway for your team and the client? Beyond traditional adversarial attacks, what emerging threats in the AI security landscape are you most concerned about right now? What trips most clients, classic security mistakes in AI systems or AI-specific mistakes? Are there truly new mistakes in AI systems or are they old mistakes in new clothing? I know it is not your job to fix it, but much of this is unfixable, right? Is it a good idea to use AI to secure AI? Resources: EP84 How to Secure Artificial Intelligence (AI): Threats, Approaches, Lessons So Far AI Red Teaming Reasoning LLM US vs China: Jailbreak Deepseek, Qwen, O1, O3, Claude, Kimi Adversa AI blog Oops! 5 serious gen AI security mistakes to avoid Generative AI Fast Followership: Avoid These First Adopter Security Missteps
This week, Ads Dawson, Staff AI Security Researcher at Dreadnode, joins the show to talk all things AI Red Teaming!George K and George A talk to Ads about: The reality of securing #AI model development pipelines Why cross-functional expertise is critical when securing AI systems How to approach continuous red teaming for AI applications (hint: annual pen tests won't cut it anymore) Practical advice for #cybersecurity pros looking to skill up in AI securityWhether you're a CISO trying to navigate securing AI implementations or an infosec professional looking to expand your skill set, this conversation is all signal.Course mentioned: https://learn.nvidia.com/courses/course-detail?course_id=course-v1:DLI+S-DS-03+V1————
When I first experienced the Cynefin Framework in an HBR article many years ago, I never tried to adapt it to my work until I interviewed Bryce Hoffman, author of American Icon and Red Teaming, a few years ago. While Bryce made the Cynefin Framework seem more understandable and accessible, Kevin Eikenberry has gone further to show leaders how to act when surrounded by varying problems they are trying to navigate with this sensemaking framework.Kevin has written nearly 20 books, and his newest title is Flexible Leadership which includes a better approach to holistic thinking, the Cynefin Framework and the use of flexors.
Are you passionate about ethical hacking and cybersecurity? Want to break into the exciting world of Red Teaming and Penetration Testing? In this episode of the InfosecTrain podcast, our experts guide you through everything you need to know to start and grow a career in these advanced cybersecurity domains.
There's this popular trope in fiction about a character being mind controlled without losing awareness of what's happening. Think Jessica Jones, The Manchurian Candidate or Bioshock. The villain uses some magical technology to take control of your brain - but only the part of your brain that's responsible for motor control. You remain conscious and experience everything with full clarity.If it's a children's story, the villain makes you do embarrassing things like walk through the street naked, or maybe punch yourself in the face. But if it's an adult story, the villain can do much worse. They can make you betray your values, break your commitments and hurt your loved ones. There are some things you'd rather die than do. But the villain won't let you stop. They won't let you die. They'll make you feel — that's the point of the torture.I first started working on [...] The original text contained 3 footnotes which were omitted from this narration. The original text contained 1 image which was described by AI. --- First published: March 16th, 2025 Source: https://www.lesswrong.com/posts/MnYnCFgT3hF6LJPwn/why-white-box-redteaming-makes-me-feel-weird-1 --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try
Today's guest is Tomer Poran, Chief Evangelist and VP of Strategy at ActiveFence. ActiveFence is a technology company specializing in trust and safety solutions, helping platforms detect and prevent harmful content, malicious activity, and emerging threats online. Tomer joins today's podcast to explore the critical role of red teaming in AI safety and security. He breaks down the challenges enterprises face in deploying AI responsibly, the evolving nature of adversarial risks, and why organizations must adopt a proactive approach to testing AI systems. This episode is sponsored by ActiveFence. Learn how brands work with Emerj and other Emerj Media options at emerj.com/ad1.
ABOUT JIM PALMERJim Palmer is the Chief AI Officer at Dialpad. Previously he was CTO and Co-Founder of TalkIQ, a conversational intelligence start-up with expertise in real-time speech recognition and natural language processing, acquired by Dialpad in May of 2018. Prior to TalkIQ, he was the founding engineer on the eBay Now local delivery service.SHOW NOTES:Tips and cheat codes for navigating AI governance (3:30)Breaking down red teaming & adversarial testing in AI governance (8:02)Launching and scaling adversarial testing efforts (11:27)Unexpected benefits unlocked with adversarial testing (13:43)Understanding data governance and strategic AI investments (15:38)Building resilient AI from concept to customer validation (19:28)Exploring early feature validation and pattern recognition in AI (22:38)Adaptability in data management and ensuring safe, ethical data use while adapting to evolving legal and governance requirements (26:51)How to prepare data for safe and sustainable long-term use (30:02)Strategies for compliant data practices in a regulated world (32:43)Building data deletion systems with model training in mind (35:14)Current events and trends shaping adaptability and durability in the AI ecosystem (38:38)The role of a Chief AI Officer (41:20)Rapid fire questions (44:35)LINKS AND RESOURCESGenius Makers: The Mavericks Who Brought AI to Google, Facebook, and the World - With deep and exclusive reporting, across hundreds of interviews, New York Times Silicon Valley journalist Cade Metz brings you into the rooms where these questions are being answered. Where an extraordinarily powerful new artificial intelligence has been built into our biggest companies, our social discourse, and our daily lives, with few of us even noticing.This episode wouldn't have been possible without the help of our incredible production team:Patrick Gallagher - Producer & Co-HostJerry Li - Co-HostNoah Olberding - Associate Producer, Audio & Video Editor https://www.linkedin.com/in/noah-olberding/Dan Overheim - Audio Engineer, Dan's also an avid 3D printer - https://www.bnd3d.com/Ellie Coggins Angus - Copywriter, Check out her other work at https://elliecoggins.com/about/
What exactly is generative AI (genAI) red-teaming? What strategies and standards should guide its implementation? And how can it protect the public interest? In this conversation, Lama Ahmad, Camille François, Tarleton Gillespie, Briana Vecchione, and Borhane Blili-Hamelin examined red-teaming's place in the evolving landscape of genAI evaluation and governance.Our discussion drew on a new report by Data & Society (D&S) and AI Risk and Vulnerability Alliance (ARVA), a nonprofit that aims to empower communities to recognize, diagnose, and manage harmful flaws in AI. The report, Red-Teaming in the Public Interest, investigates how red-teaming methods are being adapted to confront uncertainty about flaws in systems and to encourage public engagement with the evaluation and oversight of genAI systems. Red-teaming offers a flexible approach to uncovering a wide range of problems with genAI models. It also offers new opportunities for incorporating diverse communities into AI governance practices.Ultimately, we hope this report and discussion present a vision of red-teaming as an area of public interest sociotechnical experimentation.Download the report and learn more about the speakers and references at datasociety.net.--00:00 Opening00:12 Welcome and Framing04:48 Panel Introductions09:34 Discussion Overview10:23 Lama Ahmad on The Value of Human Red-Teaming17:37 Tarleton Gillespie on Labor and Content Moderation Antecedents25:03 Briana Vecchione on Participation & Accountability28:25 Camille François on Global Policy and Open-source Infrastructure35:09 Questions and Answers56:39 Final Takeaways
Our 199th episode with a summary and discussion of last week's big AI news! Recorded on 02/09/2025 Join our brand new Discord here! https://discord.gg/nTyezGSKwP Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. In this episode: - OpenAI's deep research feature capability launched, allowing models to generate detailed reports after prolonged inference periods, competing directly with Google's Gemini 2.0 reasoning models. - France and UAE jointly announce plans to build a massive AI data center in France, aiming to become a competitive player within the AI infrastructure landscape. - Mistral introduces a mobile app, broadening its consumer AI lineup amidst market skepticism about its ability to compete against larger firms like OpenAI and Google. - Anthropic unveils 'Constitutional Classifiers,' a method showing strong defenses against universal jailbreaks; they also launched a $20K challenge to find weaknesses. Timestamps + Links: (00:00:00) Intro / Banter (00:02:27) News Preview (00:03:28) Response to listener comments Tools & Apps (00:08:01) OpenAI now reveals more of its o3-mini model's thought process (00:16:03) Google's Gemini app adds access to ‘thinking' AI models (00:21:04) OpenAI Unveils A.I. Tool That Can Do Research Online (00:31:09) Mistral releases its AI assistant on iOS and Android (00:36:17) AI music startup Riffusion launches its service in public beta (00:39:11) Pikadditions by Pika Labs lets users seamlessly insert objects into videos Applications & Business (00:41:19) Softbank set to invest $40 billion in OpenAI at $260 billion valuation, sources say (00:47:36) UAE to invest billions in France AI data centre (00:50:34) Report: Ilya Sutskever's startup in talks to fundraise at roughly $20B valuation (00:52:03) ASML to Ship First Second-Gen High-NA EUV Machine in the Coming Months, Aiming for 2026 Production (00:54:38) NVIDIA's GB200 NVL 72 Shipments Not Under Threat From DeepSeek As Hyperscalers Maintain CapEx; Meanwhile, Trump Tariffs Play Havoc With TSMC's Pricing Strategy Projects & Open Source (00:56:49) The Allen Institute for AI (AI2) Releases Tülu 3 405B: Scaling Open-Weight... (01:00:06) SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model (01:03:56) PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models (01:08:26) OpenEuroLLM: Europe's New Initiative for Open-Source AI Development Research & Advancements (01:10:34) LIMO: Less is More for Reasoning (01:16:39) s1: Simple test-time scaling (01:19:17) ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning (01:23:55) Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch Policy & Safety (01:26:50) US sets AI safety aside in favor of 'AI dominance' (01:29:39) Almost Surely Safe Alignment of Large Language Models at Inference-Time (01:32:02) Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming (01:33:16) Anthropic offers $20,000 to whoever can jailbreak its new AI safety system
Vom KI-Betreiber zum Anbieter: Haftung und Risiken Künstliche Intelligenz verändert viele Branchen grundlegend. Doch wer haftet eigentlich, wenn KI-Systeme eingesetzt werden? Während Anbieter KI-Modelle entwickeln und bereitstellen, sind Betreiber diejenigen, die diese Modelle für ihre Zwecke nutzen. Doch in manchen Fällen können Betreiber rechtlich zu Anbietern werden – mit weitreichenden Konsequenzen. Prof. Dr. Philipp Hacker auf LinkedIn: LinkedIn - https://www.linkedin.com/in/philipp-hacker Betreiber vs. Anbieter: Wo liegt der Unterschied? Ein Betreiber ist eine Organisation oder Person, die ein KI-System unter eigener Aufsicht einsetzt. Typischerweise sind das beispielsweise Ärztinnen und Ärzte, die ein KI-gestütztes Diagnosetool verwenden. Ein Anbieter hingegen entwickelt oder vertreibt ein KI-Modell Die entscheidende Frage ist, wann Betreiber die Grenze zum Anbieter überschreiten. Dafür gibt es mehrere relevante Faktoren. Branding – Die „Scheinhaftung“ Wer eine bestehende KI-Lösung unter eigenem Namen oder eigener Marke anbietet, übernimmt rechtlich Verantwortung. Wenn ein Unternehmen beispielsweise eine bestehende KI in „BoschGPT“ umbenennt, wird es nicht mehr nur als Betreiber, sondern rechtlich wie ein Anbieter behandelt. Unternehmen sollten vermeiden, ein generisches KI-Modell mit der eigenen Marke zu verknüpfen, um unerwünschte Haftungsrisiken zu umgehen. Zweckänderung – KI für Hochrisikoanwendungen nutzen Ein KI-Modell ist nicht automatisch „hochrisikoreich“. General Purpose AI wie GPT oder Claude fällt nicht per se in diese Kategorie. Wird ein solches Modell jedoch in einem Hochrisikobereich wie medizinischer Diagnostik oder Recruiting-Prozessen eingesetzt, wird das Unternehmen rechtlich zum Anbieter. Der ursprüngliche Entwickler kann nicht nachvollziehen, wie Millionen von Nutzern seine KI einsetzen. Deshalb trifft die Verantwortung denjenigen, der die KI für ein Hochrisiko-Szenario nutzt. Sobald eine KI zur Kreditbewertung oder Personalauswahl eingesetzt wird, greift die Hochrisikoklassifizierung. Das Unternehmen muss sicherstellen, dass das Modell für diesen Zweck zertifiziert und geeignet ist. Feintuning – Anpassung eines bestehenden Modells Wer ein KI-Modell durch Feintuning verändert, kann ebenfalls zum Anbieter werden. Das gilt insbesondere dann, wenn das Modell ursprünglich nicht für den Hochrisikobereich bestimmt war. Wenn ein Unternehmen eine offene KI wie Llama oder Mistral mit spezifischen Daten für ein HR-Tool trainiert, kann es als Anbieter eingestuft werden. Die bloße Nutzung einer spezialisierten Software, die bereits als Hochrisikoprodukt zertifiziert ist, macht ein Unternehmen nicht zum Anbieter. Entscheidend ist, ob durch Feintuning eine wesentliche Änderung am Modell vorgenommen wird. Fine-Tuning kann neue Transparenz- und Compliance-Pflichten auslösen, insbesondere nach Artikel 55 der KI-Verordnung. Unternehmen müssen sich mit zusätzlichen Anforderungen wie Red-Teaming und Risikoanalysen auseinandersetzen. Möglichkeiten zur Risikominimierung Unternehmen können verschiedene Strategien nutzen, um rechtliche Fallstricke zu vermeiden. Es ist ratsam, keine eigene Marke auf bestehende KI-Modelle zu setzen, um eine Anbieterhaftung zu umgehen. General Purpose AI sollte nicht für Hochrisiko-Anwendungen genutzt werden, es sei denn, das Modell ist dafür zertifiziert. Feintuning sollte nur mit Bedacht eingesetzt werden. Alternativen wie Prompt Engineering oder Retrieval-Augmented Generation (RAG) verändern das Modell selbst nicht und können eine bessere Argumentationsgrundlage bieten, um nicht als Anbieter zu gelten. Ein weiterer Ansatz ist die Nutzung kleinerer Modelle mit weniger als 10^25 FLOPs, da diese unter regulatorischen Schwellenwerten bleiben. In manchen Fällen kann es auch sinnvoll sein, Haftung gezielt auszulagern, etwa durch die Gründung einer separaten Fazit Unternehmen, die KI-Modelle einsetzen, sollten sich bewusst sein, dass sie schnell vom Betreiber zum Anbieter werden können – mit erheblichen rechtlichen und regulatorischen Konsequenzen. Die sichere Nutzung von KI erfordert daher ein grundlegendes Verständnis der rechtlichen Rahmenbedingungen. Wer auf Hochrisikobereiche setzt oder Feintuning betreibt, sollte sich intensiv mit den Pflichten und Risiken auseinandersetzen. Wer Feintuning oder Hochrisiko-KI einsetzt, sollte sich frühzeitig mit KI-Compliance-Experten austauschen, um Haftungsrisiken zu minimieren. Noch mehr von den Koertings ... Das KI-Café ... jede Woche Mittwoch (>350 Teilnehmer) von 08:30 bis 10:00 Uhr ... online via Zoom .. kostenlos und nicht umsonstJede Woche Mittwoch um 08:30 Uhr öffnet das KI-Café seine Online-Pforten ... wir lösen KI-Anwendungsfälle live auf der Bühne ... moderieren Expertenpanel zu speziellen Themen (bspw. KI im Recruiting ... KI in der Qualitätssicherung ... KI im Projektmanagement ... und vieles mehr) ... ordnen die neuen Entwicklungen in der KI-Welt ein und geben einen Ausblick ... und laden Experten ein für spezielle Themen ... und gehen auch mal in die Tiefe und durchdringen bestimmte Bereiche ganz konkret ... alles für dein Weiterkommen. Melde dich kostenfrei an ... www.koerting-institute.com/ki-cafe/ Das KI-Buch ... für Selbstständige und Unternehmer Lerne, wie ChatGPT deine Produktivität steigert, Zeit spart und Umsätze maximiert. Enthält praxisnahe Beispiele für Buchvermarktung, Text- und Datenanalysen sowie 30 konkrete Anwendungsfälle. Entwickle eigene Prompts, verbessere Marketing & Vertrieb und entlaste dich von Routineaufgaben. Geschrieben von Torsten & Birgit Koerting, Vorreitern im KI-Bereich, die Unternehmer bei der Transformation unterstützen. Das Buch ist ein Geschenk, nur Versandkosten von 6,95 € fallen an. Perfekt für Anfänger und Fortgeschrittene, die mit KI ihr Potenzial ausschöpfen möchten. Das Buch in deinen Briefkasten ... www.koerting-institute.com/ki-buch/ Die KI-Lounge ... unsere Community für den Einstieg in die KI (>1000 Mitglieder) Die KI-Lounge ist eine Community für alle, die mehr über generative KI erfahren und anwenden möchten. Mitglieder erhalten exklusive monatliche KI-Updates, Experten-Interviews, Vorträge des KI-Speaker-Slams, KI-Café-Aufzeichnungen und einen 3-stündigen ChatGPT-Kurs. Tausche dich mit über 1000 KI-Enthusiasten aus, stelle Fragen und starte durch. Initiiert von Torsten & Birgit Koerting, bietet die KI-Lounge Orientierung und Inspiration für den Einstieg in die KI-Revolution. Hier findet der Austausch statt ... www.koerting-institute.com/ki-lounge/ Starte mit uns in die 1:1 Zusammenarbeit Wenn du direkt mit uns arbeiten und KI in deinem Business integrieren möchtest, buche dir einen Termin für ein persönliches Gespräch. Gemeinsam finden wir Antworten auf deine Fragen und finden heraus, wie wir dich unterstützen können. Klicke hier, um einen Termin zu buchen und deine Fragen zu klären. Buche dir jetzt deinen Termin mit uns ... www.koerting-institute.com/termin/ Weitere Impulse im Netflix Stil ... Wenn du auf der Suche nach weiteren spannenden Impulsen für deine Selbstständigkeit bist, dann gehe jetzt auf unsere Impulseseite und lass die zahlreichen spannenden Impulse auf dich wirken. Inspiration pur ... www.koerting-institute.com/impulse/ Die Koertings auf die Ohren ... Wenn dir diese Podcastfolge gefallen hat, dann höre dir jetzt noch weitere informative und spannende Folgen an ... über 380 Folgen findest du hier ... www.koerting-institute.com/podcast/ Wir freuen uns darauf, dich auf deinem Weg zu begleiten!
HackerOne's co-founder, Michiel Prins walks us through the latest new offensive security service: AI red teaming. At the same time enterprises are globally trying to figure out how to QA and red team generative AI models like LLMs, early adopters are challenged to scale these tests. Crowdsourced bug bounty platforms are a natural place to turn for assistance with scaling this work, though, as we'll discuss on this episode, it is unlike anything bug hunters have ever tackled before. Segment Resources: https://www.hackerone.com/ai/snap-ai-red-teaming https://www.hackerone.com/thought-leadership/ai-safety-red-teaming This interview is a bit different from our norm. We talk to the founder and CEO of OpenVPN about what it is like to operate a business based on open source, particularly through trying times like the recent pandemic. How do you compete when your competitors are free to build products using your software and IP? It seems like an oxymoron, but an open source-based business actually has some significant advantages over the closed source commercial approach. In this week's enterprise security news, the first cybersecurity IPO in 3.5 years! new companies new tools the fate of CISA and the cyber safety review board things we learned about AI in 2024 is the humanless SOC possible? NGFWs have some surprising vulnerabilities what did generative music sound like in 1996? All that and more, on this episode of Enterprise Security Weekly. Visit https://www.securityweekly.com/esw for all the latest episodes! Show Notes: https://securityweekly.com/esw-391
HackerOne's co-founder, Michiel Prins walks us through the latest new offensive security service: AI red teaming. At the same time enterprises are globally trying to figure out how to QA and red team generative AI models like LLMs, early adopters are challenged to scale these tests. Crowdsourced bug bounty platforms are a natural place to turn for assistance with scaling this work, though, as we'll discuss on this episode, it is unlike anything bug hunters have ever tackled before. Segment Resources: https://www.hackerone.com/ai/snap-ai-red-teaming https://www.hackerone.com/thought-leadership/ai-safety-red-teaming This interview is a bit different from our norm. We talk to the founder and CEO of OpenVPN about what it is like to operate a business based on open source, particularly through trying times like the recent pandemic. How do you compete when your competitors are free to build products using your software and IP? It seems like an oxymoron, but an open source-based business actually has some significant advantages over the closed source commercial approach. In this week's enterprise security news, the first cybersecurity IPO in 3.5 years! new companies new tools the fate of CISA and the cyber safety review board things we learned about AI in 2024 is the humanless SOC possible? NGFWs have some surprising vulnerabilities what did generative music sound like in 1996? All that and more, on this episode of Enterprise Security Weekly. Visit https://www.securityweekly.com/esw for all the latest episodes! Show Notes: https://securityweekly.com/esw-391
HackerOne's co-founder, Michiel Prins walks us through the latest new offensive security service: AI red teaming. At the same time enterprises are globally trying to figure out how to QA and red team generative AI models like LLMs, early adopters are challenged to scale these tests. Crowdsourced bug bounty platforms are a natural place to turn for assistance with scaling this work, though, as we'll discuss on this episode, it is unlike anything bug hunters have ever tackled before. Segment Resources: https://www.hackerone.com/ai/snap-ai-red-teaming https://www.hackerone.com/thought-leadership/ai-safety-red-teaming Show Notes: https://securityweekly.com/esw-391
HackerOne's co-founder, Michiel Prins walks us through the latest new offensive security service: AI red teaming. At the same time enterprises are globally trying to figure out how to QA and red team generative AI models like LLMs, early adopters are challenged to scale these tests. Crowdsourced bug bounty platforms are a natural place to turn for assistance with scaling this work, though, as we'll discuss on this episode, it is unlike anything bug hunters have ever tackled before. Segment Resources: https://www.hackerone.com/ai/snap-ai-red-teaming https://www.hackerone.com/thought-leadership/ai-safety-red-teaming Show Notes: https://securityweekly.com/esw-391
How to identify risks in AI models? Red teaming is one of the options, says the guest of AI at Scale podcast - Dr. Rumman Chowdhury, CEO of Humane Intelligence, US Science Envoy for Artificial Intelligence. Rumman guides us through her approach to detecting risks, ensuring transparency and accountability in AI systems. She emphasizes the importance of responsible AI practices and shares her perspective on the role of regulation in fostering innovation. Recognized as one of Time's 100 most Influential People in AI, she offers valuable insights on navigating ethical challenges in AI development.
Das ist das KI-Update vom 15.01.2025 unter anderem mit diesen Themen: Microsoft strukturiert KI-Entwicklung radikal um USA öffnen Staatsgelände für KI-Rechenzentren IT-Fachkräftemangel in Deutschland: Alarmierende Passivität Microsofts KI-Sicherheitstests zeigen überraschende Schwachstellen Links zu allen Themen der heutigen Folge findet Ihr hier: https://heise.de/-10243218 https://www.heise.de/thema/KI-Update https://pro.heise.de/ki/ https://www.heise.de/newsletter/anmeldung.html?id=ki-update https://www.heise.de/thema/Kuenstliche-Intelligenz https://the-decoder.de/ https://www.heiseplus.de/podcast https://www.ct.de/ki https://www.ki-adventskalender.de/
AI governance is a rapidly evolving field that faces a wide array of risks, challenges and opportunities. For organizations looking to leverage AI systems such as large language models and generative AI, assessing risk prior to deployment is a must. One technique that's been borrowed from the security space is red teaming. The practice is growing, and regulators are taking notice. Brenda Leong, a partner of Luminos Law, helps global businesses manage their AI and data risks. I recently caught up with her to discuss what organizations should be thinking about when diving into red teaming to assess risk prior to deployment.
Streamline Your Cybersecurity with Flare Here: https://try.flare.io/unsupervised-learning/ In this conversation, I speak with Jason Haddix, founder of Arcanum Security and CISO at Flare. We talk about: Flare's Unique Approach to Threat Intelligence:How Flare's capability to uncover compromised credentials and cookies from the dark web and private forums has been crucial in red team engagements. Challenges of Credential Theft and Advanced Malware Techniques:How adversaries utilize tools like the RedLine Stealer malware to gather credentials, cookies, and other sensitive information, and this stolen data enables attackers to bypass authentication protocols, emphasizing the need for comprehensive exposure management. Jason's Journey To Founding Arcanum & Arcanum's Security Training Programs:How Jason now advises on product development and threat intelligence as Flare's CISO and his journey to fund Arcanum, a company focused on red teaming and cybersecurity, and Arcanum's specialized training programs focusing on offensive security and using AI in security roles. And more Introduction to the Podcast (00:00:00)Guest Excitement on Podcast (00:00:20)Jason's New Business and Flare Role (00:00:24)Career Shift from Ubisoft to Red Teaming (00:01:02)Evolution of Adversary Tactics (00:02:04)Flare's Credential Exposure Management (00:02:58)Synergy Between Arcanum and Flare(00:03:55)Dark Web Credential Compromise (00:04:45)Challenges with Two-Factor Authentication (00:06:25)Cookie Theft and Unauthorized Access (00:07:39)Redline Malware and Its Impact (00:08:12)Flare's Research Capabilities (00:09:50)Potential for Advanced Malware Detection (00:11:40)Expansion of Threat Intelligence Services (00:12:15)Vision for a Unified Security Dashboard (00:13:25)Integrating Threat Intelligence with Identity Management (00:14:00)Credential Update Notifications via API (00:15:54)Automated Credential Management Potential (00:17:28)AI Features in Security Platforms (00:17:32)Exploration of Automated Security Responses (00:18:38)Introduction to Arcanum Security (00:19:25)Overview of Arcanum Training Courses (00:20:25)Necessity for Up-to-Date Training (00:22:15)Guest Experts in Training Sessions (00:23:08)Upcoming Features for Flare (00:25:11)Integrating Vulnerability Management (00:28:08)Accessing Flare's Free Trial (00:28:25)Learning More About Arcanum (00:29:09)Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.
In this episode of the Microsoft Threat Intelligence Podcast host Sherrod DeGrippo is joined by Yonatan Zunger, CVP of AI Safety and Security at Microsoft. The conversation delves into the critical role of the AI Red Team, which focuses on identifying vulnerabilities in AI systems. Yonatan emphasizes the importance of ensuring the safety of Microsoft's AI products and the innovative methods the team employs to simulate potential threats, including how they assess risk and develop effective responses. This engaging dialogue offers insights into the intersection of technology, security, and human behavior in the evolving landscape of AI. In this episode you'll learn: Why securing AI systems requires understanding their unique psychology The importance of training and technical mitigations to enhance AI safety How financial incentives drive performance improvements in AI systems Some questions we ask: How does Retrieval Augmented Generation (RAG) work? What are the potential risks with data access and permissions in AI systems? Should users tell language models that accuracy affects their rewards to improve responses? Resources: View Yonatan Zunger on LinkedIn View Sherrod DeGrippo on LinkedIn Related Microsoft Podcasts: Afternoon Cyber Tea with Ann Johnson The BlueHat Podcast Uncovering Hidden Risks Discover and follow other Microsoft podcasts at microsoft.com/podcasts Get the latest threat intelligence insights and guidance at Microsoft Security Insider The Microsoft Threat Intelligence Podcast is produced by Microsoft and distributed as part of N2K media network.
Enjoy this encore episode. The practice of emulating known adversary behavior against an organization's actual defensive posture.
Ben is founder and CEO of watchTowr, building an external attack surface management tool (EASM) that performs automated penetration testing and red teaming activities. Before founding watchTowr in 2021, Ben worked as a security consultant for a decade focused largely on penetration testing. And as Ben describes in the episode, what started as a combination of cobbled together scripts from his previous experience has since grown into a comprehensive automation platform. Website: https://watchtowr.com/ Sponsor: VulnCheck
Enjoy this encore episode. The practice of emulating known adversary behavior against an organization's actual defensive posture. Learn more about your ad choices. Visit megaphone.fm/adchoices
Apologies for lower audio quality; we lost recordings and had to use backup tracks. Our guests today are Anastasios Angelopoulos and Wei-Lin Chiang, leads of Chatbot Arena, fka LMSYS, the crowdsourced AI evaluation platform developed by the LMSys student club at Berkeley, which became the de facto standard for comparing language models. Arena ELO is often more cited than MMLU scores to many folks, and they have attracted >1,000,000 people to cast votes since its launch, leading top model trainers to cite them over their own formal academic benchmarks:The Limits of Static BenchmarksWe've done two benchmarks episodes: Benchmarks 101 and Benchmarks 201. One issue we've always brought up with static benchmarks is that 1) many are getting saturated, with models scoring almost perfectly on them 2) they often don't reflect production use cases, making it hard for developers and users to use them as guidance. The fundamental challenge in AI evaluation isn't technical - it's philosophical. How do you measure something that increasingly resembles human intelligence? Rather than trying to define intelligence upfront, Arena let users interact naturally with models and collect comparative feedback. It's messy and subjective, but that's precisely the point - it captures the full spectrum of what people actually care about when using AI.The Pareto Frontier of Cost vs IntelligenceBecause the Elo scores are remarkably stable over time, we can put all the chat models on a map against their respective cost to gain a view of at least 3 orders of magnitude of model sizes/costs and observe the remarkable shift in intelligence per dollar over the past year:This frontier stood remarkably firm through the recent releases of o1-preview and price cuts of Gemini 1.5:The Statistics of SubjectivityIn our Benchmarks 201 episode, Clémentine Fourrier from HuggingFace thought this design choice was one of shortcomings of arenas: they aren't reproducible. You don't know who ranked what and what exactly the outcome was at the time of ranking. That same person might rank the same pair of outputs differently on a different day, or might ask harder questions to better models compared to smaller ones, making it imbalanced. Another argument that people have brought up is confirmation bias. We know humans prefer longer responses and are swayed by formatting - Rob Mulla from Dreadnode had found some interesting data on this in May:The approach LMArena is taking is to use logistic regression to decompose human preferences into constituent factors. As Anastasios explains: "We can say what components of style contribute to human preference and how they contribute." By adding these style components as parameters, they can mathematically "suck out" their influence and isolate the core model capabilities.This extends beyond just style - they can control for any measurable factor: "What if I want to look at the cost adjusted performance? Parameter count? We can ex post facto measure that." This is one of the most interesting things about Arena: You have a data generation engine which you can clean and turn into leaderboards later. If you wanted to create a leaderboard for poetry writing, you could get existing data from Arena, normalize it by identifying these style components. Whether or not it's possible to really understand WHAT bias the voters have, that's a different question.Private EvalsOne of the most delicate challenges LMSYS faces is maintaining trust while collaborating with AI labs. The concern is that labs could game the system by testing multiple variants privately and only releasing the best performer. This was brought up when 4o-mini released and it ranked as the second best model on the leaderboard:But this fear misunderstands how Arena works. Unlike static benchmarks where selection bias is a major issue, Arena's live nature means any initial bias gets washed out by ongoing evaluation. As Anastasios explains: "In the long run, there's way more fresh data than there is data that was used to compare these five models." The other big question is WHAT model is actually being tested; as people often talk about on X / Discord, the same endpoint will randomly feel “nerfed” like it happened for “Claude European summer” and corresponding conspiracy theories:It's hard to keep track of these performance changes in Arena as these changes (if real…?) are not observable.The Future of EvaluationThe team's latest work on RouteLLM points to an interesting future where evaluation becomes more granular and task-specific. But they maintain that even simple routing strategies can be powerful - like directing complex queries to larger models while handling simple tasks with smaller ones.Arena is now going to expand beyond text into multimodal evaluation and specialized domains like code execution and red teaming. But their core insight remains: the best way to evaluate intelligence isn't to simplify it into metrics, but to embrace its complexity and find rigorous ways to analyze it. To go after this vision, they are spinning out Arena from LMSys, which will stay as an academia-driven group at Berkeley.Full Video PodcastChapters* 00:00:00 - Introductions* 00:01:16 - Origin and development of Chatbot Arena* 00:05:41 - Static benchmarks vs. Arenas* 00:09:03 - Community building* 00:13:32 - Biases in human preference evaluation* 00:18:27 - Style Control and Model Categories* 00:26:06 - Impact of o1* 00:29:15 - Collaborating with AI labs* 00:34:51 - RouteLLM and router models* 00:38:09 - Future of LMSys / ArenaShow Notes* Anastasios Angelopoulos* Anastasios' NeurIPS Paper Conformal Risk Control* Wei-Lin Chiang* Chatbot Arena* LMSys* MTBench* ShareGPT dataset* Stanford's Alpaca project* LLMRouter* E2B* DreadnodeTranscriptAlessio [00:00:00]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, Partner and CTO in Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol.ai.Swyx [00:00:14]: Hey, and today we're very happy and excited to welcome Anastasios and Wei Lin from LMSys. Welcome guys.Wei Lin [00:00:21]: Hey, how's it going? Nice to see you.Anastasios [00:00:23]: Thanks for having us.Swyx [00:00:24]: Anastasios, I actually saw you, I think at last year's NeurIPS. You were presenting a paper, which I don't really super understand, but it was some theory paper about how your method was very dominating over other sort of search methods. I don't remember what it was, but I remember that you were a very confident speaker.Anastasios [00:00:40]: Oh, I totally remember you. Didn't ever connect that, but yes, that's definitely true. Yeah. Nice to see you again.Swyx [00:00:46]: Yeah. I was frantically looking for the name of your paper and I couldn't find it. Basically I had to cut it because I didn't understand it.Anastasios [00:00:51]: Is this conformal PID control or was this the online control?Wei Lin [00:00:55]: Blast from the past, man.Swyx [00:00:57]: Blast from the past. It's always interesting how NeurIPS and all these academic conferences are sort of six months behind what people are actually doing, but conformal risk control, I would recommend people check it out. I have the recording. I just never published it just because I was like, I don't understand this enough to explain it.Anastasios [00:01:14]: People won't be interested.Wei Lin [00:01:15]: It's all good.Swyx [00:01:16]: But ELO scores, ELO scores are very easy to understand. You guys are responsible for the biggest revolution in language model benchmarking in the last few years. Maybe you guys want to introduce yourselves and maybe tell a little bit of the brief history of LMSysWei Lin [00:01:32]: Hey, I'm Wei Lin. I'm a fifth year PhD student at UC Berkeley, working on Chatbot Arena these days, doing crowdsourcing AI benchmarking.Anastasios [00:01:43]: I'm Anastasios. I'm a sixth year PhD student here at Berkeley. I did most of my PhD on like theoretical statistics and sort of foundations of model evaluation and testing. And now I'm working 150% on this Chatbot Arena stuff. It's great.Alessio [00:02:00]: And what was the origin of it? How did you come up with the idea? How did you get people to buy in? And then maybe what were one or two of the pivotal moments early on that kind of made it the standard for these things?Wei Lin [00:02:12]: Yeah, yeah. Chatbot Arena project was started last year in April, May, around that. Before that, we were basically experimenting in a lab how to fine tune a chatbot open source based on the Llama 1 model that I released. At that time, Lama 1 was like a base model and people didn't really know how to fine tune it. So we were doing some explorations. We were inspired by Stanford's Alpaca project. So we basically, yeah, grow a data set from the internet, which is called ShareGPT data set, which is like a dialogue data set between user and chat GPT conversation. It turns out to be like pretty high quality data, dialogue data. So we fine tune on it and then we train it and release the model called V2. And people were very excited about it because it kind of like demonstrate open way model can reach this conversation capability similar to chat GPT. And then we basically release the model with and also build a demo website for the model. People were very excited about it. But during the development, the biggest challenge to us at the time was like, how do we even evaluate it? How do we even argue this model we trained is better than others? And then what's the gap between this open source model that other proprietary offering? At that time, it was like GPT-4 was just announced and it's like Cloud One. What's the difference between them? And then after that, like every week, there's a new model being fine tuned, released. So even until still now, right? And then we have that demo website for V2 now. And then we thought like, okay, maybe we can add a few more of the model as well, like API model as well. And then we quickly realized that people need a tool to compare between different models. So we have like a side by side UI implemented on the website to that people choose, you know, compare. And we quickly realized that maybe we can do something like, like a battle on top of ECLMs, like just anonymize it, anonymize the identity, and that people vote which one is better. So the community decides which one is better, not us, not us arguing, you know, our model is better or what. And that turns out to be like, people are very excited about this idea. And then we tweet, we launch, and that's, yeah, that's April, May. And then it was like first two, three weeks, like just a few hundred thousand views tweet on our launch tweets. And then we have regularly double update weekly, beginning at a time, adding new model GPT-4 as well. So it was like, that was the, you know, the initial.Anastasios [00:04:58]: Another pivotal moment, just to jump in, would be private models, like the GPT, I'm a little,Wei Lin [00:05:04]: I'm a little chatty. That was this year. That was this year.Anastasios [00:05:07]: Huge.Wei Lin [00:05:08]: That was also huge.Alessio [00:05:09]: In the beginning, I saw the initial release was May 3rd of the beta board. On April 6, we did a benchmarks 101 episode for a podcast, just kind of talking about, you know, how so much of the data is like in the pre-training corpus and blah, blah, blah. And like the benchmarks are really not what we need to evaluate whether or not a model is good. Why did you not make a benchmark? Maybe at the time, you know, it was just like, Hey, let's just put together a whole bunch of data again, run a, make a score that seems much easier than coming out with a whole website where like users need to vote. Any thoughts behind that?Wei Lin [00:05:41]: I think it's more like fundamentally, we don't know how to automate this kind of benchmarks when it's more like, you know, conversational, multi-turn, and more open-ended task that may not come with a ground truth. So let's say if you ask a model to help you write an email for you for whatever purpose, there's no ground truth. How do you score them? Or write a story or a creative story or many other things like how we use ChatterBee these days. It's more open-ended. You know, we need human in the loop to give us feedback, which one is better. And I think nuance here is like, sometimes it's also hard for human to give the absolute rating. So that's why we have this kind of pairwise comparison, easier for people to choose which one is better. So from that, we use these pairwise comparison, those to calculate the leaderboard. Yeah. You can add more about this methodology.Anastasios [00:06:40]: Yeah. I think the point is that, and you guys probably also talked about this at some point, but static benchmarks are intrinsically, to some extent, unable to measure generative model performance. And the reason is because you cannot pre-annotate all the outputs of a generative model. You change the model, it's like the distribution of your data is changing. New labels to deal with that. New labels are great automated labeling, right? Which is why people are pursuing both. And yeah, static benchmarks, they allow you to zoom in to particular types of information like factuality, historical facts. We can build the best benchmark of historical facts, and we will then know that the model is great at historical facts. But ultimately, that's not the only axis, right? And we can build 50 of them, and we can evaluate 50 axes. But it's just so, the problem of generative model evaluation is just so expansive, and it's so subjective, that it's just maybe non-intrinsically impossible, but at least we don't see a way. We didn't see a way of encoding that into a fixed benchmark.Wei Lin [00:07:47]: But on the other hand, I think there's a challenge where this kind of online dynamic benchmark is more expensive than static benchmark, offline benchmark, where people still need it. Like when they build models, they need static benchmark to track where they are.Anastasios [00:08:03]: It's not like our benchmark is uniformly better than all other benchmarks, right? It just measures a different kind of performance that has proved to be useful.Swyx [00:08:14]: You guys also published MTBench as well, which is a static version, let's say, of Chatbot Arena, right? That people can actually use in their development of models.Wei Lin [00:08:25]: Right. I think one of the reasons we still do this static benchmark, we still wanted to explore, experiment whether we can automate this, because people, eventually, model developers need it to fast iterate their model. So that's why we explored LM as a judge, and ArenaHard, trying to filter, select high-quality data we collected from Chatbot Arena, the high-quality subset, and use that as a question and then automate the judge pipeline, so that people can quickly get high-quality signal, benchmark signals, using this online benchmark.Swyx [00:09:03]: As a community builder, I'm curious about just the initial early days. Obviously when you offer effectively free A-B testing inference for people, people will come and use your arena. What do you think were the key unlocks for you? Was it funding for this arena? Was it marketing? When people came in, do you see a noticeable skew in the data? Which obviously now you have enough data sets, you can separate things out, like coding and hard prompts, but in the early days, it was just all sorts of things.Anastasios [00:09:31]: Yeah, maybe one thing to establish at first is that our philosophy has always been to maximize organic use. I think that really does speak to your point, which is, yeah, why do people come? They came to use free LLM inference, right? And also, a lot of users just come to the website to use direct chat, because you can chat with the model for free. And then you could think about it like, hey, let's just be kind of like more on the selfish or conservative or protectionist side and say, no, we're only giving credits for people that battle or so on and so forth. Strategy wouldn't work, right? Because what we're trying to build is like a big funnel, a big funnel that can direct people. And some people are passionate and interested and they battle. And yes, the distribution of the people that do that is different. It's like, as you're pointing out, it's like, that's not as they're enthusiastic.Wei Lin [00:10:24]: They're early adopters of this technology.Anastasios [00:10:27]: Or they like games, you know, people like this. And we've run a couple of surveys that indicate this as well, of our user base.Wei Lin [00:10:36]: We do see a lot of developers come to the site asking polling questions, 20-30%. Yeah, 20-30%.Anastasios [00:10:42]: It's obviously not reflective of the general population, but it's reflective of some corner of the world of people that really care. And to some extent, maybe that's all right, because those are like the power users. And you know, we're not trying to claim that we represent the world, right? We represent the people that come and vote.Swyx [00:11:02]: Did you have to do anything marketing-wise? Was anything effective? Did you struggle at all? Was it success from day one?Wei Lin [00:11:09]: At some point, almost done. Okay. Because as you can imagine, this leaderboard depends on community engagement participation. If no one comes to vote tomorrow, then no leaderboard.Anastasios [00:11:23]: So we had some period of time when the number of users was just, after the initial launch, it went lower. Yeah. And, you know, at some point, it did not look promising. Actually, I joined the project a couple months in to do the statistical aspects, right? As you can imagine, that's how it kind of hooked into my previous work. At that time, it wasn't like, you know, it definitely wasn't clear that this was like going to be the eval or something. It was just like, oh, this is a cool project. Like Wayland seems awesome, you know, and that's it.Wei Lin [00:11:56]: Definitely. There's in the beginning, because people don't know us, people don't know what this is for. So we had a hard time. But I think we were lucky enough that we have some initial momentum. And as well as the competition between model providers just becoming, you know, became very intense. Intense. And then that makes the eval onto us, right? Because always number one is number one.Anastasios [00:12:23]: There's also an element of trust. Our main priority in everything we do is trust. We want to make sure we're doing everything like all the I's are dotted and the T's are crossed and nobody gets unfair treatment and people can see from our profiles and from our previous work and from whatever, you know, we're trustworthy people. We're not like trying to make a buck and we're not trying to become famous off of this or that. It's just, we're trying to provide a great public leaderboard community venture project.Wei Lin [00:12:51]: Yeah.Swyx [00:12:52]: Yes. I mean, you are kind of famous now, you know, that's fine. Just to dive in more into biases and, you know, some of this is like statistical control. The classic one for human preference evaluation is humans demonstrably prefer longer contexts or longer outputs, which is actually something that we don't necessarily want. You guys, I think maybe two months ago put out some length control studies. Apart from that, there are just other documented biases. Like, I'd just be interested in your review of what you've learned about biases and maybe a little bit about how you've controlled for them.Anastasios [00:13:32]: At a very high level, yeah. Humans are biased. Totally agree. Like in various ways. It's not clear whether that's good or bad, you know, we try not to make value judgments about these things. We just try to describe them as they are. And our approach is always as follows. We collect organic data and then we take that data and we mine it to get whatever insights we can get. And, you know, we have many millions of data points that we can now use to extract insights from. Now, one of those insights is to ask the question, what is the effect of style, right? You have a bunch of data, you have votes, people are voting either which way. We have all the conversations. We can say what components of style contribute to human preference and how do they contribute? Now, that's an important question. Why is that an important question? It's important because some people want to see which model would be better if the lengths of the responses were the same, were to be the same, right? People want to see the causal effect of the model's identity controlled for length or controlled for markdown, number of headers, bulleted lists, is the text bold? Some people don't, they just don't care about that. The idea is not to impose the judgment that this is not important, but rather to say ex post facto, can we analyze our data in a way that decouples all the different factors that go into human preference? Now, the way we do this is via statistical regression. That is to say the arena score that we show on our leaderboard is a particular type of linear model, right? It's a linear model that takes, it's a logistic regression that takes model identities and fits them against human preference, right? So it regresses human preference against model identity. What you get at the end of that logistic regression is a parameter vector of coefficients. And when the coefficient is large, it tells you that GPT 4.0 or whatever, very large coefficient, that means it's strong. And that's exactly what we report in the table. It's just the predictive effect of the model identity on the vote. The other thing that you can do is you can take that vector, let's say we have M models, that is an M dimensional vector of coefficients. What you can do is you say, hey, I also want to understand what the effect of length is. So I'll add another entry to that vector, which is trying to predict the vote, right? That tells me the difference in length between two model responses. So we have that for all of our data. We can compute it ex post facto. We added it into the regression and we look at that predictive effect. And then the idea, and this is formally true under certain conditions, not always verifiable ones, but the idea is that adding that extra coefficient to this vector will kind of suck out the predictive power of length and put it into that M plus first coefficient and quote, unquote, de-bias the rest so that the effect of length is not included. And that's what we do in style control. Now we don't just do it for M plus one. We have, you know, five, six different style components that have to do with markdown headers and bulleted lists and so on that we add here. Now, where is this going? You guys see the idea. It's a general methodology. If you have something that's sort of like a nuisance parameter, something that exists and provides predictive value, but you really don't want to estimate that. You want to remove its effect. In causal inference, these things are called like confounders often. What you can do is you can model the effect. You can put them into your model and try to adjust for them. So another one of those things might be cost. You know, what if I want to look at the cost adjusted performance of my model, which models are punching above their weight, parameter count, which models are punching above their weight in terms of parameter count, we can ex post facto measure that. We can do it without introducing anything that compromises the organic nature of theWei Lin [00:17:17]: data that we collect.Anastasios [00:17:18]: Hopefully that answers the question.Wei Lin [00:17:20]: It does.Swyx [00:17:21]: So I guess with a background in econometrics, this is super familiar.Anastasios [00:17:25]: You're probably better at this than me for sure.Swyx [00:17:27]: Well, I mean, so I used to be, you know, a quantitative trader and so, you know, controlling for multiple effects on stock price is effectively the job. So it's interesting. Obviously the problem is proving causation, which is hard, but you don't have to do that.Anastasios [00:17:45]: Yes. Yes, that's right. And causal inference is a hard problem and it goes beyond statistics, right? It's like you have to build the right causal model and so on and so forth. But we think that this is a good first step and we're sort of looking forward to learning from more people. You know, there's some good people at Berkeley that work on causal inference for the learning from them on like, what are the really most contemporary techniques that we can use in order to estimate true causal effects if possible.Swyx [00:18:10]: Maybe we could take a step through the other categories. So style control is a category. It is not a default. I have thought that when you wrote that blog post, actually, I thought it would be the new default because it seems like the most obvious thing to control for. But you also have other categories, you have coding, you have hard prompts. We consider that.Anastasios [00:18:27]: We're still actively considering it. It's just, you know, once you make that step, once you take that step, you're introducing your opinion and I'm not, you know, why should our opinion be the one? That's kind of a community choice. We could put it to a vote.Wei Lin [00:18:39]: We could pass.Anastasios [00:18:40]: Yeah, maybe do a poll. Maybe do a poll.Swyx [00:18:42]: I don't know. No opinion is an opinion.Wei Lin [00:18:44]: You know what I mean?Swyx [00:18:45]: Yeah.Wei Lin [00:18:46]: There's no neutral choice here.Swyx [00:18:47]: Yeah. You have all these others. You have instruction following too. What are your favorite categories that you like to talk about? Maybe you tell a little bit of the stories, tell a little bit of like the hard choices that you had to make.Wei Lin [00:18:57]: Yeah. Yeah. Yeah. I think the, uh, initially the reason why we want to add these new categories is essentially to answer some of the questions from our community, which is we won't have a single leaderboard for everything. So these models behave very differently in different domains. Let's say this model is trend for coding, this model trend for more technical questions and so on. On the other hand, to answer people's question about like, okay, what if all these low quality, you know, because we crowdsource data from the internet, there will be noise. So how do we de-noise? How do we filter out these low quality data effectively? So that was like, you know, some questions we want to answer. So basically we spent a few months, like really diving into these questions to understand how do we filter all these data because these are like medias of data points. And then if you want to re-label yourself, it's possible, but we need to kind of like to automate this kind of data classification pipeline for us to effectively categorize them to different categories, say coding, math, structure, and also harder problems. So that was like, the hope is when we slice the data into these meaningful categories to give people more like better signals, more direct signals, and that's also to clarify what we are actually measuring for, because I think that's the core part of the benchmark. That was the initial motivation. Does that make sense?Anastasios [00:20:27]: Yeah. Also, I'll just say, this does like get back to the point that the philosophy is to like mine organic, to take organic data and then mine it x plus factor.Alessio [00:20:35]: Is the data cage-free too, or just organic?Anastasios [00:20:39]: It's cage-free.Wei Lin [00:20:40]: No GMO. Yeah. And all of these efforts are like open source, like we open source all of the data cleaning pipeline, filtering pipeline. Yeah.Swyx [00:20:50]: I love the notebooks you guys publish. Actually really good just for learning statistics.Wei Lin [00:20:54]: Yeah. I'll share this insights with everyone.Alessio [00:20:59]: I agree on the initial premise of, Hey, writing an email, writing a story, there's like no ground truth. But I think as you move into like coding and like red teaming, some of these things, there's like kind of like skill levels. So I'm curious how you think about the distribution of skill of the users. Like maybe the top 1% of red teamers is just not participating in the arena. So how do you guys think about adjusting for it? And like feels like this where there's kind of like big differences between the average and the top. Yeah.Anastasios [00:21:29]: Red teaming, of course, red teaming is quite challenging. So, okay. Moving back. There's definitely like some tasks that are not as subjective that like pairwise human preference feedback is not the only signal that you would want to measure. And to some extent, maybe it's useful, but it may be more useful if you give people better tools. For example, it'd be great if we could execute code with an arena, be fantastic.Wei Lin [00:21:52]: We want to do it.Anastasios [00:21:53]: There's also this idea of constructing a user leaderboard. What does that mean? That means some users are better than others. And how do we measure that? How do we quantify that? Hard in chatbot arena, but where it is easier is in red teaming, because in red teaming, there's an explicit game. You're trying to break the model, you either win or you lose. So what you can do is you can say, Hey, what's really happening here is that the models and humans are playing a game against one another. And then you can use the same sort of Bradley Terry methodology with some, some extensions that we came up with in one of you can read one of our recent blog posts for, for the sort of theoretical extensions. You can attribute like strength back to individual players and jointly attribute strength to like the models that are in this jailbreaking game, along with the target tasks, like what types of jailbreaks you want.Wei Lin [00:22:44]: So yeah.Anastasios [00:22:45]: And I think that this is, this is a hugely important and interesting avenue that we want to continue researching. We have some initial ideas, but you know, all thoughts are welcome.Wei Lin [00:22:54]: Yeah.Alessio [00:22:55]: So first of all, on the code execution, the E2B guys, I'm sure they'll be happy to helpWei Lin [00:22:59]: you.Alessio [00:23:00]: I'll please set that up. They're big fans. We're investors in a company called Dreadnought, which we do a lot in AI red teaming. I think to me, the most interesting thing has been, how do you do sure? Like the model jailbreak is one side. We also had Nicola Scarlini from DeepMind on the podcast, and he was talking about, for example, like, you know, context stealing and like a weight stealing. So there's kind of like a lot more that goes around it. I'm curious just how you think about the model and then maybe like the broader system, even with Red Team Arena, you're just focused on like jailbreaking of the model, right? You're not doing kind of like any testing on the more system level thing of the model where like, maybe you can get the training data back, you're going to exfiltrate some of the layers and the weights and things like that.Wei Lin [00:23:43]: So right now, as you can see, the Red Team Arena is at a very early stage and we are still exploring what could be the potential new games we can introduce to the platform. So the idea is still the same, right? And we build a community driven project platform for people. They can have fun with this website, for sure. That's one thing, and then help everyone to test these models. So one of the aspects you mentioned is stealing secrets, stealing training sets. That could be one, you know, it could be designed as a game. Say, can you still use their credential, you know, we hide, maybe we can hide the credential into system prompts and so on. So there are like a few potential ideas we want to explore for sure. Do you want to add more?Anastasios [00:24:28]: I think that this is great. This idea is a great one. There's a lot of great ideas in the Red Teaming space. You know, I'm not personally like a Red Teamer. I don't like go around and Red Team models, but there are people that do that and they're awesome. They're super skilled. When I think about the Red Team arena, I think those are really the people that we're building it for. Like, we want to make them excited and happy, build tools that they like. And just like chatbot arena, we'll trust that this will end up being useful for the world. And all these people are, you know, I won't say all these people in this community are actually good hearted, right? They're not doing it because they want to like see the world burn. They're doing it because they like, think it's fun and cool. And yeah. Okay. Maybe they want to see, maybe they want a little bit.Wei Lin [00:25:13]: I don't know. Majority.Anastasios [00:25:15]: Yeah.Wei Lin [00:25:16]: You know what I'm saying.Anastasios [00:25:17]: So, you know, trying to figure out how to serve them best, I think, I don't know where that fits. I just, I'm not expressing. And give them credits, right?Wei Lin [00:25:24]: And give them credit.Anastasios [00:25:25]: Yeah. Yeah. So I'm not trying to express any particular value judgment here as to whether that's the right next step. It's just, that's sort of the way that I think we would think about it.Swyx [00:25:35]: Yeah. We also talked to Sander Schulhoff of the HackerPrompt competition, and he's pretty interested in Red Teaming at scale. Let's just call it that. You guys maybe want to talk with him.Wei Lin [00:25:45]: Oh, nice.Swyx [00:25:46]: We wanted to cover a little, a few topical things and then go into the other stuff that your group is doing. You know, you're not just running Chatbot Arena. We can also talk about the new website and your future plans, but I just wanted to briefly focus on O1. It is the hottest, latest model. Obviously, you guys already have it on the leaderboard. What is the impact of O1 on your evals?Wei Lin [00:26:06]: Made our interface slower.Anastasios [00:26:07]: It made it slower.Swyx [00:26:08]: Yeah.Wei Lin [00:26:10]: Because it needs like 30, 60 seconds, sometimes even more to, the latency is like higher. So that's one. Sure. But I think we observe very interesting things from this model as well. Like we observe like significant improvement in certain categories, like more technical or math. Yeah.Anastasios [00:26:32]: I think actually like one takeaway that was encouraging is that I think a lot of people before the O1 release were thinking, oh, like this benchmark is saturated. And why were they thinking that? They were thinking that because there was a bunch of models that were kind of at the same level. They were just kind of like incrementally competing and it sort of wasn't immediately obvious that any of them were any better. Nobody, including any individual person, it's hard to tell. But what O1 did is it was, it's clearly a better model for certain tasks. I mean, I used it for like proving some theorems and you know, there's some theorems that like only I know because I still do a little bit of theory. Right. So it's like, I can go in there and ask like, oh, how would you prove this exact thing? Which I can tell you has never been in the public domain. It'll do it. It's like, what?Wei Lin [00:27:19]: Okay.Anastasios [00:27:20]: So there's this model and it crushed the benchmark. You know, it's just like really like a big gap. And what that's telling us is that it's not saturated yet. It's still measuring some signal. That was encouraging. The point, the takeaway is that the benchmark is comparative. There's no absolute number. There's no maximum ELO. It's just like, if you're better than the rest, then you win. I think that was actually quite helpful to us.Swyx [00:27:46]: I think people were criticizing, I saw some of the academics criticizing it as not apples to apples. Right. Like, because it can take more time to reason, it's basically doing some search, doing some chain of thought that if you actually let the other models do that same thing, they might do better.Wei Lin [00:28:03]: Absolutely.Anastasios [00:28:04]: To be clear, none of the leaderboard currently is apples to apples because you have like Gemini Flash, you have, you know, all sorts of tiny models like Lama 8B, like 8B and 405B are not apples to apples.Wei Lin [00:28:19]: Totally agree. They have different latencies.Anastasios [00:28:21]: Different latencies.Wei Lin [00:28:22]: Control for latency. Yeah.Anastasios [00:28:24]: Latency control. That's another thing. We can do style control, but latency control. You know, things like this are important if you want to understand the trade-offs involved in using AI.Swyx [00:28:34]: O1 is a developing story. We still haven't seen the full model yet, but it's definitely a very exciting new paradigm. I think one community controversy I just wanted to give you guys space to address is the collaboration between you and the large model labs. People have been suspicious, let's just say, about how they choose to A-B test on you. I'll state the argument and let you respond, which is basically they run like five anonymous models and basically argmax their Elo on LMSYS or chatbot arena, and they release the best one. Right? What has been your end of the controversy? How have you decided to clarify your policy going forward?Wei Lin [00:29:15]: On a high level, I think our goal here is to build a fast eval for everyone, and including everyone in the community can see the data board and understand, compare the models. More importantly, I think we want to build the best eval also for model builders, like all these frontier labs building models. They're also internally facing a challenge, which is how do they eval the model? That's the reason why we want to partner with all the frontier lab people, and then to help them testing. That's one of the... We want to solve this technical challenge, which is eval. Yeah.Anastasios [00:29:54]: I mean, ideally, it benefits everyone, right?Wei Lin [00:29:56]: Yeah.Anastasios [00:29:57]: And people also are interested in seeing the leading edge of the models. People in the community seem to like that. Oh, there's a new model up. Is this strawberry? People are excited. People are interested. Yeah. And then there's this question that you bring up of, is it actually causing harm?Wei Lin [00:30:15]: Right?Anastasios [00:30:16]: Is it causing harm to the benchmark that we are allowing this private testing to happen? Maybe stepping back, why do you have that instinct? The reason why you and others in the community have that instinct is because when you look at something like a benchmark, like an image net, a static benchmark, what happens is that if I give you a million different models that are all slightly different, and I pick the best one, there's something called selection bias that plays in, which is that the performance of the winning model is overstated. This is also sometimes called the winner's curse. And that's because statistical fluctuations in the evaluation, they're driving which model gets selected as the top. So this selection bias can be a problem. Now there's a couple of things that make this benchmark slightly different. So first of all, the selection bias that you include when you're only testing five models is normally empirically small.Wei Lin [00:31:12]: And that's why we have these confidence intervals constructed.Anastasios [00:31:16]: That's right. Yeah. Our confidence intervals are actually not multiplicity adjusted. One thing that we could do immediately tomorrow in order to address this concern is if a model provider is testing five models and they want to release one, and we're constructing the models at level one minus alpha, we can just construct the intervals instead at level one minus alpha divided by five. That's called Bonferroni correction. What that'll tell you is that the final performance of the model, the interval that gets constructed, is actually formally correct. We don't do that right now, partially because we know from simulations that the amount of selection bias you incur with these five things is just not huge. It's not huge in comparison to the variability that you get from just regular human voters. So that's one thing. But then the second thing is the benchmark is live, right? So what ends up happening is it'll be a small magnitude, but even if you suffer from the winner's curse after testing these five models, what'll happen is that over time, because we're getting new data, it'll get adjusted down. So if there's any bias that gets introduced at that stage, in the long run, it actually doesn't matter. Because asymptotically, basically in the long run, there's way more fresh data than there is data that was used to compare these five models against these private models.Swyx [00:32:35]: The announcement effect is only just the first phase and it has a long tail.Anastasios [00:32:39]: Yeah, that's right. And it sort of like automatically corrects itself for this selection adjustment.Swyx [00:32:45]: Every month, I do a little chart of Ellim's ELO versus cost, just to track the price per dollar, the amount of like, how much money do I have to pay for one incremental point in ELO? And so I actually observe an interesting stability in most of the ELO numbers, except for some of them. For example, GPT-4-O August has fallen from 12.90
Let's Talk Automated Red TeamingExplore automated red teaming and red-blue team synergy with Ryan Hays, Global Head of Red Team at Citi, tackling misconceptions and fostering cross-team collaboration.+ + +Find more episodes on YouTube or wherever you listen to podcasts, as well as at netspi.com/agentofinfluence.
Hurricane Helene did a massive amount of damage to our state, and many weren't prepared for it. Perrin walks through his own thought process about a concept called “Red Teaming” and relates it to his family and business. The time to deploy it is “hopefully never.” The time to think about it is “NOW”.
Guest: Steve Wilson, Chief Product Officer, Exabeam [@exabeam] & Project Lead, OWASP Top 10 for Larage Language Model Applications [@owasp]On LinkedIn | https://www.linkedin.com/in/wilsonsd/On Twitter | https://x.com/virtualsteve____________________________Host: Sean Martin, Co-Founder at ITSPmagazine [@ITSPmagazine] and Host of Redefining CyberSecurity Podcast [@RedefiningCyber]On ITSPmagazine | https://www.itspmagazine.com/sean-martinView This Show's Sponsors___________________________Episode NotesIn this episode of Redefining CyberSecurity, host Sean Martin sat down with Steve Wilson, chief product officer at Exabeam, to discuss the critical topic of secure AI development. The conversation revolved around the nuances of developing and deploying large language models (LLMs) in the field of cybersecurity.Steve Wilson's expertise lies at the intersection of AI and cybersecurity, a point he emphasized while sharing his journey from founding the Top 10 group for large language models to authoring his new book, "The Developer's Playbook for Large Language Model Security." In this insightful discussion, Wilson and Martin explore the roles of developers and product managers in ensuring the safety and security of AI systems.One of the key themes in the conversation is the categorization of AI applications into chatbots, co-pilots, and autonomous agents. Wilson explains that while chatbots are open-ended, interacting with users on various topics, co-pilots focus on enhancing productivity within specific domains by interacting with user data. Autonomous agents are more independent, executing tasks with minimal human intervention.Wilson brings attention to the concept of overreliance on AI models and the associated risks. Highlighting that large language models can hallucinate or produce unreliable outputs, he stresses the importance of designing systems that account for these limitations. Product managers play a crucial role here, ensuring that AI applications are built to mitigate risks and communicate their reliability to users effectively.The discussion also touches on the importance of security guardrails and continuous monitoring. Wilson introduces the idea of using tools akin to web app firewalls (WAF) or runtime application self-protection (RASP) to keep AI models within safe operational parameters. He mentions frameworks like Nvidia's open-source project, Nemo Guardrails, which aid developers in implementing these defenses.Moreover, the conversation highlights the significance of testing and evaluation in AI development. Wilson parallels the education and evaluation of LLMs to training and testing a human-like system, underscoring that traditional unit tests may not suffice. Instead, flexible test cases and advanced evaluation tools are necessary. Another critical aspect Wilson discusses is the need for red teaming in AI security. By rigorously testing AI systems and exploring their vulnerabilities, organizations can better prepare for real-world threats. This proactive approach is essential for maintaining robust AI applications.Finally, Wilson shares insights from his book, including the Responsible AI Software Engineering (RAISE) framework. This comprehensive guide offers developers and product managers practical steps to integrate secure AI practices into their workflows. With an emphasis on continuous improvement and risk management, the RAISE framework serves as a valuable resource for anyone involved in AI development.About the BookLarge language models (LLMs) are not just shaping the trajectory of AI, they're also unveiling a new era of security challenges. This practical book takes you straight to the heart of these threats. Author Steve Wilson, chief product officer at Exabeam, focuses exclusively on LLMs, eschewing generalized AI security to delve into the unique characteristics and vulnerabilities inherent in these models.Complete with collective wisdom gained from the creation of the OWASP Top 10 for LLMs list—a feat accomplished by more than 400 industry experts—this guide delivers real-world guidance and practical strategies to help developers and security teams grapple with the realities of LLM applications. Whether you're architecting a new application or adding AI features to an existing one, this book is your go-to resource for mastering the security landscape of the next frontier in AI.___________________________SponsorsImperva: https://itspm.ag/imperva277117988LevelBlue: https://itspm.ag/attcybersecurity-3jdk3___________________________Watch this and other videos on ITSPmagazine's YouTube ChannelRedefining CyberSecurity Podcast with Sean Martin, CISSP playlist:
In this Emergency Pod of The Cognitive Revolution, Nathan provides crucial insights into OpenAI's new o1 and o1-mini reasoning models. Featuring exclusive interviews with members of the o1 Red Team from Apollo Research and Haize Labs, we explore the models' capabilities, safety profile, and OpenAI's pre-release testing approach. Dive into the implications of these advanced AI systems, including their potential to match or exceed expert performance in many areas. Join us for an urgent and informative discussion on the latest developments in AI technology and their impact on the future. o1 Safety Card Haize Labs Endless Jailbreaks with Bijection Learning: a Powerful, Scale-Agnostic Attack Method Haize Labs Job board Papers mentioned: https://arxiv.org/pdf/2407.21792 https://far.ai/post/2024-07-robust-llm/paper.pdf Apply to join over 400 Founders and Execs in the Turpentine Network: https://www.turpentinenetwork.co/ SPONSORS: Oracle: Oracle Cloud Infrastructure (OCI) is a single platform for your infrastructure, database, application development, and AI needs. OCI has four to eight times the bandwidth of other clouds; offers one consistent price, and nobody does data better than Oracle. If you want to do more and spend less, take a free test drive of OCI at https://oracle.com/cognitive Brave: The Brave search API can be used to assemble a data set to train your AI models and help with retrieval augmentation at the time of inference. All while remaining affordable with developer first pricing, integrating the Brave search API into your workflow translates to more ethical data sourcing and more human representative data sets. Try the Brave search API for free for up to 2000 queries per month at https://bit.ly/BraveTCR Omneky: Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off https://www.omneky.com/ Squad: Head to Squad to access global engineering without the headache and at a fraction of the cost: head to https://choosesquad.com/ and mention “Turpentine” to skip the waitlist. RECOMMENDED PODCAST: This Won't Last. Eavesdrop on Keith Rabois, Kevin Ryan, Logan Bartlett, and Zach Weinberg's monthly backchannel. They unpack their hottest takes on the future of tech, business, venture, investing, and politics. Apple Podcasts: https://podcasts.apple.com/us/podcast/id1765665937 Spotify: https://open.spotify.com/show/2HwSNeVLL1MXy0RjFPyOSz YouTube: https://www.youtube.com/@ThisWontLastpodcast CHAPTERS: (00:00:00) About the Show (00:00:22) About the Episode (00:05:03) Introduction and Haize Labs Overview (00:07:36) Universal Jailbreak Technique and Attacks (00:13:47) Automated vs Manual Red Teaming (00:17:15) Qualitative Assessment of Model Jailbreaking (Part 1) (00:19:38) Sponsors: Oracle | Brave (00:21:42) Qualitative Assessment of Model Jailbreaking (Part 2) (00:26:21) Context-Specific Safety Considerations (00:32:26) Model Capabilities and Safety Correlation (Part 1) (00:36:22) Sponsors: Omneky | Squad (00:37:48) Model Capabilities and Safety Correlation (Part 2) (00:44:42) Model Behavior and Defense Mechanisms (00:52:47) Challenges in Preventing Jailbreaks (00:56:24) Safety, Capabilities, and Model Scale (01:00:56) Model Classification and Preparedness (01:04:40) Concluding Thoughts on o1 and Future Work (01:05:54) Outro
In this Emergency Pod of The Cognitive Revolution, Nathan provides crucial insights into OpenAI's new O1 and O1-mini reasoning models. Featuring exclusive interviews with members of the O1 Red Team from Apollo Research and Hayes Labs, we explore the models' capabilities, safety profile, and OpenAI's pre-release testing approach. Dive into the implications of these advanced AI systems, including their potential to match or exceed expert performance in many areas. Join us for an urgent and informative discussion on the latest developments in AI technology and their impact on the future. o1 Safety Card Endless Jailbreaks with Bijection Learning: a Powerful, Scale-Agnostic Attack Method Apollo Research Apollo Careers Page Papers mentioned: Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress? Exploring Scaling Trends in LLM Robustness Apply to join over 400 Founders and Execs in the Turpentine Network: https://www.turpentinenetwork.co/ SPONSORS: Oracle: Oracle Cloud Infrastructure (OCI) is a single platform for your infrastructure, database, application development, and AI needs. OCI has four to eight times the bandwidth of other clouds; offers one consistent price, and nobody does data better than Oracle. If you want to do more and spend less, take a free test drive of OCI at https://oracle.com/cognitive Brave: The Brave search API can be used to assemble a data set to train your AI models and help with retrieval augmentation at the time of inference. All while remaining affordable with developer first pricing, integrating the Brave search API into your workflow translates to more ethical data sourcing and more human representative data sets. Try the Brave search API for free for up to 2000 queries per month at https://bit.ly/BraveTCR Omneky: Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off https://www.omneky.com/ Squad: Head to Squad to access global engineering without the headache and at a fraction of the cost: head to https://choosesquad.com/ and mention “Turpentine” to skip the waitlist. RECOMMENDED PODCAST: This Won't Last. Eavesdrop on Keith Rabois, Kevin Ryan, Logan Bartlett, and Zach Weinberg's monthly backchannel. They unpack their hottest takes on the future of tech, business, venture, investing, and politics. Apple Podcasts: https://podcasts.apple.com/us/podcast/id1765665937 Spotify: https://open.spotify.com/show/2HwSNeVLL1MXy0RjFPyOSz YouTube: https://www.youtube.com/@ThisWontLastpodcast CHAPTERS: (00:00:00) About the Show (00:00:22) About the Episode (00:05:03) Introduction and Apollo Research Updates (00:06:40) Focus on Deception in AI (00:11:08) Open AI's 01 Model and Testing (00:15:54) Evaluating AI Models for Scheming (Part 1) (00:19:32) Sponsors: Oracle | Brave (00:21:36) Evaluating AI Models for Scheming (Part 2) (00:25:55) Specific Benchmarks and Tasks (Part 1) (00:35:03) Sponsors: Omneky | Squad (00:36:29) Specific Benchmarks and Tasks (Part 2) (00:37:21) Model Capabilities and Potential Risks (00:44:11) Ethical Considerations and Future Concerns (00:50:31) Competing Trends in AI Development (00:53:30) System Card Quotes and Implications (00:58:36) Sponsors: Outro
Guest: Sander Schulhoff, CEO and Co-Founder, Learn Prompting [@learnprompting]On LinkedIn | https://www.linkedin.com/in/sander-schulhoff/____________________________Host: Sean Martin, Co-Founder at ITSPmagazine [@ITSPmagazine] and Host of Redefining CyberSecurity Podcast [@RedefiningCyber]On ITSPmagazine | https://www.itspmagazine.com/sean-martinView This Show's Sponsors___________________________Episode NotesIn this episode of Redefining CyberSecurity, host Sean Martin engages with Sander Schulhoff, CEO and Co-Founder of Learn Prompting and a researcher at the University of Maryland. The discussion focuses on the critical intersection of artificial intelligence (AI) and cybersecurity, particularly the role of prompt engineering in the evolving AI landscape. Schulhoff's extensive work in natural language processing (NLP) and deep reinforcement learning provides a robust foundation for this insightful conversation.Prompt engineering, a vital part of AI research and development, involves creating effective input prompts that guide AI models to produce desired outputs. Schulhoff explains that the diversity of prompt techniques is vast and includes methods like the chain of thought, which helps AI articulate its reasoning steps to solve complex problems. However, the conversation highlights that there are significant security concerns that accompany these techniques.One such concern is the vulnerability of systems when they integrate user-generated prompts with AI models, especially those prompts that can execute code or interact with external databases. Security flaws can arise when these systems are not adequately sandboxed or otherwise protected, as demonstrated by Schulhoff through real-world examples like MathGPT, a tool that was exploited to run arbitrary code by injecting malicious prompts into the AI's input.Schulhoff's insights into the AI Village at DEF CON underline the community's nascent but growing focus on AI security. He notes an intriguing pattern: many participants in AI-specific red teaming events were beginners, which suggests a gap in traditional red teamer familiarity with AI systems. This gap necessitates targeted education and training, something Schulhoff is actively pursuing through initiatives at Learn Prompting.The discussion also covers the importance of studying and understanding the potential risks posed by AI models in business applications. With AI increasingly integrated into various sectors, including security, the stakes for anticipating and mitigating risks are high. Schulhoff mentions that his team is working on Hack A Prompt, a global prompt injection competition aimed at crowdsourcing diverse attack strategies. This initiative not only helps model developers understand potential vulnerabilities but also furthers the collective knowledge base necessary for building more secure AI systems.As AI continues to intersect with various business processes and applications, the role of security becomes paramount. This episode underscores the need for collaboration between prompt engineers, security professionals, and organizations at large to ensure that AI advancements are accompanied by robust, proactive security measures. By fostering awareness and education, and through collaborative competitions like Hack A Prompt, the community can better prepare for the multifaceted challenges that AI security presents.Top Questions AddressedWhat are the key security concerns associated with prompt engineering?How can organizations ensure the security of AI systems that integrate user-generated prompts?What steps can be taken to bridge the knowledge gap in AI security among traditional security professionals?___________________________SponsorsImperva: https://itspm.ag/imperva277117988LevelBlue: https://itspm.ag/attcybersecurity-3jdk3___________________________Watch this and other videos on ITSPmagazine's YouTube ChannelRedefining CyberSecurity Podcast with Sean Martin, CISSP playlist:
Bishop Fox senior security consultant Alethe Denis joins the Claroty Nexus podcast to discuss social engineering in cybersecurity and how it has become part of red-team engagements, especially inside critical infrastructure organizations. She explains the value of open source intelligence and data stolen in breaches to scammers and extortionists in creating pretexts for their schemes. She also explains how to best defend against these tactics that aid threat actors in weaponizing personal information against victims and organizations. For more, visit nexusconnect.io/podcasts.
In this episode we sit down with Chloe Messdaghi, Head of Threat Intelligence at HiddenLayer, an AI Security startup focused on securing the quickly evolving AI security landscape. HiddenLayer was the 2023 RSAC Innovation Sandbox Winner and offers a robust platform including AI Security, Detection & Response and Model Scanning.- For folks now familiar with you or the HiddenLayer team, can you tell us a bit about your background, as well as that of HiddenLayer?- When you look at the AI landscape, and discussions around securing AI, what is the current state of things as it stands now? I would recommend checking out the "AI Threat Landscape Report" you all recently published.- Many organizations of course are in their infancy in terms of AI adoption and security. I know the HiddenLayer team has really been advocating concepts such as AI Governance. Can you talk about how organizations can get started on this foundational activity?- HiddenLayer published a great two part series on an "AI Step-by-Step Guide for CISO's", can you talk about some of those recommendations a bit?- You all also have been evangelizing practices such as Red Teaming for AI and AI Models. What exactly is AI Red Teaming and why is it so critical to do?- Another interesting topic is how we're beginning to look to Govern AI, both here in the U.S. with things such as the AI EO, and in the EU with the EU AI Act. What are some key takeaways from those, and what do you think about the differences in approaches we're seeing so far?
Alethe Denis is the first ever three-time guest to the Layer 8 Podcast. When Alethe comes on, we can swap stories for hours. And we did! This is part 1 of a two-part episode, as Alethe had so many great stories to share. For this episode, she talks her way into buildings, tells us how she prepares her OSINT and when she knows it's time to go into the building. Check back in two weeks for part 2!
Azure Open AI is widely used in industry but there are number of security aspects that must be taken into account when using the technology. Luckily for us, Audrey Long, a Software Engineer at Microsoft, security expert and renowned conference speaker, gives us insights into securing LLMs and provides various tips, tricks and tools to help developers use these models safely in their applications. Media file: https://azpodcast.blob.core.windows.net/episodes/Episode502.mp3 YouTube: https://youtu.be/64Achcz97PI Resources: AI Tooling: Azure AI Tooling Announcing new tools in Azure AI to help you build more secure and trustworthy generative AI applications | Microsoft Azure Blog Prompt Shields to detect and block prompt injection attacks, including a new model for identifying indirect prompt attacks before they impact your model, coming soon and now available in preview in Azure AI Content Safety. Groundedness detection to detect “hallucinations” in model outputs, coming soon. Safety system messagesto steer your model’s behavior toward safe, responsible outputs, coming soon. Safety evaluations to assess an application’s vulnerability to jailbreak attacks and to generating content risks, now available in preview. Risk and safety monitoring to understand what model inputs, outputs, and end users are triggering content filters to inform mitigations, coming soon, and now available in preview in Azure OpenAI Service. AI Defender for Cloud AI Security Posture Management AI security posture management (Preview) - Microsoft Defender for Cloud | Microsoft Learn AI Workloads Enable threat protection for AI workloads (preview) - Microsoft Defender for Cloud | Microsoft Learn AI Red Teaming Tool Announcing Microsoft’s open automation framework to red team generative AI Systems | Microsoft Security Blog AI Development Considerations: AI Assessment from Microsoft Conduct an AI assessment using Microsoft’s Responsible AI Impact Assessment Template Responsible AI Impact Assessment Guide for detailed instructions Microsoft Responsible AI Processes Follow Microsoft’s Responsible AI principles: fairness, reliability, safety, privacy, security, inclusiveness, transparency, and accountability Utilize tools like the Responsible AI Dashboard for continuous monitoring and improvement Define Use Case and Model Architecture Determine the specific use case for your LLM Design the model architecture, focusing on the Transformer architecture Content Filtering System How to use content filters (preview) with Azure OpenAI Service - Azure OpenAI | Microsoft Learn Azure OpenAI Service includes a content filtering system that works alongside core models, including DALL-E image generation models. This system uses an ensemble of classification models to detect and prevent harmful content in both input prompts and output completions The filtering system covers four main categories: hate, sexual, violence, and self-harm Each category is assessed at four severity levels: safe, low, medium, and high Additional classifiers are available for detecting jailbreak risks and known content for text and code. JailBreaking Content Filters Red Teaming the LLM Plan and conduct red teaming exercises to identify potential vulnerabilities Use diverse red teamers to simulate adversarial attacks and test the model’s robustness Microsoft AI Red Team building future of safer AI | Microsoft Security Blog Create a Threat Model with OWASP Top 10 owasp.org/www-project-top-10-for-large-language-model-applications/assets/PDF/OWASP-Top-10-for-LLMs-2023-slides-v1_1.pdf Develop a threat model and implement mitigations based on identified risks Other updates: Los Angeles Azure Extended Zones Carbon Optimization App Config Ref GA OS SKU In-Place Migration for AKS Operator CRD Support with Azure Monitor Managed Service Azure API Center Visual Studio Code Extension Pre-release Azure API Management WordPress Plugin Announcing a New OpenAI Feature for Developers on Azure
In this interview we explore the new and sometimes strange world of redteaming AI. I have SO many questions, like what is AI safety? We'll discuss her presence at Black Hat, where she delivered two days of training and participated on an AI safety panel. We'll also discuss the process of pentesting an AI. Will pentesters just have giant cheatsheets or text files full of adversarial prompts? How can we automate this? Will an AI generate adversarial prompts you can use against another AI? And finally, what do we do with the results? Resources: PyRIT AI redteaming tool Microsoft's AI redteaming guide We chat with Sounil Yu, co-founder of LLM access control startup, Knostic. We discuss both the experience of participating in Black Hat's startup competition, and what his company, Knostic, is all about. Knostic was one of four finalists for Black Hat's Startup Spotlight competition and was announced as the winner on August 6th. References DarkReading: Knostic Wins 2024 Black Hat Startup Spotlight Competition Knostic's Website , in the enterprise security news, AI is still getting a ton of funding! Netwrix acquires PingCastle Tenable looks for a buyer SentinelOne hires Alex Stamos as their new CISO Crowdstrike doesn't appreciate satire when it's at their expense Intel begins one of the biggest layoffs we've ever seen in tech Windows Downdate RAG poisoning GPT yourself The Xerox Hypothesis All that and more, on this episode of Enterprise Security Weekly. Visit https://www.securityweekly.com/esw for all the latest episodes! Show Notes: https://securityweekly.com/esw-371
In this interview we explore the new and sometimes strange world of redteaming AI. I have SO many questions, like what is AI safety? We'll discuss her presence at Black Hat, where she delivered two days of training and participated on an AI safety panel. We'll also discuss the process of pentesting an AI. Will pentesters just have giant cheatsheets or text files full of adversarial prompts? How can we automate this? Will an AI generate adversarial prompts you can use against another AI? And finally, what do we do with the results? Resources: PyRIT AI redteaming tool Microsoft's AI redteaming guide We chat with Sounil Yu, co-founder of LLM access control startup, Knostic. We discuss both the experience of participating in Black Hat's startup competition, and what his company, Knostic, is all about. Knostic was one of four finalists for Black Hat's Startup Spotlight competition and was announced as the winner on August 6th. References DarkReading: Knostic Wins 2024 Black Hat Startup Spotlight Competition Knostic's Website , in the enterprise security news, AI is still getting a ton of funding! Netwrix acquires PingCastle Tenable looks for a buyer SentinelOne hires Alex Stamos as their new CISO Crowdstrike doesn't appreciate satire when it's at their expense Intel begins one of the biggest layoffs we've ever seen in tech Windows Downdate RAG poisoning GPT yourself The Xerox Hypothesis All that and more, on this episode of Enterprise Security Weekly. Visit https://www.securityweekly.com/esw for all the latest episodes! Show Notes: https://securityweekly.com/esw-371
In this interview we explore the new and sometimes strange world of redteaming AI. I have SO many questions, like what is AI safety? We'll discuss her presence at Black Hat, where she delivered two days of training and participated on an AI safety panel. We'll also discuss the process of pentesting an AI. Will pentesters just have giant cheatsheets or text files full of adversarial prompts? How can we automate this? Will an AI generate adversarial prompts you can use against another AI? And finally, what do we do with the results? Resources: PyRIT AI redteaming tool Microsoft's AI redteaming guide Show Notes: https://securityweekly.com/esw-371
Hey humans, this is Stacie Baird, and welcome back to the HX podcast. We're continuing our series, HX in an AI World, where we're exploring how to humanize the integration of AI and technology in our organizations. Last week, I gave you a simple assignment to reduce fear around generative AI. Today, we'll build on that by discussing how to get more comfortable with these tools and introducing another assignment to help you integrate technology into your daily routine. I also shared my experiences from Freedom Friday, a ritual at our offices where I prepare emotionally and mentally for our weekly town hall. It's a time for reflection, regulation, and recognizing wins, which is crucial for maintaining well-being. In this episode, I'll provide tips on using technology for mental health, such as apps like Calm or Insight Timer, and emphasize the importance of ethical use and red teaming AI applications. Remember, it's our responsibility to understand AI to effectively contribute to discussions and decision-making processes. I hope you find these insights and assignments helpful as we navigate the integration of AI into our lives and work. Stay curious and compassionate, and join me next time as we continue this journey with guest experts who will share their knowledge on this fascinating topic. Stacie More episodes at StacieBaird.com.
Today on the Social-Engineer Podcast: The Security Awareness Series, Chris is joined by May Brooks-Kempler. May is a cybersecurity expert who has transformed her early curiosity, hacking 90's computer games, into a distinguished cybersecurity career. As a board member of ISC2, an educator, a CISO and the founder of the Think Safe Cyber community, she is dedicated to making the online world a safer place for everyone. [July 15, 2024] 00:00 - Intro 00:19 - Intro Links: - Social-Engineer.com - http://www.social-engineer.com/ - Managed Voice Phishing - https://www.social-engineer.com/services/vishing-service/ - Managed Email Phishing - https://www.social-engineer.com/services/se-phishing-service/ - Adversarial Simulations - https://www.social-engineer.com/services/social-engineering-penetration-test/ - Social-Engineer channel on SLACK - https://social-engineering-hq.slack.com/ssb - CLUTCH - http://www.pro-rock.com/ - innocentlivesfoundation.org - http://www.innocentlivesfoundation.org/ 03:17 - May Brooks-Kempler Intro 03:55 - Twist of Fate 05:10 - A Moment of Silence 05:51 - Blame Grandma 08:15 - An Unclear Path 11:34 - It Takes a Village 13:40 - Considering the Other Side 16:10 - Start with "Why" 20:41 - "It's Never Personal - CyberWise Parenting Course - Listeners get 20% off with the coupon SOCIAL - TEDx – Think Cyber 27:47 - Lifelong Learning 30:50 - Going Public 32:57 - Find May Brooks-Kempler online - LinkedIn: in/may-brooks-kempler - Instagram: @cybermaynia 33:46 - Mentors - Avi Weissman - Oren Bratt - Itzik Kochav 35:54 - Book Recommendations - Human Hacking - Christopher Hadnagy - Countdown to Zero Day - Kim Zetter - Do You Talk Funny? - David Nihill - Start with Why - Simon Sinek 37:17 - Wrap Up & Outro - www.social-engineer.com - www.innocentlivesfoundation.org
Dive into an accessible discussion on AI safety and philosophy, technical AI safety progress, and why catastrophic outcomes aren't inevitable. This conversation provides practical advice for AI newcomers and hope for a positive future. Consistently Candid Podcast : https://open.spotify.com/show/1EX89qABpb4pGYP1JLZ3BB SPONSORS: Oracle Cloud Infrastructure (OCI) is a single platform for your infrastructure, database, application development, and AI needs. OCI has four to eight times the bandwidth of other clouds; offers one consistent price, and nobody does data better than Oracle. If you want to do more and spend less, take a free test drive of OCI at https://oracle.com/cognitive The Brave search API can be used to assemble a data set to train your AI models and help with retrieval augmentation at the time of inference. All while remaining affordable with developer first pricing, integrating the Brave search API into your workflow translates to more ethical data sourcing and more human representative data sets. Try the Brave search API for free for up to 2000 queries per month at https://bit.ly/BraveTCR Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off https://www.omneky.com/ Head to Squad to access global engineering without the headache and at a fraction of the cost: head to https://choosesquad.com/ and mention “Turpentine” to skip the waitlist. Recommended Podcast: Byrne Hobart, the writer of The Diff, is revered in Silicon Valley. You can get an hour with him each week. See for yourself how his thinking can upgrade yours. Spotify: https://open.spotify.com/show/6rANlV54GCARLgMOtpkzKt Apple: https://podcasts.apple.com/us/podcast/the-riff-with-byrne-hobart-and-erik-torenberg/id1716646486 CHAPTERS: (00:00:00) About the Show (00:03:50) Intro (00:08:13) AI Scouting (00:14:42) Why arent people adopting AI more quickly? (00:18:25) Why dont people take advantage of AI? (00:22:35) Sponsors: Oracle | Brave (00:24:42) How to get a better understanding of AI (00:31:16) How to handle the public discourse around AI (00:34:02) Scaling and research (00:43:18) Sponsors: Omneky | Squad (00:45:03) The pause (00:47:29) Algorithmic efficiency (00:52:52) Red Teaming in Public (00:55:41) Deepfakes (01:01:02) AI safety (01:04:00) AI moderation (01:07:03) Why not a doomer (01:09:10) AI understanding human values (01:15:00) Interpretability research (01:18:30) AI safety leadership (01:21:55) AI safety respectability politics (01:33:42) China (01:37:22) Radical uncertainty (01:39:53) P(doom) (01:42:30) Where to find the guest (01:44:48) Outro