Podcasts about Red team

508PODCASTS
982EPISODES
47mAVG DURATION
5WEEKLY NEW EPISODES
Nov 6, 2025LATEST

POPULARITY

20172018201920202021202220232024

Best podcasts about Red team

Security Happy Hour

102 episodes with Red team

2ND CONTACT READY

23 episodes with Red team

ITSPmagazine | Technology. Cybersecurity. Society

12 episodes with Red team

Brakeing Down Security Podcast

10 episodes with Red team

Inspired by Ms Amber Red

10 episodes with Red team

The InfoSec & OSINT Show

27 episodes with Red team

The Thinking Leader

10 episodes with Red team

Cybernation Uncensored

9 episodes with Red team

7 Minute Security

6 episodes with Red team

The Steve Gruber Show

6 episodes with Red team

The CyberWire

4 episodes with Red team

Security Conversations

5 episodes with Red team

Paul's Security Weekly TV

6 episodes with Red team

The Political Orphanage

3 episodes with Red team

Cloud Security Podcast

4 episodes with Red team

Cyber Work

4 episodes with Red team

Content Strategy Insights

4 episodes with Red team

Halo Podcast Evolved

3 episodes with Red team

Cybercrime Magazine Podcast

4 episodes with Red team

Cyber and Technology with Mike

5 episodes with Red team

Amateur Hour Chiefs Podcast

4 episodes with Red team

The Social-Engineer Podcast

3 episodes with Red team

The Hacker Factory

5 episodes with Red team

Digital Forensic Survival Podcast

3 episodes with Red team

WP Builds

3 episodes with Red team

How to Be Awesome at Your Job

2 episodes with Red team

Courtney & Company

3 episodes with Red team

the CYBER5

4 episodes with Red team

Global Dispatches -- World News That Matters

2 episodes with Red team

David Bombal

5 episodes with Red team

Red Team Reviews

7 episodes with Red team

Jupiter Extras

3 episodes with Red team

Tradecraft Security Weekly (Video)

6 episodes with Red team

Hacker And The Fed

2 episodes with Red team

All TWiT.tv Shows (MP3)

2 episodes with Red team

Show all podcasts related to red team

Latest podcast episodes about Red team

#154 ANGRIFFSLUSTIG – Red Teaming

„ANGRIFFSLUSTIG – IT-Sicherheit für DEIN Unternehmen“

Play Episode Listen Later Nov 6, 2025 16:29

Wie realistisch sind deine Security-Massnahmen wirklich? In dieser Folge sprechen Andreas und Sandro über simulierte Angriffe – vom gezielten Red-Team-Einsatz bis zum kollaborativen Purple Teaming. Sie erklären, wie strukturierte Security-Simulationen klassische Pentests und Bug-Bounties ergänzen, welche Rollen Red, Blue und Purple wirklich spielen – und warum die wahren Erkenntnisse oft erst nach dem Angriff kommen. Wer verstehen will, wie man Security im Ernstfall testet, sollte hier reinhören.

security cybersecurity andreas purple erkenntnisse angriff sandro angriffe ernstfall red team it security bug bounties blue team red teaming purple teaming

As the Liberals introduce a new Federal Budget, a Conservative MP crosses the floor

The Vassy Kapelos Show

Play Episode Listen Later Nov 5, 2025 78:14

Vassy Kapelos speaks with Finance Minister Francois-Philippe Champagne on the details surrounding the 2025 'Canada Strong' Budget. On today's show: Andrew Scheer, the House Leader of Canada's Conservatives, on his party's response to the federal budget and why one Conservative MP is crossing the floor to the Red Team. Reaction to the details in this year's federal budget from Canadian Construction Association president Rodrigue Gilbert and PSAC national president Sharon DeSousa. The Daily Debrief Panel - featuring Jeff Rutledge, Stephanie Levitz, and Laura D'Angelo. Don Davies, the interim leader for Canada's NDP, shares his thoughts on the budget and how his party might vote on it.

canada budget conservatives floor reaction liberals crosses federal budget ndp conservative mps red team andrew scheer psac vassy kapelos stephanie levitz

Teknik - La guerre Red Team vs EDR - l'aspect business et non technique du problème - Parce que... c'est l'épisode 0x657!

PolySécure Podcast

Play Episode Listen Later Nov 5, 2025

Parce que… c'est l'épisode 0x657! Shameless plug 8 et 9 novembre 2025 - DEATHcon 17 au 20 novembre 2025 - European Cyber Week 25 et 26 février 2026 - SéQCure 2026 14 au 17 avril 2026 - Botconf 2026 28 et 29 avril 2026 - Cybereco Cyberconférence 2026 9 au 17 mai 2026 - NorthSec 2026 3 au 5 juin 2025 - SSTIC 2026 Description Introduction Ce podcast explore la relation complexe entre les équipes Red Team et les solutions EDR (Endpoint Detection and Response), en mettant l'accent sur les dimensions business plutôt que purement techniques. Charles F. Hamilton partage son expertise terrain sur l'évasion des EDR et démystifie la confiance aveugle que beaucoup placent dans ces solutions présentées comme magiques. La réalité des EDR : au-delà du marketing Les EDR sont souvent vendus comme des solutions universelles de protection, mais cette perception cache une réalité plus nuancée. Il existe plusieurs types de solutions (EDR, XDR, NDR) avec des capacités différentes, notamment au niveau de la télémétrie réseau et de l'enrichissement des données. L'industrie de la cybersécurité reste avant tout un business, où les décisions sont guidées par des considérations financières, de croissance et de parts de marché plutôt que uniquement par la protection des utilisateurs. Un aspect troublant est la romanticisation des groupes d'attaquants par certaines compagnies de détection, qui créent des figurines géantes et des noms accrocheurs pour ces groupes criminels lors de conférences. Cette approche marketing peut paradoxalement valoriser le crime et encourager de nouveaux acteurs malveillants. Fonctionnement technique des EDR Les EDR fonctionnent sur plusieurs niveaux de détection. D'abord, l'aspect antivirus traditionnel effectue une analyse statique avant l'exécution d'un binaire. Ensuite, la détection en temps réel utilise diverses techniques : le user mode hooking (de moins en moins populaire), les callbacks dans le kernel, et ETW (Event Tracing for Windows) qui capture de la télémétrie partout dans Windows. Les EDR modernes privilégient les callbacks kernel plutôt que le user mode, car le kernel offre une meilleure protection. Cependant, le risque est qu'une erreur dans le code kernel peut causer un écran bleu, comme l'a démontré l'incident CrowdStrike. Microsoft a également implémenté les PPL (Protected Process Light) pour empêcher même les utilisateurs avec privilèges système de tuer certains processus critiques. Un point crucial : les Red Teams sont souvent plus sophistiquées que les attaquants réels, précisément parce qu'elles doivent contourner les EDR dans leurs mandats. Techniques d'évasion : simplicité et adaptation Contrairement à ce qu'on pourrait croire, l'évasion d'EDR ne nécessite pas toujours des techniques extrêmement sophistiquées. Plusieurs approches simples fonctionnent encore remarquablement bien. Par exemple, modifier légèrement un outil comme PinkCastle en changeant les requêtes LDAP et en désactivant certaines fonctionnalités détectables (comme les tentatives de zone transfer DNS ou les requêtes SPN) peut le rendre indétectable. Un cas particulier intéressant concerne un EDR qui, suite à son acquisition par Broadcom, a cessé d'être signé par Microsoft. Cette décision business a rendu leur DLL incapable de s'injecter dans les processus utilisant le flag de chargement de DLL signées uniquement par Microsoft, rendant effectivement l'EDR sans valeur de détection. Une stratégie efficace consiste à désactiver la connectivité réseau des processus EDR avant toute manipulation, en utilisant le firewall local. Même si des alertes sont générées, elles ne peuvent pas être transmises au serveur. L'agent apparaît simplement offline temporairement. Les vieilles techniques qui fonctionnent encore De nombreuses techniques d'attaque anciennes restent efficaces car elles ne sont pas assez utilisées par les attaquants standard pour justifier leur détection. Les EDR se concentrent sur le “commodity malware” - les attaques volumétriques - plutôt que sur les techniques de niche utilisées principalement par les Red Teams. Charles cite l'exemple d'une “nouvelle backdoor” découverte en 2024 qui était en fait son propre code archivé sur GitHub depuis 8 ans. Pour les compagnies de sécurité, c'était nouveau car jamais vu dans leur environnement, illustrant le décalage entre ce qui existe et ce qui est détecté. L'importance de la simplicité Un conseil crucial : ne pas suivre les tendances en matière de malware. Les techniques à la mode comme le stack spoofing deviennent rapidement détectées. Charles utilise depuis 6-7 ans un agent simple en C# sans share code ni techniques exotiques, qui passe encore inaperçu. La simplicité et une approche différente sont souvent plus efficaces que la complexité. L'utilisation de Beacon Object Files (BOF) avec Cobalt Strike évite l'injection de processus, réduisant considérablement les artefacts détectables. Recommandations pratiques Pour les organisations, avoir un EDR est essentiel en 2025 pour bloquer les attaques triviales. Mais ce n'est qu'un début. Il faut absolument avoir au moins une personne qui examine les logs quotidiennement, idéalement trois fois par jour. De nombreux incidents de réponse montrent que toute l'information était disponible dans la console EDR, mais personne ne l'a regardée. La segmentation réseau reste sous-développée depuis 15 ans, principalement pour des raisons de complexité opérationnelle. Sysmon devrait être déployé partout avec une configuration appropriée pour augmenter exponentiellement la visibilité, malgré la courbe d'apprentissage XML. La visibilité réseau est ce qui manque le plus aux clients en 2025. Sans elle, il est impossible de valider ce que les EDR prétendent avoir bloqué. Charles donne l'exemple de Microsoft Defender Identity qui dit avoir bloqué des attaques alors que l'attaquant a bel et bien obtenu les hash recherchés. Conclusion L'évasion d'EDR est une spécialisation à part entière, au même titre que le pentesting web ou Active Directory. Le secret est de comprendre profondément Windows, les outils et les EDR eux-mêmes avant de tenter de les contourner. Les entreprises doivent garder l'intelligence à l'interne plutôt que de dépendre entièrement des produits commerciaux. Finalement, la collaboration entre Blue Teams et Red Teams reste insuffisante. Plus de synergie permettrait aux deux côtés de mieux comprendre les perspectives de l'autre et d'améliorer globalement la sécurité. La curiosité et l'apprentissage continu sont les clés du succès dans ce domaine en constante évolution. Notes Training Training Collaborateurs Nicolas-Loïc Fortin Charles F. Hamilton Crédits Montage par Intrasecure inc Locaux virtuels par Riverside.fm

8.5 - Footprint Discovery for Red Teamers

TrustedSec Security Podcast

Play Episode Listen Later Nov 3, 2025 29:18

Red Teaming 101: understand your target before you attack. On this episode, we invited two heavy hitters, Principal Security Consultants Hans Lakhan and Oddvar Moe on the show to talk about Red Team operations. We discuss footprinting and reconnaissance techniques including identifying a target's online presence, the tools and methods used for reconnaissance, and social engineering. Listen as we walk through how we map the digital terrain before a red team engagement! About this podcast: Security Noise, a TrustedSec Podcast hosted by Geoff Walton and Producer/Contributor Skyler Tuter, features our cybersecurity experts in conversation about the infosec topics that interest them the most. Find more cybersecurity resources on our website at https://trustedsec.com/resources. Red teaming services: https://trustedsec.com/services/red-teaming

discovery footprint red team red teaming

Mehr als Firewalls: Physische Sicherheit im Rechenzentrum

Das Ohr am Netz

Play Episode Listen Later Oct 29, 2025 43:44

Wenn von IT-Sicherheit die Rede ist, denken viele zuerst an Cyberangriffe, Firewalls oder Datenverschlüsselung. Doch zahlreiche Risiken für den zuverlässigen IT-Betrieb kommen oft aus der physischen Welt: Feuer, Stromausfälle, unbefugter Zutritt oder schlicht menschliches Fehlverhalten. In dieser Folge des eco Podcasts “Das Ohr am Netz” widmen sich Sidonie und Sven der Frage, wie moderne Rechenzentren sich gegen solche realen Gefahren absichern. Ihre Gäste geben einen tiefen Einblick in ihre Praxis: Gemeinsam sprechen sie darüber, welche physischen Risiken in Rechenzentren heute im Fokus stehen und welche Sicherheitsaspekte in der IT-Welt häufig unterschätzt werden, gerade im Vergleich zur Cyber- oder Prozesssicherheit. Joachim Astel (noris network) erklärt, wie ein durchdachtes Sicherheitsmodell aussieht und warum es mehr braucht als nur abgeschlossene Türen. Er berichtet, welche Rolle regelmäßige Auditierungen, Zertifizierungen und Red-Team-Übungen spielen und wie sich physische Resilienz im Alltag überprüfen lässt. Auch Standortfaktoren wie Energieversorgung, Kühlung und Georedundanz sind zentrale Bausteine eines robusten Sicherheitskonzepts. Brandschutzexpertin Anne Omar (Fogtec) zeigt, warum Feuer in modernen Rechenzentren noch immer ein ernstzunehmendes Risiko ist und welche technologischen Entwicklungen dabei helfen, Daten und Infrastruktur effektiv zu schützen. Sie erklärt, wie sich Wassernebel- und Hochdrucklöschsysteme von klassischen Gaslöschanlagen unterscheiden und welche Vorteile sie gerade in sensiblen IT-Umgebungen bieten. Außerdem geht es darum, wie sich bestehende Anlagen im laufenden Betrieb nachrüsten lassen, welche Herausforderungen Lithium-Ionen-Batterien mit sich bringen und wie sich Sicherheit und Nachhaltigkeit im Brandschutz vereinen lassen. Weitere Informationen: Mehr zum DSC-Beirat eco Studie “Die Internetwirtschaft in Deutschland 2025-2030” Artikel zum großen KI-Innovationspark IPAI Erdbebensichere Rechenzentren Das unterirdische Rechenzentrum in Stockholm --------- Moderation: Sidonie Krug, Sven Oswald Schnitt: David Grassinger Redaktion: Christin Müller, Irmeline Uhlmann, Laura Rodenbeck, Anja Wittenburg Produktion: eco – Verband der Internetwirtschaft e.V.

Inside Offensive AI: From MCP Servers To Real Security Risks

Security Unfiltered

Play Episode Listen Later Oct 27, 2025 66:01 Transcription Available

Send us a textSecurity gets sharper when we stop treating AI like magic and start treating it like an untrusted user. We sit down with Eric Galinkin to unpack the real-world ways red teams and defenders are using language models today, where they fall apart, and how to build guardrails that hold up under pressure. From MCP servers that look a lot like ordinary APIs to the messy truths of model hallucination, this conversation trades buzzwords for practical patterns you can apply right now.Eric shares takeaways from Offensive AI Con: how models help triage code and surface likely bug classes, why decomposed workflows beat “find all vulns” prompts, and what happens when toy benchmarks meet stubborn, real binaries. We explore reinforcement learning environments as a scalable way to train security behaviors without leaking sensitive data, and we grapple with the uncomfortable reality that jailbreaks aren't going away—so output validation, sandboxing, and principled boundaries must do the heavy lifting.We also dig into Garak, the open-source system security scanner that targets LLM-integrated apps where it hurts: prompted cross-site scripting, template injection in Jinja, and OS command execution. By mapping findings to CWE, Garak turns vague model “misbehavior” into concrete fixes tied to known controls. Along the way, we compare GPT, Claude, and Grok, talk through verification habits to counter confident nonsense, and zoom out on careers: cultivate niche depth, stay broadly literate, and keep your skepticism calibrated. If you've ever wondered how to harness AI without handing it the keys to prod, this one's for you.Enjoyed the episode? Follow, share with a teammate, and leave a quick review so more builders and defenders can find the show.Inspiring Tech Leaders - The Technology PodcastInterviews with Tech Leaders and insights on the latest emerging technology trends.Listen on: Apple Podcasts SpotifySupport the showFollow the Podcast on Social Media! Tesla Referral Code: https://ts.la/joseph675128 YouTube: https://www.youtube.com/@securityunfilteredpodcast Instagram: https://www.instagram.com/secunfpodcast/Twitter: https://twitter.com/SecUnfPodcast Affiliates➡️ OffGrid Faraday Bags: https://offgrid.co/?ref=gabzvajh➡️ OffGrid Coupon Code: JOE➡️ Unplugged Phone: https://unplugged.com/Unplugged's UP Phone - The performance you expect, with the privacy you deserve. Meet the alternative. Use Code UNFILTERED at checkout*See terms and conditions at affiliated webpages. Offers are subject to change. These are affiliated/paid promotions.

social media ai security os hackers offers offensive nvidia gpt unplugged apis llm gpu grok servers red team tech leaders security risks garak jinja cwe offensive security

The Three Most Important Words We're Taught Not to Say

The Great Simplification with Nate Hagens

Play Episode Listen Later Oct 24, 2025 26:30

In this week's Frankly, Nate considers the ways in which our social species overvalues false-confidence rather than the more honest and inquisitive response of “I don't know.” He invites us to consider the science behind this cultural bias towards certainty: from our biological response from the stress of “not knowing” to the reinforcing effects of motivated reasoning that ensnares even the smartest among us (especially the smartest among us). Overconfidence and the desire for quick answers have been the root cause of many of humanity's disasters, from the space shuttle Challenger explosion to the Deep Water Horizon oil spill to the subprime housing bubble. And now, the exponential growth and integration of Artificial Intelligence is hyper-fueling this risk, as AI mirrors the human aversion to uncertainty through “hallucinations”. As some AI companies are now considering penalizing over-confident answers in favor of “I don't know”, perhaps humans could learn to do the same for ourselves. How often do you say. "I don't know"? In what ways do we lose opportunities for conversation and exploration by not admitting our own uncertainties? Can listening to our own gut for “truth” and embracing intentional Red Team dissent shift “I don't know” from weakness to wisdom? (Recorded October 17th, 2025) Show Notes and More Watch this video episode on YouTube Want to learn the broad overview of The Great Simplification in 30 minutes? Watch our Animated Movie. --- Support The Institute for the Study of Energy and Our Future Join our Substack newsletter Join our Hylo channel and connect with other listeners

ai energy study artificial intelligence taught substack challenger animated movies deepwater horizon overconfidence red team hylo great simplification

389 - Lessons and AI Insights from Using the Red Blue and Purple Team Strategy with Dave Frees

Podiatry Legends Podcast

Play Episode Listen Later Oct 21, 2025 50:04

What happens when you take the Red, Blue, and Purple Team Strategy and apply it inside a podiatry clinic? In this episode, I share what I learned firsthand while running this creative exercise with a clinic team, and how it revealed new insights into teamwork, reputation, and readiness. Joined by Dave Frees, we also unpack how AI can help clinics think smarter, act faster, and build lasting resilience. If you want a deeper understanding of this concept, please go back and listen to Episode 380: Future Proofing Your Podiatry Clinic with Red, Blue and Purple Team Strategy. Learn more about Dave's training at Business Black Ops. MY UPCOMING EVENTS If you found this episode helpful, share it with another podiatrist or business owner. If you'd like guidance on implementing Red Team, Blue Team, Purple Team thinking in your practice or organising a Team Creative Day, please get in touch with me at tf@tysonfrasnklin.com or visit my website www.tysonfranklin.com, I'd love to help you future-proof your business. If you're looking for a speaker for an upcoming event or a facilitator to run a pre-conference workshop, please visit my Speaker Page to see the range of topics I cover. Are You Looking for A Little Business Guidance? A podiatrist I spoke with in early 2024 earned an additional $40,000 by following my advice from a 30-minute free Zoom call. Think about it: you have everything to gain and nothing to lose, and it's not a TRAP. I'm not out to get you, I'm here to help you. Please follow the link below to my calendar and schedule a free 30-minute Zoom call. I guarantee that after we talk, you will have far more clarity on what is best for you, your business and your career. ONLINE CALENDAR Business Coaching I offer three coaching options: Monthly Scheduled Calls. Hourly Sessions. On-Site TEAM Training and Creativity Days. But let's have a chat first to see what best suits you. ONLINE CALENDAR Facebook Group: Podiatry Business Owners Club Have you grabbed a copy of one of my books yet? 2014 – It's No Secret There's Money in Podiatry 2017 – It's No Secret There's Money in Small Business

money ai strategy lessons zoom trap small business reputation red team podiatry ai insights red blue blue team purple team are you looking dave frees no secret there business black ops

Episodio 314: El Impacto Millonario de los Ciberataques

CiberAfterWork: ciberseguridad en Capital Radio

Play Episode Listen Later Oct 16, 2025 52:17

En este episodio la discusión se centra en la acuciante relevancia de la ciberseguridad en el mundo corporativo, destacando ciberataques recientes de alto perfil contra compañías como Jaguar Land Rover y Aceros Olarra, que provocaron importantes pérdidas financieras. Se reportaron pérdidas estimadas en millones de euros diarios en beneficios para Jaguar Land Rover, poniendo en riesgo unos empleos directos e indirectos en la cadena de suministro. El programa también presenta dos entrevistas: la primera con Luis Fernández, editor de la revista SIC, para discutir el próximo congreso Securmática (que celebra su 35ª edición) y el papel en evolución del CISO. Fernández enfatiza que el CISO debe mutar y hablar el lenguaje del negocio, integrando el riesgo tecnológico dentro del concepto más amplio de riesgos empresariales. La segunda entrevista es con Alberto Rodríguez de Rut Valencia, para adelantar la agenda de esa conferencia de ciberseguridad, incluyendo formaciones sobre DFIR, OPSEC, Red Team, ataques al directorio activo y hardware hacking. Twitter: @ciberafterwork Instagram: @ciberafterwork Panda Security: https://www.pandasecurity.com/es/ +info: https://psaneme.com/ https://bitlifemedia.com/ https://www.vapasec.com/ VAPASEC https://www.vapasec.com/ https://www.vapasec.com/webprotection/

impacto ciso millonario sic jaguar land rover red team opsec ciberataques alberto rodr luis fern dfir

No Password Required Podcast Episode 64 — DeMarcus Williams

No Password Required

Play Episode Listen Later Sep 29, 2025 42:59

In this episode of No Password Required, host Jack Clabby and guest host Sarina Gandy discuss the insights gained from their conversation with Demarcus Williams, a senior security engineer at Starbucks. They explore Demarcus's journey into cybersecurity, the importance of competitions like CCDC in career development, and the role of gut instinct in cybersecurity. The discussion also touches on the differences between corporate cultures, the significance of mentorship, and the fun aspects of the cybersecurity community, including a light-hearted lifestyle polygraph segment. TakeawaysDemarcus' curiosity about video games sparked his interest in cybersecurity.The transition from defense contracting to corporate roles offers broader access to tools.Gut feelings play a significant role in cybersecurity decision-making.Competitions like CCDC are crucial for career development in cybersecurity.Networking at competitions can lead to job opportunities.Corporate culture varies significantly between government contracting and tech companies.A people-first approach is essential in mentorship and cybersecurity.The red team experience enhances skills applicable to day-to-day work.Work-life balance is crucial in maintaining a sustainable career in cybersecurity.Engaging with the community is vital for personal and professional growth. Chapters00:00 Introduction to Cybersecurity and Curiosity02:47 Day-to-Day Life of a Senior Security Engineer05:30 The Role of Gut Instinct in Cybersecurity08:31 Early Inspirations and the Journey into Cybersecurity11:35 The Importance of Competitions in Career Development14:33 Transitioning from Student to Professional17:34 The Red Team Experience and Its Impact20:25 Recruitment Opportunities in Cybersecurity Competitions23:33 Navigating Corporate Culture in Cybersecurity26:31 Mentorship and People-First Approach29:11 Lifestyle Polygraph and Fun Insights

Red Team Masterclass: Crafting & Executing Cyber Attacks (Part 2)

InfosecTrain

Play Episode Listen Later Sep 24, 2025 70:36

In Part 2 of InfosecTrain's Red Team Masterclass, we go deeper into the art of executing simulated cyber attacks. This session explores how red teams gain access, escalate privileges, and establish persistence within target environments — all under strict ethical and controlled conditions.

masterclass crafting cyberattacks executing red team

Red Team Masterclass: Crafting & Executing Cyber Attacks (Part 1)

InfosecTrain

Play Episode Listen Later Sep 23, 2025 64:36

Welcome to Red Team Unleashed — Part 1 of InfosecTrain's masterclass on offensive security. In this episode we demystify how advanced red teams design realistic attack scenarios and test organizational defenses end-to-end. You'll learn the differences between red teaming and penetration testing, the common engagement types, and the full red team attack lifecycle. We also introduce the MITRE ATT&CK framework and dive into reconnaissance and enumeration techniques in Active Directory environments — the foundational skills every offensive operator and defender should know. Whether you're an ethical hacker, SOC analyst, or security professional aiming to level up, this session gives practical frameworks and real-world context to sharpen your offensive and detection capabilities.For certifications, structured training, or team workshops, visit: ⁠⁠infosectrain.com ⁠⁠For enquiries, email: ⁠⁠sales@infosectrain.com⁠⁠ or connect via ⁠⁠infosectrain.com⁠⁠⁠⁠/contact-us

practical types masterclass crafting cyberattacks executing soc ck reconnaissance red team active directory mitre att

Ep. 321 - Security Awareness Series - Trust But Verify Even Under Abnormal Circumstances: A Red Team Story with Chris and Faith

The Social-Engineer Podcast

Play Episode Listen Later Sep 15, 2025 29:29

Today on the Social-Engineer Podcast: The Security Awareness Series, Chris is joined by Faith Kent. Together, they delve into the critical role of effective communication, the art of role adaptation, and the psychological dynamics in crisis situations. The conversation highlights the importance of proactive preparedness and fostering trust within teams to tackle challenges with confidence. [Sept 15, 2025] 00:00 - Intro 00:42 - Faith Kent Intro 01:21 - Intro Links: - Social-Engineer.com - http://www.social-engineer.com/ - Managed Voice Phishing - https://www.social-engineer.com/services/vishing-service/ - Managed Email Phishing - https://www.social-engineer.com/services/se-phishing-service/ - Adversarial Simulations - https://www.social-engineer.com/services/social-engineering-penetration-test/ - Social-Engineer channel on SLACK - https://social-engineering-hq.slack.com/ssb - CLUTCH - http://www.pro-rock.com/ - innocentlivesfoundation.org - http://www.innocentlivesfoundation.org/ 03:09 - Breaking & Entering 04:45 - Blending In 07:45 - Frank's Computer 10:13 - Unusual Communications 12:17 - Cochlear Implant 14:19 - Ethical Boundaries 16:11 - Community Pride 18:00 - Leaning Into the Discomfort 21:57 - Not an Afterthought 23:08 - Diversity for Security 27:00 – Trust, But Verify (Always) 28:23 - Wrap Up - Deaf Gain: Raising the Stakes for Human Diversity - https://gallaudet.edu/deaf-president-now/ - https://www.lifeprint.com/ 29:03 - Outro - www.social-engineer.com - www.innocentlivesfoundation.org

L'anticipation stratégique par l'imaginaire de la science-fiction. Avec le Lieutenant-colonel Jean-Baptiste Colas et Virginie Tournay

Sur le front climatique

Play Episode Listen Later Sep 15, 2025 58:43

Comment le ministère des Armées pourrait-il anticiper un arrêt simultané de toutes les infrastructures technologiques ou une super-explosion de gisements pétroliers plongeant la planète dans une nuit carbonique ? Dans ce nouvel épisode, Julia Tasse, directrice de recherche à l'IRIS et Maxime Thuillez, rédacteur en chef du Greenletter Club, s'entretiennent avec le Lieutenant-Colonel Jean-Baptiste Colas, conseiller prospective et anticipation stratégique au cabinet du Délégué général pour l'Armement et Virginie Tournay, auteure de science-fiction et membre de la RedTeam Défense du ministère des Armées. Ce projet de l'Agence de l'Innovation de Défense réunit depuis 2020 des auteurs et autrices de science-fiction et des experts scientifiques et militaires pour imaginer les menaces pouvant directement mettre en danger la France et ses intérêts à l'horizon 2030-2060. Cet exercice d'écriture de scénarios, remplacé par le programme RADAR depuis 2024 pour y inclure aussi la voix des citoyens, permet de confronter le ministère des Armées à de nouvelles pistes de réflexion concernant des enjeux stratégiques et opérationnels extrêmes, transformant les paradigmes intellectuels et institutionnels actuels. Crédits :« Sur le front climatique » est un podcast de l'Observatoire Défense & Climat produit par l'IRIS pour le compte de la DGRIS du ministère des Armées. Cet entretien a été enregistré au ministère des Armées. Le Greenletter Club est un média qui réalise de longues interviews – en vidéo et en podcast – pour décortiquer les grands sujets écologiques : https://www.youtube.com/@greenletterclub4184 Édition : Julia Tasse et Maxime ThuillezPrise de son, communication : Coline LarocheMontage : Matisse DormoyGénérique : Near Deaf ExperienceHébergé par Ausha. Visitez ausha.co/politique-de-confidentialite pour plus d'informations.

france innovation fiction dans science fiction arm radar visitez strat ausha prospective climat jean baptiste lieutenant colonel la science colas red team tournay

Red Team: The Best Job in Cybersecurity (Until You Get Arrested)

True Crime Cyber Geeks

Play Episode Listen Later Sep 7, 2025 28:30

What is Red Teaming, and what does it have to do with cybersecurity? In this episode, we look at how Red Teamers are hired to attack company security using all manner of tactics, from tossing malware-infested USB sticks into parking lots to posing as an HVAC technician. We also take a look at one of the most notorious Red Team exercises in history, when two Coalfire employees were arrested and fought a long legal battle, just for doing their jobs. ResourcesInside the Courthouse Break-In Spree That Landed Two White-Hat Hackers in JailDarknet Diaries Episode 59: The CourthouseCoalfire Systems websiteDEF CON 22 - Eric Smith and Josh Perrymon - Advanced Red Teaming: All Your Badges Are Belong To UsHow RFID Technology Works: Revolutionizing the Supply ChainNolaCon 2019 D 07 Breaking Into Your Building A Hackers Guide to Unauthorized Physical AccessSend us a textSupport the showJoin our Patreon to listen ad-free!

cybersecurity arrested usb hvac best jobs eric smith red team red teaming coalfire

380 - Future-Proofing Your Podiatry Practice with David M Frees (Red, Blue, Purple Teaming)

Podiatry Legends Podcast

Play Episode Listen Later Aug 15, 2025 60:18

David M Frees returns to the Podiatry Legends Podcast to share a powerful strategic tool borrowed from the military: Red Team, Blue Team, and Purple Team thinking. We discuss how podiatry businesses can use this method to identify vulnerabilities, defend against threats, and uncover growth opportunities – creating a more resilient and profitable practice. Learn more about Dave's Training at Business Black Ops. 9 Key Takeaways from this Episode: The Red Team's job is to identify vulnerabilities and potential attacks on the business. The Blue Team focuses on defending the business and finding growth opportunities. Combining both perspectives creates the Purple Team, where the best ideas are integrated. Podiatry clinics often fail to review the effectiveness of marketing campaigns. Anticipating threats prevents panic when challenges arise. Inversion thinking helps identify what could destroy your business. Opportunities can be just as valuable as threat prevention. Having a facilitator improves the quality of Red/Blue Team sessions. Documenting scenarios and solutions creates a valuable reference for future challenges. If you found this episode helpful, share it with another podiatrist or business owner. And if you'd like guidance on implementing Red Team, Blue Team, Purple Team thinking in your practice, get in touch with me via email at tf@tysonfranklin.com or visit my website tysonfranklin.com. I'd love to help you future-proof your business. Visit the Podiatry Legends Podcast Website for more detailed show notes. Join my Facebook Group - Podiatry Business Owners Club. Check out my YouTube Channel - Tyson E Franklin

training opportunities practice key takeaways documenting anticipating futureproofing inversion red team podiatry red blue blue team purple team purple teaming podiatry legends podcast business black ops david m frees

Thirty Years of Application Security with Michael Howard

.NET Rocks!

Play Episode Listen Later Aug 14, 2025 63:00

How has application security evolved over the decades? Carl and Richard talk to Michael Howard about his experiences working in security at Microsoft. Michael discusses his current role as a member of the Red Team at Microsoft, which identifies security vulnerabilities within the organization by creating scenarios that black hats might employ, such as stealing tokens or hijacking financial transactions. The conversation examines how security continues to evolve, with improved tools, new attack surfaces, and increasingly serious attacks. It's an arms race, but one the good guys can win!

microsoft thirty years red team application security michael howard

Thirty Years of Application Security with Michael Howard

.NET Rocks!

Play Episode Listen Later Aug 13, 2025 63:03 Transcription Available

microsoft thirty years red team application security michael howard

「御社のゼロデイ見つけます」ITM と韓国 NSHC が業務提携しペンテスト等提供～ 800 万円から

ScanNetSecurity 最新セキュリティ情報

Play Episode Listen Later Jul 23, 2025 0:16

　アイティーエム株式会社（ITM）は7月10日、大韓民国NSHC Inc.との業務提携の第1弾として「Red Teamベース・バグバウンティハンティング＆ペネトレーションテストサービス」の提供を7月1日から開始したと発表した。

red team itm

On No Password Required Podcast Episode 61 — Kathy Collins

No Password Required

Play Episode Listen Later Jul 9, 2025 40:15

keywordscybersecurity, culinary arts, penetration testing, career transition, high-pressure situations, horror films, IT, social engineering, cooking, cybersecurity horror, dark web, pen testing, B-Sides community, cybersecurity, lifestyle polygraph, music, childhood memories, culinary skills, competition takeawaysKathy Collins transitioned from IT to culinary arts and back to cybersecurity.Her journey highlights the transferable skills between cooking and cybersecurity.Physical penetration testing involves unpredictable human elements.High-pressure situations in cooking can prepare one for cybersecurity challenges.Unexpected challenges can arise in both culinary events and cybersecurity tests.The importance of communication in cybersecurity engagements is crucial.Kathy's experience in cooking for large groups parallels the complexities of cybersecurity.The need for proper notification in penetration testing to avoid misunderstandings.Kathy's culinary background influences her approach to problem-solving in cybersecurity.There is a lack of big-budget horror films focused on cybersecurity. Going with the correct skeptical mindset is crucial.Using tools like Flare helps in dark web monitoring.B-Sides events are affordable and beneficial for newcomers.Engaging with the community fosters excitement and learning.Hannibal Lecter would be an interesting pen test partner.The Jaws soundtrack sets a perfect mood for stealth.Bonding over music can strengthen family relationships.Childhood toys can reveal early hacker tendencies.Culinary skills can be approached with a hacker mindset.Competition in cooking shows often emphasizes drama over skill. summaryIn this episode of the No Password Required podcast, host Jack Clabby and co-host Kaylee Melton welcome Kathy Collins, a security consultant at Secure Ideas. Kathy shares her unique journey from working in IT to pursuing a culinary career, and then back to cybersecurity. The conversation explores the transferable skills between cooking and cybersecurity, the unpredictability of physical penetration testing, and the high-pressure situations faced in both fields. Kathy also recounts memorable experiences from her culinary career and discusses the lack of horror films centered around cybersecurity. In this engaging conversation, the speakers delve into various aspects of cybersecurity, including the use of the dark web in penetration testing, the importance of community events like B-Sides, and the fun of the Lifestyle Polygraph segment. They also share personal anecdotes about music, childhood memories, and culinary skills, creating a rich tapestry of insights and experiences in the cybersecurity field. titlesFrom Chef to Cybersecurity: A Unique JourneyThe Culinary Path to CybersecurityHigh Stakes: Cooking and Cybersecurity Under PressurePenetration Testing: The Culinary Connection Sound Bites"I had to do some soul searching.""I was like, what if I have to do...""It's disturbingly easy.""There are so many opportunities there.""Going with the correct skeptical mindset.""We have a tool that we use called Flare.""They should attend them, first of all.""I had an Easy Bake Oven and took it apart." Chapters00:00 Introduction to Cybersecurity and Culinary Journeys02:46 From IT to Culinary Arts: A Unique Transition06:02 The Shift Back to Cybersecurity09:00 Experiences in Physical Penetration Testing11:48 High-Pressure Situations: Cooking vs. Cybersecurity15:02 Unexpected Challenges in Culinary Events17:54 The Intersection of Horror and Cybersecurity23:32 Exploring the Dark Web in Pen Testing25:34 Engaging with the B-Sides Community27:09 The Lifestyle Polygraph: Fun and Games 31:09 Bonding Over Music and Childhood Memories34:17 Culinary Skills and Competition Insights

culture games parenting horror experiences competition unexpected childhood exploring engaging cybersecurity thriller jacksonville emotional intelligence jaws intersection required reality tv horror movies law enforcement bootcamp working from home passwords risk management live podcasts culinary bonding improvisation gordon ramsay dark web working moms beekeepers crisis management flare hannibal lecter social engineering b sides career pivots culinary arts security guards tampa florida summaryin restaurant industry gender bias tropical storms bobby flay cooking shows vinyl records red team transferable skills communication breakdown piggybacking blue team culinary school pentesting career reinvention childhood toys easy bake oven reality shifts from it horror movie villains cooking competition corporate chef threatlocker culinary institute of america open kitchen cybersecurity podcast cybersecurity careers event logistics fullstack academy food service management carlton fields secure ideas kathy collins

Hands-On, Job-Ready: A Fresh Approach to Building the Next Generation of Pen Testers | A White Knight Labs Brand Story With John Stigerwalt And Greg Hatcher

ITSPmagazine | Technology. Cybersecurity. Society

Play Episode Listen Later Jun 30, 2025 40:25

Getting a start in cybersecurity has never been easy — but for today's aspiring pen testers, the entry barriers are even higher than they were a decade ago. In this conversation, Sean Martin and Marco Ciappelli sit down with Greg Hatcher and John Stigerwalt from White Knight Labs to unpack why they decided to flip the script on entry-level offensive security training.Greg, a former Army Special Operations communicator, and John, who got his break as a self-taught hacker, agree that the traditional path — expensive certifications and theoretical labs — doesn't reflect the reality of the work. That's why White Knight Labs is launching the Entry Level Pen Tester (ELPT) program. The idea is straightforward: make high-quality, practical training accessible to anyone, anywhere.Unlike other courses that focus purely on the technical side, the ELPT emphasizes the full skill set a junior pen tester needs. This means not just breaking into systems, but learning how to write clear reports, communicate effectively with clients, and operate as part of a real engagement team. John explains that even the best technical find is worthless if it's not explained properly or delivered with clear guidance for fixing the issue.Greg points out that the team culture at White Knight Labs borrows from his Special Forces days — small, specialized teams where each individual goes deep on a specific domain but works in tight coordination with others. Their goal for trainees mirrors this: to develop focused, practical skills while understanding how their piece fits into bigger, complex attack scenarios.Affordability and global access are key parts of the mission. The team wants the ELPT to open doors for people who might not have thousands to spend on training. By combining hands-on labs, in-depth modules, real-world scenarios, and a tough final exam, they aim to ensure that passing the ELPT means you're truly job-ready.For anyone considering a start in offensive security, this episode is a glimpse into a program designed to create more than just hackers — it's building adaptable, communicative professionals ready to hit the ground running.Learn more about White Knight Labs: https://itspm.ag/white-knight-labs-vukrGuests:John Stigerwalt | Founder at White Knight Labs | Red Team Operations Leader | https://www.linkedin.com/in/john-stigerwalt-90a9b4110/Greg Hatcher | Founder at White Knight Labs | SOF veteran | Red Team | https://www.linkedin.com/in/gregoryhatcher2/______________________Keywords: sean martin, marco ciappelli, greg hatcher, john stigerwalt, cybersecurity, pentesting, training, certification, whiteknightlabs, hacking, brand story, brand marketing, marketing podcast, brand story podcast______________________ResourcesVisit the White Knight Labs Website to learn more: https://itspm.ag/white-knight-labs-vukrLearn more and catch more stories from White Knight Labs on ITSPmagazine: https://www.itspmagazine.com/directory/white-knight-labsLearn more about ITSPmagazine Brand Story Podcasts: https://www.itspmagazine.com/purchase-programsNewsletter Archive: https://www.linkedin.com/newsletters/tune-into-the-latest-podcasts-7109347022809309184/Business Newsletter Signup: https://www.itspmagazine.com/itspmagazine-business-updates-sign-upAre you interested in telling your story?https://www.itspmagazine.com/telling-your-story

training hands next generation cybersecurity hacking labs certification special forces affordability brand marketing brand stories marketing podcast red team white knight testers fresh approach pentesting army special operations sean martin greg hatcher

The Valley-From Coyote Ugly to Maui Messy

We Wine Whenever's Podcast

Play Episode Listen Later Jun 30, 2025 47:45 Transcription Available

Send us a textThe Valley-From Coyote Ugly to Maui MessyPodcast Summary: The Valley Season 2, Episode 11 – “El Coyote Ugly”This episode was a rollercoaster of emotions, drama, and shifting alliances. We open with Kristen and Luke settling into domestic bliss, with Luke expressing his deep commitment to LA and starting a family. Meanwhile, Nia stirs the pot by confronting Zack about knowing of Luke's proposal and insists Jason and Janet should still be invited to Hawaii for authenticity's sake.At Jesse's house, a powerful men's group therapy session takes place with life coach Scott. Jax breaks down in tears, admitting his lifelong anger issues, especially toward his wife. Danny confesses to sneaky drinking habits, which shocks the group. There's a moment of attempted reconciliation between Jason and Danny, but trust is still shaky.The women head to El Coyote, and the night quickly turns chaotic. Tensions between Janet, Kristen, and Michelle explode over rumors, cheating accusations, and Kristen's friend Jenna. Janet aggressively inserts herself into Kristen and Michelle's conflict, prompting Kristen to walk out. Nia breaks down in tears, overwhelmed by the drama. Janet, unapologetic and loud, leaves too.Next, the cast gathers for a messy but fun Field Day — with egg tosses and tug-of-war. The Red Team dominates. Britt opens up to Janet about her divorce and Jax being served. Later, Jax and Kristen share a rare civil moment. Meanwhile, Luke picks up the ring for his upcoming proposal and reflects on his grandmother's illness, which has him emotionally rattled.Jax and Britt are supposed to meet up, but she no-shows after hearing he was drunk at his bar. She feels betrayed, especially since he was supposed to be on a healing journey.The group arrives in Hawaii — with Jesse and Zack awkwardly sharing the presidential suite. The girls try to patch things up poolside, and Janet apologizes to Nia, who remains cautious. Danny also owns up to stirring the pot and apologizes. Jesse reveals deep resentment toward Aaron, Michelle's new boyfriend, who started dating her while Jesse and Michelle were still married.Tensions remain high as Luke nervously prepares to propose, Jesse questions Aaron's integrity, and Kristen wonders why Luke is suddenly acting so distant.Support the showhttps://www.wewinewhenever.com/

relationships mental health hawaii engagement valley messy reality tv tensions maui character development field day red team coyote ugly el coyote

The Art and Science of Microsoft's Red Team

Microsoft Threat Intelligence Podcast

Play Episode Listen Later Jun 25, 2025 40:51

In this episode of the Microsoft Threat Intelligence Podcast, host⁠ ⁠⁠Sherrod DeGrippo is joined by Craig Nelson, who leads the elite Microsoft Red Team. Together, they dive into the art and impact of red teaming at Microsoft: what it means to simulate real-world attacks, how threat intelligence informs operations, and why collaboration between red and blue teams is crucial for organizational resilience. Craig shares the surprising mission that blurred the lines between physical and cyber security, reflects on how AI is reshaping attacker tactics and defensive strategies, and offers advice for aspiring red teamers. From stories of early hacker days in the '90s to navigating the complexities of securing cloud and AI systems, this conversation is packed with insights on how Microsoft stays ahead of evolving threats. In this episode you'll learn: The role of human behavior in real-world security breaches How Microsoft's Secure Future Initiative impacts security culture What the Microsoft Red Team does and what it doesn't do Some questions we ask: How do you feel about getting caught during a red team operation? What do you wish people paid more attention to in red team findings? Is this new AI complexity good or bad for red teaming? Resources: View Craig Nelson on LinkedIn View Sherrod DeGrippo on LinkedIn Related Microsoft Podcasts: Afternoon Cyber Tea with Ann Johnson The BlueHat Podcast Uncovering Hidden Risks Discover and follow other Microsoft podcasts at microsoft.com/podcasts Get the latest threat intelligence insights and guidance at Microsoft Security Insider The Microsoft Threat Intelligence Podcast is produced by Microsoft and distributed as part of N2K media network.

ai microsoft art and science red team craig nelson n2k sherrod degrippo

COMMENT LES GROUPES DE HACKERS LES PLUS DANGEREUX AU MONDE SONT ARRÊTÉS QUAND ILS ATTAQUENT LES BANQUES ?

LEGEND

Play Episode Listen Later Jun 20, 2025 102:22

Nouvelle émission sur LEGEND avec trois experts en cybersécurité qui travaillent pour de grandes banques françaises et veillent à ce que des groupes de hackers ne pénètrent pas le réseau bancaire français. David fait partie de la Red Team, qui simule des attaques pour tester la sécurité d'un système, tandis que Cécile et Nicolas font partie de la Blue Team, qui défend ce système en détectant les intrusions, puis en les bloquant !On a également suivi les équipes de cybersécurité pour filmer des images de hacking en direct, où l'on prend le contrôle d'un ordinateur à distance.Découvrez de nombreuses offres d'emploi juste ici ➡️ https://link.influxcrew.com/LESBANQUESRECRUTENT-LEGENDRetrouvez la FBF sur les réseaux ⬇️YouTube ➡️ https://www.youtube.com/@FederationBancaireFrancaiseLinkedIN ➡️ https://www.linkedin.com/company/fbf/Les réseaux de l'OPCO Atlas ⬇️Instagram ➡️ https://www.instagram.com/orientationatlas/?hl=frYouTube ➡️ https://www.youtube.com/@JinvestislavenirTiktok ➡️ https://www.tiktok.com/@orientationatlas?lang=frLinkedIN ➡️ https://www.linkedin.com/company/opco-atlas/Collaboration commerciale avec FBF et OPCO AtlasPour toutes demandes de partenariats : legend@influxcrew.comRetrouvez-nous sur tous les réseaux LEGEND !Facebook : https://www.facebook.com/legendmediafrInstagram : https://www.instagram.com/legendmedia/TikTok : https://www.tiktok.com/@legendTwitter : https://twitter.com/legendmediafrSnapchat : https://t.snapchat.com/CgEvsbWV Hébergé par Acast. Visitez acast.com/privacy pour plus d'informations.

tiktok collaboration acast hackers quand monde sont nouvelle visitez dangereux red team groupes banques blue team fbf

Episode 59 - Human + AI: Shaping the Nonprofit Work of Tomorrow

FundraisingAI

Play Episode Listen Later Jun 11, 2025 48:48

In an era where Artificial Intelligence is redefining our world at a remarkably swift pace, the traditional way of doing things no longer guarantees decent outcomes. The transformation of how we work, create, and find meaning due to AI puts the conventional job pyramid into a disruptive mode. It's our responsibility to embrace curiosity, adapt boldly, and focus on delivering meaningful outcomes rather than routine tasks in order to thrive in this constantly changing landscape. It is no longer enough to simply work harder; it's essential to work smarter with the assistance of AI. The future belongs to those who are willing to combine the power of human creativity with the power of AI. In this week's episode, Nathan and Scott bring their usual updates on what new changes occurred in the landscape of AI. Then they share thoughts on the differences in AI adoption between the private and nonprofit sectors. The nonprofits have a responsibility to use AI thoughtfully because it can magnify their impact and shape the future of technology at the same time. Nathan then explains the importance of redefining success by choosing joy over traditional job titles in an AI-driven world. He also comments on the importance of encouraging the younger generations to follow what they love instead of the paycheck or traditional career paths. Furthermore, Nathan and Scott pay attention to discussing a much-needed topic: the limitations of AI when it comes to providing the human connection. AI can definitely answer your questions, but it cannot make you feel heard as another human being. It's important to remember what a technology like AI is capable of and not capable of providing. Wrapping up the conversation, Scott provides tips to train ChatGPT fast, and Nathan contributes with the ponder of the week and encourages people to slow down and take time to ground themselves with the reality of what surrounds them. HIGHLIGHTS [02.04] AI in the private and nonprofit sectors. [07.50] The weekly highlight on AI. [14.12] Document by OpenAI's Red Team about AI resisting termination instructions. [16.22] The responsibility of using AI thoughtfully. [22.37] Redefining success. [37.37] The Collapse of the Traditional Career Pyramid. [31.14] The limitations of AI in providing the human connection. [35.50] AI in the nonprofit sector. [39.41] Tip of the Week – Train ChaatGPT fast with a quick profile of you and your work. [43.20] Ponder of the Week – Slow down and ground yourself with the reality of what's around you. Connect with Nathan and Scott: LinkedIn (Nathan): ⁠⁠⁠linkedin.com/in/nathanchappell/⁠⁠⁠ LinkedIn (Scott): ⁠⁠⁠linkedin.com/in/scott-rosenkrans⁠⁠⁠ Website: ⁠⁠⁠fundraising.ai/

ai chatgpt artificial intelligence redefining collapse nonprofits wrapping shaping tip openai document ponder red team human ai

Build your own pen testing tools and master red teaming tactics | Ed Williams

Cyber Work

Play Episode Listen Later Jun 2, 2025 34:46 Transcription Available

Get your FREE Cybersecurity Salary Guide: https://www.infosecinstitute.com/form/cybersecurity-salary-guide-podcast/?utm_source=youtube&utm_medium=podcast&utm_campaign=podcastEd Williams, Vice President of EMEA Consulting and Professional Services (CPS) at TrustWave, shares his two decades of pentesting and red teaming experience with Cyber Work listeners. From building his first programs on a BBC Micro (an early PC underwritten by the BBC network in England to promote computer literacy) to co-authoring award-winning red team security tools, Ed discusses his favorite red team social engineering trick (hint: it involves fire extinguishers!), and the ways that pentesting and red team methodologies have (and have not) changed in 20 years. As a bonus, Ed explains how he created a red team tool that gained accolades from the community in 2013, and how building your own tools can help you create your personal calling card in the Cybersecurity industry! Whether you're breaking into cybersecurity or looking to level up your pentesting skills, Ed's practical advice and red team “war stories,” as well as his philosophy of continuous learning that he calls “Stacking Days,” bring practical and powerful techniques to your study of Cybersecurity.0:00 - Intro to today's episode2:17 - Meet Ed Williams and his BBC Micro origins5:16 - Evolution of pentesting since 200812:50 - Creating the RedSnarf tool in 201317:18 - Advice for aspiring pentesters in 202519:59 - Building community and finding collaborators 22:28 - Red teaming vs pentesting strategies24:19 - Red teaming, social engineering, and fire extinguishers27:07 - Early career obsession and focus29:41 - Essential skills: Python and command-line mastery31:30 - Best career advice: "Stacking Days"32:12 - About TrustWave and connecting with EdAbout InfosecInfosec's mission is to put people at the center of cybersecurity. We help IT and security professionals advance their careers with skills development and certifications while empowering all employees with security awareness and phishing training to stay cyber-safe at work and home. More than 70% of the Fortune 500 have relied on Infosec to develop their security talent, and more than 5 million learners worldwide are more cyber-resilient from Infosec IQ's security awareness training. Learn more at infosecinstitute.com.

Purple teaming in the modern enterprise. [CyberWire-X]

The CyberWire

Play Episode Listen Later May 25, 2025 26:30

In large enterprise software companies, Red and Blue Teams collaborate through Purple Teaming to proactively detect, respond to, and mitigate advanced threats. In this episode of CyberWire-X, N2K's Dave Bittner is joined by Adobe's Justin Tiplitsky, Director of Red Team and Ivan Koshkin, Senior Detection Engineer to discuss how their teams work together daily to strengthen Adobe's security ecosystem. They share real-world insights on how this essential collaboration enhances threat detection, refines security controls, and improves overall cyber resilience. Learn more about your ad choices. Visit megaphone.fm/adchoices

director modern enterprise adobe red team cyberwire purple teaming n2k dave bittner

Episode 113: Microsoft Red Team

The Azure Security Podcast

Play Episode Listen Later May 16, 2025 35:48 Transcription Available

In this episode, Michael, Sarah, and Mark talk to Craig Nelson, VP of the Microsoft Red Team about how the Red Team works to help secure Microsoft and its customers.In life, there are things you know you know, things you know you don't know, and finally, things you don't know you don't know. This episode is full of the latter.We also cover security news about LLMs and MCP, TLS 1.1 and 1.0 deprecation, Private End Point Improvements, Containers and more.https://aka.ms/azsecpod

microsoft containers tls mcp red team craig nelson

Fixing the Detection Disconnect and Rethinking Detection: From Static Rules to Living Signals | A Brand Story with Fred Wilmot from Detecteam | An On Location RSAC Conference 2025 Brand Story

ITSPmagazine | Technology. Cybersecurity. Society

Play Episode Listen Later May 7, 2025 21:33

Fred Wilmot, CEO and co-founder of Detecteam, and Sebastien Tricaud, CTO and co-founder, bring a candid and critical take on cybersecurity's detection and response problem. Drawing on their collective experience—from roles at Splunk, Devo, and time spent in defense and offensive operations—they raise a core question: does any of the content, detections, or tooling security teams deploy actually work?The Detecteam founders challenge the industry's obsession with metrics like mean time to detect or respond, pointing out that these often measure operational efficiency—not true risk readiness. Instead, they propose a shift in thinking: stop optimizing broken processes and start creating better ones.At the heart of their work is a new approach to detection engineering—one that continuously generates and validates detections based on actual behavior, environmental context, and adversary tactics. It's about moving away from one-size-fits-all IOCs toward purpose-built, context-aware detections that evolve as threats do.Sebastien highlights the absurdity of relying on static, signature-based detection in a world of dynamic threats. Adversaries constantly change tactics, yet detection rules often sit unchanged for months. The platform they've built breaks detection down into a testable, iterative process—closing the gap between intel, engineering, and operations. Teams no longer need to rely on hope or external content packs—they can build, test, and validate detections in minutes.Fred explains the benefit in terms any CISO can understand: this isn't just detection—it's readiness. If a team can build a working detection in under 15 minutes, they beat the average breakout time of many attackers. That's a tangible advantage, especially when operating with limited personnel.This conversation isn't about a silver bullet or more noise—it's about clarity. What's working? What's not? And how do you know? For organizations seeking real impact in their security operations—not just activity—this episode explores a path forward that's faster, smarter, and grounded in reality.Learn more about Detecteam: https://itspm.ag/detecteam-21686Note: This story contains promotional content. Learn more.Guests: Fred Wilmot, Co-Founder & CEO, Detecteam | https://www.linkedin.com/in/fredwilmot/Sebastien Tricaud, Co-Founder & CTO, Detecteam | https://www.linkedin.com/in/tricaud/ResourcesLearn more and catch more stories from Detecteam: https://www.itspmagazine.com/directory/detecteamWebinar: Rethink, Don't Just Optimize: A New Philosophy for Intelligent Detection and Response — An ITSPmagazine Webinar with Detecteam | https://www.crowdcast.io/c/rethink-dont-just-optimize-a-new-philosophy-for-intelligent-detection-and-response-an-itspmagazine-webinar-with-detecteam-314ca046e634Learn more and catch more stories from RSA Conference 2025 coverage: https://www.itspmagazine.com/rsac25______________________Keywords:sean martin, fred wilmot, sebastien tricaud, detecteam, detection, cybersecurity, behavior, automation, red team, blue team, brand story, brand marketing, marketing podcast, brand story podcast______________________Catch all of our event coverage: https://www.itspmagazine.com/technology-and-cybersecurity-conference-coverageWant to tell your Brand Story Briefing as part of our event coverage? Learn More

No Manuals, No Shortcuts: Inside the Offensive Security Mindset at White Knight Labs | A White Knight Labs Brand Story With Co-Founders John Stigerwalt And Greg Hatcher

ITSPmagazine | Technology. Cybersecurity. Society

Play Episode Listen Later Apr 25, 2025 47:54

We've been in enough conversations to know when something clicks. This one did — and it did from the very first moment.In our debut Brand Story with White Knight Labs, we sat down with co-founders John Stigerwalt and Greg Hatcher, and what unfolded was more than a company intro — it was a behind-the-scenes look at what offensive security should be.John's journey is the kind that earns your respect quickly: he started at the help desk and worked his way to CISO, before pivoting into red teaming and co-founding WKL. Greg's path was more unconventional — from orchestral musician to Green Beret to cybersecurity leader. Two very different stories, but a shared philosophy: learn by doing, adapt without a manual, and never take the easy route when something meaningful is on the table.That mindset now defines how White Knight Labs works with clients. They don't sell cookie-cutter pen tests. Instead, they ask the right question up front: How does your business make money? Because if you can answer that, you can identify what a real-world attacker would go after. Then they simulate it — not in theory, but in practice.Their ransomware simulation service is a perfect example. They don't just show up with a scanner. They emulate modern adversaries using Cobalt Strike, bypassing endpoint defenses with in-house payloads, encrypting and exfiltrating data like it's just another Tuesday. Most clients fail the test — not because they're careless, but because most simulations aren't this real.And that's the point.White Knight Labs isn't here to help companies check a box. They're here to expose the gaps and raise the bar — because real threats don't play fair, and security shouldn't pretend they do.What makes them different is what they don't do. They're not an all-in-one shop, and they're proud of that. They won't touch IR for major breaches — they've got partners for that. They only resell hardware and software they've personally vetted. That honesty builds credibility. That kind of focus builds trust.Their training programs are just as intense. Between live DEF CON courses and their online platform, they're giving both new and experienced professionals a chance to train the way they operate: no shortcuts, no watered-down certs, just hard-earned skills that translate into real-world readiness.Pass their ODPC certification, and you'll probably get a call — not because they need to check a hiring box, but because it proves you're serious. And if you can write loaders that bypass real defenses? You're speaking their language.This first conversation with John and Greg reminded us why we started this series in the first place. It's not just about product features or service offerings — it's about people who live and breathe what they do, and who bring that passion into every test, every client call, and every training they offer.We've got more stories with them on the way. But if this first one is any sign of what's to come, we're in for something special.⸻Learn more about White Knight Labs: Guests:John Stigerwalt | Founder at White Knight Labs | Red Team Operations Leader | https://www.linkedin.com/in/john-stigerwalt-90a9b4110/Greg Hatcher | Founder at White Knight Labs | SOF veteran | Red Team | https://www.linkedin.com/in/gregoryhatcher2/White Knight Labs Website | https://itspm.ag/white-knight-labs-vukr______________________Keywords: penetration testing, red team, ransomware simulation, offensive security, EDR bypass, cybersecurity training, White Knight Labs, advanced persistent threat, cybersecurity startup, DEF CON training, security partnerships, cybersecurity services______________________ResourcesVisit the White Knight Labs Website to learn more: https://itspm.ag/white-knight-labs-vukrLearn more and catch more stories from White Knight Labs on ITSPmagazine: https://www.itspmagazine.com/directory/white-knight-labsLearn more about ITSPmagazine Brand Story Podcasts: https://www.itspmagazine.com/purchase-programsNewsletter Archive: https://www.linkedin.com/newsletters/tune-into-the-latest-podcasts-7109347022809309184/Business Newsletter Signup: https://www.itspmagazine.com/itspmagazine-business-updates-sign-upAre you interested in telling your story?https://www.itspmagazine.com/telling-your-story

Talking Lead 571 – SPYCRAFT: Corporate Espionage Files

Firearms Radio Network (All Shows)

Play Episode Listen Later Apr 24, 2025

Bugged boardrooms. Insider moles. Social engineers posing as safety inspectors!? In this Talking Lead episode, Lefty assembles a veteran intel crew—Bryan Seaver U.S. Army Military Police vet and owner of SAPS Squadron Augmented Protection Services, LLC, a Nashville outfit running dignitary protection, K9 ops, and intelligence training. A *Talking Lead* mainstay! He's got firsthand scoop on "Red Teaming"; Mitch Davis U.S. Marine, private investigator, interrogator, Phoenix Consulting Group (now DynCorp) contractor, with a nose for sniffing out moles and lies; Brad Duley U.S. Marine, embassy guard, Phoenix/DynCorp contractor, Iraq vet, deputy sheriff, and precision shooter, bringing tactical grit to the table —to expose the high-stakes world of corporate espionage. They pull back the curtain on real-world spy tactics that were used during the the "Cold War" era and are still used in today's business battles: Red Team operations, honeypots, pretexting, data theft, and the growing threat of AI-driven deception. From cyber breaches to physical infiltrations, the tools of Cold War espionage are now aimed at American companies, defense tech, and even firearms innovation. State-backed actors, insider threats, and corporate sabotage—it's not just overseas anymore. Tune-in and get "Leaducated"!!

american ai social state nashville llc iraq marine insider cold war files lefty k9 red team red teaming spycraft bugged corporate espionage talking lead

TLP 571 – SPYCRAFT: Corporate Espionage Files

Talking Lead Podcast

Play Episode Listen Later Apr 21, 2025 167:50

american ai social state nashville llc iraq marine insider cold war files lefty k9 red team red teaming spycraft bugged corporate espionage talking lead

Joas Santos: Hackeando para proteger - Segurança Ofensiva

PODCAFÉ DA TI

Play Episode Listen Later Apr 8, 2025 83:26 Transcription Available

Joas Santos é especialista em Red Team e traz uma visão prática sobre como pensar segurança de forma ofensiva. Falamos sobre engenharia social, testes de intrusão, inteligência de ameaças, mentoria e os desafios de construir defesas que realmente funcionam. Uma conversa direta com quem está na linha de frente da segurança cibernética no Brasil.

brasil santos cybersecurity uma tecnologia falamos seguran proteger devsecops ofensiva red team hackeando pentest joas guilherme gomes

Breaking AI to Build Trust: A Conversation with a Microsoft Red Team Engineer

Microsoft Business Applications Podcast

Play Episode Listen Later Apr 7, 2025 31:37 Transcription Available

Get featured on the show by leaving us a Voice Mail: https://bit.ly/MIPVM FULL SHOW NOTES https://www.microsoftinnovationpodcast.com/672We dive deep into the world of AI security with Microsoft's Senior Offensive Security Engineer from the AI Red Team who shares insights into how they test and break AI systems to ensure safety and trustworthiness.TAKEAWAYS• Microsoft requires all AI features to be thoroughly documented and approved by a central board• The AI Red Team tests products adversarially and as regular users to identify vulnerabilities• Red teaming originated in military exercises during the Cold War before being adapted for software security• The team tests for jailbreaks, harmful content generation, data exfiltration, and bias• Team members come from diverse backgrounds including PhDs in machine learning, traditional security, and military experience• New AI modalities like audio, images, and video each present unique security challenges• Mental health support is prioritized since team members regularly encounter disturbing content• Working exclusively with failure modes creates a healthy skepticism about AI capabilities• Hands-on experimentation is recommended for anyone wanting to develop AI skills• Curating your own information sources rather than relying on algorithms helps discover new knowledgeCheck out the Microsoft co-pilot and other AI tools to start experimenting and finding practical ways they can help in your daily work.This year we're adding a new show to our line up - The AI Advantage. We'll discuss the skills you need to thrive in an AI-enabled world. DynamicsMinds is a world-class event in Slovenia that brings together Microsoft product managers, industry leaders, and dedicated users to explore the latest in Microsoft Dynamics 365, the Power Platform, and Copilot.Early bird tickets are on sale now and listeners of the Microsoft Innovation Podcast get 10% off with the code MIPVIP144bff https://www.dynamicsminds.com/register/?voucher=MIPVIP144bff Accelerate your Microsoft career with the 90 Day Mentoring Challenge We've helped 1,300+ people across 70+ countries establish successful careers in the Microsoft Power Platform and Dynamics 365 ecosystem.Benefit from expert guidance, a supportive community, and a clear career roadmap. A lot can change in 90 days, get started today!Support the showIf you want to get in touch with me, you can message me here on Linkedin.Thanks for listening

Trailer – PodCafé Tech com Joas Santos

PODCAFÉ DA TI

Play Episode Listen Later Apr 7, 2025 1:56

tech santos ia forma tecnologia cc seguran amanh investiga dominar ofensiva red team red teaming joas senior research director guilherme gomes podcaf

Jerry Tylman, Founder of the Fraud Red Team, on the gaps in fraud detection systems

Lend Academy Podcast

Play Episode Listen Later Apr 3, 2025 35:36

Every bank and fintech company has a suite of anti-fraud tools that they use to keep the bad guys out. Few tools are 100% effective, however, and often the implementation of these tools, along with their interfaces with other system leave gaps. And the fraudsters will exploit these gaps. So, how do you get a holistic view of your anti-fraud arsenal and discover where these gaps are?My next guest on the Fintech One-on-One podcast is Jerry Tylman, the co-founder and partner at Greenway Solutions and the founder of their Fraud Red Team. The Fraud Red Team is all about discovering the gaps, where the weaknesses in the anti-fraud systems are. They are 100% focused on financial services, working with many of the largest banks in the country as well as several fintech companies.In this podcast you will learn:How Greenway Solutions became focused on financial services.What a pen test is and the groundbreaking work they do with fraud controls.The different attack vectors that fraudsters use.Why banks and fintechs need the services of the Fraud Red Team.How successful they are in penetrating the fraud detection systems.How they interact with the anti-fraud providers to banks and fintechs.An example of a recent test they have done that penetrated anti-fraud systems.How they tackle the challenge of account onboarding.Why behavioral technology is a key piece of the puzzle.How deepfake video and audio are being used by fraudsters.The fascinating way that the Fraud Red Team works with deepfakes.Why companies have to completely rethink their internal authentication today.Some of the fintechs they have worked with recently.How they work with check fraud and why it is a growing problem.Why all financial institutions cannot stop investing in anti-fraud tools.Connect with Fintech One-on-One: Tweet me @PeterRenton Connect with me on LinkedIn Find previous Fintech One-on-One episodes

founders gaps red team fraud detection

CISA Layoffs, AI Chatbots in Government, and Utah's Age Verification Law: Tech Policy Updates

Business of Tech

Play Episode Listen Later Mar 12, 2025 16:15

Former Trump administration cybersecurity official Sean Planky has been nominated to lead the Cybersecurity and Infrastructure Security Agency (CISA). His nomination comes amid significant layoffs at the agency, where over 100 employees were let go, including key members of the Red Team responsible for simulating cyberattacks. These cuts raise concerns about CISA's ability to maintain cybersecurity amid ongoing federal budget constraints, potentially leading to increased threats in the private sector as federal infrastructure and intelligence sharing weaken.In the realm of artificial intelligence, the General Services Administration (GSA) has introduced a custom chatbot named GSAI to automate various government tasks, coinciding with significant job cuts within the agency. While the chatbot aims to enhance efficiency, internal memos have warned employees against inputting sensitive information. This trend reflects a broader movement in the federal government towards tech-driven workforce reductions, raising questions about data privacy and the reliability of AI tools in government operations.Utah has made headlines by passing legislation requiring App Store operators to verify the ages of users and obtain parental consent for minors downloading apps. This law, aimed at enhancing online safety for children, has garnered support from major tech companies but has also faced criticism regarding potential infringements on privacy rights. The Supreme Court is expected to examine age verification issues, particularly concerning adult content websites, highlighting the ongoing debate over online safety regulations.The podcast also discusses the competitive landscape of AI, with Google reporting continued growth in search queries despite the rise of ChatGPT. New benchmarks have been developed to measure the honesty of AI models, revealing that larger models do not necessarily correlate with higher honesty rates. As companies like Microsoft and Amazon introduce advanced AI tools, the implications for businesses are significant, emphasizing the need for oversight and governance in AI deployment to mitigate risks associated with inaccuracies and compliance issues. Three things to know today00:00 Cybersecurity Jobs Cut, AI Hired, and Kids Get ID'd—Welcome to the Future of Tech Policy05:45 ChatGPT Isn't Killing Google Search—And AI Lies More Than You'd Think08:27 Microsoft and OpenAI: A Rocky Relationship, While AI Prices Tumble Supported by: https://getflexpoint.com/msp-radio/ Event: https://www.nerdiocon.com/ All our Sponsors: https://businessof.tech/sponsors/ Do you want the show on your podcast app or the written versions of the stories? Subscribe to the Business of Tech: https://www.businessof.tech/subscribe/Looking for a link from the stories? The entire script of the show, with links to articles, are posted in each story on https://www.businessof.tech/ Support the show on Patreon: https://patreon.com/mspradio/ Want to be a guest on Business of Tech: Daily 10-Minute IT Services Insights? Send Dave Sobel a message on PodMatch, here: https://www.podmatch.com/hostdetailpreview/businessoftech Want our stuff? Cool Merch? Wear “Why Do We Care?” - Visit https://mspradio.myspreadshop.com Follow us on:LinkedIn: https://www.linkedin.com/company/28908079/YouTube: https://youtube.com/mspradio/Facebook: https://www.facebook.com/mspradionews/Instagram: https://www.instagram.com/mspradio/TikTok: https://www.tiktok.com/@businessoftechBluesky: https://bsky.app/profile/businessof.tech

Cyberpunk Red Rising: Ep 5. Red Team (Everyday Heroes)

Nastygram: An RPG Podcast

Play Episode Listen Later Mar 11, 2025 58:10

We come back to the other side of the brain dance and find that the crew we had gotten to know are actually the targets. Our new team is coming for them and will squeeze Baron in anyway they need to to get it, but types like him have a few hardwire tricks up their sleeves... Thanks to A Wilhelm Scream for intro music, "Walkin' with Michael Douglas" more here https://www.awilhelmscream.com/ Theme song for Red Rising is "Neon Drifter" by Antti Martikainen. All other scores are by Antti Martikainen and Adrian von Ziegler. Check us out online at www.nastygramrpg.com Find us on Facebook at www.facebook.com/nastygram and our group is at https://www.facebook.com/groups/865467380821766; we are @nastygramrpg on both Instagram and Twitter and on Tik Tok at @nastygram.rpg

tiktok baron michael douglas ziegler walkin everyday heroes red team red rising wilhelm scream cyberpunk red antti martikainen

Hacked Healthcare, Hacked Cars & The Hidden Risks of Modern Tech

Hacker And The Fed

Play Episode Listen Later Feb 6, 2025 49:13

This week on Hacker And The Fed former FBI agent Chris Tarbell and ex-black hat hacker Hector Monsegur discuss a shocking backdoor found in healthcare patient monitors linked to China, a major vulnerability in Subaru's Starlink system allowing remote vehicle control, and the ongoing concerns over modern cars collecting unnecessary user data. They also discuss cybersecurity career paths—Blue Team vs. Red Team—and how to build a well-rounded skillset. Plus, plenty of laughs, from muscle car nostalgia to an unexpected debate about pole vs. stripper dancing. Send HATF your questions at questions@hackerandthefed.com.

china tech modern healthcare fbi hidden cars risks hacked starlink subaru red team blue team hector monsegur

Back to the office, back to the threats.

Hacking Humans

Play Episode Listen Later Jan 23, 2025 43:11

On Hacking Humans, Dave Bittner, Joe Carrigan, and Maria Varmazis (also host of N2K's daily space podcast, T-Minus), are once again sharing the latest in social engineering scams, phishing schemes, and criminal exploits that are making headlines to help our audience become aware of what is out there. This week Maria has the story on how the return to office life brings unique security challenges, highlighting the need for Red Team assessments to uncover and address physical and digital vulnerabilities, empowering organizations to proactively enhance workplace security and protect against evolving threats. Joe's story comes from the FCC's warning about a scam dubbed "Green Mirage," where fraudsters impersonate mortgage lenders, spoof caller IDs, and use social engineering to trick financially vulnerable homeowners into sending payments via unconventional methods, often only discovered when foreclosure proceedings begin. Last but not least, Dave's story is on how a Reddit user shared their cautious experiment with a suspected Airbnb scam involving a new account requesting to move to WhatsApp, agreeing to unusually high rental rates, and engaging in rapport-building tactics, with red flags pointing to potential financial fraud or phishing attempts. Our catch of the day comes from listener William, who spotted a phishing scam disguised as a security alert about a compromised crypto wallet, featuring an unsolicited QR code and a generic warning that targets even non-crypto users. Resources and links to stories: Navigating Workplace Security: Red Team Insights for the Return to Office FCC warns of 50-state scam by fraudsters posing as mortgage lenders FCC ENFORCEMENT ADVISORY I'm saying "Yes" to the Chinese long-term rental WhatsApp chat asking for video You can hear more from the T-Minus space daily show here. Have a Catch of the Day you'd like to share? Email it to us at hackinghumans@n2k.com.

office chinese whatsapp airbnb threats reddit qr fcc ids red team t minus n2k dave bittner joe carrigan

Refactoring the Windows Kernel with Joe Bialek

The BlueHat Podcast

Play Episode Listen Later Jan 22, 2025 47:14

In this episode of The BlueHat Podcast, hosts Nic Fillingham and Wendy Zenone are joined by BlueHat 2024 presenter Joe Bialek, a security engineer at Microsoft with over 13 years of experience. Joe shares his fascinating journey from intern to red team pioneer, recounting how he helped establish the Office 365 Red Team and pushed the boundaries of ethical hacking within Microsoft. He discusses his formative years building sneaky hacking tools, navigating the controversial beginnings of red teaming, and transitioning to the Windows Security Team to focus on low-level security and mitigations. Joe reflects on the challenges of internal hacking, the human reactions to being "hacked," and the value of strengthening defenses before external threats arise. In This Episode You Will Learn: How Microsoft is developing tooling to identify and address bad programming patterns Why kernel-related discussions are primarily focused on Windows and driver developers The challenges developers face when reading and writing through pointers in C or C++ Some Questions We Ask: How does working with the Windows kernel impact system security and performance? What sets Windows kernel and driver development apart from other types of development? Why should internal teams test systems for vulnerabilities before external hackers? Resources: View Joe Bialek on LinkedIn View Wendy Zenone on LinkedIn View Nic Fillingham on LinkedIn BlueHat 2024 Session: Pointer Problems – Why We're Refactoring the Windows Kernel Related Microsoft Podcasts: Microsoft Threat Intelligence Podcast Afternoon Cyber Tea with Ann Johnson Uncovering Hidden Risks Discover and follow other Microsoft podcasts at microsoft.com/podcasts The BlueHat Podcast is produced by Microsoft and distributed as part of N2K media network.

office microsoft windows kernel red team refactoring bialek n2k bluehat

Yahoo Red Team Layoffs, North Korea Infiltrating U.S. Companies, Data Breaches, and Protecting your Medical History

Hacker And The Fed

Play Episode Listen Later Dec 19, 2024 47:04

This week on Hacker And The Fed former FBI agent Chris Tarbell and ex-black hat hacker Hector Monsegur discuss Yahoo's controversial decision to lay off its red team, the rise of North Korean IT workers infiltrating U.S. companies, and the ethical dilemmas around hacking. They also reflects on the desensitization to data breaches, debate the significance of protecting medical history, and share candid moments about their personal lives and experiences in the industry. Send HATF your questions at questions@hackerandthefed.com.

fbi companies protecting yahoo north korea layoffs infiltrating data breaches red team medical history hector monsegur

Automating Scientific Discovery, with Andrew White, Head of Science at Future House

Play Episode Listen Later Dec 5, 2024 118:32

In this episode of The Cognitive Revolution, Nathan interviews Andrew White, Professor of Chemical Engineering at the University of Rochester and Head of Science at Future House. We explore groundbreaking AI systems for scientific discovery, including PaperQA and Aviary, and discuss how large language models are transforming research. Join us for an insightful conversation about the intersection of AI and scientific advancement with this pioneering researcher in his first-ever podcast appearance. Check out Future House: https://www.futurehouse.org Help shape our show by taking our quick listener survey at https://bit.ly/TurpentinePulse SPONSORS: Oracle Cloud Infrastructure (OCI): Oracle's next-generation cloud platform delivers blazing-fast AI and ML performance with 50% less for compute and 80% less for outbound networking compared to other cloud providers13. OCI powers industry leaders with secure infrastructure and application development capabilities. New U.S. customers can get their cloud bill cut in half by switching to OCI before December 31, 2024 at https://oracle.com/cognitive SelectQuote: Finding the right life insurance shouldn't be another task you put off. SelectQuote compares top-rated policies to get you the best coverage at the right price. Even in our AI-driven world, protecting your family's future remains essential. Get your personalized quote at https://selectquote.com/cognitive Shopify: Shopify is the world's leading e-commerce platform, offering a market-leading checkout system and exclusive AI apps like Quikly. Nobody does selling better than Shopify. Get a $1 per month trial at https://shopify.com/cognitive CHAPTERS: (00:00:00) Teaser (00:01:13) About the Episode (00:04:37) Andrew White's Journey (00:10:23) GPT-4 Red Team (00:15:33) GPT-4 & Chemistry (00:17:54) Sponsors: Oracle Cloud Infrastructure (OCI) | SelectQuote (00:20:19) Biology vs Physics (00:23:14) Conceptual Dark Matter (00:26:27) Future House Intro (00:30:42) Semi-Autonomous AI (00:35:39) Sponsors: Shopify (00:37:00) Lab Automation (00:39:46) In Silico Experiments (00:45:22) Cost of Experiments (00:51:30) Multi-Omic Models (00:54:54) Scale and Grokking (01:00:53) Future House Projects (01:10:42) Paper QA Insights (01:16:28) Generalizing to Other Domains (01:17:57) Using Figures Effectively (01:22:01) Need for Specialized Tools (01:24:23) Paper QA Cost & Latency (01:27:37) Aviary: Agents & Environments (01:31:42) Black Box Gradient Estimation (01:36:14) Open vs Closed Models (01:37:52) Improvement with Training (01:40:00) Runtime Choice & Q-Learning (01:43:43) Narrow vs General AI (01:48:22) Future Directions & Needs (01:53:22) Future House: What's Next? (01:55:32) Outro SOCIAL LINKS: Website: https://www.cognitiverevolution.ai Twitter (Podcast): https://x.com/cogrev_podcast Twitter (Nathan): https://x.com/labenz LinkedIn: https://www.linkedin.com/in/nathanlabenz/ Youtube: https://www.youtube.com/@CognitiveRevolutionPodcast Apple: https://podcasts.apple.com/de/podcast/the-cognitive-revolution-ai-builders-researchers-and/id1669813431 Spotify: https://open.spotify.com/show/6yHyok3M3BjqzR0VB5MSyk

Red Team Tactics with EA's Johann Rehberger

Crying Out Cloud

Play Episode Listen Later Nov 29, 2024 34:38

Dive into the latest #CryingOutCloud episode featuring Johann Rehberger! Join Eden and Amitai as they sit down with Johann Rehberger, Red Team Director at @electronicarts and a cybersecurity expert. Johann also publishes innovative security research on his blog, Embrace the Red. What you'll learn:

ai embrace dive tactics johann red team amitai

A THANKSGIVING SPECIAL: Phishing Failures, Red Team Career Advice, and Cybersecurity Ethics

Hacker And The Fed

Play Episode Listen Later Nov 28, 2024 47:12

This week on Hacker And The Fed former FBI agent Chris Tarbell and ex-black hat hacker Hector Monsegur discuss key cybersecurity challenges, from the effectiveness of phishing training to the ethical dilemmas of vulnerability disclosure. They explore how technical controls and employee education can work together to defend against increasingly sophisticated attacks, including SMS and social media phishing. They also dive into career advice for transitioning from Blue Team to Red Team roles and the complexities of the cybersecurity job market. And to close out, a heartfelt Thanksgiving message.

thanksgiving fbi ethics cybersecurity failures sms thanksgiving special career advice phishing red team blue team hector monsegur

Jason Haddix on Red Team Tactics, CISO Challenges, and the Battle for Gaming Security

Breaking Badness

Play Episode Listen Later Nov 13, 2024 44:21

In this episode of the Breaking Badness Cybersecurity Podcast, Jason Haddix dives into his unique journey from red teaming and pentesting to leading security teams as a CISO in high-profile organizations, including a top gaming company. Jason unpacks the distinct challenges of securing a gaming company, where risks come not only from state actors but also from clout-seeking young hackers. He shares valuable insights on building scalable security programs, secrets management, and the importance of radical transparency in corporate security cultures. Tune in to hear why, in Jason's words, "gaming saved me from a misspent youth," and learn about his latest ventures into offensive security training and AI-driven security solutions.

ai battle challenges security gaming tactics ciso red team haddix

Agents @ Work: Dust.tt

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Nov 11, 2024 60:06

We are recording our next big recap episode and taking questions! Submit questions and messages on Speakpipe here for a chance to appear on the show!Also subscribe to our calendar for our Singapore, NeurIPS, and all upcoming meetups!In our first ever episode with Logan Kilpatrick we called out the two hottest LLM frameworks at the time: LangChain and Dust. We've had Harrison from LangChain on twice (as a guest and as a co-host), and we've now finally come full circle as Stanislas from Dust joined us in the studio.After stints at Oracle and Stripe, Stan had joined OpenAI to work on mathematical reasoning capabilities. He describes his time at OpenAI as "the PhD I always wanted to do" while acknowledging the challenges of research work: "You're digging into a field all day long for weeks and weeks, and you find something, you get super excited for 12 seconds. And at the 13 seconds, you're like, 'oh, yeah, that was obvious.' And you go back to digging." This experience, combined with early access to GPT-4's capabilities, shaped his decision to start Dust: "If we believe in AGI and if we believe the timelines might not be too long, it's actually the last train leaving the station to start a company. After that, it's going to be computers all the way down."The History of DustDust's journey can be broken down into three phases:* Developer Framework (2022): Initially positioned as a competitor to LangChain, Dust started as a developer tooling platform. While both were open source, their approaches differed – LangChain focused on broad community adoption and integration as a pure developer experience, while Dust emphasized UI-driven development and better observability that wasn't just `print` statements.* Browser Extension (Early 2023): The company pivoted to building XP1, a browser extension that could interact with web content. This experiment helped validate user interaction patterns with AI, even while using less capable models than GPT-4.* Enterprise Platform (Current): Today, Dust has evolved into an infrastructure platform for deploying AI agents within companies, with impressive metrics like 88% daily active users in some deployments.The Case for Being HorizontalThe big discussion for early stage companies today is whether or not to be horizontal or vertical. Since models are so good at general tasks, a lot of companies are building vertical products that take care of a workflow end-to-end in order to offer more value and becoming more of “Services as Software”. Dust on the other hand is a platform for the users to build their own experiences, which has had a few advantages:* Maximum Penetration: Dust reports 60-70% weekly active users across entire companies, demonstrating the potential reach of horizontal solutions rather than selling into a single team.* Emergent Use Cases: By allowing non-technical users to create agents, Dust enables use cases to emerge organically from actual business needs rather than prescribed solutions.* Infrastructure Value: The platform approach creates lasting value through maintained integrations and connections, similar to how Stripe's value lies in maintaining payment infrastructure. Rather than relying on third-party integration providers, Dust maintains its own connections to ensure proper handling of different data types and structures.The Vertical ChallengeHowever, this approach comes with trade-offs:* Harder Go-to-Market: As Stan talked about: "We spike at penetration... but it makes our go-to-market much harder. Vertical solutions have a go-to-market that is much easier because they're like, 'oh, I'm going to solve the lawyer stuff.'"* Complex Infrastructure: Building a horizontal platform requires maintaining numerous integrations and handling diverse data types appropriately – from structured Salesforce data to unstructured Notion pages. As you scale integrations, the cost of maintaining them also scales. * Product Surface Complexity: Creating an interface that's both powerful and accessible to non-technical users requires careful design decisions, down to avoiding technical terms like "system prompt" in favor of "instructions." The Future of AI PlatformsStan initially predicted we'd see the first billion-dollar single-person company in 2023 (a prediction later echoed by Sam Altman), but he's now more focused on a different milestone: billion-dollar companies with engineering teams of just 20 people, enabled by AI assistance.This vision aligns with Dust's horizontal platform approach – building the infrastructure that allows small teams to achieve outsized impact through AI augmentation. Rather than replacing entire job functions (the vertical approach), they're betting on augmenting existing workflows across organizations.Full YouTube EpisodeChapters* 00:00:00 Introductions* 00:04:33 Joining OpenAI from Paris* 00:09:54 Research evolution and compute allocation at OpenAI* 00:13:12 Working with Ilya Sutskever and OpenAI's vision* 00:15:51 Leaving OpenAI to start Dust* 00:18:15 Early focus on browser extension and WebGPT-like functionality* 00:20:20 Dust as the infrastructure for agents* 00:24:03 Challenges of building with early AI models* 00:28:17 LLMs and Workflow Automation* 00:35:28 Building dependency graphs of agents* 00:37:34 Simulating API endpoints* 00:40:41 State of AI models* 00:43:19 Running evals* 00:46:36 Challenges in building AI agents infra* 00:49:21 Buy vs. build decisions for infrastructure components* 00:51:02 Future of SaaS and AI's Impact on Software* 00:53:07 The single employee $1B company race* 00:56:32 Horizontal vs. vertical approaches to AI agentsTranscriptAlessio [00:00:00]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol.ai.Swyx [00:00:11]: Hey, and today we're in a studio with Stanislas, welcome.Stan [00:00:14]: Thank you very much for having me.Swyx [00:00:16]: Visiting from Paris.Stan [00:00:17]: Paris.Swyx [00:00:18]: And you have had a very distinguished career. It's very hard to summarize, but you went to college in both Ecopolytechnique and Stanford, and then you worked in a number of places, Oracle, Totems, Stripe, and then OpenAI pre-ChatGPT. We'll talk, we'll spend a little bit of time about that. About two years ago, you left OpenAI to start Dust. I think you were one of the first OpenAI alum founders.Stan [00:00:40]: Yeah, I think it was about at the same time as the Adept guys, so that first wave.Swyx [00:00:46]: Yeah, and people really loved our David episode. We love a few sort of OpenAI stories, you know, for back in the day, like we're talking about pre-recording. Probably the statute of limitations on some of those stories has expired, so you can talk a little bit more freely without them coming after you. But maybe we'll just talk about, like, what was your journey into AI? You know, you were at Stripe for almost five years, there are a lot of Stripe alums going into OpenAI. I think the Stripe culture has come into OpenAI quite a bit.Stan [00:01:11]: Yeah, so I think the buses of Stripe people really started flowing in, I guess, after ChatGPT. But, yeah, my journey into AI is a... I mean, Greg Brockman. Yeah, yeah. From Greg, of course. And Daniela, actually, back in the days, Daniela Amodei.Swyx [00:01:27]: Yes, she was COO, I mean, she is COO, yeah. She had a pretty high job at OpenAI at the time, yeah, for sure.Stan [00:01:34]: My journey started as anybody else, you're fascinated with computer science and you want to make them think, it's awesome, but it doesn't work. I mean, it was a long time ago, it was like maybe 16, so it was 25 years ago. Then the first big exposure to AI would be at Stanford, and I'm going to, like, disclose a whole lamb, because at the time it was a class taught by Andrew Ng, and there was no deep learning. It was half features for vision and a star algorithm. So it was fun. But it was the early days of deep learning. At the time, I think a few years after, it was the first project at Google. But you know, that cat face or the human face trained from many images. I went to, hesitated doing a PhD, more in systems, eventually decided to go into getting a job. Went at Oracle, started a company, did a gazillion mistakes, got acquired by Stripe, worked with Greg Buckman there. And at the end of Stripe, I started interesting myself in AI again, felt like it was the time, you had the Atari games, you had the self-driving craziness at the time. And I started exploring projects, it felt like the Atari games were incredible, but there were still games. And I was looking into exploring projects that would have an impact on the world. And so I decided to explore three things, self-driving cars, cybersecurity and AI, and math and AI. It's like I sing it by a decreasing order of impact on the world, I guess.Swyx [00:03:01]: Discovering new math would be very foundational.Stan [00:03:03]: It is extremely foundational, but it's not as direct as driving people around.Swyx [00:03:07]: Sorry, you're doing this at Stripe, you're like thinking about your next move.Stan [00:03:09]: No, it was at Stripe, kind of a bit of time where I started exploring. I did a bunch of work with friends on trying to get RC cars to drive autonomously. Almost started a company in France or Europe about self-driving trucks. We decided to not go for it because it was probably very operational. And I think the idea of the company, of the team wasn't there. And also I realized that if I wake up a day and because of a bug I wrote, I killed a family, it would be a bad experience. And so I just decided like, no, that's just too crazy. And then I explored cybersecurity with a friend. We're trying to apply transformers to cut fuzzing. So cut fuzzing, you have kind of an algorithm that goes really fast and tries to mutate the inputs of a library to find bugs. And we tried to apply a transformer to that and do reinforcement learning with the signal of how much you propagate within the binary. Didn't work at all because the transformers are so slow compared to evolutionary algorithms that it kind of didn't work. Then I started interested in math and AI and started working on SAT solving with AI. And at the same time, OpenAI was kind of starting the reasoning team that were tackling that project as well. I was in touch with Greg and eventually got in touch with Ilya and finally found my way to OpenAI. I don't know how much you want to dig into that. The way to find your way to OpenAI when you're in Paris was kind of an interesting adventure as well.Swyx [00:04:33]: Please. And I want to note, this was a two-month journey. You did all this in two months.Stan [00:04:38]: The search.Swyx [00:04:40]: Your search for your next thing, because you left in July 2019 and then you joined OpenAI in September.Stan [00:04:45]: I'm going to be ashamed to say that.Swyx [00:04:47]: You were searching before. I was searching before.Stan [00:04:49]: I mean, it's normal. No, the truth is that I moved back to Paris through Stripe and I just felt the hardship of being remote from your team nine hours away. And so it kind of freed a bit of time for me to start the exploration before. Sorry, Patrick. Sorry, John.Swyx [00:05:05]: Hopefully they're listening. So you joined OpenAI from Paris and from like, obviously you had worked with Greg, but notStan [00:05:13]: anyone else. No. Yeah. So I had worked with Greg, but not Ilya, but I had started chatting with Ilya and Ilya was kind of excited because he knew that I was a good engineer through Greg, I presume, but I was not a trained researcher, didn't do a PhD, never did research. And I started chatting and he was excited all the way to the point where he was like, hey, come pass interviews, it's going to be fun. I think he didn't care where I was, he just wanted to try working together. So I go to SF, go through the interview process, get an offer. And so I get Bob McGrew on the phone for the first time, he's like, hey, Stan, it's awesome. You've got an offer. When are you coming to SF? I'm like, hey, it's awesome. I'm not coming to the SF. I'm based in Paris and we just moved. He was like, hey, it's awesome. Well, you don't have an offer anymore. Oh, my God. No, it wasn't as hard as that. But that's basically the idea. And it took me like maybe a couple more time to keep chatting and they eventually decided to try a contractor set up. And that's how I kind of started working at OpenAI, officially as a contractor, but in practice really felt like being an employee.Swyx [00:06:14]: What did you work on?Stan [00:06:15]: So it was solely focused on math and AI. And in particular in the application, so the study of the larger grid models, mathematical reasoning capabilities, and in particular in the context of formal mathematics. The motivation was simple, transformers are very creative, but yet they do mistakes. Formal math systems are of the ability to verify a proof and the tactics they can use to solve problems are very mechanical, so you miss the creativity. And so the idea was to try to explore both together. You would get the creativity of the LLMs and the kind of verification capabilities of the formal system. A formal system, just to give a little bit of context, is a system in which a proof is a program and the formal system is a type system, a type system that is so evolved that you can verify the program. If the type checks, it means that the program is correct.Swyx [00:07:06]: Is the verification much faster than actually executing the program?Stan [00:07:12]: Verification is instantaneous, basically. So the truth is that what you code in involves tactics that may involve computation to search for solutions. So it's not instantaneous. You do have to do the computation to expand the tactics into the actual proof. The verification of the proof at the very low level is instantaneous.Swyx [00:07:32]: How quickly do you run into like, you know, halting problem PNP type things, like impossibilities where you're just like that?Stan [00:07:39]: I mean, you don't run into it at the time. It was really trying to solve very easy problems. So I think the... Can you give an example of easy? Yeah, so that's the mass benchmark that everybody knows today. The Dan Hendricks one. The Dan Hendricks one, yeah. And I think it was the low end part of the mass benchmark at the time, because that mass benchmark includes AMC problems, AMC 8, AMC 10, 12. So these are the easy ones. Then AIME problems, somewhat harder, and some IMO problems, like Crazy Arm.Swyx [00:08:07]: For our listeners, we covered this in our Benchmarks 101 episode. AMC is literally the grade of like high school, grade 8, grade 10, grade 12. So you can solve this. Just briefly to mention this, because I don't think we'll touch on this again. There's a bit of work with like Lean, and then with, you know, more recently with DeepMind doing like scoring like silver on the IMO. Any commentary on like how math has evolved from your early work to today?Stan [00:08:34]: I mean, that result is mind blowing. I mean, from my perspective, spent three years on that. At the same time, Guillaume Lampe in Paris, we were both in Paris, actually. He was at FAIR, was working on some problems. We were pushing the boundaries, and the goal was the IMO. And we cracked a few problems here and there. But the idea of getting a medal at an IMO was like just remote. So this is an impressive result. And we can, I think the DeepMind team just did a good job of scaling. I think there's nothing too magical in their approach, even if it hasn't been published. There's a Dan Silver talk from seven days ago where it goes a little bit into more details. It feels like there's nothing magical there. It's really applying reinforcement learning and scaling up the amount of data that can generate through autoformalization. So we can dig into what autoformalization means if you want.Alessio [00:09:26]: Let's talk about the tail end, maybe, of the OpenAI. So you joined, and you're like, I'm going to work on math and do all of these things. I saw on one of your blog posts, you mentioned you fine-tuned over 10,000 models at OpenAI using 10 million A100 hours. How did the research evolve from the GPD 2, and then getting closer to DaVinci 003? And then you left just before ChatGPD was released, but tell people a bit more about the research path that took you there.Stan [00:09:54]: I can give you my perspective of it. I think at OpenAI, there's always been a large chunk of the compute that was reserved to train the GPTs, which makes sense. So it was pre-entropic splits. Most of the compute was going to a product called Nest, which was basically GPT-3. And then you had a bunch of, let's say, remote, not core research teams that were trying to explore maybe more specific problems or maybe the algorithm part of it. The interesting part, I don't know if it was where your question was going, is that in those labs, you're managing researchers. So by definition, you shouldn't be managing them. But in that space, there's a managing tool that is great, which is compute allocation. Basically by managing the compute allocation, you can message the team of where you think the priority should go. And so it was really a question of, you were free as a researcher to work on whatever you wanted. But if it was not aligned with OpenAI mission, and that's fair, you wouldn't get the compute allocation. As it happens, solving math was very much aligned with the direction of OpenAI. And so I was lucky to generally get the compute I needed to make good progress.Swyx [00:11:06]: What do you need to show as incremental results to get funded for further results?Stan [00:11:12]: It's an imperfect process because there's a bit of a... If you're working on math and AI, obviously there's kind of a prior that it's going to be aligned with the company. So it's much easier than to go into something much more risky, much riskier, I guess. You have to show incremental progress, I guess. It's like you ask for a certain amount of compute and you deliver a few weeks after and you demonstrate that you have a progress. Progress might be a positive result. Progress might be a strong negative result. And a strong negative result is actually often much harder to get or much more interesting than a positive result. And then it generally goes into, as any organization, you would have people finding your project or any other project cool and fancy. And so you would have that kind of phase of growing up compute allocation for it all the way to a point. And then maybe you reach an apex and then maybe you go back mostly to zero and restart the process because you're going in a different direction or something else. That's how I felt. Explore, exploit. Yeah, exactly. Exactly. Exactly. It's a reinforcement learning approach.Swyx [00:12:14]: Classic PhD student search process.Alessio [00:12:17]: And you were reporting to Ilya, like the results you were kind of bringing back to him or like what's the structure? It's almost like when you're doing such cutting edge research, you need to report to somebody who is actually really smart to understand that the direction is right.Stan [00:12:29]: So we had a reasoning team, which was working on reasoning, obviously, and so math in general. And that team had a manager, but Ilya was extremely involved in the team as an advisor, I guess. Since he brought me in OpenAI, I was lucky to mostly during the first years to have kind of a direct access to him. He would really coach me as a trainee researcher, I guess, with good engineering skills. And Ilya, I think at OpenAI, he was the one showing the North Star, right? He was his job and I think he really enjoyed it and he did it super well, was going through the teams and saying, this is where we should be going and trying to, you know, flock the different teams together towards an objective.Swyx [00:13:12]: I would say like the public perception of him is that he was the strongest believer in scaling. Oh, yeah. Obviously, he has always pursued the compression thesis. You have worked with him personally, what does the public not know about how he works?Stan [00:13:26]: I think he's really focused on building the vision and communicating the vision within the company, which was extremely useful. I was personally surprised that he spent so much time, you know, working on communicating that vision and getting the teams to work together versus...Swyx [00:13:40]: To be specific, vision is AGI? Oh, yeah.Stan [00:13:42]: Vision is like, yeah, it's the belief in compression and scanning computes. I remember when I started working on the Reasoning team, the excitement was really about scaling the compute around Reasoning and that was really the belief we wanted to ingrain in the team. And that's what has been useful to the team and with the DeepMind results shows that it was the right approach with the success of GPT-4 and stuff shows that it was the right approach.Swyx [00:14:06]: Was it according to the neural scaling laws, the Kaplan paper that was published?Stan [00:14:12]: I think it was before that, because those ones came with GPT-3, basically at the time of GPT-3 being released or being ready internally. But before that, there really was a strong belief in scale. I think it was just the belief that the transformer was a generic enough architecture that you could learn anything. And that was just a question of scaling.Alessio [00:14:33]: Any other fun stories you want to tell? Sam Altman, Greg, you know, anything.Stan [00:14:37]: Weirdly, I didn't work that much with Greg when I was at OpenAI. He had always been mostly focused on training the GPTs and rightfully so. One thing about Sam Altman, he really impressed me because when I joined, he had joined not that long ago and it felt like he was kind of a very high level CEO. And I was mind blown by how deep he was able to go into the subjects within a year or something, all the way to a situation where when I was having lunch by year two, I was at OpenAI with him. He would just quite know deeply what I was doing. With no ML background. Yeah, with no ML background, but I didn't have any either, so I guess that explains why. But I think it's a question about, you don't necessarily need to understand the very technicalities of how things are done, but you need to understand what's the goal and what's being done and what are the recent results and all of that in you. And we could have kind of a very productive discussion. And that really impressed me, given the size at the time of OpenAI, which was not negligible.Swyx [00:15:44]: Yeah. I mean, you've been a, you were a founder before, you're a founder now, and you've seen Sam as a founder. How has he affected you as a founder?Stan [00:15:51]: I think having that capability of changing the scale of your attention in the company, because most of the time you operate at a very high level, but being able to go deep down and being in the known of what's happening on the ground is something that I feel is really enlightening. That's not a place in which I ever was as a founder, because first company, we went all the way to 10 people. Current company, there's 25 of us. So the high level, the sky and the ground are pretty much at the same place. No, you're being too humble.Swyx [00:16:21]: I mean, Stripe was also like a huge rocket ship.Stan [00:16:23]: Stripe, I was a founder. So I was, like at OpenAI, I was really happy being on the ground, pushing the machine, making it work. Yeah.Swyx [00:16:31]: Last OpenAI question. The Anthropic split you mentioned, you were around for that. Very dramatic. David also left around that time, you left. This year, we've also had a similar management shakeup, let's just call it. Can you compare what it was like going through that split during that time? And then like, does that have any similarities now? Like, are we going to see a new Anthropic emerge from these folks that just left?Stan [00:16:54]: That I really, really don't know. At the time, the split was pretty surprising because they had been trying GPT-3, it was a success. And to be completely transparent, I wasn't in the weeds of the splits. What I understood of it is that there was a disagreement of the commercialization of that technology. I think the focal point of that disagreement was the fact that we started working on the API and wanted to make those models available through an API. Is that really the core disagreement? I don't know.Swyx [00:17:25]: Was it safety?Stan [00:17:26]: Was it commercialization?Swyx [00:17:27]: Or did they just want to start a company?Stan [00:17:28]: Exactly. Exactly. That I don't know. But I think what I was surprised of is how quickly OpenAI recovered at the time. And I think it's just because we were mostly a research org and the mission was so clear that some divergence in some teams, some people leave, the mission is still there. We have the compute. We have a site. So it just keeps going.Swyx [00:17:50]: Very deep bench. Like just a lot of talent. Yeah.Alessio [00:17:53]: So that was the OpenAI part of the history. Exactly. So then you leave OpenAI in September 2022. And I would say in Silicon Valley, the two hottest companies at the time were you and Lanktrain. What was that start like and why did you decide to start with a more developer focused kind of like an AI engineer tool rather than going back into some more research and something else?Stan [00:18:15]: Yeah. First, I'm not a trained researcher. So going through OpenAI was really kind of the PhD I always wanted to do. But research is hard. You're digging into a field all day long for weeks and weeks and weeks, and you find something, you get super excited for 12 seconds. And at the 13 seconds, you're like, oh, yeah, that was obvious. And you go back to digging. I'm not a trained, like formally trained researcher, and it wasn't kind of a necessarily an ambition of me of creating, of having a research career. And I felt the hardness of it. I enjoyed a lot of like that a ton. But at the time, I decided that I wanted to go back to something more productive. And the other fun motivation was like, I mean, if we believe in AGI and if we believe the timelines might not be too long, it's actually the last train leaving the station to start a company. After that, it's going to be computers all the way down. And so that was kind of the true motivation for like trying to go there. So that's kind of the core motivation at the beginning of personally. And the motivation for starting a company was pretty simple. I had seen GPT-4 internally at the time, it was September 2022. So it was pre-GPT, but GPT-4 was ready since, I mean, I'd been ready for a few months internally. I was like, okay, that's obvious, the capabilities are there to create an insane amount of value to the world. And yet the deployment is not there yet. The revenue of OpenAI at the time were ridiculously small compared to what it is today. So the thesis was, there's probably a lot to be done at the product level to unlock the usage.Alessio [00:19:49]: Yeah. Let's talk a bit more about the form factor, maybe. I think one of the first successes you had was kind of like the WebGPT-like thing, like using the models to traverse the web and like summarize things. And the browser was really the interface. Why did you start with the browser? Like what was it important? And then you built XP1, which was kind of like the browser extension.Stan [00:20:09]: So the starting point at the time was, if you wanted to talk about LLMs, it was still a rather small community, a community of mostly researchers and to some extent, very early adopters, very early engineers. It was almost inconceivable to just build a product and go sell it to the enterprise, though at the time there was a few companies doing that. The one on marketing, I don't remember its name, Jasper. But so the natural first intention, the first, first, first intention was to go to the developers and try to create tooling for them to create product on top of those models. And so that's what Dust was originally. It was quite different than Lanchain, and Lanchain just beat the s**t out of us, which is great. It's a choice.Swyx [00:20:53]: You were cloud, in closed source. They were open source.Stan [00:20:56]: Yeah. So technically we were open source and we still are open source, but I think that doesn't really matter. I had the strong belief from my research time that you cannot create an LLM-based workflow on just one example. Basically, if you just have one example, you overfit. So as you develop your interaction, your orchestration around the LLM, you need a dozen examples. Obviously, if you're running a dozen examples on a multi-step workflow, you start paralyzing stuff. And if you do that in the console, you just have like a messy stream of tokens going out and it's very hard to observe what's going there. And so the idea was to go with an UI so that you could kind of introspect easily the output of each interaction with the model and dig into there through an UI, which is-Swyx [00:21:42]: Was that open source? I actually didn't come across it.Stan [00:21:44]: Oh yeah, it wasn't. I mean, Dust is entirely open source even today. We're not going for an open source-Swyx [00:21:48]: If it matters, I didn't know that.Stan [00:21:49]: No, no, no, no, no. The reason why is because we're not open source because we're not doing an open source strategy. It's not an open source go-to-market at all. We're open source because we can and it's fun.Swyx [00:21:59]: Open source is marketing. You have all the downsides of open source, which is like people can clone you.Stan [00:22:03]: But I think that downside is a big fallacy. Okay. Yes, anybody can clone Dust today, but the value of Dust is not the current state. The value of Dust is the number of eyeballs and hands of developers that are creating to it in the future. And so yes, anybody can clone it today, but that wouldn't change anything. There is some value in being open source. In a discussion with the security team, you can be extremely transparent and just show the code. When you have discussion with users and there's a bug or a feature missing, you can just point to the issue, show the pull request, show the, show the, exactly, oh, PR welcome. That doesn't happen that much, but you can show the progress if the person that you're chatting with is a little bit technical, they really enjoy seeing the pull request advancing and seeing all the way to deploy. And then the downsides are mostly around security. You never want to do security by obfuscation. But the truth is that your vector of attack is facilitated by you being open source. But at the same time, it's a good thing because if you're doing anything like a bug bountying or stuff like that, you just give much more tools to the bug bountiers so that their output is much better. So there's many, many, many trade-offs. I don't believe in the value of the code base per se. I think it's really the people that are on the code base that have the value and go to market and the product and all of those things that are around the code base. Obviously, that's not true for every code base. If you're working on a very secret kernel to accelerate the inference of LLMs, I would buy that you don't want to be open source. But for product stuff, I really think there's very little risk. Yeah.Alessio [00:23:39]: I signed up for XP1, I was looking, January 2023. I think at the time you were on DaVinci 003. Given that you had seen GPD 4, how did you feel having to push a product out that was using this model that was so inferior? And you're like, please, just use it today. I promise it's going to get better. Just overall, as a founder, how do you build something that maybe doesn't quite work with the model today, but you're just expecting the new model to be better?Stan [00:24:03]: Yeah, so actually, XP1 was even on a smaller one that was the post-GDPT release, small version, so it was... Ada, Babbage... No, no, no, not that far away. But it was the small version of GDPT, basically. I don't remember its name. Yes, you have a frustration there. But at the same time, I think XP1 was designed, was an experiment, but was designed as a way to be useful at the current capability of the model. If you just want to extract data from a LinkedIn page, that model was just fine. If you want to summarize an article on a newspaper, that model was just fine. And so it was really a question of trying to find a product that works with the current capability, knowing that you will always have tailwinds as models get better and faster and cheaper. So that was kind of a... There's a bit of a frustration because you know what's out there and you know that you don't have access to it yet. It's also interesting to try to find a product that works with the current capability.Alessio [00:24:55]: And we highlighted XP1 in our anatomy of autonomy post in April of last year, which was, you know, where are all the agents, right? So now we spent 30 minutes getting to what you're building now. So you basically had a developer framework, then you had a browser extension, then you had all these things, and then you kind of got to where Dust is today. So maybe just give people an overview of what Dust is today and the courtesies behind it. Yeah, of course.Stan [00:25:20]: So Dust, we really want to build the infrastructure so that companies can deploy agents within their teams. We are horizontal by nature because we strongly believe in the emergence of use cases from the people having access to creating an agent that don't need to be developers. They have to be thinkers. They have to be curious. But anybody can create an agent that will solve an operational thing that they're doing in their day-to-day job. And to make those agents useful, there's two focus, which is interesting. The first one is an infrastructure focus. You have to build the pipes so that the agent has access to the data. You have to build the pipes such that the agents can take action, can access the web, et cetera. So that's really an infrastructure play. Maintaining connections to Notion, Slack, GitHub, all of them is a lot of work. It is boring work, boring infrastructure work, but that's something that we know is extremely valuable in the same way that Stripe is extremely valuable because it maintains the pipes. And we have that dual focus because we're also building the product for people to use it. And there it's fascinating because everything started from the conversational interface, obviously, which is a great starting point. But we're only scratching the surface, right? I think we are at the pong level of LLM productization. And we haven't invented the C3. We haven't invented Counter-Strike. We haven't invented Cyberpunk 2077. So this is really our mission is to really create the product that lets people equip themselves to just get away all the work that can be automated or assisted by LLMs.Alessio [00:26:57]: And can you just comment on different takes that people had? So maybe the most open is like auto-GPT. It's just kind of like just trying to do anything. It's like it's all magic. There's no way for you to do anything. Then you had the ADAPT, you know, we had David on the podcast. They're very like super hands-on with each individual customer to build super tailored. How do you decide where to draw the line between this is magic? This is exposed to you, especially in a market where most people don't know how to build with AI at all. So if you expect them to do the thing, they're probably not going to do it. Yeah, exactly.Stan [00:27:29]: So the auto-GPT approach obviously is extremely exciting, but we know that the agentic capability of models are not quite there yet. It just gets lost. So we're starting, we're starting where it works. Same with the XP one. And where it works is pretty simple. It's like simple workflows that involve a couple tools where you don't even need to have the model decide which tools it's used in the sense of you just want people to put it in the instructions. It's like take that page, do that search, pick up that document, do the work that I want in the format I want, and give me the results. There's no smartness there, right? In terms of orchestrating the tools, it's mostly using English for people to program a workflow where you don't have the constraint of having compatible API between the two.Swyx [00:28:17]: That kind of personal automation, would you say it's kind of like an LLM Zapier type ofStan [00:28:22]: thing?Swyx [00:28:22]: Like if this, then that, and then, you know, do this, then this. You're programming with English?Stan [00:28:28]: So you're programming with English. So you're just saying, oh, do this and then that. You can even create some form of APIs. You say, when I give you the command X, do this. When I give you the command Y, do this. And you describe the workflow. But you don't have to create boxes and create the workflow explicitly. It just needs to describe what are the tasks supposed to be and make the tool available to the agent. The tool can be a semantic search. The tool can be querying into a structured database. The tool can be searching on the web. And obviously, the interesting tools that we're only starting to scratch are actually creating external actions like reimbursing something on Stripe, sending an email, clicking on a button in the admin or something like that.Swyx [00:29:11]: Do you maintain all these integrations?Stan [00:29:13]: Today, we maintain most of the integrations. We do always have an escape hatch for people to kind of custom integrate. But the reality is that the reality of the market today is that people just want it to work, right? And so it's mostly us maintaining the integration. As an example, a very good source of information that is tricky to productize is Salesforce. Because Salesforce is basically a database and a UI. And they do the f**k they want with it. And so every company has different models and stuff like that. So right now, we don't support it natively. And the type of support or real native support will be slightly more complex than just osing into it, like is the case with Slack as an example. Because it's probably going to be, oh, you want to connect your Salesforce to us? Give us the SQL. That's the Salesforce QL language. Give us the queries you want us to run on it and inject in the context of dust. So that's interesting how not only integrations are cool, and some of them require a bit of work on the user. And for some of them that are really valuable to our users, but we don't support yet, they can just build them internally and push the data to us.Swyx [00:30:18]: I think I understand the Salesforce thing. But let me just clarify, are you using browser automation because there's no API for something?Stan [00:30:24]: No, no, no, no. In that case, so we do have browser automation for all the use cases and apply the public web. But for most of the integration with the internal system of the company, it really runs through API.Swyx [00:30:35]: Haven't you felt the pull to RPA, browser automation, that kind of stuff?Stan [00:30:39]: I mean, what I've been saying for a long time, maybe I'm wrong, is that if the future is that you're going to stand in front of a computer and looking at an agent clicking on stuff, then I'll hit my computer. And my computer is a big Lenovo. It's black. Doesn't sound good at all compared to a Mac. And if the APIs are there, we should use them. There is going to be a long tail of stuff that don't have APIs, but as the world is moving forward, that's disappearing. So the core API value in the past has really been, oh, this old 90s product doesn't have an API. So I need to use the UI to automate. I think for most of the ICP companies, the companies that ICP for us, the scale ups that are between 500 and 5,000 people, tech companies, most of the SaaS they use have APIs. Now there's an interesting question for the open web, because there are stuff that you want to do that involve websites that don't necessarily have APIs. And the current state of web integration from, which is us and OpenAI and Anthropic, I don't even know if they have web navigation, but I don't think so. The current state of affair is really, really broken because you have what? You have basically search and headless browsing. But headless browsing, I think everybody's doing basically body.innertext and fill that into the model, right?Swyx [00:31:56]: MARK MIRCHANDANI There's parsers into Markdown and stuff.Stan [00:31:58]: FRANCESC CAMPOY I'm super excited by the companies that are exploring the capability of rendering a web page into a way that is compatible for a model, being able to maintain the selector. So that's basically the place where to click in the page through that process, expose the actions to the model, have the model select an action in a way that is compatible with model, which is not a big page of a full DOM that is very noisy, and then being able to decompress that back to the original page and take the action. And that's something that is really exciting and that will kind of change the level of things that agents can do on the web. That I feel exciting, but I also feel that the bulk of the useful stuff that you can do within the company can be done through API. The data can be retrieved by API. The actions can be taken through API.Swyx [00:32:44]: For listeners, I'll note that you're basically completely disagreeing with David Wan. FRANCESC CAMPOY Exactly, exactly. I've seen it since it's summer. ADEPT is where it is, and Dust is where it is. So Dust is still standing.Alessio [00:32:55]: Can we just quickly comment on function calling? You mentioned you don't need the models to be that smart to actually pick the tools. Have you seen the models not be good enough? Or is it just like, you just don't want to put the complexity in there? Like, is there any room for improvement left in function calling? Or do you feel you usually consistently get always the right response, the right parametersStan [00:33:13]: and all of that?Alessio [00:33:13]: FRANCESC CAMPOY So that's a tricky product question.Stan [00:33:15]: Because if the instructions are good and precise, then you don't have any issue, because it's scripted for you. And the model will just look at the scripts and just follow and say, oh, he's probably talking about that action, and I'm going to use it. And the parameters are kind of abused from the state of the conversation. I'll just go with it. If you provide a very high level, kind of an auto-GPT-esque level in the instructions and provide 16 different tools to your model, yes, we're seeing the models in that state making mistakes. And there is obviously some progress can be made on the capabilities. But the interesting part is that there is already so much work that can assist, augment, accelerate by just going with pretty simply scripted for actions agents. What I'm excited about by pushing our users to create rather simple agents is that once you have those working really well, you can create meta agents that use the agents as actions. And all of a sudden, you can kind of have a hierarchy of responsibility that will probably get you almost to the point of the auto-GPT value. It requires the construction of intermediary artifacts, but you're probably going to be able to achieve something great. I'll give you some example. We have our incidents are shared in Slack in a specific channel, or shipped are shared in Slack. We have a weekly meeting where we have a table about incidents and shipped stuff. We're not writing that weekly meeting table anymore. We have an assistant that just go find the right data on Slack and create the table for us. And that assistant works perfectly. It's trivially simple, right? Take one week of data from that channel and just create the table. And then we have in that weekly meeting, obviously some graphs and reporting about our financials and our progress and our ARR. And we've created assistants to generate those graphs directly. And those assistants works great. By creating those assistants that cover those small parts of that weekly meeting, slowly we're getting to in a world where we'll have a weekly meeting assistance. We'll just call it. You don't need to prompt it. You don't need to say anything. It's going to run those different assistants and get that notion page just ready. And by doing that, if you get there, and that's an objective for us to us using Dust, get there, you're saving an hour of company time every time you run it. Yeah.Alessio [00:35:28]: That's my pet topic of NPM for agents. How do you build dependency graphs of agents? And how do you share them? Because why do I have to rebuild some of the smaller levels of what you built already?Swyx [00:35:40]: I have a quick follow-up question on agents managing other agents. It's a topic of a lot of research, both from Microsoft and even in startups. What you've discovered best practice for, let's say like a manager agent controlling a bunch of small agents. It's two-way communication. I don't know if there should be a protocol format.Stan [00:35:59]: To be completely honest, the state we are at right now is creating the simple agents. So we haven't even explored yet the meta agents. We know it's there. We know it's going to be valuable. We know it's going to be awesome. But we're starting there because it's the simplest place to start. And it's also what the market understands. If you go to a company, random SaaS B2B company, not necessarily specialized in AI, and you take an operational team and you tell them, build some tooling for yourself, they'll understand the small agents. If you tell them, build AutoGP, they'll be like, Auto what?Swyx [00:36:31]: And I noticed that in your language, you're very much focused on non-technical users. You don't really mention API here. You mention instruction instead of system prompt, right? That's very conscious.Stan [00:36:41]: Yeah, it's very conscious. It's a mark of our designer, Ed, who kind of pushed us to create a friendly product. I was knee-deep into AI when I started, obviously. And my co-founder, Gabriel, was a Stripe as well. We started a company together that got acquired by Stripe 15 years ago. It was at Alain, a healthcare company in Paris. After that, it was a little bit less so knee-deep in AI, but really focused on product. And I didn't realize how important it is to make that technology not scary to end users. It didn't feel scary to me, but it was really seen by Ed, our designer, that it was feeling scary to the users. And so we were very proactive and very deliberate about creating a brand that feels not too scary and creating a wording and a language, as you say, that really tried to communicate the fact that it's going to be fine. It's going to be easy. You're going to make it.Alessio [00:37:34]: And another big point that David had about ADAPT is we need to build an environment for the agents to act. And then if you have the environment, you can simulate what they do. How's that different when you're interacting with APIs and you're kind of touching systems that you cannot really simulate? If you call it the Salesforce API, you're just calling it.Stan [00:37:52]: So I think that goes back to the DNA of the companies that are very different. ADAPT, I think, was a product company with a very strong research DNA, and they were still doing research. One of their goals was building a model. And that's why they raised a large amount of money, et cetera. We are 100% deliberately a product company. We don't do research. We don't train models. We don't even run GPUs. We're using the models that exist, and we try to push the product boundary as far as possible with the existing models. So that creates an issue. Indeed, so to answer your question, when you're interacting in the real world, well, you cannot simulate, so you cannot improve the models. Even improving your instructions is complicated for a builder. The hope is that you can use models to evaluate the conversations so that you can get at least feedback and you could get contradictive information about the performance of the assistance. But if you take actual trace of interaction of humans with those agents, it is even for us humans extremely hard to decide whether it was a productive interaction or a really bad interaction. You don't know why the person left. You don't know if they left happy or not. So being extremely, extremely, extremely pragmatic here, it becomes a product issue. We have to build a product that identifies the end users to provide feedback so that as a first step, the person that is building the agent can iterate on it. As a second step, maybe later when we start training model and post-training, et cetera, we can optimize around that for each of those companies. Yeah.Alessio [00:39:17]: Do you see in the future products offering kind of like a simulation environment, the same way all SaaS now kind of offers APIs to build programmatically? Like in cybersecurity, there are a lot of companies working on building simulative environments so that then you can use agents like Red Team, but I haven't really seen that.Stan [00:39:34]: Yeah, no, me neither. That's a super interesting question. I think it's really going to depend on how much, because you need to simulate to generate data, you need to train data to train models. And the question at the end is, are we going to be training models or are we just going to be using frontier models as they are? On that question, I don't have a strong opinion. It might be the case that we'll be training models because in all of those AI first products, the model is so close to the product surface that as you get big and you want to really own your product, you're going to have to own the model as well. Owning the model doesn't mean doing the pre-training, that would be crazy. But at least having an internal post-training realignment loop, it makes a lot of sense. And so if we see many companies going towards that all the time, then there might be incentives for the SaaS's of the world to provide assistance in getting there. But at the same time, there's a tension because those SaaS, they don't want to be interacted by agents, they want the human to click on the button. Yeah, they got to sell seats. Exactly.Swyx [00:40:41]: Just a quick question on models. I'm sure you've used many, probably not just OpenAI. Would you characterize some models as better than others? Do you use any open source models? What have been the trends in models over the last two years?Stan [00:40:53]: We've seen over the past two years kind of a bit of a race in between models. And at times, it's the OpenAI model that is the best. At times, it's the Anthropic models that is the best. Our take on that is that we are agnostic and we let our users pick their model. Oh, they choose? Yeah, so when you create an assistant or an agent, you can just say, oh, I'm going to run it on GP4, GP4 Turbo, or...Swyx [00:41:16]: Don't you think for the non-technical user, that is actually an abstraction that you should take away from them?Stan [00:41:20]: We have a sane default. So we move the default to the latest model that is cool. And we have a sane default, and it's actually not very visible. In our flow to create an agent, you would have to go in advance and go pick your model. So this is something that the technical person will care about. But that's something that obviously is a bit too complicated for the...Swyx [00:41:40]: And do you care most about function calling or instruction following or something else?Stan [00:41:44]: I think we care most for function calling because you want to... There's nothing worse than a function call, including incorrect parameters or being a bit off because it just drives the whole interaction off.Swyx [00:41:56]: Yeah, so got the Berkeley function calling.Stan [00:42:00]: These days, it's funny how the comparison between GP4O and GP4 Turbo is still up in the air on function calling. I personally don't have proof, but I know many people, and I'm probably part of them, to think that GP4 Turbo is still better than GP4O on function calling. Wow. We'll see what comes out of the O1 class if it ever gets function calling. And Cloud 3.5 Summit is great as well. They kind of innovated in an interesting way, which was never quite publicized. But it's that they have that kind of chain of thought step whenever you use a Cloud model or Summit model with function calling. That chain of thought step doesn't exist when you just interact with it just for answering questions. But when you use function calling, you get that step, and it really helps getting better function calling.Swyx [00:42:43]: Yeah, we actually just recorded a podcast with the Berkeley team that runs that leaderboard this week. So they just released V3.Stan [00:42:49]: Yeah.Swyx [00:42:49]: It was V1 like two months ago, and then they V2, V3. Turbo is on top.Stan [00:42:53]: Turbo is on top. Turbo is over 4.0.Swyx [00:42:54]: And then the third place is XLAM from Salesforce, which is a large action model they've been trying to popularize.Stan [00:43:01]: Yep.Swyx [00:43:01]: O1 Mini is actually on here, I think. O1 Mini is number 11.Stan [00:43:05]: But arguably, O1 Mini has been in a line for that. Yeah.Alessio [00:43:09]: Do you use leaderboards? Do you have your own evals? I mean, this is kind of intuitive, right? Like using the older model is better. I think most people just upgrade. Yeah. What's the eval process like?Stan [00:43:19]: It's funny because I've been doing research for three years, and we have bigger stuff to cook. When you're deploying in a company, one thing where we really spike is that when we manage to activate the company, we have a crazy penetration. The highest penetration we have is 88% daily active users within the entire employee of the company. The kind of average penetration and activation we have in our current enterprise customers is something like more like 60% to 70% weekly active. So we basically have the entire company interacting with us. And when you're there, there is so many stuff that matters most than getting evals, getting the best model. Because there is so many places where you can create products or do stuff that will give you the 80% with the work you do. Whereas deciding if it's GPT-4 or GPT-4 Turbo or et cetera, you know, it'll just give you the 5% improvement. But the reality is that you want to focus on the places where you can really change the direction or change the interaction more drastically. But that's something that we'll have to do eventually because we still want to be serious people.Swyx [00:44:24]: It's funny because in some ways, the model labs are competing for you, right? You don't have to do any effort. You just switch model and then it'll grow. What are you really limited by? Is it additional sources?Stan [00:44:36]: It's not models, right?Swyx [00:44:37]: You're not really limited by quality of model.Stan [00:44:40]: Right now, we are limited by the infrastructure part, which is the ability to connect easily for users to all the data they need to do the job they want to do.Swyx [00:44:51]: Because you maintain all your own stuff.Stan [00:44:53]: You know, there are companies out thereSwyx [00:44:54]: that are starting to provide integrations as a service, right? I used to work in an integrations company. Yeah, I know.Stan [00:44:59]: It's just that there is some intricacies about how you chunk stuff and how you process information from one platform to the other. If you look at the end of the spectrum, you could think of, you could say, oh, I'm going to support AirByte and AirByte has- I used to work at AirByte.Swyx [00:45:12]: Oh, really?Stan [00:45:13]: That makes sense.Swyx [00:45:14]: They're the French founders as well.Stan [00:45:15]: I know Jean very well. I'm seeing him today. And the reality is that if you look at Notion, AirByte does the job of taking Notion and putting it in a structured way. But that's the way it is not really usable to actually make it available to models in a useful way. Because you get all the blocks, details, et cetera, which is useful for many use cases.Swyx [00:45:35]: It's also for data scientists and not for AI.Stan [00:45:38]: The reality of Notion is that sometimes you have a- so when you have a page, there's a lot of structure in it and you want to capture the structure and chunk the information in a way that respects that structure. In Notion, you have databases. Sometimes those databases are real tabular data. Sometimes those databases are full of text. You want to get the distinction and understand that this database should be considered like text information, whereas this other one is actually quantitative information. And to really get a very high quality interaction with that piece of information, I haven't found a solution that will work without us owning the connection end-to-end.Swyx [00:46:15]: That's why I don't invest in, there's Composio, there's All Hands from Graham Newbig. There's all these other companies that are like, we will do the integrations for you. You just, we have the open source community. We'll do off the shelf. But then you are so specific in your needs that you want to own it.Swyx [00:46:28]: Yeah, exactly.Stan [00:46:29]: You can talk to Michel about that.Swyx [00:46:30]: You know, he wants to put the AI in there, but you know. Yeah, I will. I will.Stan [00:46:35]: Cool. What are we missing?Alessio [00:46:36]: You know, what are like the things that are like sneakily hard that you're tackling that maybe people don't even realize they're like really hard?Stan [00:46:43]: The real parts as we kind of touch base throughout the conversation is really building the infra that works for those agents because it's a tenuous walk. It's an evergreen piece of work because you always have an extra integration that will be useful to a non-negligible set of your users. I'm super excited about is that there's so many interactions that shouldn't be conversational interactions and that could be very useful. Basically, know that we have the firehose of information of those companies and there's not going to be that many companies that capture the firehose of information. When you have the firehose of information, you can do a ton of stuff with models that are just not accelerating people, but giving them superhuman capability, even with the current model capability because you can just sift through much more information. An example is documentation repair. If I have the firehose of Slack messages and new Notion pages, if somebody says, I own that page, I want to be updated when there is a piece of information that should update that page, this is not possible. You get an email saying, oh, look at that Slack message. It says the opposite of what you have in that paragraph. Maybe you want to update or just ping that person. I think there is a lot to be explored on the product layer in terms of what it means to interact productively with those models. And that's a problem that's extremely hard and extremely exciting.Swyx [00:48:00]: One thing you keep mentioning about infra work, obviously, Dust is building that infra and serving that in a very consumer-friendly way. You always talk about infra being additional sources, additional connectors. That is very important. But I'm also interested in the vertical infra. There is an orchestrator underlying all these things where you're doing asynchronous work. For example, the simplest one is a cron job. You just schedule things. But also, for if this and that, you have to wait for something to be executed and proceed to the next task. I used to work on an orchestrator as well, Temporal.Stan [00:48:31]: We used Temporal. Oh, you used Temporal? Yeah. Oh, how was the experience?Swyx [00:48:34]: I need the NPS.Stan [00:48:36]: We're doing a self-discovery call now.Swyx [00:48:39]: But you can also complain to me because I don't work there anymore.Stan [00:48:42]: No, we love Temporal. There's some edges that are a bit rough, surprisingly rough. And you would say, why is it so complicated?Swyx [00:48:49]: It's always versioning.Stan [00:48:50]: Yeah, stuff like that. But we really love it. And we use it for exactly what you said, like managing the entire set of stuff that needs to happen so that in semi-real time, we get all the updates from Slack or Notion or GitHub into the system. And whenever we see that piece of information goes through, maybe trigger workflows to run agents because they need to provide alerts to users and stuff like that. And Temporal is great. Love it.Swyx [00:49:17]: You haven't evaluated others. You don't want to build your own. You're happy with...Stan [00:49:21]: Oh, no, we're not in the business of replacing Temporal. And Temporal is so... I mean, it is or any other competitive product. They're very general. If it's there, there's an interesting theory about buy versus build. I think in that case, when you're a high-growth company, your buy-build trade-off is very much on the side of buy. Because if you have the capability, you're just going to be saving time, you can focus on your core competency, etc. And it's funny because we're seeing, we're starting to see the post-high-growth company, post-SKF company, going back on that trade-off, interestingly. So that's the cloud news about removing Zendesk and Salesforce. Do you believe that, by the way?Alessio [00:49:56]: Yeah, I did a podcast with them.Stan [00:49:58]: Oh, yeah?Alessio [00:49:58]: It's true.Swyx [00:49:59]: No, no, I know.Stan [00:50:00]: Of course they say it's true,Swyx [00:50:00]: but also how well is it going to go?Stan [00:50:02]: So I'm not talking about deflecting the customer traffic. I'm talking about building AI on top of Salesforce and Zendesk, basically, if I understand correctly. And all of a sudden, your product surface becomes much smaller because you're interacting with an AI system that will take some actions. And so all of a sudden, you don't need the product layer anymore. And you realize that, oh, those things are just databases that I pay a hundred times the price, right? Because you're a post-SKF company and you have tech capabilities, you are incentivized to reduce your costs and you have the capability to do so. And then it makes sense to just scratch the SaaS away. So it's interesting that we might see kind of a bad time for SaaS in post-hyper-growth tech companies. So it's still a big market, but it's not that big because if you're not a tech company, you don't have the capabilities to reduce that cost. If you're a high-growth company, always going to be buying because you go faster with that. But that's an interesting new space, new category of companies that might remove some SaaS. Yeah, Alessio's firmSwyx [00:51:02]: has an interesting thesis on the future of SaaS in AI.Alessio [00:51:05]: Service as a software, we call it. It's basically like, well, the most extreme is like, why is there any software at all? You know, ideally, it's all a labor interface where you're asking somebody to do something for you, whether that's a person, an AI agent or whatnot.Stan [00:51:17]: Yeah, yeah, that's interesting. I have to ask.Swyx [00:51:19]: Are you paying for Temporal Cloud or are you self-hosting?Stan [00:51:22]: Oh, no, no, we're paying, we're paying. Oh, okay, interesting.Swyx [00:51:24]: We're paying way too much.Stan [00:51:26]: It's crazy expensive, but it makes us-Swyx [00:51:28]: That's why as a shareholder, I like to hear that. It makes us go faster,Stan [00:51:31]: so we're happy to pay.Swyx [00:51:33]: Other things in the infrastack, I just want a list for other founders to think about. Ops, API gateway, evals, you know, anything interesting there that you build or buy?Stan [00:51:41]: I mean, there's always an interesting question. We've been building a lot around the interface between models and because Dust, the original version, was an orchestration platform and we basically provide a unified interface to every model providers.Swyx [00:51:56]: That's what I call gateway.Stan [00:51:57]: That we add because Dust was that and so we continued building upon and we own it. But that's an interesting question was in you, you want to build that or buy it?Swyx [00:52:06]: Yeah, I always say light LLM is the current open source consensus.Stan [00:52:09]: Exactly, yeah. There's an interesting question there.Swyx [00:52:12]: Ops, Datadog, just tracking.Stan [00:52:14]: Oh yeah, so Datadog is an obvious... What are the mistakes that I regret? I started as pure JavaScript, not TypeScript, and I think you want to, if you're wondering, oh, I want to go fast, I'll do a little bit of JavaScript. No, don't, just start with TypeScript. I see, okay.Swyx [00:52:30]: So interesting, you are a research engineer that came out of OpenAI that bet on TypeScript.Stan [00:52:36]: Well, the reality is that if you're building a product, you're going to be doing a lot of JavaScript, right? And Next, we're using Next as an example. It's

god love ceo amazon spotify history ai europe english google vision france pr future service running state challenges french building phd research dna microsoft explore open impact current progress chatgpt beast silicon valley discovering services software auto coo cloud singapore summit stanford mac maintaining pinterest joe rogan owning remix dust dom oracle berkeley saas cto nest adapt slack cyberpunk visiting openai salesforce sf amc rust api north star gpt python ui vertical turbo ml atari alain dropbox github notion 1b apis kaplan stripe formal javascript da vinci arr temporal llm sam altman ops reasoning rc agi lenovo nps horizontal xp icp verification sql perplexity counter strike imo benchmarks anthropic gpus rpa deepmind ilya v2 alessio weirdly zendesk speakpipe v1 typescript gpts red team pnp markdown datadog adept v3 npm stanislas totems smol all hands babbage andrew ng ilya sutskever greg brockman o1 gpd a100 langchain skf saas b2b neurips ffmpeg jimmy donaldson dan silver latent space mfu greg buckman xlam

Election Night Shenanigans - Locker Room LIVE! 11-5-24

War Stories by Manstalgia

Play Episode Listen Later Nov 7, 2024 90:27

It's the Red Team vs the Blue Team in the US's greatest spectator sport: The 2024 Presidential Election! Tom and Chuck spent the first part of the evening over on the Two Cops One Donut podcast and then asked the hosts of that show, Erik and Banning to join us on Locker Room LIVE for the election results and to talk about what they do on their show. Remember to like, subscribe, and leave a review to help us grow the podcast. Go to www.warstoriesofficial.com to listen to older episodes or to support us by buying our merch. You can also support us at https://patron.podbean.com/warstories... and follow us on Instagram @war_stories_official and Facebook at www.facebook.com/WarStoriesOfficialPodcast

elections shenanigans presidential election banning locker room election night red team blue team

Steve Gruber, The liberal meltdowns following the collapse of Kamala Harris

The Steve Gruber Show

Play Episode Listen Later Nov 7, 2024 11:00

Here are the 3 Big things you need to know this hour— Number One— The markets are fired up for Donald Trump to be President—now he has to make good on the promises he made to America—and that work is already underway I can tell you that— Number Two— For the first time—the voter turnout model had more independents than Democrats—maybe that's why so many of the pollsters really blew it this time around—or maybe they are just dishonest— Number Three— To be honest—I cannot get enough of the liberal meltdowns following the collapse of Kamala Harris and the Democrats all over America—that appear to have made way for a trifecta in Washington for the Red Team—

america president donald trump washington democrats kamala harris collapse liberal meltdowns red team steve gruber

In the Arena: How LMSys changed LLM Benchmarking Forever

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Nov 1, 2024 41:02

Apologies for lower audio quality; we lost recordings and had to use backup tracks. Our guests today are Anastasios Angelopoulos and Wei-Lin Chiang, leads of Chatbot Arena, fka LMSYS, the crowdsourced AI evaluation platform developed by the LMSys student club at Berkeley, which became the de facto standard for comparing language models. Arena ELO is often more cited than MMLU scores to many folks, and they have attracted >1,000,000 people to cast votes since its launch, leading top model trainers to cite them over their own formal academic benchmarks:The Limits of Static BenchmarksWe've done two benchmarks episodes: Benchmarks 101 and Benchmarks 201. One issue we've always brought up with static benchmarks is that 1) many are getting saturated, with models scoring almost perfectly on them 2) they often don't reflect production use cases, making it hard for developers and users to use them as guidance. The fundamental challenge in AI evaluation isn't technical - it's philosophical. How do you measure something that increasingly resembles human intelligence? Rather than trying to define intelligence upfront, Arena let users interact naturally with models and collect comparative feedback. It's messy and subjective, but that's precisely the point - it captures the full spectrum of what people actually care about when using AI.The Pareto Frontier of Cost vs IntelligenceBecause the Elo scores are remarkably stable over time, we can put all the chat models on a map against their respective cost to gain a view of at least 3 orders of magnitude of model sizes/costs and observe the remarkable shift in intelligence per dollar over the past year:This frontier stood remarkably firm through the recent releases of o1-preview and price cuts of Gemini 1.5:The Statistics of SubjectivityIn our Benchmarks 201 episode, Clémentine Fourrier from HuggingFace thought this design choice was one of shortcomings of arenas: they aren't reproducible. You don't know who ranked what and what exactly the outcome was at the time of ranking. That same person might rank the same pair of outputs differently on a different day, or might ask harder questions to better models compared to smaller ones, making it imbalanced. Another argument that people have brought up is confirmation bias. We know humans prefer longer responses and are swayed by formatting - Rob Mulla from Dreadnode had found some interesting data on this in May:The approach LMArena is taking is to use logistic regression to decompose human preferences into constituent factors. As Anastasios explains: "We can say what components of style contribute to human preference and how they contribute." By adding these style components as parameters, they can mathematically "suck out" their influence and isolate the core model capabilities.This extends beyond just style - they can control for any measurable factor: "What if I want to look at the cost adjusted performance? Parameter count? We can ex post facto measure that." This is one of the most interesting things about Arena: You have a data generation engine which you can clean and turn into leaderboards later. If you wanted to create a leaderboard for poetry writing, you could get existing data from Arena, normalize it by identifying these style components. Whether or not it's possible to really understand WHAT bias the voters have, that's a different question.Private EvalsOne of the most delicate challenges LMSYS faces is maintaining trust while collaborating with AI labs. The concern is that labs could game the system by testing multiple variants privately and only releasing the best performer. This was brought up when 4o-mini released and it ranked as the second best model on the leaderboard:But this fear misunderstands how Arena works. Unlike static benchmarks where selection bias is a major issue, Arena's live nature means any initial bias gets washed out by ongoing evaluation. As Anastasios explains: "In the long run, there's way more fresh data than there is data that was used to compare these five models." The other big question is WHAT model is actually being tested; as people often talk about on X / Discord, the same endpoint will randomly feel “nerfed” like it happened for “Claude European summer” and corresponding conspiracy theories:It's hard to keep track of these performance changes in Arena as these changes (if real…?) are not observable.The Future of EvaluationThe team's latest work on RouteLLM points to an interesting future where evaluation becomes more granular and task-specific. But they maintain that even simple routing strategies can be powerful - like directing complex queries to larger models while handling simple tasks with smaller ones.Arena is now going to expand beyond text into multimodal evaluation and specialized domains like code execution and red teaming. But their core insight remains: the best way to evaluate intelligence isn't to simplify it into metrics, but to embrace its complexity and find rigorous ways to analyze it. To go after this vision, they are spinning out Arena from LMSys, which will stay as an academia-driven group at Berkeley.Full Video PodcastChapters* 00:00:00 - Introductions* 00:01:16 - Origin and development of Chatbot Arena* 00:05:41 - Static benchmarks vs. Arenas* 00:09:03 - Community building* 00:13:32 - Biases in human preference evaluation* 00:18:27 - Style Control and Model Categories* 00:26:06 - Impact of o1* 00:29:15 - Collaborating with AI labs* 00:34:51 - RouteLLM and router models* 00:38:09 - Future of LMSys / ArenaShow Notes* Anastasios Angelopoulos* Anastasios' NeurIPS Paper Conformal Risk Control* Wei-Lin Chiang* Chatbot Arena* LMSys* MTBench* ShareGPT dataset* Stanford's Alpaca project* LLMRouter* E2B* DreadnodeTranscriptAlessio [00:00:00]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, Partner and CTO in Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol.ai.Swyx [00:00:14]: Hey, and today we're very happy and excited to welcome Anastasios and Wei Lin from LMSys. Welcome guys.Wei Lin [00:00:21]: Hey, how's it going? Nice to see you.Anastasios [00:00:23]: Thanks for having us.Swyx [00:00:24]: Anastasios, I actually saw you, I think at last year's NeurIPS. You were presenting a paper, which I don't really super understand, but it was some theory paper about how your method was very dominating over other sort of search methods. I don't remember what it was, but I remember that you were a very confident speaker.Anastasios [00:00:40]: Oh, I totally remember you. Didn't ever connect that, but yes, that's definitely true. Yeah. Nice to see you again.Swyx [00:00:46]: Yeah. I was frantically looking for the name of your paper and I couldn't find it. Basically I had to cut it because I didn't understand it.Anastasios [00:00:51]: Is this conformal PID control or was this the online control?Wei Lin [00:00:55]: Blast from the past, man.Swyx [00:00:57]: Blast from the past. It's always interesting how NeurIPS and all these academic conferences are sort of six months behind what people are actually doing, but conformal risk control, I would recommend people check it out. I have the recording. I just never published it just because I was like, I don't understand this enough to explain it.Anastasios [00:01:14]: People won't be interested.Wei Lin [00:01:15]: It's all good.Swyx [00:01:16]: But ELO scores, ELO scores are very easy to understand. You guys are responsible for the biggest revolution in language model benchmarking in the last few years. Maybe you guys want to introduce yourselves and maybe tell a little bit of the brief history of LMSysWei Lin [00:01:32]: Hey, I'm Wei Lin. I'm a fifth year PhD student at UC Berkeley, working on Chatbot Arena these days, doing crowdsourcing AI benchmarking.Anastasios [00:01:43]: I'm Anastasios. I'm a sixth year PhD student here at Berkeley. I did most of my PhD on like theoretical statistics and sort of foundations of model evaluation and testing. And now I'm working 150% on this Chatbot Arena stuff. It's great.Alessio [00:02:00]: And what was the origin of it? How did you come up with the idea? How did you get people to buy in? And then maybe what were one or two of the pivotal moments early on that kind of made it the standard for these things?Wei Lin [00:02:12]: Yeah, yeah. Chatbot Arena project was started last year in April, May, around that. Before that, we were basically experimenting in a lab how to fine tune a chatbot open source based on the Llama 1 model that I released. At that time, Lama 1 was like a base model and people didn't really know how to fine tune it. So we were doing some explorations. We were inspired by Stanford's Alpaca project. So we basically, yeah, grow a data set from the internet, which is called ShareGPT data set, which is like a dialogue data set between user and chat GPT conversation. It turns out to be like pretty high quality data, dialogue data. So we fine tune on it and then we train it and release the model called V2. And people were very excited about it because it kind of like demonstrate open way model can reach this conversation capability similar to chat GPT. And then we basically release the model with and also build a demo website for the model. People were very excited about it. But during the development, the biggest challenge to us at the time was like, how do we even evaluate it? How do we even argue this model we trained is better than others? And then what's the gap between this open source model that other proprietary offering? At that time, it was like GPT-4 was just announced and it's like Cloud One. What's the difference between them? And then after that, like every week, there's a new model being fine tuned, released. So even until still now, right? And then we have that demo website for V2 now. And then we thought like, okay, maybe we can add a few more of the model as well, like API model as well. And then we quickly realized that people need a tool to compare between different models. So we have like a side by side UI implemented on the website to that people choose, you know, compare. And we quickly realized that maybe we can do something like, like a battle on top of ECLMs, like just anonymize it, anonymize the identity, and that people vote which one is better. So the community decides which one is better, not us, not us arguing, you know, our model is better or what. And that turns out to be like, people are very excited about this idea. And then we tweet, we launch, and that's, yeah, that's April, May. And then it was like first two, three weeks, like just a few hundred thousand views tweet on our launch tweets. And then we have regularly double update weekly, beginning at a time, adding new model GPT-4 as well. So it was like, that was the, you know, the initial.Anastasios [00:04:58]: Another pivotal moment, just to jump in, would be private models, like the GPT, I'm a little,Wei Lin [00:05:04]: I'm a little chatty. That was this year. That was this year.Anastasios [00:05:07]: Huge.Wei Lin [00:05:08]: That was also huge.Alessio [00:05:09]: In the beginning, I saw the initial release was May 3rd of the beta board. On April 6, we did a benchmarks 101 episode for a podcast, just kind of talking about, you know, how so much of the data is like in the pre-training corpus and blah, blah, blah. And like the benchmarks are really not what we need to evaluate whether or not a model is good. Why did you not make a benchmark? Maybe at the time, you know, it was just like, Hey, let's just put together a whole bunch of data again, run a, make a score that seems much easier than coming out with a whole website where like users need to vote. Any thoughts behind that?Wei Lin [00:05:41]: I think it's more like fundamentally, we don't know how to automate this kind of benchmarks when it's more like, you know, conversational, multi-turn, and more open-ended task that may not come with a ground truth. So let's say if you ask a model to help you write an email for you for whatever purpose, there's no ground truth. How do you score them? Or write a story or a creative story or many other things like how we use ChatterBee these days. It's more open-ended. You know, we need human in the loop to give us feedback, which one is better. And I think nuance here is like, sometimes it's also hard for human to give the absolute rating. So that's why we have this kind of pairwise comparison, easier for people to choose which one is better. So from that, we use these pairwise comparison, those to calculate the leaderboard. Yeah. You can add more about this methodology.Anastasios [00:06:40]: Yeah. I think the point is that, and you guys probably also talked about this at some point, but static benchmarks are intrinsically, to some extent, unable to measure generative model performance. And the reason is because you cannot pre-annotate all the outputs of a generative model. You change the model, it's like the distribution of your data is changing. New labels to deal with that. New labels are great automated labeling, right? Which is why people are pursuing both. And yeah, static benchmarks, they allow you to zoom in to particular types of information like factuality, historical facts. We can build the best benchmark of historical facts, and we will then know that the model is great at historical facts. But ultimately, that's not the only axis, right? And we can build 50 of them, and we can evaluate 50 axes. But it's just so, the problem of generative model evaluation is just so expansive, and it's so subjective, that it's just maybe non-intrinsically impossible, but at least we don't see a way. We didn't see a way of encoding that into a fixed benchmark.Wei Lin [00:07:47]: But on the other hand, I think there's a challenge where this kind of online dynamic benchmark is more expensive than static benchmark, offline benchmark, where people still need it. Like when they build models, they need static benchmark to track where they are.Anastasios [00:08:03]: It's not like our benchmark is uniformly better than all other benchmarks, right? It just measures a different kind of performance that has proved to be useful.Swyx [00:08:14]: You guys also published MTBench as well, which is a static version, let's say, of Chatbot Arena, right? That people can actually use in their development of models.Wei Lin [00:08:25]: Right. I think one of the reasons we still do this static benchmark, we still wanted to explore, experiment whether we can automate this, because people, eventually, model developers need it to fast iterate their model. So that's why we explored LM as a judge, and ArenaHard, trying to filter, select high-quality data we collected from Chatbot Arena, the high-quality subset, and use that as a question and then automate the judge pipeline, so that people can quickly get high-quality signal, benchmark signals, using this online benchmark.Swyx [00:09:03]: As a community builder, I'm curious about just the initial early days. Obviously when you offer effectively free A-B testing inference for people, people will come and use your arena. What do you think were the key unlocks for you? Was it funding for this arena? Was it marketing? When people came in, do you see a noticeable skew in the data? Which obviously now you have enough data sets, you can separate things out, like coding and hard prompts, but in the early days, it was just all sorts of things.Anastasios [00:09:31]: Yeah, maybe one thing to establish at first is that our philosophy has always been to maximize organic use. I think that really does speak to your point, which is, yeah, why do people come? They came to use free LLM inference, right? And also, a lot of users just come to the website to use direct chat, because you can chat with the model for free. And then you could think about it like, hey, let's just be kind of like more on the selfish or conservative or protectionist side and say, no, we're only giving credits for people that battle or so on and so forth. Strategy wouldn't work, right? Because what we're trying to build is like a big funnel, a big funnel that can direct people. And some people are passionate and interested and they battle. And yes, the distribution of the people that do that is different. It's like, as you're pointing out, it's like, that's not as they're enthusiastic.Wei Lin [00:10:24]: They're early adopters of this technology.Anastasios [00:10:27]: Or they like games, you know, people like this. And we've run a couple of surveys that indicate this as well, of our user base.Wei Lin [00:10:36]: We do see a lot of developers come to the site asking polling questions, 20-30%. Yeah, 20-30%.Anastasios [00:10:42]: It's obviously not reflective of the general population, but it's reflective of some corner of the world of people that really care. And to some extent, maybe that's all right, because those are like the power users. And you know, we're not trying to claim that we represent the world, right? We represent the people that come and vote.Swyx [00:11:02]: Did you have to do anything marketing-wise? Was anything effective? Did you struggle at all? Was it success from day one?Wei Lin [00:11:09]: At some point, almost done. Okay. Because as you can imagine, this leaderboard depends on community engagement participation. If no one comes to vote tomorrow, then no leaderboard.Anastasios [00:11:23]: So we had some period of time when the number of users was just, after the initial launch, it went lower. Yeah. And, you know, at some point, it did not look promising. Actually, I joined the project a couple months in to do the statistical aspects, right? As you can imagine, that's how it kind of hooked into my previous work. At that time, it wasn't like, you know, it definitely wasn't clear that this was like going to be the eval or something. It was just like, oh, this is a cool project. Like Wayland seems awesome, you know, and that's it.Wei Lin [00:11:56]: Definitely. There's in the beginning, because people don't know us, people don't know what this is for. So we had a hard time. But I think we were lucky enough that we have some initial momentum. And as well as the competition between model providers just becoming, you know, became very intense. Intense. And then that makes the eval onto us, right? Because always number one is number one.Anastasios [00:12:23]: There's also an element of trust. Our main priority in everything we do is trust. We want to make sure we're doing everything like all the I's are dotted and the T's are crossed and nobody gets unfair treatment and people can see from our profiles and from our previous work and from whatever, you know, we're trustworthy people. We're not like trying to make a buck and we're not trying to become famous off of this or that. It's just, we're trying to provide a great public leaderboard community venture project.Wei Lin [00:12:51]: Yeah.Swyx [00:12:52]: Yes. I mean, you are kind of famous now, you know, that's fine. Just to dive in more into biases and, you know, some of this is like statistical control. The classic one for human preference evaluation is humans demonstrably prefer longer contexts or longer outputs, which is actually something that we don't necessarily want. You guys, I think maybe two months ago put out some length control studies. Apart from that, there are just other documented biases. Like, I'd just be interested in your review of what you've learned about biases and maybe a little bit about how you've controlled for them.Anastasios [00:13:32]: At a very high level, yeah. Humans are biased. Totally agree. Like in various ways. It's not clear whether that's good or bad, you know, we try not to make value judgments about these things. We just try to describe them as they are. And our approach is always as follows. We collect organic data and then we take that data and we mine it to get whatever insights we can get. And, you know, we have many millions of data points that we can now use to extract insights from. Now, one of those insights is to ask the question, what is the effect of style, right? You have a bunch of data, you have votes, people are voting either which way. We have all the conversations. We can say what components of style contribute to human preference and how do they contribute? Now, that's an important question. Why is that an important question? It's important because some people want to see which model would be better if the lengths of the responses were the same, were to be the same, right? People want to see the causal effect of the model's identity controlled for length or controlled for markdown, number of headers, bulleted lists, is the text bold? Some people don't, they just don't care about that. The idea is not to impose the judgment that this is not important, but rather to say ex post facto, can we analyze our data in a way that decouples all the different factors that go into human preference? Now, the way we do this is via statistical regression. That is to say the arena score that we show on our leaderboard is a particular type of linear model, right? It's a linear model that takes, it's a logistic regression that takes model identities and fits them against human preference, right? So it regresses human preference against model identity. What you get at the end of that logistic regression is a parameter vector of coefficients. And when the coefficient is large, it tells you that GPT 4.0 or whatever, very large coefficient, that means it's strong. And that's exactly what we report in the table. It's just the predictive effect of the model identity on the vote. The other thing that you can do is you can take that vector, let's say we have M models, that is an M dimensional vector of coefficients. What you can do is you say, hey, I also want to understand what the effect of length is. So I'll add another entry to that vector, which is trying to predict the vote, right? That tells me the difference in length between two model responses. So we have that for all of our data. We can compute it ex post facto. We added it into the regression and we look at that predictive effect. And then the idea, and this is formally true under certain conditions, not always verifiable ones, but the idea is that adding that extra coefficient to this vector will kind of suck out the predictive power of length and put it into that M plus first coefficient and quote, unquote, de-bias the rest so that the effect of length is not included. And that's what we do in style control. Now we don't just do it for M plus one. We have, you know, five, six different style components that have to do with markdown headers and bulleted lists and so on that we add here. Now, where is this going? You guys see the idea. It's a general methodology. If you have something that's sort of like a nuisance parameter, something that exists and provides predictive value, but you really don't want to estimate that. You want to remove its effect. In causal inference, these things are called like confounders often. What you can do is you can model the effect. You can put them into your model and try to adjust for them. So another one of those things might be cost. You know, what if I want to look at the cost adjusted performance of my model, which models are punching above their weight, parameter count, which models are punching above their weight in terms of parameter count, we can ex post facto measure that. We can do it without introducing anything that compromises the organic nature of theWei Lin [00:17:17]: data that we collect.Anastasios [00:17:18]: Hopefully that answers the question.Wei Lin [00:17:20]: It does.Swyx [00:17:21]: So I guess with a background in econometrics, this is super familiar.Anastasios [00:17:25]: You're probably better at this than me for sure.Swyx [00:17:27]: Well, I mean, so I used to be, you know, a quantitative trader and so, you know, controlling for multiple effects on stock price is effectively the job. So it's interesting. Obviously the problem is proving causation, which is hard, but you don't have to do that.Anastasios [00:17:45]: Yes. Yes, that's right. And causal inference is a hard problem and it goes beyond statistics, right? It's like you have to build the right causal model and so on and so forth. But we think that this is a good first step and we're sort of looking forward to learning from more people. You know, there's some good people at Berkeley that work on causal inference for the learning from them on like, what are the really most contemporary techniques that we can use in order to estimate true causal effects if possible.Swyx [00:18:10]: Maybe we could take a step through the other categories. So style control is a category. It is not a default. I have thought that when you wrote that blog post, actually, I thought it would be the new default because it seems like the most obvious thing to control for. But you also have other categories, you have coding, you have hard prompts. We consider that.Anastasios [00:18:27]: We're still actively considering it. It's just, you know, once you make that step, once you take that step, you're introducing your opinion and I'm not, you know, why should our opinion be the one? That's kind of a community choice. We could put it to a vote.Wei Lin [00:18:39]: We could pass.Anastasios [00:18:40]: Yeah, maybe do a poll. Maybe do a poll.Swyx [00:18:42]: I don't know. No opinion is an opinion.Wei Lin [00:18:44]: You know what I mean?Swyx [00:18:45]: Yeah.Wei Lin [00:18:46]: There's no neutral choice here.Swyx [00:18:47]: Yeah. You have all these others. You have instruction following too. What are your favorite categories that you like to talk about? Maybe you tell a little bit of the stories, tell a little bit of like the hard choices that you had to make.Wei Lin [00:18:57]: Yeah. Yeah. Yeah. I think the, uh, initially the reason why we want to add these new categories is essentially to answer some of the questions from our community, which is we won't have a single leaderboard for everything. So these models behave very differently in different domains. Let's say this model is trend for coding, this model trend for more technical questions and so on. On the other hand, to answer people's question about like, okay, what if all these low quality, you know, because we crowdsource data from the internet, there will be noise. So how do we de-noise? How do we filter out these low quality data effectively? So that was like, you know, some questions we want to answer. So basically we spent a few months, like really diving into these questions to understand how do we filter all these data because these are like medias of data points. And then if you want to re-label yourself, it's possible, but we need to kind of like to automate this kind of data classification pipeline for us to effectively categorize them to different categories, say coding, math, structure, and also harder problems. So that was like, the hope is when we slice the data into these meaningful categories to give people more like better signals, more direct signals, and that's also to clarify what we are actually measuring for, because I think that's the core part of the benchmark. That was the initial motivation. Does that make sense?Anastasios [00:20:27]: Yeah. Also, I'll just say, this does like get back to the point that the philosophy is to like mine organic, to take organic data and then mine it x plus factor.Alessio [00:20:35]: Is the data cage-free too, or just organic?Anastasios [00:20:39]: It's cage-free.Wei Lin [00:20:40]: No GMO. Yeah. And all of these efforts are like open source, like we open source all of the data cleaning pipeline, filtering pipeline. Yeah.Swyx [00:20:50]: I love the notebooks you guys publish. Actually really good just for learning statistics.Wei Lin [00:20:54]: Yeah. I'll share this insights with everyone.Alessio [00:20:59]: I agree on the initial premise of, Hey, writing an email, writing a story, there's like no ground truth. But I think as you move into like coding and like red teaming, some of these things, there's like kind of like skill levels. So I'm curious how you think about the distribution of skill of the users. Like maybe the top 1% of red teamers is just not participating in the arena. So how do you guys think about adjusting for it? And like feels like this where there's kind of like big differences between the average and the top. Yeah.Anastasios [00:21:29]: Red teaming, of course, red teaming is quite challenging. So, okay. Moving back. There's definitely like some tasks that are not as subjective that like pairwise human preference feedback is not the only signal that you would want to measure. And to some extent, maybe it's useful, but it may be more useful if you give people better tools. For example, it'd be great if we could execute code with an arena, be fantastic.Wei Lin [00:21:52]: We want to do it.Anastasios [00:21:53]: There's also this idea of constructing a user leaderboard. What does that mean? That means some users are better than others. And how do we measure that? How do we quantify that? Hard in chatbot arena, but where it is easier is in red teaming, because in red teaming, there's an explicit game. You're trying to break the model, you either win or you lose. So what you can do is you can say, Hey, what's really happening here is that the models and humans are playing a game against one another. And then you can use the same sort of Bradley Terry methodology with some, some extensions that we came up with in one of you can read one of our recent blog posts for, for the sort of theoretical extensions. You can attribute like strength back to individual players and jointly attribute strength to like the models that are in this jailbreaking game, along with the target tasks, like what types of jailbreaks you want.Wei Lin [00:22:44]: So yeah.Anastasios [00:22:45]: And I think that this is, this is a hugely important and interesting avenue that we want to continue researching. We have some initial ideas, but you know, all thoughts are welcome.Wei Lin [00:22:54]: Yeah.Alessio [00:22:55]: So first of all, on the code execution, the E2B guys, I'm sure they'll be happy to helpWei Lin [00:22:59]: you.Alessio [00:23:00]: I'll please set that up. They're big fans. We're investors in a company called Dreadnought, which we do a lot in AI red teaming. I think to me, the most interesting thing has been, how do you do sure? Like the model jailbreak is one side. We also had Nicola Scarlini from DeepMind on the podcast, and he was talking about, for example, like, you know, context stealing and like a weight stealing. So there's kind of like a lot more that goes around it. I'm curious just how you think about the model and then maybe like the broader system, even with Red Team Arena, you're just focused on like jailbreaking of the model, right? You're not doing kind of like any testing on the more system level thing of the model where like, maybe you can get the training data back, you're going to exfiltrate some of the layers and the weights and things like that.Wei Lin [00:23:43]: So right now, as you can see, the Red Team Arena is at a very early stage and we are still exploring what could be the potential new games we can introduce to the platform. So the idea is still the same, right? And we build a community driven project platform for people. They can have fun with this website, for sure. That's one thing, and then help everyone to test these models. So one of the aspects you mentioned is stealing secrets, stealing training sets. That could be one, you know, it could be designed as a game. Say, can you still use their credential, you know, we hide, maybe we can hide the credential into system prompts and so on. So there are like a few potential ideas we want to explore for sure. Do you want to add more?Anastasios [00:24:28]: I think that this is great. This idea is a great one. There's a lot of great ideas in the Red Teaming space. You know, I'm not personally like a Red Teamer. I don't like go around and Red Team models, but there are people that do that and they're awesome. They're super skilled. When I think about the Red Team arena, I think those are really the people that we're building it for. Like, we want to make them excited and happy, build tools that they like. And just like chatbot arena, we'll trust that this will end up being useful for the world. And all these people are, you know, I won't say all these people in this community are actually good hearted, right? They're not doing it because they want to like see the world burn. They're doing it because they like, think it's fun and cool. And yeah. Okay. Maybe they want to see, maybe they want a little bit.Wei Lin [00:25:13]: I don't know. Majority.Anastasios [00:25:15]: Yeah.Wei Lin [00:25:16]: You know what I'm saying.Anastasios [00:25:17]: So, you know, trying to figure out how to serve them best, I think, I don't know where that fits. I just, I'm not expressing. And give them credits, right?Wei Lin [00:25:24]: And give them credit.Anastasios [00:25:25]: Yeah. Yeah. So I'm not trying to express any particular value judgment here as to whether that's the right next step. It's just, that's sort of the way that I think we would think about it.Swyx [00:25:35]: Yeah. We also talked to Sander Schulhoff of the HackerPrompt competition, and he's pretty interested in Red Teaming at scale. Let's just call it that. You guys maybe want to talk with him.Wei Lin [00:25:45]: Oh, nice.Swyx [00:25:46]: We wanted to cover a little, a few topical things and then go into the other stuff that your group is doing. You know, you're not just running Chatbot Arena. We can also talk about the new website and your future plans, but I just wanted to briefly focus on O1. It is the hottest, latest model. Obviously, you guys already have it on the leaderboard. What is the impact of O1 on your evals?Wei Lin [00:26:06]: Made our interface slower.Anastasios [00:26:07]: It made it slower.Swyx [00:26:08]: Yeah.Wei Lin [00:26:10]: Because it needs like 30, 60 seconds, sometimes even more to, the latency is like higher. So that's one. Sure. But I think we observe very interesting things from this model as well. Like we observe like significant improvement in certain categories, like more technical or math. Yeah.Anastasios [00:26:32]: I think actually like one takeaway that was encouraging is that I think a lot of people before the O1 release were thinking, oh, like this benchmark is saturated. And why were they thinking that? They were thinking that because there was a bunch of models that were kind of at the same level. They were just kind of like incrementally competing and it sort of wasn't immediately obvious that any of them were any better. Nobody, including any individual person, it's hard to tell. But what O1 did is it was, it's clearly a better model for certain tasks. I mean, I used it for like proving some theorems and you know, there's some theorems that like only I know because I still do a little bit of theory. Right. So it's like, I can go in there and ask like, oh, how would you prove this exact thing? Which I can tell you has never been in the public domain. It'll do it. It's like, what?Wei Lin [00:27:19]: Okay.Anastasios [00:27:20]: So there's this model and it crushed the benchmark. You know, it's just like really like a big gap. And what that's telling us is that it's not saturated yet. It's still measuring some signal. That was encouraging. The point, the takeaway is that the benchmark is comparative. There's no absolute number. There's no maximum ELO. It's just like, if you're better than the rest, then you win. I think that was actually quite helpful to us.Swyx [00:27:46]: I think people were criticizing, I saw some of the academics criticizing it as not apples to apples. Right. Like, because it can take more time to reason, it's basically doing some search, doing some chain of thought that if you actually let the other models do that same thing, they might do better.Wei Lin [00:28:03]: Absolutely.Anastasios [00:28:04]: To be clear, none of the leaderboard currently is apples to apples because you have like Gemini Flash, you have, you know, all sorts of tiny models like Lama 8B, like 8B and 405B are not apples to apples.Wei Lin [00:28:19]: Totally agree. They have different latencies.Anastasios [00:28:21]: Different latencies.Wei Lin [00:28:22]: Control for latency. Yeah.Anastasios [00:28:24]: Latency control. That's another thing. We can do style control, but latency control. You know, things like this are important if you want to understand the trade-offs involved in using AI.Swyx [00:28:34]: O1 is a developing story. We still haven't seen the full model yet, but it's definitely a very exciting new paradigm. I think one community controversy I just wanted to give you guys space to address is the collaboration between you and the large model labs. People have been suspicious, let's just say, about how they choose to A-B test on you. I'll state the argument and let you respond, which is basically they run like five anonymous models and basically argmax their Elo on LMSYS or chatbot arena, and they release the best one. Right? What has been your end of the controversy? How have you decided to clarify your policy going forward?Wei Lin [00:29:15]: On a high level, I think our goal here is to build a fast eval for everyone, and including everyone in the community can see the data board and understand, compare the models. More importantly, I think we want to build the best eval also for model builders, like all these frontier labs building models. They're also internally facing a challenge, which is how do they eval the model? That's the reason why we want to partner with all the frontier lab people, and then to help them testing. That's one of the... We want to solve this technical challenge, which is eval. Yeah.Anastasios [00:29:54]: I mean, ideally, it benefits everyone, right?Wei Lin [00:29:56]: Yeah.Anastasios [00:29:57]: And people also are interested in seeing the leading edge of the models. People in the community seem to like that. Oh, there's a new model up. Is this strawberry? People are excited. People are interested. Yeah. And then there's this question that you bring up of, is it actually causing harm?Wei Lin [00:30:15]: Right?Anastasios [00:30:16]: Is it causing harm to the benchmark that we are allowing this private testing to happen? Maybe stepping back, why do you have that instinct? The reason why you and others in the community have that instinct is because when you look at something like a benchmark, like an image net, a static benchmark, what happens is that if I give you a million different models that are all slightly different, and I pick the best one, there's something called selection bias that plays in, which is that the performance of the winning model is overstated. This is also sometimes called the winner's curse. And that's because statistical fluctuations in the evaluation, they're driving which model gets selected as the top. So this selection bias can be a problem. Now there's a couple of things that make this benchmark slightly different. So first of all, the selection bias that you include when you're only testing five models is normally empirically small.Wei Lin [00:31:12]: And that's why we have these confidence intervals constructed.Anastasios [00:31:16]: That's right. Yeah. Our confidence intervals are actually not multiplicity adjusted. One thing that we could do immediately tomorrow in order to address this concern is if a model provider is testing five models and they want to release one, and we're constructing the models at level one minus alpha, we can just construct the intervals instead at level one minus alpha divided by five. That's called Bonferroni correction. What that'll tell you is that the final performance of the model, the interval that gets constructed, is actually formally correct. We don't do that right now, partially because we know from simulations that the amount of selection bias you incur with these five things is just not huge. It's not huge in comparison to the variability that you get from just regular human voters. So that's one thing. But then the second thing is the benchmark is live, right? So what ends up happening is it'll be a small magnitude, but even if you suffer from the winner's curse after testing these five models, what'll happen is that over time, because we're getting new data, it'll get adjusted down. So if there's any bias that gets introduced at that stage, in the long run, it actually doesn't matter. Because asymptotically, basically in the long run, there's way more fresh data than there is data that was used to compare these five models against these private models.Swyx [00:32:35]: The announcement effect is only just the first phase and it has a long tail.Anastasios [00:32:39]: Yeah, that's right. And it sort of like automatically corrects itself for this selection adjustment.Swyx [00:32:45]: Every month, I do a little chart of Ellim's ELO versus cost, just to track the price per dollar, the amount of like, how much money do I have to pay for one incremental point in ELO? And so I actually observe an interesting stability in most of the ELO numbers, except for some of them. For example, GPT-4-O August has fallen from 12.90

Podcasts about Red team

Best podcasts about Red team

Security Happy Hour

2ND CONTACT READY

ITSPmagazine | Technology. Cybersecurity. Society

Brakeing Down Security Podcast

Inspired by Ms Amber Red

The InfoSec & OSINT Show

The Thinking Leader

Cybernation Uncensored

7 Minute Security

The Steve Gruber Show

The CyberWire

Security Conversations

Paul's Security Weekly TV

The Political Orphanage

Cloud Security Podcast

Cyber Work

Content Strategy Insights

Halo Podcast Evolved

Cybercrime Magazine Podcast

Cyber and Technology with Mike

Amateur Hour Chiefs Podcast

The Social-Engineer Podcast

The Hacker Factory

Digital Forensic Survival Podcast

WP Builds

How to Be Awesome at Your Job

Courtney & Company

the CYBER5

Global Dispatches -- World News That Matters

David Bombal

Red Team Reviews

Jupiter Extras

Tradecraft Security Weekly (Video)

Hacker And The Fed

All TWiT.tv Shows (MP3)

Latest news about Red team

Latest podcast episodes about Red team

#154 ANGRIFFSLUSTIG – Red Teaming

As the Liberals introduce a new Federal Budget, a Conservative MP crosses the floor

Teknik - La guerre Red Team vs EDR - l'aspect business et non technique du problème - Parce que... c'est l'épisode 0x657!

8.5 - Footprint Discovery for Red Teamers

Mehr als Firewalls: Physische Sicherheit im Rechenzentrum

Inside Offensive AI: From MCP Servers To Real Security Risks

The Three Most Important Words We're Taught Not to Say

389 - Lessons and AI Insights from Using the Red Blue and Purple Team Strategy with Dave Frees

Episodio 314: El Impacto Millonario de los Ciberataques

No Password Required Podcast Episode 64 — DeMarcus Williams

Red Team Masterclass: Crafting & Executing Cyber Attacks (Part 2)

Red Team Masterclass: Crafting & Executing Cyber Attacks (Part 1)

Ep. 321 - Security Awareness Series - Trust But Verify Even Under Abnormal Circumstances: A Red Team Story with Chris and Faith

L'anticipation stratégique par l'imaginaire de la science-fiction. Avec le Lieutenant-colonel Jean-Baptiste Colas et Virginie Tournay

Red Team: The Best Job in Cybersecurity (Until You Get Arrested)

380 - Future-Proofing Your Podiatry Practice with David M Frees (Red, Blue, Purple Teaming)

Thirty Years of Application Security with Michael Howard

Thirty Years of Application Security with Michael Howard

「御社のゼロデイ見つけます」ITM と韓国 NSHC が業務提携しペンテスト等提供 ～ 800 万円から

On No Password Required Podcast Episode 61 — Kathy Collins

Hands-On, Job-Ready: A Fresh Approach to Building the Next Generation of Pen Testers | A White Knight Labs Brand Story With John Stigerwalt And Greg Hatcher

The Valley-From Coyote Ugly to Maui Messy

The Art and Science of Microsoft's Red Team

COMMENT LES GROUPES DE HACKERS LES PLUS DANGEREUX AU MONDE SONT ARRÊTÉS QUAND ILS ATTAQUENT LES BANQUES ?

Episode 59 - Human + AI: Shaping the Nonprofit Work of Tomorrow

Build your own pen testing tools and master red teaming tactics | Ed Williams

Purple teaming in the modern enterprise. [CyberWire-X]

Episode 113: Microsoft Red Team

Fixing the Detection Disconnect and Rethinking Detection: From Static Rules to Living Signals | A Brand Story with Fred Wilmot from Detecteam | An On Location RSAC Conference 2025 Brand Story

No Manuals, No Shortcuts: Inside the Offensive Security Mindset at White Knight Labs | A White Knight Labs Brand Story With Co-Founders John Stigerwalt And Greg Hatcher

Talking Lead 571 – SPYCRAFT: Corporate Espionage Files

TLP 571 – SPYCRAFT: Corporate Espionage Files

Joas Santos: Hackeando para proteger - Segurança Ofensiva

Breaking AI to Build Trust: A Conversation with a Microsoft Red Team Engineer

Trailer – PodCafé Tech com Joas Santos

Jerry Tylman, Founder of the Fraud Red Team, on the gaps in fraud detection systems

CISA Layoffs, AI Chatbots in Government, and Utah's Age Verification Law: Tech Policy Updates

Cyberpunk Red Rising: Ep 5. Red Team (Everyday Heroes)

Hacked Healthcare, Hacked Cars & The Hidden Risks of Modern Tech

Back to the office, back to the threats.

Refactoring the Windows Kernel with Joe Bialek

「御社のゼロデイ見つけます」ITM と韓国 NSHC が業務提携しペンテスト等提供～ 800 万円から