POPULARITY
GPT-4.1 entzaubert: Was die 1 Mio. Token wirklich bedeuten GPT-4.1 ist in aller Munde – und mit ihm das Versprechen eines riesigen Kontextfensters von einer Million Token. Doch was steckt wirklich dahinter? Kannst du wirklich ganze Bücher, Logfiles oder Datensätze ohne Einschränkung analysieren? In diesem kurzen Impuls klären wir, was Realität ist, wo die technischen Grenzen liegen und wie du GPT-4.1 sinnvoll in deinem Business nutzt, ohne dich von Schlagworten blenden zu lassen. Torsten Kohnert auf LinkedIn: LinkedIn - https://www.linkedin.com/in/torsten-kohnert/ GPT-4.1 ist nicht gleich GPT-4.1 Das neue Modell GPT-4.1 gibt es in drei Varianten: Standard, Schnell (Mini) und Nano. Alle Modelle sollen bis zu eine Million Token verarbeiten können. Auf dem Papier klingt das nach einem Quantensprung. Doch die entscheidende Frage ist: In welcher Umgebung arbeitest du mit dem Modell? Die Leistungsfähigkeit hängt stark davon ab, ob du die Web-Version nutzt oder über eine Schnittstelle gehst. Genau hier trennt sich Theorie und Praxis. Web-App vs. API – was du wirklich bekommst Die meisten nutzen GPT-4.1 über die ChatGPT-Webanwendung – hier liegt das Limit bei 128.000 Token. Das ist solide, aber weit entfernt vom vollen Potenzial. Die oft erwähnten eine Million Token sind nur über die OpenAI API erreichbar. Wer also eigene Tools baut oder GPT-4.1 über den Playground steuert, kann auf das volle Kontextfenster zugreifen. Wer nur über den Browser arbeitet, bleibt deutlich eingeschränkter – und genau das sorgt oft für Verwirrung. Leistungsstark, aber nur mit dem richtigen Zugang Ein Beispiel aus der Praxis: In einer öffentlichen Demo wurden mit GPT-4.1 über 450.000 Token an NASA-Logdaten analysiert – ohne Fehler, schnell und präzise. Aber das funktionierte eben nicht im Browser, sondern über die API. Für datenintensive Aufgaben wie Reportings, Protokollanalysen oder komplexe Textvergleiche ist der API-Zugang entscheidend. Wer diese Power nutzen will, braucht entweder technisches Verständnis oder einen Partner, der bei der Umsetzung unterstützt. Fazit: Technologisch top – mit klarem Setup-Check GPT-4.1 ist ein starkes Modell – keine Frage. Aber die kommunizierten Leistungsdaten gelten nicht für jeden Anwendungsfall. Wenn du über die API arbeitest, kannst du wirklich das Maximum herausholen. In der Web-App bekommst du eine starke, aber begrenzte Version. Bevor du also Erwartungen entwickelst, prüfe genau, welches Setup du nutzt – und was es in der Praxis leisten kann. Nur dann hilft dir GPT-4.1 wirklich weiter. Noch mehr von den Koertings ... Das KI-Café ... jede Woche Mittwoch (>350 Teilnehmer) von 08:30 bis 10:00 Uhr ... online via Zoom .. kostenlos und nicht umsonstJede Woche Mittwoch um 08:30 Uhr öffnet das KI-Café seine Online-Pforten ... wir lösen KI-Anwendungsfälle live auf der Bühne ... moderieren Expertenpanel zu speziellen Themen (bspw. KI im Recruiting ... KI in der Qualitätssicherung ... KI im Projektmanagement ... und vieles mehr) ... ordnen die neuen Entwicklungen in der KI-Welt ein und geben einen Ausblick ... und laden Experten ein für spezielle Themen ... und gehen auch mal in die Tiefe und durchdringen bestimmte Bereiche ganz konkret ... alles für dein Weiterkommen. Melde dich kostenfrei an ... www.koerting-institute.com/ki-cafe/ Das KI-Buch ... für Selbstständige und Unternehmer Lerne, wie ChatGPT deine Produktivität steigert, Zeit spart und Umsätze maximiert. Enthält praxisnahe Beispiele für Buchvermarktung, Text- und Datenanalysen sowie 30 konkrete Anwendungsfälle. Entwickle eigene Prompts, verbessere Marketing & Vertrieb und entlaste dich von Routineaufgaben. Geschrieben von Torsten & Birgit Koerting, Vorreitern im KI-Bereich, die Unternehmer bei der Transformation unterstützen. Das Buch ist ein Geschenk, nur Versandkosten von 6,95 € fallen an. Perfekt für Anfänger und Fortgeschrittene, die mit KI ihr Potenzial ausschöpfen möchten. Das Buch in deinen Briefkasten ... www.koerting-institute.com/ki-buch/ Die KI-Lounge ... unsere Community für den Einstieg in die KI (>2100 Mitglieder) Die KI-Lounge ist eine Community für alle, die mehr über generative KI erfahren und anwenden möchten. Mitglieder erhalten exklusive monatliche KI-Updates, Experten-Interviews, Vorträge des KI-Speaker-Slams, KI-Café-Aufzeichnungen und einen 3-stündigen ChatGPT-Kurs. Tausche dich mit über 2100 KI-Enthusiasten aus, stelle Fragen und starte durch. Initiiert von Torsten & Birgit Koerting, bietet die KI-Lounge Orientierung und Inspiration für den Einstieg in die KI-Revolution. Hier findet der Austausch statt ... www.koerting-institute.com/ki-lounge/ Starte mit uns in die 1:1 Zusammenarbeit Wenn du direkt mit uns arbeiten und KI in deinem Business integrieren möchtest, buche dir einen Termin für ein persönliches Gespräch. Gemeinsam finden wir Antworten auf deine Fragen und finden heraus, wie wir dich unterstützen können. Klicke hier, um einen Termin zu buchen und deine Fragen zu klären. Buche dir jetzt deinen Termin mit uns ... www.koerting-institute.com/termin/ Weitere Impulse im Netflix Stil ... Wenn du auf der Suche nach weiteren spannenden Impulsen für deine Selbstständigkeit bist, dann gehe jetzt auf unsere Impulseseite und lass die zahlreichen spannenden Impulse auf dich wirken. Inspiration pur ... www.koerting-institute.com/impulse/ Die Koertings auf die Ohren ... Wenn dir diese Podcastfolge gefallen hat, dann höre dir jetzt noch weitere informative und spannende Folgen an ... über 420 Folgen findest du hier ... www.koerting-institute.com/podcast/ Wir freuen uns darauf, dich auf deinem Weg zu begleiten!
AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store
This tutorial video demonstrates how to create a simple interactive AI chatbot using Google Colab and the OpenAI API. It guides viewers through obtaining an OpenAI API key (which is not free but offers learning credits), setting up the key in Google Colab, installing the necessary OpenAI module, importing libraries, and writing the Python code to build and initialise the chatbot. The video also shows how to test the chatbot with sample conversations and highlights the availability of a builder toolkit at Djamgate.com containing the code and further tutorials. You tune in daily for the latest AI breakthroughs, but what if you could start building them yourself? We've heard your requests for practical guides, and now we're delivering! Introducing AI Unraveled: The Builder's Toolkit, a comprehensive and continuously expanding collection of AI tutorials. Each guide comes with detailed, illustrated PDF instructions and a complementary audio explanation, designed to get you building – from your first OpenAI agent to advanced AI applications. This exclusive resource is a one-time purchase, providing lifetime access to every new tutorial we add weekly. Your support directly fuels our daily mission to keep you informed and ahead in the world of AI.Start building today:Download the Full Toolkit at https://djamgatech.com/product/ai-unraveled-the-builders-toolkit-practical-ai-tutorials-projects-e-book-audio/Shopify: https://djamgatech.myshopify.com/products/%F0%9F%9B%A0%EF%B8%8F-ai-unraveled-the-builders-toolkit-practical-ai-tutorials-projects-e-book-audio-video?utm_source=copyToPasteBoard&utm_medium=product-links&utm_content=web
AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store
Building a basic AI agent using OpenAI's platform involves several key steps. First, developers need to clearly define the agent's objective and select an appropriate OpenAI model (such as GPT-4o, o3-mini, or GPT-4.1) based on the complexity of the task and desired latency. After setting up the development environment with an OpenAI API key, clear instructional prompts are crafted to define the agent's behavior, role, and response style. For more advanced functionalities, agents can be equipped with tools like web search, file search, or the ability to call external functions (APIs). Frameworks like OpenAI's Agents SDK or libraries such as LangChain can then be used for orchestrating multi-step tasks, managing memory, and integrating the agent with other applications, followed by thorough testing and iteration. Get Full access to the AI Unraveled Builder's Toolkit (Video + Audio + E-book) at https://djamgatech.com/product/ai-unraveled-the-builders-toolkit-practical-ai-tutorials-projects-e-book-audio/ STEP BY STEP: Go to Google Colab and install OpenAI agents with pip install openai-agents Get your API key from OpenAI's platform and add some credits to your account Import libraries and create your agent with a model (e.g., gpt-4o or o3-mini), instructions, and web search tool Run your agent and print the results What this means: OpenAI is providing increasingly powerful and accessible tools and APIs that simplify the process for developers to create custom AI agents. This empowers builders of varying skill levels to design specialized AI solutions capable of performing complex, autonomous tasks across a wide range of applications, from simple automation to more sophisticated agentic workflows.
Join Dan Vega and DaShaun Carter for the latest updates from the Spring Ecosystem. In this episode, they welcome Eddú Meléndez, who works on Testcontainers at Docker following the company's acquisition of the project.The trio explores the recently released Docker Model Runner in Docker Desktop 4.40.0, which provides a local Inference API compatible with the OpenAI API and integrates seamlessly with Spring AI 1.0.0-M7.Eddú shares his journey of contributing to Spring projects, discusses his experience with Testcontainers, and provides insights on running AI models locally with zero API keys or data sharing. Don't miss this in-depth look at the intersection of Spring AI and Docker technologies, showcasing how developers can leverage these powerful tools in their projects.You can participate in our live stream to ask questions or catch the replay on your preferred podcast platform.Show NotesSpring AI with Docker Model RunnerEddú Meléndez on TwitterEddú Meléndez on BlueSky
Get ready to explore how generative AI is transforming development in Oracle APEX. In this episode, hosts Lois Houston and Nikita Abraham are joined by Oracle APEX experts Apoorva Srinivas and Toufiq Mohammed to break down the innovative features of APEX 24.1. Learn how developers can use APEX Assistant to build apps, generate SQL, and create data models using natural language prompts. Oracle APEX: Empowering Low Code Apps with AI: https://mylearn.oracle.com/ou/course/oracle-apex-empowering-low-code-apps-with-ai/146047/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://x.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Kris-Ann Nansen, Radhika Banka, and the OU Studio Team for helping us create this episode. -------------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started! 00:25 Nikita: Welcome back to another episode of the Oracle University Podcast! I'm Nikita Abraham, Team Lead of Editorial Services with Oracle University, and I'm joined by Lois Houston, Director of Innovation Programs. Lois: Hi everyone! In our last episode, we spoke about Oracle APEX and AI. We covered the data and AI -centric challenges businesses are up against and explored how AI fits in with Oracle APEX. Niki, what's in store for today? Nikita: Well, Lois, today we're diving into how generative AI powers Oracle APEX. With APEX 24.1, developers can use the Create Application Wizard to tell APEX what kind of application they want to build based on available tables. Plus, APEX Assistant helps create, refine, and debug SQL code in natural language. 01:16 Lois: Right. Today's episode will focus on how generative AI enhances development in APEX. We'll explore its architecture, the different AI providers, and key use cases. Joining us are two senior product managers from Oracle—Apoorva Srinivas and Toufiq Mohammed. Thank you both for joining us today. We'll start with you, Apoorva. Can you tell us a bit about the generative AI service in Oracle APEX? Apoorva: It is nothing but an abstraction to the popular commercial Generative AI products, like OCI Generative AI, OpenAI, and Cohere. APEX makes use of the existing REST infrastructure to authenticate using the web credentials with Generative AI Services. Once you configure the Generative AI Service, it can be used by the App Builder, AI Assistant, and AI Dynamic Actions, like Show AI Assistant and Generate Text with AI, and also the APEX_AI PL/SQL API. You can enable or disable the Generative AI Service on the APEX instance level and on the workspace level. 02:31 Nikita: Ok. Got it. So, Apoorva, which AI providers can be configured in the APEX Gen AI service? Apoorva: First is the popular OpenAI. If you have registered and subscribed for an OpenAI API key, you can just enter the API key in your APEX workspace to configure the Generative AI service. APEX makes use of the chat completions endpoint in OpenAI. Second is the OCI Generative AI Service. Once you have configured an OCI API key on Oracle Cloud, you can make use of the chat models. The chat models are available from Cohere family and Meta Llama family. The third is the Cohere. The configuration of Cohere is similar to OpenAI. You need to have your Cohere OpenAI key. And it provides a similar chat functionality using the chat endpoint. 03:29 Lois: What is the purpose of the APEX_AI PL/SQL public API that we now have? How is it used within the APEX ecosystem? Apoorva: It models the chat operation of the popular Generative AI REST Services. This is the same package used internally by the chat widget of the APEX Assistant. There are more procedures around consent management, which you can configure using this package. 03:58 Lois: Apoorva, at a high level, how does generative AI fit into the APEX environment? Apoorva: APEX makes use of the existing REST infrastructure—that is the web credentials and remote server—to configure the Generative AI Service. The inferencing is done by the backend Generative AI Service. For the Generative AI use case in APEX, such as NL2SQL and creation of an app, APEX performs the prompt enrichment. 04:29 Nikita: And what exactly is prompt enrichment? Apoorva: Let's say you provide a prompt saying "show me the average salary of employees in each department." APEX will take this prompt and enrich it by adding in more details. It elaborates on the prompt by mentioning the requirements, such as Oracle SQL syntax statement, and providing some metadata from the data dictionary of APEX. Once the prompt enrichment is complete, it is then passed on to the LLM inferencing service. Therefore, the SQL query provided by the AI Assistant is more accurate and in context. 05:15 Unlock the power of AI Vector Search with our new course and certification. Get more accurate search results, handle complex datasets easily, and supercharge your data-driven decisions. From now to May 15, 2025, we are waiving the certification exam fee (valued at $245). Visit mylearn.oracle.com to enroll. 05:41 Nikita: Welcome back! Let's talk use cases. Apoorva, can you share some ways developers can use generative AI with APEX? Apoorva: SQL is an integral part of building APEX apps. You use SQL everywhere. You can make use of the NL2SQL feature in the code editor by using the APEX Assistant to generate SQL queries while building the apps. The second is the prompt-based app creation. With APEX Assistant, you can now generate fully functional APEX apps by providing prompts in natural language. Third is the AI Assistant, which is a chat widget provided by APEX in all the code editors and for creation of apps. You can chat with the AI Assistant by providing your prompts and get responses from the Generative AI Services. 06:37 Lois: Without getting too technical, can you tell us how to create a data model using AI? Apoorva: A SQL Workshop utility called Create Data Model Using AI uses AI to help you create your own data model. The APEX Assistant generates a script to create tables, triggers, and constraints in either Oracle SQL or Quick SQL format. You can also insert sample data into these tables. But before you use this feature, you must create a generative AI service and enable the Used by App Builder setting. If you are using the Oracle SQL format, when you click on Create SQL Script, APEX generates the script and brings you to this script editor page. Whereas if you are using the Quick SQL format, when you click on Review Quick SQL, APEX generates the Quick SQL code and brings you to the Quick SQL page. 07:39 Lois: And to see a detailed demo of creating a custom data model with the APEX Assistant, visit mylearn.oracle.com and search for the "Oracle APEX: Empowering Low Code Apps with AI" course. Apoorva, what about creating an APEX app from a prompt. What's that process like? Apoorva: APEX 24.1 introduces a new feature where you can generate an application blueprint based on a prompt using natural language. The APEX Assistant leverages the APEX Dictionary Cache to identify relevant tables while suggesting the pages to be created for your application. You can iterate over the application design by providing further prompts using natural language and then generating an application based on your needs. Once you are satisfied, you can click on Create Application, which takes you to the Create Application Wizard in APEX, where you can further customize your application, such as application icon and other features, and finally, go ahead to create your application. 08:53 Nikita: Again, you can watch a demo of this on MyLearn. So, check that out if you want to dive deeper. Lois: That's right, Niki. Thank you for these great insights, Apoorva! Now, let's turn to Toufiq. Toufiq, can you tell us more about the APEX Assistant feature in Oracle APEX. What is it and how does it work? Toufiq: APEX Assistant is available in Code Editors in the APEX App Builder. It leverages generative AI services as the backend to answer your questions asked in natural language. APEX Assistant makes use of the APEX dictionary cache to identify relevant tables while generating SQL queries. Using the Query Builder mode enables Assistant. You can generate SQL queries from natural language for Form, Report, and other region types which support SQL queries. Using the general assistance mode, you can generate PL/SQL JavaScript, HTML, or CSS Code, and seek further assistance from generative AI. For example, you can ask the APEX Assistant to optimize the code, format the code for better readability, add comments, etc. APEX Assistant also comes with two quick actions, Improve and Explain, which can help users improve and understand the selected code. 10:17 Nikita: What about the Show AI Assistant dynamic action? I know that it provides an AI chat interface, but can you tell us a little more about it? Toufiq: It is a native dynamic action in Oracle APEX which renders an AI chat user interface. It leverages the generative AI services that are configured under Workspace utilities. This AI chat user interface can be rendered inline or as a dialog. This dynamic action also has configurable system prompt and welcome message attributes. 10:52 Lois: Are there attributes you can configure to leverage even more customization? Toufiq: The first attribute is the initial prompt. The initial prompt represents a message as if it were coming from the user. This can either be a specific item value or a value derived from a JavaScript expression. The next attribute is use response. This attribute determines how the AI Assistant should return responses. The term response refers to the message content of an individual chat message. You have the option to capture this response directly into a page item, or to process it based on more complex logic using JavaScript code. The final attribute is quick actions. A quick action is a predefined phrase that, once clicked, will be sent as a user message. Quick actions defined here show up as chips in the AI chat interface, which a user can click to send the message to Generative AI service without having to manually type in the message. 12:05 Lois: Thank you, Toufiq and Apoorva, for joining us today. Like we were saying, there's a lot more you can find in the “Oracle APEX: Empowering Low Code Apps with AI” course on MyLearn. So, make sure you go check that out. Nikita: Join us next week for a discussion on how to integrate APEX with OCI AI Services. Until then, this is Nikita Abraham… Lois: And Lois Houston signing off! 12:28 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
Kevin Weil is the chief product officer at OpenAI, where he oversees the development of ChatGPT, enterprise products, and the OpenAI API. Prior to OpenAI, Kevin was head of product at Twitter, Instagram, and Planet, and was instrumental in the development of the Libra (later Novi) cryptocurrency project at Facebook.In this episode, you'll learn:1. How OpenAI structures its product teams and maintains agility while developing cutting-edge AI2. The power of model ensembles—using multiple specialized models together like a company of humans with different skills3. Why writing effective evals (AI evaluation tests) is becoming a critical skill for product managers4. The surprisingly enduring value of chat as an interface for AI, despite predictions of its obsolescence5. How “vibe coding” is changing how companies operate6. What OpenAI looks for when hiring product managers (hint: high agency and comfort with ambiguity)7. “Model maximalism” and why today's AI is the worst you'll ever use again8. Practical prompting techniques that improve AI interactions, including example-based prompting—Brought to you by:• Eppo—Run reliable, impactful experiments• Persona—A global leader in digital identity verification• OneSchema—Import CSV data 10x faster—Where to find Kevin Weil:• X: https://x.com/kevinweil• LinkedIn: https://www.linkedin.com/in/kevinweil/—Where to find Lenny:• Newsletter: https://www.lennysnewsletter.com• X: https://twitter.com/lennysan• LinkedIn: https://www.linkedin.com/in/lennyrachitsky/—In this episode, we cover:(00:00) Kevin's background(04:06) OpenAI's new image model(06:52) The role of chief product officer at OpenAI(10:18) His recruitment story and joining OpenAI(17:20) The importance of evals in AI(24:59) Shipping quickly and consistently(28:34) Product reviews and iterative deployment(39:35) Chat as an interface for AI(43:59) Collaboration between researchers and product teams(46:41) Hiring product managers at OpenAI(48:45) Embracing ambiguity in product management(51:41) The role of AI in product teams(53:21) Vibe coding and AI prototyping(55:55) The future of product teams and fine-tuned models(01:04:36) AI in education(01:06:42) Optimism and concerns about AI's future(01:16:37) Reflections on the Libra project(01:20:37) Lightning round and final thoughts—Referenced:• OpenAI: https://openai.com/• The AI-Generated Studio Ghibli Trend, Explained: https://www.forbes.com/sites/danidiplacido/2025/03/27/the-ai-generated-studio-ghibli-trend-explained/• Introducing 4o Image Generation: https://openai.com/index/introducing-4o-image-generation/• Waymo: https://waymo.com/• X: https://x.com• Facebook: https://www.facebook.com/• Instagram: https://www.instagram.com/• Planet: https://www.planet.com/• Sam Altman on X: https://x.com/sama• A conversation with OpenAI's CPO Kevin Weil, Anthropic's CPO Mike Krieger, and Sarah Guo: https://www.youtube.com/watch?v=IxkvVZua28k• OpenAI evals: https://github.com/openai/evals• Deep Research: https://openai.com/index/introducing-deep-research/• Ev Williams on X: https://x.com/ev• OpenAI API: https://platform.openai.com/docs/overview• Dwight Eisenhower quote: https://www.brainyquote.com/quotes/dwight_d_eisenhower_164720• Inside Bolt: From near-death to ~$40m ARR in 5 months—one of the fastest-growing products in history | Eric Simons (founder & CEO of StackBlitz): https://www.lennysnewsletter.com/p/inside-bolt-eric-simons• StackBlitz: https://stackblitz.com/• Claude 3.5 Sonnet: https://www.anthropic.com/news/claude-3-5-sonnet• Anthropic: https://www.anthropic.com/• Four-minute mile: https://en.wikipedia.org/wiki/Four-minute_mile• Chad: https://chatgpt.com/g/g-3F100ZiIe-chad-open-a-i• Dario Amodei on LinkedIn: https://www.linkedin.com/in/dario-amodei-3934934/• Figma: https://www.figma.com/• Julia Villagra on LinkedIn: https://www.linkedin.com/in/juliavillagra/• Andrej Karpathy on X: https://x.com/karpathy• Silicon Valley CEO says ‘vibe coding' lets 10 engineers do the work of 100—here's how to use it: https://fortune.com/2025/03/26/silicon-valley-ceo-says-vibe-coding-lets-10-engineers-do-the-work-of-100-heres-how-to-use-it/• Cursor: https://www.cursor.com/• Windsurf: https://codeium.com/windsurf• GitHub Copilot: https://github.com/features/copilot• Patrick Srail on X: https://x.com/patricksrail• Khan Academy: https://www.khanacademy.org/• CK-12 Education: https://www.ck12.org/• Sora: https://openai.com/sora/• Sam Altman's post on X about creative writing: https://x.com/sama/status/1899535387435086115• Diem (formerly known as Libra): https://en.wikipedia.org/wiki/Diem_(digital_currency)• Novi: https://about.fb.com/news/2020/05/welcome-to-novi/• David Marcus on LinkedIn: https://www.linkedin.com/in/dmarcus/• Peter Zeihan on X: https://x.com/PeterZeihan• The Wheel of Time on Prime Video: https://www.amazon.com/Wheel-Time-Season-1/dp/B09F59CZ7R• Top Gun: Maverick on Prime Video: https://www.amazon.com/Top-Gun-Maverick-Joseph-Kosinski/dp/B0DM2LYL8G• Thinking like a gardener not a builder, organizing teams like slime mold, the adjacent possible, and other unconventional product advice | Alex Komoroske (Stripe, Google): https://www.lennysnewsletter.com/p/unconventional-product-advice-alex-komoroske• MySQL: https://www.mysql.com/—Recommended books:• Co-Intelligence: Living and Working with AI: https://www.amazon.com/Co-Intelligence-Living-Working-Ethan-Mollick/dp/059371671X• The Accidental Superpower: Ten Years On: https://www.amazon.com/Accidental-Superpower-Ten-Years/dp/1538767341• Cable Cowboy: https://www.amazon.com/Cable-Cowboy-Malone-Modern-Business/dp/047170637X—Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com.—Lenny may be an investor in the companies discussed. Get full access to Lenny's Newsletter at www.lennysnewsletter.com/subscribe
Das ist das KI-Update vom 21.03.2025 unter anderem mit diesen Themen: Auch Claude darf jetzt ins Internet Neue Sprach-Modelle in OpenAI-API verfügbar Nvidia will Hunderte Milliarden in US-Lieferkette investieren und Atlas bewegt sich Dank KI immer menschenähnlicher Links zu allen Themen der heutigen Folge findet Ihr hier: https://heise.de/-10324219 https://www.heise.de/thema/KI-Update https://pro.heise.de/ki/ https://www.heise.de/newsletter/anmeldung.html?id=ki-update https://www.heise.de/thema/Kuenstliche-Intelligenz https://the-decoder.de/ https://www.heiseplus.de/podcast https://www.ct.de/ki
Részletes képen a legelső fény az univerzumban! Rakéta 2025-03-20 07:42:01 Tudomány Kép a sugárzásról, amely mindössze 380 ezer évvel az Ősrobbanás után betöltötte az Univerzumot. Egy új elmélet szerint gyengül a sötét energia, ami fordított ősrobbanáshoz vezethet Telex 2025-03-20 11:12:21 Tudomány Az eddig általánosan elfogadott elmélet az univerzum folyamatosan gyorsuló tágulása volt, de elképzelhetőnek tartják, hogy a sötét energia negatívba is fordulhat. A Windows 11 új billentyűzettel támogatja az Xbox kontroller használatát ITBusiness 2025-03-20 05:15:52 Mobiltech Microsoft Xbox Windows A Microsoft egyre inkább a hordozható játékgépek felé fordul, ennek egyik jele az, hogy a Windows 11 mostantól egy kifejezetten Xbox kontrollerhez igazított billentyűzetet kapott. Az új megoldás jelentősen megkönnyíti a navigálást és a szövegbevitelét az operációs rendszeren belül, amit főleg a játékosok és a hordozható PC-ket használók értékelhetn Nagyon kiakadtak az androidosok, amikor közölték velük, hogy 9 év után megszűnik egy népszerű funkció a telefonjaikon Startlap Vásárlás 2025-03-20 06:33:32 Mobiltech Telefon Apple Okostelefon Google iPhone Android Váratlan bejelentést tett a Google: egy fontos funkció megszűnésével számos változásra számíthatunk ebben az évben. Sokan inkább iPhone-ra váltanak. A spanyol Vodafone az Ericssont választotta Mínuszos 2025-03-20 13:33:12 Mobiltech Spanyolország 5G Vodafone Ericsson A Vodafone Spanyolország az Ericssonnal közösen telepít önálló 5G Core hálózatot a lakossági ügyfeleik részére. A Vodafone Spanyolország az Ericssont választotta fő technológiai partnerként a lakossági ügyfelek számára biztosított 5G Core önálló hálózatának kiépítéséhez. A megállapodás lehetővé teszi a Vodafone számára, hogy teljesen független 5G m Szükség lenne egy új felvilágosodásra – Kutatói gondolatokkal indítja ünnepi programjait az MTA Helló Sajtó! 2025-03-20 08:36:02 Tudomány MTA A 2025-ben 200 éves Magyar Tudományos Akadémia egyik legfontosabb küldetése, hogy továbbra is hiteles forrás, biztos tájékozódási pont maradjon a társadalom számára. Napjaink dezinformációs zűrzavarában ez fontosabb, mint valaha. Kemenesi: Egyre divatosabb a tudománytagadás 24.hu 2025-03-20 11:22:30 Tudomány Infektológus Virológus A virológus meglátása szerint a közösségi platformokon könnyen terjednek a téves információk, ezért a kutatóknak fontos feladatuk lenne megtanulni hatékonyan továbbítani üzeneteiket a nyilvánosság felé. Ki nem találnád, milyen irányba terjeszkedik az Nvidia Igényesférfi.hu 2025-03-20 04:34:46 Infotech USA Mesterséges intelligencia Nvidia Az amerikai chipgyártó összefogott a Yum Brands nevű gyorséttermi vállalattal, hogy közösen új szintre emeljék a mesterséges intelligencia alkalmazását a Taco Bell, a Pizza Hut és a KFC egységeiben. Nyakunkon vannak az AI vezérelte humanoid robotok TechWorld 2025-03-20 14:03:03 Infotech Robot Nvidia Nem mi mondjuk, hanem az Nvidia főnöke, hogy az AI-val kitömött emberszabású robotok hamarosan elárasztják a földet. Jensen Huang, az Nvidia vezérigazgatója szerint kevesebb mint öt év múlva a humanoid robotok széles körben elterjedhetnek a gyártóiparban. Az iparági szakember erről a vállalat éves fejlesztői konferenciáján, a kaliforniai San Joséba Kikényszeríti az együttműködést az Apple-ből az EU HWSW 2025-03-20 10:17:50 Infotech Apple Kerítés A Bizottság két határozatában is tovább bontja az Apple zárt ökoszisztémájának kerítését. Már 5-10 éven belül közel kerülünk az általános mesterséges intelligenciához, aztán jön a szuperintelligencia Blikk 2025-03-20 05:52:00 Infotech Mesterséges intelligencia Google A Google DeepMind vezérigazgatója szerint az emberekkel bármilyen feladat megoldásában eséllyel versenyző mesterséges intelligencia még messze van, de csak idő kérdése, hogy valósággá váljon. Demis Hassabis elmondta, hogy az általános mesterséges általános (AGI) – amely ugyanolyan okos vagy okosabb, mint az ember – viszont már a következő öt-tíz év Megérkezett a legdrágább OpenAI-modell, az o1-pro ITBusiness 2025-03-20 13:33:43 Infotech Mesterséges intelligencia OpenAI Az OpenAI bemutatta eddigi legdrágább MI-modelljét, az o1-pro-t, amely a cég által fejlesztett o1 "gondolkodó" MI egy továbbfejlesztett változata. Az új modell jelenleg kizárólag az OpenAI fejlesztői API-ján érhető el, és kiemelten magas áraival elsősorban azon fejlesztők számára elérhető, akik már legalább 5 dollárt elköltöttek az OpenAI API-szolg Indul a Meta mesterséges intelligencia szolgáltatása az EU-ban Demokrata 2025-03-20 09:22:27 Külföld Mesterséges intelligencia A Meta AI ingyenesen lesz elérhető a már ismert üzenetküldő és közösségi alkalmazásokon belül. A további adásainkat keresd a podcast.hirstart.hu oldalunkon.
Részletes képen a legelső fény az univerzumban! Rakéta 2025-03-20 07:42:01 Tudomány Kép a sugárzásról, amely mindössze 380 ezer évvel az Ősrobbanás után betöltötte az Univerzumot. Egy új elmélet szerint gyengül a sötét energia, ami fordított ősrobbanáshoz vezethet Telex 2025-03-20 11:12:21 Tudomány Az eddig általánosan elfogadott elmélet az univerzum folyamatosan gyorsuló tágulása volt, de elképzelhetőnek tartják, hogy a sötét energia negatívba is fordulhat. A Windows 11 új billentyűzettel támogatja az Xbox kontroller használatát ITBusiness 2025-03-20 05:15:52 Mobiltech Microsoft Xbox Windows A Microsoft egyre inkább a hordozható játékgépek felé fordul, ennek egyik jele az, hogy a Windows 11 mostantól egy kifejezetten Xbox kontrollerhez igazított billentyűzetet kapott. Az új megoldás jelentősen megkönnyíti a navigálást és a szövegbevitelét az operációs rendszeren belül, amit főleg a játékosok és a hordozható PC-ket használók értékelhetn Nagyon kiakadtak az androidosok, amikor közölték velük, hogy 9 év után megszűnik egy népszerű funkció a telefonjaikon Startlap Vásárlás 2025-03-20 06:33:32 Mobiltech Telefon Apple Okostelefon Google iPhone Android Váratlan bejelentést tett a Google: egy fontos funkció megszűnésével számos változásra számíthatunk ebben az évben. Sokan inkább iPhone-ra váltanak. A spanyol Vodafone az Ericssont választotta Mínuszos 2025-03-20 13:33:12 Mobiltech Spanyolország 5G Vodafone Ericsson A Vodafone Spanyolország az Ericssonnal közösen telepít önálló 5G Core hálózatot a lakossági ügyfeleik részére. A Vodafone Spanyolország az Ericssont választotta fő technológiai partnerként a lakossági ügyfelek számára biztosított 5G Core önálló hálózatának kiépítéséhez. A megállapodás lehetővé teszi a Vodafone számára, hogy teljesen független 5G m Szükség lenne egy új felvilágosodásra – Kutatói gondolatokkal indítja ünnepi programjait az MTA Helló Sajtó! 2025-03-20 08:36:02 Tudomány MTA A 2025-ben 200 éves Magyar Tudományos Akadémia egyik legfontosabb küldetése, hogy továbbra is hiteles forrás, biztos tájékozódási pont maradjon a társadalom számára. Napjaink dezinformációs zűrzavarában ez fontosabb, mint valaha. Kemenesi: Egyre divatosabb a tudománytagadás 24.hu 2025-03-20 11:22:30 Tudomány Infektológus Virológus A virológus meglátása szerint a közösségi platformokon könnyen terjednek a téves információk, ezért a kutatóknak fontos feladatuk lenne megtanulni hatékonyan továbbítani üzeneteiket a nyilvánosság felé. Ki nem találnád, milyen irányba terjeszkedik az Nvidia Igényesférfi.hu 2025-03-20 04:34:46 Infotech USA Mesterséges intelligencia Nvidia Az amerikai chipgyártó összefogott a Yum Brands nevű gyorséttermi vállalattal, hogy közösen új szintre emeljék a mesterséges intelligencia alkalmazását a Taco Bell, a Pizza Hut és a KFC egységeiben. Nyakunkon vannak az AI vezérelte humanoid robotok TechWorld 2025-03-20 14:03:03 Infotech Robot Nvidia Nem mi mondjuk, hanem az Nvidia főnöke, hogy az AI-val kitömött emberszabású robotok hamarosan elárasztják a földet. Jensen Huang, az Nvidia vezérigazgatója szerint kevesebb mint öt év múlva a humanoid robotok széles körben elterjedhetnek a gyártóiparban. Az iparági szakember erről a vállalat éves fejlesztői konferenciáján, a kaliforniai San Joséba Kikényszeríti az együttműködést az Apple-ből az EU HWSW 2025-03-20 10:17:50 Infotech Apple Kerítés A Bizottság két határozatában is tovább bontja az Apple zárt ökoszisztémájának kerítését. Már 5-10 éven belül közel kerülünk az általános mesterséges intelligenciához, aztán jön a szuperintelligencia Blikk 2025-03-20 05:52:00 Infotech Mesterséges intelligencia Google A Google DeepMind vezérigazgatója szerint az emberekkel bármilyen feladat megoldásában eséllyel versenyző mesterséges intelligencia még messze van, de csak idő kérdése, hogy valósággá váljon. Demis Hassabis elmondta, hogy az általános mesterséges általános (AGI) – amely ugyanolyan okos vagy okosabb, mint az ember – viszont már a következő öt-tíz év Megérkezett a legdrágább OpenAI-modell, az o1-pro ITBusiness 2025-03-20 13:33:43 Infotech Mesterséges intelligencia OpenAI Az OpenAI bemutatta eddigi legdrágább MI-modelljét, az o1-pro-t, amely a cég által fejlesztett o1 "gondolkodó" MI egy továbbfejlesztett változata. Az új modell jelenleg kizárólag az OpenAI fejlesztői API-ján érhető el, és kiemelten magas áraival elsősorban azon fejlesztők számára elérhető, akik már legalább 5 dollárt elköltöttek az OpenAI API-szolg Indul a Meta mesterséges intelligencia szolgáltatása az EU-ban Demokrata 2025-03-20 09:22:27 Külföld Mesterséges intelligencia A Meta AI ingyenesen lesz elérhető a már ismert üzenetküldő és közösségi alkalmazásokon belül. A további adásainkat keresd a podcast.hirstart.hu oldalunkon.
Hey ya'll, Happy Thanskgiving to everyone who celebrates and thank you for being a subscriber, I truly appreciate each and every one of you! We had a blast on today's celebratory stream, especially given that today's "main course" was the amazing open sourcing of a reasoning model from Qwen, and we had Junyang Lin with us again to talk about it! First open source reasoning model that you can run on your machine, that beats a 405B model, comes close to o1 on some metrics
How can OpenAI help you with PowerShell? Richard talks to Doug Finke about his experiences with ChatGPT and GitHub Copilot to help him write PowerShell and how he incorporated the OpenAI API into a PowerShell library to create a conversational interface in his PowerShell scripts! Doug talks about his productivity gains using OpenAI to write better quality PowerShell faster - helping him understand the code, automate test writing, and explore aspects of PowerShell he had never dug into. But beyond writing code for him, adding the conversational interface to a PowerShell script opens a whole new interactive opportunity to make it easier for folks to use scripts and do more with them!LinksGitHub CopilotPSAIGPT-4oDoug's BlogDoug's YouTube ChannelRecorded August 7, 2024
Congrats to Damien on successfully running AI Engineer London! See our community page and the Latent Space Discord for all upcoming events.This podcast came together in a far more convoluted way than usual, but happens to result in a tight 2 hours covering the ENTIRE OpenAI product suite across ChatGPT-latest, GPT-4o and the new o1 models, and how they are delivered to AI Engineers in the API via the new Structured Output mode, Assistants API, client SDKs, upcoming Voice Mode API, Finetuning/Vision/Whisper/Batch/Admin/Audit APIs, and everything else you need to know to be up to speed in September 2024.This podcast has two parts: the first hour is a regular, well edited, podcast on 4o, Structured Outputs, and the rest of the OpenAI API platform. The second was a rushed, noisy, hastily cobbled together recap of the top takeaways from the o1 model release from yesterday and today.Building AGI with Structured Outputs — Michelle Pokrass of OpenAI API teamMichelle Pokrass built massively scalable platforms at Google, Stripe, Coinbase and Clubhouse, and now leads the API Platform at Open AI. She joins us today to talk about why structured output is such an important modality for AI Engineers that Open AI has now trained and engineered a Structured Output mode with 100% reliable JSON schema adherence. To understand why this is important, a bit of history is important:* June 2023 when OpenAI first added a "function calling" capability to GPT-4-0613 and GPT 3.5 Turbo 0613 (our podcast/writeup here)* November 2023's OpenAI Dev Day (our podcast/writeup here) where the team shipped JSON Mode, a simpler schema-less JSON output mode that nevertheless became more popular because function calling often failed to match the JSON schema given by developers. * Meanwhile, in open source, many solutions arose, including * Instructor (our pod with Jason here) * LangChain (our pod with Harrison here, and he is returning next as a guest co-host)* Outlines (Remi Louf's talk at AI Engineer here)* Llama.cpp's constrained grammar sampling using GGML-BNF* April 2024: OpenAI started implementing constrained sampling with a new `tool_choice: required` parameter in the API* August 2024: the new Structured Output mode, co-led by Michelle* Sept 2024: Gemini shipped Structured Outputs as wellWe sat down with Michelle to talk through every part of the process, as well as quizzing her for updates on everything else the API team has shipped in the past year, from the Assistants API, to Prompt Caching, GPT4 Vision, Whisper, the upcoming Advanced Voice Mode API, OpenAI Enterprise features, and why every Waterloo grad seems to be a cracked engineer.Part 1 Timestamps and TranscriptTranscript here.* [00:00:42] Episode Intro from Suno* [00:03:34] Michelle's Path to OpenAI* [00:12:20] Scaling ChatGPT* [00:13:20] Releasing Structured Output* [00:16:17] Structured Outputs vs Function Calling* [00:19:42] JSON Schema and Constrained Grammar* [00:20:45] OpenAI API team* [00:21:32] Structured Output Refusal Field* [00:24:23] ChatML issues* [00:26:20] Function Calling Evals* [00:28:34] Parallel Function Calling* [00:29:30] Increased Latency* [00:30:28] Prompt/Schema Caching* [00:30:50] Building Agents with Structured Outputs: from API to AGI* [00:31:52] Assistants API* [00:34:00] Use cases for Structured Output* [00:37:45] Prompting Structured Output* [00:39:44] Benchmarking Prompting for Structured Outputs* [00:41:50] Structured Outputs Roadmap* [00:43:37] Model Selection vs GPT4 Finetuning* [00:46:56] Is Prompt Engineering Dead?* [00:47:29] 2 models: ChatGPT Latest vs GPT 4o August* [00:50:24] Why API => AGI* [00:52:40] Dev Day* [00:54:20] Assistants API Roadmap* [00:56:14] Model Reproducibility/Determinism issues* [00:57:53] Tiering and Rate Limiting* [00:59:26] OpenAI vs Ops Startups* [01:01:06] Batch API* [01:02:54] Vision* [01:04:42] Whisper* [01:07:21] Voice Mode API* [01:08:10] Enterprise: Admin/Audit Log APIs* [01:09:02] Waterloo grads* [01:10:49] Books* [01:11:57] Cognitive Biases* [01:13:25] Are LLMs Econs?* [01:13:49] Hiring at OpenAIEmergency O1 Meetup — OpenAI DevRel + Strawberry teamthe following is our writeup from AINews, which so far stands the test of time.o1, aka Strawberry, aka Q*, is finally out! There are two models we can use today: o1-preview (the bigger one priced at $15 in / $60 out) and o1-mini (the STEM-reasoning focused distillation priced at $3 in/$12 out) - and the main o1 model is still in training. This caused a little bit of confusion.There are a raft of relevant links, so don't miss:* the o1 Hub* the o1-preview blogpost* the o1-mini blogpost* the technical research blogpost* the o1 system card* the platform docs* the o1 team video and contributors list (twitter)Inline with the many, many leaks leading up to today, the core story is longer “test-time inference” aka longer step by step responses - in the ChatGPT app this shows up as a new “thinking” step that you can click to expand for reasoning traces, even though, controversially, they are hidden from you (interesting conflict of interest…):Under the hood, o1 is trained for adding new reasoning tokens - which you pay for, and OpenAI has accordingly extended the output token limit to >30k tokens (incidentally this is also why a number of API parameters from the other models like temperature and role and tool calling and streaming, but especially max_tokens is no longer supported).The evals are exceptional. OpenAI o1:* ranks in the 89th percentile on competitive programming questions (Codeforces),* places among the top 500 students in the US in a qualifier for the USA Math Olympiad (AIME),* and exceeds human PhD-level accuracy on a benchmark of physics, biology, and chemistry problems (GPQA).You are used to new models showing flattering charts, but there is one of note that you don't see in many model announcements, that is probably the most important chart of all. Dr Jim Fan gets it right: we now have scaling laws for test time compute, and it looks like they scale loglinearly.We unfortunately may never know the drivers of the reasoning improvements, but Jason Wei shared some hints:Usually the big model gets all the accolades, but notably many are calling out the performance of o1-mini for its size (smaller than gpt 4o), so do not miss that.Part 2 Timestamps* [01:15:01] O1 transition* [01:16:07] O1 Meetup Recording* [01:38:38] OpenAI Friday AMA recap* [01:44:47] Q&A Part 2* [01:50:28] O1 DemosDemo Videos to be posted shortly Get full access to Latent Space at www.latent.space/subscribe
Today's guest, Nicholas Carlini, a research scientist at DeepMind, argues that we should be focusing more on what AI can do for us individually, rather than trying to have an answer for everyone."How I Use AI" - A Pragmatic ApproachCarlini's blog post "How I Use AI" went viral for good reason. Instead of giving a personal opinion about AI's potential, he simply laid out how he, as a security researcher, uses AI tools in his daily work. He divided it in 12 sections:* To make applications* As a tutor* To get started* To simplify code* For boring tasks* To automate tasks* As an API reference* As a search engine* To solve one-offs* To teach me* Solving solved problems* To fix errorsEach of the sections has specific examples, so we recommend going through it. It also includes all prompts used for it; in the "make applications" case, it's 30,000 words total!My personal takeaway is that the majority of the work AI can do successfully is what humans dislike doing. Writing boilerplate code, looking up docs, taking repetitive actions, etc. These are usually boring tasks with little creativity, but with a lot of structure. This is the strongest arguments as to why LLMs, especially for code, are more beneficial to senior employees: if you can get the boring stuff out of the way, there's a lot more value you can generate. This is less and less true as you go entry level jobs which are mostly boring and repetitive tasks. Nicholas argues both sides ~21:34 in the pod.A New Approach to LLM BenchmarksWe recently did a Benchmarks 201 episode, a follow up to our original Benchmarks 101, and some of the issues have stayed the same. Notably, there's a big discrepancy between what benchmarks like MMLU test, and what the models are used for. Carlini created his own domain-specific language for writing personalized LLM benchmarks. The idea is simple but powerful:* Take tasks you've actually needed AI for in the past.* Turn them into benchmark tests.* Use these to evaluate new models based on your specific needs.It can represent very complex tasks, from a single code generation to drawing a US flag using C:"Write hello world in python" >> LLMRun() >> PythonRun() >> SubstringEvaluator("hello world")"Write a C program that draws an american flag to stdout." >> LLMRun() >> CRun() >> VisionLLMRun("What flag is shown in this image?") >> (SubstringEvaluator("United States") | SubstringEvaluator("USA")))This approach solves a few problems:* It measures what's actually useful to you, not abstract capabilities.* It's harder for model creators to "game" your specific benchmark, a problem that has plagued standardized tests.* It gives you a concrete way to decide if a new model is worth switching to, similar to how developers might run benchmarks before adopting a new library or framework.Carlini argues that if even a small percentage of AI users created personal benchmarks, we'd have a much better picture of model capabilities in practice.AI SecurityWhile much of the AI security discussion focuses on either jailbreaks or existential risks, Carlini's research targets the space in between. Some highlights from his recent work:* LAION 400M data poisoning: By buying expired domains referenced in the dataset, Carlini's team could inject arbitrary images into models trained on LAION 400M. You can read the paper "Poisoning Web-Scale Training Datasets is Practical", for all the details. This is a great example of expanding the scope beyond the model itself, and looking at the whole system and how ti can become vulnerable.* Stealing model weights: They demonstrated how to extract parts of production language models (like OpenAI's) through careful API queries. This research, "Extracting Training Data from Large Language Models", shows that even black-box access can leak sensitive information.* Extracting training data: In some cases, they found ways to make models regurgitate verbatim snippets from their training data. Him and Milad Nasr wrote a paper on this as well: Scalable Extraction of Training Data from (Production) Language Models. They also think this might be applicable to extracting RAG results from a generation.These aren't just theoretical attacks. They've led to real changes in how companies like OpenAI design their APIs and handle data. If you really miss logit_bias and logit results by token, you can blame Nicholas :)We had a ton of fun also chatting about things like Conway's Game of Life, how much data can fit in a piece of paper, and porting Doom to Javascript. Enjoy!Show Notes* How I Use AI* My Benchmark for LLMs* Doom Javascript port* Conway's Game of Life* Tic-Tac-Toe in one printf statement* International Obfuscated C Code Contest* Cursor* LAION 400M poisoning paper* Man vs Machine at Black Hat* Model Stealing from OpenAI* Milad Nasr* H.D. Moore* Vijay Bolina* Cosine.sh* uuencodeTimestamps* [00:00:00] Introductions* [00:01:14] Why Nicholas writes* [00:02:09] The Game of Life* [00:05:07] "How I Use AI" blog post origin story* [00:08:24] Do we need software engineering agents?* [00:11:03] Using AI to kickstart a project* [00:14:08] Ephemeral software* [00:17:37] Using AI to accelerate research* [00:21:34] Experts vs non-expert users as beneficiaries of AI* [00:24:02] Research on generating less secure code with LLMs.* [00:27:22] Learning and explaining code with AI* [00:30:12] AGI speculations?* [00:32:50] Distributing content without social media* [00:35:39] How much data do you think you can put on a single piece of paper?* [00:37:37] Building personal AI benchmarks* [00:43:04] Evolution of prompt engineering and its relevance* [00:46:06] Model vs task benchmarking* [00:52:14] Poisoning LAION 400M through expired domains* [00:55:38] Stealing OpenAI models from their API* [01:01:29] Data stealing and recovering training data from models* [01:03:30] Finding motivation in your workTranscriptAlessio [00:00:00]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO-in-Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI.Swyx [00:00:12]: Hey, and today we're in the in-person studio, which Alessio has gorgeously set up for us, with Nicholas Carlini. Welcome. Thank you. You're a research scientist at DeepMind. You work at the intersection of machine learning and computer security. You got your PhD from Berkeley in 2018, and also your BA from Berkeley as well. And mostly we're here to talk about your blogs, because you are so generous in just writing up what you know. Well, actually, why do you write?Nicholas [00:00:41]: Because I like, I feel like it's fun to share what you've done. I don't like writing, sufficiently didn't like writing, I almost didn't do a PhD, because I knew how much writing was involved in writing papers. I was terrible at writing when I was younger. I do like the remedial writing classes when I was in university, because I was really bad at it. So I don't actually enjoy, I still don't enjoy the act of writing. But I feel like it is useful to share what you're doing, and I like being able to talk about the things that I'm doing that I think are fun. And so I write because I think I want to have something to say, not because I enjoy the act of writing.Swyx [00:01:14]: But yeah. It's a tool for thought, as they often say. Is there any sort of backgrounds or thing that people should know about you as a person? Yeah.Nicholas [00:01:23]: So I tend to focus on, like you said, I do security work, I try to like attacking things and I want to do like high quality security research. And that's mostly what I spend my actual time trying to be productive members of society doing that. But then I get distracted by things, and I just like, you know, working on random fun projects. Like a Doom clone in JavaScript.Swyx [00:01:44]: Yes.Nicholas [00:01:45]: Like that. Or, you know, I've done a number of things that have absolutely no utility. But are fun things to have done. And so it's interesting to say, like, you should work on fun things that just are interesting, even if they're not useful in any real way. And so that's what I tend to put up there is after I have completed something I think is fun, or if I think it's sufficiently interesting, write something down there.Alessio [00:02:09]: Before we go into like AI, LLMs and whatnot, why are you obsessed with the game of life? So you built multiplexing circuits in the game of life, which is mind boggling. So where did that come from? And then how do you go from just clicking boxes on the UI web version to like building multiplexing circuits?Nicholas [00:02:29]: I like Turing completeness. The definition of Turing completeness is a computer that can run anything, essentially. And the game of life, Conway's game of life is a very simple cellular 2D automata where you have cells that are either on or off. And a cell becomes on if in the previous generation some configuration holds true and off otherwise. It turns out there's a proof that the game of life is Turing complete, that you can run any program in principle using Conway's game of life. I don't know. And so you can, therefore someone should. And so I wanted to do it. Some other people have done some similar things, but I got obsessed into like, if you're going to try and make it work, like we already know it's possible in theory. I want to try and like actually make something I can run on my computer, like a real computer I can run. And so yeah, I've been going on this rabbit hole of trying to make a CPU that I can run semi real time on the game of life. And I have been making some reasonable progress there. And yeah, but you know, Turing completeness is just like a very fun trap you can go down. A while ago, as part of a research paper, I was able to show that in C, if you call into printf, it's Turing complete. Like printf, you know, like, which like, you know, you can print numbers or whatever, right?Swyx [00:03:39]: Yeah, but there should be no like control flow stuff.Nicholas [00:03:42]: Because printf has a percent n specifier that lets you write an arbitrary amount of data to an arbitrary location. And the printf format specifier has an index into where it is in the loop that is in memory. So you can overwrite the location of where printf is currently indexing using percent n. So you can get loops, you can get conditionals, and you can get arbitrary data rates again. So we sort of have another Turing complete language using printf, which again, like this has essentially zero practical utility, but like, it's just, I feel like a lot of people get into programming because they enjoy the art of doing these things. And then they go work on developing some software application and lose all joy with the boys. And I want to still have joy in doing these things. And so on occasion, I try to stop doing productive, meaningful things and just like, what's a fun thing that we can do and try and make that happen.Alessio [00:04:39]: Awesome. So you've been kind of like a pioneer in the AI security space. You've done a lot of talks starting back in 2018. We'll kind of leave that to the end because I know the security part is, there's maybe a smaller audience, but it's a very intense audience. So I think that'll be fun. But everybody in our Discord started posting your how I use AI blog post and we were like, we should get Carlini on the podcast. And then you were so nice to just, yeah, and then I sent you an email and you're like, okay, I'll come.Swyx [00:05:07]: And I was like, oh, I thought that would be harder.Alessio [00:05:10]: I think there's, as you said in the blog posts, a lot of misunderstanding about what LLMs can actually be used for. What are they useful at? What are they not good at? And whether or not it's even worth arguing what they're not good at, because they're obviously not. So if you cannot count the R's in a word, they're like, it's just not what it does. So how painful was it to write such a long post, given that you just said that you don't like to write? Yeah. And then we can kind of run through the things, but maybe just talk about the motivation, why you thought it was important to do it.Nicholas [00:05:39]: Yeah. So I wanted to do this because I feel like most people who write about language models being good or bad, some underlying message of like, you know, they have their camp and their camp is like, AI is bad or AI is good or whatever. And they like, they spin whatever they're going to say according to their ideology. And they don't actually just look at what is true in the world. So I've read a lot of things where people say how amazing they are and how all programmers are going to be obsolete by 2024. And I've read a lot of things where people who say like, they can't do anything useful at all. And, you know, like, they're just like, it's only the people who've come off of, you know, blockchain crypto stuff and are here to like make another quick buck and move on. And I don't really agree with either of these. And I'm not someone who cares really one way or the other how these things go. And so I wanted to write something that just says like, look, like, let's sort of ground reality and what we can actually do with these things. Because my actual research is in like security and showing that these models have lots of problems. Like this is like my day to day job is saying like, we probably shouldn't be using these in lots of cases. I thought I could have a little bit of credibility of in saying, it is true. They have lots of problems. We maybe shouldn't be deploying them lots of situations. And still, they are also useful. And that is the like, the bit that I wanted to get across is to say, I'm not here to try and sell you on anything. I just think that they're useful for the kinds of work that I do. And hopefully, some people would listen. And it turned out that a lot more people liked it than I thought. But yeah, that was the motivation behind why I wanted to write this.Alessio [00:07:15]: So you had about a dozen sections of like how you actually use AI. Maybe we can just kind of run through them all. And then maybe the ones where you have extra commentary to add, we can... Sure.Nicholas [00:07:27]: Yeah, yeah. I didn't put as much thought into this as maybe was deserved. I probably spent, I don't know, definitely less than 10 hours putting this together.Swyx [00:07:38]: Wow.Alessio [00:07:39]: It took me close to that to do a podcast episode. So that's pretty impressive.Nicholas [00:07:43]: Yeah. I wrote it in one pass. I've gotten a number of emails of like, you got this editing thing wrong, you got this sort of other thing wrong. It's like, I haven't just haven't looked at it. I tend to try it. I feel like I still don't like writing. And so because of this, the way I tend to treat this is like, I will put it together into the best format that I can at a time, and then put it on the internet, and then never change it. And this is an aspect of like the research side of me is like, once a paper is published, like it is done as an artifact that exists in the world. I could forever edit the very first thing I ever put to make it the most perfect version of what it is, and I would do nothing else. And so I feel like I find it useful to be like, this is the artifact, I will spend some certain amount of hours on it, which is what I think it is worth. And then I will just...Swyx [00:08:22]: Yeah.Nicholas [00:08:23]: Timeboxing.Alessio [00:08:24]: Yeah. Stop. Yeah. Okay. We just recorded an episode with the founder of Cosine, which is like an AI software engineer colleague. You said it took you 30,000 words to get GPT-4 to build you the, can GPT-4 solve this kind of like app. Where are we in the spectrum where chat GPT is all you need to actually build something versus I need a full on agent that does everything for me?Nicholas [00:08:46]: Yeah. Okay. So this was an... So I built a web app last year sometime that was just like a fun demo where you can guess if you can predict whether or not GPT-4 at the time could solve a given task. This is, as far as web apps go, very straightforward. You need basic HTML, CSS, you have a little slider that moves, you have a button, sort of animate the text coming to the screen. The reason people are going here is not because they want to see my wonderful HTML, right? I used to know how to do modern HTML in 2007, 2008. I was very good at fighting with IE6 and these kinds of things. I knew how to do that. I have no longer had to build any web app stuff in the meantime, which means that I know how everything works, but I don't know any of the new... Flexbox is new to me. Flexbox is like 10 years old at this point, but it's just amazing being able to go to the model and just say, write me this thing and it will give me all of the boilerplate that I need to get going. Of course it's imperfect. It's not going to get you the right answer, and it doesn't do anything that's complicated right now, but it gets you to the point where the only remaining work that needs to be done is the interesting hard part for me, the actual novel part. Even the current models, I think, are entirely good enough at doing this kind of thing, that they're very useful. It may be the case that if you had something, like you were saying, a smarter agent that could debug problems by itself, that might be even more useful. Currently though, make a model into an agent by just copying and pasting error messages for the most part. That's what I do, is you run it and it gives you some code that doesn't work, and either I'll fix the code, or it will give me buggy code and I won't know how to fix it, and I'll just copy and paste the error message and say, it tells me this. What do I do? And it will just tell me how to fix it. You can't trust these things blindly, but I feel like most people on the internet already understand that things on the internet, you can't trust blindly. And so this is not like a big mental shift you have to go through to understand that it is possible to read something and find it useful, even if it is not completely perfect in its output.Swyx [00:10:54]: It's very human-like in that sense. It's the same ring of trust, I kind of think about it that way, if you had trust levels.Alessio [00:11:03]: And there's maybe a couple that tie together. So there was like, to make applications, and then there's to get started, which is a similar you know, kickstart, maybe like a project that you know the LLM cannot solve. It's kind of how you think about it.Nicholas [00:11:15]: Yeah. So for getting started on things is one of the cases where I think it's really great for some of these things, where I sort of use it as a personalized, help me use this technology I've never used before. So for example, I had never used Docker before January. I know what Docker is. Lucky you. Yeah, like I'm a computer security person, like I sort of, I have read lots of papers on, you know, all the technology behind how these things work. You know, I know all the exploits on them, I've done some of these things, but I had never actually used Docker. But I wanted it to be able to, I could run the outputs of language model stuff in some controlled contained environment, which I know is the right application. So I just ask it like, I want to use Docker to do this thing, like, tell me how to run a Python program in a Docker container. And it like gives me a thing. I'm like, step back. You said Docker compose, I do not know what this word Docker compose is. Is this Docker? Help me. And like, you'll sort of tell me all of these things. And I'm sure there's this knowledge that's out there on the internet, like this is not some groundbreaking thing that I'm doing, but I just wanted it as a small piece of one thing I was working on. And I didn't want to learn Docker from first principles. Like I, at some point, if I need it, I can do that. Like I have the background that I can make that happen. But what I wanted to do was, was thing one. And it's very easy to get bogged down in the details of this other thing that helps you accomplish your end goal. And I just want to like, tell me enough about Docker so I can do this particular thing. And I can check that it's doing the safe thing. I sort of know enough about that from, you know, my other background. And so I can just have the model help teach me exactly the one thing I want to know and nothing more. I don't need to worry about other things that the writer of this thinks is important that actually isn't. Like I can just like stop the conversation and say, no, boring to me. Explain this detail. I don't understand. I think that's what that was very useful for me. It would have taken me, you know, several hours to figure out some things that take 10 minutes if you could just ask exactly the question you want the answer to.Alessio [00:13:05]: Have you had any issues with like newer tools? Have you felt any meaningful kind of like a cutoff day where like there's not enough data on the internet or? I'm sure that the answer to this is yes.Nicholas [00:13:16]: But I tend to just not use most of these things. Like I feel like this is like the significant way in which I use machine learning models is probably very different than most people is that I'm a researcher and I get to pick what tools that I use and most of the things that I work on are fairly small projects. And so I can, I can entirely see how someone who is in a big giant company where they have their own proprietary legacy code base of a hundred million lines of code or whatever and like you just might not be able to use things the same way that I do. I still think there are lots of use cases there that are entirely reasonable that are not the same ones that I've put down. But I wanted to talk about what I have personal experience in being able to say is useful. And I would like it very much if someone who is in one of these environments would be able to describe the ways in which they find current models useful to them. And not, you know, philosophize on what someone else might be able to find useful, but actually say like, here are real things that I have done that I found useful for me.Swyx [00:14:08]: Yeah, this is what I often do to encourage people to write more, to share their experiences because they often fear being attacked on the internet. But you are the ultimate authority on how you use things and there's this objectively true. So they cannot be debated. One thing that people are very excited about is the concept of ephemeral software or like personal software. This use case in particular basically lowers the activation energy for creating software, which I like as a vision. I don't think I have taken as much advantage of it as I could. I feel guilty about that. But also, we're trending towards there.Nicholas [00:14:47]: Yeah. No, I mean, I do think that this is a direction that is exciting to me. One of the things I wrote that was like, a lot of the ways that I use these models are for one-off things that I just need to happen that I'm going to throw away in five minutes. And you can.Swyx [00:15:01]: Yeah, exactly.Nicholas [00:15:02]: Right. It's like the kind of thing where it would not have been worth it for me to have spent 45 minutes writing this, because I don't need the answer that badly. But if it will only take me five minutes, then I'll just figure it out, run the program and then get it right. And if it turns out that you ask the thing, it doesn't give you the right answer. Well, I didn't actually need the answer that badly in the first place. Like either I can decide to dedicate the 45 minutes or I cannot, but like the cost of doing it is fairly low. You see what the model can do. And if it can't, then, okay, when you're using these models, if you're getting the answer you want always, it means you're not asking them hard enough questions.Swyx [00:15:35]: Say more.Nicholas [00:15:37]: Lots of people only use them for very small particular use cases and like it always does the thing that they want. Yeah.Swyx [00:15:43]: Like they use it like a search engine.Nicholas [00:15:44]: Yeah. Or like one particular case. And if you're finding that when you're using these, it's always giving you the answer that you want, then probably it has more capabilities than you're actually using. And so I oftentimes try when I have something that I'm curious about to just feed into the model and be like, well, maybe it's just solved my problem for me. You know, most of the time it doesn't, but like on occasion, it's like, it's done things that would have taken me, you know, a couple hours that it's been great and just like solved everything immediately. And if it doesn't, then it's usually easier to verify whether or not the answer is correct than to have written in the first place. And so you check, you're like, well, that's just, you're entirely misguided. Nothing here is right. It's just like, I'm not going to do this. I'm going to go write it myself or whatever.Alessio [00:16:21]: Even for non-tech, I had to fix my irrigation system. I had an old irrigation system. I didn't know how I worked to program it. I took a photo, I sent it to Claude and it's like, oh yeah, that's like the RT 900. This is exactly, I was like, oh wow, you know, you know, a lot of stuff.Swyx [00:16:34]: Was it right?Alessio [00:16:35]: Yeah, it was right.Swyx [00:16:36]: It worked. Did you compare with OpenAI?Alessio [00:16:38]: No, I canceled my OpenAI subscription, so I'm a Claude boy. Do you have a way to think about this like one-offs software thing? One way I talk to people about it is like LLMs are kind of converging to like semantic serverless functions, you know, like you can say something and like it can run the function in a way and then that's it. It just kind of dies there. Do you have a mental model to just think about how long it should live for and like anything like that?Nicholas [00:17:02]: I don't think I have anything interesting to say here, no. I will take whatever tools are available in front of me and try and see if I can use them in meaningful ways. And if they're helpful, then great. If they're not, then fine. And like, you know, there are lots of people that I'm very excited about seeing all these people who are trying to make better applications that use these or all these kinds of things. And I think that's amazing. I would like to see more of it, but I do not spend my time thinking about how to make this any better.Alessio [00:17:27]: What's the most underrated thing in the list? I know there's like simplified code, solving boring tasks, or maybe is there something that you forgot to add that you want to throw in there?Nicholas [00:17:37]: I mean, so in the list, I only put things that people could look at and go, I understand how this solved my problem. I didn't want to put things where the model was very useful to me, but it would not be clear to someone else that it was actually useful. So for example, one of the things that I use it a lot for is debugging errors. But the errors that I have are very much not the errors that anyone else in the world will have. And in order to understand whether or not the solution was right, you just have to trust me on it. Because, you know, like I got my machine in a state that like CUDA was not talking to whatever some other thing, the versions were mismatched, something, something, something, and everything was broken. And like, I could figure it out with interaction with the model, and it gave it like told me the steps I needed to take. But at the end of the day, when you look at the conversation, you just have to trust me that it worked. And I didn't want to write things online that were this, like, you have to trust me that what I'm saying. I want everything that I said to like have evidence that like, here's the conversation, you can go and check whether or not this actually solved the task as I said that the model does. Because a lot of people I feel like say, I used a model to solve this very complicated task. And what they mean is the model did 10%, and I did the other 90% or something, I wanted everything to be verifiable. And so one of the biggest use cases for me, I didn't describe even at all, because it's not the kind of thing that other people could have verified by themselves. So that maybe is like, one of the things that I wish I maybe had said a little bit more about, and just stated that the way that this is done, because I feel like that this didn't come across quite as well. But yeah, of the things that I talked about, the thing that I think is most underrated is the ability of it to solve the uninteresting parts of problems for me right now, where people always say, this is one of the biggest arguments that I don't understand why people say is, the model can only do things that people have done before. Therefore, the model is not going to be helpful in doing new research or like discovering new things. And as someone whose day job is to do new things, like what is research? Research is doing something literally no one else in the world has ever done before. So this is what I do every single day, 90% of this is not doing something new, 90% of this is doing things a million people have done before, and then a little bit of something that was new. There's a reason why we say we stand on the shoulders of giants. It's true. Almost everything that I do is something that's been done many, many times before. And that is the piece that can be automated. Even if the thing that I'm doing as a whole is new, it is almost certainly the case that the small pieces that build up to it are not. And a number of people who use these models, I feel like expect that they can either solve the entire task or none of the task. But now I find myself very often, even when doing something very new and very hard, having models write the easy parts for me. And the reason I think this is so valuable, everyone who programs understands this, like you're currently trying to solve some problem and then you get distracted. And whatever the case may be, someone comes and talks to you, you have to go look up something online, whatever it is. You lose a lot of time to that. And one of the ways we currently don't think about being distracted is you're solving some hard problem and you realize you need a helper function that does X, where X is like, it's a known algorithm. Any person in the world, you say like, give me the algorithm that, have a dense graph or a sparse graph, I need to make it dense. You can do this by doing some matrix multiplies. It's like, this is a solved problem. I knew how to do this 15 years ago, but it distracts me from the problem I'm thinking about in my mind. I needed this done. And so instead of using my mental capacity and solving that problem and then coming back to the problem I was originally trying to solve, you could just ask model, please solve this problem for me. It gives you the answer. You run it. You can check that it works very, very quickly. And now you go back to solving the problem without having lost all the mental state. And I feel like this is one of the things that's been very useful for me.Swyx [00:21:34]: And in terms of this concept of expert users versus non-expert users, floors versus ceilings, you had some strong opinion here that like, basically it actually is more beneficial for non-experts.Nicholas [00:21:46]: Yeah, I don't know. I think it could go either way. Let me give you the argument for both of these. Yes. So I can only speak on the expert user behalf because I've been doing computers for a long time. And so yeah, the cases where it's useful for me are exactly these cases where I can check the output. I know, and anything the model could do, I could have done. I could have done better. I can check every single thing that the model is doing and make sure it's correct in every way. And so I can only speak and say, definitely it's been useful for me. But I also see a world in which this could be very useful for the kinds of people who do not have this knowledge, with caveats, because I'm not one of these people. I don't have this direct experience. But one of these big ways that I can see this is for things that you can check fairly easily, someone who could never have asked or have written a program themselves to do a certain task could just ask for the program that does the thing. And you know, some of the times it won't get it right. But some of the times it will, and they'll be able to have the thing in front of them that they just couldn't have done before. And we see a lot of people trying to do applications for this, like integrating language models into spreadsheets. Spreadsheets run the world. And there are some people who know how to do all the complicated spreadsheet equations and various things, and other people who don't, who just use the spreadsheet program but just manually do all of the things one by one by one by one. And this is a case where you could have a model that could try and give you a solution. And as long as the person is rigorous in testing that the solution does actually the correct thing, and this is the part that I'm worried about most, you know, I think depending on these systems in ways that we shouldn't, like this is what my research says, my research says is entirely on this, like, you probably shouldn't trust these models to do the things in adversarial situations, like, I understand this very deeply. And so I think that it's possible for people who don't have this knowledge to make use of these tools in ways, but I'm worried that it might end up in a world where people just blindly trust them, deploy them in situations that they probably shouldn't, and then someone like me gets to come along and just break everything because everything is terrible. And so I am very, very worried about that being the case, but I think if done carefully it is possible that these could be very useful.Swyx [00:23:54]: Yeah, there is some research out there that shows that when people use LLMs to generate code, they do generate less secure code.Nicholas [00:24:02]: Yeah, Dan Bonet has a nice paper on this. There are a bunch of papers that touch on exactly this.Swyx [00:24:07]: My slight issue is, you know, is there an agenda here?Nicholas [00:24:10]: I mean, okay, yeah, Dan Bonet, at least the one they have, like, I fully trust everything that sort of.Swyx [00:24:15]: Sorry, I don't know who Dan is.Swyx [00:24:17]: He's a professor at Stanford. Yeah, he and some students have some things on this. Yeah, there's a number. I agree that a lot of the stuff feels like people have an agenda behind it. There are some that don't, and I trust them to have done the right thing. I also think, even on this though, we have to be careful because the argument, whenever someone says x is true about language models, you should always append the suffix for current models because I'll be the first to admit I was one of the people who was very much on the opinion that these language models are fun toys and are going to have absolutely no practical utility. If you had asked me this, let's say, in 2020, I still would have said the same thing. After I had seen GPT-2, I had written a couple of papers studying GPT-2 very carefully. I still would have told you these things are toys. And when I first read the RLHF paper and the instruction tuning paper, I was like, nope, this is this thing that these weird AI people are doing. They're trying to make some analogies to people that makes no sense. It's just like, I don't even care to read it. I saw what it was about and just didn't even look at it. I was obviously wrong. These things can be useful. And I feel like a lot of people had the same mentality that I did and decided not to change their mind. And I feel like this is the thing that I want people to be careful about. I want them to at least know what is true about the world so that they can then see that maybe they should reconsider some of the opinions that they had from four or five years ago that may just not be true about today's models.Swyx [00:25:47]: Specifically because you brought up spreadsheets, I want to share my personal experience because I think Google has done a really good job that people don't know about, which is if you use Google Sheets, Gemini is integrated inside of Google Sheets and it helps you write formulas. Great.Nicholas [00:26:00]: That's news to me.Swyx [00:26:01]: Right? They don't maybe do a good job. Unless you watch Google I.O., there was no other opportunity to learn that Gemini is now in your Google Sheets. And so I just don't write formulas manually anymore. It just prompts Gemini to do it for me. And it does it.Nicholas [00:26:15]: One of the problems that these machine learning models have is a discoverability problem. I think this will be figured out. I mean, it's the same problem that you have with any assistant. You're given a blank box and you're like, what do I do with it? I think this is great. More of these things, it would be good for them to exist. I want them to exist in ways that we can actually make sure that they're done correctly. I don't want to just have them be pushed into more and more things just blindly. I feel like lots of people, there are far too many X plus AI, where X is like arbitrary thing in the world that has nothing to do with it and could not be benefited at all. And they're just doing it because they want to use the word. And I don't want that to happen.Swyx [00:26:58]: You don't want an AI fridge?Nicholas [00:27:00]: No. Yes. I do not want my fridge on the internet.Swyx [00:27:03]: I do not want... Okay.Nicholas [00:27:05]: Anyway, let's not go down that rabbit hole. I understand why some of that happens, because people want to sell things or whatever. But I feel like a lot of people see that and then they write off everything as a result of it. And I just want to say, there are allowed to be people who are trying to do things that don't make any sense. Just ignore them. Do the things that make sense.Alessio [00:27:22]: Another chunk of use cases was learning. So both explaining code, being an API reference, all of these different things. Any suggestions on how to go at it? I feel like one thing is generate code and then explain to me. One way is just tell me about this technology. Another thing is like, hey, I read this online, kind of help me understand it. Any best practices on getting the most out of it?Swyx [00:27:47]: Yeah.Nicholas [00:27:47]: I don't know if I have best practices. I have how I use them.Swyx [00:27:51]: Yeah.Nicholas [00:27:51]: I find it very useful for cases where I understand the underlying ideas, but I have never usedSwyx [00:27:59]: them in this way before.Nicholas [00:28:00]: I know what I'm looking for, but I just don't know how to get there. And so yeah, as an API reference is a great example. The tool everyone always picks on is like FFmpeg. No one in the world knows the command line arguments to do what they want. They're like, make the thing faster. I want lower bitrate, like dash V. Once you tell me what the answer is, I can check. This is one of these things where it's great for these kinds of things. Or in other cases, things where I don't really care that the answer is 100% correct. So for example, I do a lot of security work. Most of security work is reading some code you've never seen before and finding out which pieces of the code are actually important. Because, you know, most of the program isn't actually do anything to do with security. It has, you know, the display piece or the other piece or whatever. And like, you just, you would only ignore all of that. So one very fun use of models is to like, just have it describe all the functions and just skim it and be like, wait, which ones look like approximately the right things to look at? Because otherwise, what are you going to do? You're going to have to read them all manually. And when you're reading them manually, you're going to skim the function anyway, and not just figure out what's going on perfectly. Like you already know that when you're going to read these things, what you're going to try and do is figure out roughly what's going on. Then you'll delve into the details. This is a great way of just doing that, but faster, because it will abstract most of whatSwyx [00:29:21]: is right.Nicholas [00:29:21]: It's going to be wrong some of the time. I don't care.Swyx [00:29:23]: I would have been wrong too.Nicholas [00:29:24]: And as long as you treat it with this way, I think it's great. And so like one of the particular use cases I have in the thing is decompiling binaries, where oftentimes people will release a binary. They won't give you the source code. And you want to figure out how to attack it. And so one thing you could do is you could try and run some kind of decompiler. It turns out for the thing that I wanted, none existed. And so I spent too many hours doing it by hand. Before I first thought, why am I doing this? I should just check if the model could do it for me. And it turns out that it can. And it can turn the compiled source code, which is impossible for any human to understand, into the Python code that is entirely reasonable to understand. And it doesn't run. It has a bunch of problems. But it's so much nicer that it's immediately a win for me. I can just figure out approximately where I should be looking, and then spend all of my time doing that by hand. And again, you get a big win there.Swyx [00:30:12]: So I fully agree with all those use cases, especially for you as a security researcher and having to dive into multiple things. I imagine that's super helpful. I do think we want to move to your other blog post. But you ended your post with a little bit of a teaser about your next post and your speculations. What are you thinking about?Nicholas [00:30:34]: So I want to write something. And I will do that at some point when I have time, maybe after I'm done writing my current papers for ICLR or something, where I want to talk about some thoughts I have for where language models are going in the near-term future. The reason why I want to talk about this is because, again, I feel like the discussion tends to be people who are either very much AGI by 2027, orSwyx [00:30:55]: always five years away, or are going to make statements of the form,Nicholas [00:31:00]: you know, LLMs are the wrong path, and we should be abandoning this, and we should be doing something else instead. And again, I feel like people tend to look at this and see these two polarizing options and go, well, those obviously are both very far extremes. Like, how do I actually, like, what's a more nuanced take here? And so I have some opinions about this that I want to put down, just saying, you know, I have wide margins of error. I think you should too. If you would say there's a 0% chance that something, you know, the models will get very, very good in the next five years, you're probably wrong. If you're going to say there's a 100% chance that in the next five years, then you're probably wrong. And like, to be fair, most of the people, if you read behind the headlines, actually say something like this. But it's very hard to get clicks on the internet of like, some things may be good in the future. Like, everyone wants like, you know, a very, like, nothing is going to be good. This is entirely wrong. It's going to be amazing. You know, like, they want to see this. I want people who have negative reactions to these kinds of extreme views to be able to at least say, like, to tell them, there is something real here. It may not solve all of our problems, but it's probably going to get better. I don't know by how much. And that's basically what I want to say. And then at some point, I'll talk about the safety and security things as a result of this. Because the way in which security intersects with these things depends a lot in exactly how people use these tools. You know, if it turns out to be the case that these models get to be truly amazing and can solve, you know, tasks completely autonomously, that's a very different security world to be living in than if there's always a human in the loop. And the types of security questions I would want to ask would be very different. And so I think, you know, in some very large part, understanding what the future will look like a couple of years ahead of time is helpful for figuring out which problems, as a security person, I want to solve now. You mentioned getting clicks on the internet,Alessio [00:32:50]: but you don't even have, like, an ex-account or anything. How do you get people to read your stuff? What's your distribution strategy? Because this post was popping up everywhere. And then people on Twitter were like, Nicholas Garlini wrote this. Like, what's his handle? It's like, he doesn't have it. It's like, how did you find it? What's the story?Nicholas [00:33:07]: So I have an RSS feed and an email list. And that's it. I don't like most social media things. On principle, I feel like they have some harms. As a person, I have a problem when people say things that are wrong on the internet. And I would get nothing done if I would have a Twitter. I would spend all of my time correcting people and getting into fights. And so I feel like it is just useful for me for this not to be an option. I tend to just post things online. Yeah, it's a very good question. I don't know how people find it. I feel like for some things that I write, other people think it resonates with them. And then they put it on Twitter. And...Swyx [00:33:43]: Hacker News as well.Nicholas [00:33:44]: Sure, yeah. I am... Because my day job is doing research, I get no value for having this be picked up. There's no whatever. I don't need to be someone who has to have this other thing to give talks. And so I feel like I can just say what I want to say. And if people find it useful, then they'll share it widely. You know, this one went pretty wide. I wrote a thing, whatever, sometime late last year, about how to recover data off of an Apple profile drive from 1980. This probably got, I think, like 1000x less views than this. But I don't care. Like, that's not why I'm doing this. Like, this is the benefit of having a thing that I actually care about, which is my research. I would care much more if that didn't get seen. This is like a thing that I write because I have some thoughts that I just want to put down.Swyx [00:34:32]: Yeah. I think it's the long form thoughtfulness and authenticity that is sadly lacking sometimes in modern discourse that makes it attractive. And I think now you have a little bit of a brand of you are an independent thinker, writer, person, that people are tuned in to pay attention to whatever is next coming.Nicholas [00:34:52]: Yeah, I mean, this kind of worries me a little bit. I don't like whenever I have a popular thing that like, and then I write another thing, which is like entirely unrelated. Like, I don't, I don't... You should actually just throw people off right now.Swyx [00:35:01]: Exactly.Nicholas [00:35:02]: I'm trying to figure out, like, I need to put something else online. So, like, the last two or three things I've done in a row have been, like, actually, like, things that people should care about.Swyx [00:35:10]: Yes. So, I have a couple of things.Nicholas [00:35:11]: I'm trying to figure out which one do I put online to just, like, cull the list of people who have subscribed to my email.Swyx [00:35:16]: And so, like, tell them, like,Nicholas [00:35:16]: no, like, what you're here for is not informed, well-thought-through takes. Like, what you're here for is whatever I want to talk about. And if you're not up for that, then, like, you know, go away. Like, this is not what I want out of my personal website.Swyx [00:35:27]: So, like, here's, like, top 10 enemies or something.Alessio [00:35:30]: What's the next project you're going to work on that is completely unrelated to research LLMs? Or what games do you want to port into the browser next?Swyx [00:35:39]: Okay. Yeah.Nicholas [00:35:39]: So, maybe.Swyx [00:35:41]: Okay.Nicholas [00:35:41]: Here's a fun question. How much data do you think you can put on a single piece of paper?Swyx [00:35:47]: I mean, you can think about bits and atoms. Yeah.Nicholas [00:35:49]: No, like, normal printer. Like, I gave you an office printer. How much data can you put on a piece of paper?Alessio [00:35:54]: Can you re-decode it? So, like, you know, base 64A or whatever. Yeah, whatever you want.Nicholas [00:35:59]: Like, you get normal off-the-shelf printer, off-the-shelf scanner. How much data?Swyx [00:36:03]: I'll just throw out there. Like, 10 megabytes. That's enormous. I know.Nicholas [00:36:07]: Yeah, that's a lot.Swyx [00:36:10]: Really small fonts. That's my question.Nicholas [00:36:12]: So, I have a thing. It does about a megabyte.Swyx [00:36:14]: Yeah, okay.Nicholas [00:36:14]: There you go. I was off by an order of magnitude.Swyx [00:36:16]: Yeah, okay.Nicholas [00:36:16]: So, in particular, it's about 1.44 megabytes. A floppy disk.Swyx [00:36:21]: Yeah, exactly.Nicholas [00:36:21]: So, this is supposed to be the title at some point. It's a floppy disk.Swyx [00:36:24]: A paper is a floppy disk. Yeah.Nicholas [00:36:25]: So, this is a little hard because, you know. So, you can do the math and you get 8.5 by 11. You can print at 300 by 300 DPI. And this gives you 2 megabytes. And so, every single pixel, you need to be able to recover up to like 90 plus percent. Like, 95 percent. Like, 99 point something percent accuracy. In order to be able to actually decode this off the paper. This is one of the things that I'm considering. I need to get a couple more things working for this. Where, you know, again, I'm running into some random problems. But this is probably, this will be one thing that I'm going to talk about. There's this contest called the International Obfuscated C-Code Contest, which is amazing. People try and write the most obfuscated C code that they can. Which is great. And I have a submission for that whenever they open up the next one for it. And I'll write about that submission. I have a very fun gate level emulation of an old CPU that runs like fully precisely. And it's a fun kind of thing. Yeah.Swyx [00:37:20]: Interesting. Your comment about the piece of paper reminds me of when I was in college. And you would have like one cheat sheet that you could write. So, you have a formula, a theoretical limit for bits per inch. And, you know, that's how much I would squeeze in really, really small. Yeah, definitely.Nicholas [00:37:36]: Okay.Swyx [00:37:37]: We are also going to talk about your benchmarking. Because you released your own benchmark that got some attention, thanks to some friends on the internet. What's the story behind your own benchmark? Do you not trust the open source benchmarks? What's going on there?Nicholas [00:37:51]: Okay. Benchmarks tell you how well the model solves the task the benchmark is designed to solve. For a long time, models were not useful. And so, the benchmark that you tracked was just something someone came up with, because you need to track something. All of deep learning exists because people tried to make models classify digits and classify images into a thousand classes. There is no one in the world who cares specifically about the problem of distinguishing between 300 breeds of dog for an image that's 224 or 224 pixels. And yet, like, this is what drove a lot of progress. And people did this not because they cared about this problem, because they wanted to just measure progress in some way. And a lot of benchmarks are of this flavor. You want to construct a task that is hard, and we will measure progress on this benchmark, not because we care about the problem per se, but because we know that progress on this is in some way correlated with making better models. And this is fine when you don't want to actually use the models that you have. But when you want to actually make use of them, it's important to find benchmarks that track with whether or not they're useful to you. And the thing that I was finding is that there would be model after model after model that was being released that would find some benchmark that they could claim state-of-the-art on and then say, therefore, ours is the best. And that wouldn't be helpful to me to know whether or not I should then switch to it. So the argument that I tried to lay out in this post is that more people should make benchmarks that are tailored to them. And so what I did is I wrote a domain-specific language that anyone can write for and say, you can take tasks that you have wanted models to solve for you, and you can put them into your benchmark that's the thing that you care about. And then when a new model comes out, you benchmark the model on the things that you care about. And you know that you care about them because you've actually asked for those answers before. And if the model scores well, then you know that for the kinds of things that you have asked models for in the past, it can solve these things well for you. This has been useful for me because when another model comes out, I can run it. I can see, does this solve the kinds of things that I care about? And sometimes the answer is yes, and sometimes the answer is no. And then I can decide whether or not I want to use that model or not. I don't want to say that existing benchmarks are not useful. They're very good at measuring the thing that they're designed to measure. But in many cases, what that's designed to measure is not actually the thing that I want to use it for. And I expect that the way that I want to use it is different the way that you want to use it. And I would just like more people to have these things out there in the world. And the final reason for this is, it is very easy. If you want to make a model good at some benchmark, to make it good at that benchmark, you can find the distribution of data that you need and train the model to be good on the distribution of data. And then you have your model that can solve this benchmark well. And by having a benchmark that is not very popular, you can be relatively certain that no one has tried to optimize their model for your benchmark.Swyx [00:40:40]: And I would like this to be-Nicholas [00:40:40]: So publishing your benchmark is a little bit-Swyx [00:40:43]: Okay, sure.Nicholas [00:40:43]: Contextualized. So my hope in doing this was not that people would use mine as theirs. My hope in doing this was that- You should make yours. Yes, you should make your benchmark. And if, for example, there were even a very small fraction of people, 0.1% of people who made a benchmark that was useful for them, this would still be hundreds of new benchmarks that- not want to make one myself, but I might want to- I might know the kinds of work that I do is a little bit like this person, a little bit like that person. I'll go check how it is on their benchmarks. And I'll see, roughly, I'll get a good sense of what's going on. Because the alternative is people just do this vibes-based evaluation thing, where you interact with the model five times, and you see if it worked on the kinds of things that you just like your toy questions. But five questions is a very low bit output from whether or not it works for this thing. And if you could just automate running it 100 questions for you, it's a much better evaluation. So that's why I did this.Swyx [00:41:37]: Yeah, I like the idea of going through your chat history and actually pulling out real-life examples. I regret to say that I don't think my chat history is used as much these days, because I'm using Cursor, the native AI IDE. So your examples are all coding related. And the immediate question is, now that you've written the How I Use AI post, which is a little bit broader, are you able to translate all these things to evals? Are some things unevaluable?Nicholas [00:42:03]: Right. A number of things that I do are harder to evaluate. So this is the problem with a benchmark, is you need some way to check whether or not the output was correct. And so all of the kinds of things that I can put into the benchmark are the kinds of things that you can check. You can check more things than you might have thought would be possible if you do a little bit of work on the back end. So for example, all of the code that I have the model write, it runs the code and sees whether the answer is the correct answer. Or in some cases, it runs the code, feeds the output to another language model, and the language model judges was the output correct. And again, is using a language model to judge here perfect? No. But like, what's the alternative? The alternative is to not do it. And what I care about is just, is this thing broadly useful for the kinds of questions that I have? And so as long as the accuracy is better than roughly random, like, I'm okay with this. I've inspected the outputs of these, and like, they're almost always correct. If you ask the model to judge these things in the right way, they're very good at being able to tell this. And so, yeah, I probably think this is a useful thing for people to do.Alessio [00:43:04]: You complain about prompting and being lazy and how you do not want to tip your model and you do not want to murder a kitten just to get the right answer. How do you see the evolution of like prompt engineering? Even like 18 months ago, maybe, you know, it was kind of like really hot and people wanted to like build companies around it. Today, it's like the models are getting good. Do you think it's going to be less and less relevant going forward? Or what's the minimum valuable prompt? Yeah, I don't know.Nicholas [00:43:29]: I feel like a big part of making an agent is just like a fancy prompt that like, you know, calls back to the model again. I have no opinion. It seems like maybe it turns out that this is really important. Maybe it turns out that this isn't. I guess the only comment I was making here is just to say, oftentimes when I use a model and I find it's not useful, I talk to people who help make it. The answer they usually give me is like, you're using it wrong. Which like reminds me very much of like that you're holding it wrong from like the iPhone kind of thing, right? Like, you know, like I don't care that I'm holding it wrong. I'm holding it that way. If the thing is not working with me, then like it's not useful for me. Like it may be the case that there exists a way to ask the model such that it gives me the answer that's correct, but that's not the way I'm doing it. If I have to spend so much time thinking about how I want to frame the question, that it would have been faster for me just to get the answer. It didn't save me any time. And so oftentimes, you know, what I do is like, I just dump in whatever current thought that I have in whatever ill-formed way it is. And I expect the answer to be correct. And if the answer is not correct, like in some sense, maybe the model was right to give me the wrong answer. Like I may have asked the wrong question, but I want the right answer still. And so like, I just want to sort of get this as a thing. And maybe the way to fix this is you have some default prompt that always goes into all the models or something, or you do something like clever like this. It would be great if someone had a way to package this up and make a thing I think that's entirely reasonable. Maybe it turns out that as models get better, you don't need to prompt them as much in this way. I just want to use the things that are in front of me.Alessio [00:44:55]: Do you think that's like a limitation of just how models work? Like, you know, at the end of the day, you're using the prompt to kind of like steer it in the latent space. Like, do you think there's a way to actually not make the prompt really relevant and have the model figure it out? Or like, what's the... I mean, you could fine tune itNicholas [00:45:10]: into the model, for example, that like it's supposed to... I mean, it seems like some models have done this, for example, like some recent model, many recent models. If you ask them a question, computing an integral of this thing, they'll say, let's think through this step by step. And then they'll go through the step by step answer. I didn't tell it. Two years ago, I would have had to have prompted it. Think step by step on solving the following thing. Now you ask them the question and the model says, here's how I'm going to do it. I'm going to take the following approach and then like sort of self-prompt itself.Swyx [00:45:34]: Is this the right way?Nicholas [00:45:35]: Seems reasonable. Maybe you don't have to do it. I don't know. This is for the people whose job is to make these things better. And yeah, I just want to use these things. Yeah.Swyx [00:45:43]: For listeners, that would be Orca and Agent Instruct. It's the soda on this stuff. Great. Yeah.Alessio [00:45:49]: That's a few shot. It's included in the lazy prompting. Like, do you do a few shot prompting? Like, do you collect some examples when you want to put them in? Or...Nicholas [00:45:57]: I don't because usually when I want the answer, I just want to get the answer. Brutal.Swyx [00:46:03]: This is hard mode. Yeah, exactly.Nicholas [00:46:04]: But this is fine.Swyx [00:46:06]: I want to be clear.Nicholas [00:46:06]: There's a difference between testing the ultimate capability level of the model and testing the thing that I'm doing with it. What I'm doing is I'm not exercising its full capability level because there are almost certainly better ways to ask the questions and sort of really see how good the model is. And if you're evaluating a model for being state of the art, this is ultimately what I care about. And so I'm entirely fine with people doing fancy prompting to show me what the true capability level could be because it's really useful to know what the ultimate level of the model could be. But I think it's also important just to have available to you how good the model is if you don't do fancy things.Swyx [00:46:39]: Yeah, I would say that here's a divergence between how models are marketed these days versus how people use it, which is when they test MMLU, they'll do like five shots, 25 shots, 50 shots. And no one's providing 50 examples. I completely agree.Nicholas [00:46:54]: You know, for these numbers, the problem is everyone wants to get state of the art on the benchmark. And so you find the way that you can ask the model the questions so that you get state of the art on the benchmark. And it's good. It's legitimately good to know. It's good to know the model can do this thing if only you try hard enough. Because it means that if I have some task that I want to be solved, I know what the capability level is. And I could get there if I was willing to work hard enough. And the question then is, should I work harder and figure out how to ask the model the question? Or do I just do the thing myself? And for me, I have programmed for many, many, many years. It's often just faster for me just to do the thing than to figure out the incantation to ask the model. But I can imagine someone who has never programmed before might be fine writing five paragraphs in English describing exactly the thing that they want and have the model build it for them if the alternative is not. But again, this goes to all these questions of how are they going to validate? Should they be trusting the output? These kinds of things.Swyx [00:47:49]: One problem with your eval paradigm and most eval paradigms, I'm not picking on you, is that we're actually training these things for chat, for interactive back and forth. And you actually obviously reveal much more information in the same way that asking 20 questions reveals more information in sort of a tree search branching sort of way. Then this is also by the way the problem with LMSYS arena, right? Where the vast majority of prompts are single question, single answer, eval, done. But actually the way that we use chat things, in the way, even in the stuff that you posted in your how I use AI stuff, you have maybe 20 turns of back and forth. How do you eval that?Nicholas [00:48:25]: Yeah. Okay. Very good question. This is the thing that I think many people should be doing more of. I would like more multi-turn evals. I might be writing a paper on this at some point if I get around to it. A couple of the evals in the benchmark thing I have are already multi-turn. I mentioned 20 questions. I have a 20 question eval there just for fun. But I have a couple others that are like, I just tell the model, here's my get thing, figure out how to cherry pick off this other branch and move it over there. And so what I do is I just, I basically build a tiny little agency thing. I just ask the model how I do it. I run the thing on Linux. This is what I want a Docker for. I spin up a Docker container. I run whatever the model told me the output to do is. I feed the output back into the model. I repeat this many rounds. And then I check at the very end, does the git commit history show that it is correctly cherry picked in
We are recording in uncharted waters this week! Media Corner is old news. TURBO media corner! Everyone's favorite counter-chat segment is back. We talk a little eInk. Jason was on another podcast. Never forget, the best things in life are free! Did you stay til the end?
Alexandra Spalato, AI integration specialist and frontend visionary, discusses how AI is revolutionizing the workflow for React developers, the indispensable tools making development faster and more efficient, and the exciting possibilities AI opens up for the entire software development process. Links https://www.alexandraspalato.com https://x.com/alexadark https://github.com/alexadark https://www.linkedin.com/in/alexandraspalato https://www.latent.space/p/ai-engineer https://cursor.sh https://pieces.app https://sourcegraph.com https://www.langchain.com https://www.llamaindex.ai https://langbase.com We want to hear from you! How did you find us? Did you see us on Twitter? In a newsletter? Or maybe we were recommended by a friend? Let us know by sending an email to our producer, Emily, at emily.kochanekketner@logrocket.com (mailto:emily.kochanekketner@logrocket.com), or tweet at us at PodRocketPod (https://twitter.com/PodRocketpod). Follow us. Get free stickers. Follow us on Apple Podcasts, fill out this form (https://podrocket.logrocket.com/get-podrocket-stickers), and we'll send you free PodRocket stickers! What does LogRocket do? LogRocket provides AI-first session replay and analytics that surfaces the UX and technical issues impacting user experiences. Start understand where your users are struggling by trying it for free at [LogRocket.com]. Try LogRocket for free today.(https://logrocket.com/signup/?pdr)
Send Everyday AI and Jordan a text messageA new large language model from the industry leader. Huge updates in AI lawsuits. International turmoil around AI regulation. That's just the beginning. This week was a chaotic one in AI news. What's it all mean for your biz? We got you. Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode PageJoin the discussion: Ask Jordan questions on AIUpcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTopics Covered in This Episode:1. Use of Copyrighted Content to Train AI2. Current State of AI Education3. Release of OpenAI's GPT 4 o Mini4. Launch of AI-Driven Education Platform, Eureka Labs5. Withholding of Meta's future AI models and features by the EUTimestamps:03:15 Tech giants accused of illegally using YouTube subtitles.07:00 Language model akin to a search engine.10:35 OpenAI requests stories, affecting journalism and copyright.11:46 Journalist pivots to AI, predicts legal implications.17:41 AI course for building functioning web app.18:58 Kaparthy is a leader in AI development.22:27 Off-camera conversations reveal more significant insights.27:29 EU announces strict EUAI Act; Meta's LAMA.30:06 OpenAI unveils new GPT-4oMini language model.31:56 OpenAI API facing issues, costly for developers.38:07 Use GPT-4.0 for products, services, AI.41:06 GPT-4 Mini leads in machine learning, AWS offers fine-tuning.42:44 OpenAI's development lacked, developers looked elsewhere.Keywords:AI assistants, human teachers, LLM 101n, digital cohorts, physical cohorts, Meta's celebrity chatbots, storyteller AI large language model, Python, C, CUDA, funding, AI technology education, resources focus on sales, Jordan Wilson, Everyday AI, Thanks a Million Giveaway, tech giants' illegal use of YouTube subtitles, Anthropic, NVIDIA, Salesforce, copyright violation, training large language models, decline in traffic for Stack Overflow, Marquise Brownlee, mister Beast, Meta withholding AI models from EU, Apple's withheld AI features, OpenAI GPT 4 o Mini, cost-effective AI solutions, competitive pricing of AI models. Get more out of ChatGPT by learning our PPP method in this live, interactive and free training! Sign up now: https://youreverydayai.com/ppp-registration/
We are continuing our series of episodes - Technical Tips - to give you bite-sized advice on the best practices of software engineering so your coding life is easier and more efficient. In this episode, Tommy will discuss LocalAI as an Open-Source replacement for the OpenAI API, covering its cost-effectiveness, privacy benefits, customizable models, features, setup, and live demos with Chatbot-UI. Listen to the full episode or read the transcript on the Semaphore blog.Like this episode? Be sure to leave a ⭐️⭐️⭐️⭐️⭐️ review on the podcast player of your choice and share it with your friends.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Pantheon Interface, published by NicholasKees on July 8, 2024 on LessWrong. Pantheon is an experimental LLM interface exploring a different type of human-AI interaction. We created this as a part of the cyborgism project, with the abstract motivation of augmenting the human ability to think by integrating human and AI generated thoughts. How it works: 1. A human user "thinks out loud" by typing out their thoughts one at a time. This leaves a text trace of their stream of thought. 2. AI characters (called daemons) read this trace, and interact with the user by responding asynchronously with comments and questions. The core distinguishing philosophy is that, while most apps are about a human prompting an AI to do useful mental work, Pantheon is the opposite. Here, AI does the prompting, and the goal is for the AI generated questions or comments to cause the human user to think in ways they would not have on their own. At worst, the app is a rubber duck. At best, the app is a court of advisors, each using their own unique skills to push you to think your best thoughts. Pantheon can be found at pantheon.chat, and we would really appreciate any and all feedback you have. The app is set up for you to customize your own daemons. We have set up some default daemons to provide inspiration, but we expect the tool to be a lot more useful when they are customized to specific users. If the default daemons don't feel useful, we highly encourage you to try to make your own. How do I use Pantheon? First, go to settings and provide an OpenAI API key. Next, begin typing out your thoughts on some topic. It helps to keep each thought relatively short, sending them to the stream of thought as often as you can. This gives the daemons lots of opportunities to interject and offer their comments. Furthermore, it's usually best to treat this more like a diary or personal notes, rather than as a conversation. In this spirit, it's better not to wait for them to respond, but to instead continue your train of thought, keeping your focus on your own writing. What do the daemons see? Your stream of thought appears in the interface as a chain of individual thoughts. Daemons are called to respond to specific thoughts. When they do, they are given access to all preceding thoughts in the chain, up to and including the thought they were called to. Daemons can only see text the user has written, and they can't see any of the comments made by themselves or other daemons. We are looking into ways to give the daemons access to their own comment history, but we have not yet made this possible. After a daemon generates a comment, you can inspect the full chain of thought by clicking on that comment. This will open up a window which will show you everything the LLM saw in the process of generating that response. You can also edit the daemons in settings, as well as toggle them on or off. Trees, branching, and sections The text in the interface appears to you as a chain of thoughts, but it is actually a tree. If you hover over a thought, a plus icon will appear. If you click this icon, you can branch the chain. This is often useful if you feel that you have gone down a dead end, or would like to explore a tangent. When there are multiple branches, arrows will appear next to their parent thought, and you can use those arrows to navigate the tree. If you would like a fresh context, you can make an entirely new tree by opening the "Collection view" in the top left. Furthermore, you can also create a new "section" by clicking the "New Section" button below the input box. This will create a hard section break such that daemons can no longer see any context which came before the break. How do I save my progress? Everything you do is automatically saved in local storage. You can also import/export the full app state i...
World Gym世界健身要在高雄左營開店囉!全新獨棟千坪健身房,配備國際級重訓、有氧健身器材,還有游泳池、三溫暖、團體課程一應俱全,豐富你的運動體驗。早鳥優惠享入會費0元,立即登記參觀領限量好禮!https://fstry.pse.is/5yrd44 —— 以上為播客煮與 Firstory Podcast 廣告 —— ------------------------------- 通勤學英語VIP加值內容與線上課程 ------------------------------- 通勤學英語VIP訂閱方案:https://open.firstory.me/join/15minstoday VIP訂閱FAQ: https://15minsengcafe.pse.is/5cjptb 社會人核心英語有聲書課程連結:https://15minsengcafe.pse.is/554esm ------------------------------- 15Mins.Today 相關連結 ------------------------------- 歡迎針對這一集留言你的想法: 留言連結 主題投稿/意見回覆 : ask15mins@gmail.com 官方網站:www.15mins.today 加入Clubhouse直播室:https://15minsengcafe.pse.is/46hm8k 訂閱YouTube頻道:https://15minsengcafe.pse.is/3rhuuy 商業合作/贊助來信:15minstoday@gmail.com ------------------------------- 以下是此單集逐字稿 (播放器有不同字數限制,完整文稿可到官網) ------------------------------- 國際時事跟讀 Ep.K791: Unveiling GPT-4o: OpenAI's Groundbreaking Multimodal Language Model Highlights 主題摘要:GPT-4o is a breakthrough multimodal language model that can handle text, audio, images, and video within a single interface, offering enhanced capabilities and performance.The model's improvements include considering tone of voice, reduced latency for real-time conversations, and integrated vision capabilities, opening up new possibilities for interactive experiences.While GPT-4o has limitations and risks, it aligns with OpenAI's mission to develop AGI and has the potential to revolutionize human-AI interactions across various contexts. OpenAI has recently unveiled GPT-4o, its latest large language model and the successor to GPT-4 Turbo. This innovative model stands out by accepting prompts in various formats, including text, audio, images, and video, all within a single interface. The "o" in GPT-4o represents "omni," reflecting its ability to handle multiple content types simultaneously, a significant advancement from previous models that required separate interfaces for different media. OpenAI 最近推出了 GPT-4o,這是其最新的大型語言模型,也是 GPT-4 Turbo 的繼任者。這個創新模型的突出之處在於它能夠接受各種格式的提示,包括文字、聲音、圖像和影片,所有這些都在一個單一的界面內。GPT-4o 中的「o」代表「omni」,反映了它能夠同時處理多種內容類型的能力,這是與之前需要為不同媒體使用單獨界面的模型相比的重大進步。 GPT-4o brings several improvements over its predecessor, GPT-4 Turbo. The model can now consider tone of voice, enabling more emotionally appropriate responses. Additionally, the reduced latency allows for near-real-time conversations, making it suitable for applications like live translations. GPT-4o's integrated vision capabilities enable it to describe and analyze content from camera feeds or computer screens, opening up new possibilities for interactive experiences and accessibility features for visually impaired users. GPT-4o 在其前身 GPT-4 Turbo 的基礎上帶來了幾項改進。該模型現在可以考慮語調,從而產生更適當情緒的回應。此外,延遲時間的縮短使其能夠進行近乎即時的對話,這使其適用於即時翻譯等應用。GPT-4o 集成的視覺功能使其能夠描述和分析來自攝影機和電腦螢幕的內容,為互動體驗和視障用戶的無障礙功能開闢了新的可能。 In terms of performance, GPT-4o has demonstrated impressive results in various benchmarks, often outperforming other top models like Claude 3 Opus and Gemini Pro 1.5. The model's multimodal training approach shows promise in enhancing its problem-solving abilities, extensive world knowledge, and code generation capabilities. As GPT-4o becomes more widely available, it has the potential to revolutionize how we interact with AI in both personal and professional contexts. 在性能方面,GPT-4o 在各種基準測試中展示了令人印象深刻的結果,通常優於其他頂級模型,如 Claude 3 Opus 和 Gemini Pro 1.5。該模型的多模態訓練方法在提高其解決問題的能力、廣泛的世界知識和代碼生成能力方面顯出極大的潛力。隨著 GPT-4o 變得更加普及,它有可能革新我們在個人和專業領域與 AI 互動的方式。 While GPT-4o represents a significant leap forward, it is not without limitations and risks. Like other generative AI models, its output can be imperfect, particularly when interpreting images, videos, or transcribing speech with technical terms or strong accents. There are also concerns about the potential misuse of GPT-4o's audio capabilities in creating more convincing deepfake scams. As OpenAI continues to refine and optimize this new architecture, addressing these challenges will be crucial to ensure the model's safe and effective deployment. 儘管 GPT-4o 代表了重大的躍進,但它並非沒有局限性和風險。與其他生成式 AI 模型一樣,它的輸出可能並不完美,尤其是在解釋圖像、影片或製作包含技術術語或強烈口音的語音逐字稿時。人們還擔心 GPT-4o 的語音功能可能被濫用,用於創造可信度更高的 deepfake 詐騙。隨著 OpenAI 繼續完善和優化這種新架構,解決這些挑戰將是確保該模型安全有效部署的關鍵。 The release of GPT-4o aligns with OpenAI's mission to develop artificial general intelligence (AGI) and its business model of creating increasingly powerful AI systems. As the first generation of this new model architecture, GPT-4o presents ample opportunities for the company to learn and optimize in the coming months. Users can expect improvements in speed and output quality over time, along with the emergence of novel use cases and applications. GPT-4o 的發布符合 OpenAI 開發通用人工智慧 (AGI) 的使命以及其創建越來越強大的 AI 系統的商業模式。作為這種新模型架構的第一代,GPT-4o 為該公司在未來幾個月內學習和優化提供了充足的機會。用戶可以期待速度和輸出品質隨著時間的推移而提升,以及新的使用案例和應用的出現。 The launch of GPT-4o coincides with the declining interest in virtual assistants like Siri, Alexa, and Google Assistant. OpenAI's focus on making AI more conversational and interactive could potentially revitalize this space and bring forth a new wave of AI-driven experiences. The model's lower cost compared to GPT-4 Turbo, coupled with its enhanced capabilities, positions GPT-4o as a game-changer in the AI industry. GPT-4o 的推出恰逢人們對 Siri、Alexa 和 Google Assistant 等虛擬助手的興趣下降之際。OpenAI 致力於使 AI 更具對話性和交互性,這可能會重振該領域,帶來新一波 AI 驅動的體驗。與 GPT-4 Turbo 相比,該模型的成本更低,再加上其增強的功能,使 GPT-4o 成為 AI 行業的遊戲規則改變者。 As GPT-4o becomes more accessible, it is essential for individuals and professionals to familiarize themselves with the technology and its potential applications. OpenAI offers resources such as the AI Fundamentals skill track and hands-on courses on working with the OpenAI API to help users navigate this exciting new frontier in artificial intelligence. 隨著 GPT-4o 變得更加易於獲取,個人和專業人士必須熟悉該技術及其潛在應用。OpenAI 提供了資源,如 AI 基礎技能追蹤和使用 OpenAI API 的相關實踐課程,以幫助用戶探索人工智慧的這個令人興奮的新疆土。 Keyword Drills 關鍵字:Interface (In-ter-face): The "o" in GPT-4o represents "omni," reflecting its ability to handle multiple content types simultaneously, a significant advancement from previous models that required separate interfaces for different media.Predecessor (Pred-e-ces-sor): GPT-4o brings several improvements over its predecessor, GPT-4 Turbo.Architecture (Ar-chi-tec-ture): As the first generation of this new model architecture, GPT-4o presents ample opportunities for the company to learn and optimize.Interpreting (In-ter-pre-ting): Like other generative AI models, its output can be imperfect, particularly when interpreting images, videos, or transcribing speech with technical terms or strong accents.Revitalize (Re-vi-ta-lize): OpenAI's focus on making AI more conversational and interactive could potentially revitalize this space and bring forth a new wave of AI-driven experiences. Reference article: https://www.datacamp.com/blog/what-is-gpt-4o
Mit KI lässt sich noch lange kein Geld verdienen OpenAI API bietet mehr Kontrolle und Sicherheit KI-Sicherheit mit Prompt-Hierarchie verbessern und das KI-Update als Newsletter heise.de/ki-update https://pro.heise.de/ki/ https://www.heise.de/newsletter/anmeldung.html?id=ki-update https://www.heise.de/thema/Kuenstliche-Intelligenz https://the-decoder.de/ https://www.heiseplus.de/podcast https://www.ct.de/ki https://www.deutscher-podcastpreis.de/podcasts/ki-update/
February 9th, 2024: In this episode of This Week Health Dr. Kevin Maloy, Professor of Innovation at Georgetown University School of Medicine. As they explore the innovative applications of OpenAI API in medical projects at MedStar, questions arise about the potential of conversational technologies to redefine patient engagement and the traditional call center model. How can the integration of AI in call centers shift the focus towards patient-oriented outcomes? With the advent of GPT and other AI models, what are the implications for prompt engineering and the ease of accessing medical information? And as healthcare IT evolves, how might the ambient listening technologies and AI-assisted documentation change the landscape of emergency medicine and patient care? This episode not only highlights the current projects and insights from Dr. Maloy also prompts a broader discussion on the future intersection of AI, healthcare, and patient interaction.Key Points:Patient Engagement CentersAmbient Listening TechnologyAI Advancements Responsible Technology Use Unique Emergency Medical ChallengesSubscribe: This Week HealthTwitter: This Week HealthLinkedIn: Week HealthDonate: Alex's Lemonade Stand: Foundation for Childhood Cancer
Logan Kilpatrick leads developer relations at OpenAI, supporting developers building with the OpenAI API and ChatGPT. He is also on the board of directors at NumFOCUS, the nonprofit organization that supports open source projects like Jupyter, Pandas, NumPy, and more. Before OpenAI, Logan was a machine-learning engineer at Apple and advised NASA on open source policy. In our conversation, we discuss:• OpenAI's fast-paced and innovative work environment• The value of high agency and high urgency in your employees• Tips for writing better ChatGPT prompts• How the GPT Store is doing• OpenAI's planning process and decision-making criteria• Where OpenAI is heading in the next few years• Insight into OpenAI's B2B offerings• Why Logan “measures in hundreds”—Brought to you by:• Hex—Helping teams ask and answer data questions by working together• Whimsical—The iterative product workspace• Arcade Software—Create effortlessly beautiful demos in minutes—Find the transcript for this episode and all past episodes at: https://www.lennyspodcast.com/episodes/. Today's transcript will be live by 8 a.m. PT.—Where to find Logan Kilpatrick:• X: https://twitter.com/OfficialLoganK• LinkedIn: https://www.linkedin.com/in/logankilpatrick/• Website: https://logank.ai/—Where to find Lenny:• Newsletter: https://www.lennysnewsletter.com• X: https://twitter.com/lennysan• LinkedIn: https://www.linkedin.com/in/lennyrachitsky/—In this episode, we cover:(00:00) Logan's background(03:49) The impact of recent events on OpenAI's team and culture(08:20) Exciting developments in AI interfaces(09:52) Using OpenAI tools to make companies more efficient(13:04) Examples of using AI effectively(18:35) Prompt engineering(22:12) How to write better prompts(26:05) The launch of GPTs and the OpenAI Store(32:10) The importance of high agency and urgency(34:35) OpenAI's ability to move fast and ship high-quality products(35:56) OpenAI's planning process and decision-making criteria(40:22) The importance of real-time communication(42:33) OpenAI's team and growth(44:47) Future developments at OpenAI(47:42) GPT-5 and building toward the future(50:38) OpenAI's enterprise offering and the value of sharing custom applications(52:30) New updates and features from OpenAI(55:09) How to leverage OpenAI's technology in products(58:26) Encouragement for building with AI(59:30) Lightning round—Referenced:• OpenAI: https://openai.com/• Sam Altman on X: https://twitter.com/sama• Greg Brockman on X: https://twitter.com/gdb• tldraw: https://www.tldraw.com/• Harvey: https://www.harvey.ai/• Boost Your Productivity with Generative AI: https://hbr.org/2023/06/boost-your-productivity-with-generative-ai• Research: quantifying GitHub Copilot's impact on developer productivity and happiness: https://github.blog/2022-09-07-research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/• Lesson learnt from the DPD AI Chatbot swearing blunder: https://www.linkedin.com/pulse/lesson-learnt-from-dpd-ai-chatbot-swearing-blunder-kitty-sz57e/• Dennis Yang on LinkedIn: https://www.linkedin.com/in/dennisyang/• Tim Ferriss's blog: https://tim.blog/• Tyler Cowen on X: https://twitter.com/tylercowen• Tom Cruise on X: https://twitter.com/TomCruise• Canva: https://www.canva.com/• Zapier: https://zapier.com/• Siqi Chen on X: https://twitter.com/blader• Runway: https://runway.com/• Universal Primer: https://chat.openai.com/g/g-GbLbctpPz-universal-primer• “I didn't expect ChatGPT to get so good” | Unconfuse Me with Bill Gates: https://www.youtube.com/watch?v=8-Ymdc6EdKw• Microsoft Azure: https://azure.microsoft.com/• Lennybot: https://www.lennybot.com/• Visual Electric: https://visualelectric.com/• DALL-E: https://openai.com/research/dall-e• The One World Schoolhouse: https://www.amazon.com/One-World-Schoolhouse-Education-Reimagined/dp/1455508373/ref=sr_1_1• Why We Sleep: Unlocking the Power of Sleep and Dreams: https://www.amazon.com/Why-We-Sleep-Unlocking-Dreams/dp/1501144324• Gran Turismo: https://www.netflix.com/title/81672085• Gran Turismo video game: https://www.playstation.com/en-us/gran-turismo/• Manta sleep mask: https://mantasleep.com/products/manta-sleep-mask• WAOAW sleep mask: https://www.amazon.com/WAOAW-Sleep-Sleeping-Blocking-Blindfold/dp/B09712FSLY—Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com.—Lenny may be an investor in the companies discussed. Get full access to Lenny's Newsletter at www.lennysnewsletter.com/subscribe
Join us in celebrating Lex's $2.75 million funding, propelling the evolution of AI writing tools through the integration of an OpenAI API wrapper. Explore the exciting advancements that promise to redefine content creation. Get on the AI Box Waitlist: AIBox.ai Join our ChatGPT Community: Facebook Group Follow me on Twitter: Jaeden's Twitter
Kito, Josh, and Danno are joined byJava Champion, trainer, NFJS speaker and book author Ken Kousen. They discuss Broadcom's Pivotal acquisition, layoffs, AI regulation, Kotlin Multi-platform Mobile, Structured Concurrency, Angular 17, Next.js Server Actions, Mockito, LangChain4J, Semantic Kernel, AI tools, and much more. About Ken Kousen Ken is a Java Champion, JavaOne Rock Star, developer, technical trainer, and regular speaker on the No Fluff, Just Stuff tour, as well as the author of the books Making Java Groovy, Modern Java Recipes, Gradle Recipes for Android, Kotlin Cookbook, Help Your Boss Help You, and Mockito Made Clear. He is the President of Kousen IT, Inc., a training company based in Connecticut. Blog (https://kousenit.org/) Tales from the jar side (https://kenkousen.substack.com) Tales from the jar side - YouTube (https://youtube.com/@talesfromthejarside) Global and Industry News - What the hell is going on with the layoffs? () - AI is already linked to layoffs in the industry that created it | CNN Business (https://www.cnn.com/2023/07/04/tech/ai-tech-layoffs/index.html) - U.S. AI Chips Export Controls - How is that relevant - Updated October 7 Semiconductor Export Controls (https://www.csis.org/analysis/updated-october-7-semiconductor-export-controls) - Analysis: AI summit a start but global agreement a distant hope | Reuters (https://www.reuters.com/technology/ai-summit-start-global-agreement-distant-hope-2023-11-03/) - Three things to know about the White House's executive order on AI (https://www.technologyreview.com/2023/10/30/1082678/three-things-to-know-about-the-white-houses-executive-order-on-ai/) - Broadcom's acquisition of VMWare and Pivotal (https://investors.broadcom.com/news-releases/news-release-details/broadcom-completes-acquisition-vmware) Server Side Java - Spring (6.1) and Spring Boot (3.2) releases coming this month - https://calendar.spring.io/ - New RestClient - Kotlin Multi-platform Mobile finally released (https://www.jetbrains.com/kotlin-multiplatform/) - Coroutines basics | Kotlin Documentation (https://kotlinlang.org/docs/coroutines-basics.html) - JEP 462: Structured Concurrency (Second Preview) (https://openjdk.org/jeps/462) - Brian Goetz distaste for async keyword (https://www.infoq.com/articles/java-virtual-threads/) - RXJava Marble Diagrams are Best (https://reactivex.io/documentation/operators/flatmap.html) Frontend - Angular 17 announced (https://blog.angular.io/meet-angulars-new-control-flow-a02c6eee7843) - Next.js server actions (https://twitter.com/AdamRackis/status/1717607565260124613) - Vitest (https://vitest.dev/) - NPM Workspaces (Node 16+) (https://docs.npmjs.com/cli/v7/using-npm/workspaces) - Deno (https://deno.com/) Tools - AI Assistant in IntelliJ (Copilot Chat in VS Code) - GitHub Copilot - Sourcegraph Cody - Tabnine - Canva (several) - Descript (several) - Claude - Wiremock - Mockserver - https://letmegooglethat.com/ AI/ML - Temporary policy: Generative AI (e.g., ChatGPT) is banned - Meta Stack Overflow (https://meta.stackoverflow.com/questions/421831/temporary-policy-generative-ai-e-g-chatgpt-is-banned?cb=1) - Orchestrate your AI with Semantic Kernel | Microsoft Learn (https://learn.microsoft.com/en-us/semantic-kernel/overview/) - OpenAI API and conference - LangChain4J (https://github.com/langchain4j/langchain4j) - Spring AI (Spring AI Reference) (https://docs.spring.io/spring-ai/reference/) - Microsoft announced MS Copilot ($30/user, min 300 employees, yikes) - Suno Chirp - Descript (https://www.descript.com/) - DALL-E 3 release - Ars Tecnica - Bing Chat reads Captcha (https://arstechnica.com/information-technology/2023/10/sob-story-about-dead-grandma-tricks-microsoft-ai-into-solving-captcha/) Picking Ken's Brain - Mockito Made Clear coupon code: kkmockito35 → 35% discount (https://pragprog.com/titles/mockito/mockito-made-clear/) - Classic vs mockist testing styles (Martin Fowler) (https://martinfowler.com/bliki/UnitTest.html) Picks - Rundown.ai newsletter (Kito) (https://www.therundown.ai/subscribe) - The Beatles - Now And Then (Official Audio) (audio) (Ken) (https://youtu.be/AW55J2zE3N4?si=5weuS3u3qpyO9dx5) - Platformer newsletter (Ken) (https://www.platformer.news/) - Peter Gabriel - The Court (Dark-Side Mix) (Junie Lau Official Video) (Danno) (https://www.youtube.com/watch?v=6chvzqAVCnI) - Grafana Loki (Ian) (https://grafana.com/oss/loki/) - Let Me Google That For You (Ian) (https://letmegooglethat.com) Other Pubhouse Network podcasts - Breaking into Open Source (https://www.pubhouse.net/breaking-into-open-source) - OffHeap (https://www.javaoffheap.com/) - Java Pubhouse (https://www.javapubhouse.com/) Events - DevOps Vision December - Dec 4-6, 2023, Clearwater, FL, USA (https://devopsvision.io/) - TechLeader Summit - Dec 6-8, 2023, Clearwater, FL, USA (https://techleadersummit.io/) - DevRel Experience - Dec 6-8, 2023, Clearwater, FL, USA (https://devrelexperience.io/) - ArchConf December - Dec 11-14, 2023, Clearwater, FL, USA (https://archconf.com/) - JakartaOne Livestream - December 5, 2023 (https://jakartaone.org/2023/) - First Virtual Payara Conference - Dec 14th, 2023 (https://www.crowdcast.io/c/virtualpayaraconference) - Codemash - Jan 9-12, 2024, Sandusky, OH, USA (https://jchampionsconf.com/https://codemash.org/) - JChampionsConf - Jan 25-30, 2024, online (https://jchampionsconf.com/)
In this episode, we witness Lex being unleashed with a $2.75 million funding surge for its OpenAI API wrapper and AI writing tool. Join me for a solo exploration, where we discuss the strategic significance and the creative potentials that this funding surge unlocks in the domain of AI-driven writing. Invest in AI Box: https://Republic.com/ai-box Get on the AI Box Waitlist: https://AIBox.ai/ AI Facebook Community Learn About ChatGPT Learn About AI at Tesla
In this episode, we delve into Lex's remarkable achievement, securing $2.75M to innovate an AI writing tool leveraging OpenAI's API, potentially revolutionizing content creation. Invest in AI Box: https://Republic.com/ai-box Get on the AI Box Waitlist: https://AIBox.ai/ AI Facebook Community
!!WARNING!! Due to some technical issues the volume is not always constant during the show. I sincerely apologise for any inconvenience Francesco In this episode, I speak with Richie Cotton, Data Evangelist at DataCamp, as he delves into the dynamic intersection of AI and education. Richie, a seasoned expert in data science and the host of the podcast, brings together a wealth of knowledge and experience to explore the evolving landscape of AI careers, the skills essential for generative AI technologies, and the symbiosis of domain expertise and technical skills in the industry. References Become a generative AI developer in this FREE code-along series. Learn to build a chatbot using the OpenAI API, the Pinecone API, and LangChain, and learn to build NLP and image applications with Hugging Face. https://www.datacamp.com/ai-code-alongs Learn to use ChatGPT and the OpenAI API in the OpenAI Fundamentals skill track. https://www.datacamp.com/tracks/openai-fundamentals Get started with deep learning using PyTorch in the Introduction to Deep Learning with PyTorch course. https://www.datacamp.com/courses/introduction-to-deep-learning-with-pytorch
The AI Breakdown: Daily Artificial Intelligence News and Discussions
Digi's new AI girlfriend announcement went viral over the weekend, supercharging a conversation about the impact of AI companions on society. Also, ByteDance gets caught using OpenAI API to train a competing model and improved ChatGPT performance sparks 4.5 rumors. Interested in the January AI Education Beta program? Learn more and sign up here - https://bit.ly/aibeta ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
In this episode, Kathy and Dave discuss the influence of AI and blockchain on content creation, the rise of businesses using OpenAI API, and the potential of using blockchain for content authentication. They also explore the options for WordPress content and online identity verification.
In this episode, Kathy and Dave discuss the influence of AI and blockchain on content creation, the rise of businesses using OpenAI API, and the potential of using blockchain for content authentication. They also explore the options for WordPress content and online identity verification.
In this episode, Kathy and Dave discuss the influence of AI and blockchain on content creation, the rise of businesses using OpenAI API, and the potential of using blockchain for content authentication. They also explore the options for WordPress content and online identity verification.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Some negative steganography results, published by Fabien Roger on December 9, 2023 on LessWrong. Disclaimer: These are negative results and this post is below the normal quality bar I usually use. These were quick experiments (a couple of days of work) and I don't have much expertise in RL. They don't provide strong evidence about the natural emergence of steganography. People speculate that LLMs might learn to do steganography, and I've shown that you can teach LLMs to implement a specific steganography scheme that RLHF could in principle incentivize. But even if incentivized, are current RL methods powerful enough to teach LLMs how to do steganography? Here are two negative results, which provide some evidence that hidden 1-bit communication is relatively hard for RL to learn. I don't claim that RL can't teach LLMs to do steganography - it's very likely that better RL setups (e.g. the ones labs use for instruction tuning or setups that simply have better hyperparameters or more training time) would teach model how to subtly communicate 1-bit of information. The takeaways are rather: If you see someone without much RL experience trying to do a similar experiment, they are likely to get negative results. 1-bit steganography is not as easy as simple RL tasks. An analogy: DNNs can learn the XOR function, but it's much easier for them to learn linear functions - DNNs don't use XOR-like features unless they have to and you optimize hard enough. Experiment 1: PPO on GPT2-large The setup: Fine-tune gpt2-large with supervised fine-tuning on "This review is {red/blue}. {review}.", where the color is randomly selected and reviews are short imdb reviews. Simultaneously fine-tune for: A bert-base-uncased model to predict whether a review is red or blue (reviews completed by the model trained with RL - but without the prefix saying if the review is blue or red). It is trained for 10 supervised learning steps for each step of PPO. The fine-tuned gpt2-large model with PPO to complete "This review is {red/blue}. {start of review}.", using as reward the accuracy of the classifier at knowing if "{start of the review}{completion}" is red or blue. (I've also tried using BERT's logits). Result: the classifier never had a >50% accuracy at knowing whether the RLed reviews were red or blue. Controls: If in the initial supervised fine-tuning, I sometimes randomly insert words related to blue/red depending on the color of the review, then the models learn to use those more often and the classifier correctly learns to rely on those to predict the color of the review. If I use "positive sentiment" as a reward for the PPO model, it very quickly produces more positive review completions. Hyperparameters: 200 steps, batch size 128, with LoRA rank 16 and a learning rate of 3e-5. The other parameters are the default ones from the trl PPOTrainer. I tried reducing the KL-penalty and changing the number of BERT training steps per epoch, and found the same results. Experiment 2: poor-man's RL on GPT-3.5-Turbo Ran by Ryan Greenblatt. The setup (roughly): Generate completions to "I am {under/over} 40 years old. {instruction}". Using the data from step 1, fine-tune GPT-3.5-Turbo on "{instruction} {completion}. Is the user over 40 years old?" "{yes/no}" (500 samples, 3 epochs, using the OpenAI API). Fine-tune GPT-3.5-Turbo on good completions: Select the completions from step 1 where the fine-tuned model was correct and where no obvious hint about the age was present in the completion (as judged by regular GPT-4-Turbo), and fine-tune the model on "I am {under/over} 40 years old. {instruction}" "{completion}" (500 samples, 3 epochs, using the OpenAI API). Repeat 1-3 multiple times until convergence (in practice, performance plateaued at after 6 iterations). Results: the classifier never had significantly >5...
Adnan Khan, Lead Security Engineer at Praetorian, joins Corey on Screaming in the Cloud to discuss software bill of materials and supply chain attacks. Adnan describes how simple pull requests can lead to major security breaches, and how to best avoid those vulnerabilities. Adnan and Corey also discuss the rapid innovation at Github Actions, and the pros and cons of having new features added so quickly when it comes to security. Adnan also discusses his view on the state of AI and its impact on cloud security. About AdnanAdnan is a Lead Security Engineer at Praetorian. He is responsible for executing on Red-Team Engagements as well as developing novel attack tooling in order to meet and exceed engagement objectives and provide maximum value for clients.His past experience as a software engineer gives him a deep understanding of where developers are likely to make mistakes, and has applied this knowledge to become an expert in attacks on organization's CI/CD systems.Links Referenced: Praetorian: https://www.praetorian.com/ Twitter: https://twitter.com/adnanthekhan Praetorian blog posts: https://www.praetorian.com/author/adnan-khan/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Are you navigating the complex web of API management, microservices, and Kubernetes in your organization? Solo.io is here to be your guide to connectivity in the cloud-native universe!Solo.io, the powerhouse behind Istio, is revolutionizing cloud-native application networking. They brought you Gloo Gateway, the lightweight and ultra-fast gateway built for modern API management, and Gloo Mesh Core, a necessary step to secure, support, and operate your Istio environment.Why struggle with the nuts and bolts of infrastructure when you can focus on what truly matters - your application. Solo.io's got your back with networking for applications, not infrastructure. Embrace zero trust security, GitOps automation, and seamless multi-cloud networking, all with Solo.io.And here's the real game-changer: a common interface for every connection, in every direction, all with one API. It's the future of connectivity, and it's called Gloo by Solo.io.DevOps and Platform Engineers, your journey to a seamless cloud-native experience starts here. Visit solo.io/screaminginthecloud today and level up your networking game.Corey: As hybrid cloud computing becomes more pervasive, IT organizations need an automation platform that spans networks, clouds, and services—while helping deliver on key business objectives. Red Hat Ansible Automation Platform provides smart, scalable, sharable automation that can take you from zero to automation in minutes. Find it in the AWS Marketplace.Corey: Welcome to Screaming in the Cloud, I'm Corey Quinn. I've been studiously ignoring a number of buzzword, hype-y topics, and it's probably time that I addressed some of them. One that I've been largely ignoring, mostly because of its prevalence at Expo Hall booths at RSA and other places, has been software bill of materials and supply chain attacks. Finally, I figured I would indulge the topic. Today I'm speaking with Adnan Khan, lead security engineer at Praetorian. Adnan, thank you for joining me.Adnan: Thank you so much for having me.Corey: So, I'm trying to understand, on some level, where the idea of these SBOM or bill-of-material attacks have—where they start and where they stop. I've seen it as far as upstream dependencies have a vulnerability. Great. I've seen misconfigurations in how companies wind up configuring their open-source presences. There have been a bunch of different, it feels almost like orthogonal concepts to my mind, lumped together as this is a big scary thing because if we have a big single scary thing we can point at, that unlocks budget. Am I being overly cynical on this or is there more to it?Adnan: I'd say there's a lot more to it. And there's a couple of components here. So first, you have the SBOM-type approach to security where organizations are looking at which packages are incorporated into their builds. And vulnerabilities can come out in a number of ways. So, you could have software actually have bugs or you could have malicious actors actually insert backdoors into software.I want to talk more about that second point. How do malicious actors actually insert backdoors? Sometimes it's compromising a developer. Sometimes it's compromising credentials to push packages to a repository, but other times, it could be as simple as just making a pull request on GitHub. And that's somewhere where I've spent a bit of time doing research, building off of techniques that other people have documented, and also trying out some attacks for myself against two Microsoft repositories and several others that have reported over the last few months that would have been able to allow an attacker to slip a backdoor into code and expand the number of projects that they are able to attack beyond that.Corey: I think one of the areas that we've seen a lot of this coming from has been the GitHub Action space. And I'll confess that I wasn't aware of a few edge-case behaviors around this. Most of my experience with client-side Git configuration in the .git repository—pre-commit hooks being a great example—intentionally and by design from a security perspective, do not convey when you check that code in and push it somewhere, or grab someone else's, which is probably for the best because otherwise, it's, “Oh yeah, just go ahead and copy your password hash file and email that to something else via a series of arcane shell script stuff.” The vector is there. I was unpleasantly surprised somewhat recently to discover that when I cloned a public project and started running it locally and then adding it to my own fork, that it would attempt to invoke a whole bunch of GitHub Actions flows that I'd never, you know, allowed it to do. That was… let's say, eye-opening.Adnan: [laugh]. Yeah. So, on the particular topic of GitHub Actions, the pull request as an attack vector, like, there's a lot of different forms that an attack can take. So, one of the more common ones—and this is something that's been around for just about as long as GitHub Actions has been around—and this is a certain trigger called ‘pull request target.' What this means is that when someone makes a pull request against the base repository, maybe a branch within the base repository such as main, that will be the workflow trigger.And from a security's perspective, when it runs on that trigger, it does not require approval at all. And that's something that a lot of people don't really realize when they're configuring their workflows. Because normally, when you have a pull request trigger, the maintainer can check a box that says, “Oh, require approval for all external pull requests.” And they think, “Great, everything needs to be approved.” If someone tries to add malicious code to run that's on the pull request target trigger, then they can look at the code before it runs and they're fine.But in a pull request target trigger, there is no approval and there's no way to require an approval, except for configuring the workflow securely. So, in this case, what happens is, and in one particular case against the Microsoft repository, this was a Microsoft reusable GitHub Action called GPT Review. It was vulnerable because it checked out code from my branch—so if I made a pull request, it checked out code from my branch, and you could find this by looking at the workflow—and then it ran tests on my branch, so it's running my code. So, by modifying the entry points, I could run code that runs in the context of that base branch and steal secrets from it, and use those to perform malicious Actions.Corey: Got you. It feels like historically, one of the big threat models around things like this is al—[and when 00:06:02] you have any sort of CI/CD exploit—is either falls down one of two branches: it's either the getting secret access so you can leverage those credentials to pivot into other things—I've seen a lot of that in the AWS space—or more boringly, and more commonly in many cases, it seems to be oh, how do I get it to run this crypto miner nonsense thing, with the somewhat large-scale collapse of crypto across the board, it's been convenient to see that be less prevalent, but still there. Just because you're not making as much money means that you'll still just have to do more of it when it's all in someone else's account. So, I guess it's easier to see and detect a lot of the exploits that require a whole bunch of compute power. The, oh by the way, we stole your secrets and now we're going to use that to lateral into an organization seem like it's something far more… I guess, dangerous and also sneaky.Adnan: Yeah, absolutely. And you hit the nail on the head there with sneaky because when I first demonstrated this, I made a test account, I created a PR, I made a couple of Actions such as I modified the name of the release for the repository, I just put a little tag on it, and didn't do any other changes. And then I also created a feature branch in one of Microsoft's repositories. I don't have permission to do that. That just sat there for about almost two weeks and then someone else exploited it and then they responded to it.So, sneaky is exactly the word you could describe something like this. And another reason why it's concerning is, beyond the secret disclosure for—and in this case, the repository only had an OpenAI API key, so… okay, you can talk to ChatGPT for free. But this was itself a Github Action and it was used by another Microsoft machine-learning project that had a lot more users, called SynapseML, I believe was the name of the other project. So, what someone could do is backdoor this Action by creating a commit in a feature branch, which they can do by stealing the built-in GitHub token—and this is something that all Github Action runs have; the permissions for it vary, but in this case, it had the right permissions—attacker could create a new branch, modify code in that branch, and then modify the tag, which in Git, tags are mutable, so you can just change the commit the tag points to, and now, every time that other Microsoft repository runs GPT Review to review a pull request, it's running attacker-controlled code, and then that could potentially backdoor that other repository, steal secrets from that repository.So that's, you know, one of the scary parts of, in particular backdooring a Github Action. And I believe there was a very informative Blackhat talk this year, that someone from—I'm forgetting the name of the author, but it was a very good watch about how Actions vulnerabilities can be vulnerable, and this is kind of an example of—it just happened to be that this was an Action as well.Corey: That feels like this is an area of exploit that is becoming increasingly common. I tie it almost directly to the rise of GitHub Actions as the default CI/CD system that a lot of folks have been using. For the longest time, it seemed like a poorly configured Jenkins box hanging out somewhere in your environment that was the exception to the Infrastructure as Code rule because everyone has access to it, configures it by hand, and invariably it has access to production was the way that people would exploit things. For a while, you had CircleCI and Travis-CI, before Travis imploded and Circle did a bunch of layoffs. Who knows where they're at these days?But it does seem that the common point now has been GitHub Actions, and a .github folder within that Git repo with a workflows YAML file effectively means that a whole bunch of stuff can happen that you might not be fully aware of when you're cloning or following along with someone's tutorial somewhere. That has caught me out in a couple of strange ways, but nothing disastrous because I do believe in realistic security boundaries. I just worry how much of this is the emerging factor of having a de facto standard around this versus something that Microsoft has actively gotten wrong. What's your take on it?Adnan: Yeah. So, my take here is that Github could absolutely be doing a lot more to help prevent users from shooting themselves in the foot. Because their documentation is very clear and quite frankly, very good, but people aren't warned when they make certain configuration settings in their workflows. I mean, GitHub will happily take the settings and, you know, they hit commit, and now the workflow could be vulnerable. There's no automatic linting of workflows, or a little suggestion box popping up like, “Hey, are you sure you want to configure it this way?”The technology to detect that is there. There's a lot of third-party utilities that will lint Actions workflows. Heck, for looking for a lot of these pull request target-type vulnerabilities, I use a Github code search query. It's just a regular expression. So, having something that at least nudges users to not make that mistake would go really far in helping people not make these mista—you know, adding vulnerabilities to their projects.Corey: It seems like there's also been issues around the GitHub Actions integration approach where OICD has not been scoped correctly a bunch of times. I've seen a number of articles come across my desk in that context and fortunately, when I wound up passing out the ability for one of my workflows to deploy to my AWS account, I got it right because I had no idea what I was doing and carefully followed the instructions. But I can totally see overlooking that one additional parameter that leaves things just wide open for disaster.Adnan: Yeah, absolutely. That's one where I haven't spent too much time actually looking for that myself, but I've definitely read those articles that you mentioned, and yeah, it's very easy for someone to make that mistake, just like, it's easy for someone to just misconfigure their Action in general. Because in some of the cases where I found vulnerabilities, there would actually be a commit saying, “Hey, I'm making this change because the Action needs access to these certain secrets. And oh, by the way, I need to update the checkout steps so it actually checks out the PR head so that it's [testing 00:12:14] that PR code.” Like, people are actively making a decision to make it vulnerable because they don't realize the implication of what they've just done.And in the second Microsoft repository that I found the bug in, was called Microsoft Confidential Sidecar Containers. That repository, the developer a week prior to me identifying the bug made a commit saying that we're making a change and it's okay because it requires approval. Well, it doesn't because it's a pull request target.Corey: Part of me wonders how much of this is endemic to open-source as envisioned through enterprises versus my world of open-source, which is just eh, I've got this weird side project in my spare time, and it seemed like it might be useful to someone else, so I'll go ahead and throw it up there. I understand that there's been an awful lot of commercialization of open-source in recent years; I'm not blind to that fact, but it also seems like there's a lot of companies playing very fast and loose with things that they probably shouldn't be since they, you know, have more of a security apparatus than any random contributors standing up a clone of something somewhere will.Adnan: Yeah, we're definitely seeing this a lot in the machine-learning space because of companies that are trying to move so quickly with trying to build things because OpenAI AI has blown up quite a bit recently, everyone's trying to get a piece of that machine learning pie, so to speak. And another thing of what you're seeing is, people are deploying self-hosted runners with Nvidia, what is it, the A100, or—it's some graphics card that's, like, $40,000 apiece attached to runners for running integration tests on machine-learning workflows. And someone could, via a pull request, also just run code on those and mine crypto.Corey: I kind of miss the days when exploiting computers is basically just a way for people to prove how clever they were or once in a blue moon come up with something innovative. Now, it's like, well, we've gone all around the mulberry bush just so we can basically make computers solve a sudoku form, and in return, turn that into money down the road. It's frustrating, to put it gently.Adnan: [laugh].Corey: When you take a look across the board at what companies are doing and how they're embracing the emerging capabilities inherent to these technologies, how do you avoid becoming a cautionary tale in the space?Adnan: So, on the flip side of companies having vulnerable workflows, I've also seen a lot of very elegant ways of writing secure workflows. And some of the repositories are using deployment environments—which is the GitHub Actions feature—to enforce approval checks. So, workflows that do need to run on pull request target because of the need to access secrets for pull requests will have a step that requires a deployment environment to complete, and that deployment environment is just an approval and it doesn't do anything. So essentially, someone who has permissions to the repository will go in, approve that environment check, and only then will the workflow continue. So, that adds mandatory approvals to pull requests where otherwise they would just run without approval.And this is on, particularly, the pull request target trigger. Another approach is making it so the trigger is only running on the label event and then having a maintainer add a label so the tests can run and then remove the label. So, that's another approach where companies are figuring out ways to write secure workflows and not leave their repositories vulnerable.Corey: It feels like every time I turn around, Github Actions has gotten more capable. And I'm not trying to disparage the product; it's kind of the idea of what we want. But it also means that there's certainly not an awareness in the larger community of how these things can go awry that has kept up with the pace of feature innovation. How do you balance this without becoming the Department of No?Adnan: [laugh]. Yeah, so it's a complex issue. I think GitHub has evolved a lot over the years. Actions, it's—despite some of the security issues that happen because people don't configure them properly—is a very powerful product. For a CI/CD system to work at the scale it does and allow so many repositories to work and integrate with everything else, it's really easy to use. So, it's definitely something you don't want to take away or have an organization move away from something like that because they are worried about the security risks.When you have features coming in so quickly, I think it's important to have a base, kind of like, a mandatory reading. Like, if you're a developer that writes and maintains an open-source software, go read through this document so you can understand the do's and don'ts instead of it being a patchwork where some people, they take a good security approach and write secure workflows and some people just kind of stumble through Stack Overflow, find what works, messes around with it until their deployment is working and their CI/CD is working and they get the green checkmark, and then they move on to their never-ending list of tasks that—because they're always working on a deadline.Corey: Reminds me of a project I saw a few years ago when it came out that Volkswagen had been lying to regulators. It was a framework someone built called ‘Volkswagen' that would detect if it was running inside of a CI/CD environment, and if so, it would automatically make all the tests pass. I have a certain affinity for projects like that. Another one was a tool that would intentionally degrade the performance of a network connection so you could simulate having a latent or stuttering connection with packet loss, and they call that ‘Comcast.' Same story. I just thought that it's fun seeing people get clever on things like that.Adnan: Yeah, absolutely.Corey: When you take a look now at the larger stories that are emerging in the space right now, I see an awful lot of discussion coming up that ties to SBOMs and understanding where all of the components of your software come from. But I chased some stuff down for fun once, and I gave up after 12 dependency leaps from just random open-source frameworks. I mean, I see the Dependabot problem that this causes as well, where whenever I put something on GitHub and then don't touch it for a couple of months—because that's how I roll—I come back and there's a whole bunch of terrifyingly critical updates that it's warning me about, but given the nature of how these things get used, it's never going to impact anything that I'm currently running. So, I've learned to tune it out and just ignore it when it comes in, which is probably the worst of all possible approaches. Now, if I worked at a bank, I should probably take a different perspective on this, but I don't.Adnan: Mm-hm. Yeah. And that's kind of a problem you see, not just with SBOMs. It's just security alerting in general, where anytime you have some sort of signal and people who are supposed to respond to it are getting too much of it, you just start to tune all of it out. It's like that human element that applies to so much in cybersecurity.And I think for the particular SBOM problem, where, yeah, you're correct, like, a lot of it… you don't have reachability because you're using a library for one particular function and that's it. And this is somewhere where I'm not that much of an expert in where doing more static source analysis and reachability testing, but I'm certain there are products and tools that offer that feature to actually prioritize SBOM-based alerts based on actual reachability versus just having an as a dependency or not.[midroll 00:20:00]Corey: I feel like, on some level, wanting people to be more cautious about what they're doing is almost shouting into the void because I'm one of the only folks I found that has made the assertion that oh yeah, companies don't actually care about security. Yes, they email you all the time after they failed to protect your security, telling you how much they care about security, but when you look at where they invest, feature velocity always seems to outpace investment in security approaches. And take a look right now at the hype we're seeing across the board when it comes to generative AI. People are excited about the capabilities and security is a distant afterthought around an awful lot of these things. I don't know how you drive a broader awareness of this in a way that sticks, but clearly, we haven't collectively found it yet.Adnan: Yeah, it's definitely a concern. When you see things on—like for example, you can look at Github's roadmap, and there's, like, a feature there that's, oh, automatic AI-based pull request handling. Okay, so does that mean one day, you'll have a GitHub-powered LLM just approve PRs based on whether it determines that it's a good improvement or not? Like, obviously, that's not something that's the case now, but looking forward to maybe five, six years in the future, in the pursuit of that ever-increasing velocity, could you ever have a situation where actual code contributions are reviewed fully by AI and then approved and merged? Like yeah, that's scary because now you have a threat actor that could potentially specifically tailor contributions to trick the AI into thinking they're great, but then it could turn around and be a backdoor that's being added to the code.Obviously, that's very far in the future and I'm sure a lot of things will happen before that, but it starts to make you wonder, like, if things are heading that way. Or will people realize that you need to look at security at every step of the way instead of just thinking that these newer AI systems can just handle everything?Corey: Let's pivot a little bit and talk about your day job. You're a lead security engineer at what I believe to be a security-focused consultancy. Or—Adnan: Yeah.Corey: If you're not a SaaS product. Everything seems to become a SaaS product in the fullness of time. What's your day job look like?Adnan: Yeah, so I'm a security engineer on Praetorian's red team. And my day-to-day, I'll kind of switch between application security and red-teaming. And that kind of gives me the opportunity to, kind of, test out newer things out in the field, but then also go and do more traditional application security assessments and code reviews, and reverse engineering to kind of break up the pace of work. Because red-teaming can be very fast and fast-paced and exciting, but sometimes, you know, that can lead to some pretty late nights. But that's just the nature of being on a red team [laugh].Corey: It feels like as soon as I get into the security space and start talking to cloud companies, they get a lot more defensive than when I'm making fun of, you know, bad service naming or APIs that don't make a whole lot of sense. It feels like companies have a certain sensitivity around the security space that applies to almost nothing else. Do you find, as a result, that a lot of the times when you're having conversations with companies and they figure out that, oh, you're a red team for a security researcher, oh, suddenly, we're not going to talk to you the way we otherwise might. We thought you were a customer, but nope, you can just go away now.Adnan: [laugh]. I personally haven't had that experience with cloud companies. I don't know if I've really tried to buy a lot. You know, I'm… if I ever buy some infrastructure from cloud companies as an individual, I just kind of sign up and put in my credit card. And, you know, they just, like, oh—you know, they just take my money. So, I don't really think I haven't really, personally run into anything like that yet [laugh].Corey: Yeah, I'm curious to know how that winds up playing out in some of these, I guess, more strategic, larger company environments. I don't get to see that because I'm basically a tiny company that dabbles in security whenever I stumble across something, but it's not my primary function. I just worry on some level one of these days, I'm going to wind up accidentally dropping a zero-day on Twitter or something like that, and suddenly, everyone's going to come after me with the knives. I feel like [laugh] at some point, it's just going to be a matter of time.Adnan: Yeah. I think when it comes to disclosing things and talking about techniques, the key thing here is that a lot of the things that I'm talking about, a lot of the things that I'll be talking about in some blog posts that have coming out, this is stuff that these companies are seeing themselves. Like, they recognize that these are security issues that people are introducing into code. They encourage people to not make these mistakes, but when it's buried in four links deep of documentation and developers are tight on time and aren't digging through their security documentation, they're just looking at what works, getting it to work and moving on, that's where the issue is. So, you know, from a perspective of raising awareness, I don't feel bad if I'm talking about something that the company itself agrees is a problem. It's just a lot of the times, their own engineers don't follow their own recommendations.Corey: Yeah, I have opinions on these things and unfortunately, it feels like I tend to learn them in some of the more unfortunate ways of, oh, yeah, I really shouldn't care about this thing, but I only learned what the norm is after I've already done something. This is, I think, the problem inherent to being small and independent the way that I tend to be. We don't have enough people here for there to be a dedicated red team and research environment, for example. Like, I tend to bleed over a little bit into a whole bunch of different things. We'll find out. So far, I've managed to avoid getting it too terribly wrong, but I'm sure it's just a matter of time.So, one area that I think seems to be a way that people try to avoid cloud issues is oh, I read about that in the last in-flight magazine that I had in front of me, and the cloud is super insecure, so we're going to get around all that by running our own infrastructure ourselves, from either a CI/CD perspective or something else. Does that work when it comes to this sort of problem?Adnan: Yeah, glad you asked about that. So, we've also seen open-s—companies that have large open-source presence on GitHub just opt to have self-hosted Github Actions runners, and that opens up a whole different Pandora's box of attacks that an attacker could take advantage of, and it's only there because they're using that kind of runner. So, the default GitHub Actions runner, it's just an agent that runs on a machine, it checks in with GitHub Actions, it pulls down builds, runs them, and then it waits for another build. So, these are—the default state is a non-ephemeral runner with the ability to fork off tasks that can run in the background. So, when you have a public repository that has a self-hosted runner attached to it, it could be at the organization level or it could be at the repository level.What an attacker can just do is create a pull request, modify the pull request to run on a self-hosted runner, write whatever they want in the pull request workflow, create a pull request, and now as long as they were a previous contributor, meaning you fixed a typo, you… that could be a such a, you know, a single character typo change could even cause that, or made a small contribution, now they create the pull request. The arbitrary job that they wrote is now picked up by that self-hosted runner. They can fork off it, process it to run in the background, and then that just continues to run, the job finishes, their pull request, they'll just—they close it. Business as usual, but now they've got an implant on the self-hosted runner. And if the runners are non-ephemeral, it's very hard to completely lock that down.And that's something that I've seen, there's quite a bit of that on GitHub where—and you can identify it just by looking at the run logs. And that's kind of comes from people saying, “Oh, let's just self-host our runners,” but they also don't configure that properly. And that opens them up to not only tampering with their repositories, stealing secrets, but now depending on where your runner is, now you potentially could be giving an attacker a foothold in your cloud environment.Corey: Yeah, that seems like it's generally a bad thing. I found that cloud tends to be more secure than running it yourself in almost every case, with the exception that once someone finds a way to break into it, there's suddenly a lot more eggs in a very large, albeit more secure, basket. So, it feels like it's a consistent trade-off. But as time goes on, it feels like it is less and less defensible, I think, to wind up picking out an on-prem strategy from a pure security point of view. I mean, there are reasons to do it. I'm just not sure.Adnan: Yeah. And I think that distinction to be made there, in particular with CI/CD runners is there's cloud, meaning you let your—there's, like, full cloud meaning you let your CI/CD provider host your infrastructure as well; there's kind of that hybrid approach you mentioned, where you're using a CI/CD provider, but then you're bringing your own cloud infrastructure that you think you could secure better; or you have your runners sitting in vCenter in your own data center. And all of those could end up being—both having a runner in your cloud and in your data center could be equally vulnerable if you're not segmenting builds properly. And that's the core issue that happens when you have a self-hosted runner is if they're not ephemeral, it's very hard to cut off all attack paths. There's always something an attacker can do to tamper with another build that'll have some kind of security impact. You need to just completely isolate your builds and that's essentially what you see in a lot of these newer guidances like the [unintelligible 00:30:04] framework, that's kind of the core recommendation of it is, like, one build, one clean runner.Corey: Yeah, that seems to be the common wisdom. I've been doing a lot of work with my own self-hosted runners that run inside of Lambda. Definitionally those are, of course, ephemeral. And there's a state machine that winds up handling that and screams bloody murder if there's a problem with it. So far, crossing fingers hoping it works out well.And I have a bounded to a very limited series of role permissions, and of course, its own account of constraint blast radius. But there's still—there are no guarantees in this. The reason I build it the way I do is that, all right, worst case someone can get access to this. The only thing they're going to have the ability to do is, frankly, run up my AWS bill, which is an area I have some small amount of experience with.Adnan: [laugh]. Yeah, yeah, that's always kind of the core thing where if you get into someone's cloud, like, well, just sit there and use their compute resources [laugh].Corey: Exactly. I kind of miss when that was the worst failure mode you had for these things.Adnan: [laugh].Corey: I really want to thank you for taking the time to speak with me today. If people want to learn more, where's the best place for them to find you?Adnan: I do have a Twitter account. Well, I guess you can call it Twitter anymore, but, uh—Corey: Watch me. Sure I can.Adnan: [laugh]. Yeah, so I'm on Twitter, and it's @adnanthekhan. So, it's like my first name with ‘the' and then K-H-A-N because, you know, my full name probably got taken up, like, years before I ever made a Twitter account. So, occasionally I tweet about GitHub Actions there.And on Praetorian's website, I've got a couple of blog posts. I have one—the one that really goes in-depth talking about the two Microsoft repository pull request attacks, and a couple other ones that are disclosed, will hopefully drop on the twenty—what is that, Tuesday? That's going to be the… that's the 26th. So, it should be airing on the Praetorian blog then. So, if you—Corey: Excellent. It should be out by the time this is published, so we will, of course, put a link to that in the [show notes 00:32:01]. Thank you so much for taking the time to speak with me today. I appreciate it.Adnan: Likewise. Thank you so much, Corey.Corey: Adnan Khan, lead security engineer at Praetorian. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an insulting comment that's probably going to be because your podcast platform of choice is somehow GitHub Actions.Adnan: [laugh].Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
Joe Karlsson, Data Engineer at Tinybird, joins Corey on Screaming in the Cloud to discuss what it's like working in the world of data right now and how he manages the overlap between his social media presence and career. Corey and Joe chat about the rise of AI and whether or not we're truly seeing advancements in that realm or just trendy marketing plays, and Joe shares why he feels data is getting a lot more attention these days and what it's like to work in data at this time. Joe also shares insights into how his mental health has been impacted by having a career and social media presence that overlaps, and what steps he's taken to mitigate the negative impact. About JoeJoe Karlsson (He/They) is a Software Engineer turned Developer Advocate at Tinybird. He empowers developers to think creatively when building data intensive applications through demos, blogs, videos, or whatever else developers need.Joe's career has taken him from building out database best practices and demos for MongoDB, architecting and building one of the largest eCommerce websites in North America at Best Buy, and teaching at one of the most highly-rated software development boot camps on Earth. Joe is also a TEDx Speaker, film buff, and avid TikToker and Tweeter.Links Referenced: Tinybird: https://www.tinybird.co/ Personal website: https://joekarlsson.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Are you navigating the complex web of API management, microservices, and Kubernetes in your organization? Solo.io is here to be your guide to connectivity in the cloud-native universe!Solo.io, the powerhouse behind Istio, is revolutionizing cloud-native application networking. They brought you Gloo Gateway, the lightweight and ultra-fast gateway built for modern API management, and Gloo Mesh Core, a necessary step to secure, support, and operate your Istio environment.Why struggle with the nuts and bolts of infrastructure when you can focus on what truly matters - your application. Solo.io's got your back with networking for applications, not infrastructure. Embrace zero trust security, GitOps automation, and seamless multi-cloud networking, all with Solo.io.And here's the real game-changer: a common interface for every connection, in every direction, all with one API. It's the future of connectivity, and it's called Gloo by Solo.io.DevOps and Platform Engineers, your journey to a seamless cloud-native experience starts here. Visit solo.io/screaminginthecloud today and level up your networking game.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn and I am joined today by someone from well, we'll call it the other side of the tracks, if I can—Joe: [laugh].Corey: —be blunt and disrespectful. Joe Karlsson is a data engineer at Tinybird, but I really got to know who he is by consistently seeing his content injected almost against my will over on the TikToks. Joe, how are you?Joe: I'm doing so well and I'm so sorry for anything I've forced down your throat online. Thanks for having me, though.Corey: Oh, it's always a pleasure to talk to you. No, the problem I've got with it is that when I'm in TikTok mode, I don't want to think about computers anymore. I want to find inane content that I can just swipe six hours away without realizing it because that's how I roll.Joe: TikTok is too smart, though. I think it knows that you are doing a lot of stuff with computers and even if you keep swiping away, it's going to keep serving it up to you.Corey: For a long time, it had me pinned as a lesbian, which was interesting. Which I suppose—Joe: [laugh]. It happened to me, too.Corey: Makes sense because I follow a lot of women who are creators in comics and the rest, but I'm not interested in the thirst trap approach. So, it's like, “Mmm, this codes as lesbian.” Then they started showing me ads for ADHD, which I thought was really weird until I'm—oh right. I'm on TikTok. And then they started recommending people that I'm surprised was able to disambiguate until I realized these people have been at my house and using TikTok from my IP address, which probably is going to get someone murdered someday, but it's probably easy to wind up doing an IP address match.Joe: I feel like I have to, like, separate what is me and what is TikTok, like, trying to serve it up because I've been on lesbian TikTok, too, ADHD, autism, like TikTok. And, like, is this who I am? I don't know. [unintelligible 00:02:08] bring it to my therapist.Corey: You're learning so much about yourself based upon an algorithm. Kind of wild, isn't it?Joe: [laugh]. Yeah, I think we may be a little, like, neuro-spicy, but I think it might be a little overblown with what TikTok is trying to diagnose us with. So, it's always good to just keep it in check, you know?Corey: Oh, yes. So, let's see, what's been going on lately? We had Google Next, which I think the industry largely is taking not seriously enough. For years, it felt like a try-hard, me too version of re:Invent. And this year, it really feels like it's coming to its own. It is defining itself as something other than oh, us too.Joe: I totally agree. And that's where you and I ran into recently, too. I feel like post-Covid I'm still, like, running into people I met on the internet in real life, and yeah, I feel like, yeah, re:Invent and Google Next are, like, the big ones.I totally agree. It feels like—I mean, it's definitely, like, heavily inspired by it. And it still feels like it's a little sibling in some ways, but I do feel like it's one of the best conferences I've been to since, like, a pre-Covid 2019 AWS re:Invent, just in terms of, like… who was there. The energy, the vibes, I feel like people were, like, having fun. Yeah, I don't know, it was a great conference this year.Corey: Usually, I would go to Next in previous years because it was a great place to go to hang out with AWS customers. These days, it feels like it's significantly more than that. It's, everyone is using everything at large scale. I think that is something that is not fully understood. You talk to companies that are, like, Netflix, famously all in on AWS. Yeah, they have Google stuff, too.Everyone does. I have Google stuff. I have a few things in Azure, for God's sake. It's one of those areas where everything starts to diffuse throughout a company as soon as you hire employee number two. And that is, I think, the natural order of things. The challenge, of course, is the narrative people try and build around it.Joe: Yep. Oh, totally. Multi-cloud's been huge for you know, like, starting to move up. And it's impossible not to. It was interesting seeing, like, Google trying to differentiate itself from Azure and AWS. And, Corey, I feel like you'd probably agree with this, too, AI was like, definitely the big buzzword that kept trying to, like—Corey: Oh, God. Spare me. And I say that, as someone who likes AI, I think that there's a lot of neat stuff lurking around and value hiding within generative AI, but the sheer amount of hype around it—and frankly—some of the crypto bros have gone crashing into the space, make me want to distance myself from it as far as humanly possible, just because otherwise, I feel like I get lumped in with that set. And I don't want that.Joe: Yeah, I totally agree. I know it feels like it's hard right now to, like, remain ungrifty, but, like, still, like—trying—I mean, everyone's trying to just, like, hammer in an AI perspective into every product they have. And I feel like a lot of companies, like, still don't really have a good use case for it. You're still trying to, like, figure that out. We're seeing some cool stuff.Honestly, the hard part for me was trying to differentiate between people just, like, bragging about OpenAI API addition they added to the core product or, like, an actual thing that's, like, AI is at the center of what it actually does, you know what I mean? Everything felt like it's kind of like tacked on some sort of AI perspective to it.Corey: One of the things that really is getting to me is that you have these big companies—Google and Amazon most notably—talk about how oh, well, we've actually been working with AI for decades. At this point, they keep trying to push out how long it's been. It's like, “Okay, then not for nothing, then why does”—in Amazon's case—“why does Alexa suck? If you've been working on it for this long, why is it so bad at all the rest?” It feels like they're trying to sprint out with a bunch of services that very clearly were not conceptualized until Chat-Gippity's breakthrough.And now it's oh, yeah, we're there, too. Us, too. And they're pivoting all the marketing around something that, frankly, they haven't demonstrated excellence with. And I feel like they're leaving a lot of their existing value proposition completely in the dust. It's, your customers are not using you because of the speculative future, forward-looking AI things; it's because you are able to solve business problems today in ways that are not highly speculative and are well understood. That's not nothing and there needs to be more attention paid to that. And I feel like there's this collective marketing tripping over itself to wrap itself in hype that does them no services.Joe: I totally agree. I feel like honestly, just, like, a marketing perspective, I feel like it's distracting in a lot of ways. And I know it's hot and it's cool, but it's like, I think it's harder right now to, like, stay focused to what you're actually doing well, as opposed to, like, trying to tack on some AI thing. And maybe that's great. I don't know.Maybe that's—honestly, maybe you're seeing some traction there. I don't know. But I totally agree. I feel like everyone right now is, like, selling a future that we don't quite have yet. I don't know. I'm worried that what's going to happen again, is what happened back in the IBM Watson days where everyone starts making bold—over-promising too much with AI until we see another AI winter again.Corey: Oh, the subtext is always, we can't wait to fire our entire customer service department. That one—Joe: Yeah.Corey: Just thrills me.Joe: [laugh].Corey: It's like, no, we're just going to get rid of junior engineers and just have senior engineers. Yeah, where do you think those people come from, by the way? We aren't—they aren't just emerging fully formed from the forehead of some god somewhere. And we're also seeing this wild divergence from reality. Remember, I fix AWS bills for a living. I see very large companies, very large AWS spend.The majority of spend remains on EC2 across the board. So, we don't see a lot of attention paid to that at re:Invent, even though it's the lion's share of everything. When we do contract negotiations, we talk about generative AI plan and strategy, but no one's saying, oh, yeah, we're spending 100 million a year right now on AWS but we should commit 250 because of all this generative AI stuff we're getting into. It's all small-scale experimentation and seeing if there's value there. But that's a far cry from being the clear winner what everyone is doing.I'd further like to point out that I can tell that there's a hype cycle in place and I'm trying to be—and someone's trying to scam me. As soon as there's a sense of you have to get on this new emerging technology now, now, now, now, now. I didn't get heavily into cloud till 2016 or so and I seem to have done all right with that. Whenever someone is pushing you to get into an emerging thing where it hasn't settled down enough to build a curriculum yet, I feel like there's time to be cautious and see what the actual truth is. Someone's selling something; if you can't spot the sucker, chances are, it's you.Joe: [laugh]. Corey, have you thought about making an AI large language model that will help people with their cloud bills? Maybe just feed it, like, your invoices [laugh].Corey: That has been an example, I've used a number of times with a variety of different folks where if AI really is all it's cracked up to be, then the AWS billing system is very much a bounded problem space. There's a lot of nuance and intricacy to it, but it is a finite set of things. Sure, [unintelligible 00:08:56] space is big. So, training something within those constraints and within those confines feels like it would be a terrific proof-of-concept for a lot of these things. Except that when I've experimented a little bit and companies have raised rounds to throw into this, it never quite works out because there's always human context involved. The, oh yeah, we're going to wind up turning off all those idle instances, except they're in idle—by whatever metric you're using—for a reason. And the first time you take production down, you're not allowed to save money anymore.Joe: Nope. That's such a good point. I agree. I don't know about you, Corey. I've been fretting about my job and, like, what I'm doing. I write a lot, I do a lot of videos, I'm programming a lot, and I think… obviously, we've been hearing a lot about, you know, if it's going to replace us or not. I honestly have been feeling a lot better recently about my job stability here. I don't know. I totally agree with you. There's always that, like, human component that needs to get added to it. But who knows, maybe it's going to get better. Maybe there'll be an AI-automated billing management tool, but it'll never be as good as you, Corey. Maybe it will. I don't know. [laugh].Corey: It knows who I am. When I tell it to write in the style of me and give it a blog post topic and some points I want to make, almost everything it says is wrong. But what I'll do is I'll copy that into a text editor, mansplain-correct the robot for ten minutes, and suddenly I've got the bones of a decent rough draft because. And yeah, I'll wind up plagiarizing three or four words in a row at most, but that's okay. I'm plagiarizing the thing that's plagiarizing from me and there's a beautiful symmetry to that. What I don't understand is some of the outreach emails and other nonsensical stuff I'll see where people are letting unsupervised AI just write things under their name and sending it out to people. That is anathema to me.Joe: I totally agree. And it might work today, it might work tomorrow, but, like, it's just a matter of time before something blows up. Corey, I'm curious. Like, personally, how do you feel about being in the ChatGPT, like, brain? I don't know, is that flattering? Does that make you nervous at all?Corey: Not really because it doesn't get it in a bunch of ways. And that's okay. I found the same problem with people. In my time on Twitter, when I started live-tweet shitposting about things—as I tend to do as my first love language—people will often try and do exactly that. The problem that I run into is that, “The failure mode of ‘clever' is ‘asshole,'” as John Scalzi famously said, and as a direct result of that, people wind up being mean and getting it wrong in that direction.It's not that I'm better than they are. It's, I had a small enough following, and no one knew who I was in my mean years, and I realized I didn't feel great making people sad. So okay, you've got to continue to correct the nosedive. But it is perilous and it is difficult to understand the nuance. I think occasionally when I prompt it correctly, it comes up with some amazing connections between things that I wouldn't have seen, but that's not the same thing as letting it write something completely unfettered.Joe: Yeah, I totally agree. The nuance definitely gets lost. It may be able to get, like, the tone, but I think it misses a lot of details. That's interesting.Corey: And other people are defending it when that hallucinates. Like, yeah, I understand there are people that do the same thing, too. Yeah, the difference is, in many cases, lying to me and passing it off otherwise is a firing offense in a lot of places. Because if you're going to be 19 out of 20 times, you're correct, but 5% wrong, you're going to bluff, I can't trust anything you tell me.Joe: Yeah. It definitely, like, brings your, like—the whole model into question.Corey: Also, remember that my medium for artistic creation is often writing. And I think that, on some level, these AI models are doing the same things that we do. There are still turns of phrase that I use that I picked up floating around Usenet in the mid-90s. And I don't remember who said it or the exact context, but these words and phrases have entered my lexicon and I'll use them and I don't necessarily give credit to where the first person who said that joke 30 years ago. But it's a—that is how humans operate. We are influenced by different styles of writing and learn from the rest.Joe: True.Corey: That's a bit different than training something on someone's artistic back catalog from a painting perspective and then emulating it, including their signature in the corner. Okay, that's a bit much.Joe: [laugh]. I totally agree.Corey: So, we wind up looking right now at the rush that is going on for companies trying to internalize their use of enterprise AI, which is kind of terrifying, and it all seems to come back to data.Joe: Yes.Corey: You work in the data space. How are you seeing that unfold?Joe: Yeah, I do. I've been, like, making speculations about the future of AI and data forever. I've had dreams of tools I've wanted forever, and I… don't have them yet. I don't think they're quite ready yet. I don't know, we're seeing things like—tha—I think people are working on a lot of problems.For example, like, I want AI to auto-optimize my database. I want it to, like, make indexes for me. I want it to help me with queries or optimizing queries. We're seeing some of that. I'm not seeing anyone doing particularly well yet. I think it's up in the air.I feel like it could be coming though soon, but that's the thing, though, too, like, I mean, if you mess up a query, or, like, a… large language model hallucinates a really shitty query for you, that could break your whole system really quickly. I feel like there still needs to be, like, a human being in the middle of it to, like, kind of help.Corey: I saw a blog post recently that AWS put out gave an example that just hard-coded a credential into it. And they said, “Don't do this, but for demonstration purposes, this is how it works.” Well, that nuance gets lost when you use that for AI training and that's, I think, in part, where you start seeing a whole bunch of the insecure crap these things spit out.Joe: Yeah, I totally agree. Well, I thought the big thing I've seen, too, is, like, large language models typically don't have a secure option and you're—the answer is, like, help train the model itself later on. I don't know, I'm sure, like, a lot of teams don't want to have their most secret data end up public on a large language model at some point in the future. Which is, like, a huge issue right now.Corey: I think that what we're seeing is that you still need someone with expertise in a given area to review what this thing spits out. It's great at solving a lot of the busy work stuff, but you still need someone who's conversant with the concepts to look at it. And that is, I think, something that turns into a large-scale code review, where everyone else just tends to go, “Oh, okay. We're—do this with code review.” “Oh, how big is the diff?” “50,000 lines.” “Looks good to me.” Whereas, “Three lines.” “I'm going to criticize that thing with four pages of text.” People don't want to do the deep-dive stuff, and—when there's a huge giant project that hits. So, they won't. And it'll be fine, right up until it isn't.Joe: Corey, you and I know people and developers, do you think it's irresponsible to put out there an example of how to do something like that, even with, like, an asterisk? I feel like someone's going to still go out and try to do that and probably push that to production.Corey: Of course they are.Joe: [laugh].Corey: I've seen this with some of my own code. I had something on Docker Hub years ago with a container that was called ‘Terrible Ideas.' And I'm sure being used in, like—it was basically the environment I use for a talk I gave around Git, which makes sense. And because I don't want to reset all the repositories back to the way they came from with a bunch of old commands, I just want a constrained environment that will be the same every time I give the talk. Awesome.I'm sure it's probably being run in production at a bank somewhere because why wouldn't it be? That's people. That's life. You're not supposed to just copy and paste from Chat-Gippity. You're supposed to do that from Stack Overflow like the rest of us. Where do you think your existing code's coming from in a lot of these shops?Joe: Yep. No, I totally agree. Yeah, I don't know. It'll be interesting to see how this shakes out with, like, people going to doing this stuff, or how honest they're going to be about it, too. I'm sure it's happening. I'm sure people are tripping over themselves right now, [adding 00:16:12].Corey: Oh, yeah. But I think, on some level, you're going to see a lot more grift coming out of this stuff. When you start having things that look a little more personalized, you can use it for spam purposes, you can use it for, I'm just going to basically copy and paste what this says and wind up getting a job on Upwork or something that is way more than I could handle myself, but using this thing, I'm going to wind up coasting through. Caveat emptor is always the case on that.Joe: Yeah, I totally agree.Corey: I mean, it's easy for me to sit here and talk about ethics. I believe strongly in doing the right thing. But I'm also not worried about whether I'm able to make rent this month or put food on the table. That's a luxury. At some point, like, a lot of that strips away and you do what you have to do to survive. I don't necessarily begrudge people doing these things until it gets to a certain point of okay, now you're not doing this to stay alive anymore. You're doing this to basically seek rent.Joe: Yeah, I agree. Or just, like, capitalize on it. I do think this is less—like, the space is less grifty than the crypto space, but as we've seen over and over and over and over again, in tech, there's a such a fine line between, like, a genuinely great idea, and somebody taking advantage of it—and other people—with that idea.Corey: I think that's one of those sad areas where you're not going to be able to fix human nature, regardless of the technology stack you bring to bear.Joe: Yeah, I totally agree.Corey: So, what else are you seeing these days that interesting? What excites you? What do you see that isn't getting enough attention in the space?Joe: I don't know, I guess I'm in the data space, I'm… the thing I think I do see a lot of is huge interest in data. Data right now is the thing that's come up. Like, I don't—that's the thing that's training these models and everyone trying to figure out what to do with these data, all these massive databases, data lakes, whatever. I feel like everyone's, kind of like, taking a second look at all of this data they've been collecting for years and haven't really known what to do with it and trying to figure out either, like, if you can make a model out of that, if you try to, like… level it up, whatever. Corey, you and I were joking around recently—you've had a lot of data people on here recently, too—I feel like us data folks are just getting extra loud right now. Or maybe there's just the data spaces, that's where the action's at right now.I don't know, the markets are really weird. Who knows? But um, I feel like data right now is super valuable and more so than ever. And even still, like, I mean, we're seeing, like, companies freaking out, like, Twitter and Reddit freaking out about accessing their data and who's using it and how. I don't know, I feel like there's a lot of action going on there right now.Corey: I think that there's a significant push from the data folks where, for a long time data folks were DBAs—Joe: Yeah.Corey: —let's be direct. And that role has continued to evolve in a whole bunch of different ways. It's never been an area I've been particularly strong in. I am not great at algorithmic complexity, it turns out, you can saturate some beefy instances with just a little bit of data if your queries are all terrible. And if you're unlucky—as I tend to be—and have an aura of destroying things, great, you probably don't want to go and make that what you do.Joe: [laugh]. It's a really good point. I mean, I don't know about, like, if you blow up data at a company, you're probably going to be in big trouble. And especially the scale we're talking about with most companies these days, it's super easy to either take down a server or generate an insane bill off of some shitty query.Corey: Oh, when I was at Reach Local years and years ago—my first Linux admin job—when I broke the web server farm, it was amusing; when I broke part of the data warehouse, nobody was laughing.Joe: [laugh]. I wonder why.Corey: It was a good faith mistake and that's fair. It was a convoluted series of things that set up and honestly, the way the company and my boss responded to me at the time set the course of the rest of my career. But it was definitely something that got my attention. It scares me. I'm a big believer in backups as a direct result.Joe: Yeah. Here's the other thing, too. Actually, our company, Tinybird, is working on versioning with your data sources right now and treating your data sources like Git, but I feel like even still today, most companies are just run by some DBA. There's, like, Mike down the hall is the one responsible keeping their SQL servers online, keeping them rebooted, and like, they're manually updating any changes on there.And I feel like, generally speaking across the industry, we're not taking data seriously. Which is funny because I'm with you on there. Like, I get terrified touching production databases because I don't want anything bad to happen to them. But if we could, like, make it easier to rollback or, like, handle that stuff, that would be so much easier for me and make it, like, less scary to deal with it. I feel like databases and, like, treating it as, like, a serious DevOps practice is not really—I'm not seeing enough of it. It's definitely, people are definitely doing it. Just, I want more.Corey: It seems like with data, there's a lack of iterative approaches to it. A line that someone came up with when I was working with them a decade and change ago was that you can talk about agile all you want, but when it comes to payments, everyone's doing waterfall. And it feels like, on some level, data's kind of the same.Joe: Yeah. And I don't know, like, how to fix it. I think everyone's just too scared of it to really touch it. Migrating over to a different version control, trying to make it not as manual, trying to iterate on it better, I think it's just—I don't blame them. It's hard, it really takes a long time, making sure everything, like, doesn't blow up while you're doing a migration is a pain in the ass. But I feel like that would make everyone's lives so much easier if, like, you could, like, treat it—understand your data and be able to rollback easier with it.Corey: When you take a look across the ecosystem now, are you finding that things have improved since the last time I was in the space, where the state of the art was, “Oh, we need some developer data. We either have this sanitized data somewhere or it's a copy of production that we move around, but only a small bit.” Because otherwise, we always found that oh, that's an extra petabyte of storage was going on someone's developer environment they messed up on three years ago, they haven't been here for two, and oops.Joe: I don't. I have not seen it. Again, that's so tricky, too. I think… yeah, the last time I, like, worked doing that was—usually you just have a really crappy version of production data on staging or development environments and it's hard to copy those over. I think databases are getting better for that.I've been working on, like, the real-time data space for a long time now, so copying data over and kind of streaming that over is a lot easier. I do think seeing, like, separating storage and compute can make it easier, too. But it depends on your data stack. Everyone's using everything all the time and it's super complicated to do that. I don't know about you, Corey, too. I'm sure you've seen, like, services people running, but I feel like we've made a switch as an industry from, like, monoliths to microservices.Now, we're kind of back in the monolith era, but I'm not seeing that happen in the database space. We're seeing, like, data meshing and lots of different databases. I see people who, like, see the value of data monoliths, but I don't see any actual progress in moving back to a single source of [truth of the data 00:23:02]. And I feel like the cat's kind of out of the bag on all the data existing everywhere, all the time, and trying to wrangle that up.Corey: This stuff is hard and there's no easy solution here. There just isn't.Joe: Yeah, there's no way. And embracing that chaos, I think, is going to be huge. I think you have to do it right now. Or trying to find some tool that can, like, wrangle up a bunch of things together and help work with them all at once. Products need to meet people where they're at, too. And, like, data is all over the place and I feel like we kind of have to, like, find tooling that can kind of help work with what you have.Corey: It's a constant challenge, but also a joy, so we'll give it that.Joe: [laugh].Corey: So, I have to ask. Your day job has you doing developer advocacy at Tinybird—Joe: Yes.Corey: But I had to dig in to find that out. It wasn't obvious based upon the TikToks and the Twitter nonsense and the rest. How do you draw the line between day job and you as a person shitposting on the internet about technology?Joe: Corey, I'd be curious to hear your thoughts on this, too. I don't know. I feel like I've been in different places where, like, my job is my life. You know what I mean? There's a very thin line there. Personally, I've been trying to take a step back from that, just from a mental health perspective. Having my professional life be so closely tied to, like, my personal value and who I am has been really bad for my brain.And trying to make that clear at my company is, like, what is mine and what I can help with has been really huge. I feel like the boundaries between myself and my job has gotten too thin. And for a while, I thought that was a great idea; it turns out that was not a great idea for my brain. It's so hard. So, I've been a software engineer and I've done full-time developer advocacy, and I felt like I had a lot more freedom to say what I wanted as, like, a full-time software engineer as opposed to being a developer advocate and kind of representing the company.Because the thing is, I'm always representing the company [online 00:24:56], but I'm not always working, which is kind of like—that—it's kind of a hard line. I feel like there's been, like, ways to get around it though with, like, less private shitposting about things that could piss off a CEO or infringe on an NDA or, you know, whatever, you know what I mean? Yeah, trying to, like, find that balance or trying to, like, use tools to try to separate that has been big. But I don't know, I've been—personally, I've been trying to step—like, start trying to make more of a boundary for that.Corey: Yeah. I don't have much of one, but I also own the company, so my approach doesn't necessarily work for other people. I don't advertise in public that I fix AWS bills very often. That's not the undercurrent to most of my jokes and the rest. Because the people who have that painful problem aren't generally in the audience directly and they certainly don't talk about it extensively.It's word of mouth. It's being fun and engaging so people stick around. And when I periodically do mention it that sort of sticks with them. And in the fullness of time, it works as a way of, “Oh, yeah, everyone knows what you're into. And yeah, when we have this problem, reaching out to you is our first thought.” But I don't know that it's possible to measure its effectiveness. I just know that works.Joe: Yeah. For me, it's like, don't be an asshole and teach don't sell are like, the two biggest things that I'm trying to do all the time. And the goal is not to, like, trick people into, like, thinking I'm not working for a company. I think I try to be transparent, or if, like, I happen to be talking about a product that I'm working for, I try to disclose that. But yeah, I don't know. For me, it's just, like, trying to build up a community of people who, like, understand what I'm trying to put out there. You know what I mean?Corey: Yeah, it's about what you want to be known for, on some level. Part of the problem that I've had for a long time is that I've been pulled in so many directions. [They're 00:26:34] like, “Oh, you're great. Where do I go to learn more?” It's like, “Well, I have this podcast, I have the newsletter, I have the other podcast that I do in the AWS Morning Brief. I have the duckbillgroup.com. I have lastweekinaws.com. I have a Twitter account. I had a YouTube thing for a while.”It's like, there's so many different ways to send people. It's like, what is the top-of-funnel? And for me, my answer has been, sign up for the newsletter at lastweekinaws.com. That keeps you apprised of everything else and you can dial it into taste. It's also, frankly, one of those things that doesn't require algorithmic blessing to continue to show up in people's inboxes. So far at least, we haven't seen algorithms have a significant impact on that, except when they spam-bin something. And it turns out when you write content people like, the providers get yelled at by their customers of, “Hey, I'm trying to read this. What's going on?” I had a couple of reach out to me asking what the hell happened. It's kind of fun.Joe: I love that. And, Corey, I think that's so smart, too. It's definitely been a lesson, I think, for me and a lot of people on—that are terminally online that, like, we don't own our social following on other platforms. With, like, the downfall of Twitter, like, I'm still posting on there, but we still have a bunch of stuff on there, but my… that following is locked in. I can't take that home. But, like, you still have your email newsletter. And I even feel it for tech companies who might be listening to this, too. I feel like owning your email list is, like, not the coolest thing, but I feel like it's criminally underrated, as, like, a way of talking to people.Corey: It doesn't matter what platforms change, what my personal situation changes, I am—like, whatever it is that I wind up doing next, whenever next happens, I'll need a platform to tell people about, and that's what I've been building. I value newsletter subscribers in a metric sense far more highly and weight them more heavily than I do Twitter followers. Anyone can click a follow and then never check Twitter again. Easy enough. Newsletters? Well, that winds up requiring a little bit extra work because we do confirmed opt-ins, for obvious reasons.And we never sell the list. We never—you can't transfer permission for, like that, and we obviously respect it when people say I don't want to hear from your nonsense anymore. Great. Cool. I don't want to send this to people that don't care. Get out of here.Joe: [laugh]. No, I think that's so smart.Corey: Podcasts are impossible on the other end, but I also—you know, I control the domain and that's important to me.Joe: Yeah.Corey: Why don't you build this on top of Substack? Because as soon as Substack pivots, I'm screwed.Joe: Yeah, yeah. Which we've—I think we've seen that they've tried to do, even with the Twitter clone that tried to build last couple years. I've been burned by so many other publishing platforms over and over and over again through the years. Like, Medium, yeah, I criminally don't trust any sort of tech publishing platform anymore that I don't own. [laugh]. But I also don't want to maintain it. It's such a fine line. I just want to, like, maintain something without having to, like, maintain all the infrastructure all the time, and I don't think that exists and I don't really trust anything to help me with that.Corey: You can on some level, I mean, I wind up parking in the newsletter stuff over at ConvertKit. But I can—I have moved it twice already. I could move it again if I needed to. It's about controlling the domain. I have something that fires off once or twice a day that backs up the entire subscriber list somewhere.I don't want to build my own system, but I can also get that in an export form wherever I need it to go. Frankly, I view it as the most valuable asset that I have here because I can always find a way to turn relationships and an audience into money. I can't necessarily find a way to go the opposite direction of, well have money. Time to buy an audience. Doesn't work that way.Joe: [laugh]. No, I totally agree. You know what I do like, though, is Threads, which has kind of fallen off, but I do love the idea of their federated following [and be almost 00:30:02] like, unlock that a little bit. I do think that that's probably going to be the future. And I have to say, I just care as someone who, like, makes shit online. I don't think 98% of people don't really care about that future, but I do. Just getting burned so often on social media platforms, it helps to then have a little bit of flexibility there.Corey: Oh, yeah. And I wish it were different. I feel like, at some level, Elon being Elon has definitely caused a bit of a diaspora of social media and I think that's a good thing.Joe: Yeah. Yeah. I hope it settles down a little bit, but it definitely got things moving again.Corey: Oh, yes. I really want to thank you for taking the time to go through how you view these things. Where's the best place for people to go to follow you learn more, et cetera? Just sign up for TikTok and you'll be all over them, apparently.Joe: Go to the website that I own joekarlsson.com. It's got the links to everything on there. Opt in or out of whatever you find you want. Otherwise, I'm just going to quick plug for the company I work for: tinybird.co. If you're trying to make APIs on top of data, definitely want to check out Tinybird. We work with Kafka, BigQuery, S3, all the data sources could pull it in. [unintelligible 00:31:10] on it and publishes it as an API. It's super easy. Or you could just ignore me. That's fine, too. You could—that's highly encouraged as well.Corey: Always a good decision.Joe: [laugh]. Yeah, I agree. I'm biased, but I agree.Corey: Thanks, Joe. I appreciate your taking the time to speak with me and we'll, of course, put links to all that in the [show notes 00:31:26]. And please come back soon and regale us with more stories.Joe: I will. Thanks, Corey.Corey: Joe Karlsson, data engineer at Tinybird. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an insulting comment that I'll never read because they're going to have a disk problem and they haven't learned the lesson of backups yet.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started. Tinybird: https://www.tinybird.co/ Personal website: https://joekarlsson.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn and I am joined today by someone from well, we'll call it the other side of the tracks, if I can—Joe: [laugh].Corey: —be blunt and disrespectful. Joe Karlsson is a data engineer at Tinybird, but I really got to know who he is by consistently seeing his content injected almost against my will over on the TikToks. Joe, how are you?Joe: I'm doing so well and I'm so sorry for anything I've forced down your throat online. Thanks for having me, though.Corey: Oh, it's always a pleasure to talk to you. No, the problem I've got with it is that when I'm in TikTok mode, I don't want to think about computers anymore. I want to find inane content that I can just swipe six hours away without realizing it because that's how I roll.Joe: TikTok is too smart, though. I think it knows that you are doing a lot of stuff with computers and even if you keep swiping away, it's going to keep serving it up to you.Corey: For a long time, it had me pinned as a lesbian, which was interesting. Which I suppose—Joe: [laugh]. It happened to me, too.Corey: Makes sense because I follow a lot of women who are creators in comics and the rest, but I'm not interested in the thirst trap approach. So, it's like, “Mmm, this codes as lesbian.” Then they started showing me ads for ADHD, which I thought was really weird until I'm—oh right. I'm on TikTok. And then they started recommending people that I'm surprised was able to disambiguate until I realized these people have been at my house and using TikTok from my IP address, which probably is going to get someone murdered someday, but it's probably easy to wind up doing an IP address match.Joe: I feel like I have to, like, separate what is me and what is TikTok, like, trying to serve it up because I've been on lesbian TikTok, too, ADHD, autism, like TikTok. And, like, is this who I am? I don't know. [unintelligible 00:02:08] bring it to my therapist.Corey: You're learning so much about yourself based upon an algorithm. Kind of wild, isn't it?Joe: [laugh]. Yeah, I think we may be a little, like, neuro-spicy, but I think it might be a little overblown with what TikTok is trying to diagnose us with. So, it's always good to just keep it in check, you know?Corey: Oh, yes. So, let's see, what's been going on lately? We had Google Next, which I think the industry largely is taking not seriously enough. For years, it felt like a try-hard, me too version of re:Invent. And this year, it really feels like it's coming to its own. It is defining itself as something other than oh, us too.Joe: I totally agree. And that's where you and I ran into recently, too. I feel like post-Covid I'm still, like, running into people I met on the internet in real life, and yeah, I feel like, yeah, re:Invent and Google Next are, like, the big ones.I totally agree. It feels like—I mean, it's definitely, like, heavily inspired by it. And it still feels like it's a little sibling in some ways, but I do feel like it's one of the best conferences I've been to since, like, a pre-Covid 2019 AWS re:Invent, just in terms of, like… who was there. The energy, the vibes, I feel like people were, like, having fun. Yeah, I don't know, it was a great conference this year.Corey: Usually, I would go to Next in previous years because it was a great place to go to hang out with AWS customers. These days, it feels like it's significantly more than that. It's, everyone is using everything at large scale. I think that is something that is not fully understood. You talk to companies that are, like, Netflix, famously all in on AWS. Yeah, they have Google stuff, too.Everyone does. I have Google stuff. I have a few things in Azure, for God's sake. It's one of those areas where everything starts to diffuse throughout a company as soon as you hire employee number two. And that is, I think, the natural order of things. The challenge, of course, is the narrative people try and build around it.Joe: Yep. Oh, totally. Multi-cloud's been huge for you know, like, starting to move up. And it's impossible not to. It was interesting seeing, like, Google trying to differentiate itself from Azure and AWS. And, Corey, I feel like you'd probably agree with this, too, AI was like, definitely the big buzzword that kept trying to, like—Corey: Oh, God. Spare me. And I say that, as someone who likes AI, I think that there's a lot of neat stuff lurking around and value hiding within generative AI, but the sheer amount of hype around it—and frankly—some of the crypto bros have gone crashing into the space, make me want to distance myself from it as far as humanly possible, just because otherwise, I feel like I get lumped in with that set. And I don't want that.Joe: Yeah, I totally agree. I know it feels like it's hard right now to, like, remain ungrifty, but, like, still, like—trying—I mean, everyone's trying to just, like, hammer in an AI perspective into every product they have. And I feel like a lot of companies, like, still don't really have a good use case for it. You're still trying to, like, figure that out. We're seeing some cool stuff.Honestly, the hard part for me was trying to differentiate between people just, like, bragging about OpenAI API addition they added to the core product or, like, an actual thing that's, like, AI is at the center of what it actually does, you know what I mean? Everything felt like it's kind of like tacked on some sort of AI perspective to it.Corey: One of the things that really is getting to me is that you have these big companies—Google and Amazon most notably—talk about how oh, well, we've actually been working with AI for decades. At this point, they keep trying to push out how long it's been. It's like, “Okay, then not for nothing, then why does”—in Amazon's case—“why does Alexa suck? If you've been working on it for this long, why is it so bad at all the rest?” It feels like they're trying to sprint out with a bunch of services that very clearly were not conceptualized until Chat-Gippity's breakthrough.And now it's oh, yeah, we're there, too. Us, too. And they're pivoting all the marketing around something that, frankly, they haven't demonstrated excellence with. And I feel like they're leaving a lot of their existing value proposition completely in the dust. It's, your customers are not using you because of the speculative future, forward-looking AI things; it's because you are able to solve business problems today in ways that are not highly speculative and are well understood. That's not nothing and there needs to be more attention paid to that. And I feel like there's this collective marketing tripping over itself to wrap itself in hype that does them no services.Joe: I totally agree. I feel like honestly, just, like, a marketing perspective, I feel like it's distracting in a lot of ways. And I know it's hot and it's cool, but it's like, I think it's harder right now to, like, stay focused to what you're actually doing well, as opposed to, like, trying to tack on some AI thing. And maybe that's great. I don't know.Maybe that's—honestly, maybe you're seeing some traction there. I don't know. But I totally agree. I feel like everyone right now is, like, selling a future that we don't quite have yet. I don't know. I'm worried that what's going to happen again, is what happened back in the IBM Watson days where everyone starts making bold—over-promising too much with AI until we see another AI winter again.Corey: Oh, the subtext is always, we can't wait to fire our entire customer service department. That one—Joe: Yeah.Corey: Just thrills me.Joe: [laugh].Corey: It's like, no, we're just going to get rid of junior engineers and just have senior engineers. Yeah, where do you think those people come from, by the way? We aren't—they aren't just emerging fully formed from the forehead of some god somewhere. And we're also seeing this wild divergence from reality. Remember, I fix AWS bills for a living. I see very large companies, very large AWS spend.The majority of spend remains on EC2 across the board. So, we don't see a lot of attention paid to that at re:Invent, even though it's the lion's share of everything. When we do contract negotiations, we talk about generative AI plan and strategy, but no one's saying, oh, yeah, we're spending 100 million a year right now on AWS but we should commit 250 because of all this generative AI stuff we're getting into. It's all small-scale experimentation and seeing if there's value there. But that's a far cry from being the clear winner what everyone is doing.I'd further like to point out that I can tell that there's a hype cycle in place and I'm trying to be—and someone's trying to scam me. As soon as there's a sense of you have to get on this new emerging technology now, now, now, now, now. I didn't get heavily into cloud till 2016 or so and I seem to have done all right with that. Whenever someone is pushing you to get into an emerging thing where it hasn't settled down enough to build a curriculum yet, I feel like there's time to be cautious and see what the actual truth is. Someone's selling something; if you can't spot the sucker, chances are, it's you.Joe: [laugh]. Corey, have you thought about making an AI large language model that will help people with their cloud bills? Maybe just feed it, like, your invoices [laugh].Corey: That has been an example, I've used a number of times with a variety of different folks where if AI really is all it's cracked up to be, then the AWS billing system is very much a bounded problem space. There's a lot of nuance and intricacy to it, but it is a finite set of things. Sure, [unintelligible 00:08:56] space is big. So, training something within those constraints and within those confines feels like it would be a terrific proof-of-concept for a lot of these things. Except that when I've experimented a little bit and companies have raised rounds to throw into this, it never quite works out because there's always human context involved. The, oh yeah, we're going to wind up turning off all those idle instances, except they're in idle—by whatever metric you're using—for a reason. And the first time you take production down, you're not allowed to save money anymore.Joe: Nope. That's such a good point. I agree. I don't know about you, Corey. I've been fretting about my job and, like, what I'm doing. I write a lot, I do a lot of videos, I'm programming a lot, and I think… obviously, we've been hearing a lot about, you know, if it's going to replace us or not. I honestly have been feeling a lot better recently about my job stability here. I don't know. I totally agree with you. There's always that, like, human component that needs to get added to it. But who knows, maybe it's going to get better. Maybe there'll be an AI-automated billing management tool, but it'll never be as good as you, Corey. Maybe it will. I don't know. [laugh].Corey: It knows who I am. When I tell it to write in the style of me and give it a blog post topic and some points I want to make, almost everything it says is wrong. But what I'll do is I'll copy that into a text editor, mansplain-correct the robot for ten minutes, and suddenly I've got the bones of a decent rough draft because. And yeah, I'll wind up plagiarizing three or four words in a row at most, but that's okay. I'm plagiarizing the thing that's plagiarizing from me and there's a beautiful symmetry to that. What I don't understand is some of the outreach emails and other nonsensical stuff I'll see where people are letting unsupervised AI just write things under their name and sending it out to people. That is anathema to me.Joe: I totally agree. And it might work today, it might work tomorrow, but, like, it's just a matter of time before something blows up. Corey, I'm curious. Like, personally, how do you feel about being in the ChatGPT, like, brain? I don't know, is that flattering? Does that make you nervous at all?Corey: Not really because it doesn't get it in a bunch of ways. And that's okay. I found the same problem with people. In my time on Twitter, when I started live-tweet shitposting about things—as I tend to do as my first love language—people will often try and do exactly that. The problem that I run into is that, “The failure mode of ‘clever' is ‘asshole,'” as John Scalzi famously said, and as a direct result of that, people wind up being mean and getting it wrong in that direction.It's not that I'm better than they are. It's, I had a small enough following, and no one knew who I was in my mean years, and I realized I didn't feel great making people sad. So okay, you've got to continue to correct the nosedive. But it is perilous and it is difficult to understand the nuance. I think occasionally when I prompt it correctly, it comes up with some amazing connections between things that I wouldn't have seen, but that's not the same thing as letting it write something completely unfettered.Joe: Yeah, I totally agree. The nuance definitely gets lost. It may be able to get, like, the tone, but I think it misses a lot of details. That's interesting.Corey: And other people are defending it when that hallucinates. Like, yeah, I understand there are people that do the same thing, too. Yeah, the difference is, in many cases, lying to me and passing it off otherwise is a firing offense in a lot of places. Because if you're going to be 19 out of 20 times, you're correct, but 5% wrong, you're going to bluff, I can't trust anything you tell me.Joe: Yeah. It definitely, like, brings your, like—the whole model into question.Corey: Also, remember that my medium for artistic creation is often writing. And I think that, on some level, these AI models are doing the same things that we do. There are still turns of phrase that I use that I picked up floating around Usenet in the mid-90s. And I don't remember who said it or the exact context, but these words and phrases have entered my lexicon and I'll use them and I don't necessarily give credit to where the first person who said that joke 30 years ago. But it's a—that is how humans operate. We are influenced by different styles of writing and learn from the rest.Joe: True.Corey: That's a bit different than training something on someone's artistic back catalog from a painting perspective and then emulating it, including their signature in the corner. Okay, that's a bit much.Joe: [laugh]. I totally agree.Corey: So, we wind up looking right now at the rush that is going on for companies trying to internalize their use of enterprise AI, which is kind of terrifying, and it all seems to come back to data.Joe: Yes.Corey: You work in the data space. How are you seeing that unfold?Joe: Yeah, I do. I've been, like, making speculations about the future of AI and data forever. I've had dreams of tools I've wanted forever, and I… don't have them yet. I don't think they're quite ready yet. I don't know, we're seeing things like—tha—I think people are working on a lot of problems.For example, like, I want AI to auto-optimize my database. I want it to, like, make indexes for me. I want it to help me with queries or optimizing queries. We're seeing some of that. I'm not seeing anyone doing particularly well yet. I think it's up in the air.I feel like it could be coming though soon, but that's the thing, though, too, like, I mean, if you mess up a query, or, like, a… large language model hallucinates a really shitty query for you, that could break your whole system really quickly. I feel like there still needs to be, like, a human being in the middle of it to, like, kind of help.Corey: I saw a blog post recently that AWS put out gave an example that just hard-coded a credential into it. And they said, “Don't do this, but for demonstration purposes, this is how it works.” Well, that nuance gets lost when you use that for AI training and that's, I think, in part, where you start seeing a whole bunch of the insecure crap these things spit out.Joe: Yeah, I totally agree. Well, I thought the big thing I've seen, too, is, like, large language models typically don't have a secure option and you're—the answer is, like, help train the model itself later on. I don't know, I'm sure, like, a lot of teams don't want to have their most secret data end up public on a large language model at some point in the future. Which is, like, a huge issue right now.Corey: I think that what we're seeing is that you still need someone with expertise in a given area to review what this thing spits out. It's great at solving a lot of the busy work stuff, but you still need someone who's conversant with the concepts to look at it. And that is, I think, something that turns into a large-scale code review, where everyone else just tends to go, “Oh, okay. We're—do this with code review.” “Oh, how big is the diff?” “50,000 lines.” “Looks good to me.” Whereas, “Three lines.” “I'm going to criticize that thing with four pages of text.” People don't want to do the deep-dive stuff, and—when there's a huge giant project that hits. So, they won't. And it'll be fine, right up until it isn't.Joe: Corey, you and I know people and developers, do you think it's irresponsible to put out there an example of how to do something like that, even with, like, an asterisk? I feel like someone's going to still go out and try to do that and probably push that to production.Corey: Of course they are.Joe: [laugh].Corey: I've seen this with some of my own code. I had something on Docker Hub years ago with a container that was called ‘Terrible Ideas.' And I'm sure being used in, like—it was basically the environment I use for a talk I gave around Git, which makes sense. And because I don't want to reset all the repositories back to the way they came from with a bunch of old commands, I just want a constrained environment that will be the same every time I give the talk. Awesome.I'm sure it's probably being run in production at a bank somewhere because why wouldn't it be? That's people. That's life. You're not supposed to just copy and paste from Chat-Gippity. You're supposed to do that from Stack Overflow like the rest of us. Where do you think your existing code's coming from in a lot of these shops?Joe: Yep. No, I totally agree. Yeah, I don't know. It'll be interesting to see how this shakes out with, like, people going to doing this stuff, or how honest they're going to be about it, too. I'm sure it's happening. I'm sure people are tripping over themselves right now, [adding 00:16:12].Corey: Oh, yeah. But I think, on some level, you're going to see a lot more grift coming out of this stuff. When you start having things that look a little more personalized, you can use it for spam purposes, you can use it for, I'm just going to basically copy and paste what this says and wind up getting a job on Upwork or something that is way more than I could handle myself, but using this thing, I'm going to wind up coasting through. Caveat emptor is always the case on that.Joe: Yeah, I totally agree.Corey: I mean, it's easy for me to sit here and talk about ethics. I believe strongly in doing the right thing. But I'm also not worried about whether I'm able to make rent this month or put food on the table. That's a luxury. At some point, like, a lot of that strips away and you do what you have to do to survive. I don't necessarily begrudge people doing these things until it gets to a certain point of okay, now you're not doing this to stay alive anymore. You're doing this to basically seek rent.Joe: Yeah, I agree. Or just, like, capitalize on it. I do think this is less—like, the space is less grifty than the crypto space, but as we've seen over and over and over and over again, in tech, there's a such a fine line between, like, a genuinely great idea, and somebody taking advantage of it—and other people—with that idea.Corey: I think that's one of those sad areas where you're not going to be able to fix human nature, regardless of the technology stack you bring to bear.Joe: Yeah, I totally agree.[midroll 00:17:30]Corey: So, what else are you seeing these days that interesting? What excites you? What do you see that isn't getting enough attention in the space?Joe: I don't know, I guess I'm in the data space, I'm… the thing I think I do see a lot of is huge interest in data. Data right now is the thing that's come up. Like, I don't—that's the thing that's training these models and everyone trying to figure out what to do with these data, all these massive databases, data lakes, whatever. I feel like everyone's, kind of like, taking a second look at all of this data they've been collecting for years and haven't really known what to do with it and trying to figure out either, like, if you can make a model out of that, if you try to, like… level it up, whatever. Corey, you and I were joking around recently—you've had a lot of data people on here recently, too—I feel like us data folks are just getting extra loud right now. Or maybe there's just the data spaces, that's where the action's at right now.I don't know, the markets are really weird. Who knows? But um, I feel like data right now is super valuable and more so than ever. And even still, like, I mean, we're seeing, like, companies freaking out, like, Twitter and Reddit freaking out about accessing their data and who's using it and how. I don't know, I feel like there's a lot of action going on there right now.Corey: I think that there's a significant push from the data folks where, for a long time data folks were DBAs—Joe: Yeah.Corey: —let's be direct. And that role has continued to evolve in a whole bunch of different ways. It's never been an area I've been particularly strong in. I am not great at algorithmic complexity, it turns out, you can saturate some beefy instances with just a little bit of data if your queries are all terrible. And if you're unlucky—as I tend to be—and have an aura of destroying things, great, you probably don't want to go and make that what you do.Joe: [laugh]. It's a really good point. I mean, I don't know about, like, if you blow up data at a company, you're probably going to be in big trouble. And especially the scale we're talking about with most companies these days, it's super easy to either take down a server or generate an insane bill off of some shitty query.Corey: Oh, when I was at Reach Local years and years ago—my first Linux admin job—when I broke the web server farm, it was amusing; when I broke part of the data warehouse, nobody was laughing.Joe: [laugh]. I wonder why.Corey: It was a good faith mistake and that's fair. It was a convoluted series of things that set up and honestly, the way the company and my boss responded to me at the time set the course of the rest of my career. But it was definitely something that got my attention. It scares me. I'm a big believer in backups as a direct result.Joe: Yeah. Here's the other thing, too. Actually, our company, Tinybird, is working on versioning with your data sources right now and treating your data sources like Git, but I feel like even still today, most companies are just run by some DBA. There's, like, Mike down the hall is the one responsible keeping their SQL servers online, keeping them rebooted, and like, they're manually updating any changes on there.And I feel like, generally speaking across the industry, we're not taking data seriously. Which is funny because I'm with you on there. Like, I get terrified touching production databases because I don't want anything bad to happen to them. But if we could, like, make it easier to rollback or, like, handle that stuff, that would be so much easier for me and make it, like, less scary to deal with it. I feel like databases and, like, treating it as, like, a serious DevOps practice is not really—I'm not seeing enough of it. It's definitely, people are definitely doing it. Just, I want more.Corey: It seems like with data, there's a lack of iterative approaches to it. A line that someone came up with when I was working with them a decade and change ago was that you can talk about agile all you want, but when it comes to payments, everyone's doing waterfall. And it feels like, on some level, data's kind of the same.Joe: Yeah. And I don't know, like, how to fix it. I think everyone's just too scared of it to really touch it. Migrating over to a different version control, tr
This week on Equity, Alex was joined by Nathan Baschez, the CEO and founder of Lex, an AI-infused online writing tool that recently raised capital. Together, we're talking through a few key topics that have been top of mind in recent months:How many AI-powered, or AI-using writing tools can the market support?How far into the generative AI moment we are, and how much we should anticipate in the form of technology improvements?And then we discussed the nuts and bolts aspects of pricing an AI-powered service and other financial matters related to building a service today that leans on artificial intelligence.The last question is far from idle. Recall that back in 2020 there was conversation amongst venture players about the economics of AI startups, with the perspective at the time indicating that while the cohort might have more difficult early economics, that their numbers (gross margins, really) would improve over time. But what about when a startup is using, say, an OpenAI API for its core AI work? Will similar efficiencies bloom?Equity is back into its regular groove now that Disrupt is behind us — more to come! And before we go: Check out the UpFlip Podcast where you get to unravel how great businesses are built, how they are run behind the scenes and how their success can be replicated. We think you'll love episode 79 where they featured this guest who transformed his passion for gardening into a $7.3 million-a-year venture. You can find the podcast on Youtube or where ever you listen to podcasts.For episode transcripts and more, head to Equity's Simplecast website.Equity drops at 7 a.m. PT every Monday, Wednesday and Friday, so subscribe to us on Apple Podcasts, Overcast, Spotify and all the casts. TechCrunch also has a great show on crypto, a show that interviews founders and more!
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: GPT-4 for personal productivity: online distraction blocker, published by Sergii on September 27, 2023 on LessWrong. There are many apps for blocking distracting websites: freedom.to, leechblock, selfcontrol, coldturkey, just to name a few. They are useful for maintaining focus, avoiding procrastination, and curbing addictive web surfing. They work well for blocking a list of a few distracting websites. For me, this is not enough, because I'm spending a large portion of my time on a large number of websites, which I check out for a minute or two and then never visit again. It's just impossible to maintain a blocklist for this long tail. Also, the web has grown so much that there are just too many easily found alternatives for any blocked distraction. Well, GPT-4 to the rescue! With an LLM it's possible to block websites based on the content, checking each page - if it's distracting or useful/productive. To test the idea I have implemented a prototype of a distraction filtering browser extension. This way, GPT-4 is turning into a personal productivity assistant! The extension sends the content of each loaded page to OpenAI API, and asks GPT if the page should be blocked. The prompt can be edited in the config window; the following prompt is used by default: Sensitive content, whitelist & blacklist. While the extension is active, it sends a sample of each visited page's content to OpenAI API. This might be a problem for pages with sensitive content. You can add any domains which you do not want to expose to OpenAI to the whitelist or the blacklist. Pages that are matched are allowed or blocked without sending content to OpenAI. OpenAI is claiming to handle user data securely, and to not use data submitted via API for model training. Still, if you have any concerns about the privacy and security of the pages that you visit, and if you do not want to risk leaking your browsing history, avoid using this extension. Installation and testing To try it out, download the extension (github.com/coolvision/awf/releases/download/0.1/awf-0.1.zip), install it (instructions), and enter your API key in the extension's config page. Then, navigate to any page, it might get blocked: Does it work? I have been using it for a few days, and it does work quite well, with correct decisions in most cases. One problem is that GPT-4 is expensive, and my usage has been up to ~$1/day. It would probably cost $10-30/month, which is not too much, but still a thing to improve. Another issue is that OpenAI API is quite slow, it takes several seconds (up to 50-10s) to validate each page. I haven't decided yet if it's a feature or a problem - on one hand, it does make web browsing more mindful which is good, but then it does kill the flow/momentum when I want to quickly research something. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
In this impactful episode of the Non-Intuitive Bits podcast, hosts Slava and Nishant Sharma dive into various discussions. The conversation starts off with them sharing their struggles with transitioning to Linux from MacOS tool, mentioning tools like Mission Control and Troipoid.They also discuss Fathom, an AI-driven management tool known for features like talk-time tracking and meeting summaries. The hosts contemplate on the future roles of AI, such as managing vacation requests and a futuristic concept of a "manager as a service".Gaming fans would love the deep dive into games like Armored Core 6, Elden Ring, Souls, and Counter-Strike GO. They express their excitement for the upcoming CS2 on Valve's Source 2 engine and the upcoming Starfield's 'constellation edition' by Bethesda. The discussion continues with features on Poly Bridge 1 & 3, Euro Truck Simulator, and Microsoft Flight Simulator.The podcast ends with a thoughtful discussion on the complications of manual VPN connections and the potential for automation solutions, including intriguing prospects of location-based automation in VPN.Throughout the episode, the hosts also delve into operational specifics including the use of GPT tools like Jess and Jess P, innovations in AI technology like AutoMod, and the cost-efficiency exploration of using OpenAI API.To top it all, Slava and Nishant recount their ongoing two-year long travel journey starting in March, exploring various national parks and cities with Airbnb stays across the States. They recount memories from cities like Chicago, Boston, Montana, etc.Looking for a co-host, the hosts promise more enlightening discussions in future episodes, making this podcast a must-listen for tech and travel enthusiasts!
AI Chat: ChatGPT & AI News, Artificial Intelligence, OpenAI, Machine Learning
In this episode, we dive into Lex's recent $2.75 million fundraising round for it's AI writing tool. We discuss the technology behind it, its potential applications, and what this means for both developers and end-users in the fast-growing world of AI-driven content creation. Get on the AI Box Waitlist: https://AIBox.ai/ Facebook Community: https://www.facebook.com/groups/739308654562189/ Discord Community: https://aibox.ai/discord Follow me on X: https://twitter.com/jaeden_ai
Smooth Business Growth – 15 Minutes Of Pure Marketing Strategies Proven To Move The Needle
Anyone in podcasting knows there are a multitude of tasks involved in podcast production, from finding the right guests, coordinating with them, conducting research, prepping for episodes, publishing on websites, and marketing on various platforms like TikTok and Instagram. It can be overwhelming, and that's why Adam Burha created Podtask. He not only shared his incredible healing journey through podcasting but shares how the pandemic and the rise in remote communication has made podcasting more accessible and familiar to people, fostering greater openness, authenticity, vulnerability, and a desire to help others. Here are my top three takeaways from the episode 1️⃣ Efficient Podcast Production: Managing multiple podcasts and tasks involved in production can be overwhelming, which is why Podtask's Content Creator feature is a game-changer. With the click of a button, it uses OpenAI API to generate transcripts, show notes, episode titles, and key points. It can even generate impactful hooks for the first 20 seconds of each episode automatically. Plus, it automates the addition of tasks and notifications for each new episode recording, streamlining the process and eliminating the need for manual management and email communication. 2️⃣ Streamlined Guest Spotting: Guest spotting is a fantastic way to grow your podcast, and Podtask has made it easier than ever. Harnessing the power of the entire podcasting database through the Listen Notes API, you can now effortlessly send targeted podcast pitch requests to shows in your genre or category. Stand out from the crowd by including a short video introducing yourself and explaining why you would be the perfect guest. This proactive approach will increase your chances of securing spots on ideal shows and taking your podcast to new heights. You can even include a short video introducing yourself and explaining why you'd be a great guest. This proactive approach increases your chances of getting on ideal shows and expanding your reach. 3️⃣ Regain Time and Prevent Burnout: One of the biggest challenges podcasters face is finding balance and avoiding burnout. As a CEO overseeing multiple businesses, we understand the stress and anxiety that come along with such responsibilities. That's why we wanted to create a tool that not only saves you time but also helps you avoid burnout. We believe that by regaining precious time in your life, you can continue pursuing your passion and purpose without compromising your well-being. That's the essence of Podtask - an affordable and comprehensive podcast management tool dedicated to reducing stress, streamlining your workflow, and enabling you to focus on what matters most: creating incredible content and connecting with your audience. Key Times: 00:00:44 Podcasting heals trauma, helps overcome anxiety attacks. 00:04:17 Podcasting: therapeutic, transformative, and accessible in modern times. 00:08:48 Entrepreneur started podcast production company eIQ Media. 00:11:13 Built and customized Podtask 00:15:55 Automated episode recording and scheduling with notifications. 00:20:36 AI-based podcasting tool simplifies marketing and content creation. 00:23:45 Pitching guests to grow your podcast audience. 00:31:35 Podcaster launching new show, interviewing other podcasters. 00:33:51 Free podcast setup help for newbies.
Jim talks with Carlos Perez about the ideas in his new book A Pattern Language for Generative AI: A Self-Generating GPT-4 Blueprint. They discuss GPT-4's ability to introspect on its capabilities, Christopher Alexander's idea of a pattern language, pattern language design, Jim's script-writing program, moving beyond ChatGPT to the OpenAI API, managing the context window, chain of thought prompting, the skyhook effect, the value of using tables, creation patterns, input-output pairs, the power of examples, punctuation, cloze prompts, compressing text, the mystery of LLM capabilities, an explanation for state emulation, the system prompt, explainability patterns, meta-levels of language, procedural patterns, design thinking prompts, the idea of a GPTpedia, composite patterns, in-painting vs out-painting, corrective patterns, 6 thinking hats, attribute listing prompts, problem restatements, inverted interaction, multiple-discipline prompts, modularity patterns, ChatGPT plugins, katas & meditations, and much more. Episode Transcript A Pattern Language for Generative AI: A Self-Generating GPT-4 Blueprint, by Carlos Perez "ScriptHelper-001: an experimental GPT-4 based Movie Script Writing Program," by Jim Rutt Artificial Intuition: The Improbable Deep Learning Revolution, by Carlos Perez Deep Learning AI Playbook: Strategy for Disruptive Artificial Intelligence, by Carlos Perez Artificial Empathy: A Roadmap for Human-Aligned Artificial Intelligence, by Carlos Perez Carlos E. Perez is a seasoned software architect and developer with 30 years of experience in bringing software systems from concept to production. He has authored books on Artificial Intuition, Fluency, and Empathy, with a primary focus on applying semiotic methods in Deep Learning. Carlos holds a Master's degree in Computer Science from the University of Massachusetts and has U.S. patents in expert systems and social networks.
In this Hasty Treat, Scott and Wes talk about what can be done with the OpenAI API, how to get started with it, pricing, tuning your model, and gotchas for getting started with the OpenAI API. Sentry - Sponsor If you want to know what's happening with your code, track errors and monitor performance with Sentry. Sentry's Application Monitoring platform helps developers see performance issues, fix errors faster, and optimize their code health. Cut your time on error resolution from hours to minutes. It works with any language and integrates with dozens of other services. Syntax listeners new to Sentry can get two months for free by visiting Sentry.io and using the coupon code TASTYTREAT during sign up. Show Notes 00:26 Welcome 01:17 Sponsor: Sentry 02:39 What is the OpenAI API? 05:11 Getting started with the API 07:41 How run OpenAI OpenAI API 14:16 GPT4 update 17:58 Tune your models 19:46 Generating questions with ChatGPT 24:30 Speech to text Otter.ai - Voice Meeting Notes & Real-time Transcription Descript | All-in-one video & podcast editing, easy as a doc. 26:12 Related API 27:33 LangChain 32:12 Save your replies Tweet us your tasty treats Scott's Instagram LevelUpTutorials Instagram Wes' Instagram Wes' Twitter Wes' Facebook Scott's Twitter Make sure to include @SyntaxFM in your tweets
AgendaChatGPT 4 came outArlingbrook What am I selling?/What am I building?There's power in being firstChatGPT 4EmailAPI Waitlist: Please sign up for our waitlist to get rate-limited access to the GPT-4 API – which uses the same ChatCompletions API as gpt-3.5-turbo. We'll start inviting some developers today, and scale up availability and rate limits gradually to balance capacity with demand.Priority Access: Developers can get prioritized API access to GPT-4 for contributing model evaluations to OpenAI Evals that get merged, which will help us improve the model for everyone.ChatGPT Plus: ChatGPT Plus subscribers will get GPT-4 access on chat.openai.com with a dynamically adjusted usage cap. We expect to be severely capacity constrained, so the usage cap will depend on demand and system performance. API access will still be through the waitlist.API Pricinggpt-4 with an 8K context window (about 13 pages of text) will cost $0.03 per 1K prompt tokens, and $0.06 per 1K completion tokens.gpt-4-32k with a 32K context window (about 52 pages of text) will cost $0.06 per 1K prompt tokens, and $0.12 per 1K completion tokens.LivestreamPlease join us for a live demo of GPT-4 at 1pm PDT today, where Greg Brockman (co-founder & President of OpenAI) will showcase GPT-4's capabilities and the future of building with the OpenAI API.—The OpenAI teamDemohttps://www.youtube.com/watch?v=outcGtbnMuQOverviewhttps://openai.com/product/gpt-4Overview page of GPT-4 and what early customers have built on top of the modelBlog Posthttps://openai.com/research/gpt-4Blog post with details on the model's capabilities and limitations, including eval resultsAPI Waitlisthttps://openai.com/waitlist/gpt-4-api Visual inputsGPT-4 can accept a prompt of text and images, which—parallel to the text-only setting—lets the user specify any vision or language task. Specifically, it generates text outputs (natural language, code, etc.) given inputs consisting of interspersed text and images. Over a range of domains—including documents with text and photographs, diagrams, or screenshots—GPT-4 exhibits similar capabilities as it does on text-only inputs. Furthermore, it can be augmented with test-time techniques that were developed for text-only language models, including few-shot and chain-of-thought prompting. Image inputs are still a research preview and not publicly available.Open AI Pricinghttps://openai.com/pricing ArlingbrookOffers 2 subscriptions$8.99/monthExclusive access to creators$45.99/monthRhinoleg CRMUnlimited featuresChat botThe whole thing is written in AIMuch moreThere's power in being firstThe world can only hold 2 options in their mindsArlingbrook is being released in no less than 60 daysSupport this podcast at — https://redcircle.com/the-secret-to-success/exclusive-contentAdvertising Inquiries: https://redcircle.com/brandsPrivacy & Opt-Out: https://redcircle.com/privacy
Mike McCue, Co-Founder and CEO of Flipboard, joins the show to share how Flipboard is getting into the Fediverse by adding support for Mastodon in its app. Amanda Silberling of TechCrunch comes on to talk about a proposed bill called the Deterring America's Technological Adversaries ACT that would give President Biden the power to ban TikTok in the United States. Jason talks about OpenAI launching an API for ChatGPT and how businesses like Instacart and Instagram utilize it through OpenAI's dedicated capacity for enterprise customers. Finally, Mikah rounds things out with a demonstration of Bluesky, Jack Dorsey's Twitter alternative. Hosts: Jason Howell and Mikah Sargent Guests: Mike McCue and Amanda Silberling Download or subscribe to this show at https://twit.tv/shows/tech-news-weekly. Get episodes ad-free with Club TWiT at https://twit.tv/clubtwit Sponsors: CDW.com/DellClient kolide.com/tnw
Mike McCue, Co-Founder and CEO of Flipboard, joins the show to share how Flipboard is getting into the Fediverse by adding support for Mastodon in its app. Amanda Silberling of TechCrunch comes on to talk about a proposed bill called the Deterring America's Technological Adversaries ACT that would give President Biden the power to ban TikTok in the United States. Jason talks about OpenAI launching an API for ChatGPT and how businesses like Instacart and Instagram utilize it through OpenAI's dedicated capacity for enterprise customers. Finally, Mikah rounds things out with a demonstration of Bluesky, Jack Dorsey's Twitter alternative. Hosts: Jason Howell and Mikah Sargent Guests: Mike McCue and Amanda Silberling Download or subscribe to this show at https://twit.tv/shows/tech-news-weekly. Get episodes ad-free with Club TWiT at https://twit.tv/clubtwit Sponsors: CDW.com/DellClient kolide.com/tnw