POPULARITY
Federico Barbero (DeepMind/Oxford) is the lead author of "Transformers Need Glasses!". Have you ever wondered why LLMs struggle with seemingly simple tasks like counting or copying long strings of text? We break down the theoretical reasons behind these failures, revealing architectural bottlenecks and the challenges of maintaining information fidelity across extended contexts.Federico explains how these issues are rooted in the transformer's design, drawing parallels to over-squashing in graph neural networks and detailing how the softmax function limits sharp decision-making.But it's not all bad news! Discover practical "glasses" that can help transformers see more clearly, from simple input modifications to architectural tweaks.SPONSOR MESSAGES:***CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!https://centml.ai/pricing/Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/***https://federicobarbero.com/TRANSCRIPT + RESEARCH:https://www.dropbox.com/s/h7ys83ztwktqjje/Federico.pdf?dl=0TOC:1. Transformer Limitations: Token Detection & Representation[00:00:00] 1.1 Transformers fail at single token detection[00:02:45] 1.2 Representation collapse in transformers[00:03:21] 1.3 Experiment: LLMs fail at copying last tokens[00:18:00] 1.4 Attention sharpness limitations in transformers2. Transformer Limitations: Information Flow & Quantization[00:18:50] 2.1 Unidirectional information mixing[00:18:50] 2.2 Unidirectional information flow towards sequence beginning in transformers[00:21:50] 2.3 Diagonal attention heads as expensive no-ops in LAMA/Gemma[00:27:14] 2.4 Sequence entropy affects transformer model distinguishability[00:30:36] 2.5 Quantization limitations lead to information loss & representational collapse[00:38:34] 2.6 LLMs use subitizing as opposed to counting algorithms3. Transformers and the Nature of Reasoning[00:40:30] 3.1 Turing completeness conditions in transformers[00:43:23] 3.2 Transformers struggle with sequential tasks[00:45:50] 3.3 Windowed attention as solution to information compression[00:51:04] 3.4 Chess engines: mechanical computation vs creative reasoning[01:00:35] 3.5 Epistemic foraging introducedREFS:[00:01:05] Transformers Need Glasses!, Barbero et al.https://proceedings.neurips.cc/paper_files/paper/2024/file/b1d35561c4a4a0e0b6012b2af531e149-Paper-Conference.pdf[00:05:30] Softmax is Not Enough, Veličković et al.https://arxiv.org/abs/2410.01104[00:11:30] Adv Alg Lecture 15, Chawlahttps://pages.cs.wisc.edu/~shuchi/courses/787-F09/scribe-notes/lec15.pdf[00:15:05] Graph Attention Networks, Veličkovićhttps://arxiv.org/abs/1710.10903[00:19:15] Extract Training Data, Carlini et al.https://arxiv.org/pdf/2311.17035[00:31:30] 1-bit LLMs, Ma et al.https://arxiv.org/abs/2402.17764[00:38:35] LLMs Solve Math, Nikankin et al.https://arxiv.org/html/2410.21272v1[00:38:45] Subitizing, Railohttps://link.springer.com/10.1007/978-1-4419-1428-6_578[00:43:25] NN & Chomsky Hierarchy, Delétang et al.https://arxiv.org/abs/2207.02098[00:51:05] Measure of Intelligence, Chollethttps://arxiv.org/abs/1911.01547[00:52:10] AlphaZero, Silver et al.https://pubmed.ncbi.nlm.nih.gov/30523106/[00:55:10] Golden Gate Claude, Anthropichttps://www.anthropic.com/news/golden-gate-claude[00:56:40] Chess Positions, Chase & Simonhttps://www.sciencedirect.com/science/article/abs/pii/0010028573900042[01:00:35] Epistemic Foraging, Fristonhttps://www.frontiersin.org/journals/computational-neuroscience/articles/10.3389/fncom.2016.00056/full
In this episode, security researcher Nicholas Carlini of Google DeepMind delves into his extensive work on adversarial machine learning and cybersecurity. He discusses his pioneering contributions, which include developing attacks that have challenged the defenses of image classifiers and exploring the robustness of neural networks. Carlini details the inherent difficulties of defending against adversarial attacks, the role of human intuition in his work, and the potential of scaling attack methodologies using language models. He also addresses the broader implications of open-source AI and the complexities of balancing security with accessibility in emerging AI technologies. SPONSORS: SafeBase: SafeBase is the leading trust-centered platform for enterprise security. Streamline workflows, automate questionnaire responses, and integrate with tools like Slack and Salesforce to eliminate friction in the review process. With rich analytics and customizable settings, SafeBase scales to complex use cases while showcasing security's impact on deal acceleration. Trusted by companies like OpenAI, SafeBase ensures value in just 16 days post-launch. Learn more at https://safebase.io/podcast Oracle Cloud Infrastructure (OCI): Oracle's next-generation cloud platform delivers blazing-fast AI and ML performance with 50% less for compute and 80% less for outbound networking compared to other cloud providers. OCI powers industry leaders like Vodafone and Thomson Reuters with secure infrastructure and application development capabilities. New U.S. customers can get their cloud bill cut in half by switching to OCI before March 31, 2024 at https://oracle.com/cognitive Shopify: Shopify is revolutionizing online selling with its market-leading checkout system and robust API ecosystem. Its exclusive library of cutting-edge AI apps empowers e-commerce businesses to thrive in a competitive market. Cognitive Revolution listeners can try Shopify for just $1 per month at https://shopify.com/cognitive NetSuite: Over 41,000 businesses trust NetSuite by Oracle, the #1 cloud ERP, to future-proof their operations. With a unified platform for accounting, financial management, inventory, and HR, NetSuite provides real-time insights and forecasting to help you make quick, informed decisions. Whether you're earning millions or hundreds of millions, NetSuite empowers you to tackle challenges and seize opportunities. Download the free CFO's guide to AI and machine learning at https://netsuite.com/cognitive RECOMMENDED PODCAST: Second OpinionJoin Christina Farr, Ash Zenooz and Luba Greenwood as they bring influential entrepreneurs, experts and investors into the ring for candid conversations at the frontlines of healthcare and digital health every week. Spotify: https://open.spotify.com/show/0A8NwQE976s32zdBbZw6bv Apple: https://podcasts.apple.com/us/podcast/second-opinion-with-christina-farr-ash-zenooz-md-luba/id1759267211 YouTube: https://www.youtube.com/@SecondOpinionwithChristinaFarr SOCIAL LINKS: Website: https://www.cognitiverevolution.ai Twitter (Podcast): https://x.com/cogrev_podcast Twitter (Nathan): https://x.com/labenz LinkedIn: https://linkedin.com/in/nathanlabenz/ Youtube: https://youtube.com/@CognitiveRevolutionPodcast Apple: https://podcasts.apple.com/de/podcast/the-cognitive-revolution-ai-builders-researchers-and/id1669813431 Spotify: https://open.spotify.com/show/6yHyok3M3BjqzR0VB5MSyk PRODUCED BY: https://aipodcast.ing
'Let us model our large language model as a hash function—' Sold.Our special guest Nicholas Carlini joins us to discuss differential cryptanalysis on LLMs and other attacks, just as the ones that made OpenAI turn off some features, hehehehe.Watch episode on YouTube: https://youtu.be/vZ64xPI2Rc0Transcript: https://securitycryptographywhatever.com/2025/01/28/cryptanalyzing-llms-with-nicholas-carlini/Links:- https://nicholas.carlini.com- “Stealing Part of a Production Language Model”: https://arxiv.org/pdf/2403.06634- ‘Why I attack"': https://nicholas.carlini.com/writing/2024/why-i-attack.html- “Cryptanalytic Extraction of Neural Network Models”, CRYPTO 2020: https://arxiv.org/abs/2003.04884- “Stochastic Parrots”: https://dl.acm.org/doi/10.1145/3442188.3445922- https://help.openai.com/en/articles/5247780-using-logit-bias-to-alter-token-probability-with-the-openai-api- https://community.openai.com/t/temperature-top-p-and-top-k-for-chatbot-responses/295542- https://opensource.org/license/mit- https://github.com/madler/zlib- https://ai.meta.com/blog/yann-lecun-ai-model-i-jepa/- https://nicholas.carlini.com/writing/2024/how-i-use-ai.html"Security Cryptography Whatever" is hosted by Deirdre Connolly (@durumcrustulum), Thomas Ptacek (@tqbf), and David Adrian (@davidcadrian)
Flavia Carlini è un'autrice e divulgatrice italiana, seguitissima sui social. Per anni, ha condotto l'estenuante ricerca di una diagnosi per i suoi dolori lancinanti, tra consulti medici e terapie inefficaci, arrivando a spendere decine di migliaia di euro senza avere risposte. Ma anche quando è riuscita a ottenere una diagnosi, le difficoltà non si sono affievolite. Il trattamento dell'endometriosi, infatti, rimane interamente a suo carico poiché questa malattia, che colpisce una donna su sette, non è ufficialmente riconosciuta dal sistema sanitario nazionale. «Io spendo mediamente 800 euro al mese per trattare tutte le mie malattie. È quasi uno stipendio. Io ho i miei genitori che mi aiutano a pagare tutto e questo è un privilegio gigantesco perché se loro non avessero avuto la possibilità di pagarmi le medicine o quello che il mio stipendio non mi dava, io sarei finita in mezzo a una strada e questo è un prezzo altissimo che io sto pagando in quanto donna». A questa battaglia per il diritto alla salute si intreccia un ulteriore peso: il costo economico e psicologico del dover aderire alle aspettative sociali imposte alle donne. Conformarsi agli standard di bellezza e comportamento richiesti dall'ambiente lavorativo non è mai stata per lei una scelta, ma una necessità che si traduceva in spese significative per abbigliamento, trucco e cure estetiche: «Troppo corto, troppo lungo, troppo aderente, troppo vistoso, troppo colorato, troppo scollato, troppo sottile, troppo spesso. Questo, ogni mattina. La mia premura era di vestirmi in modo tale da non essere molestata». Nonostante il successo sui social e la scelta di vivere di scrittura, Flavia ammette di non aver raggiunto l'indipendenza economica, rinunciando a collaborazioni pubblicitarie per mantenere la sua integrità. «Nella storia della letteratura, la cultura è sempre venuta dai margini. E voi direte: perché se è sempre venuta dai margini non può ancora venire dai margini? Perché oggi c'è il capitalismo, e se non produco, non mangio. E quindi la cultura rimarrà sempre un privilegio di una classe già privilegiata».
Nicholas Carlini from Google DeepMind offers his view of AI security, emergent LLM capabilities, and his groundbreaking model-stealing research. He reveals how LLMs can unexpectedly excel at tasks like chess and discusses the security pitfalls of LLM-generated code. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events? Goto https://tufalabs.ai/ *** Transcript: https://www.dropbox.com/scl/fi/lat7sfyd4k3g5k9crjpbf/CARLINI.pdf?rlkey=b7kcqbvau17uw6rksbr8ccd8v&dl=0 TOC: 1. ML Security Fundamentals [00:00:00] 1.1 ML Model Reasoning and Security Fundamentals [00:03:04] 1.2 ML Security Vulnerabilities and System Design [00:08:22] 1.3 LLM Chess Capabilities and Emergent Behavior [00:13:20] 1.4 Model Training, RLHF, and Calibration Effects 2. Model Evaluation and Research Methods [00:19:40] 2.1 Model Reasoning and Evaluation Metrics [00:24:37] 2.2 Security Research Philosophy and Methodology [00:27:50] 2.3 Security Disclosure Norms and Community Differences 3. LLM Applications and Best Practices [00:44:29] 3.1 Practical LLM Applications and Productivity Gains [00:49:51] 3.2 Effective LLM Usage and Prompting Strategies [00:53:03] 3.3 Security Vulnerabilities in LLM-Generated Code 4. Advanced LLM Research and Architecture [00:59:13] 4.1 LLM Code Generation Performance and O(1) Labs Experience [01:03:31] 4.2 Adaptation Patterns and Benchmarking Challenges [01:10:10] 4.3 Model Stealing Research and Production LLM Architecture Extraction REFS: [00:01:15] Nicholas Carlini's personal website & research profile (Google DeepMind, ML security) - https://nicholas.carlini.com/ [00:01:50] CentML AI compute platform for language model workloads - https://centml.ai/ [00:04:30] Seminal paper on neural network robustness against adversarial examples (Carlini & Wagner, 2016) - https://arxiv.org/abs/1608.04644 [00:05:20] Computer Fraud and Abuse Act (CFAA) – primary U.S. federal law on computer hacking liability - https://www.justice.gov/jm/jm-9-48000-computer-fraud [00:08:30] Blog post: Emergent chess capabilities in GPT-3.5-turbo-instruct (Nicholas Carlini, Sept 2023) - https://nicholas.carlini.com/writing/2023/chess-llm.html [00:16:10] Paper: “Self-Play Preference Optimization for Language Model Alignment” (Yue Wu et al., 2024) - https://arxiv.org/abs/2405.00675 [00:18:00] GPT-4 Technical Report: development, capabilities, and calibration analysis - https://arxiv.org/abs/2303.08774 [00:22:40] Historical shift from descriptive to algebraic chess notation (FIDE) - https://en.wikipedia.org/wiki/Descriptive_notation [00:23:55] Analysis of distribution shift in ML (Hendrycks et al.) - https://arxiv.org/abs/2006.16241 [00:27:40] Nicholas Carlini's essay “Why I Attack” (June 2024) – motivations for security research - https://nicholas.carlini.com/writing/2024/why-i-attack.html [00:34:05] Google Project Zero's 90-day vulnerability disclosure policy - https://googleprojectzero.blogspot.com/p/vulnerability-disclosure-policy.html [00:51:15] Evolution of Google search syntax & user behavior (Daniel M. Russell) - https://www.amazon.com/Joy-Search-Google-Master-Information/dp/0262042878 [01:04:05] Rust's ownership & borrowing system for memory safety - https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html [01:10:05] Paper: “Stealing Part of a Production Language Model” (Carlini et al., March 2024) – extraction attacks on ChatGPT, PaLM-2 - https://arxiv.org/abs/2403.06634 [01:10:55] First model stealing paper (Tramèr et al., 2016) – attacking ML APIs via prediction - https://arxiv.org/abs/1609.02943
Part 2 of my behind-the-scenes look at LOVB Madison and the players and coaches shaping its first year with more fun talk about New Year's resolutions and favorite TV Shows too. I share my reaction to the first match of the season while looking ahead to Madison's home opener on Friday. Former Wisconsin Badger Lauren Carlini reflects on returning to the city where she became a college volleyball star. Temi Thomas-Ailara shares her enthusiasm for the upcoming season and why she's addicted to Tik-Tok. Plus, Head Coach Matt Fuerbringer gives fans a glimpse of what to expect from LOVB Madison in 2025.
Live from DCD London 2024, Schneider Electric's Chief Advocate of Data Center and AI addresses how data centers can adapt to meet the growing demands of AI. He also discusses the innovative ways data center operators are collaborating with utilities to solve power challenges. Check it out!
This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Today, we're joined by Nicholas Carlini, research scientist at Google DeepMind to discuss adversarial machine learning and model security, focusing on his 2024 ICML best paper winner, “Stealing part of a production language model.” We dig into this work, which demonstrated the ability to successfully steal the last layer of production language models including ChatGPT and PaLM-2. Nicholas shares the current landscape of AI security research in the age of LLMs, the implications of model stealing, ethical concerns surrounding model privacy, how the attack works, and the significance of the embedding layer in language models. We also discuss the remediation strategies implemented by OpenAI and Google, and the future directions in the field of AI security. Plus, we also cover his other ICML 2024 best paper, “Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining,” which questions the use and promotion of differential privacy in conjunction with pre-trained models. The complete show notes for this episode can be found at https://twimlai.com/go/702.
In this episode, I recap Sarah Franklin's incredible performance against Marquette, where she shattered the Wisconsin school record with 33 kills. I break down how Franklin's dominant play impacted the match and what it means for the team moving forward. I also recap this week's Kelly Sheffield Show, sharing my top three takeaways from the show. Plus, hear my exclusive pre-game interview with former Badger setter and new Offensive Analyst and Strategy Consultant, Lauren Carlini, as she talks about her new role with the team. --- Support this podcast: https://podcasters.spotify.com/pod/show/jon-arias/support
In this episode I talk about the news of former Wisconsin setter Lauren Carlini rejoining the Badgers as an Offensive Analyst and Strategy Consultant. I discuss what Carlini's return means for the team and how her experience can benefit Freshman setter Charlier Fuerbringer. Plus, I break down three key takeaways from Coach Kelly Sheffield's recent insights on the Kelly Sheffield Show on ESPN Madison. And don't miss my pre-game interview with freshman libero Lola Schumacher, where we talk about her performance in her first start at libero and why she loves the game of volleyball. --- Support this podcast: https://podcasters.spotify.com/pod/show/jon-arias/support
Oggi parleremo di intervento nutrizionale nella Malattia di Parkinson con la dott.ssa Carlini
Today's guest, Nicholas Carlini, a research scientist at DeepMind, argues that we should be focusing more on what AI can do for us individually, rather than trying to have an answer for everyone."How I Use AI" - A Pragmatic ApproachCarlini's blog post "How I Use AI" went viral for good reason. Instead of giving a personal opinion about AI's potential, he simply laid out how he, as a security researcher, uses AI tools in his daily work. He divided it in 12 sections:* To make applications* As a tutor* To get started* To simplify code* For boring tasks* To automate tasks* As an API reference* As a search engine* To solve one-offs* To teach me* Solving solved problems* To fix errorsEach of the sections has specific examples, so we recommend going through it. It also includes all prompts used for it; in the "make applications" case, it's 30,000 words total!My personal takeaway is that the majority of the work AI can do successfully is what humans dislike doing. Writing boilerplate code, looking up docs, taking repetitive actions, etc. These are usually boring tasks with little creativity, but with a lot of structure. This is the strongest arguments as to why LLMs, especially for code, are more beneficial to senior employees: if you can get the boring stuff out of the way, there's a lot more value you can generate. This is less and less true as you go entry level jobs which are mostly boring and repetitive tasks. Nicholas argues both sides ~21:34 in the pod.A New Approach to LLM BenchmarksWe recently did a Benchmarks 201 episode, a follow up to our original Benchmarks 101, and some of the issues have stayed the same. Notably, there's a big discrepancy between what benchmarks like MMLU test, and what the models are used for. Carlini created his own domain-specific language for writing personalized LLM benchmarks. The idea is simple but powerful:* Take tasks you've actually needed AI for in the past.* Turn them into benchmark tests.* Use these to evaluate new models based on your specific needs.It can represent very complex tasks, from a single code generation to drawing a US flag using C:"Write hello world in python" >> LLMRun() >> PythonRun() >> SubstringEvaluator("hello world")"Write a C program that draws an american flag to stdout." >> LLMRun() >> CRun() >> VisionLLMRun("What flag is shown in this image?") >> (SubstringEvaluator("United States") | SubstringEvaluator("USA")))This approach solves a few problems:* It measures what's actually useful to you, not abstract capabilities.* It's harder for model creators to "game" your specific benchmark, a problem that has plagued standardized tests.* It gives you a concrete way to decide if a new model is worth switching to, similar to how developers might run benchmarks before adopting a new library or framework.Carlini argues that if even a small percentage of AI users created personal benchmarks, we'd have a much better picture of model capabilities in practice.AI SecurityWhile much of the AI security discussion focuses on either jailbreaks or existential risks, Carlini's research targets the space in between. Some highlights from his recent work:* LAION 400M data poisoning: By buying expired domains referenced in the dataset, Carlini's team could inject arbitrary images into models trained on LAION 400M. You can read the paper "Poisoning Web-Scale Training Datasets is Practical", for all the details. This is a great example of expanding the scope beyond the model itself, and looking at the whole system and how ti can become vulnerable.* Stealing model weights: They demonstrated how to extract parts of production language models (like OpenAI's) through careful API queries. This research, "Extracting Training Data from Large Language Models", shows that even black-box access can leak sensitive information.* Extracting training data: In some cases, they found ways to make models regurgitate verbatim snippets from their training data. Him and Milad Nasr wrote a paper on this as well: Scalable Extraction of Training Data from (Production) Language Models. They also think this might be applicable to extracting RAG results from a generation.These aren't just theoretical attacks. They've led to real changes in how companies like OpenAI design their APIs and handle data. If you really miss logit_bias and logit results by token, you can blame Nicholas :)We had a ton of fun also chatting about things like Conway's Game of Life, how much data can fit in a piece of paper, and porting Doom to Javascript. Enjoy!Show Notes* How I Use AI* My Benchmark for LLMs* Doom Javascript port* Conway's Game of Life* Tic-Tac-Toe in one printf statement* International Obfuscated C Code Contest* Cursor* LAION 400M poisoning paper* Man vs Machine at Black Hat* Model Stealing from OpenAI* Milad Nasr* H.D. Moore* Vijay Bolina* Cosine.sh* uuencodeTimestamps* [00:00:00] Introductions* [00:01:14] Why Nicholas writes* [00:02:09] The Game of Life* [00:05:07] "How I Use AI" blog post origin story* [00:08:24] Do we need software engineering agents?* [00:11:03] Using AI to kickstart a project* [00:14:08] Ephemeral software* [00:17:37] Using AI to accelerate research* [00:21:34] Experts vs non-expert users as beneficiaries of AI* [00:24:02] Research on generating less secure code with LLMs.* [00:27:22] Learning and explaining code with AI* [00:30:12] AGI speculations?* [00:32:50] Distributing content without social media* [00:35:39] How much data do you think you can put on a single piece of paper?* [00:37:37] Building personal AI benchmarks* [00:43:04] Evolution of prompt engineering and its relevance* [00:46:06] Model vs task benchmarking* [00:52:14] Poisoning LAION 400M through expired domains* [00:55:38] Stealing OpenAI models from their API* [01:01:29] Data stealing and recovering training data from models* [01:03:30] Finding motivation in your workTranscriptAlessio [00:00:00]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO-in-Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI.Swyx [00:00:12]: Hey, and today we're in the in-person studio, which Alessio has gorgeously set up for us, with Nicholas Carlini. Welcome. Thank you. You're a research scientist at DeepMind. You work at the intersection of machine learning and computer security. You got your PhD from Berkeley in 2018, and also your BA from Berkeley as well. And mostly we're here to talk about your blogs, because you are so generous in just writing up what you know. Well, actually, why do you write?Nicholas [00:00:41]: Because I like, I feel like it's fun to share what you've done. I don't like writing, sufficiently didn't like writing, I almost didn't do a PhD, because I knew how much writing was involved in writing papers. I was terrible at writing when I was younger. I do like the remedial writing classes when I was in university, because I was really bad at it. So I don't actually enjoy, I still don't enjoy the act of writing. But I feel like it is useful to share what you're doing, and I like being able to talk about the things that I'm doing that I think are fun. And so I write because I think I want to have something to say, not because I enjoy the act of writing.Swyx [00:01:14]: But yeah. It's a tool for thought, as they often say. Is there any sort of backgrounds or thing that people should know about you as a person? Yeah.Nicholas [00:01:23]: So I tend to focus on, like you said, I do security work, I try to like attacking things and I want to do like high quality security research. And that's mostly what I spend my actual time trying to be productive members of society doing that. But then I get distracted by things, and I just like, you know, working on random fun projects. Like a Doom clone in JavaScript.Swyx [00:01:44]: Yes.Nicholas [00:01:45]: Like that. Or, you know, I've done a number of things that have absolutely no utility. But are fun things to have done. And so it's interesting to say, like, you should work on fun things that just are interesting, even if they're not useful in any real way. And so that's what I tend to put up there is after I have completed something I think is fun, or if I think it's sufficiently interesting, write something down there.Alessio [00:02:09]: Before we go into like AI, LLMs and whatnot, why are you obsessed with the game of life? So you built multiplexing circuits in the game of life, which is mind boggling. So where did that come from? And then how do you go from just clicking boxes on the UI web version to like building multiplexing circuits?Nicholas [00:02:29]: I like Turing completeness. The definition of Turing completeness is a computer that can run anything, essentially. And the game of life, Conway's game of life is a very simple cellular 2D automata where you have cells that are either on or off. And a cell becomes on if in the previous generation some configuration holds true and off otherwise. It turns out there's a proof that the game of life is Turing complete, that you can run any program in principle using Conway's game of life. I don't know. And so you can, therefore someone should. And so I wanted to do it. Some other people have done some similar things, but I got obsessed into like, if you're going to try and make it work, like we already know it's possible in theory. I want to try and like actually make something I can run on my computer, like a real computer I can run. And so yeah, I've been going on this rabbit hole of trying to make a CPU that I can run semi real time on the game of life. And I have been making some reasonable progress there. And yeah, but you know, Turing completeness is just like a very fun trap you can go down. A while ago, as part of a research paper, I was able to show that in C, if you call into printf, it's Turing complete. Like printf, you know, like, which like, you know, you can print numbers or whatever, right?Swyx [00:03:39]: Yeah, but there should be no like control flow stuff.Nicholas [00:03:42]: Because printf has a percent n specifier that lets you write an arbitrary amount of data to an arbitrary location. And the printf format specifier has an index into where it is in the loop that is in memory. So you can overwrite the location of where printf is currently indexing using percent n. So you can get loops, you can get conditionals, and you can get arbitrary data rates again. So we sort of have another Turing complete language using printf, which again, like this has essentially zero practical utility, but like, it's just, I feel like a lot of people get into programming because they enjoy the art of doing these things. And then they go work on developing some software application and lose all joy with the boys. And I want to still have joy in doing these things. And so on occasion, I try to stop doing productive, meaningful things and just like, what's a fun thing that we can do and try and make that happen.Alessio [00:04:39]: Awesome. So you've been kind of like a pioneer in the AI security space. You've done a lot of talks starting back in 2018. We'll kind of leave that to the end because I know the security part is, there's maybe a smaller audience, but it's a very intense audience. So I think that'll be fun. But everybody in our Discord started posting your how I use AI blog post and we were like, we should get Carlini on the podcast. And then you were so nice to just, yeah, and then I sent you an email and you're like, okay, I'll come.Swyx [00:05:07]: And I was like, oh, I thought that would be harder.Alessio [00:05:10]: I think there's, as you said in the blog posts, a lot of misunderstanding about what LLMs can actually be used for. What are they useful at? What are they not good at? And whether or not it's even worth arguing what they're not good at, because they're obviously not. So if you cannot count the R's in a word, they're like, it's just not what it does. So how painful was it to write such a long post, given that you just said that you don't like to write? Yeah. And then we can kind of run through the things, but maybe just talk about the motivation, why you thought it was important to do it.Nicholas [00:05:39]: Yeah. So I wanted to do this because I feel like most people who write about language models being good or bad, some underlying message of like, you know, they have their camp and their camp is like, AI is bad or AI is good or whatever. And they like, they spin whatever they're going to say according to their ideology. And they don't actually just look at what is true in the world. So I've read a lot of things where people say how amazing they are and how all programmers are going to be obsolete by 2024. And I've read a lot of things where people who say like, they can't do anything useful at all. And, you know, like, they're just like, it's only the people who've come off of, you know, blockchain crypto stuff and are here to like make another quick buck and move on. And I don't really agree with either of these. And I'm not someone who cares really one way or the other how these things go. And so I wanted to write something that just says like, look, like, let's sort of ground reality and what we can actually do with these things. Because my actual research is in like security and showing that these models have lots of problems. Like this is like my day to day job is saying like, we probably shouldn't be using these in lots of cases. I thought I could have a little bit of credibility of in saying, it is true. They have lots of problems. We maybe shouldn't be deploying them lots of situations. And still, they are also useful. And that is the like, the bit that I wanted to get across is to say, I'm not here to try and sell you on anything. I just think that they're useful for the kinds of work that I do. And hopefully, some people would listen. And it turned out that a lot more people liked it than I thought. But yeah, that was the motivation behind why I wanted to write this.Alessio [00:07:15]: So you had about a dozen sections of like how you actually use AI. Maybe we can just kind of run through them all. And then maybe the ones where you have extra commentary to add, we can... Sure.Nicholas [00:07:27]: Yeah, yeah. I didn't put as much thought into this as maybe was deserved. I probably spent, I don't know, definitely less than 10 hours putting this together.Swyx [00:07:38]: Wow.Alessio [00:07:39]: It took me close to that to do a podcast episode. So that's pretty impressive.Nicholas [00:07:43]: Yeah. I wrote it in one pass. I've gotten a number of emails of like, you got this editing thing wrong, you got this sort of other thing wrong. It's like, I haven't just haven't looked at it. I tend to try it. I feel like I still don't like writing. And so because of this, the way I tend to treat this is like, I will put it together into the best format that I can at a time, and then put it on the internet, and then never change it. And this is an aspect of like the research side of me is like, once a paper is published, like it is done as an artifact that exists in the world. I could forever edit the very first thing I ever put to make it the most perfect version of what it is, and I would do nothing else. And so I feel like I find it useful to be like, this is the artifact, I will spend some certain amount of hours on it, which is what I think it is worth. And then I will just...Swyx [00:08:22]: Yeah.Nicholas [00:08:23]: Timeboxing.Alessio [00:08:24]: Yeah. Stop. Yeah. Okay. We just recorded an episode with the founder of Cosine, which is like an AI software engineer colleague. You said it took you 30,000 words to get GPT-4 to build you the, can GPT-4 solve this kind of like app. Where are we in the spectrum where chat GPT is all you need to actually build something versus I need a full on agent that does everything for me?Nicholas [00:08:46]: Yeah. Okay. So this was an... So I built a web app last year sometime that was just like a fun demo where you can guess if you can predict whether or not GPT-4 at the time could solve a given task. This is, as far as web apps go, very straightforward. You need basic HTML, CSS, you have a little slider that moves, you have a button, sort of animate the text coming to the screen. The reason people are going here is not because they want to see my wonderful HTML, right? I used to know how to do modern HTML in 2007, 2008. I was very good at fighting with IE6 and these kinds of things. I knew how to do that. I have no longer had to build any web app stuff in the meantime, which means that I know how everything works, but I don't know any of the new... Flexbox is new to me. Flexbox is like 10 years old at this point, but it's just amazing being able to go to the model and just say, write me this thing and it will give me all of the boilerplate that I need to get going. Of course it's imperfect. It's not going to get you the right answer, and it doesn't do anything that's complicated right now, but it gets you to the point where the only remaining work that needs to be done is the interesting hard part for me, the actual novel part. Even the current models, I think, are entirely good enough at doing this kind of thing, that they're very useful. It may be the case that if you had something, like you were saying, a smarter agent that could debug problems by itself, that might be even more useful. Currently though, make a model into an agent by just copying and pasting error messages for the most part. That's what I do, is you run it and it gives you some code that doesn't work, and either I'll fix the code, or it will give me buggy code and I won't know how to fix it, and I'll just copy and paste the error message and say, it tells me this. What do I do? And it will just tell me how to fix it. You can't trust these things blindly, but I feel like most people on the internet already understand that things on the internet, you can't trust blindly. And so this is not like a big mental shift you have to go through to understand that it is possible to read something and find it useful, even if it is not completely perfect in its output.Swyx [00:10:54]: It's very human-like in that sense. It's the same ring of trust, I kind of think about it that way, if you had trust levels.Alessio [00:11:03]: And there's maybe a couple that tie together. So there was like, to make applications, and then there's to get started, which is a similar you know, kickstart, maybe like a project that you know the LLM cannot solve. It's kind of how you think about it.Nicholas [00:11:15]: Yeah. So for getting started on things is one of the cases where I think it's really great for some of these things, where I sort of use it as a personalized, help me use this technology I've never used before. So for example, I had never used Docker before January. I know what Docker is. Lucky you. Yeah, like I'm a computer security person, like I sort of, I have read lots of papers on, you know, all the technology behind how these things work. You know, I know all the exploits on them, I've done some of these things, but I had never actually used Docker. But I wanted it to be able to, I could run the outputs of language model stuff in some controlled contained environment, which I know is the right application. So I just ask it like, I want to use Docker to do this thing, like, tell me how to run a Python program in a Docker container. And it like gives me a thing. I'm like, step back. You said Docker compose, I do not know what this word Docker compose is. Is this Docker? Help me. And like, you'll sort of tell me all of these things. And I'm sure there's this knowledge that's out there on the internet, like this is not some groundbreaking thing that I'm doing, but I just wanted it as a small piece of one thing I was working on. And I didn't want to learn Docker from first principles. Like I, at some point, if I need it, I can do that. Like I have the background that I can make that happen. But what I wanted to do was, was thing one. And it's very easy to get bogged down in the details of this other thing that helps you accomplish your end goal. And I just want to like, tell me enough about Docker so I can do this particular thing. And I can check that it's doing the safe thing. I sort of know enough about that from, you know, my other background. And so I can just have the model help teach me exactly the one thing I want to know and nothing more. I don't need to worry about other things that the writer of this thinks is important that actually isn't. Like I can just like stop the conversation and say, no, boring to me. Explain this detail. I don't understand. I think that's what that was very useful for me. It would have taken me, you know, several hours to figure out some things that take 10 minutes if you could just ask exactly the question you want the answer to.Alessio [00:13:05]: Have you had any issues with like newer tools? Have you felt any meaningful kind of like a cutoff day where like there's not enough data on the internet or? I'm sure that the answer to this is yes.Nicholas [00:13:16]: But I tend to just not use most of these things. Like I feel like this is like the significant way in which I use machine learning models is probably very different than most people is that I'm a researcher and I get to pick what tools that I use and most of the things that I work on are fairly small projects. And so I can, I can entirely see how someone who is in a big giant company where they have their own proprietary legacy code base of a hundred million lines of code or whatever and like you just might not be able to use things the same way that I do. I still think there are lots of use cases there that are entirely reasonable that are not the same ones that I've put down. But I wanted to talk about what I have personal experience in being able to say is useful. And I would like it very much if someone who is in one of these environments would be able to describe the ways in which they find current models useful to them. And not, you know, philosophize on what someone else might be able to find useful, but actually say like, here are real things that I have done that I found useful for me.Swyx [00:14:08]: Yeah, this is what I often do to encourage people to write more, to share their experiences because they often fear being attacked on the internet. But you are the ultimate authority on how you use things and there's this objectively true. So they cannot be debated. One thing that people are very excited about is the concept of ephemeral software or like personal software. This use case in particular basically lowers the activation energy for creating software, which I like as a vision. I don't think I have taken as much advantage of it as I could. I feel guilty about that. But also, we're trending towards there.Nicholas [00:14:47]: Yeah. No, I mean, I do think that this is a direction that is exciting to me. One of the things I wrote that was like, a lot of the ways that I use these models are for one-off things that I just need to happen that I'm going to throw away in five minutes. And you can.Swyx [00:15:01]: Yeah, exactly.Nicholas [00:15:02]: Right. It's like the kind of thing where it would not have been worth it for me to have spent 45 minutes writing this, because I don't need the answer that badly. But if it will only take me five minutes, then I'll just figure it out, run the program and then get it right. And if it turns out that you ask the thing, it doesn't give you the right answer. Well, I didn't actually need the answer that badly in the first place. Like either I can decide to dedicate the 45 minutes or I cannot, but like the cost of doing it is fairly low. You see what the model can do. And if it can't, then, okay, when you're using these models, if you're getting the answer you want always, it means you're not asking them hard enough questions.Swyx [00:15:35]: Say more.Nicholas [00:15:37]: Lots of people only use them for very small particular use cases and like it always does the thing that they want. Yeah.Swyx [00:15:43]: Like they use it like a search engine.Nicholas [00:15:44]: Yeah. Or like one particular case. And if you're finding that when you're using these, it's always giving you the answer that you want, then probably it has more capabilities than you're actually using. And so I oftentimes try when I have something that I'm curious about to just feed into the model and be like, well, maybe it's just solved my problem for me. You know, most of the time it doesn't, but like on occasion, it's like, it's done things that would have taken me, you know, a couple hours that it's been great and just like solved everything immediately. And if it doesn't, then it's usually easier to verify whether or not the answer is correct than to have written in the first place. And so you check, you're like, well, that's just, you're entirely misguided. Nothing here is right. It's just like, I'm not going to do this. I'm going to go write it myself or whatever.Alessio [00:16:21]: Even for non-tech, I had to fix my irrigation system. I had an old irrigation system. I didn't know how I worked to program it. I took a photo, I sent it to Claude and it's like, oh yeah, that's like the RT 900. This is exactly, I was like, oh wow, you know, you know, a lot of stuff.Swyx [00:16:34]: Was it right?Alessio [00:16:35]: Yeah, it was right.Swyx [00:16:36]: It worked. Did you compare with OpenAI?Alessio [00:16:38]: No, I canceled my OpenAI subscription, so I'm a Claude boy. Do you have a way to think about this like one-offs software thing? One way I talk to people about it is like LLMs are kind of converging to like semantic serverless functions, you know, like you can say something and like it can run the function in a way and then that's it. It just kind of dies there. Do you have a mental model to just think about how long it should live for and like anything like that?Nicholas [00:17:02]: I don't think I have anything interesting to say here, no. I will take whatever tools are available in front of me and try and see if I can use them in meaningful ways. And if they're helpful, then great. If they're not, then fine. And like, you know, there are lots of people that I'm very excited about seeing all these people who are trying to make better applications that use these or all these kinds of things. And I think that's amazing. I would like to see more of it, but I do not spend my time thinking about how to make this any better.Alessio [00:17:27]: What's the most underrated thing in the list? I know there's like simplified code, solving boring tasks, or maybe is there something that you forgot to add that you want to throw in there?Nicholas [00:17:37]: I mean, so in the list, I only put things that people could look at and go, I understand how this solved my problem. I didn't want to put things where the model was very useful to me, but it would not be clear to someone else that it was actually useful. So for example, one of the things that I use it a lot for is debugging errors. But the errors that I have are very much not the errors that anyone else in the world will have. And in order to understand whether or not the solution was right, you just have to trust me on it. Because, you know, like I got my machine in a state that like CUDA was not talking to whatever some other thing, the versions were mismatched, something, something, something, and everything was broken. And like, I could figure it out with interaction with the model, and it gave it like told me the steps I needed to take. But at the end of the day, when you look at the conversation, you just have to trust me that it worked. And I didn't want to write things online that were this, like, you have to trust me that what I'm saying. I want everything that I said to like have evidence that like, here's the conversation, you can go and check whether or not this actually solved the task as I said that the model does. Because a lot of people I feel like say, I used a model to solve this very complicated task. And what they mean is the model did 10%, and I did the other 90% or something, I wanted everything to be verifiable. And so one of the biggest use cases for me, I didn't describe even at all, because it's not the kind of thing that other people could have verified by themselves. So that maybe is like, one of the things that I wish I maybe had said a little bit more about, and just stated that the way that this is done, because I feel like that this didn't come across quite as well. But yeah, of the things that I talked about, the thing that I think is most underrated is the ability of it to solve the uninteresting parts of problems for me right now, where people always say, this is one of the biggest arguments that I don't understand why people say is, the model can only do things that people have done before. Therefore, the model is not going to be helpful in doing new research or like discovering new things. And as someone whose day job is to do new things, like what is research? Research is doing something literally no one else in the world has ever done before. So this is what I do every single day, 90% of this is not doing something new, 90% of this is doing things a million people have done before, and then a little bit of something that was new. There's a reason why we say we stand on the shoulders of giants. It's true. Almost everything that I do is something that's been done many, many times before. And that is the piece that can be automated. Even if the thing that I'm doing as a whole is new, it is almost certainly the case that the small pieces that build up to it are not. And a number of people who use these models, I feel like expect that they can either solve the entire task or none of the task. But now I find myself very often, even when doing something very new and very hard, having models write the easy parts for me. And the reason I think this is so valuable, everyone who programs understands this, like you're currently trying to solve some problem and then you get distracted. And whatever the case may be, someone comes and talks to you, you have to go look up something online, whatever it is. You lose a lot of time to that. And one of the ways we currently don't think about being distracted is you're solving some hard problem and you realize you need a helper function that does X, where X is like, it's a known algorithm. Any person in the world, you say like, give me the algorithm that, have a dense graph or a sparse graph, I need to make it dense. You can do this by doing some matrix multiplies. It's like, this is a solved problem. I knew how to do this 15 years ago, but it distracts me from the problem I'm thinking about in my mind. I needed this done. And so instead of using my mental capacity and solving that problem and then coming back to the problem I was originally trying to solve, you could just ask model, please solve this problem for me. It gives you the answer. You run it. You can check that it works very, very quickly. And now you go back to solving the problem without having lost all the mental state. And I feel like this is one of the things that's been very useful for me.Swyx [00:21:34]: And in terms of this concept of expert users versus non-expert users, floors versus ceilings, you had some strong opinion here that like, basically it actually is more beneficial for non-experts.Nicholas [00:21:46]: Yeah, I don't know. I think it could go either way. Let me give you the argument for both of these. Yes. So I can only speak on the expert user behalf because I've been doing computers for a long time. And so yeah, the cases where it's useful for me are exactly these cases where I can check the output. I know, and anything the model could do, I could have done. I could have done better. I can check every single thing that the model is doing and make sure it's correct in every way. And so I can only speak and say, definitely it's been useful for me. But I also see a world in which this could be very useful for the kinds of people who do not have this knowledge, with caveats, because I'm not one of these people. I don't have this direct experience. But one of these big ways that I can see this is for things that you can check fairly easily, someone who could never have asked or have written a program themselves to do a certain task could just ask for the program that does the thing. And you know, some of the times it won't get it right. But some of the times it will, and they'll be able to have the thing in front of them that they just couldn't have done before. And we see a lot of people trying to do applications for this, like integrating language models into spreadsheets. Spreadsheets run the world. And there are some people who know how to do all the complicated spreadsheet equations and various things, and other people who don't, who just use the spreadsheet program but just manually do all of the things one by one by one by one. And this is a case where you could have a model that could try and give you a solution. And as long as the person is rigorous in testing that the solution does actually the correct thing, and this is the part that I'm worried about most, you know, I think depending on these systems in ways that we shouldn't, like this is what my research says, my research says is entirely on this, like, you probably shouldn't trust these models to do the things in adversarial situations, like, I understand this very deeply. And so I think that it's possible for people who don't have this knowledge to make use of these tools in ways, but I'm worried that it might end up in a world where people just blindly trust them, deploy them in situations that they probably shouldn't, and then someone like me gets to come along and just break everything because everything is terrible. And so I am very, very worried about that being the case, but I think if done carefully it is possible that these could be very useful.Swyx [00:23:54]: Yeah, there is some research out there that shows that when people use LLMs to generate code, they do generate less secure code.Nicholas [00:24:02]: Yeah, Dan Bonet has a nice paper on this. There are a bunch of papers that touch on exactly this.Swyx [00:24:07]: My slight issue is, you know, is there an agenda here?Nicholas [00:24:10]: I mean, okay, yeah, Dan Bonet, at least the one they have, like, I fully trust everything that sort of.Swyx [00:24:15]: Sorry, I don't know who Dan is.Swyx [00:24:17]: He's a professor at Stanford. Yeah, he and some students have some things on this. Yeah, there's a number. I agree that a lot of the stuff feels like people have an agenda behind it. There are some that don't, and I trust them to have done the right thing. I also think, even on this though, we have to be careful because the argument, whenever someone says x is true about language models, you should always append the suffix for current models because I'll be the first to admit I was one of the people who was very much on the opinion that these language models are fun toys and are going to have absolutely no practical utility. If you had asked me this, let's say, in 2020, I still would have said the same thing. After I had seen GPT-2, I had written a couple of papers studying GPT-2 very carefully. I still would have told you these things are toys. And when I first read the RLHF paper and the instruction tuning paper, I was like, nope, this is this thing that these weird AI people are doing. They're trying to make some analogies to people that makes no sense. It's just like, I don't even care to read it. I saw what it was about and just didn't even look at it. I was obviously wrong. These things can be useful. And I feel like a lot of people had the same mentality that I did and decided not to change their mind. And I feel like this is the thing that I want people to be careful about. I want them to at least know what is true about the world so that they can then see that maybe they should reconsider some of the opinions that they had from four or five years ago that may just not be true about today's models.Swyx [00:25:47]: Specifically because you brought up spreadsheets, I want to share my personal experience because I think Google has done a really good job that people don't know about, which is if you use Google Sheets, Gemini is integrated inside of Google Sheets and it helps you write formulas. Great.Nicholas [00:26:00]: That's news to me.Swyx [00:26:01]: Right? They don't maybe do a good job. Unless you watch Google I.O., there was no other opportunity to learn that Gemini is now in your Google Sheets. And so I just don't write formulas manually anymore. It just prompts Gemini to do it for me. And it does it.Nicholas [00:26:15]: One of the problems that these machine learning models have is a discoverability problem. I think this will be figured out. I mean, it's the same problem that you have with any assistant. You're given a blank box and you're like, what do I do with it? I think this is great. More of these things, it would be good for them to exist. I want them to exist in ways that we can actually make sure that they're done correctly. I don't want to just have them be pushed into more and more things just blindly. I feel like lots of people, there are far too many X plus AI, where X is like arbitrary thing in the world that has nothing to do with it and could not be benefited at all. And they're just doing it because they want to use the word. And I don't want that to happen.Swyx [00:26:58]: You don't want an AI fridge?Nicholas [00:27:00]: No. Yes. I do not want my fridge on the internet.Swyx [00:27:03]: I do not want... Okay.Nicholas [00:27:05]: Anyway, let's not go down that rabbit hole. I understand why some of that happens, because people want to sell things or whatever. But I feel like a lot of people see that and then they write off everything as a result of it. And I just want to say, there are allowed to be people who are trying to do things that don't make any sense. Just ignore them. Do the things that make sense.Alessio [00:27:22]: Another chunk of use cases was learning. So both explaining code, being an API reference, all of these different things. Any suggestions on how to go at it? I feel like one thing is generate code and then explain to me. One way is just tell me about this technology. Another thing is like, hey, I read this online, kind of help me understand it. Any best practices on getting the most out of it?Swyx [00:27:47]: Yeah.Nicholas [00:27:47]: I don't know if I have best practices. I have how I use them.Swyx [00:27:51]: Yeah.Nicholas [00:27:51]: I find it very useful for cases where I understand the underlying ideas, but I have never usedSwyx [00:27:59]: them in this way before.Nicholas [00:28:00]: I know what I'm looking for, but I just don't know how to get there. And so yeah, as an API reference is a great example. The tool everyone always picks on is like FFmpeg. No one in the world knows the command line arguments to do what they want. They're like, make the thing faster. I want lower bitrate, like dash V. Once you tell me what the answer is, I can check. This is one of these things where it's great for these kinds of things. Or in other cases, things where I don't really care that the answer is 100% correct. So for example, I do a lot of security work. Most of security work is reading some code you've never seen before and finding out which pieces of the code are actually important. Because, you know, most of the program isn't actually do anything to do with security. It has, you know, the display piece or the other piece or whatever. And like, you just, you would only ignore all of that. So one very fun use of models is to like, just have it describe all the functions and just skim it and be like, wait, which ones look like approximately the right things to look at? Because otherwise, what are you going to do? You're going to have to read them all manually. And when you're reading them manually, you're going to skim the function anyway, and not just figure out what's going on perfectly. Like you already know that when you're going to read these things, what you're going to try and do is figure out roughly what's going on. Then you'll delve into the details. This is a great way of just doing that, but faster, because it will abstract most of whatSwyx [00:29:21]: is right.Nicholas [00:29:21]: It's going to be wrong some of the time. I don't care.Swyx [00:29:23]: I would have been wrong too.Nicholas [00:29:24]: And as long as you treat it with this way, I think it's great. And so like one of the particular use cases I have in the thing is decompiling binaries, where oftentimes people will release a binary. They won't give you the source code. And you want to figure out how to attack it. And so one thing you could do is you could try and run some kind of decompiler. It turns out for the thing that I wanted, none existed. And so I spent too many hours doing it by hand. Before I first thought, why am I doing this? I should just check if the model could do it for me. And it turns out that it can. And it can turn the compiled source code, which is impossible for any human to understand, into the Python code that is entirely reasonable to understand. And it doesn't run. It has a bunch of problems. But it's so much nicer that it's immediately a win for me. I can just figure out approximately where I should be looking, and then spend all of my time doing that by hand. And again, you get a big win there.Swyx [00:30:12]: So I fully agree with all those use cases, especially for you as a security researcher and having to dive into multiple things. I imagine that's super helpful. I do think we want to move to your other blog post. But you ended your post with a little bit of a teaser about your next post and your speculations. What are you thinking about?Nicholas [00:30:34]: So I want to write something. And I will do that at some point when I have time, maybe after I'm done writing my current papers for ICLR or something, where I want to talk about some thoughts I have for where language models are going in the near-term future. The reason why I want to talk about this is because, again, I feel like the discussion tends to be people who are either very much AGI by 2027, orSwyx [00:30:55]: always five years away, or are going to make statements of the form,Nicholas [00:31:00]: you know, LLMs are the wrong path, and we should be abandoning this, and we should be doing something else instead. And again, I feel like people tend to look at this and see these two polarizing options and go, well, those obviously are both very far extremes. Like, how do I actually, like, what's a more nuanced take here? And so I have some opinions about this that I want to put down, just saying, you know, I have wide margins of error. I think you should too. If you would say there's a 0% chance that something, you know, the models will get very, very good in the next five years, you're probably wrong. If you're going to say there's a 100% chance that in the next five years, then you're probably wrong. And like, to be fair, most of the people, if you read behind the headlines, actually say something like this. But it's very hard to get clicks on the internet of like, some things may be good in the future. Like, everyone wants like, you know, a very, like, nothing is going to be good. This is entirely wrong. It's going to be amazing. You know, like, they want to see this. I want people who have negative reactions to these kinds of extreme views to be able to at least say, like, to tell them, there is something real here. It may not solve all of our problems, but it's probably going to get better. I don't know by how much. And that's basically what I want to say. And then at some point, I'll talk about the safety and security things as a result of this. Because the way in which security intersects with these things depends a lot in exactly how people use these tools. You know, if it turns out to be the case that these models get to be truly amazing and can solve, you know, tasks completely autonomously, that's a very different security world to be living in than if there's always a human in the loop. And the types of security questions I would want to ask would be very different. And so I think, you know, in some very large part, understanding what the future will look like a couple of years ahead of time is helpful for figuring out which problems, as a security person, I want to solve now. You mentioned getting clicks on the internet,Alessio [00:32:50]: but you don't even have, like, an ex-account or anything. How do you get people to read your stuff? What's your distribution strategy? Because this post was popping up everywhere. And then people on Twitter were like, Nicholas Garlini wrote this. Like, what's his handle? It's like, he doesn't have it. It's like, how did you find it? What's the story?Nicholas [00:33:07]: So I have an RSS feed and an email list. And that's it. I don't like most social media things. On principle, I feel like they have some harms. As a person, I have a problem when people say things that are wrong on the internet. And I would get nothing done if I would have a Twitter. I would spend all of my time correcting people and getting into fights. And so I feel like it is just useful for me for this not to be an option. I tend to just post things online. Yeah, it's a very good question. I don't know how people find it. I feel like for some things that I write, other people think it resonates with them. And then they put it on Twitter. And...Swyx [00:33:43]: Hacker News as well.Nicholas [00:33:44]: Sure, yeah. I am... Because my day job is doing research, I get no value for having this be picked up. There's no whatever. I don't need to be someone who has to have this other thing to give talks. And so I feel like I can just say what I want to say. And if people find it useful, then they'll share it widely. You know, this one went pretty wide. I wrote a thing, whatever, sometime late last year, about how to recover data off of an Apple profile drive from 1980. This probably got, I think, like 1000x less views than this. But I don't care. Like, that's not why I'm doing this. Like, this is the benefit of having a thing that I actually care about, which is my research. I would care much more if that didn't get seen. This is like a thing that I write because I have some thoughts that I just want to put down.Swyx [00:34:32]: Yeah. I think it's the long form thoughtfulness and authenticity that is sadly lacking sometimes in modern discourse that makes it attractive. And I think now you have a little bit of a brand of you are an independent thinker, writer, person, that people are tuned in to pay attention to whatever is next coming.Nicholas [00:34:52]: Yeah, I mean, this kind of worries me a little bit. I don't like whenever I have a popular thing that like, and then I write another thing, which is like entirely unrelated. Like, I don't, I don't... You should actually just throw people off right now.Swyx [00:35:01]: Exactly.Nicholas [00:35:02]: I'm trying to figure out, like, I need to put something else online. So, like, the last two or three things I've done in a row have been, like, actually, like, things that people should care about.Swyx [00:35:10]: Yes. So, I have a couple of things.Nicholas [00:35:11]: I'm trying to figure out which one do I put online to just, like, cull the list of people who have subscribed to my email.Swyx [00:35:16]: And so, like, tell them, like,Nicholas [00:35:16]: no, like, what you're here for is not informed, well-thought-through takes. Like, what you're here for is whatever I want to talk about. And if you're not up for that, then, like, you know, go away. Like, this is not what I want out of my personal website.Swyx [00:35:27]: So, like, here's, like, top 10 enemies or something.Alessio [00:35:30]: What's the next project you're going to work on that is completely unrelated to research LLMs? Or what games do you want to port into the browser next?Swyx [00:35:39]: Okay. Yeah.Nicholas [00:35:39]: So, maybe.Swyx [00:35:41]: Okay.Nicholas [00:35:41]: Here's a fun question. How much data do you think you can put on a single piece of paper?Swyx [00:35:47]: I mean, you can think about bits and atoms. Yeah.Nicholas [00:35:49]: No, like, normal printer. Like, I gave you an office printer. How much data can you put on a piece of paper?Alessio [00:35:54]: Can you re-decode it? So, like, you know, base 64A or whatever. Yeah, whatever you want.Nicholas [00:35:59]: Like, you get normal off-the-shelf printer, off-the-shelf scanner. How much data?Swyx [00:36:03]: I'll just throw out there. Like, 10 megabytes. That's enormous. I know.Nicholas [00:36:07]: Yeah, that's a lot.Swyx [00:36:10]: Really small fonts. That's my question.Nicholas [00:36:12]: So, I have a thing. It does about a megabyte.Swyx [00:36:14]: Yeah, okay.Nicholas [00:36:14]: There you go. I was off by an order of magnitude.Swyx [00:36:16]: Yeah, okay.Nicholas [00:36:16]: So, in particular, it's about 1.44 megabytes. A floppy disk.Swyx [00:36:21]: Yeah, exactly.Nicholas [00:36:21]: So, this is supposed to be the title at some point. It's a floppy disk.Swyx [00:36:24]: A paper is a floppy disk. Yeah.Nicholas [00:36:25]: So, this is a little hard because, you know. So, you can do the math and you get 8.5 by 11. You can print at 300 by 300 DPI. And this gives you 2 megabytes. And so, every single pixel, you need to be able to recover up to like 90 plus percent. Like, 95 percent. Like, 99 point something percent accuracy. In order to be able to actually decode this off the paper. This is one of the things that I'm considering. I need to get a couple more things working for this. Where, you know, again, I'm running into some random problems. But this is probably, this will be one thing that I'm going to talk about. There's this contest called the International Obfuscated C-Code Contest, which is amazing. People try and write the most obfuscated C code that they can. Which is great. And I have a submission for that whenever they open up the next one for it. And I'll write about that submission. I have a very fun gate level emulation of an old CPU that runs like fully precisely. And it's a fun kind of thing. Yeah.Swyx [00:37:20]: Interesting. Your comment about the piece of paper reminds me of when I was in college. And you would have like one cheat sheet that you could write. So, you have a formula, a theoretical limit for bits per inch. And, you know, that's how much I would squeeze in really, really small. Yeah, definitely.Nicholas [00:37:36]: Okay.Swyx [00:37:37]: We are also going to talk about your benchmarking. Because you released your own benchmark that got some attention, thanks to some friends on the internet. What's the story behind your own benchmark? Do you not trust the open source benchmarks? What's going on there?Nicholas [00:37:51]: Okay. Benchmarks tell you how well the model solves the task the benchmark is designed to solve. For a long time, models were not useful. And so, the benchmark that you tracked was just something someone came up with, because you need to track something. All of deep learning exists because people tried to make models classify digits and classify images into a thousand classes. There is no one in the world who cares specifically about the problem of distinguishing between 300 breeds of dog for an image that's 224 or 224 pixels. And yet, like, this is what drove a lot of progress. And people did this not because they cared about this problem, because they wanted to just measure progress in some way. And a lot of benchmarks are of this flavor. You want to construct a task that is hard, and we will measure progress on this benchmark, not because we care about the problem per se, but because we know that progress on this is in some way correlated with making better models. And this is fine when you don't want to actually use the models that you have. But when you want to actually make use of them, it's important to find benchmarks that track with whether or not they're useful to you. And the thing that I was finding is that there would be model after model after model that was being released that would find some benchmark that they could claim state-of-the-art on and then say, therefore, ours is the best. And that wouldn't be helpful to me to know whether or not I should then switch to it. So the argument that I tried to lay out in this post is that more people should make benchmarks that are tailored to them. And so what I did is I wrote a domain-specific language that anyone can write for and say, you can take tasks that you have wanted models to solve for you, and you can put them into your benchmark that's the thing that you care about. And then when a new model comes out, you benchmark the model on the things that you care about. And you know that you care about them because you've actually asked for those answers before. And if the model scores well, then you know that for the kinds of things that you have asked models for in the past, it can solve these things well for you. This has been useful for me because when another model comes out, I can run it. I can see, does this solve the kinds of things that I care about? And sometimes the answer is yes, and sometimes the answer is no. And then I can decide whether or not I want to use that model or not. I don't want to say that existing benchmarks are not useful. They're very good at measuring the thing that they're designed to measure. But in many cases, what that's designed to measure is not actually the thing that I want to use it for. And I expect that the way that I want to use it is different the way that you want to use it. And I would just like more people to have these things out there in the world. And the final reason for this is, it is very easy. If you want to make a model good at some benchmark, to make it good at that benchmark, you can find the distribution of data that you need and train the model to be good on the distribution of data. And then you have your model that can solve this benchmark well. And by having a benchmark that is not very popular, you can be relatively certain that no one has tried to optimize their model for your benchmark.Swyx [00:40:40]: And I would like this to be-Nicholas [00:40:40]: So publishing your benchmark is a little bit-Swyx [00:40:43]: Okay, sure.Nicholas [00:40:43]: Contextualized. So my hope in doing this was not that people would use mine as theirs. My hope in doing this was that- You should make yours. Yes, you should make your benchmark. And if, for example, there were even a very small fraction of people, 0.1% of people who made a benchmark that was useful for them, this would still be hundreds of new benchmarks that- not want to make one myself, but I might want to- I might know the kinds of work that I do is a little bit like this person, a little bit like that person. I'll go check how it is on their benchmarks. And I'll see, roughly, I'll get a good sense of what's going on. Because the alternative is people just do this vibes-based evaluation thing, where you interact with the model five times, and you see if it worked on the kinds of things that you just like your toy questions. But five questions is a very low bit output from whether or not it works for this thing. And if you could just automate running it 100 questions for you, it's a much better evaluation. So that's why I did this.Swyx [00:41:37]: Yeah, I like the idea of going through your chat history and actually pulling out real-life examples. I regret to say that I don't think my chat history is used as much these days, because I'm using Cursor, the native AI IDE. So your examples are all coding related. And the immediate question is, now that you've written the How I Use AI post, which is a little bit broader, are you able to translate all these things to evals? Are some things unevaluable?Nicholas [00:42:03]: Right. A number of things that I do are harder to evaluate. So this is the problem with a benchmark, is you need some way to check whether or not the output was correct. And so all of the kinds of things that I can put into the benchmark are the kinds of things that you can check. You can check more things than you might have thought would be possible if you do a little bit of work on the back end. So for example, all of the code that I have the model write, it runs the code and sees whether the answer is the correct answer. Or in some cases, it runs the code, feeds the output to another language model, and the language model judges was the output correct. And again, is using a language model to judge here perfect? No. But like, what's the alternative? The alternative is to not do it. And what I care about is just, is this thing broadly useful for the kinds of questions that I have? And so as long as the accuracy is better than roughly random, like, I'm okay with this. I've inspected the outputs of these, and like, they're almost always correct. If you ask the model to judge these things in the right way, they're very good at being able to tell this. And so, yeah, I probably think this is a useful thing for people to do.Alessio [00:43:04]: You complain about prompting and being lazy and how you do not want to tip your model and you do not want to murder a kitten just to get the right answer. How do you see the evolution of like prompt engineering? Even like 18 months ago, maybe, you know, it was kind of like really hot and people wanted to like build companies around it. Today, it's like the models are getting good. Do you think it's going to be less and less relevant going forward? Or what's the minimum valuable prompt? Yeah, I don't know.Nicholas [00:43:29]: I feel like a big part of making an agent is just like a fancy prompt that like, you know, calls back to the model again. I have no opinion. It seems like maybe it turns out that this is really important. Maybe it turns out that this isn't. I guess the only comment I was making here is just to say, oftentimes when I use a model and I find it's not useful, I talk to people who help make it. The answer they usually give me is like, you're using it wrong. Which like reminds me very much of like that you're holding it wrong from like the iPhone kind of thing, right? Like, you know, like I don't care that I'm holding it wrong. I'm holding it that way. If the thing is not working with me, then like it's not useful for me. Like it may be the case that there exists a way to ask the model such that it gives me the answer that's correct, but that's not the way I'm doing it. If I have to spend so much time thinking about how I want to frame the question, that it would have been faster for me just to get the answer. It didn't save me any time. And so oftentimes, you know, what I do is like, I just dump in whatever current thought that I have in whatever ill-formed way it is. And I expect the answer to be correct. And if the answer is not correct, like in some sense, maybe the model was right to give me the wrong answer. Like I may have asked the wrong question, but I want the right answer still. And so like, I just want to sort of get this as a thing. And maybe the way to fix this is you have some default prompt that always goes into all the models or something, or you do something like clever like this. It would be great if someone had a way to package this up and make a thing I think that's entirely reasonable. Maybe it turns out that as models get better, you don't need to prompt them as much in this way. I just want to use the things that are in front of me.Alessio [00:44:55]: Do you think that's like a limitation of just how models work? Like, you know, at the end of the day, you're using the prompt to kind of like steer it in the latent space. Like, do you think there's a way to actually not make the prompt really relevant and have the model figure it out? Or like, what's the... I mean, you could fine tune itNicholas [00:45:10]: into the model, for example, that like it's supposed to... I mean, it seems like some models have done this, for example, like some recent model, many recent models. If you ask them a question, computing an integral of this thing, they'll say, let's think through this step by step. And then they'll go through the step by step answer. I didn't tell it. Two years ago, I would have had to have prompted it. Think step by step on solving the following thing. Now you ask them the question and the model says, here's how I'm going to do it. I'm going to take the following approach and then like sort of self-prompt itself.Swyx [00:45:34]: Is this the right way?Nicholas [00:45:35]: Seems reasonable. Maybe you don't have to do it. I don't know. This is for the people whose job is to make these things better. And yeah, I just want to use these things. Yeah.Swyx [00:45:43]: For listeners, that would be Orca and Agent Instruct. It's the soda on this stuff. Great. Yeah.Alessio [00:45:49]: That's a few shot. It's included in the lazy prompting. Like, do you do a few shot prompting? Like, do you collect some examples when you want to put them in? Or...Nicholas [00:45:57]: I don't because usually when I want the answer, I just want to get the answer. Brutal.Swyx [00:46:03]: This is hard mode. Yeah, exactly.Nicholas [00:46:04]: But this is fine.Swyx [00:46:06]: I want to be clear.Nicholas [00:46:06]: There's a difference between testing the ultimate capability level of the model and testing the thing that I'm doing with it. What I'm doing is I'm not exercising its full capability level because there are almost certainly better ways to ask the questions and sort of really see how good the model is. And if you're evaluating a model for being state of the art, this is ultimately what I care about. And so I'm entirely fine with people doing fancy prompting to show me what the true capability level could be because it's really useful to know what the ultimate level of the model could be. But I think it's also important just to have available to you how good the model is if you don't do fancy things.Swyx [00:46:39]: Yeah, I would say that here's a divergence between how models are marketed these days versus how people use it, which is when they test MMLU, they'll do like five shots, 25 shots, 50 shots. And no one's providing 50 examples. I completely agree.Nicholas [00:46:54]: You know, for these numbers, the problem is everyone wants to get state of the art on the benchmark. And so you find the way that you can ask the model the questions so that you get state of the art on the benchmark. And it's good. It's legitimately good to know. It's good to know the model can do this thing if only you try hard enough. Because it means that if I have some task that I want to be solved, I know what the capability level is. And I could get there if I was willing to work hard enough. And the question then is, should I work harder and figure out how to ask the model the question? Or do I just do the thing myself? And for me, I have programmed for many, many, many years. It's often just faster for me just to do the thing than to figure out the incantation to ask the model. But I can imagine someone who has never programmed before might be fine writing five paragraphs in English describing exactly the thing that they want and have the model build it for them if the alternative is not. But again, this goes to all these questions of how are they going to validate? Should they be trusting the output? These kinds of things.Swyx [00:47:49]: One problem with your eval paradigm and most eval paradigms, I'm not picking on you, is that we're actually training these things for chat, for interactive back and forth. And you actually obviously reveal much more information in the same way that asking 20 questions reveals more information in sort of a tree search branching sort of way. Then this is also by the way the problem with LMSYS arena, right? Where the vast majority of prompts are single question, single answer, eval, done. But actually the way that we use chat things, in the way, even in the stuff that you posted in your how I use AI stuff, you have maybe 20 turns of back and forth. How do you eval that?Nicholas [00:48:25]: Yeah. Okay. Very good question. This is the thing that I think many people should be doing more of. I would like more multi-turn evals. I might be writing a paper on this at some point if I get around to it. A couple of the evals in the benchmark thing I have are already multi-turn. I mentioned 20 questions. I have a 20 question eval there just for fun. But I have a couple others that are like, I just tell the model, here's my get thing, figure out how to cherry pick off this other branch and move it over there. And so what I do is I just, I basically build a tiny little agency thing. I just ask the model how I do it. I run the thing on Linux. This is what I want a Docker for. I spin up a Docker container. I run whatever the model told me the output to do is. I feed the output back into the model. I repeat this many rounds. And then I check at the very end, does the git commit history show that it is correctly cherry picked in
Nicholas Carlini joined Bryan and Adam to talk about his terrific blog post on his many pragmatic uses of LLMs to solve real problems. He has great advice about when to use them (often!) and what kinds of problems they handle well. LLMs aren't great at many things, but used well they can be an amazing tool.In addition to Bryan Cantrill and Adam Leventhal, we were joined by special guest, Nicholas Carlini as well as by listeners Mike Cafarella, p5commit, and chrisbur.Some of the topics we hit on, in the order that we hit them:Nicholas' blog: How I Use "AI"The McLaughlin GroupSurge 2011 ~ Closing Plenary ~ Theo SchlossnagleMicrosoft's Tay chatbotCurb Your Enthusiasm: Larry vs. SiriSal Khan on LLMsGoogle's awful AI adGoogle pulls adIf we got something wrong or missed something, please file a PR! Our next show will likely be on Monday at 5p Pacific Time on our Discord server; stay tuned to our Mastodon feeds for details, or subscribe to this calendar. We'd love to have you join us, as we always love to hear from new speakers!
She was one of the hottest names in Indy media on the 1980s. From TV weather to radio diva, Pat Carlini was a household name. In this episode, she talks about her career path and her new goals, post media.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Can Generalized Adversarial Testing Enable More Rigorous LLM Safety Evals?, published by Stephen Casper on July 30, 2024 on The AI Alignment Forum. Thanks to Zora Che, Michael Chen, Andi Peng, Lev McKinney, Bilal Chughtai, Shashwat Goel, Domenic Rosati, and Rohit Gandikota. TL;DR In contrast to evaluating AI systems under normal "input-space" attacks, using "generalized," attacks, which allow an attacker to manipulate weights or activations, might be able to help us better evaluate LLMs for risks - even if they are deployed as black boxes. Here, I outline the rationale for "generalized" adversarial testing and overview current work related to it. See also prior work in Casper et al. (2024), Casper et al. (2024), and Sheshadri et al. (2024). Even when AI systems perform well in typical circumstances, they sometimes fail in adversarial/anomalous ones. This is a persistent problem. State-of-the-art AI systems tend to retain undesirable latent capabilities that can pose risks if they resurface. My favorite example of this is the most cliche one many recent papers have demonstrated diverse attack techniques that can be used to elicit instructions for making a bomb from state-of-the-art LLMs. There is an emerging consensus that, even when LLMs are fine-tuned to be harmless, they can retain latent harmful capabilities that can and do cause harm when they resurface ( Qi et al., 2024). A growing body of work on red-teaming ( Shayegani et al., 2023, Carlini et al., 2023, Geiping et al., 2024, Longpre et al., 2024), interpretability ( Juneja et al., 2022, Lubana et al., 2022, Jain et al., 2023, Patil et al., 2023, Prakash et al., 2024, Lee et al., 2024), representation editing ( Wei et al., 2024, Schwinn et al., 2024), continual learning ( Dyer et al., 2022, Cossu et al., 2022, Li et al., 2022, Scialom et al., 2022, Luo et al., 2023, Kotha et al., 2023, Shi et al., 2023, Schwarzchild et al., 2024), and fine-tuning ( Jain et al., 2023, Yang et al., 2023, Qi et al., 2023, Bhardwaj et al., 2023, Lermen et al., 2023, Zhan et al., 2023, Ji et al., 2024, Hu et al., 2024, Halawi et al., 2024) suggests that fine-tuning struggles to make fundamental changes to an LLM's inner knowledge and capabilities. For example, Jain et al. (2023) likened fine-tuning in LLMs to merely modifying a "wrapper" around a stable, general-purpose set of latent capabilities. Even if they are generally inactive, harmful latent capabilities can pose harm if they resurface due to an attack, anomaly, or post-deployment modification ( Hendrycks et al., 2021, Carlini et al., 2023). We can frame the problem as such: There are hyper-astronomically many inputs for modern LLMs (e.g. there are vastly more 20-token strings than particles in the observable universe), so we can't brute-force-search over the input space to make sure they are safe. So unless we are able to make provably safe advanced AI systems (we won't soon and probably never will), there will always be a challenge with ensuring safety - the gap between the set of failure modes that developers identify, and unforeseen ones that they don't. This is a big challenge because of the inherent unknown-unknown nature of the problem. However, it is possible to try to infer how large this gap might be. Taking a page from the safety engineering textbook -- when stakes are high, we should train and evaluate LLMs under threats that are at least as strong as, and ideally stronger than, ones that they will face in deployment. First, imagine that an LLM is going to be deployed open-source (or if it could be leaked). Then, of course, the system's safety depends on what it can be modified to do. So it should be evaluated not as a black-box but as a general asset to malicious users who might enhance it through finetuning or other means. This seems obvious, but there's preced...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: You should go to ML conferences, published by Jan Kulveit on July 24, 2024 on LessWrong. This is second kind of obvious point to make, but if you are interested in AI, AI safety, or cognition in general, it is likely worth going to top ML conferences, such as NeurIPS, ICML or ICLR. In this post I cover some reasons why, and some anecdotal stories. 1. Parts of AI alignment and safety are now completely mainstream Looking at the "Best paper awards" at ICML, you'll find these safety-relevant or alignment-relevant papers: Stealing part of a production language model by Carlini et al. Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo by Zhao et al. Debating with More Persuasive LLMs Leads to More Truthful Answers by Khan et al. Genie: Generative Interactive Environments Bruce et al. which amounts to about one-third (!). "Because of safety concerns" is part of the motivation for hundreds of papers. While the signal-to-noise ratio is even worse than on LessWrong, in total, the amount you can learn is higher - my personal guess is there is maybe 2-3x as much prosaic AI safety relevant work at conferences than what you get by just following LessWrong, Alignment Forum and safety-oriented communication channels. 2. Conferences are an efficient way how to screen general ML research without spending a lot of time on X Almost all papers are presented in the form of posters. In case of a big conference, this usually means many thousands of posters presented in huge poster sessions. My routine for engaging with this firehose of papers: 1. For each session, read all the titles. Usually, this prunes it by a factor of ten (i.e. from 600 papers to 60). 2. Read the abstracts. Prune it to things which I haven't noticed before and seem relevant. For me, this is usually by a factor of ~3-5. 3. Visit the posters. Posters with paper authors present are actually a highly efficient way how to digest research: Sometimes, you suspect there is some assumption or choice hidden somewhere making the result approximately irrelevant - just asking can often resolve this in a matter of tens of seconds. Posters themselves don't undergo peer review which makes the communication more honest, with less hedging. Usually authors of a paper know significantly more about the problem than what's in the paper, and you can learn more about negative results, obstacles, or directions people are excited about. Clear disadvantage of conferences is the time lag; by the time they are presented, some of the main results are old and well known, but in my view a lot of the value is the long tail of results which are sometimes very useful, but not attention grabbing. 3. ML research community as a control group My vague impression is that in conceptual research, mainstream ML research lags behind LW/AI safety community by something between 1 to 5 years, rediscovering topics discussed here. Some examples: ICML poster & oral presentation The Platonic Representation Hypothesis is an independent version of Natural abstractions discussed here for about 4 years. A Roadmap to Pluralistic Alignment deals with Self-unalignment problem and Coherent extrapolated volition Plenty of research on safety protocols like debate, IDA,... Prior work published in the LW/AI safety community is almost never cited or acknowledged - in some cases because it is more convenient to claim the topic is completely novel, but I suspect in many cases researchers are genuinely not aware of the existing work, which makes their contribution a useful control: if someone starts thinking about these topics, unaware of the thousands hours spent on them by dozens of people, what will they arrive at? 4. What 'experts' think ML research community is the intellectual home of many people expressing public opinions about AI risk. In my view, b...
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: You should go to ML conferences, published by Jan Kulveit on July 24, 2024 on LessWrong. This is second kind of obvious point to make, but if you are interested in AI, AI safety, or cognition in general, it is likely worth going to top ML conferences, such as NeurIPS, ICML or ICLR. In this post I cover some reasons why, and some anecdotal stories. 1. Parts of AI alignment and safety are now completely mainstream Looking at the "Best paper awards" at ICML, you'll find these safety-relevant or alignment-relevant papers: Stealing part of a production language model by Carlini et al. Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo by Zhao et al. Debating with More Persuasive LLMs Leads to More Truthful Answers by Khan et al. Genie: Generative Interactive Environments Bruce et al. which amounts to about one-third (!). "Because of safety concerns" is part of the motivation for hundreds of papers. While the signal-to-noise ratio is even worse than on LessWrong, in total, the amount you can learn is higher - my personal guess is there is maybe 2-3x as much prosaic AI safety relevant work at conferences than what you get by just following LessWrong, Alignment Forum and safety-oriented communication channels. 2. Conferences are an efficient way how to screen general ML research without spending a lot of time on X Almost all papers are presented in the form of posters. In case of a big conference, this usually means many thousands of posters presented in huge poster sessions. My routine for engaging with this firehose of papers: 1. For each session, read all the titles. Usually, this prunes it by a factor of ten (i.e. from 600 papers to 60). 2. Read the abstracts. Prune it to things which I haven't noticed before and seem relevant. For me, this is usually by a factor of ~3-5. 3. Visit the posters. Posters with paper authors present are actually a highly efficient way how to digest research: Sometimes, you suspect there is some assumption or choice hidden somewhere making the result approximately irrelevant - just asking can often resolve this in a matter of tens of seconds. Posters themselves don't undergo peer review which makes the communication more honest, with less hedging. Usually authors of a paper know significantly more about the problem than what's in the paper, and you can learn more about negative results, obstacles, or directions people are excited about. Clear disadvantage of conferences is the time lag; by the time they are presented, some of the main results are old and well known, but in my view a lot of the value is the long tail of results which are sometimes very useful, but not attention grabbing. 3. ML research community as a control group My vague impression is that in conceptual research, mainstream ML research lags behind LW/AI safety community by something between 1 to 5 years, rediscovering topics discussed here. Some examples: ICML poster & oral presentation The Platonic Representation Hypothesis is an independent version of Natural abstractions discussed here for about 4 years. A Roadmap to Pluralistic Alignment deals with Self-unalignment problem and Coherent extrapolated volition Plenty of research on safety protocols like debate, IDA,... Prior work published in the LW/AI safety community is almost never cited or acknowledged - in some cases because it is more convenient to claim the topic is completely novel, but I suspect in many cases researchers are genuinely not aware of the existing work, which makes their contribution a useful control: if someone starts thinking about these topics, unaware of the thousands hours spent on them by dozens of people, what will they arrive at? 4. What 'experts' think ML research community is the intellectual home of many people expressing public opinions about AI risk. In my view, b...
Lauren Carlini is a setter for the U.S. Women's National Team and has played professional volleyball overseas for eight years. A three-time AVCA All-American at the University of Wisconsin, Carlini grew to love Madison, Wisconsin, and will lead LOVB Madison when League One Volleyball's inaugural season launches in late 2024. She and Tiffany Oshinsky discuss Lauren's volleyball career at Wisconsin and abroad, excitement over joining LOVB and LOVB Madison, affection for dogs – and, of course, pancakes – during the inaugural episode of Serving Pancakes. Chapters include: Young Lauren's dreams and how she got into volleyball Introduction to the University of Wisconsin, its campus and the Madison community Beginning a pro career overseas Lauren's response to missing the Tokyo 2020 roster The close-knit relationships on Team USA Excitement about playing professional volleyball in the United States Lauren Carlini: Must LOVB dogs Lauren's Legacy Follow Lauren on Instagram and X (formerly Twitter). Become a LOVB Insider to stay up-to-date on when tickets will go on sale, team info, venue announcements and more! Host: Tiffany OshinskySenior Producer: Anya Alvarez is our Senior ProducerExecutive Producers: Carrie Stett, Tamara Deike, and Lindsay HoffmanTheme Music: Pancakes by Eric W. Mast, JrSound Designer: Max Lorenzen Serving Pancakes is an iHeart Women's Sports Production, in partnership with Deep Blue Sports and Entertainment. You can find us on the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts. See omnystudio.com/listener for privacy information.
Meet Michael Carlini, a certified life coach and founder of Carlini Coaching, who specializes in supporting medical professionals with ADHD. Drawing from his personal experience and close family ties to healthcare, Michael sheds light on the unique challenges faced by physicians with ADHD, including time management, focus, and work-life balance. He discusses the prevalence of ADHD among medical professionals and the stigma that often prevents them from seeking help or disclosing their condition. Throughout the interview, Michael shares his journey from teaching to coaching and reveals the strategies he employs to help his clients thrive both personally and professionally. He emphasizes the importance of setting boundaries, leveraging assistive technology, and creating a supportive environment without necessarily disclosing an ADHD diagnosis. If you're a physician entrepreneur, investor, or considering a non-clinical career, this podcast is a must-listen. Tune in to discover how Michael's insights and expertise can help you navigate the challenges of ADHD in the medical field and unlock your full potential. Michael Carlini's website: https://www.carlinicoaching.com/ Subscribe to Our FREE Newsletter - THE LOUNGE: Get your weekly dose of tech trends, investment tips, and entrepreneurial insights designed for the ambitious physician! https://www.bootstrapmd.com/thelounge Our Podcast Sponsor: Doc 2 Doc Lending - Affordable loans for the busy forward-thinking physician https://www.bootstrapmd.com/doc2doc PhysicianCoaches.com The #1 Doctor Directory for Physician Coaches, Consultants, and Mentors https://www.PhysicianCoaches.com
Em sua pesquisa de mestrado na Escola de Artes, Ciências e Humanidades (EACH) da USP, orientada pelo professor Ricardo Santhiago Corrêa, a jornalista Márcia Scapatício discute as imbricações de gênero no Jornalismo Musical, pela perspectiva de uma garota que sempre respirou música mas nunca conseguiu acessar esta área de especialização, ambiente ainda extremamente masculino. “Isso alimenta a indústria da música, o mercado, as artistas e tem um desdobramento, um ‘eco' no jornalismo musical”. Nesta edição do USP Especiais #91, Márcia conta sobre um projeto a que deu início mas não chegou a ser publicado em lugar algum: os caras da loja de disco Laser Express, em Piracicaba, composta fundamentalmente por homens que, com sua influência para além das indicações de discos, regiam a agenda cultural e a cena musical da cidade. A pesquisadora fala sobre a proximidade entre artistas e jornalistas. Ela tem como referência a trajetória da cantora britânica Poly Styrene - primeira vocalista negra a ganhar relevância na cena do punk rock inglês dos anos 1970. E sobre o sexismo na música, Márcia entrevistou a jornalista inglesa Vivien Goldman, uma das cronistas mais importantes sobre o punk e o reggae, biógrafa de Bob Marley e conhecida como “professora de punk” da New York University. Goldman desabafou que a indústria musical de sua época era uma área de garotos, em que os homens do pedaço acreditavam que as mulheres não consumiam música. “Os caras que eu lia aos 15 anos, e agora eu tenho 41, são os mesmos. Se eu pegar entrevistando uma banda joia, vai ser aquele cara. Eu vejo isto como uma estrutura social que não permite que mulheres alcancem todos os lugares, ou que, se alcancem, sejam somente algumas mulheres”, lembra Márcia. Créditos do programa Produção e roteiro: Tabita Said Direção e montagem: Gustavo Xavier Captação: Cid Roberto e Mariana Franco Créditos musicais Waiting Room / Composição: Ian MacKaye / Interpretação: Fugazi O Que Eu Vou Ser Quando Crescer? / Composição: Mao / Interpretação: Garotos Podres A Noite Vai Chegar / Composição: Paulinho Camargo / Interpretação: Lady Zu Loneliness / Composição: Alexandra, Zé Antonio, Flávio e Eliane/ Interpretação: Pin Ups Deusa Sombria/ Composição: Chico Lobo/ Interpretação: Perfume Azul do Sol Na Hora Do Almoço / Composição: Belchior / Interpretação: As Baias Crash / Composição: Rodrigo Ogi / Interpretação: Juçara Marçal Humanos / Composição: Supla, Bid e Andrés / Interpretação: Tokyo Som ambiente Correndo sem Parar / Composição: Karl Hummel e Marcelo Nova / Interpretação: Camisa de Vênus Status / Composição: Lee Marcucci, Luiz Sérgio Carlini e Rita Lee / Interpretação: Rita Lee & Tutti Frutti Kitsch / Composição: Marian Elliott, Celeste Bell e Martin Glover / Interpretação: Poly Styrene What to do / Composição: Papi e Alf Soares / Interpretação: Vanusa E acho que não sou só eu / Composição: Marina Lima / Interpretação: Marina Lima Heavy Drums Bass/ Composição: Audionautix / Interpretação: Audionautix
Screenwriter Stuart Wright talks to filmmaker Uga Carlini about her new true-life alien abduction documentary BEYOND THE LIGHT BARRIER and "3 Films That Have Impacted Everything In Your Adult Life" E.T. THE EXTRA TERRESTRIALTHE BIG BLUE (1988) / BETTY BLUE aka 37°2 LE MATIN (1986)AMELIE (2001)Bonus choice… LOVE ACTUALLY (2003) “Has to be there,” says UgaBEYOND THE LIGHT BARRIER is out now and available to PRIME subscribers in the UK"3 FILMS THAT HAVE IMPACTED EVERYTHING IN YOUR ADULT LIFE" is a podcast by screenwriter Stuart Wright that explores the transformative power of cinema. From emotional masterpieces to thought-provoking classics, each episode delves into the films that have had a profound impact on our personal growth and perspective. Through engaging storytelling, critical analysis, and cultural commentary, Stuart aims to uncover the lasting influence that movies have had on his guests. Please join him on an emotional journey through the world of film and discover how just three movies can change the direction of a life, cement memories you will never forget or sometimes change how you see the world."CreditsIntro/Outro music is Rocking The Stew by Tokyo Dragons (www.instagram.com/slomaxster/)Podcast for www.britflicks.com https://www.britflicks.com/britflicks-podcast/Written, produced and hosted by Stuart WrightSupport this podcast at — https://redcircle.com/britflicks-com-podcast/donationsAdvertising Inquiries: https://redcircle.com/brandsPrivacy & Opt-Out: https://redcircle.com/privacy
SO HONORED to have Jake Carlini on the podcast! Today, we chat personally with Jake about his Youtube career, encapsulating storytelling and being kicked out of school for being TOO talented. --- Support this podcast: https://podcasters.spotify.com/pod/show/spicy-bit-of-meat/support
Former Wisconsin setter, Lauren Carlini, takes center stage in this episode of the Joncast Podcast! We talk about Lauren's exciting news – her return to Madison as the first player named for the LOVB Madison Volleyball franchise and talk about what it means to Lauren to come back to the city where her college volleyball journey began. We explore Lauren's journey toward potentially making the USA national team for the 2024 Paris Olympics. And, of course, we discuss all things Wisconsin volleyball, including their impressive performance so far in the NCAA tournament in 2023. Whether you're a volleyball enthusiast, a sports fan, or just looking for engaging and entertaining content, this episode has something for everyone. Tune in now for an exclusive conversation with a volleyball superstar, and stay updated with the latest in volleyball, sports, and more! --- Support this podcast: https://podcasters.spotify.com/pod/show/jon-arias/support
In this episode, I am joined by Freddie Carlini. He is the mastermind behind Mixtape Massacre, a highly thematic board game where you are the slasher hunting down victims to collect souvinirs. Freddie talks about how Mixtape Massacre is NOT TSA Friendly! It is a hilarious story. We also talk about running over zombies on a motorcycle, chunky dice being cool, and how dice trays save time and dice! All this and more coming up. Thanks for listening! Get a copy of Mixtape Massacre!
Libbi is traveling this weekend all the way to Kansas State University to judge the Make it with Wool competition. Libbi is part of a mentorship program through her college, Justin Hall the College of Health and Human Sciences, and on this episode she chats with her mentee Teya Carlini. Teya Carlini is currently a sophomore at Kansas State University majoring in Fashion Design. She enjoys being able to find joy in what people do and find what makes their creativity click. She loves coffee, art, nature, listening to music, finding new hobbies, and of course, fashion. Although she is just a sophomore, she can't wait to go and see what the future holds. PSA: the last few minutes of the episode has a slight buzzing sound because Libbi bumped her microphone, and we are still learning how to edit the show. Also, the quote mentioned in the show is: "The dream is free, but the hustle isn't." Thanks for listening! Be sure to give us a follow on social media @threecheeseblendpod --- Support this podcast: https://podcasters.spotify.com/pod/show/threecheeseblend/support
See more here: https://wp.me/p58EtD-6S8 Multi-award winning filmmaker Uga Carlini discusses her film Beyond the Light Barrier, exploring the extraordinary life of Elizabeth Klarer, a South African meteorologist who devoted herself to proving the existence of Akon, her extraterrestrial lover from the planet Meton in the Proxima Centauri solar system. Appreciate KAren's work Awakening Consciousness? THANK YOU for your Support for the content. Share your appreciation on this link https://www.paypal.me/KArenASwain THANK YOU for SHARING these conversations, we present them to you completely FREE with no ads! Please spread the LOVE and Wisdom. BIG LOVE ks. Visit KAren's website here https://karenswain.com/ Follow us on all our platforms https://linktr.ee/KArenSwain
In this episode we celebrate the 10th anniversary of the Wisconsin volleyball team's incredible run to the Final Four and National Championship in 2013. I'm joined by a star-studded lineup of former players, including Haleigh Nelson, Lauren Carlini, KT Kvas, Ellen Chapman, Deme Morales, Taylor Morey, and Annmarie Hickey. In this episode, we dive deep into the memories and emotions of that remarkable season. Hear firsthand what it was like for the team to make it all the way to the Final Four, their initial thoughts on new head coach Kelly Sheffield, and how often they reflect on the unforgettable moments of the 2013 season. Follow or subscribe so you don't miss a new episode! --- Support this podcast: https://podcasters.spotify.com/pod/show/jon-arias/support
Schneider Electric is driving sustainability forward for the datacentre industry. They have created an industry-first free framework to understand the full environmental impact of enterprise data centres, as detailed in the latest Scope 3 Emissions whitepaper. To learn more about this Ronan spoke to Steve Carlini, Vice President of Innovation and Data Center, Energy Management Business Unit, at Schneider Electric. Steve talks about his background, AI, what Schneider Electric does, the Scope 3 Emissions white paper and more. More about Schneider Electric: Schneider believes access to energy and digital is a basic human right. they empower all to make the most of their energy and resources, ensuring Life Is On everywhere, for everyone, at every moment. They provide energy and automation digital solutions for efficiency and sustainability. We combine world-leading energy technologies, real-time automation, software and services into integrated solutions for Homes, Buildings, Data Centres, Infrastructure and Industries. They are committed to unleash the infinite possibilities of an open, global, innovative community that is passionate about our Meaningful Purpose, Inclusive and Empowered values.
Steve and Sean chat about The Beast You Are, The Exorcist, Gen V, Futurama, and not being a frickin scab.Then, they're joined by Freddie Carlini, creator of Mixtape Massacre! They discuss creating the game, starting Bright Light Media, refusing to sit through a 3 hour rule book, and more!Support the WGA/SAG strike! See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
Lauren Carlini is a setter who will play for LOVB (League One Volleyball) in 2024. She's also striving to make the roster for USA volleyball and the 2024 Olympics. In this episode we talk about why she chose to join LOVB, if she'll be on the Madison team, her time at Wisconsin and Lauren talks about what motivates her and her process on accomplishing goals in her life. Thanks to Ian's Pizza for sponsoring this episode. Thanks to YOU for making this podcast a nominee for "Best of Madison!" Here is how we can win it. 1. Go here: https://channel3000.com/madison-magazine/best-of-madison/best-of-madison-2023-vote-now/article_81bb44d4-9dae-11ed-8778-831b6146a870.html#// 2. Click on "Arts and Entertainment" 3. Scroll to "Local Podcast" and click on "Joncast Podcast" and enter your email. 4. Vote once between now and June 30th. --- Support this podcast: https://podcasters.spotify.com/pod/show/jon-arias/support
This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Today we're joined by Nicholas Carlini, a research scientist at Google Brain. Nicholas works at the intersection of machine learning and computer security, and his recent paper “Extracting Training Data from LLMs” has generated quite a buzz within the ML community. In our conversation, we discuss the current state of adversarial machine learning research, the dynamic of dealing with privacy issues in black box vs accessible models, what privacy attacks in vision models like diffusion models look like, and the scale of “memorization” within these models. We also explore Nicholas' work on data poisoning, which looks to understand what happens if a bad actor can take control of a small fraction of the data that an ML model is trained on. The complete show notes for this episode can be found at twimlai.com/go/618.
Born and raised in the Italian speaking part of Switzerland, Valentina Carlini studied medicine in Geneva for one quarter before studying classics in Zurich, and has never looked back. As a philologist, she deeply believes we could have a better world if people would cultivate dead languages, handwriting, arts with movement of the body, brain, soul and feelings. She also believes we can ask anything, but shouldn't lecture. Please rate us on Apple and Spotify and subscribe for free at mikeyopp.com This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit mikeyopp.substack.com/subscribe
Diciassettesima puntata della quarta stagione della rubrica, nel canale spreaker J-TACTICS, dedicata alle women ed alle giovanili della Juventus, J-WORLD.Tornano alla vittoria le Juventus Women: un successo di carattere, in trasferta, una rimonta che tiene la Juve in seconda posizione in campionato, ma soprattutto un ruggito che mai come adesso era fondamentale.Rotonda vittoria delle Juventus Women, che battono in trasferta il Chievo 0-3 e ipotecano il passaggio alle Semifinali di Coppa Italia.Si chiude con una sconfitta di misura per la Juventus Next Gen la semifinale di andata della Coppa Italia di Serie C. Al "Pino Zaccheria" il Foggia supera 2-1 i bianconeri. Decisiva la doppietta di Ogunseye, a segno una volta per tempo.In mezzo il gol del pari firmato da Poli di testa. Un risultato che non sorride alla squadra di Mister Massimo Brambilla, ma che indubbiamente lascia più di una porta aperta per cercare la qualificazione nella gara di ritorno che si giocherà il prossimo 15 febbraio.Secondo pareggio consecutivo in campionato per la Juventus Next Gen che ad Alessandria pareggia 1-1 contro il Renate.Vantaggio bianconero firmato da Sekulov in avvio di ripresa e pareggio dei lombardi arrivato pochi minuti dopo con Baldassin. Un punto per parte e bianconeri che salgono, così, a quota 28 punti dopo 23 gare disputate.Si chiude con una sconfitta per 1-3 il match di Vinovo tra la Juventus Under19 e il Frosinone. Decidono l'incontro la doppietta di Condello e la rete di Bracaglia. Di Anghelè la marcatura bianconera per il momentaneo pareggio. La squadra di Paolo Montero rimane ferma a quota 25 punti in classifica, momentaneamente al sesto posto. Se il risultato non permette di sorridere, una delle note positive della giornata è il ritorno in campo di Mbangula dopo tre partite di assenza. Bella vittoria dell'Under17 allenata da Mister Panzanaro sui pari età del Napoli. A Vinovo finisce 3-1 la sfida contro i partenopei. De Chiara apre le marcature al quarto d'ora, prima della rimonta e del sorpasso della Juve tutto maturato nella ripresa con le reti di Pugno, Boufandar e Biliboc. Con questo successo i bianconeri salgono a 36 punti in graduatoria, sempre in prima posizione con tre lunghezze di vantaggio sul Parma secondo.Sconfitta di misura per l'Under16 di Mister Rivalta.A Vinovo passano 1-2 i pari età blucerchiati. Carlini e Papasergio portano gli ospiti avanti di due reti e il rigore di Merola non permette alla Juventus di uscire dal campo con almeno un punto. La classifica, attualmente, vede i bianconeri al sesto posto a quota 15 punti.A Vinovo la Juventus Under15 cala il poker contro i pari età della Sampdoria e sale a quota 20 punti dopo 10 giornate, al secondo posto in graduatoria. Doppietta per Kaba e reti di Borasio e Suazo. Una gara senza storia alla prima gara ufficiale del 2023.Nella dodicesima giornata di campionato l'Under19 femminile supera con un netto 8-0 il Parma portandosi così a quota 31 punti e al primo posto in classifica in attesa del recupero tra San Marino Academy e Roma (gara rinviata per neve).Tante le bianconere in gol: Bertucci, Mounecif, Ruggeri (doppietta), Berveglieri e Cinquegrana. A completare il tabellino un'autorete delle emiliane.Da segnalare anche l'esordio dal primo minuto per la classe 2008, Giulia Robino.Esordio anche per la classe 2006 Arianna Gallina.Ottima uscita nel campionato regionale per l'Under15 femminile di Mister Lombardi che si è imposta 8-0 sul campo delle pari età del Baveno.Tripletta per Berbotto, doppietta per Gaiardelli e reti di Basciu, Abbondanza e Alice a completare l'elenco delle marcatrici.Non mancherà poi uno sguardo ai prossimi impegni delle women e delle giovanili:Juve-Sampdoria women,Domenica 29 gennaio, ore 12:30.Next Gen-Vicenza,Domenica 29 gennaio, ore 12:30.Genoa-Juve Under17,Domenica 29 gennaio, ore 15.Como-Juve Under16,Domenica 29 gennaio, ore 14:30.Como-Juve Under15,Domenica 29 gennaio, ore 12.Tavagnacco-Juve Under19 femm.,Domenica 29 gennaio, ore 15.Vercelli-Juve Under15 femm.,Domenica 29 gennaio, ore 15.Anche quest'anno sarà nostra guida nel mondo Juve, il sempre competente e preciso amico Roberto Loforte, Fuori rosa TV.
Given to the convent at just 9 years old, young Benedetta sees miracles in everything from a statue falling, to a big black dog not killing her. As she dives into the strict aesthetic lifestyle her experiences evolve to full-on visions including a marriage proposal from Christ himself. WHOA! Her visitations and eventual stigmata quickly raise her to leadership levels and she gains much notoriety, along with her Abbess title. But not all the nuns in the convent believe in her visions or her leadership choices, and Benedetta is accused of heresy and forced to face a series of investigations into the truth. Does Benedetta survive these accusations? What do the investigators discover? Who is Splenditello? And more importantly - what happened behind her doors at night? Let's just say - she deserves her own float at the pride parade. Listen now, to hear her full story! — A Broad is a woman who lives by her own rules. Broads You Should Know is the podcast about the Broads who helped shape our world! 3 Ways you can help support the podcast: Write a review on Apple Podcasts Share your favorite episode on social Tell a friend! — Broads You Should Know is hosted by Sara Gorsky. IG: @SaraGorsky Web master / site design: www.BroadsYouShouldKnow.com — Broads You Should Know is produced by Sara Gorsky & edited by Chloe Skye
Today's guest refers to himself as a “FOMO Dragon”. Early to both Cryptokitties and NBA Top Shot, Carlini is no stranger to the volatile market of non fungibles. He joins us on the show to share some shocking tales, like how he sold his punks for Top Shot, and dish a slice of humble pie to no one other than himself.
In this episode, Marianna takes us through the exact steps to creating a sales funnel and the biggest reason to build a marketing funnel and use a system. We also give you the systems we recommend and use to get you started! Her website Insta 10 things to set up your Kartra free workbook
This week we feature John Carlini. John was involved with the original David Grisman Quintet, as its musical director, and from that time forward he continued a close relationship with Tony Rice. Although known primarily as a guitar player, John is also a skilled banjo player and was close with legendary fiddler Tex Logan, who he considers his bluegrass mentor. In this podcast we talk with John about his long, varied and vast career in music.
The funnel is the backbone of sales, both online and offline. It can be very easy to get overwhelmed by technology and options when setting up your sales and marketing funnels, but fundamentally they can be quite simple. In this episode, Marianna Carlini and Alastair McDermott discuss how to create a simple but effective sales and marketing funnel, the different types of funnels you can choose from, and what funnels are appropriate for high-end offers. They also discuss mistakes experts make when setting up their funnels, how to connect with your audience, and why it pays to repeat yourself on social media. “Talk about what you offer as much as possible. Every context, every opportunity you have, just talk about it because the more people hear about what you do, the more they will remember.” -- Marianna Carlini on The Recognized Authority podcast “You don't really know how well it's going to work. Even if you did your customer research, until it's out there, you don't really know what's going to happen. That's why starting with the simplest option can save you a lot of headache and time in the long run.” -- Marianna Carlini on The Recognized Authority podcast
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [MLSN #6]: Transparency survey, provable robustness, ML models that predict the future, published by Dan Hendrycks on October 12, 2022 on The AI Alignment Forum. You can subscribe to the newsletter here, follow the newsletter on twitter here, or join the subreddit here. Welcome to the 6th issue of the ML Safety Newsletter. In this edition, we cover: A review of transparency research and future research directions A large improvement to certified robustness “Goal misgeneralization” examples and discussion A benchmark for assessing how well neural networks predict world events (geopolitical, industrial, epidemiological, etc.) Surveys that track what the ML community thinks about AI risks $500,000 in prizes for new benchmarks And much more. Monitoring Transparency Survey A taxonomy of transparency methods. Methods are organized according to what part of the model they help to explain (weights, neurons, subnetworks, or latent representations). They can be intrinsic (implemented during training), post hoc (implemented after training), or can rely on a mix of intrinsic and post hoc techniques. ‘Hazards' (in orange) are phenomena that make any of these techniques more difficult. This survey provides an overview of transparency methods: what's going on inside of ML models? It also discusses future directions, including: Detecting deception and eliciting latent knowledge. Language models are dishonest when they babble common misconceptions like “bats are blind” despite knowing that this is false. Transparency methods could potentially indicate what the model ‘knows to be true' and provide a cheaper and more reliable method for detecting dishonest outputs. Developing rigorous benchmarks. These benchmarks should ideally measure the extent to which transparency methods provide actionable insights. For example, if a human implants a flaw in a model, can interpretability methods reliably identify it? Discovering novel behaviors. An ambitious goal of transparency tools is to uncover why a model behaves the way it does on a set of inputs. More feasibly, transparency tools could help researchers identify failures that would be difficult to otherwise anticipate. Other Monitoring News [Link] This paper discusses the sudden emergence of capabilities in large language models. This unpredictability is naturally a safety concern, especially when many of these capabilities could be hazardous or discovered after deployment. It will be difficult to make models safe if we do not know what they are capable of. [Link] This work attributes emergent capabilities to “hidden progress” rather than random discovery. [Link] Current transparency techniques (e.g., feature visualization) generally fail to distinguish the inputs that induce anomalous behavior Robustness Mathematical Guarantees of Model Performance The current state-of-the-art method for certified robustness (denoised smoothing) combines randomized smoothing with a diffusion model for denoising. In randomized smoothing, an input is perturbed many times and the most commonly assigned label is selected as the final answer, which guarantees a level robust accuracy within a certain perturbation radius. To improve this method, the perturbed inputs are denoised with a diffusion model after the perturbation step so that they can be more easily classified (from Salman et al.) A central concern in the robustness literature is that empirical evaluations may not give performance guarantees. Sometimes the test set will not find important faults in a model, and some think empirical evidence is insufficient for having high confidence. However, robustness certificates enable definitive claims for how a model will behave in some classes of situations. In this paper, Carlini et al. recently improved ImageNet certified robustness by 14 percentage points by s...
In this episode, Clarence and Stephen catch up with one another. Then, they sit down with U.S. Women's National Team setter Lauren Carlini (8:03 - 1:01:37) ahead of the 2022 FIVB Women's World Championship. They chat about the Lauren's time at Wisconsin, video games, growing the game, Worlds, her time overseas and so much more!
DISCLAIMER The following show is in no way advice on how you should invest into cryptocurrency, always do your own research and source out the information for yourself. Shawn Silva, an Engineering Technologist who created A3C to help others navigate the blockchain landscape and Marco Carlini, a salesperson in the concrete industry are here to share a lot about cryptocurrency, blockchain, NFT's and a whole lot more.On with the show. What is Blockchain? A blockchain is a distributed system that achieves security through cryptography and consensus without relying on trust.Nearly two years studying and investing in blockchain, Shawn has learned a lot from his initial $1000.00 investment in a few currencies and he has stuck with Cardano. He believes in the Cardano mandates as he dives deep into the founder, Charles Hoskinson, please look him up.Shawn and Marco discuss the Terra Luna crash over 3 days, where it went from $160.00 to worth 0.000019 and a lot of people lost their life savings and people committed suicide. Where did that money go? That is the fear.There are two types of Blockchains, Proof of Work and Proof of State and the guys break it all down. They discuss DeFi, there are two types of exchanges, secs and decs. When you create a blockchain there is a 3-pillar system, Security, Scalability, and Interoperability. Lessons to consider when investing, diversify and have one meme/joke coin because you never know. We discuss hot and cold storages and how to set them up and the term KYC Know Your Customer. Bottom line, do your research and source out the information and make informed decisions and never look at crypto as the quick rich entity that has been portrayed in the last decade.Shared and Discussed Linkshttps://www.youtube.com/c/charleshoskinsoncryptohttps://messari.iohttps://koinly.iohttps://www.youtube.com/watch?v=vW2BPQ15OSwhttps://www.youtube.com/channel/UCRvqjQPSeaWn-uEx-w0XOIghttps://www.youtube.com/c/CryptosRUshttps://empowa.iohttps://www.beeple-crap.comhttps://cardanotrees.comhttps://www.cryptoboons.comhttps://www.claynation.iohttps://www.unsigs.comhttps://trezor.ioThank you Shawn and Marco for sharing so much about the digital currency age quickly approaching all of us. Find Shawn at the following @ShawnA3C on Twitter and his YouTube Channle – A3C Crypto Club Inc and on Facebook at ShawnA3C and on his email at shawn@A3Ccryptoclub.com and the website at www.A3Ccryptoclub.com and contact Marco on his email at m.e.carlini@hotmail.com
Guests | Q&A | Chilling & ShillingRecorded live on Twitter Spaces April 18, 2022.Links:Guests:Mumbot Twitter Interview EpisodeSwickie Twitter Interview EpisodeCarlini Twitter Interview EpisodeNoobie Twitter Interview EpisodeGiovanni Twitter Interview EpisodeStorm Twitter Interview EpisodeSteph Sutto Twitter Interview EpisodeSarah Script Twitter Interview EpisodeHenryG Twitter Interview EpisodeMichael Keen https://twitter.com/NFTicketJennifer Sutto https://twitter.com/jennifer_suttoNFT Catcher Podcast https://twitter.com/NFTCatcherPodproduced by Andy Cinquino https://twitter.com/ajc254NFT Catcher theme music by ItsJustLos https://twitter.com/its_JustLosemail : NFTCatcherPod@gmail.comDiscord
Carlini8, founder of Purrnelope's Country Club and an OG CryptoKitties user, joins the podcast to discuss how he got started in NFTs, his work in NFTs, and the roadmap for Purrnelope's Country Club. We also dive into his early work in NFTs on Pranksy's Loot Boxes, how the NFT space has changed, and where NFT projects are going — including the future for Purrnelope's. Folow Carlini on Twitter: https://twitter.com/Carlini8N Follow Purrnelope's Country Club on Twitter: https://twitter.com/PurrnelopesCC On this episode: 0:00 - Intro to Eric Carlini 1:45 - Getting Started in NFTs with CryptoKitties 7:25 - The CryptoKitties Community 8:40 - How Top Shot and CryptoKitties are similar 13:20 - Carlini's work on Pranksy Loot Boxes 18:30 - Creating Purrnelope's Country Club NFT 22:00 - NFT Roadmaps and PCC's Roadmap 31:00 - Replacing Discord/OpenSea with a Project Website 34:00 - PCC World Building, Games, and Storyline 39:00 - PCC Token and Utility 44:00 - The PCC Team & Organization 48:45 - Following Purrnelope's Country Club ***** Follow Max Minsker on Twitter: @MaxMinsker Follow MomentRanks on Twitter: @MomentRanks Edited by Christian Hardy: @ByHardy Music by Soulker ***** MomentRanks.com is the premier NFT resource for NFT valuations, rarity, marketplace tools, the latest sales trends and data, and more. Get 1-of-1 valuations for your 1-of-1 NFT collectibles and find your NFT home at MomentRanks.com.
Today Stuart covers the recent film Benedetta with friend and fellow podcaster Quentin from "Bridge and Tunnel" and "Bell Book and Scandal" podcasts. The film Benedetta explores the life of Benedetta Carlini who is considered one of the first recorded historical examples of female homosexuality in the West. Learn more as we untangle the real life figure from the one portrayed in the film.
NFTs in 2017 | Cryptokitties | NFT Boxes w/ Pranksy | Purrnelope's Country Club | ENS Subdomains | NFT Wallet Safety | Links: Carlini Twitter Purrnelope's Country ClubPCC LinktreeMichael Keen https://twitter.com/NFTicketJennifer Sutto https://twitter.com/jennifer_suttoNFT Catcher Podcast https://twitter.com/NFTCatcherPodproduced by Andy Cinquino https://twitter.com/ajc254NFT Catcher theme music by ItsJustLos https://twitter.com/its_JustLosemail : NFTCatcherPod@gmail.comDiscord
Are you comfortable openly talking about grief? Accepting that loss is inevitable and embracing grief can be incredibly healing, but we first have to open up the conversation about it. In this twenty second episode together, I am joined by the wonderful Reverend Rich Carlini who is an ordained Unity minister and currently serves the Unity congregation in Davis, California. He is also the senior minister of an alternative ministry, Transform Myself Inc., A Unity Ministry. Rich is a co-host of The Healing Power of Grief on Unity Online Radio. He teaches Ministry at the End-of-Life for both UWSI and UUMS. He has been a registered nurse for over 40 years, 25 of them in hospice and palliative care.Throughout this episode, Reverend Rich and I talk about the importance of being open to grief and challenging society's ideas of what grief looks like. He shares what his background is, the grief that he has experienced throughout his life, how life and death are comparable, what his teachings consist of, and so much more. Tune in and listen to the twenty second episode of Grief and Happiness, and join me in learning from Revered Rich about the healing power of grief! In This Episode, You Will Learn:About Reverend Rich's background (1:27)Reverend Rich's thoughts on being open to grieve (7:17)About Reverend Rich's current teachings (14:16)Reverend Rich's explanation of life and death being comparable (19:57)Reverend Rich's reflection on his parents passing (27:45)Connect with Reverend Rich Unity Center of Davis WebsiteTransform Myself Inc WebsiteUnity Online Radio Let's Connect:WebsiteLinkedInFacebookInstagramTwitterPinterestBook: Emily Thiroux Threatt - Loving and Living Your Way Through Grief Hosted on Acast. See acast.com/privacy for more information.
Lauren Carlini is a professional indoor volleyball player, and a member of the USA Volleyball Women's National Team. Lauren grew up in Illinois and was a multi-sport athlete until she decided to commit her whole focus on the sport of Volleyball. Her collegiate journey took her to the University of Wisconsin and started off with an extreme high, finding themselves in the National Championship and Lauren earning National Freshman of the Year accolades. Lauren's college journey was one that many can learn from. She encountered bumps In the road and character building moments that have made her the person she is today.From college Lauren transitioned into Professional Volleyball and began training with USA Volleyball's Women's National Team. For the 4 years following her final collegiate competition Lauren had one goal in mind…. The Tokyo Olympics. In our interview we unpack her journey and the result on her quest for Gold. There's so much Gold (pun kind of intended) in this episode and I hope that you enjoy Lauren's vulnerability and candor.Links:https://www.instagram.com/laurencarlini/https://twitter.com/laurencarlini
Own The Moment: NBA Top Shot, NFL All Day, and Sports NFT Podcast
This week on OTM's NFT Weekly, we are joined by a Purrfect guest, Carlni to talk about all things NFTs including: - Carlini's NFT Journey - Building an NFT Community - The future of NFT avatar projects - The big TOC announcement Also, TOC Drop 2 pre-order ENDS at midnight, so this is your last chance to guarantee yourself a TOC pack! https://toc.otmnft.com/drop/preorder #OTM #TOC Follow The Owners Club on Twitter: https://twitter.com/TOCNFT Join The Owners Club Discord: http://bit.ly/tocnftdiscord Website: https://www.otmnft.com/ Twitter: https://twitter.com/OwnTheMomentNFT YouTube: https://www.youtube.com/c/OwnTheMoment Discord: http://bit.ly/otmdiscord
Have you ever wondered what you should be delegating or who you should be delegating to? Have you ever wondered what the difference is between a virtual assistant and an online business manager? Well, you're going to find out in this episode. You're also going to find out what you could take out of your business and outsource to someone else so that you can focus on the reason you started a business in the first place. We're going to hear from my very own amazing online business manager, Lynda Carlini. She is sharing what it is like to do what she does, who she helps, and why people need her. Lynda is a certified online business manager and systems strategist. She loves working with her clients to eliminate the overwhelm and get back to loving their businesses by streamlining their business systems and processes. Lynda was a stay-at-home mom to 3 kids under the age of 5 and the right hand to her husband in his business. When her kids were in elementary school, she worked there too, but as they left, she began working as a VA to online entrepreneurs. This quickly transformed into more as she began implementing the systems and processes in her clients' businesses that they needed to help their days run more smoothly. She knew that she had finally found what she was meant to do. Her mission is to create more time for other business owners so that they can enjoy making memories and growing their revenue at the same time. In her free time, Lynda will usually be found with her family on the lake, a beach, or in the pasture with their horses and cattle! You're going to love what Lynda has to share about the benefits of having an OBM, so don't miss this episode. Resources Mentioned: www.aquabluevirtualservices.com Follow @aquabluevirtualservices on Instagram Connect with @aquabluevirtualservices on Facebook Get access to all my free downloads and productivity tips HERE I would love to connect on Facebook or Instagram Show notes available at www.andrealiebross.com/podcast33The Get a Grip Masterclass starting September 12th will teach you how to get a clear picture of each and every facet of your business, in just 5 20 minutes increments, so that you'll easily know what needs work or needs to change in order to get to that next level. Andrea will guide you through understanding the people, the marketing, the systems, the goals, the numbers (ugg)- all of it - in just 5 easy steps. Head to andrealiebross.com/getagrip2022 to register NOW.