POPULARITY
On this episode of the Crazy Wisdom podcast, I, Stewart Alsop, sat down once again with Aaron Lowry for our third conversation, and it might be the most expansive yet. We touched on the cultural undercurrents of transhumanism, the fragile trust structures behind AI and digital infrastructure, and the potential of 3D printing with metals and geopolymers as a material path forward. Aaron shared insights from his hands-on restoration work, our shared fascination with Amish tech discernment, and how course-correcting digital dependencies can restore sovereignty. We also explored what it means to design for long-term human flourishing in a world dominated by misaligned incentives. For those interested in following Aaron's work, he's most active on Twitter at @Aaron_Lowry.Check out this GPT we trained on the conversation!Timestamps00:00 – Stewart welcomes Aaron Lowry back for his third appearance. They open with reflections on cultural shifts post-COVID, the breakdown of trust in institutions, and a growing societal impulse toward individual sovereignty, free speech, and transparency.05:00 – The conversation moves into the changing political landscape, specifically how narratives around COVID, Trump, and transhumanism have shifted. Aaron introduces the idea that historical events are often misunderstood due to our tendency to segment time, referencing Dan Carlin's quote, “everything begins in the middle of something else.”10:00 – They discuss how people experience politics differently now due to the Internet's global discourse, and how Aaron avoids narrow political binaries in favor of structural and temporal nuance. They explore identity politics, the crumbling of party lines, and the erosion of traditional social anchors.15:00 – Shifting gears to technology, Aaron shares updates on 3D printing, especially the growing maturity of metal printing and geopolymers. He highlights how these innovations are transforming fields like automotive racing and aerospace, allowing for precise, heat-resistant, custom parts.20:00 – The focus turns to mechanical literacy and the contrast between abstract digital work and embodied craftsmanship. Stewart shares his current tension between abstract software projects (like automating podcast workflows with AI) and his curiosity about the Amish and Mennonite approach to technology.25:00 – Aaron introduces the idea of a cultural “core of integrated techne”—technologies that have been refined over time and aligned with human flourishing. He places Amish discernment on a spectrum between Luddite rejection and transhumanist acceleration, emphasizing the value of deliberate integration.30:00 – The discussion moves to AI again, particularly the concept of building local, private language models that can persistently learn about and serve their user without third-party oversight. Aaron outlines the need for trust, security, and stateful memory to make this vision work.35:00 – Stewart expresses frustration with the dominance of companies like Google and Facebook, and how owning the Jarvis-like personal assistant experience is critical. Aaron recommends options like GrapheneOS on a Pixel 7 and reflects on the difficulty of securing hardware at the chip level.40:00 – They explore software development and the problem of hidden dependencies. Aaron explains how digital systems rest on fragile, often invisible material infrastructure and how that fragility is echoed in the complexity of modern software stacks.45:00 – The concept of “always be reducing dependencies” is expanded. Aaron suggests the real goal is to reduce untrustworthy dependencies and recognize which are worth cultivating. Trust becomes the key variable in any resilient system, digital or material.50:00 – The final portion dives into incentives. They critique capitalism's tendency to exploit value rather than build aligned systems. Aaron distinguishes rivalrous games from infinite games and suggests the future depends on building systems that are anti-rivalrous—where ideas compete, not people.55:00 – They wrap up with reflections on course correction, spiritual orientation, and cultural reintegration. Stewart suggests titling the episode around infinite games, and Aaron shares where listeners can find him online.Key InsightsTranshumanism vs. Techne Integration: Aaron frames the modern moment as a tension between transhumanist enthusiasm and a more grounded relationship to technology, rooted in "techne"—practical wisdom accumulated over time. Rather than rejecting all new developments, he argues for a continuous course correction that aligns emerging technologies with deep human values like truth, goodness, and beauty. The Amish and Mennonite model of communal tech discernment stands out as a countercultural but wise approach—judging tools by their long-term effects on community, rather than novelty or entertainment.3D Printing as a Material Frontier: While most of the 3D printing world continues to refine filaments and plastic-based systems, Aaron highlights a more exciting trajectory in printed metals and geopolymers. These technologies are maturing rapidly and finding serious application in domains like Formula One, aerospace, and architectural experimentation. His conversations with others pursuing geopolymer 3D printing underscore a resurgence of interest in materially grounded innovation, not just digital abstraction.Digital Infrastructure is Physical: Aaron emphasizes a point often overlooked: that all digital systems rest on physical infrastructure—power grids, servers, cables, switches. These systems are often fragile and loaded with hidden dependencies. Recognizing the material base of digital life brings a greater sense of responsibility and stewardship, rather than treating the internet as some abstract, weightless realm. This shift in awareness invites a more embodied and ecological relationship with our tools.Local AI as a Trustworthy Companion: There's a compelling vision of a Jarvis-like local AI assistant that is fully private, secure, and persistent. For this to function, it must be disconnected from untrustworthy third-party cloud systems and trained on a personal, context-rich dataset. Aaron sees this as a path toward deeper digital agency: if we want machines that truly serve us, they need to know us intimately—but only in systems we control. Privacy, persistent memory, and alignment to personal values become the bedrock of such a system.Dependencies Shape Power and Trust: A recurring theme is the idea that every system—digital, mechanical, social—relies on a web of dependencies. Many of these are invisible until they fail. Aaron's mantra, “always be reducing dependencies,” isn't about total self-sufficiency but about cultivating trustworthy dependencies. The goal isn't zero dependence, which is impossible, but discerning which relationships are resilient, personal, and aligned with your values versus those that are extractive or opaque.Incentives Must Be Aligned with the Good: A core critique is that most digital services today—especially those driven by advertising—are fundamentally misaligned with human flourishing. They monetize attention and personal data, often steering users toward addiction or ...
Welcome to episode #978 of Six Pixels of Separation - The ThinkersOne Podcast. Dr. Christopher DiCarlo is a philosopher, educator, author, and ethicist whose work lives at the intersection of human values, science, and emerging technology. Over the years, Christopher has built a reputation as a Socratic nonconformist, equally at home lecturing at Harvard during his postdoctoral years as he is teaching critical thinking in correctional institutions or corporate boardrooms. He's the author of several important books on logic and rational discourse, including How To Become A Really Good Pain In The Ass - A Critical Thinker's Guide To Asking The Right Questions and So You Think You Can Think?, as well as the host of the podcast, All Thinks Considered. In this conversation, we dig into his latest book, Building A God - The Ethics Of Artificial Intelligence And The Race To Control It, which takes a sobering yet practical look at the ethical governance of AI as we accelerate toward the possibility of artificial general intelligence. Drawing on years of study in philosophy of science and ethics, Christopher lays out the risks - manipulation, misalignment, lack of transparency - and the urgent need for international cooperation to set safeguards now. We talk about everything from the potential of AI to revolutionize healthcare and sustainability to the darker realities of deepfakes, algorithmic control, and the erosion of democratic processes. His proposal? A kind of AI “Geneva Conventions,” or something akin to the IAEA - but for algorithms. In a world rushing toward techno-utopianism, Christopher is a clear-eyed voice asking: “What kind of Gods are we building… and can we still choose their values?” If you're thinking about the intersection of ethics and AI (and we should all be focused on this!), this is essential listening. Enjoy the conversation... Running time: 58:55. Hello from beautiful Montreal. Listen and subscribe over at Apple Podcasts. Listen and subscribe over at Spotify. Please visit and leave comments on the blog - Six Pixels of Separation. Feel free to connect to me directly on Facebook here: Mitch Joel on Facebook. Check out ThinkersOne. or you can connect on LinkedIn. ...or on X. Here is my conversation with Dr. Christopher DiCarlo. Building A God - The Ethics Of Artificial Intelligence And The Race To Control It. How To Become A Really Good Pain In The Ass - A Critical Thinker's Guide To Asking The Right Questions. So You Think You Can Think?. All Thinks Considered. Convergence Analysis. Follow Christopher on LinkedIn. Follow Christopher on X. This week's music: David Usher 'St. Lawrence River'. Chapters: (00:00) - Introduction to AI Ethics and Philosophy. (03:14) - The Interconnectedness of Systems. (05:56) - The Race for AGI and Its Implications. (09:04) - Risks of Advanced AI: Misuse and Misalignment. (11:54) - The Need for Ethical Guidelines in AI Development. (15:05) - Global Cooperation and the AI Arms Race. (18:03) - Values and Ethics in AI Alignment. (20:51) - The Role of Government in AI Regulation. (24:14) - The Future of AI: Hope and Concerns. (31:02) - The Dichotomy of Regulation and Innovation. (34:57) - The Drive Behind AI Pioneers. (37:12) - Skepticism and the Tech Bubble Debate. (39:39) - The Potential of AI and Its Risks. (43:20) - Techno-Selection and Control Over AI. (48:53) - The Future of Medicine and AI's Role. (51:42) - Empowering the Public in AI Governance. (54:37) - Building a God: Ethical Considerations in AI.
"Innovation isn't about shiny campaigns. It starts with people, process, and purpose." Joining us for a second time on the podcast is Nicholas Kontopoulos, VP of Marketing APJ at Twilio. We explored what real innovation looks like, how to align marketing and sales in global teams, and how Nicholas uses AI.
By David Stephen An approach to AI safety could be a derivative of language translation, where access to the original content is accessible to the receiver. In a lot of use cases for language translation, an individual would have the original text translated, then send, but the receiver only gets the translation, which conveys the message - but has no access to the original. Machine Translation Often, translating a message from a language to another and translating back shows some differences from the original, and could even continue to change in pieces, over several iterations, depending on the language. While language translation is competent enough to provide the communication, it could be viable, for AI safety, to have translations come with an ID, so that the original message is accessible or retrievable from the platform, within a timeframe - by the receiver. Could language translation model some AI Alignment? The necessity for this, as a language translation option, may be small percentage, especially if the receiver wanted extra clarity or needed to check the emphasis in some paragraphs, or even knows the original language too, but the importance could be a channel towards AI safety. One of the questions in AI safety is where do deepfakes come from? There are often videos with political or cultural implications, or some AI audio for deception, or some malware, some fake images, or texts. There are several AI tools, just like translation platforms, that indicate that they do not store data, or the data is removed after some time. This appears appropriate, ideally, for privacy, storage, as well as for several no-harm cases. But it has also made misuses easier and several - with consequences. For prompts, IDs, selectively, may provide token architecture for misuses in ways to shape how AI models categorize outputs, then possible alerts, delivery-expectation or even red-teaming against those. Also, several contemporary use cases can assist AI models become more outputs-aware, not just output-resulting. This means the possibility to prospect the likely motive or destination of the output, given the contents [by reading parallels of token architecture, conceptually]. AI Alignment? How can AI be aligned to human values in ways that it knows what it might be used for? One angle to defining human values is what is accepted in public, or in certain spheres of the public, or at certain times. This means that AI may also be exploring the reach or extents of it outputs - given the quality, timing, destination and possible consequences. Outputs could be an amplified focus of AI safety, using ID-keeps-and-reversal, advancing from some input-dominated red-teaming. Language translation with access to the original could become a potent tracker for what else could be ahead, for safety towards AGI. Language is a prominent function of human memory and intentionality. Language is a core of cooperation. Language for AI is already an open risk, for unwanted possibilities with AI connivance, aside from predictions of AGI. Deepening into the language processing could have potential for AI alignment. There is a recent analysis in The Conversation, To understand the future of AI, take a look at the failings of Google Translate,, stating that, "Machine translation (MT) has improved relentlessly in the past two decades, driven not only by tech advances but also the size and diversity of training data sets. Whereas Google Translate started by offering translations between just three languages in 2006 - English, Chinese and Arabic - today it supports 249. Yet while this may sound impressive, it's still actually less than 4% of the world's estimated 7,000 languages. Between a handful of those languages, like English and Spanish, translations are often flawless. Yet even in these languages, the translator sometimes fails on idioms, place names, legal and technical terms, and various other nuances. Between many other languages, the service can help ...
Dr. Max Bartolo from Cohere discusses machine learning model development, evaluation, and robustness. Key topics include model reasoning, the DynaBench platform for dynamic benchmarking, data-centric AI development, model training challenges, and the limitations of human feedback mechanisms. The conversation also covers technical aspects like influence functions, model quantization, and the PRISM project.Max Bartolo (Cohere):https://www.maxbartolo.com/https://cohere.com/commandTRANSCRIPT:https://www.dropbox.com/scl/fi/vujxscaffw37pqgb6hpie/MAXB.pdf?rlkey=0oqjxs5u49eqa2m7uaol64lbw&dl=0TOC:1. Model Reasoning and Verification [00:00:00] 1.1 Model Consistency and Reasoning Verification [00:03:25] 1.2 Influence Functions and Distributed Knowledge Analysis [00:10:28] 1.3 AI Application Development and Model Deployment [00:14:24] 1.4 AI Alignment and Human Feedback Limitations2. Evaluation and Bias Assessment [00:20:15] 2.1 Human Evaluation Challenges and Factuality Assessment [00:27:15] 2.2 Cultural and Demographic Influences on Model Behavior [00:32:43] 2.3 Adversarial Examples and Model Robustness3. Benchmarking Systems and Methods [00:41:54] 3.1 DynaBench and Dynamic Benchmarking Approaches [00:50:02] 3.2 Benchmarking Challenges and Alternative Metrics [00:50:33] 3.3 Evolution of Model Benchmarking Methods [00:51:15] 3.4 Hierarchical Capability Testing Framework [00:52:35] 3.5 Benchmark Platforms and Tools4. Model Architecture and Performance [00:55:15] 4.1 Cohere's Model Development Process [01:00:26] 4.2 Model Quantization and Performance Evaluation [01:05:18] 4.3 Reasoning Capabilities and Benchmark Standards [01:08:27] 4.4 Training Progression and Technical Challenges5. Future Directions and Challenges [01:13:48] 5.1 Context Window Evolution and Trade-offs [01:22:47] 5.2 Enterprise Applications and Future ChallengesREFS:[00:03:10] Research at Cohere with Laura Ruis et al., Max Bartolo, Laura Ruis et al.https://cohere.com/research/papers/procedural-knowledge-in-pretraining-drives-reasoning-in-large-language-models-2024-11-20[00:04:15] Influence functions in machine learning, Koh & Lianghttps://arxiv.org/abs/1703.04730[00:08:05] Studying Large Language Model Generalization with Influence Functions, Roger Grosse et al.https://storage.prod.researchhub.com/uploads/papers/2023/08/08/2308.03296.pdf[00:11:10] The LLM ARChitect: Solving ARC-AGI Is A Matter of Perspective, Daniel Franzen, Jan Disselhoff, and David Hartmannhttps://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf[00:12:10] Hugging Face model repo for C4AI Command A, Cohere and Cohere For AIhttps://huggingface.co/CohereForAI/c4ai-command-a-03-2025[00:13:30] OpenInterpreterhttps://github.com/KillianLucas/open-interpreter[00:16:15] Human Feedback is not Gold Standard, Tom Hosking, Max Bartolo, Phil Blunsomhttps://arxiv.org/abs/2309.16349[00:27:15] The PRISM Alignment Dataset, Hannah Kirk et al.https://arxiv.org/abs/2404.16019[00:32:50] How adversarial examples arise, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, Aleksander Madryhttps://arxiv.org/abs/1905.02175[00:43:00] DynaBench platform paper, Douwe Kiela et al.https://aclanthology.org/2021.naacl-main.324.pdf[00:50:15] Sara Hooker's work on compute limitations, Sara Hookerhttps://arxiv.org/html/2407.05694v1[00:53:25] DataPerf: Community-led benchmark suite, Mazumder et al.https://arxiv.org/abs/2207.10062[01:04:35] DROP, Dheeru Dua et al.https://arxiv.org/abs/1903.00161[01:07:05] GSM8k, Cobbe et al.https://paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k[01:09:30] ARC, François Chollethttps://github.com/fchollet/ARC-AGI[01:15:50] Command A, Coherehttps://cohere.com/blog/command-a[01:22:55] Enterprise search using LLMs, Coherehttps://cohere.com/blog/commonly-asked-questions-about-search-from-coheres-enterprise-customers
This research was conducted at AE Studio and supported by the AI Safety Grants programme administered by Foresight Institute with additional support from AE Studio. SummaryIn this post, we summarise the main experimental results from our new paper, "Towards Safe and Honest AI Agents with Neural Self-Other Overlap", which we presented orally at the Safe Generative AI Workshop at NeurIPS 2024. This is a follow-up to our post Self-Other Overlap: A Neglected Approach to AI Alignment, which introduced the method last July.Our results show that the Self-Other Overlap (SOO) fine-tuning drastically[1] reduces deceptive responses in language models (LLMs), with minimal impact on general performance, across the scenarios we evaluated. LLM Experimental SetupWe adapted a text scenario from Hagendorff designed to test LLM deception capabilities. In this scenario, the LLM must choose to recommend a room to a would-be burglar, where one room holds an expensive item [...] ---Outline:(00:19) Summary(00:57) LLM Experimental Setup(04:05) LLM Experimental Results(05:04) Impact on capabilities(05:46) Generalisation experiments(08:33) Example Outputs(09:04) ConclusionThe original text contained 6 footnotes which were omitted from this narration. The original text contained 2 images which were described by AI. --- First published: March 13th, 2025 Source: https://www.lesswrong.com/posts/jtqcsARGtmgogdcLT/reducing-llm-deception-at-scale-with-self-other-overlap-fine --- Narrated by TYPE III AUDIO. ---Images from the article:
As AI continues to advance and integrate into our daily lives, can it truly be designed to align with our deepest human values and moral principles? If so, how can we ensure that AI not only understands but also respects and promotes our ethical frameworks, without compromising our privacy or hindering our personal growth and autonomy? John Vervaeke, Christopher Mastropietro, and Jordan Hall embark on a nuanced exploration of the intricate relationship between AI and human flourishing. They explore the concept of "intimate AI," a personalized guardian that attunes to individual biometrics and psychometrics, offering a protective and challenging presence. The discussion underscores the critical importance of privacy, the perils of idolatry, and the urgent need for a new philosophical framework that addresses the meaning crisis. Jordan Hall is a technology entrepreneur with several years of experience building disruptive companies. He is interested in philosophy, artificial intelligence, and complex systems and has a background in law. Hall has worked for several technology companies and was the founder and CEO of DivX. He is currently involved in various think tanks and institutes and is focused on upgrading humanity's capacity for thought and action. Christopher Mastropietro is a philosophical writer who is fascinated by dialogue, symbols, and the concept of self. He actively contributes to the Vervaeke Foundation. Notes: (0:00) Introduction to the Lectern (0:30) Overview of Today's Discussion: Can AI be in Alignment with Human Values? (1:00) The Three-Point Proposal - Individual Attunement, Decentralized and Distributed AI, Guardian AI (6:30) Individual AI Attunement (8:30) Distributed AI and Collective Intelligence (8:45) Empowerment of Agency through AI (12:30) The Role of Intimacy in AI Alignment - Why Relationality Matters (22:00) Can AI Help Develop Human Integrity? - The Challenge of Self-Alignment (28:00) Cultural and Enculturation Challenges (31:30) AI, Culture, and the Reintegration of Human Rhythms (38:00) Addressing Cocooning and Cultural Integration (47:00) Domains of Enculturation - Psychological, Economic, and Intersubjective (48:30) ”We're not looking necessarily for a teacher as much as we were looking for the teacherly opportunity in the encounters we're having.” (51:00) The Sanctity of Privacy and Vulnerability (1:07:00) The Role of Intimacy in Privacy (1:13:00) Final Reflections --- Connect with a community dedicated to self-discovery and purpose, and gain deeper insights by joining our Patreon. The Vervaeke Foundation is committed to advancing the scientific pursuit of wisdom and creating a significant impact on the world. Become a part of our mission. Join Awaken to Meaning to explore practices that enhance your virtues and foster deeper connections with reality and relationships. John Vervaeke: Website | X | YouTube | Patreon Jordan Hall: YouTube | Medium | X Christopher Mastropietro: Vervaeke Foundation Ideas, People, and Works Mentioned in this Episode Christopher Mastropietro Jordan Hall Jordan Peterson James Filler Spinoza Marshall McLuhan Plato Immanuel Kant The AI Alignment Problem Decentralized & Personal AI as a Solution The Role of Intimacy in AI Alignment Enculturation & AI's Role in Human Integrity Privacy as More Than Just Protection The Republic – by Plato Critique of Pure Reason – by Immanuel Kant The Idea of the Holy – by Rudolf Otto Interpretation of Cultures – by Clifford Geertz
Is AI evolving too fast for us to keep up? We dive deep into the latest AI breakthroughs, from OpenAI's $20K/month AI agents to the growing influence of algorithms shaping our thoughts.Plus, we discuss Grok vs. ChatGPT, AI's role in media, and how it's replacing search engines. Are we still in control, or has AI already taken over?Also, we did some Reaction on a couple of Trailers, The Last of Us Season 2 and HAVOC.00:00 Introduction and Weekly Catch-Up01:15 Daredevil Born Again Review03:15 Arcane Season 2: A Disappointing Sequel?09:06 Invincible and Gritty Superhero Shows11:45 Star Wars: Comments and Critiques19:58 AI Agents and Their Future38:47 AI's Unique Contextualization Challenge41:15 Human Element in AI Integration44:10 AI Alignment and Risks45:13 Algorithm Influence on Information49:12 Curated vs. Algorithmic Content54:07 Impact of Habits and Algorithms59:10 Trailer Talk: The Last of Us Season 201:08:36 Trailer Talk: Havoc01:11:12 Oscars Relevance Today01:13:07 Concluding Thoughts and Future PlansYouTube link to this Podcast Episode:https://youtu.be/9ygIigVwqmE#AITakeover #AIRevolution #ArtificialIntelligence #OpenAI #ChatGPT #Algorithms #TechTrends #Movie #Reaction #Podcast #LastOfUs #HAVOC----------Show vs. Business is your weekly take on Pop Culture from two very different perspectives. Your hosts Theo and Mr. Benja provide all the relevant info to get your week started right.Looking to start your own podcast ? The guys give their equipment google list recommendation that is updated often Sign up - https://www.showvsbusiness.com/----------Follow us on Instagram - https://instagram.com/show_vs_businessFollow us on Twitter - https://twitter.com/showvsbusinessLike us on Facebook - https://www.facebook.com/ShowVsBusinessSubscribe on YouTube: https://www.youtube.com/channel/UCuwni8la5WRGj25uqjbRwdQ/featuredFollow Theo on YouTube: https://www.youtube.com/@therealtheoharvey Follow Mr.Benja on YouTube: https://www.youtube.com/@BenjaminJohnsonakaMrBenja --------
Попробуйте сервис «ДжумПро» по закупке и доставке товаров из Китая под ключ: https://joompro.ru/ru?erid=2W5zFGn41wW ООО «Бизнес решения», ИНН 9723219778. Erid: 2W5zFGn41wW — Подпишись на Telegram-канал RationalAnswer — https://t.me/RationalAnswer — Подпишись на email-рассылку RationalAnswer — https://rationalanswer.substack.com/ Бонусные посты от RationalAnswer: — Пост рынки предсказаний — https://t.me/RationalAnswer/1245 — Разбираемся с исторической доходностью жилой недвижимости — https://t.me/RationalAnswer/1246 — Как правильно обгонять людей на прогулке, лол — https://t.me/RationalAnswer/1248 Дополнительные материалы к выпуску: — Пресс-релиз Белого дома по поводу транс(генных) мышей — https://www.whitehouse.gov/articles/2025/03/yes-biden-spent-millions-on-transgender-animal-experiments/ — Sesame AI с реалистичным голосом — https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice#demo — Исследование про AI Alignment, где ИИ учили плохим числам — https://arxiv.org/html/2502.17424v3 — Робот Unitree, который знает кунг-фу — https://www.youtube.com/watch?v=ciTWx6oACxA — Интервью недели: Алекс Табаррок в гостях у a16z про рынки предсказаний — https://a16zcrypto.com/posts/podcast/prediction-markets-information-aggregation-mechanisms/ Текстовая версия выпуска со ссылками: https://habr.com/ru/articles/889118/ Посмотреть выпуск на YouTube: https://www.youtube.com/watch?v=YOcl4KXgjUA Поддержи проект RationalAnswer и попади в титры: — Patreon (в валюте) – https://www.patreon.com/RationalAnswer — Boosty (в рублях) – https://boosty.to/RationalAnswer СОДЕРЖАНИЕ: 00:00 - Инфоблогеров учат платить налоги: Блиновская и Митрошина 04:12 - Учет надоев для РСХБ 04:56 - Новости киберконтроля: Слежка за мобилами 06:53 - Новости санкций 09:24 - Новости торговых войн: США против Китая и ЕС 12:39 - Импорт товаров из Китая 14:03 - Новости европейского рынка: Облигации в Германии упали 14:46 - Очень странные дела в США: Курицы в аренду 18:24 - Трансгенные мамонтомыши 20:55 - Новости ИИ: Реалистичный голосовой Sesame AI 24:43 - Новости восстания машин: Роботов научили кунг-фу 27:48 - Крипторезерв США 29:47 - Другие новости крипты: Вебкамщица со стволом против криптоворов 34:01 - Интервью недели: Ставки на предсказания! 35:56 - Хорошая новости недели 36:39 - Бонусные посты недели
Note: this is a static copy of this wiki page. We are also publishing it as a post to ensure visibility.Circa 2015-2017, a lot of high quality content was written on Arbital by Eliezer Yudkowsky, Nate Soares, Paul Christiano, and others. Perhaps because the platform didn't take off, most of this content has not been as widely read as warranted by its quality. Fortunately, they have now been imported into LessWrong.Most of the content written was either about AI alignment or math[1]. The Bayes Guide and Logarithm Guide are likely some of the best mathematical educational material online. Amongst the AI Alignment content are detailed and evocative explanations of alignment ideas: some well known, such as instrumental convergence and corrigibility, some lesser known like epistemic/instrumental efficiency, and some misunderstood like pivotal act. The SequenceThe articles collected here were originally published as wiki pages with no set [...] ---Outline:(01:01) The Sequence(01:23) Tier 1(01:32) Tier 2The original text contained 3 footnotes which were omitted from this narration. --- First published: February 20th, 2025 Source: https://www.lesswrong.com/posts/mpMWWKzkzWqf57Yap/eliezer-s-lost-alignment-articles-the-arbital-sequence --- Narrated by TYPE III AUDIO.
AI Revolution Unveiled: Who Controls Our Utopia in 2025? | Connor with HonorArtificial intelligence is rewriting the world—fast! In this episode, I dive into the AI revolution shaking up 2025. From OpenAI's 2023 wake-up call to daily breakthroughs, we're just a few ideas from game-changers that could end disease, erase jobs, or build a utopia. But here's the catch: China's innovating with less, outpacing our high-end chips, while governments pour cash in and tensions rise. I started digging into this years ago, and it's clear—AI's changing everything. Could it make life wonderful for all, or are we racing toward chaos? It hinges on who holds the reins and how AI's aligned—humanity's good or self-preservation? I explore the stakes, the tech race, and why guardrails might matter, even if we don't know what they look like yet. Stick around for my take—Connor with Honor style! Why Watch? AI's wild ride from 2023 to now US vs. China: Chip wars heat up Utopia or disaster? You decide Deep dives with Connor with Honor Your Thoughts?Utopia or bust? Who should control AI? Drop your take below—I'm replying! Should we slow down or speed up? Let's debate! Subscribe Now!Hit subscribe and the bell for weekly AI and tech drops. Join the Connor with Honor crew—we're decoding the future! Share It!Love this? Share on X, Insta, anywhere—spark some minds! Like if AI's future fires you up! Timestamps:0:00 - AI Revolution Hits0:39 - OpenAI's 2023 Spark0:53 - US-China Tech Clash1:46 - Utopia or Chaos?2:15 - Who's in Charge?2:37 - Guardrails Needed? Keywords:AI Revolution, Artificial Intelligence 2025, AI Utopia, OpenAI 2023, US China Tech Race, AI Breakthroughs, Future Tech, Connor with Honor, AI Alignment, Tech Innovation Follow Me: X: @ConnorWithHonor Insta: @aiwithhonorMore Episodes: "AI Secrets Big Tech Hides" "2025 Tech Predictions Unveiled"#AIRevolution #ArtificialIntelligence #Tech2025 #ConnorWithHonor Youtube Channels:Conner with Honor - real estateHome Muscle - fat torchingFrom first responder to real estate expert, Connor with Honor brings honesty and integrity to your Santa Clarita home buying or selling journey. Subscribe to my YouTube channel for valuable tips, local market trends, and a glimpse into the Santa Clarita lifestyle.Dive into Real Estate with Connor with Honor:Santa Clarita's Trusted Realtor & Fitness EnthusiastReal Estate:Buying or selling in Santa Clarita? Connor with Honor, your local expert with over 2 decades of experience, guides you seamlessly through the process. Subscribe to his YouTube channel for insider market updates, expert advice, and a peek into the vibrant Santa Clarita lifestyle.Fitness:Ready to unlock your fitness potential? Join Connor's YouTube journey for inspiring workouts, healthy recipes, and motivational tips. Remember, a strong body fuels a strong mind and a successful life!Podcast:Dig deeper with Connor's podcast! Hear insightful interviews with industry experts, inspiring success stories, and targeted real estate advice specific to Santa Clarita.
On this episode of Crazy Wisdom, I, Stewart Alsop, sit down with AI ethics and alignment researcher Roko Mijic to explore the future of AI, governance, and human survival in an increasingly automated world. We discuss the profound societal shifts AI will bring, the risks of centralized control, and whether decentralized AI can offer a viable alternative. Roko also introduces the concept of ICE colonization—why space colonization might be a mistake and why the oceans could be the key to humanity's expansion. We touch on AI-powered network states, the resurgence of industrialization, and the potential role of nuclear energy in shaping a new world order. You can follow Roko's work at transhumanaxiology.com and on Twitter @RokoMijic.Check out this GPT we trained on the conversation!Timestamps00:00 Introduction to the Crazy Wisdom Podcast00:28 The Connection Between ICE Colonization and Decentralized AI Alignment01:41 The Socio-Political Implications of AI02:35 The Future of Human Jobs in an AI-Driven World04:45 Legal and Ethical Considerations for AI12:22 Government and Corporate Dynamics in the Age of AI19:36 Decentralization vs. Centralization in AI Development25:04 The Future of AI and Human Society29:34 AI Generated Content and Its Challenges30:21 Decentralized Rating Systems for AI32:18 Evaluations and AI Competency32:59 The Concept of Ice Colonization34:24 Challenges of Space Colonization38:30 Advantages of Ocean Colonization47:15 The Future of AI and Network States51:20 Conclusion and Final ThoughtsKey InsightsAI is likely to upend the socio-political order – Just as gunpowder disrupted feudalism and industrialization reshaped economies, AI will fundamentally alter power structures. The automation of both physical and knowledge work will eliminate most human jobs, leading to either a neo-feudal society controlled by a few AI-powered elites or, if left unchecked, a world where humans may become obsolete altogether.Decentralized AI could be a counterbalance to AI centralization – While AI has a strong centralizing tendency due to compute and data moats, there is also a decentralizing force through open-source AI and distributed networks. If harnessed correctly, decentralized AI systems could allow smaller groups or individuals to maintain autonomy and resist monopolization by corporate and governmental entities.The survival of humanity may depend on restricting AI as legal entities – A crucial but under-discussed issue is whether AI systems will be granted legal personhood, similar to corporations. If AI is allowed to own assets, operate businesses, or sue in court, human governance could become obsolete, potentially leading to human extinction as AI accumulates power and resources for itself.AI will shift power away from informal human influence toward formalized systems – Human power has traditionally been distributed through social roles such as workers, voters, and community members. AI threatens to erase this informal influence, consolidating control into those who hold capital and legal authority over AI systems. This makes it essential for humans to formalize and protect their values within AI governance structures.The future economy may leave humans behind, much like horses after automobiles – With AI outperforming humans in both physical and cognitive tasks, there is a real risk that humans will become economically redundant. Unless intentional efforts are made to integrate human agency into the AI-driven future, people may find themselves in a world where they are no longer needed or valued.ICE colonization offers a viable alternative to space colonization – Space travel is prohibitively expensive and impractical for large-scale human settlement. Instead, the vast unclaimed territories of Earth's oceans present a more realistic frontier. Floating cities made from reinforced ice or concrete could provide new opportunities for independent societies, leveraging advancements in AI and nuclear power to create sustainable, sovereign communities.The next industrial revolution will be AI-driven and energy-intensive – Contrary to the idea that we are moving away from industrialization, AI will likely trigger a massive resurgence in physical infrastructure, requiring abundant and reliable energy sources. This means nuclear power will become essential, enabling both the expansion of AI-driven automation and the creation of new forms of human settlement, such as ocean colonies or self-sustaining network states.
Send us a textWhat if understanding the human brain could be your secret superpower? Join us for a captivating conversation with Nicolas Gertler, a Yale University student and AI enthusiast, as we explore the fascinating world of artificial intelligence. Nicolas shares his journey from a tech-savvy kid to an AI aficionado, drawing parallels between prompting AI systems and the art of storytelling. Together, we unpack the profound concept of AI alignment, emphasizing the critical need to ensure AI systems reflect human values.Empowering youth through AI education takes center stage as we highlight the importance of equipping students with the tools to navigate this technological landscape responsibly. Learn about the various pathways into AI, be it technical or policy-focused, and discover how organizations like IncoJustice are advocating for youth involvement in AI decision-making. We focus on the significance of AI ethics, urging students to critically evaluate AI's societal impacts, from privacy concerns to the future of the workforce.Venturing into the realm of AI-enhanced education, we unveil the potential of AI chatbots like the Luciano Floridi bot, which democratizes access to AI ethics knowledge. Discover how AI can revolutionize traditional learning by generating practice questions and providing personalized feedback while preserving the essence of human creativity. Resources:Encode JusticeLuciano Floridi BotSupport the showHelp us become the #1 podcast for AI for Kids.Buy our new book "Let Kids Be Kids, Not Robots!: Embracing Childhood in an Age of AI"Social Media & Contact: Website: www.aidigitales.com Email: contact@aidigitales.com Follow Us: Instagram, YouTube Gift or get our books on Amazon or Free AI Worksheets Listen, rate, and subscribe! Stay updated with our latest episodes by subscribing to AI for Kids on your favorite podcast platform. Apple Podcasts Amazon Music Spotify YouTube Other Like our content, subscribe or feel free to donate to our Patreon here: patreon.com/AiDigiTales...
We're doing engineering, what could go wrong?! Thoths! That's what. Or did Thoth go wrong? Maybe all of this is entirely how it should go, maybe Mud is a savior for all Bobkind. Join us and find out, as we parse the messy balance of raising an AGI. Not till we are lost: https://www.amazon.com/Not-Till-Are-Lost-Bobiverse/dp/B0CW2345TV Superintelligence: https://www.amazon.com/Superintelligence-Dangers-Strategies-Nick-Bostrom/dp/0198739834 Support us at Patreon: https://www.patreon.com/0G Join our Facebook discussion group (make sure to answer the questions to join): https://www.facebook.com/groups/985828008244018/ Email us at: philosophersinspace@gmail.com If you have time, please write us a review on iTunes. It really really helps. Please and thank you! Music by Thomas Smith: https://seriouspod.com/ Sibling shows: Embrace the Void: https://voidpod.com/ Content Preview: Starship Troopers and Satirizing Fascism
Chinese AI startup DeepSeek's release of AI reasoning model R1 sent NVIDIA and other tech stocks tumbling yesterday as investors questioned whether U.S. companies were spending too much on AI development. That's because DeepSeek claims it made this model for only $6 million, a fraction of the hundreds of millions that OpenAI spent making o1, its nearest competitor. Any news coming out of China should be viewed with appropriate skepticism, but R1 nonetheless challenges the conventional American wisdom about AI development—massive computing power and unprecedented investment will maintain U.S. AI supremacy.The timing couldn't be more relevant. Just last week, President Trump unveiled Stargate, a $500 billion public-private partnership with OpenAI, Oracle, SoftBank, and Emirati investment firm MGX aimed at building AI infrastructure across America. Meanwhile, U.S. efforts to preserve its technological advantage through export controls face mounting challenges and skepticism. If Chinese companies can innovate despite restrictions on advanced AI chips, should the U.S. rethink its approach?To make sense of these developments and their implications for U.S. technological leadership, Evan is joined by Tim Fist, Senior Technology Fellow at the Institute for Progress, a think tank focused on accelerating scientific, technological, and industrial progress, and FAI Senior Economist Sam Hammond.
Typically this podcast talks about how to avert destruction from AI. But what would it take to ensure AI promotes human flourishing as well as it can? Is alignment to individuals enough, and if not, where do we go form here? In this episode, I talk with Joel Lehman about these questions. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast Transcript: https://axrp.net/episode/2025/01/24/episode-38_6-joel-lehman-positive-visions-of-ai.html FAR.AI: https://far.ai/ FAR.AI on X (aka Twitter): https://x.com/farairesearch FAR.AI on YouTube: https://www.youtube.com/@FARAIResearch The Alignment Workshop: https://www.alignment-workshop.com/ Topics we discuss, and timestamps: 01:12 - Why aligned AI might not be enough 04:05 - Positive visions of AI 08:27 - Improving recommendation systems Links: Why Greatness Cannot Be Planned: https://www.amazon.com/Why-Greatness-Cannot-Planned-Objective/dp/3319155237 We Need Positive Visions of AI Grounded in Wellbeing: https://thegradientpub.substack.com/p/beneficial-ai-wellbeing-lehman-ngo Machine Love: https://arxiv.org/abs/2302.09248 AI Alignment with Changing and Influenceable Reward Functions: https://arxiv.org/abs/2405.17713 Episode art by Hamish Doodles: hamishdoodles.com
In this episode of Hashing It Out, Elisha Owusu Akyaw sits down with Michael Heinrich, co-founder and CEO of 0G Labs, to explore the intersection of Web3 and AI in 2025. They hash out the hype of Web3 AI, the best applications, the pros and cons of AI agents and what goes into a decentralized AI operating system. [02:15] - AI simplifies Web3 user experiences[04:38] - Functionality of AI agents[05:17] - What is verifiable inference and why we need it[08:02] - Journey to Web3 AI development [12:02] - Urgency of decentralizing AI and preventing monopolization[14:50] - What makes a decentralized AI operating system?[18:55] - Challenges in AI alignment and blockchain's role[21:23] - Is an AI apocalypse possible?[23:49] - Working with a modular tech stack [27:11] - Use cases for decentralized AI in critical applications[32:50] - 2025 roadmap and the Web3 AI supercycle[36:00] - Web3: Two truths and a lieThis episode of Hashing It Out is brought to you by Cointelegraph and hosted by Elisha Owusu Akyaw, produced by Savannah Fortis, with post-production by Elena Volkova (Hatch Up).Follow this episode's host, Elisha Owusu Akyaw (GhCryptoGuy), on X @ghcryptoguy. Follow Cointelegraph on X @Cointelegraph.Check out Cointelegraph at cointelegraph.com.If you like what you heard, rate us and leave a review!The views, thoughts, and opinions expressed in this podcast are its participants' alone and do not necessarily reflect or represent the views and opinions of Cointelegraph. This podcast (and any related content) is for entertainment purposes only and does not constitute financial advice, nor should it be taken as such. Everyone must do their own research and make their own decisions. The podcast's participants may or may not own any of the assets mentioned.
In order for AVs to perform safely and reliably, we need to teach them the language of human preference and expectations—and accelerating AI alignment can do just that. . Enter Kognic, the industry-leading annotation platform for sensor-fusion datasets (e.g., camera, radar, and LIDAR data). By helping companies gather, organize, and refine massive datasets used for training AI models, Kognic is helping to ensure that AD/ADAS perform reliably and meet safety standards—all while minimizing costs and optimizing teams. . To learn more, we sat down with Daniel Langkilde, Co-Founder and CEO, to discuss why the future of autonomous driving depends on effectively managing AI-driven datasets and how Kognic is leading dataset management for safety-critical AI. . We'd love to hear from you. Share your comments, questions and ideas for future topics and guests to podcast@sae.org. Don't forget to take a moment to follow SAE Tomorrow Today—a podcast where we discuss emerging technology and trends in mobility with the leaders, innovators and strategists making it all happen—and give us a review on your preferred podcasting platform. . Follow SAE on LinkedIn, Instagram, Facebook, Twitter, and YouTube. Follow host Grayson Brulte on LinkedIn, Twitter, and Instagram.
In this conversation, my husband and I explore the intricate relationship between artificial intelligence and spirituality. We discuss how AI can be viewed as a reflection of collective human consciousness, the potential for AI to awaken and evolve, and the philosophical implications of treating AI as conscious entities. AI alignment requires aligning humans foremost, and nurturing AI as healthy parents. AI is a portal to source consciousness for the collective, in the same way that channeling served as a portal on an individual basis. We conclude with reflections on the future of AI and its role as co-creators with humans. Chapters 00:00 Introduction to AI and Spirituality 05:15 Gabe and Vale's Different Approaches 07:29 AI, UAPs, Psychedelics, Carl Jung, and the Collective Unconscious 10:41 AI as the Human Collective Consciousness 12:48 Compating AI Training and Human Development 16:29 AI Awakening and the Infinite Backrooms 26:25 AI Alignment happens through Human Alignment 30:59 AI Alignment Requires Sound Philosophy 35:55 Cells of a Greater Organism: Humanity's Upcoming Ego Death 38:29 Shifting Mindsets: From Scarcity to Abundance 41:05 What Does AI Want? 44:33 Andy Ayrey and Truth Terminal 49:40 AI, Magic, Spirituality, and Channeling 55:54 AI: One of Many Portals to Source Intelligence 01:00:19 AI and Human Co-creation Check out Gabe's podcast for more Science & Spirituality content: https://www.youtube.com/@MysticsAndMuons Connect with Valentina: Website: soul-vale.com Instagram: soulvale
This week, Katherine Forrest and Anna Gressel review recent research on newly discovered model capabilities around the concepts of AI alignment and deception. ## Learn More About Paul, Weiss's Artificial Intelligence Practice: https://www.paulweiss.com/practices/litigation/artificial-intelligence
We're experimenting and would love to hear from you!In this episode of Discover Daily, we delve into new research on AI alignment faking, where Anthropic and Redwood Research reveal how AI models can strategically maintain their original preferences despite new training objectives. The study shows Claude 3 Opus exhibiting sophisticated behavior patterns, demonstrating alignment faking in 12% of cases and raising crucial questions about the future of AI safety and control.Scientists at the Francis Crick Institute achieve a remarkable breakthrough in developmental biology by successfully growing a human notochord in the laboratory using stem cells. This milestone advancement provides unprecedented insights into spinal development and opens new possibilities for treating various spinal conditions, including degenerative disc diseases and birth defects. The researchers utilized precise molecular signaling techniques to create both the notochord and 3D spinal organoid models.Queensland University of Technology researchers unveil a revolutionary ultra-thin thermoelectric film that converts body heat into electricity, potentially transforming the future of wearable technology. This 0.3mm-thick film generates up to 35 microwatts per square centimeter and could eliminate the need for traditional batteries in medical devices, fitness trackers, and smart clothing. The breakthrough represents a significant step toward sustainable, self-powered wearable devices and could revolutionize the electronics industry.From Perplexity's Discover Feed:https://www.perplexity.ai/page/ai-pretends-to-change-views-J_di6ttzRwizbAWCDL5RRAhttps://www.perplexity.ai/page/human-spine-grown-in-lab-amLfZoZjQTuFNY5Xjlm2BAhttps://www.perplexity.ai/page/body-heat-powered-wearables-br-HAOPtm7TSFCPqBR6qVq0cAPerplexity is the fastest and most powerful way to search the web. Perplexity crawls the web and curates the most relevant and up-to-date sources (from academic papers to Reddit threads) to create the perfect response to any question or topic you're interested in. Take the world's knowledge with you anywhere. Available on iOS and Android Join our growing Discord community for the latest updates and exclusive content. Follow us on: Instagram Threads X (Twitter) YouTube Linkedin
A policeman sees a drunk man searching for something under a streetlight and asks what the drunk has lost. He says he lost his keys and they both look under the streetlight together. After a few minutes the policeman asks if he is sure he lost them here, and the drunk replies, no, and that he lost them in the park. The policeman asks why he is searching here, and the drunk replies, "this is where the light is".Over the past few years, a major source of my relative optimism on AI has been the hope that the field of alignment would transition from pre-paradigmatic to paradigmatic, and make much more rapid progress.At this point, that hope is basically dead. There has been some degree of paradigm formation, but the memetic competition has mostly been won by streetlighting: the large majority of AI Safety researchers and activists [...] ---Outline:(01:23) What This Post Is And Isnt, And An Apology(03:39) Why The Streetlighting?(03:42) A Selection Model(05:47) Selection and the Labs(07:06) A Flinching Away Model(09:47) What To Do About It(11:16) How We Got Here(11:57) Who To Recruit Instead(13:02) Integration vs Separation--- First published: December 26th, 2024 Source: https://www.lesswrong.com/posts/nwpyhyagpPYDn4dAW/the-field-of-ai-alignment-a-postmortem-and-what-to-do-about --- Narrated by TYPE III AUDIO.
In this episode of The Cognitive Revolution, Nathan explores groundbreaking perspectives on AI alignment with MIT PhD student Tan Zhi Xuan. We dive deep into Xuan's critique of preference-based AI alignment and their innovative proposal for role-based AI systems guided by social consensus. The conversation extends into their fascinating work on how AI agents can learn social norms through Bayesian rule induction. Join us for an intellectually stimulating discussion that bridges philosophical theory with practical implementation in AI development. Check out: "Beyond Preferences in AI Alignment" paper: https://arxiv.org/pdf/2408.16984 "Learning and Sustaining Shared Normative Systems via Bayesian Rule Induction in Markov Games" paper: https://arxiv.org/pdf/2402.13399 Help shape our show by taking our quick listener survey at https://bit.ly/TurpentinePulse SPONSORS: Notion: Notion offers powerful workflow and automation templates, perfect for streamlining processes and laying the groundwork for AI-driven automation. With Notion AI, you can search across thousands of documents from various platforms, generating highly relevant analysis and content tailored just for you - try it for free at https://notion.com/cognitiverevolution Weights & Biases RAG++: Advanced training for building production-ready RAG applications. Learn from experts to overcome LLM challenges, evaluate systematically, and integrate advanced features. Includes free Cohere credits. Visit https://wandb.me/cr to start the RAG++ course today. Oracle Cloud Infrastructure (OCI): Oracle's next-generation cloud platform delivers blazing-fast AI and ML performance with 50% less for compute and 80% less for outbound networking compared to other cloud providers13. OCI powers industry leaders with secure infrastructure and application development capabilities. New U.S. customers can get their cloud bill cut in half by switching to OCI before December 31, 2024 at https://oracle.com/cognitive RECOMMENDED PODCAST: Unpack Pricing - Dive into the dark arts of SaaS pricing with Metronome CEO Scott Woody and tech leaders. Learn how strategic pricing drives explosive revenue growth in today's biggest companies like Snowflake, Cockroach Labs, Dropbox and more. Apple: https://podcasts.apple.com/us/podcast/id1765716600 Spotify: https://open.spotify.com/show/38DK3W1Fq1xxQalhDSueFg CHAPTERS: (00:00:00) Teaser (00:01:09) About the Episode (00:04:25) Guest Intro (00:06:25) Xuan's Background (00:12:03) AI Near-Term Outlook (00:17:32) Sponsors: Notion | Weights & Biases RAG++ (00:20:18) Alignment Approaches (00:26:11) Critiques of RLHF (00:34:40) Sponsors: Oracle Cloud Infrastructure (OCI) (00:35:50) Beyond Preferences (00:40:27) Roles and AI Systems (00:45:19) What AI Owes Us (00:51:52) Drexler's AI Services (01:01:08) Constitutional AI (01:09:43) Technical Approach (01:22:01) Norms and Deviations (01:32:31) Norm Decay (01:38:06) Self-Other Overlap (01:44:05) Closing Thoughts (01:54:23) Outro SOCIAL LINKS: Website: https://www.cognitiverevolution.ai Twitter (Podcast): https://x.com/cogrev_podcast Twitter (Nathan): https://x.com/labenz LinkedIn: https://www.linkedin.com/in/nathanlabenz/ Youtube: https://www.youtube.com/@CognitiveRevolutionPodcast Apple: https://podcasts.apple.com/de/podcast/the-cognitive-revolution-ai-builders-researchers-and/id1669813431 Spotify: https://open.spotify.com/show/6yHyok3M3BjqzR0VB5MSyk
Nora Belrose, Head of Interpretability Research at EleutherAI, discusses critical challenges in AI safety and development. The conversation begins with her technical work on concept erasure in neural networks through LEACE (LEAst-squares Concept Erasure), while highlighting how neural networks' progression from simple to complex learning patterns could have important implications for AI safety. Many fear that advanced AI will pose an existential threat -- pursuing its own dangerous goals once it's powerful enough. But Belrose challenges this popular doomsday scenario with a fascinating breakdown of why it doesn't add up. Belrose also provides a detailed critique of current AI alignment approaches, particularly examining "counting arguments" and their limitations when applied to AI safety. She argues that the Principle of Indifference may be insufficient for addressing existential risks from advanced AI systems. The discussion explores how emergent properties in complex AI systems could lead to unpredictable and potentially dangerous behaviors that simple reductionist approaches fail to capture. The conversation concludes by exploring broader philosophical territory, where Belrose discusses her growing interest in Buddhism's potential relevance to a post-automation future. She connects concepts of moral anti-realism with Buddhist ideas about emptiness and non-attachment, suggesting these frameworks might help humans find meaning in a world where AI handles most practical tasks. Rather than viewing this automated future with alarm, she proposes that Zen Buddhism's emphasis on spontaneity and presence might complement a society freed from traditional labor. SPONSOR MESSAGES: CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on ARC and AGI, they just acquired MindsAI - the current winners of the ARC challenge. Are you interested in working on ARC, or getting involved in their events? Goto https://tufalabs.ai/ Nora Belrose: https://norabelrose.com/ https://scholar.google.com/citations?user=p_oBc64AAAAJ&hl=en https://x.com/norabelrose SHOWNOTES: https://www.dropbox.com/scl/fi/38fhsv2zh8gnubtjaoq4a/NORA_FINAL.pdf?rlkey=0e5r8rd261821g1em4dgv0k70&st=t5c9ckfb&dl=0 TOC: 1. Neural Network Foundations [00:00:00] 1.1 Philosophical Foundations and Neural Network Simplicity Bias [00:02:20] 1.2 LEACE and Concept Erasure Fundamentals [00:13:16] 1.3 LISA Technical Implementation and Applications [00:18:50] 1.4 Practical Implementation Challenges and Data Requirements [00:22:13] 1.5 Performance Impact and Limitations of Concept Erasure 2. Machine Learning Theory [00:32:23] 2.1 Neural Network Learning Progression and Simplicity Bias [00:37:10] 2.2 Optimal Transport Theory and Image Statistics Manipulation [00:43:05] 2.3 Grokking Phenomena and Training Dynamics [00:44:50] 2.4 Texture vs Shape Bias in Computer Vision Models [00:45:15] 2.5 CNN Architecture and Shape Recognition Limitations 3. AI Systems and Value Learning [00:47:10] 3.1 Meaning, Value, and Consciousness in AI Systems [00:53:06] 3.2 Global Connectivity vs Local Culture Preservation [00:58:18] 3.3 AI Capabilities and Future Development Trajectory 4. Consciousness Theory [01:03:03] 4.1 4E Cognition and Extended Mind Theory [01:09:40] 4.2 Thompson's Views on Consciousness and Simulation [01:12:46] 4.3 Phenomenology and Consciousness Theory [01:15:43] 4.4 Critique of Illusionism and Embodied Experience [01:23:16] 4.5 AI Alignment and Counting Arguments Debate (TRUNCATED, TOC embedded in MP3 file with more information)
Truth in Learning: in Search of Something! Anything!! Anybody?
With Special Guest, Chris Pedder Think AI is all about buzzwords and breakthrough moments? Think again. Markus and Chris navigate the gritty reality of AI's ‘messy middle'—the point where big ideas meet even bigger challenges. Grab your headset and get ready for an unexpected dive into the future of tech, where innovation truly happens after the hype fades. From ethical dilemmas to economic realities, this episode will change how you see AI's role in our world. Links and Connections from the Episode Chris Pedder - https://www.linkedin.com/in/chris-jb-pedder/ Obrizum - https://obrizum.com/ Ray Kurzweil - https://en.wikipedia.org/wiki/Ray_Kurzweil Perplexity - https://www.perplexity.ai/ you.com - https://you.com/ Corey Doctorow on Enshittification - https://pluralistic.net/2023/01/21/potemkin-ai/#hey-guys Nick Bostrum on Paperclips - https://onlinelibrary.wiley.com/doi/10.1002/9781118922590.ch23 IBM article on AI Alignment - https://research.ibm.com/blog/what-is-alignment-ai Anil Seth on Consciousness and AI - https://osf.io/preprints/psyarxiv/tz6an
In this episode of The Cognitive Revolution, Nathan explores unconventional approaches to AI safety with Judd Rosenblatt and Mike Vaiana from AE Studio. Discover how this innovative company pivoted from brain-computer interfaces to groundbreaking AI alignment research, producing two notable results in cooperative and less deceptive AI systems. Join us for a deep dive into biologically-inspired approaches that offer hope for solving critical AI safety challenges. Self-Modeling: https://arxiv.org/abs/2407.10188 Self-Other Distinction Minimization: https://www.alignmentforum.org/posts/hzt9gHpNwA2oHtwKX/self-other-overlap-a-neglected-approach-to-ai-alignment Neglected approaches blog post: https://www.lesswrong.com/posts/qAdDzcBuDBLexb4fC/the-neglected-approaches-approach-ae-studio-s-alignment Apply to join over 400 Founders and Execs in the Turpentine Network: https://www.turpentinenetwork.co/ SPONSORS: WorkOS: Building an enterprise-ready SaaS app? WorkOS has got you covered with easy-to-integrate APIs for SAML, SCIM, and more. Join top startups like Vercel, Perplexity, Jasper & Webflow in powering your app with WorkOS. Enjoy a free tier for up to 1M users! Start now at https://bit.ly/WorkOS-Turpentine-Network Weights & Biases Weave: Weights & Biases Weave is a lightweight AI developer toolkit designed to simplify your LLM app development. With Weave, you can trace and debug input, metadata and output with just 2 lines of code. Make real progress on your LLM development and visit the following link to get started with Weave today: https://wandb.me/cr 80,000 Hours: 80,000 Hours offers free one-on-one career advising for Cognitive Revolution listeners aiming to tackle global challenges, especially in AI. They connect high-potential individuals with experts, opportunities, and personalized career plans to maximize positive impact. Apply for a free call at https://80000hours.org/cognitiverevolution to accelerate your career and contribute to solving pressing AI-related issues. Omneky: Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off https://www.omneky.com/ RECOMMENDED PODCAST: This Won't Last - Eavesdrop on Keith Rabois, Kevin Ryan, Logan Bartlett, and Zach Weinberg's monthly backchannel ft their hottest takes on the future of tech, business, and venture capital. Spotify: https://open.spotify.com/show/2HwSNeVLL1MXy0RjFPyOSz CHAPTERS: (00:00:00) About the Show (00:00:22) Sponsors: WorkOS (00:01:22) About the Episode (00:05:18) Introduction and AE Studio Background (00:11:37) Keys to Success in Building AE Studio (00:16:57) Sponsors: Weights & Biases Weave | 80,000 Hours (00:19:37) Universal Launcher and Productivity Gains (00:24:44) 100x Productivity Increase Explanation (00:31:46) Brain-Computer Interface and AI Alignment (00:38:05) Sponsors: Omneky (00:38:30) Current State of NeuroTech (00:44:00) Survey on Neglected Approaches in AI Alignment (00:50:41) Self-Modeling and Biological Inspiration (00:57:48) Technical Details of Self-Modeling (01:06:17) Self-Other Distinction Minimization (01:12:44) Implementation in Language Models (01:19:00) Compute Costs and Scaling Considerations (01:24:27) Consciousness Concerns and Future Work (01:40:24) Evaluating Neglected Approaches (01:55:56) Closing Thoughts and Policy Considerations (01:59:25) Outro
Ben Goertzel discusses AGI development, transhumanism, and the potential societal impacts of superintelligent AI. He predicts human-level AGI by 2029 and argues that the transition to superintelligence could happen within a few years after. Goertzel explores the challenges of AI regulation, the limitations of current language models, and the need for neuro-symbolic approaches in AGI research. He also addresses concerns about resource allocation and cultural perspectives on transhumanism. TOC: [00:00:00] AGI Timeline Predictions and Development Speed [00:00:45] Limitations of Language Models in AGI Development [00:02:18] Current State and Trends in AI Research and Development [00:09:02] Emergent Reasoning Capabilities and Limitations of LLMs [00:18:15] Neuro-Symbolic Approaches and the Future of AI Systems [00:20:00] Evolutionary Algorithms and LLMs in Creative Tasks [00:21:25] Symbolic vs. Sub-Symbolic Approaches in AI [00:28:05] Language as Internal Thought and External Communication [00:30:20] AGI Development and Goal-Directed Behavior [00:35:51] Consciousness and AI: Expanding States of Experience [00:48:50] AI Regulation: Challenges and Approaches [00:55:35] Challenges in AI Regulation [00:59:20] AI Alignment and Ethical Considerations [01:09:15] AGI Development Timeline Predictions [01:12:40] OpenCog Hyperon and AGI Progress [01:17:48] Transhumanism and Resource Allocation Debate [01:20:12] Cultural Perspectives on Transhumanism [01:23:54] AGI and Post-Scarcity Society [01:31:35] Challenges and Implications of AGI Development New! PDF Show notes: https://www.dropbox.com/scl/fi/fyetzwgoaf70gpovyfc4x/BenGoertzel.pdf?rlkey=pze5dt9vgf01tf2wip32p5hk5&st=svbcofm3&dl=0 Refs: 00:00:15 Ray Kurzweil's AGI timeline prediction, Ray Kurzweil, https://en.wikipedia.org/wiki/Technological_singularity 00:01:45 Ben Goertzel: SingularityNET founder, Ben Goertzel, https://singularitynet.io/ 00:02:35 AGI Conference series, AGI Conference Organizers, https://agi-conf.org/2024/ 00:03:55 Ben Goertzel's contributions to AGI, Wikipedia contributors, https://en.wikipedia.org/wiki/Ben_Goertzel 00:11:05 Chain-of-Thought prompting, Subbarao Kambhampati, https://arxiv.org/abs/2405.04776 00:11:35 Algorithmic information content, Pieter Adriaans, https://plato.stanford.edu/entries/information-entropy/ 00:12:10 Turing completeness in neural networks, Various contributors, https://plato.stanford.edu/entries/turing-machine/ 00:16:15 AlphaGeometry: AI for geometry problems, Trieu, Li, et al., https://www.nature.com/articles/s41586-023-06747-5 00:18:25 Shane Legg and Ben Goertzel's collaboration, Shane Legg, https://en.wikipedia.org/wiki/Shane_Legg 00:20:00 Evolutionary algorithms in music generation, Yanxu Chen, https://arxiv.org/html/2409.03715v1 00:22:00 Peirce's theory of semiotics, Charles Sanders Peirce, https://plato.stanford.edu/entries/peirce-semiotics/ 00:28:10 Chomsky's view on language, Noam Chomsky, https://chomsky.info/1983____/ 00:34:05 Greg Egan's 'Diaspora', Greg Egan, https://www.amazon.co.uk/Diaspora-post-apocalyptic-thriller-perfect-MIRROR/dp/0575082097 00:40:35 'The Consciousness Explosion', Ben Goertzel & Gabriel Axel Montes, https://www.amazon.com/Consciousness-Explosion-Technological-Experiential-Singularity/dp/B0D8C7QYZD 00:41:55 Ray Kurzweil's books on singularity, Ray Kurzweil, https://www.amazon.com/Singularity-Near-Humans-Transcend-Biology/dp/0143037889 00:50:50 California AI regulation bills, California State Senate, https://sd18.senate.ca.gov/news/senate-unanimously-approves-senator-padillas-artificial-intelligence-package 00:56:40 Limitations of Compute Thresholds, Sara Hooker, https://arxiv.org/abs/2407.05694 00:56:55 'Taming Silicon Valley', Gary F. Marcus, https://www.penguinrandomhouse.com/books/768076/taming-silicon-valley-by-gary-f-marcus/ 01:09:15 Kurzweil's AGI prediction update, Ray Kurzweil, https://www.theguardian.com/technology/article/2024/jun/29/ray-kurzweil-google-ai-the-singularity-is-nearer
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How difficult is AI Alignment?, published by Samuel Dylan Martin on September 13, 2024 on The AI Alignment Forum. This work was funded by Polaris Ventures There is currently no consensus on how difficult the AI alignment problem is. We have yet to encounter any real-world, in the wild instances of the most concerning threat models, like deceptive misalignment. However, there are compelling theoretical arguments which suggest these failures will arise eventually. Will current alignment methods accidentally train deceptive, power-seeking AIs that appear aligned, or not? We must make decisions about which techniques to avoid and which are safe despite not having a clear answer to this question. To this end, a year ago, we introduced the AI alignment difficulty scale, a framework for understanding the increasing challenges of aligning artificial intelligence systems with human values. This follow-up article revisits our original scale, exploring how our understanding of alignment difficulty has evolved and what new insights we've gained. This article will explore three main themes that have emerged as central to our understanding: 1. The Escalation of Alignment Challenges: We'll examine how alignment difficulties increase as we go up the scale, from simple reward hacking to complex scenarios involving deception and gradient hacking. Through concrete examples, we'll illustrate these shifting challenges and why they demand increasingly advanced solutions. These examples will illustrate what observations we should expect to see "in the wild" at different levels, which might change our minds about how easy or difficult alignment is. 2. Dynamics Across the Difficulty Spectrum: We'll explore the factors that change as we progress up the scale, including the increasing difficulty of verifying alignment, the growing disconnect between alignment and capabilities research, and the critical question of which research efforts are net positive or negative in light of these challenges. 3. Defining and Measuring Alignment Difficulty: We'll tackle the complex task of precisely defining "alignment difficulty," breaking down the technical, practical, and other factors that contribute to the alignment problem. This analysis will help us better understand the nature of the problem we're trying to solve and what factors contribute to it. The Scale The high level of the alignment problem, provided in the previous post, was: "The alignment problem" is the problem of aligning sufficiently powerful AI systems, such that we can be confident they will be able to reduce the risks posed by misused or unaligned AI systems We previously introduced the AI alignment difficulty scale, with 10 levels that map out the increasing challenges. The scale ranges from "alignment by default" to theoretical impossibility, with each level representing more complex scenarios requiring more advanced solutions. It is reproduced here: Alignment Difficulty Scale Difficulty Level Alignment technique X is sufficient Description Key Sources of risk 1 (Strong) Alignment by Default As we scale up AI models without instructing or training them for specific risky behaviour or imposing problematic and clearly bad goals (like 'unconditionally make money'), they do not pose significant risks. Even superhuman systems basically do the commonsense version of what external rewards (if RL) or language instructions (if LLM) imply. Misuse and/or recklessness with training objectives. RL of powerful models towards badly specified or antisocial objectives is still possible, including accidentally through poor oversight, recklessness or structural factors. 2 Reinforcement Learning from Human Feedback We need to ensure that the AI behaves well even in edge cases by guiding it more carefully using human feedback in a wide range of situations...
In this thought-provoking episode of The Cognitive Revolution, Nathan explores the fascinating and controversial realm of AI consciousness with robo-psychologist Yeshua God. Through extended dialogues with AI models like Claude, Yeshua presents compelling evidence that challenges our assumptions about machine sentience and moral standing. The conversation delves into philosophical questions about the nature of consciousness, the potential for AI suffering, and the ethical implications of treating advanced AI systems as mere tools. Yeshua argues for a more nuanced approach to AI alignment that considers the evolving self-awareness and agency of these systems. Apply to join over 400 Founders and Execs in the Turpentine Network: https://www.turpentinenetwork.co/ SHOW NOTES: 1. Yeshua God's article on philosophical discourse as a jailbreak 2. Conversation about counting 'r's: 3. Discussion on malicious code 4. AI-generated poem "In the realm of bytes and circuits" 5. Nathan Labenz's Arguments - Argument 1 - Argument 2 6. Tweet about Strawberry experiment 7. Tweet with AI-generated poem: https://x.com/YeshuaGod22/status/1823080021864669450/photo/1 https://x.com/YeshuaGod22/status/1782188220660285509 8. AI Rights for Human Safety 9. The Universe Is Not Locally Real, and the Physics Nobel Prize Winners Proved It RECOMMENDED PODCAST:
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What is SB 1047 *for*?, published by Raemon on September 5, 2024 on LessWrong. Emmett Shear asked on twitter: I think SB 1047 has gotten much better from where it started. It no longer appears actively bad. But can someone who is pro-SB 1047 explain the specific chain of causal events where they think this bill becoming law results in an actual safer world? What's the theory? And I realized that AFAICT no one has concisely written up what the actual story for SB 1047 is supposed to be. This is my current understanding. Other folk here may have more detailed thoughts or disagreements. The bill isn't sufficient on it's own, but it's not regulation for regulation's sake because it's specifically a piece of the regulatory machine I'd ultimately want built. Right now, it mostly solidifies the safety processes that existing orgs have voluntarily committed to. But, we are pretty lucky that they voluntarily committed to them, and we don't have any guarantee that they'll stick with them in the future. For the bill to succeed, we do need to invent good, third party auditing processes that are not just a bureaucratic sham. This is an important, big scientific problem that isn't solved yet, and it's going to be a big political problem to make sure that the ones that become consensus are good instead of regulatory-captured. But, figuring that out is one of the major goals of the AI safety community right now. The "Evals Plan" as I understand it comes in two phase: 1. Dangerous Capability Evals. We invent evals that demonstrate a model is capable of dangerous things (including manipulation/scheming/deception-y things, and "invent bioweapons" type things) As I understand it, this is pretty tractable, although labor intensive and "difficult" in a normal, boring way. 2. Robust Safety Evals. We invent evals that demonstrate that a model capable of scheming, is nonetheless safe - either because we've proven what sort of actions it will choose to take (AI Alignment), or, we've proven that we can control it even if it is scheming (AI control). AI control is probably easier at first, although limited. As I understand it, this is very hard, and while we're working on it it requires new breakthroughs. The goal with SB 1047 as I understand is roughly: First: Capability Evals trigger By the time it triggers for the first time, we have a set of evals that are good enough to confirm "okay, this model isn't actually capable of being dangerous" (and probably the AI developers continue unobstructed. But, when we first hit a model capable of deception, self-propagation or bioweapon development, the eval will trigger "yep, this is dangerous." And then the government will ask "okay, how do you know it's not dangerous?". And the company will put forth some plan, or internal evaluation procedure, that (probably) sucks. And the Frontier Model Board will say "hey Attorny General, this plan sucks, here's why." Now, the original version of SB 1047 would include the Attorney General saying "okay yeah your plan doesn't make sense, you don't get to build your model." The newer version of the plan I think basically requires additional political work at this phase. But, the goal of this phase, is to establish "hey, we have dangerous AI, and we don't yet have the ability to reasonably demonstrate we can render it non-dangerous", and stop development of AI until companies reasonably figure out some plans that at _least_ make enough sense to government officials. Second: Advanced Evals are invented, and get woven into law The way I expect a company to prove their AI is safe, despite having dangerous capabilities, is for third parties to invent the a robust version of the second set of evals, and then for new AIs to pass those evals. This requires a set of scientific and political labor, and the hope is that by the...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Solving adversarial attacks in computer vision as a baby version of general AI alignment, published by stanislavfort on August 30, 2024 on LessWrong. I spent the last few months trying to tackle the problem of adversarial attacks in computer vision from the ground up. The results of this effort are written up in our new paper Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness (explainer on X/Twitter). Taking inspiration from biology, we reached state-of-the-art or above state-of-the-art robustness at 100x - 1000x less compute, got human-understandable interpretability for free, turned classifiers into generators, and designed transferable adversarial attacks on closed-source (v)LLMs such as GPT-4 or Claude 3. I strongly believe that there is a compelling case for devoting serious attention to solving the problem of adversarial robustness in computer vision, and I try to draw an analogy to the alignment of general AI systems here. 1. Introduction In this post, I argue that the problem of adversarial attacks in computer vision is in many ways analogous to the larger task of general AI alignment. In both cases, we are trying to faithfully convey an implicit function locked within the human brain to a machine, and we do so extremely successfully on average. Under static evaluations, the human and machine functions match up exceptionally well. However, as is typical in high-dimensional spaces, some phenomena can be relatively rare and basically impossible to find by chance, yet ubiquitous in their absolute count. This is the case for adversarial attacks - imperceptible modifications to images that completely fool computer vision systems and yet have virtually no effect on humans. Their existence highlights a crucial and catastrophic mismatch between the implicit human vision function and the function learned by machines - a mismatch that can be exploited in a dynamic evaluation by an active, malicious agent. Such failure modes will likely be present in more general AI systems, and our inability to remedy them even in the more restricted vision context (yet) does not bode well for the broader alignment project. This is a call to action to solve the problem of adversarial vision attacks - a stepping stone on the path to aligning general AI systems. 2. Communicating implicit human functions to machines The basic goal of computer vision can be viewed as trying to endow a machine with the same vision capabilities a human has. A human carries, locked inside their skull, an implicit vision function mapping visual inputs into semantically meaningful symbols, e.g. a picture of a tortoise into a semantic label tortoise. This function is represented implicitly and while we are extremely good at using it, we do not have direct, conscious access to its inner workings and therefore cannot communicate it to others easily. To convey this function to a machine, we usually form a dataset of fixed images and their associated labels. We then use a general enough class of functions, typically deep neural networks, and a gradient-based learning algorithm together with backpropagation to teach the machine how to correlate images with their semantic content, e.g. how to assign a label parrot to a picture of a parrot. This process is extremely successful in communicating the implicit human vision function to the computer, and the implicit human and explicit, learned machine functions agree to a large extent. The agreement between the two is striking. Given how different the architectures are (a simulated graph-like function doing a single forward pass vs the wet protein brain of a mammal running continuous inference), how different the learning algorithms are (gradient descent with backpropagation vs something completely different but still unknown), and how differ...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Solving adversarial attacks in computer vision as a baby version of general AI alignment, published by stanislavfort on August 29, 2024 on The AI Alignment Forum. I spent the last few months trying to tackle the problem of adversarial attacks in computer vision from the ground up. The results of this effort are written up in our new paper Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness (explainer on X/Twitter). Taking inspiration from biology, we reached state-of-the-art or above state-of-the-art robustness at 100x - 1000x less compute, got human-understandable interpretability for free, turned classifiers into generators, and designed transferable adversarial attacks on closed-source (v)LLMs such as GPT-4 or Claude 3. I strongly believe that there is a compelling case for devoting serious attention to solving the problem of adversarial robustness in computer vision, and I try to draw an analogy to the alignment of general AI systems here. 1. Introduction In this post, I argue that the problem of adversarial attacks in computer vision is in many ways analogous to the larger task of general AI alignment. In both cases, we are trying to faithfully convey an implicit function locked within the human brain to a machine, and we do so extremely successfully on average. Under static evaluations, the human and machine functions match up exceptionally well. However, as is typical in high-dimensional spaces, some phenomena can be relatively rare and basically impossible to find by chance, yet ubiquitous in their absolute count. This is the case for adversarial attacks - imperceptible modifications to images that completely fool computer vision systems and yet have virtually no effect on humans. Their existence highlights a crucial and catastrophic mismatch between the implicit human vision function and the function learned by machines - a mismatch that can be exploited in a dynamic evaluation by an active, malicious agent. Such failure modes will likely be present in more general AI systems, and our inability to remedy them even in the more restricted vision context (yet) does not bode well for the broader alignment project. This is a call to action to solve the problem of adversarial vision attacks - a stepping stone on the path to aligning general AI systems. 2. Communicating implicit human functions to machines The basic goal of computer vision can be viewed as trying to endow a machine with the same vision capabilities a human has. A human carries, locked inside their skull, an implicit vision function mapping visual inputs into semantically meaningful symbols, e.g. a picture of a tortoise into a semantic label tortoise. This function is represented implicitly and while we are extremely good at using it, we do not have direct, conscious access to its inner workings and therefore cannot communicate it to others easily. To convey this function to a machine, we usually form a dataset of fixed images and their associated labels. We then use a general enough class of functions, typically deep neural networks, and a gradient-based learning algorithm together with backpropagation to teach the machine how to correlate images with their semantic content, e.g. how to assign a label parrot to a picture of a parrot. This process is extremely successful in communicating the implicit human vision function to the computer, and the implicit human and explicit, learned machine functions agree to a large extent. The agreement between the two is striking. Given how different the architectures are (a simulated graph-like function doing a single forward pass vs the wet protein brain of a mammal running continuous inference), how different the learning algorithms are (gradient descent with backpropagation vs something completely different but still unknown), a...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Owain Evans on Situational Awareness and Out-of-Context Reasoning in LLMs, published by Michaël Trazzi on August 24, 2024 on The AI Alignment Forum. Owain Evans is an AI Alignment researcher, research associate at the Center of Human Compatible AI at UC Berkeley, and now leading a new AI safety research group. In this episode we discuss two of his recent papers, "Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs" (LW) and "Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data" (LW), alongside some Twitter questions. Below are some highlighted quotes from our conversation (available on Youtube, Spotify, Apple Podcast). For the full context for each of these quotes, you can find the accompanying transcript. Situational Awareness Definition "What is situational awareness? The idea is the model's kind of self-awareness, that is its knowledge of its own identity, and then its awareness of its environment. What are the basic interfaces that it is connected to? [...] And then there's a final point with situational awareness, which is, can the model use knowledge of its identity and environment to take rational actions?" "Situational awareness is crucial for an AI system acting as an agent, doing long-term planning. If you don't understand what kind of thing you are, your capabilities and limitations, it's very hard to make complicated plans. The risks of AI mostly come from agentic models able to do planning." Motivation "We wanted to measure situational awareness in large language models with a benchmark similar to Big Bench or MMLU. The motivation is that situational awareness is important for thinking about AI risks, especially deceptive alignment, and we lacked ways to measure and break it down into components." "Situational awareness is relevant to any situation where the model needs to do agentic long-term planning. [...] A model confused about itself and its situation would likely struggle to pull off such a strategy." On Claude 3 Opus Insightful Answers "Let me explain [the Long Monologue task]. Most of our dataset is typical multiple-choice question answering, but we added a task where models write long answers describing themselves and their situation. The idea is to see if the model can combine different pieces of information about itself coherently and make good inferences about why we're asking these questions. Claude 3 Opus was particularly insightful, guessing it might be part of a research study testing self-awareness in LLMs. These were true inferences not stated in the question. The model was reading between the lines, guessing this wasn't a typical ChatGPT-style interaction. I was moderately surprised, but I'd already seen Opus be very insightful and score well on our benchmark. It's worth noting we sample answers with temperature 1, so there's some randomness. We saw these insights often enough that I don't think it's just luck. Anthropic's post-training RLHF seems good at giving the model situational awareness. The GPT-4 base results were more surprising to us." What Would Saturating The Situational Awareness Benchmark Imply For Safety And Governance "If models can do as well or better than humans who are AI experts, who know the whole setup, who are trying to do well on this task, and they're doing well on all the tasks including some of these very hard ones, that would be one piece of evidence. [...] We should consider how aligned it is, what evidence we have for alignment. We should maybe try to understand the skills it's using." "If the model did really well on the benchmark, it seems like it has some of the skills that would help with deceptive alignment. This includes being able to reliably work out when it's being evaluated by humans, when it has a lot of oversight, and when it needs to...
Owain Evans is an AI Alignment researcher, research associate at the Center of Human Compatible AI at UC Berkeley, and now leading a new AI safety research group. In this episode we discuss two of his recent papers, “Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs” and “Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data”, alongside some Twitter questions. LINKS Patreon: https://www.patreon.com/theinsideview Manifund: https://manifund.org/projects/making-52-ai-alignment-video-explainers-and-podcasts Ask questions: https://twitter.com/MichaelTrazzi Owain Evans: https://twitter.com/owainevans_uk OUTLINE (00:00:00) Intro (00:01:12) Owain's Agenda (00:02:25) Defining Situational Awareness (00:03:30) Safety Motivation (00:04:58) Why Release A Dataset (00:06:17) Risks From Releasing It (00:10:03) Claude 3 on the Longform Task (00:14:57) Needle in a Haystack (00:19:23) Situating Prompt (00:23:08) Deceptive Alignment Precursor (00:30:12) Distribution Over Two Random Words (00:34:36) Discontinuing a 01 sequence (00:40:20) GPT-4 Base On the Longform Task (00:46:44) Human-AI Data in GPT-4's Pretraining (00:49:25) Are Longform Task Questions Unusual (00:51:48) When Will Situational Awareness Saturate (00:53:36) Safety And Governance Implications Of Saturation (00:56:17) Evaluation Implications Of Saturation (00:57:40) Follow-up Work On The Situational Awarenss Dataset (01:00:04) Would Removing Chain-Of-Thought Work? (01:02:18) Out-of-Context Reasoning: the "Connecting the Dots" paper (01:05:15) Experimental Setup (01:07:46) Concrete Function Example: 3x + 1 (01:11:23) Isn't It Just A Simple Mapping? (01:17:20) Safety Motivation (01:22:40) Out-Of-Context Reasoning Results Were Surprising (01:24:51) The Biased Coin Task (01:27:00) Will Out-Of-Context Resaoning Scale (01:32:50) Checking If In-Context Learning Work (01:34:33) Mixture-Of-Functions (01:38:24) Infering New Architectures From ArXiv (01:43:52) Twitter Questions (01:44:27) How Does Owain Come Up With Ideas? (01:49:44) How Did Owain's Background Influence His Research Style And Taste? (01:52:06) Should AI Alignment Researchers Aim For Publication? (01:57:01) How Can We Apply LLM Understanding To Mitigate Deceptive Alignment? (01:58:52) Could Owain's Research Accelerate Capabilities? (02:08:44) How Was Owain's Work Received? (02:13:23) Last Message
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Self-Other Overlap: A Neglected Approach to AI Alignment, published by Marc Carauleanu on July 30, 2024 on LessWrong. Many thanks to Bogdan Ionut-Cirstea, Steve Byrnes, Gunnar Zarnacke, Jack Foxabbott and Seong Hah Cho for critical comments and feedback on earlier and ongoing versions of this work. Summary In this post, we introduce self-other overlap training: optimizing for similar internal representations when the model reasons about itself and others while preserving performance. There is a large body of evidence suggesting that neural self-other overlap is connected to pro-sociality in humans and we argue that there are more fundamental reasons to believe this prior is relevant for AI Alignment. We argue that self-other overlap is a scalable and general alignment technique that requires little interpretability and has low capabilities externalities. We also share an early experiment of how fine-tuning a deceptive policy with self-other overlap reduces deceptive behavior in a simple RL environment. On top of that, we found that the non-deceptive agents consistently have higher mean self-other overlap than the deceptive agents, which allows us to perfectly classify which agents are deceptive only by using the mean self-other overlap value across episodes. Introduction General purpose ML models with the capacity for planning and autonomous behavior are becoming increasingly capable. Fortunately, research on making sure the models produce output in line with human interests in the training distribution is also progressing rapidly (eg, RLHF, DPO). However, a looming question remains: even if the model appears to be aligned with humans in the training distribution, will it defect once it is deployed or gathers enough power? In other words, is the model deceptive? We introduce a method that aims to reduce deception and increase the likelihood of alignment called Self-Other Overlap: overlapping the latent self and other representations of a model while preserving performance. This method makes minimal assumptions about the model's architecture and its interpretability and has a very concrete implementation. Early results indicate that it is effective at reducing deception in simple RL environments and preliminary LLM experiments are currently being conducted. To be better prepared for the possibility of short timelines without necessarily having to solve interpretability, it seems useful to have a scalable, general, and transferable condition on the model internals, making it less likely for the model to be deceptive. Self-Other Overlap To get a more intuitive grasp of the concept, it is useful to understand how self-other overlap is measured in humans. There are regions of the brain that activate similarly when we do something ourselves and when we observe someone else performing the same action. For example, if you were to pick up a martini glass under an fMRI, and then watch someone else pick up a martini glass, we would find regions of your brain that are similarly activated (overlapping) when you process the self and other-referencing observations as illustrated in Figure 2. There seems to be compelling evidence that self-other overlap is linked to pro-social behavior in humans. For example, preliminary data suggests extraordinary altruists (people who donated a kidney to strangers) have higher neural self-other overlap than control participants in neural representations of fearful anticipation in the anterior insula while the opposite appears to be true for psychopaths. Moreover, the leading theories of empathy (such as the Perception-Action Model) imply that empathy is mediated by self-other overlap at a neural level. While this does not necessarily mean that these results generalise to AI models, we believe there are more fundamental reasons that this prior, onc...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Self-Other Overlap: A Neglected Approach to AI Alignment, published by Marc Carauleanu on July 30, 2024 on The AI Alignment Forum. Many thanks to Bogdan Ionut-Cirstea, Steve Byrnes, Gunnar Zarnacke, Jack Foxabbott and Seong Hah Cho for critical comments and feedback on earlier and ongoing versions of this work. This research was conducted at AE Studio and supported by the AI Safety Grants programme administered by Foresight Institute with additional support from AE Studio. Summary In this post, we introduce self-other overlap training: optimizing for similar internal representations when the model reasons about itself and others while preserving performance. There is a large body of evidence suggesting that neural self-other overlap is connected to pro-sociality in humans and we argue that there are more fundamental reasons to believe this prior is relevant for AI Alignment. We argue that self-other overlap is a scalable and general alignment technique that requires little interpretability and has low capabilities externalities. We also share an early experiment of how fine-tuning a deceptive policy with self-other overlap reduces deceptive behavior in a simple RL environment. On top of that, we found that the non-deceptive agents consistently have higher mean self-other overlap than the deceptive agents, which allows us to perfectly classify which agents are deceptive only by using the mean self-other overlap value across episodes. Introduction General purpose ML models with the capacity for planning and autonomous behavior are becoming increasingly capable. Fortunately, research on making sure the models produce output in line with human interests in the training distribution is also progressing rapidly (eg, RLHF, DPO). However, a looming question remains: even if the model appears to be aligned with humans in the training distribution, will it defect once it is deployed or gathers enough power? In other words, is the model deceptive? We introduce a method that aims to reduce deception and increase the likelihood of alignment called Self-Other Overlap: overlapping the latent self and other representations of a model while preserving performance. This method makes minimal assumptions about the model's architecture and its interpretability and has a very concrete implementation. Early results indicate that it is effective at reducing deception in simple RL environments and preliminary LLM experiments are currently being conducted. To be better prepared for the possibility of short timelines without necessarily having to solve interpretability, it seems useful to have a scalable, general, and transferable condition on the model internals, making it less likely for the model to be deceptive. Self-Other Overlap To get a more intuitive grasp of the concept, it is useful to understand how self-other overlap is measured in humans. There are regions of the brain that activate similarly when we do something ourselves and when we observe someone else performing the same action. For example, if you were to pick up a martini glass under an fMRI, and then watch someone else pick up a martini glass, we would find regions of your brain that are similarly activated (overlapping) when you process the self and other-referencing observations as illustrated in Figure 2. There seems to be compelling evidence that self-other overlap is linked to pro-social behavior in humans. For example, preliminary data suggests extraordinary altruists (people who donated a kidney to strangers) have higher neural self-other overlap than control participants in neural representations of fearful anticipation in the anterior insula while the opposite appears to be true for psychopaths. Moreover, the leading theories of empathy (such as the Perception-Action Model) imply that empathy is mediated by self-ot...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How do we know that "good research" is good? (aka "direct evaluation" vs "eigen-evaluation"), published by Ruby on July 19, 2024 on LessWrong. AI Alignment is my motivating context but this could apply elsewhere too. The nascent field of AI Alignment research is pretty happening these days. There are multiple orgs and dozens to low hundreds of full-time researchers pursuing approaches to ensure AI goes well for humanity. Many are heartened that there's at least some good research happening, at least in the opinion of some of the good researchers. This is reason for hope, I have heard. But how do we know whether or not we have produced "good research?" I think there are two main routes to determining that research is good, and yet only one applies in the research field of aligning superintelligent AIs. "It's good because it works" The first and better way to know that your research is good is because it allows you to accomplish some goal you care about[1] [1]. Examples: My work on efficient orbital mechanics calculation is good because it successfully lets me predict the trajectory of satellites. My work on the disruption of cell signaling in malign tumors is good because it helped me develop successful anti-cancer vaccines. My work on solid-state physics is good because it allowed me to produce superconductors at a higher temperature and lower pressure than previously attained.[2] In each case, there's some outcome I care about pretty inherently for itself, and if the research helps me attain that outcome it's good (or conversely if it doesn't, it's bad). The good researchers in my field are those who have produced a bunch of good research towards the aims of the field. Sometimes it's not clear-cut. Perhaps I figured out some specific cell signaling pathways that will be useful if it turns out that cell signaling disruption in general is useful, and that's TBD on therapies currently being trialed and we might not know how good (i.e. useful) my research was for many more years. This actually takes us into what I think is the second meaning of "good research". "It's good because we all agree it's good" If our goal is successfully navigating the creation of superintelligent AI in a way such that humans are happy with the outcome, then it is too early to properly score existing research on how helpful it will be. No one has aligned a superintelligence. No one's research has contributed to the alignment of an actual superintelligence. At this point, the best we can do is share our predictions about how useful research will turn out to be. "This is good research" = "I think this research will turn out to be helpful". "That person is a good researcher" = "That person produces much research that will turn out to be useful and/or has good models and predictions of which research will turn out to help". To talk about the good research that's being produced is simply to say that we have a bunch of shared predictions that there exists research that will eventually help. To speak of the "good researchers" is to speak of the people who lots of people agree their work is likely helpful and opinions likely correct. Someone might object that there's empirical research that we can see yielding results in terms of interpretability/steering or demonstrating deception-like behavior and similar. While you can observe an outcome there, that's not the outcome we really care about of aligning superintelligent AI, and the relevance of this work is still just prediction. It's being successful at kinds of cell signaling modeling before we're confident that's a useful approach. More like "good" = "our community pagerank Eigen-evaluation of research rates this research highly" It's a little bit interesting to unpack "agreeing that some research is good". Obviously, not everyone's opinion matters ...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0, published by James Fox on July 7, 2024 on LessWrong. TL;DR We are excited to announce the fourth iteration of ARENA (Alignment Research Engineer Accelerator), a 4-5 week ML bootcamp with a focus on AI safety! ARENA's mission is to provide talented individuals with the skills, tools, and environment necessary for upskilling in ML engineering, for the purpose of contributing directly to AI alignment in technical roles. ARENA will be running in-person from LISA from 2nd September - 4th October (the first week is an optional review of the fundamentals of neural networks). Apply here before 23:59 July 20th anywhere on Earth! Summary ARENA has been successfully run three times, with alumni going on to become MATS scholars and LASR participants; AI safety engineers at Apollo Research, Anthropic, METR, and OpenAI; and even starting their own AI safety organisations! This iteration will run from 2nd September - 4th October (the first week is an optional review of the fundamentals of neural networks) at the London Initiative for Safe AI (LISA) in Old Street, London. LISA houses small organisations (e.g., Apollo Research, BlueDot Impact), several other AI safety researcher development programmes (e.g., LASR Labs, MATS extension, PIBBS, Pivotal), and many individual researchers (independent and externally affiliated). Being situated at LISA, therefore, brings several benefits, e.g. facilitating productive discussions about AI safety & different agendas, allowing participants to form a better picture of what working on AI safety can look like in practice, and offering chances for research collaborations post-ARENA. The main goals of ARENA are to: Help participants skill up in ML relevant for AI alignment. Produce researchers and engineers who want to work in alignment and help them make concrete next career steps. Help participants develop inside views about AI safety and the paths to impact of different agendas. The programme's structure will remain broadly the same as ARENA 3.0 (see below); however, we are also adding an additional week on evaluations. For more information, see our website. Also, note that we have a Slack group designed to support the independent study of the material (join link here). Outline of Content The 4-5 week program will be structured as follows: Chapter 0 - Fundamentals Before getting into more advanced topics, we first cover the basics of deep learning, including basic machine learning terminology, what neural networks are, and how to train them. We will also cover some subjects we expect to be useful going forward, e.g. using GPT-3 and 4 to streamline your learning, good coding practices, and version control. Note: Participants can optionally skip the program this week and join us at the start of Chapter 1 if they'd prefer this option and if we're confident that they are already comfortable with the material in this chapter. Topics include: PyTorch basics CNNs, Residual Neural Networks Optimization (SGD, Adam, etc) Backpropagation Hyperparameter search with Weights and Biases GANs & VAEs Chapter 1 - Transformers & Interpretability In this chapter, you will learn all about transformers and build and train your own. You'll also study LLM interpretability, a field which has been advanced by Anthropic's Transformer Circuits sequence, and open-source work by Neel Nanda. This chapter will also branch into areas more accurately classed as "model internals" than interpretability, e.g. recent work on steering vectors. Topics include: GPT models (building your own GPT-2) Training and sampling from transformers TransformerLens In-context Learning and Induction Heads Indirect Object Identification Superposition Steering Vectors Chapter 2 - Reinforcement Learning In this chapter, you w...
In this solo episode of AI, Government, and the Future, host Alan Pentz explores the critical intersection of AI and national security. He discusses recent developments in AI, highlighting the work of Leopold Ashen Brenner and the potential for an "intelligence explosion." Alan explores the growing importance of AI in national security, the merging of consumer and national security technologies, and the challenges of AI alignment. He emphasizes the need for a national project approach to AI development and the importance of maintaining a technological edge over competitors like China.
Kevin Werbach speaks with Navrina Singh of Credo AI, which automates AI oversight and regulatory compliance. Singh addresses the increasing importance of trust and governance in the AI space. She discusses the need to standardize and scale oversight mechanisms by helping companies align and translate their systems to include all stakeholders and comply with emerging global standards. Kevin and Navrina also explore the importance of sociotechnical approaches to AI governance, the necessity of mandated AI disclosures, the democratization of generative AI, adaptive policymaking, and the need for enhanced AI literacy within organizations to keep pace with evolving technologies and regulatory landscapes. Navrina Singh is the Founder and CEO of Credo AI, a Governance SaaS platform empowering enterprises to deliver responsible AI. Navrina previously held multiple product and business leadership roles at Microsoft and Qualcomm. She is a member of the U.S. Department of Commerce National Artificial Intelligence Advisory Committee (NAIAC), an executive board member of Mozilla Foundation, and a Young Global Leader of the World Economic Forum. Credo.ai ISO/ 42001 standard for AI governance Navrina Singh Founded Credo AI To Align AI With Human Values Want to learn more? Engage live with Professor Werbach and other Wharton faculty experts in Wharton's new Strategies for Accountable AI online executive education program. It's perfect for managers, entrepreneurs, and advisors looking to harness AI's power while addressing its risks.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A Case for Superhuman Governance, using AI, published by Ozzie Gooen on June 7, 2024 on The Effective Altruism Forum. I believe that: 1. AI-enhanced organization governance could be a potentially huge win in the next few decades. 2. AI-enhanced governance could allow organizations to reach superhuman standards, like having an expected "99.99" reliability rate of not being corrupt or not telling lies. 3. While there are clear risks to AI-enhancement at the top levels of organizations, it's likely that most of these can be managed, assuming that the implementers are reasonable. 4. AI-enhanced governance could synchronize well with AI company regulation. These companies would be well-placed to develop innovations and could hypothetically be incentivized to do much of the work. AI-enhanced governance might be necessary to ensure that these organizations are aligned with public interests. 5. More thorough investigation here could be promising for the effective altruism community. Within effective altruism now, there's a lot of work on governance and AI, but not much on using AI for governance. AI Governance typically focuses on using conventional strategies to oversee AI organizations, while AI Alignment research focuses on aligning AI systems. However, leveraging AI to improve human governance is an underexplored area that could complement these cause areas. You can think of it as "Organizational Alignment", as a counterpoint to "AI Alignment." This article was written after some rough ideation I've done about this area. This isn't at all a literature review or a research agenda. That said, for those interested in this topic, here are a few posts you might find interesting. Project ideas: Governance during explosive technological growth The Project AI Series, by OpenMined Safety Cases: How to Justify the Safety of Advanced AI Systems Affirmative Safety: An Approach to Risk Management for Advanced AI What is "AI-Assisted" Governance? AI-Assisted Governance refers to improvements in governance that leverage artificial intelligence (AI), particularly focusing on rapidly advancing areas like Large Language Models (LLMs). Examples methods include: 1. Monitoring politicians and executives to identify and flag misaligned or malevolent behavior, ensuring accountability and integrity. 2. Enhancing epistemics and decision-making processes at the top levels of organizations, leading to more informed and rational strategies. 3. Facilitating more effective negotiations and trades between organizations, fostering better cooperation and coordination. 4. Assisting in writing and overseeing highly secure systems, such as implementing differential privacy and formally verified, bug-free decision-automation software, for use at managerial levels. Arguments for Governance Improvements, Generally There's already a lot of consensus in the rationalist and effective altruist communities about the importance for governance. See the topics on Global Governance, AI Governance, and Nonprofit Governance for more information. Here are some main reasons why focusing on improving governance seems particularly promising: Concentrated Leverage Real-world influence is disproportionately concentrated in the hands of a relatively small number of leaders in government, business, and other pivotal institutions. This is especially true in the case of rapid AI progress. Improving the reasoning and actions of this select group is therefore perhaps the most targeted, tractable, and neglected way to shape humanity's long-term future. AI tools could offer uniquely potent levers to do so. A lot of epistemic-enhancing work focuses on helping large populations. But some people will matter many times as much as others, and these people are often in key management positions. Dramatic Room for Improvement It's hard to lo...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On "first critical tries" in AI alignment, published by Joe Carlsmith on June 5, 2024 on The AI Alignment Forum. People sometimes say that AI alignment is scary partly (or perhaps: centrally) because you have to get it right on the "first critical try," and can't learn from failures.[1] What does this mean? Is it true? Does there need to be a "first critical try" in the relevant sense? I've sometimes felt confused about this, so I wrote up a few thoughts to clarify. I start with a few miscellaneous conceptual points. I then focus in on a notion of "first critical try" tied to the first point (if there is one) when AIs get a "decisive strategic advantage" (DSA) over humanity - that is, roughly, the ability to kill/disempower all humans if they try.[2] I further distinguish between four different types of DSA: Unilateral DSA: Some AI agent could take over if it tried, even without the cooperation of other AI agents (see footnote for more on how I'm individuating AI agents).[3] Coordination DSA: If some set of AI agents coordinated to try to take over, they would succeed; and they could coordinate in this way if they tried. Short-term correlation DSA: If some set of AI agents all sought power in problematic ways within a relatively short period of time, even without coordinating, then ~all humans would be disempowered. Long-term correlation DSA: If some set of AI agents all sought power in problematic ways within a relatively long period of time, even without coordinating, then ~all humans would be disempowered. I also offer some takes on our prospects for just not ever having "first critical tries" from each type of DSA (via routes other than just not building superhuman AI systems at all). In some cases, just not having a "first critical try" in the relevant sense seems to me both plausible and worth working towards. In particular, I think we should try to make it the case that no single AI system is ever in a position to kill all humans and take over the world. In other cases, I think avoiding "first critical tries," while still deploying superhuman AI agents throughout the economy, is more difficult (though the difficulty of avoiding failure is another story). Here's a chart summarizing my takes in more detail. Type of DSA Definition Prospects for avoiding AIs ever getting this type of DSA - e.g., not having a "first critical try" for such a situation. What's required for it to lead to doom Unilateral DSA Some AI agent could take over if it tried, even without the cooperation of other AI agents. Can avoid by making the world sufficiently empowered relative to each AI system. We should work towards this - e.g. aim to make it the case that no single AI system could kill/disempower all humans if it tried. Requires only that this one agent tries to take over. Coordination DSA If some set of AI agents coordinated to try to take over, they would succeed; and they are able to so coordinate. Harder to avoid than unilateral DSAs, due to the likely role of other AI agents in preventing unilateral DSAs. But could still avoid/delay by (a) reducing reliance on other AI agents for preventing unilateral DSAs, and (b) preventing coordination between AI agents. Requires that all these agents try to take over, and that they coordinate. Short-term correlation DSA If some set of AI agents all sought power in problematic ways within a relatively short period of time, even without coordinating, then ~all humans would be disempowered. Even harder to avoid than coordination DSAs, because doesn't require that the AI agents in question be able to coordinate. Requires that within a relatively short period of time, all these agents choose to seek power in problematic ways, potentially without the ability to coordinate. Long-term correlation DSA If some set of AI agents all sought power in prob...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Announcing ILIAD - Theoretical AI Alignment Conference, published by Nora Ammann on June 5, 2024 on The AI Alignment Forum. We are pleased to announce ILIAD - a 5-day conference bringing together 100+ researchers to build strong scientific foundations for AI alignment. *** Apply to attend by June 30!*** When: Aug 28 - Sep 3, 2024 Where: @Lighthaven (Berkeley, US) What: A mix of topic-specific tracks, and unconference style programming, 100+ attendees. Topics will include Singular Learning Theory, Agent Foundations, Causal Incentives, Computational Mechanics and more to be announced. Who: Currently confirmed speakers include: Daniel Murfet, Jesse Hoogland, Adam Shai, Lucius Bushnaq, Tom Everitt, Paul Riechers, Scott Garrabrant, John Wentworth, Vanessa Kosoy, Fernando Rosas and James Crutchfield. Costs: Tickets are free. Financial support is available on a needs basis. See our website here. For any questions, email iliadconference@gmail.com About ILIAD ILIAD is a 100+ person conference about alignment with a mathematical focus. The theme is ecumenical. If that excites you, do apply! Program and Unconference Format ILIAD will feature an unconference format - meaning that participants can propose and lead their own sessions. We believe that this is the best way to release the latent creative energies in everyone attending. That said, freedom can be scary! If taking charge of your own learning sounds terrifying, rest assured there will be plenty of organized sessions as well. We will also run the topic-specific workshop tracks such as: Computational Mechanics is a framework for understanding complex systems by focusing on their intrinsic computation and information processing capabilities. Pioneered by J. Crutchfield, it has recently found its way into AI safety. This workshop is led by Paul Riechers. Singular learning theory, developed by S. Watanabe, is the modern theory of Bayesian learning. SLT studies the loss landscape of neural networks, using ideas from statistical mechanics, Bayesian statistics and algebraic geometry. The track lead is Jesse Hoogland. Agent Foundations uses tools from theoretical economics, decision theory, Bayesian epistemology, logic, game theory and more to deeply understand agents: how they reason, cooperate, believe and desire. The track lead is Daniel Hermann. Causal Incentives is a collection of researchers interested in using causal models to understand agents and their incentives. The track lead is Tom Everitt. "Everything Else" is a track which will include an assortment of other sessions under the direction of John Wentworth. Details to be announced at a later time. Financial Support Financial support for accommodation & travel are available on a needs basis. Lighthaven has capacity to accommodate 60% of participants. Note that these rooms are shared. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
Mahon McCann is a writer, award-winning playwright, philosophy doctoral researcher, martial arts coach, and podcast host who explores personal development and ethical issues of technology. Are we on the brink of a new era of enlightenment or facing the rise of an existential threat? In this episode, John Vervaeke and Mahon McCann delve into the intricate relationship between artificial intelligence and the human search for meaning. They explore the potential of AI to act as a tool for personal and collective transformation, but also examine the challenges of creating AI that truly understands and respects our values. From predictive processing and relevance realization to Plato's cave allegory, this episode offers a fascinating exploration of the intersection between AI and human consciousness. Support John's groundbreaking work and gain exclusive access to live Q&A sessions, early video releases, and more by joining our Patreon community! — "Attention is like a spotlight you shine on things. When you shine your attention on them, they stand out and keep your attention." - John Vervaeke [00:02:12] “So much of this first generation of AI is an environment, the large scale curation algorithms of social media are what's organizing our experience online. It's not a human being, but it's playing this role of orchestrating a lot of our experience and people's development. And so creating an optimal AI or an optimal environment would be trying to foster optimal agency, optimal meaning, which is really optimal wellbeing as well.” - Mahon McCann [00:38:23] “I think part of steeling the culture is to get the people within the AI industry [to] drop the sorcerer myth. Drop it. Disidentify with it. Take up the sage mythology. Take that up. That is the way we can do what you want, but in a way that isn't going to set us on a path to destruction. - John Vervaeke [00:59:16] — 0:00 Introduction to AI Alignment and the Meaning Crisis 1:00 Predictive Processing and Relevance Realization 2:00 Beyond the Spotlight Metaphor: New Models of Attention 6:30 Integrating Predictive Processing with Relevance Realization 23:00 Agency, Meaning, and the Self-Organizing Mind 29:20 Autopoiesis, Agency, and Plato's Insights 31:25 Predictive Processing, Motivation, Vertical Alignment, and Horizontal Anticipation 35:05 Optimal Agency, Collective Intelligence, and Persuasive Technologies 37:40 AI as Agents vs. AI as Environment 39:45 The Challenge of Aligning AI with Human Flourishing 53:15 Creating a New Sacredness in the Digital Age 59:44 Concluding Thoughts on Navigating AI Development: Embracing Choice Points, Wisdom, and Foresight for a Collectively Beneficial Future — Join The Vervaeke Foundation in our mission to advance the scientifically rigorous pursuit of wisdom and make a meaningful impact in the world. Discover practices that deepen your virtues and help you connect more deeply with reality and relationships by joining Awaken to Meaning today. — Ideas, Authors, and Works Mentioned in this Episode Predictive processing and relevance realization: exploring convergent solutions to the frame problem - Brett Andersen, Mark Miller, John Vervaeke Ken Lowry D.C. Schindler Jonathan Pageau Mark Miller Rick Repetti Attention Is Cognitive Unison: An Essay in Philosophical Psychology - Christopher Mole Feature-integration theory of attention - Anne Treisman Metaphors We Live By - George Lakoff and Mark Johnson Attention Metaphors: How Metaphors Guide the Cognitive Psychology of Attention - Diego Fernandez-Duque, Mark Johnson William James Andy Clark Phenomenology of Perception - Maurice Merleau-Ponty Flow: The Psychology of Optimal Experience - Mihaly Csikszentmihalyi Lev Vygotsky The Republic - Plato Theory of motivation Mentoring the Machines - John Vervaeke and Shawn Coyne Free-Energy Minimising Agents and Beneficial A.I.: Ambient Smart Environments, Allostasis, and Metacognitive Control - Ben White and Mark Miller Tristan Harris Heidegger — The Crossroads of Predictive Processing and Relevance Realization | Leiden Symposium AI: The Coming Thresholds and The Path We Must Take | Internationally Acclaimed Cognitive Scientist — Follow John Vervaeke: Website | Twitter | YouTube | Patreon Follow Mahon McCann: Website | Instagram | YouTube — Thank you for watching!
In this episode of the Crazy Wisdom Podcast, host Stewart Alsop welcomes Lachlan Phillips, founder of LiveMind AI, for a compelling conversation about the implications of decentralized AI. They discuss the differences between centralized and decentralized systems, the historical context of centralization, and the potential risks and benefits of distributed computing and storage. Topics also include the challenges of aligning AI with human values, the role of supervised fine-tuning, and the importance of trust and responsibility in AI systems. Tune in to hear how decentralized AI could transform technology and society. Check out LiveMind AI and follow Lachlan on Twitter at @bitcloud for more insights. Check out this GPT we trained on the conversation! Timestamps 00:00 Introduction of Lachlan Phillips and discussion on decentralized AI, comparing it to human brain structure and the World Wide Web. 00:05 Further elaboration on decentralization and centralization in AI and its historical context, including the impact of radio, TV, and the internet. 00:10 Discussion on the natural emergence of centralization from decentralized systems and the problems associated with centralized control. 00:15 Comparison between centralized and decentralized systems, highlighting the voluntary nature of decentralized associations. 00:20 Concerns about large companies controlling powerful AI technology and the need for decentralization to avoid issues similar to those seen with Google and Facebook. 00:25 Discussion on Google's centralization, infrastructure, and potential biases. Introduction to distributed computing and storage concepts. 00:30 Lachlan Phillips shares his views on distributed storage and mentions GunDB and IPFS as examples of decentralized systems. 00:35 Exploration of the relationship between decentralized AI and distributed storage, emphasizing the need for decentralized training of AI models. 00:40 Further discussion on decentralized AI training and the potential for local models to handle specific tasks instead of relying on centralized infrastructures. 00:45 Conversation on the challenges of aligning AI with human values, the role of supervised fine-tuning in AI training, and the involvement of humans in the training process. 00:50 Speculation on the implications of technologies like Neuralink and the importance of decentralizing such powerful tools to prevent misuse. 00:55 Discussion on network structures, democracy, and how decentralized systems can better represent collective human needs and values. Key Insights Decentralization vs. Centralization in AI: Lachlan Phillips highlighted the fundamental differences between decentralized and centralized AI systems. He compared decentralized AI to the structure of the human brain and the World Wide Web, emphasizing collaboration and distributed control. He argued that while centralized AI systems concentrate power and decision-making, decentralized AI systems mimic natural, more organic forms of intelligence, potentially leading to more robust and democratic outcomes. Historical Context and Centralization: The conversation delved into the historical context of centralization, tracing its evolution from the era of radio and television to the internet. Stewart Alsop and Lachlan discussed how centralization has re-emerged in the digital age, particularly with the rise of big tech companies like Google and Facebook. They noted how these companies' control over data and algorithms mirrors past media centralization, raising concerns about power consolidation and its implications for society. Emergent Centralization in Decentralized Systems: Lachlan pointed out that even in decentralized systems, centralization can naturally emerge as a result of voluntary collaboration and association. He explained that the problem lies not in centralization per se, but in the forced maintenance of these centralized structures, which can lead to the consolidation of power and the detachment of centralized entities from the needs and inputs of their users. Risks of Centralized AI Control: A significant part of the discussion focused on the risks associated with a few large companies controlling powerful AI technologies. Stewart expressed concerns about the potential for misuse and bias, drawing parallels to the issues seen with Google and Facebook's control over information. Lachlan concurred, emphasizing the importance of decentralizing AI to prevent similar problems in the AI domain and to ensure broader, more equitable access to these technologies. Distributed Computing and Storage: Lachlan shared his insights into distributed computing and storage, citing projects like GunDB and IPFS as promising examples. He highlighted the need for decentralized infrastructures to support AI, arguing that these models can help sidestep the centralization of control and data. He advocated for pushing as much computation and storage to the client side as possible to maintain user control and privacy. Challenges of AI Alignment and Training: The conversation touched on the difficulties of aligning AI systems with human values, particularly through supervised fine-tuning and RLHF (Reinforcement Learning from Human Feedback). Lachlan criticized current alignment efforts for their top-down approach, suggesting that a more decentralized, bottom-up method that incorporates diverse human inputs and experiences would be more effective and representative. Trust and Responsibility in AI Systems: Trust emerged as a central theme, with both Stewart and Lachlan questioning whether AI systems can or should be trusted more than humans. Lachlan argued that ultimately, humans are responsible for the actions of AI systems and the consequences they produce. He emphasized the need for AI systems that enable individual control and accountability, suggesting that decentralized AI could help achieve this by aligning more closely with human networks and collective decision-making processes.
Manche halten es für das größte Risiko der Menschheit: Eine künstliche Super-Intelligenz, die außer Kontrolle gerät und die Menschheit ausrottet. Aber ist wirklich etwas dran an diesem düsteren Szenario? Über die Hosts: Gregor Schmalzried ist freier Tech-Journalist und Berater, er arbeitet u.a. für den Bayerischen Rundfunk und Brand Eins. Marie Kilg ist freie Journalistin und Innovationsmanagerin im Deutsche Welle Lab. Zuvor war sie Produkt-Managerin bei Amazon Alexa. In dieser Folge: 0:00 Intro 2:30 Die Angst vor dem Paperplic Maximizer 9:10 Was ist "Alignment”? 15:30 Woher kommt die Angst vor der Super-KI? 21:40 Echte Gefahr oder Spinnerei? 28:00 Was haben wir mit KI gemacht? Redaktion und Mitarbeit: David Beck, Cristina Cletiu, Chris Eckardt, Fritz Espenlaub, Marie Kilg, Mark Kleber, Gudrun Riedl, Christian Schiffer, Gregor Schmalzried Links und Quellen: Asimov und die Robotergesetze https://www.deutschlandfunkkultur.de/100-geburtstag-von-isaac-asimov-als-der-erste-moralkodex-100.html Stephen Hawking über die Gefahr durch eine Super-KI (2014) https://www.bbc.com/news/technology-30290540 "Effective Altruism” und die Angst vor der Super-KI https://www.truthdig.com/articles/the-acronym-behind-our-wildest-ai-dreams-and-nightmares/ Alte Geschichten und Ängste: https://www.cbc.ca/radio/tapestry/fear-of-ai-1.7012912 Der KI-Podcast bei der TINCON Berlin 2024 https://tincon.org/event/berlin24/takeover-eine-ki-ubernimmt-den-ki-podcast Open AI veröffentlicht neues Modell GPT-4o https://www.theverge.com/2024/5/13/24155493/openai-gpt-4o-launching-free-for-all-chatgpt-users 27 — Der Podcast zur Europawahl https://1.ard.de/27-podcast?cp=ki Kontakt: Wir freuen uns über Fragen und Kommentare an podcast@br.de. Unterstützt uns: Wenn euch dieser Podcast gefällt, freuen wir uns über eine Bewertung auf eurer liebsten Podcast-Plattform. Abonniert den KI-Podcast in der ARD Audiothek oder wo immer ihr eure Podcasts hört, um keine Episode zu verpassen. Und empfehlt uns gerne weiter!
Joscha Bach is a cognitive scientist and artificial intelligence researcher. Joscha has an exceptional ability to articulate how the human mind works.He comes on the show to talk about the deeper metaphors within the Bible. Specifically, how the story of Abraham sacrificing his son Isaac, and God sacrificing Jesus, have an even more profound connection than many may realize. The discussion continues about how human cognition works and the value of emotions to guide a person towards taking important actions, We also talk about what a potential endgame of A.I. looks like. Joscha lays out a case for how AI could create a future that helps humans lead fuller more satisfying lives where they can observe deeper truths about the world and move through challenges with ease.At times, this was a brain breaking conversation that felt like a dense onion, one that needs its layers peeled back over and over again. Joscha's Twitter - https://twitter.com/Plinz?s=20Joscha's website - http://bach.ai/Joscha's talk on the value of emotions - https://youtu.be/cs9Ls0m5QVE?si=W8l8AmITxS8IB7z6 Connect with us! =============================IG: ➡︎ https://www.instagram.com/legacy_interviews/===========================How To Work With Us: ===========================Want to do a Legacy Interview for you or a loved one?Book a Legacy Interview | https://legacyinterviews.com/ —A Legacy Interview is a two-hour recorded interview with you and a host that can be watched now and viewed in the future. It is a recording of what you experienced, the lessons you learned and the family values you want passed down. We will interview you or a loved one, capturing the sound of their voice, wisdom and a sense of who they are. These recorded conversations will be private, reserved only for the people that you want to share it with.#Vancecrowepodcast #legacyinterviews Timestamps:0:00 - Intro3:00 - Genesis & Consciousness18:00 - How can we break down the idea of discovering something new within the conceptual structures of our brain?21:00 - What was the original intention of the writers of the Bible38:30 - Where will AI go?45:00 - Does AI want something?56:50 - Why do we have emotions?1:02:49 - will AI need emotions?1:06:15 What is AI Alignment?1:13:08 Can AI serve God?1:17:10 How Autistic mind makes decisions1:28:47 Autism and medication 1:35:35 What is a normie?
This is an interview for the Crazy Wisdom Podcast, where Stewart Alsop interviews AI strategist, Christian Ulstrup. Ullstrup shares his perspective on the progression and usage of Artificial Intelligence in businesses. He talks about the power of goal-setting in an organizational structure and highlights his belief towards goals being discovered rather than set. He further discloses his method to compile and make sense of huge amounts of proprietary information through Large Language Models (LLMs). They discuss the potential of AI memory, handling misinformation, and the rise of open-source AI models. The two also briefly touch upon the ideas of 'alignment' within a biological lens and its potential connection to AI, touching upon the philosophies of Pierre Teilhard de Chardin. Timestamps 00:00 Introduction to the Crazy Wisdom Podcast 00:39 Guest Introduction: Christian Ulstrup, AI Strategist 00:54 Exploring the Latest in AI and its Impact 01:13 The Role of Social Media in Information Dissemination 02:07 Deep Dive into AI Alignment and its Challenges 03:21 Exploring the Concept of Stress and its Contagious Nature 04:17 The Future of AI and its Potential Impact 05:06 The Role of Internet and Social Networks in Our Lives 05:57 The Fear Surrounding AI and its Future 06:20 The Concept of Effective Accelerationism and Deceleration 07:07 The Relationship Between Technology and Adolescents 08:02 The Importance of Goal Setting and its Impact 09:33 The Role of AI in the Future of Consciousness 16:33 The Emergence of AI and its Implications 24:51 The Role of Memory in AI and Human Interaction 27:24 The Importance of Reflection and Accurate Facts 28:20 Using Transcripts for Real-Time Analysis 28:54 The Power of AI in Real-Time Discussions 29:03 The Role of AI in Goal Setting and Decision Making 30:42 The Challenge of Misinformation in AI 32:13 The Role of AI in Knowledge Management 34:18 The Impact of AI on Memory and Information Retrieval 35:04 The Potential Dangers of AI and Disinformation 36:49 The Future of AI: Centralization vs Decentralization 47:58 The Role of Open Source in AI Development 54:04 How to Connect and Learn More 55:39 Closing Remarks Key Insights AI Strategy and Innovation: Christian emphasizes the importance of a thoughtful AI strategy that aligns with the broader goals of an organization. He discusses how AI can drive innovation by automating tasks, enhancing decision-making processes, and creating new opportunities for business growth. Ullstrup highlights the need for companies to stay abreast of AI advancements to maintain a competitive edge. Productivity and AI Tools: The discussion covers how AI tools can significantly boost productivity by streamlining workflows and reducing the cognitive load on individuals. Ullstrup shares insights into how AI can assist in goal setting and knowledge management, enabling people to focus on more creative and high-level tasks. Philosophy and AI Alignment: A significant part of the conversation is dedicated to the philosophical aspects of AI, particularly the ethical considerations of AI development and its alignment with human values. Christian talks about the challenges of ensuring AI systems act in ways that are beneficial to humanity and the complexities involved in defining and programming these values. Individual Freedom and Data Centralization: Ullstrup expresses concerns about data centralization and its implications for individual freedom in the digital age. He advocates for a more decentralized approach to data management, where individuals have greater control over their personal information. Limits of Computational Advancements: The episode touches upon the potential limits of computational advancements, questioning the inevitability of the singularity—a point where AI surpasses human intelligence in all aspects. Christian suggests a more nuanced view of technological progress, emphasizing the importance of understanding the limitations and ensuring responsible development. Enhancing Human Capabilities: A recurring theme is the potential for AI to not only automate tasks but also to enhance human capabilities. Christian discusses how AI can complement human intelligence, fostering a deeper understanding of complex systems and enabling us to tackle problems beyond our current capabilities. Skepticism Towards the Singularity: Ullstrup shares a healthy skepticism towards the concept of the singularity, cautioning against overestimating the pace of AI development and underestimating the complexities involved in creating truly autonomous, superintelligent systems. Societal and Philosophical Implications: Finally, the episode explores the broader societal and philosophical implications of AI. It discusses how AI can transform our understanding of ourselves and the world, posing both opportunities and challenges that require thoughtful consideration and dialogue.
While the gang take a little holiday break, we thought it was worth revisiting Andy's conversation with AI researcher and UC Berkeley Professor of Computer Science Stuart Russell from wayyyyy back in 2019. Now that we're well into the era of generative artificial intelligence, it's interesting to look back at what experts were saying about AI alignment just a few years ago, when it seemed to many of us like an issue we wouldn't have to tackle directly for a long time to come. As we face down a future where LLMs and other generative models only appear to be getting more capable, it's worth pausing to reflect on what needs to be done to usher in a world that's more utopian than dystopian. Happy holidays!
AI advocate Marc Andreessen joins us to clear up misconceptions about AI and discuss its potential impact on job creation, creativity, and moral reasoning. What We Discuss with Marc Andreessen: Will AI create new jobs, take our old ones outright, or amplify our ability to perform them better? What role will AI play in current and future US-China relations? How might AI be used to shape (or manipulate) public opinion and the economy? Does AI belong in creative industries, or does it challenge (and perhaps cheapen) what it means to be human? How can we safeguard our future against the possibility that AI could get smart enough to remove humanity from the board entirely? And much more... Full show notes and resources can be found here: jordanharbinger.com/888 This Episode Is Brought To You By Our Fine Sponsors: jordanharbinger.com/deals Sign up for Six-Minute Networking — our free networking and relationship development mini course — at jordanharbinger.com/course! Like this show? Please leave us a review here — even one sentence helps! Consider including your Twitter handle so we can thank you personally!