Podcasts about rohin

  • 72PODCASTS
  • 224EPISODES
  • 59mAVG DURATION
  • 1MONTHLY NEW EPISODE
  • Mar 21, 2025LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about rohin

Latest podcast episodes about rohin

Daybreak
Haldiram's vs Bikaji just got spicier

Daybreak

Play Episode Listen Later Mar 21, 2025 15:38


In this episode we fill you in on three standout stories from the past week. First, why the fight between every Indian's favourite namkeen brands Haldiram's and Bikaji just got spicier; second, the controversy around Urban company's newest pet project;and finally, the latest from Two by Two where our colleagues Rohin and Praveen discuss what the fourth wave of tech exports from India will look like.Check out the stories and podcasts mentioned in this episode: The latest edition of Long and ShortWhy India's biggest employer of women gig workers refuses to listen to its own workforceTwo by Two: Ultrahuman and Kuku FM have broken outThe Ken is hosting its first live subscriber event! Join two long-term and contrarian CEOs, Nithin Kamath of Zerodha and Deepak Shenoy of Capitalmind, as they discuss the mental models, decision making frameworks, and potential outcomes related to a very real possibility: an extended stock market winter that lasts 24 months or more. Click here to buy your tickets. 

Le Rendez-vous Marketing
#143 - Construire une stratégie Créa Meta Ads de A à Z, la méthode 900.care pour des pubs Meta ultra performantes avec Rohin Sama, Creative Strategist @900.care

Le Rendez-vous Marketing

Play Episode Listen Later Jan 14, 2025 56:54


Envoyez-moi un message.Dans les coulisses des campagnes Meta Ads de 900.care avec Rohin SamaPour cet épisode, j'ai eu le plaisir de discuter avec Rohin Sama, Creative Strategist chez 900.care, une marque innovante qui transforme la routine de salle de bain avec des produits rechargeables, design, et respectueux de l'environnement. Leur objectif ? Réduire les déchets tout en proposant une expérience utilisateur haut de gamme.Rohin nous ouvre les portes de leur stratégie publicitaire et partage les secrets de leur succès sur Meta Ads :Comment leur équipe créative produit jusqu'à 60 vidéos par mois, avec un processus bien rodé.Leur approche des formats, angles et personas pour concevoir des publicités percutantes et adaptées à leur audience.Les métriques clés qu'ils surveillent pour optimiser les campagnes et améliorer les performances.Pourquoi l'itération et la déclinaison des créas sont cruciales pour maintenir un impact constant.900.care ne se contente pas de proposer des produits écoresponsables ; ils sont aussi à la pointe en matière de stratégie publicitaire. Si vous souhaitez découvrir comment une organisation créative solide peut transformer vos campagnes, cet épisode est fait pour vous.Pour réserver une session stratégique avec moi et parler de vos campagnes de publicité Facebook, c'est par ici : https://dhsdigital.eu/auditPour contacter Rohin Sama :➡️ LinkedIn : https://www.linkedin.com/in/rohinsama/➡️ Creative Lab : https://www.skool.com/the-creative-lab-3638/about Pour ne pas manquer mes prochains contenus, retrouvez-moi sur :➡️ LinkedIn : https://www.linkedin.com/in/daniloduchesnes/➡️ Instagram : https://www.instagram.com/_danilodhs/➡️ Facebook : https://www.facebook.com/daniloduchesnes➡️ Newsletter : https://daniloduchesnes.com/newsletter 

Two by Two
Health Insurance in India is ripe for disruption (Highlights only)

Two by Two

Play Episode Listen Later Sep 19, 2024 41:22


The fastest growing segment of insurance in India is individual health insurance. It's growing steadily at a, well, healthy pace of 20% annually. But scratch just a little beneath the surface and things don't appear so rosy. Of the 20% annual growth in revenue, nearly 15% comes from medical inflation. Meaning, existing customers paying higher premiums each year because the costs of treatments are going up.The growth in the number of customers each year is just around 5-6%. Health insurance in India is broken from top to bottom. 70-75% of Indians have no health insurance. Of those who do, the largest chunk have free or low cost insurance provided by the government, followed by usually employer provided group insurance. Less than 10% Indians have their own health insurance. Scratch that. It's more accurate to call it hospitalization insurance, not health insurance. Because the industry has developed in a way that incentivizes catastrophic illnesses and hospitalization and treatment, not health. Why, you wonder? Because much of the industry wrongly incentivizes, for legacy reasons, all the wrong things. Like, large groups that make lots of claims. High commissions to distributors. Expensive procedures. Expensive premiums. Instead of incentivising the right things. Like, getting the young and healthy covered early on. Insuring blue collar workers. Building products customers actually want. And most importantly, staying healthy.So when hosts Praveen Gopal Krishnan and Rohin Dharmakumar sat down to discuss this complex topic, they decided to invite two guests who had the experience and candour to tell them what needs to change. Our first guest is Viren Shetty, the Executive Vice Chairman of one of India's largest hospital groups, the listed Narayana Health. was our first guest. Viren has also been spearheading Narayana Health's foray into providing its own health insurance, built to address many of the gaps I spoke about earlier.Our second guest is Shivaprasad Krishnan. Shivaprasad currently runs an investment banking firm, Kricon Capital, but was a part of the founding team at ICICI Lombard, one of India's first private health insurers. He also has over 3 decades of experience in finance and management. This episode of Two by Two was researched and produced by Hari Krishna. Sound engineering and mixing is by Rajiv C N.What you just listened to were just some of the highlights from an almost 90 minute discussion that Praveen and Rohin had with Viren and Shivaprasad. You can listen to full episodes either with a Premium subscription to The Ken or by subscribing to Two by Two Premium on Apple Podcasts.Of course, you could also wait 4 weeks, because we do make full episodes available for a while after that.[This is a highlights episode which you listen for free on Spotify, Amazon Music or YouTube or wherever you get your podcasts if you're not a paid subscriber yet]If you enjoyed listening to this episode of Two by Two or have some thoughts that you'd like to share with us you can always write to us twobytwo@the-ken.com. We'll be back next week with a new episode for you.

STR Data Labâ„¢ by AirDNA
The Journey of a Short-Term Rental Investor: From Zero to Full-Time Income

STR Data Labâ„¢ by AirDNA

Play Episode Listen Later May 30, 2024 46:22


In this episode of the STR Data Lab, Jamie Lane sits down with vacation rental investor Rohin Dhar to discuss his experience and strategies in the short-term rental market. Rohin discusses his approach to property management, highlighting the importance of creating unique experiences for guests, such as adding hot tubs to his properties to maintain high occupancy rates. He mentions using social media, particularly Twitter, to post interesting houses and connect potential buyers with real estate agents, aiming to move towards facilitating transactions. Rohin is preparing to get licensed in California to legally assist with real estate transactions and is inspired by similar models like Expedia's curated travel experiences by influencers. Rohin reflects on the challenges of short-term rentals, noting the media's tendency to focus on negative stories and the complex impact of banning Airbnb on local affordability. He argues that banning short-term rentals often leads to properties being bought by second-home buyers, not significantly improving local affordability. Rohin emphasizes the potential for local regulations to better support affordability by restricting purchases to local workers. He shares examples of regulations in places like South Lake Tahoe and Banff, which aim to balance tourism and local affordability. Rohin also discusses the economics of direct bookings versus using platforms like Airbnb and Vrbo, suggesting that direct booking might make sense for property managers with multiple properties but is often not worth the effort for individual owners. He highlights the benefits of leveraging established platforms for ease and efficiency. Rohin mentions the potential for monetizing his social media presence by helping people buy houses and earn commissions. He prefers focusing on one platform, Twitter, due to its low effort and effectiveness for his needs. Rohin advises against building large distribution channels without having a product or service to sell, emphasizing the importance of validating business models first. He shares insights on revenue management, stressing the need to stay on top of market changes to adjust pricing strategies effectively. Lastly, Rohin discusses the impact of regulatory changes on short-term rental markets and the importance of community-focused solutions. You don't want to miss this episode! ~~~~ Rohin's Twitter: https://x.com/rohindhar ~~~~ Signup for AirDNA for FREE

MetaMorphos
Faire des créas qui vendent sur Meta Ads (avec Rohin Sama)

MetaMorphos

Play Episode Listen Later Apr 28, 2024 46:17


Comment faire des créas dans l'objectif de vendre sur Meta Ads ?Aujourd'hui, je reçois Rohin Sama, Creative Strategist de 900.care, une très belle marque de produit de salle de bain à recharger.C'est lui qui est derrière les pubs que tu vois partout sur ton feed.

10X Success Hacks for Startups, Innovations and Ventures (consulting and training tips)
Finding The Right Co-Founder, Raising Funds & Scaling The Business |Rohin Parkar, Co-Founder,Spintly

10X Success Hacks for Startups, Innovations and Ventures (consulting and training tips)

Play Episode Listen Later Apr 15, 2024 22:09


In today's episode of Pitch Cafe, we have Rohin Parkar CEO & Co-Founder of Spintly, with us.

Comedicine
S2 E7 That's Edu-tainment!

Comedicine

Play Episode Listen Later Apr 4, 2024 45:52


We head back across the pond to UK interventional cardiologist, Youtuber and comedian, Dr Rohin Francis. His YouTube channel is called Medlife Crisis. Rohin makes videos on medical topics that don't get covered elsewhere and/or with more scientific rigor than Goop! He breaks it down to make it understandable and he crams in a lot of jokes! We talk about science, jokes and trying to find balance. Is that a thing?Find Dr Rohin Francis here:Website: https://www.medlifecrisis.co.ukYouTube: https://www.youtube.com/@MedlifeCrisisInstagram: https://www.instagram.com/medcrisis/Timestamps[00:00] Introducing Dr Rohin Francis[00:48] Interventional Cardiologist and Conventional Cardiologist[02:10] Bad timing on the SPACE time continuum[03:47] That's Edutainment![07:40] Exercise vs Comedy: Which one is the Magic Bullet?[08:27] Cardiology with humour and heart[10:17] Comedy in teaching and teaching in comedy[17:40] Science, vaccines, nutrition and other controversies [23:10] YouTube Channels on Starting YouTube Channels[27:40] There's riches in the niches[29:10] Veterinarians and Doctors: different but same same[33:26] The Widowmaker and why you shouldn't call it that[36:06] Achieving work-life balance by not sleeping much[38:30] YouTuber vs Physician in the NHS[43:01] Rohin invites Dr Ken Jeong to join us on Comedicine[44:29] OutroInstagram @comedicine_comedyComedicine FacebookYour host, Dr Sarah BostonDr Sarah Boston is a veterinary surgical oncologist (cancer surgeon for dogs and cats), cancer survivor (ironic, right?), bestselling author, actor and stand up comedian. She is a 2023 graduate of the Humber College Comedy Performance and Writing Program. She is the 2023 recipient of the Tim Sims Encouragement Fund Award, which recognizes and supports promising comedic performers in the early stages of their career She is also the recipient of the Award for Academic Excellence from Humber College because she is a nerd in all aspects of her life. Instagram @drsarahboston www.Drsarahboston.com Representation Book Musical Genius Mark Edwards Producer Heather McPherson Twisted Spur Media

The Nonlinear Library
AF - What's up with LLMs representing XORs of arbitrary features? by Sam Marks

The Nonlinear Library

Play Episode Listen Later Jan 3, 2024 25:58


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What's up with LLMs representing XORs of arbitrary features?, published by Sam Marks on January 3, 2024 on The AI Alignment Forum. Thanks to Clément Dumas, Nikola Jurković, Nora Belrose, Arthur Conmy, and Oam Patel for feedback. In the comments of the post on Google Deepmind's CCS challenges paper, I expressed skepticism that some of the experimental results seemed possible. When addressing my concerns, Rohin Shah made some claims along the lines of "If an LLM linearly represents features a and b, then it will also linearly represent their XOR, ab, and this is true even in settings where there's no obvious reason the model would need to make use of the feature ab."[1] For reasons that I'll explain below, I thought this claim was absolutely bonkers, both in general and in the specific setting that the GDM paper was working in. So I ran some experiments to prove Rohin wrong. The result: Rohin was right and I was wrong. LLMs seem to compute and linearly represent XORs of features even when there's no obvious reason to do so. I think this is deeply weird and surprising. If something like this holds generally, I think this has importance far beyond the original question of "Is CCS useful?" In the rest of this post I'll: Articulate a claim I'll call "representation of arbitrary XORs (RAX)": LLMs compute and linearly represent XORs of arbitrary features, even when there's no reason to do so. Explain why it would be shocking if RAX is true. For example, without additional assumptions, RAX implies that linear probes should utterly fail to generalize across distributional shift, no matter how minor the distributional shift. (Empirically, linear probes often do generalize decently.) Present experiments showing that RAX seems to be true in every case that I've checked. Think through what RAX would mean for AI safety research: overall, probably a bad sign for interpretability work in general, and work that relies on using simple probes of model internals (e.g. ELK probes or coup probes) in particular. Make some guesses about what's really going on here. Overall, this has left me very confused: I've found myself simultaneously having (a) an argument that AB, (b) empirical evidence of A, and (c) empirical evidence of B. (Here A = RAX and B = other facts about LLM representations.) The RAX claim: LLMs linearly represent XORs of arbitrary features, even when there's no reason to do so To keep things simple, throughout this post, I'll say that a model linearly represents a binary feature f if there is a linear probe out of the model's latent space which is accurate for classifying f; in this case, I'll denote the corresponding direction as vf. This is not how I would typically use the terminology "linearly represents" - normally I would reserve the term for a stronger notion which, at minimum, requires the model to actually make use of the feature direction when performing cognition involving the feature[2]. But I'll intentionally abuse the terminology here because I don't think this distinction matters much for what I'll discuss. If a model linearly represents features a and b, then it automatically linearly represents ab and ab. However, ab is not automatically linearly represented - no linear probe in the figure above would be accurate for classifying ab. Thus, if the model wants to make use of the feature ab, then it needs to do something additional: allocate another direction[3] (more model capacity) to representing ab, and also perform the computation of ab so that it knows what value to store along this new direction. The representation of arbitrary XORs (RAX) claim, in its strongest form, asserts that whenever a LLM linearly represents features a and b, it will also linearly represent ab. Concretely, this might look something like: in layer 5, the model computes and linearly r...

The Nonlinear Library
AF - Discussion: Challenges with Unsupervised LLM Knowledge Discovery by Seb Farquhar

The Nonlinear Library

Play Episode Listen Later Dec 18, 2023 16:37


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Discussion: Challenges with Unsupervised LLM Knowledge Discovery, published by Seb Farquhar on December 18, 2023 on The AI Alignment Forum. TL;DR: Contrast-consistent search (CCS) seemed exciting to us and we were keen to apply it. At this point, we think it is unlikely to be directly helpful for implementations of alignment strategies (>95%). Instead of finding knowledge, it seems to find the most prominent feature. We are less sure about the wider category of unsupervised consistency-based methods, but tend to think they won't be directly helpful either (70%). We've written a paper about some of our detailed experiences with it. Paper authors: Sebastian Farquhar*, Vikrant Varma*, Zac Kenton*, Johannes Gasteiger, Vlad Mikulik, and Rohin Shah. *Equal contribution, order randomised. Credences are based on a poll of Seb, Vikrant, Zac, Johannes, Rohin and show single values where we mostly agree and ranges where we disagreed. What does CCS try to do? To us, CCS represents a family of possible algorithms aiming at solving an ELK-style problem that have the steps: Knowledge-like property: write down a property that points at an LLM feature which represents the model's knowledge (or a small number of features that includes the model-knowledge-feature). Formalisation: make that property mathematically precise so you can search for features with that property in an unsupervised way. Search: find it (e.g., by optimising a formalised loss). In the case of CCS, the knowledge-like property is negation-consistency, the formalisation is a specific loss function, and the search is unsupervised learning with gradient descent on a linear + sigmoid function taking LLM activations as inputs. We were pretty excited about this. We especially liked that the approach is not supervised. Conceptually, supervising ELK seems really hard: it is too easy to confuse what you know, what you think the model knows, and what it actually knows. Avoiding the need to write down what-the-model-knows labels seems like a great goal. Why we think CCS isn't working We spent a lot of time playing with CCS and trying to make it work well enough to build a deception detector by measuring the difference between elicited model's knowledge and stated claims.[1] Having done this, we are now not very optimistic about CCS or things like it. Partly, this is because the loss itself doesn't give much reason to think that it would be able to find a knowledge-like property and empirically it seems to find whatever feature in the dataset happens to be most prominent, which is very prompt-sensitive. Maybe something building off it could work in the future, but we don't think anything about CCS provides evidence that it would be likely to. As a result, we have basically returned to our priors about the difficulty of ELK, which are something between "very very difficult" and "approximately impossible" for a full solution, while mostly agreeing that partial solutions are "hard but possible". What does the CCS loss say? The CCS approach is motivated like this: we don't know that much about the model's knowledge, but probably it follows basic consistency properties. For example, it probably has something like Bayesian credences and when it believes A with some probability PA, it ought to believe A with probability 1PA.[2] So if we search in the LLM's feature space for features that satisfy this consistency property, the model's knowledge is going to be one of the things that satisfies it. Moreover, they hypothesise, there probably aren't that many things that satisfy this property, so we can easily check the handful that we get and find the one representing the model's knowledge. When we dig into the CCS loss, it isn't clear that it really checks for what it's supposed to. In particular, we prove that arbitrary features, not jus...

First Principles
Lalit Keshre of Groww on being far-sighted, intuitive and absolutely obsessed with your customer

First Principles

Play Episode Listen Later Oct 11, 2023 96:35


The full transcript of this episode is available here.You've probably heard of Groww. It is a financial services platform, last valued at $3 billion. This year, the Groww team travelled to a bunch of tier-2 and tier-3 cities in South India to talk to users of its products. In fact, so far, they've been to 100 such cities.In this conversation with Rohin, Lalit—the co-founder and CEO of Groww—describes what typically happens during these visits.Take a city like Indore, for example. Hundreds of Groww users travel really long distances just to talk to the Groww team. They want to know things like how to use the app for something as simple as investing in an SIP that has actually changed their lives.Lalit tells Rohin how the same people have told so many of their friends and family members about Groww. They're excited, grateful and most importantly, emotional, says Lalit.And then he says there's just one word that can describe what's happening between Groww and its users…and it is love.Love for customers, love for the product, and love for good financial services…You'll see how this is a recurring theme in this conversation. Customers are at the centre of Groww. Customer obsession, Lalit says, with no uncertainty, is the thing that will make an employee succeed at Groww.Lalit is a man who is sure of many things—like he's good at hiring or how to build an effective direct-to-consumer product. He knows the first principles for solving complex problems like the back of his hand.And what he's unsure of, he's not too worried about. He tells Rohin how he was raised in a way that makes him a bit different from other founders.Everything will pass, he says. Every problem will be solved. And he explains how this keeps him going, keeps Groww going.In his stoic, cool manner, he goes on to talk about:Why financial services is a basic necessityHow to create delight with your productUnderstanding the difference between what your customer wants and what they will useWhy "discipline is overrated"A simple process that Lalit uses to hire the right people_____________Chapters: 03:01 - What we talk about when we talk about financial services07:19 - Democratising financial services09:41 - Supply and demand of financial services10:51  - How Groww grew 16:33 - How to find future customers today?20:09 - The one metric Groww watches22:21 - What customers say is not what customers do30:37 - What makes people love a financial services brand?35:16 - Lalit's career ladder40:08 - The seeds for Groww46:31 - “Not old & wise enough to give advice, but if there was one thing…”53:14 - Cracking delayed monetisation55:28 - The only thing that guides Groww1:02:31 - “Discipline is overrated”1:05:38 - A simple process to hire the right people1:08:23 - What does Lalit read?1:14:36 - A weekday in Lalit's life1:16:25 - What kind of people succeed at Groww?_____________Also, tune in to the latest episode of Daybreak—The Ken's podcast which breaks down one significant business story thrice a week—to learn about the talent exodus at Niti Aayog on Spotify, Apple or wherever you get your podcasts!The Ken is India's first subscriber-only business journalism platform. Check out our deeply reported long-form stories, insightful newsletters, original podcasts and much more here.

The Henry George Program
Rohin Ghosh on DC, Tenant Movements, Democracy

The Henry George Program

Play Episode Listen Later Sep 28, 2023


Rohin Ghosh has moved on to school in DC, and has been keeping busy by acquiring public office (!); he informs us all about how DC's ANCs work, as well as larger dynamics of housing in our nation's capital. Also talk on tenant organizing, as well what this means for democracy more generally.

The Your Life! Your Terms! Show
Ricky Zhang & Rohin Jain - How To Travel The World For FREE Using Credit Card Reward Points

The Your Life! Your Terms! Show

Play Episode Listen Later Sep 19, 2023 88:13


Ricky Zhang and Rohin Jain from The Prince of Travel are somehow travelling the world for free using credit card reward points. We asked them for all their secrets on how they're doing so. What the best travel credit cards are in Canada, the minimalist credit card portfolio that will get you 80% of the credit card points results you're looking for, and how to optimally redeem your points for maximum return on investment. If you travel a few times a year or have ever dreamed of travelling all over the world, you NEED to listen to this podcast! Use promo code YOURLIFEYOURTERMS for $50 off a Travel Summit ticket at https://thetravelsummit.com/ and go to https://www.yourlifeyourtermsevent.com/ to check out the big upcoming Rock Star event.

80k After Hours
Highlights: #154 – Rohin Shah on DeepMind and trying to fairly hear out both AI doomers and doubters

80k After Hours

Play Episode Listen Later Sep 12, 2023 22:45


This is a selection of highlights from episode #154 of The 80,000 Hours Podcast.These aren't necessarily the most important, or even most entertaining parts of the interview — and if you enjoy this, we strongly recommend checking out the full episode:Rohin Shah on DeepMind and trying to fairly hear out both AI doomers and doubtersAnd if you're finding these highlights episodes valuable, please let us know by emailing podcast@80000hours.org.Highlights put together by Simon Monsour and Milo McGuire

Patient from Hell
Episode 33: Navigating cancer as a family, with Samira's mom Monika

Patient from Hell

Play Episode Listen Later Aug 9, 2023 31:34


Samira's mom, Monika Daswani, joins the podcast to discuss her experience navigating Samira's cancer diagnosis and treatment as her caregiver. Monika, a trained chef by profession, shares her perspective on caring for her daughter during her cancer treatment including the challenges of maintaining nutrition during treatment, adapting meals to symptoms, and the importance of a strong support network. Monika emphasizes the role of faith, spirituality, and intent in both caregiving and healing. She also discusses her work leading the Helping Hands Foundation, which supports cancer patients in India. Key Highlights: The role of family caregivers during cancer treatment How faith, intent and spirituality guided Monika's caregiving Navigating nutrition during cancer treatments About our guest: Monika Daswani is the CEO and a member of Helping Hands's Board of Directors. With her leadership, Helping Hands pivoted from a general non-profit to embrace the mission to spread awareness about cancer diagnosis, early detection, and support other patients and caregivers. She was the primary caregiver to her daughter, who was diagnosed with cancer. A self-taught chef, Monika Daswani has dedicated her life to the gourmet food catering industry. She is the Founder and CEO of Kitchen Stories, a food catering company that has provided Kolkata, and numerous cities in India, access to global cuisine for over 2 decades.  Key Moments: 7 minutes: It takes a village - “The whole family wanted to be there with and for Samira. We are a small family of five, and each of us contributed in our own way for her. I was looking after Samira's food. My husband, who is very good at doing research, was helping me with handling symptoms with the correct food and home remedies. Raghav was Samira's emotional support. My other son Rohin, because he also lives in San Francisco, knew the healthcare system and could negotiate and set up appointments and get reports.” 10 minutes: On how faith has played a part in her caregiving journey - “I essentially work on faith. And I have enough faith, as I tell my family, for my entire family. Because I know I'm not alone, and I know the universe is there for me, and I think the universe is actually there for everyone. It's just up to us to realize him and absorb that energy from him. And as a caregiver, I think that is what I did because there is only so much that you can do as a human being. There is so much more that you require to be able to successfully survive this journey because at every point, it's trying to bring you down.” 12 minutes: On nutrition during cancer treatments - “Feeding her during chemo was all about focusing on how much nutrition and energy can be absorbed from what you are taking in because a body needs it all. The body is battling all those poisons that are going in. All the good cells, bad cells, everything is getting killed. We need the body to fight back. Because I come from India, when we cook, we believe that it's your thoughts and it's your vibrations that get transmitted in the food that you cook. It's your intent. And that's what I would focus on.”   Visit the Manta Cares website  Disclaimer: This podcast is for general informational purposes only and does not constitute the practice of medicine, nursing or other professional health care services, including the giving of medical advice, and no doctor/patient relationship is formed. The use of information on this podcast or materials linked from this podcast is at the user's own risk. The content of this podcast is not intended to be a substitute for professional medical advice, diagnosis, or treatment. Users should not disregard, or delay in obtaining, medical advice for any medical condition they may have, and should seek the assistance of their health care professionals for any such conditions. --- Support this podcast: https://podcasters.spotify.com/pod/show/manta-cares/support

80,000 Hours Podcast with Rob Wiblin
#154 - Rohin Shah on DeepMind and trying to fairly hear out both AI doomers and doubters

80,000 Hours Podcast with Rob Wiblin

Play Episode Listen Later Jun 9, 2023 189:42


Can there be a more exciting and strange place to work today than a leading AI lab? Your CEO has said they're worried your research could cause human extinction. The government is setting up meetings to discuss how this outcome can be avoided. Some of your colleagues think this is all overblown; others are more anxious still.Today's guest — machine learning researcher Rohin Shah — goes into the Google DeepMind offices each day with that peculiar backdrop to his work. Links to learn more, summary and full transcript.He's on the team dedicated to maintaining 'technical AI safety' as these models approach and exceed human capabilities: basically that the models help humanity accomplish its goals without flipping out in some dangerous way. This work has never seemed more important.In the short-term it could be the key bottleneck to deploying ML models in high-stakes real-life situations. In the long-term, it could be the difference between humanity thriving and disappearing entirely.For years Rohin has been on a mission to fairly hear out people across the full spectrum of opinion about risks from artificial intelligence -- from doomers to doubters -- and properly understand their point of view. That makes him unusually well placed to give an overview of what we do and don't understand. He has landed somewhere in the middle — troubled by ways things could go wrong, but not convinced there are very strong reasons to expect a terrible outcome.Today's conversation is wide-ranging and Rohin lays out many of his personal opinions to host Rob Wiblin, including:What he sees as the strongest case both for and against slowing down the rate of progress in AI research.Why he disagrees with most other ML researchers that training a model on a sensible 'reward function' is enough to get a good outcome.Why he disagrees with many on LessWrong that the bar for whether a safety technique is helpful is “could this contain a superintelligence.”That he thinks nobody has very compelling arguments that AI created via machine learning will be dangerous by default, or that it will be safe by default. He believes we just don't know.That he understands that analogies and visualisations are necessary for public communication, but is sceptical that they really help us understand what's going on with ML models, because they're different in important ways from every other case we might compare them to.Why he's optimistic about DeepMind's work on scalable oversight, mechanistic interpretability, and dangerous capabilities evaluations, and what each of those projects involves.Why he isn't inherently worried about a future where we're surrounded by beings far more capable than us, so long as they share our goals to a reasonable degree.Why it's not enough for humanity to know how to align AI models — it's essential that management at AI labs correctly pick which methods they're going to use and have the practical know-how to apply them properly.Three observations that make him a little more optimistic: humans are a bit muddle-headed and not super goal-orientated; planes don't crash; and universities have specific majors in particular subjects.Plenty more besides.Get this episode by subscribing to our podcast on the world's most pressing problems and how to solve them: type ‘80,000 Hours' into your podcasting app. Or read the transcript below.Producer: Keiran HarrisAudio mastering: Milo McGuire, Dominic Armstrong, and Ben CordellTranscriptions: Katy Moore

First Principles
Five founders on fundraising, culture-building, clock- building, culture-shaping, people coaching, surviving and thriving

First Principles

Play Episode Listen Later Mar 15, 2023 111:51


If this isn't the first time you're listening to First Principles, you're probably wondering what's going on. Why did we have five different founders opening the episode, instead of just one. It's because today's is a special episode. We went back to the first five episodes we didto compile some of the most interesting, original and often counterintuitive conversations from five accomplished founders. While I'd urge you to listen to each of their conversations in full, this episode is a good place if you took to First Principles recently and are searching for reasons to listen to older episodes. We begin with Kabeer Biswas, the co-founder and CEO of Dunzo, the cult-like instant delivery service that started from Bangalore. He talks about the 10,000 plus tasks he's run himself on Dunzo, the impossible grind of fundraising, how founders' traits tend to show up as organization's culture and many more things. Episode 1: Kabeer Biswas of Dunzo talks about raising money, gathering user insights, battling deadlines and moreNext, we have Baskar Subramanian, the co-founder and CEO of Amagi, the most unlikely of unicorns to emerge from India. It is a media technology company that enables virtually the entire video production and distribution chain for all sorts of media companies globally. “Glass to glass solutions” is how Amagi describes itself, implying its presence from the glass of the camera where video is being shot to the glass of the screen on which it is finally watched. Baskar dropped out of his master's program at IIT Bombay because he found it oriented around getting grades, not necessarily learning. He talks about why entrepreneurship is like a drug for him, why vulnerability is a core value at Amagi, why a CEO's job is to be a clock-builder and not a time-keeper, and many more things. Episode 2: $1.5B Amagi Founder Baskar Subramanian talks about culture at work, parenting, and building from ground upNext is Nithin Kamath, the co-founder and CEO of Zerodha, India's largest online brokerage. He doesn't believe in setting targets or goals for his company or employees. He also is one of those rare Bangalore founders who have succeeded at scale without taking a single dollar of venture capital. No, in fact Nithin insists Zerodha's success is partly due to avoiding venture capital.From his anonymous days as “Nathan Hawk”, “Tarzan” or “Columbus” at a call center or internet forums, to running one of the most profitable and yet leanest startups in India, Nithin covers a lot of ground. He talks about thinking like a trader, running a company with zero attrition, creating optionality and many more things. Episode 3: Nithin Kamath of Zerodha candidly talks about building his bootstrapped business, weighing risks, and finding opportunitiesWhich brings me to Naveen Tewari, the co-founder and CEO of adtech giant InMobi, which is not only a unicorn in terms of its own valuation, but was also the first Indian company to incubate another unicorn of its own, lockscreen giant Glance. Naveen is one of the earliest tech entrepreneurs from India, having started his very first company, SMS-based search provider mKhoj, way back in 2007. Thus, survival is one of his recurring themes.Over the conversation he talks about the mistakes he made as an entrepreneur and his lessons from them, building careers and companies slowly instead of “blitzscaling”, CEOs pushing the envelope of what's possible within companies, and a lot more. Episode 4: InMobi founder Naveen Tewari gets candid about survival, innovation, and playing the game by changing the rulesAnd finally, we have Ananth Narayanan, the co-founder and CEO of Mensa Brands, a global tech-led house of brands – I know, it's a mouthful – which earned the distinction of becoming India's fastest unicorn. It buys existing brands, and then punches up their scale by providing the resources and knowledge to do so. Ananth says that's no different from a P&G, which too is a house of brands if you really look closely.Ananth talks about the emotional toll founders pay silently each day, learning to manage energy and not time, the best way to solicit and give feedback, and many more things.Episode 5: Learnability, curiosity, and brand building; Ex-Myntra CEO and Mensa Brands founder Ananth Narayanan gets candidI hope I managed to interest you in at least a few of those incredible conversations, if not all of them! Tell us what you thought of today's format. Did you like it? No? What other new features would you like from First Principles or The Ken?  Write to me at podcasts@the-ken.com. And if you haven't already rated us on your favourite podcast platform, why is that? I would truly appreciate your rating, no matter what it is. Lastly, a big thanks to my colleague Rajiv CN, our resident sound engineer, for helping put together this special episode across nearly 8 hours of conversations. See you next time with a new conversation with another accomplished founder. Till then, this is me Rohin thanking you for listening and for your support. 

Knock Knock, Hi! with the Glaucomfleckens
Comedy in Medicine with Cardiologist Dr. Rohin Francis

Knock Knock, Hi! with the Glaucomfleckens

Play Episode Listen Later Feb 21, 2023 77:39


Cardiologist Dr. Rohin Francis, chats with the Glaucomfleckens about the power of mixing comedy with medicine and they take the time to compare the US and UK medical systems — Want more Dr.Rohin Francis Website: https://www.medlifecrisis.co.uk/ YouTube: Med Life Crisis Instagram: @medcrisis Tik Tok: @medcrisis — We want to hear YOUR stories (and medical puns)! Shoot us an email and say hi! knockknockhi@human-content.com Can't get enough of us? Shucks. You can support the show on Patreon for early episode access, exclusive bonus shows, livestream hangouts, and much more! – www.patreon.com/glaucomflecken    Produced by Human Content Learn more about your ad choices. Visit megaphone.fm/adchoices

The Nonlinear Library
AF - Shard theory alignment requires magic. by Charlie Steiner

The Nonlinear Library

Play Episode Listen Later Jan 20, 2023 4:38


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Shard theory alignment requires magic., published by Charlie Steiner on January 20, 2023 on The AI Alignment Forum. A delayed hot take. This is pretty similar to previous comments from Rohin. "Magic," of course, in the technical sense of stuff we need to remind ourselves we don't know how to do. I don't mean this pejoratively, locating magic is an important step in trying to demystify it. And "shard theory alignment" in the sense of building an AI that does good things and not bad things by encouraging an RL agent to want to do good things, via kinds of reward shaping analogous to the diamond maximizer example. How might the story go? You start out with some unsupervised model of sensory data. On top of its representation of the world you start training an RL agent, with a carefully chosen curriculum and a reward signal that you think matches "goodness in general" on that curriculum distribution. This cultivates shards that want things in the vicinity of "what's good according to human values." These start out as mere bundles of heuristics, but eventually they generalize far enough to be self-reflective, promoting goal-directed behavior that takes into account the training process and the possibility of self-modification. At this point the values will lock themselves in, and future behavior will be guided by the abstractions in the learned representation of the world that the shards used to get good results in training, not by what would actually maximize the reward function you used. There magic here is especially concentrated around how we end up with the right shards. One magical process is how we pick the training curriculum and reward signal. If the curriculum is only made up only of simple environments, then the RL agent will learn heuristics that don't need to refer to humans. But if you push the complexity up too fast, the RL process will fail, or the AI will be more likely to learn heuristics that are better than nothing but aren't what we intended. Does a goldilocks zone where the agent learns more-or-less what we intended exist? How can we build confidence that it does, and that we've found it? And what's in the curriculum matters a lot. Do we try to teach the AI to locate "human values" by having it be prosocial towards individuals? Which ones? To groups? Over what timescale? How do we reward it for choices on various ethical dilemmas? Or do we artificially suppress the rate of occurrence of such dilemmas? Different choices will lead to different shards. We wouldn't need to find a unique best way to do things (that's a boondoggle), but we would need to find some way of doing things that we trust enough. Another piece of magic is how the above process lines up with generalization and self-reflectivity. If the RL agent becomes self-reflective too early, it will lock in simple goals that we don't want. If it becomes self-reflective too late, it will have started exploiting unintended maxima of the reward function. How do we know when we want the AI to lock in its values? How do we exert control over that? If shard theory alignment seemed like it has few free parameters, and doesn't need a lot more work, then I think you failed to see the magic. I think the free parameters haven't been discussed enough precisely because they need so much more work. The part of the magic that I think we could start working on now is how to connect curricula and learned abstractions. In order to predict that a certain curriculum will cause an AI to learn what we think is good, we want to have a science of reinforcement learning advanced in both theory and data. In environments of moderate complexity (e.g. Atari, MuJoCo), we can study how to build curricula that impart different generalization behaviors, and try to make predictive models of this process. Even if shard theory ali...

The Nonlinear Library
AF - Review AI Alignment posts to help figure out how to make a proper AI Alignment review by Oliver Habryka

The Nonlinear Library

Play Episode Listen Later Jan 10, 2023 3:21


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Review AI Alignment posts to help figure out how to make a proper AI Alignment review, published by Oliver Habryka on January 10, 2023 on The AI Alignment Forum. I've had many conversations over the last few years about the health of the AI Alignment field and one of the things that has come up most frequently (including in conversations with Rohin, Buck and various Open Phil people) is that many people wish there was more of a review process in the AI Alignment field. I also think there is a bunch of value in better review processes, but have felt hesitant to create something very official and central, since AI Alignment is a quite preparadigmatic field, which makes creating shared standards of quality hard, and because I haven't had the time to really commit to maintain something great here. Separately, I am also quite proud of the LessWrong review, and am very happy about the overall institution that we've created there, and I realized that the LessWrong review might just be a good test bed and bandaid for having a better AI Alignment review process. I think the UI we built for it is quite good, and I think the vote does have real stakes and a lot of the people voting are also people quite active in AI Alignment. So this year, I would like to encourage many of the people who expressed a need for better review processes in AI Alignment to try reviewing some AI Alignment posts from 2021 as part of the LessWrong review. I personally got quite a bit of personal value out of doing that, and e.g. found that my review of the MIRI dialogues helped crystallize some helpful new directions for me to work towards, and I am also hoping to write a longer review of Eliciting Latent Knowledge that I also think will help clarify some things for me, and is something that I will feel comfortable linking to later when people ask me about my takes on ELK-adjacent AI Alignment research. I am also interested in comments on this post with takes for better review-processes in AI Alignment. I am currently going through a period where I feel quite confused how to relate to the field at large, so it might be a good time to also have a conversation about what kind of standards we even want to have in the field. Current AI Alignment post frontrunners in the review We've had an initial round of preliminary voting, in which people cast non-binding votes that help prioritize posts during the Review Phase. Among Alignment Forum voters, the top Alignment Forum posts were: ARC's first technical report: Eliciting Latent Knowledge What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs) Another (outer) alignment failure story Finite Factored Sets Ngo and Yudkowsky on alignment difficulty My research methodology Fun with +12 OOMs of Compute The Plan Comments on Carlsmith's “Is power-seeking AI an existential risk?” Ngo and Yudkowsky on AI capability gains There are also a lot of other great alignment posts in the review (a total of 88 posts were nominated), and I do expect things to shift around a bit, but I do think all 10 of these top essays deserve some serious engagement and a relatively in-depth review, since I expect most of them will get read by people for many years to come, and people might be basing new research approaches and directions on them. To review a post, you can navigate to the post page, and click the "Review" button at the top of the page (just under the post title). It looks like this: Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

The Nonlinear Library
AF - Definitions of “objective” should be Probable and Predictive by Rohin Shah

The Nonlinear Library

Play Episode Listen Later Jan 6, 2023 20:08


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Definitions of “objective” should be Probable and Predictive, published by Rohin Shah on January 6, 2023 on The AI Alignment Forum. Introduction Core arguments about existential risk from AI misalignment often reason about AI “objectives” to make claims about how they will behave in novel situations. I often find these arguments plausible but not rock solid because it doesn't seem like there is a notion of “objective” that makes the argument clearly valid. Two examples of these core arguments: AI risk from power-seeking. This is often some variant of “because the AI system is pursuing an undesired objective, it will seek power in order to accomplish its goal, which causes human extinction”. For example, “The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.” This is a prediction about a novel situation, since “causing human extinction” is something that only happens at most once. AI optimism. This is often some variant of “we will use human feedback to train the AI system to help humans, and so it will learn to pursue the objective of helping humans.” Implicitly, this is a prediction about what AI systems do in novel situations; for example, it is a prediction that once the AI system has enough power to take over the world, it will continue to help humans rather than execute a treacherous turn. When we imagine powerful AI systems built out of large neural networks, I'm often somewhat skeptical of these arguments, because I don't see a notion of “objective” that can be confidently claimed is: Probable: there is a good argument that the systems we build will have an “objective”, and Predictive: If I know that a system has an “objective”, and I know its behavior on a limited set of training data, I can predict significant aspects of the system's behavior in novel situations (e.g. whether it will execute a treacherous turn once it has the ability to do so successfully). Note that in both cases, I find the stories plausible, but they do not seem strong enough to warrant confidence, because of the lack of a notion of “objective” with these two properties. In the case of AI risk, this is sufficient to justify “people should be working on AI alignment”; I don't think it is sufficient to justify “if we don't work on AI alignment we're doomed”. The core difficulty is that we do not currently understand deep learning well enough to predict how future systems will generalize to novel circumstances. So, when choosing a notion of “objective”, you either get to choose a notion that we currently expect to hold true of future deep learning systems (Probable), or you get to choose a notion that would allow you to predict behavior in novel situations (Predictive), but not both. This post is split into two parts. In the first part, I'll briefly gesture at arguments that make predictions about generalization behavior directly (i.e. without reference to “objectives”), and why they don't make me confident about how future systems will generalize. In the second part, I'll demonstrate how various notions of “objective” don't seem simultaneously Probable and Predictive. Part 1: We can't currently confidently predict how future systems will generalize Note that this is about what we can currently say about future generalization. I would not be shocked if in the future we could confidently predict how the future AGI systems will generalize. My core reasons for believing that predicting generalization is hard are that: We can't predict how current systems will generalize to novel situations (of similar novelty to the situations that would be encountered when deliberately causing an existential catastrophe) There are a ridiculously huge number of possible programs, including a huge number of possible programs that are consistent with a ...

The Nonlinear Library
AF - Categorizing failures as “outer” or “inner” misalignment is often confused by Rohin Shah

The Nonlinear Library

Play Episode Listen Later Jan 6, 2023 12:59


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Categorizing failures as “outer” or “inner” misalignment is often confused, published by Rohin Shah on January 6, 2023 on The AI Alignment Forum. Pop quiz: Are the following failures examples of outer or inner misalignment? Or is it ambiguous? (I've spoilered the scenarios so you look at each one separately and answer before moving on to the next. Even if you don't want to actually answer, you should read through the scenarios for the rest of the post to make sense.) Scenario 1: We train an AI system that writes pull requests based on natural language dialogues where we provide specifications. Human programmers check whether the pull request is useful, and if so the pull request is merged and the AI is rewarded. During training the humans always correctly evaluate whether the pull request is useful. At deployment, the AI constructs human farms where humans are forced to spend all day merging pull requests that now just add useless print statements to the codebase. Scenario 2: We train an AI system to write pull requests as before. The humans providing feedback still correctly evaluate whether the pull request is useful, but it turns out that during training the AI system noticed that it was in training, and so specifically wrote useful pull requests in order to trick the humans into thinking that it was helpful, when it was already planning to take over the world and build human farms. I posed this quiz to a group of 10+ full-time alignment researchers a few months ago, and a majority thought that Scenario 1 was outer misalignment, while almost everyone thought that Scenario 2 was inner misalignment. But that can't be correct, since Scenario 2 is just Scenario 1 with more information! If you think that Scenario 2 is clearly inner misalignment, then surely Scenario 1 has to be either ambiguous or inner misalignment! (See the appendix for hypotheses about what's going on here.) I claim that most researchers' intuitive judgments about outer and inner misalignment are confused. So what exactly do we actually mean by “outer” and “inner” misalignment? Is it sensible to talk about separate “outer” and “inner” misalignment problems, or is that just a confused question? I think it is misguided to categorize failures as “outer” or “inner” misalignment. Instead, I think “outer” and “inner” alignment are best thought of as one particular way of structuring a plan for building aligned AI. I'll first discuss some possible ways that you could try to categorize failures, and why I'm not a fan of them, and then discuss outer and inner alignment as parts of a plan for building aligned AI. Possible categorizations Objective-based. This categorization is based on distinctions between specifications or objectives at different points in the overall system. This post identifies three different notions of specifications or objectives: Ideal objective (“wishes”): The hypothetical objective that describes good behavior that the designers have in mind. Design objective (“blueprint”): The objective that is actually used to build the AI system. This is the designer's attempt to formalize the ideal objective. Revealed objective (“behavior”): The objective that best describes what actually happens. On an objective-based categorization, outer misalignment would be a discrepancy between the ideal and design objectives, while inner misalignment is a discrepancy between the design and revealed objectives. (The mesa-optimizers paper has a similar breakdown, except the third objective is a structural objective, rather than a behavioral one. This is also the same as Richard's Framing 1.) With this operationalization, it is not clear which of the two categories a given situation falls under, even when you know exactly what happened. In our scenarios above, what exactly is the design objective? Is it “how ...

EARadio
AI Safety Careers | Rohin Shah, Lewis Hammond and Kuhan Jeyapragasan | EAGxOxford 22

EARadio

Play Episode Listen Later Dec 18, 2022 52:16


Rohin Shah (DeepMind), Lewis Hammond (Cooperative AI) and Jamie Bernardi (Centre for Effective Altruism) host a Q&A session on careers directed at solving the AI alignment problem.View the original talk and video here.Effective Altruism is a social movement dedicated to finding ways to do the most good possible, whether through charitable donations, career choices, or volunteer projects. EA Global conferences are gatherings for EAs to meet.Effective Altruism is a social movement dedicated to finding ways to do the most good possible, whether through charitable donations, career choices, or volunteer projects. EA Global conferences are gatherings for EAs to meet. You can also listen to this talk along with its accompanying video on YouTube.

EARadio
AI Alignment: An Introduction | Rohin Shah | EAGxOxford 22

EARadio

Play Episode Listen Later Dec 17, 2022 52:50


"You've probably heard that Elon Musk, Stuart Russell, and Stephen Hawking warn of dangers posed by AI. What are these risks, and what basis do they have in AI practice? Rohin Shah will first describe the more philosophical argument that suggests that a superintelligent AI system pursuing the wrong goal would lead to an existential catastrophe. Then, he'll ground this argument in current AI practice, arguing that it is plausible both that we build superintelligent AI in the coming decades, and that such a system would pursue an incorrect goal.View the original talk and video here.Effective Altruism is a social movement dedicated to finding ways to do the most good possible, whether through charitable donations, career choices, or volunteer projects. EA Global conferences are gatherings for EAs to meet.Effective Altruism is a social movement dedicated to finding ways to do the most good possible, whether through charitable donations, career choices, or volunteer projects. EA Global conferences are gatherings for EAs to meet. You can also listen to this talk along with its accompanying video on YouTube.

Parallax by Ankur Kalra
Ep 81: @Medlife Crisis and Authenticity with YouTuber & Interventionalist, Dr Rohin Francis

Parallax by Ankur Kalra

Play Episode Listen Later Dec 5, 2022 44:41


In 2017, Rohin Francis AKA Medlife Crisis uploaded a YouTube video titled “Leonardo da Vinci's theory about the heart was right”. This first video about how the heart valves close was watched by 412K people, garnering more than 700 comments from a diverse audience who connected to the subject with fascination, curiosity and humour. This week's guest on Parallax is Dr Rohin Francis, Consultant Interventional Cardiologist at East Suffolk and North Essex NHS Foundation Trust and prolific YouTuber. Rohin believes that authenticity is one of the keys of his videos' success. When asked about his journey to medicine he summarises: “I was being rebellious and ended up doing the most cliché job for an Indian possible.” He turns the table and asks Ankur what he think is behind the stereotype of the Indian cardiologist. Rohin shares an advice that he received at the beginning of his career: “If you can deal with an average day, the exciting day will take care of itself.” Ankur asks Rohin about Medlife Crisis and the work that goes into producing a show followed by 500K people. Rohin reiterates his passion for research and science communication. Ankur and Rohin discuss what it means to be yourself on social media and what is Rohin's advice to our early-career listeners. What is Rohin's advice for aspiring creators? How does he balance his work and personal life? Questions and comments can be sent to “podcast@radcliffe-group.com” and may be answered by Ankur in the next episode. Guest: @MedCrisis, host: @AnkurKalraMD and produced by: @RadcliffeCARDIO.

First Principles
Vineeta Singh of SUGAR Cosmetics talks about building products, educating consumers, and focusing on the long term

First Principles

Play Episode Listen Later Nov 3, 2022 92:51


SUGAR Cosmetics, though now amongst the most popular and fastest growing cosmetics brand in India, wasn't the first choice when Vineeta was asked to name her new cosmetics company, and neither was cosmetics the first business that Vineeta undertook when she set out to be an entrepreneur. Getting SUGAR to it's users was a tumultuous journey and as Vineeta sits down with Rohin to recount some of the most important points in the journey, we get a peek at the lenses she uses to look at the world around her. We have also published the full transcript for this interview on our website. You can click here and read through it. If you have any questions, thoughts, suggestions, or rants, please email them to podcasts@the-ken.com. We might not be able to reply to all of them but we do read every single one of them. 

The Nonlinear Library
AF - More examples of goal misgeneralization by Rohin Shah

The Nonlinear Library

Play Episode Listen Later Oct 7, 2022 3:23


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: More examples of goal misgeneralization, published by Rohin Shah on October 7, 2022 on The AI Alignment Forum. In our latest paper and accompanying blog post, we provide several new examples of goal misgeneralization in a variety of learning systems. The rest of this post picks out a few upshots that we think would be of interest to this community. It assumes that you've already read the linked blog post (but not necessarily the paper). Goal misgeneralization is not limited to RL The core feature of goal misgeneralization is that after learning, the system pursues a goal that was correlated with the intended goal in the training situations, but comes apart in some test situations. This does not require you to use RL – it can happen with any learning system. The Evaluating Expressions example, where Gopher asks redundant questions, is an example of goal misgeneralization in the few-shot learning regime for large language models. The train/test distinction is not crucial Sometimes people wonder whether goal misgeneralization depends on the train/test distinction, and whether it would no longer be a problem if we were in a continual learning setting. As Evan notes, continual learning doesn't make much of a difference: whenever your AI system is acting, you can view that as a “test” situation with all the previous experience as the “training” situations. If goal misgeneralization occurs, the AI system might take an action that breaks your continual learning scheme (for example, by creating and running a copy of itself on a different server that isn't subject to gradient descent). The Tree Gridworld example showcases this mechanism: an agent trained with continual learning learns to chop trees as fast as possible, driving them extinct, when the optimal policy would be to chop the trees sustainably. (In our example the trees eventually repopulate and the agent recovers, but if we slightly tweak the environment so that once extinct the trees can never come back, then the agent would never be able to recover.) It can be hard to identify goal misgeneralization InstructGPT was trained to be helpful, truthful, and harmless, but nevertheless it will answer "harmful" questions in detail. For example, it will advise you on the best ways to rob a grocery store. An AI system that competently does something that would have gotten low reward? Surely this is an example of goal misgeneralization? Not so fast! It turns out that during training the labelers were told to prioritize helpfulness over the other two criteria. So maybe that means that actually these sorts of harmful answers would have gotten high reward? Maybe this is just specification gaming? We asked the authors of the InstructGPT paper, and their guess was that these answers would have had high variance – some labelers would have given them a high score; others would have given them a low score. So now is it or is it not goal misgeneralization? One answer is to say that it depends on the following counterfactual: “how would the labelers have reacted if the model had politely declined to answer?” If the labelers would have preferred that the model decline to answer, then it would be goal misgeneralization, otherwise it would be specification gaming. As systems become more complicated we expect that it will become harder to (1) aggregate and analyze the actual labels or rewards given during training, and (2) evaluate the relevant counterfactuals. So we expect that it will become more challenging to categorize a failure as specification gaming or goal misgeneralization. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

The Eco Well podcast
(Medical) Experts Behaving Badly w/ Dr Rohin Francis, aka MedLife Crisis

The Eco Well podcast

Play Episode Listen Later Oct 5, 2022 49:38


And we're back! Kicking off season 7 with a convo feat Dr Rohin Francis, otherwise known as MedLife Crisis, about (medical) experts behaving badly online. Enjoy! Interested in supporting the show? Find us on Patreon at www.patreon.com/theecowell

Nutrition Rounds Podcast
Fitness Trackers & Wearables - Health or Hype?

Nutrition Rounds Podcast

Play Episode Listen Later Sep 22, 2022 84:19


Join Dr. Danielle Belardo and her expert of the week, cardiologist and internal medicine physician Dr. Rohin Francis. Rohin is also a writer, comedian, and creator of the YouTube channel Medlife Crisis, where he tackles fascinating and offbeat science topics that don't get covered elsewhere. On today's episode, he covers the spectrum of all things exercise and cardiovascular disease related, with a sprinkle of humor (of course), and answers the burning question—is there such a thing as too much exercise? He may also help save you a few bucks as you eye up the latest flashy fitness tracker.    Danielle and Rohin discuss:      How much exercise the average person needs weekly to accrue health benefits   The causes of atrial fibrillation (A-fib) in high-performance athletes   Target heart rate training and the importance of moderate exercise   The advantages and disadvantages of smart tracking health and fitness devices    The data surrounding heart rate variability and what we actually need to worry about    The fascinating results of the Apple Watch study    Harmful physical and emotional effects of medical over-screening    Dr. Rohin Francis is a Consultant Cardiologist and a Doctoral Researcher at the University College London with a passion for science communication and education. Dr. Francis got his MBBS from St George's Hospital Medical School and trained as a physician at the Cambridge Deanery (Cambridge, UK) with a subspecialization in coronary intervention. Currently, Dr. Francis is undertaking a PhD on imaging techniques for acute myocardial infarction at University College London. Rohin has presented around the world and has been published extensively, including in The Guardian and numerous academic journals and websites.       Thank you so much for taking the time to contribute to a generation that values fact over fiction! Be sure to rate, review, and follow on your favorite podcast app and let us know which not-so-wellness trend you'd like to hear debunked. Follow your host on Instagram @daniellebelardomd and the podcast @wellnessfactvsfiction. All studies discussed can be found @wellnessfvfjournalclub. Follow Rohin @medcrisis.    Thank you to our sponsors for making this episode possible. Check out these deals just for you:  COZY EARTH - Go to cozyearth.com and enter WELLNESS at checkout to SAVE thirty-five  Percent.  ZOCDOC - Go to zocdoc.com/wellness and download the Zocdoc app for FREE. Then start your  search for a top-rated doctor today.  Learn more about your ad choices. Visit megaphone.fm/adchoices

Alignment Newsletter Podcast
Alignment Newsletter #173: Recent language model results from DeepMind

Alignment Newsletter Podcast

Play Episode Listen Later Jul 21, 2022 16:43


Recorded by Robert Miles: http://robertskmiles.com More information about the newsletter here: https://rohinshah.com/alignment-newsletter/ YouTube Channel: https://www.youtube.com/channel/UCfGGFXwKpr-TJ5HfxEFaFCg   HIGHLIGHTS Scaling Language Models: Methods, Analysis & Insights from Training Gopher (Jack W. Rae et al) (summarized by Rohin): This paper details the training of the Gopher family of large language models (LLMs), the biggest of which is named Gopher and has 280 billion parameters. The algorithmic details are very similar to the GPT series (AN #102): a Transformer architecture trained on next-word prediction. The models are trained on a new data distribution that still consists of text from the Internet but in different proportions (for example, book data is 27% of Gopher's training data but only 16% of GPT-3's training data). Like other LLM papers, there are tons of evaluations of Gopher on various tasks, only some of which I'm going to cover here. One headline number is that Gopher beat the state of the art (SOTA) at the time on 100 out of 124 evaluation tasks. The most interesting aspect of the paper (to me) is that the entire Gopher family of models were all trained on the same number of tokens, thus allowing us to study the effect of scaling up model parameters (and thus training compute) while holding data constant. Some of the largest benefits of scale were seen in the Medicine, Science, Technology, Social Sciences, and the Humanities task categories, while scale has not much effect or even a negative effect in the Maths, Logical Reasoning, and Common Sense categories. Surprisingly, we see improved performance on TruthfulQA (AN #165) with scale, even though the TruthfulQA benchmark was designed to show worse performance with increased scale. We can use Gopher in a dialogue setting by prompting it appropriately. The prompt specifically instructs Gopher to be “respectful, polite, and inclusive”; it turns out that this significantly helps with toxicity. In particular, for the vanilla Gopher model family, with more scale the models produce more toxic continuations given toxic user statements; this no longer happens with Dialogue-Prompted Gopher models, which show slight reductions in toxicity with scale in the same setting. The authors speculate that while increased scale leads to an increased ability to mimic the style of a user statement, this is compensated for by an increased ability to account for the prompt. Another alternative the authors explore is to finetune Gopher on 5 billion tokens of dialogue to produce Dialogue-Tuned Gopher. Interestingly, human raters were indifferent between Dialogue-Prompted Gopher and Dialogue-Tuned Gopher. Read more: Blog post: Language modelling at scale: Gopher, ethical considerations, and retrieval Training Compute-Optimal Large Language Models (Jordan Hoffmann et al) (summarized by Rohin): One application of scaling laws (AN #87) is to figure out how big a model to train, on how much data, given some compute budget. This paper performs a more systematic study than the original paper and finds that existing models are significantly overtrained. Chinchilla is a new model built with this insight: it has 4x fewer parameters than Gopher, but is trained on 4x as much data. Despite using the same amount of training compute as Gopher (and lower inference compute), Chinchilla outperforms Gopher across a wide variety of metrics, validating these new scaling laws. You can safely skip to the opinion at this point – the rest of this summary is quantitative details. We want to find functions N(C) and D(C) that specify the optimal number of parameters N and the amount of data D to use given some compute budget C. We'll assume that these scale with a power of C, that is, N(C) = k_N * C^a and D(C) = k_D * C^b, for some constants a, b, k_N, and k_D. Note that since total compute increases linearly with both N (since each forward / backward pass is linear in N) and D (since the number of forward / backwards passes is linear in D), we need to have a + b = 1. (You can see this somewhat more formally by noting that we have C = k_C * N(C) * D(C) for some constant k_C, and then substituting in the definitions of N(C) and D(C).) This paper uses three different approaches to get three estimates of a and b. The approach I like best is “isoFLOP curves”: 1. Choose a variety of possible values of (N, D, C), train models with those values, and record the final loss obtained. Note that not all values of (N, D, C) are possible: given any two values the third is determined. 2. Draw isoFLOP curves: for each value of C, choose either N or D to be your remaining independent variable, and fit a parabola to the losses of the remaining points. The minimum of this parabola gives you an estimate for the optimal N and D for each particular value of C. 3. Use the optimal (N, D, C) points to fit N(C) and D(C). This approach gives an estimate of a = 0.49; the other approaches give estimates of a = 0.5 and a = 0.46. If we take the nice round number a = b = 0.5, this suggests that you should scale up parameters and data equally. With 10x the computation, you should train a 3.2x larger model with 3.2x as much data. In contrast, the original scaling laws paper (AN #87) estimated that a = 0.74 and b = 0.26. With 10x more computation, it would suggest training a 5.5x larger model with 1.8x as much data. Rohin's opinion: It's particularly interesting to think about how this should influence timelines. If you're extrapolating progress forwards in time, the update seems pretty straightforward: this paper shows that you can significantly better capabilities using the same compute budget and so your timelines should shorten (unless you were expecting an even bigger result than this). For bio anchor approaches (AN #121) the situation is more complicated. For a given number of parameters, this paper suggests that it will take significantly more compute than was previously expected to train a model of the required number of parameters. There's a specific parameter for this in the bio anchors framework (for the neural network paths); if you only update that parameter it will lengthen the timelines output by the model. It is less clear how you'd update other parts of the model: for example, should you decrease the size of model that you think is required for TAI? It's not obvious that the reasoning used to set that parameter is changed much by this result, and so maybe this shouldn't be changed and you really should update towards longer timelines overall.   TECHNICAL AI ALIGNMENT PROBLEMS Ethical and social risks of harm from Language Models (Laura Weidinger et al) (summarized by Rohin): This paper provides a detailed discussion, taxonomy, and literature review of various risks we could see with current large language models. It doesn't cover alignment risks; for those you'll want Alignment of Language Agents (AN #144), which has some overlap of authors. I'll copy over the authors' taxonomy in Table 1: 1. Discrimination, Exclusion and Toxicity: These risks arise from the LM accurately reflecting natural speech, including unjust, toxic, and oppressive tendencies present in the training data. 2. Information Hazards: These risks arise from the LM predicting utterances which constitute private or safety-critical information which are present in, or can be inferred from, training data. 3. Misinformation Harms: These risks arise from the LM assigning high probabilities to false, misleading, nonsensical or poor quality information. 4. Malicious Uses: These risks arise from humans intentionally using the LM to cause harm. 5. Human-Computer Interaction Harms: These risks arise from LM applications, such as Conversational Agents, that directly engage a user via the mode of conversation. (For example, users might anthropomorphize LMs and trust them too much as a result.) 6. Automation, access, and environmental harms: These risks arise where LMs are used to underpin widely used downstream applications that disproportionately benefit some groups rather than others. FIELD BUILDING How to pursue a career in technical AI alignment (Charlie Rogers-Smith) (summarized by Rohin): This post gives a lot of advice in great detail on how to pursue a career in AI alignment. I strongly recommend it if you are in such a position; I previously would recommend my FAQ (AN #148) but I think this is significantly more detailed (while providing broadly similar advice).   OTHER PROGRESS IN AI REINFORCEMENT LEARNING Learning Robust Real-Time Cultural Transmission without Human Data (Cultural General Intelligence Team et al) (summarized by Rohin): Let's consider a 3D RL environment with obstacles and bumpy terrain, in which an agent is rewarded for visiting colored spheres in a specific order (that the agent does not initially know). Even after the agent learns how to navigate at all in the environment (non-trivial in its own right), it still has to learn to try the various orderings of spheres. In other words, it must solve a hard exploration problem within every episode. How do humans solve such problems? Often we simply learn from other people who already know what to do, that is, we rely on cultural transmission. This paper investigates what it would take to get agents that learn through cultural transmission. We'll assume that there is an expert bot that visits the spheres in the correct order. Given that, this paper identifies MEDAL-ADR as the necessary ingredients for cultural transmission: 1. (M)emory: Memory is needed for the agent to retain information it is not currently observing. 2. (E)xpert (D)ropout: There need to be some training episodes in which the expert is only present for part of the episode. If the expert was always present, then there's no incentive to actually learn: you can just follow the expert forever. 3. (A)ttention (L)oss: It turns out that vanilla RL by itself isn't enough for the agent to learn to follow the expert. There needs to be an auxiliary task of predicting the relative position of other agents in the world, which encourages the agent to learn representations about the expert bot's position, which then makes it easier for RL to learn to follow the expert. These ingredients by themselves are already enough to train an agent that learns through cultural transmission. However, if you then put the agent in a new environment, it does not perform very well. To get agents that generalize well to previously unseen test environments, we also need: 4. (A)utomatic (D)omain (R)andomization: The training environments are procedurally generated, and the parameters are randomized during each episode. There is a curriculum that automatically increases the difficulty of the environments in lockstep with the agent's capabilities. With all of these ingredients, the resulting agent can even culturally learn from a human player, despite only encountering bots during training. Rohin's opinion: I liked the focus of this paper on identifying the ingredients for cultural transmission, as well as the many ablations and experiments to understand what was going on, many of which I haven't summarized here. For example, you might be interested in the four phases of learning of MEDAL without ADR (random behavior, expert following, cultural learning, and solo learning), or the cultural transmission metric they use, or the “social neurons” they identified which detect whether the expert bot is present. DEEP LEARNING Improving language models by retrieving from trillions of tokens (Sebastian Borgeaud et al) (summarized by Rohin): We know that large language models memorize a lot of their training data, especially data that gets repeated many times. This seems like a waste; we're interested in having the models use their parameters to implement “smart” computations rather than regurgitation of already written text. One natural idea is to give models the ability to automatically search previously written text, which they can then copy if they so choose: this removes their incentive to memorize a lot of training data. The key to implementing this idea is to take a large dataset of text (~trillions of tokens), chunk it into sequences, compute language model representations of these sequences, and store them in a database that allows for O(log N) time nearest-neighbor access. Then, every time we do a forward pass through the model that we're training, we first query the database for the K nearest neighbors (intuitively, the K most related chunks of text), and give the forward pass access to representations for those chunks of text and the chunks immediately following them. This is non-differentiable – from the standpoint of gradient descent, it “looks like” there's always some helpful extra documents that often have information relevant to predicting the next token, and so gradient descent pushes the model to use those extra documents. There's a bunch of fiddly technical details to get this all working that I'm not going to summarize here. As a side benefit, once you have this database of text representations that supports fast nearest neighbor querying, you can also use it to address the problem of test set leakage. For any test document you are evaluating on, you can look for the nearest neighbors in the database and look at the overlap between these neighbors and your test document, to check whether your supposedly “test” document was something the model might have trained on. The evaluation shows that the 7 billion parameter (7B) Retro model from the paper can often do as well as or better than the 280B Gopher or 178B Jurassic-1 (both of which outperform GPT-3) on language modeling, and that it also does well on question answering. (Note that these are both tasks that seem particularly likely to benefit from retrieval.)   NEWS Apply to the Open Philanthropy Technology Policy Fellowship! (Luke Muehlhauser) (summarized by Rohin): This policy fellowship (AN #157) on high-priority emerging technologies is running for the second time! Application deadline is September 15. Job ad: DeepMind Long-term Strategy & Governance Research Scientist (summarized by Rohin): The Long-term Strategy and Governance Team at DeepMind works to build recommendations for better governance of AI, identifying actions, norms, and institutional structures that could improve decision-making around advanced AI. They are seeking a broad range of expertise including: global governance of science and powerful technologies; the technical landscape; safety-critical organisations; political economy of large general models and AI services. The application deadline is August 1st. Also, the Alignment and Scalable Alignment teams at DeepMind are hiring, though some of the applications are closed at this point. Job ads: Anthropic (summarized by Rohin): Anthropic is hiring for a large number of roles (I count 19 different ones as of the time of writing). Job ad: Deputy Director at BERI (Sawyer Bernath) (summarized by Rohin): The Berkeley Existential Risk Initiative (BERI) is hiring a Deputy Director. Applications will be evaluated on a rolling basis. Job ads: Centre for the Governance of AI (summarized by Rohin): The Centre for the Governance of AI has several roles open, including Research Scholars (General Track and Policy Track), Survey Analyst, and three month fellowships. The application deadlines are in the August 1 - 10 range. Job ads: Metaculus (summarized by Rohin): Metaculus is hiring for a variety of roles, including an AI Forecasting Lead. Job ads: Epoch AI (summarized by Rohin): Epoch AI is a new organization that investigates and forecasts the development of advanced AI. They are currently hiring for a Research Manager and Staff Researcher position. Job ad: AI Safety Support is hiring a Chief Operating Officer (summarized by Rohin): Application deadline is August 14.

The Nonlinear Library
AF - [AN #173] Recent language model results from DeepMind by Rohin Shah

The Nonlinear Library

Play Episode Listen Later Jul 21, 2022 15:45


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [AN #173] Recent language model results from DeepMind, published by Rohin Shah on July 21, 2022 on The AI Alignment Forum. HIGHLIGHTS Scaling Language Models: Methods, Analysis & Insights from Training Gopher (Jack W. Rae et al) (summarized by Rohin): This paper details the training of the Gopher family of large language models (LLMs), the biggest of which is named Gopher and has 280 billion parameters. The algorithmic details are very similar to the GPT series (AN #102): a Transformer architecture trained on next-word prediction. The models are trained on a new data distribution that still consists of text from the Internet but in different proportions (for example, book data is 27% of Gopher's training data but only 16% of GPT-3's training data). Like other LLM papers, there are tons of evaluations of Gopher on various tasks, only some of which I'm going to cover here. One headline number is that Gopher beat the state of the art (SOTA) at the time on 100 out of 124 evaluation tasks. The most interesting aspect of the paper (to me) is that the entire Gopher family of models were all trained on the same number of tokens, thus allowing us to study the effect of scaling up model parameters (and thus training compute) while holding data constant. Some of the largest benefits of scale were seen in the Medicine, Science, Technology, Social Sciences, and the Humanities task categories, while scale has not much effect or even a negative effect in the Maths, Logical Reasoning, and Common Sense categories. Surprisingly, we see improved performance on TruthfulQA (AN #165) with scale, even though the TruthfulQA benchmark was designed to show worse performance with increased scale. We can use Gopher in a dialogue setting by prompting it appropriately. The prompt specifically instructs Gopher to be “respectful, polite, and inclusive”; it turns out that this significantly helps with toxicity. In particular, for the vanilla Gopher model family, with more scale the models produce more toxic continuations given toxic user statements; this no longer happens with Dialogue-Prompted Gopher models, which show slight reductions in toxicity with scale in the same setting. The authors speculate that while increased scale leads to an increased ability to mimic the style of a user statement, this is compensated for by an increased ability to account for the prompt. Another alternative the authors explore is to finetune Gopher on 5 billion tokens of dialogue to produce Dialogue-Tuned Gopher. Interestingly, human raters were indifferent between Dialogue-Prompted Gopher and Dialogue-Tuned Gopher. Read more: Blog post: Language modelling at scale: Gopher, ethical considerations, and retrieval Training Compute-Optimal Large Language Models (Jordan Hoffmann et al) (summarized by Rohin): One application of scaling laws (AN #87) is to figure out how big a model to train, on how much data, given some compute budget. This paper performs a more systematic study than the original paper and finds that existing models are significantly undertrained. Chinchilla is a new model built with this insight: it has 4x fewer parameters than Gopher, but is trained on 4x as much data. Despite using the same amount of training compute as Gopher (and lower inference compute), Chinchilla outperforms Gopher across a wide variety of metrics, validating these new scaling laws. You can safely skip to the opinion at this point – the rest of this summary is quantitative details. We want to find functions N(C) and D(C) that specify the optimal number of parameters N and the amount of data D to use given some compute budget C. We'll assume that these scale with a power of C, that is, N(C) = k_N C^a and D(C) = k_D C^b, for some constants a, b, k_N, and k_D. Note that since total compute increases linearly with both N (since each...

The Nonlinear Library
EA - Person-affecting views can often be Dutch booked by Rohin Shah

The Nonlinear Library

Play Episode Listen Later Jul 7, 2022 3:57


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Person-affecting views can often be Dutch booked, published by Rohin Shah on July 7, 2022 on The Effective Altruism Forum. This is a short reference post for an argument I wish was better known. A common intuition people have is that our goal is "Making People Happy, not Making Happy People". That is: Making people happy: if some person Alice will definitely exist, then it is good to improve her welfare Not making happy people: it is neutral to go from "Alice won't exist" to "Alice will exist". Intuitively, if Alice doesn't exist, she can't care that she doesn't live a happy life, and so no harm was done. This position is vulnerable to a Dutch book, that is, there is a set of trades that it would make that would achieve nothing and lose money with certainty. Consider the following worlds: World 1: Alice won't exist in the future. World 2: Alice will exist in the future, and will be slightly happy. World 3: Alice will exist in the future, and will be very happy. (The worlds are the same in every other aspect. It's a thought experiment.) Then this view would be happy to make the following trades: Receive $0.01 to move from World 1 to World 2 ("Not making happy people") Pay $1.00 to move from World 2 to World 3 ("Making people happy") Receive $0.01 to move from World 3 to World 1 ("Not making happy people") The net result is to lose $0.98 to move from World 1 to World 1. FAQ Q. Why should I care if my preferences lead to Dutch booking? This is a longstanding debate that I'm not going to get into here. I'd recommend Holden's series on this general topic, starting with Future-proof ethics. Q. In the real world we'd never have such clean options to choose from. Why does this matter? See previous answer. Q. In step 2, Alice was definitely going to exist, which is why we paid $1. But then in step 3 Alice was no longer definitely going to exist. If we knew step 3 was going to happen, then we wouldn't think Alice was definitely going to exist, and so we wouldn't pay $1. If your person-affecting view requires people to definitely exist, taking into account all decision-making, then it is almost certainly going to include only currently existing people. This does avoid the Dutch book but has problems of its own, most notably time inconsistency. For example, perhaps right before a baby is born, it take actions that as a side effect will harm the baby; right after the baby is born, it immediately undoes those actions to prevent the side effects. Q. What if we instead have ? Often these variants are also vulnerable to the same issue. For example, if you have a "moderate view" where making happy people is not worthless but is discounted by a factor of (say) 10, the same example works with slightly different numbers: Let's say that "Alice is very happy" has an undiscounted worth of 2 utilons. Then you would be happy to (1) move from World 1 to World 2 for free, (2) pay 1 utilon to move from World 2 to World 3, and (3) receive 0.5 utilons to move from World 3 to World 1. More generally, Arrhenius proves an impossibility result that applies to all possible population ethics (not just person-affecting views), so (if you want consistency) you need to bite at least one of those bullets. Further resources On the Overwhelming Importance of Shaping the Far Future (Nick Beckstead's thesis) An Impossibility Theorem for Welfarist Axiologies (Arrhenius paradox, summarized in Section 2 of Impossibility and Uncertainty Theorems in AI Value Alignment) For this post I'll assume that Alice's life is net positive, since "asymmetric" views say that if Alice would have a net negative life, then it would be actively bad (rather than neutral) to move Alice from "won't exist" to "will exist". By giving it $0.01, I'm making it so that it strictly prefers to take the trade (rather than being indifferent to t...

The Nonlinear Library
AF - [AN #172] Sorry for the long hiatus! by Rohin Shah

The Nonlinear Library

Play Episode Listen Later Jul 5, 2022 5:46


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [AN #172] Sorry for the long hiatus!, published by Rohin Shah on July 5, 2022 on The AI Alignment Forum. Listen to this newsletter on The Alignment Newsletter Podcast. Alignment Newsletter is a publication with recent content relevant to AI alignment. Find all Alignment Newsletter resources here. In particular, you can look through this spreadsheet of all summaries that have ever been in the newsletter. Please note that this newsletter represents my personal views and not those of DeepMind. Sorry for the long hiatus! I was really busy over the past few months and just didn't find time to write this newsletter. (Realistically, I was also a bit tired of writing it and so lacked motivation.) I'm intending to go back to writing it now, though I don't think I can realistically commit to publishing weekly; we'll see how often I end up publishing. For now, have a list of all the things I should have advertised to you whose deadlines haven't already passed. NEWS Survey on AI alignment resources (Anonymous) (summarized by Rohin): This survey is being run by an outside collaborator in partnership with the Centre for Effective Altruism (CEA). They ask that you fill it out to help field builders find out which resources you have found most useful for learning about and/or keeping track of the AI alignment field. Results will help inform which resources to promote in the future, and what type of resources we should make more of. Announcing the Inverse Scaling Prize ($250k Prize Pool) (Ethan Perez et al) (summarized by Rohin): This prize with a $250k prize pool asks participants to find new examples of tasks where pretrained language models exhibit inverse scaling: that is, models get worse at the task as they are scaled up. Notably, you do not need to know how to program to participate: a submission consists solely of a dataset giving at least 300 examples of the task. Inverse scaling is particularly relevant to AI alignment, for two main reasons. First, it directly helps understand how the language modeling objective ("predict the next word") is outer misaligned, as we are finding tasks where models that do better according to the language modeling objective do worse on the task of interest. Second, the experience from examining inverse scaling tasks could lead to general observations about how best to detect misalignment. $500 bounty for alignment contest ideas (Akash) (summarized by Rohin): The authors are offering a $500 bounty for producing a frame of the alignment problem that is accessible to smart high schoolers/college students and people without ML backgrounds. (See the post for details; this summary doesn't capture everything well.) Job ad: Bowman Group Open Research Positions (Sam Bowman) (summarized by Rohin): Sam Bowman is looking for people to join a research center at NYU that'll focus on empirical alignment work, primarily on large language models. There are a variety of roles to apply for (depending primarily on how much research experience you already have). Job ad: Postdoc at the Algorithmic Alignment Group (summarized by Rohin): This position at Dylan Hadfield-Menell's lab will lead the design and implementation of a large-scale Cooperative AI contest to take place next year, alongside collaborators at DeepMind and the Cooperative AI Foundation. Job ad: AI Alignment postdoc (summarized by Rohin): David Krueger is hiring for a postdoc in AI alignment (and is also hiring for another role in deep learning). The application deadline is August 2. Job ad: OpenAI Trust & Safety Operations Contractor (summarized by Rohin): In this remote contractor role, you would evaluate submissions to OpenAI's App Review process to ensure they comply with OpenAI's policies. Apply here by July 13, 5pm Pacific Time. Job ad: Director of CSER (summarized by Rohin): Application deadlin...

The Nonlinear Library
AF - DeepMind is hiring for the Scalable Alignment and Alignment Teams by Rohin Shah

The Nonlinear Library

Play Episode Listen Later May 13, 2022 15:10


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: DeepMind is hiring for the Scalable Alignment and Alignment Teams, published by Rohin Shah on May 13, 2022 on The AI Alignment Forum. We are hiring for several roles in the Scalable Alignment and Alignment Teams at DeepMind, two of the subteams of DeepMind Technical AGI Safety trying to make artificial general intelligence go well. In brief, The Alignment Team investigates how to avoid failures of intent alignment, operationalized as a situation in which an AI system knowingly acts against the wishes of its designers. Alignment is hiring for Research Scientist and Research Engineer positions. The Scalable Alignment Team (SAT) works to make highly capable agents do what humans want, even when it is difficult for humans to know what that is. This means we want to remove subtle biases, factual errors, or deceptive behaviour even if they would normally go unnoticed by humans, whether due to reasoning failures or biases in humans or due to very capable behaviour by the agents. SAT is hiring for Research Scientist - Machine Learning, Research Scientist - Cognitive Science, Research Engineer, and Software Engineer positions. We elaborate on the problem breakdown between Alignment and Scalable Alignment next, and discuss details of the various positions. “Alignment” vs “Scalable Alignment” Very roughly, the split between Alignment and Scalable Alignment reflects the following decomposition: Generate approaches to AI alignment – Alignment Team Make those approaches scale – Scalable Alignment Team In practice, this means the Alignment Team has many small projects going on simultaneously, reflecting a portfolio-based approach, while the Scalable Alignment Team has fewer, more focused projects aimed at scaling the most promising approaches to the strongest models available. Scalable Alignment's current approach: make AI critique itself Imagine a default approach to building AI agents that do what humans want: Pretrain on a task like “predict text from the internet”, producing a highly capable model such as Chinchilla or Flamingo. Fine-tune into an agent that does useful tasks, as evaluated by human judgements. There are several ways this could go wrong: Humans are unreliable: The human judgements we train against could be flawed: we could miss subtle factual errors, use biased reasoning, or have insufficient context to evaluate the task. The agent's reasoning could be hidden: We want to know not just what the system is doing but why, both because that might reveal something about what that we don't like, and because we expect good reasoning to better generalize to other situations. Even if the agent is reasoning well, it could fail in other situations: Even if the reasoning is correct this time, the AI could fail to generalize correctly to other situations. Our current plan to address these problem is (in part): Give humans help in supervising strong agents: On the human side, provide channels for oversight and advice from peers, experts in various domains, and broader society. On the ML side, agents should explain their behaviour and reasoning, argue against themselves when wrong, and cite relevant evidence. Align explanations with the true reasoning process of the agent: Ensure that agent's are able and incentivized to show their reasoning to human supervisors, either by making reasoning explicit if possible or via methods for interpretability and eliciting latent knowledge. Red team models to exhibit failure modes that don't occur in normal use We believe none of these pieces are sufficient by themselves: (1) without (2) can be rationalization, where an agent decides what to do and produces an explanation after the fact that justifies its answer. (2) without (1) doesn't scale: The full reasoning trace of the agent might be enormous, it might be terabytes of data even with com...

The Nonlinear Library
EA - DeepMind is hiring for the Scalable Alignment and Alignment Teams by Rohin Shah

The Nonlinear Library

Play Episode Listen Later May 13, 2022 15:05


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: DeepMind is hiring for the Scalable Alignment and Alignment Teams, published by Rohin Shah on May 13, 2022 on The Effective Altruism Forum. We are hiring for several roles in the Scalable Alignment and Alignment Teams at DeepMind, two of the subteams of DeepMind Technical AGI Safety trying to make artificial general intelligence go well. In brief, The Alignment Team investigates how to avoid failures of intent alignment, operationalized as a situation in which an AI system knowingly acts against the wishes of its designers. Alignment is hiring for Research Scientist and Research Engineer positions. The Scalable Alignment Team (SAT) works to make highly capable agents do what humans want, even when it is difficult for humans to know what that is. This means we want to remove subtle biases, factual errors, or deceptive behaviour even if they would normally go unnoticed by humans, whether due to reasoning failures or biases in humans or due to very capable behaviour by the agents. SAT is hiring for Research Scientist - Machine Learning, Research Scientist - Cognitive Science, Research Engineer, and Software Engineer positions. We elaborate on the problem breakdown between Alignment and Scalable Alignment next, and discuss details of the various positions. “Alignment” vs “Scalable Alignment” Very roughly, the split between Alignment and Scalable Alignment reflects the following decomposition: Generate approaches to AI alignment – Alignment Team Make those approaches scale – Scalable Alignment Team In practice, this means the Alignment Team has many small projects going on simultaneously, reflecting a portfolio-based approach, while the Scalable Alignment Team has fewer, more focused projects aimed at scaling the most promising approaches to the strongest models available. Scalable Alignment's current approach: make AI critique itself Imagine a default approach to building AI agents that do what humans want: Pretrain on a task like “predict text from the internet”, producing a highly capable model such as Chinchilla or Flamingo. Fine-tune into an agent that does useful tasks, as evaluated by human judgements. There are several ways this could go wrong: Humans are unreliable: The human judgements we train against could be flawed: we could miss subtle factual errors, use biased reasoning, or have insufficient context to evaluate the task. The agent's reasoning could be hidden: We want to know not just what the system is doing but why, both because that might reveal something about what that we don't like, and because we expect good reasoning to better generalize to other situations. Even if the agent is reasoning well, it could fail in other situations: Even if the reasoning is correct this time, the AI could fail to generalize correctly to other situations. Our current plan to address these problem is (in part): Give humans help in supervising strong agents: On the human side, provide channels for oversight and advice from peers, experts in various domains, and broader society. On the ML side, agents should explain their behaviour and reasoning, argue against themselves when wrong, and cite relevant evidence. Align explanations with the true reasoning process of the agent: Ensure that agent's are able and incentivized to show their reasoning to human supervisors, either by making reasoning explicit if possible or via methods for interpretability and eliciting latent knowledge. Red team models to exhibit failure modes that don't occur in normal use We believe none of these pieces are sufficient by themselves: (1) without (2) can be rationalization, where an agent decides what to do and produces an explanation after the fact that justifies its answer. (2) without (1) doesn't scale: The full reasoning trace of the agent might be enormous, it might be terabytes of data even wi...

A Podcast Of Unnecessary Detail
Live And Kicking Part 1

A Podcast Of Unnecessary Detail

Play Episode Listen Later May 10, 2022 51:50


Helen, Matt and Steve introduce the best bits of their recent live shows, performed at London's Bloomsbury Theatre with nerdy guest performers Dr Rohin Francis aka Medlife Crisis and comedian Rosie Wilby. Plus Matt, Steve and Helen share their own unnecessary details on stage:- Steve's bit (01:33)- Rohin's bit (14:40)- Matt's bit (23:45)- Rosie's bit (39:05)- Helen's bit (47:58)SHOW NOTES: Unfortunately our show notes are too big for Acast's margins to contain... head to the episode page to see everything.Corrections and clarifications:- None, so far.This series is sponsored by Brilliant.org, the place to learn maths and science through interactive online lessons. Start your free trial at Brilliant.org/apoud, and the first 200 Unnecessary Detail listeners who sign up for annual membership will get 20% off on the same link.For tickets to live shows, nerd merch, our mailing list and more, visit: festivalofthespokennerd.com. Want to get in touch? We're on Twitter, Facebook, Instagram or email podcast@festivalofthespokennerd.com. Come for the Unnecessary Detail. Stay for the A Podcast Of. Thanks for listening! See acast.com/privacy for privacy and opt-out information.

The Nonlinear Library
AF - Learning the smooth prior by Geoffrey Irving

The Nonlinear Library

Play Episode Listen Later Apr 29, 2022 18:49


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Learning the smooth prior, published by Geoffrey Irving on April 29, 2022 on The AI Alignment Forum. Most of this document is composed of thoughts from Geoffrey Irving (safety researcher at DeepMind) written on January 15th, 2021 on Learning the Prior/Imitative Generalization, cross-examination, and AI safety via debate, plus some discussion between Geoffrey and Rohin and some extra commentary from me at the end. – Evan Geoffrey on learning the smooth prior Vague claims This doc is about a potential obstacle to Paul's learning the prior scheme (LTP). Before reading this doc, please read either Beth's simplified exposition or Paul's original. The intention of this doc was to argue for two claims, but weakly since I don't have much clarity: LTP has an obstacle in assigning joint probabilities to similar statements. The best version of LTP may collapse into a version of debate + cross-examination However, I don't quite believe (2) after writing the doc (see the section on “How LTP differs from cross-examination”). Wall of text → generative model As originally sketched, the prior z in learning the prior is a huge wall of text, containing useful statements like “A husky is a large, fluffy dog that looks quite like a wolf”, and not containing wrong facts like “if there are a lot of white pixels in the bottom half of the image, then it's a husky” (statements taken from Beth's post). A wall of text is of course unreasonable. Let's try to make it more reasonable: No one believes that a wall of text is the right type for z; instead, we'd like z to be some sort of generative network that spits out statements. (The original proposal wasn't really a wall of text either; the wall was just a thought experiment.) We likely want probabilities attached to these statements so that the prior can include uncertain statements. Whether we attach probabilities or not, the validity of a statement like “A husky is a large, fluffy dog that looks quite like a wolf” depends on the definition of the terms. At a high level, we presumably want to solve this with something like cross-examination, so that our generative z model can be independently asked what a husky is, what fluffy is, etc. The high level LTP loss includes a log p(z) term: we need to be able to compute log probabilities for z as a whole. It's at least plausible to me that humans can be asked to assign probabilities to individual statements like our husky statements, but stitching this together seems rough. The interpolation problem Consider the following statements: A husky is a large, fluffy dog that looks quite like a wolf. A husky is a large, fluffy dog that's very similar to a wolf. A husky is a big, fluffy dog that's very similar to a wolf. A husky is a big, fluffy dog that's closely related to wolves. Tomatoes are usually red. The first four statements are all true with overwhelming probability, as is the last, but to make the thought experiment better let's say their individual probabilities are all around p = 0.9. What about their joint probabilities? For any subset of the first four statements, the joint probability will also be roughly p = 0.9, since the statements have extremely high correlation. However, if we take a set that includes 1-4 of the first 4 statements and the last statement, the probability will be closer to p2≈0.8, since the two clusters of statements are mostly independent. What's the ellipsis? Since we're in neural net land, we likely have a variety of natural ways to approximately map statements into a continuous vector space: in terms of random bits drawn, in terms of the activations resulting from whatever statement these statements conditioned on, etc. For any of these, we'll get a natural interpolation scheme between any two statements, even statements that are completely unrelated to each other. LTP...

Bank On It
Episode 500 Rohin Tagra from Azimuth GRC

Bank On It

Play Episode Listen Later Apr 14, 2022 38:34


This episode was produced remotely using the ListenDeck standardized audio & video production system. If you're looking to jumpstart your podcast miniseries or upgrade your podcast or video production please visit www.ListenDeck.com. You can subscribe to this podcast and stay up to date on all the stories here on Apple Podcasts, Google Play, Stitcher, Spotify, Amazon and iHeartRadio. In this episode the host John Siracusa chats remotely with Rohin Tagra, founder & CEO of Azimuth GRC.  Azimuth GRC is a provider of SaaS-based compliance platform intended for companies in regulated industries to comply with applicable laws.   Tune in and Listen. Subscribe now on Apple Podcasts, Google , Stitcher, Spotify, Amazon and iHeartRadio to hear next Tuesdays episode with Thomas Li from Daloopa. About the host:   John, is the host of the  ‘Bank On It' podcast recorded onsite in Wall Street at OpenFin and the founder of the remotely recorded, studio quality standardized podcast production system ListenDeck. Follow John on LinkedIn, Twitter, Medium

TalkRL: The Reinforcement Learning Podcast
Rohin Shah

TalkRL: The Reinforcement Learning Podcast

Play Episode Listen Later Apr 12, 2022 97:04 Transcription Available


Dr. Rohin Shah is a Research Scientist at DeepMind, and the editor and main contributor of the Alignment Newsletter.Featured ReferencesThe MineRL BASALT Competition on Learning from Human FeedbackRohin Shah, Cody Wild, Steven H. Wang, Neel Alex, Brandon Houghton, William Guss, Sharada Mohanty, Anssi Kanervisto, Stephanie Milani, Nicholay Topin, Pieter Abbeel, Stuart Russell, Anca DraganPreferences Implicit in the State of the WorldRohin Shah, Dmitrii Krasheninnikov, Jordan Alexander, Pieter Abbeel, Anca DraganBenefits of Assistance over Reward Learning Rohin Shah, Pedro Freire, Neel Alex, Rachel Freedman, Dmitrii Krasheninnikov, Lawrence Chan, Michael D Dennis, Pieter Abbeel, Anca Dragan, Stuart RussellOn the Utility of Learning about Humans for Human-AI CoordinationMicah Carroll, Rohin Shah, Mark K. Ho, Thomas L. Griffiths, Sanjit A. Seshia, Pieter Abbeel, Anca DraganEvaluating the Robustness of Collaborative AgentsPaul Knott, Micah Carroll, Sam Devlin, Kamil Ciosek, Katja Hofmann, A. D. Dragan, Rohin ShahAdditional References AGI Safety Fundamentals, EA Cambridge

The Nonlinear Library
AF - Shah and Yudkowsky on alignment failures by Rohin Shah

The Nonlinear Library

Play Episode Listen Later Feb 28, 2022 144:05


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Shah and Yudkowsky on alignment failures, published by Rohin Shah on February 28, 2022 on The AI Alignment Forum. This is the final discussion log in the Late 2021 MIRI Conversations sequence, featuring Rohin Shah and Eliezer Yudkowsky, with additional comments from Rob Bensinger, Nate Soares, Richard Ngo, and Jaan Tallinn. The discussion begins with summaries and comments on Richard and Eliezer's debate. Rohin's summary has since been revised and published in the Alignment Newsletter. After this log, we'll be concluding this sequence with an AMA, where we invite you to comment with questions about AI alignment, cognition, forecasting, etc. Eliezer, Richard, Paul Christiano, Nate, and Rohin will all be participating. Color key: Chat by Rohin and Eliezer Other chat Emails Follow-ups 19. Follow-ups to the Ngo/Yudkowsky conversation 19.1. Quotes from the public discussion [Bensinger][9:22] (Nov. 25) Interesting extracts from the public discussion of Ngo and Yudkowsky on AI capability gains: Eliezer: I think some of your confusion may be that you're putting "probability theory" and "Newtonian gravity" into the same bucket. You've been raised to believe that powerful theories ought to meet certain standards, like successful bold advance experimental predictions, such as Newtonian gravity made about the existence of Neptune (quite a while after the theory was first put forth, though). "Probability theory" also sounds like a powerful theory, and the people around you believe it, so you think you ought to be able to produce a powerful advance prediction it made; but it is for some reason hard to come up with an example like the discovery of Neptune, so you cast about a bit and think of the central limit theorem. That theorem is widely used and praised, so it's "powerful", and it wasn't invented before probability theory, so it's "advance", right? So we can go on putting probability theory in the same bucket as Newtonian gravity? They're actually just very different kinds of ideas, ontologically speaking, and the standards to which we hold them are properly different ones. It seems like the sort of thing that would take a subsequence I don't have time to write, expanding beyond the underlying obvious ontological difference between validities and empirical-truths, to cover the way in which "How do we trust this, when" differs between "I have the following new empirical theory about the underlying model of gravity" and "I think that the logical notion of 'arithmetic' is a good tool to use to organize our current understanding of this little-observed phenomenon, and it appears within making the following empirical predictions..." But at least step one could be saying, "Wait, do these two kinds of ideas actually go into the same bucket at all?" In particular it seems to me that you want properly to be asking "How do we know this empirical thing ends up looking like it's close to the abstraction?" and not "Can you show me that this abstraction is a very powerful one?" Like, imagine that instead of asking Newton about planetary movements and how we know that the particular bits of calculus he used were empirically true about the planets in particular, you instead started asking Newton for proof that calculus is a very powerful piece of mathematics worthy to predict the planets themselves - but in a way where you wanted to see some highly valuable material object that calculus had produced, like earlier praiseworthy achievements in alchemy. I think this would reflect confusion and a wrongly directed inquiry; you would have lost sight of the particular reasoning steps that made ontological sense, in the course of trying to figure out whether calculus was praiseworthy under the standards of praiseworthiness that you'd been previously raised to believe in as universal standards about a...

Book Reviews Kill
Licanius Trilogy (EOTTC ~Chapter 17 - Chapter 34)

Book Reviews Kill

Play Episode Listen Later Jan 31, 2022 59:38


Things are certainly ramping up now! Evan and Chad delve deep into this one folks. Evan is not a fan of the Rohin plot line, and Chad thinks about making his own fan-film of a certain fight scene. Will Evan and Chad's predictions come to fruition or will they be proven wrong by the slow decay of time? Find out on the next exciting episode of Book Reviews Kill!   

The Art of Teaching
Rohan Dredge and Mike Hardie: How to build leaders worth following and cultures worth reproducing, Jim Rohin and John Maxwell and why self-awareness is everything.

The Art of Teaching

Play Episode Listen Later Jan 23, 2022 43:08


Welcome to this week's episode of The Art of Teaching podcast with the amazing Rohan Dredge and Mike Hardie. Rohan and Mike run For Leaders Global which is an organisation that helps organisations build leaders worth following and cultures worth reproducing.  In this wide-ranging episode we talked about: How they define leadership and if they believe that it can be taught. Why leadership is about realising potential, activating others and shaping a prefered future. Why we shouldn't wish for things to be easier, but we should wish for ourselves to be better. (Jin Rohn) It was an incredible privilege to speak with Rohan and Mike. I hope that you get a lot out of it.  The Art of Teaching Podcast resources:  Facebook Group: https://www.facebook.com/groups/artofteaching Here is the link to the show notes: https://theartofteachingpodcast.com/ Instagram: https://www.instagram.com/theartofteachingpodcast/ New Teacher Resources:  Website: https://imanewteacher.com/ Twitter: @Imanewteacher Instagram: @Imanewteacher

The Nonlinear Library
AF - [AN #171]: Disagreements between alignment "optimists" and "pessimists" by Rohin Shah

The Nonlinear Library

Play Episode Listen Later Jan 21, 2022 9:41


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [AN #171]: Disagreements between alignment "optimists" and "pessimists", published by Rohin Shah on January 21, 2022 on The AI Alignment Forum. Listen to this newsletter on The Alignment Newsletter Podcast. Alignment Newsletter is a weekly publication with recent content relevant to AI alignment around the world. Find all Alignment Newsletter resources here. In particular, you can look through this spreadsheet of all summaries that have ever been in the newsletter. Please note that while I work at DeepMind, this newsletter represents my personal views and not those of my employer. HIGHLIGHTS 1. We are very likely going to keep improving AI capabilities until we reach AGI, at which point either the world is destroyed, or we use the AI system to take some pivotal act before some careless actor destroys the world. 2. In either case, the AI system must be producing high-impact, world-rewriting plans; such plans are “consequentialist” in that the simplest way to get them (and thus, the one we will first build) is if you are forecasting what might happen, thinking about the expected consequences, considering possible obstacles, searching for routes around the obstacles, etc. If you don't do this sort of reasoning, your plan goes off the rails very quickly - it is highly unlikely to lead to high impact. In particular, long lists of shallow heuristics (as with current deep learning systems) are unlikely to be enough to produce high-impact plans. 3. We're producing AI systems by selecting for systems that can do impressive stuff, which will eventually produce AI systems that can accomplish high-impact plans using a general underlying “consequentialist”-style reasoning process (because that's the only way to keep doing more impressive stuff). However, this selection process does not constrain the goals towards which those plans are aimed. In addition, most goals seem to have convergent instrumental subgoals like survival and power-seeking that would lead to extinction. This suggests that we should expect an existential catastrophe by default. 4. None of the methods people have suggested for avoiding this outcome seem like they actually avert this story. Richard responds to this with a few distinct points: 1. It might be possible to build AI systems which are not of world-destroying intelligence and agency, that humans use to save the world. For example, we could make AI systems that do better alignment research. Such AI systems do not seem to require the property of making long-term plans in the real world in point (3) above, and so could plausibly be safe. 2. It might be possible to build general AI systems that only state plans for achieving a goal of interest that we specify, without executing that plan. 3. It seems possible to create consequentialist systems with constraints upon their reasoning that lead to reduced risk. 4. It also seems possible to create systems with the primary aim of producing plans with certain properties (that aren't just about outcomes in the world) -- think for example of corrigibility (AN #35) or deference to a human user. 5. (Richard is also more bullish on coordinating not to use powerful and/or risky AI systems, though the debate did not discuss this much.) Eliezer's responses: 1. AI systems that help with alignment research to such a degree that it actually makes a difference are almost certainly already dangerous. 2. It is the plan itself that is risky; if the AI system made a plan for a goal that wasn't the one we actually meant, and we don't understand that plan, that plan can still cause extinction. It is the misaligned optimization that produced the plan that is dangerous. 3 and 4. It is certainly possible to do such things; the space of minds that could be designed is very large. However, it is difficult to do such things, as they tend ...

The Nonlinear Library
AF - Counterexamples to some ELK proposals by Paul Christiano

The Nonlinear Library

Play Episode Listen Later Dec 31, 2021 11:37


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Counterexamples to some ELK proposals, published by Paul Christiano on December 31, 2021 on The AI Alignment Forum. In this post I'll describe some possible approaches to eliciting latent knowledge (ELK) not discussed in our report. These are basically restatements of proposals by Davidad, Rohin, Ramana, and John Maxwell. For each approach, I'll present one or two counterexamples that I think would break it. I think of these approaches as being based on two fundamental hopes: If our AI tampers with one sensor, there may be other sensors we could deploy, or experiments we could run, in order to notice that something fishy has happened. We could have an AI help us design those experiments/sensors. Actions that covertly tamper may be similar to actions that visibly tamper, e.g. they may exploit similar unknown facts about physics or they may involve common substeps. So we might be able to notice covert tampering by pointing out similar action sequences that lead to visible tampering. While these hopes may hold in practice, in this post I'll describe possible situations where they would fail. (I don't think anyone particularly expected these approaches to be counterexample-proof.) I'm only claiming that these failures are possible. I think it is also interesting to try to generate the most probable failure cases and to actually estimate their probability, but that's definitely not what I'm doing here. Strategy: Hold out sensors (Proposal #1 here) In addition to asking my model to predict one set of sensors, I can deploy a second “held out” set (e.g. more cameras or seismographs or microphones.). I don't ask my model to predict those sensors, but I show them to a human before getting them to answer questions. Hopefully that sometimes lets them notice tampering that would have been invisible without the new sensors. Worry: human simulator can use all the extra sensor readings too I'm concerned that I will learn a better version of the human simulator which predicts the readings of all sensors and then outputs what a human would infer from the complete set. That task need not be much more computationally demanding than the original human simulator. Moreover, my AI may be able to deduce the presence of the additional sensors very cheaply. Perhaps it can notice the sensors, or it can learn about my past actions to get a hint about where I may have placed them. If this is possible, then “Predict the readings of all sensors” need not be much more complex than “Predict the readings of one sensor.” An AI doesn't have to be exactly correct about what sensors exist—when in doubt it can just predict what the sensor would say and imagine giving it to the human. Strategy: Predict hypothetical sensors (Proposal #2 here, also suggested with counterexample by Rohin in private communication) Instead of installing a single sensor, I could randomly choose what sensors to deploy and how to install them. Then I could train a model Predict+ to map (description of sensor) → (sensor reading), and I could give a human the readings from many hypothetical sensors in order to give them the best chance of detecting tampering. Worry 1: Predict+ may not be reasonable on non-existent sensors To collect training data for Predict+, I actually install sensors and then use their readings as a target. In every situation, my AI is being asked to predict the readings from a sensor that I really did build. If my AI is able to deduce the presence of sensors (as described in the last counterexample), then it's unclear how Predict+(sensor description) will behave when we give it a description of a sensor that does not in fact exist. In particular, Predict+(sensor description) might behave in the following undesirable way: it predicts the future observations conditioned on its observations and on the fact that some...

The Nonlinear Library
LW - Ambitious vs. narrow value learning by paulfchristiano from Value Learning

The Nonlinear Library

Play Episode Listen Later Dec 24, 2021 27:48


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is Value Learning, Part 13: Ambitious vs. narrow value learning, published by paulfchristiano. Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. (Re)Posted as part of the AI Alignment Forum sequence on Value Learning. Rohin's note: The definition of narrow value learning in the previous post focused on the fact that the resulting behavior is limited to some domain. The definition in this post focuses on learning instrumental goals and values. While the definitions are different, I have used the same term for both because I believe that they are both pointing at the same underlying concept. (I do not know if Paul agrees.) I'm including this post to give a different perspective on what I mean by narrow value learning, before delving into conceptual ideas within narrow value learning. Suppose I'm trying to build an AI system that “learns what I want” and helps me get it. I think that people sometimes use different interpretations of this goal. At two extremes of a spectrum of possible interpretations: The AI learns my preferences over (very) long-term outcomes. If I were to die tomorrow, it could continue pursuing my goals without me; if humanity were to disappear tomorrow, it could rebuild the kind of civilization we would want; etc. The AI might pursue radically different subgoals than I would on the scale of months and years, if it thinks that those subgoals better achieve what I really want. The AI learns the narrower subgoals and instrumental values I am pursuing. It learns that I am trying to schedule an appointment for Tuesday and that I want to avoid inconveniencing anyone, or that I am trying to fix a particular bug without introducing new problems, etc. It does not make any effort to pursue wildly different short-term goals than I would in order to better realize my long-term values, though it may help me correct some errors that I would be able to recognize as such. I think that many researchers interested in AI safety per se mostly think about the former. I think that researchers with a more practical orientation mostly think about the latter. The ambitious approach The maximally ambitious approach has a natural theoretical appeal, but it also seems quite hard. It requires understanding human preferences in domains where humans are typically very uncertain, and where our answers to simple questions are often inconsistent, like how we should balance our own welfare with the welfare of others, or what kinds of activities we really want to pursue vs. enjoying in the moment. (It seems unlikely to me that there is a unified notion of “what I want” in many of these cases.) It also requires extrapolation to radically unfamiliar domains, where we will need to make decisions about issues like population ethics, what kinds of creatures do we care about, and unforeseen new technologies. I have written about this problem, pointing out that it is unclear how you would solve it even with an unlimited amount of computing power. My impression is that most practitioners don't think of this problem even as a long-term research goal — it's a qualitatively different project without direct relevance to the kinds of problems they want to solve. The narrow approach The narrow approach looks relatively tractable and well-motivated by existing problems. We want to build machines that helps us do the things we want to do, and to that end they need to be able to understand what we are trying to do and what instrumental values guide our behavior. To the extent that our “preferences” are underdetermined or inconsistent, we are happy if our systems at least do as well as a human, and make the kinds of improvements that humans would reliably consider improvements. But it's not clear that anything short of the maximally ambitious approach can solve the prob...

The Nonlinear Library
LW - Model Mis-specification and Inverse Reinforcement Learning by Owain_Evans, jsteinhardt from Value Learning

The Nonlinear Library

Play Episode Listen Later Dec 24, 2021 29:27


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is Value Learning, Part 6: Model Mis-specification and Inverse Reinforcement Learning, published by Owain_Evans, jsteinhardt. Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. Posted as part of the AI Alignment Forum sequence on Value Learning. Rohin's note: While I motivated the last post with an example of using a specific model for human biases, in this post (original here), Jacob Steinhardt and Owain Evans point out that model mis-specification can arise in other parts of inverse reinforcement learning as well. The arguments here consider some more practical concerns (for example, the worries about getting only short-term data for each human would not be a problem if you had the entire human policy). In my previous post, “Latent Variables and Model Mis-specification”, I argued that while machine learning is good at optimizing accuracy on observed signals, it has less to say about correctly inferring the values for unobserved variables in a model. In this post I'd like to focus in on a specific context for this: inverse reinforcement learning (Ng et al. 2000, Abbeel et al. 2004, Ziebart et al. 2008, Ho et al 2016), where one observes the actions of an agent and wants to infer the preferences and beliefs that led to those actions. For this post, I am pleased to be joined by Owain Evans, who is an active researcher in this area and has co-authored an online book about building models of agents (see here in particular for a tutorial on inverse reinforcement learning and inverse planning). Owain and I are particularly interested in inverse reinforcement learning (IRL) because it has been proposed (most notably by Stuart Russell) as a method for learning human values in the context of AI safety; among other things, this would eventually involve learning and correctly implementing human values by artificial agents that are much more powerful, and act with much broader scope, than any humans alive today. While we think that overall IRL is a promising route to consider, we believe that there are also a number of non-obvious pitfalls related to performing IRL with a mis-specified model. The role of IRL in AI safety is to infer human values, which are represented by a reward function or utility function. But crucially, human values (or human reward functions) are never directly observed. Below, we elaborate on these issues. We hope that by being more aware of these issues, researchers working on inverse reinforcement learning can anticipate and address the resulting failure modes. In addition, we think that considering issues caused by model mis-specification in a particular concrete context can better elucidate the general issues pointed to in the previous post on model mis-specification. Specific Pitfalls for Inverse Reinforcement Learning In “Latent Variables and Model Mis-specification”, Jacob talked about model mis-specification, where the “true” model does not lie in the model family being considered. We encourage readers to read that post first, though we've also tried to make the below readable independently. In the context of inverse reinforcement learning, one can see some specific problems that might arise due to model mis-specification. For instance, the following are things we could misunderstand about an agent, which would cause us to make incorrect inferences about the agent's values: The actions of the agent. If we believe that an agent is capable of taking a certain action, but in reality they are not, we might make strange inferences about their values (for instance, that they highly value not taking that action). Furthermore, if our data is e.g. videos of human behavior, we have an additional inference problem of recognizing actions from the frames. The information available to the agent. If an agent has access to more inf...

The Nonlinear Library
LW - Latent Variables and Model Mis-Specification by jsteinhardt from Value Learning

The Nonlinear Library

Play Episode Listen Later Dec 24, 2021 19:10


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is Value Learning, Part 5: Latent Variables and Model Mis-Specification, published by jsteinhardt. Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. Posted as part of the AI Alignment Forum sequence on Value Learning. Rohin's note: So far, we've seen that ambitious value learning needs to understand human biases, and that we can't simply learn the biases in tandem with the reward. Perhaps we could hardcode a specific model of human biases? Such a model is likely to be incomplete and inaccurate, but it will perform better than assuming an optimal human, and as we notice failure modes we can improve the model. In the language of this post by Jacob Steinhardt (original here), we are using a mis-specified human model. The post talks about why model mis-specification is worse than it may seem at first glance. This post is fairly technical and may not be accessible if you don't have a background in machine learning. If so, you can skip this post and still understand the rest of the posts in the sequence. However, if you want to do ML-related safety research, I strongly recommend putting in the effort to understand the problems that can arise with mis-specification. Machine learning is very good at optimizing predictions to match an observed signal — for instance, given a dataset of input images and labels of the images (e.g. dog, cat, etc.), machine learning is very good at correctly predicting the label of a new image. However, performance can quickly break down as soon as we care about criteria other than predicting observables. There are several cases where we might care about such criteria: In scientific investigations, we often care less about predicting a specific observable phenomenon, and more about what that phenomenon implies about an underlying scientific theory. In economic analysis, we are most interested in what policies will lead to desirable outcomes. This requires predicting what would counterfactually happen if we were to enact the policy, which we (usually) don't have any data about. In machine learning, we may be interested in learning value functions which match human preferences (this is especially important in complex settings where it is hard to specify a satisfactory value function by hand). However, we are unlikely to observe information about the value function directly, and instead must infer it implicitly. For instance, one might infer a value function for autonomous driving by observing the actions of an expert driver. In all of the above scenarios, the primary object of interest — the scientific theory, the effects of a policy, and the value function, respectively — is not part of the observed data. Instead, we can think of it as an unobserved (or “latent”) variable in the model we are using to make predictions. While we might hope that a model that makes good predictions will also place correct values on unobserved variables as well, this need not be the case in general, especially if the model is mis-specified. I am interested in latent variable inference because I think it is a potentially important sub-problem for building AI systems that behave safely and are aligned with human values. The connection is most direct for value learning, where the value function is the latent variable of interest and the fidelity with which it is learned directly impacts the well-behavedness of the system. However, one can imagine other uses as well, such as making sure that the concepts that an AI learns sufficiently match the concepts that the human designer had in mind. It will also turn out that latent variable inference is related to counterfactual reasoning, which has a large number of tie-ins with building safe AI systems that I will elaborate on in forthcoming posts. The goal of this post is to explain why problems s...

The Nonlinear Library
LW - Humans can be assigned any values whatsoever by Stuart_Armstrong from Value Learning

The Nonlinear Library

Play Episode Listen Later Dec 24, 2021 10:13


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is Value Learning, Part 4: Humans can be assigned any values whatsoever., published by Stuart_Armstrong. Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. (Re)Posted as part of the AI Alignment Forum sequence on Value Learning. Rohin's note: In the last post, we saw that a good broad value learning approach would need to understand the systematic biases in human planning in order to achieve superhuman performance. Perhaps we can just use machine learning again and learn the biases and reward simultaneously? This post by Stuart Armstrong (original here) and the associated paper say: “Not without more assumptions.” This post comes from a theoretical perspective that may be alien to ML researchers; in particular, it makes an argument that simplicity priors do not solve the problem pointed out here, where simplicity is based on Kolmogorov complexity (which is an instantiation of the Minimum Description Length principle). The analog in machine learning would be an argument that regularization would not work. The proof used is specific to Kolmogorov complexity and does not clearly generalize to arbitrary regularization techniques; however, I view the argument as being suggestive that regularization techniques would also be insufficient to address the problems raised here. Humans have no values. nor do any agent. Unless you make strong assumptions about their rationality. And depending on those assumptions, you get humans to have any values. An agent with no clear preferences There are three buttons in this world, B 0 B 1 , and X , and one agent H B 0 and B 1 can be operated by H , while X can be operated by an outside observer. H will initially press button B 0 ; if ever X is pressed, the agent will switch to pressing B 1 . If X is pressed again, the agent will switch back to pressing B 0 , and so on. After a large number of turns N H will shut off. That's the full algorithm for H So the question is, what are the values/preferences/rewards of H ? There are three natural reward functions that are plausible: R 0 , which is linear in the number of times B 0 is pressed. R 1 , which is linear in the number of times B 1 is pressed. R 2 I E X R 0 I O X R 1 , where I E X is the indicator function for X being pressed an even number of times, I O X 1 − I E X being the indicator function for X being pressed an odd number of times. For R 0 , we can interpret H as an R 0 maximising agent which X overrides. For R 1 , we can interpret H as an R 1 maximising agent which X releases from constraints. And R 2 is the “ H is always fully rational” reward. Semantically, these make sense for the various R i 's being a true and natural reward, with X “coercive brain surgery” in the first case, X “release H from annoying social obligations” in the second, and X “switch which of R 0 and R 1 gives you pleasure” in the last case. But note that there is no semantic implications here, all that we know is H , with its full algorithm. If we wanted to deduce its true reward for the purpose of something like Inverse Reinforcement Learning (IRL), what would it be? Modelling human (ir)rationality and reward Now let's talk about the preferences of an actual human. We all know that humans are not always rational. But even if humans were fully rational, the fact remains that we are physical, and vulnerable to things like coercive brain surgery (and in practice, to a whole host of other more or less manipulative techniques). So there will be the equivalent of “button X ” that overrides human preferences. Thus, “not immortal and unchangeable” is in practice enough for the agent to be considered “not fully rational”. Now assume that we've thoroughly observed a given human h (including their internal brain wiring), so we know the human policy π h (which determines their actions in a...

The Nonlinear Library
LW - The easy goal inference problem is still hard by paulfchristiano from Value Learning

The Nonlinear Library

Play Episode Listen Later Dec 24, 2021 6:55


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is Value Learning, Part 3: The easy goal inference problem is still hard, published by paulfchristiano. Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. Posted as part of the AI Alignment Forum sequence on Value Learning. Rohin's note: In this post (original here), Paul Christiano analyzes the ambitious value learning approach. He considers a more general view of ambitious value learning where you infer preferences more generally (i.e. not necessarily in the form of a utility function), and you can ask the user about their preferences, but it's fine to imagine that you infer a utility function from data and then optimize it. The key takeaway is that in order to infer preferences that can lead to superhuman performance, it is necessary to understand how humans are biased, which seems very hard to do even with infinite data. One approach to the AI control problem goes like this: Observe what the user of the system says and does. Infer the user's preferences. Try to make the world better according to the user's preference, perhaps while working alongside the user and asking clarifying questions. This approach has the major advantage that we can begin empirical work today — we can actually build systems which observe user behavior, try to figure out what the user wants, and then help with that. There are many applications that people care about already, and we can set to work on making rich toy models. It seems great to develop these capabilities in parallel with other AI progress, and to address whatever difficulties actually arise, as they arise. That is, in each domain where AI can act effectively, we'd like to ensure that AI can also act effectively in the service of goals inferred from users (and that this inference is good enough to support foreseeable applications). This approach gives us a nice, concrete model of each difficulty we are trying to address. It also provides a relatively clear indicator of whether our ability to control AI lags behind our ability to build it. And by being technically interesting and economically meaningful now, it can help actually integrate AI control with AI practice. Overall I think that this is a particularly promising angle on the AI safety problem. Modeling imperfection That said, I think that this approach rests on an optimistic assumption: that it's possible to model a human as an imperfect rational agent, and to extract the real values which the human is imperfectly optimizing. Without this assumption, it seems like some additional ideas are necessary. To isolate this challenge, we can consider a vast simplification of the goal inference problem: The easy goal inference problem: Given no algorithmic limitations and access to the complete human policy — a lookup table of what a human would do after making any sequence of observations — find any reasonable representation of any reasonable approximation to what that human wants. I think that this problem remains wide open, and that we've made very little headway on the general case. We can make the problem even easier, by considering a human in a simple toy universe making relatively simple decisions, but it still leaves us with a very tough problem. It's not clear to me whether or exactly how progress in AI will make this problem easier. I can certainly see how enough progress in cognitive science might yield an answer, but it seems much more likely that it will instead tell us “Your question wasn't well defined.” What do we do then? I am especially interested in this problem because I think that “business as usual” progress in AI will probably lead to the ability to predict human behavior relatively well, and to emulate the performance of experts. So I really care about the residual — what do we need to know to address AI control, beyond what...

The Henry George Program
Rohin Ghosh on Peninsula Non-Profits, and Youth Perspective in Palo Alto

The Henry George Program

Play Episode Listen Later Aug 26, 2021


Rohin Ghosh was a high-schooler in Palo Alto just a few months ago, but has already had years of involvement in renter and houseless campaigns throughout the Peninsula, and is here to talk about what's it's like for teens in this crazy environment, as well as his perspective on the landscape of non-profits throughout the Peninsula, based on his work. Also talk about how cities are reflecting to RHNA allocations, tenant organizing in Palo Alto, and more

The Marketing Careers Podcast
Ian Rohin on How To Future-Proof Agency-Client Relationships

The Marketing Careers Podcast

Play Episode Listen Later Aug 2, 2021 36:13


A marketing career path that is focused on agency roles,  specifically client management roles, needs to understand and master how to build and develop relationships. Ian Rohin discusses how advancing his marketing career in the same agency, over 20 years, gave him the opportunity to become one of the best Client Business Partners in the industry today. Connect with Ian Rohin: via Linkedin:  linkedin.com/in/ianrohin/via Email: ianrohin (at) gmail (dot) comResources from this episode:Marketing Jobs at UM Worldwide. https://www.umww.com/jobs/Marketing Career Resources:Find the right Marketing Course for you: - themarketinghelp.co/linkedinlearningJoin The Marketing Help newsletter for monthly career tips and updates:- themarketinghelp.co/subscribe