Analysis that corresponds too closely to a particular set of data and may fail to fit additional data
POPULARITY
Explains language models (LLMs) advancements. Scaling laws - the relationships among model size, data size, and compute - and how emergent abilities such as in-context learning, multi-step reasoning, and instruction following arise once certain scaling thresholds are crossed. The evolution of the transformer architecture with Mixture of Experts (MoE), describes the three-phase training process culminating in Reinforcement Learning from Human Feedback (RLHF) for model alignment, and explores advanced reasoning techniques such as chain-of-thought prompting which significantly improve complex task performance. Links Notes and resources at ocdevel.com/mlg/mlg34 Build the future of multi-agent software with AGNTCY Try a walking desk stay healthy & sharp while you learn & code Transformer Foundations and Scaling Laws Transformers: Introduced by the 2017 "Attention is All You Need" paper, transformers allow for parallel training and inference of sequences using self-attention, in contrast to the sequential nature of RNNs. Scaling Laws: Empirical research revealed that LLM performance improves predictably as model size (parameters), data size (training tokens), and compute are increased together, with diminishing returns if only one variable is scaled disproportionately. The "Chinchilla scaling law" (DeepMind, 2022) established the optimal model/data/compute ratio for efficient model performance: earlier large models like GPT-3 were undertrained relative to their size, whereas right-sized models with more training data (e.g., Chinchilla, LLaMA series) proved more compute and inference efficient. Emergent Abilities in LLMs Emergence: When trained beyond a certain scale, LLMs display abilities not present in smaller models, including: In-Context Learning (ICL): Performing new tasks based solely on prompt examples at inference time. Instruction Following: Executing natural language tasks not seen during training. Multi-Step Reasoning & Chain of Thought (CoT): Solving arithmetic, logic, or symbolic reasoning by generating intermediate reasoning steps. Discontinuity & Debate: These abilities appear abruptly in larger models, though recent research suggests that this could result from non-linearities in evaluation metrics rather than innate model properties. Architectural Evolutions: Mixture of Experts (MoE) MoE Layers: Modern LLMs often replace standard feed-forward layers with MoE structures. Composed of many independent "expert" networks specializing in different subdomains or latent structures. A gating network routes tokens to the most relevant experts per input, activating only a subset of parameters—this is called "sparse activation." Enables much larger overall models without proportional increases in compute per inference, but requires the entire model in memory and introduces new challenges like load balancing and communication overhead. Specialization & Efficiency: Experts learn different data/knowledge types, boosting model specialization and throughput, though care is needed to avoid overfitting and underutilization of specialists. The Three-Phase Training Process 1. Unsupervised Pre-Training: Next-token prediction on massive datasets—builds a foundation model capturing general language patterns. 2. Supervised Fine Tuning (SFT): Training on labeled prompt-response pairs to teach the model how to perform specific tasks (e.g., question answering, summarization, code generation). Overfitting and "catastrophic forgetting" are risks if not carefully managed. 3. Reinforcement Learning from Human Feedback (RLHF): Collects human preference data by generating multiple responses to prompts and then having annotators rank them. Builds a reward model (often PPO) based on these rankings, then updates the LLM to maximize alignment with human preferences (helpfulness, harmlessness, truthfulness). Introduces complexity and risk of reward hacking (specification gaming), where the model may exploit the reward system in unanticipated ways. Advanced Reasoning Techniques Prompt Engineering: The art/science of crafting prompts that elicit better model responses, shown to dramatically affect model output quality. Chain of Thought (CoT) Prompting: Guides models to elaborate step-by-step reasoning before arriving at final answers—demonstrably improves results on complex tasks. Variants include zero-shot CoT ("let's think step by step"), few-shot CoT with worked examples, self-consistency (voting among multiple reasoning chains), and Tree of Thought (explores multiple reasoning branches in parallel). Automated Reasoning Optimization: Frontier models selectively apply these advanced reasoning techniques, balancing compute costs with gains in accuracy and transparency. Optimization for Training and Inference Tradeoffs: The optimal balance between model size, data, and compute is determined not only for pretraining but also for inference efficiency, as lifetime inference costs may exceed initial training costs. Current Trends: Efficient scaling, model specialization (MoE), careful fine-tuning, RLHF alignment, and automated reasoning techniques define state-of-the-art LLM development.
Teil 2 unseres Preisprognose-Experiments für Gebrauchtfahrzeuge: Können Open-Source-LLMs wie Llama 3.1, Mistral und Leo-HessianAI mit GPT-3.5 mithalten? Wir haben fleißig gefinetuned, bis die Motoren qualmten – und es zeigt sich, dass die Unterschiede gar nicht mehr so groß sind. Mit ausreichend vielen Trainingsbeobachtungen nähern sich die Open-Source-Modelle den Ergebnissen von GPT-3.5 an und können es in einzelnen Metriken sogar übertreffen. Für das Finetuning größerer Modelle sind jedoch auch leistungsfähige GPUs notwendig, was die Ressourcenanforderungen deutlich erhöht. In der Folge beleuchten wir, welchen Mehrwert diese Open-Source-LLMs für praxisnahe Use Cases liefern und welche Herausforderungen dabei auftreten. Zusammenfassung: Vergleich von OpenAI GPT-3.5 und drei Open-Source-LLMs (Llama 3.1, Mistral 7B, Leo-HessianAI) Finetuning der Modelle auf lokalen Daten Ergebnisse: Open-Source-LLMs sind bei größerem Trainingsdatensatz fast so gut wie GPT-3.5 XGBoost hinkt etwas hinterher, da Freitexte hier nicht einbezogen wurden Wichtige Faktoren: Batchgröße, Trainingsschritte, Speicherbedarf und Nutzung von Lora-Finetuning Beim Einsatz von Open Source ist mehr Handarbeit nötig, dafür bleibt alles on-premise OpenAI punktet durch Einfachheit und hohe Qualität ohne großen Datenbedarf Frameworks wie Huggingface, Mistral Codebase und Torchtune unterstützen das Finetuning Ausblick: größere LLMs mit Multi-GPU, multimodale Daten und Unsicherheitsquantifizierung ***Links*** [Blog] Predictive LLMs: Übertreffen Open-Source-Modelle OpenAI bei Preisprognosen? https://www.inwt-statistics.de/blog/predictive-llms-uebertreffen-os-modelle-openai-bei-preisprognosen [Podcast] #50: Predictive Analytics mit LLMs: ist GPT3.5 besser als XGBoost? https://www.podbean.com/ew/pb-n6wem-165cb2c [Blog] Predictive LLMs: Kann GPT-3.5 die Prognosen von XGBoost verbessern? https://www.inwt-statistics.de/blog/predictive-llms-kann-gpt-xgboost-prognosen-verbessern [Podcast] #43: Damit es im Live-Betrieb nicht kracht: Vermeidung von Overfitting & Data Leakage https://www.podbean.com/ew/pb-vw736-15baac0 [Link] Llama-3.1-8B-Instruct auf Huggingface https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct - [Link] Mistral-7B-Instruct-v0.3 auf Huggingface https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3 [Link] Mistral 7B Release Notes https://mistral.ai/news/announcing-mistral-7b/ [Link] leo-hessianai-7b auf Huggingface https://huggingface.co/LeoLM/leo-hessianai-7b [Link] The Hessian Center for Artificial Intelligence https://hessian.ai/de/ [Docs] LangChain: How to return structured data from a model https://python.langchain.com/docs/how_to/structured_output/#the-with_structured_output-method [Link] Wie hoch sind die Treibhausgasemissionen pro Person in Deutschland durchschnittlich? https://www.umweltbundesamt.de/service/uba-fragen/wie-hoch-sind-die-treibhausgasemissionen-pro-person#:~:text=Der%20deutsche%20Aussto%C3%9F%20an%20Treibhausgasen,sehr%20gro%C3%9Fe%20Unterschiede%20im%20Konsumniveau.
Zusammenfassend unsere Must-Haves: Datenbank / DWH Lösung zur Datenvisualisierung Möglichkeit, unkompliziert zu entwickeln (lokal oder im Web) Versionskontrolle / CI/CD Deployment-Lösung Trennung von Entwicklungs- und Produktivumgebung Monitoring für Modell & Ressourcen Verwandte Podcast-Episoden Folge #2: Erfolgsfaktoren für Predictive Analytics Projekte Folge #5: Data Warehouse vs. Data Lake vs. Data Mesh Folge #20: Ist Continuous Integration (CI) ein Muss für Data Scientists? Folge #21: Machine Learning Operations (MLOps) Folge #29: Die Qual der Wahl: Data Science Plattform vs. Customized Stack Folge #35: Erfolgsfaktoren für Machine Learning Projekte mit Philipp Jackmuth von dida Folge #43: Damit es im Live-Betrieb nicht kracht: Vermeidung von Overfitting & Data Leakage Folge #54: Modell-Deployment: Wie bringe ich mein Modell in die Produktion? Technologien & Tools Datenvisualisierung: Azure Databricks, AWS Quicksight, Redash Entwicklungsumgebung: VSCode, INWT Python IDE V2, Remote Explorer, Pycharm Versionskontrolle: GitHub, GitLab, Azure DevOps CI/CD: GitHub Actions, GitLab CI, Jenkins Deployment: Kubernetes, Docker, Helm, ArgoCD Experiment-Tracking: MLFlow, DVC, Tensorboard Monitoring: Prometheus, Grafana, AWS Cloudwatch
We continue talking with Tim about Overfitting and Heuristics in Philosophy (2024), considering Tim's overall project and view of what philosophy should be doing and with what tools. We get into modeling, ethics, public philosophy, and more. Get more at partiallyexaminedlife.com. Visit partiallyexaminedlife.com/support to get ad-free episodes and tons of bonus discussion, including a supporter-exclusive PEL Nightcap further reflecting on this episode. Sponsor: Apply for convenient term life insurance from Fabric by Gerber Life at meetfabric.com/PEL.
Oxford philosophy professor Timothy Williamson talks to us about his new book, Overfitting and Heuristics in Philosophy. How can we best apply the insights of philosophy of science to philosophy itself? Maybe some alleged philosophical counter-examples are just the result of psychological heuristics gone wrong. Get more at partiallyexaminedlife.com. Visit partiallyexaminedlife.com/support to get ad-free episodes and tons of bonus discussion. Sponsor: Get a $1/month e-commerce trial at shopify.com/pel.
【自由的路徑|2024最終梯次+無所事事小聚|開放報名】 【 ✸ 海倫講座-歡迎報名 】 9/29(日)10:00-12:00 @好伴共享空間 - 【本集關鍵字:模型】 從歷史學習, 或被歷史困住? 烏合之眾, 還是集體智慧? 一輩子做好一件事, 或學著截然分人? 更多學習, 會帶來更多矛盾? 還是逐步趨近, 可靠的智慧? 系統思考的究極架構, 讓我們,挑戰看看吧! ※※※ 《多模型思維:天才的32個思考策略》 讀墨電子書|博客來商城 ※※※ (00:11:54) 《荀子.勸學》:「螣蛇無足而飛,梧鼠五技而窮。」 (00:12:48) 孔多賽陪審團定理 (00:16:18) 推理:邏輯條件 REASON (00:20:37) 解釋:歷史因果 EXPLAIN (00:23:29) 策畫:衡量選擇 DESIGN (00:23:48) 溝通:傳達訊息 COMMUNICATE (00:27:14) 行動:做出決定 ACT (00:27:24) 預測:未來因果 PREDICT (00:27:43) 探索:虛構假設 EXPLORE (00:28:16) EP35《蒼蠅王》 - (00:31:33) 多模型怎麼用? (00:38:55) 簡化或擬真 (00:40:13) 簡化模型:推理、解釋、溝通、探索 (00:40:30) EP48《正義:一場思辨之旅》 (00:46:32) 每個模型,都是謊言 (00:47:46) 細緻模型:策畫、行動、預測 - (00:50:41) 一對多模型 (00:53:10) 《師父:那些我在課堂外學會的本事》 (00:54:25) EP277《大自然就是要你胖》 (00:55:18) EP13《規模的規律和祕密》 (00:57:24) EP148《夠了:約翰‧伯格談金錢的最佳策略》 (01:08:08) 過度擬合 OVERFITTING (01:09:40) 《沒了名片,你還剩下什麼?》 (01:14:18) EP31《自私的美德》 (01:14:26) 《專業之死》 -- Hosting provided by SoundOn
In this episode of The Cognitive Revolution, Nathan explores the cutting-edge intersection of AI and biology with Stanford assistant professor Brian Hie. Discover how AI is revolutionizing our understanding of biological systems and creating new possibilities for interventions. Brian discusses his groundbreaking papers on hybrid AI architectures, the surprising capabilities of language models trained on DNA sequences, and AI-guided evolution of antibodies. Join us for an insightful journey into the future of AI in biology, touching on biosecurity, drug discovery, and the challenges of interpreting AI models in biological contexts. Links : Brian Hie : https://x.com/BrianHie Scanorama paper Mechanistic Design and Scaling of Hybrid Architectures Sequence modeling and design from molecular to genome scale with Evo Unsupervised evolution of protein and antibody complexes with a structure-informed language model Apply to join over 400 Founders and Execs in the Turpentine Network: https://www.turpentinenetwork.co/ SPONSORS: 80,000 Hours: 80,000 Hours offers free one-on-one career advising for Cognitive Revolution listeners aiming to tackle global challenges, especially in AI. They connect high-potential individuals with experts, opportunities, and personalized career plans to maximize positive impact. Apply for a free call at https://80000hours.org/cognitiverevolution to accelerate your career and contribute to solving pressing AI-related issues. Brave: The Brave search API can be used to assemble a data set to train your AI models and help with retrieval augmentation at the time of inference. All while remaining affordable with developer first pricing, integrating the Brave search API into your workflow translates to more ethical data sourcing and more human representative data sets. Try the Brave search API for free for up to 2000 queries per month at https://bit.ly/BraveTCR Omneky: Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off https://www.omneky.com/ Oracle: Oracle Cloud Infrastructure (OCI) is a single platform for your infrastructure, database, application development, and AI needs. OCI has four to eight times the bandwidth of other clouds; offers one consistent price, and nobody does data better than Oracle. If you want to do more and spend less, take a free test drive of OCI at https://oracle.com/cognitive RECOMMENDED PODCAST: This Won't Last - Eavesdrop on Keith Rabois, Kevin Ryan, Logan Bartlett, and Zach Weinberg's monthly backchannel ft their hottest takes on the future of tech, business, and venture capital. Spotify: https://open.spotify.com/show/2HwSNeVLL1MXy0RjFPyOSz CHAPTERS: (00:00:00) About the Show (00:00:22) About the Episode (00:03:22) Introduction and Overview (00:03:55) AI for Biology: Big Picture (00:05:49) Challenges in Biology and Drug Discovery (00:08:36) Scanorama and Single Cell Transcriptomics (00:10:27) Brian's Research Identity (00:13:00) Mechanistic Design of Hybrid Architectures (00:14:52) Evo: DNA Language Model (Part 1) (00:18:10) Sponsors: 80,000 Hours | Brave (00:20:44) Evo: DNA Language Model (Part 2) (00:28:58) Gene Essentiality and Evo's Capabilities (00:29:19) Unsupervised Evolution of Protein Complexes (Part 1) (00:31:02) Sponsors: Omneky | Oracle (00:32:24) Unsupervised Evolution of Protein Complexes (Part 2) (01:10:27) Improving Antibody Binding Affinity (01:14:24) Overfitting and ARC Institute (01:18:11) Sponsors: Outro
Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!My Intuitive Bayes Online Courses1:1 Mentorship with meOur theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work !Visit our Patreon page to unlock exclusive Bayesian swag ;)TakeawaysBayesian methods align better with researchers' intuitive understanding of research questions and provide more tools to evaluate and understand models.Prior sensitivity analysis is crucial for understanding the robustness of findings to changes in priors and helps in contextualizing research findings.Bayesian methods offer an elegant and efficient way to handle missing data in longitudinal studies, providing more flexibility and information for researchers.Fit indices in Bayesian model selection are effective in detecting underfitting but may struggle to detect overfitting, highlighting the need for caution in model complexity.Bayesian methods have the potential to revolutionize educational research by addressing the challenges of small samples, complex nesting structures, and longitudinal data. Posterior predictive checks are valuable for model evaluation and selection.Chapters00:00 The Power and Importance of Priors09:29 Updating Beliefs and Choosing Reasonable Priors16:08 Assessing Robustness with Prior Sensitivity Analysis34:53 Aligning Bayesian Methods with Researchers' Thinking37:10 Detecting Overfitting in SEM43:48 Evaluating Model Fit with Posterior Predictive Checks47:44 Teaching Bayesian Methods 54:07 Future Developments in Bayesian StatisticsThank you to my Patrons for making this episode possible!Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi...
Yapay zeka serimizin üçüncü bölümündeyiz. Geçen seferde oyunların güvenli sınırlarından çıkmış, tavsiye sistemlerinin karışık dünyasına girmiştik Youtube örneği üstünden. Buradan devam ediyoruz. Fularsız Entelliki dinlediniz, sevdiniz, sevdiğinizi ona söylediniz, o da sizden hoşlandı, seviyeli bir ilişkiniz var. Şimdi sistem size magazin haberleri mi önermeli, arı yetiştiriciliğiyle ilgili bir şeyler mi? Makina bu ilişkileri nasıl öğrenecek? Hatta daha temelde makina, bir içeriğin arı yetiştiriciliği hakkında olduğunu nereden bilecek? .Konular:(00:00) Spotify'da çalışan arkadaş(02:17) Seri Özeti(03:33) İşbirliğine dayalı filtreleme (kullanıcı bazlı)(08:48) İşbirliğine dayalı filtreleme (ürün bazlı)(13:54) Dolaylı işbirliği(15:49) İçerik bazlı filtreleme(19:12) Overfitting(21:50) Çok boyutlu modeller ve Music Genome Project(24:19) Derin öğrenmeye geçiş(25:58) Patreon TeşekkürlerKaynaklar:Video: How Recommender Systems Work (Netflix/Amazon)Yazı: A Comprehensive List of Similarity Search AlgorithmsYazı: Basics of Recommender SystemsYazı: The history of Amazon's recommendation algorithmYazı: How AI helps Spotify win in the music streaming worldYazı: AI's new workforce: the data-labelling industry spreads globally.------- Podbee Sunar -------Bu podcast, Frink hakkında reklam içerir.Frink uygulaması sol menüde bulunan ''Kupon Kodu Kullan'' alanına FRNKPOD kodunu tanımlayıp üyeliğini 200 TL indirimli başlatabilirsin. Hemen indir, üyeliğini başlat!: tıklayınız.Bu podcast, Hiwell hakkında reklam içerir.Hiwell'i indirmek ve podbee10 koduyla size özel indirimden faydalanmak için tıklayınız.See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
Zwei Herausforderungen bei der Zuverlässigkeit von Prognosen im Live-Betrieb sind Overfitting (Modell ist zu stark an Trainingsdaten angepasst) und Data Leakage (Modell verfügt über Informationen, die es in der realen Anwendung nicht hat). Wir sprechen darüber, was Overfitting und Data Leakage genau sind und wo ihre Ursachen liegen. Außerdem diskutieren wir Lösungsansätze. **Links:** Spurious Correlations: https://www.tylervigen.com/spurious-correlations inwt Website: https://www.inwt-statistics.de/
In 2023 we did a few Fundamentals episodes covering Benchmarks 101, Datasets 101, FlashAttention, and Transformers Math, and it turns out those were some of your evergreen favorites! So we are experimenting with more educational/survey content in the mix alongside our regular founder and event coverage. Pls request more!We have a new calendar for events; join to be notified of upcoming things in 2024!Today we visit the shoggoth mask factory: how do transformer models go from trawling a deeply learned latent space for next-token prediction to a helpful, honest, harmless chat assistant? Our guest “lecturer” today is ; you might know him from his prolific online writing on and Twitter, or from his previous work leading RLHF at HuggingFace and now at the Allen Institute for AI (AI2) which recently released the open source GPT3.5-class Tulu 2 model which was trained with DPO. He's widely considered one of the most knowledgeable people on RLHF and RLAIF. He recently gave an “RLHF 201” lecture at Stanford, so we invited him on the show to re-record it for everyone to enjoy! You can find the full slides here, which you can use as reference through this episode. Full video with synced slidesFor audio-only listeners, this episode comes with slide presentation along our discussion. You can find it on our YouTube (like, subscribe, tell a friend, et al).Theoretical foundations of RLHFThe foundation and assumptions that go into RLHF go back all the way to Aristotle (and you can find guidance for further research in the slide below) but there are two key concepts that will be helpful in thinking through this topic and LLMs in general:* Von Neumann–Morgenstern utility theorem: you can dive into the math here, but the TLDR is that when humans make decision there's usually a “maximum utility” function that measures what the best decision would be; the fact that this function exists, makes it possible for RLHF to model human preferences and decision making.* Bradley-Terry model: given two items A and B from a population, you can model the probability that A will be preferred to B (or vice-versa). In our world, A and B are usually two outputs from an LLM (or at the lowest level, the next token). It turns out that from this minimal set of assumptions, you can build up the mathematical foundations supporting the modern RLHF paradigm!The RLHF loopOne important point Nathan makes is that "for many tasks we want to solve, evaluation of outcomes is easier than producing the correct behavior". For example, it might be difficult for you to write a poem, but it's really easy to say if you like or dislike a poem someone else wrote. Going back to the Bradley-Terry Model we mentioned, the core idea behind RLHF is that when given two outputs from a model, you will be able to say which of the two you prefer, and we'll then re-encode that preference into the model.An important point that Nathan mentions is that when you use these preferences to change model behavior "it doesn't mean that the model believes these things. It's just trained to prioritize these things". When you have preference for a model to not return instructions on how to write a computer virus for example, you're not erasing the weights that have that knowledge, but you're simply making it hard for that information to surface by prioritizing answers that don't return it. We'll talk more about this in our future Fine Tuning 101 episode as we break down how information is stored in models and how fine-tuning affects it.At a high level, the loop looks something like this:For many RLHF use cases today, we can assume the model we're training is already instruction-tuned for chat or whatever behavior the model is looking to achieve. In the "Reward Model & Other Infrastructure" we have multiple pieces:Reward + Preference ModelThe reward model is trying to signal to the model how much it should change its behavior based on the human preference, subject to a KL constraint. The preference model itself scores the pairwise preferences from the same prompt (worked better than scalar rewards).One way to think about it is that the reward model tells the model how big of a change this new preference should make in the behavior in absolute terms, while the preference model calculates how big of a difference there is between the two outputs in relative terms. A lot of this derives from John Schulman's work on PPO:We recommend watching him talk about it in the video above, and also Nathan's pseudocode distillation of the process:Feedback InterfacesUnlike the "thumbs up/down" buttons in ChatGPT, data annotation from labelers is much more thorough and has many axis of judgement. At a simple level, the LLM generates two outputs, A and B, for a given human conversation. It then asks the labeler to use a Likert scale to score which one it preferred, and by how much:Through the labeling process, there are many other ways to judge a generation:We then use all of this data to train a model from the preference pairs we have. We start from the base instruction-tuned model, and then run training in which the loss of our gradient descent is the difference between the good and the bad prompt.Constitutional AI (RLAIF, model-as-judge)As these models have gotten more sophisticated, people started asking the question of whether or not humans are actually a better judge of harmfulness, bias, etc, especially at the current price of data labeling. Anthropic's work on the "Constitutional AI" paper is using models to judge models. This is part of a broader "RLAIF" space: Reinforcement Learning from AI Feedback.By using a "constitution" that the model has to follow, you are able to generate fine-tuning data for a new model that will be RLHF'd on this constitution principles. The RLHF model will then be able to judge outputs of models to make sure that they follow its principles:Emerging ResearchRLHF is still a nascent field, and there are a lot of different research directions teams are taking; some of the newest and most promising / hyped ones:* Rejection sampling / Best of N Sampling: the core idea here is that rather than just scoring pairwise generations, you are generating a lot more outputs (= more inference cost), score them all with your reward model and then pick the top N results. LLaMA2 used this approach, amongst many others.* Process reward models: in Chain of Thought generation, scoring each step in the chain and treating it like its own state rather than just scoring the full output. This is most effective in fields like math that inherently require step-by-step reasoning.* Direct Preference Optimization (DPO): We covered DPO in our NeurIPS Best Papers recap, and Nathan has a whole blog post on this; DPO isn't technically RLHF as it doesn't have the RL part, but it's the “GPU Poor” version of it. Mistral-Instruct was a DPO model, as do Intel's Neural Chat and StableLM Zephyr. Expect to see a lot more variants in 2024 given how “easy” this was.* Superalignment: OpenAI launched research on weak-to-strong generalization which we briefly discuss at the 1hr mark.Note: Nathan also followed up this post with RLHF resources from his and peers' work:Show Notes* Full RLHF Slides* Interconnects* Retort (podcast)* von Neumann-Morgenstern utility theorem* Bradley-Terry model (pairwise preferences model)* Constitutional AI* Tamer (2008 paper by Bradley Knox and Peter Stone)* Paul Christiano et al. RLHF paper* InstructGPT* Eureka by Jim Fan* ByteDance / OpenAI lawsuit* AlpacaEval* MTBench* TruthfulQA (evaluation tool)* Self-Instruct Paper* Open Assistant* Louis Castricato* Nazneen Rajani* Tulu (DPO model from the Allen Institute)Timestamps* [00:00:00] Introductions and background on the lecture origins* [00:05:17] History of RL and its applications* [00:10:09] Intellectual history of RLHF* [00:13:47] RLHF for decision-making and pre-deep RL vs deep RL* [00:20:19] Initial papers and intuitions around RLHF* [00:27:57] The three phases of RLHF* [00:31:09] Overfitting issues* [00:34:47] How preferences get defined* [00:40:35] Ballpark on LLaMA2 costs* [00:42:50] Synthetic data for training* [00:47:25] Technical deep dive in the RLHF process* [00:54:34] Projection / best event sampling* [00:57:49] Constitutional AI* [01:04:13] DPO* [01:08:54] What's the Allen Institute for AI?* [01:13:43] Benchmarks and models comparisonsTranscriptAlessio [00:00:00]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO in Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI.Swyx [00:00:15]: Hey, and today we have Dr. Nathan Lambert in the house. Welcome.Nathan [00:00:18]: Thanks guys.Swyx [00:00:19]: You didn't have to come too far. You got your PhD in Berkeley, and it seems like you've lived there most of the time in recent years. You worked on robotics and model-based reinforcement learning on your PhD, and you also interned at FAIR and DeepMind. You bootstrapped the RLHF team at Hugging Face, and you recently joined the Allen Institute as a research scientist. So that's your quick bio. What should people know about you that maybe is not super obvious about you on New LinkedIn?Nathan [00:00:43]: I stay sane in various insane sport and ultra-endurance sport activities that I do.Swyx [00:00:50]: What's an ultra-endurance sport activity?Nathan [00:00:52]: Long-distance trail running or gravel biking. Try to unplug sometimes, although it's harder these days. Yeah.Swyx [00:00:59]: Well, you know, just the Bay Area is just really good for that stuff, right?Nathan [00:01:02]: Oh, yeah. You can't beat it. I have a trailhead like 1.2 miles from my house, which is pretty unmatchable in any other urban area.Swyx [00:01:11]: Pretty excellent. You also have an incredible blog, Interconnects, which I'm a fan of. And I also just recently discovered that you have a new podcast, Retort.Nathan [00:01:20]: Yeah, we do. I've been writing for a while, and I feel like I've finally started to write things that are understandable and fun. After a few years lost in the wilderness, if you ask some of my friends that I made read the earlier blogs, they're like, oh, this is yikes, but it's coming along. And the podcast is with my friend Tom, and we just kind of like riff on what's actually happening on AI and not really do news recaps, but just what it all means and have a more critical perspective on the things that really are kind of funny, but still very serious happening in the world of machine learning.Swyx [00:01:52]: Yeah. Awesome. So let's talk about your work. What would you highlight as your greatest hits so far on Interconnects, at least?Nathan [00:01:59]: So the ones that are most popular are timely and or opinion pieces. So the first real breakout piece was when April and I also just wrote down the thing that everyone in AI was feeling, which is we're all feeling stressed, that we're going to get scooped, and that we're overworked, which is behind the curtain, what it feels to work in AI. And then a similar one, which we might touch on later in this, was about my recent job search, which wasn't the first time I wrote a job search post. People always love that stuff. It's so open. I mean, it's easy for me to do in a way that it's very on-brand, and it's very helpful. I understand that until you've done it, it's hard to share this information. And then the other popular ones are various model training techniques or fine tuning. There's an early one on RLHF, which is, this stuff is all just like when I figure it out in my brain. So I wrote an article that's like how RLHF actually works, which is just the intuitions that I had put together in the summer about RLHF, and that was pretty well. And then I opportunistically wrote about QSTAR, which I hate that you have to do it, but it is pretty funny. From a literature perspective, I'm like, open AI publishes on work that is very related to mathematical reasoning. So it's like, oh, you just poke a little around what they've already published, and it seems pretty reasonable. But we don't know. They probably just got like a moderate bump on one of their benchmarks, and then everyone lost their minds. It doesn't really matter.Swyx [00:03:15]: You're like, this is why Sam Altman was fired. I don't know. Anyway, we're here to talk about RLHF 101. You did a presentation, and I think you expressed some desire to rerecord it. And that's why I reached out on Twitter saying, like, why not rerecord it with us, and then we can ask questions and talk about it. Yeah, sounds good.Nathan [00:03:30]: I try to do it every six or 12 months is my estimated cadence, just to refine the ways that I say things. And people will see that we don't know that much more, but we have a bit of better way of saying what we don't know.Swyx [00:03:43]: Awesome. We can dive right in. I don't know if there's any other topics that we want to lay out as groundwork.Alessio [00:03:48]: No, you have some awesome slides. So for people listening on podcast only, we're going to have the slides on our show notes, and then we're going to have a YouTube version where we run through everything together.Nathan [00:03:59]: Sounds good. Yeah. I think to start skipping a lot of the, like, what is a language model stuff, everyone knows that at this point. I think the quote from the Llama 2 paper is a great kind of tidbit on RLHF becoming like a real deal. There was some uncertainty earlier in the year about whether or not RLHF was really going to be important. I think it was not that surprising that it is. I mean, with recent models still using it, the signs were there, but the Llama 2 paper essentially reads like a bunch of NLP researchers that were skeptical and surprised. So the quote from the paper was, meanwhile, reinforcement learning known for its instability seemed a somewhat shadowy field for those in the NLP research community. However, reinforcement learning proved highly effective, particularly given its cost and time effectiveness. So you don't really know exactly what the costs and time that Meta is looking at, because they have a huge team and a pretty good amount of money here to release these Llama models. This is just the kind of thing that we're seeing now. I think any major company that wasn't doing RLHF is now realizing they have to have a team around this. At the same time, we don't have a lot of that in the open and research communities at the same scale. I think seeing that converge would be great, but it's still very early days. And the other thing on the slide is some of Anthropic's work, but everyone knows Anthropic is kind of the masters of this, and they have some of their own techniques that we're going to talk about later on, but that's kind of where we start.Alessio [00:05:17]: Can we do just a one-second RL version? So you come from a robotics background, which RL used to be, or maybe still is, state-of-the-art. And then now you're seeing a lot of LLM plus RL, so you have the gym fans, Eureka, you have MPU, which we had on the podcast when they started with RL. Now they're doing RL plus LLMs. Yeah. Any thoughts there on how we got here? Maybe how the pendulum will keep swinging?Nathan [00:05:46]: I really think RL is about a framing of viewing the world through trial and error learning and feedback, and really just one that's focused on thinking about decision-making and inputs in the world and how inputs have reactions. And in that, a lot of people come from a lot of different backgrounds, whether it's physics, electrical engineering, mechanical engineering. There are obviously computer scientists, but compared to other fields of CS, I do think it's a much more diverse background of people. My background was in electrical engineering and doing robotics and things like that. It really just changes the worldview. I think that reinforcement learning as it was back then, so to say, is really different. You're looking at these toy problems and the numbers are totally different, and everyone went kind of zero to one at scaling these things up, but people like Jim Phan and other people that were... You saw this transition in the decision transformer and papers and when people are trying to use transformers to do decision-making for things like offline RL, and I think that was kind of like the early days. But then once language models were so proven, it's like everyone is using this tool for their research. I think in the long run, it will still settle out, or RL will still be a field that people work on just because of these kind of fundamental things that I talked about. It's just viewing the whole problem formulation different than predicting text, and so there needs to be that separation. And the view of RL in language models is pretty contrived already, so it's not like we're doing real RL. I think the last slide that I have here is a way to make RLHF more like what people would think of with RL, so actually running things over time, but a weird lineage of tools that happen to get us to where we are, so that's why the name takes up so much space, but it could have gone a lot of different ways. Cool.Alessio [00:07:29]: We made it one slide before going on a tangent.Nathan [00:07:31]: Yeah, I mean, it's kind of related. This is a...Swyx [00:07:35]: Yeah, so we have a history of RL.Nathan [00:07:37]: Yeah, so to give the context, this paper really started because I have this more diverse background than some computer scientists, such as trying to understand what the difference of a cost function or a reward function and a preference function would be without going into all of the details. Costs are normally things that control theorists would work with in these kind of closed domains, and then reinforcement learning has always worked with rewards that's central to the formulation that we'll see, and then the idea was like, okay, we now are at preferences, and each step along the way there's kind of different assumptions that you're making. We'll get into these, and those assumptions are built on other fields of work. So that's what this slide is going to say, it's like RLHF, while directly building on tools from RL and language models, is really implicitly impacted and built on theories and philosophies spanning tons of human history. I think we cite Aristotle in this paper, which is fun. It's like going pre-BC, it's like 2,300 years old or something like that. So that's the reason to do this, I think. We kind of list some things in the paper about summarizing what different presumptions of RLHF could be. I think going through these is actually kind of funny. It's fun to talk about these, because they're kind of grab bags of things that you'll see return throughout this podcast that we're talking about it. The core thing of RLHF that, in order to be a believer in this, is that RL actually works. It's like, if you have a reward function, you can optimize it in some way and get a different performance out of it, and you could do this at scale, and you could do this in really complex environments, which is, I don't know how to do that in all the domains. I don't know how to exactly make chat GPT. So it's kind of, we'll overshadow everything. And then there's, go from something kind of obvious like that, and then you read the von Neumann-Morgenstern utility theorem, which is essentially an economic theory that says you can weight different probabilities of different people, which is a theoretical piece of work that is the foundation of utilitarianism, and trying to quantify preferences is crucial to doing any sort of RLHF. And if you look into this, all of these things, there's way more you could go into if you're interested in any of these. So this is kind of like grabbing a few random things, and then kind of similar to that is the Bradley-Terry model, which is the fancy name for the pairwise preferences that everyone is doing. And then all the things that are like, that Anthropic and OpenAI figured out that you can do, which is that you can aggregate preferences from a bunch of different people and different sources. And then when you actually do RLHF, you extract things from that data, and then you train a model that works somehow. And we don't know, there's a lot of complex links there, but if you want to be a believer in doing this at scale, these are the sorts of things that you have to accept as preconditions for doing RLHF. Yeah.Swyx [00:10:09]: You have a nice chart of like the sort of intellectual history of RLHF that we'll send people to refer to either in your paper or in the YouTube video for this podcast. But I like the other slide that you have on like the presumptions that you need to have for RLHF to work. You already mentioned some of those. Which one's underappreciated? Like, this is the first time I've come across the VNM Utility Theorem.Nathan [00:10:29]: Yeah, I know. This is what you get from working with people like to my co-host on the podcast, the rhetoric is that sociologist by training. So he knows all these things and like who the philosophers are that found these different things like utilitarianism. But there's a lot that goes into this. Like essentially there's even economic theories that like there's debate whether or not preferences exist at all. And there's like different types of math you can use with whether or not you actually can model preferences at all. So it's pretty obvious that RLHF is built on the math that thinks that you can actually model any human preference. But this is the sort of thing that's been debated for a long time. So all the work that's here is like, and people hear about in their AI classes. So like Jeremy Bentham, like hedonic calculus and all these things like these are the side of work where people assume that preferences can be measured. And this is like, I don't really know, like, this is what I kind of go on a rant and I say that in RLHF calling things a preference model is a little annoying because there's no inductive bias of what a preference is. It's like if you were to learn a robotic system and you learned a dynamics model, like hopefully that actually mirrors the world in some way of the dynamics. But with a preference model, it's like, Oh my God, I don't know what this model, like I don't know what chat GPT encodes as any sort of preference or what I would want it to be in a fair way. Anthropic has done more work on trying to write these things down. But even like if you look at Claude's constitution, like that doesn't mean the model believes these things. It's just trained to prioritize these things. And that's kind of what the later points I'm looking at, like what RLHF is doing and if it's actually like a repeatable process in the data and in the training, that's just unknown. And we have a long way to go before we understand what this is and the link between preference data and any notion of like writing down a specific value.Alessio [00:12:05]: The disconnect between more sociology work versus computer work already exists, or is it like a recent cross contamination? Because when we had Tri Dao on the podcast, he said FlashAttention came to be because at Hazy they have so much overlap between systems engineer and like deep learning engineers. Is it the same in this field?Nathan [00:12:26]: So I've gone to a couple of workshops for the populations of people who you'd want to include this like R. I think the reason why it's not really talked about is just because the RLHF techniques that people use were built in labs like OpenAI and DeepMind where there are some of these people. These places do a pretty good job of trying to get these people in the door when you compare them to like normal startups. But like they're not bringing in academics from economics, like social choice theory. There's just too much. Like the criticism of this paper that this is based on is like, oh, you're missing these things in RL or at least this decade of RL and it's like it would be literally be bigger than the Sutton and Barto book if you were to include everyone. So it's really hard to include everyone in a principled manner when you're designing this. It's just a good way to understand and improve the communication of what RLHF is and like what is a good reward model for society. It really probably comes down to what an individual wants and it'll probably motivate models to move more in that direction and just be a little bit better about the communication, which is a recurring theme and kind of my work is like I just get frustrated when people say things that don't really make sense, especially when it's going to manipulate individual's values or manipulate the general view of AI or anything like this. So that's kind of why RLHF is so interesting. It's very vague in what it's actually doing while the problem specification is very general.Swyx [00:13:42]: Shall we go to the, I guess, the diagram here on the reinforcement learning basics? Yeah.Nathan [00:13:47]: So reinforcement learning, I kind of mentioned this, it's a trial and error type of system. The diagram and the slides is really this classic thing where you have an agent interacting with an environment. So it's kind of this agent has some input to the environment, which is called the action. The environment returns a state and a reward and that repeats over time and the agent learns based on these states and these rewards that it's seeing and it should learn a policy that makes the rewards go up. That seems pretty simple than if you try to mentally map what this looks like in language, which is that like the language models don't make this easy. I think with the language model, it's very hard to define what an environment is. So if the language model is the policy and it's generating, it's like the environment should be a human, but setting up the infrastructure to take tens of thousands of prompts and generate them and then show them to a human and collect the human responses and then shove that into your training architecture is very far away from working. So we don't really have an environment. We just have a reward model that returns a reward and the state doesn't really exist when you look at it like an RL problem. What happens is the state is a prompt and then you do a completion and then you throw it away and you grab a new prompt. We're really in as an RL researcher, you would think of this as being like you take a state, you get some completion from it and then you look at what that is and you keep kind of iterating on it and all of that isn't here, which is why you'll hear RLHF referred to as bandits problem, which is kind of like you choose one action and then you watch the dynamics play out. There's many more debates that you can have in this. If you get the right RL people in the room, then kind of like this is an RL even when you zoom into what RLHF is doing.Alessio [00:15:22]: Does this change as you think about a chain of thought reasoning and things like that? Like does the state become part of the chain that you're going through?Nathan [00:15:29]: There's work that I've mentioned on one slide called process reward models that essentially rewards each step in the chain of thought reasoning. It doesn't really give the part of interaction, but it does make it a little bit more fine grained where you can think about like calling it at least you have many states from your initial state. That formulation I don't think people have fully settled on. I think there's a bunch of great work out there, like even OpenAI is releasing a lot of this and let's verify step by step is there pretty great paper on the matter. I think in the next year that'll probably get made more concrete by the community on like if you can easily draw out like if chain of thought reasoning is more like RL, we can talk about that more later. That's a kind of a more advanced topic than we probably should spend all the time on.Swyx [00:16:13]: RLHF for decision making. You have a slide here that compares pre-deep RL versus deep RL.Nathan [00:16:19]: This is getting into the history of things, which is showing that the work that people are using now really came from well outside of NLP and it came before deep learning was big. Next up from this paper, Tamer, which is from 2008. Some names that are still really relevant in kind of human centric RL, Bradley Knox and Peter Stone. If you have an agent take an action, you would just have a human give a score from zero to one as a reward rather than having a reward function. And then with that classifier, you can do something with a policy that learns to take actions to maximize that reward. It's a pretty simple setup. It works in simple domains. And then the reason why this is interesting is you compare it to the paper that everyone knows, which is this Paul Christiano et al. Deep Reinforced Learning from Human Preferences paper, which is where they showed that learning from human preferences, you can solve like the basic RL tasks at the time. So various control problems and simulation and this kind of like human preferences approach had higher rewards in some environments than if you just threw RL at the environment that returned a reward. So the preferences thing was you took two trajectories. So in this case, it was like complete trajectories of the agent and the human was labeling which one is better. You can see how this kind of comes to be like the pairwise preferences that are used today that we'll talk about. And there's also a really kind of interesting nugget that is the trajectory that the humans were labeling over has a lot more information than the RL algorithm would see if you just had one state, which is kind of why people think that it's why the performance in this paper was so strong. But I still think that it's surprising that there isn't more RL work of this style happening now. This paper is in 2017. So it's like six years later and I haven't seen things that are exactly similar, but it's a great paper to understand where stuff that's happening now kind of came from.Swyx [00:17:58]: Just on the Christiano paper, you mentioned the performance being strong. I don't remember what results should I have in mind when I think about that paper?Nathan [00:18:04]: It's mostly like if you think about an RL learning curve, which is like on the X axis, you have environment interactions on the Y axis, you have performance. You can think about different like ablation studies of between algorithms. So I think they use like A2C, which I don't even remember what that stands for as their baseline. But if you do the human preference version on a bunch of environments, like the human preference labels, the agent was able to learn faster than if it just learned from the signal from the environment, which means like it's happening because the reward model has more information than the agent would. But like the fact that it can do better, I was like, that's pretty surprising to me because RL algorithms are pretty sensitive. So I was like, okay.Swyx [00:18:41]: It's just one thing I do want to establish as a baseline for our listeners. We are updating all the weights. In some sense, the next token prediction task of training a language model is a form of reinforcement learning. Except that it's not from human feedback. It's just self-supervised learning from a general corpus. There's one distinction which I love, which is that you can actually give negative feedback. Whereas in a general sort of pre-training situation, you cannot. And maybe like the order of magnitude of feedback, like the Likert scale that you're going to talk about, that actually just gives more signal than a typical training process would do in a language model setting. Yeah.Nathan [00:19:15]: I don't think I'm the right person to comment exactly, but like you can make analogies that reinforcement learning is self-supervised learning as well. Like there are a lot of things that will point to that. I don't know whether or not it's a richer signal. I think that could be seen in the results. It's a good thing for people to look into more. As reinforcement learning is so much less compute, like it is a richer signal in terms of its impact. Because if they could do what RLHF is doing at pre-training, they would, but they don't know how to have that effect in like a stable manner. Otherwise everyone would do it.Swyx [00:19:45]: On a practical basis, as someone fine-tuning models, I have often wished for negative fine-tuning, which pretty much doesn't exist in OpenAI land. And it's not the default setup in open-source land.Nathan [00:19:57]: How does this work in like diffusion models and stuff? Because you can give negative prompts to something to like stable diffusion or whatever. It's for guidance.Swyx [00:20:04]: That's for clip guidance.Nathan [00:20:05]: Is that just from like how they prompt it then? I'm just wondering if we could do something similar. It's another tangent.Swyx [00:20:10]: I do want to sort of spell that out for people in case they haven't made the connection between RLHF and the rest of the training process. They might have some familiarity with it.Nathan [00:20:19]: Yeah. The upcoming slides can really dig into this, which is like this in 2018 paper, there was a position paper from a bunch of the same authors from the Christiano paper and from the OpenAI work that everyone knows, which is like, they write a position paper on what a preference reward model could do to solve alignment for agents. That's kind of based on two assumptions. The first assumption is that we can learn user intentions to a sufficiently high accuracy. That doesn't last with me because I don't know what that means. But the second one is pretty telling in the context of RLHF, which is for many tasks we want to solve, evaluation of outcomes is easier than producing the correct behavior. And this is the whole thing. It's like we can compare two poems that the model generates and it can be viewed as liking a positive example, or it could be viewed as really disliking a negative example. And that's what I think a lot of people are doing in like the harm space is like a harmful response to a language model, whether or not you agree with the company's definition of harms is that it's a really bad negative example and they downweight them by preferring something more benign in the RLHF process, among other ways of dealing with safety. So that's a good way of saying it's like this is core, this kind of like comparison and positive or negative example is core to all of the RLHF work that has continued.Swyx [00:21:29]: People often say, I don't know what I want, but I'll know when I see it. This is that expressed in reinforcement learning tools.Nathan [00:21:35]: Yeah, it is. Yeah, it is. That's what everyone's doing in the preference modeling stage that we'll get to. Yeah. Yeah. And you can see there are more papers. This is really just to have all the links for people that go deeper. There's a Ziegler et al. paper in 2019, which shows that you can do this RLHF process on language models. This familiar diagram starts to emerge in 2019, and it's just to show that this goes really far back. I think we can kind of breeze through some of these. And then 2020 is the first open AI experiment that I think caught people's eyes, which is this learning to summarize experiment. It has this three-step process that we'll go to into more when I kind of go into the main concepts. But this is like the first time you see this diagram that they reuse with InstructGPT, they reuse with ChatGPT. And the types of examples that they would have, I don't think I need to read these exactly, but one that I have read a whole bunch of times is like, they took these prompts from Reddit that was like, explain like I'm five or get career advice, and people really pour their heart and soul into these. So these are like multi-paragraph pieces of writing. And then they essentially do comparisons between a vanilla language model, like I think it was either GPT-2 or GPT-3, I don't always get the exact years.Swyx [00:22:42]: 3 was early 2020. So that's about right.Nathan [00:22:45]: Yeah. So this is probably done with GPT-2. It doesn't really matter. But the language model does normal things when you do few shot, which is like it repeats itself. It doesn't have nice text. And what they did is that this was the first time where the language model would generate like pretty nice text from an output. It was restricted to the summarization domain. But I think that I guess this is where I wish I was paying attention more because I would see the paper, but I didn't know to read the language model outputs and kind of understand this qualitative sense of the models very well then. Because you look at the plots in the papers, these Learning to Summarize and Destruct GPT have incredibly pretty plots, just like nicely separated lines with error bars and they're like superfine tuning works, the RL step works. But if you were early to see like how different the language that was written by these models was, I think you could have been early to like things like ChatGPT and knowing RLHF would matter. And now I think the good people know to chat with language models, but not even everyone does this. Like people are still looking at numbers. And I think OpenAI probably figured it out when they were doing this, how important that could be. And then they had years to kind of chisel away at that and that's why they're doing so well now. Yeah.Swyx [00:23:56]: I mean, arguably, you know, it's well known that ChatGPT was kind of an accident that they didn't think it would be that big of a deal. Yeah.Nathan [00:24:02]: So maybe they didn't. Maybe they didn't, but they were getting the proxy that they needed.Swyx [00:24:06]: I've heard off the record from other labs that it was in the air. If OpenAI didn't do it, someone else would have done it. So you've mentioned a couple of other papers that are very seminal to this period. And I love how you say way back when in referring to 2019.Nathan [00:24:19]: It feels like it in my life.Swyx [00:24:21]: So how much should people understand the relationship between RLHF, instruction tuning, PPO, KL divergence, anything like that? Like how would you construct the level of knowledge that people should dive into? What should people know at the high level? And then if people want to dive in deeper, where do they go? Is instruct tuning important here or is that part of the overall process towards modern RLHF?Nathan [00:24:44]: I think for most people, instruction tuning is probably still more important in their day to day life. I think instruction tuning works very well. You can write samples by hand that make sense. You can get the model to learn from them. You could do this with very low compute. It's easy to do almost in like no code solutions at this point. And the loss function is really straightforward. And then if you're interested in RLHF, you can kind of learn from it from a different perspective, which is like how the instruction tuning distribution makes it easier for your RLHF model to learn. There's a lot of details depending on your preference data, if it's close to your instruction model or not, if that matters. But that's really at the RLHF stage. So I think it's nice to segment and just kind of understand what your level of investment and goals are. I think instruction tuning still can do most of what you want to do. And it's like, if you want to think about RLHF, at least before DPO really had taken off at all, it would be like, do you want to have a team of at least like five people if you're really thinking about doing RLHF? I think DPO makes it a little bit easier, but that's still really limited to kind of one data set that everyone's using at this point. Like everyone's using this ultra feedback data set and it boosts AlpacaVal, MTBench, TruthfulQA and like the qualitative model a bit. We don't really know why. It's like, it might just be a data set combined with the method, but you've got to be ready for a bumpy ride if you're wanting to try to do RLHF. I don't really recommend most startups to do it unless it's like going to provide them a clear competitive advantage in their kind of niche, because you're not going to make your model chat GPT like better than OpenAI or anything like that. You've got to accept that there's some exploration there and you might get a vein of benefit in your specific domain, but I'm still like, oh, be careful going into the RLHF can of worms. You probably don't need to.Swyx [00:26:27]: Okay. So there's a bit of a time skip in what you mentioned. DPO is like a couple months old, so we'll leave that towards the end. I think the main result that I think most people talk about at this stage, we're talking about September 2020 and then going into, I guess maybe last year was Vicuña as one of the more interesting applications of instruction tuning that pushed LLAMA1 from, let's say a GPT 3-ish model to a GPT 3.5 model in pure open source with not a lot of resources. I think, I mean, they said something like, you know, they use like under $100 to makeNathan [00:26:58]: this. Yeah. Like instruction tuning can really go a long way. I think the claims of chat GPT level are long overblown in most of the things in open source. I think it's not to say, like Vicuña was a huge step and it's just kind of showing that instruction tuning with the right data will completely change what it feels like to talk with your model. Yeah.Swyx [00:27:19]: From text completion to actually chatting back and forth. Yeah. Yeah.Nathan [00:27:23]: Instruction tuning can be multi-turn. Just having a little bit of data that's like a couple of turns can go a really long way. That was like the story of the whole first part of the year is like people would be surprised by how far you can take instruction tuning on a small model. I think the things that people see now is like the small models don't really handle nuance as well and they could be more repetitive even if they have really good instruction tuning. But if you take that kind of 7 to 70 billion parameter jump, like the instruction tuning at the bigger model is like robustness, little things make more sense. So that's still just with instruction tuning and scale more than anything else.Swyx [00:27:56]: Excellent. Shall we go to technical overview?Nathan [00:27:58]: Yeah. This is kind of where we go through my own version of this like three phase process. You can talk about instruction tuning, which we've talked about a lot. It's funny because all these things, instruction tuning has the fewest slides, even though it's the most practical thing for most people. We could save the debate for like if the big labs still do instruction tuning for later, but that's a coming wave for people. And then like preference data and training and then kind of like what does reinforce learning optimization actually mean? We talk about these sequentially because you really have to be able to do each of them to be able to do the next one. You need to be able to have a model that's chatty or helpful instruction following. Every company has their own word that they like to assign to what instructions mean. And then once you have that, you can collect preference data and do some sort of optimization.Swyx [00:28:39]: When you say word, you mean like angle bracket inst or do you mean something else?Nathan [00:28:42]: Oh, I don't even know what inst means, but just saying like they use their adjective that they like. I think Entropic also like steerable is another one.Swyx [00:28:51]: Just the way they describe it. Yeah.Nathan [00:28:53]: So like instruction tuning, we've covered most of this is really about like you should try to adapt your models to specific needs. It makes models that were only okay, extremely comprehensible. A lot of the times it's where you start to get things like chat templates. So if you want to do system prompts, if you want to ask your model, like act like a pirate, that's one of the ones I always do, which is always funny, but like whatever you like act like a chef, like anything, this is where those types of things that people really know in language models start to get applied. So it's good as a kind of starting point because this chat template is used in our early childhood and all of these things down the line, but it was a basic pointer. It's like, once you see this with instruction tuning, you really know it, which is like you take things like stack overflow where you have a question and an answer. You format that data really nicely. There's much more tricky things that people do, but I still think the vast majority of it is question answer. Please explain this topic to me, generate this thing for me. That hasn't changed that much this year. I think people have just gotten better at scaling up the data that they need. Yeah, this is where this talk will kind of take a whole left turn into more technical detail land. I put a slide with the RLHF objective, which I think is good for people to know. I've started going back to this more, just kind of understand what is trying to happen here and what type of math people could do. I think because of this algorithm, we've mentioned this, it's in the air, direct preference optimization, but everything kind of comes from an equation of trying to learn a policy that maximizes the reward. The reward is some learned metric. A lot can be said about what the reward should be subject to some constraint. The most popular constraint is the KL distraint, which is just a distributional distance. Essentially in language models, that means if you have a completion from your instruction or RLHF model, you can compare that completion to a base model. And looking at the log probs from the model, which are essentially how likely each token is, you can see a rough calculation of the distance between these two models, just as a scalar number. I think what that actually looks like in code, you can look at it. It'd be like a sum of log probs that you get right from the model. It'll look much more simpler than it sounds, but it is just to make the optimization kind of stay on tracks.Make sure it doesn't overfit to the RLHF data. Because we have so little data in RLHF, overfitting is really something that could happen. I think it'll fit to specific features that labelers like to see, that the model likes to generate, punctuation, weird tokens like calculator tokens. It could overfit to anything if it's in the data a lot and it happens to be in a specific format. And the KL constraint prevents that. There's not that much documented work on that, but there's a lot of people that know if you take that away, it just doesn't work at all. I think it's something that people don't focus on too much. But the objective, as I said, it's just kind of, you optimize the reward. The reward is where the human part of this comes in. We'll talk about that next. And then subject to a constraint, don't change the model too much. The real questions are, how do you implement the reward? And then how do you make the reward go up in a meaningful way? So like a preference model, the task is kind of to design a human reward. I think the equation that most of the stuff is based on right now is something called a Bradley-Terry model, which is like a pairwise preference model where you compare two completions and you say which one you like better. I'll show an interface that Anthropic uses here. And the Bradley-Terry model is really a fancy probability between two selections. And what's happening in the math is that you're looking at the probability that the chosen completion, the one you like better, is actually the better completion over the rejected completion. And what these preference models do is they assume this probability is correlated to reward. So if you just sample from this probability, it'll give you a scalar. And then you use that reward later on to signify what piece of text is better. I'm kind of inclined to breeze through the math stuff because otherwise, it's going to be not as good to listen to.Alessio [00:32:49]: I think people want to hear it. I think there's a lot of higher level explanations out there. Yeah.Nathan [00:32:55]: So the real thing is you need to assign a scalar reward of how good a response is. And that's not necessarily that easy to understand. Because if we take back to one of the first works, I mentioned this tamer thing for decision making. People tried that with language models, which is if you have a prompt in a completion and you just have someone rate it from 0 to 10, could you then train a reward model on all of these completions in 0 to 10 ratings and see if you can get chat2BT with that? And the answer is really kind of no. Like a lot of people tried that. It didn't really work. And then that's why they tried this pairwise preference thing. And it happened to work. And this Bradley Terry model comes from the 50s. It's from these fields that I was mentioning earlier. And it's wild how much this happens. I mean, this screenshot I have in the slides is from the DPO paper. I think it might be the appendix. But it's still really around in the literature of what people are doing for RLHF.Alessio [00:33:45]: Yeah.Nathan [00:33:45]: So it's a fun one to know.Swyx [00:33:46]: I'll point out one presumption that this heavily relies on. You mentioned this as part of your six presumptions that we covered earlier, which is that you can aggregate these preferences. This is not exactly true among all humans, right? I have a preference for one thing. You have a preference for a different thing. And actually coming from economics, you mentioned economics earlier. There's a theorem or a name for this called error impossibility, which I'm sure you've come across..Nathan [00:34:07]: It's one of the many kind of things we throw around in the paper.Swyx [00:34:10]: Right. Do we just ignore it?Nathan [00:34:14]: We just, yeah, just aggregate. Yeah. I think the reason this really is done on a deep level is that you're not actually trying to model any contestable preference in this. You're not trying to go into things that are controversial or anything. It's really the notion of preference is trying to stay around correctness and style rather than any meaningful notion of preference. Because otherwise these companies, they don't want to do this at all. I think that's just how it is. And it's like, if you look at what people actually do. So I have a bunch of slides on the feedback interface. And they all publish this.Swyx [00:34:43]: It's always at the appendices of every paper.Nathan [00:34:47]: There's something later on in this talk, which is like, but it's good to mention. And this is when you're doing this preference collection, you write out a very long document of instructions to people that are collecting this data. And it's like, this is the hierarchy of what we want to prioritize. Something amount like factuality, helpfulness, honestness, harmlessness. These are all different things. Every company will rank these in different ways, provide extensive examples. It's like, if you see these two answers, you should select this one and why. And all of this stuff. And then my kind of like head scratching is like, why don't we check if the models actually do these things that we tell the data annotators to collect? But I think it's because it's hard to make that attribution. And it's hard to test if a model is honest and stuff. It would just be nice to understand the kind of causal mechanisms as a researcher or like if our goals are met. But at a simple level, what it boils down to, I have a lot more images than I need. It's like you're having a conversation with an AI, something like type GPT. You get shown two responses or more in some papers, and then you have to choose which one is better. I think something you'll hear a lot in this space is something called a Likert scale. Likert is a name. It's a name for probably some research in economics, decision theory, something. But essentially, it's a type of scale where if you have integers from like one to eight, the middle numbers will represent something close to a tie. And the smallest numbers will represent one model being way better than the other. And the biggest numbers will be like the other models better. So in the case of one to eight, if you're comparing models A to B, if you return a one, if you really liked option A, you return eight if you really like B, and then like a four or five if they were close. There's other ways to collect this data. This one's become really popular. We played with it a bit at Hugging Face. It's hard to use. Filling out this preference data is really hard. You have to read like multiple paragraphs. It's not for me. Some people really like it. I hear I'm like, I can't imagine sitting there and reading AI-generated text and like having to do that for my job. But a lot of these early papers in RLHF have good examples of what was done. The one I have here is from Anthropic's collection demo because it was from slides that I did with Anthropic. But you can look up these in the various papers. It looks like Chat2BT with two responses, and then you have an option to say which one is better. It's nothing crazy. The infrastructure is almost exactly the same, but they just log which one you think is better. I think places like Scale are also really big in this where a lot of the labeler companies will help control like who's doing how many samples. You have multiple people go over the same sample once and like what happens if there's disagreement. I don't really think this disagreement data is used for anything, but it's good to know like what the distribution of prompts is, who's doing it, how many samples you have, controlling the workforce. All of this is very hard. A last thing to add is that a lot of these companies do collect optional metadata. I think the Anthropic example shows a rating of like how good was the prompt or the conversation from good to bad because things matter. Like there's kind of a quadrant of preference data in my mind, which is you're comparing a good answer to a good answer, which is like really interesting signal. And then there's kind of the option of you're comparing a bad answer to a bad answer, which is like you don't want to train your model on two different issues. This is like, we did this at Hugging Base and it was like, our data was like, we don't know if we can use this because a lot of it was just bad answer to bad answer because you're like rushing to try to do this real contract. And then there's also good answer to bad answer, which I think is probably pretty reasonable to include. You just prefer the good one and move on with your life. But those are very different scenarios. I think open AIs of the world are all in good answer, good answer, and have learned to eliminate everything else. But when people try to do this in open source, it's probably like what Open Assistance saw is like, there's just a lot of bad answers in your preference data. And you're like, what do I do with this? Metadata flags can help. I threw in the instruct GPT metadata. You can see how much they collect here. And like everything from the model fails to actually complete the task, hallucinations, different types of offensive or dangerous content, moral judgment, expresses opinion. Like, I don't know exactly if they're doing this now, but you can kind of see why doing RLHF at scale and prioritizing a lot of different endpoints would be hard because these are all things I'd be interested in if I was scaling up a big team to do RLHF and like what is going into the preference data. You do an experiment and you're like, okay, we're going to remove all the data where they said the model hallucinates like just that and then retrain everything. Like, what does that do?Swyx [00:38:59]: Yeah, so hallucination is big, but some of these other metadata categories, and I've seen this in a lot of papers, it's like, does it contain sexual content? Does it express a moral judgment? Does it denigrate a protected class? That kind of stuff, very binary. Should people try to adjust for this at the RLHF layer or should they put it as a pipeline where they have a classifier as a separate model that grades the model output?Nathan [00:39:20]: Do you mean for training or like a deployment? Deployment. I do think that people are doing it at deployment. I think we've seen safety and other things in the RLHF pipeline. Like Lama 2 is famous for kind of having this like helpfulness and safety reward models. Deep in the Gemini report is something that Gemini has like four things, which is like helpfulness, factuality, maybe safety, maybe something else. But places like Anthropic and Chattopadhyay and Bard almost surely have a classifier after, which is like, is this text good? Is this text bad? That's not that surprising, I think, because you could use like a hundred times smaller language model and do much better at filtering than RLHF. But I do think it's still so deeply intertwined with the motivation of RLHF to be for safety that some of these categories still persist. I think that's something I'll kind of settle out, I think.Swyx [00:40:11]: I'm just wondering if it's worth collecting this data for the RLHF purpose, if you're not going to use it in any way, separate model to-Nathan [00:40:18]: Yeah, I don't think OpenAI will collect all of this anymore, but I think for research perspectives, it's very insightful to know, but it's also expensive. So essentially your preference data scales with how many minutes it takes for you to do each task and every button is like, it scales pretty linearly. So it's not cheap stuff.Swyx [00:40:35]: Can we, since you mentioned expensiveness, I think you may have joined one of our spaces back in Lama 2 was released. We had an estimate from you that was something on the order of Lama 2 costs $3 to $6 million to train GPU-wise, and then it was something like $20 to $30 million in preference data. Is that something that's still in the ballpark? I don't need precise numbers.Nathan [00:40:56]: I think it's still a ballpark. I know that the 20 million was off by a factor of four because I was converting from a prompt number to a total data point. So essentially when you do this, if you have multi-turn setting, each turn will be one data point and the Lama 2 paper reports like 1.5 million data points, which could be like 400,000 prompts. So I would say it's still say like 6 to 8 million is safe to say that they're spending, if not more, they're probably also buying other types of data and or throwing out data that they don't like, but it's very comparable to compute costs. But the compute costs listed in the paper always are way lower because all they have to say is like, what does one run cost? But they're running tens or hundreds of runs. So it's like, okay, like... Yeah, it's just kind of a meaningless number. Yeah, the data number would be more interesting.Alessio [00:41:42]: What's the depreciation of this data?Nathan [00:41:46]: It depends on the method. Like some methods, people think that it's more sensitive to the, this is what I was saying. It was like, does the type of instruction tuning you do matter for RLHF? So like, depending on the method, some people are trying to figure out if you need to have like what is called like, this is very confusing. It's called like on policy data, which is like your RLHF data is from your instruction model. I really think people in open source and academics are going to figure out how to use any preference data on any model just because they're scrappy. But there's been an intuition that to do like PPO well and keep improving the model over time and do like what Meta did and what people think that OpenAI does is that you need to collect new preference data to kind of edge the distribution of capabilities forward. So there's a depreciation where like the first batch of data you collect isn't really useful for training the model when you have the fifth batch. We don't really know, but it's a good question. And I do think that if we had all the LLAMA data, we wouldn't know what to do with all of it. Like probably like 20 to 40% would be pretty useful for people, but not the whole data set. Like a lot of it's probably kind of gibberish because they had a lot of data in there.Alessio [00:42:51]: So do you think like the open source community should spend more time figuring out how to reuse the data that we have or like generate more data? I think that's one of the-Nathan [00:43:02]: I think if the people are kind of locked into using synthetic data, people also think that synthetic data is like GPT-4 is more accurate than humans at labeling preferences. So if you look at these diagrams, like humans are about 60 to 70% agreement. And we're like, that's what the models get to. And if humans are about 70% agreement or accuracy, like GPT-4 is like 80%. So it is a bit better, which is like in one way of saying it.Swyx [00:43:24]: Humans don't even agree with humans 50% of the time.Nathan [00:43:27]: Yeah, so like that's the thing. It's like the human disagreement or the lack of accuracy should be like a signal, but how do you incorporate that? It's really tricky to actually do that. I think that people just keep using GPT-4 because it's really cheap. It's one of my like go-to, like I just say this over and over again is like GPT-4 for data generation, all terms and conditions aside because we know OpenAI has this stuff is like very cheap for getting pretty good data compared to compute or salary of any engineer or anything. So it's like tell people to go crazy generating GPT-4 data if you're willing to take the organizational like cloud of should we be doing this? But I think most people have accepted that you kind of do this, especially at individuals. Like they're not gonna come after individuals. I do think more companies should think twice before doing tons of OpenAI outputs. Also just because the data contamination and what it does to your workflow is probably hard to control at scale.Swyx [00:44:21]: And we should just mention at the time of recording, we've seen the first example of OpenAI enforcing their terms of service. ByteDance was caught, reported to be training on GPT-4 data and they got their access to OpenAI revoked. So that was one example.Nathan [00:44:36]: Yeah, I don't expect OpenAI to go too crazy on this cause they're just gonna, there's gonna be so much backlash against them. And like, everyone's gonna do it anyways.Swyx [00:44:46]: And what's at stake here to spell it out is like, okay, that's like cost $10 to collect one data point from a human. It's gonna cost you like a 10th of a cent with OpenAI, right? So like it's just orders of magnitude cheaper. And therefore people-Nathan [00:44:58]: Yeah, and it's like the signal you get from humans is from preferences isn't that high. The signal that you get from humans for instructions is pretty high, but it is also very expensive. So like the human instructions are definitely like by far and away the best ones out there compared to the synthetic data. But I think like the synthetic preferences are just so much easier to get some sort of signal running with and you can work in other, I think people will start working in other goals there between safety and whatever. That's something that's taking off and we'll kind of see that. I think in 2024, at some point, people will start doing things like constitutional AI for preferences, which will be pretty interesting. I think we saw how long it took RLHF to get started in open source. Instruction tuning was like the only thing that was really happening until maybe like August, really. I think Zephyr was the first model that showed success with RLHF in the public, but that's a long time from everyone knowing that it was something that people are interested in to having any like check mark. So I accept that and think the same will happen with constitutional AI. But once people show that you can do it once, they continue to explore.Alessio [00:46:01]: Excellent.Swyx [00:46:01]: Just in the domain of human preference data suppliers, Scale.ai very happily will tell you that they supplied all that data for Lama 2. The other one is probably interesting, LMSYS from Berkeley. What they're running with Chaterina is perhaps a good store of human preference data.Nathan [00:46:17]: Yeah, they released some toxicity data. They, I think, are generally worried about releasing data because they have to process it and make sure everything is safe and they're really lightweight work. I think they're trying to release the preference data. I have, if we make it to evaluation, I'd pretty much say that Chaterina is the best limited evaluation that people have to learn how to use language models. And like, it's very valuable data. They also may share some data with people that they host models from. So like if your model is hosted there and you pay for the hosting, you can get the prompts because you're pointing the endpoint at it and that gets pinged to you and you're any real LLM inference stack saves the prompts tha
In this episode of Voices with Vervaeke, Dr. John Vervaeke and guest Alexander Beiner, a leading voice in the world of psychedelics, discuss psychedelics' role in contemporary society. They tackle complex topics like the dangers and potentials of mixing politics with psychedelics, the mechanics of mystical experiences, and the modern meaning crisis. The duo also delves into the risks and rewards of commodifying psychedelic experiences. They explore the latest scientific studies, personal accounts, and anecdotal evidence, weaving them into an intricate narrative that invites listeners to consider psychedelics beyond recreational use. From trials at Imperial College London to DMT injections and the exploration of meta-cognitive skills, this episode serves as a comprehensive guide for anyone interested in the intricacies of the human mind, altered states, and the possibility of a collective conscious awakening. Alexander Beiner (@AlexanderBeiner) is an author, journalist, and facilitator who is dedicated to bringing countercultural perspectives into mainstream conversation. With an approach that blends writing and experiential transformation, he's committed to helping us navigate the complex era we find ourselves. Alexander is the author of 'The Bigger Picture: How Psychedelics Can Help Us Make Sense of the World'' and also pens a popular Substack with the same name. He serves as an executive director for Breaking Convention, Europe's seminal conference on psychedelic medicine and culture. Additionally, he co-created and co-facilitates Regenerative Stewardship, a legal psilocybin retreat. A pioneer in alternative media, he was one of the founders of Rebel Wisdom, a platform that delved into the realms of systems change and cultural sensemaking. Resources: Alexander Beiner: Website | Substack | X John Vervaeke: Website | Facebook | X The Vervaeke Foundation Rebel Wisdom — YouTube: Psychedelic Capitalism and The Sacred John Vervaeke — YouTube: John Vervaeke: Artificial Intelligence, The Meaning Crisis, & The Future of Humanity Conversation with John Vervaeke - AI edition - Jordan Hall Regenerative Stewardship Breaking Convention Challenging Psychedelic Experiences Project ‘I took part in a radical psychedelic clinical trial and it changed my life forever' 'Why Socrates was a Monster' with John Vervaeke We Will Call It Pala Books: The Bigger Picture: How Psychedelics Can Help Us Make Sense of the World - Alexander Beiner How to Change Your Mind: What the New Science of Psychedelics Teaches Us About Consciousness, Dying, Addiction, Depression, and Transcendence - Michael Pollan Cosmic Serpent: DNA and the Origins of Knowledge - Jeremy Narby The Razor's Edge - W. Somerset Maugham Zen and the Art of Motorcycle Maintenance: An Inquiry into Values - Robert M Pirsig Heidegger, Neoplatonism, and the History of Being: Relation as Ontological Ground - James Filler Mentoring the Machines: Orientation - Part One: Surviving the Deep Impact of the Artificially Intelligent Tomorrow - John Vervaeke, Shawn Coyne Publications: Dose-Response Study of N,N-Dimethyltryptamine in Humans: II. Subjective Effects and Preliminary Results of a New Rating Scale - Rick J. Strassman MD; Clifford R. Qualls PhD; Eberhard H. Uhlenhuth MD; Robert Kellner MD, PhD A Model for the Application of Target-Controlled Intravenous Infusion for a Prolonged Immersive DMT Psychedelic Experience - Andrew R. Gallimor, Rick J. Strassman On Revelations and Revolutions: Drinking Ayahuasca Among Palestinians Under Israeli Occupation - Leor Roseman, Nadeem Karkabi The Self-Organization of Insight: Entropy and Power Laws in Problem Solving - Damian G. Stephen, James A. Dixon People: Ram Dass Timothy Leary Terence McKenna Marc Lewis Ashleigh Murphy-Beiner Robin Carhart-Harris Juensung Kim Michel Ferrari Daniel Schmachtenberger Iain McGilchrist Émile Durkheim Bernardo Kastrup Nicholas of Cusa Ben Sessa Peter Gasser Friederike Meckel Aldous Huxley Timecodes: 00:00:00 - Dr. John Vervaeke introduces his guest, Alexander Beiner, a founding figure of Rebel Wisdom. 00:03:22 - Beiner shares how the psychedelic counterculture influenced him and his exploration of the potential role of psychedelics in societal change. 00:06:50 - The conversation turns to the transformative possibilities offered by psychedelic experiences, highlighting both their enlightening and limiting aspects. 00:11:38 - Dr. Vervaeke probes the relationship between the duration of a DMT trip and its perceived intensity while contrasting mystical and visionary experiences. 00:18:00 - Delving into the sensory richness of psychedelic trips, Alexander expounds on the accompanying emotions like relevance, mystery, and familiarity. 00:23:00 - Skepticism and belief intersect as both speakers explore the advantages of an agnostic viewpoint when interpreting profound experiences. 00:26:40 - Dr. Vervaeke advocates for mindfulness as preparatory groundwork, cautioning against blindly conferring authority to psychedelic apparitions. 00:29:19 - Alexander talks about the research done by his wife, Ashleigh Murphy-Beiner, in the psychedelic world. He emphasizes the significance of leaning into what's coming up but also acknowledges the need to zoom out and take a step back. 00:34:00 - Vervaeke introduces the concept of overfitting and underfitting in machine learning, drawing parallels to our mental processes and how psychedelics can introduce noise to prevent overfitting. 00:40:43 - Reverence as a virtue is discussed, accompanied by reflections on the cultivation of epistemic virtue. 00:47:20 - Alexander suggests that even partial acceptance of panpsychism or idealism can enrich our perspectives in a meaningful way. 00:54:39 - John and Alexander discuss the dark side of AI and the manipulative potential of making things salient and opening people up to misinformation. 01:01:18 - Dr. John Vervaeke talks about the importance of the dialogical character of experiences and how they differ from traditional enlightenment experiences 01:09:51 - John describes the stages of Dialogos, including interpersonal intimacy, intimacy with the logos, and intimacy with the ground of being itself. 01:10:49 - John praises Alexander's book and his efforts to address the meaning crisis and explore the psychedelic renaissance.
Embark on a journey into the world of Time Series Forecasting in this episode of "The AI Frontier". Discover the importance of trends and seasonality, and explore advanced forecasting techniques like ARIMA, SARIMA, and Prophet. Learn about the challenges in time series forecasting and how to address them. Whether you're a data science enthusiast or a professional in the field, this episode will equip you with the knowledge and insights to make accurate and effective forecasts.Support the Show.Keep AI insights flowing – become a supporter of the show!Click the link for details
Unlock the power of Data Augmentation in our latest episode of 'The AI Frontier.' We delve into how this innovative technique can creatively increase your dataset size, enhancing your machine learning models' performance. From understanding the concept, exploring various techniques for images, text, and audio, to discussing advanced methods like GANs and autoencoders, this episode is a comprehensive guide to data augmentation. Tune in to discover how to leverage data augmentation in your AI projects and boost your model's efficiency.Support the Show.Keep AI insights flowing – become a supporter of the show!Click the link for details
Click here to register for my FREE Masterclass: https://autc.pro/TSSeng-pod?sl=EN_POD-53531329
Clicca qui per iscriverti alla Masterclass gratuita: https://autc.pro/TSSita-pod?sl=IT_POD-53531336
AI Today Podcast: Artificial Intelligence Insights, Experts, and Opinion
When it comes to building ML models, you want to make a model simple enough so that it can handle a wide range of real-world data on the one hand, but not too simple that it overgeneralizes or underfits the available data. In this episode of the AI Today podcast hosts Kathleen Walch and Ron Schmelzer define the terms Overfitting, Underfitting, Bias, Variance, and Bias/Variance Tradeoff, and explain how they relate to AI and why it's important to know about them. Continue reading AI Today Podcast: AI Glossary Series: Overfitting, Underfitting, Bias, Variance, Bias/Variance Tradeoff at AI & Data Today.
Footballguys staff writer Adam Harstad joins RSP Film and Data as Matt Waldman's cohost for the 2022 football season to discuss practices that will make you a better fantasy GM and NFL fan. There are a lot of excellent writers and analysts in the fantasy industry. There are few that I hold in as high a regard as Adam. He's a forthright human being with a tremendous intellect whose approach to analysis in this space differs from most. He's also an excellent fantasy GM in re-draft and dynasty formats. Unlike Dwain McFarland, whose work delves into the statistical process, Adam spends a lot of time examining results and dealing with broad themes of decision-making that help us become better fantasy players and fantasy analysts: How analytics professionals can overfit data and the value of accounting for uncertainty in football. Where Matt accounts for uncertainty in his scouting process. Terry McClaurin: A player type that Matt believes he'll miss on again in the near future if another like him comes along. Dalvin Cook: A player type Matt believes he's more likely to identify with certainty. The Odell Beckham vs. big receiver debate. Why player archetypes are valuable for scouting talent, especially at WR and RB. The value of the NFL Combine. Why WRs often make better punt returners and RBs better kick returners. The awesome Jeff Fisher punt returner test. And of course, if you want to know about the rookies from this draft class, you will find the most in-depth analysis of offensive skill players available (QB, RB, WR, and TE), with the 2022 Rookie Scouting Portfolio for $21.95. Matt's new RSP Dynasty Rankings and Two-Year Projections Package is available for $24.95 If you're a fantasy owner and interested in purchasing past publications for $9.95 each, the 2012-2020 RSPs also have a Post-Draft Add-on that's included at no additional charge. If you're a fantasy owner and interested in purchasing past publications for $9.95 each, the 2012-2020 RSPs also have a Post-Draft Add-on that's included at no additional charge. Best yet, proceeds from sales are set aside for a year-end donation to Darkness to Light to combat the sexual abuse of children.
Selam fularsızlar, zeka hakkında yazılmış en tartışmalı kitaba devam ediyor, orijinal araştırma kısmına dalıyoruz. Oradaki yüzlerce sayfanın ana fikri şu: IQ > Sosyoekonomi. Yani IQ sadece akademik başarıyı değil, insan hayatında olan biten çoğu şeyi en çok etkileyen faktör. Hakikaten de böyle mi?(Duyuru: Bunca bölümdür devam etmemi sağlayan en önemli şey, Patreon'dan irili ufaklı destek veren sizin gibi dinleyiciler. Bu destek doğrudan bana geliyor, normal reklam gelirleri ise (varsa o ay) yapımcımla paylaşılıyor. Ayrıca patronlara e-kitabım bedava, yoksa da buyrun: Safsatalar Ansiklopedisi Kısaltılmış Edisyon)----------------------------------------------------Bu podcast, Hiwell hakkında reklam içerir.Hiwell hakkında daha detaylı bilgi almak ve fular100 kodu ile %20 indirimden faydalanmak için tıklayın.----------------------------------------------------.Bölümler:(00:05) Fakirliğin sebepleri.(03:45) Suçlular neden suç işliyorlar.(06:35) Survivorship bias.(07:45) NLSY79 çalışması.(09:15) Regresyon analizi ve İşsizlik.(10:40) Nedensellik.(11:25) Multiple linear regression.(12:35) Sosyoekonomik endeks.(14:15) Evliliğin yoksulluğa etkisi.(15:15) Gayrımeşru çocuk oranı.(17:35) Overfitting.(19:15) Multicollinearity.(20:55) Eğitim ve zeka ilişkisi.(22:30) Zeka ve sosyoekonomik durum ilişkisi.(24:30) Bilişsel elitler.(27:55) Genetik mi çevre mi.(29:00) AFQT vs WAIS.(30:50) IQ'nun gerçek etkisi.(32:35) Gelecek bölüm ve teşekkürler..Kaynaklar:Ayrıntılı Kitap Özeti: Intelligence and Class Structure in American LifeAkademik makale: Is There a Cognitive Elite in America?AFQT Savunusu: Technical Issues Regarding the Armed Forces Qualification Test as a Measure of IQSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
A discussion on overfitting. Are you doing it? And if so is it harming your trading results more than it's helping it? Check out this FREE training series - https://tieronetrading.com/free/ Your Trading Coach - Akil
Als Data Scientisten möchten wir stets ein Overfitting und Underfitting unseres Daten Modells vermeiden.
In episode 3 of Artificial Chaos, Holly rants about America trying to find Russian missile silos, why self-driving cars sometimes drive into cyclists, and how to predict house prices.ResourcesHere's a link to the "Overfitting" video that I mentioned at 6:53 in this episode. 48 furious seconds of me writing the French word for yes: https://twitter.com/HollyGraceful/status/1438210026187939843If you didn't understand the reference to "Nazi Chatbots" at the end of this episode, we're referencing this problem: https://www.theverge.com/2016/3/24/11297050/tay-microsoft-chatbot-racist
In this paper we propose to study generalization of neural networks on small algorithmically generated datasets. In this setting, questions about data efficiency, memorization, generalization, and speed of learning can be studied in great detail. In some situations we show that neural networks learn through a process of “grokking” a pattern in the data, improving generalization performance from random chance level to perfect generalization, and that this improvement in generalization can happen well past the point of overfitting. 2022: Alethea Power, Yuri Burda, Harrison Edwards, I. Babuschkin, Vedant Misra https://arxiv.org/pdf/2201.02177v1.pdf
#grokking #openai #deeplearning Grokking is a phenomenon when a neural network suddenly learns a pattern in the dataset and jumps from random chance generalization to perfect generalization very suddenly. This paper demonstrates grokking on small algorithmic datasets where a network has to fill in binary tables. Interestingly, the learned latent spaces show an emergence of the underlying binary operations that the data were created with. OUTLINE: 0:00 - Intro & Overview 1:40 - The Grokking Phenomenon 3:50 - Related: Double Descent 7:50 - Binary Operations Datasets 11:45 - What quantities influence grokking? 15:40 - Learned Emerging Structure 17:35 - The role of smoothness 21:30 - Simple explanations win 24:30 - Why does weight decay encourage simplicity? 26:40 - Appendix 28:55 - Conclusion & Comments Paper: https://mathai-iclr.github.io/papers/... Abstract: In this paper we propose to study generalization of neural networks on small algorithmically generated datasets. In this setting, questions about data efficiency, memorization, generalization, and speed of learning can be studied in great detail. In some situations we show that neural networks learn through a process of “grokking” a pattern in the data, improving generalization performance from random chance level to perfect generalization, and that this improvement in generalization can happen well past the point of overfitting. We also study generalization as a function of dataset size and find that smaller datasets require increasing amounts of optimization for generalization. We argue that these datasets provide a fertile ground for studying a poorly understood aspect of deep learning: generalization of overparametrized neural networks beyond memorization of the finite training dataset. Authors: Alethea Power, Yuri Burda, Harri Edwards, Igor Babuschkin & Vedant Misra Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... Minds: https://www.minds.com/ykilcher Parler: https://parler.com/profile/YannicKilcher LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/1824646584 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
The proposed framework contains two important ingredients: Smoothness regularization and Bregman proximal point optimization. Our experiments show that the proposed framework achieves new state-of-the-art performance on a number of NLP tasks including GLUE, SNLI, SciTail and ANLI. 2020: Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, T. Zhao Keywords: Language model, Overfitting, Experiment, Manifold regularization, Matrix regularization https://arxiv.org/pdf/1911.03437v4.pdf
Learn how weird dreams may help us in the real world; how we date dinosaurs; and why a healthy grip means a healthy body. A theory from AI says our weird dreams help us better perceive the world by Briana Brownell Our dreams' weirdness might be why we have them, argues new AI-inspired theory of dreaming. (2021). EurekAlert! https://www.eurekalert.org/pub_releases/2021-05/cp-odw050621.php Hoel, E. (2021). The overfitted brain: Dreams evolved to assist generalization. Patterns, 2(5), 100244. https://doi.org/10.1016/j.patter.2021.100244 Paleontologists know how old dinosaurs were when they died because bones are like tree rings by Cameron Duke Anonymous. (2019, June 11). Which Dinosaur Bones Are “Real”? Field Museum. https://www.fieldmuseum.org/blog/which-dinosaur-bones-are-real Field Museum. (2020, November 25). Growth Rings From Fossil Bones Reveals T. rex Had Huge Growth Spurts, but Other Dinosaurs Grew “Slow and Steady.” SciTechDaily. https://scitechdaily.com/growth-rings-from-fossil-bones-reveals-t-rex-had-huge-growth-spurts-but-other-dinosaurs-grew-slow-and-steady/ Welsh, J. (2012, June 27). How Sweet! Dinosaurs May Have Been Warm-Blooded After All. Livescience.com; Live Science. https://www.livescience.com/21215-dinosaur-bones-warm-blooded.html Wits University. (2021, May 12). Southern African dinosaur had irregular growth. Phys.org; Phys.org. https://phys.org/news/2021-05-southern-african-dinosaur-irregular-growth.html A Healthy Grip Means a Healthy Body by Ashley Hamer Grip Strength Is Good Indicator of Overall Health - UConn Today. (2011, June 6). UConn Today. https://today.uconn.edu/2011/06/grip-strength-is-good-indicator-of-overall-health/# Sanderson, W. C., & Scherbov, S. (2014). Measuring the Speed of Aging across Population Subgroups. PLoS ONE, 9(5), e96289. https://doi.org/10.1371/journal.pone.0096289 Mukherjee, S., Clouston, S., Kotov, R., Bromet, E., & Luft, B. (2019). Handgrip Strength of World Trade Center (WTC) Responders: The Role of Re-Experiencing Posttraumatic Stress Disorder (PTSD) Symptoms. International Journal of Environmental Research and Public Health, 16(7), 1128. https://doi.org/10.3390/ijerph16071128 Follow Curiosity Daily on your favorite podcast app to learn something new every day withCody Gough andAshley Hamer. Still curious? Get exclusive science shows, nature documentaries, and more real-life entertainment on discovery+! Go to https://discoveryplus.com/curiosity to start your 7-day free trial. discovery+ is currently only available for US subscribers. See omnystudio.com/listener for privacy information.
This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Today we’re joined by Nir Bar-Lev, co-founder and CEO of ClearML. In our conversation with Nir, we explore how his view of the wide vs deep machine learning platforms paradox has changed and evolved over time, how companies should think about building vs buying and integration, and his thoughts on why experiment management has become an automatic buy, be it open source or otherwise. We also discuss the disadvantages of using a cloud vendor as opposed to a software-based approach, the balance between mlops and data science when addressing issues of overfitting, and how ClearML is applying techniques like federated machine learning and transfer learning to their solutions. The complete show notes for this episode can be found at https://twimlai.com/go/488.
Chris talks about overfitting to the public leaderboard (on Kaggle and in life), and Christian tries to figure out a direction for two different projects: a hierarchical book reader (or writer?), and his timetracking app: should he go more B2B, or be more like Scott's Cheap Flights or Nomad List? 00:00 Intro 02:48 Overfitting to the public leaderboard 06:48 Different approaches to life, business and problems 11:40 Buying a new computer? 20:04 SaaS Work in Progress and 30x500 23:03 Hot Take: hierarchical book reader 37:29 Hot Take: timetracker: B2B or start a cult? Timestamps created with https://clips.marketing by @cgenco
Jack Butcher is the founder of Visualize Value a communications channel that creates unique pieces of art, and offers insight into becoming a successful creator on the internet. Jack spoke with Vance Crowe about his sale of an NFT for more than $60,000 the challenges of being a new father, what he has learned running a massive community and the challenge of having your creative skillset compete with everyone else in the "anywhere world."Vance and Jack discuss Renee Girard's concept of Mimetic desire, Eric Hoel's Super sensorium.and David Goodhart's concept of Somewhere vs. Anywhere people.Chapters —3:26 Start5:49 Innovation is happening faster7:52 What is an NFT?19:26 What it's like as a new parent23:26 Communicating with a child24:58 Mimetic desire26:44 How emotions develop in babies28:16 Jack Butcher's Daily Manifest29:41 Building a Lasting Community33:40 Graph Theory42:26 Voice of Resistance47:31 Twitter Phenoms vs Kings50:02 The future of anonymity53:07 Niche Fame1:00:07 Overfitting and the value of dreaming1:05:53 Jack Butcher's Peter Thiel ParadoxPodcast Website: https://www.vancecrowe.com/podcastApple Podcasts: https://podcasts.apple.com/us/podcast/the-vance-crowe-podcast/id1463771076Spotify: https://open.spotify.com/show/08nGGRJCjVw2frkbtNrfLw?si=WUCu-FoyRRu9U_i-1gJZfgRSS: https://feeds.transistor.fm/the-vance-crowe-podcastYouTube Full Episodes: https://www.youtube.com/channel/UCigB7W5bX_gCinJxev9WB8w/YouTube Clips: https://www.youtube.com/channel/UCJKKb66A5_4ZcsE-rKI24ygBuy a sweatshirt, T-shirt or mugs from the podcast! Check out the Articulate Ventures Merch Store: https://teespring.com/stores/thevancecrowepodcastSubscribe to the podcast for email notifications on new episodes, invites to events and other exclusive content — http://eepurl.com/gSTfk5ABOUT THE VANCE CROWE PODCAST — Vance Crowe interviews people with an expertise that you would want to know about, but might not think to ask. He prompts his guests to think about their work in novel ways, discusses how it applies to regular people and has fun sharing stories and experiences.SUPPORT THE PODCAST —Rate the Podcast | https://ratethispodcast.com/vcpJoin the Articulate Ventures Network | https://network.articulate.ventures/ —We are a patchwork of thinkers that want to articulate ideas in a forum where they can be respectfully challenged, improved and celebrated so that we can explore complex subjects, learn from those we disagree with and achieve our personal & professional goals.Contact Vance for a Talk | https://www.vancecrowe.com/ —Vance delivers speeches that reveal important aspects of human communication. Audiences are entertained, engaged, and leave feeling empowered to change something about the way they are communicating. Vance tells stories about his own experiences, discusses theories in ways that make them relatable and highlights interesting people, books, and media that the audience can learn even more from. Join the #ATCF Book Club | https://articulate.ventures/category/atcf-book-club
I’ve recently taken a course about Mental Models for Marketing from Corey Haines and it helped me become better at growth marketing. There were some principles that needed more details and stories. So, we decided to record an episode on that. We discussed: 2:01 - Corey's story in the marketing world4:15 - What's the inversion principle? How did Corey use to at Baremetrics?10:46 - What's Cobra Effect? How did Corey realize while trying to increase the activation metric?17:17 - A key lesson learned while trying to increase activation18:47 - Ockham's Razor & Overfitting story related to his landing page25:25 - Favorite growth marketers according to Corey25:52 - Worst advice Corey ever got? There are 40+ Mental Models Corey has in his course. Check out
Today Yannic Lightspeed Kilcher and I spoke with Alex Stenlake about Kernel Methods. What is a kernel? Do you remember those weird kernel things which everyone obsessed about before deep learning? What about Representer theorem and reproducible kernel hilbert spaces? SVMs and kernel ridge regression? Remember them?! Hope you enjoy the conversation! 00:00:00 Tim Intro 00:01:35 Yannic clever insight from this discussion 00:03:25 Street talk and Alex intro 00:05:06 How kernels are taught 00:09:20 Computational tractability 00:10:32 Maths 00:11:50 What is a kernel? 00:19:39 Kernel latent expansion 00:23:57 Overfitting 00:24:50 Hilbert spaces 00:30:20 Compare to DL 00:31:18 Back to hilbert spaces 00:45:19 Computational tractability 2 00:52:23 Curse of dimensionality 00:55:01 RBF: infinite taylor series 00:57:20 Margin/SVM 01:00:07 KRR/dual 01:03:26 Complexity compute kernels vs deep learning 01:05:03 Good for small problems? vs deep learning) 01:07:50 Whats special about the RBF kernel 01:11:06 Another DL comparison 01:14:01 Representer theorem 01:20:05 Relation to back prop 01:25:10 Connection with NLP/transformers 01:27:31 Where else kernels good 01:34:34 Deep learning vs dual kernel methods 01:33:29 Thoughts on AI 01:34:35 Outro
What makes a good model? Is it grace? Beauty? or just a really great set of data? What happens when your data goes wrong. What is overfitting data and how does it affect your results? Join hosts Shanti and Danny as we discuss Data Modeling, Overfitting/Underfitting Data, the Dunning-Kruger effect, Tom Cruise, and Underwater Basket weaving. Nothing to reference, but here’s thishttps://en.wikipedia.org/wiki/Underwater_basket_weavinghttps://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
Anthony Goldbloom is the founder and CEO of Kaggle. In 2011 & 2012, Forbes Magazine named Anthony as one of the 30 under 30 in technology. In 2011, Fast Company featured him as one of the innovative thinkers who are changing the future of business. He and Lukas discuss the differences in strategies that do well in Kaggle competitions vs academia vs in production. They discuss his 2016 Ted talk through the lens of 2020, frameworks, and languages. Topics Discussed: 0:00 Sneak Peek 0:20 Introduction 0:45 methods used in kaggle competitions vs mainstream academia 2:30 Feature engineering 3:55 Kaggle Competitions now vs 10 years ago 8:35 Data augmentation strategies 10:06 Overfitting in Kaggle Competitions 12:53 How to not overfit 14:11 Kaggle competitions vs the real world 18:15 Getting into ML through Kaggle 22:03 Other Kaggle products 25:48 Favorite under appreciated kernel or dataset 28:27 Python & R 32:03 Frameworks 35:15 2016 Ted talk though the lens of 2020 37:54 Reinforcement Learning 38:43 What’s the topic in ML that people don’t talk about enough? 42:02 Where are the biggest bottlenecks in deploying ML software? Check out Kaggle: https://www.kaggle.com/ Follow Anthony on Twitter: https://twitter.com/antgoldbloom Watch his 2016 Ted Talk: https://www.ted.com/talks/anthony_goldbloom_the_jobs_we_ll_lose_to_machines_and_the_ones_we_won_t Visit our podcasts homepage for transcripts and more episodes! www.wandb.com/podcast Get our podcast on Soundcloud, Apple, and Spotify! Soundcloud: https://bit.ly/2YnGjIq Apple Podcasts: https://bit.ly/2WdrUvI Spotify: https://bit.ly/2SqtadF We started Weights and Biases to build tools for Machine Learning practitioners because we care a lot about the impact that Machine Learning can have in the world and we love working in the trenches with the people building these models. One of the most fun things about these building tools has been the conversations with these ML practitioners and learning about the interesting things they’re working on. This process has been so fun that we wanted to open it up to the world in the form of our new podcast called Gradient Dissent. We hope you have as much fun listening to it as we had making it! Weights and Biases: We’re always free for academics and open source projects. Email carey@wandb.com with any questions or feature suggestions. * Blog: https://www.wandb.com/articles * Gallery: See what you can create with W&B - https://app.wandb.ai/gallery * Join our community of ML practitioners working on interesting problems - https://www.wandb.com/ml-community Host: Lukas Biewald - https://twitter.com/l2k Producer: Lavanya Shukla - https://twitter.com/lavanyaai Editor: Cayla Sharp - http://caylasharp.com/
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.07.08.193664v1?rss=1 Authors: David O'Connor, Evelyn M.R. Lake, Dustin Scheinost, R. Todd Constable Abstract: It is a long-standing goal of neuroimaging to produce reliable generalized models of brain behavior relationships. More recently data driven predicative models have become popular. Overfitting is a common problem with statistical models, which impedes model generalization. Cross validation (CV) is often used to give more balanced estimates of performance. However, CV does not provide guidance on how best to apply the models generated out-of-sample. As a solution, this study proposes an ensemble learning method, in this case bootstrap aggregating, or bagging, encompassing both model parameter estimation and feature selection. Here we investigate the use of bagging when generating predictive models of fluid intelligence (fIQ) using functional connectivity (FC). We take advantage of two large openly available datasets, the Human Connectome Project (HCP), and the Philadelphia Neurodevelopmental Cohort (PNC). We generate bagged and non-bagged models of fIQ in the HCP. Over various test-train splits, these models are evaluated in sample, on left out HCP data, and out-of-sample, on PNC data. We find that in sample, a non-bagged model performs best, however out-of-sample the bagged models perform best. We also find that feature selection can vary substantially within-sample. A more considered approach to feature selection, alongside data driven predictive modeling, is needed to improve cross sample performance of FC based brain behavior models.Competing Interest StatementThe authors have declared no competing interest.View Full Text Copy rights belong to original authors. Visit the link for more info
Nan Jiang takes us deep into Model-based vs Model-free RL, Sim vs Real, Evaluation & Overfitting, RL Theory vs Practice and much more!
En este episodio hablamos de regularización, una técnica efectiva para resolver el problema de overfitting o sobre ajuste. Presentamos dos técnicas: ridge regression y lasso. Este último tiene la propiedad de ser un algoritmo que selecciona automáticamente los parámetros finales.
In this episode with talk about regularization, an effective technique to deal with overfitting by reducing the variance of the model. Two techniques are introduced: ridge regression and lasso. The latter one is effectively a feature selection algorithm.
Overfitting is a problem that almost every machine learning engineer, data scientist, data analyst face at least once in his life. In this podcast, we discussed some methods to prevent them and reduce them using various methods. . Instagram Handle: -https://www.instagram.com/aihindishow/ --- This episode is sponsored by · Anchor: The easiest way to make a podcast. https://anchor.fm/app --- Send in a voice message: https://anchor.fm/aihindishow/message Support this podcast: https://anchor.fm/aihindishow/support
Scalable Capital setzt in der Vermögensverwaltung auf Technologie. Die ETF-Portfolios unserer Kunden werden von einem Algorithmus verwaltet. Dabei stellt sich die Frage, wie viel Automatisierung die Geldanlage verträgt. Soll der Computer wirklich alle Handlungsentscheidungen selbst treffen und auch das Regelwerk dazu mitbestimmen? Wo ist nach wie vor menschliche Expertise gefragt? Und wie viel künstliche Intelligenz und maschinelles Lernen stecken im dynamischen Risikomanagement von Scalable Capital? Christian Groll, Head of Quantitative Investment Strategy bei Scalable Capital, erklärt, wie Anlage-Algorithmen ticken und worauf man bei ihrer Entwicklung achten sollte. Mehr zum Thema: https://de.scalable.capital/mittnik-on-markets/was-die-wissenschaft-zu-robo-advice-sagt Blog: https://de.scalable.capital/blog Quant's Perspective: https://de.scalable.capital/quants-perspective ETF-Ratgeber: https://de.scalable.capital/etf-leitfaden Kapitalanlagen bergen Risiken
مفهوم بیشبرازش (Overfitting) و بیشآموزش (Overtraining) کانال تلگرام: https://t.me/kalami_qa
Show Notes: Geodesics (12:40) Booz Allen Hamilton (21:00) Kirk ’surprised’ himself through the cognitive ability test at a job interview - the idea of surprising ourselves through exposing ourselves to new ideas (25:00) "Cognitive view of the whole, and not just a narrow silo’ed view - the bias buster” - systems thinking (26:40) Underfitting and Overfitting (27:00) Data Science: the application of scientific discovery from data (30:00) ‘Miracle Year of Physics’ - Albert Einstein’s immaculate year (32:00) The Hubble Telescope (35:50) “Any job worth doing, is worth doing poorly” (37:50) “All models are wrong, but some are useful” - George Box (38:30) “Fail fast to learn fast” - discussed in Tim Ferriss’ conversation with Google’s Astro Teller (40:30) Palomar Mountain (46:00) Kirk’s approach to information deluge (47:00) Data literacy (48:45) We discuss the ‘lens’ we each put on the world - here’s a brilliant take on the subject by Maria Popova (51:30) "The message is in the madness” (57:00) Lighting Round: Book: Language in thought and action by Hayakawa (01:05:30) Family has been most important to setting Kirk’s trajectory Making his hear sing: contribution to a book “Demystifying AI for the enterprise” (59:40) Kirk’s Five-Cut FridaysFind Kirk online: Twitter: @KirkDBorne LinkedInPersonal blog: http://rocketdatascience.org/ Find us at originspodcast.co
Volvemos con un episodio teórico de machine learning y retomamos el ejercicio del árbol de decisión para introducir los conceptos de sesgo y sobreajuste (mas conocido como Overfitting). Origen
Hvordan ser fremtiden til AI-revolusjonen ut? Og hva er «overfitting»? I denne episoden av #LØRN snakker Silvija med sjefskonsulent i Kantega, Jon Espen Ingvaldsen om hvordan datamaskiner kan være selvlærende og forstå oss mennesker bedre enn vi kan selv. De snakker også om «overfitting» vs personalisering.— AI er enkelt. Det er tilgang til gode og unike datasett som er vanskelig, forteller han i episoden.Dette lørner du: AI«overfitting» vs personaliseringTreningsdata See acast.com/privacy for privacy and opt-out information.
Today's concept is underfitting vs. overfitting. Someone who underfits doesn't adapt quickly enough to new data. But if you think that's bad, an overfitter will read TOO MUCH into new data and come up with wild ideas. How do you balance between them and what are some examples? We also talk about Warren Buffet's Recent statements on Bitcoin as Rat Poison Squared, and the response from Union Square Venure's Fred Wilson. How does Bitcoin get it's value? We look at underfitters and overfitters in the crypto space. Finally, the value of the Turing Test. When General AI comes, will it be the fabled machine that can pass as human, or will that idea be sidelined as more profitable uses taken precedent. Also, we learn what an Ornithopter is, and get an update on Yanny-Laurel.
These past few years have seen some remarkable achievements in the field of AI as regular listeners will well know. However, despite these accomplishments, we still don't see widespread adoption of some of the key technologies that propelled AI milestones such as AlphaGo's stunning success over the world's best. It turns out, this tech may not quite be ready for the prime time as it is more difficult than you might think. Links: Deep Reinforcement Learning Doesn't Work Yet Follow us and leave us a rating! iTunes Homepage Twitter @artlyintelly Facebook artificiallyintelligent1@gmail.com Access the latest episode here!
Dr. Gari Clifford, DPhil has been studying artificial intelligence (AI) and its utility in healthcare for two decades. He holds several prestigious positions in academia and is an Associate Professor of Biomedical Informatics at Emory University and an Associate Professor of Biomedical Engineering at Georgia Institute of Technology. We met him at the San Francisco Data Institute Conference in October where he chaired sessions on Machine Learning and Health. Gari recently held a competition challenging data scientists to develop predictive algorithms for the early detection of Atrial Fibrillation, using mobile ECG machines. He shares insight into the complexity of using AI to diagnose health conditions and offers a glimpse into the future of healthcare and medical information. Here’s the outline of this interview with Gari Clifford: [00:01:07] The road to machine learning and mobile health. [00:01:27] Lionel Tarassenko: neural networks and artificial intelligence. [00:03:36] San Francisco Data Institute Conference. [00:03:54] Jeremy Howard at fast.ai. [00:04:17] Director of Data Institute David Uminsky. [00:05:05] Dr. Roger Mark, Computing in Cardiology PhysioNet Challenges. [00:05:23] 2017 Challenge: Detecting atrial fibrillation in electrocardiograms. [00:05:44] Atrial Fibrillation. [00:06:08] KardiaMobile EKG monitor by AliveCor. [00:06:33] Random forests, support vector machines, heuristics, deep learning. [00:07:23] Experts don't always agree. [00:08:33] Labeling ECGs: AF, normal sinus rhythm, another rhythm, or noisy. [00:09:07] 20-30 experts are required to discern a stable diagnosis. [00:09:40] Podcast: Arrhythmias in Endurance Athletes, with Peter Backx, PhD. [00:11:17] Applying additional algorithm on top of all final algorithms: improved score from 83% to 87% accuracy. [00:11:38] Kaggle for machine learning competitions. [00:13:44] Overfitting an algorithm increases complexity, decreases utility. [00:15:01] 10,000 ECGs are not enough. [00:16:24] Podcast: How to Teach Machines That Can Learn with Dr. Pedro Domingos. [00:16:50] XGBoost. [00:19:18] Mechanical Turk. [00:20:08] QRS onset and T-wave offset. [00:21:31] Galaxy Zoo. [00:24:00] Podcast: Jason Moore of Elite HRV. [00:24:34] Andrew Ng. Paper: Rajpurkar, Pranav, et al. "Cardiologist-level arrhythmia detection with convolutional neural networks." arXiv preprint arXiv:1707.01836 (2017). [00:28:44] Detecting arrhythmias using other biomarkers. [00:30:41] Algorithms trained on specific patient populations not accurate for other populations. [00:31:24] Propensity matching. [00:31:55] Should we be sharing our medical data? [00:32:15] Privacy concerns associated with sharing medical data. [00:32:44] Mass scale research: possible with high-quality data across a large population. [00:33:04] Selling social media data in exchange for useful or entertaining software. [00:33:42] Who touched my medical data and why? [00:36:31] Siloing data, perhaps to protect the current industries. [00:37:03] Health Insurance Portability and Privacy Act (HIPPA). [00:37:34] Fast Healthcare Interoperability Resources (FHIR) protocol. [00:37:48] Microsoft HealthVault and Google Health. [00:38:46] Blockchain and 3blue1brown. [00:39:28] Where to go to learn more about Gari Clifford. [00:39:53] Presentation: Machine learning for FDA-approved consumer level point of care diagnostics – the wisdom of algorithm crowds: (the PhysioNet Computing in Cardiology Challenge 2017).
Predicting the future is hard. Weirdly enough, you can sometimes do better with *less* information!
In this episode three of season three of Talking Machines we dive into overfitting, take a listener question about unbalanced data and talk with Professor (Emeritus) Tom Dietterich from Oregon State University.
本期播客是《得意忘形》的第 6 期。这期节目是我的单口。在说了几期虚无缥缈的内容之后,本期节目我试图聊点儿实在的——从一个管理学概念「帕金森定律」开始,讲讲直觉型思考的优势、生活中的过度拟合问题、以及拖延症的一些实用主义疗法。本期节目里你可以听到:* 什么帕金森定律?它的几种表现形式是什么?* 与帕金森定律并称为「西方文化三大发现」的「彼得原理」和「墨菲定律」是怎么回事儿?* 什么是「过度拟合」(Overfitting)?生活中有哪些例子?* 如何看待《异类》、《引爆点》之类的畅销书?* 如何从三个不同的角度「治疗」拖延症?本期延伸阅读:(含链接版:https://zhuanlan.zhihu.com/p/25453260/)* 帕金森定律* 回想彼得原理 by 罗登* 墨菲定律* 《Algorithms to Live By》* 《眨眼之间:不假思索的决断力》* 《眨眼之间》书评:有趣的直觉* 过拟合* Sam Altman:关于早期创业,你需要知道的几乎所有事* 什么是创业公司的「速度」* "The human race built most nobly when limitations were greatest. "—— Frank Lloyd Wright音乐:* My Way by Frank Sinatra《得意忘形》是一个主张追求个体自由与探寻真理的媒体计划。我们见证了第一次工业革命以来科技对人类社会的极大推动与助益,但也意识到资本主义与市场经济不可避免地催生了消费文化、剥夺了个人价值、并窃取了大众时间。带着对生命的有限性与无目的性的敬畏,我们试图为读者与听众提供更全面的觉察自我与认知世界的工具,以不断重建当下的方式穿越时间、抵达生活的本质。
本期播客是《得意忘形》的第 6 期。这期节目是我的单口。在说了几期虚无缥缈的内容之后,本期节目我试图聊点儿实在的——从一个管理学概念「帕金森定律」开始,讲讲直觉型思考的优势、生活中的过度拟合问题、以及拖延症的一些实用主义疗法。本期节目里你可以听到:* 什么帕金森定律?它的几种表现形式是什么?* 与帕金森定律并称为「西方文化三大发现」的「彼得原理」和「墨菲定律」是怎么回事儿?* 什么是「过度拟合」(Overfitting)?生活中有哪些例子?* 如何看待《异类》、《引爆点》之类的畅销书?* 如何从三个不同的角度「治疗」拖延症?本期延伸阅读:(含链接版:https://zhuanlan.zhihu.com/p/25453260/)* 帕金森定律* 回想彼得原理 by 罗登* 墨菲定律* 《Algorithms to Live By》* 《眨眼之间:不假思索的决断力》* 《眨眼之间》书评:有趣的直觉* 过拟合* Sam Altman:关于早期创业,你需要知道的几乎所有事* 什么是创业公司的「速度」* "The human race built most nobly when limitations were greatest. "—— Frank Lloyd Wright音乐:* My Way by Frank Sinatra《得意忘形》是一个主张追求个体自由与探寻真理的媒体计划。我们见证了第一次工业革命以来科技对人类社会的极大推动与助益,但也意识到资本主义与市场经济不可避免地催生了消费文化、剥夺了个人价值、并窃取了大众时间。带着对生命的有限性与无目的性的敬畏,我们试图为读者与听众提供更全面的觉察自我与认知世界的工具,以不断重建当下的方式穿越时间、抵达生活的本质。
Overfitting to your training data can be avoided by evaluating your machine learning algorithm on a holdout test dataset, but what about overfitting to the test data? Turns out it can be done, easily, and you have to be very careful to avoid it. But an algorithm from the field of privacy research shows promise for keeping your test data safe from accidental overfitting
《IT 公论》一岁了!感谢大家的支持。本期节目我们请来了在 Facebook 伦敦办公室任研究科学家的王珵(Cici)和我们分享试玩 Oculus Rift 和三星 Gear 虚拟现实头盔的体验。其它话题包括 ACM 编程竞赛、Facebook 的工作状态、以及飞天神麵教。 相关链接 Dijkstra’s algorithm ACM-ICPC(美国计算机协会国际大学生编程竞赛) Facebook Graph Search Overfitting Oculus VR Samsung Gear VR Land’s End 王珵戴着三星 Gear 虚拟现实眼镜试玩 Land’s End(高速摄像) Hyperlapse Choose Your Own Adventure Jamie Oliver 吴涛提到的 Skyrim VR 视频 人物简介 李如一:字节社创始人。 Rio: Apple4us 程序员。 吴涛:Type is Beautiful 程序员,《内核恐慌》主播。 王珵:牛津三一学院计算机博士,Facebook 伦敦办公室研究科学家。