Podcasts about XGBoost

Play Episode Listen Later May 1, 2025 26:20

In dieser Folge geht's um die Frage: Macht Größe von Large Language Models (LLMs) bei Predictive Analytics wirklich einen Unterschied? Wir vergleichen Open-Source-Modelle mit bis zu 70 Milliarden Parametern – und siehe da, das 8B-Modell schlägt das große Schwergewicht. Außerdem berichten wir vom Finetuning auf einer AWS-Maschine mit 8 A100-GPUs und den Herausforderungen in Bezug auf die Reproduzierbarkeit. Auch das viel diskutierte DeepSeek-Modell haben wir im Autopreis-Benchmark antreten lassen. Und wie immer fragen wir uns: Was ist praktisch und was ist overkill? **Zusammenfassung** Modellgröße ≠ bessere Prognosen: Das Llama-3.1-8B übertraf das größere 70B-Modell bei der Fahrzeugpreisprognose DeepSeek im Benchmark: Das chinesische Modell zeigt bei größeren Trainingsmengen eine ähnlich gute Performance wie das Llama-3.1-8B, ist bei kleinen Datensätzen aber schwächer Finetuning mit Multi-GPU auf AWS: Für das 70B-Modell war ein Setup mit 8 A100-GPUs nötig Reproduzierbarkeit bleibt schwierig: Trotz Seed erzeugen wiederholte Finetuning-Runs unterschiedliche Ergebnisse Modellselektion empfohlen: Um zuverlässige Prognosen zu erhalten, sollte aus mehreren Finetuning-Durchläufen das beste Modell ausgewählt werden CPU-Inferenz möglich, aber langsam: Im Vergleich zur GPU war die Vorhersage auf der CPU ca. 30-mal langsamer, Quantisierung könnte künftig Abhilfe schaffen Ausblick auf TabPFN & Quantisierung: Kommende Beiträge widmen sich Erfahrungen mit TabPFN und der praktischen Umsetzung von quantisierten LLMs auf kleineren Maschinen **Links** [Begleitender Blogartikel] Predictive LLMs: Skalierung, Reproduzierbarkeit & DeepSeek https://www.inwt-statistics.de/blog/predictive-llms-skalierung-reproduzierbarkeit-und-deepseek #50: Predictive Analytics mit LLMs: ist GPT3.5 besser als XGBoost? https://inwt.podbean.com/e/50-predictive-analytics-mit-llms-ist-gpt35-besser-als-xgboost/ #64: Predictive LLMs: Übertreffen Open-Source-Modelle jetzt OpenAI und XGBoost bei Preisprognosen https://inwt.podbean.com/e/64-predictive-llms-ubertreffen-open-source-modelle-jetzt-openai-und-xgboost-bei-preisprognosen/ vLLM Framework für schnelle Inferenz: https://github.com/vllm-project/vllm?tab=readme-ov-file torchtune Finetuning-Framework von PyTorch: https://github.com/pytorch/torchtune PyTorch Reproducibility: https://pytorch.org/docs/stable/notes/randomness.html Paper zur Reproduzierbarkeit von QLoRA-Finetuning: S. S. Alahmari, L. O. Hall, P. R. Mouton and D. B. Goldgof, "Repeatability of Fine-Tuning Large Language Models Illustrated Using QLoRA," in IEEE Access, vol. 12, pp. 153221-153231, 2024, doi: 10.1109/ACCESS.2024.3470850 https://ieeexplore.ieee.org/document/10700744 heise online: Komprimierte KI: Wie Quantisierung große Sprachmodelle verkleinert von René Peinl https://www.heise.de/hintergrund/Komprimierte-KI-Wie-Quantisierung-grosse-Sprachmodelle-verkleinert-10206033.html deepseek-ai/DeepSeek-R1-Distill-Llama-8B auf Huggingface https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B#6-how-to-run-locally TabPFN: Hollmann, N., Müller, S., Purucker, L. et al. Accurate predictions on small data with a tabular foundation model. Nature 637, 319–326 (2025). https://doi.org/10.1038/s41586-024-08328-6 Feedback, Fragen oder Themenwünsche gern an podcast@inwt-statistics.de

#69: AI Agents verstehen und evaluieren mit Matthäus Deutsch

Play Episode Listen Later Apr 3, 2025 47:22

AI Agents sind mehr als nur Chatbots – aber wie bewertet man sie richtig? Wir sprechen über die Herausforderungen beim Testen von AI im Kundenservice, warum falsche API-Parameter ins Chaos führen und wieso "mysteriöser Fleischeintopf" ein PR-Desaster wurde. Matthäus Deutsch von Parloa berichtet, wie flexible Plattformintegrationen und evaluative Ansätze (z.B. assertion-based Testing und Simulationen) den Einsatz von AI Agents vorantreiben. Außerdem: welche Metriken wirklich zählen, was Multi-Agent-Setups leisten und warum der Preisverfall bei Open-Source-Modellen das Game verändert. Zusammenfassung AI Agents erweitern klassische Chatbots im Kundenservice, insbesondere im Telefonbereich, durch GenAI-basierte, dynamische Lösungen Parloa demonstriert flexible Plattformintegrationen und den Einsatz von Evaluationsmethoden wie assertion-based Testing und Simulationen Die Evaluation von AI Agents erfordert spezielles Benchmarking auf Plattform- und individueller Ebene Typische Herausforderungen sind Integrationsprobleme, fehlerhafte API-Calls und unzureichendes Instruction Following Tests erfolgen sowohl auf Konversationsebene als auch durch deterministische Ansätze und LLMs als Judge Es müssen komplexe Metriken und Trade-offs beachtet werden, wobei häufig binäre Testansätze aggregiert werden Schnelle Updates auf neue Modellversionen sind möglich, allerdings steigen langfristig die Kosten durch umfangreiche Testzyklen Innovationen wie optimierte Speech-to-Speech-Technologien und Open-Source-Lösungen (z. B. DeepSeek) bieten Potenzial zur Kostenreduktion Der Einsatz von Operatoren-Modellen und Tool-Integrationen ermöglicht auch die Anbindung an Legacy-Systeme, z.B. SAP Ziel ist es, den Automatisierungsanteil im Kundenservice zu erhöhen und eine Balance zwischen bewährter Qualität und neuen Features zu finden Links Matthäus Deutsch auf LinkedIn: https://www.linkedin.com/in/matth%C3%A4us-d-928864ab/ Parloa Contact-Center-AI-Plattform https://www.parloa.com/de/ Stellenangebote bei Parloa https://www.parloa.com/company/careers/#jobs #55: Alle machen XGBoost, aber was macht eigentlich XGBoost? Mit Matthäus Deutsch https://www.podbean.com/ew/pb-6gvc6-16d5018 #64: Predictive LLMs: Übertreffen Open-Source-Modelle jetzt OpenAI und XGBoost bei Preisprognosen? https://www.podbean.com/ew/pb-m5qr2-17c425d heise online: "Aromatisches" Chloramingas, Eintopf aus Menschenfleisch: KI-Rezepte irritieren https://www.heise.de/news/Aromatisches-Chlorgas-Eintopf-aus-Menschenfleisch-KI-irritiert-mit-Rezepten-9242991.html Feedback, Fragen oder Themenwünsche gern an podcast@inwt-statistics.de

#64: Predictive LLMs: Übertreffen Open-Source-Modelle jetzt OpenAI und XGBoost bei Preisprognosen?

commentary models primary diagnosis machine learning wang gold standard validating brief introduction saline jian jacc xgboost

Play Episode Listen Later Jan 23, 2025 40:31

Teil 2 unseres Preisprognose-Experiments für Gebrauchtfahrzeuge: Können Open-Source-LLMs wie Llama 3.1, Mistral und Leo-HessianAI mit GPT-3.5 mithalten? Wir haben fleißig gefinetuned, bis die Motoren qualmten – und es zeigt sich, dass die Unterschiede gar nicht mehr so groß sind. Mit ausreichend vielen Trainingsbeobachtungen nähern sich die Open-Source-Modelle den Ergebnissen von GPT-3.5 an und können es in einzelnen Metriken sogar übertreffen. Für das Finetuning größerer Modelle sind jedoch auch leistungsfähige GPUs notwendig, was die Ressourcenanforderungen deutlich erhöht. In der Folge beleuchten wir, welchen Mehrwert diese Open-Source-LLMs für praxisnahe Use Cases liefern und welche Herausforderungen dabei auftreten. Zusammenfassung: Vergleich von OpenAI GPT-3.5 und drei Open-Source-LLMs (Llama 3.1, Mistral 7B, Leo-HessianAI) Finetuning der Modelle auf lokalen Daten Ergebnisse: Open-Source-LLMs sind bei größerem Trainingsdatensatz fast so gut wie GPT-3.5 XGBoost hinkt etwas hinterher, da Freitexte hier nicht einbezogen wurden Wichtige Faktoren: Batchgröße, Trainingsschritte, Speicherbedarf und Nutzung von Lora-Finetuning Beim Einsatz von Open Source ist mehr Handarbeit nötig, dafür bleibt alles on-premise OpenAI punktet durch Einfachheit und hohe Qualität ohne großen Datenbedarf Frameworks wie Huggingface, Mistral Codebase und Torchtune unterstützen das Finetuning Ausblick: größere LLMs mit Multi-GPU, multimodale Daten und Unsicherheitsquantifizierung ***Links*** [Blog] Predictive LLMs: Übertreffen Open-Source-Modelle OpenAI bei Preisprognosen? https://www.inwt-statistics.de/blog/predictive-llms-uebertreffen-os-modelle-openai-bei-preisprognosen [Podcast] #50: Predictive Analytics mit LLMs: ist GPT3.5 besser als XGBoost? https://www.podbean.com/ew/pb-n6wem-165cb2c [Blog] Predictive LLMs: Kann GPT-3.5 die Prognosen von XGBoost verbessern? https://www.inwt-statistics.de/blog/predictive-llms-kann-gpt-xgboost-prognosen-verbessern [Podcast] #43: Damit es im Live-Betrieb nicht kracht: Vermeidung von Overfitting & Data Leakage https://www.podbean.com/ew/pb-vw736-15baac0 [Link] Llama-3.1-8B-Instruct auf Huggingface https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct - [Link] Mistral-7B-Instruct-v0.3 auf Huggingface https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3 [Link] Mistral 7B Release Notes https://mistral.ai/news/announcing-mistral-7b/ [Link] leo-hessianai-7b auf Huggingface https://huggingface.co/LeoLM/leo-hessianai-7b [Link] The Hessian Center for Artificial Intelligence https://hessian.ai/de/ [Docs] LangChain: How to return structured data from a model https://python.langchain.com/docs/how_to/structured_output/#the-with_structured_output-method [Link] Wie hoch sind die Treibhausgasemissionen pro Person in Deutschland durchschnittlich? https://www.umweltbundesamt.de/service/uba-fragen/wie-hoch-sind-die-treibhausgasemissionen-pro-person#:~:text=Der%20deutsche%20Aussto%C3%9F%20an%20Treibhausgasen,sehr%20gro%C3%9Fe%20Unterschiede%20im%20Konsumniveau.

JACC: Asia - Brief Introduction - Validating Machine Learning Models Against the Saline Test Gold Standard for Primary Aldosteronism Diagnosis

JACC Speciality Journals

Play Episode Listen Later Dec 3, 2024 1:35

Commentary by Dr. Jian'an Wang.

[48] Tianqi Chen - Scalable and Intelligent Learning Systems

The Thesis Review

Play Episode Listen Later Oct 28, 2024 46:29

Tianqi Chen is an Assistant Professor in the Machine Learning Department and Computer Science Department at Carnegie Mellon University and the Chief Technologist of OctoML. His research focuses on the intersection of machine learning and systems. Tianqi's PhD thesis is titled "Scalable and Intelligent Learning Systems," which he completed in 2019 at the University of Washington. We discuss his influential work on machine learning systems, starting with the development of XGBoost,an optimized distributed gradient boosting library that has had an enormous impact in the field. We also cover his contributions to deep learning frameworks like MXNet and machine learning compilation with TVM, and connect these to modern generative AI. - Episode notes: www.wellecks.com/thesisreview/episode48.html - Follow the Thesis Review (@thesisreview) and Sean Welleck (@wellecks) on Twitter - Follow Tianqi Chen on Twitter (@tqchenml) - Support The Thesis Review at www.patreon.com/thesisreview or www.buymeacoffee.com/thesisreview

university ai washington phd assistant professor chen intelligent carnegie mellon university scalable chief technologist computer science department learning systems tvm xgboost tianqi mxnet octoml

#55: Alle machen XGBoost, aber was macht eigentlich XGBoost? Mit Matthäus Deutsch

AI DAILY: Breaking News in AI

Play Episode Listen Later Sep 16, 2024 42:35

Warum ist XGBoost seit Jahren das Tool der Wahl, wenn es um tabulare Daten geht? Mira spricht zusammen mit Matthäus Deutsch darüber, warum XGBoost State of the Art ist und was es so erfolgreich macht. Außerdem: Wie schlägt sich XGBoost im Vergleich zu Deep Learning? Und gibt es überhaupt bessere Alternativen? **Links** Kaggle AI Report 2023: https://storage.googleapis.com/kaggle-media/reports/2023_Kaggle_AI_Report.pdf?trk=public_post_comment-text XGBoost Documentation: https://xgboost.readthedocs.io/en/stable/ Hastie, T.; Tibshirani, R. & Friedman, J. (2009), The elements of statistical learning: data mining, inference and prediction , Springer (ISBN: 0387848576)

art tool macht machen wahl daten vergleich deutsch friedman alternativen deep learning matth hastie xgboost

AI: MORE LAME MEMES

Play Episode Listen Later Aug 21, 2024 4:05

AI in the 2024 Election: More Memes, Less Misinformation AI's influence on the 2024 election is manifesting more through humorous memes than dangerous lies. While AI-generated content is widespread, it's mostly recognizable as satire, not deception. However, there's concern that the proliferation of AI imagery might erode public trust, making it harder to distinguish between real and fake content as Election Day nears. LVMH CEO Bernard Arnault Turns to AI Amid Declining Sales Amid declining sales and a reduced net worth, LVMH CEO Bernard Arnault is pivoting towards AI, investing over $300 million in AI startups in 2024. Despite luxury sales slowing, Arnault aims to leverage AI technologies to revitalize his company's portfolio, including innovations like AI-powered digital fitting rooms and customer behavior analysis platforms. Study Debunks AI's Existential Threat Myth A new study led by computer scientists Iryna Gurevych and Harish Tayyar Madabushi finds that large language models (LLMs) aren't capable of going rogue. The models are limited by their programming, debunking fears of AI developing dangerous, independent abilities. AI Predicts Future Disasters by Spotting Tipping Points A new AI algorithm developed by scientists at Tongji University can predict catastrophic tipping points, such as ecological collapse or financial crashes. The AI combines neural networks to analyze complex systems, accurately predicting transitions like tropical forests turning into savannahs. This breakthrough could help mitigate damage from sudden, critical shifts. AI Revolutionizes Disease Detection with Tongue Scans A new AI model identifies diseases with 98% accuracy by analyzing tongue scans, using advanced imaging and machine learning. The XGBoost algorithm, tested on over 5,260 images, excels in detecting health issues like diabetes and respiratory problems, enhancing non-invasive diagnostics. AI Could Help Diagnose OCD Faster: Promising Study from Stanford A Stanford study shows that AI, specifically large language models like ChatGPT-4, could significantly improve the speed and accuracy of diagnosing obsessive-compulsive disorder (OCD). The AI models outperformed human professionals, suggesting potential for earlier detection and better patient outcomes. AI Drives Breakthroughs in Polymer Discovery Georgia Tech researchers are using AI to accelerate the discovery of next-generation polymers for applications like energy storage and recyclable plastics. Their AI models predict polymer properties before lab testing, speeding up development and transforming industrial materials R&D. Challenges include data quality and real-world validation.

ai challenges chatgpt memes election day obsessive compulsive disorder lame arnault their ai xgboost tongji university

Llama 2, 3 & 4: Synthetic Data, RLHF, Agents on the path to Open Source AGI

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Jul 23, 2024 65:07

If you see this in time, join our emergency LLM paper club on the Llama 3 paper!For everyone else, join our special AI in Action club on the Latent Space Discord for a special feature with the Cursor cofounders on Composer, their newest coding agent!Today, Meta is officially releasing the largest and most capable open model to date, Llama3-405B, a dense transformer trained on 15T tokens that beats GPT-4 on all major benchmarks:The 8B and 70B models from the April Llama 3 release have also received serious spec bumps, warranting the new label of Llama 3.1.If you are curious about the infra / hardware side, go check out our episode with Soumith Chintala, one of the AI infra leads at Meta. Today we have Thomas Scialom, who led Llama2 and now Llama3 post-training, so we spent most of our time on pre-training (synthetic data, data pipelines, scaling laws, etc) and post-training (RLHF vs instruction tuning, evals, tool calling).Synthetic data is all you needLlama3 was trained on 15T tokens, 7x more than Llama2 and with 4 times as much code and 30 different languages represented. But as Thomas beautifully put it:“My intuition is that the web is full of s**t in terms of text, and training on those tokens is a waste of compute.” “Llama 3 post-training doesn't have any human written answers there basically… It's just leveraging pure synthetic data from Llama 2.”While it is well speculated that the 8B and 70B were "offline distillations" of the 405B, there are a good deal more synthetic data elements to Llama 3.1 than the expected. The paper explicitly calls out:* SFT for Code: 3 approaches for synthetic data for the 405B bootstrapping itself with code execution feedback, programming language translation, and docs backtranslation.* SFT for Math: The Llama 3 paper credits the Let's Verify Step By Step authors, who we interviewed at ICLR:* SFT for Multilinguality: "To collect higher quality human annotations in non-English languages, we train a multilingual expert by branching off the pre-training run and continuing to pre-train on a data mix that consists of 90% multilingualtokens."* SFT for Long Context: "It is largely impractical to get humans to annotate such examples due to the tedious and time-consuming nature of reading lengthy contexts, so we predominantly rely on synthetic data to fill this gap. We use earlier versions of Llama 3 to generate synthetic data based on the key long-context use-cases: (possibly multi-turn) question-answering, summarization for long documents, and reasoning over code repositories, and describe them in greater detail below"* SFT for Tool Use: trained for Brave Search, Wolfram Alpha, and a Python Interpreter (a special new ipython role) for single, nested, parallel, and multiturn function calling.* RLHF: DPO preference data was used extensively on Llama 2 generations. This is something we partially covered in RLHF 201: humans are often better at judging between two options (i.e. which of two poems they prefer) than creating one (writing one from scratch). Similarly, models might not be great at creating text but they can be good at classifying their quality.Last but not least, Llama 3.1 received a license update explicitly allowing its use for synthetic data generation.Llama2 was also used as a classifier for all pre-training data that went into the model. It both labelled it by quality so that bad tokens were removed, but also used type (i.e. science, law, politics) to achieve a balanced data mix. Tokenizer size mattersThe tokens vocab of a model is the collection of all tokens that the model uses. Llama2 had a 34,000 tokens vocab, GPT-4 has 100,000, and 4o went up to 200,000. Llama3 went up 4x to 128,000 tokens. You can find the GPT-4 vocab list on Github.This is something that people gloss over, but there are many reason why a large vocab matters:* More tokens allow it to represent more concepts, and then be better at understanding the nuances.* The larger the tokenizer, the less tokens you need for the same amount of text, extending the perceived context size. In Llama3's case, that's ~30% more text due to the tokenizer upgrade. * With the same amount of compute you can train more knowledge into the model as you need fewer steps.The smaller the model, the larger the impact that the tokenizer size will have on it. You can listen at 55:24 for a deeper explanation.Dense models = 1 Expert MoEsMany people on X asked “why not MoE?”, and Thomas' answer was pretty clever: dense models are just MoEs with 1 expert :)[00:28:06]: I heard that question a lot, different aspects there. Why not MoE in the future? The other thing is, I think a dense model is just one specific variation of the model for an hyperparameter for an MOE with basically one expert. So it's just an hyperparameter we haven't optimized a lot yet, but we have some stuff ongoing and that's an hyperparameter we'll explore in the future.Basically… wait and see!Llama4Meta already started training Llama4 in June, and it sounds like one of the big focuses will be around agents. Thomas was one of the authors behind GAIA (listen to our interview with Thomas in our ICLR recap) and has been working on agent tooling for a while with things like Toolformer. Current models have “a gap of intelligence” when it comes to agentic workflows, as they are unable to plan without the user relying on prompting techniques and loops like ReAct, Chain of Thought, or frameworks like Autogen and Crew. That may be fixed soon?

english ai france action future state french phd data focus microsoft mit teacher current character code chatgpt web improving singapore period latin honestly blm bay area researchers architecture arena cto bloom nlp react academia chain openai residence composer bits open source gaia gpt github guillaume llama jarvis synthetic google docs llm reasoning gpu genai elo agi sorbonne udio node instruct kepler gpus raspberry 7b dense benchmarking deepmind anthropic noam tldr grammarly alessio latex recitals gans 8b mistral meta ai cursor chinchillas alphago galactica moes mattersthe wolfram alpha gaussian allen institute yann lecun sorbonne university 70b 400b andrej karpathy 128k sft rephrase huggingface tool use bpe model b rlhf brave search xgboost iclr latent space lhf llama2

#50: Predictive Analytics mit LLMs: ist GPT3.5 besser als XGBoost?

artificial intelligence kann portal statistics zudem besser daten turbo nase classification prognosen prognose large language models international conference fine tuning predictive analytics klassifikation openai api xgboost

Play Episode Listen Later Jul 4, 2024 38:44

Wir lassen GPT3.5 Turbo und XGBoost bei der Prognose einer metrischen Zielvariablen gegeneinander antreten. Dafür haben wir von LOT Internet Fahrzeugdaten aus dem Portal mobile.de bereitgestellt bekommen, um zu sehen, wer bei der Prognose des Fahrzeugpreises die Nase vorn hat. Zudem besprechen wir das Finetuning und gehen auch darauf ein, wie LLMs und XGBoost kombiniert werden können. ***Links*** Blogartikel: Predictive LLMs: Kann GPT-3.5 die Prognosen von XGBoost verbessern? https://www.inwt-statistics.de/blog/predictive-llms-kann-gpt-xgboost-prognosen-verbessern #27: Kann ein Large Language Model (LLM) bei der Klassifikation tabellarischer Daten XGBoost schlagen? https://inwt.podbean.com/e/27-kann-ein-large-language-model-llm-bei-der-klassifikation-tabellarischer-daten-xgboost-schlagen/ OpenAI API: https://platform.openai.com/docs/introduction LLMs für Prognosen auf tabularen Daten zu nutzen, ist wenig erforscht. Wenn es probiert wurde, geht es meistens um Klassifikation, also keine metrische Zielvariable. Ein oft zitiertes Paper hierzu ist dieses: TabLLM: Few-shot Classification of Tabular Data with Large Language Models (Stefan Hegselmann, Alejandro Buendia, Hunter Lang, Monica Agrawal, Xiaoyi Jiang, David Sontag Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:5549-5581, 2023.) https://proceedings.mlr.press/v206/hegselmann23a/hegselmann23a.pdf Till mit seinem Song In My Fantasy auf YouTube: https://www.youtube.com/watch?v=MU3oyJ1WR1U

High Agency Pydantic > VC Backed Frameworks — with Jason Liu of Instructor

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Apr 19, 2024 52:20

We are reuniting for the 2nd AI UX demo day in SF on Apr 28. Sign up to demo here! And don't forget tickets for the AI Engineer World's Fair — for early birds who join before keynote announcements!About a year ago there was a lot of buzz around prompt engineering techniques to force structured output. Our friend Simon Willison tweeted a bunch of tips and tricks, but the most iconic one is Riley Goodside making it a matter of life or death:Guardrails (friend of the pod and AI Engineer speaker), Marvin (AI Engineer speaker), and jsonformer had also come out at the time. In June 2023, Jason Liu (today's guest!) open sourced his “OpenAI Function Call and Pydantic Integration Module”, now known as Instructor, which quickly turned prompt engineering black magic into a clean, developer-friendly SDK. A few months later, model providers started to add function calling capabilities to their APIs as well as structured outputs support like “JSON Mode”, which was announced at OpenAI Dev Day (see recap here). In just a handful of months, we went from threatening to kill grandmas to first-class support from the research labs. And yet, Instructor was still downloaded 150,000 times last month. Why?What Instructor looks likeInstructor patches your LLM provider SDKs to offer a new response_model option to which you can pass a structure defined in Pydantic. It currently supports OpenAI, Anthropic, Cohere, and a long tail of models through LiteLLM.What Instructor is forThere are three core use cases to Instructor:* Extracting structured data: Taking an input like an image of a receipt and extracting structured data from it, such as a list of checkout items with their prices, fees, and coupon codes.* Extracting graphs: Identifying nodes and edges in a given input to extract complex entities and their relationships. For example, extracting relationships between characters in a story or dependencies between tasks.* Query understanding: Defining a schema for an API call and using a language model to resolve a request into a more complex one that an embedding could not handle. For example, creating date intervals from queries like “what was the latest thing that happened this week?” to then pass onto a RAG system or similar.Jason called all these different ways of getting data from LLMs “typed responses”: taking strings and turning them into data structures. Structured outputs as a planning toolThe first wave of agents was all about open-ended iteration and planning, with projects like AutoGPT and BabyAGI. Models would come up with a possible list of steps, and start going down the list one by one. It's really easy for them to go down the wrong branch, or get stuck on a single step with no way to intervene.What if these planning steps were returned to us as DAGs using structured output, and then managed as workflows? This also makes it easy to better train model on how to create these plans, as they are much more structured than a bullet point list. Once you have this structure, each piece can be modified individually by different specialized models. You can read some of Jason's experiments here:While LLMs will keep improving (Llama3 just got released as we write this), having a consistent structure for the output will make it a lot easier to swap models in and out. Jason's overall message on how we can move from ReAct loops to more controllable Agent workflows mirrors the “Process” discussion from our Elicit episode:Watch the talkAs a bonus, here's Jason's talk from last year's AI Engineer Summit. He'll also be a speaker at this year's AI Engineer World's Fair!Timestamps* [00:00:00] Introductions* [00:02:23] Early experiments with Generative AI at StitchFix* [00:08:11] Design philosophy behind the Instructor library* [00:11:12] JSON Mode vs Function Calling* [00:12:30] Single vs parallel function calling* [00:14:00] How many functions is too many?* [00:17:39] How to evaluate function calling* [00:20:23] What is Instructor good for?* [00:22:42] The Evolution from Looping to Workflow in AI Engineering* [00:27:03] State of the AI Engineering Stack* [00:28:26] Why Instructor isn't VC backed* [00:31:15] Advice on Pursuing Open Source Projects and Consulting* [00:36:00] The Concept of High Agency and Its Importance* [00:42:44] Prompts as Code and the Structure of AI Inputs and Outputs* [00:44:20] The Emergence of AI Engineering as a Distinct FieldShow notes* Jason on the UWaterloo mafia* Jason on Twitter, LinkedIn, website* Instructor docs* Max Woolf on the potential of Structured Output* swyx on Elo vs Cost* Jason on Anthropic Function Calling* Jason on Rejections, Advice to Young People* Jason on Bad Startup Ideas* Jason on Prompts as Code* Rysana's inversion models* Bryan Bischof's episode* Hamel HusainTranscriptAlessio [00:00:00]: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO at Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI.Swyx [00:00:16]: Hello, we're back in the remote studio with Jason Liu from Instructor. Welcome Jason.Jason [00:00:21]: Hey there. Thanks for having me.Swyx [00:00:23]: Jason, you are extremely famous, so I don't know what I'm going to do introducing you, but you're one of the Waterloo clan. There's like this small cadre of you that's just completely dominating machine learning. Actually, can you list like Waterloo alums that you're like, you know, are just dominating and crushing it right now?Jason [00:00:39]: So like John from like Rysana is doing his inversion models, right? I know like Clive Chen from Waterloo. When I started the data science club, he was one of the guys who were like joining in and just like hanging out in the room. And now he was at Tesla working with Karpathy, now he's at OpenAI, you know.Swyx [00:00:56]: He's in my climbing club.Jason [00:00:58]: Oh, hell yeah. I haven't seen him in like six years now.Swyx [00:01:01]: To get in the social scene in San Francisco, you have to climb. So both in career and in rocks. So you started a data science club at Waterloo, we can talk about that, but then also spent five years at Stitch Fix as an MLE. You pioneered the use of OpenAI's LLMs to increase stylist efficiency. So you must have been like a very, very early user. This was like pretty early on.Jason [00:01:20]: Yeah, I mean, this was like GPT-3, okay. So we actually were using transformers at Stitch Fix before the GPT-3 model. So we were just using transformers for recommendation systems. At that time, I was very skeptical of transformers. I was like, why do we need all this infrastructure? We can just use like matrix factorization. When GPT-2 came out, I fine tuned my own GPT-2 to write like rap lyrics and I was like, okay, this is cute. Okay, I got to go back to my real job, right? Like who cares if I can write a rap lyric? When GPT-3 came out, again, I was very much like, why are we using like a post request to review every comment a person leaves? Like we can just use classical models. So I was very against language models for like the longest time. And then when ChatGPT came out, I basically just wrote a long apology letter to everyone at the company. I was like, hey guys, you know, I was very dismissive of some of this technology. I didn't think it would scale well, and I am wrong. This is incredible. And I immediately just transitioned to go from computer vision recommendation systems to LLMs. But funny enough, now that we have RAG, we're kind of going back to recommendation systems.Swyx [00:02:21]: Yeah, speaking of that, I think Alessio is going to bring up the next one.Alessio [00:02:23]: Yeah, I was going to say, we had Bryan Bischof from Hex on the podcast. Did you overlap at Stitch Fix?Jason [00:02:28]: Yeah, he was like one of my main users of the recommendation frameworks that I had built out at Stitch Fix.Alessio [00:02:32]: Yeah, we talked a lot about RecSys, so it makes sense.Swyx [00:02:36]: So now I have adopted that line, RAG is RecSys. And you know, if you're trying to reinvent new concepts, you should study RecSys first, because you're going to independently reinvent a lot of concepts. So your system was called Flight. It's a recommendation framework with over 80% adoption, servicing 350 million requests every day. Wasn't there something existing at Stitch Fix? Why did you have to write one from scratch?Jason [00:02:56]: No, so I think because at Stitch Fix, a lot of the machine learning engineers and data scientists were writing production code, sort of every team's systems were very bespoke. It's like, this team only needs to do like real time recommendations with small data. So they just have like a fast API app with some like pandas code. This other team has to do a lot more data. So they have some kind of like Spark job that does some batch ETL that does a recommendation. And so what happens is each team writes their code differently. And I have to come in and refactor their code. And I was like, oh man, I'm refactoring four different code bases, four different times. Wouldn't it be better if all the code quality was my fault? Let me just write this framework, force everyone else to use it. And now one person can maintain five different systems, rather than five teams having their own bespoke system. And so it was really a need of just sort of standardizing everything. And then once you do that, you can do observability across the entire pipeline and make large sweeping improvements in this infrastructure, right? If we notice that something is slow, we can detect it on the operator layer. Just hey, hey, like this team, you guys are doing this operation is lowering our latency by like 30%. If you just optimize your Python code here, we can probably make an extra million dollars. So let's jump on a call and figure this out. And then a lot of it was doing all this observability work to figure out what the heck is going on and optimize this system from not only just a code perspective, sort of like harassingly or against saying like, we need to add caching here. We're doing duplicated work here. Let's go clean up the systems. Yep.Swyx [00:04:22]: Got it. One more system that I'm interested in finding out more about is your similarity search system using Clip and GPT-3 embeddings and FIASS, where you saved over $50 million in annual revenue. So of course they all gave all that to you, right?Jason [00:04:34]: No, no, no. I mean, it's not going up and down, but you know, I got a little bit, so I'm pretty happy about that. But there, you know, that was when we were doing fine tuning like ResNets to do image classification. And so a lot of it was given an image, if we could predict the different attributes we have in the merchandising and we can predict the text embeddings of the comments, then we can kind of build a image vector or image embedding that can capture both descriptions of the clothing and sales of the clothing. And then we would use these additional vectors to augment our recommendation system. And so with the recommendation system really was just around like, what are similar items? What are complimentary items? What are items that you would wear in a single outfit? And being able to say on a product page, let me show you like 15, 20 more things. And then what we found was like, hey, when you turn that on, you make a bunch of money.Swyx [00:05:23]: Yeah. So, okay. So you didn't actually use GPT-3 embeddings. You fine tuned your own? Because I was surprised that GPT-3 worked off the shelf.Jason [00:05:30]: Because I mean, at this point we would have 3 million pieces of inventory over like a billion interactions between users and clothes. So any kind of fine tuning would definitely outperform like some off the shelf model.Swyx [00:05:41]: Cool. I'm about to move on from Stitch Fix, but you know, any other like fun stories from the Stitch Fix days that you want to cover?Jason [00:05:46]: No, I think that's basically it. I mean, the biggest one really was the fact that I think for just four years, I was so bearish on language models and just NLP in general. I'm just like, none of this really works. Like, why would I spend time focusing on this? I got to go do the thing that makes money, recommendations, bounding boxes, image classification. Yeah. Now I'm like prompting an image model. I was like, oh man, I was wrong.Swyx [00:06:06]: So my Stitch Fix question would be, you know, I think you have a bit of a drip and I don't, you know, my primary wardrobe is free startup conference t-shirts. Should more technology brothers be using Stitch Fix? What's your fashion advice?Jason [00:06:19]: Oh man, I mean, I'm not a user of Stitch Fix, right? It's like, I enjoy going out and like touching things and putting things on and trying them on. Right. I think Stitch Fix is a place where you kind of go because you want the work offloaded. I really love the clothing I buy where I have to like, when I land in Japan, I'm doing like a 45 minute walk up a giant hill to find this weird denim shop. That's the stuff that really excites me. But I think the bigger thing that's really captured is this idea that narrative matters a lot to human beings. Okay. And I think the recommendation system, that's really hard to capture. It's easy to use AI to sell like a $20 shirt, but it's really hard for AI to sell like a $500 shirt. But people are buying $500 shirts, you know what I mean? There's definitely something that we can't really capture just yet that we probably will figure out how to in the future.Swyx [00:07:07]: Well, it'll probably output in JSON, which is what we're going to turn to next. Then you went on a sabbatical to South Park Commons in New York, which is unusual because it's based on USF.Jason [00:07:17]: Yeah. So basically in 2020, really, I was enjoying working a lot as I was like building a lot of stuff. This is where we were making like the tens of millions of dollars doing stuff. And then I had a hand injury. And so I really couldn't code anymore for like a year, two years. And so I kind of took sort of half of it as medical leave, the other half I became more of like a tech lead, just like making sure the systems were like lights were on. And then when I went to New York, I spent some time there and kind of just like wound down the tech work, you know, did some pottery, did some jujitsu. And after GPD came out, I was like, oh, I clearly need to figure out what is going on here because something feels very magical. I don't understand it. So I spent basically like five months just prompting and playing around with stuff. And then afterwards, it was just my startup friends going like, hey, Jason, you know, my investors want us to have an AI strategy. Can you help us out? And it just snowballed and bore more and more until I was making this my full time job. Yeah, got it.Swyx [00:08:11]: You know, you had YouTube University and a journaling app, you know, a bunch of other explorations. But it seems like the most productive or the best known thing that came out of your time there was Instructor. Yeah.Jason [00:08:22]: Written on the bullet train in Japan. I think at some point, you know, tools like Guardrails and Marvin came out. Those are kind of tools that I use XML and Pytantic to get structured data out. But they really were doing things sort of in the prompt. And these are built with sort of the instruct models in mind. Like I'd already done that in the past. Right. At Stitch Fix, you know, one of the things we did was we would take a request note and turn that into a JSON object that we would use to send it to our search engine. Right. So if you said like, I want to, you know, skinny jeans that were this size, that would turn into JSON that we would send to our internal search APIs. But it always felt kind of gross. A lot of it is just like you read the JSON, you like parse it, you make sure the names are strings and ages are numbers and you do all this like messy stuff. But when function calling came out, it was very much sort of a new way of doing things. Right. Function calling lets you define the schema separate from the data and the instructions. And what this meant was you can kind of have a lot more complex schemas and just map them in Pytantic. And then you can just keep those very separate. And then once you add like methods, you can add validators and all that kind of stuff. The one thing I really had with a lot of these libraries, though, was it was doing a lot of the string formatting themselves, which was fine when it was the instruction to models. You just have a string. But when you have these new chat models, you have these chat messages. And I just didn't really feel like not being able to access that for the developer was sort of a good benefit that they would get. And so I just said, let me write like the most simple SDK around the OpenAI SDK, a simple wrapper on the SDK, just handle the response model a bit and kind of think of myself more like requests than actual framework that people can use. And so the goal is like, hey, like this is something that you can use to build your own framework. But let me just do all the boring stuff that nobody really wants to do. People want to build their own frameworks, but people don't want to build like JSON parsing.Swyx [00:10:08]: And the retrying and all that other stuff.Jason [00:10:10]: Yeah.Swyx [00:10:11]: Right. We had this a little bit of this discussion before the show, but like that design principle of going for being requests rather than being Django. Yeah. So what inspires you there? This has come from a lot of prior pain. Are there other open source projects that inspired your philosophy here? Yeah.Jason [00:10:25]: I mean, I think it would be requests, right? Like, I think it is just the obvious thing you install. If you were going to go make HTTP requests in Python, you would obviously import requests. Maybe if you want to do more async work, there's like future tools, but you don't really even think about installing it. And when you do install it, you don't think of it as like, oh, this is a requests app. Right? Like, no, this is just Python. The bigger question is, like, a lot of people ask questions like, oh, why isn't requests like in the standard library? Yeah. That's how I want my library to feel, right? It's like, oh, if you're going to use the LLM SDKs, you're obviously going to install instructor. And then I think the second question would be like, oh, like, how come instructor doesn't just go into OpenAI, go into Anthropic? Like, if that's the conversation we're having, like, that's where I feel like I've succeeded. Yeah. It's like, yeah, so standard, you may as well just have it in the base libraries.Alessio [00:11:12]: And the shape of the request stayed the same, but initially function calling was maybe equal structure outputs for a lot of people. I think now the models also support like JSON mode and some of these things and, you know, return JSON or my grandma is going to die. All of that stuff is maybe to decide how have you seen that evolution? Like maybe what's the metagame today? Should people just forget about function calling for structure outputs or when is structure output like JSON mode the best versus not? We'd love to get any thoughts given that you do this every day.Jason [00:11:42]: Yeah, I would almost say these are like different implementations of like the real thing we care about is the fact that now we have typed responses to language models. And because we have that type response, my IDE is a little bit happier. I get autocomplete. If I'm using the response wrong, there's a little red squiggly line. Like those are the things I care about in terms of whether or not like JSON mode is better. I usually think it's almost worse unless you want to spend less money on like the prompt tokens that the function call represents, primarily because with JSON mode, you don't actually specify the schema. So sure, like JSON load works, but really, I care a lot more than just the fact that it is JSON, right? I think function calling gives you a tool to specify the fact like, okay, this is a list of objects that I want and each object has a name or an age and I want the age to be above zero and I want to make sure it's parsed correctly. That's where kind of function calling really shines.Alessio [00:12:30]: Any thoughts on single versus parallel function calling? So I did a presentation at our AI in Action Discord channel, and obviously showcase instructor. One of the big things that we have before with single function calling is like when you're trying to extract lists, you have to make these funky like properties that are lists to then actually return all the objects. How do you see the hack being put on the developer's plate versus like more of this stuff just getting better in the model? And I know you tweeted recently about Anthropic, for example, you know, some lists are not lists or strings and there's like all of these discrepancies.Jason [00:13:04]: I almost would prefer it if it was always a single function call. Obviously, there is like the agents workflows that, you know, Instructor doesn't really support that well, but are things that, you know, ought to be done, right? Like you could define, I think maybe like 50 or 60 different functions in a single API call. And, you know, if it was like get the weather or turn the lights on or do something else, it makes a lot of sense to have these parallel function calls. But in terms of an extraction workflow, I definitely think it's probably more helpful to have everything be a single schema, right? Just because you can sort of specify relationships between these entities that you can't do in a parallel function calling, you can have a single chain of thought before you generate a list of results. Like there's like small like API differences, right? Where if it's for parallel function calling, if you do one, like again, really, I really care about how the SDK looks and says, okay, do I always return a list of functions or do you just want to have the actual object back out and you want to have like auto complete over that object? Interesting.Alessio [00:14:00]: What's kind of the cap for like how many function definitions you can put in where it still works well? Do you have any sense on that?Jason [00:14:07]: I mean, for the most part, I haven't really had a need to do anything that's more than six or seven different functions. I think in the documentation, they support way more. I don't even know if there's any good evals that have over like two dozen function calls. I think if you're running into issues where you have like 20 or 50 or 60 function calls, I think you're much better having those specifications saved in a vector database and then have them be retrieved, right? So if there are 30 tools, like you should basically be like ranking them and then using the top K to do selection a little bit better rather than just like shoving like 60 functions into a single. Yeah.Swyx [00:14:40]: Yeah. Well, I mean, so I think this is relevant now because previously I think context limits prevented you from having more than a dozen tools anyway. And now that we have million token context windows, you know, a cloud recently with their new function calling release said they can handle over 250 tools, which is insane to me. That's, that's a lot. You're saying like, you know, you don't think there's many people doing that. I think anyone with a sort of agent like platform where you have a bunch of connectors, they wouldn't run into that problem. Probably you're right that they should use a vector database and kind of rag their tools. I know Zapier has like a few thousand, like 8,000, 9,000 connectors that, you know, obviously don't fit anywhere. So yeah, I mean, I think that would be it unless you need some kind of intelligence that chains things together, which is, I think what Alessio is coming back to, right? Like there's this trend about parallel function calling. I don't know what I think about that. Anthropic's version was, I think they use multiple tools in sequence, but they're not in parallel. I haven't explored this at all. I'm just like throwing this open to you as to like, what do you think about all these new things? Yeah.Jason [00:15:40]: It's like, you know, do we assume that all function calls could happen in any order? In which case, like we either can assume that, or we can assume that like things need to happen in some kind of sequence as a DAG, right? But if it's a DAG, really that's just like one JSON object that is the entire DAG rather than going like, okay, the order of the function that return don't matter. That's definitely just not true in practice, right? Like if I have a thing that's like turn the lights on, like unplug the power, and then like turn the toaster on or something like the order doesn't matter. And it's unclear how well you can describe the importance of that reasoning to a language model yet. I mean, I'm sure you can do it with like good enough prompting, but I just haven't any use cases where the function sequence really matters. Yeah.Alessio [00:16:18]: To me, the most interesting thing is the models are better at picking than your ranking is usually. Like I'm incubating a company around system integration. For example, with one system, there are like 780 endpoints. And if you're actually trying to do vector similarity, it's not that good because the people that wrote the specs didn't have in mind making them like semantically apart. You know, they're kind of like, oh, create this, create this, create this. Versus when you give it to a model, like in Opus, you put them all, it's quite good at picking which ones you should actually run. And I'm curious to see if the model providers actually care about some of those workflows or if the agent companies are actually going to build very good rankers to kind of fill that gap.Jason [00:16:58]: Yeah. My money is on the rankers because you can do those so easily, right? You could just say, well, given the embeddings of my search query and the embeddings of the description, I can just train XGBoost and just make sure that I have very high like MRR, which is like mean reciprocal rank. And so the only objective is to make sure that the tools you use are in the top end filtered. Like that feels super straightforward and you don't have to actually figure out how to fine tune a language model to do tool selection anymore. Yeah. I definitely think that's the case because for the most part, I imagine you either have like less than three tools or more than a thousand. I don't know what kind of company said, oh, thank God we only have like 185 tools and this works perfectly, right? That's right.Alessio [00:17:39]: And before we maybe move on just from this, it was interesting to me, you retweeted this thing about Anthropic function calling and it was Joshua Brown's retweeting some benchmark that it's like, oh my God, Anthropic function calling so good. And then you retweeted it and then you tweeted it later and it's like, it's actually not that good. What's your flow? How do you actually test these things? Because obviously the benchmarks are lying, right? Because the benchmarks say it's good and you said it's bad and I trust you more than the benchmark. How do you think about that? And then how do you evolve it over time?Jason [00:18:09]: It's mostly just client data. I actually have been mostly busy with enough client work that I haven't been able to reproduce public benchmarks. And so I can't even share some of the results in Anthropic. I would just say like in production, we have some pretty interesting schemas where it's like iteratively building lists where we're doing like updates of lists, like we're doing in place updates. So like upserts and inserts. And in those situations we're like, oh yeah, we have a bunch of different parsing errors. Numbers are being returned to strings. We were expecting lists of objects, but we're getting strings that are like the strings of JSON, right? So we had to call JSON parse on individual elements. Overall, I'm like super happy with the Anthropic models compared to the OpenAI models. Sonnet is very cost effective. Haiku is in function calling, it's actually better, but I think they just had to sort of file down the edges a little bit where like our tests pass, but then we actually deployed a production. We got half a percent of traffic having issues where if you ask for JSON, it'll try to talk to you. Or if you use function calling, you know, we'll have like a parse error. And so I think that definitely gonna be things that are fixed in like the upcoming weeks. But in terms of like the reasoning capabilities, man, it's hard to beat like 70% cost reduction, especially when you're building consumer applications, right? If you're building something for consultants or private equity, like you're charging $400, it doesn't really matter if it's a dollar or $2. But for consumer apps, it makes products viable. If you can go from four to Sonnet, you might actually be able to price it better. Yeah.Swyx [00:19:31]: I had this chart about the ELO versus the cost of all the models. And you could put trend graphs on each of those things about like, you know, higher ELO equals higher cost, except for Haiku. Haiku kind of just broke the lines, or the ISO ELOs, if you want to call it. Cool. Before we go too far into your opinions on just the overall ecosystem, I want to make sure that we map out the surface area of Instructor. I would say that most people would be familiar with Instructor from your talks and your tweets and all that. You had the number one talk from the AI Engineer Summit.Jason [00:20:03]: Two Liu. Jason Liu and Jerry Liu. Yeah.Swyx [00:20:06]: Yeah. Until I actually went through your cookbook, I didn't realize the surface area. How would you categorize the use cases? You have LLM self-critique, you have knowledge graphs in here, you have PII data sanitation. How do you characterize to people what is the surface area of Instructor? Yeah.Jason [00:20:23]: This is the part that feels crazy because really the difference is LLMs give you strings and Instructor gives you data structures. And once you get data structures, again, you can do every lead code problem you ever thought of. Right. And so I think there's a couple of really common applications. The first one obviously is extracting structured data. This is just be, okay, well, like I want to put in an image of a receipt. I want to give it back out a list of checkout items with a price and a fee and a coupon code or whatever. That's one application. Another application really is around extracting graphs out. So one of the things we found out about these language models is that not only can you define nodes, it's really good at figuring out what are nodes and what are edges. And so we have a bunch of examples where, you know, not only do I extract that, you know, this happens after that, but also like, okay, these two are dependencies of another task. And you can do, you know, extracting complex entities that have relationships. Given a story, for example, you could extract relationships of families across different characters. This can all be done by defining a graph. The last really big application really is just around query understanding. The idea is that like any API call has some schema and if you can define that schema ahead of time, you can use a language model to resolve a request into a much more complex request. One that an embedding could not do. So for example, I have a really popular post called like rag is more than embeddings. And effectively, you know, if I have a question like this, what was the latest thing that happened this week? That embeds to nothing, right? But really like that query should just be like select all data where the date time is between today and today minus seven days, right? What if I said, how did my writing change between this month and last month? Again, embeddings would do nothing. But really, if you could do like a group by over the month and a summarize, then you could again like do something much more interesting. And so this really just calls out the fact that embeddings really is kind of like the lowest hanging fruit. And using something like instructor can really help produce a data structure. And then you can just use your computer science and reason about the data structure. Maybe you say, okay, well, I'm going to produce a graph where I want to group by each month and then summarize them jointly. You can do that if you know how to define this data structure. Yeah.Swyx [00:22:29]: So you kind of run up against like the LangChains of the world that used to have that. They still do have like the self querying, I think they used to call it when we had Harrison on in our episode. How do you see yourself interacting with the other LLM frameworks in the ecosystem? Yeah.Jason [00:22:42]: I mean, if they use instructor, I think that's totally cool. Again, it's like, it's just Python, right? It's like asking like, oh, how does like Django interact with requests? Well, you just might make a request.get in a Django app, right? But no one would say, I like went off of Django because I'm using requests now. They should be ideally like sort of the wrong comparison in terms of especially like the agent workflows. I think the real goal for me is to go down like the LLM compiler route, which is instead of doing like a react type reasoning loop. I think my belief is that we should be using like workflows. If we do this, then we always have a request and a complete workflow. We can fine tune a model that has a better workflow. Whereas it's hard to think about like, how do you fine tune a better react loop? Yeah. You always train it to have less looping, in which case like you wanted to get the right answer the first time, in which case it was a workflow to begin with, right?Swyx [00:23:31]: Can you define workflow? Because I used to work at a workflow company, but I'm not sure this is a good term for everybody.Jason [00:23:36]: I'm thinking workflow in terms of like the prefect Zapier workflow. Like I want to build a DAG, I want you to tell me what the nodes and edges are. And then maybe the edges are also put in with AI. But the idea is that like, I want to be able to present you the entire plan and then ask you to fix things as I execute it, rather than going like, hey, I couldn't parse the JSON, so I'm going to try again. I couldn't parse the JSON, I'm going to try again. And then next thing you know, you spent like $2 on opening AI credits, right? Yeah. Whereas with the plan, you can just say, oh, the edge between node like X and Y does not run. Let me just iteratively try to fix that, fix the one that sticks, go on to the next component. And obviously you can get into a world where if you have enough examples of the nodes X and Y, maybe you can use like a vector database to find a good few shot examples. You can do a lot if you sort of break down the problem into that workflow and executing that workflow, rather than looping and hoping the reasoning is good enough to generate the correct output. Yeah.Swyx [00:24:35]: You know, I've been hammering on Devon a lot. I got access a couple of weeks ago. And obviously for simple tasks, it does well. For the complicated, like more than 10, 20 hour tasks, I can see- That's a crazy comparison.Jason [00:24:47]: We used to talk about like three, four loops. Only once it gets to like hour tasks, it's hard.Swyx [00:24:54]: Yeah. Less than an hour, there's nothing.Jason [00:24:57]: That's crazy.Swyx [00:24:58]: I mean, okay. Maybe my goalposts have shifted. I don't know. That's incredible.Jason [00:25:02]: Yeah. No, no. I'm like sub one minute executions. Like the fact that you're talking about 10 hours is incredible.Swyx [00:25:08]: I think it's a spectrum. I think I'm going to say this every single time I bring up Devon. Let's not reward them for taking longer to do things. Do you know what I mean? I think that's a metric that is easily abusable.Jason [00:25:18]: Sure. Yeah. You know what I mean? But I think if you can monotonically increase the success probability over an hour, that's winning to me. Right? Like obviously if you run an hour and you've made no progress. Like I think when we were in like auto GBT land, there was that one example where it's like, I wanted it to like buy me a bicycle overnight. I spent $7 on credit and I never found the bicycle. Yeah.Swyx [00:25:41]: Yeah. Right. I wonder if you'll be able to purchase a bicycle. Because it actually can do things in real world. It just needs to suspend to you for off and stuff. The point I was trying to make was that I can see it turning plans. I think one of the agents loopholes or one of the things that is a real barrier for agents is LLMs really like to get stuck into a lane. And you know what you're talking about, what I've seen Devon do is it gets stuck in a lane and it will just kind of change plans based on the performance of the plan itself. And it's kind of cool.Jason [00:26:05]: I feel like we've gone too much in the looping route and I think a lot of more plans and like DAGs and data structures are probably going to come back to help fill in some holes. Yeah.Alessio [00:26:14]: What do you think of the interface to that? Do you see it's like an existing state machine kind of thing that connects to the LLMs, the traditional DAG players? Do you think we need something new for like AI DAGs?Jason [00:26:25]: Yeah. I mean, I think that the hard part is going to be describing visually the fact that this DAG can also change over time and it should still be allowed to be fuzzy. I think in like mathematics, we have like plate diagrams and like Markov chain diagrams and like recurrent states and all that. Some of that might come into this workflow world. But to be honest, I'm not too sure. I think right now, the first steps are just how do we take this DAG idea and break it down to modular components that we can like prompt better, have few shot examples for and ultimately like fine tune against. But in terms of even the UI, it's hard to say what it will likely win. I think, you know, people like Prefect and Zapier have a pretty good shot at doing a good job.Swyx [00:27:03]: Yeah. You seem to use Prefect a lot. I actually worked at a Prefect competitor at Temporal and I'm also very familiar with Dagster. What else would you call out as like particularly interesting in the AI engineering stack?Jason [00:27:13]: Man, I almost use nothing. I just use Cursor and like PyTests. Okay. I think that's basically it. You know, a lot of the observability companies have... The more observability companies I've tried, the more I just use Postgres.Swyx [00:27:29]: Really? Okay. Postgres for observability?Jason [00:27:32]: But the issue really is the fact that these observability companies isn't actually doing observability for the system. It's just doing the LLM thing. Like I still end up using like Datadog or like, you know, Sentry to do like latency. And so I just have those systems handle it. And then the like prompt in, prompt out, latency, token costs. I just put that in like a Postgres table now.Swyx [00:27:51]: So you don't need like 20 funded startups building LLM ops? Yeah.Jason [00:27:55]: But I'm also like an old, tired guy. You know what I mean? Like I think because of my background, it's like, yeah, like the Python stuff, I'll write myself. But you know, I will also just use Vercel happily. Yeah. Yeah. So I'm not really into that world of tooling, whereas I think, you know, I spent three good years building observability tools for recommendation systems. And I was like, oh, compared to that, Instructor is just one call. I just have to put time star, time and then count the prompt token, right? Because I'm not doing a very complex looping behavior. I'm doing mostly workflows and extraction. Yeah.Swyx [00:28:26]: I mean, while we're on this topic, we'll just kind of get this out of the way. You famously have decided to not be a venture backed company. You want to do the consulting route. The obvious route for someone as successful as Instructor is like, oh, here's hosted Instructor with all tooling. Yeah. You just said you had a whole bunch of experience building observability tooling. You have the perfect background to do this and you're not.Jason [00:28:43]: Yeah. Isn't that sick? I think that's sick.Swyx [00:28:44]: I mean, I know why, because you want to go free dive.Jason [00:28:47]: Yeah. Yeah. Because I think there's two things. Right. Well, one, if I tell myself I want to build requests, requests is not a venture backed startup. Right. I mean, one could argue whether or not Postman is, but I think for the most part, it's like having worked so much, I'm more interested in looking at how systems are being applied and just having access to the most interesting data. And I think I can do that more through a consulting business where I can come in and go, oh, you want to build perfect memory. You want to build an agent. You want to build like automations over construction or like insurance and supply chain, or like you want to handle writing private equity, mergers and acquisitions reports based off of user interviews. Those things are super fun. Whereas like maintaining the library, I think is mostly just kind of like a utility that I try to keep up, especially because if it's not venture backed, I have no reason to sort of go down the route of like trying to get a thousand integrations. In my mind, I just go like, okay, 98% of the people use open AI. I'll support that. And if someone contributes another platform, that's great. I'll merge it in. Yeah.Swyx [00:29:45]: I mean, you only added Anthropic support this year. Yeah.Jason [00:29:47]: Yeah. You couldn't even get an API key until like this year, right? That's true. Okay. If I add it like last year, I was trying to like double the code base to service, you know, half a percent of all downloads.Swyx [00:29:58]: Do you think the market share will shift a lot now that Anthropic has like a very, very competitive offering?Jason [00:30:02]: I think it's still hard to get API access. I don't know if it's fully GA now, if it's GA, if you can get a commercial access really easily.Alessio [00:30:12]: I got commercial after like two weeks to reach out to their sales team.Jason [00:30:14]: Okay.Alessio [00:30:15]: Yeah.Swyx [00:30:16]: Two weeks. It's not too bad. There's a call list here. And then anytime you run into rate limits, just like ping one of the Anthropic staff members.Jason [00:30:21]: Yeah. Then maybe we need to like cut that part out. So I don't need to like, you know, spread false news.Swyx [00:30:25]: No, it's cool. It's cool.Jason [00:30:26]: But it's a common question. Yeah. Surely just from the price perspective, it's going to make a lot of sense. Like if you are a business, you should totally consider like Sonnet, right? Like the cost savings is just going to justify it if you actually are doing things at volume. And yeah, I think the SDK is like pretty good. Back to the instructor thing. I just don't think it's a billion dollar company. And I think if I raise money, the first question is going to be like, how are you going to get a billion dollar company? And I would just go like, man, like if I make a million dollars as a consultant, I'm super happy. I'm like more than ecstatic. I can have like a small staff of like three people. It's fun. And I think a lot of my happiest founder friends are those who like raised a tiny seed round, became profitable. They're making like 70, 60, 70, like MRR, 70,000 MRR and they're like, we don't even need to raise the seed round. Let's just keep it like between me and my co-founder, we'll go traveling and it'll be a great time. I think it's a lot of fun.Alessio [00:31:15]: Yeah. like say LLMs / AI and they build some open source stuff and it's like I should just raise money and do this and I tell people a lot it's like look you can make a lot more money doing something else than doing a startup like most people that do a company could make a lot more money just working somewhere else than the company itself do you have any advice for folks that are maybe in a similar situation they're trying to decide oh should I stay in my like high paid FAANG job and just tweet this on the side and do this on github should I go be a consultant like being a consultant seems like a lot of work so you got to talk to all these people you know there's a lot to unpackJason [00:31:54]: I think the open source thing is just like well I'm just doing it purely for fun and I'm doing it because I think I'm right but part of being right is the fact that it's not a venture backed startup like I think I'm right because this is all you need right so I think a part of the philosophy is the fact that all you need is a very sharp blade to sort of do your work and you don't actually need to build like a big enterprise so that's one thing I think the other thing too that I've kind of been thinking around just because I have a lot of friends at google that want to leave right now it's like man like what we lack is not money or skill like what we lack is courage you should like you just have to do this a hard thing and you have to do it scared anyways right in terms of like whether or not you do want to do a founder I think that's just a matter of optionality but I definitely recognize that the like expected value of being a founder is still quite low it is right I know as many founder breakups and as I know friends who raised a seed round this year right like that is like the reality and like you know even in from that perspective it's been tough where it's like oh man like a lot of incubators want you to have co-founders now you spend half the time like fundraising and then trying to like meet co-founders and find co-founders rather than building the thing this is a lot of time spent out doing uh things I'm not really good at. I do think there's a rising trend in solo founding yeah.Swyx [00:33:06]: You know I am a solo I think that something like 30 percent of like I forget what the exact status something like 30 percent of starters that make it to like series B or something actually are solo founder I feel like this must have co-founder idea mostly comes from YC and most everyone else copies it and then plenty of companies break up over co-founderJason [00:33:27]: Yeah and I bet it would be like I wonder how much of it is the people who don't have that much like and I hope this is not a diss to anybody but it's like you sort of you go through the incubator route because you don't have like the social equity you would need is just sort of like send an email to Sequoia and be like hey I'm going on this ride you want a ticket on the rocket ship right like that's very hard to sell my message if I was to raise money is like you've seen my twitter my life is sick I've decided to make it much worse by being a founder because this is something I have to do so do you want to come along otherwise I want to fund it myself like if I can't say that like I don't need the money because I can like handle payroll and like hire an intern and get an assistant like that's all fine but I really don't want to go back to meta I want to like get two years to like try to find a problem we're solving that feels like a bad timeAlessio [00:34:12]: Yeah Jason is like I wear a YSL jacket on stage at AI Engineer Summit I don't need your accelerator moneyJason [00:34:18]: And boots, you don't forget the boots. But I think that is a part of it right I think it is just like optionality and also just like I'm a lot older now I think 22 year old Jason would have been probably too scared and now I'm like too wise but I think it's a matter of like oh if you raise money you have to have a plan of spending it and I'm just not that creative with spending that much money yeah I mean to be clear you just celebrated your 30th birthday happy birthday yeah it's awesome so next week a lot older is relative to some some of the folks I think seeing on the career tipsAlessio [00:34:48]: I think Swix had a great post about are you too old to get into AI I saw one of your tweets in January 23 you applied to like Figma, Notion, Cohere, Anthropic and all of them rejected you because you didn't have enough LLM experience I think at that time it would be easy for a lot of people to say oh I kind of missed the boat you know I'm too late not gonna make it you know any advice for people that feel like thatJason [00:35:14]: Like the biggest learning here is actually from a lot of folks in jiu-jitsu they're like oh man like is it too late to start jiu-jitsu like I'll join jiu-jitsu once I get in more shape right it's like there's a lot of like excuses and then you say oh like why should I start now I'll be like 45 by the time I'm any good and say well you'll be 45 anyways like time is passing like if you don't start now you start tomorrow you're just like one more day behind if you're worried about being behind like today is like the soonest you can start right and so you got to recognize that like maybe you just don't want it and that's fine too like if you wanted you would have started I think a lot of these people again probably think of things on a too short time horizon but again you know you're gonna be old anyways you may as well just start now you knowSwyx [00:35:55]: One more thing on I guess the um career advice slash sort of vlogging you always go viral for this post that you wrote on advice to young people and the lies you tell yourself oh yeah yeah you said you were writing it for your sister.Jason [00:36:05]: She was like bummed out about going to college and like stressing about jobs and I was like oh and I really want to hear okay and I just kind of like text-to-sweep the whole thing it's crazy it's got like 50,000 views like I'm mind I mean your average tweet has more but that thing is like a 30-minute read nowSwyx [00:36:26]: So there's lots of stuff here which I agree with I you know I'm also of occasionally indulge in the sort of life reflection phase there's the how to be lucky there's the how to have high agency I feel like the agency thing is always a trend in sf or just in tech circles how do you define having high agencyJason [00:36:42]: I'm almost like past the high agency phase now now my biggest concern is like okay the agency is just like the norm of the vector what also matters is the direction right it's like how pure is the shot yeah I mean I think agency is just a matter of like having courage and doing the thing that's scary right you know if people want to go rock climbing it's like do you decide you want to go rock climbing then you show up to the gym you rent some shoes and you just fall 40 times or do you go like oh like I'm actually more intelligent let me go research the kind of shoes that I want okay like there's flatter shoes and more inclined shoes like which one should I get okay let me go order the shoes on Amazon I'll come back in three days like oh it's a little bit too tight maybe it's too aggressive I'm only a beginner let me go change no I think the higher agent person just like goes and like falls down 20 times right yeah I think the higher agency person is more focused on like process metrics versus outcome metrics right like from pottery like one thing I learned was if you want to be good at pottery you shouldn't count like the number of cups or bowls you make you should just weigh the amount of clay you use right like the successful person says oh I went through 100 pounds of clay right the less agency was like oh I've made six cups and then after I made six cups like there's not really what are you what do you do next no just pounds of clay pounds of clay same with the work here right so you just got to write the tweets like make the commits contribute open source like write the documentation there's no real outcome it's just a process and if you love that process you just get really good at the thing you're doingSwyx [00:38:04]: yeah so just to push back on this because obviously I mostly agree how would you design performance review systems because you were effectively saying we can count lines of code for developers rightJason [00:38:15]: I don't think that would be the actual like I think if you make that an outcome like I can just expand a for loop right I think okay so for performance review this is interesting because I've mostly thought of it from the perspective of science and not engineering I've been running a lot of engineering stand-ups primarily because there's not really that many machine learning folks the process outcome is like experiments and ideas right like if you think about outcome is what you might want to think about an outcome is oh I want to improve the revenue or whatnot but that's really hard but if you're someone who is going out like okay like this week I want to come up with like three or four experiments I might move the needle okay nothing worked to them they might think oh nothing worked like I suck but to me it's like wow you've closed off all these other possible avenues for like research like you're gonna get to the place that you're gonna figure out that direction really soon there's no way you try 30 different things and none of them work usually like 10 of them work five of them work really well two of them work really really well and one thing was like the nail in the head so agency lets you sort of capture the volume of experiments and like experience lets you figure out like oh that other half it's not worth doing right I think experience is going like half these prompting papers don't make any sense just use chain of thought and just you know use a for loop that's basically right it's like usually performance for me is around like how many experiments are you running how oftentimes are you trying.Alessio [00:39:32]: When do you give up on an experiment because a StitchFix you kind of give up on language models I guess in a way as a tool to use and then maybe the tools got better you were right at the time and then the tool improved I think there are similar paths in my engineering career where I try one approach and at the time it doesn't work and then the thing changes but then I kind of soured on that approach and I don't go back to it soonJason [00:39:51]: I see yeah how do you think about that loop so usually when I'm coaching folks and as they say like oh these things don't work I'm not going to pursue them in the future like one of the big things like hey the negative result is a result and this is something worth documenting like this is an academia like if it's negative you don't just like not publish right but then like what do you actually write down like what you should write down is like here are the conditions this is the inputs and the outputs we tried the experiment on and then one thing that's really valuable is basically writing down under what conditions would I revisit these experiments these things don't work because of what we had at the time if someone is reading this two years from now under what conditions will we try again that's really hard but again that's like another skill you kind of learn right it's like you do go back and you do experiments you figure out why it works now I think a lot of it here is just like scaling worked yeah rap lyrics you know that was because I did not have high enough quality data if we phase shift and say okay you don't even need training data oh great then it might just work a different domainAlessio [00:40:48]: Do you have anything in your list that is like it doesn't work now but I want to try it again later? Something that people should maybe keep in mind you know people always like agi when you know when are you going to know the agi is here maybe it's less than that but any stuff that you tried recently that didn't work thatJason [00:41:01]: You think will get there I mean I think the personal assistance and the writing I've shown to myself it's just not good enough yet so I hired a writer and I hired a personal assistant so now I'm gonna basically like work with these people until I figure out like what I can actually like automate and what are like the reproducible steps but like I think the experiment for me is like I'm gonna go pay a person like thousand dollars a month that helped me improve my life and then let me get them to help me figure like what are the components and how do I actually modularize something to get it to work because it's not just like a lot gmail calendar and like notion it's a little bit more complicated than that but we just don't know what that is yet those are two sort of systems that I wish gb4 or opus was actually good enough to just write me an essay but most of the essays are still pretty badSwyx [00:41:44]: yeah I would say you know on the personal assistance side Lindy is probably the one I've seen the most flow was at a speaker at the summit I don't know if you've checked it out or any other sort of agents assistant startupJason [00:41:54]: Not recently I haven't tried lindy they were not ga last time I was considering it yeah yeah a lot of it now it's like oh like really what I want you to do is take a look at all of my meetings and like write like a really good weekly summary email for my clients to remind them that I'm like you know thinking of them and like working for them right or it's like I want you to notice that like my monday is like way too packed and like block out more time and also like email the people to do the reschedule and then try to opt in to move them around and then I want you to say oh jason should have like a 15 minute prep break after form back to back those are things that now I know I can prompt them in but can it do it well like before I didn't even know that's what I wanted to prompt for us defragging a calendar and adding break so I can like eat lunch yeah that's the AGI test yeah exactly compassion right I think one thing that yeah we didn't touch on it before butAlessio [00:42:44]: I think was interesting you had this tweet a while ago about prompts should be code and then there were a lot of companies trying to build prompt engineering tooling kind of trying to turn the prompt into a more structured thing what's your thought today now you want to turn the thinking into DAGs like do prompts should still be code any updated ideasJason [00:43:04]: It's the same thing right I think you know with Instructor it is very much like the output model is defined as a code object that code object is sent to the LLM and in return you get a data structure so the outputs of these models I think should also be code objects and the inputs somewhat should be code objects but I think the one thing that instructor tries to do is separate instruction data and the types of the output and beyond that I really just think that most of it should be still like managed pretty closely to the developer like so much of is changing that if you give control of these systems away too early you end up ultimately wanting them back like many companies I know that I reach out or ones were like oh we're going off of the frameworks because now that we know what the business outcomes we're trying to optimize for these frameworks don't work yeah because we do rag but we want to do rag to like sell you supplements or to have you like schedule the fitness appointment the prompts are kind of too baked into the systems to really pull them back out and like start doing upselling or something it's really funny but a lot of it ends up being like once you understand the business outcomes you care way more about the promptSwyx [00:44:07]: Actually this is fun in our prep for this call we were trying to say like what can you as an independent person say that maybe me and Alessio cannot say or me you know someone at a company say what do you think is the market share of the frameworks the LangChain, the LlamaIndex, the everything...Jason [00:44:20]: Oh massive because not everyone wants to care about the code yeah right I think that's a different question to like what is the business model and are they going to be like massively profitable businesses right making hundreds of millions of dollars that feels like so straightforward right because not everyone is a prompt engineer like there's so much productivity to be captured in like back office optim automations right it's not because they care about the prompts that they care about managing these things yeah but those would be sort of low code experiences you yeah I think the bigger challenge is like okay hundred million dollars probably pretty easy it's just time and effort and they have the manpower and the money to sort of solve those problems again if you go the vc route then it's like you're talking about billions and that's really the goal that stuff for me it's like pretty unclear but again that is to say that like I sort of am building things for developers who want to use infrastructure to build their own tooling in terms of the amount of developers there are in the world versus downstream consumers of these things or even just think of how many companies will use like the adobes and the ibms right because they want something that's fully managed and they want something that they know will work and if the incremental 10% requires you to hire another team of 20 people you might not want to do it and I think that kind of organization is really good for uh those are bigger companiesSwyx [00:45:32]: I just want to capture your thoughts on one more thing which is you said you wanted most of the prompts to stay close to the developer and Hamel Husain wrote this post which I really love called f you show me the prompt yeah I think he cites you in one of those part of the blog post and I think ds pi is kind of like the complete antithesis of that which is I think it's interesting because I also hold the strong view that AI is a better prompt engineer than you are and I don't know how to square that wondering if you have thoughtsJason [00:45:58]: I think something like DSPy can work because there are like very short-term metrics to measure success right it is like did you find the pii or like did you write the multi-hop question the correct way but in these workflows that I've been managing a lot of it are we minimizing churn and maximizing retention yeah that's a very long loop it's not really like a uptuna like training loop right like those things are much more harder to capture so we don't actually have those metrics for that right and obviously we can figure out like okay is the summary good but like how do you measure the quality of the summary it's like that feedback loop it ends up being a lot longer and then again when something changes it's really hard to make sure that it works across these like newer models or again like changes to work for the current process like when we migrate from like anthropic to open ai like there's just a ton of change that are like infrastructure related not necessarily around the prompt itself yeah cool any other ai engineering startups that you think should not exist before we wrap up i mean oh my gosh i mean a lot of it again it's just like every time of investors like how does this make a billion dollars like it doesn't i'm gonna go back to just like tweeting and holding my breath underwater yeah like i don't really pay attention too much to most of this like most of the stuff i'm doing is around like the consumer of like llm calls yep i think people just want to move really fast and they will end up pick these vendors but i don't really know if anything has really like blown me out the water like i only trust myself but that's also a function of just being an old man like i think you know many companies are definitely very happy with using most of these tools anyways but i definitely think i occupy a very small space in the engineering ecosystem.Swyx [00:47:41]: Yeah i would say one of the challenges here you know you call about the dealing in the consumer of llm's space i think that's what ai engineering differs from ml engineering and i think a constant disconnect or cognitive dissonance in this field in the ai engineers that have sprung up is that they are not as good as the ml engineers they are not as qualified i think that you know you are someone who has credibility in the mle space and you are also a very authoritative figure in the ai space and i think so and you know i think you've built the de facto leading library i think yours i think instructors should be part of the standard lib even though i try to not use it like i basically also end up rebuilding instructor right like that's a lot of the back and forth that we had over the past two days i think that's the fundamental thing that we're trying to figure out like there's very small supply of MLEs not everyone's going to have that experience that you had but the global demand for AI is going to far outstrip the existing MLEs.Jason [00:48:36]: So what do we do do we force everyone to go through the standard MLE curriculum or do we make a new one? I'

god new york ai man japan advice state san francisco design evolution single numbers code chatgpt tesla ga defining flight identifying agency consulting agent structure concept spark models cto vc nlp react instructors openai function sf residence api backed python waterloo ui gpt emergence workflow notion apis clip frameworks temporal opus versus llm structured prompts elo agi sequoia django dag zapier ide dags usf hex postman haiku rag sonnets guardrails figma sdks anthropic alessio sentry yc rejections stitch fix ysl query extracting faang json xml mrr looping cursor datadog outputs pii etl markov prefect postgres joshua brown cohere mle elicit autogpt gbt gpd langchain ai engineer toolthe xgboost dagster simon willison latent space swix jerry liu babyagi

Best of: Why AI must embody the values of its users

The Future of Everything presented by Stanford Engineering

Play Episode Listen Later Apr 12, 2024 27:52

We're bringing back an episode about trust and AI. In a world where the use of Artificial Intelligence is exploding, guest computer scientist Carlos Guestrin shares insights from the work he's doing to support the development of trust between humans and machines. We originally recorded this episode in 2022, but the insights are just as if not more relevant today. We hope you'll take another listen and enjoy. Episode Reference Links:Carlos Ernesto Guestrin (Stanford Profile)Carlos Guestrin (Carlos' Website)Measuring Patients' Trust In Physicians When Assessing Quality Of Care (Paper Carlos discusses as comparison to his work with AI)Adding Glycemic And Physical Activity Metrics To A Multimodal Algorithm-Enabled Decision-Support Tool For Type 1 Diabetes Care (Carlos' published paper about Stanford Lucile Packard Children's Hospital diabetes type 1 project)XGBoost Documentation (Carlos' out-source project)Ep.172 - Why AI Must Embody the Value of Its Users YouTube / Website (Original Episode)Connect With Us:Episode Transcripts >>> The Future of Everything WebsiteConnect with Russ >>> Threads or Twitter/XConnect with School of Engineering >>> Twitter/XChapters:(00:00:00) Introduction Russ Altman introduces the episode with guest Carlos Guestrin, a professor of computer science at Stanford, whose focus is bringing AI into broader use.(00:02:58) Current Status of AIThe current capabilities of AI and machine learning and the widespread use and integration of these technologies.(00:05:44) Deep Dive into Trust and AIThree core components of trust in AI and how these factors influence the adoption and efficacy of AI systems.(00:09:43) Technical Challenges in Implementing TrustThe challenges of translating the abstract concepts of trust into practical, implementable AI features.(00:14:32) Enhancing AI Transparency and Generalization Methods to improve AI's generalisation capabilities and transparency.(00:18:00) The Role of Open-Source in AI DevelopmentThe impact of open-source software on the AI field, highlighting the benefits of shared knowledge and collaborative advancements.(00:22:34) AI in HealthcareHealthcare and the use of AI in enhancing data-driven decisions in medical treatments.(00:27:11) Conclusion Connect With Us:Episode Transcripts >>> The Future of Everything WebsiteConnect with Russ >>> Threads or Twitter/XConnect with School of Engineering >>> Twitter/X

trust ai school future hospitals artificial intelligence values deep dive engineering stanford russ threads machine learning users open source embody future of ai ai ethics tensorflow current status pytorch xgboost russ altman stanford engineering

771: Gradient Boosting: XGBoost, LightGBM and CatBoost, with Kirill Eremenko

The Machine Learning Podcast

Play Episode Listen Later Apr 2, 2024 119:00

Kirill Eremenko joins Jon Krohn for another exclusive, in-depth teaser for a new course just released on the SuperDataScience platform, “Machine Learning Level 2”. Kirill walks listeners through why decision trees and random forests are fruitful for businesses, and he offers hands-on walkthroughs for the three leading gradient-boosting algorithms today: XGBoost, LightGBM, and CatBoost. This episode is brought to you by Ready Tensor, where innovation meets reproducibility (https://www.readytensor.ai/), and by Data Universe, the out-of-this-world data conference (https://datauniverse2024.com). Interested in sponsoring a SuperDataScience Podcast episode? Visit passionfroot.me/superdatascience for sponsorship information. In this episode you will learn: • All about decision trees [09:28] • All about ensemble models [22:03] • All about AdaBoost [38:46] • All about gradient boosting [46:51] • Gradient boosting for classification problems [1:01:26] • All about LightGBM and CatBoost [1:05:39] Additional materials: www.superdatascience.com/771

boosting kirill gradient xgboost jon krohn

Strategies For Building A Product Using LLMs At DataChat

Play Episode Listen Later Mar 3, 2024 48:40

Summary Large Language Models (LLMs) have rapidly captured the attention of the world with their impressive capabilities. Unfortunately, they are often unpredictable and unreliable. This makes building a product based on their capabilities a unique challenge. Jignesh Patel is building DataChat to bring the capabilities of LLMs to organizational analytics, allowing anyone to have conversations with their business data. In this episode he shares the methods that he is using to build a product on top of this constantly shifting set of technologies. Announcements Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery. Your host is Tobias Macey and today I'm interviewing Jignesh Patel about working with LLMs; understanding how they work and how to build your own Interview Introduction How did you get involved in machine learning? Can you start by sharing some of the ways that you are working with LLMs currently? What are the business challenges involved in building a product on top of an LLM model that you don't own or control? In the current age of business, your data is often your strategic advantage. How do you avoid losing control of, or leaking that data while interfacing with a hosted LLM API? What are the technical difficulties related to using an LLM as a core element of a product when they are largely a black box? What are some strategies for gaining visibility into the inner workings or decision making rules for these models? What are the factors, whether technical or organizational, that might motivate you to build your own LLM for a business or product? Can you unpack what it means to "build your own" when it comes to an LLM? In your work at DataChat, how has the progression of sophistication in LLM technology impacted your own product strategy? What are the most interesting, innovative, or unexpected ways that you have seen LLMs/DataChat used? What are the most interesting, unexpected, or challenging lessons that you have learned while working with LLMs? When is an LLM the wrong choice? What do you have planned for the future of DataChat? Contact Info Website (https://jigneshpatel.org/) LinkedIn (https://www.linkedin.com/in/jigneshmpatel/) Parting Question From your perspective, what is the biggest barrier to adoption of machine learning today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast (https://www.dataengineeringpodcast.com) covers the latest on modern data management. Podcast.__init__ () covers the Python language, its community, and the innovative ways it is being used. Visit the site (https://www.themachinelearningpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com (mailto:hosts@themachinelearningpodcast.com)) with your story. To help other people find the show please leave a review on iTunes (https://podcasts.apple.com/us/podcast/the-machine-learning-podcast/id1626358243) and tell your friends and co-workers. Links DataChat (https://datachat.ai/) CMU == Carnegie Mellon University (https://www.cmu.edu/) SVM == Support Vector Machine (https://en.wikipedia.org/wiki/Support_vector_machine) Generative AI (https://en.wikipedia.org/wiki/Generative_artificial_intelligence) Genomics (https://en.wikipedia.org/wiki/Genomics) Proteomics (https://en.wikipedia.org/wiki/Proteomics) Parquet (https://parquet.apache.org/) OpenAI Codex (https://openai.com/blog/openai-codex) LLama (https://en.wikipedia.org/wiki/LLaMA) Mistral (https://mistral.ai/) Google Vertex (https://cloud.google.com/vertex-ai) Langchain (https://www.langchain.com/) Retrieval Augmented Generation (https://blogs.nvidia.com/blog/what-is-retrieval-augmented-generation/) Prompt Engineering (https://en.wikipedia.org/wiki/Prompt_engineering) Ensemble Learning (https://en.wikipedia.org/wiki/Ensemble_learning) XGBoost (https://xgboost.readthedocs.io/en/stable/) Catboost (https://catboost.ai/) Linear Regression (https://en.wikipedia.org/wiki/Linear_regression) COGS == Cost Of Goods Sold (https://www.investopedia.com/terms/c/cogs.asp) Bruce Schneier - AI And Trust (https://www.schneier.com/blog/archives/2023/12/ai-and-trust.html) The intro and outro music is from Hitman's Lovesong feat. Paola Graziano (https://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Tales_Of_A_Dead_Fish/Hitmans_Lovesong/) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/)/CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0/)

strategy product ensemble hitman python llama generative linear prompt llm genomics prompt engineering langchain linear regression xgboost openai codex freak fandango orchestra

How to Make Your Career in Data Science 2024? Earnings Revealed*

Tech Stories

Play Episode Listen Later Feb 2, 2024 5:09

career matrix revealed math data science earnings partial regression probability graphs derivatives bayes numerical rnn xgboost word2vec eigenvalues

Exploring Open-Source for Tissue Image Analysis and Data Science Business w/ Trevor McKee, Pathomics.io

Digital Pathology Podcast

Play Episode Listen Later Nov 7, 2023 31:21 Transcription Available

Exploring Image Analysis Innovation with Trevor McKee of Pathomics.ioIf you work in digital pathology, you likely rely on image analysis tools to gain insights from complex visual data. But how do you stay on top of the latest innovations in this fast-evolving field?In this podcast episode together with Trevor McKee, CEO of Pathomics.io, we discuss innovation in image analysis using open source tools.Pathomics takes an innovative approach by building image analysis solutions on open source platforms like QuPath. As Trevor explained, open source fosters collaboration, democratizes access, and drives rapid advances - key in a fast-moving field like digital pathology. This enables rapid progress that proprietary systems can't match.Trevor's Career JourneyTrevor's journey lead him from chemical engineering into pioneering image analysis, inspired by solving complex biological problems. His diverse experiences, from photon imaging at MIT to leading a core lab facility, fueled a passion for leveraging image analysis to extract insights. Today, in addition to leading Pathomics.io he is an Adjunct Lecturer at the University of Toronto, and the Chief Scientific Officer at BioCache™ Lab Solutions.Transparent and Reproducible Image Analysis & Explainable AIA core ethos at Pathomics is making image analysis transparent and reproducible. through explainable AI techniques. Tools like XGBoost create models that are easier to interpret than "black-box" end-to-end neural networks. This builds trust and acceptance among the scientific community.Streamlining WorkflowsIn addition, Pathomics develops solutions to streamline clients' image analysis workflows. For example, their Universal StarDist plugin makes it easy to run advanced models like StarDist in QuPath. Overall, the goal is to automate tedious tasks so you can concentrate on high-value decision making.The Future of Image AnalysisLooking ahead, Trevor shared his vision for an AI-powered online platform enabling users to go seamlessly from images to insights. He also discussed open wikis to prevent redundant work and encourage knowledge sharing as the field rapidly evolves.Trevor plans to launch it to catalogue digital pathology resources such as image analysis focused machine learning papers to prevent redundant research work and encourage knowledge sharing as the field rapidly evolves.. It aligns with his commitment to open science and community knowledge sharing.Key TakeawaysI came away from our wide-ranging discussion with an insider's view of the huge potential of image analysis to transform digital pathology. By leveraging open source tools and staying atop the latest advances, you can work smarter and unlock new capabilities.So tune in to explore these innovations and more from a leader in the field! The episode provides practical insights you can apply to make the most of the newest techniquesTHIS EPISODE'S RESOURCES:Pathomics.io website Pathomics Wiki with tissue image analysis papersInterested in contributing to Pathomics Wiki? Submit your entry here.List of open source software for image analysisDigital Pathology Club Free TrialSupport the showBecome a Digital Pathology Trailblazer and See you inside the club: Digital Pathology Club Membership

ceo university ai future toronto tools mit open source transparent data science mckee tissue chief scientific officer adjunct lecturer xgboost

Episode 173 : L'arbre qui cache la forêt aléatoire

Bigdata Hebdo

Play Episode Listen Later Oct 30, 2023 48:12

### Data-science* XGBoost 2.0: New Tool for Training Better AI Models on More Complex Data -> https://aibusiness.com/ml/xgboost-2-0-new-tool-for-training-better-ai-models-on-more-complex-data* Semantic link in Microsoft Fabric: Bridging BI and Data Science -> https://blog.fabric.microsoft.com/en-us/blog/semantic-link-use-fabric-notebooks-and-power-bi-datasets-for-machine-learning-data-validation-and-more* Mastering Customer Segmentation with LLMs -> https://towardsdatascience.com/mastering-customer-segmentation-with-llm-3d9008235f41### Tools* ELT with Meltano (PostgreSQL -> Snowflake) -> https://medium.com/@danthelion/elt-with-meltano-postgressql-snowflake-a543c077ae1a* Fast, Git Friendly API Client -> https://www.usebruno.com------------------Cette publication est sponsorisée par Affini-Tech et CerenIT.CerenIT vous accompagne pour concevoir, industrialiser ou automatiser vos plateformes mais aussi pour faire parler vos données temporelles. Ecrivez nous à contact@cerenit.fr et retrouvez-nous aussi au Time Series France.Affini-Tech vous accompagne dans tous vos projets Cloud et Data, pour Imaginer, Expérimenter et Executer vos services ! (Affini-Tech, Datatask) Consulter le blog d'Affini-Tech et le blog de Datatask pour en savoir plus. On recrute ! Venez cruncher de la data avec nous ! Ecrivez nous à recrutement@affini-tech.comLe générique a été composé et réalisé par Maxence Lecointe

data cloud data science snowflakes exp cache venez new tools semantic imaginer comle ecrivez xgboost

Bűnelőrejelző megoldás Rioban, itt az XGboost 2.0, a Gartner mondott egy nagyot, de lehet jobb lett volna ha inkább nem mond

Adatépítész - a magyar data podcast

Play Episode Listen Later Oct 29, 2023 24:45

Adatépítész -az első magyar datapodcast Minden ami hír, érdekesség, esemény vagy tudásmorzsa az adat, datascience, adatbányászat és hasonló kockaságok világából. Become a Patron! XGBoost 2.0 Rio a bűnös város Gartner

ai artificial intelligence rio machine learning data science mond gartner jobb minden data mining lett lehet adat megold nagyot xgboost

Bűnelőrejelző megoldás Rioban, itt az XGboost 2.0, a Gartner mondott egy nagyot, de lehet jobb lett volna ha inkább nem mond

Adatépítész - a magyar data podcast

Play Episode Listen Later Oct 29, 2023 24:45

rio mond gartner jobb minden lett lehet adat megold nagyot xgboost

232 XGBoost 2.0 とか DALL·E 3 システムカードとか

regonn&curry.fm

Play Episode Listen Later Oct 16, 2023 20:23

話した内容Blog⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ 今回は、XGBoost 2.0、DALLE3 paper、今週の分析コンペ、雑談・来週話したいことについて話しました。 #regonn_curry_fm へのお便りはこちら https://forms.gle/BZsrPSa4znoQNfww8

dall xgboost

Improving Classification Models With XGBoost

The Real Python Podcast

Play Episode Listen Later Aug 25, 2023 65:00

How can you improve a classification model while avoiding overfitting? Once you have a model, what tools can you use to explain it to others? This week on the show, we talk with author and Python trainer Matt Harrison about his new book Effective XGBoost: Tuning, Understanding, and Deploying Classification Models.

improving models python classification matt harrison xgboost

LLMs Everywhere: Running 70B models in browsers and iPhones using MLC — with Tianqi Chen of CMU / OctoML

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Aug 10, 2023 52:10

We have just announced our first set of speakers at AI Engineer Summit! Sign up for the livestream or email sponsors@ai.engineer if you'd like to support.We are facing a massive GPU crunch. As both startups and VC's hoard Nvidia GPUs like countries count nuclear stockpiles, tweets about GPU shortages have become increasingly common. But what if we could run LLMs with AMD cards, or without a GPU at all? There's just one weird trick: compilation. And there's one person uniquely qualified to do it.We had the pleasure to sit down with Tianqi Chen, who's an Assistant Professor at CMU, where he both teaches the MLC course and runs the MLC group. You might also know him as the creator of XGBoost, Apache TVM, and MXNet, as well as the co-founder of OctoML. The MLC (short for Machine Learning Compilation) group has released a lot of interesting projects:* MLC Chat: an iPhone app that lets you run models like RedPajama-3B and Vicuna-7B on-device. It gets up to 30 tok/s!* Web LLM: Run models like LLaMA-70B in your browser (!!) to offer local inference in your product.* MLC LLM: a framework that allows any language models to be deployed natively on different hardware and software stacks.The MLC group has just announced new support for AMD cards; we previously talked about the shortcomings of ROCm, but using MLC you can get performance very close to the NVIDIA's counterparts. This is great news for founders and builders, as AMD cards are more readily available. Here are their latest results on AMD's 7900s vs some of top NVIDIA consumer cards.If you just can't get a GPU at all, MLC LLM also supports ARM and x86 CPU architectures as targets by leveraging LLVM. While speed performance isn't comparable, it allows for non-time-sensitive inference to be run on commodity hardware.We also enjoyed getting a peek into TQ's process, which involves a lot of sketching:With all the other work going on in this space with projects like ggml and Ollama, we're excited to see GPUs becoming less and less of an issue to get models in the hands of more people, and innovative software solutions to hardware problems!Show Notes* TQ's Projects:* XGBoost* Apache TVM* MXNet* MLC* OctoML* CMU Catalyst* ONNX* GGML* Mojo* WebLLM* RWKV* HiPPO* Tri Dao's Episode* George Hotz EpisodePeople:* Carlos Guestrin* Albert GuTimestamps* [00:00:00] Intros* [00:03:41] The creation of XGBoost and its surprising popularity* [00:06:01] Comparing tree-based models vs deep learning* [00:10:33] Overview of TVM and how it works with ONNX* [00:17:18] MLC deep dive* [00:28:10] Using int4 quantization for inference of language models* [00:30:32] Comparison of MLC to other model optimization projects* [00:35:02] Running large language models in the browser with WebLLM* [00:37:47] Integrating browser models into applications* [00:41:15] OctoAI and self-optimizing compute* [00:45:45] Lightning RoundTranscriptAlessio: Hey everyone, welcome to the Latent Space podcast. This is Alessio, Partner and CTO in Residence at Decibel Partners, and I'm joined by my co-host Swyx, writer and editor of Latent Space. [00:00:20]Swyx: Okay, and we are here with Tianqi Chen, or TQ as people call him, who is assistant professor in ML computer science at CMU, Carnegie Mellon University, also helping to run Catalyst Group, also chief technologist of OctoML. You wear many hats. Are those, you know, your primary identities these days? Of course, of course. [00:00:42]Tianqi: I'm also, you know, very enthusiastic open source. So I'm also a VP and PRC member of the Apache TVM project and so on. But yeah, these are the things I've been up to so far. [00:00:53]Swyx: Yeah. So you did Apache TVM, XGBoost, and MXNet, and we can cover any of those in any amount of detail. But maybe what's one thing about you that people might not learn from your official bio or LinkedIn, you know, on the personal side? [00:01:08]Tianqi: Let me say, yeah, so normally when I do, I really love coding, even though like I'm trying to run all those things. So one thing that I keep a habit on is I try to do sketchbooks. I have a book, like real sketchbooks to draw down the design diagrams and the sketchbooks I keep sketching over the years, and now I have like three or four of them. And it's kind of a usually a fun experience of thinking the design through and also seeing how open source project evolves and also looking back at the sketches that we had in the past to say, you know, all these ideas really turn into code nowadays. [00:01:43]Alessio: How many sketchbooks did you get through to build all this stuff? I mean, if one person alone built one of those projects, he'll be a very accomplished engineer. Like you built like three of these. What's that process like for you? Like it's the sketchbook, like the start, and then you think about the code or like. [00:01:59]Swyx: Yeah. [00:02:00]Tianqi: So, so usually I start sketching on high level architectures and also in a project that works for over years, we also start to think about, you know, new directions, like of course generative AI language model comes in, how it's going to evolve. So normally I would say it takes like one book a year, roughly at that rate. It's usually fun to, I find it's much easier to sketch things out and then gives a more like a high level architectural guide for some of the future items. Yeah. [00:02:28]Swyx: Have you ever published this sketchbooks? Cause I think people would be very interested on, at least on a historical basis. Like this is the time where XGBoost was born, you know? Yeah, not really. [00:02:37]Tianqi: I started sketching like after XGBoost. So that's a kind of missing piece, but a lot of design details in TVM are actually part of the books that I try to keep a record of. [00:02:48]Swyx: Yeah, we'll try to publish them and publish something in the journals. Maybe you can grab a little snapshot for visual aid. Sounds good. [00:02:57]Alessio: Yeah. And yeah, talking about XGBoost, so a lot of people in the audience might know it's a gradient boosting library, probably the most popular out there. And it became super popular because many people started using them in like a machine learning competitions. And I think there's like a whole Wikipedia page of like all state-of-the-art models. They use XGBoost and like, it's a really long list. When you were working on it, so we just had Tri Dao, who's the creator of FlashAttention on the podcast. And I asked him this question, it's like, when you were building FlashAttention, did you know that like almost any transform race model will use it? And so I asked the same question to you when you were coming up with XGBoost, like, could you predict it would be so popular or like, what was the creation process? And when you published it, what did you expect? We have no idea. [00:03:41]Tianqi: Like, actually, the original reason that we built that library is that at that time, deep learning just came out. Like that was the time where AlexNet just came out. And one of the ambitious mission that myself and my advisor, Carlos Guestrin, then is we want to think about, you know, try to test the hypothesis. Can we find alternatives to deep learning models? Because then, you know, there are other alternatives like, you know, support vector machines, linear models, and of course, tree-based models. And our question was, if you build those models and feed them with big enough data, because usually like one of the key characteristics of deep learning is that it's taking a lot [00:04:22]Swyx: of data, right? [00:04:23]Tianqi: So we will be able to get the same amount of performance. That's a hypothesis we're setting out to test. Of course, if you look at now, right, that's a wrong hypothesis, but as a byproduct, what we find out is that, you know, most of the gradient boosting library out there is not efficient enough for us to test that hypothesis. So I happen to have quite a bit of experience in the past of building gradient boosting trees and their variants. So Effective Action Boost was kind of like a byproduct of that hypothesis testing. At that time, I'm also competing a bit in data science challenges, like I worked on KDDCup and then Kaggle kind of become bigger, right? So I kind of think maybe it's becoming useful to others. One of my friends convinced me to try to do a Python binding of it. That tends to be like a very good decision, right, to be effective. Usually when I build it, we feel like maybe a command line interface is okay. And now we have a Python binding, we have R bindings. And then it realized, you know, it started getting interesting. People started contributing different perspectives, like visualization and so on. So we started to push a bit more on to building distributive support to make sure it works on any platform and so on. And even at that time point, when I talked to Carlos, my advisor, later, he said he never anticipated that we'll get to that level of success. And actually, why I pushed for gradient boosting trees, interestingly, at that time, he also disagreed. He thinks that maybe we should go for kernel machines then. And it turns out, you know, actually, we are both wrong in some sense, and Deep Neural Network was the king in the hill. But at least the gradient boosting direction got into something fruitful. [00:06:01]Swyx: Interesting. [00:06:02]Alessio: I'm always curious when it comes to these improvements, like, what's the design process in terms of like coming up with it? And how much of it is a collaborative with like other people that you're working with versus like trying to be, you know, obviously, in academia, it's like very paper-driven kind of research driven. [00:06:19]Tianqi: I would say the extra boost improvement at that time point was more on like, you know, I'm trying to figure out, right. But it's combining lessons. Before that, I did work on some of the other libraries on matrix factorization. That was like my first open source experience. Nobody knew about it, because you'll find, likely, if you go and try to search for the package SVD feature, you'll find some SVN repo somewhere. But it's actually being used for some of the recommender system packages. So I'm trying to apply some of the previous lessons there and trying to combine them. The later projects like MXNet and then TVM is much, much more collaborative in a sense that... But, of course, extra boost has become bigger, right? So when we started that project myself, and then we have, it's really amazing to see people come in. Michael, who was a lawyer, and now he works on the AI space as well, on contributing visualizations. Now we have people from our community contributing different things. So extra boost even today, right, it's a community of committers driving the project. So it's definitely something collaborative and moving forward on getting some of the things continuously improved for our community. [00:07:37]Alessio: Let's talk a bit about TVM too, because we got a lot of things to run through in this episode. [00:07:42]Swyx: I would say that at some point, I'd love to talk about this comparison between extra boost or tree-based type AI or machine learning compared to deep learning, because I think there is a lot of interest around, I guess, merging the two disciplines, right? And we can talk more about that. I don't know where to insert that, by the way, so we can come back to it later. Yeah. [00:08:04]Tianqi: Actually, what I said, when we test the hypothesis, the hypothesis is kind of, I would say it's partially wrong, because the hypothesis we want to test now is, can you run tree-based models on image classification tasks, where deep learning is certainly a no-brainer right [00:08:17]Swyx: now today, right? [00:08:18]Tianqi: But if you try to run it on tabular data, still, you'll find that most people opt for tree-based models. And there's a reason for that, in the sense that when you are looking at tree-based models, the decision boundaries are naturally rules that you're looking at, right? And they also have nice properties, like being able to be agnostic to scale of input and be able to automatically compose features together. And I know there are attempts on building neural network models that work for tabular data, and I also sometimes follow them. I do feel like it's good to have a bit of diversity in the modeling space. Actually, when we're building TVM, we build cost models for the programs, and actually we are using XGBoost for that as well. I still think tree-based models are going to be quite relevant, because first of all, it's really to get it to work out of the box. And also, you will be able to get a bit of interoperability and control monotonicity [00:09:18]Swyx: and so on. [00:09:19]Tianqi: So yes, it's still going to be relevant. I also sometimes keep coming back to think about, are there possible improvements that we can build on top of these models? And definitely, I feel like it's a space that can have some potential in the future. [00:09:34]Swyx: Are there any current projects that you would call out as promising in terms of merging the two directions? [00:09:41]Tianqi: I think there are projects that try to bring a transformer-type model for tabular data. I don't remember specifics of them, but I think even nowadays, if you look at what people are using, tree-based models are still one of their toolkits. So I think maybe eventually it's not even a replacement, it will be just an ensemble of models that you can call. Perfect. [00:10:07]Alessio: Next up, about three years after XGBoost, you built this thing called TVM, which is now a very popular compiler framework for models. Let's talk about, so this came out about at the same time as ONNX. So I think it would be great if you could maybe give a little bit of an overview of how the two things work together. Because it's kind of like the model, then goes to ONNX, then goes to the TVM. But I think a lot of people don't understand the nuances. I can get a bit of a backstory on that. [00:10:33]Tianqi: So actually, that's kind of an ancient history. Before XGBoost, I worked on deep learning for two years or three years. I got a master's before I started my PhD. And during my master's, my thesis focused on applying convolutional restricted Boltzmann machine for ImageNet classification. That is the thing I'm working on. And that was before AlexNet moment. So effectively, I had to handcraft NVIDIA CUDA kernels on, I think, a GTX 2070 card. I have a 22070 card. It took me about six months to get one model working. And eventually, that model is not so good, and we should have picked a better model. But that was like an ancient history that really got me into this deep learning field. And of course, eventually, we find it didn't work out. So in my master's, I ended up working on recommender system, which got me a paper, and I applied and got a PhD. But I always want to come back to work on the deep learning field. So after XGBoost, I think I started to work with some folks on this particular MXNet. At that time, it was like the frameworks of CAFE, Ciano, PyTorch haven't yet come out. And we're really working hard to optimize for performance on GPUs. At that time, I found it's really hard, even for NVIDIA GPU. It took me six months. And then it's amazing to see on different hardwares how hard it is to go and optimize code for the platforms that are interesting. So that gets me thinking, can we build something more generic and automatic? So that I don't need an entire team of so many people to go and build those frameworks. So that's the motivation of starting working on TVM. There is really too little about machine learning engineering needed to support deep learning models on the platforms that we're interested in. I think it started a bit earlier than ONNX, but once it got announced, I think it's in a similar time period at that time. So overall, how it works is that TVM, you will be able to take a subset of machine learning programs that are represented in what we call a computational graph. Nowadays, we can also represent a loop-level program ingest from your machine learning models. Usually, you have model formats ONNX, or in PyTorch, they have FX Tracer that allows you to trace the FX graph. And then it goes through TVM. We also realized that, well, yes, it needs to be more customizable, so it will be able to perform some of the compilation optimizations like fusion operator together, doing smart memory planning, and more importantly, generate low-level code. So that works for NVIDIA and also is portable to other GPU backends, even non-GPU backends [00:13:36]Swyx: out there. [00:13:37]Tianqi: So that's a project that actually has been my primary focus over the past few years. And it's great to see how it started from where I think we are the very early initiator of machine learning compilation. I remember there was a visit one day, one of the students asked me, are you still working on deep learning frameworks? I tell them that I'm working on ML compilation. And they said, okay, compilation, that sounds very ancient. It sounds like a very old field. And why are you working on this? And now it's starting to get more traction, like if you say Torch Compile and other things. I'm really glad to see this field starting to pick up. And also we have to continue innovating here. [00:14:17]Alessio: I think the other thing that I noticed is, it's kind of like a big jump in terms of area of focus to go from XGBoost to TVM, it's kind of like a different part of the stack. Why did you decide to do that? And I think the other thing about compiling to different GPUs and eventually CPUs too, did you already see some of the strain that models could have just being focused on one runtime, only being on CUDA and that, and how much of that went into it? [00:14:50]Tianqi: I think it's less about trying to get impact, more about wanting to have fun. I like to hack code, I had great fun hacking CUDA code. Of course, being able to generate CUDA code is cool, right? But now, after being able to generate CUDA code, okay, by the way, you can do it on other platforms, isn't that amazing? So it's more of that attitude to get me started on this. And also, I think when we look at different researchers, myself is more like a problem solver type. So I like to look at a problem and say, okay, what kind of tools we need to solve that problem? So regardless, it could be building better models. For example, while we build extra boots, we build certain regularizations into it so that it's more robust. It also means building system optimizations, writing low-level code, maybe trying to write assembly and build compilers and so on. So as long as they solve the problem, definitely go and try to do them together. And I also see it's a common trend right now. Like if you want to be able to solve machine learning problems, it's no longer at Aggressor layer, right? You kind of need to solve it from both Aggressor data and systems angle. And this entire field of machine learning system, I think it's kind of emerging. And there's now a conference around it. And it's really good to see a lot more people are starting to look into this. [00:16:10]Swyx: Yeah. Are you talking about ICML or something else? [00:16:13]Tianqi: So machine learning and systems, right? So not only machine learning, but machine learning and system. So there's a conference called MLsys. It's definitely a smaller community than ICML, but I think it's also an emerging and growing community where people are talking about what are the implications of building systems for machine learning, right? And how do you go and optimize things around that and co-design models and systems together? [00:16:37]Swyx: Yeah. And you were area chair for ICML and NeurIPS as well. So you've just had a lot of conference and community organization experience. Is that also an important part of your work? Well, it's kind of expected for academic. [00:16:48]Tianqi: If I hold an academic job, I need to do services for the community. Okay, great. [00:16:53]Swyx: Your most recent venture in MLsys is going to the phone with MLCLLM. You announced this in April. I have it on my phone. It's great. I'm running Lama 2, Vicuña. I don't know what other models that you offer. But maybe just kind of describe your journey into MLC. And I don't know how this coincides with your work at CMU. Is that some kind of outgrowth? [00:17:18]Tianqi: I think it's more like a focused effort that we want in the area of machine learning compilation. So it's kind of related to what we built in TVM. So when we built TVM was five years ago, right? And a lot of things happened. We built the end-to-end machine learning compiler that works, the first one that works. But then we captured a lot of lessons there. So then we are building a second iteration called TVM Unity. That allows us to be able to allow ML engineers to be able to quickly capture the new model and how we demand building optimizations for them. And MLCLLM is kind of like an MLC. It's more like a vertical driven organization that we go and build tutorials and go and build projects like LLM to solutions. So that to really show like, okay, you can take machine learning compilation technology and apply it and bring something fun forward. Yeah. So yes, it runs on phones, which is really cool. But the goal here is not only making it run on phones, right? The goal is making it deploy universally. So we do run on Apple M2 Macs, the 17 billion models. Actually, on a single batch inference, more recently on CUDA, we get, I think, the most best performance you can get out there already on the 4-bit inference. Actually, as I alluded earlier before the podcast, we just had a result on AMD. And on a single batch, actually, we can get the latest AMD GPU. This is a consumer card. It can get to about 80% of the 4019, so NVIDIA's best consumer card out there. So it's not yet on par, but thinking about how diversity and what you can enable and the previous things you can get on that card, it's really amazing that what you can do with this kind of technology. [00:19:10]Swyx: So one thing I'm a little bit confused by is that most of these models are in PyTorch, but you're running this inside a TVM. I don't know. Was there any fundamental change that you needed to do, or was this basically the fundamental design of TVM? [00:19:25]Tianqi: So the idea is that, of course, it comes back to program representation, right? So effectively, TVM has this program representation called TVM script that contains more like computational graph and operational representation. So yes, initially, we do need to take a bit of effort of bringing those models onto the program representation that TVM supports. Usually, there are a mix of ways, depending on the kind of model you're looking at. For example, for vision models and stable diffusion models, usually we can just do tracing that takes PyTorch model onto TVM. That part is still being robustified so that we can bring more models in. On language model tasks, actually what we do is we directly build some of the model constructors and try to directly map from Hugging Face models. The goal is if you have a Hugging Face configuration, we will be able to bring that in and apply optimization on them. So one fun thing about model compilation is that your optimization doesn't happen only as a soft language, right? For example, if you're writing PyTorch code, you just go and try to use a better fused operator at a source code level. Torch compile might help you do a bit of things in there. In most of the model compilations, it not only happens at the beginning stage, but we also apply generic transformations in between, also through a Python API. So you can tweak some of that. So that part of optimization helps a lot of uplifting in getting both performance and also portability on the environment. And another thing that we do have is what we call universal deployment. So if you get the ML program into this TVM script format, where there are functions that takes in tensor and output tensor, we will be able to have a way to compile it. So they will be able to load the function in any of the language runtime that TVM supports. So if you could load it in JavaScript, and that's a JavaScript function that you can take in tensors and output tensors. If you're loading Python, of course, and C++ and Java. So the goal there is really bring the ML model to the language that people care about and be able to run it on a platform they like. [00:21:37]Swyx: It strikes me that I've talked to a lot of compiler people, but you don't have a traditional compiler background. You're inventing your own discipline called machine learning compilation, or MLC. Do you think that this will be a bigger field going forward? [00:21:52]Tianqi: First of all, I do work with people working on compilation as well. So we're also taking inspirations from a lot of early innovations in the field. Like for example, TVM initially, we take a lot of inspirations from Halide, which is just an image processing compiler. And of course, since then, we have evolved quite a bit to focus on the machine learning related compilations. If you look at some of our conference publications, you'll find that machine learning compilation is already kind of a subfield. So if you look at papers in both machine learning venues, the MLC conferences, of course, and also system venues, every year there will be papers around machine learning compilation. And in the compiler conference called CGO, there's a C4ML workshop that also kind of trying to focus on this area. So definitely it's already starting to gain traction and becoming a field. I wouldn't claim that I invented this field, but definitely I helped to work with a lot of folks there. And I try to bring a perspective, of course, trying to learn a lot from the compiler optimizations as well as trying to bring in knowledges in machine learning and systems together. [00:23:07]Alessio: So we had George Hotz on the podcast a few episodes ago, and he had a lot to say about AMD and their software. So when you think about TVM, are you still restricted in a way by the performance of the underlying kernel, so to speak? So if your target is like a CUDA runtime, you still get better performance, no matter like TVM kind of helps you get there, but then that level you don't take care of, right? [00:23:34]Swyx: There are two parts in here, right? [00:23:35]Tianqi: So first of all, there is the lower level runtime, like CUDA runtime. And then actually for NVIDIA, a lot of the mood came from their libraries, like Cutlass, CUDN, right? Those library optimizations. And also for specialized workloads, actually you can specialize them. Because a lot of cases you'll find that if you go and do benchmarks, it's very interesting. Like two years ago, if you try to benchmark ResNet, for example, usually the NVIDIA library [00:24:04]Swyx: gives you the best performance. [00:24:06]Tianqi: It's really hard to beat them. But as soon as you start to change the model to something, maybe a bit of a variation of ResNet, not for the traditional ImageNet detections, but for latent detection and so on, there will be some room for optimization because people sometimes overfit to benchmarks. These are people who go and optimize things, right? So people overfit the benchmarks. So that's the largest barrier, like being able to get a low level kernel libraries, right? In that sense, the goal of TVM is actually we try to have a generic layer to both, of course, leverage libraries when available, but also be able to automatically generate [00:24:45]Swyx: libraries when possible. [00:24:46]Tianqi: So in that sense, we are not restricted by the libraries that they have to offer. That's why we will be able to run Apple M2 or WebGPU where there's no library available because we are kind of like automatically generating libraries. That makes it easier to support less well-supported hardware, right? For example, WebGPU is one example. From a runtime perspective, AMD, I think before their Vulkan driver was not very well supported. Recently, they are getting good. But even before that, we'll be able to support AMD through this GPU graphics backend called Vulkan, which is not as performant, but it gives you a decent portability across those [00:25:29]Swyx: hardware. [00:25:29]Alessio: And I know we got other MLC stuff to talk about, like WebLLM, but I want to wrap up on the optimization that you're doing. So there's kind of four core things, right? Kernel fusion, which we talked a bit about in the flash attention episode and the tiny grab one memory planning and loop optimization. I think those are like pretty, you know, self-explanatory. I think the one that people have the most questions, can you can you quickly explain [00:25:53]Swyx: those? [00:25:54]Tianqi: So there are kind of a different things, right? Kernel fusion means that, you know, if you have an operator like Convolutions or in the case of a transformer like MOP, you have other operators that follow that, right? You don't want to launch two GPU kernels. You want to be able to put them together in a smart way, right? And as a memory planning, it's more about, you know, hey, if you run like Python code, every time when you generate a new array, you are effectively allocating a new piece of memory, right? Of course, PyTorch and other frameworks try to optimize for you. So there is a smart memory allocator behind the scene. But actually, in a lot of cases, it's much better to statically allocate and plan everything ahead of time. And that's where like a compiler can come in. We need to, first of all, actually for language model, it's much harder because dynamic shape. So you need to be able to what we call symbolic shape tracing. So we have like a symbolic variable that tells you like the shape of the first tensor is n by 12. And the shape of the third tensor is also n by 12. Or maybe it's n times 2 by 12. Although you don't know what n is, right? But you will be able to know that relation and be able to use that to reason about like fusion and other decisions. So besides this, I think loop transformation is quite important. And it's actually non-traditional. Originally, if you simply write a code and you want to get a performance, it's very hard. For example, you know, if you write a matrix multiplier, the simplest thing you can do is you do for i, j, k, c, i, j, plus, equal, you know, a, i, k, times b, i, k. But that code is 100 times slower than the best available code that you can get. So we do a lot of transformation, like being able to take the original code, trying to put things into shared memory, and making use of tensor calls, making use of memory copies, and all this. Actually, all these things, we also realize that, you know, we cannot do all of them. So we also make the ML compilation framework as a Python package, so that people will be able to continuously improve that part of engineering in a more transparent way. So we find that's very useful, actually, for us to be able to get good performance very quickly on some of the new models. Like when Lamato came out, we'll be able to go and look at the whole, here's the bottleneck, and we can go and optimize those. [00:28:10]Alessio: And then the fourth one being weight quantization. So everybody wants to know about that. And just to give people an idea of the memory saving, if you're doing FB32, it's like four bytes per parameter. Int8 is like one byte per parameter. So you can really shrink down the memory footprint. What are some of the trade-offs there? How do you figure out what the right target is? And what are the precision trade-offs, too? [00:28:37]Tianqi: Right now, a lot of people also mostly use int4 now for language models. So that really shrinks things down a lot. And more recently, actually, we started to think that, at least in MOC, we don't want to have a strong opinion on what kind of quantization we want to bring, because there are so many researchers in the field. So what we can do is we can allow developers to customize the quantization they want, but we still bring the optimum code for them. So we are working on this item called bring your own quantization. In fact, hopefully MOC will be able to support more quantization formats. And definitely, I think there's an open field that's being explored. Can you bring more sparsities? Can you quantize activations as much as possible, and so on? And it's going to be something that's going to be relevant for quite a while. [00:29:27]Swyx: You mentioned something I wanted to double back on, which is most people use int4 for language models. This is actually not obvious to me. Are you talking about the GGML type people, or even the researchers who are training the models also using int4? [00:29:40]Tianqi: Sorry, so I'm mainly talking about inference, not training, right? So when you're doing training, of course, int4 is harder, right? Maybe you could do some form of mixed type precision for inference. I think int4 is kind of like, in a lot of cases, you will be able to get away with int4. And actually, that does bring a lot of savings in terms of the memory overhead, and so on. [00:30:09]Alessio: Yeah, that's great. Let's talk a bit about maybe the GGML, then there's Mojo. How should people think about MLC? How do all these things play together? I think GGML is focused on model level re-implementation and improvements. Mojo is a language, super sad. You're more at the compiler level. Do you all work together? Do people choose between them? [00:30:32]Tianqi: So I think in this case, I think it's great to say the ecosystem becomes so rich with so many different ways. So in our case, GGML is more like you're implementing something from scratch in C, right? So that gives you the ability to go and customize each of a particular hardware backend. But then you will need to write from CUDA kernels, and you write optimally from AMD, and so on. So the kind of engineering effort is a bit more broadened in that sense. Mojo, I have not looked at specific details yet. I think it's good to start to say, it's a language, right? I believe there will also be machine learning compilation technologies behind it. So it's good to say, interesting place in there. In the case of MLC, our case is that we do not want to have an opinion on how, where, which language people want to develop, deploy, and so on. And we also realize that actually there are two phases. We want to be able to develop and optimize your model. By optimization, I mean, really bring in the best CUDA kernels and do some of the machine learning engineering in there. And then there's a phase where you want to deploy it as a part of the app. So if you look at the space, you'll find that GGML is more like, I'm going to develop and optimize in the C language, right? And then most of the low-level languages they have. And Mojo is that you want to develop and optimize in Mojo, right? And you deploy in Mojo. In fact, that's the philosophy they want to push for. In the ML case, we find that actually if you want to develop models, the machine learning community likes Python. Python is a language that you should focus on. So in the case of MLC, we really want to be able to enable, not only be able to just define your model in Python, that's very common, right? But also do ML optimization, like engineering optimization, CUDA kernel optimization, memory planning, all those things in Python that makes you customizable and so on. But when you do deployment, we realize that people want a bit of a universal flavor. If you are a web developer, you want JavaScript, right? If you're maybe an embedded system person, maybe you would prefer C++ or C or Rust. And people sometimes do like Python in a lot of cases. So in the case of MLC, we really want to have this vision of, you optimize, build a generic optimization in Python, then you deploy that universally onto the environments that people like. [00:32:54]Swyx: That's a great perspective and comparison, I guess. One thing I wanted to make sure that we cover is that I think you are one of these emerging set of academics that also very much focus on your artifacts of delivery. Of course. Something we talked about for three years, that he was very focused on his GitHub. And obviously you treated XGBoost like a product, you know? And then now you're publishing an iPhone app. Okay. Yeah. Yeah. What is his thinking about academics getting involved in shipping products? [00:33:24]Tianqi: I think there are different ways of making impact, right? Definitely, you know, there are academics that are writing papers and building insights for people so that people can build product on top of them. In my case, I think the particular field I'm working on, machine learning systems, I feel like really we need to be able to get it to the hand of people so that really we see the problem, right? And we show that we can solve a problem. And it's a different way of making impact. And there are academics that are doing similar things. Like, you know, if you look at some of the people from Berkeley, right? A few years, they will come up with big open source projects. Certainly, I think it's just a healthy ecosystem to have different ways of making impacts. And I feel like really be able to do open source and work with open source community is really rewarding because we have a real problem to work on when we build our research. Actually, those research bring together and people will be able to make use of them. And we also start to see interesting research challenges that we wouldn't otherwise say, right, if you're just trying to do a prototype and so on. So I feel like it's something that is one interesting way of making impact, making contributions. [00:34:40]Swyx: Yeah, you definitely have a lot of impact there. And having experience publishing Mac stuff before, the Apple App Store is no joke. It is the hardest compilation, human compilation effort. So one thing that we definitely wanted to cover is running in the browser. You have a 70 billion parameter model running in the browser. That's right. Can you just talk about how? Yeah, of course. [00:35:02]Tianqi: So I think that there are a few elements that need to come in, right? First of all, you know, we do need a MacBook, the latest one, like M2 Max, because you need the memory to be big enough to cover that. So for a 70 million model, it takes you about, I think, 50 gigahertz of RAM. So the M2 Max, the upper version, will be able to run it, right? And it also leverages machine learning compilation. Again, what we are doing is the same, whether it's running on iPhone, on server cloud GPUs, on AMDs, or on MacBook, we all go through that same MOC pipeline. Of course, in certain cases, maybe we'll do a bit of customization iteration for either ones. And then it runs on the browser runtime, this package of WebLM. So that will effectively... So what we do is we will take that original model and compile to what we call WebGPU. And then the WebLM will be to pick it up. And the WebGPU is this latest GPU technology that major browsers are shipping right now. So you can get it in Chrome for them already. It allows you to be able to access your native GPUs from a browser. And then effectively, that language model is just invoking the WebGPU kernels through there. So actually, when the LATMAR2 came out, initially, we asked the question about, can you run 17 billion on a MacBook? That was the question we're asking. So first, we actually... Jin Lu, who is the engineer pushing this, he got 17 billion on a MacBook. We had a CLI version. So in MLC, you will be able to... That runs through a metal accelerator. So effectively, you use the metal programming language to get the GPU acceleration. So we find, okay, it works for the MacBook. Then we asked, we had a WebGPU backend. Why not try it there? So we just tried it out. And it's really amazing to see everything up and running. And actually, it runs smoothly in that case. So I do think there are some kind of interesting use cases already in this, because everybody has a browser. You don't need to install anything. I think it doesn't make sense yet to really run a 17 billion model on a browser, because you kind of need to be able to download the weight and so on. But I think we're getting there. Effectively, the most powerful models you will be able to run on a consumer device. It's kind of really amazing. And also, in a lot of cases, there might be use cases. For example, if I'm going to build a chatbot that I talk to it and answer questions, maybe some of the components, like the voice to text, could run on the client side. And so there are a lot of possibilities of being able to have something hybrid that contains the edge component or something that runs on a server. [00:37:47]Alessio: Do these browser models have a way for applications to hook into them? So if I'm using, say, you can use OpenAI or you can use the local model. Of course. [00:37:56]Tianqi: Right now, actually, we are building... So there's an NPM package called WebILM, right? So that you will be able to, if you want to embed it onto your web app, you will be able to directly depend on WebILM and you will be able to use it. We are also having a REST API that's OpenAI compatible. So that REST API, I think, right now, it's actually running on native backend. So that if a CUDA server is faster to run on native backend. But also we have a WebGPU version of it that you can go and run. So yeah, we do want to be able to have easier integrations with existing applications. And OpenAI API is certainly one way to do that. Yeah, this is great. [00:38:37]Swyx: I actually did not know there's an NPM package that makes it very, very easy to try out and use. I want to actually... One thing I'm unclear about is the chronology. Because as far as I know, Chrome shipped WebGPU the same time that you shipped WebILM. Okay, yeah. So did you have some kind of secret chat with Chrome? [00:38:57]Tianqi: The good news is that Chrome is doing a very good job of trying to have early release. So although the official shipment of the Chrome WebGPU is the same time as WebILM, actually, you will be able to try out WebGPU technology in Chrome. There is an unstable version called Canary. I think as early as two years ago, there was a WebGPU version. Of course, it's getting better. So we had a TVM-based WebGPU backhand two years ago. Of course, at that time, there were no language models. It was running on less interesting, well, still quite interesting models. And then this year, we really started to see it getting matured and performance keeping up. So we have a more serious push of bringing the language model compatible runtime onto the WebGPU. [00:39:45]Swyx: I think you agree that the hardest part is the model download. Has there been conversations about a one-time model download and sharing between all the apps that might use this API? That is a great point. [00:39:58]Tianqi: I think it's already supported in some sense. When we download the model, WebILM will cache it onto a special Chrome cache. So if a different web app uses the same WebILM JavaScript package, you don't need to redownload the model again. So there is already something there. But of course, you have to download the model once at least to be able to use it. [00:40:19]Swyx: Okay. One more thing just in general before we're about to zoom out to OctoAI. Just the last question is, you're not the only project working on, I guess, local models. That's right. Alternative models. There's gpt4all, there's olama that just recently came out, and there's a bunch of these. What would be your advice to them on what's a valuable problem to work on? And what is just thin wrappers around ggml? Like, what are the interesting problems in this space, basically? [00:40:45]Tianqi: I think making API better is certainly something useful, right? In general, one thing that we do try to push very hard on is this idea of easier universal deployment. So we are also looking forward to actually have more integration with MOC. That's why we're trying to build API like WebILM and other things. So we're also looking forward to collaborate with all those ecosystems and working support to bring in models more universally and be able to also keep up the best performance when possible in a more push-button way. [00:41:15]Alessio: So as we mentioned in the beginning, you're also the co-founder of Octomel. Recently, Octomel released OctoAI, which is a compute service, basically focuses on optimizing model runtimes and acceleration and compilation. What has been the evolution there? So Octo started as kind of like a traditional MLOps tool, where people were building their own models and you help them on that side. And then it seems like now most of the market is shifting to starting from pre-trained generative models. Yeah, what has been that experience for you and what you've seen the market evolve? And how did you decide to release OctoAI? [00:41:52]Tianqi: One thing that we found out is that on one hand, it's really easy to go and get something up and running, right? So if you start to consider there's so many possible availabilities and scalability issues and even integration issues since becoming kind of interesting and complicated. So we really want to make sure to help people to get that part easy, right? And now a lot of things, if we look at the customers we talk to and the market, certainly generative AI is something that is very interesting. So that is something that we really hope to help elevate. And also building on top of technology we build to enable things like portability across hardwares. And you will be able to not worry about the specific details, right? Just focus on getting the model out. We'll try to work on infrastructure and other things that helps on the other end. [00:42:45]Alessio: And when it comes to getting optimization on the runtime, I see when we run an early adopters community and most enterprises issue is how to actually run these models. Do you see that as one of the big bottlenecks now? I think a few years ago it was like, well, we don't have a lot of machine learning talent. We cannot develop our own models. Versus now it's like, there's these great models you can use, but I don't know how to run them efficiently. [00:43:12]Tianqi: That depends on how you define by running, right? On one hand, it's easy to download your MLC, like you download it, you run on a laptop, but then there's also different decisions, right? What if you are trying to serve a larger user request? What if that request changes? What if the availability of hardware changes? Right now it's really hard to get the latest hardware on media, unfortunately, because everybody's trying to work on the things using the hardware that's out there. So I think when the definition of run changes, there are a lot more questions around things. And also in a lot of cases, it's not only about running models, it's also about being able to solve problems around them. How do you manage your model locations and how do you make sure that you get your model close to your execution environment more efficiently? So definitely a lot of engineering challenges out there. That we hope to elevate, yeah. And also, if you think about our future, definitely I feel like right now the technology, given the technology and the kind of hardware availability we have today, we will need to make use of all the possible hardware available out there. That will include a mechanism for cutting down costs, bringing something to the edge and cloud in a more natural way. So I feel like still this is a very early stage of where we are, but it's already good to see a lot of interesting progress. [00:44:35]Alessio: Yeah, that's awesome. I would love, I don't know how much we're going to go in depth into it, but what does it take to actually abstract all of this from the end user? You know, like they don't need to know what GPUs you run, what cloud you're running them on. You take all of that away. What was that like as an engineering challenge? [00:44:51]Tianqi: So I think that there are engineering challenges on. In fact, first of all, you will need to be able to support all the kind of hardware backhand you have, right? On one hand, if you look at the media library, you'll find very surprisingly, not too surprisingly, most of the latest libraries works well on the latest GPU. But there are other GPUs out there in the cloud as well. So certainly being able to have know-hows and being able to do model optimization is one thing, right? Also infrastructures on being able to scale things up, locate models. And in a lot of cases, we do find that on typical models, it also requires kind of vertical iterations. So it's not about, you know, build a silver bullet and that silver bullet is going to solve all the problems. It's more about, you know, we're building a product, we'll work with the users and we find out there are interesting opportunities in a certain point. And when our engineer will go and solve that, and it will automatically reflect it in a service. [00:45:45]Swyx: Awesome. [00:45:46]Alessio: We can jump into the lightning round until, I don't know, Sean, if you have more questions or TQ, if you have more stuff you wanted to talk about that we didn't get a chance to [00:45:54]Swyx: touch on. [00:45:54]Alessio: Yeah, we have talked a lot. [00:45:55]Swyx: So, yeah. We always would like to ask, you know, do you have a commentary on other parts of AI and ML that is interesting to you? [00:46:03]Tianqi: So right now, I think one thing that we are really pushing hard for is this question about how far can we bring open source, right? I'm kind of like a hacker and I really like to put things together. So I think it's unclear in the future of what the future of AI looks like. On one hand, it could be possible that, you know, you just have a few big players, you just try to talk to those bigger language models and that can do everything, right? On the other hand, one of the things that Wailing Academic is really excited and pushing for, that's one reason why I'm pushing for MLC, is that can we build something where you have different models? You have personal models that know the best movie you like, but you also have bigger models that maybe know more, and you get those models to interact with each other, right? And be able to have a wide ecosystem of AI agents that helps each person while still being able to do things like personalization. Some of them can run locally, some of them, of course, running on a cloud, and how do they interact with each other? So I think that is a very exciting time where the future is yet undecided, but I feel like there is something we can do to shape that future as well. [00:47:18]Swyx: One more thing, which is something I'm also pursuing, which is, and this kind of goes back into predictions, but also back in your history, do you have any idea, or are you looking out for anything post-transformers as far as architecture is concerned? [00:47:32]Tianqi: I think, you know, in a lot of these cases, you can find there are already promising models for long contexts, right? There are space-based models, where like, you know, a lot of some of our colleagues from Albert, who he worked on this HIPPO models, right? And then there is an open source version called RWKV. It's like a recurrent models that allows you to summarize things. Actually, we are bringing RWKV to MOC as well, so maybe you will be able to see one of the models. [00:48:00]Swyx: We actually recorded an episode with one of the RWKV core members. It's unclear because there's no academic backing. It's just open source people. Oh, I see. So you like the merging of recurrent networks and transformers? [00:48:13]Tianqi: I do love to see this model space continue growing, right? And I feel like in a lot of cases, it's just that attention mechanism is getting changed in some sense. So I feel like definitely there are still a lot of things to be explored here. And that is also one reason why we want to keep pushing machine learning compilation, because one of the things we are trying to push in was productivity. So that for machine learning engineering, so that as soon as some of the models came out, we will be able to, you know, empower them onto those environments that's out there. [00:48:43]Swyx: Yeah, it's a really good mission. Okay. Very excited to see that RWKV and state space model stuff. I'm hearing increasing chatter about that stuff. Okay. Lightning round, as always fun. I'll take the first one. Acceleration. What has already happened in AI that you thought would take much longer? [00:48:59]Tianqi: Emergence of more like a conversation chatbot ability is something that kind of surprised me before it came out. This is like one piece that I feel originally I thought would take much longer, but yeah, [00:49:11]Swyx: it happens. And it's funny because like the original, like Eliza chatbot was something that goes all the way back in time. Right. And then we just suddenly came back again. Yeah. [00:49:21]Tianqi: It's always too interesting to think about, but with a kind of a different technology [00:49:25]Swyx: in some sense. [00:49:25]Alessio: What about the most interesting unsolved question in AI? [00:49:31]Swyx: That's a hard one, right? [00:49:32]Tianqi: So I can tell you like what kind of I'm excited about. So, so I think that I have always been excited about this idea of continuous learning and lifelong learning in some sense. So how AI continues to evolve with the knowledges that have been there. It seems that we're getting much closer with all those recent technologies. So being able to develop systems, support, and be able to think about how AI continues to evolve is something that I'm really excited about. [00:50:01]Swyx: So specifically, just to double click on this, are you talking about continuous training? That's like a training. [00:50:06]Tianqi: I feel like, you know, training adaptation and it's all similar things, right? You want to think about entire life cycle, right? The life cycle of collecting data, training, fine tuning, and maybe have your local context that getting continuously curated and feed onto models. So I think all these things are interesting and relevant in here. [00:50:29]Swyx: Yeah. I think this is something that people are really asking, you know, right now we have moved a lot into the sort of pre-training phase and off the shelf, you know, the model downloads and stuff like that, which seems very counterintuitive compared to the continuous training paradigm that people want. So I guess the last question would be for takeaways. What's basically one message that you want every listener, every person to remember today? [00:50:54]Tianqi: I think it's getting more obvious now, but I think one of the things that I always want to mention in my talks is that, you know, when you're thinking about AI applications, originally people think about algorithms a lot more, right? Our algorithm models, they are still very important. But usually when you build AI applications, it takes, you know, both algorithm side, the system optimizations, and the data curations, right? So it takes a connection of so many facades to be able to bring together an AI system and be able to look at it from that holistic perspective is really useful when we start to build modern applications. I think it's going to continue going to be more important in the future. [00:51:35]Swyx: Yeah. Thank you for showing the way on this. And honestly, just making things possible that I thought would take a lot longer. So thanks for everything you've done. [00:51:46]Tianqi: Thank you for having me. [00:51:47]Swyx: Yeah. [00:51:47]Alessio: Thanks for coming on TQ. [00:51:49]Swyx: Have a good one. [00:51:49] Get full access to Latent Space at www.latent.space/subscribe

#27: Kann ein Large Language Model (LLM) bei der Klassifikation tabellarischer Daten XGBoost schlagen?

kann herausforderungen entwicklung projekt einsatz ausblick daten ergebnisse umsetzung leistung nase llm potenziale large language models churn schlagen vorhersage klassifikation xgboost

Play Episode Listen Later Jul 6, 2023 39:26

Wir diskutieren den Einsatz von Large Language Models (LLMs) zur Klassifikation tabellarischer Daten, ein bis dato eher unerforschtes Anwendungsfeld. Wir vergleichen die Leistung eines LLMs mit der von XGBoost in einem Projekt zur Vorhersage von Churn. Obwohl XGBoost noch die Nase vorn hat, zeigt das LLM bemerkenswerte Ergebnisse. Wir beleuchten die technische Umsetzung, Herausforderungen sowie Potenziale, und geben einen Ausblick auf die Entwicklung dieses spannenden Anwendungsfeldes. Links: OpenAI Fine-Tune for Classification Example: https://github.com/openai/openai-cookbook/blob/main/examples/Fine-tuned_classification.ipynb TabLLM Paper: https://arxiv.org/abs/2210.10723 Dataset: https://www.kaggle.com/datasets/datazng/telecom-company-churn-rate-call-center-data Large Language Models in Production Conference: https://home.mlops.community/public/events/llm-in-prod-part-ii-2023-06-20

#600: Amazon SageMaker Multi Model Endpoints

AWS Podcast

Play Episode Listen Later Jul 3, 2023 20:20

Amazon SageMaker Multi-Model Endpoint (MME) is fully managed capability of SageMaker Inference that allows customers to deploy thousands of models on a single endpoint and save costs by sharing instances on which the endpoints run across all the models. Until recently, MME was only supported for machine learning (ML) models which run on CPU instances. Now, customers can use MME to deploy thousands of ML models on GPU based instances as well, and potentially save costs by 90%. MME dynamically loads and unloads models from GPU memory based on incoming traffic to the endpoint. Customers save cost with MME as the GPU instances are shared by thousands of models. Customers can run ML models from multiple ML frameworks including PyTorch, TensorFlow, XGBoost, and ONNX. Customers can get started by using the NVIDIA Triton™ Inference Server and deploy models on SageMaker's GPU instances in “multi-model“ mode. Once the MME is created, customers specify the ML model from which they want to obtain inference while invoking the endpoint. Multi Model Endpoints for GPU is available in all AWS regions where Amazon SageMaker is available. To learn more checkout: Our launch blog: https://go.aws/3NwtJyh Amazon SageMaker website: https://go.aws/44uCdNr

model customers aws ml cpu gpu mme tensorflow pytorch endpoints sagemaker amazon sagemaker xgboost

#21 - Ensemble Learning: Boosting, Bagging, and Random Forests in Machine Learning

The AI Frontier Podcast

Play Episode Listen Later Jun 11, 2023 11:05

Dive into this episode of The AI Frontier podcast, where we explore Ensemble Learning techniques like Boosting, Bagging, and Random Forests in Machine Learning. Learn about their applications, advantages, and limitations, and discover real-world success stories. Enhance your understanding of these powerful methods and stay ahead in the world of data science.Support the Show.Keep AI insights flowing – become a supporter of the show!Click the link for details

ai dive artificial intelligence machine learning enhance ensemble boosting data science forests bagging xgboost random forest decision trees

681: XGBoost: The Ultimate Classifier, with Matt Harrison

The Machine Learning Podcast

Play Episode Listen Later May 23, 2023 72:01

Unlock the power of XGBoost by learning how to fine-tune its hyperparameters and discover its optimal modeling situations. This and more, when best-selling author and leading Python consultant Matt Harrison teams up with Jon Krohn for yet another jam-packed technical episode! Are you ready to upgrade your data science toolkit in just one hour? Tune-in now! This episode is brought to you by Pathway, the reactive data processing framework (pathway.com/?from=superdatascience), by Posit, the open-source data science company (posit.co), and by Anaconda, the world's most popular Python distribution (superdatascience.com/anaconda). Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information. In this episode you will learn: • Matt's book ‘Effective XGBoost' [07:05] • What is XGBoost [09:09] • XGBoost's key model hyperparameters [19:01] • XGBoost's secret sauce [29:57] • When to use XGBoost [34:45] • When not to use XGBoost [41:42] • Matt's recommended Python libraries [47:36] • Matt's production tips [57:57] Additional materials: www.superdatascience.com/681

unlock pathway python anaconda matt harrison posit classifier xgboost jon krohn

Real-Time Machine Learning Has Entered The Realm Of The Possible

Play Episode Listen Later Mar 9, 2023 34:29

Summary Machine learning models have predominantly been built and updated in a batch modality. While this is operationally simpler, it doesn't always provide the best experience or capabilities for end users of the model. Tecton has been investing in the infrastructure and workflows that enable building and updating ML models with real-time data to allow you to react to real-world events as they happen. In this episode CTO Kevin Stumpf explores they benefits of real-time machine learning and the systems that are necessary to support the development and maintenance of those models. Announcements Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery. Your host is Tobias Macey and today I'm interviewing Kevin Stumpf about the challenges and promise of real-time ML applications Interview Introduction How did you get involved in machine learning? Can you describe what real-time ML is and some examples of where it might be applied? What are the operational and organizational requirements for being able to adopt real-time approaches for ML projects? What are some of the ways that real-time requirements influence the scale/scope/architecture of an ML model? What are some of the failure modes for real-time vs analytical or operational ML? Given the low latency between source/input data being generated or received and a prediction being generated, how does that influence susceptibility to e.g. data drift? Data quality and accuracy also become more critical. What are some of the validation strategies that teams need to consider as they move to real-time? What are the most interesting, innovative, or unexpected ways that you have seen real-time ML applied? What are the most interesting, unexpected, or challenging lessons that you have learned while working on real-time ML systems? When is real-time the wrong choice for ML? What do you have planned for the future of real-time support for ML in Tecton? Contact Info LinkedIn (https://www.linkedin.com/in/kevinstumpf/) @kevinmstumpf (https://twitter.com/kevinmstumpf?lang=en) on Twitter Parting Question From your perspective, what is the biggest barrier to adoption of machine learning today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast (https://www.dataengineeringpodcast.com) covers the latest on modern data management. Podcast.__init__ () covers the Python language, its community, and the innovative ways it is being used. Visit the site (https://www.themachinelearningpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com (mailto:hosts@themachinelearningpodcast.com)) with your story. To help other people find the show please leave a review on iTunes (https://podcasts.apple.com/us/podcast/the-machine-learning-podcast/id1626358243) and tell your friends and co-workers Links Tecton (https://www.tecton.ai/) Podcast Episode (https://www.themachinelearningpodcast.com/tecton-machine-learning-feature-platform-episode-6/) Data Engineering Podcast Episode (https://www.dataengineeringpodcast.com/tecton-mlops-feature-store-episode-166/) Uber Michelangelo (https://www.uber.com/blog/michelangelo-machine-learning-platform/) Reinforcement Learning (https://en.wikipedia.org/wiki/Reinforcement_learning) Online Learning (https://en.wikipedia.org/wiki/Online_machine_learning) Random Forest (https://en.wikipedia.org/wiki/Random_forest) ChatGPT (https://openai.com/blog/chatgpt) XGBoost (https://xgboost.ai/) Linear Regression (https://en.wikipedia.org/wiki/Linear_regression) Train-Serve Skew (https://ploomber.io/blog/train-serve-skew/) Flink (https://flink.apache.org/) Data Engineering Podcast Episode (https://www.dataengineeringpodcast.com/apache-flink-with-fabian-hueske-episode-57/) The intro and outro music is from Hitman's Lovesong feat. Paola Graziano (https://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Tales_Of_A_Dead_Fish/Hitmans_Lovesong/) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/)/CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0/)

How Shopify Built A Machine Learning Platform That Encourages Experimentation

The Machine Learning Podcast

Play Episode Listen Later Feb 2, 2023 66:11

Summary Shopify uses machine learning to power multiple features in their platform. In order to reduce the amount of effort required to develop and deploy models they have invested in building an opinionated platform for their engineers. They have gone through multiple iterations of the platform and their most recent version is called Merlin. In this episode Isaac Vidas shares the use cases that they are optimizing for, how it integrates into the rest of their data platform, and how they have designed it to let machine learning engineers experiment freely and safely. Announcements Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery. Your host is Tobias Macey and today I'm interviewing Isaac Vidas about his work on the ML platform used by Shopify Interview Introduction How did you get involved in machine learning? Can you describe what Shopify is and some of the ways that you are using ML at Shopify? What are the challenges that you have encountered as an organization in applying ML to your business needs? Can you describe how you have designed your current technical platform for supporting ML workloads? Who are the target personas for this platform? What does the workflow look like for a given data scientist/ML engineer/etc.? What are the capabilities that you are trying to optimize for in your current platform? What are some of the previous iterations of ML infrastructure and process that you have built? What are the most useful lessons that you gathered from those previous experiences that informed your current approach? How have the capabilities of the Merlin platform influenced the ways that ML is viewed and applied across Shopify? What are the most interesting, innovative, or unexpected ways that you have seen Merlin used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Merlin? When is Merlin the wrong choice? What do you have planned for the future of Merlin? Contact Info @kazuaros (https://twitter.com/kazuarous) on Twitter LinkedIn (https://www.linkedin.com/in/isaac-vidas/) kazuar (https://github.com/kazuar) on GitHub Parting Question From your perspective, what is the biggest barrier to adoption of machine learning today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast (https://www.dataengineeringpodcast.com) covers the latest on modern data management. Podcast.__init__ () covers the Python language, its community, and the innovative ways it is being used. Visit the site (https://www.themachinelearningpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com (mailto:hosts@themachinelearningpodcast.com)) with your story. To help other people find the show please leave a review on iTunes (https://podcasts.apple.com/us/podcast/the-machine-learning-podcast/id1626358243) and tell your friends and co-workers Links Shopify (https://www.shopify.com/) Shopify Merlin (https://shopify.engineering/merlin-shopify-machine-learning-platform) Vertex AI (https://cloud.google.com/vertex-ai) scikit-learn (https://scikit-learn.org/stable/) XGBoost (https://xgboost.ai/) Ray (https://docs.ray.io/en/latest/) Podcast.__init__ Episode (https://www.pythonpodcast.com/ray-distributed-computing-episode-258/) PySpark (https://spark.apache.org/docs/latest/api/python/) GPT-3 (https://en.wikipedia.org/wiki/GPT-3) ChatGPT (https://openai.com/blog/chatgpt/) Google AI (https://ai.google/) PyTorch (https://pytorch.org/) Podcast.__init__ Episode (https://www.pythonpodcast.com/pytorch-deep-learning-epsiode-202/) Dask (https://www.dask.org/) Modin (https://modin.readthedocs.io/en/stable/) Podcast.__init__ Episode (https://www.pythonpodcast.com/modin-parallel-dataframe-episode-324/) Flink (https://flink.apache.org/) Data Engineering Podcast Episode (https://www.dataengineeringpodcast.com/apache-flink-with-fabian-hueske-episode-57/) Feast Feature Store (https://feast.dev/) Kubernetes (https://kubernetes.io/) The intro and outro music is from Hitman's Lovesong feat. Paola Graziano (https://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Tales_Of_A_Dead_Fish/Hitmans_Lovesong/) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/)/CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0/)

Emmanuel Turlay, Founder and CEO of Sematic and machine learning pioneer, discusses what's required to turn every software engineer into an ML engineer

AI and the Future of Work

Play Episode Listen Later Dec 4, 2022 45:10

Emmanuel Turlay spent more than a decade in engineering roles at tech-first companies like Instacart and Cruise before realizing machine learning engineers need a better solution. Emmanuel started Sematic earlier this year and was part of the YC summer 2022 batch. He recently raised a $3M seed round from investors including Race Capital and Soma Capital. Thanks to friend of the podcast and former guest Hina Dixit from Samsung NEXT for the intro to Emmanuel.I've been involved with the AutoML space for five years and, for full disclosure, I'm on the board of Auger which is in a related space. I've seen the space evolve and know how much room there is for innovation. This one's a great education about what's broken and what's ahead from a true machine learning pioneer.Listen and learn...How to turn every software engineer into a machine learning engineerHow AutoML platforms are automating tasks performed in traditional ML toolsHow Emmanuel translated learning from Cruise, the self-driving car company, into an open source platform available to all data engineering teamsHow to move from building an ML model locally to deploying it to the cloud and creating a data pipeline... in hoursWhat you should know about self-driving cars... from one of the experts who developed the brains that power themWhy 80% of AI and ML projects failReferences in this episode:Unscrupulous users manipulate LLMs to spew hateHina Dixit from Samsung NEXT on AI and the Future of WorkApache BeamEliot Shmukler, Anomalo CEO, on AI and the Future of Work

Identifying the physical origin of gamma-ray bursts with supervised machine learning

Astro arXiv | all categories

Play Episode Listen Later Nov 30, 2022 0:56

Identifying the physical origin of gamma-ray bursts with supervised machine learning by Jia-Wei Luo et al. on Wednesday 30 November The empirical classification of gamma-ray bursts (GRBs) into long and short GRBs based on their durations is already firmly established. This empirical classification is generally linked to the physical classification of GRBs originating from compact binary mergers and GRBs originating from massive star collapses, or Type I and II GRBs, with the majority of short GRBs belonging to Type I and the majority of long GRBs belonging to Type II. However, there is a significant overlap in the duration distributions of long and short GRBs. Furthermore, some intermingled GRBs, i.e., short-duration Type II and long-duration Type I GRBs, have been reported. A multi-wavelength, multi-parameter classification scheme of GRBs is evidently needed. In this paper, we seek to build such a classification scheme with supervised machine learning methods, chiefly XGBoost. We utilize the GRB Big Table and Greiner's GRB catalog and divide the input features into three subgroups: prompt emission, afterglow, and host galaxy. We find that the prompt emission subgroup performs the best in distinguishing between Type I and II GRBs. We also find the most important distinguishing feature in prompt emission to be $T_{90}$, hardness ratio, and fluence. After building the machine learning model, we apply it to the currently unclassified GRBs to predict their probabilities of being either GRB class, and we assign the most probable class of each GRB to be its possible physical class. arXiv: http://arxiv.org/abs/http://arxiv.org/abs/2211.16451v1

identifying origin machine learning supervised type ii gamma rays arxiv gamma ray bursts grb xgboost

Identifying the physical origin of gamma-ray bursts with supervised machine learning

Astro arXiv | all categories

Play Episode Listen Later Nov 30, 2022 0:58

identifying origin machine learning supervised type ii gamma rays arxiv gamma ray bursts grb xgboost

The MLOps Podcast

Play Episode Listen Later Sep 16, 2022 71:00

In this episode, I speak with Dean Langsam, Data Scientist at SentinelOne and one of the organizers of PyData in Israel. We chat about imposter syndrome, the best field in machine learning, why XGBoost is the best model, and the fact that most organizations have too much data. It was fascinating for me, so I hope you enjoy it too.

israel data discord data scientists langsam sentinelone xgboost pydata

Weather forecasting with ML, Kaggle tips and tricks, dealing with missing data, deep learning with Jesper Dramsch, The Data Scientist Show #040

The Data Scientist Show

Play Episode Listen Later Jun 16, 2022 118:11

Jesper Dramsch is a scientist for machine learning at the European Centre for Medium-Range Weather forecasts. They have a phd in applied Machine Learning to Geoscience from Technical University of Denmark. They are a Kaggle Kernals Expert and TPU star, ranking at top 81/100k worldwide. We talked about weather forecasting, things they learned from Kaggle, how to deal with missing data and ourliers, deep learning, Keras vs Pytorch, XGBoost, their struggles as a phd student, working in the EU vs US. Follow @DalianaLiu for more updates on data science and this show. Resources shared by Jesper: The newsletter with missing data: https://buttondown.email/jesper/archive/towels-have-quite-a-dry-sense-of-humor/ The paper by Gael about missing data: https://academic.oup.com/gigascience/article/doi/10.1093/gigascience/giac013/6568998 The Huber Loss: https://en.wikipedia.org/wiki/Huber_loss Skill Scores: https://en.wikipedia.org/wiki/Forecast_skill Brier Skill in Weather: https://www.dwd.de/EN/ourservices/seasonals_forecasts/forecast_reliability.html CRPS Continuous Ranked Probability Score https://datascience.stackexchange.com/questions/63919/what-is-continuous-ranked-probability-score-crps ConvNext, Convnets for the 2020s: https://arxiv.org/abs/2201.03545 Transformers for ensemble forecasts: https://arxiv.org/abs/2106.13924 Books I recommend: https://www.amazon.com/shop/jesperdramsch/list/2DYS5KVR5TX0E Blog posts I wrote about these books: https://dramsch.net/tags/books/ Short I made about Test-Time Augmentation https://www.youtube.com/shorts/w4sAh9lKyls Their links: https://dramsch.net/links Their open PhD thesis: https://dramsch.net/phd Newsletter: https://dramsch.net/newsletter Twitter: https://dramsch.net/twitter Youtube: https://dramsch.net/youtube Linkedin: https://dramsch.net/linkedin Kaggle: https://dramsch.net/

phd european union weather denmark machine learning tips and tricks data scientists deep learning jesper technical university geosciences keras kaggle european centre pytorch weather forecasting tpu missing data xgboost

Hopular: Modern Hopfield Networks for Tabular Data

Papers Read on AI

Play Episode Listen Later Jun 8, 2022 30:01

We suggest “Hopular”, a novel Deep Learning architecture for medium and small sized datasets, where each layer is equipped with continuous modern Hopfield networks. The modern Hopfield networks use stored data to identify feature-feature, feature-target, and sample-sample dependencies. Hopular's novelty is that every layer can directly access the original input as well as the whole training set via stored data in the Hopfield networks. Hopular outperforms XGBoost, CatBoost, LightGBM and a state-of-the art Deep Learning method designed for tabular data. Thus, Hopular is a strong alternative to these methods on tabular data. 2022: Bernhard Schafl, Lukas Gruber, Angela Bitto-Nemling, S. Hochreiter Ranked #1 on General Classification on Shrutime https://arxiv.org/pdf/2206.00664v1.pdf

data modern networks deep learning tabular xgboost general classification

SDS 561: Engineering Data APIs

Yannic Kilcher Videos (Audio Only)

Play Episode Listen Later Mar 29, 2022 53:54

In this episode, Ribbon Health CTO Nate Fox joins us to discuss the ins and outs of APIs. Tune in to hear him share how he and his team build out APIs from scratch; how they ensure the uptime and reliability of APIs and how they leverage machine learning to improve the quality of healthcare delivery and maximize their social impact. In this episode you will learn: • What are APIs? [13:20] • How Ribbon Health's data API leverages ML models to improve the quality of healthcare delivery [16:08] • How to design a data API from scratch [20:00] • How to ensure the uptime and reliability of APIs [25:28] • How Ribbon uses knowledge graphs, manually labeled data samples, and an XGBoost model with hundreds of inputs to assign a confidence score [27:14] • Nate's favorite tool for easily scaling up the impact of data science [37:40] • What is Nate's day-to-day like? [34:34] • The qualities Nate looks for when hiring data scientists [39:50] • How scientists and engineers can make a big social impact in health technology [42:50] Additional materials: www.superdatascience.com/561

data engineering api ml apis xgboost

[ML News] Uber: Deep Learning for ETA | MuZero Video Compression | Block-NeRF | EfficientNet-X

Play Episode Listen Later Feb 24, 2022 26:06

#mlnews #muzero #nerf Your regularly irregular updates on everything new in the ML world! Merch: store.ykilcher.com OUTLINE: 0:00 - Intro 0:15 - Sponsor: Weights & Biases 2:15 - Uber switches from XGBoost to Deep Learning for ETA prediction 5:45 - MuZero advances video compression 10:10 - Learned Soft Prompts can steer large language models 12:45 - Block-NeRF captures entire city blocks 14:15 - Neural Architecture Search considers underlying hardware 16:50 - Mega-Blog on Self-Organizing Agents 18:40 - Know Your Data (for Tensorflow Datasets) 20:30 - Helpful Things Sponsor: Weights & Biases https://wandb.me/yannic References: https://docs.wandb.ai/guides/integrat... https://colab.research.google.com/git... https://wandb.ai/borisd13/GPT-3/repor... Uber switches from XGBoost to Deep Learning for ETA prediction https://eng.uber.com/deepeta-how-uber... MuZero advances video compression https://deepmind.com/blog/article/MuZ... https://storage.googleapis.com/deepmi... Learned Soft Prompts can steer large language models https://ai.googleblog.com/2022/02/gui... https://aclanthology.org/2021.emnlp-m... Block-NeRF captures entire city blocks https://arxiv.org/abs/2202.05263 https://arxiv.org/pdf/2202.05263.pdf https://waymo.com/intl/zh-cn/research... Neural Architecture Search considers underlying hardware https://ai.googleblog.com/2022/02/unl... https://openaccess.thecvf.com/content... Mega-Blog on Self-Organizing Agents https://developmentalsystems.org/sens... https://flowers.inria.fr/ Know Your Data (for Tensorflow Datasets) https://knowyourdata-tfds.withgoogle.... https://knowyourdata.withgoogle.com/ Helpful Things https://twitter.com/casualganpapers/s... https://www.reddit.com/r/MachineLearn... https://arxiv.org/abs/2202.02435 https://github.com/vicariousinc/PGMax https://www.vicarious.com/posts/pgmax... https://diambra.ai/tournaments https://github.com/diambra/diambraArena https://www.youtube.com/watch?v=dw72P... https://gitlab.com/deepcypher/python-... https://python-fhez.readthedocs.io/en... https://joss.theoj.org/papers/10.2110... https://github.com/PyTorchLightning/m... https://torchmetrics.readthedocs.io/e... https://twitter.com/alanyttian/status... https://github.com/google/evojax https://arxiv.org/abs/2202.05008 https://www.reddit.com/r/MachineLearn... https://www.gymlibrary.ml/pages/api/#... Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

uber merch gpt ml deep learning nerf eta compression muz xgboost muzero

Building for Small Data Science Teams // James Lamb // MLOps Coffee Sessions #69

MLOps.community

Play Episode Listen Later Dec 20, 2021 52:36

MLOps Coffee Sessions #69 with James Lamb, Building for Small Data Science Teams co-hosted by Adam Sroka. // Abstract In this conversation, James shares some hard-won lessons on how to effectively use technology to create applications powered by machine learning models. James also talks about how making the "right" architecture decisions is as much about org structure and hiring plans as it is about technological features. // Bio James Lamb is a machine learning engineer at SpotHero, a Chicago-based parking marketplace company. He is a maintainer of LightGBM, a popular machine learning framework from Microsoft Research, and has made many contributions to other open-source data science projects, including XGBoost and prefect. Prior to joining SpotHero, he worked on a managed Dask + Jupyter + Prefect service at Saturn Cloud and as an Industrial IoT Data Scientist at AWS and Uptake. Outside of work, he enjoys going to hip hop shows, watching the Celtics / Red Sox, and watching reality TV (he wouldn't object to being called “Bravo Trash”). // Relevant Links James keeps track of conference and meetup talks he has given at https://github.com/jameslamb/talks#gallery. The audience for this podcast might be most interested in "Scaling LightGBM with Python and Dask" and "How Distributed LightGBM on Dask Works". --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletter and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Adam on LinkedIn: https://www.linkedin.com/in/aesroka/ Connect with James on LinkedIn: https://www.linkedin.com/in/jameslamb1/

tv chicago building coffee python data science aws das k uptake microsoft research small data demetrios spothero xgboost james lamb connect with us join

#9 - Trees and forests. How these algorithms work? Understanding XGBoost and RandomForest.

Life with AI

Play Episode Listen Later Jul 29, 2021 13:35

In this episode I explain how the machine learning algorithms based on trees and forests work. In the episode I explain in a simple way the idea behind the decision trees and also the powerful algorithms RandomForest and XGBoost and its respectives methods, bagging and boosting.

trees algorithms forests xgboost

#9 - Árvores e florestas. Como funcionam esses algoritmos? Explicando o XGBoost e o RandomForest.

Vida com IA

Play Episode Listen Later Jul 29, 2021 14:14

Nesse episódio eu explico como os algoritmos de inteligência artificial baseado em árvores e florestas funcionam. No episódio eu explico de uma forma simples as ideias por trás das árvores de decisão and também dos poderoso algoritmos RandomForest e XGBoost, além dos seus respectivos métodos bagging e boosting.

nesse esses explicando algoritmos funcionam florestas xgboost

Scaling AI at H&M Group with Errol Koolmeister - #503

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Jul 22, 2021 40:46

Today we're joined by Errol Koolmeister, the head of AI foundation at H&M Group. In our conversation with Errol, we explore H&M's AI journey, including its wide adoption across the company in 2016, and the various use cases in which it's deployed like fashion forecasting and pricing algorithms. We discuss Errol's first steps in taking on the challenge of scaling AI broadly at the company, the value-added learning from proof of concepts, and how to align in a sustainable, long-term way. Of course, we dig into the infrastructure and models being used, the biggest challenges faced, and the importance of managing the project portfolio, while Errol shares their approach to building infra for a specific product with many products in mind.

ai fashion scaling infrastructure forecasting hm errol databricks xgboost

Neural Models for Tabular Data

The Data Exchange with Ben Lorica

Play Episode Listen Later Jul 1, 2021 43:55

This week's guest is Sercan Arik, Research Scientist at Google Cloud AI. Sercan and his collaborators recently published a paper on TabNet, a deep neural network architecture for tabular data. It uses sequential attention to select features, is explainable, and based on tests Sarjan and team have done spanning many domains, TabNet outperforms or is on par with other models (e.g., XGBoost) on classification and regression problems.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

data models detailed neural research scientist sarjan tabular sercan xgboost google cloud ai

Loan risk analysis with xgboost

Machine learning

Play Episode Listen Later Jun 13, 2021 11:22

Thoughts on the best classifier for risk --- Send in a voice message: https://anchor.fm/david-nishimoto/message

loans risk analysis xgboost

What Data Scientists Don't Mention in Their LinkedIn Profiles - Yury Kashnitsky

DataTalks.Club

Play Episode Listen Later Jun 4, 2021 59:56

We talked about: Yury's background Failing fast: Grammarly for science Not failing fast: Keyword recommender Four steps to epiphany Lesson learned when bringing XGBoost into production When data scientists try to be engineers Joining a fintech startup: Doing NLP with thousands of GPUs Working at a Telco company Having too much freedom The importance of digital presence Work-life balance Quantifying impact of failing projects on our CVs Business trips to Perm: don't work on the weekend What doesn't kill you makes you stronger Links: Yury's course: https://mlcourse.ai/ Yury's Twitter: https://twitter.com/ykashnitsky Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

work club lesson failing profiles data scientists keyword perm quantifying grammarly telco yury xgboost

S2-E20.3 - How Can we Best Use NASHmap in Clinical trials and Patient Treatment?

patients artificial intelligence prof treatments diabetes machine learning clinical trials best use ehr surfers optum nafld electronic health records auc diagnostic tests xgboost

Play Episode Listen Later Apr 25, 2021 15:15

The Surfers explore the impact that NASHmap, the first Machine Learning model that can identify patients likely to have NASH, can have on clinical trials and patient treatment.Prof. Schattenberg and colleagues built the NASHmap model from a NIDDK database and validated it using the Optum de-identified EHR dataset. The model includes 14 variables, some obvious, others less so, that produce an AUC of 0.82 in the NIDDK database and 0.79 in the Optum database. This conversation explores NASHmap's powerful implications for future clinical trial recruitment and large-scale patient screening.

S2-E20.2 - What Can We Learn about NASH From NASHmap?

artificial intelligence prof diabetes machine learning ehr surfers optum nafld electronic health records auc diagnostic tests xgboost

Play Episode Listen Later Apr 24, 2021 21:34

The Surfers analyze outputs from NASHmap, the first Machine Learning model that can identify patients likely to have NASH, to learn what we can about the disease and how to think about its value for racial and ethnic minorities.Prof. Schattenberg and colleagues built the NASHmap model from a NIDDK database and validated it using the Optum de-identified EHR dataset. The model includes 14 variables, some obvious, others less so, that produce an AUC of 0.82 in the NIDDK database and 0.79 in the Optum database. This conversation explores some model elements and statistical outputs to glean knowledge about what the model says about NASH itself.

S2-E20 - 14-variable Machine Learning model identifies Probable NASH patients from Electronic Health Records

model patients artificial intelligence prof diabetes machine learning variable probable identifies ehr optum nafld electronic health records auc diagnostic tests xgboost

Play Episode Listen Later Apr 22, 2021 53:20

Jörn Schattenberg discusses NASHmap, the first Machine Learning model that can identify patients likely to have NASH in clinical settings. Louise, Roger and guest Dr. Kris Kowdley join Jörn to discuss the model from academic, patient treatment and statistical perspectives.Prof. Schattenberg and colleagues built the NASHmap model from a NIDDK database and validated it using the Optum de-identified EHR dataset. The model includes 14 variables, some obvious, others less so, that produce an AUC of 0.82 in the NIDDK database and 0.79 in the Optum database. The episode focuses on how the model was built and what it implies for clinical trial recruitment, patient care and the view it provides of NASH risk factors. This has powerful implications for future clinical trial recruitment and large-scale patient screening.

S2-E20.1 - How NASHmap, the 1st Machine Learning Model to Identify Probable NASH Patients, Came to Life

model patients artificial intelligence prof identify diabetes machine learning probable ehr optum nafld electronic health records auc diagnostic tests xgboost

Play Episode Listen Later Apr 21, 2021 12:30

Jörn Schattenberg discusses the issues that drove the project to develop a Machine Learning model that can identify patients likely to have NASH in clinical settings, and discusses key elements of study and model design.Prof. Schattenberg and colleagues built the NASHmap model from a NIDDK database and validated it using the Optum de-identified EHR dataset. The model includes 14 variables, some obvious, others less so, that produce an AUC of 0.82 in the NIDDK database and 0.79 in the Optum database. This conversation focuses on why the collaborators undertook this project and how the model was built.

Oracle open-sources Java machine learning library

The Artificial Intelligence Podcast

Play Episode Listen Later Apr 20, 2021 4:59

Tribuo offers tools for building and deploying classification, clustering, and regression models in Java, along with interfaces to TensorFlow, XGBoost, and ONNX --- Send in a voice message: https://anchor.fm/tonyphoang/message

library oracle machine learning java tensorflow xgboost

Episode #96: Serverless and Machine Learning with Alexandra Abbas

Serverless Chats

Play Episode Listen Later Apr 12, 2021 44:19

About Alexa AbbasAlexandra Abbas is a Google Cloud Certified Data Engineer & Architect and Apache Airflow Contributor. She currently works as a Machine Learning Engineer at Wise. She has experience with large-scale data science and engineering projects. She spends her time building data pipelines using Apache Airflow and Apache Beam and creating production-ready Machine Learning pipelines with Tensorflow.Alexandra was a speaker at Serverless Days London 2019 and presented at the Tensorflow London meetup.Personal linksTwitter: https://twitter.com/alexandraabbasLinkedIn: https://www.linkedin.com/in/alexandraabbasGitHub: https://github.com/alexandraabbasdatastack.tv's linksWeb: https://datastack.tvTwitter: https://twitter.com/datastacktvYouTube: https://www.youtube.com/c/datastacktvLinkedIn: https://www.linkedin.com/company/datastacktvGitHub: https://github.com/datastacktvLink to the Data Engineer Roadmap: https://github.com/datastacktv/data-engineer-roadmapThis episode is sponsored by CBT Nuggets: cbtnuggets.com/serverless andStackery: https://www.stackery.io/Watch this video on YouTube: https://youtu.be/SLJZPwfRLb8TranscriptJeremy: Hi, everyone. I'm Jeremy Daly, and this is Serverless Chats. Today I'm joined by Alexa Abbas. Hey, Alexa, thanks for joining me.Alexa: Hey, everyone. Thanks for having me.Jeremy: So you are a machine learning engineer at Wise and also the founder of datastack.tv. So I'd love it if you could tell the listeners a little bit about your background and what you do at Wise and what datastack.tv is all about.Alexa: Yeah. So as you said, I'm a machine learning engineer at Wise. So Wise is an international money transfer service. We are aiming for very transparent fees and very low fees compared to banks. So at Wise, basically, designing, maintaining, and developing the machine learning platform, which serves data scientists and analysts, so they can train their models and deploy their models, easily.Datastack.tv is, basically, it's a video service or a video platform for data engineers. So we create bite-sized videos, educational videos, for data engineers. We mostly cover open source topics, because we noticed that some of the open source tools in the data engineering world are quite underserved in terms of educational content. So we create videos about those.Jeremy: Awesome. And then, what about your background?Alexa: So I actually worked as a data engineer and machine learning engineer, so I've always been a data engineer or machine learning engineer in terms of roles. I also worked, for a small amount of time, I worked as a data scientist as well. In terms of education, I did a big data engineering Master's, but actually my Bachelor is economics, so quite a mix.Jeremy: Well, it's always good to have a ton of experience and that diverse perspective. Well, listen, I'm super excited to have you here, because machine learning is one of those things where it probably is more of a buzzword, I think, to a lot of people where every startup puts it in their pitch deck, like, "Oh, we're doing machine learning and artificial intelligence ..." stuff like that. But I think it's important to understand, one, what exactly it is, because I think there's a huge confusion there in terms of what we think of as machine learning, and maybe we think it's more advanced than it is sometimes, as I think there's lower versions of machine learning that can be very helpful.And obviously, this being a serverless podcast, I've heard you speak a number of times about the work that you've done with machine learning and some experiments you've done with serverless there. So I'd love to just pick your brain about that and just see if we can educate the users here on what exactly machine learning is, how people are using it, and where it fits in with serverless and some of the use cases and things like that. So first of all, I think one of the important things to start with anyways is this idea of MLOps. So can you explain what MLOps is?Alexa: Yeah, sure. So really short, MLOps is DevOps for machine learning. So I guess the traditional software engineering projects, you have a streamlined process you can release, really often, really quickly, because you already have all these best practices that all these traditional software engineering projects implement. Machine learning, this is still in a quite early stage and MLOps is in a quite early stage. But what we try to do in MLOps is we try to streamline machine learning projects, as well as traditional software engineering projects are streamlined. So data scientists can train models really easily, and they can release models really frequently and really easily into production. So MLOps is all about streamlining the whole data science workflow, basically.And I guess it's good to understand what the data science workflow is. So I talk a bit about that as well. So before actually starting any machine learning project, the first phase is an experimentation phase. It's a really iterative process when data scientists are looking at the data, they are trying to find features and they are also training many different models; they are doing architecture search, trying different architecture, trying different hyperparameter settings with those models. So it's a really iterative process of trying many models, many features.And then by the end, they probably find a model that they like and that hit the benchmark that they were looking for, and then they are ready to release that model into production. And this usually looks like ... so sometimes they use shadow models, in the beginning, to check if the results are as expected in production as well, and then they actually release into production. So basically MLOps tries to create the infrastructure and the processes that streamline this whole process, the whole life cycle.Jeremy: Right. So the question I have is, so if you're an ML engineer or you're working on these models and you're going through these iterations and stuff, so now you have this, you're ready to release it to production, so why do you need something like an MLOps pipeline? Why can't you just move that into production? Where's the barrier?Alexa: Well, I guess ... I mean, to be honest, the thing is there shouldn't be a barrier. Right now, that's the whole goal of MLOps. They shouldn't feel that they need to do any manual model artifact copying or anything like that. They just, I don't know, press a button and they can release to production. So that's what MLOps is about really and we can version models, we can version the data, things like that. And we can create reproducible experiments. So I guess right now, I think many bits in this whole lifecycle is really manual, and that could be automated. For example, releasing to production, sometimes it's a manual thing. You just copy a model artifact to a production bucket or whatever. So sometimes we would like to automate all these things.Jeremy: Which makes a lot of sense. So then, in terms of actually implementing this stuff, because we hear all the time about CI/CD. If we're talking about DevOps, we know that there's all these tools that are being built and services that are being launched that allow us to quickly move code through some process and get into production. So are there similar tools for deploying models and things like that?Alexa: Well, I think this space is quite crowded. It's getting more and more crowded. I think there are many ... So there are the cloud providers, who are trying to create tools that help these processes, and there are also many third-party platforms that are trying to create the ML platform that everybody uses. So I think there is no go-to thing that everybody uses, so I think there is many tools that we can use.Some examples, for example, TensorFlow is a really popular machine learning library, But TensorFlow, they created a package on top of TensorFlow, which is called TFX, TensorFlow Extended, which is exactly for streamlining this process and serving models easily, So I would say it TFX is a really good example. There is Kubeflow, which is a machine learning toolkit for Kubernetes. I think there are many custom implementations in-house in many companies, they create their own machine learning platforms, their own model serving API, things like that. And like the cloud providers on AWS, we have SageMaker. They are trying to cover many parts of the tech science lifecycle. And on Google Cloud, we have AI Platform, which is really similar to SageMaker.Jeremy: Right. And what are you doing at Wise? Are you using one of those tools? Are you building something custom?Alexa: Yeah, it's a mix actually. We have some custom bits. We have a custom API, serving API, for serving models. But for model training, we are using many things. We are using SageMaker, Notebooks. And we are also experimenting with SageMaker endpoints, which are actually serverless model serving endpoints. And we are also using EMR for model training and data preparation, so some Spark-based things, a bit more traditional type of model training. So it's quite a mix.Jeremy: Right. Right. So I am not well-versed in machine learning. I know just enough to be dangerous. And so I think that what would be really interesting, at least for me, and hopefully be interesting to listeners as well, is just talk about some of these standard tools. So you mentioned things like TensorFlow and then Kubeflow, which I guess is that end-to-end piece of it, but if you're ... Just how do you start? How do you go from, I guess, building and training a model to then productizing it and getting that out? What's that whole workflow look like?Alexa: So, actually, the data science workflow I mentioned, the first bit is that experimentation, which is really iterative, really free, so you just try to find a good model. And then, when you found a good model architecture and you know that you are going to receive new data, let's say, I don't know, I have a day, or whatever, I have a week, then you need to build out a retraining pipeline. And that is, I think, what the productionization of a model really means, that you can build a retraining pipeline, which can automatically pick up new data and then prepare that new data, retrain the model on that data, and release that model into production automatically. So I think that means productionization really.Jeremy: Right. Yeah. And so by being able to build and train a model and then having that process where you're getting that feedback back in, is that something where you're just taking that data and assuming that that is right and fits in the model or is there an ongoing testing process? Is there supervised learning? I know that's a buzzword. I'm not even sure what it means. But those ... I mean, what types of things go into that retraining of the models? Is it something that is just automatic or is it something where you need constant, babysitting's probably the wrong word, but somebody to be monitoring that on a regular basis?Alexa: So monitoring is definitely necessary, especially, I think when you trained your model and you shouldn't release automatically in production just because you've trained a new data. I mentioned this shadow model thing a bit. Usually, after you retrained the model and this retraining pipeline, then you release that model into shadow mode; and then you will serve that model in parallel to your actual product production model, and then you will check the results from your new model against your production model. And that's a manual thing, you need to ... or maybe you can automate it as well, actually. So if it performs like ... If it is comparable with your production model or if it's even better, then you will replace it.And also, in terms of the data quality in the beginning, you should definitely monitor that. And I think that's quite custom, really depends on what kind of data you work with. So it's really important to test your data. I mean, there are many ... This space is also quite crowded. There are many tools that you can use to monitor your distribution of your data and see that the new data is actually corresponds to your already existing data set. So there are many bits that you can monitor in this whole retraining pipeline, and you should monitor.Jeremy: Right. Yeah. And so, I think of some machine learning like use cases of like sentiment analysis, for example... looking at tweets or looking at customer service conversations and trying to rate those things. So when you say monitoring or running them against a shadow model, is that something where ... I mean, how do you gauge what's better, right? if you've got a shadow... I mean, what's the success metric there as to say X number were classified as positive versus negative sentiment? Is that something that requires human review or some sampling for you to kind of figure out the quality of the success of those models?Alexa: Yeah. So actually, I think that really depends on the use case. For example, when you are trying to catch fraudsters, your false positive rate and true positive rate, these are really important. If your true positive rate is higher that means, oh, you are catching more fraudsters. But let's say your new model, with your model, also the false positive rate is higher, which means that you are catching more people who are actually not fraudsters, but you have more work because I guess that's a manual process to actually check those people. So I think it really depends on the use case.Jeremy: Right. Right. And you also said that the markets a little bit flooded and, I mean, I know of SageMaker and then, of course, there's all these tools like, what's it called, Recognition, a bunch of things at AWS, and then Google has a whole bunch of the Vision API and some of these things and Watson's Natural Language Processing over at IBM and some of these things. So there's all these different tools that are just available via an API, which is super simple and great for people like me that don't want to get into building TensorFlow models and things like that. So is there an advantage to building your own models beyond those things, or are we getting to a point where with things like ... I mean, again, I know SageMaker has a whole library of models that are already built for you and things like that. So are we getting to a point where some of these models are just good enough off the shelf or do we really still need ... And I know there are probably some custom things. But do we still really need to be building our own models around that stuff?Alexa: So to be honest, I think most of the data scientists, they are using off-the-shelf models, maybe not the serverless API type of models that Google has, but just off-the-shelf TensorFlow models or SageMaker, they have these built-in containers for some really popular model architectures like XGBoost, and I think most of the people they don't tweak these, I mean, as far as I know. I think they just use them out of the box, and they really try to tweak the data instead, the data that they have, and try to have these off-the-shelf models with higher and higher quality data.Jeremy: So shape the data to fit the model as opposed to the model to fit the data.Alexa: Yeah, exactly. Yeah. So you don't actually have to know ... You don't have to know how those models work exactly. As long as you know what the input should be and what output you expect, then I think you're good to go.Jeremy: Yeah, yeah. Well, I still think that there's probably a lot of value in tuning the models though against your particular data sets.Alexa: Yeah, right. But also there are services for hyperparameter tuning. There are services even for neural architecture search, where they try a lot of different architectures for your data specifically and then they will tell you what is the best model architecture that you should use and same for the hyperparameter search. So these can be automated as well.Jeremy: Yeah. Very cool. So if you are hosting your own version of this ... I mean, maybe you'll go back to the MLOps piece of this. So I would assume that a data scientist doesn't want to be responsible for maintaining the servers or the virtual machines or whatever it is that it's running on. So you want to have this workflow where you can get your models trained, you can get them into production, and then you can run them through this loop you talked about and be able to tweak them and continue to retrain them as things go through. So on the other side of that wall, if we want to put it that way, you have your ops people that are running this stuff. Is there something specific that ops people need to know? How much do they need to know about ML, as opposed to ... I mean, the data scientists, hopefully, they know more. But in terms of running it, what do they need to know about it, or is it just a matter of keeping a server up and running?Alexa: Well, I think ... So I think the machine learning pipelines are not yet as standardized as a traditional software engineering pipeline. So I would say that you have to have some knowledge of machine learning or at least some understanding of how this lifecycle works. You don't actually need to know about research and things like that, but you need to know how this whole lifecycle works in order to work as an ops person who can automate this. But I think the software engineering skills and DevOps skills are the base, and then you can just build this knowledge on top of that. So I think it's actually quite easy to pick this up.Jeremy: Yeah. Okay. And what about, I mean, you mentioned this idea of a lot of data scientists aren't actually writing the models, they're just using the preconfigured model. So I guess that begs the question: How much does just a regular person ... So let's say I'm just a regular developer, and I say, "I want to start building machine learning tools." Is it as easy as just pulling a model off the shelf and then just learning a little bit more about it? How much can the average person do with some of these tools out of the box?Alexa: So I think most of the time, it's that easy, because usually the use cases that someone tries to tackle, those are not super edge cases. So for those use cases, there are already models which perform really well. Especially if you are talking about, I don't know, supervised learning on tabular data, I think you can definitely find models that will perform really well off the shelf on those type of datasets.Jeremy: Right. And if you were advising somebody who wanted to get started... I mean, because I think that I think where it might come down to is going to be things like pricing. If you're using Vision API and you're maybe limited on your quota, and then you can ... if you're paying however many cents per, I guess, lookup or inference, then that can get really expensive as opposed to potentially running your own model on something else. But how would you suggest that somebody get started? Would you point them at the APIs or would you want to get them up and running on TensorFlow or something like that?Alexa: So I think, actually, for a developer, just using an API would be super easy. Those APIs are, I think ... So getting started with those APIs just to understand the concepts are very useful, but I think getting started with Tensorflow itself or just Keras, I definitely I would recommend that, or just use scikit-learn, which is a more basic package for more basic machine learning. So those are really good starting points. And there are so many tutorials to get started with, and if you have an idea of what you would like to build, then I think you will definitely find tutorials which are similar to your own use case and you can just use those to build your custom pipeline or model. So I would say, for developers, I would definitely recommend jumping into TensorFlow or scikit-learn or XGBoost or things like that.Jeremy: Right, right. And how many of these models exist? I mean, are we talking there's 20 different models or are we talking there's 20,000 models?Alexa: Well, I think ... Wow. Good question. I think we are more towards today maybe not 20,000, but definitely many thousands, I think. But there are popular models that most of the people use, and I think there are maybe 50 or 100 models that are the most popular and most companies use them and you are probably fine just using those for any use case or most of the use cases.Jeremy: Right. Now, and speaking of use cases, so, again, I try to think of use cases or machine learning and whether it's classifying movies into genres or sentiment analysis, like I said, or maybe trying to classify news stories, things like that. Fraud detection, you mentioned. Those are all great use cases, but what are ... I know you've worked on a bunch of projects. So what are some of the projects that you've done and what were the use cases that were being solved there, because I find these to be really interesting?Alexa: Yeah. So I think a nice project that I worked on was a project with Lush, which is a cosmetics company. They manufacture like soaps and bath bombs. And they have this nice mission that they would like to eliminate packaging from their shops. So they asked us, when I worked at Datatonic, we worked on a small project with them. They asked us to create an image recognition model, to train one, and then create a retraining pipeline that they can use afterwards. So they provided us with many hundred thousand images of their products, and they made photos from different angles with different lightings and all of that, so really high-quality image data set of all their products.And then, we used a mobile net model, because they wanted this model to be built-in into their mobile application. So when users actually use this model, they download it with their mobile application. And then, they created a service called Lush [inaudible], which you can use from within their app. And then, people can just scan the products and they can see the ingredients and how-to-use guides and things like that. So this is how they are trying to eliminate all kinds of packaging from their shops, that they don't actually need to put the papers there or put packaging with ingredients and things like that.And in terms of what we did on the technical side, so as I mentioned, we used a mobile net model, because we needed to quantize the model in order to put it on a mobile device. And we used TF Lite to do this. TF Lite is specifically for models that you want to run on an edge device, like a mobile phone. So that was already a constraint. So this is how we picked a model. I think, back then, like there were only a few model architectures supported by TF Lite, and I think there were only two, maybe. So we picked MobileNet, because it had a smaller size.And then, in terms of the retraining, so we automated the whole workflow with Cloud Composer on Google Cloud, which is a managed version of Apache Airflow, the open source scheduling package. The training happened on AI Platform, which is Google Cloud's SageMaker.Jeremy: Yeah.Alexa: Yeah. And what else? We also had an image pre-processing step just before the training, which happened on Dataflow, which is an auto-scaling processing service on Google Cloud. And after we trained the model, we just saved the model active artifact in a bucket, and then ... I think we also monitored the performance of the model, and if it was good enough, then we just shipped the model to developers who actually they manually updated the model file that went into the application that people can download. So we didn't really see if they use any shadow model thing or anything like that.Jeremy: Right. Right. And I think that is such a cool use case, because, if I'm hearing you right, there were just like a bar soap or something like that with no packaging, no nothing, and you just hold your mobile phone camera up to it or it looks at it, determines which particular product is, gives you all that ... so no QR codes, no bar codes, none of that stuff. How did they ring them up though? Do you know how that process worked? Did the employees just have to know what they were or did the employees use the app as well to figure out what they were billing people for?Alexa: Good question. So I think they wanted the employees as well to use the app.Jeremy: Nice.Alexa: Yeah. But when the app was wrong, then I don't know what happened.Jeremy: Just give them a discount on it or something like that. That's awesome. And that's the thing you mentioned there about ... Was it Tensor Lite, was it called?Alexa: TF Lite. Yeah.Jeremy: TF Lite. Yes. TensorFlow Lite or TF Lite. But, basically, that idea of being able to really package a model and get it to be super small like you said. You said edge devices, and I'm thinking serverless compute at the edge, I'm thinking Lambda functions. I'm thinking other ways that if you could get your models small enough in package, that you could run it. But that'd be a pretty cool way to do inference, right? Because, again, even if you're using edge devices, if you're on an edge network or something like that, if you could do that at the edge, that'd be a pretty fast response time.Alexa: Yeah, definitely. Yeah.Jeremy: Awesome. All right. So what about some other stuff that you've done? You've mentioned some things about fraud detection and things like that.Alexa: Yeah. So fraud detection is a use case for Wise. As I mentioned, Wise services international money transfer, one of its services. So, obviously, if you are doing anything with money, then a full use case is for sure that you will have. So, I mean, in terms of ... I don't actually develop models at Wise, so I don't know actually what models they use. I know that they use H2O, which is a Spark-based library that you can use for model training. I think it's quite an advanced library, but I haven't used it myself too much, so I cannot talk about that too much.But in terms of the workflow, it's quite similar. We also have Airflow to schedule the retraining of the models. And they use EMR for data preparation, so quite similar to Dataflow, in a sense. A Spark-based auto-scaling cluster that processes the data and then, they train the models on EMR as well but using this H2O library. And then in the end, when they are happy with the model, we have this tool that they can use for releasing shadow models in production. And then, if they are satisfied with the performance of the model that they can actually release into production. And at Wise, we have a custom micro service, a custom API, for serving models.Jeremy: Right. Right. And that sounds like you need a really good MLOps flow to make all that stuff work, because you just have a lot of moving parts there, right?Alexa: Yeah, definitely. Also, I think we have many bits that could be improved. I think there are many bits that still a bit manual and not streamlined enough. But I think most of the companies struggle with the same thing. It's just we don't yet have those best practices that we can implement, so many people try many different things, and then ... Yeah, so I think it's still a work in progress.Jeremy: Right. Right. And I'm curious if your economics background helps at all with the fraud and the money laundering stuff at all?Alexa: No.Jeremy: No. All right. So what about you worked in another data engineering project for Vodafone, right?Alexa: Yeah. Yeah, so that was a data engineering project purely, so we didn't do any machine learning. Well, Vodafone has their own Google Analytics library that they use in all their websites and mobile apps and things like that and that sense Clickstream data to a server in a Google Cloud Platform Project, and we consume that data in a streaming manner from data flows. So, basically, the project was really about processing this data by writing an Apache Beam pipeline, which was always on and always expected messages to come in. And then, we dumped all the data into BigQuery tables, which is data warehouse in Google Cloud. And then, these BigQuery tables powered some of the dashboards that they use to monitor the uptime and, I don't know, different metrics for their websites and mobile apps.Jeremy: Right. But collecting all of that data is a good source for doing machine learning on top of that, right?Alexa: Yeah, exactly. Yeah. I think they already had some use cases in mind. I'm not sure if they actually done those or not, but it's a really good base for machine learning, what we collected the data there in BigQuery, because that is an analytical data warehouse, so some analysts can already start and explore the data as a first step of the machine learning process.Jeremy: Right. I would think anomaly detection and things like that, right?Alexa: Yeah, exactly.Jeremy: Right. All right. Well, so let's go on and talk about serverless a little bit more, because I know I saw you do a talk where you were you ran some experiments with serverless. And so, I'm just kind of curious, where are the limitations that you see? And I know that there continues ... I mean, we now have EFS integration, and we've got 10 gigs of memory for lambda functions, you've even got Cloud Run, which I don't know how much you could do with that, but where's still some of the limitations for running machine learning in a serverless way, I guess?Alexa: So I think, actually, from this data science lifecycle, many bits, there are Cloud providers offer a lot of serverless options. For data preparation, there is Dataflow, which is, I think, kind of like serverless data processing service, so you can use that for data processing. For model training, there is ... Or the SageMaker and AI Platform, which are kind of serverless, because you don't actually need to provision these clusters that you train your models on. And for model serving, in SageMaker, there are the serverless model endpoints that you can deploy. So there are many options, I think, for serverless in the machine learning lifecycle.In my experience, many times, it's a cost thing. For example, at Wise, we have this custom model serving API, where we serve all our models. And if they would use SageMaker endpoints, I think, a single SageMaker endpoint is about $50 per month, that's the minimum price, and that's for a single model and a single endpoint. And if you have thousands of models, then your price can go up pretty quickly, or maybe not thousands, but hundreds of models, then your price can go up pretty quickly. So I think, in my experience, limitation could be just price.But in terms of ... So I think, for example, if I compare Dataflow with a spark cluster that you program yourself, then I would definitely go with Dataflow. I think it's just much easier and maybe cost-wise as well, you might be better off, I'm not sure. But in terms of comfort and developer experience, it's a much better experience.Jeremy: Right. Right. And so, we talked a little bit about TF Lite there. Is that something possible where maybe the training piece of it, running that on Functions as a Service or something like that maybe isn't the most efficient or cost-effective way to do that, but what about running models or running inference on something like a Lambda function or a Google Cloud function or an Azure function or something like that? Is it possible to package those models in a way that's small enough that you could do that type of workload?Alexa: I think so. Yeah. I think you can definitely make inference using a Lambda function. But in terms of model training, I think that's not a ... Maybe there were already experiments for, I'm sure there were. But I think it's not the kind of workload that would fit for Lambda functions. That's a typical parallelizable, really large-scale workloads for ... You know the MapReduce type of data processing workloads? I think those are not necessarily fit for Lambda functions. So I think for model training and data preparation, maybe those are not the best options, but for model inference, definitely. And I think there are many examples using Lambda functions for inference.Jeremy: Right. Now, do you think that ... because this is always something where I find with serverless, and I know you're more of a data scientist, ML expert, but I look at serverless and I question whether or not it needs to handle some of these things. Especially with some of the endpoints that are out there now, we talked about the Vision API and some of the other NLP things, are we putting in too much effort maybe to try to make serverless be able to handle these things, or is it just something where there's a really good way to handle these by hosting your ... I mean, even if you're doing SageMaker, maybe not SageMaker endpoints, but just running SageMaker machines to do it or whatever, are we trying too hard to squeeze some of these things into a serverless environment?Alexa: Well, I don't know. I think, as a developer, I definitely prefer the more managed versions of these products. So the less I need to bother with, "Oh, my cluster died and now we need to rebuild a cluster of things," and I think serverless can definitely solve that. I would definitely prefer the more managed version. Maybe not serverless, because, for some of the use cases or some of the bits from the lifecycle, serverless is not the best fit, but a managed product is definitely something that I prefer over a non-managed product.Jeremy: Right. And so, I guess one last question for you here, because this is something that always interests me. Just there are relevant things that we need machine learning for. I mean, I think the fraud detection is a hugely important one. Sentiment analysis, again. Some of those other things are maybe, I don't know, I shouldn't call them toy things, but personalization and some of the things, they're all really great things to have, and it seems like you can't build an application now without somebody wanting some piece of that machine learning in there. So do you see that as where we are going where in the future, we're just going to have more of these APIs?I mean, out of AWS, because I'm more familiar with the AWS ecosystem, but they have Personalize and they have Connect and they have all these other services, they have the recommendation engine thing, all these different services ... Lex, or whatever, that will read text, natural language processing and all that kind of stuff. Is that where we're moving to just all these pre-trained, canned products that I can just access via an API or do you think that if you're somebody getting started and you really want to get into the ML world that you should start diving into the TensorFlows and some of those other things?Alexa: So I think if you are building an app and your goal is not to become an ML engineer or a data scientist, then these canned models are really useful because you can have a really good recommendation engine in your product, you could have really good personalization engine in your product, things like that. And so, those are, I think, really useful and you don't need to know any machine learning in order to use them. So I think we definitely go into that direction, because most of the companies won't hire data scientists just to train a recommender model. I think it's just easier to use an API endpoint that is already really good.So I think, yeah, we are definitely heading into that direction. But if you are someone who wants to become a data scientist or wants to be more involved with MLOps or machine learning engineering, then I think jumping into TensorFlow and understanding, maybe not, as we discussed, not getting into the model architectures and things like that, but just understanding the workflow and being able to program a machine learning pipeline from end to end, I think that's definitely recommended.Jeremy: All right. So one last question: If you've ever used the Watson NLP API or the Google Vision API, can you put on your resume that you're a machine learning expert?Alexa: Well, if you really want to do that, I would give it a go. Why not?Jeremy: All right. Good. Good to know. Well, Alexa, thank you so much for sharing all this information. Again, I find the use cases here to be much more complex than maybe some of the surface ones that you sometimes hear about. So, obviously, machine learning is here to stay. It sounds like there's a lot of really good opportunities for people to start kind of dabbling in it and using that without having to become a machine learning expert. But, again, I appreciate your expertise. So if people want to find out more about you or more about the things you're working on and datastack.tv, things like that, how do they do that?Alexa: So we have a Twitter page for datastack.tv, so feel free to follow that. I also have a Twitter page, feel free to follow me, account, not page. There is a datastack.tv website, so it's just datastack.tv. You can go there, and you can check out the courses. And also, we have created a roadmap for data engineers specifically, because there was no good roadmap for data engineers. I definitely recommend checking that out, because we listed most of the tools that a data engineer and also machine learning engineer should know about. So if you're interested in this career path, then I would definitely recommend checking that out. So under datastack.tv's GitHub, there is a roadmap that you can find.Jeremy: Awesome. All right. And that's just, like you said, datastack.tv.Alexa: Yes.Jeremy: I will make sure that we get your Twitter and LinkedIn and GitHub and all that stuff in there. Alexa, thank you so much.Alexa: Thanks. Thank you.

18: Stacking bamboo spears (部品, ブカ, oka)

Interaxion | 物理系ポッドキャスト

Play Episode Listen Later Apr 8, 2021 77:39

部品さん、ブカさんと論文コンペ、献本とバズった本などについて話しました。以下の Show Notes は簡易版です。完全版はこちら。0:00 論文コンペの振り返り論文の被引用数予測 - ProbSpaceProbSpace 論文の被引用数予測コンペ反省メモ｜elmizbuka｜note上位の方たちの解法1位2位5位6位7位TabNetとは一体何者なのか？ - ZennLightGBM 徹底入門 – LightGBMの使い方や仕組み、XGBoostとの違いについてcatboostの推論の仕組みを理解する (1/2)｜Yotaro Katayama｜noteCond-matの論文タイトルから固体物性の流行を探る(2018) - ぶひんブログCond-matデータから共著者ネットワークを探る - ぶひんブログColeridge Initiative - Show US the Data - Kagglecuprates (銅酸化物系)と pnictides (鉄系) はそれぞれブカさん、部品さんが専門としていた超伝導体のこと。ep. 9 や ep. 11 でも出てきます。フィジカル・レビュー - Wikipediaニールス・ボーア - Wikipediaマレー・ゲルマン - Wikipedia38:04 献本とバズった本これならわかる機械学習入門 (KS物理専門書)ディープラーニングと物理学原理がわかる、応用ができる (KS物理専門書)意思決定分析と予測の活用基礎理論からPython実装まで → 短縮 URL (https://amzn.to/3wogrL5) が Googirl !?女子力アップCafe Googirlゼロから学ぶPythonプログラミング Google Colaboratoryでらくらく導入フォン・ノイマンの哲学人間のフリをした悪魔探究する精神職業としての基礎科学物理学者のすごい思考法 (インターナショナル新書)1:00:50 日本物理学会新会長第77期会長 - 日本物理学会新会長については、このポッドキャストの ep. 1 でも話しています。同時通訳AIを専門家級に、25年実現へ開発加速: 日本経済新聞 → NICT の同時通訳デモ。1:09:17 SF『人乃彼岸』郝景芳(ハオ・ジンファン)折りたたみ北京　現代中国SFアンソロジー三体3:死神永生フェルミのパラドックス - Wikipedia1:15:42 お知らせ朗読 - NHK聴き逃し配信 - 朗読 - NHK で放送後2ヶ月後くらいまで聴けます。紀ノ川ぶひんブログおたよりお待ちしております。

wikipedia sf ks spears stacking bamboo cond xgboost nhk nhk

401 AutoML at outbrain with Assaf Klein

Reversim Podcast

Play Episode Listen Later Feb 21, 2021

פודקאסט מספר 401 של רברס עם פלטפורמה - מברוק! שינינו קידומת במהלך הסגר . . . Unauthorized 401.היום אנחנו הולכים לדבר עם אסף מחברת Outbrain על נושא שנקרא AutoML - תיכף נדבר על מה זה ועל מה זה עושה.ואנחנו, כרגיל, באולפנינו הביתי אשר בכרכור - סגר מספר 3 עבר עלינו בשלום, החיסונים כבר אצלנו ואנחנו נתנים גז . . .(אורי) כן - חוץ מזה שפתאום הופעת עם משקפיים . . . (רן) רק לאותיות הקטנות.אז אסף - בוא קודם נכיר אותך: מי אתה? מה אתה עושה ב-Outbrain? אחר כך כמובן נדבר על מה זה AutoML ולמה זה מעניין אותנו.(אורי) זה סוג של אוטו . . .(אסף) אז אני אסף קליין, והיום ב-Outbrain אני מנהל קבוצה של מהנדסים ו-Data Scientistsספציפית, ה-Task הגדול שלנו זה לבנות את מערכת ה-CTR, זאת אומרת - היכולת שלנו לחזות את ההסתברות ש-User יקליק על אחת ההמלצות שלנו.בהשכלתי אני בעל תואר שני במתימטיקה ובמדעי המחשב, עבדתי בעבר בכמה תפקידים, גם כמה תפקידי Engineer וגם בתפקידי אלגוריתמיקה שונים ומשונים בכמה חברות.ב-Outbrain אני כבר כשש שנים - וואו, הזמן עף כשנהנים . . .(רן) מעולה . . . עד כמה בכלל חשוב כל הסיפור הזה של CTR ב-Outbrain? למה בכלל זה מעניין AutoML או Machine Learning באופן כללי ב-Outbrain?(אסף) אני אתייחס חלק הראשון של השאלה שלך בהתחלה - CTR זה בעצם אבן הבניין - ואני קצת אצטנע - המרכזית במנוע ההמלצה שלנו.זאת אומרת - כשאנחנו פוגשים משתמשת באחד מאתרי התוכן שעובדים איתנו, בעצם כדי להבין מהי ההמלצה הכי נכונה עבורה יש לנו איזשהו מודל, Predictor, שאמור לבוא ולהגיד מה ההסתברות שעבור אותה היוזרית - פריט התוכן הספציפי יעניין אותה כרגע, בהקשר הנתון שבו היא נמצאת.וזה בעצם השיקול הכי משמעותי לגבי איזו המלצה היא תראה.(אורי) חלק בלתי נפרד מה . . . מוצר ההמלצות, התפקיד שלו הוא להגיש תכנים מעניינים וליצור את המוטיבציה לעבור אל התוכן הזה.זה נמדד אצלנו בקליק - כמו שמקליקים על תוצאת חיפוש בגוגל או על תוכן שמוגש בפייסבוק - אנחנו עושים את זה “באינטרנט הפתוח”.(רן) אז אפשר להגיד שהתפקיד שלך ושל החבר’ה שלך זה חלק מה - Core Value Proposition של Outbrain, חלק מהמנוע המרכזי של Outbrain - ואתם עוסקים בעיקר בתחום של Machine Learning.פה ספציפית אנחנו רוצים לדבר עלך הקונספט של AutoML - אז בוא אולי נתחיל לדבר על מהי הגדרת הבעיה, למה בכלל AutoML? למה זה דומה? איפה נתקלת בזה לראשונה? (אסף) אז שנייה . . . AutoML, אני חושב, זה Buzzword מאוד נפוץ היום בכל התעשייה - ואני לפחות אתן את ה-Take שלי על כל הדבר הזה.אני חושב שכל מי שהתנסה אי פעם בפיתוח מערכת מבוססת Machine Learning, שצריך גם לשים אותה ב-Production ואמורה לשמש משתמשים אמיתיים באילוצים אמיתיים, חווה תסכול מסויים או קושי מסויים, ואני אנסה טיפה להרחיב, אם אני מדבר יותר מדי אנא עצור אותי . . .לרוב, כשאנחנו חושבים אל איזשהו Task שהוא סביב Data Science אז יש לנו כמה שלבים לדבר הזה - כ-Data Scientist אנחנו לרוב מקבלים איזושהי בעיה, איזשהו Data-set, איזושהי משימה - לרוב זה מתחיל באיזשהו מחקר, Offline-י, שבו אנחנו עוסקים באקספלורציה (Exploration) של הדאטה, הבנה של הבעיה, הבנה של המימדים הרלוונטיים וכן הלאה.אם התמזל מזלנו, אחרי עבודה קשה, אנחנו מגיעים לאיזשהו מודל שאנחנו מרוצים ממנוהיה נחמד אם זה היה הסוף - אבל המעבר ממודל שעובד לנו לוקאלית על המכונה שלנו, על ה-Data set שיש על המכונה, למשהו שרץ ב-Production הוא מעבר, בוא נגדיר את זה כ”לא טריוואלי” . . .אני יכול קצת להרחיב?(רן) כן . . . באופן טיפוסי, אתם בונים את המודל על המכונות הפרטיות, או שהמכונה זה רק כמשל?(אסף) כמשל . . . יש לי פואנטה בסוף . . . אז סבבה - אנחנו באים לשים את המודל ב-Production, ולא תמיד גם שם החיים פשוטים, כי Data Scientists ואלגוריתמיקאים אוהבים לעבוד ב-Stack שלהם, לרוב ב-Python . . .אני כמובן מכליל, אבל Python זה לרוב ה-Stack הנפוץ, עם כלים וחבילות ו-Libraries שמאוד נוחים לנו כ-Data Scientists - ולא תמיד ה-Stack ב-Production שלנו כתוב גם הוא ב-Python, ולכן לא תמיד זה קל, לדוגמא, לקחת את ה- XGBoost המדהים שבנינו ולשים אותו במערכת ה-Production שלנו.במיוחד אם אני מתייחס שנייה ל-use cases של Outbrain, שבהם נדרשים SLA מאוד קשיחים - “קח את ה-XGBoost שלך ב-Python ועכשיו תן לו להגיש 100,000 בקשות בשנייה” - ושיהיה לנו בהצלחה עם זה. . .(רן) אז אולי בשביל ה-Context, אני בתפקיד שלי ביום-יום גם מנהל קבוצה של Data Science ב-AppsFlyer ואני גם מזהה עם הכאבים שלכם והם חוזרים גם אצלנו.אוקיי - אז מה עושים? Data Scientist אוהב לעבוד ב-Python, אוהב לכתוב XGBoost - וזהו, אחר כך נגמר העניין. אבל בכל אופן - ה-Business רוצה שניקח את המודל הזה ונשים אותו ב-Production, אז מה עושים?(אסף) אז האמת היא שיש לי פה עוד פואנטה, רן, אני מצטער שאני ככה נותן לך קונטרה, אני אשמח לענות על זה - בוא נניח שהתגברנו על הבעיות האלה, אני חושב שיש מגוון של דרכים לתת מענה לדבר הזהאם זה כל מיני חבילות שיודעות לקחת את המודל XGBoost ולהגיש אותו בחבילות שונות ומשונות, או אולי אתה ממש רוצה להתאבד ובא לך לממש את זה ב-++C כדי בכלל לחוות זמני תגובה מהיריםאבל אפילו אם התגברת על המהמורה הזאת, גם כשהמודל שלך מתחיל לשרת את ה-Production, גם שם אתה מתחיל לחוות Friction . . . אני אתן שתי דוגמאות ברשותך - למשל, משהו שאני חושב שהצוות שלי בזבז - השקיע! - עליו משהו כמו שלושה שבועות כשהמודל שלנו היה ב-Production וזה שאתה פשוט מקבל פרדיקציות (Predictions) שהן לא הגיוניות . . . וכשאתה בא לחקור את זה אתה מבין שהדאטה שראית בזמן ה-Training הוא לא הדאטה שאתה רואה בזמן ה-Serving בהמון מובנים . . .סתם אנקדוטה - יש לך דאטה שמגיע לך ב-Serving באיזשהו API והוא פורמט בצורה אחת, אבל כשהוא נכתב כבר ל-Data Lake, ל-Database שלך, על ה-Hive שלך או Whatever [כבר יצא כזה שירות של AWS? שם טוב], הוא מפורמט (Format) קצת אחרת - נגיד שהוא עובר ל-Lower case, ועכשיו המודל שלך ראה Upper case ב-Training וב-Serving הוא רואה Lower Case ואוי ואבוי . . .זה עוד סוג של Friction(רן) אתה יודע מה? אם אנחנו כבר מערימים קשיים, אז תרשה לי להערים קושי נוסף: יש את כל הסיפור הזה של Feature engineering, שלפעמים הוא קורה ב-Offline באופן שונה ממה שקורה ב-Online, וזו עוד מהמורה שככה צצה וצריך לעבור . . .(אסף) לגמרי . . . בוא ככה נבנה את המוטיבציה אפילו, נעשה פה דרמה יותר גדולה - גם אם התגברנו על זה ובנינו כלים, עכשיו יש לנו מודל ב-Production, ולרוב זה מודל בהתחלה ראשוני, ועכשיו יש לנו צוות של Data Scientists שכל הזמן רוצים לשפר אותו, ולבדוק את הדברים שלהם ב-Production ולראות שהם עבדו - איך אנחנו עושים את כל הזמן? זה בעצם כל הסיפור הזה כפול כל החיים בערך . . . (רן) כן . . .(אורי) כל החיים כפול כל ה-Data Scientists . . .(אסף) אמרת את זה נכון(רן) אוי, כבר יש לנו מטריצה, עוד שנייה אנחנו עוברים ל Tesor-ים . . . וגם יכול להיות שמודל שעבד מצויין אתמול אולי יפסיק לעבוד מחר, כי העולם השתנה או כי דברים קרו.(אסף) נכון, וזה אחד האייטמים שלי פה שלא דיברתי עליהם - Concept Drift, שזה משהו מאוד מאוד שכיח, במיוחד בתעשיית ה-Ad-Tech, כשה-Marketplace הוא נורא דינאמי.(רן) כן - המלצה שאולי הייתה מצויינת אתמול, והדאטה בסדר והכל Lower case והכל בסדר - אבל התוכן השתנה, עולם התוכן השתנה וההמלצה כבר פחות רלוונטית.(אורי) אומרים שעם העיתון של אתמול אפשר לעטוף דגים? אז זה בערך . . . זה כבר בעולם ה-Online זה שעם העיתון של לפני עשר דקות אפשר לעטוף דגים.(אסף) לגמרי(רן) בסדר - אז עכשיו אנחנו מוכנים לפתור את הבעיה?(אסף) יש איזושהי מנטרה שאני מנסה לחזור עליה במהלך השיחה שלנו - אני חושב שאחד ה-Take aways שלנו מאיזשהו שכתוב מאוד מאסיבי של המערכת במהלך השנה האחרונה הוא שצריך בעצם לאפשר ל-Data Scientist להיכשל מהר . . .בטח שמעתם את זה, זה קצת קלישאה, אבל זה נורא נכון.(רן) אז אתה אומר “להיכשל מהר” . . . לא להצליח אלא להיכשל מהר. מה עומד מאחורי זה?(אסף) כל מי שעסק באלגוריתמיקה ו-Data Science, לפחות בי זה היכה לפני כמה זמן, כי - כמה כיף זה לפתח פרויקט תוכנה רגיל? בפרויקט תוכנה רגיל קל לך לראות אם ה-Latency שלך מספיק מהיר, אם הכפתור שלך במקום וכן הלאה . . .ב-Data Science זה לא ככה - ב-Data Science לרוב אתה יורה באפילה, ומקווה לטוב. לצערי - הרבה מאוד פעמים אנחנו נכשלים, ולכן אם נאפשר בפרויקט ל-Data Scientists שלנו להיכשל מהר, זה יאפשר להם לנסות דברים הרבה יותר מהר, הם לא יפחדו לנסות, יהיו יותר הצלחות, המודלים שלנו ישתפרו וה-KPI העסקיים שלנו יעלו.(רן) אז בוא נסתכל, נגיד, על דוגמא - יש לך איזשהו מודל המלצות, וחלמת בלילה על פיצ’ר חדש: “נגיד שכל אות שנייה היא ב-Capital אז זו הולכת להיות המלצה מצויינת!” - ועכשיו אתה רוצה לבדוק האם זה הולך לעבוד.אם יקח לך חודש לבדוק את כל הסיפור, אתה תספיק לבדוק אולי, עד שיפטרו אותך, משהו כמו שניים או שלושה רעיונות, ואז יגמר לך הזמן, Game Over; אבל אם אתה נכשל מהר, ולוקח לך יום או חצי יום לבדוק את הרעיון המטורף הזה אז אתה תספיק לבדוק עוד ועוד, ובסופו של דבר תגיע דווקא לרעיון קצת יותר מוצלח.(אסף) נכון - ואם תיקח את זה אפילו יותר קיצוני, אם תספיק לבדוק בשלושה ימים 20,000 אפשרויות להוספת פיצ’ר כזה או אחר, בפורמולציות כאלו ואחרות, זה אפילו יותר טוב.(רן) אני חושב שיש פה שני מושגים שהם אולי דומים - אחד מהם זה AutoML והשני זה MLOps - ואני חושב שדיברת על שניהם . . . בוא נגדיר את שניהם ונראה מה כל אחד פותר.(אסף) אז באמת אני אעשה Zoom-out ואתייחס למה שאמרת - באיזשהו מקום אני חושב שההבנה היא, לא רק אצלנו אלא באופן כללי, שיש איזשהו יתרון להפרדה בין ניסויים ב-Offline לבין ניסויים ב-Online, ב-Serving.זאת אומרת - בסוף היום, ההוכחה שהצלחת לעשות משהו מועיל באיזור האלגוריתמי, באיזור המודל, הוא שאתה שם, במקרה שלנו, איזשהו A/B Test ומוכיח שה-KPI העסקיים במודל החדש שייצרת עולים על המודל הקודם - וזה באמת ה-Online.בעצם, כל היכולת לעשות אורקסטרציה (Orchestration) לסיפור הזה - לעלות את הA/B Test בצורה נוחה, לשים את המודל ב-Production בצורה נוחה - אני חושב שזה יותר באיזורים של ה-MLOps.לעומת זאת, כמובן שלכל A/B Test או לכל ניסוי Online-י יש Overheads, יש תקורות - ולכן אם אנחנו באמת רוצים להיות יעילים אנחנו צריכים לאפשר לעשות ניסויים Offline, כלומר - איזושהי סביבה שבה ה-Data Scientists שלנו יוכלו לחקור את הבעיה ולחקור כל מיני היפותזות שיש להם בצורה מהירה ויעילה, בעזרת כלים שנותנים להם להתרכז בעבודת ה-Data Science ופחות באספקטים טכניים שתמיד הם חלק מהחיים שלנו.אולי האוטומציה הזו, או כל מיני כלים שמאפשרים את האוטומציה הזו - אפשר לתייג אותם בתור AutoML.(רן) לצורך העניין, חשבתי על הפיצ’ר המטורף שלי שבו כל אות שנייה היא Capital letter וזה הולך להיות פצצה - ועכשיו אני מתלבט: האם אני רוצה לדחוף את זה לתוך XGBoost או לתוך רשת ניורונים - ואם כן אז כמה שכבות או כמה ניורונים, או כמה עצים הולכים להיות בתוך . . . מה העומק של ה-XGBoost.אם אני עושה את כל זה בצורה סדרתית, כנראה ששוב זה יקח לי כמה ימים טובים, אולי חודש, לבדוק את כל אלו - מה ה-AutoML נותן לנו?(אסף) בעצם כלי AutoML מאפשרים לך לבדוק את ההיפותזות האלו בצורה אוטומטית, ווכמובן בסוף נותנים לך איזשהו Audit trail - היכולת להבין עבור כל אחת מההיפותזות או הדברים שרצית לבדוק, עד כמה הם טובים.לרוב זה בעזרת איזשהו Proxy - זה לא יהיה ב-Business KPI שלך אלא איזשהו Proxy ל - Business KPI.(רן) ראיתי כלים כאלה Out there - יש כלי כזה ל-AWS ויש כלי כזה ל-GCP - מה אתם עושים? כתבתם אחד משלכם?(אסף) כן . . . תראה, אני חושב שיש . . . קודם כל כתבנו משהו משלנו, והוא תפור באמת לבעיה שלנו, אני יכול טיפה לדבר למה . . .המודל שלנו, לרוב . . . כדי להגיד משהו על טיב של מודל כזה או אחר, למשל בדוגמא של רן - האם הוספנו פיצ’ר כזה או פיצ’ר אחר, אז עד כמה המודל מתפקד בצורה טובה? - לדבר הזה נדרש די הרבה דאטה.הכלים, לפחות אלו שאנחנו מכירים, פחות יודעים להתמודד עם כמויות הדאטה העצומות שנדרשות לבעיה שלנו.(רן) אוקיי - אז מה, אני, כ- Data Scientist, בא בבוקר ואומר: “אני רוצה להריץ את שלושת האלגוריתמים האלה, כל אחד עם קומבינציה של 10 פרמטרים שונים”, וזהו - הולך לשתות קפה, חוזר ויש לי תוצאות?(אסף) אה . . . קצת יותר מורכב מזה.לרוב יש לנו כל מיני שאלות, היפותזות, שבהן אנחנו עוסקים, וכל Data Scientist לוקח מה-Backlog איזושהי שאלה כזאתלמשל - האם הוספה של פיצ’ר כזה או אחר תשפר את המודל, ובכמה זה ישפר את המודללרוב אתה צריך לבחור איזשהו Data Set שהוא Offline, כי זה לא נעשה בחלל ריק.יש לנו דרך, בעצם, לתרגם, את ההיפותזה הזאת לבעית חיפוש - אני יכול לתת דוגמא תיכף, אבל בסוף היום ב-AutoML יש לנו מנוע חיפוש, והוא מחפש במרחב המודלים שה-Data Scientist מגדיר לו - מחפש בצורה מאוד מהירה ומבוזרת, וזה תלוי בכוח המחשב שאתה שם עליו - אבל בגדול, אחרי כמה שעות אתה תקבל תשובה.(רן) אוקיי, אז אמרת שאתה רוצה לתת איזושהי דוגמא?(אסף) אני מתלבט האם לתת דוגמא מורכבת או . . . מעניינת וקצת מורכבת, שנדרש קצת רקע, או משהו יותר בנאלי?(רן) מעניינת ומורכבת ונדרש ידע - בוא נצלול!(אסף) מצויין - אז בעולם של CTR Prediction, אני אתן דוגמא שהיא יחסית מעניינת.אחד המודלים - אמנם מודל בסיסי, אבל עובד לא רע - הוא Logistic Regressionבגדול, הדאטה שהוא מקבל זה המידע על אותו Listing, וה-Context - למשל: רן כרגע נמצא בכרכור, צופה בדף של Ynet(!) בתוך iPhone 12 (!!) - ומערכת ההמלצה שלנו באה לנסות להבין איזו מבין שלושת הפרסומות שכרגע נמצאות ב-Inventory הכי מתאימה לרן.אז בעצם, אם ניקח את כל הדברים שאנחנו יודעים על רן ועל ה-Context שבו הוא נמצא - כרכור, iPhone 12 וכו’ - אז לכל אחת מהפרסומות, נוכל להפוך את זה לבעיית Classification של “מה ההסתברות שרן יקליק על אותו פריט תוכן?”בוא נאמר שלמדנו את זה מתוך דאטה היסטורי - וזה בעצם Logistic Regression ל - CTR Prediction על קצה המזלג.מסתבר שכמו בהרבה בעיות ב-Logistic Regression, צימודים של פיצ’רים מאוד עוזריםזאת אומרת - יכול להיות שלמדנו, אם נגיד את זה למודל, שאנשים בכרכור אוהבים ללחוץ על פרטי תוכן על מכוניות אדומות, לעומת אנשים שגרים בתל אביב שאוהבים משאיות ירוקות.(אורי) בדרך כלל ההיפך, אבל . . .(אסף) כן, קצת אילתרתי . . . אפשר לתת דוגמא יותר מוצלחת כנראה, אבל הנקודה היא שבאמת צימודים כאלה של פיצ’רים מוסיפים המון לדיוק של המודל(רן) זאת אומרת שאם מקודם בתיאור שלך של המודל הנחת איזושהי אי-תלות - הוא בכרכור, יש לו iPhone, וזה לא קשור לזה שהוא אוהב או לא אוהב מכוניות ירוקות או אדומות, אבל מסתבר שהמציאות מספרת לנו סיפור אחר, כנראה שיש איזושהי קורלציה ביניהם (אסף) נכון - וכאשר אתה מוסיף את הצימוד הזה של מיקום וסוג הפרסומת, בעצם המודל שלך יהיה יותר מדויק.אם נחשוב על בעיה מהחיים האמיתיים - אז יש לנו אלפים של פיצ’רים, וכאשר באים להוסיף פיצ’ר חדש ל-Logistic Regression, נשאלת השאלה עם איזה צימודים הוא יעבוד הכי טוב.אפילו אפשר לקחת שאלה הרבה יותר כללית - אם יש לנו נגיד אלף פיצ’רים, איזה צימודים אנחנו צריכים להוסיף למודל על מנת שהוא יתפקד בצורה הכי טובה שיש?(רן) רק נבהיר פה את המתימטיקה - אם יש לנו אלף פיצ’רים, אז אם נוסיף פיצ’ר אחד ונצמיד אותו לכל האחרים, אנחנו הולכים לקבל פי . . . הרבה פיצ’ריםכל אחד מחובר לכולם(אורי) יהיו הרבה . . .(אסף) למעשה, אם הולכים בקומבינטוריקה לאקסטרים - זה C(10,2) - וזה הרבה.[10 לא המון, 1,000 כבר כמעט חצי מליון . . . ]בעצם די מהר אפשר לחשוב על הבעיה הספציפית הזו כבעיית חיפוש - ואז אם יש לך מנוע חיפוש מוצלח אז תוכל למצוא את המודל המיטבי.ברור שמרחב האפשרויות פה הוא אקספוננציאלי, ואנחנו לא אוהבים אקספוננציאלי במדעי המחשב . . .(רן) שנייה, בוא נבהיר רגע . . .(אורי) אנחנו לא אוהבים אקספוננציאלי - ותמיד זה אקספוננציאלי . . .(אסף) נכון - ויש פתרונות מהספר, מה-Text book, של איך עושים את זה.(רן) אז בוא רגע נדבר על למה זה מנוע חיפוש - החיפוש הוא בין השילובים השונים של הפיצ’רים, ולכל אחד מהם יש איזשהו Score, זאת אומרת איזשהו . . אם הגעתי למקום אז אני יודע האם הגעתי למקום טוב או לא.וה-Score הוא - מה? היכולת שלו לחזות את ה-CTR לצורך העניין?(אסף) אז באמת ה-Score, אני חושב שזה המרכז פה ואפשר לדבר עליו, ותיכף אני אצלול בשמחה לתיאור של ה-Score, אבל אנחנו מדברים פה בעצם על מרחב חיפוש אקספוננציאלי, ואפשר להפעיל כל מיני אלגוריתמי חיפוש שרצים על המרחב הזה עם Score, עם יוריסטיקה.אם נכתוב תשתית טובה והמנוע חיפוש הזה יוכל לרוץ בצורה מבוזרת על הרבה מכונות, וגם ה-Score הזה ידע להיות מחושב מהר, אז ה-Data Scientist שלנו יהיה מאוד מרוצה כי יוכל בדוק את ההיפותזות שלו מאוד מהר.(רן) אז בו נעשה רגע שנייה סיכום - אני רוצה להוסיף עכשיו פיצ’ר חדש, אבל הבנו שפשוט להוסיף את הפיצ’ר ה-1,001 זה כנראה לא כזה מעניין כי יש הרבה מאוד קורלציות בין פיצ’רים, אז צריך להבין למי הפיצ’ר שלי קורלטיבי או לא, זאת אומרת - איזה קרוסים (Cross) מעניין להוסיף.וזה אתה אומר במרחב שהוא אקספוננציאלי, אז את זה צריך לצמצם, שלא יהיה אקספוננציאלי - לא אמרת בדיוק מה זה כן, אבל אני מניח שזה קצת פחות, ובכל אופן כנראה שנקבל עדיין מרחב מאוד מאוד גדול, אפילו שזה לא אקספוננציאלי זה עדיין מאוד מאוד גדול - אז גם את החיפוש במרחב הזה אני רוצה לעשות בצורה יחסית מהירה.והחלק שהוא Computational expensive בכל הסיפור הזה זה החישוב של ה-Score או משהו אחר?(אסף) כן, בדיוק זה המרכז.בסוף השאלה הבסיסית היא בעצם, אם שנייה נצלול אפילו יותר עמוק, אז בעצם כל Node במרחב החיפוש שלנו זה איזשהו מודל, עם צימוד כזה או אחר של פיצ’רים או עם שניהם יחדיו - ובעצם אנחנו שואלים עד כמה הוא יותר טוב מכל האחרים.ואני קצת אתאר איך אנחנו מחשבים את ה-Score הזה - בעצם ה-Score הזה אמור להיות Proxy מאוד קרוב לדיוק המודל ,ולשם כך אנחנו משתמשים ב-Data set שהוא Offline-י, מאוד קלאסי, שמחלקים אותו ל-Train ול-Test, חלק ממנו משמש לאימון המודל - ה-Bootstrap שלו - וחלק אחר לחיזוי הקליקים.זה דאטה אמיתי כמובןועל תוצאות הפרדיקציות, כשאנחנו יודעים את ה-Target, אם היה קליק או לא היה קליק, אנחנו מחשבים מדדי דיוק - AUC או מדדים אחרים די סטנדרטיים.(אורי) זה ל-Offline(אסף) זה ל-Offline כמובן(רן) ואיך אתה יודע שבאמת הדאטה הזה מייצגת את המציאות הנכונה? אם תיקח את כל הדאטה בעולם אז כן, אבל הבעיה היא שהוא גדול מדי . . . (אסף) כן, בגלל זה אני חושב ש . . .לפני כן שאלת אותי למה עשינו פה את כל הפיתוח של משהו שהוא ייחודי לנו, וזה באמת כי כדי לייצר את מטריקה שנותנת לנו דיוק טוב, אנחנו נדרשים לקחת הרבה דאטה - אז לרוב אנחנו לוקחים דאטה של שבוע, שזה דאטה של ג’יגות (Gb) בגדול, ועליו אנחנו מריצים את החיפוש.(רן) הבנתי . . . ובעצם מי שבנה את ה-Framework הזה זה בעצם מישהו בתוך קבוצת ה-Data Science אצלכם?(אסף) כן, זה התחיל ממשהו די פשוט וסיבכנו אותו ככל שהדרישות הצטברו . . .(רן) ככל שנתנו לכם . . . (אסף) האמת שלא, האמת ש . . . (אורי) זה היה תהליך מדהים, כי התחילו במשהו יחסית פשוט כדי להוכיח את ההתכנות של המודל הזה, ולאט לאט הלכו הוגדילו Scale ודייקו ודייקו את המודל, ונוספו פיצ’רים וגם המודלים המתימטיים השתפרו לאורך הזמן.(אסף) כן . . . האמת שהתחלנו עם Vowpal Wabbit, למי שמכיר, ובמהלך הדרך . . .למי שמכיר - Vowpal Wabbit זה Logistic Regression ממש מהירובמהלך הדרך קפצנו במנוע המתימטי למשהו שכתבנו אצלנו בקבוצה, שאנחנו קוראים לו Fwumious wabbit, שזה Field-aware factorization machines נורא נורא מהיר(אורי) ממש ממש ממש מהיר(אסף) ממש . . .אתם יכולים לראות את הבלוגים אצלנו, ב-Outbrain engineering, כתבנו על זה וגם על AutoML אז תוכלו לראות כמה דוגמאות יותר מפורטות(רן) יותר ממכונית ירוקה בכרכור . . .(אורי) לא, באמת - יש הרבה דברים שנכתבו בבלוג של Outbrain Engineering ושווה לקרוא שם.(רן) ופה ספציפית דיברנו על Use case של הוספת פיצ’ר - יש לכם נגיד גם Use cases של בחירת מסווג חדש, או כיוונון של Hyper-parameters במסווג או Use cases אחרים בסגנון הזה?(אסף) תראה, בחירת מסווג זה עניין די מורכב, כי המסווגים שלנו, כדי שהם יעבדו כמו שצריך, דורשים המון המון עבודה.כיום, למשל, יש לנו איזושהי גרסא שתפורה ל-TensorFlow, ובה אנחנו מנסים כל מיני ארכיטקטורות של רשתות - וכן, זה משהו שהכלי זה יודע לתמוך גם בה.(רן) אוקיי, אז פה בעיקר דיברנו על AutoML - זאת אומרת, החלק ה-Offline-י - אבל בחלק מהצגת הבעיה גם דיברנו על החלק השני, החלק ה-Online-י, שנקרא לו אולי MLOps? נניח . . . אילו פתרונות נבנו בתחום הזה? (אסף) אני יכול לספר לך על הרצוי ועל המצוי . . . בוא נתחיל עם הרצויבגדול, אני חושב שהשאיפה שלנו היא שברגע שיש לך תוצאה טובה ב-AutoML אתה תיהיה, אנחנו קוראים לזה “One click away” מ-A/B Test, זאת אומרת - בקליק אחד, או גם שלושה קליקים זה יהיה בסדר . . .(רן) שה-CTR שלהם הוא? . . .(אסף) 1 . . . 100%אבל הרעיון הוא שתוכל להגיע נורא מהר ל - A/B Test, עם כמה שפחות Technicalitiesאם נפרוט שנייה מה זה אומר ”ללכת ל - A/B Test”, אני יכול לפחות לתאר את החווייה שלנו ב-Outbrain, אז אומר לפעמים לעשות שינויים בשכבה שעוטפת את המודלים שלנו, לקחת את הפיצ’ר הזה ולגרום לו להגיע בכלל אל המודל ב-Serving, ויש איזושהי עבודה שנדרשת ב-Production, ואחרי זה להתקין, “לרלס” (Release) את אותו Service שמגיש את המודל, וכמובן שזה אומר לבדוק אותו ולעשות בדיקות SLA ו-Latency ו-Staging, ואחרי זה להגדיר את ה-A/B test . . . זו שורה ארוכה של משימות, ובעצם השאיפה שלנו היא לגרום לכל הסיפור הזה להיות אוטומטי.(רן) נכון להיום, כמה . . . אני זוכר, בהיותי מתחום ההנדסה, שאם מודדים לדוגמא מטריקות (Metrics) כמו Commit to Production - כמה זמן לוקח מהרגע שבוא עשיתי Commit ועד שהקוד הוא Deployed ב100% - לכם יש אילו שהן מדדי זמן כאלה? אם כן, מה הם?(אסף) אני חושב ש . . .אין לנו איזה Dashboard של Garfana שמודד את זה.כיום יש איזושהי סדרה של פעולות שה-Data Scientist צריך לעשות, יש פרוטוקל מוגדר היטב - זה לוקח בסביבות שעה וחצי.ואנחנו שואפים בעצם להפוך את זה לדקה וחצי.(אורי) זה למודל חדש, כשגם שינית את ההבאה של הדאטה . . .(אסף) לא, לרוב, אם נדרשים שינויים ב - Data Pipeline אם להוסיף APIs אז זה יכול לקחת קצת יותר - אני מתייחס ל - Tweak במודל.(רן) אוקיי - והחלק, נגיד ה - Post Production - לגלות Drift או תקלות אחרות שיכולות לקרות - איך זה עובד בעצם? סיימתי את ה - A/B Test ועכשיו אני שמח, אני שם את זה ב-Production - מה הלאה? (אסף) תראה, יש לנו שכבות שונות של ניטור, ושוב - גם פה יש עוד הרבה עבודה לפנינו.בגדול יש לנו כל מיני . . . אני יכול רק לומר שהדברים הקריטיים שגילינו שבהם המודלים שלנו מאוד משתפרים זה שהמודל צריך להיות מאוד עדכני.בעצם, המערכת בכל חמש דקות מקבל מודל חדש ב - Production - מאמנת מודל חדש ומכניסה מודל חדש ל-Production.וזה די מפחיד . . .(אורי) רגע, רגע . . . קודם דיברנו על זה שאנחנו רוצים לאמן כל מיני פיצ’רים חדשים ודברים כאלה, וזה כדי לבנות מודל מסוג חדש.עכשיו אתה מדבר על מודל עדכני - זאת אומרת: אותו מודל, כמו שבניתי ואימנתי וזה, רק שאני מאמן אותו בכל פעם על דאטה עדכני בכל חמש דקות.(אסף) כן, אז אולי באמת שווה לתת טיפה קונטקסט, אולי קצת קפצתי לזה - בגדול, המודלים של ה-CTR שלנו, אם תחשבו על ה-Setting של הדבר הזה, בעצם אנחנו כל הזמן מגישים המלצות, ה - Users מקליקים או לא מקליקים על ההמלצות שאנחנו מגישים, הדאטה הזה, של על מה הקליקו ועל מה לא מגיע למערכת שלנו אחרי כמה דקות ונאסף ב - Data Lake שלנוובעצם אנחנו יכולים לקחת את הדאטה הזה ולעדכן את המודלים שלנו כרגע ב-Production.וגילינו ש-KPI מאוד משמעותי לביצועים של המודל זה מהי נקודת הדאטה האחרונה שהמודל שלנו ראה בזמן ההגשה, ואנחנו שואפים שהזמן הזה יהיה קצר ככל האפשר.(רן) והסיבה מאחורי זה היא שהתוכן מתחדש? זאת אומרת, מה . . .(אורי) עם העיתון של אתמול אפשר לעטוף דגים . . .(רן) אוקיי . . . בסדר - היו לכם אילו שהן מחשבות על ללכת לכיוון משהו שהוא לגמרי Online? זאת אומרת Reinforcement learning, שממש יעשה את זה “באפס זמן”, או קצת יותר?(אסף) אני חושב שיש פה הרבה מקום לשיפור, בעיקר אני חושב שבתחום התשתיתייש כאן דבר אחד שצריך להבין, שיש כאן Trade off - אנחנו בעצם מציגים המלצות למשתמשהמשתמש, יכול להיות שבדיוק כשהגשנו לו את ההמלצה הוא צריך ללכת לעשות פיפי, והוא יחזור אחרי חמש דקות ואז הוא נזכר שהוא צריך לשתות קפה[זו לולאה אינסופית, דוגמא סטריאוטיפית של מהנדס כלשהו]רק אחרי רבע שעה הוא יגיד “וואו - איזו המלצה מדהימה ש-Outbrain נתנו לי!” והוא יקליק על ההמלצהורק אחרי אולי שמונה עשרה דקות נקבל את האינפורמציה על הקליק . . .הנקודה היא שיש לנו איזשהו Delay אינרנטי (Inherent) בכמה שאנחנו מחכים לקליק - אבל השאיפה באמת להיות כמה שיותר קרובים.(רן) אז אתה אומר שכאילו “ה-Watermark” הזה של החמש דקות הוא מספיק קצר כרגע, לפי מה שאתם רואים - כי גם ככה יש את ההתנהגות האנושית הזו של זמני קריאה, זמן הפיפי וזמן הקפה, ואין כרגע סיבה משמעותית לרדת מתחת לזה?(אסף) אה - לא, אני אגיד . . .יש עדיין איזשהו Delay אינרנטי, כמו שאמרנו, וכמובן שיש את כל ה-Data Pipeline שלנו שגורם לאותו קליק לחלחל למערכת, ל-Updater שלנו - ואת זה אפשר לקצר.שם יש לנו הרבה - אבל זו בעיקר עבודה תשתיתית, לגרום ל-Data Pipeline להיות הרבה יותר מהיר.(רן) הבנתי - גם אם אתה מרלס (Release) מודל חדש כל חמש דקות, אזא. יש את זמן בניית המודל, שגם יכול להיות לא טריוויאליב. וגם יש את השאלה של “ממתי הדאטה שעליו הוא עובד?” - ויכול להיות שהדאטה הזה “עדיין בצינורות” ולוקח לו זמן להגיעאני יכול להגיד שאצלנו (AppsFlyer) זה לוקח סדר גודל של שעתיים, אני חושב . . . או שלוש, לא זוכר.(אורי) אז פה, עשינו התקדמות מאוד גדולה בנושא של Real-time data pipelines, וזה עוזר בהמון מקומות.(אסף) וודאי - בעצם מה שזה אומר, אם שנייה נחשוב, זה בעצם שהמודלים שמגישים המלצות ב-Production, פוטנציאלית ראו חודשים של דאטה.(רן) דרך אגב - האימון הוא כל פעם מחדש, או שזה אינקרמנטלי (Incremental)?(אסף) אינקרמנטלי . . . (רן) אוקיי - אז זה לא חייב להיות כל כך הרבה זמן, ה-Cycle של האימון.(רן) אוקיי, מעולה - יש עוד המון דברים לדבר עליהם, אבל אנחנו כבר לקראת סוף הזמן שלנו.יש עוד נושא שרצית לכסות, לפני שאנחנו נסיים?(אסף) לא, אני חושב שעברתי על כל הרשימה שלי(אורי) יש נושא שרצית לכסות לפני שאנחנו אומרים שאנחנו מגייסים Data Scientists? (אסף) אה, נכון . . . אנחנו מגייסים!אנחנו מגייסים אנשים, תסתכלו בדף המשרות שלנוואם מעניין אתכם קצת לקרוא יותר על הדברים שדיברנו כרגע אז כנסו לבלוג של Outbrain Engineering או ב - Medium ל - Outbrain Engineering ותמצאו שם תוכן מעניין.(רן) מעולה - בהצלחה בהמשך הדרך ותודה רבה.(אסף) תודה שאירחתם אותי.הקובץ נמצא כאן, האזנה נעימה ותודה רבה לעופר פורר על התמלול

A multi-modal machine learning approach towards predicting patient readmission

Play Episode Listen Later Nov 20, 2020

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.11.20.391904v1?rss=1 Authors: Mohanty, S. D., Lekan, D., McCoy, T. P., Jenkins, M., Manda, P. Abstract: Healthcare costs that can be attributed to unplanned readmissions are staggeringly high and negatively impact health and wellness of patients. In the United States, hospital systems and care providers have strong financial motivations to reduce readmissions in accordance with several government guidelines. One of the critical steps to reducing readmissions is to recognize the factors that lead to readmission and correspondingly identify at-risk patients based on these factors. The availability of large volumes of electronic health care records make it possible to develop and deploy automated machine learning models that can predict unplanned readmissions and pinpoint the most important factors of readmission risk. While hospital readmission is an undesirable outcome for any patient, it is more so for medically frail patients. Here, we develop and compare four machine learning models (Random Forest, XGBoost, CatBoost, and Logistic Regression) for predicting 30-day unplanned readmission for patients deemed frail (Age [≥] 50). Variables that indicate frailty, comorbidities, high risk medication use, demographic, hospital and insurance were incorporated in the models for prediction of unplanned 30-day readmission. Our findings indicate that CatBoost outperforms the other three models (AUC 0.80) and prior work in this area. We find that constructs of frailty, certain categories of high risk medications, and comorbidity are all strong predictors of readmission for elderly patients. Copy rights belong to original authors. Visit the link for more info

united states patients copy machine learning predicting jenkins mccoy variables modal auc biorxiv readmission lekan xgboost random forest logistic regression

RNAmining: A machine learning stand-alone and web server tool for RNA coding potential prediction

Machine Learning with Coffee

Play Episode Listen Later Oct 26, 2020

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.10.26.354357v1?rss=1 Authors: Ramos, T., Galindo, N., Arias-Carrasco, R., da Silva, C., Maracaja-Coutinho, V., do Rego, T. Abstract: Non-coding RNAs (ncRNAs) are important players in the cellular regulation of organisms from different kingdoms. One of the key steps in ncRNAs research is the ability to distinguish coding/non-coding sequences. We applied 7 machine learning algorithms (Naive Bayes, SVM, KNN, Random Forest, XGBoost, ANN and DL) through 15 model organisms from different evolutionary branches. Then, we created a stand-alone and web server tool (RNAmining) to distinguish coding and non-coding sequences, selecting the algorithm with the best performance (XGBoost). Firstly, we used coding/non-coding sequences downloaded from Ensembl (April 14th, 2020). Then, coding/non-coding sequences were balanced, had their tri-nucleotides counts analysed and we performed a normalization by the sequence length. Thus, in total we built 180 models. All the machine learning algorithms tests were performed using 10-folds cross-validation and we selected the algorithm with the best results (XGBoost) to implement at RNAmining. Best F1-scores ranged from 97.56% to 99.57% depending on the organism. Moreover, we produced a benchmarking with other tools already in literature (CPAT, CPC2, RNAcon and Transdecoder) and our results outperformed them, opening opportunities for the development of RNAmining, which is freely available at https://rnamining.integrativebioinformatics.me/ . Copy rights belong to original authors. Visit the link for more info

predictions tool copy machine learning coding dl stand alone galindo rego svm webserver biorxiv xgboost random forest naive bayes

15 Adaboost: Adaptive Boosting

Play Episode Listen Later Sep 28, 2020 18:01

Adaboost is one of the classic machine learning algorithms. Just like Random Forest and XGBoost, Adaboost belongs to the ensemble models, in other words, it aggregates the results of simpler classifiers to make robust predictions. The main different of Adaboost is that it is an adaptive algorithm, which means that it learns from the misclassified instances of previous models, assigning more weights to those errors and focusing its attention on those instances in the next round.

boosting adaptive xgboost random forest

15 Adaboost: Adaptive Boosting

Machine Learning en Español

Play Episode Listen Later Sep 28, 2020 16:32

Adaboost es uno de los algoritmos clásicos de aprendizaje máquina. Al igual que Random Forest y XGBoost pertenece a la clase de modelos de ensamble, es decir, que se basan en agregar otros modelos débiles o de base para hacer predicciones. La principal diferencia con Adaboost es que es adaptativo, es decir, aprender de los errores hechos en los primeros modelos poniendo más énfasis en los ejemplos clasificados incorrectamente.

boosting adaptive xgboost random forest

Applying Machine Learning Technology in the Prediction of Crop Infestation with Cotton Leafworm in Greenhouse

predictions copy machine learning crops cotton greenhouses pest infestation mostafa learning technologies biorxiv xgboost

Play Episode Listen Later Sep 19, 2020

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.09.17.301168v1?rss=1 Authors: Tageldin, A., Adly, D., Mostafa, H., Mohammed, H. Abstract: The use of technology in agriculture has grown in recent years with the era of data analytics affecting every industry. The main challenge in using technology in agriculture is identification of effectiveness of big data analytics algorithms and their application methods. Pest management is one of the most important problems facing farmers. The cotton leafworm, Spodoptera littoralis (Boisd.) (CLW) is one of the major polyphagous key pests attacking plants includes 73 species recorded at Egypt. In the present study, several machine learning algorithms have been implemented to predict plant infestation with CLW. The moth of CLW data was weekly collected for two years in a commercial hydroponic greenhouse. Furthermore, among other features temperature and relative humidity were recorded over the total period of the study. It was proven that the XGBoost algorithm is the most effective algorithm applied in this study. Prediction accuracy of 84% has been achieved using this algorithm. The impact of environmental features on the prediction accuracy was compared with each other to ensure a complete dataset for future results. In conclusion, the present study provided a framework for applying machine learning in the prediction of plant infestation with the CLW in the greenhouses. Based on this framework, further studies with continuous measurements are warranted to achieve greater accuracy. Copy rights belong to original authors. Visit the link for more info

Xgboost pipeline

Machine learning

Play Episode Listen Later Sep 10, 2020 10:17

Thoughts on setting up your production pipelines

pipeline xgboost

StackPDB: predicting DNA-binding proteins based on XGB-RFE feature optimization and stacked ensemble classifier

Play Episode Listen Later Aug 24, 2020

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.24.264267v1?rss=1 Authors: Zhang, Q., Liu, P., Han, Y., Zhang, Y., Wang, X., Yu, B. Abstract: DNA binding proteins (DBPs) not only play an important role in all aspects of genetic activities such as DNA replication, recombination, repair, and modification but also are used as key components of antibiotics, steroids, and anticancer drugs in the field of drug discovery. Identifying DBPs becomes one of the most challenging problems in the domain of proteomics research. Considering the high-priced and inefficient of the experimental method, constructing a detailed DBPs prediction model becomes an urgent problem for researchers. In this paper, we propose a stacked ensemble classifier based method for predicting DBPs called StackPDB. Firstly, pseudo amino acid composition (PseAAC), pseudo position-specific scoring matrix (PsePSSM), position-specific scoring matrix-transition probability composition (PSSM-TPC), evolutionary distance transformation (EDT), and residue probing transformation (RPT) are applied to extract protein sequence features. Secondly, extreme gradient boosting-recursive feature elimination (XGB-RFE) is employed to gain an excellent feature subset. Finally, the best features are applied to the stacked ensemble classifier composed of XGBoost, LightGBM, and SVM to construct StackPDB. After applying leave-one-out cross-validation (LOOCV), StackPDB obtains high ACC and MCC on PDB1075, 93.44% and 0.8687, respectively. Besides, the ACC of the independent test datasets PDB186 and PDB180 are 84.41% and 90.00%, respectively. The MCC of the independent test datasets PDB186 and PDB180 are 0.6882 and 0.7997, respectively. The results on the training dataset and the independent test dataset show that StackPDB has a great predictive ability to predict DBPs. Copy rights belong to original authors. Visit the link for more info

dna feature copy predicting ensemble acc wang optimization zhang binding stacked edt liu yu proteins mcc rpt svm biorxiv classifier xgboost

A Novel Machine Learning Strategy for Prediction of Antihypertensive Peptides Derived from Food with High Efficiency

predictions copy machine learning wang peptides shen derived niu learning strategies biorxiv high efficiency xue xgboost antihypertensive

Play Episode Listen Later Aug 13, 2020

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.12.248955v1?rss=1 Authors: Wang, L., Niu, D., Wang, X., Shen, Q., Xue, Y. Abstract: Strategies to screen antihypertensive peptides with high throughput and rapid speed will be doubtlessly contributed to the treatment of hypertension. The food-derived antihypertensive peptides can reduce blood pressure without side effects. In present study, a novel model based on Extreme Gradient Boosting (XGBoost) algorithm was developed using the primary structural features of the food-derived peptides, and its performance in the prediction of antihypertensive peptides was compared with the dominating machine learning models. To further reflect the reliability of the method in real situation, the optimized XGBoost model was utilized to predict the antihypertensive degree of k-mer peptides cutting from 6 key proteins in bovine milk and the peptide-protein docking technology was introduced to verify the findings. The results showed that the XGBoost model achieved outstanding performance with the accuracy of 0.9841 and the area under the receiver operating characteristic curve of 0.9428, which were better than the other models. Using the XGBoost model, the prediction of antihypertensive peptides derived from milk protein was consistent with the peptide-protein docking results, and was more efficient. Our results indicate that using XGBoost algorithm as a novel auxiliary tool is feasible for screening antihypertensive peptide derived from food with high throughput and high efficiency. Copy rights belong to original authors. Visit the link for more info

DNN-DTIs: improved drug-target interactions prediction using XGBoost feature selection and deep neural network

Machine Learning en Español

Play Episode Listen Later Aug 12, 2020

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.11.247437v1?rss=1 Authors: Chen, C., Shi, H., Han, Y., Jiang, Z., Cui, X., Yu, B. Abstract: Research, analysis, and prediction of drug-target interactions (DTIs) play an important role in understanding drug mechanisms, drug repositioning and design. Machine learning (ML)-based methods for DTIs prediction can mitigate the shortcomings of time-consuming and labor-intensive experimental approaches, providing new ideas and insights for drug design. We propose a novel pipeline for predicting drug-target interactions, called DNN-DTIs. First, the target information is characterized by pseudo-amino acid composition, pseudo position-specific scoring matrix, conjoint triad, composition, transition and distribution, Moreau-Broto autocorrelation, and structure feature. Then, the drug compounds are encoded using substructure fingerprint. Next, we utilize XGBoost to determine nonredundant and important feature subset, then the optimized and balanced sample vectors could be obtained through SMOTE. Finally, a DTIs predictor, DNN-DTIs, is developed based on deep neural network (DNN) via layer-by-layer learning. Experimental results indicate that DNN-DTIs achieves outstanding performance than other predictors with the ACC values of 98.78%, 98.60%, 97.98%, 98.24% and 98.00% on Enzyme, Ion Channels (IC), GPCR, Nuclear Receptors (NR) and Kuang's dataset. Therefore, DNN-DTIs's accurate prediction performance on Network1 and Network2 make it logical choice for contributing to the study of DTIs, especially, the drug repositioning and new usage of old drugs. Copy rights belong to original authors. Visit the link for more info

deep predictions target drug feature copy selection improved acc experimental ml interactions yu shi neural networks enzyme jiang cui biorxiv dnn smote xgboost gpcr

14 XGBoost: El Ganador de Muchas Competencias

Play Episode Listen Later Jul 26, 2020 19:15

XGBoost es una librería de software que es open-source y que ha ganado varias competencias de Machine Learning. XGBoost está basado en los principios de gradient booting, el cual a su vez está basado en las ideas de Leo Breiman, el creador de Random Forest. La teoría detrás de gradient boosting fue formalizada por Jerome H. Friedman. Gradient boosting combina modelos simples y utiliza ingeniería muy inteligente la cual incluye una penalización para los árboles y un encogimiento proporcional para los nodos hoja.

muchas machine learning friedman competencias gradient el ganador xgboost random forest

14 XGBoost: The Winner of Many Competitions

Machine Learning with Coffee

Play Episode Listen Later Jul 26, 2020 13:50

XGBoost is an open-source software library which has won several Machine Learning competitions in Kaggle. It is based on the principles of gradient boosting, which is based on the ideas of the Leo Breiman, the creator of Random Forest. The theory behind gradient boosting was later formalized by Jerome H. Friedman. Gradient boosting combines weak learners just as Random Forest. XGBoost is an engineering implementation which includes a clever penalization of trees and a proportional shrinking of leaf nodes.

winner machine learning friedman competitions gradient kaggle xgboost random forest

Logistic regression vs xgboost

Machine learning

Play Episode Listen Later Jul 16, 2020 18:56

Boston has become popular in kaggle competitions but is it better than logistic regression

xgboost logistic regression

Adversarial Examples, Protein Folding, and Shapley Values

Journal Club

Play Episode Listen Later Apr 28, 2020 45:57

George dives into his blog post experimenting with Scott Lundberg's SHAP library. By training an XGBoost model on a dataset about academic attainment and alcohol consumption can we develop a global interpretation of the underlying relationships? Lan leads the discussion of the paper Adversarial Examples Are Not Bugs, They Are Features by Ilyas and colleagues. This papers proposes a new perspective on adversarial susceptibility of machine learning models by teasing apart the 'robust' and the 'non-robust' features in a dataset. The authors summarizes the key take away message as "Adversarial vulnerability is a direct result of the models’ sensitivity to well-generalizing, ‘non-robust’ features in the data." Last but not least, Kyle discusses Alphafold!

values lan adversarial alphafold ilyas shap shapley protein folding xgboost

Episode 11: XGBoost special

AWS AI & Machine Learning Podcast

Play Episode Listen Later Feb 24, 2020 18:50

In this episode, I talk about XGBoost 1.0, a major milestone for this very popular algorithm. Then, I discuss the three options you have for running XGBoost on Amazon SageMaker: built-in algo, built-in framework, and bring your own container. Code included, of course!⭐️⭐️⭐️ Don't forget to subscribe to be notified of future episodes ⭐️⭐️⭐️Additional resources mentioned in the podcast:* XGBoost built-in algo: https://gitlab.com/juliensimon/ent321* XGBoost built-in framework: https://gitlab.com/juliensimon/dlnotebooks/-/blob/master/sagemaker/09-XGBoost-script-mode.ipynb * BYO with Scikit-learn: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/scikit_bring_your_own/scikit_bring_your_own.ipynb * Deploying XGBoost with mlflow: https://youtu.be/jpZSp9O8_ew * New model format: https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html* Converting pickled models: https://github.com/dmlc/xgboost/blob/master/doc/python/convert_090to100.py This podcast is also available in video at https://youtu.be/w0F4z0dMdzI.For more content, follow me on:* Medium https://medium.com/@julsimon* Twitter https://twitter.com/@julsimon

code medium converting byo amazon sagemaker xgboost scikit

Episode 35: IBM Lale

Open Source Directions hosted by Quansight

Play Episode Listen Later Jan 24, 2020

Lale is a Python library for semi-automated data science. Lale makes it easy to automatically select algorithms and tune hyperparameters of pipelines that are compatible with scikit-learn, in a type-safe fashion. If you are a data scientist who wants to experiment with automated machine learning, this library is for you! Lale adds value beyond scikit-learn along three dimensions: automation, correctness checks, and interoperability. For automation, Lale provides a consistent high-level interface to existing pipeline search tools including GridSearchCV, SMAC, and Hyperopt. For correctness checks, Lale uses JSON Schema to catch mistakes when there is a mismatch between hyperparameters and their type, or between data and operators. And for interoperability, Lale has a growing library of transformers and estimators from popular libraries such as scikit-learn, XGBoost, PyTorch etc. Lale can be installed just like any other Python package and can be edited with off-the-shelf Python tools such as Jupyter notebooks.

open python roadmaps lale pytorch smac jupyter xgboost

Episode 27: Feature Engineering with Ben Fowler

DataCast

Play Episode Listen Later Jan 24, 2020 60:50

Show Notes:(2:17) Ben talked about his past career working in the golf industry - working at the National Golf Foundation and the PGA of America.(4:12) Ben discussed about his first exposure to machine learning and data science.(5:06) Ben talked about his motivation for pursuing an online Master’s degree in Data Science at Southern Methodist University.(6:02) Ben emphasized the importance of a Data Mining course that he took.(8:12) Ben discussed his job as a Senior Data Scientist at CarePredict, an AI elder care platform that helps senior live independently, economically, and longer.(8:56) Ben shared his thought about data security, the biggest challenge of adopting machine learning in healthcare.(10:38) Ben talked about his next employer JM Family Enterprises, one of the largest companies in the automotive industry.(12:44) Ben walked through the end-to-end model development process to solve various problems of interests in his Data Scientist work at JM Family Enterprises.(14:15) Ben discussed the challenges around feature engineering and model experiments in this process.(18:09) Ben shared information about his current role as Machine Learning Technical Lead at Southeast Toyota Finance.(19:29) Ben talked about his passion to do IC data science work.(22:37) Ben went over different conferences he has been / will be at.(26:03) Ben shared the best practices/techniques/libraries to do efficient feature engineering and feature selection, as presented at Palm Beach Data Science Meetup in September 2018 and PyData Miami in January 2019.(29:27) Ben talked about the importance of doing exploratory data analysis and logging experiments before engaging in any feature engineering / selection work.(32:50) Ben shared his experiments performing data science for Fantasy Football - specifically using machine learning to predict the future performance of players, from his talk at the Palm Beach Data Science Meetup last year.(37:25) Ben talked about his experience using H2O AutoML.(40:07) Ben gave a glimpse of his talks about evaluating traditional and novel feature selection approaches at PyData LA and Strata Data Conf.(51:25) Ben gave his advice for people who are interested in speaking at conferences.(52:29) Ben shared his thoughts about the tech and data community in the greater Miami area.(53:16) Closing Segment.His Contact Info:LinkedInHis Recommended Resources:MLflow from DatabricksStreamlit LibraryPyData ConferenceH2O World ConferenceO’Reilly Strata Data and AI ConferenceREWORK Summit ConferencePandas LibraryXGBFir Librarytsfresh LibraryLending Club DatasetSHAP library from Scott Lundberg"Interpretable Machine Learning with XGBoost" by Scott LundbergAmazon SageMakerGoogle Cloud AutoMLH2O AutoMLWes McKinney’s "Python for Data Analysis"

america ai master miami conference fantasy football pga python data science ic data scientists southern methodist university data analysis data mining senior data scientist automl interpretability xgboost feature engineering national golf foundation ben fowler

Episode 27: Feature Engineering with Ben Fowler

Datacast

Play Episode Listen Later Jan 23, 2020 60:50

Episode 8 - Mucking around and making things work Ft. Tom Liptrot

PyDataMCR

Play Episode Listen Later Sep 24, 2019 47:08

Welcome to PyDataMCR episode 8 , today we are talking to Tom Liptrot who is a Consultant Data Scientist at Ortom, a Manchester based Data Science Consultancy. We talk about how flexibility at work can lead to great data products, some various meetups Tom will be attending and even some stand up comedy. Show Notes Sponsors Arctic Shores - arcticshores.com/ Cathcart Associates - cathcartassociates.com/ Meetups Pydatamcr - https://www.meetup.com/PyData-Manchester/ Macnml - https://www.mancml.io/ Chester Data Insights - https://www.meetup.com/Chester-Data-Insights/ AiFrenzy - labs.uk.barclays/ai Crap talks - https://www.meetup.com/CRAP-Talks-CRO-Analytics-Product-Manchester/ Bright festival - http://www.brightclub.org/ Tom Ortom Data Science Consultancy - https://ortom.co.uk/ DSF blog post - https://ortom.co.uk/2019/03/22/data-fest-2019.html Packages Tidyverse - https://www.tidyverse.org/ Pandas - https://pandas.pydata.org/ dplyr - https://dplyr.tidyverse.org/ brms - https://cran.r-project.org/web/packages/brms/index.html pymc3 - https://docs.pymc.io/ XGBoost - https://cran.r-project.org/web/packages/xgboost/index.html glmnet - https://cran.r-project.org/web/packages/glmnet/index.html keras - https://keras.io/ Book recommendation Andrew Gelman - https://www.amazon.com/Red-State-Blue-Rich-Poor/dp/0691143935 Who you admire Demis Hassabis - twitter.com/demishassabis?lang=en Jeff Dean - en.m.wikipedia.org/wiki/Jeff_Dean_(computer_scientist) Hadley Wickham - http://hadley.nz/ Hannah Fry - http://www.hannahfry.co.uk/ Andy Clark - https://twitter.com/fluffycyborg?lang=en Karl J. Friston - en.m.wikipedia.org/wiki/Karl_J._Friston Martin Eastwood - http://www.pena.lt/y/blog.html Peak.ai - https://peak.ai/ Kayle Haynes - https://twitter.com/KayleaHaynes Chris boddington - https://www.linkedin.com/in/christopher-boddington-449555112/

manchester pandas making things dsf andy clark hannah fry mucking jeff dean friston xgboost hadley wickham karl j friston

KISS Java EE, MicroProfile, AI, (Deep) Machine Learning

airhacks.fm podcast with adam bien

Play Episode Listen Later Aug 10, 2019 81:40

An airhacks.fm conversation with Pavel Pscheidl (@PavelPscheidl) about: Pentium 1 with 12, 75 MHz, first hello world with 17, Quake 3 friend as programming coach, starting with Java 1.6 at at the university of Hradec Kralove, second "hello world" with Operation Flashpoint, the third "hello world" was a Swing Java application as introduction to object oriented programming, introduction to enterprise Java in the 3rd year at the university, first commercial banking Java EE 6 / WebLogic project in Prague with mobile devices, working full time during the study, the first Java EE project was really successful, 2 month development time, one DTO, nor superfluous layers, using enunciate to generate the REST API, CDI and JAX-RS are a strong foundation, the first beep, fast JSF, CDI and JAX-RS deployments, the first beep, the War of Frameworks, pragmatic Java EE, "no frameworks" project at telco, reverse engineering Java EE, getting questions answered at airhacks.tv, working on PhD and statistics, starting at h2o.ai, h2o is a sillicon valley startup, h2o started as a distributed key-value store with involvement of Cliff Click, machine learning algorithms were introduced on top of distributed cache - the advent of h2o, h2o is an opensource company - see github, Driverless AI is the commercial product, Driverless AI automates cumbersome tasks, all AI heavy lifting is written in Java, h2o provides a custom java.util.Map implementation as distributed cache, random forest is great for outlier detection, the computer vision library openCV, Gradient Boosting Machine (GBM), the opensource airlines dataset, monitoring Java EE request processing queues with GBM, Generalized Linear Model (GLM), GBM vs. GLM, GBM is more explained with the decision tree as output, XGBoost, at h2o XGBoost is written in C and comes with JNI Java interface, XGBoost works well on GPUs, XGBoost is like GBM but optimized for GPUs, Word2vec, Deep Learning (Neural Networks), h2o generates a directly usable archive with the trained model -- and is directly usable in Java, K-Means, k-means will try to find the answer without a teacher, AI is just predictive statistics on steroids, Isolation Random Forest, IRF was designed for outlier detection, and K-Means was not, Naïve Bayes Classifier is rarely used in practice - it assumes no relation between the features, Stacking is the combination of algorithms to improve the results, AutoML: Automatic Machine Learning, AutomML will try to find the right combination of algorithms to match the outcome, h2o provides a set of connectors: csv, JDBC, amazon S3, Google Cloud Storage, applying AI to Java EE logs, the amount of training data depends on the amount of features, for each feature you will need approx. 30 observations, h2o world - the conference, cancer prediction with machine learning, preserving wildlife with AI, using AI for spider categorization Pavel Pscheidl on twitter: @PavelPscheidl, Pavel's blog: pavel.cool

ai deep phd war kiss machine learning map prague java frameworks stacking pavel quake s3 gpus cdi mhz gbm rest apis pentium dto irf glm jsf opencv xgboost java ee word2vec jdbc weblogic k means operation flashpoint hradec kralove

SDS 283: Getting The Most Out of Data With Gradient Boosting

airbnb algorithms real world tan oculus xgboost

Play Episode Listen Later Jul 31, 2019 62:25

In this episode of the SuperDataScience Podcast, I chat with one of the key people behind the Python package scikit-learn, Andreas Mueller. You will learn about gradient boosting algorithms, XGBoost, LightGBM and HistGradientBoosting. You will hear Andreas's approach to solving problems, what machine learning algorithms he prefers to apply to a given data science challenge, in which order and why. You will also hear about problems with Kaggle competitions. You will find out the four key questions that Andreas recommends to ask when you have a data challenge in front of you. You will learn about his 95% rule to creating models, and creating success in business enterprises with the help of machine learning. And, finally, you will also learn about the Data Science Institute at Columbia University. If you enjoyed this episode, check out show notes, resources, and more at www.superdatascience.com/283

data columbia university andreas boosting python gradient kaggle data science institute xgboost andreas mueller

Tan Vachiramon - Choosing the right algorithm for your real-world problem

Towards Data Science

Play Episode Listen Later Jul 15, 2019 43:21

You import your data. You clean your data. You make your baseline model. Then, you tune your hyperparameters. You go back and forth from random forests to XGBoost, add feature selection, and tune some more. Your model’s performance goes up, and up, and up. And eventually, the thought occurs to you: when do I stop? Most data scientists struggle with this question on a regular basis, and from what I’ve seen working with SharpestMinds, the vast majority of aspiring data scientists get the answer wrong. That’s why we sat down with Tan Vachiramon, a member of the Spatial AI team Oculus, and former data scientist at Airbnb. Tan has seen data science applied in two very different industry settings: once, as part of a team whose job it was to figure out how to understand their customer base in the middle of a the whirlwind of out-of-control user growth (at Airbnb); and again in a context where he’s had the luxury of conducting far more rigorous data science experiments under controlled circumstances (at Oculus). My biggest take-home from our conversation was this: if you’re interested in working at a company, it’s worth taking some time to think about their business context, because that’s the single most important factor driving the kind of data science you’ll be doing there. Specifically: Data science at rapidly growing companies comes with a special kind of challenge that’s not immediately obvious: because they’re growing so fast, no matter where you look, everything looks like it’s correlated with growth! New referral campaign? “That definitely made the numbers go up!” New user onboarding strategy? “Wow, that worked so well!”. Because the product is taking off, you need special strategies to ensure that you don’t confuse the effectiveness of a company initiative you’re interested in with the inherent viral growth that the product was already experiencing. The amount of time you spend tuning or selecting your model, or doing feature selection, entirely depends on the business context. In some companies (like Airbnb in the early days), super-accurate algorithms aren’t as valuable as algorithms that allow you to understand what the heck is going on in your dataset. As long as business decisions don’t depend on getting second-digit-after-the-decimal levels of accuracy, it’s okay (and even critical) to build a quick model and move on. In these cases, even logistic regression often does the trick! In other contexts, where tens of millions of dollars depend on every decimal point of accuracy you can squeeze out of your model (investment banking, ad optimization), expect to spend more time on tuning/modeling. At the end of the day, it’s a question of opportunity costs: keep asking yourself if you could be creating more value for the business if you wrapped up your model tuning now, to work on something else. If you think the answer could be yes, then consider calling model.save() and walking away.

#60 – XGBoost: A Scalable Tree Boosting System

Misreading Chat

Play Episode Listen Later Apr 24, 2019

GBDT の実装である XGBoost とかのコードを森田が読みます。

system tree boosting scalable xgboost

cuDF, cuML & RAPIDS: GPU Accelerated Data Science with Paul Mahler - TWiML Talk #254

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Apr 19, 2019 38:13

Today we're joined by Paul Mahler, senior data scientist and technical product manager for machine learning at NVIDIA. In our conversation, Paul and I discuss NVIDIA's RAPIDS open source project, which aims to bring GPU acceleration to traditional data science workflows and machine learning tasks. We dig into the various subprojects like cuDF and cuML that make up the RAPIDS ecosystem, as well as the role of lower-level libraries like mlprims and the relationship to other open-source projects like Scikit-learn, XGBoost and Dask. The complete show notes for this episode can be found at https://twimlai.com/talk/254. Visit twimlai.com/gtc19 for more from our GTC 2019 series. To learn more about Dell Precision workstations, and some of the ways they’re being used by customers in industries like Media and Entertainment, Engineering and Manufacturing, Healthcare and Life Sciences, Oil and Gas, and Financial services, visit Dellemc.com/Precision.

An Interpretable Machine Learning Model of Biological Age

conference machine learning sns xgboost pydata

Play Episode Listen Later Mar 22, 2019 52:26

When we launched the Blood Chemistry Calculator (BCC) in early 2018 we couldn’t have predicted the changes the software would undergo or the projects it would lead to. One such project has been researching and writing a scientific paper on the use of machine learning to predict and interpret biological age. The paper is currently in the peer review process on F1000Research, an open research publishing platform. In this podcast, I talk with lead author Dr. Tommy Wood, MD, PhD, about the importance of knowing your biological age and understanding how it can be derived from basic blood chemistry markers. Tommy and I discuss the peer-review process and the changes we’re making to the software as a result of the feedback that’s been provided. We also discuss the individual markers that have the greatest impact on biological age, and how you can get a free predicted age report. Here’s the outline of this interview with Tommy Wood: [00:00:58] Tommy got bit by a snake. [00:02:38] Going to the doctor vs. changing lifestyle. [00:03:32] Iatrogenic antibiotic injury. [00:03:49] Antivenom: what it is, what it does and the side effects. [00:06:49] Snake oral microbiota. [00:10:23] Effects of antibiotics on gut. [00:13:29] DUTCH (Dried Urine Test for Comprehensive Hormones). [00:15:54] Our article: An interpretable machine model of biological age. [00:17:15] Why is biological age important? [00:19:12] Other tests of biological age; telomeres. [00:20:31] Epigenetic testing. [00:20:59] Effects of environment on epigenetic methylation; Studies: Nilsson, Emma, and Charlotte Ling. "DNA methylation links genetics, fetal environment, and an unhealthy lifestyle to the development of type 2 diabetes." Clinical epigenetics 9.1 (2017): 105; and Yet, Idil, et al. "Genetic and environmental impacts on DNA methylation levels in twins." Epigenomics 8.1 (2016): 105-117. Effects of lifestyle change on epigenetic methylation; Studies: Arpón, Ana, et al. "Impact of consuming extra-virgin olive oil or nuts within a Mediterranean diet on DNA methylation in peripheral white blood cells within the PREDIMED-Navarra randomized controlled trial: A role for dietary lipids." Nutrients 10.1 (2018): 15; and Delgado-Cruzata, Lissette, et al. "Dietary modifications, weight loss, and changes in metabolic markers affect global DNA methylation in Hispanic, African American, and Afro-Caribbean breast cancer survivors." The Journal of nutrition 145.4 (2015): 783-790. [00:21:05] Epigenetic shifts and aging; Study: Pal, Sangita, and Jessica K. Tyler. "Epigenetics and aging." Science advances 2.7 (2016): e1600584. [00:21:48] Insilico Medicine - Deep Biomarkers of Human Aging: aging.ai. [00:22:46] Blood Chemistry Calculator (BCC). [00:23:33] Find out your biological age with the free partial BCC report. [00:24:04] How the biological age score is determined. [00:28:13] Why we published the paper. [00:28:40] Medscape article: Journal Editors on Peer Review, Paywalls, and Preprints. [00:31:26] F1000Research. [00:33:54] GitHub; XGBoost; Python. [00:35:32] The reviewers for the peer review process: Alex Zhavoronkov and Peter Fedichev. [00:39:10] Ideas that came out of the peer review process. [00:42:49] Shapley Values and SHAP plots. [00:43:51] Machine learning competition website: Kaggle. [00:45:20] The most important blood markers for predicting biological age. [00:48:02] Total cholesterol and BUN for predicting biological age. [00:50:48] Nourish Balance Thrive on Patreon; NBT Forum.

223: Ear Bleeding Pods (higepon)

Rebuild

Play Episode Listen Later Dec 11, 2018 77:27

Taro Minowa さんをゲストに迎えて、Kaggle, PayPay, Liberty Air, Switch, YouTube などについて話しました。 Show Notes Kaggle Titanic: Machine Learning from Disaster | Kaggle New York City Taxi Fare Prediction | Kaggle How to Win a Data Science Competition: Learn from Top Kagglers | Coursera XGBoost Akinator ロバート秋山&友近の国産洋画劇場「船と氷山」 PayPay LINE Payで飲み物を買う Google Home Hub Soundcore Anker Liberty Air UNDERTALE Wizard of Legend 弟者,兄者の「Wizard Of Legend」としょ子 games YouTube Premium

switch wizard jp bleeding pods undertale kaggle paypay akinator google home hub xgboost wizard of legend

7. 台風コンペ追い込み

regonn&curry.fm

Play Episode Listen Later Oct 23, 2018 45:06

PyData.tokyo One-day Conference 2018 - connpass に参加しました。どのセッションもとても勉強になった「kaggle のススメ」という LT を実施 NVIDIA この日は、「PyData.tokyo」、「TokyoR」、「JuliaTokyo」が同日開催という珍しい日だった。今週の１週間台風コンペ１０位まで上がりました。最近読んでる本マルチ・ポテンシャライト好きなことを次々と仕事にして、一生食っていく方法 https://amzn.to/2SdVr5C 一つのことに専念していくためにはどうすればいいかという本が多い中、この本は専門家で一つのことをやりとげるのではなく、色々と興味が移る人(マルチポテンシャライト)向けの話しどのような人生設計をすればいいのか 4 つの働き方(ワークモデル)を提示してくれているどの働き方も「お金」「意義」「多様性」を大事にしているグループハグアプローチスラッシュアプローチ自分はこれに一番近いニッチなテーマに魅力を感じるが、それをフルタイムでやろうとはせず、名詞の肩書が多くなるタイプ(フリーランスの掛け持ち) アインシュタインアプローチフェニックスアプローチ最近気になってる新しいサービスみずはのめ競艇予想人工知能 SNS 要素を入れていくみたい Qrunch 技術ブログを書くハードルを下げるサービスログという形で気軽に残せるクロス投稿で自分のブログ記事を投稿することもできる Qiita よりも良いデザインもしっかりしているタスクが Trello で公開している早速自分の Julia の記事をクロス投稿しておいた最近、個人がクオリティの高いサービスを出してくるようになったイメージ Julia の動き Gadfly.jl が v1.0 で動くようになった XGboost.jl も master ブランチが v1.0 対応したっぽい Take Kaggle's 2018 Machine Learning and Data Science Survey! やってるデータサイエンスの最前線がわかる調査収集データは CSV で公開される Kaggle Machine Learning & Data Science Survey 2017 Two Sigma せっかく Python で書くのでなにか学びながらやりたいベイズ統計で解いてみる(Stan.jl 等もあるので、Julia でも知見を活かせそう) Two Sigma のカーネルで Bayesian とかで検索しても出てこない。人気が無いのかこのコンペには適さないのか【Python と Stan で学ぶ】仕組みが分かるベイズ統計学入門で学んでる今日の一句アイスからホットに変はり松手入れ恋言

4. 技術書典5来てね

regonn&curry.fm

Play Episode Listen Later Oct 1, 2018 75:10

10/8 の技術書典サークル参加します「Kaggle のチュートリアル」と「代表１人の小さな合同会社の作り方」を販売予定グッズとして扇子も作成中 twitter でどういう文字・画像が良いか聞いたが反応なし note.mu の継続課金マガジンを購読している方には、扇子をプレゼント 10/8 の天気が気になる podcast 聞いていますで ○○ プレゼントをやりたい今週の一週間を簡単に振り返る台風コンペチームメンバーの家に行って打ち合わせをしたり、チームメンバーが使う、train.py と test.py を書いていました。海外旅行中に読んだ本(3 冊) 機械学習入門ボルツマン機械学習から深層学習まで以前ベイズ推定入門の本は読んでいたので、こちらも読みたかった作者は量子コンピュータの書籍もだしているので、ちょくちょくそこら辺の話題が出てくるシリーズは続いているみたい(公式ではないけど、本人 Twitter では呟いていた)で量子コンピュータの話とかも今後出てくるかも 10 年後の仕事図鑑古代ローマ市民好きなことに没頭発信が大事カレーちゃんが影響受けている人はいる？ホリエモン、西野亮廣、ゆうこす、ハヤカワ五味など若い方とか、違う分野の方に学べるようになりたいと思っている自分は落合陽一、森博嗣、土屋賢二の影響が大きい一生リバウンドしないパレオダイエットの教科書鈴木祐メンタリスト DaiGo 最近この二人の本を読んでて、色々と論文の内容を紹介してくれる量子コンピュータの手軽に触れるイジングエディタが出てきたイジングエディタ - Annealing Cloud Web ここやってるフィックスターズ株式会社は積極的に量子コンピュータを使った研究開発をやっていて勉強会とかも定期的に開催している次世代計算機講座 - connpass あとは、企業だと MDR が有名量子コンピュータアプリ（ゲートモデル＆アニーリングモデル） - connpass こちらは kaggler-ja みたいに Slack コミュニティがある MDR | 量子コンピュータ最近 Kaggle の新規コンペが増えてきた Julia v1.0.1 リリース Bugfix がメインリスナーからの質問最近 Kaggle を始めた者です。素晴らしいポッドキャストありがとうございます。（各エピソード 3 回ずつは聞きました。）GA コンペで R を使っているせいもあるのか、モデリングのトライ&エラーを高速で行うことができずに困っています。（XGBoost 回すだけで 1 時間はかかってしまいます。）何かいい方法は無いでしょうか？ Python の場合は、LightGBM の方が XGBoost よりも数倍はやい R の場合はどうなんだろう私の場合は早く回すために特に利いたのは次の３つ Learning Late を小さくし過ぎない多く分割しすぎない（3 分割ぐらい） GPU を使うクロスバリデーションや主成分分析でトライアンドエラーの時間を減らせるかも LightGBM 公式の Parameters Tuning にも、For Faster Speed という項目があるので参考にすると良いイベント情報 Workshop in VR #2 LT 会 - connpass 技術書典５ | 技術書典今日の一句流星やラジオの会話途切れたり恋言

ga vr workshop lt slack python gpu mdr kaggle daigo xgboost bugfix

The No Free Lunch Theorems

Data Skeptic

Play Episode Listen Later Mar 9, 2018 27:25

What's the best machine learning algorithm to use? I hear that XGBoost wins most of the Kaggle competitions that aren't won with deep learning. Should I just use XGBoost all the time? That might work out most of the time in practice, but a proof exists which tells us that there cannot be one true algorithm to rule them.

kaggle no free lunch theorems xgboost

Machine Learning for Arrhythmia Detection

Play Episode Listen Later Dec 20, 2017 40:40

Dr. Gari Clifford, DPhil has been studying artificial intelligence (AI) and its utility in healthcare for two decades. He holds several prestigious positions in academia and is an Associate Professor of Biomedical Informatics at Emory University and an Associate Professor of Biomedical Engineering at Georgia Institute of Technology. We met him at the San Francisco Data Institute Conference in October where he chaired sessions on Machine Learning and Health. Gari recently held a competition challenging data scientists to develop predictive algorithms for the early detection of Atrial Fibrillation, using mobile ECG machines. He shares insight into the complexity of using AI to diagnose health conditions and offers a glimpse into the future of healthcare and medical information. Here’s the outline of this interview with Gari Clifford: [00:01:07] The road to machine learning and mobile health. [00:01:27] Lionel Tarassenko: neural networks and artificial intelligence. [00:03:36] San Francisco Data Institute Conference. [00:03:54] Jeremy Howard at fast.ai. [00:04:17] Director of Data Institute David Uminsky. [00:05:05] Dr. Roger Mark, Computing in Cardiology PhysioNet Challenges. [00:05:23] 2017 Challenge: Detecting atrial fibrillation in electrocardiograms. [00:05:44] Atrial Fibrillation. [00:06:08] KardiaMobile EKG monitor by AliveCor. [00:06:33] Random forests, support vector machines, heuristics, deep learning. [00:07:23] Experts don't always agree. [00:08:33] Labeling ECGs: AF, normal sinus rhythm, another rhythm, or noisy. [00:09:07] 20-30 experts are required to discern a stable diagnosis. [00:09:40] Podcast: Arrhythmias in Endurance Athletes, with Peter Backx, PhD. [00:11:17] Applying additional algorithm on top of all final algorithms: improved score from 83% to 87% accuracy. [00:11:38] Kaggle for machine learning competitions. [00:13:44] Overfitting an algorithm increases complexity, decreases utility. [00:15:01] 10,000 ECGs are not enough. [00:16:24] Podcast: How to Teach Machines That Can Learn with Dr. Pedro Domingos. [00:16:50] XGBoost. [00:19:18] Mechanical Turk. [00:20:08] QRS onset and T-wave offset. [00:21:31] Galaxy Zoo. [00:24:00] Podcast: Jason Moore of Elite HRV. [00:24:34] Andrew Ng. Paper: Rajpurkar, Pranav, et al. "Cardiologist-level arrhythmia detection with convolutional neural networks." arXiv preprint arXiv:1707.01836 (2017). [00:28:44] Detecting arrhythmias using other biomarkers. [00:30:41] Algorithms trained on specific patient populations not accurate for other populations. [00:31:24] Propensity matching. [00:31:55] Should we be sharing our medical data? [00:32:15] Privacy concerns associated with sharing medical data. [00:32:44] Mass scale research: possible with high-quality data across a large population. [00:33:04] Selling social media data in exchange for useful or entertaining software. [00:33:42] Who touched my medical data and why? [00:36:31] Siloing data, perhaps to protect the current industries. [00:37:03] Health Insurance Portability and Privacy Act (HIPPA). [00:37:34] Fast Healthcare Interoperability Resources (FHIR) protocol. [00:37:48] Microsoft HealthVault and Google Health. [00:38:46] Blockchain and 3blue1brown. [00:39:28] Where to go to learn more about Gari Clifford. [00:39:53] Presentation: Machine learning for FDA-approved consumer level point of care diagnostics – the wisdom of algorithm crowds: (the PhysioNet Computing in Cardiology Challenge 2017).

How to Test and Predict Blood, Urine and Stool for Health, Longevity and Performance

Play Episode Listen Later Jan 6, 2017 62:50

Dr Tim Gerstmar practices Naturopathic Medicine at his Redmond, WA office, Aspire Natural Health. He specialises in working with people with digestive and autoimmune problems, and has worked with many of the most difficult to treat situations using a blend of natural and conventional medicine. He treats patients locally, throughout the US and as far away as the Qatar, Korea and Australia. In this interview, Dr Gerstmar discusses the tests he most commonly uses, especially for gastrointestinal complaints. We also talk about strategies for dealing with health insurance and tips for keeping costs down. These scatter plots, sometimes called calibration plots, are the ones I mentioned in the podcast. On the x-axis is what my XGBoost model predicted for previously unseen data, the y-axis represents the measured value. When the dot appears on the diagonal line, the prediction was perfect. The model was trained using results from just 260 athletes. My hope is that is these models will eventually bring down the cost of our full program by allowing us to predict the results of an expensive test using a cheaper one. Here’s the outline of this interview with Dr Tim Gerstmar: [00:00:15] First podcast: Methylation and Environmental Pollutants with Dr. Tim Gerstmar. [00:00:56] Bob McRae podcast: How to Use Biomedical Testing for IRONMAN Performance. [00:01:24] Our Elite Performance Programme. [00:04:04] How much testing should we do? [00:04:35] Factoring in lifetime costs. [00:08:33] Donating blood to the Red Cross. [00:09:44] Iron disorders: ferritin and haemochromatosis. [00:10:53] Therapeutic phlebotomy. [00:14:22] Treating symptoms is sometimes necessary. [00:15:07] Steroids for eczema. [00:17:09] Adrenal dysregulation and thyroid dysfunction. [00:18:00] You can't feel high blood sugar in diabetes. [00:18:46] AIMed conference. [00:19:06] De Fauw, Jeffrey, et al. "Automated analysis of retinal imaging using machine learning techniques for computer vision." F1000Research 5 (2016). [00:20:38] 25-OH-D testing. See Optimizing Vitamin D for Athletic Performance. [00:22:01] Insurance interfering with testing. [00:23:29] Liberty HealthShare. [00:23:42] Affordable Care Act. [00:24:10] Direct Primary Care. [00:24:23] Health Share of Oregon. [00:25:08] Covered California. [00:25:32] Health Savings Account. [00:28:52] Genova GI Effects stool test. [00:29:27] BioHealth stool test. [00:29:49] Doctor's Data stool test. [00:30:13] Coeliac diagnosis. [00:30:50] Transglutaminase. [00:31:44] Genetic risk factors. [00:32:43] NCGS and FODMAPs. [00:34:22] Intestinal lymphoma. [00:37:47] Normal test results are still useful information. [00:38:45] Liver enzymes, e.g. ALT, AST and GGT. [00:39:19] CBC and CMP, Hs-CRP. [00:39:47] Testosterone and thyroid. [00:40:09] Genova SIBO test. [00:41:08] Organic acids by Genova and Great Plains. [00:41:55] Beware insurance with OATs. [00:43:23] Verifying your policy. [00:44:07] Mitochondrial function. [00:44:21] Nutrient deficiencies. [00:44:52] Neurotransmitters and brain function. [00:45:04] Oxidative stress. [00:45:12] Detox stress, GSH status. [00:45:38] Bacterial and yeast markers. [00:46:49] Cortisol testing--DUTCH. [00:48:45] Interview with Pedro Domingos: How to Teach Machines That Can Learn. [00:49:08] XGBoost. [00:50:33] Robb Wolf early adoption costs. [00:51:54] HRV. See Elite HRV podcast. [00:52:08] Supplement companies and self-assessment questionnaires. [00:53:10] Arabinose. [00:53:48] Hallucinating from noise in the data. [00:54:52] Big Data. [00:56:03] Abnormality detection. [00:56:45] Functional versus pathological lab ranges. [00:57:46] Mark Newman. See cortisol testing above. [00:58:23] Adjusted reference ranges. [00:58:45] Vanity sizing. [01:00:52] Thyroid cancer and proximity to a mine. [01:01:21] Aspire Natural Health podcast.

australia interview doctors blood performance data oregon normal insurance dutch korea qatar wa organic treating detox big data alt functional cbc genetic predict steroids vanity liver automated supplement red cross therapeutic testosterone affordable care act thyroid cortisol redmond urine donating ast nutrient naturopathic medicine hrv aimed oats genova great plains athletic performance adjusted stool adrenal intestinal cmp bacterial neurotransmitters factoring health savings accounts mitochondrial verifying robb wolf direct primary care methylation abnormalities fodmaps oxidative hallucinating health longevity coeliac covered california gsh ggt mark newman oh d xgboost liberty healthshare ncgs biohealth transglutaminase gerstmar aspire natural health

How to Teach Machines That Can Learn