Podcasts about Project Jupyter

Nonprofit organization developing open-source software

21PODCASTS
26EPISODES
47mAVG DURATION
1MONTHLY NEW EPISODE
Dec 1, 2025LATEST

POPULARITY

20172018201920202021202220232024

Best podcasts about Project Jupyter

Google Cloud Platform Podcast

4 episodes with Project Jupyter

Rebuild

2 episodes with Project Jupyter

Latest podcast episodes about Project Jupyter

Teaching a Billion People to Code: How JupyterLite Is Scaling the Impossible

The New Stack Podcast

Play Episode Listen Later Dec 1, 2025 19:18

JupyterLite, a fully browser-based distribution of JupyterLab, is enabling new levels of global scalability in technical education. Developed by Sylvain Corlay's QuantStack team, it allows math and programming lessons to run entirely in students' browsers — kernel included — without relying on Docker or cloud-scale infrastructure. Its most prominent success is Capytale, a French national deployment that supports half a million high school students and over 200,000 weekly sessions from essentially a single server, which hosts only teaching content while computation happens locally in each browser.QuantStack, founded in 2016 as what Corlay calls an “accidental startup,” has since grown into a 30-person team contributing across Jupyter, Conda-Forge, and Apache Arrow. But JupyterLite embodies its most ambitious goal: making programming education accessible to countries with rapidly growing youth populations, such as Nigeria, where traditional cloud-hosted notebooks are impractical. Achieving a billion-user future will require advances in accessibility, collaboration, and expanding browser-based package support — efforts that depend on grants and foundation backing.Learn more from The New Stack about Project JupyterFrom Physics to the Future: Brian Granger on Project Jupyter in the Age of AIJupyter AI v3: Could It Generate an ‘Ecosystem of AI Personas?'Join our community of newsletter subscribers to stay on top of the news and at the top of your game. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

french teaching code impossible nigeria achieving scaling ecosystem open source developed simplecast docker billion people jupyter new stack apache arrow jupyterlab project jupyter jupytercon new stack makers

From Physics to the Future: Brian Granger on Project Jupyter in the Age of AI

The New Stack Podcast

Play Episode Listen Later Nov 13, 2025 23:26

In an interview at JupyterCon, Brian Granger — co-creator of Project Jupyter and senior principal technologist at AWS — reflected on Jupyter's evolution and how AI is redefining open source sustainability. Originally inspired by physics' modular principles, Granger and co-founder Fernando Pérez designed Jupyter with flexible, extensible components like the notebook format and kernel message protocol. This architecture has endured as the ecosystem expanded from data science into AI and machine learning. Now, AI is accelerating development itself: Granger described rewriting Jupyter Server in Go, complete with tests, in just 30 minutes using an AI coding agent — a task once considered impossible. This shift challenges traditional notions of technical debt and could reshape how large open source projects evolve. Jupyter's 2017 ACM Software System Award placed it among computing's greats, but also underscored its global responsibility. Granger emphasized that sustaining Jupyter's mission — empowering human reasoning, collaboration, and innovation — remains the team's top priority in the AI era. Learn more from The New Stack about the latest in Jupyter AI development: Introduction to Jupyter Notebooks for Developers Display AI-Generated Images in a Jupyter Notebook Join our community of newsletter subscribers to stay on top of the news and at the top of your game. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

ai tech physics open source aws software engineers simplecast granger software developers tech podcast fernando p jupyter jupyter notebooks new stack developer podcast project jupyter brian granger jupytercon new stack makers

Notebooks = Chat++ and RAG = RecSys! — with Bryan Bischof of Hex Magic

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Nov 29, 2023 51:54

Catch us at Modular's ModCon next week with Chris Lattner, and join our community!Due to Bryan's very wide ranging experience in data science and AI across Blue Bottle (!), StitchFix, Weights & Biases, and now Hex Magic, this episode can be considered a two-parter.Notebooks = Chat++We've talked a lot about AI UX (in our meetups, writeups, and guest posts), and today we're excited to dive into a new old player in AI interfaces: notebooks! Depending on your background, you either Don't Like or you Like notebooks — they are the most popular example of Knuth's Literate Programming concept, basically a collection of cells; each cell can execute code, display it, and share its state with all the other cells in a notebook. They can also simply be Markdown cells to add commentary to the analysis. Notebooks have a long history but most recently became popular from iPython evolving into Project Jupyter, and a wave of notebook based startups from Observable to DeepNote and Databricks sprung up for the modern data stack.The first wave of AI applications has been very chat focused (ChatGPT, Character.ai, Perplexity, etc). Chat as a user interface has a few shortcomings, the major one being the inability to edit previous messages. We enjoyed Bryan's takes on why notebooks feel like “Chat++” and how they are building Hex Magic:* Atomic actions vs Stream of consciousness: in a chat interface, you make corrections by adding more messages to a conversation (i.e. “Can you try again by doing X instead?” or “I actually meant XYZ”). The context can easily get messy and confusing for models (and humans!) to follow. Notebooks' cell structure on the other hand allows users to go back to any previous cells and make edits without having to add new ones at the bottom. * “Airlocks” for repeatability: one of the ideas they came up with at Hex is “airlocks”, a collection of cells that depend on each other and keep each other in sync. If you have a task like “Create a summary of my customers' recent purchases”, there are many sub-tasks to be done (look up the data, sum the amounts, write the text, etc). Each sub-task will be in its own cell, and the airlock will keep them all in sync together.* Technical + Non-Technical users: previously you had to use Python / R / Julia to write notebooks code, but with models like GPT-4, natural language is usually enough. Hex is also working on lowering the barrier of entry for non-technical users into notebooks, similar to how Code Interpreter is doing the same in ChatGPT. Obviously notebooks aren't new for developers (OpenAI Cookbooks are a good example), but haven't had much adoption in less technical spheres. Some of the shortcomings of chat UIs + LLMs lowering the barrier of entry to creating code cells might make them a much more popular UX going forward.RAG = RecSys!We also talked about the LLMOps landscape and why it's an “iron mine” rather than a “gold rush”: I'll shamelessly steal [this] from a friend, Adam Azzam from Prefect. He says that [LLMOps] is more of like an iron mine than a gold mine in the sense of there is a lot of work to extract this precious, precious resource. Don't expect to just go down to the stream and do a little panning. There's a lot of work to be done. And frankly, the steps to go from this resource to something valuable is significant.Some of my favorite takeaways:* RAG as RecSys for LLMs: at its core, the goal of a RAG pipeline is finding the most relevant documents based on a task. This isn't very different from traditional recommendation system products that surface things for users. How can we apply old lessons to this new problem? Bryan cites fellow AIE Summit speaker and Latent Space Paper Club host Eugene Yan in decomposing the retrieval problem into retrieval, filtering, and scoring/ranking/ordering:As AI Engineers increasingly find that long context has tradeoffs, they will also have to relearn age old lessons that vector search is NOT all you need and a good systems not models approach is essential to scalable/debuggable RAG. Good thing Bryan has just written the first O'Reilly book about modern RecSys, eh?* Narrowing down evaluation: while “hallucination” is a easy term to throw around, the reality is more nuanced. A lot of times, model errors can be automatically fixed: is this JSON valid? If not, why? Is it just missing a closing brace? These smaller issues can be checked and fixed before returning the response to the user, which is easier than fixing the model.* Fine-tuning isn't all you need: when they first started building Magic, one of the discussions was around fine-tuning a model. In our episode with Jeremy Howard we talked about how fine-tuning leads to loss of capabilities as well. In notebooks, you are often dealing with domain-specific data (i.e. purchases, orders, wardrobe composition, household items, etc); the fact that the model understands that “items” are probably part of an “order” is really helpful. They have found that GPT-4 + 3.5-turbo were everything they needed to ship a great product rather than having to fine-tune on notebooks specifically.Definitely recommend listening to this one if you are interested in getting a better understanding of how to think about AI, data, and how we can use traditional machine learning lessons in large language models. The AI PivotFor more Bryan, don't miss his fireside chat at the AI Engineer Summit:Show Notes* Hex Magic* Bryan's new book: Building Recommendation Systems in Python and JAX* Bryan's whitepaper about MLOps* “Kitbashing in ML”, slides from his talk on building on top of foundation models* “Bayesian Statistics The Fun Way” by Will Kurt* Bryan's Twitter* “Berkeley man determined to walk every street in his city”* People:* Adam Azzam* Graham Neubig* Eugene Yan* Even OldridgeTimestamps* [00:00:00] Bryan's background* [00:02:34] Overview of Hex and the Magic product* [00:05:57] How Magic handles the complex notebook format to integrate cleanly with Hex* [00:08:37] Discussion of whether to build vs buy models - why Hex uses GPT-4 vs fine-tuning* [00:13:06] UX design for Magic with Hex's notebook format (aka “Chat++”)* [00:18:37] Expanding notebooks to less technical users* [00:23:46] The "Memex" as an exciting underexplored area - personal knowledge graph and memory augmentation* [00:27:02] What makes for good LLMops vs MLOps* [00:34:53] Building rigorous evaluators for Magic and best practices* [00:36:52] Different types of metrics for LLM evaluation beyond just end task accuracy* [00:39:19] Evaluation strategy when you don't own the core model that's being evaluated* [00:41:49] All the places you can make improvements outside of retraining the core LLM* [00:45:00] Lightning RoundTranscriptAlessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, Partner and CTO-in-Residence of Decibel Partners, and today I'm joining by Bryan Bischof. [00:00:15]Bryan: Hey, nice to meet you. [00:00:17]Alessio: So Bryan has one of the most thorough and impressive backgrounds we had on the show so far. Lead software engineer at Blue Bottle Coffee, which if you live in San Francisco, you know a lot about. And maybe you'll tell us 30 seconds on what that actually means. You worked as a data scientist at Stitch Fix, which used to be one of the premier data science teams out there. [00:00:38]Bryan: It used to be. Ouch. [00:00:39]Alessio: Well, no, no. Well, you left, you know, so how good can it still be? Then head of data science at Weights and Biases. You're also a professor at Rutgers and you're just wrapping up a new O'Reilly book as well. So a lot, a lot going on. Yeah. [00:00:52]Bryan: And currently head of AI at Hex. [00:00:54]Alessio: Let's do the Blue Bottle thing because I definitely want to hear what's the, what's that like? [00:00:58]Bryan: So I was leading data at Blue Bottle. I was the first data hire. I came in to kind of get the data warehouse in order and then see what we could build on top of it. But ultimately I mostly focused on demand forecasting, a little bit of recsys, a little bit of sort of like website optimization and analytics. But ultimately anything that you could imagine sort of like a retail company needing to do with their data, we had to do. I sort of like led that team, hired a few people, expanded it out. One interesting thing was I was part of the Nestle acquisition. So there was a period of time where we were sort of preparing for that and didn't know, which was a really interesting dynamic. Being acquired is a very not necessarily fun experience for the data team. [00:01:37]Alessio: I build a lot of internal tools for sourcing at the firm and we have a small VCs and data community of like other people doing it. And I feel like if you had a data feed into like the Blue Bottle in South Park, the Blue Bottle at the Hanahaus in Palo Alto, you can get a lot of secondhand information on the state of VC funding. [00:01:54]Bryan: Oh yeah. I feel like the real source of alpha is just bugging a Blue Bottle. [00:01:58]Alessio: Exactly. And what's your latest book about? [00:02:02]Bryan: I just wrapped up a book with a coauthor Hector Yee called Building Production Recommendation Systems. I'll give you the rest of the title because it's fun. It's in Python and JAX. And so for those of you that are like eagerly awaiting the first O'Reilly book that focuses on JAX, here you go. [00:02:17]Alessio: Awesome. And we'll chat about that later on. But let's maybe talk about Hex and Magic before. I've known Hex for a while, I've used it as a notebook provider and you've been working on a lot of amazing AI enabled experiences. So maybe run us through that. [00:02:34]Bryan: So I too, before I sort of like joined Hex, saw it as this like really incredible notebook platform, sort of a great place to do data science workflows, quite complicated, quite ad hoc interactive ones. And before I joined, I thought it was the best place to do data science workflows. And so when I heard about the possibility of building AI tools on top of that platform, that seemed like a huge opportunity. In particular, I lead the product called Magic. Magic is really like a suite of sort of capabilities as opposed to its own independent product. What I mean by that is they are sort of AI enhancements to the existing product. And that's a really important difference from sort of building something totally new that just uses AI. It's really important to us to enhance the already incredible platform with AI capabilities. So these are things like the sort of obvious like co-pilot-esque vibes, but also more interesting and dynamic ways of integrating AI into the product. And ultimately the goal is just to make people even more effective with the platform. [00:03:38]Alessio: How do you think about the evolution of the product and the AI component? You know, even if you think about 10 months ago, some of these models were not really good on very math based tasks. Now they're getting a lot better. I'm guessing a lot of your workloads and use cases is data analysis and whatnot. [00:03:53]Bryan: When I joined, it was pre 4 and it was pre the sort of like new chat API and all that. But when I joined, it was already clear that GPT was pretty good at writing code. And so when I joined, they had already executed on the vision of what if we allowed the user to ask a natural language prompt to an AI and have the AI assist them with writing code. So what that looked like when I first joined was it had some capability of writing SQL and it had some capability of writing Python and it had the ability to explain and describe code that was already written. Those very, what feel like now primitive capabilities, believe it or not, were already quite cool. It's easy to look back and think, oh, it's like kind of like Stone Age in these timelines. But to be clear, when you're building on such an incredible platform, adding a little bit of these capabilities feels really effective. And so almost immediately I started noticing how it affected my own workflow because ultimately as sort of like an engineering lead and a lot of my responsibility is to be doing analytics to make data driven decisions about what products we build. And so I'm actually using Hex quite a bit in the process of like iterating on our product. When I'm using Hex to do that, I'm using Magic all the time. And even in those early days, the amount that it sped me up, that it enabled me to very quickly like execute was really impressive. And so even though the models weren't that good at certain things back then, that capability was not to be underestimated. But to your point, the models have evolved between 3.5 Turbo and 4. We've actually seen quite a big enhancement in the kinds of tasks that we can ask Magic and even more so with things like function calling and understanding a little bit more of the landscape of agent workflows, we've been able to really accelerate. [00:05:57]Alessio: You know, I tried using some of the early models in notebooks and it actually didn't like the IPyNB formatting, kind of like a JSON plus XML plus all these weird things. How have you kind of tackled that? Do you have some magic behind the scenes to make it easier for models? Like, are you still using completely off the shelf models? Do you have some proprietary ones? [00:06:19]Bryan: We are using at the moment in production 3.5 Turbo and GPT-4. I would say for a large number of our applications, GPT-4 is pretty much required. To your question about, does it understand the structure of the notebook? And does it understand all of this somewhat complicated wrappers around the content that you want to show? We do our very best to abstract that away from the model and make sure that the model doesn't have to think about what the cell wrapper code looks like. Or for our Magic charts, it doesn't have to speak the language of Vega. These are things that we put a lot of work in on the engineering side, to the AI engineer profile. This is the AI engineering work to get all of that out of the way so that the model can speak in the languages that it's best at. The model is quite good at SQL. So let's ensure that it's speaking the language of SQL and that we are doing the engineering work to get the output of that model, the generations, into our notebook format. So too for other cell types that we support, including charts, and just in general, understanding the flow of different cells, understanding what a notebook is, all of that is hard work that we've done to ensure that the model doesn't have to learn anything like that. I remember early on, people asked the question, are you going to fine tune a model to understand Hex cells? And almost immediately, my answer was no. No we're not. Using fine-tuned models in 2022, I was already aware that there are some limitations of that approach and frankly, even using GPT-3 and GPT-2 back in the day in Stitch Fix, I had already seen a lot of instances where putting more effort into pre- and post-processing can avoid some of these larger lifts. [00:08:14]Alessio: You mentioned Stitch Fix and GPT-2. How has the balance between build versus buy, so to speak, evolved? So GPT-2 was a model that was not super advanced, so for a lot of use cases it was worth building your own thing. Is with GPT-4 and the likes, is there a reason to still build your own models for a lot of this stuff? Or should most people be fine-tuning? How do you think about that? [00:08:37]Bryan: Sometimes people ask, why are you using GPT-4 and why aren't you going down the avenue of fine-tuning today? I can get into fine-tuning specifically, but I do want to talk a little bit about the good old days of GPT-2. Shout out to Reza. Reza introduced me to GPT-2. I still remember him explaining the difference between general transformers and GPT. I remember one of the tasks that we wanted to solve with transformer-based generative models at Stitch Fix were writing descriptions of clothing. You might think, ooh, that's a multi-modal problem. The answer is, not necessarily. We actually have a lot of features about the clothes that are almost already enough to generate some reasonable text. I remember at that time, that was one of the first applications that we had considered. There was a really great team of NLP scientists at Stitch Fix who worked on a lot of applications like this. I still remember being exposed to the GPT endpoint back in the days of 2. If I'm not mistaken, and feel free to fact check this, I'm pretty sure Stitch Fix was the first OpenAI customer, unlike their true enterprise application. Long story short, I ultimately think that depending on your task, using the most cutting-edge general model has some advantages. If those are advantages that you can reap, then go for it. So at Hex, why GPT-4? Why do we need such a general model for writing code, writing SQL, doing data analysis? Shouldn't a fine-tuned model just on Kaggle notebooks be good enough? I'd argue no. And ultimately, because we don't have one specific sphere of data that we need to write great data analysis workbooks for, we actually want to provide a platform for anyone to do data analysis about their business. To do that, you actually need to entertain an extremely general universe of concepts. So as an example, if you work at Hex and you want to do data analysis, our projects are called Hexes. That's relatively straightforward to teach it. There's a concept of a notebook. These are data science notebooks, and you want to ask analytics questions about notebooks. Maybe if you trained on notebooks, you could answer those questions, but let's come back to Blue Bottle. If I'm at Blue Bottle and I have data science work to do, I have to ask it questions about coffee. I have to ask it questions about pastries, doing demand forecasting. And so very quickly, you can see that just by serving just those two customers, a model purely fine-tuned on like Kaggle competitions may not actually fit the bill. And so the more and more that you want to build a platform that is sufficiently general for your customer base, the more I think that these large general models really pack a lot of additional opportunity in. [00:11:21]Alessio: With a lot of our companies, we talked about stuff that you used to have to extract features for, now you have out of the box. So say you're a travel company, you want to do a query, like show me all the hotels and places that are warm during spring break. It would be just literally like impossible to do before these models, you know? But now the model knows, okay, spring break is like usually these dates and like these locations are usually warm. So you get so much out of it for free. And in terms of Magic integrating into Hex, I think AI UX is one of our favorite topics and how do you actually make that seamless. In traditional code editors, the line of code is like kind of the atomic unit and HEX, you have the code, but then you have the cell also. [00:12:04]Bryan: I think the first time I saw Copilot and really like fell in love with Copilot, I thought finally, fancy auto-complete. And that felt so good. It felt so elegant. It felt so right sized for the task. But as a data scientist, a lot of the work that you do previous to the ML engineering part of the house, you're working in these cells and these cells are atomic. They're expressing one idea. And so ultimately, if you want to make the transition from something like this code, where you've got like a large amount of code and there's a large amount of files and they kind of need to have awareness of one another, and that's a long story and we can talk about that. But in this atomic, somewhat linear flow through the notebook, what you ultimately want to do is you want to reason with the agent at the level of these individual thoughts, these atomic ideas. Usually it's good practice in say Jupyter notebook to not let your cells get too big. If your cell doesn't fit on one page, that's like kind of a code smell, like why is it so damn big? What are you doing in this cell? That also lends some hints as to what the UI should feel like. I want to ask questions about this one atomic thing. So you ask the agent, take this data frame and strip out this prefix from all the strings in this column. That's an atomic task. It's probably about two lines of pandas. I can write it, but it's actually very natural to ask magic to do that for me. And what I promise you is that it is faster to ask magic to do that for me. At this point, that kind of code, I never write. And so then you ask the next question, which is what should the UI be to do chains, to do multiple cells that work together? Because ultimately a notebook is a chain of cells and actually it's a first class citizen for Hex. So we have a DAG and the DAG is the execution DAG for the individual cells. This is one of the reasons that Hex is reactive and kind of dynamic in that way. And so the very next question is, what is the sort of like AI UI for these collections of cells? And back in June and July, we thought really hard about what does it feel like to ask magic a question and get a short chain of cells back that execute on that task. And so we've thought a lot about sort of like how that breaks down into individual atomic units and how those are tied together. We introduced something which is kind of an internal name, but it's called the airlock. And the airlock is exactly a sequence of cells that refer to one another, understand one another, use things that are happening in other cells. And it gives you a chance to sort of preview what magic has generated for you. Then you can accept or reject as an entire group. And that's one of the reasons we call it an airlock, because at any time you can sort of eject the airlock and see it in the space. But to come back to your question about how the AI UX fits into this notebook, ultimately a notebook is very conversational in its structure. I've got a series of thoughts that I'm going to express as a series of cells. And sometimes if I'm a kind data scientist, I'll put some text in between them too, explaining what on earth I'm doing. And that feels, in my opinion, and I think this is quite shared amongst exons, that feels like a really nice refinement of the chat UI. I've been saying for several months now, like, please stop building chat UIs. There is some irony because I think what the notebook allows is like chat plus plus. [00:15:36]Alessio: Yeah, I think the first wave of everything was like chat with X. So it was like chat with your data, chat with your documents and all of this. But people want to code, you know, at the end of the day. And I think that goes into the end user. I think most people that use notebooks are software engineer, data scientists. I think the cool things about these models is like people that are not traditionally technical can do a lot of very advanced things. And that's why people like code interpreter and chat GBT. How do you think about the evolution of that persona? Do you see a lot of non-technical people also now coming to Hex to like collaborate with like their technical folks? [00:16:13]Bryan: Yeah, I would say there might even be more enthusiasm than we're prepared for. We're obviously like very excited to bring what we call the like low floor user into this world and give more people the opportunity to self-serve on their data. We wanted to start by focusing on users who are already familiar with Hex and really make magic fantastic for them. One of the sort of like internal, I would say almost North Stars is our team's charter is to make Hex feel more magical. That is true for all of our users, but that's easiest to do on users that are already able to use Hex in a great way. What we're hearing from some customers in particular is sort of like, I'm excited for some of my less technical stakeholders to get in there and start asking questions. And so that raises a lot of really deep questions. If you immediately enable self-service for data, which is almost like a joke over the last like maybe like eight years, if you immediately enabled self-service, what challenges does that bring with it? What risks does that bring with it? And so it has given us the opportunity to think about things like governance and to think about things like alignment with the data team and making sure that the data team has clear visibility into what the self-service looks like. Having been leading a data team, trying to provide answers for stakeholders and hearing that they really want to self-serve, a question that we often found ourselves asking is, what is the easiest way that we can keep them on the rails? What is the easiest way that we can set up the data warehouse and set up our tools such that they can ask and answer their own questions without coming away with like false answers? Because that is such a priority for data teams, it becomes an important focus of my team, which is, okay, magic may be an enabler. And if it is, what do we also have to respect? We recently introduced the data manager and the data manager is an auxiliary sort of like tool on the Hex platform to allow people to write more like relevant metadata about their data warehouse to make sure that magic has access to the best information. And there are some things coming to kind of even further that story around governance and understanding. [00:18:37]Alessio: You know, you mentioned self-serve data. And when I was like a joke, you know, the whole rush to the modern data stack was something to behold. Do you think AI is like in a similar space where it's like a bit of a gold rush? [00:18:51]Bryan: I have like sort of two comments here. One I'll shamelessly steal from a friend, Adam Azzam from Prefect. He says that this is more of like an iron mine than a gold mine in the sense of there is a lot of work to extract this precious, precious resource. And that's the first one is I think, don't expect to just go down to the stream and do a little panning. There's a lot of work to be done. And frankly, the steps to go from this like gold to, or this resource to something valuable is significant. I think people have gotten a little carried away with the old maxim of like, don't go pan for gold, sell pickaxes and shovels. It's a much stronger business model. At this point, I feel like I look around and I see more pickaxe salesmen and shovel salesmen than I do prospectors. And that scares me a little bit. Metagame where people are starting to think about how they can build tools for people building tools for AI. And that starts to give me a little bit of like pause in terms of like, how confident are we that we can even extract this resource into something valuable? I got a text message from a VC earlier today, and I won't name the VC or the fund, but the question was, what are some medium or large size companies that have integrated AI into their platform in a way that you're really impressed by? And I looked at the text message for a few minutes and I was finding myself thinking and thinking, and I responded, maybe only co-pilot. It's been a couple hours now, and I don't think I've thought of another one. And I think that's where I reflect again on this, like iron versus gold. If it was really gold, I feel like I'd be more blown away by other AI integrations. And I'm not yet. [00:20:40]Alessio: I feel like all the people finding gold are the ones building things that traditionally we didn't focus on. So like mid-journey. I've talked to a company yesterday, which I'm not going to name, but they do agents for some use case, let's call it. They are 11 months old. They're making like 8 million a month in revenue, but in a space that you wouldn't even think about selling to. If you were like a shovel builder, you wouldn't even go sell to those people. And Swix talks about this a bunch, about like actually trying to go application first for some things. Let's actually see what people want to use and what works. What do you think are the most maybe underexplored areas in AI? Is there anything that you wish people were actually trying to shovel? [00:21:23]Bryan: I've been saying for a couple of months now, if I had unlimited resources and I was just sort of like truly like, you know, on my own building whatever I wanted, I think the thing that I'd be most excited about is building sort of like the personal Memex. The Memex is something that I've wanted since I was a kid. And are you familiar with the Memex? It's the memory extender. And it's this idea that sort of like human memory is quite weak. And so if we can extend that, then that's a big opportunity. So I think one of the things that I've always found to be one of the limiting cases here is access. How do you access that data? Even if you did build that data like out, how would you quickly access it? And one of the things I think there's a constellation of technologies that have come together in the last couple of years that now make this quite feasible. Like information retrieval has really improved and we have a lot more simple systems for getting started with information retrieval to natural language is ultimately the interface that you'd really like these systems to work on, both in terms of sort of like structuring the data and preparing the data, but also on the retrieval side. So what keys off the query for retrieval, probably ultimately natural language. And third, if you really want to go into like the purely futuristic aspect of this, it is latent voice to text. And that is also something that has quite recently become possible. I did talk to a company recently called gather, which seems to have some cool ideas in this direction, but I haven't seen yet what I, what I really want, which is I want something that is sort of like every time I listen to a podcast or I watch a movie or I read a book, it sort of like has a great vector index built on top of all that information that's contained within. And then when I'm having my next conversation and I can't quite remember the name of this person who did this amazing thing, for example, if we're talking about the Memex, it'd be really nice to have Vannevar Bush like pop up on my, you know, on my Memex display, because I always forget Vannevar Bush's name. This is one time that I didn't, but I often do. This is something that I think is only recently enabled and maybe we're still five years out before it can be good, but I think it's one of the most exciting projects that has become possible in the last three years that I think generally wasn't possible before. [00:23:46]Alessio: Would you wear one of those AI pendants that record everything? [00:23:50]Bryan: I think I'm just going to do it because I just like support the idea. I'm also admittedly someone who, when Google Glass first came out, thought that seems awesome. I know that there's like a lot of like challenges about the privacy aspect of it, but it is something that I did feel was like a disappointment to lose some of that technology. Fun fact, one of the early Google Glass developers was this MIT computer scientist who basically built the first wearable computer while he was at MIT. And he like took notes about all of his conversations in real time on his wearable and then he would have real time access to them. Ended up being kind of a scandal because he wanted to use a computer during his defense and they like tried to prevent him from doing it. So pretty interesting story. [00:24:35]Alessio: I don't know but the future is going to be weird. I can tell you that much. Talking about pickaxes, what do you think about the pickaxes that people built before? Like all the whole MLOps space, which has its own like startup graveyard in there. How are those products evolving? You know, you were at Wits and Biases before, which is now doing a big AI push as well. [00:24:57]Bryan: If you really want to like sort of like rub my face in it, you can go look at my white paper on MLOps from 2022. It's interesting. I don't think there's many things in that that I would these days think are like wrong or even sort of like naive. But what I would say is there are both a lot of analogies between MLOps and LLMops, but there are also a lot of like key differences. So like leading an engineering team at the moment, I think a lot more about good engineering practices than I do about good ML practices. That being said, it's been very convenient to be able to see around corners in a few of the like ML places. One of the first things I did at Hex was work on evals. This was in February. I hadn't yet been overwhelmed by people talking about evals until about May. And the reason that I was able to be a couple of months early on that is because I've been building evals for ML systems for years. I don't know how else to build an ML system other than start with the evals. I teach my students at Rutgers like objective framing is one of the most important steps in starting a new data science project. If you can't clearly state what your objective function is and you can't clearly state how that relates to the problem framing, you've got no hope. And I think that is a very shared reality with LLM applications. Coming back to one thing you mentioned from earlier about sort of like the applications of these LLMs. To that end, I think what pickaxes I think are still very valuable is understanding systems that are inherently less predictable, that are inherently sort of experimental. On my engineering team, we have an experimentalist. So one of the AI engineers, his focus is experiments. That's something that you wouldn't normally expect to see on an engineering team. But it's important on an AI engineering team to have one person whose entire focus is just experimenting, trying, okay, this is a hypothesis that we have about how the model will behave. Or this is a hypothesis we have about how we can improve the model's performance on this. And then going in, running experiments, augmenting our evals to test it, et cetera. What I really respect are pickaxes that recognize the hybrid nature of the sort of engineering tasks. They are ultimately engineering tasks with a flavor of ML. And so when systems respect that, I tend to have a very high opinion. One thing that I was very, very aligned with Weights and Biases on is sort of composability. These systems like ML systems need to be extremely composable to make them much more iterative. If you don't build these systems in composable ways, then your integration hell is just magnified. When you're trying to iterate as fast as people need to be iterating these days, I think integration hell is a tax not worth paying. [00:27:51]Alessio: Let's talk about some of the LLM native pickaxes, so to speak. So RAG is one. One thing is doing RAG on text data. One thing is doing RAG on tabular data. We're releasing tomorrow our episode with Kube, the semantic layer company. Curious to hear your thoughts on it. How are you doing RAG, pros, cons? [00:28:11]Bryan: It became pretty obvious to me almost immediately that RAG was going to be important. Because ultimately, you never expect your model to have access to all of the things necessary to respond to a user's request. So as an example, Magic users would like to write SQL that's relevant to their business. And it's important then to have the right data objects that they need to query. We can't expect any LLM to understand our user's data warehouse topology. So what we can expect is that we can build a RAG system that is data warehouse aware, data topology aware, and use that to provide really great information to the model. If you ask the model, how are my customers trending over time? And you ask it to write SQL to do that. What is it going to do? Well, ultimately, it's going to hallucinate the structure of that data warehouse that it needs to write a general query. Most likely what it's going to do is it's going to look in its sort of memory of Stack Overflow responses to customer queries, and it's going to say, oh, it's probably a customer stable and we're in the age of DBT, so it might be even called, you know, dim customers or something like that. And what's interesting is, and I encourage you to try, chatGBT will do an okay job of like hallucinating up some tables. It might even hallucinate up some columns. But what it won't do is it won't understand the joins in that data warehouse that it needs, and it won't understand the data caveats or the sort of where clauses that need to be there. And so how do you get it to understand those things? Well, this is textbook RAG. This is the exact kind of thing that you expect RAG to be good at augmenting. But I think where people who have done a lot of thinking about RAG for the document case, they think of it as chunking and sort of like the MapReduce and the sort of like these approaches. But I think people haven't followed this train of thought quite far enough yet. Jerry Liu was on the show and he talked a little bit about thinking of this as like information retrieval. And I would push that even further. And I would say that ultimately RAG is just RecSys for LLM. As I kind of already mentioned, I'm a little bit recommendation systems heavy. And so from the beginning, RAG has always felt like RecSys to me. It has always felt like you're building a recommendation system. And what are you trying to recommend? The best possible resources for the LLM to execute on a task. And so most of my approach to RAG and the way that we've improved magic via retrieval is by building a recommendation system. [00:30:49]Alessio: It's funny, as you mentioned that you spent three years writing the book, the O'Reilly book. Things must have changed as you wrote the book. I don't want to bring out any nightmares from there, but what are the tips for people who want to stay on top of this stuff? Do you have any other favorite newsletters, like Twitter accounts that you follow, communities you spend time in? [00:31:10]Bryan: I am sort of an aggressive reader of technical books. I think I'm almost never disappointed by time that I've invested in reading technical manuscripts. I find that most people write O'Reilly or similar books because they've sort of got this itch that they need to scratch, which is that I have some ideas, I have some understanding that we're hard won, I need to tell other people. And there's something that, from my experience, correlates between that itch and sort of like useful information. As an example, one of the people on my team, his name is Will Kurt, he wrote a book sort of Bayesian statistics the fun way. I knew some Bayesian statistics, but I read his book anyway. And the reason was because I was like, if someone feels motivated to write a book called Bayesian statistics the fun way, they've got something to say about Bayesian statistics. I learned so much from that book. That book is like technically like targeted at someone with less knowledge and experience than me. And boy, did it humble me about my understanding of Bayesian statistics. And so I think this is a very boring answer, but ultimately like I read a lot of books and I think that they're a really valuable way to learn these things. I also regrettably still read a lot of Twitter. There is plenty of noise in that signal, but ultimately it is still usually like one of the first directions to get sort of an instinct for what's valuable. The other comment that I want to make is we are in this age of sort of like archive is becoming more of like an ad platform. I think that's a little challenging right now to kind of use it the way that I used to use it, which is for like higher signal. I've chatted a lot with a CMU professor, Graham Neubig, and he's been doing LLM evaluation and LLM enhancements for about five years and know that I didn't misspeak. And I think talking to him has provided me a lot of like directionality for more believable sources. Trying to cut through the hype. I know that there's a lot of other things that I could mention in terms of like just channels, but ultimately right now I think there's almost an abundance of channels and I'm a little bit more keen on high signal. [00:33:18]Alessio: The other side of it is like, I see so many people say, Oh, I just wrote a paper on X and it's like an article. And I'm like, an article is not a paper, but it's just funny how I know we were kind of chatting before about terms being reinvented and like people that are not from this space kind of getting into AI engineering now. [00:33:36]Bryan: I also don't want to be gatekeepy. Actually I used to say a lot to people, don't be shy about putting your ideas down on paper. I think it's okay to just like kind of go for it. And I, I myself have something on archive that is like comically naive. It's intentionally naive. Right now I'm less concerned by more naive approaches to things than I am by the purely like advertising approach to sort of writing these short notes and articles. I think blogging still has a good place. And I remember getting feedback during my PhD thesis that like my thesis sounded more like a long blog post. And I now feel like that curmudgeonly professor who's also like, yeah, maybe just keep this to the blogs. That's funny.Alessio: Uh, yeah, I think one of the things that Swyx said when he was opening the AI engineer summit a couple of weeks ago was like, look, most people here don't know much about the space because it's so new and like being open and welcoming. I think it's one of the goals. And that's why we try and keep every episode at a level that it's like, you know, the experts can understand and learn something, but also the novices can kind of like follow along. You mentioned evals before. I think that's one of the hottest topics obviously out there right now. What are evals? How do we know if they work? Yeah. What are some of the fun learnings from building them into X? [00:34:53]Bryan: I said something at the AI engineer summit that I think a few people have already called out, which is like, if you can't get your evals to be sort of like objective, then you're not trying hard enough. I stand by that statement. I'm not going to, I'm not going to walk it back. I know that that doesn't feel super good because people, people want to think that like their unique snowflake of a problem is too nuanced. But I think this is actually one area where, you know, in this dichotomy of like, who can do AI engineering? And the answer is kind of everybody. Software engineering can become AI engineering and ML engineering can become AI engineering. One thing that I think the more data science minded folk have an advantage here is we've gotten more practice in taking very vague notions and trying to put a like objective function around that. And so ultimately I would just encourage everybody who wants to build evals, just work incredibly hard on codifying what is good and bad in terms of these objective metrics. As far as like how you go about turning those into evals, I think it's kind of like sweat equity. Unfortunately, I told the CEO of gantry several months ago, I think it's been like six months now that I was sort of like looking at every single internal Hex request to magic by hand with my eyes and sort of like thinking, how can I turn this into an eval? Is there a way that I can take this real request during this dog foodie, not very developed stage? How can I make that into an evaluation? That was a lot of sweat equity that I put in a lot of like boring evenings, but I do think ultimately it gave me a lot of understanding for the way that the model was misbehaving. Another thing is how can you start to understand these misbehaviors as like auxiliary evaluation metrics? So there's not just one evaluation that you want to do for every request. It's easy to say like, did this work? Did this not work? Did the response satisfy the task? But there's a lot of other metrics that you can pull off these questions. And so like, let me give you an example. If it writes SQL that doesn't reference a table in the database that it's supposed to be querying against, we would think of that as a hallucination. You could separately consider, is it a hallucination as a valuable metric? You could separately consider, does it get the right answer? The right answer is this sort of like all in one shot, like evaluation that I think people jump to. But these intermediary steps are really important. I remember hearing that GitHub had thousands of lines of post-processing code around Copilot to make sure that their responses were sort of correct or in the right place. And that kind of sort of defensive programming against bad responses is the kind of thing that you can build by looking at many different types of evaluation metrics. Because you can say like, oh, you know, the Copilot completion here is mostly right, but it doesn't close the brace. Well, that's the thing you can check for. Or, oh, this completion is quite good, but it defines a variable that was like already defined in the file. Like that's going to have a problem. That's an evaluation that you could check separately. And so this is where I think it's easy to convince yourself that all that matters is does it get the right answer? But the more that you think about production use cases of these things, the more you find a lot of this kind of stuff. One simple example is like sometimes the model names the output of a cell, a variable that's already in scope. Okay. Like we can just detect that and like we can just fix that. And this is the kind of thing that like evaluations over time and as you build these evaluations over time, you really can expand the robustness in which you trust these models. And for a company like Hex, who we need to put this stuff in GA, we can't just sort of like get to demo stage or even like private beta stage. We really hunting GA on all of these capabilities. Did it get the right answer on some cases is not good enough. [00:38:57]Alessio: I think the follow up question to that is in your past roles, you own the model that you're evaluating against. Here you don't actually have control into how the model evolves. How do you think about the model will just need to improve or we'll use another model versus like we can build kind of like engineering post-processing on top of it. How do you make the choice? [00:39:19]Bryan: So I want to say two things here. One like Jerry Liu talked a little bit about in his episode, he talked a little bit about sort of like you don't always want to retrain the weights to serve certain use cases. Rag is another tool that you can use to kind of like soft tune. I think that's right. And I want to go back to my favorite analogy here, which is like recommendation systems. When you build a recommendation system, you build the objective function. You think about like what kind of recs you want to provide, what kind of features you're allowed to use, et cetera, et cetera. But there's always another step. There's this really wonderful collection of blog posts from Eugene Yon and then ultimately like even Oldridge kind of like iterated on that for the Merlin project where there's this multi-stage recommender. And the multi-stage recommender says the first step is to do great retrieval. Once you've done great retrieval, you then need to do great ranking. Once you've done great ranking, you need to then do a good job serving. And so what's the analogy here? Rag is retrieval. You can build different embedding models to encode different features in your latent space to ensure that your ranking model has the best opportunity. Now you might say, oh, well, my ranking model is something that I've got a lot of capability to adjust. I've got full access to my ranking model. I'm going to retrain it. And that's great. And you should. And over time you will. But there's one more step and that's downstream and that's the serving. Serving often sounds like I just show the s**t to the user, but ultimately serving is things like, did I provide diverse recommendations? Going back to Stitch Fix days, I can't just recommend them five shirts of the same silhouette and cut. I need to serve them a diversity of recommendations. Have I respected their requirements? They clicked on something that got them to this place. Is the recommendations relevant to that query? Are there any hard rules? Do we maybe not have this in stock? These are all things that you put downstream. And so much like the recommendations use case, there's a lot of knobs to pull outside of retraining the model. And even in recommendation systems, when do you retrain your model for ranking? Not nearly as much as you do other s**t. And even this like embedding model, you might fiddle with more often than the true ranking model. And so I think the only piece of the puzzle that you don't have access to in the LLM case is that sort of like middle step. That's okay. We've got plenty of other work to do. So right now I feel pretty enabled. [00:41:56]Alessio: That's great. You obviously wrote a book on RecSys. What are some of the key concepts that maybe people that don't have a data science background, ML background should keep in mind as they work in this area? [00:42:07]Bryan: It's easy to first think these models are stochastic. They're unpredictable. Oh, well, what are we going to do? I think of this almost like gaseous type question of like, if you've got this entropy, where can you put the entropy? Where can you let it be entropic and where can you constrain it? And so what I want to say here is think about the cases where you need it to be really tightly constrained. So why are people so excited about function calling? Because function calling feels like a way to constrict it. Where can you let it be more gaseous? Well, maybe in the way that it talks about what it wants to do. Maybe for planning, if you're building agents and you want to do sort of something chain of thoughty. Well, that's a place where the entropy can happily live. When you're building applications of these models, I think it's really important as part of the problem framing to be super clear upfront. These are the things that can be entropic. These are the things that cannot be. These are the things that need to be super rigid and really, really aligned to a particular schema. We've had a lot of success in making specific the parts that need to be precise and tightly schemified, and that has really paid dividends. And so other analogies from data science that I think are very valuable is there's the sort of like human in the loop analogy, which has been around for quite a while. And I have gone on record a couple of times saying that like, I don't really love human in the loop. One of the things that I think we can learn from human in the loop is that the user is the best judge of what is good. And the user is pretty motivated to sort of like interact and give you kind of like additional nudges in the direction that you want. I think what I'd like to flip though, is instead of human in the loop, I'd like it to be AI in the loop. I'd rather center the user. I'd rather keep the user as the like core item at the center of this universe. And the AI is a tool. By switching that analogy a little bit, what it allows you to do is think about where are the places in which the user can reach for this as a tool, execute some task with this tool, and then go back to doing their workflow. It still gets this back and forth between things that computers are good at and things that humans are good at, which has been valuable in the human loop paradigm. But it allows us to be a little bit more, I would say, like the designers talk about like user-centered. And I think that's really powerful for AI applications. And it's one of the things that I've been trying really hard with Magic to make that feel like the workflow as the AI is right there. It's right where you're doing your work. It's ready for you anytime you need it. But ultimately you're in charge at all times and your workflow is what we care the most about. [00:44:56]Alessio: Awesome. Let's jump into lightning round. What's something that is not on your LinkedIn that you're passionate about or, you know, what's something you would give a TED talk on that is not work related? [00:45:05]Bryan: So I walk a lot. [00:45:07]Bryan: I have walked every road in Berkeley. And I mean like every part of every road even, not just like the binary question of, have you been on this road? I have this little app that I use called Wanderer, which just lets me like kind of keep track of everywhere I've been. And so I'm like a little bit obsessed. My wife would say a lot a bit obsessed with like what I call new roads. I'm actually more motivated by trails even than roads, but like I'm a maximalist. So kind of like everything and anything. Yeah. Believe it or not, I was even like in the like local Berkeley paper just talking about walking every road. So yeah, that's something that I'm like surprisingly passionate about. [00:45:45]Alessio: Is there a most underrated road in Berkeley? [00:45:49]Bryan: What I would say is like underrated is Kensington. So Kensington is like a little town just a teeny bit north of Berkeley, but still in the Berkeley hills. And Kensington is so quirky and beautiful. And it's a really like, you know, don't sleep on Kensington. That being said, one of my original motivations for doing all this walking was people always tell me like, Berkeley's so quirky. And I was like, how quirky is Berkeley? Turn it out. It's quite, quite quirky. It's also hard to say quirky and Berkeley in the same sentence I've learned as of now. [00:46:20]Alessio: That's a, that's a good podcast warmup for our next guests. All right. The actual lightning ground. So we usually have three questions, acceleration, exploration, then a takeaway acceleration. What's, what's something that's already here today that you thought would take much longer to arrive in AI and machine learning? [00:46:39]Bryan: So I invited the CEO of Hugging Face to my seminar when I worked at Stitch Fix and his talk at the time, honestly, like really annoyed me. The talk was titled like something to the effect of like LLMs are going to be the like technology advancement of the next decade. It's on YouTube. You can find it. I don't remember exactly the title, but regardless, it was something like LLMs for the next decade. And I was like, okay, they're like one modality of model, like whatever. His talk was fine. Like, I don't think it was like particularly amazing or particularly poor, but what I will say is damn, he was right. Like I, I don't think I quite was on board during that talk where I was like, ah, maybe, you know, like there's a lot of other modalities that are like moving pretty quick. I thought things like RL were going to be the like real like breakout success. And there's a little pun with Atari and breakout there, but yeah, like I, man, I was sleeping on LLMs and I feel a little embarrassed. I, yeah. [00:47:44]Alessio: Yeah. No, I mean, that's a good point. It's like sometimes the, we just had Jeremy Howard on the podcast and he was saying when he was talking about fine tuning, everybody thought it was dumb, you know, and then later people realize, and there's something to be said about messaging, especially like in technical audiences where there's kind of like the metagame, you know, which is like, oh, these are like the cool ideas people are exploring. I don't know where I want to align myself yet, you know, or whatnot. So it's cool exploration. So it's kind of like the opposite of that. You mentioned RL, right? That's something that was kind of like up and up and up. And then now it's people are like, oh, I don't know. Are there any other areas if you weren't working on, on magic that you want to go work on? [00:48:25]Bryan: Well, I did mention that, like, I think this like Memex product is just like incredibly exciting to me. And I think it's really opportunistic. I think it's very, very feasible, but I would maybe even extend that a little bit, which is I don't see enough people getting really enthusiastic about hardware with advanced AI built in. You're hearing whispering of it here and there, put on the whisper, but like you're starting to see people putting whisper into pieces of hardware and making that really powerful. I joked with, I can't think of her name. Oh, Sasha, who I know is a friend of the pod. Like I joked with Sasha that I wanted to make the big mouth Billy Bass as a babble fish, because at this point it's pretty easy to connect that up to whisper and talk to it in one language and have it talk in the other language. And I was like, this is the kind of s**t I want people building is like silly integrations between hardware and these new capabilities. And as much as I'm starting to hear whisperings here and there, it's not enough. I think I want to see more people going down this track because I think ultimately like these things need to be in our like physical space. And even though the margins are good on software, I want to see more like integration into my daily life. Awesome. [00:49:47]Alessio: And then, yeah, a takeaway, what's one message idea you want everyone to remember and think about? [00:49:54]Bryan: Even though earlier I was talking about sort of like, maybe like not reinventing things and being respectful of the sort of like ML and data science, like ideas. I do want to say that I think everybody should be experimenting with these tools as much as they possibly can. I've heard a lot of professors, frankly, express concern about their students using GPT to do their homework. And I took a completely opposite approach, which is in the first 15 minutes of the first class of my semester this year, I brought up GPT on screen and we talked about what GPT was good at. And we talked about like how the students can sort of like use it. I showed them an example of it doing data analysis work quite well. And then I showed them an example of it doing quite poorly. I think however much you're integrating with these tools or interacting with these tools, and this audience is probably going to be pretty high on that distribution. I would really encourage you to sort of like push this into the other people in your life. My wife is very technical. She's a product manager and she's using chat GPT almost every day for communication or for understanding concepts that are like outside of her sphere of excellence. And recently my mom and my sister have been sort of like onboarded onto the chat GPT train. And so ultimately I just, I think that like it is our duty to help other people see like how much of a paradigm shift this is. We should really be preparing people for what life is going to be like when these are everywhere. [00:51:25]Alessio: Awesome. Thank you so much for coming on, Bryan. This was fun. [00:51:29]Bryan: Yeah. Thanks for having me. And use Hex magic. [00:51:31] Get full access to Latent Space at www.latent.space/subscribe

ceo ai magic san francisco building phd partner fun mit chatgpt character serving ga software curious expanding stream chat berkeley cto depending vc nlp openai evaluation residence ux ended api vega south park gpt python ui turbo rutgers ml atari github merlin vcs palo alto nestle llm copilot biases stone age wanderer weights reza xyz modular dag sql perplexity kensington hex rag dbt narrowing google glass alessio notebooks stitch fix bayesian wits stack overflow rl json cmu xml bischof databricks markdown prefect uis metagame hexes chatgbt kube knuth kaggle observable north stars blue bottle jupyter jeremy howard gbt blue bottle coffee chris lattner mapreduce code interpreter billy bass ipython latent space memex jerry liu project jupyter how magic bryan it

The R Programming Language

The History of Computing

Play Episode Listen Later Apr 1, 2022 10:50

R is the 18th level of the Latin alphabet. It represents the rhotic consonant, or the r sound. It goes back to the Greek Rho, the Phoenician Resh before that and the Egyptian rêš, which is the same name the Egyptians had for head, before that. R appears in about 7 and a half percent of the words in the English dictionary. And R is probably the best language out there for programming around various statistical and machine learning tasks. We may use tools like Tensorflow imported to languages like python to prototype but R is incredibly performant for all the maths. And so it has become an essential piece of software for data scientists. The R programming language was created in 1993 by two statisticians Robert Gentleman, and Ross Ihaka at the University of Auckland, New Zealand. It has since been ported to practically every operating system and is available at r-project.org. Initially called "S," the name changed to "R" to avoid a trademark issue with a commercial software package that we'll discuss in a bit. R was primarily written in C but used Fortran and since even R itself. And there have been statistical packages since the very first computers were used for math. IBM in fact packaged up BMDP when they first started working on the idea at UCLA Health Computing Facility. That was 1957. Then came SPSS out of the University of Chicago in 1968. And the same year, John Sall and others gave us SAS, or Statistical Analysis System) out of North Carolina State University. And those evolved from those early days through into the 80s with the advent of object oriented everything and thus got not only windowing interfaces but also extensibility, code sharing, and as we moved into the 90s, acquisition's. BMDP was acquired by SPSS who was then acquired by IBM and the products were getting more expensive but not getting a ton of key updates for the same scientific and medical communities. And so we saw the upstarts in the 80s, Data Desk and JMP and others. Tools built for windowing operating systems and in object oriented languages. We got the ability to interactively manipulate data, zoom in and spin three dimensional representations of data, and all kinds of pretty aspects. But they were not a programmers tool. S was begun in the seventies at Bell Labs and was supposed to be a statistical MATLAB, a language specifically designed for number crunching. And the statistical techniques were far beyond where SPSS and SAS had stopped. And with the breakup of Ma Bell, parts of Bell became Lucent, which sold S to Insightful Corporation who released S-PLUS and would later get bought by TIBCO. Keep in mind, Bell was testing line quality and statistics and going back to World War II employed some of the top scientists in those fields, ones who would later create large chunks of the quality movement and implementations like Six Sigma. Once S went to a standalone software company basically, it became less about the statistics and more about porting to different computers to make more money. Private equity and portfolio conglomerates are, by nature, after improving the multiples on a line of business. But sometimes more statisticians in various feels might feel left behind. And this is where R comes into the picture. R gained popularity among statisticians because it made it easier to write complicated statistical algorithms without learning an entire programming language. Its popularity has grown significantly since then. R has been described as a cross between MATLAB and SPSS, but much faster. R was initially designed to be a language that could handle statistical analysis and other types of data mining, an offshoot of which we now call machine learning. R is also an open-source language and as with a number of other languages has plenty of packages available through a package repository - which they call CRAN (Comprehensive R Archive Network). This allows R to be used in fields outside of statistics and data science or to just get new methods to do math that doesn't belong in the main language. There are over 18,000 packages for R. One of the more popular is ggplot2, an open-source data visualization package. data.table is another that performs programmatic data manipulation operations. dplyr provides functions designed to enable data frame manipulation in an intuitive manner. tidyr helps create tidier data. Shiny generates interactive web apps. And there are plenty of packages to make R easier, faster, and more extensible. By 2015, more than 10 million people used R every month and it's now the 13th most popular language in use. And the needs have expanded. We can drop r scripts into other programs and tools for processing. And some of the workloads are huge. This led to the development of parallel computing, specifically using MPI (Message Passing Interface). R programming is one of the most popular languages used for statistical analysis, statistical graphics generation, and data science projects. There are other languages or tools for specific uses but it's even started being used in those. The latest version, R 4.1.2, was released on 21/11/01. R development, as with most thriving open source solutions, is guided by a group of core developers supported by contributions from the broader community. It became popular because it provides all essential features for data mining and graphics needed for academic research and industry applications and because of the pluggable and robust and versatile nature. And projects like tensorflow and numpy and sci-kit have evolved for other languages. And there are services from companies like Amazon that can host and process assets from both, both using unstructured databases like NoSQL or using Jupyter notebooks. A Jupyter Notebook is a JSON document, following a versioned schema that contains an ordered list of input/output cells which can contain code, text (using Markdown), formulas, algorithms, plots and even media like audio or video. Project Jupyter was a spin-off of iPython but the goal was to create a language-agnostic tool where we could execute aspects in Ruby or Haskel or Python or even R. This gives us so many ways to get our data into the notebook, in batches or deep learning environments or whatever pipeline needs to be built based on an organization's stack. Especially if the notebook has a frontend based on Amazon SageMaker Notebooks, Google's Colaboratory and Microsoft's Azure Notebook. Think about this. 25% of the languages lack a rhotic consonant. Sometimes it seems like we've got languages that do everything or that we've built products that do everything. But I bet no matter the industry or focus or sub-specialty, there's still 25% more automation or instigation into our own data to be done. Because there always will be.

Jupyter and the Evolution of ML Tooling with Brian Granger - #544

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Dec 13, 2021 57:09

Today we conclude our AWS re:Invent coverage joined by Brian Granger, a senior principal technologist at Amazon Web Services, and a co-creator of Project Jupyter. In our conversion with Brian, we discuss the inception and early vision of Project Jupyter, including how the explosion of machine learning and deep learning shifted the landscape for the notebook, and how they balanced the needs of these new user bases vs their existing community of scientific computing users. We also explore AWS's role with Jupyter and why they've decided to invest resources in the project, Brian's thoughts on the broader ML tooling space, and how they've applied (and the impact of) HCI principles to the building of these tools. Finally, we dig into the recent Sagemaker Canvas and Studio Lab releases and Brian's perspective on the future of notebooks and the Jupyter community at large. The complete show notes for this episode can be found at twimlai.com/go/544

evolution aws ml invent amazon web services tooling hci jupyter project jupyter brian granger

Data Wrangling in JavaScript with Ashley Davis - JSJ 484

Devchat.tv Master Feed

Play Episode Listen Later May 18, 2021 66:59

Ashley Davis jumps in to talk to Dan Shappir about wrangling data using JavaScript. Ashley describes his journey into JavaScript and his exposure to the web platform. From there he walks Dan through learning data science and building systems in Python before coming back to JavaScript. He talks through the tools and techniques used to manage data in JavaScript as well as how it can be done! Panel Dan Shappir Guest Ashley Davis Sponsors Dev Influencers Accelerator Raygun | Click here to get started on your free 14-day trial Links Data Wrangling with JavaScript Data-Forge Project Jupyter Charlie Gerard on Twitter Bootstrapping Microservices with Docker, Kubernetes, and Terraform Code Capers Data-Forge Notebook JSJ 442: Breaking Into Tech with Danny Thompson | Devchat.tv Twitter: Ashley Davis ( @ashleydavis75 ) Picks Ashley- AshleyDavis- Twitch Dan- Interlude: Rethinking the JavaScript Pipeline Operator

panel python javascript docker kubernetes terraform ashley davis jsj data wrangling project jupyter dev influencers accelerator raygun click

Data Wrangling in JavaScript with Ashley Davis - JSJ 484

JavaScript Jabber

Play Episode Listen Later May 18, 2021 66:59

panel python javascript docker kubernetes terraform ashley davis jsj data wrangling project jupyter dev influencers accelerator raygun click

Data Wrangling in JavaScript with Ashley Davis - JSJ 484

All JavaScript Podcasts by Devchat.tv

Play Episode Listen Later May 18, 2021 66:59

panel python javascript docker kubernetes terraform ashley davis jsj data wrangling project jupyter dev influencers accelerator raygun click

Hello, World: lenguajes de programación para el desarrollador políglota

Command Line Heroes en español

Play Episode Listen Later Apr 13, 2021 30:10

Todos los lenguajes de programación se crearon para lograr algo que antes era imposible. Actualmente existen muchísimos, ¿pero cuáles realmente necesitas conocer? En este episodio nos sumergiremos en la historia de los lenguajes de programación. Hablaremos de la genialidad de "Amazing Grace", también conocida como la Contralmirante Grace Hopper. A ella le debemos el que los desarrolladores no necesiten un doctorado en matemáticas para escribir sus programas en código informático. Nos acompañarán Carol Willing de Project Jupyter, exdirectora de la Python Software Foundation, y Clive Thompson, colaborador de la revista The New York Times Magazine y Wired, y autor de "Coders", un libro sobre la manera de pensar de los programadores.

21: Vicky Steeves on Reproducibility, Open Research, & Librarians (... and game modding)

FOSS and Crafts

Play Episode Listen Later Jan 18, 2021

We're joined by Vicky Steeves, a hyper-talented librarian specializing in data management, open and reproducible research, and the overlap between FOSS, free culture, and library sciences! We dive into all of that... plus a bit of crafting... and even... what's this? A discussion of what the FOSS world can learn from the world of game modding (and vice versa)!Links:ISAGE (Investigating & Archiving the Scholarly Git Experience) and, to be a bit meta, ISAGE's own data as open and reproducible research!Sherpa RomeoReproZipRelated F&C episode: Stefano Zacchiroli on preserving source code at Software HeritagebwFLA - Emulation as a ServiceStardew Valley modding page (note: Stardew Valley is not itself FOSS)Programming owes its strength to our long legacy of knitting by Carrie StokesProject JupyterGit, git-annex, and git-lfsDockerThe Types, Roles, and Practices of Documentation in Data Analytics Open Source Software Libraries: A Collaborative Ethnography of Documentation Work by R. Stuart Geiger

game service research types practices roles programming investigating librarians documentation stardew valley docker git foss archiving modding steeves reproducibility project jupyter

E18 - 5Minutes - Das JuPyteR Projekt

5Minds 5Minutes

Play Episode Listen Later Jul 24, 2020 7:43

In der 18. Folge geht es um Jupyter Notebooks und Frank von der Höh ist wieder dabei. In Folge 14 (E14 - 5Minutes - Julia) haben wir schon das Thema Jupyter Notebooks kurz gestreift. Heute erzählt uns Frank von der Höh etwas mehr über das Ökosystem Project Jupyter. Wo kommt das her, warum heißt es so, was gehört alles dazu und warum ist es eine der coolsten Erfindungen seit geschnitten Brot? Unterstützt es auch meine Lieblingssprache Brainfuck? Diese Fragen und mehr werden in der heutigen Folge beantwortet.

wo projekt brot diese fragen erfindungen jupyter jupyter notebooks project jupyter

AIM413: Deep dive on Project Jupyter

AWS re:Invent 2019

Play Episode Listen Later Dec 7, 2019 60:17

Amazon SageMaker offers fully managed Jupyter notebooks that you can use in the cloud so you can explore and visualize data and develop your machine learning model. In this session, we explain why we picked Jupyter notebooks, and how and why AWS is contributing to Project Jupyter. We dive deep into our overall strategy for Jupyter and explain different use cases for Jupyter, including data science, analytics, and simulation.

deep dive aws jupyter amazon sagemaker project jupyter

Episode 13: Jupyter Ecosystem

Open Source Directions hosted by Quansight

Play Episode Listen Later Feb 1, 2019

Project Jupyter exists to develop open-source software, open-standards, and services for interactive computing across dozens of programming languages. The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.

open ecosystem python roadmaps jupyter jupyter notebooks project jupyter

End of Year Wrap-up

Google Cloud Platform Podcast

Play Episode Listen Later Dec 11, 2018 32:50

Happy Holidays, everyone! Melanie and Mark wrap up a great year by reminiscing about some of their favorite episodes! We also talk about the big news of the year, our favorite articles, and what’s coming up for the GCP Podcast in 2019. Cool things of the week Kubernetes and GKE for developers: a year of Cloud Console blog Reducing gender bias in Google Translate blog Cloud Security Command Center is now in beta and ready to use blog Main content Podcast accomplishments! We have awesome new intro and outro music, new website, new YouTube videos! We hit 1 million and then 2 million downloads! Mark and the podcast are celebrating their three year anniversary! Top 10 most downloaded episodes of all time! GCP Podcast Episode 111: Google Cloud Platform with Sam Ramji podcast GCP Podcast Episode 112: Percy.io with Mike Fotinakis podcast GCP Podcast Episode 146: Google AI with Jeff Dean podcast GCP Podcast Episode 127: SRE vs Devops with Liz Fong-Jones and Seth Vargo podcast GCP Podcast Episode 128: Decision Intelligence with Cassie Kozyrkov podcast GCP Podcast Episode 113: Open Source TensorFlow with Yifei Feng podcast GCP Podcast Episode 88: Kubernetes 1.7 with Tim Hockin podcast GCP Podcast Episode 108: Launchpad Studio with Malika Cantor and Peter Norvig podcast GCP Podcast Episode 130: Data Science with Juliet Hougland and Michelle Casbon podcast GCP Podcast Episode 125: Open Source at Google Cloud Platform with Sarah Novotny podcast Top 10 most downloaded episodes for 2018! Exact same list except Tim Hockin is not #7. Following episodes go up a number and we added to #10 spot. GCP Podcast Episode 122: Project Jupyter with Jessica Forde, Yuvi Panda and Chris Holdgraf podcast Mark’s favorite episodes GCP Podcast Episode 129: Developer Relations with Mandy Waite podcast GCP Podcast Episode 121: Kontributing to Kubernetes with Paris Pittman and Garrett Rodrigues podcast GCP Podcast Episode 131: Actions on Google with Mandy Chan podcast GCP Podcast Episode 148: Wellio with Sivan Aldor-Noiman and Erik Andrejko podcast GCP Podcast Episode 110: CPU Vulnerability with Matt Linton and Paul Turner podcast GCP Podcast Episode 125: Open Source at Google Cloud Platform with Sarah Novotny podcast GCP Podcast Episode 140: Container Security with Maya Kaczorowski podcast Melanie’s favorite episodes GCP Podcast Episode 117: Cloud AI Fei-Fei Li was the Chief Scientist of AI/ML at Google podcast GCP Podcast Episode 114: ML Bias & Fairness with Timnit Gebru and Margaret Mitchell podcast GCP Podcast Episode 141: Accessibility in Tech podcast GCP Podcast Episode 136: Robotics with Raia podcast GCP Podcast Episode 150: Strange Loop, Remote Working, and Distributed Systems with KF podcast DL Indaba GCP Podcast Episode 147: DL Indaba: AI Investments in Africa podcast GCP Podcast Episode 149: Deep Learning Research in Africa with Yabebal Fantaye & Jessica Phalafala podcast GCP Podcast Episode 152: AI Corporations and Communities in Africa with Karim Beguir & Muthoni Wanyoike podcast GCP Podcast Episode 157: NeurIPS and AI Research with Anima Anandkumar podcast Favorite announcements, products, and more at Google Cloud Unity and Google Cloud Strategic Alliance blog Open Match blog Cloud TPU site Google Dataset Search is in beta site No tricks, just treats: Globally scaling the Halloween multiplayer Doodle with Open Match on Google Cloud blog GKE On-Prem site Open Source - Knative release, Skaffold, Istio updates, gVisor, etc. Google in Ghana blog Cloud NEXT blog GCP Podcast Episode 137: Next Day 1 podcast GCP Podcast Episode 138: Next Day 2 podcast GCP Podcast Episode 139: Next Day 3 podcast Unity and DeepMind partner to advance AI research blog Introducing PyTorch across Google Cloud blog Question of the week What were your personal highlights for 2018? Mark Agones Introducing Agones: Open-source, multiplayer, dedicated game-server hosting built on Kubernetes blog github The new website Having Melanie join me on the podcast Melanie Bringing Francesc back Meeting Grace GCP Podcast Episode 142: Agones With Mark Mandel and Cyril Tovena podcast Where can you find us next? It’s the holidays! Special thanks! Thank you guests Thank you Jennifer Thank you HD Interactive: James, Trae, Sabrina, and Sean Thank you Greg Thank you Neil, Chuck, and Shana Thank you MBooth for the website overhaul and social media support Thank you Francesc Thank you listeners!

halloween ai google tech africa unity wrap happy holidays communities ghana reducing machine learning robotics accessibility end of year open source globally data science fairness exact devops trae percy doodle chief scientist google cloud kubernetes ai ml remote working google ai deepmind google translate sre next day developer relations francesc ai research neurotic strange loop margaret mitchell google cloud platform raia kf timnit gebru distributed systems paul turner istio neurips jeff dean container security peter norvig knative gke cloud next liz fong jones anima anandkumar cassie kozyrkov project jupyter agones seth vargo maya kaczorowski open match tim hockin mandy chan google dataset search wellio gke on prem paris pittman

#44 Project Jupyter and Interactive Computing

DataFramed

Play Episode Listen Later Oct 14, 2018 65:11 Transcription Available

In this episode of DataFramed, Hugo speaks with Brian Granger, co-founder and co-lead of Project Jupyter, physicist and co-creator of the Altair package for statistical visualization in Python.They’ll speak about data science, interactive computing, open source software and Project Jupyter. With over 2.5 million public Jupyter notebooks on github alone, Project Jupyter is a force to be reckoned with. What is interactive computing and why is it important for data science work? What are all the the moving parts of the Jupyter ecosystem, from notebooks to JupyterLab to JupyterHub and binder and why are they so relevant as more and more institutions adopt open source software for interactive computing and data science? From Netflix running around 100,000 Jupyter notebook batch jobs a day to LIGO’s Nobel prize winning discovery of gravitational waves publishing all their results reproducibly using Notebooks, Project Jupyter is everywhere. Links from the show FROM THE INTERVIEWBrian on Twitter Project JupyterBeyond Interactive: Notebook Innovation at Netflix (Ufford, Pacer, Seal, Kelley, Netflix Tech Blog)Gravitational Wave Open Science Center (Tutorials)JupyterCon YouTube Playlistjupyterstream Github RepositoryFROM THE SEGMENTSMachines that Multi-Task (with Friederike Schüür of Fast Forward Labs)Part 1 at ~24:40Brief Introduction to Multi-Task Learning (By Friederike Schüür)Overview of Multi-Task Learning Use Cases (By Manny Moss)Multi-Task Learning for the Segmentation of Building Footprints (Bischke et al., arXiv.org)Multi-Task as Question Answering (McCann et al., arXiv.org)The Salesforce Natural Language Decathlon: A Multitask Challenge for NLP Part 2 at ~44:00Rich Caruana’s Awesome Overview of Multi-Task Learning and Why It WorksSebastian’s Ruder’s Overview of Multi-Task Learning in Deep Neural NetworksMassively Multi-Task Network for Drug Discovery, 259 Tasks (!) (Ramsundar et al. arXiv.org)Brief Overview of Multi-Task Learning with Video of Newsie, the Prototype (By Friederike Schüür) Original music and sounds by The Sticks.

netflix video original tasks seal nobel sticks python data science ruder segmentation notebooks pacer open source software drug discovery altair ligo multitask brief overview arxiv jupyter newsie interactive computing jupyterlab project jupyter brian granger jupyterhub

Hello, World: Programming Languages for the Polyglot Developer

Command Line Heroes

Play Episode Listen Later Sep 24, 2018 27:57

Every new programming language is created to do something previously impossible. Today, there are quite a few to choose from. But which ones do you really need to know? This episode dives into the history of programming languages. We recognize the genius of “Amazing Grace,” also known as Rear Admiral Grace Hopper. It’s thanks to her that developers don’t need a PhD in mathematics to write their programs in machine code. We’re joined by Carol Willing of Project Jupyter, former Director of the Python Software Foundation, and Clive Thompson, a contributor to The New York Times Magazine and Wired who’s writing a book about how programmers think. Reminder: this season we’re building our very own, open source Command Line Heroes game. And you are invited to contribute—in whatever way makes sense for you. Visit Command Line Heroes: The Game over on GitHub for more info. And drop us a line at redhat.com/commandlineheroes, we're listening...

ATLAS with Dr. Mario Lassnig

Google Cloud Platform Podcast

Play Episode Listen Later Sep 4, 2018 25:50

Our guest today is Dr. Mario Lassnig, a software engineer working on the ATLAS Experiment at CERN! Melanie and Mark put on their physics hats as they learn all about what it takes to manage the petabytes of data involved in such a large research project. Dr. Mario Lassnig Dr. Mario Lassnig has been working as a Software Engineer at the European Organisation for Nuclear Research (CERN) since 2006. Within the ATLAS Experiment, he is responsible for all aspects of its large-scale distributed data, including management, storage, network, and access. He is also one of the principal developers of the Rucio system for scientific data management. In his previous life, he developed mobile navigation software for multi-modal transportation in Vienna at Seibersdorf Research, as well as cryptographic smart-card applications for access control at the University of Klagenfurt. He holds a Master’s degree in Computer Science from the University of Klagenfurt, and a doctoral degree in Computer Science from the University of Innsbruck. Cool things of the week The Machines Can Do the Work, a Story of Kubernetes Testing, CI, and Automating the Contributor Experience blog Google Cloud grants $9M in credits for the operation of the Kubernetes project blog Improving job searches for veterans with Google Cloud’s Talent Solution blog Unity For Beginners… From a Beginner blog GCP Podcast Episode 134: Connected Games with Unity and Google Cloud with Brett Bibby and Micah Baker podcast Neural Information Processing Systems Conference site Interview Rucio - Scientific Data Management site CERN site ATLAS site Google Cloud Storage site Google Compute Engine site G Suite site GKE On-Prem site Rucio on GitHub site University of Oslo site University of Innsbruck site Brookhaven National Laboratory site University of Texas at Arlington site Square Kilometer Array site DUNE site LIGO Lab site Scientific Computing with Google Cloud Platform: Experiences from the Trenches in Particle Physics and Earth Sciences video GCP Podcast Episode 122: Project Jupyter with Jessica Forde, Yuvi Panda and Chris Holdgraf podcast Rucio Workshop site ACM/IEEE Supercomputing 2018 site Question of the week I am not familiar with Docker or Kubernetes - where can I get started? Docker Docker’s official “Getting Started” guide Katacoda’s free, interactive Docker course Kubernetes You should totally read this comic and interactive tutorial Katacoda’s free, interactive Kubernetes course Where can you find us next? Melanie will be at Deep Learning Indaba. Mark will be at Tokyo NEXT. We’ll both be at Strange Loop.

university texas master interview work story unity improving beginners dune getting started oslo computer science arlington github trenches automating cern software engineers google cloud docker kubernetes innsbruck 9m earth sciences g suite neurotic klagenfurt strange loop particle physics brookhaven national laboratory scientific computing nuclear research cern square kilometer array atlas experiment european organisation project jupyter deep learning indaba gke on prem

Decision Intelligence with Cassie Kozyrkov

Google Cloud Platform Podcast

Play Episode Listen Later May 22, 2018 39:37

Chief Decision Scientist, Cassie Kozyrkov joins Mark and Melanie this week to explain data science, analytics, machine learning and statistical inference, in relation to decision intelligence. Cassie Kozyrkov As Chief Decision Scientist at Google Cloud, Cassie advises leadership teams on decision process, AI strategy, and building data-driven organizations. She works to democratize statistical thinking and machine learning so that everyone - Google, its customers, the world! - can harness the beauty and power of data. She is the innovator behind the practice of Decision Intelligence Engineering at Google and she has personally trained over 15,000 Googlers in machine learning, statistics, and data-driven decision-making. Before her current role, she served in Google’s Office of the CTO as Chief Data Scientist. Prior to joining Google, Cassie worked as a data scientist and consultant. She holds degrees in mathematical statistics, economics, psychology, and cognitive neuroscience. When she’s not working, you’re most likely to find Cassie at the theater, in an art museum, exploring the world, or curled up with a good novel. Cool things of the week Cloud ML Engine adds Cloud TPU support for training blog docs Google Kubernetes Engine 1.10 is generally available and ready for the enterprise blog Introducing ultramem Google Compute Engine machine types blog Increase performance while reducing costs with the new App Engine scheduler blog docs Interview Decision Intelligence wikipedia Redhat Summit Keynote youtube Data Analytics wikipedia Statistical Analysis wikipedia Machine Learning wikipedia Deep Learning wikipedia There are several other episodes that provide insights into data science: #31 TensorFlow with Eli Bixby podcast #84 Kaggle with Wendy Kan podcast #109 Cloud AutoML Vision with Amy Unruh and Sara Robinson podcast #113 Open Source TensorFlow with Yifei Feng podcast #122 Project Jupyter with Jessica Forde, Yuvi Panda and Chris Holdgraf podcast As well as case studies on real world problems: #91 The Future of Media with Machine Learning with Amit Pande podcast #115 Google Play Marketing with Dom Elliott and Stewart Bryson podcast Question of the week How can I secure my Google Cloud Platoform acoount using a YubiKey? Securing your Cloud Platform Account with Security Keys docs Encrypting Google Application Default and gcloud credentials with GPG SmardCard blog Where can you find us next? Mark will be speaking at the Monthly SF Game Development Community, presenting on You Can’t Just Add More Servers on May the 30th in San Francisco. Melanie is speaking at a joint WiMLDS and PyLadies event “Paths to Data Science” on June 26th. More details to come.

#7: Fernando Perez — Creating IPython, Founding NumFOCUS, & The Stories Behind It All

Data Journeys

Play Episode Listen Later May 7, 2018 52:22

Fernando Perez is best-known as the creator of IPython and co-founder of Project Jupyter: a set of open-source data science tools that some may consider to be the equivalent of the bat & ball to the sport of baseball. Today, you really can’t play the game of data science without Jupyter Notebooks and our guest today is one of Jupyter's leads and originators (see here for the rest of the amazing team). Fernando is also an Assistant Professor in Statistics at UC Berekely, Researcher at the Berekely Institute for Data Science, and Founding Board Member of the NumFOCUS foundation — the community that creates the SciPy stack, along with virtually every other notable open source data science tool out there. This conversation was recorded in-person with Fernando in his office on UC Berekely’s campus, and it turned out to be the most humanizing, energizing, and down-to-earth interview I’ve had so far. Some of the many topics we covered include: what Fernando wanted to be while growing up in Medellin (Me-de-jean), Colombia the function that formal education played in his learning of data science the story behind IPython and Project Jupyter and it’s evolution over the past 10 years lessons learned about technical competence and human character from his mentors over the years what a “computational narrative” means to him and why it’s principles are key to data storytelling Fernando’s experience teaching a 650-student course (part of a pair of courses that are the largest of it's kind) as part of the Berekely Institute of Data Science Enjoy the show! Show Notes: https://ajgoldstein.com/podcast/ep7/ Fernando’s Twitter: https://twitter.com/fperez_org AJ’s Twitter: www.twitter.com/ajgoldstein393/

colombia assistant professor researchers statistics founding data science stories behind founding board member jupyter jupyter notebooks fernando perez scipy ipython numfocus project jupyter

Project Jupyter with Jessica Forde, Yuvi Panda and Chris Holdgraf

Google Cloud Platform Podcast

Play Episode Listen Later Apr 10, 2018 47:24

Jessica Forde, Yuvi Panda and Chris Holdgraf join Melanie and Mark to discuss Project Jupyter from it’s interactive notebook origin story to the various open source modular projects it’s grown into supporting data research and applications. We dive specifically into JupyterHub using Kubernetes to enable a multi-user server. We also talk about Binder, an interactive development environment that makes work easily reproducible. Jessica Forde Jessica Forde is a Project Jupyter Maintainer with a background in reinforcement learning and Bayesian statistics. At Project Jupyter, she works primarily on JupyterHub, Binder, and JuptyerLab to improve access to scientific computing and scientific research. Her previous open source projects include datamicroscopes, a DARPA-funded Bayesian nonparametrics library in Python, and density, a wireless device data tool at Columbia University. Jessica has also worked as a machine learning researcher and data scientist in a variety of applications including healthcare, energy, and human capital. Yuvi Panda Yuvi Panda is the Project Jupyter Technical Operations Architect in the UC Berkeley Data Sciences Division. He works on making it easy for people who don’t traditionally consider themselves “programmers” to do things with code. He builds tools (e.g., Quarry, PAWS, etc.) to sidestep the list of historical accidents that constitute the “command line tax” that people have to pay before doing productive things with computing. Chris Holdgraf Chris Holdgraf is a is a Project Jupyter Maintainer and Data Science Fellow at the Berkeley Institute for Data Science and a Community Architect at the Data Science Education Program at UC Berkeley. His background is in cognitive and computational neuroscience, where he used predictive models to understand the auditory system in the human brain. He’s interested in the boundary between technology, open-source software, and scientific workflows, as well as creating new pathways for this kind of work in science and the academy. He’s a core member of Project Jupyter, specifically working with JupyterHub and Binder, two open-source projects that make it easier for researchers and educators to do their work in the cloud. He works on these core tools, along with research and educational projects that use these tools at Berkeley and in the broader open science community. Cool things of the week Dragonball hosted on GC / powered by Spanner blog and GDC presentation at Developer Day Cloud Text-to-Speech API powered by DeepMind WaveNet blog and docs Now you can deploy to Kubernetes Engine from Gitlab blog Interview Jupyter site JupyterHub github Binder site and docs JupyterLab site Kubernetes site github Jupyter Notebook github LIGO (Laser Interferometer Gravitational-Wave Observatory) site and binder Paul Romer, World Bank Chief Economist blog and jupyter notebook The Scientific Paper is Obsolete article Large Scale Teaching Infrastructure with Kubernetes - Yuvi Panda, Berkeley University video Data 8: The Foundations of Data Science site Zero to JupyterHub site JupyterHub Deploy Docker github Jupyter Gitter channels Jupyter Pop-Up, May 15th site JupyterCon, Aug 21-24 site Question of the week How did Google’s predictions do during March Madness? How to build a realt time prediction model: Architecting live NCAA predictions Final Four halftime - fed data from first half to create prediction on second half and created a 30 second spot that ran on CBS before game play sample prediction ad Kaggle Competition site Where can you find us next? Melanie is speaking about AI at Techtonica today, and April 14th will be participating in a panel on Diversity and Inclusion at the Harker Research Symposium

ai google interview data diversity ncaa cbs speech inclusion march madness foundations columbia university berkeley final four machine learning uc berkeley panda open source python data science dragon ball paws gc darpa obsolete gdc kubernetes quarry binder gitlab bayesian forde architecting neurotic spanner jupyter jupyter notebooks berkeley university paul romer scientific computing scientific paper community architect yuvi ipython berkeley institute techtonica jupyterlab project jupyter world bank chief economist speech api jupytercon data science fellow jupyterhub

196: Fashionable Udon (higepon)

Rebuild

Play Episode Listen Later Dec 5, 2017 116:20

Taro Minowa さんをゲストに迎えて、Apple, ZOZOSUIT, AWS, ガジェット、Google Home, Emacs などについて話しました。 Show Notes M-1グランプリ公式サイト Apple’s High Sierra root login bug might have been discovered weeks ago Why blank root login works Apple Developer Forums post ZOZOSUIT ZOZO SUIT 後の世界に起こること - 副作用編 - Higepon’s blog An iOS 11 bug might crash your iPhone on December 2nd ktakayama/NotificationCrash Twitter kicks Android app users out for five hours due to 2015 date bug Amazon SageMaker Amazon S3 Select Is Now Available in Preview Project Jupyter Nike+ Run Club AirPods - Apple SoundSport Free: True Wireless Earbuds | Bose The Headphone - Bragi Liberty Series – Zolo Audio Bluetooth イヤホンワイヤレスイヤホン X3T | TAROME ZOLO by Anker Liberty Total-Wireless Earphones English Version Huami Amazfit Smartwatch Amazon | Echo - スマートスピーカー Google Home Mini Google Homeで「ピカチュウ」と会話 Flic - Smart Bluetooth Button メルカリNOW EmacsWiki: Anything 『ペルソナ５』サウンドインタビュー

RCE 116 Jupyter

RCE - Super Computers

Play Episode Listen Later Oct 28, 2017 43:02

Brian Granger is an associate professor of physics and data science at Cal Poly State University in San Luis Obispo, CA. His research focuses on building open-source tools for interactive computing, data science, and data visualization. Brian is a leader of the IPython project, co-founder of Project Jupyter, co-founder of the Altair project for statistical visualization, and an active contributor to a number of other open-source projects focused on data science in Python. He is an advisory board member of NumFOCUS and a faculty fellow of the Cal Poly Center for Innovation and Entrepreneurship.

innovation entrepreneurship python san luis obispo altair jupyter ipython numfocus project jupyter brian granger

171: Psychologically Safe Podcast (naoya)

Rebuild

Play Episode Listen Later Jan 10, 2017 90:59

Naoya Ito さんをゲストに迎えて、デザインパターン、Python, Pandas, データサイエンス、マネージメントなどについて話しました。 Show Notes 増田 (はてな匿名ダイアリー) Rebuild: 169: Your Blog Can Be Generated By Neural Networks (omo) リーダブルコード Java言語で学ぶデザインパターン入門マルチスレッド編 gensim Pandas Data Frame | R Tutorial Project Jupyter Is the Data Science market getting flooded? Network Programming with Perl Anaconda Perltidy Python for Data Analysis - O'Reilly Media Pythonによるデータ分析入門 Grumpy: Go running Python! Compiling Rust to WebAssembly Guide Rebuild: 97: Minimum Viable Standard (omo) 経営の”踊り場”問題悪いヤツほど出世する伊藤直也の1人CTO Nightの記事書き起こし Renewing Medium’s focus

safe night e3 rebuild python data science java pandas anaconda data analysis boswell psychologically 8b e3 naoya reilly media wes mckinney project jupyter

Interview: Brian Granger, co-Leader of Project Jupyter

Partially Derivative

Play Episode Listen Later Jul 10, 2016 31:40

Core contributor to IPython Notebook and Project Jupyter

science technology data startups beer co leader project jupyter brian granger

#44: Project Jupyter and IPython

Talk Python To Me - Python conversations for passionate developers

Play Episode Listen Later Feb 2, 2016 60:11

See the full show notes for this episode on the website at talkpython.fm/44.

online training web software developers programming python data science online courses cloud computing ide software developers web development mongodb nosql pycharm ipython project jupyter python3 python2

Becoming A Data Scientist Podcast Episode 02 – Safia Abdalla

Becoming A Data Scientist Podcast

Play Episode Listen Later Jan 4, 2016 44:15

In Episode 2 of the Becoming a Data Scientist Podcast, we meet Safia Abdalla, who started programming and even exploring machine learning and natural language processing as a teenager, and is now a student at Northwestern University, a conference speaker and trainer, co-organizer of PyLadies Chicago, and a contributor to Project Jupyter. Blog Post with Show Notes and Links

northwestern university blog post data scientists safia abdalla project jupyter

Podcasts about Project Jupyter

Best podcasts about Project Jupyter

Google Cloud Platform Podcast

Rebuild

Latest news about Project Jupyter

Latest podcast episodes about Project Jupyter

Teaching a Billion People to Code: How JupyterLite Is Scaling the Impossible

From Physics to the Future: Brian Granger on Project Jupyter in the Age of AI

Notebooks = Chat++ and RAG = RecSys! — with Bryan Bischof of Hex Magic

The R Programming Language

Jupyter and the Evolution of ML Tooling with Brian Granger - #544

Data Wrangling in JavaScript with Ashley Davis - JSJ 484

Data Wrangling in JavaScript with Ashley Davis - JSJ 484

Data Wrangling in JavaScript with Ashley Davis - JSJ 484

Hello, World: lenguajes de programación para el desarrollador políglota

21: Vicky Steeves on Reproducibility, Open Research, & Librarians (... and game modding)

E18 - 5Minutes - Das JuPyteR Projekt

AIM413: Deep dive on Project Jupyter

Episode 13: Jupyter Ecosystem

End of Year Wrap-up

#44 Project Jupyter and Interactive Computing

Hello, World: Programming Languages for the Polyglot Developer

ATLAS with Dr. Mario Lassnig

Decision Intelligence with Cassie Kozyrkov

#7: Fernando Perez — Creating IPython, Founding NumFOCUS, & The Stories Behind It All

Project Jupyter with Jessica Forde, Yuvi Panda and Chris Holdgraf

196: Fashionable Udon (higepon)

RCE 116 Jupyter

171: Psychologically Safe Podcast (naoya)

Interview: Brian Granger, co-Leader of Project Jupyter

#44: Project Jupyter and IPython

Becoming A Data Scientist Podcast Episode 02 – Safia Abdalla