Podcasts about Databricks

  • 298PODCASTS
  • 602EPISODES
  • 40mAVG DURATION
  • 5WEEKLY NEW EPISODES
  • Dec 1, 2023LATEST
Databricks

POPULARITY

20162017201820192020202120222023


Best podcasts about Databricks

Show all podcasts related to databricks

Latest podcast episodes about Databricks

This Week in Pre-IPO Stocks
E78: Shein IPO!, Canva IPO!, Anthropic +42% in 2ndary market, $323m for Neuralink | Pre-IPO Stock Market Update – Dec 1, 2023

This Week in Pre-IPO Stocks

Play Episode Listen Later Dec 1, 2023 8:33


00:08 | Shein files for US IPO- $32b 2023 revenue run rate, +41% from 2022 … following a 45% year over year increase 2022 vs 2021- $66b as of May 2023 Series G; Sequoia Capital, General Atlantic, Mubadala invested- Goldman Sachs, JPMorgan, Morgan Stanley are lead underwriters for IPO01:41 | Neuralink raises $323m- Neuralink offers brain-computer interface solutions to help people paralysis, neurodegenerative diseases, or simple connect their brain to the internet- $43m additional capital raised last week to finalize a $323m Series D- $3.5b valuation for this Series D, 66% increase from prior round03:05 | Canva to IPO in 2025- Canva provides graphic design tools, infused with AI, to make the layperson a graphic design/marketing savant- $2.0b in annualized revenue, 50% yearly growth rate, positive cash flow- Tender sale happening now, tbd on valuation- $25b secondary market valution04:21 | CEO Zhao out at Binance- US Treasury Dept fines Binance after violations of Bank Secrecy Act (i.e. bad AML controls); $4.3b for firm, $50m for CEO- Changpeng Zhao, Binance CEO, is out and Richard Teng, Binance head of regional markets, is new CEO05:19 | Anthropic up +42.5% this week- Anthropic is an OpenAI ChatGPT competitor- +42% in the secondary market last week as a result of the OpenAI board/CEO snafu- $40b secondary market valuation, +61% from Oct 2023 round at $25b just 1 month ago05:50 | Big capital raises- Envision AESC (aesc-group.com) | $1.5b Series C, $10.0b valuation- Neuralink (neuralink.com) | $323m Series D, $3.5b valuation- AI21 Labs (ai21.com) | $208m Series C, $1.4b valuation- BioCatch (biocatch.com) | $70m Series D, $1.1b valuation- Rokid Technology (global.rokid.com) | $112m Series I, $1.0b valuation07:24 | Pre-IPO +1.14% for week* NOTE: Neuralink's Nov 22, 2023 primary round valuation at $3.52b now included in the analysis below- Week winners: Anthropic +42.49%, Neuralink +5.65%, Klarna +3.17%, Databricks +1.64%, SpaceX +1.42%- Week losers: Ramp -7.17%, Rippling -5.57%, Notion -3.20%, Revolut -2.84%, Chainalysis -1.46%- Top valuations: ByteDance $201b, SpaceX $157b, OpenAI $77b, Stripe $48b, Databricks $46b lead in current valuationInvest in pre-IPO stocks with AG Dillon Funds - www.agdillon.com

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Catch us at Modular's ModCon next week with Chris Lattner, and join our community!Due to Bryan's very wide ranging experience in data science and AI across Blue Bottle (!), StitchFix, Weights & Biases, and now Hex Magic, this episode can be considered a two-parter.Notebooks = Chat++We've talked a lot about AI UX (in our meetups, writeups, and guest posts), and today we're excited to dive into a new old player in AI interfaces: notebooks! Depending on your background, you either Don't Like or you Like notebooks — they are the most popular example of Knuth's Literate Programming concept, basically a collection of cells; each cell can execute code, display it, and share its state with all the other cells in a notebook. They can also simply be Markdown cells to add commentary to the analysis. Notebooks have a long history but most recently became popular from iPython evolving into Project Jupyter, and a wave of notebook based startups from Observable to DeepNote and Databricks sprung up for the modern data stack.The first wave of AI applications has been very chat focused (ChatGPT, Character.ai, Perplexity, etc). Chat as a user interface has a few shortcomings, the major one being the inability to edit previous messages. We enjoyed Bryan's takes on why notebooks feel like “Chat++” and how they are building Hex Magic:* Atomic actions vs Stream of consciousness: in a chat interface, you make corrections by adding more messages to a conversation (i.e. “Can you try again by doing X instead?” or “I actually meant XYZ”). The context can easily get messy and confusing for models (and humans!) to follow. Notebooks' cell structure on the other hand allows users to go back to any previous cells and make edits without having to add new ones at the bottom. * “Airlocks” for repeatability: one of the ideas they came up with at Hex is “airlocks”, a collection of cells that depend on each other and keep each other in sync. If you have a task like “Create a summary of my customers' recent purchases”, there are many sub-tasks to be done (look up the data, sum the amounts, write the text, etc). Each sub-task will be in its own cell, and the airlock will keep them all in sync together.* Technical + Non-Technical users: previously you had to use Python / R / Julia to write notebooks code, but with models like GPT-4, natural language is usually enough. Hex is also working on lowering the barrier of entry for non-technical users into notebooks, similar to how Code Interpreter is doing the same in ChatGPT. Obviously notebooks aren't new for developers (OpenAI Cookbooks are a good example), but haven't had much adoption in less technical spheres. Some of the shortcomings of chat UIs + LLMs lowering the barrier of entry to creating code cells might make them a much more popular UX going forward.RAG = RecSys!We also talked about the LLMOps landscape and why it's an “iron mine” rather than a “gold rush”: I'll shamelessly steal [this] from a friend, Adam Azzam from Prefect. He says that [LLMOps] is more of like an iron mine than a gold mine in the sense of there is a lot of work to extract this precious, precious resource. Don't expect to just go down to the stream and do a little panning. There's a lot of work to be done. And frankly, the steps to go from this resource to something valuable is significant.Some of my favorite takeaways:* RAG as RecSys for LLMs: at its core, the goal of a RAG pipeline is finding the most relevant documents based on a task. This isn't very different from traditional recommendation system products that surface things for users. How can we apply old lessons to this new problem? Bryan cites fellow AIE Summit speaker and Latent Space Paper Club host Eugene Yan in decomposing the retrieval problem into retrieval, filtering, and scoring/ranking/ordering:As AI Engineers increasingly find that long context has tradeoffs, they will also have to relearn age old lessons that vector search is NOT all you need and a good systems not models approach is essential to scalable/debuggable RAG. Good thing Bryan has just written the first O'Reilly book about modern RecSys, eh?* Narrowing down evaluation: while “hallucination” is a easy term to throw around, the reality is more nuanced. A lot of times, model errors can be automatically fixed: is this JSON valid? If not, why? Is it just missing a closing brace? These smaller issues can be checked and fixed before returning the response to the user, which is easier than fixing the model.* Fine-tuning isn't all you need: when they first started building Magic, one of the discussions was around fine-tuning a model. In our episode with Jeremy Howard we talked about how fine-tuning leads to loss of capabilities as well. In notebooks, you are often dealing with domain-specific data (i.e. purchases, orders, wardrobe composition, household items, etc); the fact that the model understands that “items” are probably part of an “order” is really helpful. They have found that GPT-4 + 3.5-turbo were everything they needed to ship a great product rather than having to fine-tune on notebooks specifically.Definitely recommend listening to this one if you are interested in getting a better understanding of how to think about AI, data, and how we can use traditional machine learning lessons in large language models. The AI PivotFor more Bryan, don't miss his fireside chat at the AI Engineer Summit:Show Notes* Hex Magic* Bryan's new book: Building Recommendation Systems in Python and JAX* Bryan's whitepaper about MLOps* “Kitbashing in ML”, slides from his talk on building on top of foundation models* “Bayesian Statistics The Fun Way” by Will Kurt* Bryan's Twitter* “Berkeley man determined to walk every street in his city”* People:* Adam Azzam* Graham Neubig* Eugene Yan* Even OldridgeTimestamps* [00:00:00] Bryan's background* [00:02:34] Overview of Hex and the Magic product* [00:05:57] How Magic handles the complex notebook format to integrate cleanly with Hex* [00:08:37] Discussion of whether to build vs buy models - why Hex uses GPT-4 vs fine-tuning* [00:13:06] UX design for Magic with Hex's notebook format (aka “Chat++”)* [00:18:37] Expanding notebooks to less technical users* [00:23:46] The "Memex" as an exciting underexplored area - personal knowledge graph and memory augmentation* [00:27:02] What makes for good LLMops vs MLOps* [00:34:53] Building rigorous evaluators for Magic and best practices* [00:36:52] Different types of metrics for LLM evaluation beyond just end task accuracy* [00:39:19] Evaluation strategy when you don't own the core model that's being evaluated* [00:41:49] All the places you can make improvements outside of retraining the core LLM* [00:45:00] Lightning RoundTranscriptAlessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, Partner and CTO-in-Residence of Decibel Partners, and today I'm joining by Bryan Bischof. [00:00:15]Bryan: Hey, nice to meet you. [00:00:17]Alessio: So Bryan has one of the most thorough and impressive backgrounds we had on the show so far. Lead software engineer at Blue Bottle Coffee, which if you live in San Francisco, you know a lot about. And maybe you'll tell us 30 seconds on what that actually means. You worked as a data scientist at Stitch Fix, which used to be one of the premier data science teams out there. [00:00:38]Bryan: It used to be. Ouch. [00:00:39]Alessio: Well, no, no. Well, you left, you know, so how good can it still be? Then head of data science at Weights and Biases. You're also a professor at Rutgers and you're just wrapping up a new O'Reilly book as well. So a lot, a lot going on. Yeah. [00:00:52]Bryan: And currently head of AI at Hex. [00:00:54]Alessio: Let's do the Blue Bottle thing because I definitely want to hear what's the, what's that like? [00:00:58]Bryan: So I was leading data at Blue Bottle. I was the first data hire. I came in to kind of get the data warehouse in order and then see what we could build on top of it. But ultimately I mostly focused on demand forecasting, a little bit of recsys, a little bit of sort of like website optimization and analytics. But ultimately anything that you could imagine sort of like a retail company needing to do with their data, we had to do. I sort of like led that team, hired a few people, expanded it out. One interesting thing was I was part of the Nestle acquisition. So there was a period of time where we were sort of preparing for that and didn't know, which was a really interesting dynamic. Being acquired is a very not necessarily fun experience for the data team. [00:01:37]Alessio: I build a lot of internal tools for sourcing at the firm and we have a small VCs and data community of like other people doing it. And I feel like if you had a data feed into like the Blue Bottle in South Park, the Blue Bottle at the Hanahaus in Palo Alto, you can get a lot of secondhand information on the state of VC funding. [00:01:54]Bryan: Oh yeah. I feel like the real source of alpha is just bugging a Blue Bottle. [00:01:58]Alessio: Exactly. And what's your latest book about? [00:02:02]Bryan: I just wrapped up a book with a coauthor Hector Yee called Building Production Recommendation Systems. I'll give you the rest of the title because it's fun. It's in Python and JAX. And so for those of you that are like eagerly awaiting the first O'Reilly book that focuses on JAX, here you go. [00:02:17]Alessio: Awesome. And we'll chat about that later on. But let's maybe talk about Hex and Magic before. I've known Hex for a while, I've used it as a notebook provider and you've been working on a lot of amazing AI enabled experiences. So maybe run us through that. [00:02:34]Bryan: So I too, before I sort of like joined Hex, saw it as this like really incredible notebook platform, sort of a great place to do data science workflows, quite complicated, quite ad hoc interactive ones. And before I joined, I thought it was the best place to do data science workflows. And so when I heard about the possibility of building AI tools on top of that platform, that seemed like a huge opportunity. In particular, I lead the product called Magic. Magic is really like a suite of sort of capabilities as opposed to its own independent product. What I mean by that is they are sort of AI enhancements to the existing product. And that's a really important difference from sort of building something totally new that just uses AI. It's really important to us to enhance the already incredible platform with AI capabilities. So these are things like the sort of obvious like co-pilot-esque vibes, but also more interesting and dynamic ways of integrating AI into the product. And ultimately the goal is just to make people even more effective with the platform. [00:03:38]Alessio: How do you think about the evolution of the product and the AI component? You know, even if you think about 10 months ago, some of these models were not really good on very math based tasks. Now they're getting a lot better. I'm guessing a lot of your workloads and use cases is data analysis and whatnot. [00:03:53]Bryan: When I joined, it was pre 4 and it was pre the sort of like new chat API and all that. But when I joined, it was already clear that GPT was pretty good at writing code. And so when I joined, they had already executed on the vision of what if we allowed the user to ask a natural language prompt to an AI and have the AI assist them with writing code. So what that looked like when I first joined was it had some capability of writing SQL and it had some capability of writing Python and it had the ability to explain and describe code that was already written. Those very, what feel like now primitive capabilities, believe it or not, were already quite cool. It's easy to look back and think, oh, it's like kind of like Stone Age in these timelines. But to be clear, when you're building on such an incredible platform, adding a little bit of these capabilities feels really effective. And so almost immediately I started noticing how it affected my own workflow because ultimately as sort of like an engineering lead and a lot of my responsibility is to be doing analytics to make data driven decisions about what products we build. And so I'm actually using Hex quite a bit in the process of like iterating on our product. When I'm using Hex to do that, I'm using Magic all the time. And even in those early days, the amount that it sped me up, that it enabled me to very quickly like execute was really impressive. And so even though the models weren't that good at certain things back then, that capability was not to be underestimated. But to your point, the models have evolved between 3.5 Turbo and 4. We've actually seen quite a big enhancement in the kinds of tasks that we can ask Magic and even more so with things like function calling and understanding a little bit more of the landscape of agent workflows, we've been able to really accelerate. [00:05:57]Alessio: You know, I tried using some of the early models in notebooks and it actually didn't like the IPyNB formatting, kind of like a JSON plus XML plus all these weird things. How have you kind of tackled that? Do you have some magic behind the scenes to make it easier for models? Like, are you still using completely off the shelf models? Do you have some proprietary ones? [00:06:19]Bryan: We are using at the moment in production 3.5 Turbo and GPT-4. I would say for a large number of our applications, GPT-4 is pretty much required. To your question about, does it understand the structure of the notebook? And does it understand all of this somewhat complicated wrappers around the content that you want to show? We do our very best to abstract that away from the model and make sure that the model doesn't have to think about what the cell wrapper code looks like. Or for our Magic charts, it doesn't have to speak the language of Vega. These are things that we put a lot of work in on the engineering side, to the AI engineer profile. This is the AI engineering work to get all of that out of the way so that the model can speak in the languages that it's best at. The model is quite good at SQL. So let's ensure that it's speaking the language of SQL and that we are doing the engineering work to get the output of that model, the generations, into our notebook format. So too for other cell types that we support, including charts, and just in general, understanding the flow of different cells, understanding what a notebook is, all of that is hard work that we've done to ensure that the model doesn't have to learn anything like that. I remember early on, people asked the question, are you going to fine tune a model to understand Hex cells? And almost immediately, my answer was no. No we're not. Using fine-tuned models in 2022, I was already aware that there are some limitations of that approach and frankly, even using GPT-3 and GPT-2 back in the day in Stitch Fix, I had already seen a lot of instances where putting more effort into pre- and post-processing can avoid some of these larger lifts. [00:08:14]Alessio: You mentioned Stitch Fix and GPT-2. How has the balance between build versus buy, so to speak, evolved? So GPT-2 was a model that was not super advanced, so for a lot of use cases it was worth building your own thing. Is with GPT-4 and the likes, is there a reason to still build your own models for a lot of this stuff? Or should most people be fine-tuning? How do you think about that? [00:08:37]Bryan: Sometimes people ask, why are you using GPT-4 and why aren't you going down the avenue of fine-tuning today? I can get into fine-tuning specifically, but I do want to talk a little bit about the good old days of GPT-2. Shout out to Reza. Reza introduced me to GPT-2. I still remember him explaining the difference between general transformers and GPT. I remember one of the tasks that we wanted to solve with transformer-based generative models at Stitch Fix were writing descriptions of clothing. You might think, ooh, that's a multi-modal problem. The answer is, not necessarily. We actually have a lot of features about the clothes that are almost already enough to generate some reasonable text. I remember at that time, that was one of the first applications that we had considered. There was a really great team of NLP scientists at Stitch Fix who worked on a lot of applications like this. I still remember being exposed to the GPT endpoint back in the days of 2. If I'm not mistaken, and feel free to fact check this, I'm pretty sure Stitch Fix was the first OpenAI customer, unlike their true enterprise application. Long story short, I ultimately think that depending on your task, using the most cutting-edge general model has some advantages. If those are advantages that you can reap, then go for it. So at Hex, why GPT-4? Why do we need such a general model for writing code, writing SQL, doing data analysis? Shouldn't a fine-tuned model just on Kaggle notebooks be good enough? I'd argue no. And ultimately, because we don't have one specific sphere of data that we need to write great data analysis workbooks for, we actually want to provide a platform for anyone to do data analysis about their business. To do that, you actually need to entertain an extremely general universe of concepts. So as an example, if you work at Hex and you want to do data analysis, our projects are called Hexes. That's relatively straightforward to teach it. There's a concept of a notebook. These are data science notebooks, and you want to ask analytics questions about notebooks. Maybe if you trained on notebooks, you could answer those questions, but let's come back to Blue Bottle. If I'm at Blue Bottle and I have data science work to do, I have to ask it questions about coffee. I have to ask it questions about pastries, doing demand forecasting. And so very quickly, you can see that just by serving just those two customers, a model purely fine-tuned on like Kaggle competitions may not actually fit the bill. And so the more and more that you want to build a platform that is sufficiently general for your customer base, the more I think that these large general models really pack a lot of additional opportunity in. [00:11:21]Alessio: With a lot of our companies, we talked about stuff that you used to have to extract features for, now you have out of the box. So say you're a travel company, you want to do a query, like show me all the hotels and places that are warm during spring break. It would be just literally like impossible to do before these models, you know? But now the model knows, okay, spring break is like usually these dates and like these locations are usually warm. So you get so much out of it for free. And in terms of Magic integrating into Hex, I think AI UX is one of our favorite topics and how do you actually make that seamless. In traditional code editors, the line of code is like kind of the atomic unit and HEX, you have the code, but then you have the cell also. [00:12:04]Bryan: I think the first time I saw Copilot and really like fell in love with Copilot, I thought finally, fancy auto-complete. And that felt so good. It felt so elegant. It felt so right sized for the task. But as a data scientist, a lot of the work that you do previous to the ML engineering part of the house, you're working in these cells and these cells are atomic. They're expressing one idea. And so ultimately, if you want to make the transition from something like this code, where you've got like a large amount of code and there's a large amount of files and they kind of need to have awareness of one another, and that's a long story and we can talk about that. But in this atomic, somewhat linear flow through the notebook, what you ultimately want to do is you want to reason with the agent at the level of these individual thoughts, these atomic ideas. Usually it's good practice in say Jupyter notebook to not let your cells get too big. If your cell doesn't fit on one page, that's like kind of a code smell, like why is it so damn big? What are you doing in this cell? That also lends some hints as to what the UI should feel like. I want to ask questions about this one atomic thing. So you ask the agent, take this data frame and strip out this prefix from all the strings in this column. That's an atomic task. It's probably about two lines of pandas. I can write it, but it's actually very natural to ask magic to do that for me. And what I promise you is that it is faster to ask magic to do that for me. At this point, that kind of code, I never write. And so then you ask the next question, which is what should the UI be to do chains, to do multiple cells that work together? Because ultimately a notebook is a chain of cells and actually it's a first class citizen for Hex. So we have a DAG and the DAG is the execution DAG for the individual cells. This is one of the reasons that Hex is reactive and kind of dynamic in that way. And so the very next question is, what is the sort of like AI UI for these collections of cells? And back in June and July, we thought really hard about what does it feel like to ask magic a question and get a short chain of cells back that execute on that task. And so we've thought a lot about sort of like how that breaks down into individual atomic units and how those are tied together. We introduced something which is kind of an internal name, but it's called the airlock. And the airlock is exactly a sequence of cells that refer to one another, understand one another, use things that are happening in other cells. And it gives you a chance to sort of preview what magic has generated for you. Then you can accept or reject as an entire group. And that's one of the reasons we call it an airlock, because at any time you can sort of eject the airlock and see it in the space. But to come back to your question about how the AI UX fits into this notebook, ultimately a notebook is very conversational in its structure. I've got a series of thoughts that I'm going to express as a series of cells. And sometimes if I'm a kind data scientist, I'll put some text in between them too, explaining what on earth I'm doing. And that feels, in my opinion, and I think this is quite shared amongst exons, that feels like a really nice refinement of the chat UI. I've been saying for several months now, like, please stop building chat UIs. There is some irony because I think what the notebook allows is like chat plus plus. [00:15:36]Alessio: Yeah, I think the first wave of everything was like chat with X. So it was like chat with your data, chat with your documents and all of this. But people want to code, you know, at the end of the day. And I think that goes into the end user. I think most people that use notebooks are software engineer, data scientists. I think the cool things about these models is like people that are not traditionally technical can do a lot of very advanced things. And that's why people like code interpreter and chat GBT. How do you think about the evolution of that persona? Do you see a lot of non-technical people also now coming to Hex to like collaborate with like their technical folks? [00:16:13]Bryan: Yeah, I would say there might even be more enthusiasm than we're prepared for. We're obviously like very excited to bring what we call the like low floor user into this world and give more people the opportunity to self-serve on their data. We wanted to start by focusing on users who are already familiar with Hex and really make magic fantastic for them. One of the sort of like internal, I would say almost North Stars is our team's charter is to make Hex feel more magical. That is true for all of our users, but that's easiest to do on users that are already able to use Hex in a great way. What we're hearing from some customers in particular is sort of like, I'm excited for some of my less technical stakeholders to get in there and start asking questions. And so that raises a lot of really deep questions. If you immediately enable self-service for data, which is almost like a joke over the last like maybe like eight years, if you immediately enabled self-service, what challenges does that bring with it? What risks does that bring with it? And so it has given us the opportunity to think about things like governance and to think about things like alignment with the data team and making sure that the data team has clear visibility into what the self-service looks like. Having been leading a data team, trying to provide answers for stakeholders and hearing that they really want to self-serve, a question that we often found ourselves asking is, what is the easiest way that we can keep them on the rails? What is the easiest way that we can set up the data warehouse and set up our tools such that they can ask and answer their own questions without coming away with like false answers? Because that is such a priority for data teams, it becomes an important focus of my team, which is, okay, magic may be an enabler. And if it is, what do we also have to respect? We recently introduced the data manager and the data manager is an auxiliary sort of like tool on the Hex platform to allow people to write more like relevant metadata about their data warehouse to make sure that magic has access to the best information. And there are some things coming to kind of even further that story around governance and understanding. [00:18:37]Alessio: You know, you mentioned self-serve data. And when I was like a joke, you know, the whole rush to the modern data stack was something to behold. Do you think AI is like in a similar space where it's like a bit of a gold rush? [00:18:51]Bryan: I have like sort of two comments here. One I'll shamelessly steal from a friend, Adam Azzam from Prefect. He says that this is more of like an iron mine than a gold mine in the sense of there is a lot of work to extract this precious, precious resource. And that's the first one is I think, don't expect to just go down to the stream and do a little panning. There's a lot of work to be done. And frankly, the steps to go from this like gold to, or this resource to something valuable is significant. I think people have gotten a little carried away with the old maxim of like, don't go pan for gold, sell pickaxes and shovels. It's a much stronger business model. At this point, I feel like I look around and I see more pickaxe salesmen and shovel salesmen than I do prospectors. And that scares me a little bit. Metagame where people are starting to think about how they can build tools for people building tools for AI. And that starts to give me a little bit of like pause in terms of like, how confident are we that we can even extract this resource into something valuable? I got a text message from a VC earlier today, and I won't name the VC or the fund, but the question was, what are some medium or large size companies that have integrated AI into their platform in a way that you're really impressed by? And I looked at the text message for a few minutes and I was finding myself thinking and thinking, and I responded, maybe only co-pilot. It's been a couple hours now, and I don't think I've thought of another one. And I think that's where I reflect again on this, like iron versus gold. If it was really gold, I feel like I'd be more blown away by other AI integrations. And I'm not yet. [00:20:40]Alessio: I feel like all the people finding gold are the ones building things that traditionally we didn't focus on. So like mid-journey. I've talked to a company yesterday, which I'm not going to name, but they do agents for some use case, let's call it. They are 11 months old. They're making like 8 million a month in revenue, but in a space that you wouldn't even think about selling to. If you were like a shovel builder, you wouldn't even go sell to those people. And Swix talks about this a bunch, about like actually trying to go application first for some things. Let's actually see what people want to use and what works. What do you think are the most maybe underexplored areas in AI? Is there anything that you wish people were actually trying to shovel? [00:21:23]Bryan: I've been saying for a couple of months now, if I had unlimited resources and I was just sort of like truly like, you know, on my own building whatever I wanted, I think the thing that I'd be most excited about is building sort of like the personal Memex. The Memex is something that I've wanted since I was a kid. And are you familiar with the Memex? It's the memory extender. And it's this idea that sort of like human memory is quite weak. And so if we can extend that, then that's a big opportunity. So I think one of the things that I've always found to be one of the limiting cases here is access. How do you access that data? Even if you did build that data like out, how would you quickly access it? And one of the things I think there's a constellation of technologies that have come together in the last couple of years that now make this quite feasible. Like information retrieval has really improved and we have a lot more simple systems for getting started with information retrieval to natural language is ultimately the interface that you'd really like these systems to work on, both in terms of sort of like structuring the data and preparing the data, but also on the retrieval side. So what keys off the query for retrieval, probably ultimately natural language. And third, if you really want to go into like the purely futuristic aspect of this, it is latent voice to text. And that is also something that has quite recently become possible. I did talk to a company recently called gather, which seems to have some cool ideas in this direction, but I haven't seen yet what I, what I really want, which is I want something that is sort of like every time I listen to a podcast or I watch a movie or I read a book, it sort of like has a great vector index built on top of all that information that's contained within. And then when I'm having my next conversation and I can't quite remember the name of this person who did this amazing thing, for example, if we're talking about the Memex, it'd be really nice to have Vannevar Bush like pop up on my, you know, on my Memex display, because I always forget Vannevar Bush's name. This is one time that I didn't, but I often do. This is something that I think is only recently enabled and maybe we're still five years out before it can be good, but I think it's one of the most exciting projects that has become possible in the last three years that I think generally wasn't possible before. [00:23:46]Alessio: Would you wear one of those AI pendants that record everything? [00:23:50]Bryan: I think I'm just going to do it because I just like support the idea. I'm also admittedly someone who, when Google Glass first came out, thought that seems awesome. I know that there's like a lot of like challenges about the privacy aspect of it, but it is something that I did feel was like a disappointment to lose some of that technology. Fun fact, one of the early Google Glass developers was this MIT computer scientist who basically built the first wearable computer while he was at MIT. And he like took notes about all of his conversations in real time on his wearable and then he would have real time access to them. Ended up being kind of a scandal because he wanted to use a computer during his defense and they like tried to prevent him from doing it. So pretty interesting story. [00:24:35]Alessio: I don't know but the future is going to be weird. I can tell you that much. Talking about pickaxes, what do you think about the pickaxes that people built before? Like all the whole MLOps space, which has its own like startup graveyard in there. How are those products evolving? You know, you were at Wits and Biases before, which is now doing a big AI push as well. [00:24:57]Bryan: If you really want to like sort of like rub my face in it, you can go look at my white paper on MLOps from 2022. It's interesting. I don't think there's many things in that that I would these days think are like wrong or even sort of like naive. But what I would say is there are both a lot of analogies between MLOps and LLMops, but there are also a lot of like key differences. So like leading an engineering team at the moment, I think a lot more about good engineering practices than I do about good ML practices. That being said, it's been very convenient to be able to see around corners in a few of the like ML places. One of the first things I did at Hex was work on evals. This was in February. I hadn't yet been overwhelmed by people talking about evals until about May. And the reason that I was able to be a couple of months early on that is because I've been building evals for ML systems for years. I don't know how else to build an ML system other than start with the evals. I teach my students at Rutgers like objective framing is one of the most important steps in starting a new data science project. If you can't clearly state what your objective function is and you can't clearly state how that relates to the problem framing, you've got no hope. And I think that is a very shared reality with LLM applications. Coming back to one thing you mentioned from earlier about sort of like the applications of these LLMs. To that end, I think what pickaxes I think are still very valuable is understanding systems that are inherently less predictable, that are inherently sort of experimental. On my engineering team, we have an experimentalist. So one of the AI engineers, his focus is experiments. That's something that you wouldn't normally expect to see on an engineering team. But it's important on an AI engineering team to have one person whose entire focus is just experimenting, trying, okay, this is a hypothesis that we have about how the model will behave. Or this is a hypothesis we have about how we can improve the model's performance on this. And then going in, running experiments, augmenting our evals to test it, et cetera. What I really respect are pickaxes that recognize the hybrid nature of the sort of engineering tasks. They are ultimately engineering tasks with a flavor of ML. And so when systems respect that, I tend to have a very high opinion. One thing that I was very, very aligned with Weights and Biases on is sort of composability. These systems like ML systems need to be extremely composable to make them much more iterative. If you don't build these systems in composable ways, then your integration hell is just magnified. When you're trying to iterate as fast as people need to be iterating these days, I think integration hell is a tax not worth paying. [00:27:51]Alessio: Let's talk about some of the LLM native pickaxes, so to speak. So RAG is one. One thing is doing RAG on text data. One thing is doing RAG on tabular data. We're releasing tomorrow our episode with Kube, the semantic layer company. Curious to hear your thoughts on it. How are you doing RAG, pros, cons? [00:28:11]Bryan: It became pretty obvious to me almost immediately that RAG was going to be important. Because ultimately, you never expect your model to have access to all of the things necessary to respond to a user's request. So as an example, Magic users would like to write SQL that's relevant to their business. And it's important then to have the right data objects that they need to query. We can't expect any LLM to understand our user's data warehouse topology. So what we can expect is that we can build a RAG system that is data warehouse aware, data topology aware, and use that to provide really great information to the model. If you ask the model, how are my customers trending over time? And you ask it to write SQL to do that. What is it going to do? Well, ultimately, it's going to hallucinate the structure of that data warehouse that it needs to write a general query. Most likely what it's going to do is it's going to look in its sort of memory of Stack Overflow responses to customer queries, and it's going to say, oh, it's probably a customer stable and we're in the age of DBT, so it might be even called, you know, dim customers or something like that. And what's interesting is, and I encourage you to try, chatGBT will do an okay job of like hallucinating up some tables. It might even hallucinate up some columns. But what it won't do is it won't understand the joins in that data warehouse that it needs, and it won't understand the data caveats or the sort of where clauses that need to be there. And so how do you get it to understand those things? Well, this is textbook RAG. This is the exact kind of thing that you expect RAG to be good at augmenting. But I think where people who have done a lot of thinking about RAG for the document case, they think of it as chunking and sort of like the MapReduce and the sort of like these approaches. But I think people haven't followed this train of thought quite far enough yet. Jerry Liu was on the show and he talked a little bit about thinking of this as like information retrieval. And I would push that even further. And I would say that ultimately RAG is just RecSys for LLM. As I kind of already mentioned, I'm a little bit recommendation systems heavy. And so from the beginning, RAG has always felt like RecSys to me. It has always felt like you're building a recommendation system. And what are you trying to recommend? The best possible resources for the LLM to execute on a task. And so most of my approach to RAG and the way that we've improved magic via retrieval is by building a recommendation system. [00:30:49]Alessio: It's funny, as you mentioned that you spent three years writing the book, the O'Reilly book. Things must have changed as you wrote the book. I don't want to bring out any nightmares from there, but what are the tips for people who want to stay on top of this stuff? Do you have any other favorite newsletters, like Twitter accounts that you follow, communities you spend time in? [00:31:10]Bryan: I am sort of an aggressive reader of technical books. I think I'm almost never disappointed by time that I've invested in reading technical manuscripts. I find that most people write O'Reilly or similar books because they've sort of got this itch that they need to scratch, which is that I have some ideas, I have some understanding that we're hard won, I need to tell other people. And there's something that, from my experience, correlates between that itch and sort of like useful information. As an example, one of the people on my team, his name is Will Kurt, he wrote a book sort of Bayesian statistics the fun way. I knew some Bayesian statistics, but I read his book anyway. And the reason was because I was like, if someone feels motivated to write a book called Bayesian statistics the fun way, they've got something to say about Bayesian statistics. I learned so much from that book. That book is like technically like targeted at someone with less knowledge and experience than me. And boy, did it humble me about my understanding of Bayesian statistics. And so I think this is a very boring answer, but ultimately like I read a lot of books and I think that they're a really valuable way to learn these things. I also regrettably still read a lot of Twitter. There is plenty of noise in that signal, but ultimately it is still usually like one of the first directions to get sort of an instinct for what's valuable. The other comment that I want to make is we are in this age of sort of like archive is becoming more of like an ad platform. I think that's a little challenging right now to kind of use it the way that I used to use it, which is for like higher signal. I've chatted a lot with a CMU professor, Graham Neubig, and he's been doing LLM evaluation and LLM enhancements for about five years and know that I didn't misspeak. And I think talking to him has provided me a lot of like directionality for more believable sources. Trying to cut through the hype. I know that there's a lot of other things that I could mention in terms of like just channels, but ultimately right now I think there's almost an abundance of channels and I'm a little bit more keen on high signal. [00:33:18]Alessio: The other side of it is like, I see so many people say, Oh, I just wrote a paper on X and it's like an article. And I'm like, an article is not a paper, but it's just funny how I know we were kind of chatting before about terms being reinvented and like people that are not from this space kind of getting into AI engineering now. [00:33:36]Bryan: I also don't want to be gatekeepy. Actually I used to say a lot to people, don't be shy about putting your ideas down on paper. I think it's okay to just like kind of go for it. And I, I myself have something on archive that is like comically naive. It's intentionally naive. Right now I'm less concerned by more naive approaches to things than I am by the purely like advertising approach to sort of writing these short notes and articles. I think blogging still has a good place. And I remember getting feedback during my PhD thesis that like my thesis sounded more like a long blog post. And I now feel like that curmudgeonly professor who's also like, yeah, maybe just keep this to the blogs. That's funny.Alessio: Uh, yeah, I think one of the things that Swyx said when he was opening the AI engineer summit a couple of weeks ago was like, look, most people here don't know much about the space because it's so new and like being open and welcoming. I think it's one of the goals. And that's why we try and keep every episode at a level that it's like, you know, the experts can understand and learn something, but also the novices can kind of like follow along. You mentioned evals before. I think that's one of the hottest topics obviously out there right now. What are evals? How do we know if they work? Yeah. What are some of the fun learnings from building them into X? [00:34:53]Bryan: I said something at the AI engineer summit that I think a few people have already called out, which is like, if you can't get your evals to be sort of like objective, then you're not trying hard enough. I stand by that statement. I'm not going to, I'm not going to walk it back. I know that that doesn't feel super good because people, people want to think that like their unique snowflake of a problem is too nuanced. But I think this is actually one area where, you know, in this dichotomy of like, who can do AI engineering? And the answer is kind of everybody. Software engineering can become AI engineering and ML engineering can become AI engineering. One thing that I think the more data science minded folk have an advantage here is we've gotten more practice in taking very vague notions and trying to put a like objective function around that. And so ultimately I would just encourage everybody who wants to build evals, just work incredibly hard on codifying what is good and bad in terms of these objective metrics. As far as like how you go about turning those into evals, I think it's kind of like sweat equity. Unfortunately, I told the CEO of gantry several months ago, I think it's been like six months now that I was sort of like looking at every single internal Hex request to magic by hand with my eyes and sort of like thinking, how can I turn this into an eval? Is there a way that I can take this real request during this dog foodie, not very developed stage? How can I make that into an evaluation? That was a lot of sweat equity that I put in a lot of like boring evenings, but I do think ultimately it gave me a lot of understanding for the way that the model was misbehaving. Another thing is how can you start to understand these misbehaviors as like auxiliary evaluation metrics? So there's not just one evaluation that you want to do for every request. It's easy to say like, did this work? Did this not work? Did the response satisfy the task? But there's a lot of other metrics that you can pull off these questions. And so like, let me give you an example. If it writes SQL that doesn't reference a table in the database that it's supposed to be querying against, we would think of that as a hallucination. You could separately consider, is it a hallucination as a valuable metric? You could separately consider, does it get the right answer? The right answer is this sort of like all in one shot, like evaluation that I think people jump to. But these intermediary steps are really important. I remember hearing that GitHub had thousands of lines of post-processing code around Copilot to make sure that their responses were sort of correct or in the right place. And that kind of sort of defensive programming against bad responses is the kind of thing that you can build by looking at many different types of evaluation metrics. Because you can say like, oh, you know, the Copilot completion here is mostly right, but it doesn't close the brace. Well, that's the thing you can check for. Or, oh, this completion is quite good, but it defines a variable that was like already defined in the file. Like that's going to have a problem. That's an evaluation that you could check separately. And so this is where I think it's easy to convince yourself that all that matters is does it get the right answer? But the more that you think about production use cases of these things, the more you find a lot of this kind of stuff. One simple example is like sometimes the model names the output of a cell, a variable that's already in scope. Okay. Like we can just detect that and like we can just fix that. And this is the kind of thing that like evaluations over time and as you build these evaluations over time, you really can expand the robustness in which you trust these models. And for a company like Hex, who we need to put this stuff in GA, we can't just sort of like get to demo stage or even like private beta stage. We really hunting GA on all of these capabilities. Did it get the right answer on some cases is not good enough. [00:38:57]Alessio: I think the follow up question to that is in your past roles, you own the model that you're evaluating against. Here you don't actually have control into how the model evolves. How do you think about the model will just need to improve or we'll use another model versus like we can build kind of like engineering post-processing on top of it. How do you make the choice? [00:39:19]Bryan: So I want to say two things here. One like Jerry Liu talked a little bit about in his episode, he talked a little bit about sort of like you don't always want to retrain the weights to serve certain use cases. Rag is another tool that you can use to kind of like soft tune. I think that's right. And I want to go back to my favorite analogy here, which is like recommendation systems. When you build a recommendation system, you build the objective function. You think about like what kind of recs you want to provide, what kind of features you're allowed to use, et cetera, et cetera. But there's always another step. There's this really wonderful collection of blog posts from Eugene Yon and then ultimately like even Oldridge kind of like iterated on that for the Merlin project where there's this multi-stage recommender. And the multi-stage recommender says the first step is to do great retrieval. Once you've done great retrieval, you then need to do great ranking. Once you've done great ranking, you need to then do a good job serving. And so what's the analogy here? Rag is retrieval. You can build different embedding models to encode different features in your latent space to ensure that your ranking model has the best opportunity. Now you might say, oh, well, my ranking model is something that I've got a lot of capability to adjust. I've got full access to my ranking model. I'm going to retrain it. And that's great. And you should. And over time you will. But there's one more step and that's downstream and that's the serving. Serving often sounds like I just show the s**t to the user, but ultimately serving is things like, did I provide diverse recommendations? Going back to Stitch Fix days, I can't just recommend them five shirts of the same silhouette and cut. I need to serve them a diversity of recommendations. Have I respected their requirements? They clicked on something that got them to this place. Is the recommendations relevant to that query? Are there any hard rules? Do we maybe not have this in stock? These are all things that you put downstream. And so much like the recommendations use case, there's a lot of knobs to pull outside of retraining the model. And even in recommendation systems, when do you retrain your model for ranking? Not nearly as much as you do other s**t. And even this like embedding model, you might fiddle with more often than the true ranking model. And so I think the only piece of the puzzle that you don't have access to in the LLM case is that sort of like middle step. That's okay. We've got plenty of other work to do. So right now I feel pretty enabled. [00:41:56]Alessio: That's great. You obviously wrote a book on RecSys. What are some of the key concepts that maybe people that don't have a data science background, ML background should keep in mind as they work in this area? [00:42:07]Bryan: It's easy to first think these models are stochastic. They're unpredictable. Oh, well, what are we going to do? I think of this almost like gaseous type question of like, if you've got this entropy, where can you put the entropy? Where can you let it be entropic and where can you constrain it? And so what I want to say here is think about the cases where you need it to be really tightly constrained. So why are people so excited about function calling? Because function calling feels like a way to constrict it. Where can you let it be more gaseous? Well, maybe in the way that it talks about what it wants to do. Maybe for planning, if you're building agents and you want to do sort of something chain of thoughty. Well, that's a place where the entropy can happily live. When you're building applications of these models, I think it's really important as part of the problem framing to be super clear upfront. These are the things that can be entropic. These are the things that cannot be. These are the things that need to be super rigid and really, really aligned to a particular schema. We've had a lot of success in making specific the parts that need to be precise and tightly schemified, and that has really paid dividends. And so other analogies from data science that I think are very valuable is there's the sort of like human in the loop analogy, which has been around for quite a while. And I have gone on record a couple of times saying that like, I don't really love human in the loop. One of the things that I think we can learn from human in the loop is that the user is the best judge of what is good. And the user is pretty motivated to sort of like interact and give you kind of like additional nudges in the direction that you want. I think what I'd like to flip though, is instead of human in the loop, I'd like it to be AI in the loop. I'd rather center the user. I'd rather keep the user as the like core item at the center of this universe. And the AI is a tool. By switching that analogy a little bit, what it allows you to do is think about where are the places in which the user can reach for this as a tool, execute some task with this tool, and then go back to doing their workflow. It still gets this back and forth between things that computers are good at and things that humans are good at, which has been valuable in the human loop paradigm. But it allows us to be a little bit more, I would say, like the designers talk about like user-centered. And I think that's really powerful for AI applications. And it's one of the things that I've been trying really hard with Magic to make that feel like the workflow as the AI is right there. It's right where you're doing your work. It's ready for you anytime you need it. But ultimately you're in charge at all times and your workflow is what we care the most about. [00:44:56]Alessio: Awesome. Let's jump into lightning round. What's something that is not on your LinkedIn that you're passionate about or, you know, what's something you would give a TED talk on that is not work related? [00:45:05]Bryan: So I walk a lot. [00:45:07]Bryan: I have walked every road in Berkeley. And I mean like every part of every road even, not just like the binary question of, have you been on this road? I have this little app that I use called Wanderer, which just lets me like kind of keep track of everywhere I've been. And so I'm like a little bit obsessed. My wife would say a lot a bit obsessed with like what I call new roads. I'm actually more motivated by trails even than roads, but like I'm a maximalist. So kind of like everything and anything. Yeah. Believe it or not, I was even like in the like local Berkeley paper just talking about walking every road. So yeah, that's something that I'm like surprisingly passionate about. [00:45:45]Alessio: Is there a most underrated road in Berkeley? [00:45:49]Bryan: What I would say is like underrated is Kensington. So Kensington is like a little town just a teeny bit north of Berkeley, but still in the Berkeley hills. And Kensington is so quirky and beautiful. And it's a really like, you know, don't sleep on Kensington. That being said, one of my original motivations for doing all this walking was people always tell me like, Berkeley's so quirky. And I was like, how quirky is Berkeley? Turn it out. It's quite, quite quirky. It's also hard to say quirky and Berkeley in the same sentence I've learned as of now. [00:46:20]Alessio: That's a, that's a good podcast warmup for our next guests. All right. The actual lightning ground. So we usually have three questions, acceleration, exploration, then a takeaway acceleration. What's, what's something that's already here today that you thought would take much longer to arrive in AI and machine learning? [00:46:39]Bryan: So I invited the CEO of Hugging Face to my seminar when I worked at Stitch Fix and his talk at the time, honestly, like really annoyed me. The talk was titled like something to the effect of like LLMs are going to be the like technology advancement of the next decade. It's on YouTube. You can find it. I don't remember exactly the title, but regardless, it was something like LLMs for the next decade. And I was like, okay, they're like one modality of model, like whatever. His talk was fine. Like, I don't think it was like particularly amazing or particularly poor, but what I will say is damn, he was right. Like I, I don't think I quite was on board during that talk where I was like, ah, maybe, you know, like there's a lot of other modalities that are like moving pretty quick. I thought things like RL were going to be the like real like breakout success. And there's a little pun with Atari and breakout there, but yeah, like I, man, I was sleeping on LLMs and I feel a little embarrassed. I, yeah. [00:47:44]Alessio: Yeah. No, I mean, that's a good point. It's like sometimes the, we just had Jeremy Howard on the podcast and he was saying when he was talking about fine tuning, everybody thought it was dumb, you know, and then later people realize, and there's something to be said about messaging, especially like in technical audiences where there's kind of like the metagame, you know, which is like, oh, these are like the cool ideas people are exploring. I don't know where I want to align myself yet, you know, or whatnot. So it's cool exploration. So it's kind of like the opposite of that. You mentioned RL, right? That's something that was kind of like up and up and up. And then now it's people are like, oh, I don't know. Are there any other areas if you weren't working on, on magic that you want to go work on? [00:48:25]Bryan: Well, I did mention that, like, I think this like Memex product is just like incredibly exciting to me. And I think it's really opportunistic. I think it's very, very feasible, but I would maybe even extend that a little bit, which is I don't see enough people getting really enthusiastic about hardware with advanced AI built in. You're hearing whispering of it here and there, put on the whisper, but like you're starting to see people putting whisper into pieces of hardware and making that really powerful. I joked with, I can't think of her name. Oh, Sasha, who I know is a friend of the pod. Like I joked with Sasha that I wanted to make the big mouth Billy Bass as a babble fish, because at this point it's pretty easy to connect that up to whisper and talk to it in one language and have it talk in the other language. And I was like, this is the kind of s**t I want people building is like silly integrations between hardware and these new capabilities. And as much as I'm starting to hear whisperings here and there, it's not enough. I think I want to see more people going down this track because I think ultimately like these things need to be in our like physical space. And even though the margins are good on software, I want to see more like integration into my daily life. Awesome. [00:49:47]Alessio: And then, yeah, a takeaway, what's one message idea you want everyone to remember and think about? [00:49:54]Bryan: Even though earlier I was talking about sort of like, maybe like not reinventing things and being respectful of the sort of like ML and data science, like ideas. I do want to say that I think everybody should be experimenting with these tools as much as they possibly can. I've heard a lot of professors, frankly, express concern about their students using GPT to do their homework. And I took a completely opposite approach, which is in the first 15 minutes of the first class of my semester this year, I brought up GPT on screen and we talked about what GPT was good at. And we talked about like how the students can sort of like use it. I showed them an example of it doing data analysis work quite well. And then I showed them an example of it doing quite poorly. I think however much you're integrating with these tools or interacting with these tools, and this audience is probably going to be pretty high on that distribution. I would really encourage you to sort of like push this into the other people in your life. My wife is very technical. She's a product manager and she's using chat GPT almost every day for communication or for understanding concepts that are like outside of her sphere of excellence. And recently my mom and my sister have been sort of like onboarded onto the chat GPT train. And so ultimately I just, I think that like it is our duty to help other people see like how much of a paradigm shift this is. We should really be preparing people for what life is going to be like when these are everywhere. [00:51:25]Alessio: Awesome. Thank you so much for coming on, Bryan. This was fun. [00:51:29]Bryan: Yeah. Thanks for having me. And use Hex magic. [00:51:31] Get full access to Latent Space at www.latent.space/subscribe

Dead Cat
Databricks CEO Ali Ghodsi & MosaicML Founder Naveen Rao Speak at Cerebral Valley

Dead Cat

Play Episode Listen Later Nov 21, 2023 22:04


We were delighted to kick off the 2nd Cerebral Valley AI Summit with Ali Ghodsi, CEO of Databricks, and Naveen Rao, co-founder of MosaicML. Their encounter at our debut event in March led to Ghodsi buying Rao's company, which had little revenue, for $1.3 billion. At our event on Nov. 15, the two discussed how the deal came together quickly after meeting at the conference dinner. Thousands of enterprises around the world rely on Oracle Cloud Infrastructure (OCI) to power applications that drive their businesses. OCI customers include leaders across industries, such as healthcare, scientific research, financial services, telecommunications, and more.NVIDIA DGX Cloud on OCI is an AI training-as-a-service platform for customers to train complex AI models like generative AI applications. Included with DGX Cloud, NVIDIA AI Enterprise brings the software layer of the NVIDIA AI platform to OCI.Talk with Oracle about accelerating your GPU workloads.Ghodsi recounted how he started spending some time with Rao and thought, “these guys are pretty good,” and then by chance noticed an employee he respected poking around with MosaicML and offering a strong endorsement. Soon Ghodsi was on the phone with the head of his deals team, who told him “if you want to buy these guys you have to do it this weekend.” Rao said by that point “you kind of know he's going to pop the question,” and once they worked out the money, the deal was done.The two executives certainly seemed to be in harmony as they touted the potential benefits from their combination, which in simple terms will bring MosaicML's expertise in building specialized generative AI models to Databricks' corporate data platform products, essentially super-charging Databricks for the generative AI era. They were eager to defend the idea of open-source foundation models that are specific to certain tasks, rejecting the notion that general-purpose models like ChatGPT-4 will eventually swallow everything. (This conversation took place before OpenAI was thrown into chaos by its board of directors.)Ghodsi said calls to limit open-source models on the grounds that they'll be too easily exploited by bad actors a “horrible, horrendous” idea that would “put a stop to all innovation.” “It's essential that we have an open-source ecosystem,” he said, noting that even now it's unclear how a lot of AI models work, and open-source research will be critical to answering those questions.Rao added that many of the people making predictions about how AI would develop are “full of s**t.” On the safety question, he noted that cost alone would stand in the way of any existential risks for a long time, and in the meantime the focus should be on real threats like disinformation and robot safety.Give it a listen Get full access to Newcomer at www.newcomer.co/subscribe

Dead Cat
Databricks CEO Ali Ghodsi & MosaicML Founder Naveen Rao Speak at Cerebral Valley

Dead Cat

Play Episode Listen Later Nov 21, 2023 22:04


We were delighted to kick off the 2nd Cerebral Valley AI Summit with Ali Ghodsi, CEO of Databricks, and Naveen Rao, co-founder of MosaicML. Their encounter at our debut event in March led to Ghodsi buying Rao's company, which had little revenue, for $1.3 billion. At our event on Nov. 15, the two discussed how the deal came together quickly after meeting at the conference dinner. Thousands of enterprises around the world rely on Oracle Cloud Infrastructure (OCI) to power applications that drive their businesses. OCI customers include leaders across industries, such as healthcare, scientific research, financial services, telecommunications, and more.NVIDIA DGX Cloud on OCI is an AI training-as-a-service platform for customers to train complex AI models like generative AI applications. Included with DGX Cloud, NVIDIA AI Enterprise brings the software layer of the NVIDIA AI platform to OCI.Talk with Oracle about accelerating your GPU workloads.Ghodsi recounted how he started spending some time with Rao and thought, “these guys are pretty good,” and then by chance noticed an employee he respected poking around with MosaicML and offering a strong endorsement. Soon Ghodsi was on the phone with the head of his deals team, who told him “if you want to buy these guys you have to do it this weekend.” Rao said by that point “you kind of know he's going to pop the question,” and once they worked out the money, the deal was done.The two executives certainly seemed to be in harmony as they touted the potential benefits from their combination, which in simple terms will bring MosaicML's expertise in building specialized generative AI models to Databricks' corporate data platform products, essentially super-charging Databricks for the generative AI era. They were eager to defend the idea of open-source foundation models that are specific to certain tasks, rejecting the notion that general-purpose models like ChatGPT-4 will eventually swallow everything. (This conversation took place before OpenAI was thrown into chaos by its board of directors.)Ghodsi said calls to limit open-source models on the grounds that they'll be too easily exploited by bad actors a “horrible, horrendous” idea that would “put a stop to all innovation.” “It's essential that we have an open-source ecosystem,” he said, noting that even now it's unclear how a lot of AI models work, and open-source research will be critical to answering those questions.Rao added that many of the people making predictions about how AI would develop are “full of s**t.” On the safety question, he noted that cost alone would stand in the way of any existential risks for a long time, and in the meantime the focus should be on real threats like disinformation and robot safety.Give it a listen Get full access to Newcomer at www.newcomer.co/subscribe

SQL Data Partners Podcast
Episode 270: Medallion Architecture

SQL Data Partners Podcast

Play Episode Listen Later Nov 21, 2023 39:01


Moving up the ranks in the holy technology wars is the medallion architecture, and boy are we interested in getting your thoughts. Not since the 2008 Olympics and Michael Phelps' tenth of a second win over Milorad Cavic has there been so much controversy around bronze, silver, and gold. This episode of the podcast has a genesis with Databricks and the methodology of getting data into a workable form for all the reporting pieces businesses love so much. We discuss our thoughts on the various layers of a medallion architecture and the implementation in Azure delta lake environments. Have a different take? Let us know! The show notes for today's episode can be found at Episode 270: Medallion Architecture. Have fun on the SQL Trail!

BI or DIE
Citizen Data Management mit Microsoft Fabric | Power BI or DIE mit Fabian Heidenstecker

BI or DIE

Play Episode Listen Later Nov 19, 2023 33:41


Senior Manager sein - Mensch bleiben Fördert Fabric ein Daten-Chaos wie Access oder der Durchbruch des geordneten Datenmanagements durch die Fachbereiche? Die Vorschau-Phase von Microsoft Fabric neigt sich dem Ende zu und da lohnt es sich, als Praktiker nochmal einen Blick auf das Thema zu werfen. Fabric schließt eine wichtige Lücke - Daten können jetzt persistiert werden. Es könnte das Ende zahlreicher Workarounds mit dem Pseudo-Persistieren über Dataflows und Excel als Datenbank sein. Manche Fragen werden sich erst nach einer längeren Nutzung zeigen, z.B.: wird OneLake wirklich der Durchbruch einer zentralen Datenhaltung und gleichzeitig des Data Mesh? Aber auch ein Deep Dive mit einem Vergleich zwischen Software-as-a-Service und Platform-as-a-Service sowie zwischen Azure Synapse, Databricks und Snowflake haben sich die beiden nicht nehmen lassen. Fabian ist seit Mitte 2021 bei Opitz Consulting und unterstuetzt Kunden auf der Reise zur Data Driven Company. Seinen Start in die Berufswelt begann im CRM Umfeld und fuehrte ihn auf seinem Weg weiter zu den Themen Business Intelligence, Data Analytics und in letzter Zeit auch in den Bereich des maschinellen Lernens. Lieblingsthemen sind BI Frontends & Self-Service. Privat bedient er Informatiker-Klischees und beschaeftigt sich gerne mit Computer- und Videospielen vergangener Epochen. Er lebt als Exilhesse im Ruhrgebiet.

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Danny In The Valley
MosaicML's Naveen Rao: "Bio-inspired AI"

Danny In The Valley

Play Episode Listen Later Nov 17, 2023 45:17


The Sunday Times' tech correspondent Danny Fortson brings on Naveen Rao, founder of MosaicML, to talk about the efficiency of the human brain (3:30), slashing the cost to train AI models (8:00), how this is like the evolution of the car (11:40), selling to Databricks (14:30), how the AI market will evolve (17:00), the fallacy of AI doomerism (21:00), growing up in eastern Kentucky (22:30), plunging into the dotcom boom (24:50), why he studied neuroscience (27:10), selling his previous startup to Intel (32:00), solving intelligence (34:40), and what he tells his kids about the future (40:00). Hosted on Acast. See acast.com/privacy for more information.

How to B2B a CEO (with Ashu Garg)
How to Power the AI Boom (Robert Nishihara, Co-Founder & CEO of Anyscale)

How to B2B a CEO (with Ashu Garg)

Play Episode Listen Later Nov 17, 2023 45:36


In this episode of B2BaCEO, I speak with Robert Nishihara, co-founder and CEO of Anyscale. Anyscale's aspiration is to build the fastest, most cost-efficient infrastructure for running LLMs and AI workloads. When it is successful, Anyscale will be to the AI era what Microsoft was for the PC era: the underlying operating system on which all AI applications are developed and run. Anyscale is built on Ray, an open-source compute framework that Robert and his co-founders developed as PhD students at UC Berkeley. Under the guidance of Professor Ion Stoica, who also co-founded Conviva and Databricks, the team sought to make distributed computing broadly accessible. Anyscale was then launched as a fully managed platform for Ray, making even the toughest problems in distributed computing easy for developers to tackle. Today, Anyscale is a billion-dollar business powering mission-critical AI use cases at companies like Amazon, Cohere, Hugging Face, NVIDIA, OpenAI, and Visa. If you've been a PhD student at UC Berkeley, created a popular open-source framework, and built a billion-dollar business on top of it, you've likely learned a thing or two along the way. Robert's story offers valuable lessons for fellow founders and builders at all stages of the startup journey.

The New Stack Podcast
Integrating a Data Warehouse and a Data Lake

The New Stack Podcast

Play Episode Listen Later Nov 16, 2023 20:59


TNS host Alex Williams is joined by Florian Valeye, a data engineer at Back Market, to shed light on the evolving landscape of data engineering, particularly focusing on Delta Lake and his contributions to open source communities. As a member of the Delta Lake community, Valeye discusses the intersection of data warehouses and data lakes, emphasizing the need for a unified platform that breaks down traditional barriers.Delta Lake, initially created by Databricks and now under the Linux Foundation, aims to enhance reliability, performance, and quality in data lakes. Valeye explains how Delta Lake addresses the challenges posed by the separation of data warehouses and data lakes, emphasizing the importance of providing asset transactions, real-time processing, and scalable metadata.Valeye's involvement in Delta Lake began as a response to the challenges faced at Back Market, a global marketplace for refurbished devices. The platform manages large datasets, and Delta Lake proved to be a pivotal solution in optimizing ETL processes and facilitating communication between data scientists and data engineers.The conversation delves into Valeye's journey with Delta Lake, his introduction to Rust programming language, and his role as a maintainer in the Rust-based library for Delta Lake. Valeye emphasizes Rust's importance in providing a high-level API with reliability and efficiency, offering a balanced approach for developers.Looking ahead, Valeye envisions Delta Lake evolving beyond traditional data engineering, becoming a platform that seamlessly connects data scientists and engineers. He anticipates improvements in data storage optimization and envisions Delta Lake serving as a standard format for machine learning and AI applications.The conversation concludes with Valeye reflecting on his future contributions, expressing a passion for Rust programming and an eagerness to explore evolving projects in the open-source community. Learn more from The New Stack about Delta Lake and The Linux Foundation:Delta Lake: A Layer to Ensure Data QualityData in 2023: Revenge of the SQL NerdsWhat Do You Know about Your Linux System?

This Week in Pre-IPO Stocks
E74: $15b of 2024 SpaceX revenue, Shein to IPO!, Klarna to IPO!, xAI launches LLM model, OpenSea lays off 50% | Pre-IPO Stock Market Update – Nov 10, 2023

This Week in Pre-IPO Stocks

Play Episode Listen Later Nov 10, 2023 7:34


00:07 | Klarna to IPO!- Klarna, a buy now pay later company, is setting up a new British holding company, a step towards a potential IPO- Current secondary market valuation is $6.2; IPO rumored at $15b, 142% increase01:17 | xAI launches LLM model- Grok is xAI's first product, a LLM model to compete with OpenAI ChatGPT and Google Bard- Grok has been trained on X/Twitter data- xAI is an independent company from X and has yet to raise external capital02:07 | OpenSea lays off 50% of staff- Exact number of affected employees undisclosed- Current cuts follow a 20% staff reduction in Jul 2022 and 10 staff in May 2023- OpenSea transaction volume is down 99% since start of 202203:05 | $15b of 2024 SpaceX revenue- $9.0b in 2023 estimated revenue- $3.0b in 2023 estimated EBITDA- $15b in 2024 estimated revenue, a 67% year over year increase- Starlink now cash flow positive and will be a majority of SpaceX revenue in 202404:16 | Shein to IPO!- Shein targets up to $90 billion valuation for a potential U.S. IPO- IPO valuation is higher than the $64 billion earlier this year but below Apr 2022's $100 billion- Shein is positioning itself as a global brand, moving its base to Singapore and recruiting international leaders05:26 | Big capital raises- Next Insurance (www.nextinsurance.com) | $265m Series G, $2.5b valuation- Zilch (www.zilch.com) | Series D, $2.0b valuation- WeLab Holdings (www.welab.co) | $260m Series E, $2.0b valuation- Tabby (www.tabby.ai) | $200m Series D, $1.5b valuation- SkyCell (www.skycell.ch) | $57m Series D, $600m valuation06:26 | Pre-IPO +0.36% for week* NOTE: Pitchbook posted Anthropic's Oct 2023 primary round at $25.0b post-money valuation. This has now been updated in the implied valuation report below.- Week winners: Brex +9.11%, Epic Games +2.48%, Airtable +1.48%, Deel +1.08%, SpaceX 0.92%- Week losers: Anthropic -2.97%, Byju's -1.61%, Reddit -1.25%, Stripe -0.81%, Databricks -0.51%- Top valuations: ByteDance $204b, SpaceX $153b, OpenAI $80b, Stripe $51b, Databricks $47b lead in current valuationInvest in pre-IPO stocks with AG Dillon Funds - www.agdillon.com

This Week in Pre-IPO Stocks
E73: Anthropic +400% in 5 months, X.com -57% to $19b, Andreessen is a Techno-Optimist, SpaceX 2nd Starship attempt | Pre-IPO Stock Market Update – Nov 3, 2023

This Week in Pre-IPO Stocks

Play Episode Listen Later Nov 3, 2023 9:23


00:10 | Andreessen is a Techno-Optimist- Marc Andreessen is co-founder of Andreessen Horowitz, a $35b AUM venture capital firm- "The Techno-Optimist Manifesto" celebrates potential of technology, markets, and human intelligence for societal progress and abundance- Challenges tech stagnation, anti-technology sentiment, and excessive tech regulation- Created in response to government and public apprehension on AI, the metaverse, nuclear technology, and other emerging tech01:40 | SpaceX 2nd Starship attempt- FAA completes safety review of SpaceX's Starship rocket, clearing major hurdle for its second liftoff- Next US Fish and Wildlife Service review and approve the new water deluge system- Launch date remains TBD02:32 | Startup bankruptcies could be a bottoming?- OliveAI, Convoy, Pebble, Hello Bello all declared bankruptcy or had a fire sale- Each were big companies that a few years back would have easily been able to raise additional capital to keep going- Large business failures like this force startup CEOs to recognize that they need to drive to profitability fast in order to control their own destiny and also acknowledge that the valuations of their businesses are not going back to 2021 levels- The goal is to get to market clearing prices and these realizations are catalysts to get the market there03:57 | X.com -57% to $19b- X announced a $19b internal valuation, -57% below the Musk purchase price of $44b- Musk envisions X as the ‘everything app', implying integration of functionalities like tweets, YouTube videos, checking accounts, Venmo, merchant payments, sports betting, news article micro-payments, and shopping- Investment thesis for X centers on Musk's ability to [1] actualize the 'everything app' concept, and [2] its potential adoption by US users akin to Chinese users' engagement with WeChat, the China ‘everything app'- WeChat a primary product for Tencent with a $373b market cap06:11 | Anthropic +400% in 5 months- Google invests $500m now and commits $1.5b in the future at a $25b post-money valuation (source Pitchbook)- Google previously invested $300m in Apr 2023 for a 10% stake- Amazon invested $1.25b in Sep 2023, with potential to add $2.75b at a future date07:18 | Big capital raises- Anthropic (www.anthropic.com) | $2.0b Series F, $25b valuation- Pony.ai (www.pony.ai) | $100m Series D2, $8.5b valuation- Island (www.island.io) | $100m Series C, $1.5b valuation- Zhizi Automobile (www.xn--i8sq31b1uyx4b.com) | $76m Series B, $1.45b valuation- AgentSync (www.agentsync.io) | $125m Series B, $1.25b valuation08:28 | Pre-IPO -0.21% for week- Week winners: Klarna +2.81%, Rippling +2.41%, Scale.ai +1.53%, Airtable +1.40%, Databricks +0.70%- Week losers: SpaceX -5.29%, Anthropic -2.82%, Deel -2.54%, Stripe -1.73%, Discord -1.64%- Top valuations: ByteDance $203b, SpaceX $152b, OpenAI $80b, Stripe $51b, Databricks $47b lead in current valuationInvest in pre-IPO stocks with AG Dillon Funds - www.agdillon.com

The Analytics Engineering Podcast
Navigating AI Complexity (w/ Jonathan Frankle)

The Analytics Engineering Podcast

Play Episode Listen Later Nov 3, 2023 46:20


Jonathan Frankle is the Chief Scientist at MosaicML, which was recently bought by Databricks for $1.3 billion.  MosaicML helps customers train generative AI models on their data. Lots of companies are excited about gen AI, and the hope is that their company data and information will be what sets them apart from the competition.  In this conversation with Tristan and Julia, Jonathan discusses a potential future where you can train specialized, purpose-built models, the future of MosaicML inside of Databricks, and the importance of responsible AI practices. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.

The Twenty Minute VC: Venture Capital | Startup Funding | The Pitch
20Product: Why You Should Not Go Into Product Management, Why the CEO is Always the CPO, How to Build the Best Product Teams & Why You Should Hire People Who Aren't In Product Already with Databricks SVP Product, David Meyer

The Twenty Minute VC: Venture Capital | Startup Funding | The Pitch

Play Episode Listen Later Nov 1, 2023 62:04


David Meyer is the SVP Products at Databricks where he drives product strategy and execution. He previously ran Engineering and Product Management at OneLogin, where he grew the company to thousands of customers and market leadership. Before OneLogin, he cofounded UniversityNow, an accredited open university system, running Product and Engineering. Prior to that, David managed a $1 billion portfolio of business intelligence products at SAP and co-led cloud strategy. His first software journey was at Plumtree which went public before being acquired by BEA in 2005. In Today's Episode with David Meyer We Discuss: Entry into Product: How did David make his way into the world of product? Why did he not want to go into it? Why does David advise everyone "do not go into product management"? What does David know now that he wishes he had known when he entered product? 2. How to be a Great Product Leader: Why does David think most leaders suck at leading? Why is the most important thing to make your team feel seen? What can leaders do to ensure this? Why does David help his team members to find other roles outside of the company? 3. Building the Best Product Team: How does David hire for product today? What questions does he ask? What signals does he look for? What are David's biggest hiring mistakes? How did they change his approach? What are the biggest mistakes founders make when hiring for product? Why should you hire people who are not in product today? 4. David Meyer: The Art or Science of Product: Is product more art or science? If David were to put a number on it, what would it be? Is simple always better when it comes to product? Will AI remove the importance and focus on UI? Why are the most impressive companies business model innovations not product innovations?

Business Breakdowns
Databricks: Data Based Decisions - [Business Breakdowns, EP.134]

Business Breakdowns

Play Episode Listen Later Nov 1, 2023 49:50


This is Zack Fuss. Today I am joined by Yanev Suissa, Managing Partner at SineWave Ventures, to break down the private company Databricks. Born out of a UC Berkeley research lab in 2013, Databricks has grown rapidly, and after 50% growth this summer, it was rumored to have last raised at a $43 billion valuation. In the most simple terms, Databricks provides tools for ingesting, transforming, and analyzing large sets of data from multiple sources in multiple formats in order to inform business and engineering decisions. Databricks is on a crash course with Snowflake to amass market share. In this conversation, we explore the nuances of structured and unstructured data, discuss data lakes, and what it entails to get "Hadooped." Please enjoy this breakdown of Databricks.  Interested in hiring from the Colossus Community? Click here. For the full show notes, transcript, and links to the best content to learn more, check out the episode page here. ----- This episode is brought to you by Tegus. Tegus is the modern research platform for leading investors, and provider of Canalyst. Tired of calculating fully-diluted shares outstanding? Access every publicly-reported datapoint and industry-specific KPI through their database of over 4,000 driveable global models handbuilt by a team of sector-focused analysts, 35+ industry comp sheets, and Excel add-ins that let you use their industry-leading data in your own spreadsheets. Tegus' models automatically update each quarter, including hard to calculate KPIs like stock-based compensation and organic growth rates, empowering investors to bypass the friction of sourcing, building and updating models. Make efficiency your competitive advantage and take back your time today. As a listener, you can trial Canalyst by Tegus for free by visiting tegus.co/patrick. ----- Business Breakdowns is a property of Colossus, LLC. For more episodes of Business Breakdowns, visit joincolossus.com/episodes. Stay up to date on all our podcasts by signing up to Colossus Weekly, our quick dive every Sunday highlighting the top business and investing concepts from our podcasts and the best of what we read that week. Sign up here. Follow us on Twitter: @JoinColossus | @patrick_oshag | @jspujji | @zbfuss | @ReustleMatt | @domcooke Show Notes: (00:02:32) - (First Question) - What Databricks is and why it is so successful  (00:04:38) - Real world examples of how customers use Databricks  (00:07:23) - How issues were handled historically before Databricks was available  (00:08:39) - Key examples of what helped accelerate Databricks' success  (00:10:52) - Databricks revenue model and how it converts into bottomline   (00:12:13) - How Databricks competes with competitors like Snowflake  (00:14:11) - Competition versus symbiosis when compared to large organizations  (00:14:11) - The overall size of Databricks as a business (00:18:09) - Costs incurred when using a database service like Databricks (00:19:47) - The founding story of Databricks  (00:22:53) - When SineWave recognized the database's potential  (00:24:29) - The importance of partnerships and how they help grow the business  (00:27:07) - Legacy solutions that they are disintermediating or replacing in their growth   (00:27:57) - What being Hadoop'd means  (00:21:50) - A breakdown of the complexity behind switching to different database providers  (00:32:07) - The success of these businesses breaking into legacy regulated industries  (00:34:47) - Why AI is so impactful to the database  (00:37:40) - How AI is helping these businesses go to market with their software  (00:39:50) - Democratization of data access and businesses taking the opposite approach  (00:43:00) - Key reasons for investing in Databricks and potential risks to be considered   (00:46:12) - Lessons learned from studying Databricks  Learn more about your ad choices. Visit megaphone.fm/adchoices

ChatGPT: News on Open AI, MidJourney, NVIDIA, Anthropic, Open Source LLMs, Machine Learning
Databricks Raises $500M at Staggering $43B Valuation with Support from Nvidia and Capital One

ChatGPT: News on Open AI, MidJourney, NVIDIA, Anthropic, Open Source LLMs, Machine Learning

Play Episode Listen Later Oct 28, 2023 7:16


Tune in to our latest episode to uncover the astounding news of Databricks' remarkable achievement in raising $500 million at an extraordinary $43 billion valuation, with backing from industry giants Nvidia and Capital One. Explore the intricacies of this significant funding milestone and its implications for the future of data and analytics. Join us for an insightful discussion on the intersection of technology, finance, and innovation in this must-listen podcast. Get on the AI Box Waitlist: https://AIBox.ai/Join our ChatGPT Community: ⁠https://www.facebook.com/groups/739308654562189/⁠Follow me on Twitter: ⁠https://twitter.com/jaeden_ai⁠

ChatGPT: News on Open AI, MidJourney, NVIDIA, Anthropic, Open Source LLMs, Machine Learning
Databricks' Ari Kaplan: Navigating the AI and Data Future

ChatGPT: News on Open AI, MidJourney, NVIDIA, Anthropic, Open Source LLMs, Machine Learning

Play Episode Listen Later Oct 28, 2023 32:58


Join us for a riveting episode as we engage with Databricks' Chief Evangelist, Ari Kaplan, to navigate the future of AI and data. Delve into the visionary perspectives of an industry leader on the groundbreaking developments, challenges, and transformative potential in the dynamic realm of artificial intelligence and data management. Gain invaluable insights into the evolving landscape of technology and data-driven innovation in this must-listen podcast conversation. Get on the AI Box Waitlist: https://AIBox.ai/Join our ChatGPT Community: ⁠https://www.facebook.com/groups/739308654562189/⁠Follow me on Twitter: ⁠https://twitter.com/jaeden_ai⁠

ChatGPT: News on Open AI, MidJourney, NVIDIA, Anthropic, Open Source LLMs, Machine Learning
Databricks Joins $38M Round to Supercharge AI Marketing Data with Hightouch

ChatGPT: News on Open AI, MidJourney, NVIDIA, Anthropic, Open Source LLMs, Machine Learning

Play Episode Listen Later Oct 28, 2023 8:03


Discover how Databricks is making a significant investment in Hightouch's $38M funding round to revolutionize AI marketing data activation. Tune in to learn how this partnership aims to reshape the future of data-driven marketing using cutting-edge AI technology. Join us for an insightful discussion on the potential impact and innovations in the field of marketing and data analytics. Get on the AI Box Waitlist: https://AIBox.ai/Join our ChatGPT Community: ⁠https://www.facebook.com/groups/739308654562189/⁠Follow me on Twitter: ⁠https://twitter.com/jaeden_ai⁠

FINOS Open Source in Fintech Podcast
Navigating FDC3 UX - Cortney Stauffer & Chuck Danielsson, Adaptive

FINOS Open Source in Fintech Podcast

Play Episode Listen Later Oct 27, 2023 34:24


In this episode of the podcast, Grizz sits down with Cortney Stauffer (Head of UX Practice) & Chuck Danielsson (Head of Practice, Web/UI), both from Adaptive. They talk about UX, UI, FDC3, and why things should just work. Cortney Stauffer: https://www.linkedin.com/in/cortstauffer/ Chuck Danielsson: https://www.linkedin.com/in/chuck-danielsson-2141b058/ NYC November 1 - Open Source in Finance Forum: ⁠⁠⁠⁠⁠⁠⁠⁠⁠https://events.linuxfoundation.org/open-source-finance-forum-new-york/⁠⁠⁠⁠⁠⁠⁠⁠⁠ 2022 State of Open Source in Financial Services Download: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.finos.org/state-of-open-source-in-financial-services-2022⁠⁠⁠⁠⁠⁠⁠⁠⁠ All Links on Current Newsletter Here: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.finos.org/newsletter⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ - more show notes to come A huge thank you to all our sponsors for Open Source in Finance Forum New York ⁠⁠⁠⁠⁠⁠⁠⁠https://events.linuxfoundation.org/open-source-finance-forum-new-york/⁠⁠⁠⁠⁠⁠⁠⁠that will take place this November 1st at the New York Marriott Marquis This event wouldn't be possible without our sponsors. A special thank you to our Leader sponsors: Databricks, where you can unify all your data, analytics, and AI on one platform. And Red Hat - Open to change—yesterday, today, and tomorrow. And our Contributor and Community sponsors: Adaptive/Aeron, Connectifi, Discover, Enterprise DB, FinOps Foundation, Fujitsu, instaclustr, Major League Hacking, mend.io, Open Mainframe Project, OpenJS Foundation, OpenLogic by Perforce, Orkes, Percona, Sonatype, StormForge, and Tidelift. If you would like to sponsor or learn more about this event, please send an email to ⁠⁠⁠⁠⁠⁠⁠⁠sponsorships@linuxfoundation.org⁠⁠⁠⁠⁠⁠⁠⁠. Grizz's Info | ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.linkedin.com/in/aarongriswold/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ | ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠grizz@finos.org⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ►► ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Visit FINOS www.finos.org⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ►► ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Get In Touch: info@finos.org⁠

FINOS Open Source in Fintech Podcast
Hackathons, Open Source, Finance? Oh My! - Jon Gottfried, Major League Hacking

FINOS Open Source in Fintech Podcast

Play Episode Listen Later Oct 26, 2023 30:23


In this episode of the podcast, Grizz sits down with Jon Gottfried, Co-Founder of Major League Hacking. They talk about hackathons in finance, and developer/engineering talent, from both the individual and hiring manager perspectives. Jon Gottfried: https://www.linkedin.com/in/jonmarkgo/ MajorLeagueHacking: https://sponsor.mlh.io/ NYC November 1 - Open Source in Finance Forum: ⁠⁠⁠⁠⁠⁠⁠⁠https://events.linuxfoundation.org/open-source-finance-forum-new-york/⁠⁠⁠⁠⁠⁠⁠⁠ 2022 State of Open Source in Financial Services Download: ⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.finos.org/state-of-open-source-in-financial-services-2022⁠⁠⁠⁠⁠⁠⁠⁠ All Links on Current Newsletter Here: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.finos.org/newsletter⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ - more show notes to come A huge thank you to all our sponsors for Open Source in Finance Forum New York ⁠⁠⁠⁠⁠⁠⁠https://events.linuxfoundation.org/open-source-finance-forum-new-york/⁠⁠⁠⁠⁠⁠⁠that will take place this November 1st at the New York Marriott Marquis This event wouldn't be possible without our sponsors. A special thank you to our Leader sponsors: Databricks, where you can unify all your data, analytics, and AI on one platform. And Red Hat - Open to change—yesterday, today, and tomorrow. And our Contributor and Community sponsors: Adaptive/Aeron, Connectifi, Discover, Enterprise DB, FinOps Foundation, Fujitsu, instaclustr, Major League Hacking, mend.io, Open Mainframe Project, OpenJS Foundation, OpenLogic by Perforce, Orkes, Percona, Sonatype, StormForge, and Tidelift. If you would like to sponsor or learn more about this event, please send an email to ⁠⁠⁠⁠⁠⁠⁠sponsorships@linuxfoundation.org⁠⁠⁠⁠⁠⁠⁠. Grizz's Info | ⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.linkedin.com/in/aarongriswold/⁠⁠⁠⁠⁠⁠⁠⁠⁠ | ⁠⁠⁠⁠⁠⁠⁠⁠⁠grizz@finos.org⁠⁠⁠⁠⁠⁠⁠⁠⁠ ►► ⁠⁠⁠⁠⁠⁠⁠⁠⁠Visit FINOS www.finos.org⁠⁠⁠⁠⁠⁠⁠⁠⁠ ►► ⁠⁠⁠⁠⁠⁠⁠⁠⁠Get In Touch: info@finos.org

Data Radicals
Everything You Wanted To Know About LLMs, but Were Too Afraid To Ask with Matthew Lynley, Founding Writer of Supervised

Data Radicals

Play Episode Listen Later Oct 25, 2023 49:29


With the rise of GenAI, LLMs are now accessible to everyone. They start with a very easy learning curve that grows more complicated the deeper you go. But, not all models are created equal. It's critical to design effective prompts so users stay focused and have context that will drive how productive the model is.In this episode, Matthew Lynley, Founding Writer of Supervised, delivers a crash course on LLMs. From the basics of what they are, to vector databases, to trends in the market, you'll learn everything about LLMs that you've always wanted to know. Matthew has spent the last decade reporting on the tech industry at publications like Business Insider, The Wall Street Journal, BuzzFeed News, and TechCrunch. He founded the AI newsletter, Supervised, with the goal of helping readers understand the implications of new technologies and the team building it. Satyen and Matt discuss the inspiration behind Supervised, LLMs, and the rivalry between Databricks and Snowflake.--------“This idea of, ‘How does an LLM work?' I think, the second you touch one for the first time, you get it right away. Now, there's an enormous level of intricacy and complication once you go a single step deeper, which is the differences between the LLMs. How do you think about crafting the right prompt? Knowing that they can go off the rails really fast if you're not careful, and the whole network of tools that are associated on top of it. But, when you think from an education perspective, the education really only starts when you are talking to people that are like, ‘This is really cool. I've tried it, it's awesome. It's cool as hell. But how can I use it to improve my business?' Then it starts to get complicated. Then you have to start understanding how expensive is OpenAI? How do you integrate it? Do I go closed source or open source? The learning curve starts off very, very, very easy because you can get it right away. Then, it quickly becomes one of the hardest possible products to understand once you start trying to dig into it.” – Matthew Lynley--------Time Stamps:*(04:21): The genesis of Supervised*(11:34): The LLM learning curve*(21:35): Time to build a vector database?*(31:55): Open source vs. proprietary LLMs *(41:35): Snowflake/Databricks overlap*(47:47): Satyen's Takeaways--------SponsorThis podcast is presented by Alation.Learn more:* Subscribe to the newsletter: https://www.alation.com/podcast/* Alation's LinkedIn Profile: https://www.linkedin.com/company/alation/* Satyen's LinkedIn Profile: https://www.linkedin.com/in/ssangani/--------LinksRead SupervisedConnect with Matthew on LinkedIn

FINOS Open Source in Fintech Podcast
The CDM: Common Domain Module - Adrian Dale - ISLA, David Shone - ISDA, Jane Gavronsky - FINOS

FINOS Open Source in Fintech Podcast

Play Episode Listen Later Oct 24, 2023 36:42


In this episode of the podcast, our FINOS COO, Jane Gavronsky sits down with Adrian Dale of ISLA and David Shone of ISDA to discuss the associations contribution and backing of the FINOS CDM, Common Domain Model to the FINOS open source community. CDM: https://cdm.finos.org/ On GitHub: https://github.com/finos/common-domain-model Adrian Dale, Head of Regulation & Markets, ISLA - https://www.linkedin.com/in/adrian-dale-27942314/ David Shone, Director of Product - Data & Digital, ISDA - https://www.linkedin.com/in/david-shone/ Jane Gavronsky, COO, FINOS - https://www.linkedin.com/in/janegavronsky/ NYC November 1 - Open Source in Finance Forum: ⁠⁠⁠⁠⁠⁠⁠https://events.linuxfoundation.org/open-source-finance-forum-new-york/⁠⁠⁠⁠⁠⁠⁠ 2022 State of Open Source in Financial Services Download: ⁠⁠⁠⁠⁠⁠⁠⁠https://www.finos.org/state-of-open-source-in-financial-services-2022⁠⁠⁠⁠⁠⁠⁠ All Links on Current Newsletter Here: ⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.finos.org/newsletter⁠⁠⁠⁠⁠⁠⁠⁠⁠ - more show notes to come A huge thank you to all our sponsors for Open Source in Finance Forum New York ⁠⁠⁠⁠⁠⁠https://events.linuxfoundation.org/open-source-finance-forum-new-york/⁠⁠⁠⁠⁠⁠that will take place this November 1st at the New York Marriott Marquis This event wouldn't be possible without our sponsors. A special thank you to our Leader sponsors: Databricks, where you can unify all your data, analytics, and AI on one platform. And Red Hat - Open to change—yesterday, today, and tomorrow. And our Contributor and Community sponsors: Adaptive/Aeron, Connectifi, Discover, Enterprise DB, FinOps Foundation, Fujitsu, instaclustr, Major League Hacking, mend.io, Open Mainframe Project, OpenJS Foundation, OpenLogic by Perforce, Orkes, Percona, Sonatype, StormForge, and Tidelift. If you would like to sponsor or learn more about this event, please send an email to ⁠⁠⁠⁠⁠⁠sponsorships@linuxfoundation.org⁠⁠⁠⁠⁠⁠. Grizz's Info | ⁠⁠⁠⁠⁠⁠⁠⁠https://www.linkedin.com/in/aarongriswold/⁠⁠⁠⁠⁠⁠⁠⁠ | ⁠⁠⁠⁠⁠⁠⁠⁠grizz@finos.org⁠⁠⁠⁠⁠⁠⁠⁠ ►► ⁠⁠⁠⁠⁠⁠⁠⁠Visit FINOS www.finos.org⁠⁠⁠⁠⁠⁠⁠⁠ ►► ⁠⁠⁠⁠⁠⁠⁠⁠Get In Touch: info@finos.org

This Week in Pre-IPO Stocks
E71: Plaid IPO?, NY State sues Gemini, Amazon + humanoid robots, AI and your kids/business | Pre-IPO Stock Podcast – Oct 23, 2023 | Clint Sorenson, Nick Fusco, Aaron Dillon

This Week in Pre-IPO Stocks

Play Episode Listen Later Oct 24, 2023 36:29


00:18 | Plaid IPO?- Plaid hired former Expedia exec, Eric Hart, as their CFO, sparking IPO rumors- Plaid laid off 20% of their staff in 2022, IPO prep?- Visa's attempt to acquire Plaid for $5b was killed by regulators re: antitrust- Plaid at $15.4b implied secondary market valuation, +15% from Apr 2021 round at $13.4b valuation- “High quality” IPOs are coming … larger valuations, broader current investor base … Plaid, Databricks, Stripe06:53 | NY State sues Gemini- Gemini's Earn product allocated money to DCG's Genesis which subsequently allocated to Three Arrows Capital and Alameda Research, both of which went bankrupt, leaving a $1.1b deficit- FTX's Anthropic investment could fill the $1.1b deficit- “Picks and shovels” crypto companies are of particular interest; e.g. Kraken, Fireblocks, ConsenSys- Secondary market buyers have pulled back, both buyers and sellers13:12 | Amazon + humanoid robots- Amazon testing Agility Robotics' Digit humanoid robot in warehouses- Robotics and AI would play a significant part in lives over the next 5 to 10 years, impacting business and geopolitics- Robotics will have a positive impact on economic growth, population decline will require robotics to enhance productivity- Robotics have been around for a while but humanoid robots are making the opportunity very tangible and real for many- Robotics will drive onshoring manufacturing due to lowered costs of production21:13 | AI and your kids/business- I turned by 5 year old, Duke, loose on ChatGPT audio option this past weekend- Duke has a full on conversation with “Mr. George”, the name he gave ChatGPT- ChatGPT knew Duke was 5 years old and was prompting Duke with questions to further engage and education him- Clint uses ChatGPT to write investment research and create initial drafts of chapters for a book he is writing- AI, like ChatGPT, will make to possibility of “1-person-companies” a reality- A $1.0b valuation 1-person-company is close!

The Daily Crunch – Spoken Edition
After $43B valuation, Databricks acquires data replication startup Arcion for $100M

The Daily Crunch – Spoken Edition

Play Episode Listen Later Oct 24, 2023 5:11


Databricks has remained a hot startup at a time when interest from investors has cooled across the ecosystem.

The Engineering Leadership Podcast
Using GenAI to address cross-functional collaboration challenges & enhance engineering leadership w/ Clemens Mewald #152

The Engineering Leadership Podcast

Play Episode Listen Later Oct 24, 2023 45:32


Clemens Mewald, Head of Product @ Instabase, joins our podcast to share his insights on how to embrace generative AI tools to enhance engineering leadership, productivity, and problem-solving. We also chat about creative ways to implement GenAI beyond simply chatbots, questions to ask when deciding how to leverage GenAI capabilities, rethinking the productivity aspect of generative AI, and how to use it to address cross-functional collaboration challenges within your org. Additionally, Clemens reveals how he translates his deep technical / research knowledge on generative AI into meaningful business & product outcomes.ABOUT CLEMENS MEWALDClemens Mewald is the Head of Product at Instabase. With over 15 years of experience in the industry, Clemens Mewald has built a successful track record as a product and technology leader in the AI and machine learning space. Previously Clemens held leadership positions at Databricks, where he spent more than three years leading the product team for Machine Learning and Data Science. Before Databricks, Clemens served on the Google Brain Team building AI infrastructure for Alphabet, where his product portfolio included TensorFlow and TensorFlow Extended (TFX). Clemens holds an MSc in computer science from UAS Wiener Neustadt, Austria, and an MBA from MIT Sloan."Whenever a new technique or like innovation comes out. It's not the product, right? So like AI is not the product. LLMs are not the product. A lot of people get caught up in the innovation, like the technology if you will, but people don't buy AI. People don't buy LLMs. They buy tools and products that like solve their own problems in the world. So really what you gained is like a new tool in your tool belt to solve your customer's problems, not a new thing that you can sell.”- Clemens Mewald   SHOW NOTES:Why Clemens got involved with Instabase & the GenAI space (2:08)Clemens' observations on how eng leaders are applying GenAI tools (4:56)Where GenAI tools can fit the needs of an executive leader (7:00)The impact of generative AI on product building (9:04)Creative ways to apply GenAI beyond a chatbot (12:41)Strategies for better framing questions to leverage different GenAI capabilities (17:37)Understanding the unreliable aspects of generative AI tools (20:52)Leveraging these tools to tackle eng leadership-specific challenges (24:25)Rethink the productivity aspect of GenAI tools (25:59)Using GenAI tools to address cross-functional collaboration challenges (28:32)How to align / educate different stakeholders around new generative AI capabilities (32:24)Translating technical research on GenAI into business outcomes or products (35:22)The “identify, verify, amplify” framework (36:46)Clemens' favorite methods for verifying in a lightweight way (39:35)Rapid fire questions (40:46)LINKS AND RESOURCESAI Will Save The World with Marc Andreessen and Martin Casado - In this timely one-on-one conversation with a16z General Partner Martin Casado, Marc discusses how this technology will maximize human potential, why the future of AI should be decided by the free market, and most importantly, why AI won't destroy the world. In fact, it may save it.This episode wouldn't have been possible without the help of our incredible production team:Patrick Gallagher - Producer & Co-HostJerry Li - Co-HostNoah Olberding - Associate Producer, Audio & Video Editor https://www.linkedin.com/in/noah-olberding/Dan Overheim - Audio Engineer, Dan's also an avid 3D printer - https://www.bnd3d.com/Ellie Coggins Angus - Copywriter, Check out her other work at https://elliecoggins.com/about/

Closing Bell
Closing Bell Overtime: Why A Recession Could Boost Stocks; Databricks CEO Ali Ghodsi 10/23/23

Closing Bell

Play Episode Listen Later Oct 23, 2023 49:02


A volatile session for the markets today. Wilmington Trust EVP's Meghan Shue and Barcalys Head of US Equity Strategy Venu Krishna break down the market action. Melius' Head of Technology Research Ben Melius previews tomorrow's big tech reports from Alphabet and Microsoft, plus a late-day report from Reuters on Nvidia partnering with Arm to make PC chips. Jon sits down with Databricks CEO Ali Ghodsi.Citi's Scott Chronert makes the case for why a recession could actually boost equity prices. Plus, Marqeta CEO on consumer health and a new deal the company signed for a credit card platform. 

ChatGPT: News on Open AI, MidJourney, NVIDIA, Anthropic, Open Source LLMs, Machine Learning

In this episode, we delve into the world of AI competition and acquisitions as we explore MosaicML's astonishing $1.3 billion deal with Databricks, positioning itself as a formidable rival to OpenAI. We'll uncover the strategic moves, implications, and the future of AI landscape as these tech giants make groundbreaking acquisitions in the pursuit of AI supremacy. Join us for an insightful discussion on the evolving dynamics of the AI industry. Get on the AI Box Waitlist: https://AIBox.ai/Join our ChatGPT Community: ⁠https://www.facebook.com/groups/739308654562189/⁠Follow me on Twitter: ⁠https://twitter.com/jaeden_ai⁠

Cloud Security Today
The AI Episode

Cloud Security Today

Play Episode Listen Later Oct 21, 2023 42:00 Transcription Available


Episode SummaryIn today's episode, AI Safety Initiative Chair at Cloud Security Alliance, Caleb Sima, joins Matt to talk about some of the myths surrounding the quickly evolving world of AI. With two decades of experience in the cybersecurity industry, Caleb has held many high-level roles, including VP of Information Security at Databricks, CSO at Robinhood, Managing VP at CapitalOne, and Founder of both SPI Dynamics and Bluebox Security.Today, Caleb talks about his inspiring career after dropping out of high school, dealing with imposter syndrome, and becoming the Chair of the CSA's AI Safety Initiative. Is AI and Machine Learning the threat that we think it is? Hear about the different kinds of LLMs, the poisoning of LLMs, and how AI can be used to improve security. Timestamp Segments·       [01:31] Why Caleb dropped out high school·       [06:16] Dealing with imposter syndrome.·       [11:43] The hype around AI and Machine Learning.·       [14:55] AI 101 terminology.·       [17:42] Open source LLMs.·       [20:31] Where to start as a security practitioner.·       [24:46] What risks should people be thinking about?·       [28:24] Taking advantage of AI in cybersecurity.·       [32:32] How AI will affect different SOC functions.·       [35:00] Is it too late to get involved?·       [36:29] CSA's AI Safety Initiative.·       [38:52] What's next? Notable Quotes·       “There is no way this thing is not going to change the world.”·       “The benefit that you're going to get out of LLMs internally is going to be phenomenal.”·       “It doesn't matter whether you get in now or in six months.” Relevant LinksLinkedIn:         Caleb Sima Resources:Skipping College Pays Off For Few Teen Techiesllm-attacks.orgSecure applications from code to cloud. Prisma Cloud, the most complete cloud-native application protection platform (CNAPP).Disclaimer: This post contains affiliate links. If you make a purchase, I may receive a commission at no extra cost to you.

The Deal
Drinks With The Deal: Databricks GC Trâm Phi on the MosaicML Deal and the Challenges of Hypergrowth

The Deal

Play Episode Listen Later Oct 19, 2023 24:12


In the latest Drinks With The Deal Podcast, Databricks general counsel Trâm Phi discusses the company's $1.3 billion purchase of MosaicML, the legal uncertainty around AI and the challenges of managing a growing legal department. 

FUTRtech Podcast
Highlights from Databricks' Data + AI World Tour in Chicago - Generation AI Takes Center Stage

FUTRtech Podcast

Play Episode Listen Later Oct 16, 2023 3:19 Transcription Available


Hey everybody, this is Chris Brandt, here with another FUTR podcast.Databricks hosted their Data + AI World Tour event in Chicago on October 3rd, and with all of the hype around AI, Generative AI and Large Language Models, this was the perfect place to get a feel for what is going on. To start with, it was a well attended event, in fact it was a sold out event, and I understand that similar events in other cities around the world were also sold out.The theme for the show was Generation AI, a clever play on Gen AI, but certainly this is a kind of generational change we are seeing. Databricks is coming off a hot streak, with their last raise valuing them at $43 BillionAri Kaplan, Databricks' Lead Evangelist, was there to deliver a keynote address, "Welcome to Generation AI." If you aren't familiar with Ari Kaplan, he is considered to be the "Real Moneyball Guy". Ari regaled us with stories from his time in Chicago sports which went over well with the local crowd...Click Here to Subscribe: FUTR.tv focuses on startups, innovation, culture and the business of emerging tech with weekly podcasts featuring Chris Brandt and Sandesh Patel talking with Industry leaders and deep thinkers.Occasionally we share links to products we use. As an Amazon Associate we earn from qualifying purchases on Amazon.

Data Engineering Podcast
Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable

Data Engineering Podcast

Play Episode Listen Later Oct 15, 2023 68:28


Summary Building streaming applications has gotten substantially easier over the past several years. Despite this, it is still operationally challenging to deploy and maintain your own stream processing infrastructure. Decodable was built with a mission of eliminating all of the painful aspects of developing and deploying stream processing systems for engineering teams. In this episode Eric Sammer discusses why more companies are including real-time capabilities in their products and the ways that Decodable makes it faster and easier. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack (https://www.dataengineeringpodcast.com/rudderstack) This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold (https://www.dataengineeringpodcast.com/datafold) You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It's the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it's real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize (https://www.dataengineeringpodcast.com/materialize) today to get 2 weeks free! As more people start using AI for projects, two things are clear: It's a rapidly advancing field, but it's tough to navigate. How can you get the best results for your use case? Instead of being subjected to a bunch of buzzword bingo, hear directly from pioneers in the developer and data science space on how they use graph tech to build AI-powered apps. . Attend the dev and ML talks at NODES 2023, a free online conference on October 26 featuring some of the brightest minds in tech. Check out the agenda and register today at Neo4j.com/NODES (https://Neo4j.com/NODES). Your host is Tobias Macey and today I'm interviewing Eric Sammer about starting your stream processing journey with Decodable Interview Introduction How did you get involved in the area of data management? Can you describe what Decodable is and the story behind it? What are the notable changes to the Decodable platform since we last spoke? (October 2021) What are the industry shifts that have influenced the product direction? What are the problems that customers are trying to solve when they come to Decodable? When you launched your focus was on SQL transformations of streaming data. What was the process for adding full Java support in addition to SQL? What are the developer experience challenges that are particular to working with streaming data? How have you worked to address that in the Decodable platform and interfaces? As you evolve the technical and product direction, what is your heuristic for balancing the unification of interfaces and system integration against the ability to swap different components or interfaces as new technologies are introduced? What are the most interesting, innovative, or unexpected ways that you have seen Decodable used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Decodable? When is Decodable the wrong choice? What do you have planned for the future of Decodable? Contact Info esammer (https://github.com/esammer) on GitHub LinkedIn (https://www.linkedin.com/in/esammer/) Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. To help other people find the show please leave a review on Apple Podcasts (https://podcasts.apple.com/us/podcast/data-engineering-podcast/id1193040557) and tell your friends and co-workers Links Decodable (https://www.decodable.co/) Podcast Episode (https://www.dataengineeringpodcast.com/decodable-streaming-data-pipelines-sql-episode-233/) Flink (https://flink.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/apache-flink-with-fabian-hueske-episode-57/) Debezium (https://debezium.io/) Podcast Episode (https://www.dataengineeringpodcast.com/debezium-change-data-capture-episode-114/) Kafka (https://kafka.apache.org/) Redpanda (https://redpanda.com/) Podcast Episode (https://www.dataengineeringpodcast.com/vectorized-red-panda-streaming-data-episode-152/) Kinesis (https://aws.amazon.com/kinesis/) PostgreSQL (https://www.postgresql.org/) Podcast Episode (https://www.dataengineeringpodcast.com/postgresql-with-jonathan-katz-episode-42/) Snowflake (https://www.snowflake.com/en/) Podcast Episode (https://www.dataengineeringpodcast.com/snowflakedb-cloud-data-warehouse-episode-110/) Databricks (https://www.databricks.com/) Startree (https://startree.ai/) Pinot (https://pinot.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/pinot-embedded-analytics-episode-273/) Rockset (https://rockset.com/) Podcast Episode (https://www.dataengineeringpodcast.com/rockset-serverless-analytics-episode-101/) Druid (https://druid.apache.org/) InfluxDB (https://www.influxdata.com/) Samza (https://samza.apache.org/) Storm (https://storm.apache.org/) Pulsar (https://pulsar.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/pulsar-fast-and-scalable-messaging-with-rajan-dhabalia-and-matteo-merli-episode-17) ksqlDB (https://ksqldb.io/) Podcast Episode (https://www.dataengineeringpodcast.com/ksqldb-kafka-stream-processing-episode-122/) dbt (https://www.getdbt.com/) GitHub Actions (https://github.com/features/actions) Airbyte (https://airbyte.com/) Singer (https://www.singer.io/) Splunk (https://www.splunk.com/) Outbox Pattern (https://debezium.io/blog/2019/02/19/reliable-microservices-data-exchange-with-the-outbox-pattern/) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)

This Week in Pre-IPO Stocks
E70: $10b valuation for Anduril, OpenAI next “App Store”, Loom acquired for $975m, Flexport lays of 20% staff | Pre-IPO Stock Market Update – Oct 13, 2023

This Week in Pre-IPO Stocks

Play Episode Listen Later Oct 14, 2023 6:07


00:10 | $10b valuation for Anduril- Raising $400m to $500m at a $10b valuation, 17.6% increase from last round- Use of funds = acquisitions, research/development- Recently won $1.0b contract from US Special Ops Command01:07 | OpenAI next “Apple App Store”- releasing developer tools on Nov 6 conference- updates will reduce costs by 20x and include vision/image capabilities- current tender offer at $90b, 210% above its Apr 2023 round just 7 months ago02:51 | Loom acquired for $975m- Atlassian is acquiring for -36% vs last round ($1.53b)- Loom provides an asynchronous video solution- Loom has 25m customers, 5m monthly video conversations; Ford, Tesla, Amazon are customers03:46 | Flexport lays of 20% of staff - 660 of 3300 employees impacted- 70% drop in 1H 2023 revenue- Former CEO Dave Clark removed five weeks ago04:14 | Big capital raises- Electric Hydrogen | $380m Series C, $1.0b valuation- Headway | $125m Series C, $1.0b valuation- Bizongo | $50m Series E, $980m valuation- SuperOrdinary | $58m Series B, $800m valuation- Pulumi | $41m Series C, $391m valuation04:57 | Pre-IPO 5.75% for week- Week winners: Anthropic +100.1% (not a typo), Rippling +26.2%, Hugging Face +22.9%, OpenAI +14.7%, Brex +8.3%- Week losers: Epic Games -5.5%, Airtable -4.5%, ByteDance -3.7%, Cohere -3.0%, Deel -2.2%- Top valuations: ByteDance $203b, SpaceX $157b, OpenAI $77b, Stripe $53b, Databricks $45b

MLOps.community
The Centralization of Power in AI // Kyle Harrison // # 181

MLOps.community

Play Episode Listen Later Oct 13, 2023 61:34


MLOps podcast #180 with Kyle Harrison, General Partner at Contrary, The Centralization of Power in AI. // Abstract Kyle Harrison delves into the limitations imposed by language, underscoring how it can impede our grasp and manipulation of reality while stressing the critical need for improved language model performance for real-time applications. He further explores the perils of centralizing power in AI, with a specific focus on the "Openness of AI", where concerns about privacy are brought to the forefront, prompting his call for businesses to reconsider their reliance on it. The discussion also traverses the evolving landscape of AI, drawing comparisons between prominent machine learning frameworks such as TensorFlow and PyTorch. Notably, the episode underscores the vital role of open-source initiatives within the AI community and highlights the unexpected involvement of Meta in driving open-source development. // Bio Kyle Harrison is a General Partner at Contrary, where he leads Series A and growth-stage investing. He joined Contrary from Index where he was a Partner, and before that he was a growth investor at Coatue. His portfolio includes iconic startups and public companies including Ramp, Replit, Cohere, Snowflake, and Databricks. He also regularly shares his analysis on the venture capital landscape via his Substack Investing 101. // MLOps Jobs board https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links Website: https://investing1012dot0.substack.com/ The Openness of AI report: https://research.contrary.com/reports/the-openness-of-ai ⁠ --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Kyle on LinkedIn: https://www.linkedin.com/in/kyle-harrison-9274b278/ Timestamps: [00:00] Kyle's preferred beverage [00:20] Takeaways [03:52] Hype in technology space [09:20] Application Layer Revenue [14:44] Stability AI Lawsuit [18:08] Concern over concentration of power in AI [20:20] Transparency concerns [23:35] Open Source AI [25:57] To use or not to use Open AI [30:51] Lack of technical expertise and business-building capabilities [35:09] AI Transparency and Accountability [37:50] Traditional ML [41:47] Finding a unique approach [45:41] AGI limitations [47:43] Using Agents [49:46] Agents getting past demos [54:39] Tech Challenges & Hoverboard Dreams [58:04] Both AI hype and skepticism are foolish [01:27] Wrap up

AI Hustle: News on Open AI, ChatGPT, Midjourney, NVIDIA, Anthropic, Open Source LLMs

Join us for an engaging conversation with Databricks' Chief, Ari Kaplan, as we delve into the fascinating future of AI and data. Discover the innovative trends and insights that are shaping the landscape of data-driven AI technology. Don't miss this episode for a sneak peek into the ever-evolving world of data and artificial intelligence! Get on the AI Box Waitlist: https://AIBox.ai/Join our ChatGPT Community: ⁠https://www.facebook.com/groups/739308654562189/⁠Follow me on Twitter: ⁠https://twitter.com/jaeden_ai⁠