We all know the future of software is “cloud-native.” How did we get here? What’s coming next? Join Sam Ramji and become part of this growing community of friends. Let's go far together.
Discover how Rackspace Spot is democratizing cloud infrastructure with an open-market, transparent option for cloud servers. Kevin Carter, Product Director at Rackspace Technology, discusses Rackspace Spot's hypothesis and the impact of an open marketplace for cloud resources. Discover how this novel approach is transforming the industry. TIMESTAMPS[00:00:00] – Introduction & Kevin Carter's Background[00:02:00] – Journey to Rackspace and Open Source[00:04:00] – Engineering Culture and Pushing Boundaries[00:06:00] – Rackspace Spot and Market-Based Compute[00:08:00] – Cognitive vs. Technical Barriers in Cloud Adoption[00:10:00] – Tying Spot to OpenStack and Resource Scheduling[00:12:00] – Product Roadmap and Expansion of Spot[00:16:00] – Hardware Constraints and Power Consumption[00:18:00] – Scrappy Startups and Emerging Hardware Solutions[00:20:00] – Programming Languages for Accelerators (e.g., Mojo)[00:22:00] – Evolving Role of Software Engineers[00:24:00] – Importance of Collaboration and Communication[00:28:00] – Building Personal Networks Through Open Source[00:30:00] – The Power of Asking and Offering Help[00:34:00] – A Question No One Asks: Mentors[00:38:00] – The Power of Educators and Mentorship[00:40:00] – Rackspace's OpenStack and Spot Ecosystem Strategy[00:42:00] – Open Source Communities to Join[00:44:00] – Simplifying Complex Systems[00:46:00] – Getting Started with Rackspace Spot and GitHub[00:48:00] – Human Skills in the Age of GenAI - Post Interview Conversation[00:54:00] – Processing Feedback with Emotional Intelligence[00:56:00] – Encouraging Inclusive and Clear Collaboration QUOTESCHARNA PARKEY“If you can't engage with this infrastructure in a way that's going to help you, then I guarantee you it's not up to par for the direction that we're going. [...] This democratization — if you don't know how to use it — it's not doing its job.”KEVIN CARTER“Those scrappy startups are going to be the ones that solve it. They're going to figure out new and interesting ways to leverage instructions. [...] You're going to see a push from them into the hardware manufacturers to enhance workloads on FPGAs, leveraging AVX 512 instruction sets that are historically on CPU silicon, not on a GPU.”
In this episode of Open Source Data, Charna Parkey interviews Pete Pachal, founder of The Media Copilot. With over two decades of experience covering technology, Pete shares his insights on how AI is transforming media, journalism and discusses how journalists can embrace AI as a tool to enhance their work to adapt and thrive in this new environment. QUOTESPETE PACHAL: AI is something that you control. I know, it feels like it's a wave that's coming over that it's unstoppable, inevitable. And that's true to a large extent. But at the same time, it's not, there's no there, right? There's no spark, there's no intent. (...) Never relinquish your role as the ultimate creator and person responsible for what's coming out of this thing.CHARNA PARKEY: I think that there was a point where I found myself shifting more away from media and towards individual curated newsletters because like subject matter experts in that area, I could be like maybe they're going to summarize it incorrectly, et cetera. But at least I know my theory of mind of that individual. And then when I expand that to media, I don't know who's writing what and who's shadow writing what for who.TIMESTAMPS00:00:00 - Introduction of Pete Pachal and his background in journalism and AI.00:02:00 - Pete's career journey, including his work at CoinDesk and founding The Media Copilot.00:04:00 - AI training for media professionals (journalists, PR, marketers).00:06:00 - Evolution of AI in journalism: From skepticism to ethical frameworks.00:08:00 - AI in content pipelines: Idea generation vs. post-production tasks.00:10:00 - Open-source builders needing to cater to domain experts (e.g., journalists).00:12:00 - Meta's removal of fact-checking and its implications.00:16:00 - Public tolerance for AI errors (e.g., Apple's AI summaries).00:18:00 - Consumer trust shifts away from platforms like Facebook/X.00:22:00 - Ghostwriting vs. authenticity in AI-generated content.00:24:00 - Preference for human-curated newsletters over AI summaries.00:26:00 - AI in news digests (e.g., Perplexity, Alexa).00:28:00 - Publisher AI experiments (Washington Post chatbot, TIME summaries).00:32:00 - AI's impact on click-through rates and publisher economics.00:34:00 - AI-written articles (e.g., ESPN's use case) and copyright issues.00:36:00 - Legal battles over AI training data (NYT vs. OpenAI).00:38:00 - Copyright concerns with AI-generated outputs.00:40:00 - AI search tools (Perplexity, ChatGPT) and publisher licensing deals.00:46:00 - The unhealthy impact of social media trends on journalism.00:48:00 - Post-interview discussion: Accountability in AI and media.00:56:00 - Leo's perspective as a journalist on AI adoption.00:58:00 - Closing thoughts on balancing AI innovation with industry needs.
In this episode, Dr. Joan Bajorek—AI entrepreneur, author of Your AI Roadmap, and founder of Clarity AI—joins Charna Parkey to talk about what it really takes to build a future in AI. From career pivots and layoff anxiety to financial transparency and finding joy in your work, Joan shares practical advice and personal stories navigating fear, burnout, and career uncertainty in tech, while staying grounded in purpose, community, and long-term resilience.TIMESTAMPS[00:00:00] — Introduction to Joan Bajorek & Her Work[00:02:00] — Transparency About Finances and Career[00:04:00] — The Taboo Around Talking About Money[00:06:00] — Resilience During Tech Layoffs[00:08:00] — How to Get Credit for Your Work[00:12:00] — Should You Chase an AI Job?[00:14:00] — Career Goals vs. Financial Security[00:16:00] — Translating Academic and Life Skills into Tech[00:18:00] — Defining and Finding Joy in Work[00:20:00] — Multiple Income Streams and Personal Freedom[00:24:00] — AI's Near-Future Impact on Jobs and Industries[00:26:00] — Data and AI Opportunities in Underexplored Domains[00:34:00] — Creating Scalable, Alternative Income Models[00:36:00] — How Joan Maintains Long-Term Motivation[00:42:00] — Post-Interview DiscussionQUOTESJoan Bajorek"Networking is how I've gotten the best opportunities and jobs of my life... LinkedIn has this research about how after COVID layoffs, 70% of people landed their next job based on an intro."Charna Parkey"I always try to strive for transparency, and I get such mixed results where at work with coworkers, it's absolutely valued. And then there seems to always be some sort of consequences in my personal life."
Dr. Jason Corso joins Charna Parkey to debate the critical role of data quality, how its transparency shapes AI development and the rise of smaller, domain-specific AI models - making 2025 the year of small, specialized AI. QUOTESCharna Parkey"Knowing the right data is incredibly important, because it'll save you money, but predicting the impact of that data means that you don't have to do the training at all to even directionally know if it's going to work out, right?"Jason Corso "You can't understand and analyze an AI system in the way you can analyze open source software if you don't have access to the data."Timestamps[00:00:00] - Introduction[00:02:00] - Jason Corso's journey on open source[00:08:00] - The importance of data in AI[00:10:00] - Voxel 51's mission[00:14:00] - The value of open source and the importance of data in AI systems[00:20:00] - Recent discoveries in AI[00:28:00] - The cost of training AI models[00:36:00] - Cooperative AI in healthcare[00:40:00] - Charna Parkey on the impact of AI in education[00:56:00] -The year of small AI
In this episode of Open Source Data, Charna Parkey talks with Alex Gallego, CEO and founder of Redpanda Data, about his journey as a builder, the evolution of Redpanda, and the company's new agent framework for the enterprise. Alex shares insights on low-latency storage, distributed stream processing, and the importance of developer experience to the growth of AI and the Open Source space. Timestamps[00:00:00] Introduction[00:02:00] Alex Gallego talks about his background[00:04:00] Charna Parkey discusses the importance of hands-on experience in learning.[00:06:00] Alex explains the origins of Red Panda and how it emerged from challenges in the streaming space.[00:08:00] Alex details the evolution of Red Panda, its use of C-Star and FlatBuffers, and its low-latency design.[00:11:00] Alex discusses the positioning of Kafka versus Red Panda in the market.[00:20:00] Alex introduces Red Panda's new agent framework and multi-agent orchestration.[00:24:00] Alex explains how Red Panda fits into the evolving landscape of AI-powered applications.[00:30:00] The future of multi-agent orchestration.[00:44:00] Thoughts on AI model training and data retention.[00:46:00] Alex encourages future founders and shares his perspective on risk-taking.[00:50:00] Charna Parkey and Leo Godoy discuss the key takeaways from the conversation with Alex Gallego.[00:52:00] Charna reflects on open source trends and the role of developer experience in adoption.[00:54:00] Charna and Leo talk about the different types of founder journeys and the importance of team dynamQuotes Charna Parkey"For AI, unifying historical and real-time data is critical. If you're just using nightly or monthly data, it doesn't match the context in which your prediction is being made. So it becomes very important in the future of applying AI because you need to align those things."Alex Gallego"Every app is going to span three layers. The first layer is going to be your operational layer, just like you have to do business right now. Then there always has to be an analytical layer, and the third layer is this layer of autonomy."
In this episode, we dive deep into the world of neuro-symbolic AI with Emin Can Turan, CEO of Pebbles AI. Learn how this technology combines neuroscience, behavioral economics, and AI to revolutionize B2B go-to-market strategies. Emin explains how neuro-symbolic AI bridges the gap between human logic and machine learning, enabling smarter, context-aware systems that democratize complex workflows for startups and enterprises alike.Timestamps[00:00:00] - Introduction by Charna Parkey and introduction of Emin Can Turan.[00:02:00] - Emin's journey to AI and his background in go-to-market strategies.[00:06:00] - Emin explains his deep R&D phase and the development of neuro-symbolic AI.[00:08:00] - Emin describes the architecture of their AI system, including neuro-symbolic AI, generative AI, and agentic frameworks.[00:10:00] - Explanation of neuro-symbolic AI and its relevance to domain-specific problems.[00:12:00] - Discussion on the components of go-to-market strategies and the role of psychology and communication.[00:16:00] -The limitations of generative AI and how they applied strict communication tactics.[00:22:00] - Discussion on the importance of contextual science and data insights.[00:24:00] - The three agentic frameworks they use in their system.[00:26:00] - Explanation of how users control the product and the two co-pilots (strategy and execution).[00:36:00] - The ethical implications of AI and the potential for misuse.[00:38:00] - Discussion on the future of AI and the balance between dystopian and hopeful outcomes.[00:40:00] - Emin emphasizes the importance of truth and transparency in AI development.[00:42:00] - Emin shares his personal motivation for building his AI startup.[00:48:00] - Closing remarks and discussion on the user experience of their platform.[00:50:00] - Charna and Leo discuss the connection between Emin's work and the open-source community.QuotesEmin Can Turan"I felt that this was the future and that AI was the only technology that can digitalize this level of complexity for everyone to use. Nothing else could, you know, you can't use normal neural networks to do this. Even generative AI is not sufficient enough."Charna ParkeyI would love to be able to use Gen AI for more personal things. I love technology. I have the Oura Ring. I've got the Apple Watch. I want to feed that data into something that can somehow tell me and others, here's your state of mind. Here's what you're going to be affected by.
Learn how BrightHive's AI-powered platform is democratizing data insights, making them accessible to non-technical teams across organizations. Suzanne El-Moursi discusses the importance of data fluency and how BrightHive is helping businesses harness the power of their data.Timestamps00:00:00 - Introduction and Background00:02:30 - Journey to BrightHive and open source00:06:00 - The evolution of AI and BrightHive's approach00:14:00 - The data problem and the role of AI agents00:22:00 - Building BrightBot with open source frameworks00:26:00 - The future of AI agents and open source00:30:00 - People's reaction to DeepSeek 00:34:00 - The future of work and AI00:40:00- AI in education and personal growth00:42:00 - Suzanne's legacy 00:48:00 -Recap and takeaways with producer Leo GodoyQuotesCharna Parkey "Every single innovation comes out of some form of restriction or need. (...) Don't come and say, “oh, what is this? This is terrible”. I heard all kinds of responses to my excitement and to my belief."Suzanne El-Moursi"So if 97% of an organization is data consumers, there are strategists, the marketing analysts, the customer success associates, the managers all across the enterprise, who need to understand the insights in the company's data, in their functions, in their units, so that they can make the next right step for the customer and for their plan."
QuotesKent Keirsey "When we look at open source models, if you just release the weights, and you don't really release information on how the data set was captioned, for example, or how you construct the data set, if you don't really know how it got to the artifact that was released, as a user, you do not understand how it works."Charna Parkey But there's still a lot of claims by big tech right now about how anything on the internet should be fair use for training, even if, you know, it might have its own kind of copyrightTimestamps[00:02:00] - Kent Keirsey on his journey to open source[00:06:00] - Kent Keirsey on the Open Model Initiative (OMI)[00:08:00] -What makes a model truly open source[00:12:00] - The legal landscape of AI and copyright[00:14:00] - Kent Keirsey on the ethical implications of AI training data fair and use and AI development[00:26:00] Creativity, AI tools, personal AI models and recommendation algorithms:[00:32:00] - Kent Keirsey on TikTok and cultural clash:[00:38:00] - AI, self-reflection and a decision-making tool[00:42:00] - The Bria AI partnership[00:52:00] - The future of creativity, AI and Robotics:[01:00:00] - Final thoughts with producer Leo GodoyConnect with Kent KeirseyConnect with Charna Parkey
Join Charna Parkey as she recaps a transformative year in AI, exploring the delicate balance between innovation and ethics. From open source communities to global regulations, discover how trust, diversity, and collaboration are shaping the future of technology.
Episode QuotesVinay Kumar"I always believe in this: you don't need to solve a very large problem. Maybe it will take a lot of time to do that. A lot of resources to do that but something small, which you can have an opportunity to solve that could be very big or a fundamental for quite a bit is fantastic. Think of a scenario where your small fundamental idea is a base for another small fundamental idea for someone else." Charna ParkeyWe also want to ground it a little bit in impact we've been seeing. And I think in the financial, banking, insurance industries it's not, I would say, an even distribution of advancement. Different countries have different regulations and different appetites for risk."Timestamps- [00:00:00] Introduction by Charna Parkey.- [00:01:57] Vinay Kumar begins talking about his journey.- [00:05:27] Discussion on building a search engine for STEM researchers.- [00:07:06] Challenges with early deep learning.- [00:09:55] Conversation shifts to ML observability.- [00:17:06] Discussion on simplifying verticalized AI.- [00:22:30] Impact of large language models (LLMs) on AI.- [00:30:58] Comparison of autonomous cars with AI regulation.- [00:37:58] Vinay mentions his science fiction novels.- [00:42:19] Conversation summary with Producer Leo Godoy.
QuotesBrian Magerko“We're really trying to show that we could co-create experiences with AI technology that augmented our experience rather than served as something to replace us in creative act”.“For every project like [LuminAI], there's a thousand companies out there just trying to do their best to get our money... That's an uncomfortable place to be in for someone who has worked in AI for decades”.“I had no idea what was going to happen kind of in the future. When we started EarSketch... we were advised by a couple of colleagues to not do it. And here we are, having engaged over a million and a half learners globally”.Charna Parkey"I remember the first robot that I built. It was part of the first robotic systems... and watching these machines work with each other was just crazy."“If you're building a product and your goal is to engage underrepresented groups, it is on you to make sure that you're educating the folks in a way that you're trying to reach.”Episode timestamps(01:11) Brian Magerko's Journey into AI and Robotics (05:00) LuminAI and Human-Machine Collaboration in Dance(09:00) Challenges of AI Literacy and Public Perception(17:32) Explainable AI and Accountability (20:00) The Future of AI and Its Impact on Human Interaction
Timestamps00:00:00 - 00:01:23 - Introduction00:01:23 - 00:04:30 - Heather Domin's Journey00:09:50 - 00:12:48 - Open Source and AI Ethics00:12:48 - 00:15:25 - Generative AI and Governance00:23:40 - 00:26:22 - Future of Responsible AI Practices00:35:37 - 00:37:31 - Advice for the Audience00:37:31 - 00:46:04 - Reflection on Risk and Hope in AI QuotesHeather Domin"I think that each of us individually can scan our environment and understand, you know, where can I make an impact? What problem can I help solve? What is the next thing that I can really contribute to?""There are absolutely ways to automate, you know, the prompt testing and many of the routine tasks that you want to leverage automation in that way so that you can actually have the humans focus on other, other things so they can focus on the critical thinking and outside the box sort of thinking that we want the humans to be focused on."Charna Parkey"I think that it's a hard for people getting into it for the first time to jump to hope if they've experienced something that they should fear in the past. By that, I mean, groups that have been marginalized by other forms of technology are not going to start hopeful with this new one that is is using their data without their permission..""If for some reason I came to understand in a month what that meant, I should be able to go back and revoke and be like, nope, I actually don't want you to have that anymore. So I think that that would help people feel better." Check Heather's paper: On the ROI of AI Ethics and Governance Investments Connect with HeatherConnect with Charna
Timestamps1. Introduction and Background (00:00:00 - 00:01:16)2. Ethan's Journey (00:01:16 - 00:05:12)3. The Role of Food and Agriculture (00:05:12 - 00:06:52)4. Investment in Regenerative Agriculture and Generative AI (00:06:52 - 00:07:44)5. Levels of AI Impact (00:07:44 - 00:12:42)6. HowGood's Use of AI (00:12:42 - 00:13:20)7. Consumer Impact and Corporate Responsibility (00:13:20 - 00:15:44)8. Future of AI in Food Systems (00:15:44 - 00:20:30)9. Innovative Perspectives on AI Training (00:20:30 - 00:21:10) QuotesEthan Soloviev"What if we're using ecological data? What if we're training on trees and insects and animals and whale song? What kind of questions would a gen AI trained on whale song and hummingbird language ask us?"Charna Parkey"If we have this great translator that is Gen AI, we already have text and language to code. We can do code generation. We can already interpret this code and tell me what it's going to do. Take that code to language. Why can't we do that with some of these other senses and these other measurements?"Connect with EthanConnect with Charna
Timestamps00:00:00 - Intro00:02:00 - Beth's Journey00:19:33 - Ontologies in AI00:21:44 - Data Lineage and Provenance00:32:52 - Open Source Tools00:38:38 - Explainable AI00:44:58- Inspiration from NatureQuotesBeth Rudden: "The best thing that I could tell you that I see is that it's going to shift from more pure mathematical and statistical to much more semantic, more qualitative. Instead of quantity, we're going to have quality."Charna Parkey: "I love that because I've been so mathematical for most of my life. I didn't have a lot of words for the feelings or expressions, right? And so I had sort of this lack of data and the Brené Brown reference you make, like I have many of her books on my shelf and I often pull, I don't even know where it is right now, but the Atlas of the Heart because I am having this feeling and I don't know what it is."LinksConnect with BethConnect with Charna
Learn how Andrea Brown, CEO of Reliabl, is revolutionizing AI by ensuring diverse communities are represented in data annotation. Discover how this approach not only reduces bias but also improves algorithmic performance. Andrea shares insights from her journey as an entrepreneur and AI researcher. Episode timestamps(02:22) Andrea's Career Journey and Experience with Open Source (Adobe, Macromedia, and Alteryx)(11:59) Origins of Alteryx's AI and ML Capabilities / Challenges of Data Annotation and Bias in AI(19:00) Data Transparency & Agency(26:05) Ethical Data Practices(31:00) Open Source Inclusion Algorithms(38:20) Translating AI Governance Policies into Technical Controls(39:00) Future Outlook for AI and ML(42:34) Impact of Diversity Data and Inclusion in Open SourceQuotesAndrea Brown"If we get more of this with data transparency, if we're able to include more inputs from marginalized communities into open source data sets, into open source algorithms, then these smaller platforms that maybe can't pay for a custom algorithm can use an algorithm without having to sacrifice inclusion." Charna Parkey“I think if we lift every single platform up, then we'll advance all of the state of the art and I'm excited for that to happen."Connect with AndreaConnect with Charna
Episode timestamps(01:47) Asa Whillock's career journey at market-leading companies and the role of open source in each (Adobe, Macromedia, Alteryx)(04:56) Feature Labs acquisition by Alteryx and its open source roots in democratizing machine learning capabilities(11:00) Survey findings on enterprise board members' perspectives on AI and the need to move beyond policy creation to implementation and governance.(27:00) Applying AI capabilities and decision-making related to AI (30:00) The future of AI predominance, including cost reduction, open source model advancements, and the push for demonstrating business value(43:33) Advice for navigating AI expertise and decision-making, including continuous learning, self-awareness of decision-making models, and acknowledging knowledge limitsQuotesAsa Whillock"I love regulation. I think it's great. And people are like, what? Why would you say that? And the reason why I say that is because I think it puts a floor underneath all of us of what do we think good looks like?"Charna Parkey"I think we need to, as a community, focus on meeting them where they are if we really want the democratization that is promised. Yeah, I don't know any other way to do it."
Episode Timestamps(02:11): Robbi Armstrong's role at KeyBank and intersection with open source and AI initiatives in the financial industry(04:06): Compliance and regulatory trends in AI for banking(12:10): Organizational Change Management with AI(28:00): Responsible and Ethical AI(37:00): Financial Literacy and AI QuotesRobbi Armstrong“I truly believe that if you are an organization and you are sitting back and you're not organizing a team and you're not organizing a program and you're not learning, you're not looking at education, you're not looking at change management around Gen AI, I don't think you'll be here in two years. I really truly believe that. Because you won't be able to compete."Charna Parkey“I think the democratization is real and I think it's incredibly important because that step in between the domain expert and the technology is very lossy. You know, oftentimes we say, well, if only I had the data to answer your question let me give you a different answer or let me answer it completely and now we can actually put it in the hands of the experts and say, well, oh, then let's go collect that data." LinksConnect with RobbiConnect with Charna
Episode timestamps(05:06): State of open source in the UK (07:22): Importance of open source community (15:19): Balancing openness and regulation in AI (21:19): Pace of technological development and regulation(28:21): Reliability and discernment with AI outputs(35:24): Universal advice QuotesAmanda Brock“I think the governments that are going to win, the governments that are going to have the best regulation that promotes most innovation are going to be the ones which are able to make their regulatory environment flow in the same way as the technology evolution and innovation flows."Charna Parkey"I think the expectation needs to change. Part of what has happened with, you know, literal text search or keyword search and just Google and things like that, is that the average person expects what comes back to be relatively factual. That it's been referenced and, you know, backlinked, etc. That's a deterministic system. These are not. These are based upon statistical likelihoods of what word should come next." LinksConnect with CharnaConnect with Amanda
Episode timestamps(02:15): Tacita's unconventional career path to becoming a CTO (07:00): Textio's practices for building AI responsibly and ethically (14:00) The impact of Textio's AI on performance feedback (17:00) The importance of purpose-built vs generic AI models(28:00) Balancing open source and proprietary data/models (42:00) Advice for the AI industry moving forward QuotesTacita Morway“When you've got a team with different backgrounds, educational, lived experiences, identity, careers, all of those things, we have those different perspectives in the room. And we're all working off of the same expectations. We can catch each other's gaps.”Charna Parkey“There's an interesting conversation happening, I think, in the community right now about these purpose-built LLMs. Are they as good as generic LLMs? Sure, certainly if you're not going to apply something purpose-built to something generic or outside of its domain, it is not as good. But I think some of this shows us that unless you have something purpose-built and unless you're leveraging the data in the right way, you may just be feeding noise back into the system.” LinksConnect with TacitaConnect with Charna
Timestamps(00:02:29) Fabiana's journey starting YData and becoming a public speaker (00:20:19) Misconceptions and hype around generative AI and AGI (00:32:46) Potential real-world impact and use cases of LLMs today (00:34:55) The role of synthetic data in making AI models more robust and fair (00:43:55) Advice for founders: value your time and learn to say no (00:48:24) The importance of technical leaders being able to communicate well QuotesCharna Parkey: "It's a balance. I think that's also what led us to some of the demographic based data science. Essentially, folks were making like event data into pre-aggregated data. And then they were trying to obscure it so much that you couldn't get back to the person. And so you're like, okay, what's their age and what's their gender? And you're like, that's not actually the most useful part of data science that can't predict behavior or intent or any of that. It throws out time as a component of the entire process, seasonality, everything. And so there just, there has to be a better way."Fabiana Clemente: "I have to say, that's a very beautiful way to put it. Hallucinations, I have to say. I never thought about that. And it makes a lot of sense. I do think, though, that in terms of LLMs, it's so language, it's so definitely, it sounds like we are getting very, very intelligent system, exactly, because language is very complex. And we know that was needed for the leap of humanity. I do think there are other, the sense of combining. Well, and here we enter in the multimodal kind of space. It's what's missing." LinksConnect with CharnaConnect with Fabiana
Episode timestamps(02:15): Challenges of collecting open source usage data(22:06): Driving impact with open source usage data(28:27): Avi's entrepreneurial journey(39:42) Persistence and vision in startups(44:03) Tracking outcomes to stay motivated QuotesAvi Press“I mean, one thing is, for any project that you might be thinking about doing or any initiative that you want to work on or goal that you have, I think there's a lot of power in just trying the thing. You may not have all the details figured out, but just try it anyway and see where it takes you. And I think a lot of projects that I've ever worked on that led anywhere, I didn't know all these details, but I just start trying and seeing what works anyway and being very open to it not working out, but attempting it anyway. And then the other thing, which is I think admittedly fitting into our agenda at Scarf, but it is something that I really believe, which is that for any of these things you're doing, tracking the outcomes of that thing is very, very important and will both be tactically helpful, but also I think, like you said, give you these inspirational moments that keep you going, whether that's awe or inspiration or fulfillment or whatever that feeling is that helps you keep going. I think that tracking the outputs of your work such that you can understand the impact that you have is both very strategic and the most rewarding way to do anything, I think”. Charna Parkey“Given the venture-backed nature of a lot of these startups, there's going to have to be some sort of monetization at some point. You're not gonna have 1 million, 10 million, 40 million dollars dumped into just giving software away for free. So sort of these misaligned motivations are certainly what raised my hackles where I'm like, oh, you're claiming forever or you're claiming that you're like a values-driven organization, but you're venture-backed and you need to make money. And so show me how those motivations align or misalign. Tell me what your monetization strategy is gonna be. I know you need one. That way I'm not wondering, should I use this? Should I not?” LinksConnect with CharnaConnect with Avi
Timestamps00:00 - Intro05:10 - Paula's Professional Journey10:30 - What Inspired Paula to Go Through the Open Source Path14:50 - What are some of the biggest challenges and impacts that Paula sees in companies trying to derive value?23:30 - Is the Tech World a Meritocracy? 25:35 - A Shift Of What is a Tech Company?27:30 - Kids Interacting with New Technologies31:30 - What Does Open Source Data Means to Paula? 42:50 - What is a Question that Paula has never been asked before?47:00 - What Advice would you give to the audience? 51:50 - Backstage with Executive Producer Leo Godoy LinkedIn - Connect with CharnaLinkedin - Connect with Paula
This episode features an interview with Charna Parkey, Real-Time AI Product and Strategy Leader at DataStax. Charna has been developing AI and ML products over the last 17 years and has worked with 90 of the Fortune 100 in her various roles. She is also a co-author and inventor on several patents.In this episode, Sam and Charna discuss handing over the role as host, Sam's new startup journey, and how their thinking has evolved during the explosion of LLMs.-------------------“Now, it seems like we have this opportunity where the conversation and the place that society is at is different. Where we want to contribute to the right set of data when we talk open source data. We want to make sure that we have the right data to train this model in order to get the right outcome. We want to provide a lens of, ‘All right, you are this persona. How would you say this thing?' I do think that from a lot of what the LLMs have today, the outcome of those words are still missing. And we need to solve that. Like, ‘Is this piece of writing actually going to achieve the outcome I want versus am I following legal's guidelines? Am I technically correct? Is my CEO going to like it?' That doesn't mean you're achieving impact in the world. There's an aspect there where we've given feedback loops, it seems, to be like, ‘Did I like the answer or not?' But not, ‘Did I take an action?' As we get to autonomousness, we're going to have to have an outcome or multiple outcomes associated with the reward of the system.” – Charna Parkey“I personally believe that all cognition is bias. My degree is in cognitive science. One of the things that we trained on is attention. And to pay attention, literally means to selectively choose what data is coming in from the world that you're going to pay attention to and what you're going to discard. Which is also, to me, the definition of bias. All cognition is bias, but what do we care about? Do you trust this thing? What does that mean? Well, do you trust it to do these particular actions to a level of consistency in this particular domain? It doesn't mean that you're going to trust it in all environments. There's a lot more nuance that hopefully will evolve in this strange age of nuanced destruction machines.” – Sam Ramji-------------------Episode Timestamps:(01:04): Sam and Charna catch up (06:05): Sam explains his new company, Sailplane (14:21): How Charna's thinking has evolved during the LLM explosion(25:45): Sam's thoughts after 5 seasons of Open||Source||Data(38:52): What Charna is looking forward to in the next season of the podcast(40:44): A question Sam wishes to be asked(45:45): Backstage takeaways with executive producer, Audra Montenegro-------------------Links:LinkedIn - Connect with CharnaLinkedIn - Connect with SamLearn more about Sailplane
This episode features a panel discussion with Stefano Maffulli, Executive Director of the Open Source Initiative (OSI); and Stephen O'Grady, Co-founder of RedMonk. Stefano has decades of experience in open source advocacy. He co-founded the Italian chapter of Free Software Foundation Europe, built the developer community of the OpenStack Foundation, and led open source marketing teams at several international companies. Stephen has been an industry analyst for several decades and is author of the developer playbook, The New Kingmakers: How Developers Conquered the World.In this episode, Sam, Stefano, and Stephen discuss the intersection of open source and AI, good data for everyone, and open data foundations.-------------------“Internet Archive, Wikipedia, they have that mission to accumulate data. The OpenStreetMap is another big one with a lot of interesting data. It's a fascinating space, though. There are so many facets of the word ‘data.' One of the reasons why open data is so hard to manage and hasn't had that same impact of open source is because, like Stephen, the stories that he was telling about the startups having a hard time assembling the mixing and matching, or modifying of data has a different connotation. It's completely different from being able to do the same with software.” – Stefano Maffulli“It's also not clear how said foundation would get buy-in. Because, as far as a lot of the model holders themselves, they've been able to do most of what they want already. What's the foundation really going to offer them? They've done what they wanted. Not having any inside information here, but just judging by the fact that they are willing to indemnify their users, they feel very confident legally in their stance. Therefore, it at least takes one of the major cards off the table for them.” – Stephen O'Grady-------------------Episode Timestamps:(01:44): What open source in the context of AI means to each guest(16:21): Stefano explains OSI's opportunity to shine a light on models and teams(21:22): The next step of open source AI according to Stephen(25:38): Creating better definitions in order to modify software(33:09): The case of funding an open data foundation(42:31): The future of open source data(51:54): Executive producer, Audra Montenegro's backstage takeaways-------------------Links:LinkedIn - Connect with StefanoVisit Open Source InitiativeLinkedIn - Connect with StephenVisit RedMonk
This episode features a panel discussion with Mikiko Bazeley, Head of MLOps at Featureform; Zain Hasan, Senior Developer Advocate at Weaviate; and Tuana Celik, Developer Advocate at deepset.In this episode, Mikiko, Zain, and Tuana discuss what open source data means to them, how their companies fit into the AI-first ecosystem, and how jobs will need to evolve with the AI-native stack.-------------------“We're almost part of a fancy new AI robot kitchen that you'd find in Tokyo, in some ways. I see a virtual feature store as, yes, you can have a bunch of your ingredients tossed into a closet. Or, what you can do is you can essentially have a nice way to organize them. You can have a way to label them, to capture information.” – Mikiko Bazeley“I really like that analogy as well. I like how Mikiko put it where a vector search engine is really extracting value from what you've already got. [...] So where I see vector search engines, really, is if we think of these embedding providers as the translators to take all of our unstructured data and bring it into vector space into a common machine language, vector search engines are essentially the workhorses that allow us to compute and search over these objects in vectorized format. They're essentially the calculators of the AI stack.” – Zain Hasan“Haystack, I would really position as the kitchen. I need Mikiko to bring the apples. I need Zain to bring the pears. I need Hugging Face or OpenAI to bring the oranges to make a good fruit salad. But, Haystack will provide the spoons and the pans and the knives to make that into something that works together.” – Tuana Celik-------------------Episode Timestamps:(02:58): What open source data means to the panelists(09:11): What interested the panelists about AI/ML(24:10): Mikiko explains Featureform(27:00): Zain explains Weaviate(30:23): Tuana explains deepset(36:00): The panelists discuss how their companies fit into the AI-first ecosystem(44:58): How jobs need to evolve with the AI-native stack(54:35): Executive producer, Audra Montenegro's backstage takeaways-------------------Links:LinkedIn - Connect with MikikoVisit FeatureformLinkedIn - Connect with ZainVisit WeaviateLinkedIn - Connect with TuanaVisit deepsetVisit Data-centric AI
This episode features an interview with Mona Rakibe, CEO and Co-founder of Telmai, an AI-based data observability platform built for open architecture. Mona is a veteran in the data infrastructure space and has held engineering and product leadership positions that drove product innovation and growth strategies for startups and enterprises. She has served companies like Reltio, EMC, Oracle, and BEA where AI-driven solutions have played a pivotal role.In this episode, Sam sits down with Mona to discuss the application of LLMs, cleaning up data pipelines, and how we should think about data reliability.-------------------“When this push of large language model generative AI came in, the discussions shifted a little bit. People are more keen on, ‘How do I control the noise level in my data, in-stream, so that my model training is proper or is not very expensive, we have better precision?' We had to shift a little bit that, ‘Can we separate this data in-stream for our users?' Like good data, suspicious data, so they train it on little bit pre-processed data and they can optimize their costs. There's a lot that has changed from even people, their education level, but use cases also just within the last three years. Can we, as a tool, let users have some control and what they define as quality data reliability, and then monitor on those metrics was some of the things that we have done. That's how we think of data reliability. Full pipeline from ingestion to consumption, ability to have some human's input in the system.” – Mona Rakibe-------------------Episode Timestamps:(01:04): The journey of Telmai (05:30): How we should think about data reliability, quality, and observability (13:37): What open source data means to Mona(15:34): How Mona guides people on cleaning up their data pipelines (26:08): LLMs in real life(30:37): A question Mona wishes to be asked(33:22): Mona's advice for the audience(36:02): Backstage takeaways with executive producer, Audra Montenegro-------------------Links:LinkedIn - Connect with MonaLearn more about Telmai
This episode features an interview with Larry Augustin, angel investor and advisor to early-stage technology companies. Larry previously served as the Vice President for Applications at AWS, where he was responsible for application services like Pinpoint, Chime, and WorkSpaces.Before joining AWS, Larry was the CEO of SugarCRM, an open source CRM vendor. He also was the founder and CEO of VA Linux, where he launched SourceForge. Among the group who coined the term “open source”, Larry has sat on the boards of several open source and Linux organizations.In this episode, Sam and Larry discuss who owns the rights to data, the data in to data out ratio, and why Larry is an open source titan.-------------------"People are willing to give up so much of their personal information because they get an awful lot back. And privacy experts come along and say, ‘Well, you're taking all this personal information'. But then most people look at that and say, ‘But I get a lot of value back out of that.' And it's this data ratio value question, which is: for a little in, I get a lot back. That becomes a key element in this. And I think there has to be some kind of similar thought process around open source data in general, which is if I contribute some data into this, I'm going to get a lot of value back. So this data in to data out ratio, I think it's an incredibly important one. And it gets everyone in the mindset of, ‘How do I provide more and more and take less and less?' It's a principle of application development that I like a lot. And I think there's a similar concept here around open source data. Are there models or structures that we can come up with where people can contribute small amounts of data and as a result of that, they get back a lot of value.” – Larry Augustin-------------------Episode Timestamps:(02:52): How Larry is spending his time now after AWS(06:25): What drove Larry to open source(18:41): What is the GPL for data?(24:28): Areas of progress in open source data(28:57): The data in to data out ratio(36:39): Larry's advice for folks in open source-------------------Links:LinkedIn - Connect with LarryTwitter - Follow Larry
This episode features an interview with Jorge Torres, Co-founder and CEO of MindsDB. MindsDB is a virtual AI database that works with existing data to help developers build AI-centered apps. In 2008, Jorge began his work on scaling solutions using machine learning as the first full-time engineer at Couchsurfing, growing the company from a few thousand users to a few million. He has also served a number of data-intensive start-ups and was a visiting scholar at UC Berkeley researching machine learning automation and explainability.In this episode, Sam and Jorge discuss the inspiration and challenges behind MindsDB, classic data science AI versus applied AI, and time series transformers.-------------------“So much data in the world is time series data, so much data. Even data that people don't know is time series, it's time series. So long as it's moving over time, it is time series data. Whether you store it or not, that's a different thing. For having a pre-trained model on time series data, it even enabled the fact that you don't have to store all the historical data. You can just take the model and start passing data as it comes through, and then you get out the forecast. So you don't even have to have the historical data. All you need to have is the data at that given instance, and you can pass it to the model and you get an output. It's mind blowing.” – Jorge Torres-------------------Episode Timestamps:(05:20): The inspiration behind MindsDB(10:20): Classic data science AI approach vs. applied AI(22:09): What open source data means to Jorge(28:51): What excites Jorge about Nixtla and time series transformers(37:07): A question Jorge wishes to be asked(40:20): Jorge's advice for the audience(41:38): Backstage takeaways with executive producer, Audra Montenegro-------------------Links:LinkedIn - Connect with JorgeLearn more about MindsDB open source codeLearn more about MindsDB
On this episode, we've partnered with the Future Rodeo podcast for a discussion between Sam and Matt Wallace. Matt is the Chief Technology Officer and EVP at Faction, a pioneer of multi-cloud data services, and host of Future Rodeo.In this episode, Sam and Matt discuss Microsoft's transformation, the impact of Kubernetes on container orchestration, and the rapid acceleration of AI research and development.-------------------Episode Timestamps:(01:38): Microsoft's open source transformation(13:19): The impact of Kubernetes and how it defragmented the industry(22:06): The transformative power of AI and how it's changing the value of reasoning(54:58): The concept of cognitive economy and its potential impact on AI and software development(01:03:25): Potential implications of advancements in robotics, AI, and clean energy(01:04:17): Sam's advice for those entering the industry or choosing a career path-------------------Links:LinkedIn - Connect with MattListen to the Future Rodeo podcast
This episode features an interview with Abby Kearns, technology executive, board director, and angel investor. Her career has spanned executive leadership, product marketing, product management, and consulting across Fortune 500 companies and startups, including Puppet, Cloud Foundry Foundation, and Verizon. Abby currently serves as a board director for Lightbend, Stackpath, and Invoke. In this episode, Sam sits down with Abby to discuss the betrayal source license, the role open source plays in AI, and empowering trust.-------------------“There's so much happening so quickly that I think open source has the power to help harness a lot of that innovative conversation. In a way that I think it's going to be really, really hard to match in a proprietary way. I think open source and the ability, given the fact that we're talking about AI and data, the two are very interrelated at this point. AI is not super interesting without data. I think the power of open source right now and what's happening, I think it has to happen in open source and I think it really has to have that level of transparency and visibility. But, always the ability for everyone to step up and understand what's happening at this moment in time and shape it.” – Abby Kearns-------------------Episode Timestamps:(00:50): Sam and Abby discuss the betrayal source license(14:12): What open source data means to Abby(23:30): Abby dives into the companies she's investing in(34:30): How nonprofits can empower trust(38:32): A question Abby wishes to be asked(40:21): Abby's advice for the audience(43:53): Backstage takeaways with executive producer, Audra Montenegro-------------------Links:LinkedIn - Connect with AbbyTwitter - Follow AbbyRead Design the Life You Love
This episode features an interview with Daniel Lenton, Founder and CEO of Ivy, where the team is on a mission to unify the fragmented AI stack. Prior to Ivy, Daniel was a Robotics Research Engineer at Dyson and a Deep Learning Research Scientist for Amazon Prime Air. During his PhD, Daniel explored the intersection between learning-based geometric representations, ego-centric perception, spatial memory, and visuomotor control for robotics.In this episode, Sam and Daniel discuss the inspiration behind Ivy, open source reproducibility, and democratizing AI.-------------------"There's too much amazing stuff going on, from too many different parties. We just want to be the objective source of truth to show you the data and show you where your model will be doing best, and continue to do this as a service or something like this. This is high-level, some of the areas we see and going into, we really want to be a useful tool for anybody that wants to just kind of understand this fragmented complex space quickly and intuitively, and we are trying to be the tool that does that." – Daniel Lenton-------------------Episode Timestamps:(01:00): What open source data means to Daniel(05:37): The challenges of building Ivy(15:37): The future of Ivy(25:19): Who should know about Ivy(28:46): Daniel's advice for the audience(32:00): Backstage takeaways with executive producer, Audra Montenegro-------------------Links:LinkedIn - Connect with DanielLearn more about Ivy
This episode features an interview with Demetrios Brinkmann, Founder of the MLOps Community, an organization for people to share best practices around MLOps. Demetrios fell into the Machine Learning Operations world and has since interviewed leading names around MLOps, data science, and machine learning. In this episode, Sam sits down with Demetrios to discuss LLM in production use cases, ML engineering teams, and the LLM Survey Report from the MLOps Community.-------------------"I think the most novel ones that I saw from the survey were when a chat bot would prompt a human as opposed to the human prompting the chat bot. It's almost like you have this LLM coach. And in that way, it's not necessarily like this isn't LLM in production that an end user is getting that's not outside the business or that is outside the business. It's more like internally, you can think about maybe it's an accountant and the accountant is filing my taxes for the year. As they're filing them, the LLM is prompting them on different tax laws that maybe they weren't thinking about or different ways that they could file things." – Demetrios Brinkmann-------------------Episode Timestamps:(04:30): LLMs as the new standard(19:26): Key LLM in production use cases(31:18): What open source data means to Demetrios(34:36): What Demetrios is seeing in open source AI models(42:44): One question Demetrios wishes to be asked(44:41): Demetrios's advice for the audience(47:19): Backstage takeaways with executive producer, Audra Montenegro-------------------Links:LinkedIn - Connect with DemetriosRead the LLM Survey ReportListen to The MLOps Podcast
This bonus episode features conversations from season 5 of the Open||Source||Data podcast. In this episode, you'll hear from Jaya Gupta, Partner at Foundation Capital; Yuliia Tkachova, Co-founder and CEO of Masthead Data; and Omoju Miller, Founder and CEO of Fimio.Sam sat down with each guest to discuss how they are building foundations for trust, inspiration, and reputation as we all race into the AI-centric future.You can listen to the full episodes from Jaya Gupta, Yuliia Tkachova, and Omoju Miller by clicking the links below.-------------------Episode Timestamps:(00:49): Jaya Gupta(01:48): Yuliia Tkachova(03:03): Omoju Miller-------------------Links:Listen to Jaya's episodeListen to Yuliia's episodeListen to Omoju's episode
This episode features an interview with Jaya Gupta, Partner at Foundation Capital, where she leads early-stage investments across the enterprise software stack. Previously, Jaya was a Senior Business Analyst at McKinsey & Company focusing on software diligence and helping startups expand their go-to-market strategies.In this episode, Sam and Jaya discuss her journey to Foundation Model Ops, how software is becoming more accessible, and the democratization of AI tools.-------------------"At the end of the day, FMOps isn't just about the new tools. It's actually more about the new builders, the new workflows, and a completely new market of customers. I was on the other day, looking at LangChain's page of integrations, I don't know if you've seen it, but it's like Anyscale, Databricks, all these other huge legendary companies are integrating with LangChain, and I think it's clear that there's a huge community that is building something real and valuable." – Jaya Gupta-------------------Episode Timestamps:(01:05): What open source data means to Jaya(08:51): Jaya's journey to Foundation Model Ops(15:58): How software is becoming more accessible(23:04): The democratization of AI tools(27:01): One question Jaya wishes to be asked(29:32): Jaya's advice for the audience(31:51): Backstage takeaways with executive producer, Audra Montenegro-------------------Links:LinkedIn - Connect with JayaFollow Jaya on TwitterLearn more about FMOps
This episode features an interview with Bart Farrell, is a CNCF Ambassador, a Cloud Native Community Consultant, and Content Creator. An American entrepreneur living in Spain, Bart has spent the last decade helping tech companies broaden their audience through exceptional content. He has organized and hosted over 250 cloud native in-person and virtual events in 10 different countries.In this episode, Audra and Bart discuss upcoming AI and MLOps events, his work as a community consultant, and what open source data means to him.-------------------“When we're looking at other technologies, in particular use cases like low latency, if we're talking about autonomous vehicles, we're talking about the financial sector, we're talking about fraud detection, things where decisions have to be made in real time. What are the technologies that are helping out with that? How can organizations, some that are more advanced than others, go through that adoption phase? And others that aren't so advanced, that haven't really moved things yet into production, how can they be better prepared in order to tackle these challenges that are coming up? That being said, we've got quite a cross section of different larger and smaller organizations that are really playing a pivotal role in the changes that are going on when it comes to edge meeting AI and MLOps.” – Bart Farrell-------------------Episode Timestamps:(01:27): Bart's background(02:45): Bart dives into The Cutting Edge of MLOps live event(06:18): What open source data means to Bart-------------------Links:LinkedIn - Connect with BartTwitter - Follow BartLearn more about The Cutting-EDGE of MLOps webinarLearn more about Edgecase 2023Listen to The AI-Native Stack with Mikiko Bazeley, Zain Hasan, and Tuana Celik
This episode features an interview with Omoju Miller, Founder and CEO of Fimio, a web3 reputation company. Originally from Lagos, Nigeria, Omoju holds a doctoral degree in Computer Science Education from UC Berkeley. Her expertise in machine learning and computational intelligence led her to companies such as Google and GitHub. Omoju also served as a volunteer advisor to the Obama administration's White House Presidential Innovation Fellows.In this episode, Sam sits down with Omoju to discuss how machine learning can make applications more secure, what the future of the internet looks like, and the fascinating story behind Fimio.-------------------“So my first view is, in this future internet we have people, we also have bots, we have machines, we have code doing things. And bots sounds like such a horrible word now. [...] You need to have a level of trust on what that bot is. Everything from the humans to the machines collaborating in this decentralized world, we need to have some kind of reputation attached to each of those nodes. And the reason why we need that reputation is, as the thing scales, it becomes overwhelming to get value from it. You need something to help you filter, to find what you're looking for. Otherwise, you get stuck in that environment where you're just completely overwhelmed and you don't even know what to do. So I think of what I'm doing as just reputation to make this decentralized future slightly more attainable.” – Omoju Miller-------------------Episode Timestamps:(00:59): Omoju's inspiration for starting Fimio(10:27): The future of smart contracts(28:47): Using mathematics to guarantee the safety of algorithms(34:34): What led Omoju to building a mathematical product(51:27): What open source data means to Omoju(55:38): One question Omoju wishes to be asked(57:47): Omoju's advice for the audience(01:00:08): Backstage takeaways with executive producer, Audra Montenegro-------------------Links:LinkedIn - Connect with OmojuVisit Fimio
This episode features an interview with Yullia Tkachova, Co-founder and CEO of Masthead Data, an observability platform that catches anomalies in Google BigQuery in real-time. She holds degrees in Management Information Systems, Math, Statistics, and Marketing. Prior to Masthead, Yuliia designed complex BI products and solutions powered by ML and utilized by Fortune 500 companies.In this episode, Sam and Yuliia discuss how ML is shaping the future of data analytics, caring about users, and the fundamental human right to privacy.-------------------“We map those errors and anomalies on lineage, helping to understand what upstreams and downstreams are affected, what business users are affected. And that actually speeds up all the troubleshooting from hours to minutes. And this is the ultimate goal where we deliver. Because again, my belief that if you don't have this lineage piece was mapped anomalous in errors, it's not observability. It's monitoring. [...] What is also very unique to us, because Masthead operates on logs, it's triggered by logs. So, we do support streaming data. Unlike SQL-first solutions, as you can guess. We don't have to run SQL queries to see if they're anomalous, we're triggered by logs. And this is also what sets us apart.” – Yuliia Tkachova-------------------Episode Timestamps:(01:14): What got Yuliia excited about math and statistics(11:31): The basic human right to privacy(18:21): What open source data means to Yuliia(28:00): Yuliia's reason for building a solution focused on privacy and security(38:09): One question Yuliia wishes to be asked(42:21): Yuliia's advice for the audience(44:46): Backstage takeaways with executive producer, Audra Montenegro-------------------Links:LinkedIn - Connect with YuliiaVisit Masthead Data
This episode features an interview with Maxim Fateev, Co-founder and CEO of Temporal, an open source, distributed, and scalable workflow orchestration engine capable of running millions of workflows. He has 20 years of experience architecting mission-critical systems at Uber, Google, Amazon, and Microsoft. In this episode, Sam sits down with Maxim to discuss workflow services, the power behind Temporal, and bringing determinism to highly complex environments.-------------------“[Temporal] has this notion of workflows, which can run for a very long time and handle external events, you can treat them as a durable actor. And they're very good at implementing a lifecycle. For example, you can have an object per model and let this object handle all the events. Like, new data came in, notify this object, this object will go and retrain it. Or, it'll run an activity to superiorly check the status. So you can have end-to-end lifecycle implemented fully in Temporal.” – Maxim Fateev-------------------Episode Timestamps:(01:03): What's top of mind for Maxim in workflow services(04:09): What open source data means to Maxim(11:07): Maxim explains his time at AWS and building Cadence at Uber(23:09): Use cases and the community of Temporal(28:26): How Temporal is being used for ML workloads(32:28): One question Maxim wishes to be asked(36:38): Maxim's advice for those working with complex distributed systems(39:11): Backstage takeaways with executive producer, Audra Montenegro-------------------Links:LinkedIn - Connect with MaximTemporal.ioWatch Maxim's talk “Designing a Workflow Engine from First Principles”Replay Conference 2023
This episode features a panel discussion with Charna Parkey, a Real-Time AI Product and Strategy leader at DataStax; and Sam Bean, Staff Engineer at You.com. Charna is a co-author and inventor on several patents, including patent-pending work on ML/coordinated feature engine at the edge. Sam helped create the Spark connector to Weaviate, and is passionate about Big Data, Spark, NLP, Hugging Face, and large language models.In this episode, Charna and Sam discuss adapting to user expectations, what's missing in the AI stack, and how to become an advanced citizen in open source.-------------------"We've seen these companies start to better understand that these streaming technologies have a place, whether it's Kafka or Flink or Pulsar, but it's still incredibly difficult to use and we need a different level of abstraction. [...] We're starting to see the stack change so that it becomes more interchangeable of the components and try to sort of raise that layer of abstraction so that we can get these types of models and these types of capabilities to more people." – Charna Parkey"I think that a lot of what you need to adjust to are these, what you were discussing as I call interaction data, you were calling it event data. But these interactions that people have with the internet and trying to find ways to model that in a way that even if your models aren't real-time, having ways to featurize real-time data in a way that's interpretable by a model. [...] I think Spark and Kafka and Delta and all of those things, give you a lot more flexibility now to move in different directions and readjust and I think, pivot what you want to do with the system." – Sam Bean-------------------Episode Timestamps:(01:29): Sam explains his background(03:36): Charna explains her background(18:13): Sam explains the problems You.com is solving for(28:21): Changes in user expectations in the AI-native stack(39:09): Advice for becoming an advanced citizen in open source(47:25): What's missing in the AI stack(54:51): What open source data means to the panelists(58:22): How technologists should prepare for the future(01:03:10): Executive producer, Audra Montenegro's backstage takeaways-------------------Links:LinkedIn - Connect with CharnaVisit DataStaxLinkedIn - Connect with SamVisit You.com
This episode features a panel discussion with Mikiko Bazeley, Head of MLOps at Featureform; Zain Hasan, Senior Developer Advocate at Weaviate; and Tuana Celik, Developer Advocate at deepset.In this episode, Mikiko, Zain, and Tuana discuss what open source data means to them, how their companies fit into the AI-first ecosystem, and how jobs will need to evolve with the AI-native stack.-------------------“We're almost part of a fancy new AI robot kitchen that you'd find in Tokyo, in some ways. I see a virtual feature store as, yes, you can have a bunch of your ingredients tossed into a closet. Or, what you can do is you can essentially have a nice way to organize them. You can have a way to label them, to capture information.” – Mikiko Bazeley“I really like that analogy as well. I like how Mikiko put it where a vector search engine is really extracting value from what you've already got. [...] So where I see vector search engines, really, is if we think of these embedding providers as the translators to take all of our unstructured data and bring it into vector space into a common machine language, vector search engines are essentially the workhorses that allow us to compute and search over these objects in vectorized format. They're essentially the calculators of the AI stack.” – Zain Hasan“Haystack, I would really position as the kitchen. I need Mikiko to bring the apples. I need Zain to bring the pears. I need Hugging Face or OpenAI to bring the oranges to make a good fruit salad. But, Haystack will provide the spoons and the pans and the knives to make that into something that works together.” – Tuana Celik-------------------Episode Timestamps:(02:08): What open source data means to the panelists(08:22): What interested the panelists about AI/ML(23:20): Mikiko explains Featureform(26:11): Zain explains Weaviate(29:34): Tuana explains deepset(35:11): The panelists discuss how their companies fit into the AI-first ecosystem(44:12): How jobs need to evolve with the AI-native stack(53:45): Executive producer, Audra Montenegro's backstage takeaways-------------------Links:LinkedIn - Connect with MikikoVisit FeatureformLinkedIn - Connect with ZainVisit WeaviateLinkedIn - Connect with TuanaVisit deepsetVisit Data-centric AI
This special episode of Open||Source||Data features an interview with Patrick McFadin. Patrick has been a distributed systems hacker since he first plugged a modem into his Atari computer. Looking for adventure, he joined the US Navy, working on the Naval Tactical Data System (NTDS), which cemented his love of distributed systems. He is now an Apache Cassandra Committer, and is the Vice President of Developer Relations at DataStax. Sam catches up with Patrick at Data Day Texas to discuss his book Managing Cloud Native Data on Kubernetes, Cassandra Forward, and the future of Apache Cassandra.-------------------“I can now use my Parquet file in Iceberg or DuckDB, and this is data that I created with Cassandra. And we're not getting to the point where we have to reinvent an entire database. We can just connect the Lego parts together and if they're open, then I don't have these encumbrances. I'm not like, ‘Well, I can connect that if I call a salesperson and get a license.' [...] That's what's exciting to me about Cassandra, the way that the ecosystem is evolving around Cassandra. It's not, ‘Cassandra's at the center, it's just a player.' It's at the party." – Patrick McFadin-------------------Episode Timestamps:(01:06): What open source data means to Patrick(02:11): Patrick discusses his book Managing Cloud Native Data on Kubernetes(10:02): Patrick discusses Cassandra Forward(11:09): The future of Apache Cassandra-------------------Links:LinkedIn - Connect with PatrickCassandra Forward
This episode features an interview with Denise Gosnell, Principal Product Manager at Amazon Web Services. At AWS, Denise leads product and strategy for Amazon Neptune, a fully managed graph database service. Her career centers on her passion for examining, applying, and advocating for the applications of graph data. Denise has also authored, patented, and spoken on graph theory, algorithms, databases, and applications across all industry verticals.In this episode, Sam sits down with Denise to discuss graph initiatives, the future of developer models, and what Denise learned from hiking the Appalachian Trail.-------------------“We just open sourced something called graph-explorer, which is something for the community by the community, Apache 2.0 license. graph-explorer is a low-code visualization tool. But, the best part about it is that it works for JanusGraph, it works for Blazegraph, it works for all of these graph models that we've talked about, because we've got this divided graph community, but it was written to work with all graphs. [...] Today it's all, ‘Here's your Lego blocks and build one on your own. If you want to go ahead and fork Jupyter Notebook and figure out a way to get that D3 force-directed graph way out to pop up, have fun.' It's the first time that we've had a unified way across graph vendors and graph implementations to have a way to visualize your graph data in one tool that's open source.” – Denise Gosnell-------------------Episode Timestamps:(01:17): What open source data means to Denise(04:27): How Denise got interested in computer science(08:39): Denise's work on graph initiatives(14:30): How Denise's work at LDBC relates to SQL standards(23:43): The future of developer models(29:43): One question Denise wishes to be asked(34:05): Denise's advice for graph practitioners(37:37): Executive producer, Audra Montenegro's backstage takeaways-------------------Links:LinkedIn - Connect with DeniseThe Practitioner's Guide to Graph Data
This episode features an interview with Ben Lorica, Co-founder and Principal of Gradient Flow, a company that provides a wide range of content on data and technology. Ben is an industry expert on data, machine learning, and AI. He is a Technical Advisor for Databricks, a program chair for several data conferences, and he hosts The Data Exchange Podcast.In this episode, Sam and Ben discuss Big Data and the improvements and future opportunities of AI and machine learning.-------------------“The reason I use the word decentralize is because when you try to explain it to someone, let's say you want to train a different model for each user, or region, or sensor, or device. So you can't use necessarily just personalized because recommenders can be personalized, but they're still centralized models.” – Ben Lorica-------------------Episode Timestamps:(01:17): What open source data means to Ben(05:54): What intrigued Ben about Big Data(12:07): What brought Ben to working on Ray(16:15): Ben's opinion on how far AI and ML have come in the last 5 years(26:38): What Ben sees happening in this space in the next 5 years(39:06): What challenges Ben sees in the next 5 years (43:51): One question Ben's always wanted to be asked(44:55): Ben's advice for those starting their open source data adventure(46:34): Executive producer, Audra Montenegro's backstage takeaways-------------------Links:LinkedIn - Connect with BenGradient Flow's NewsletterGradient Flow's 2023 Trends ReportVisit Sky Labs
This episode features an interview with Holden Karau, an Open Source Engineer at Netflix. Holden is best known for her work on Apache Spark, her advocacy in the open source software movement, and her creation of a variety of related projects including spark-testing-base. Previously, Holden worked at Big Tech companies like Apple, IBM, and Google as a software engineer and developer advocate.In this episode, Sam sits down with Holden to discuss the data analysis stack, functional programming, and the future of open source software data tooling.-------------------“These things are not one off. We may think that they're one off and they don't need testing, but that's not the reality. When you write something, it needs to be maintainable and as software people, the only real way that I think we know to make something vaguely maintainable is to at least have tests. And these tests need to cover common failure cases that we've experienced. And certainly, there's different approaches to this. There's property based testing, there's golden sets, all kinds of different options. I don't think necessarily any one approach is right or better here, but I think we need something. We need less untitled 5.IPython Notebook running in production, scheduled every hour. That is not a way to run a company.” – Holden Karau-------------------Episode Timestamps:(02:27): What open source data means to Holden(04:37): What interested Holden in mathematical computer science (09:51): What drew Holden to Spark(12:49): What Holden has learned about cognitive systems(20:02): What we need to learn as developers and data specialists(25:28): The future of the data analysis stack(31:21): Improvements in data tooling over the next 5 years(34:25): A question Holden wishes to be asked(40:51): Holden's advice for open source data project committers(43:18): Executive producer, Audra Montenegro's backstage takeaways-------------------Links:LinkedIn - Connect with HoldenBuy Holden's booksVisit Holden's website
This episode features an interview with Tom Baeyens, Co-founder and CTO of Soda, where he oversees the company's product development, software architecture, and technology strategy. He is passionate about open source and committed to building a community where data engineers can succeed using the Soda Data Monitoring Platform. Tom is the inventor of the widely-used open source project jBPM and Apache Activiti. He also co-founded Effektif, a cloud process automation company.In this episode, Sam and Tom discuss the evolution of open source workflow engines, data contracts, and why data quality needs a language approach.-------------------“Where we're heading is what I think is exactly the same as with software engineering in the testing. Test-driven development was a radical new thing back then. But then it turns out, you can much more reliably release software. And this is exactly the same here. If you don't inject data testing, data observability throughout your data stack, then how are you going to trust the data that you put into your machine learning model? This is something that people are realizing, but we're still figuring out the best practices, the dos, the don'ts. We've come a long way, but there's still a way to go before this is as common and as normal as in the test-driven development software engineering space.” - Tom Baeyens-------------------Episode Timestamps:(01:23): What open source data means to Tom(04:34): Tom's motivations for creating jBPM(09:39): What led Tom to building Soda(13:57): Why data quality needs a language approach(19:24): The community of Soda(22:47): The future of Soda as a technology(24:59): A question Tom wishes to be asked(30:24): Tom's advice for engineers who want to leverage data observability tools-------------------Links:LinkedIn - Connect with TomTwitter - Follow TomVisit SodaCL
This episode features an interview with Matthew Rocklin, CEO of Coiled, the scalable Dask-based cloud platform. Prior to founding Coiled, Matthew worked on Dask at Anaconda and then NVIDIA where his teams focused on accelerating Dask through parallel computing and GPUs. Matthew is an industry speaker, author, and founding member of Pangeo, whose mission is to develop open source analysis tools for ocean, atmosphere, and climate science.In this episode, Sam sits down with Matthew to discuss enabling edge workers, the future of data science, and the revolution of AI and ML.-------------------“There's all sorts of fun people using these tools and that's the most fun part of this job. You get to learn so much about so many different applications that are all so different and all so fascinating. You were thinking about all these different tools and technologies and I was talking to someone once, it's like, ‘Oh, it's like you're standing on the shoulders of giants.' That's not quite right. There's lots of sort of normal size people all standing on each other's shoulders in like a massive pyramid. [...] Dask was designed to scale up an existing ecosystem. There's a legacy Python ecosystem that'll provide a layer of parallel computing on top of it. You can do that either by rewriting the whole thing, which is not feasible, or you can do it by talking to lots of people and getting them to integrate in interesting, fun ways. That's actually been the fun parts of Dask. I think I've probably talked to every major maintainer group ever. I have worked with them to find out the ways to get everything to work smoothly together. And that's super fun. There's an interesting sort of technical and social hacking that occurs, which I think Python has done pretty well at, historically. Which is why it has success.” – Matthew Rocklin-------------------Episode Timestamps:(00:58): What open source data means to Matthew(03:29): Matthew's motivations behind Python(18:58): How Matthew is enabling edge workers (34:46): What the future of data Python space looks like(39:29): Matthew's advice for the technical data audience-------------------Links:LinkedIn - Connect with MatthewTwitter - Follow MatthewVisit Matthew's WebsiteVisit DaskDask ExamplesVisit CoiledSciPy Mission
This episode features an interview with Nithya Ruff, Head of Open Source Program Office at Amazon. At Amazon, she drives open source culture and coordination and engagement with external communities. Prior to Amazon, Nithya spearheaded and grew Open Source Program Offices (OSPOs) for Comcast and Western Digital. She has also served as the Director-At-Large on the Linux Foundation Board since 2016, where she works to advance the mission of building sustainable ecosystems that are built on open collaboration.In this episode, Sam and Nithya discuss OSPOs, how to measure success, and the evolution of the data ecosystem.-------------------“I think if we look at what matters to customers, which is innovation, trust, and being a force for change with open source, then we can really deliver on the metrics that the company cares about.” – Nithya Ruff-------------------Episode Timestamps:(04:02): What open source data means to Nithya(06:29): What interested Nithya about open source software(12:34): What Nithya learned at Western Digital and Comcast that she uses now at Amazon(18:23): What Nithya teaches people in OSPO curriculum(22:06): How the open source data ecosystem has evolved in the last decade(27:44): One question Nithya wishes to be asked(30:37): Nithya's advice for folks who want to create an OSPO-------------------Links:LinkedIn - Connect with NithyaTwitter - Follow NithyaOpen Source Law, Policy and PracticeLinkedIn - Connect with AmazonTwitter - Follow AmazonVisit Amazon
This episode features an interview with Jonathan Beri, Founder & CEO of Golioth, a commercial IoT development platform built for scale. Previously, Jonathan was a Product Manager at Particle, Google/Nest, Magneto, and Myspace where he spent his time building IoT solutions.In this episode, Sam sits down with Jonathan to discuss the concept of digital twins, the future of IoT databases, and how to build a real holodeck.-------------------“I think about IoT when I started at Nest, we had some of the best engineers I've ever worked with. Starting from first principles, defining networking protocols, and introducing new specifications that became parts of the fabric of the internet. And fast forward 10 years later, a lot of that exists now as building blocks. Someone who's not a PhD with a lifetime and achievement award from the ITF can go actually design systems that are highly productive, integrated, and enabling. And that's where I get excited. And the through line I think is enabling teams of developers to really create more with their own bare hands. And the technology around it, that is that enabler.” – Jonathan Beri-------------------Episode Timestamps:(01:33): Jonathan's motivation for starting Golioth(08:59): The role of data in IoT(11:01): What is a digital twin and why does it matter?(17:12): The classes of problems Jonathan is trying to solve(20:35): The future of IoT databases in the next five years(31:04): What open source data means to Jonathan(32:24): Jonathan explains how to build a real holodeck(33:42): Jonathan's advice for those excited about industrial data-------------------Links:LinkedIn - Connect with JonathanTwitter - Follow JonathanVisit Jonathan's WebsiteLinkedIn - Connect with GoliothTwitter - Follow GoliothVisit Golioth
This episode features an interview with Indu Navar, CEO and Founder of EverythingALS, a patient-driven non-profit, bringing technological innovations and data science to support efforts from care to cure, for people with ALS. Indu's impressive career includes being an original member of the WebMD engineering team, where she was instrumental in using emerging technologies to achieve application scalability and performance.In this episode, Sam sits down with Indu to discuss healthcare infrastructure applications, her strategies for providing reliable patient data, and the future of ALS research.-------------------“We said, ‘Okay, we're going to make this a citizen-driven research.' That means patients are going to come and enroll because it's their project and it's patient-driven. So, it's a patient-driven, open innovation. So, once you do open patient-driven, open innovation, now we are the custodians of the data. Patients own the data, so all the data is shared with the patient. That was not done before in any of the research. And so, we give all the data back to the patients. And of course, we give them metrics as well. What was the rate of their speed of their speech? And if they don't want to see it, it's fine, at least they have it. And that data, we are the custodians and as custodians we share the data. So, once we did this model, we got almost close to one thousand people enrolled, consented, within 16 months. As supposed to about 25 people in one year or 50 people in one to two years.” – Indu Navar-------------------Episode Timestamps:(01:19): What's changed for Indu in the last tear(05:46): What data infrastructure was like 25 years ago to solve for health outcomes(13:00): Indu's personal experience with healthcare data(16:47): What Indu is looking forward to in ALS research(20:43): How regulatory establishments have shifted in healthcare(30:31): Where Indu wants to see EverythingALS go in the next year(36:28): One question Indu wishes to be asked(38:28): Indu's advice for people inspired by EverythingALS-------------------Links:LinkedIn - Connect with InduTwitter - Follow InduTwitter - Follow EverythingALSVisit EverythingALS
This bonus episode features conversations from season 3 of the Open||Source||Data podcast. In this episode, you'll hear from DeVaris Brown, CEO & Co-founder of Meroxa; Tomer Shiran, Founder & CPO of Dremio; and Erica Brescia, Managing Director at Redpoint Ventures.Sam sat down with each guest to discuss how they're making data more programmable by shifting left.You can listen to the full episodes from DeVaris Brown, Tomer Shiran, and Erica Brescia by clicking the links below.-------------------Episode Timestamps:(00:12): DeVaris Brown(00:42): Tomer Shiran(01:32): Erica Brescia-------------------Links:Listen to DeVaris' episodeListen to Tomer's episodeListen to Erica's episode