Podcasts about open source data

  • 69PODCASTS
  • 157EPISODES
  • 41mAVG DURATION
  • 1EPISODE EVERY OTHER WEEK
  • May 6, 2025LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about open source data

Latest podcast episodes about open source data

Open||Source||Data
AI and the Future of Media Consumption | Pete Pachal

Open||Source||Data

Play Episode Listen Later May 6, 2025 63:58


 In this episode of Open Source Data, Charna Parkey interviews Pete Pachal, founder of The Media Copilot. With over two decades of experience covering technology, Pete shares his insights on how AI is transforming media, journalism and discusses  how journalists can embrace AI as a tool to enhance their work to adapt and thrive in this new environment. QUOTESPETE PACHAL: AI is something that you control. I know, it feels like it's a wave that's coming over that it's unstoppable, inevitable. And that's true to a large extent. But at the same time, it's not, there's no there, right? There's no spark, there's no intent. (...) Never relinquish your role as the ultimate creator and person responsible for what's coming out of this thing.CHARNA PARKEY: I think that there was a point where I found myself shifting more away from media and towards individual curated newsletters because like subject matter experts in that area, I could be like maybe they're going to summarize it incorrectly, et cetera. But at least I know my theory of mind of that individual. And then when I expand that to media, I don't know who's writing what and who's shadow writing what for who.TIMESTAMPS00:00:00 - Introduction of Pete Pachal and his background in journalism and AI.00:02:00 - Pete's career journey, including his work at CoinDesk and founding The Media Copilot.00:04:00 - AI training for media professionals (journalists, PR, marketers).00:06:00 - Evolution of AI in journalism: From skepticism to ethical frameworks.00:08:00 - AI in content pipelines: Idea generation vs. post-production tasks.00:10:00 - Open-source builders needing to cater to domain experts (e.g., journalists).00:12:00 - Meta's removal of fact-checking and its implications.00:16:00 - Public tolerance for AI errors (e.g., Apple's AI summaries).00:18:00 - Consumer trust shifts away from platforms like Facebook/X.00:22:00 - Ghostwriting vs. authenticity in AI-generated content.00:24:00 - Preference for human-curated newsletters over AI summaries.00:26:00 - AI in news digests (e.g., Perplexity, Alexa).00:28:00 - Publisher AI experiments (Washington Post chatbot, TIME summaries).00:32:00 - AI's impact on click-through rates and publisher economics.00:34:00 - AI-written articles (e.g., ESPN's use case) and copyright issues.00:36:00 - Legal battles over AI training data (NYT vs. OpenAI).00:38:00 - Copyright concerns with AI-generated outputs.00:40:00 - AI search tools (Perplexity, ChatGPT) and publisher licensing deals.00:46:00 - The unhealthy impact of social media trends on journalism.00:48:00 - Post-interview discussion: Accountability in AI and media.00:56:00 - Leo's perspective as a journalist on AI adoption.00:58:00 - Closing thoughts on balancing AI innovation with industry needs.

Open||Source||Data
Cooperative Systems, Data Transparency & Quality and the Year of Small AI | Dr. Jason Corso

Open||Source||Data

Play Episode Listen Later Apr 8, 2025 63:09


Dr. Jason Corso joins Charna Parkey to debate the critical role of data quality, how its transparency shapes AI development and the rise of smaller, domain-specific AI models - making 2025 the year of small, specialized AI. QUOTESCharna Parkey"Knowing the right data is incredibly important, because it'll save you money, but predicting the impact of that data means that you don't have to do the training at all to even directionally know if it's going to work out, right?"Jason Corso "You can't understand and analyze an AI system in the way you can analyze open source software if you don't have access to the data."Timestamps[00:00:00] - Introduction[00:02:00] - Jason Corso's journey on open source[00:08:00] - The importance of data in AI[00:10:00] - Voxel 51's mission[00:14:00] - The value of open source and the importance of data in AI systems[00:20:00] - Recent discoveries in AI[00:28:00] - The cost of training AI models[00:36:00] - Cooperative AI in healthcare[00:40:00] - Charna Parkey on the impact of AI in education[00:56:00] -The year of small AI 

Open||Source||Data
Building the Future of Streaming Data | Alex Gallego

Open||Source||Data

Play Episode Listen Later Apr 3, 2025 55:48


In this episode of Open Source Data, Charna Parkey talks with Alex Gallego, CEO and founder of Redpanda Data, about his journey as a builder, the evolution of Redpanda, and the company's new agent framework for the enterprise. Alex shares insights on low-latency storage, distributed stream processing, and the importance of developer experience to the growth of AI and the Open Source space. Timestamps[00:00:00] Introduction[00:02:00] Alex Gallego talks about his background[00:04:00] Charna Parkey discusses the importance of hands-on experience in learning.[00:06:00] Alex explains the origins of Red Panda and how it emerged from challenges in the streaming space.[00:08:00] Alex details the evolution of Red Panda, its use of C-Star and FlatBuffers, and its low-latency design.[00:11:00] Alex discusses the positioning of Kafka versus Red Panda in the market.[00:20:00] Alex introduces Red Panda's new agent framework and multi-agent orchestration.[00:24:00] Alex explains how Red Panda fits into the evolving landscape of AI-powered applications.[00:30:00] The future of multi-agent orchestration.[00:44:00] Thoughts on AI model training and data retention.[00:46:00] Alex encourages future founders and shares his perspective on risk-taking.[00:50:00] Charna Parkey and Leo Godoy discuss the key takeaways from the conversation with Alex Gallego.[00:52:00] Charna reflects on open source trends and the role of developer experience in adoption.[00:54:00] Charna and Leo talk about the different types of founder journeys and the importance of team dynamQuotes Charna Parkey"For AI, unifying historical and real-time data is critical. If you're just using nightly or monthly data, it doesn't match the context in which your prediction is being made. So it becomes very important in the future of applying AI because you need to align those things."Alex Gallego"Every app is going to span three layers. The first layer is going to be your operational layer, just like you have to do business right now. Then there always has to be an analytical layer, and the third layer is this layer of autonomy."

Unboxing Your Packaging
Will your products actually compost? Find out with open-source data from the Compostable Field Testing Program

Unboxing Your Packaging

Play Episode Listen Later Mar 25, 2025 61:51


INTRODUCTIONReady to rethink compostable packaging? In this episode, Emily McGill from BSI Bio dives into the Compostable Field Testing Program (CFTP), an initiative that open-sources field trial data to drive progress in the industry. Curious about what they've uncovered? In addition to several key takeaways, you'll learn the key differences between field and lab testing, how methodologies are evolving, and which materials have been put to the test.Plus, there's a big update! Since recording, the program is forging ahead — actively seeking collaboration on strategy and fundraising for its next research phase. The focus? Building a game-changing matrix comparing disintegration rates with composting conditions. This is crucial intel for both composters and the packaging industry.I loved how Emily takes a step back to explain things, making even the process of "playing" with open-source data and graphs fascinating—whether you're a composter, product designer, manufacturer, policymaker, or brand.And don't miss the final minute, where Emily shares her vision for the future—it's worth sticking around for!RESOURCES MENTIONED IN THIS EPISODEEpisode 55 “[Certifications Spotlight Audio Clip 8] The OK Compost Certifications: Home & Industrial” with Love-Ese Chile: https://www.look4loops.com/packaging-podcast/ep55-certifications-review-ok-compost-home-industrial  BSIbio Packaging Solutions: https://bsibio.com/  The burning question came from Flavie of Lactips in episode 40: https://www.look4loops.com/packaging-podcast/ep40-milk-protein-plastic-free-polymer-recyclable-biodegradable-soluble  Emily invites us to check on the hashtag on LinkedIn #makecompostmainstreamShe is also suggesting to reflect on the meta crisis: What work can I do now here? What vector of change am I?WHERE TO FIND EMILY AND THE COMPOSTABLE FIELD TESTING PROGRAM (CFTP)?The website of the Compostable Field Testing Program (CFTP): https://www.compostabletesting.org/ Contact page: https://www.compostabletesting.org/contact/ LinkedIn of Emily: https://www.linkedin.com/in/emily-mcgill/    ABOUT ABOUT EMILY MCGILL FROM BSIBIO PACKAGING SOLUTIONS Emily McGill is the Program Director of the Compostable Field Testing Program (CFTP), an international research project gathering real-world disintegration data for compostable items from composting facilities across North America, cofounded by the Compost Research and Education Foundation and BSIbio. With a bachelor in Bioresource Engineering, Emily has conducted and remotely coordinated field tests since 2014, and helped lead the development of standardized methods for field testing within ASTM. Her consulting experience includes solid waste management planning at corporate and municipal levels as well as policy development and product design for zero waste and single-use plastic reduction. Since 2015 she has fostered community-based projects in urban sustainability, circular economy and regenerative systems design. She is a micro-composter, feeding the soil in her collaborative community garden in Vancouver, British Columbia, and is the co-founder of Master Recycler Vancouver, a zero waste education program for adults.PODCAST MUSICSpecial thanks to Joachim Regout who made the jingle. Have a look at his work here. I am happy to bring a sample of our strong bonds on these sound waves. Since I was a child, he made me discover a wide range of music of all kinds. I am also delighted he is a nature lover and shares the Look4Loops 'out of the box philosophy'. He is an inspiring source of creativity for me. 

Parallel Mike Podcast
Open Source Data & How It Reveals Your Secrets

Parallel Mike Podcast

Play Episode Listen Later Mar 19, 2025 56:56


Part 2 for Members: www.parallelmike.com Mike's Investing Community and Financial Newsletter – www.patreon.com/parallelsystems Consult with Mike 1-2-1: www.parallelmike.com/consultation Guest Links: Escape The Technocracy: https://escapethetechnocracy.com/digitalmarket/

Open||Source||Data
What is Neuro-Symbolic AI? | Emin Can Turan

Open||Source||Data

Play Episode Listen Later Mar 11, 2025 56:22


In this episode, we dive deep into the world of neuro-symbolic AI with Emin Can Turan, CEO of Pebbles AI. Learn how this technology combines neuroscience, behavioral economics, and AI to revolutionize B2B go-to-market strategies. Emin explains how neuro-symbolic AI bridges the gap between human logic and machine learning, enabling smarter, context-aware systems that democratize complex workflows for startups and enterprises alike.Timestamps[00:00:00] - Introduction by Charna Parkey and introduction of Emin Can Turan.[00:02:00] - Emin's journey to AI and his background in go-to-market strategies.[00:06:00] - Emin explains his deep R&D phase and the development of neuro-symbolic AI.[00:08:00] - Emin describes the architecture of their AI system, including neuro-symbolic AI, generative AI, and agentic frameworks.[00:10:00] - Explanation of neuro-symbolic AI and its relevance to domain-specific problems.[00:12:00] - Discussion on the components of go-to-market strategies and the role of psychology and communication.[00:16:00] -The limitations of generative AI and how they applied strict communication tactics.[00:22:00] - Discussion on the importance of contextual science and data insights.[00:24:00] - The three agentic frameworks they use in their system.[00:26:00] - Explanation of how users control the product and the two co-pilots (strategy and execution).[00:36:00] - The ethical implications of AI and the potential for misuse.[00:38:00] - Discussion on the future of AI and the balance between dystopian and hopeful outcomes.[00:40:00] - Emin emphasizes the importance of truth and transparency in AI development.[00:42:00] - Emin shares his personal motivation for building his AI startup.[00:48:00] - Closing remarks and discussion on the user experience of their platform.[00:50:00] - Charna and Leo discuss the connection between Emin's work and the open-source community.QuotesEmin Can Turan"I felt that this was the future and that AI was the only technology that can digitalize this level of complexity for everyone to use. Nothing else could, you know, you can't use normal neural networks to do this. Even generative AI is not sufficient enough."Charna ParkeyI would love to be able to use Gen AI for more personal things. I love technology. I have the Oura Ring. I've got the Apple Watch. I want to feed that data into something that can somehow tell me and others, here's your state of mind. Here's what you're going to be affected by. 

Open||Source||Data
How to Empower Non-Technical Teams with Data Insights | Suzanne El-Moursi

Open||Source||Data

Play Episode Listen Later Feb 25, 2025 55:23


Learn how BrightHive's AI-powered platform is democratizing data insights, making them accessible to non-technical teams across organizations. Suzanne El-Moursi discusses the importance of data fluency and how BrightHive is helping businesses harness the power of their data.Timestamps00:00:00 -  Introduction and Background00:02:30 - Journey to BrightHive and open source00:06:00 - The evolution of AI and BrightHive's approach00:14:00 - The data problem and the role of AI agents00:22:00 - Building BrightBot with open source frameworks00:26:00 - The future of AI agents and open source00:30:00 - People's reaction to DeepSeek 00:34:00 - The future of work and AI00:40:00- AI in education and personal growth00:42:00 - Suzanne's legacy 00:48:00 -Recap and takeaways with producer Leo GodoyQuotesCharna Parkey "Every single innovation comes out of some form of restriction or need. (...) Don't come and say, “oh, what is this? This is terrible”. I heard all kinds of responses to my excitement and to my belief."Suzanne El-Moursi"So if 97% of an organization is data consumers, there are strategists, the marketing analysts, the customer success associates, the managers all across the enterprise, who need to understand the insights in the company's data, in their functions, in their units, so that they can make the next right step for the customer and for their plan."

Open||Source||Data
Open Source AI and Copyright: Building Ethical Models | Kent Keirsey

Open||Source||Data

Play Episode Listen Later Feb 11, 2025 70:19


QuotesKent Keirsey "When we look at open source models, if you just release the weights, and you don't really release information on how the data set was captioned, for example, or how you construct the data set, if you don't really know how it got to the artifact that was released, as a user, you do not understand how it works."Charna Parkey But there's still a lot of claims by big tech right now about how anything on the internet should be fair use for training, even if, you know, it might have its own kind of copyrightTimestamps[00:02:00] - Kent Keirsey on his journey to open source[00:06:00] - Kent Keirsey on the Open Model Initiative (OMI)[00:08:00] -What makes a model truly open source[00:12:00] - The legal landscape of AI and copyright[00:14:00] - Kent Keirsey on the ethical implications of AI training data fair and use and AI development[00:26:00] Creativity, AI tools, personal AI models and recommendation algorithms:[00:32:00] - Kent Keirsey on TikTok and cultural clash:[00:38:00] - AI, self-reflection and a decision-making tool[00:42:00] - The Bria AI partnership[00:52:00] - The future of creativity, AI and Robotics:[01:00:00] - Final thoughts with producer Leo GodoyConnect with Kent KeirseyConnect with Charna Parkey

Software Engineering Daily
Open Source Data Analytics with Sameer Al-Sakran

Software Engineering Daily

Play Episode Listen Later Dec 3, 2024 47:29


Data analytics and business intelligence involve collecting, processing, and interpreting data to guide decision-making. A common challenge in data-focused organizations is how to make data accessible to the wider organization, without the need for large data teams. Metabase is an open source business intelligence tool that focuses on data exploration, visualization, and analysis. It offers The post Open Source Data Analytics with Sameer Al-Sakran appeared first on Software Engineering Daily.

Podcast – Software Engineering Daily
Open Source Data Analytics with Sameer Al-Sakran

Podcast – Software Engineering Daily

Play Episode Listen Later Dec 3, 2024 47:29


Data analytics and business intelligence involve collecting, processing, and interpreting data to guide decision-making. A common challenge in data-focused organizations is how to make data accessible to the wider organization, without the need for large data teams. Metabase is an open source business intelligence tool that focuses on data exploration, visualization, and analysis. It offers The post Open Source Data Analytics with Sameer Al-Sakran appeared first on Software Engineering Daily.

Open||Source||Data
Building Trust in AI: From Open Source to Global Impact with host, Charna Parkey

Open||Source||Data

Play Episode Listen Later Oct 8, 2024 44:03


Join Charna Parkey as she recaps a transformative year in AI, exploring the delicate balance between innovation and ethics. From open source communities to global regulations, discover how trust, diversity, and collaboration are shaping the future of technology.

The Data Stack Show
The PRQL: Open Source Data Tools: Buying or Selling?

The Data Stack Show

Play Episode Listen Later Sep 16, 2024 1:48


The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we'll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

PodRocket - A web development podcast from LogRocket
Visualizing Open Source Data in React with Brian Douglas

PodRocket - A web development podcast from LogRocket

Play Episode Listen Later May 16, 2024 36:04


Brian Douglas, a seasoned consultant and educator, comes on the podcast to talk about the intricacies of visualizing open source data in React From his journey starting at Netlify to building 'Open Sauce' and engaging with the developer community at GitHub, Brian shares insights on challenges and innovations in data visualization within the React ecosystem. Links https://briandouglas.me https://twitter.com/bdougieYO https://www.linkedin.com/in/brianldouglas https://b.dougie.dev https://youtube.com/@bdougie We want to hear from you! How did you find us? Did you see us on Twitter? In a newsletter? Or maybe we were recommended by a friend? Let us know by sending an email to our producer, Emily, at emily.kochanekketner@logrocket.com (mailto:emily.kochanekketner@logrocket.com), or tweet at us at PodRocketPod (https://twitter.com/PodRocketpod). Follow us. Get free stickers. Follow us on Apple Podcasts, fill out this form (https://podrocket.logrocket.com/get-podrocket-stickers), and we'll send you free PodRocket stickers! What does LogRocket do? LogRocket combines frontend monitoring, product analytics, and session replay to help software teams deliver the ideal product experience. Try LogRocket for free today. (https://logrocket.com/signup/?pdr) Special Guest: bdougie.

Open||Source||Data
Navigating Open Source Talent, AI & Policy Challenges with Amanda Brock

Open||Source||Data

Play Episode Listen Later May 7, 2024 40:16


Episode timestamps(05:06): State of open source in the UK  (07:22): Importance of open source community  (15:19): Balancing openness and regulation in AI  (21:19): Pace of technological development and regulation(28:21): Reliability and discernment with AI outputs(35:24): Universal advice QuotesAmanda Brock“I think the governments that are going to win, the governments that are going to have the best regulation that promotes most innovation are going to be the ones which are able to make their regulatory environment flow in the same way as the technology evolution and innovation flows."Charna Parkey"I think the expectation needs to change. Part of what has happened with, you know, literal text search or keyword search and just Google and things like that, is that the average person expects what comes back to be relatively factual. That it's been referenced and, you know, backlinked, etc. That's a deterministic system. These are not. These are based upon statistical likelihoods of what word should come next." LinksConnect with CharnaConnect with Amanda

Organic Holodeck with Prophetic AI co-founders Eric Wollberg and Wesley Berry

Play Episode Listen Later Apr 29, 2024 87:45


In this episode from new podcast Emergent Behavior, host @Ate-a-Pi interviews the co-founders of Prophetic AI to learn how they're using AI to facilitate lucid dreaming. Prophetic had come up in the recent Cognitive Revolution conversation with Dean W. Ball about brain computer interfaces, and Nathan had them on the list of companies to invite on the show, so it was really fun to discover that Ate-a-Pi had already done an interview with them right around the same time. -- Subscribe to Emergent Behavior: Spotify: https://open.spotify.com/show/2KfbuKL7iqIfbpY31PHtEg Apple: https://podcasts.apple.com/us/podcast/emergent-behavior/id1735023473 YouTube: https://www.youtube.com/@EmergentBehaviorPod & Check out Nathan's new chatbot on www.cognitiverevolution.ai -- SPONSORS: The Brave search API can be used to assemble a data set to train your AI models and help with retrieval augmentation at the time of inference. All while remaining affordable with developer first pricing, integrating the Brave search API into your workflow translates to more ethical data sourcing and more human representative data sets. Try the Brave search API for free for up to 2000 queries per month at https://bit.ly/BraveTCR Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off https://www.omneky.com/ Head to Squad to access global engineering without the headache and at a fraction of the cost: head to https://choosesquad.com/ and mention “Turpentine” to skip the waitlist. -- TIMESTAMPS: (00:00) Intro (00:43) Diving Deep with Prophetic AI (02:18) Exploring Lucid Dreaming with Halo (05:08) The Technical Marvel Behind Halo (07:25) A Deep Dive into Lucid Dreaming Experiences (13:15) Sponsor: Brave | Omneky (14:42) Do All People Have Lucid States? (32:19) Sponsor: Squad (41:05) Building the Neural Model for Dream Control (43:15) Exploring the Intersection of EEG, fMRI, and Ultrasound in Lucid Dreaming (43:48) The Journey from Open Source Data to Targeted Brain Stimulation (47:08) Collaboration with the Donders Institute and the Future of Lucid Dreaming Research (48:40) The Role of Machine Learning and AI in Enhancing Data Collection and Analysis (50:04) From Theory to Practice: Implementing Transcranial Focused Ultrasound (TFUS) (53:51) Personalizing the Lucid Dreaming Experience through Reinforcement Learning (1:16:18) Looking Ahead: The Potential and Challenges of Neurostimulation Technology (1:22:55) Addressing the Skeptics: The Debate Over Natural vs. Technologically Induced Lucid Dreaming

Open||Source||Data
Using AI to Impact Performance Feedback Equity with Tacita Morway

Open||Source||Data

Play Episode Listen Later Apr 23, 2024 48:01


Episode timestamps(02:15): Tacita's unconventional career path to becoming a CTO  (07:00): Textio's practices for building AI responsibly and ethically  (14:00) The impact of Textio's AI on performance feedback  (17:00) The importance of purpose-built vs generic AI models(28:00) Balancing open source and proprietary data/models  (42:00) Advice for the AI industry moving forward  QuotesTacita Morway“When you've got a team with different backgrounds, educational, lived experiences, identity, careers, all of those things, we have those different perspectives in the room. And we're all working off of the same expectations. We can catch each other's gaps.”Charna Parkey“There's an interesting conversation happening, I think, in the community right now about these purpose-built LLMs. Are they as good as generic LLMs? Sure, certainly if you're not going to apply something purpose-built to something generic or outside of its domain, it is not as good. But I think some of this shows us that unless you have something purpose-built and unless you're leveraging the data in the right way, you may just be feeding noise back into the system.” LinksConnect with TacitaConnect with Charna 

Open||Source||Data
The Ethical Path to High-Quality AI Data with Fabiana Clemente

Open||Source||Data

Play Episode Listen Later Apr 9, 2024 50:12


Timestamps(00:02:29) Fabiana's journey starting YData and becoming a public speaker (00:20:19)  Misconceptions and hype around generative AI and AGI (00:32:46) Potential real-world impact and use cases of LLMs today (00:34:55) The role of synthetic data in making AI models more robust and fair (00:43:55) Advice for founders: value your time and learn to say no (00:48:24) The importance of technical leaders being able to communicate well  QuotesCharna Parkey: "It's a balance. I think that's also what led us to some of the demographic based data science. Essentially, folks were making like event data into pre-aggregated data. And then they were trying to obscure it so much that you couldn't get back to the person. And so you're like, okay, what's their age and what's their gender? And you're like, that's not actually the most useful part of data science that can't predict behavior or intent or any of that. It throws out time as a component of the entire process, seasonality, everything. And so there just, there has to be a better way."Fabiana Clemente: "I have to say, that's a very beautiful way to put it. Hallucinations, I have to say. I never thought about that. And it makes a lot of sense. I do think, though, that in terms of LLMs, it's so language, it's so definitely, it sounds like we are getting very, very intelligent system, exactly, because language is very complex. And we know that was needed for the leap of humanity. I do think there are other, the sense of combining. Well, and here we enter in the multimodal kind of space. It's what's missing." LinksConnect with CharnaConnect with Fabiana 

Crazy Wisdom
The Privacy Paradigm: Envisioning Decentralized AI in a Data-Driven World

Crazy Wisdom

Play Episode Listen Later Jan 8, 2024 56:38


In this episode of the Crazy Wisdom Podcast, Stewart Alsop interviews Sharon Zhang, co-founder and CTO of Personal.ai. They delve into the challenges and potential of autonomous AI agents, the role of data in machine learning, and the ongoing development of Personal.ai. Sharon shares how the utilization of various programming languages and architectures has shaped the AI system, which was designed to provide personalized experiences for every user while protecting their privacy. They also discuss the future of open-source data, the possibilities of data monetization, and the evolution of AI. Sign up for the model 2 event tomorrow (Jan 10th, 2024) where the Personal.AI team will present the new model they are releasing Timestamps 00:00 Introduction to the Crazy Wisdom Podcast 00:40 Guest Introduction: Sharon Zhang, Co-founder and CTO at Personal.ai 00:54 Discussing the Technical Aspects of Personal.ai 02:16 Exploring the Evolution of Machine Learning 03:27 The Journey of Automating Medical Transcription 06:04 The Challenges of Building Personal AI 12:00 The Importance of Data Sovereignty 21:43 The Technicalities of Building APIs 23:13 Understanding the Types of Data Used for Training Personal AI 28:22 Understanding Language Models and Predictions 28:57 Exploring Cause and Effect in Decision Making 29:27 Linear vs Nonlinear Behavior 30:11 The Theory of Mind and Predictability of Humans 31:40 The Role of AI in Predicting Human Behavior 32:40 Complexities of Predicting Human Behavior 42:13 The Future of AI: Superintelligence and Autonomy 48:11 The Challenge of Building Autonomous Agents 53:55 The Potential of Open Source Data in AI Development 55:21 Closing Remarks and Future Plans Key Themes Development of Personal AI: Sharon Zhang discussed the complexities in building personal.ai, emphasizing the importance of full-stack development with multiple languages like Java, Python, and JavaScript frameworks. The focus was on creating a unique AI experience that is tailored to individual users. Evolution of Machine Learning and AI: The conversation touched upon the history and evolution of AI and machine learning. Zhang reflected on the transition from support vector machines to more advanced techniques like transformers, highlighting the significant advancements in the field. Data Privacy and Decentralization in AI: A significant portion of the discussion revolved around data privacy, user sovereignty, and the decentralization of AI. Zhang emphasized the importance of users being able to control their data and the concept of a decentralized AI that operates on a personal level. Challenges in AI Development: The technical challenges in developing AI systems, such as building scalable and efficient data models, handling diverse data types, and creating dynamic, user-specific AI models were discussed. This included the complex infrastructure required for such an AI system. Future of AI and Autonomy: The podcast delved into the future of AI, specifically the concept of autonomous AI agents. The discussion included the current limitations and the potential evolution where AI could make independent decisions and possibly coexist with humans. Open Source and AI Data: The conversation highlighted the need for more open-source data to further AI development and the potential for a marketplace for data exchange. The challenges of data availability and quality in building effective AI models were also discussed. Impact of AI on Society: There was a philosophical discussion about the role of AI in society and its potential to be the next evolutionary step for humanity. This included thoughts on how AI might reshape our understanding of autonomy and decision-making.

Open||Source||Data
New Beginnings: Open||Source||Data in Transition

Open||Source||Data

Play Episode Listen Later Dec 20, 2023 50:14


This episode features an interview with Charna Parkey, Real-Time AI Product and Strategy Leader at DataStax. Charna has been developing AI and ML products over the last 17 years and has worked with 90 of the Fortune 100 in her various roles. She is also a co-author and inventor on several patents.In this episode, Sam and Charna discuss handing over the role as host, Sam's new startup journey, and how their thinking has evolved during the explosion of LLMs.-------------------“Now, it seems like we have this opportunity where the conversation and the place that society is at is different. Where we want to contribute to the right set of data when we talk open source data. We want to make sure that we have the right data to train this model in order to get the right outcome. We want to provide a lens of, ‘All right, you are this persona. How would you say this thing?' I do think that from a lot of what the LLMs have today, the outcome of those words are still missing. And we need to solve that. Like, ‘Is this piece of writing actually going to achieve the outcome I want versus am I following legal's guidelines? Am I technically correct? Is my CEO going to like it?' That doesn't mean you're achieving impact in the world. There's an aspect there where we've given feedback loops, it seems, to be like, ‘Did I like the answer or not?' But not, ‘Did I take an action?' As we get to autonomousness, we're going to have to have an outcome or multiple outcomes associated with the reward of the system.” – Charna Parkey“I personally believe that all cognition is bias. My degree is in cognitive science. One of the things that we trained on is attention. And to pay attention, literally means to selectively choose what data is coming in from the world that you're going to pay attention to and what you're going to discard. Which is also, to me, the definition of bias. All cognition is bias, but what do we care about? Do you trust this thing? What does that mean? Well, do you trust it to do these particular actions to a level of consistency in this particular domain? It doesn't mean that you're going to trust it in all environments. There's a lot more nuance that hopefully will evolve in this strange age of nuanced destruction machines.” – Sam Ramji-------------------Episode Timestamps:(01:04): Sam and Charna catch up (06:05): Sam explains his new company, Sailplane (14:21): How Charna's thinking has evolved during the LLM explosion(25:45): Sam's thoughts after 5 seasons of Open||Source||Data(38:52): What Charna is looking forward to in the next season of the podcast(40:44): A question Sam wishes to be asked(45:45): Backstage takeaways with executive producer, Audra Montenegro-------------------Links:LinkedIn - Connect with CharnaLinkedIn - Connect with SamLearn more about Sailplane

Open||Source||Data
The Intersection of Open Source and AI with Stefano Maffulli & Stephen O'Grady

Open||Source||Data

Play Episode Listen Later Dec 13, 2023 55:40


This episode features a panel discussion with Stefano Maffulli, Executive Director of the Open Source Initiative (OSI); and Stephen O'Grady, Co-founder of RedMonk. Stefano has decades of experience in open source advocacy. He co-founded the Italian chapter of Free Software Foundation Europe, built the developer community of the OpenStack Foundation, and led open source marketing teams at several international companies. Stephen has been an industry analyst for several decades and is author of the developer playbook, The New Kingmakers: How Developers Conquered the World.In this episode, Sam, Stefano, and Stephen discuss the intersection of open source and AI, good data for everyone, and open data foundations.-------------------“Internet Archive, Wikipedia, they have that mission to accumulate data. The OpenStreetMap is another big one with a lot of interesting data. It's a fascinating space, though. There are so many facets of the word ‘data.' One of the reasons why open data is so hard to manage and hasn't had that same impact of open source is because, like Stephen, the stories that he was telling about the startups having a hard time assembling the mixing and matching, or modifying of data has a different connotation. It's completely different from being able to do the same with software.” – Stefano Maffulli“It's also not clear how said foundation would get buy-in. Because, as far as a lot of the model holders themselves, they've been able to do most of what they want already. What's the foundation really going to offer them? They've done what they wanted. Not having any inside information here, but just judging by the fact that they are willing to indemnify their users, they feel very confident legally in their stance. Therefore, it at least takes one of the major cards off the table for them.” – Stephen O'Grady-------------------Episode Timestamps:(01:44): What open source in the context of AI means to each guest(16:21): Stefano explains OSI's opportunity to shine a light on models and teams(21:22): The next step of open source AI according to Stephen(25:38): Creating better definitions in order to modify software(33:09): The case of funding an open data foundation(42:31): The future of open source data(51:54): Executive producer, Audra Montenegro's backstage takeaways-------------------Links:LinkedIn - Connect with StefanoVisit Open Source InitiativeLinkedIn - Connect with StephenVisit RedMonk

The Data Exchange with Ben Lorica
Open Source Data and AI: Past, Present, Future

The Data Exchange with Ben Lorica

Play Episode Listen Later Nov 23, 2023 43:07


Earlier this year, I had a conversation with Sam Ramji, Chief Strategy Officer at DataStax and host of the Open||Source||Data podcast,  where we talked about the evolution of big data and AI technologies. I'm airing our original conversation in its entirety on this holiday weekend in the U.S. Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.

@BEERISAC: CPS/ICS Security Podcast Playlist
Open Source Data Visualization for Cyber Threats

@BEERISAC: CPS/ICS Security Podcast Playlist

Play Episode Listen Later Nov 22, 2023 39:24


Podcast: Hack the Plant (LS 33 · TOP 5% what is this?)Episode: Open Source Data Visualization for Cyber ThreatsPub date: 2023-11-21I'm joined by Dan Ricci, founder of the ICS Advisory Project, for this episode of Hack the Plant.The ICS Advisory Project is a free, open-source platform that helps asset owners across 16 critical infrastructure sectors stay secure by identifying threats in their environments.“I saw a gap in the community. There's good data that's coming at us…but no one did anything to take and make that data more digestible through visualization. So I decided, okay, well, I'm just going to do it now. I'm going to take the the data that I have been cleaning up and monitoring for like the past two years, and I'm going to put it together and visualize it, trying to build a tool that's more practical and usable by that asset owner, who may not have a cybersecurity background.”We discuss how data visualization translates into more accessible information for the ICS operators on the ground who need the information - and how the data in the platform is maintained.Join us for an interesting - if technical - discussion about how data from CISA and other agencies can be utilized by asset owners through ICS Advisory's platform.The podcast and artwork embedded on this page are from Bryson Bort, which is the property of its owner and not affiliated with or endorsed by Listen Notes, Inc.

@BEERISAC: CPS/ICS Security Podcast Playlist
Open Source Data Visualization for Cyber Threats

@BEERISAC: CPS/ICS Security Podcast Playlist

Play Episode Listen Later Nov 22, 2023 39:24


Podcast: Hack the Plant (LS 34 · TOP 3% what is this?)Episode: Open Source Data Visualization for Cyber ThreatsPub date: 2023-11-21I'm joined by Dan Ricci, founder of the ICS Advisory Project, for this episode of Hack the Plant.The ICS Advisory Project is a free, open-source platform that helps asset owners across 16 critical infrastructure sectors stay secure by identifying threats in their environments.“I saw a gap in the community. There's good data that's coming at us…but no one did anything to take and make that data more digestible through visualization. So I decided, okay, well, I'm just going to do it now. I'm going to take the the data that I have been cleaning up and monitoring for like the past two years, and I'm going to put it together and visualize it, trying to build a tool that's more practical and usable by that asset owner, who may not have a cybersecurity background.”We discuss how data visualization translates into more accessible information for the ICS operators on the ground who need the information - and how the data in the platform is maintained.Join us for an interesting - if technical - discussion about how data from CISA and other agencies can be utilized by asset owners through ICS Advisory's platform.The podcast and artwork embedded on this page are from Bryson Bort, which is the property of its owner and not affiliated with or endorsed by Listen Notes, Inc.

Open||Source||Data
Throwback: The AI-Native Stack with Mikiko Bazeley, Zain Hasan, and Tuana Celik

Open||Source||Data

Play Episode Listen Later Nov 15, 2023 57:37


This episode features a panel discussion with Mikiko Bazeley, Head of MLOps at Featureform; Zain Hasan, Senior Developer Advocate at Weaviate; and Tuana Celik, Developer Advocate at deepset.In this episode, Mikiko, Zain, and Tuana discuss what open source data means to them, how their companies fit into the AI-first ecosystem, and how jobs will need to evolve with the AI-native stack.-------------------“We're almost part of a fancy new AI robot kitchen that you'd find in Tokyo, in some ways. I see a virtual feature store as, yes, you can have a bunch of your ingredients tossed into a closet. Or, what you can do is you can essentially have a nice way to organize them. You can have a way to label them, to capture information.” – Mikiko Bazeley“I really like that analogy as well. I like how Mikiko put it where a vector search engine is really extracting value from what you've already got. [...] So where I see vector search engines, really, is if we think of these embedding providers as the translators to take all of our unstructured data and bring it into vector space into a common machine language, vector search engines are essentially the workhorses that allow us to compute and search over these objects in vectorized format. They're essentially the calculators of the AI stack.” – Zain Hasan“Haystack, I would really position as the kitchen. I need Mikiko to bring the apples. I need Zain to bring the pears. I need Hugging Face or OpenAI to bring the oranges to make a good fruit salad. But, Haystack will provide the spoons and the pans and the knives to make that into something that works together.” – Tuana Celik-------------------Episode Timestamps:(02:58): What open source data means to the panelists(09:11): What interested the panelists about AI/ML(24:10): Mikiko explains Featureform(27:00): Zain explains Weaviate(30:23): Tuana explains deepset(36:00): The panelists discuss how their companies fit into the AI-first ecosystem(44:58): How jobs need to evolve with the AI-native stack(54:35): Executive producer, Audra Montenegro's backstage takeaways-------------------Links:LinkedIn - Connect with MikikoVisit FeatureformLinkedIn - Connect with ZainVisit WeaviateLinkedIn - Connect with TuanaVisit deepsetVisit Data-centric AI

Tech Talks with Madona
Season 3, Episode 10 - "Open Source, Data Engineering, GovTech and Diversity in Tech"

Tech Talks with Madona

Play Episode Listen Later Nov 3, 2023 22:35


Get full access to Tech Talks with Madona at www.techtalkswithmadona.com/subscribe

Open||Source||Data
How We Should Think About Data Reliability for Our LLMs with Mona Rakibe

Open||Source||Data

Play Episode Listen Later Nov 1, 2023 38:17


This episode features an interview with Mona Rakibe, CEO and Co-founder of Telmai, an AI-based data observability platform built for open architecture. Mona is a veteran in the data infrastructure space and has held engineering and product leadership positions that drove product innovation and growth strategies for startups and enterprises. She has served companies like Reltio, EMC, Oracle, and BEA where AI-driven solutions have played a pivotal role.In this episode, Sam sits down with Mona to discuss the application of LLMs, cleaning up data pipelines, and how we should think about data reliability.-------------------“When this push of large language model generative AI came in, the discussions shifted a little bit. People are more keen on, ‘How do I control the noise level in my data, in-stream, so that my model training is proper or is not very expensive, we have better precision?' We had to shift a little bit that, ‘Can we separate this data in-stream for our users?' Like good data, suspicious data, so they train it on little bit pre-processed data and they can optimize their costs. There's a lot that has changed from even people, their education level, but use cases also just within the last three years. Can we, as a tool, let users have some control and what they define as quality data reliability, and then monitor on those metrics was some of the things that we have done. That's how we think of data reliability. Full pipeline from ingestion to consumption, ability to have some human's input in the system.” – Mona Rakibe-------------------Episode Timestamps:(01:04): The journey of Telmai (05:30): How we should think about data reliability, quality, and observability (13:37): What open source data means to Mona(15:34): How Mona guides people on cleaning up their data pipelines (26:08): LLMs in real life(30:37): A question Mona wishes to be asked(33:22): Mona's advice for the audience(36:02): Backstage takeaways with executive producer, Audra Montenegro-------------------Links:LinkedIn - Connect with MonaLearn more about Telmai

Open||Source||Data
Throwback: Open Source Innovation, The GPL for Data, and The Data In to Data Out Ratio with Larry Augustin

Open||Source||Data

Play Episode Listen Later Oct 18, 2023 40:57


This episode features an interview with Larry Augustin, angel investor and advisor to early-stage technology companies. Larry previously served as the Vice President for Applications at AWS, where he was responsible for application services like Pinpoint, Chime, and WorkSpaces.Before joining AWS, Larry was the CEO of SugarCRM, an open source CRM vendor. He also was the founder and CEO of VA Linux, where he launched SourceForge. Among the group who coined the term “open source”, Larry has sat on the boards of several open source and Linux organizations.In this episode, Sam and Larry discuss who owns the rights to data, the data in to data out ratio, and why Larry is an open source titan.-------------------"People are willing to give up so much of their personal information because they get an awful lot back. And privacy experts come along and say, ‘Well, you're taking all this personal information'. But then most people look at that and say, ‘But I get a lot of value back out of that.' And it's this data ratio value question, which is: for a little in, I get a lot back. That becomes a key element in this. And I think there has to be some kind of similar thought process around open source data in general, which is if I contribute some data into this, I'm going to get a lot of value back. So this data in to data out ratio, I think it's an incredibly important one. And it gets everyone in the mindset of, ‘How do I provide more and more and take less and less?' It's a principle of application development that I like a lot. And I think there's a similar concept here around open source data. Are there models or structures that we can come up with where people can contribute small amounts of data and as a result of that, they get back a lot of value.” – Larry Augustin-------------------Episode Timestamps:(02:52): How Larry is spending his time now after AWS(06:25): What drove Larry to open source(18:41): What is the GPL for data?(24:28): Areas of progress in open source data(28:57): The data in to data out ratio(36:39): Larry's advice for folks in open source-------------------Links:LinkedIn - Connect with LarryTwitter - Follow Larry

The Tech Trek
Open source data orchestration

The Tech Trek

Play Episode Listen Later Oct 17, 2023 23:26


In this episode, Amir Bormand interviews Pete Hunt, the CEO of Dagster Labs. They discuss the open-source nature of Dagster, a product that helps businesses with data orchestration. They explore the product's benefits, the challenges in the data orchestration market, and why Dagster Labs decided to open-source their product. Pete shares his background in open source and the importance of data pipelines in making sense of messy data. Tune in to learn more about how Dagster is revolutionizing the data industry. Highlights: [00:01:02] Building with data in businesses.  [00:04:08] Data hygiene in organizations.  [00:08:09] Building multi-tenancy from day one.  [00:14:14] Data pipeline unpredictability.  [00:18:00] Open source mentality.  [00:21:10] Open source led business models.  [00:23:05] Open source pricing strategy. Guest: Pete joined Dagster Labs as Head of Engineering in early 2022 and took over the reins as CEO in November of that year. Pete was previously co-founder and CEO of Smyte, an anti-abuse provider that Twitter acquired. Before this, Pete led Instagram's web team, built Instagram's business analytics products, and helped to open-source Facebook's React.js. Connect with Pete: https://twitter.com/floydophone  https://www.linkedin.com/in/pwhunt/

Open||Source||Data
Reframing Machine Learning and AI-Assisted Development with Jorge Torres

Open||Source||Data

Play Episode Listen Later Sep 27, 2023 45:11


This episode features an interview with Jorge Torres, Co-founder and CEO of MindsDB. MindsDB is a virtual AI database that works with existing data to help developers build AI-centered apps. In 2008, Jorge began his work on scaling solutions using machine learning as the first full-time engineer at Couchsurfing, growing the company from a few thousand users to a few million. He has also served a number of data-intensive start-ups and was a visiting scholar at UC Berkeley researching machine learning automation and explainability.In this episode, Sam and Jorge discuss the inspiration and challenges behind MindsDB, classic data science AI versus applied AI, and time series transformers.-------------------“So much data in the world is time series data, so much data. Even data that people don't know is time series, it's time series. So long as it's moving over time, it is time series data. Whether you store it or not, that's a different thing. For having a pre-trained model on time series data, it even enabled the fact that you don't have to store all the historical data. You can just take the model and start passing data as it comes through, and then you get out the forecast. So you don't even have to have the historical data. All you need to have is the data at that given instance, and you can pass it to the model and you get an output. It's mind blowing.” – Jorge Torres-------------------Episode Timestamps:(05:20): The inspiration behind MindsDB(10:20): Classic data science AI approach vs. applied AI(22:09): What open source data means to Jorge(28:51): What excites Jorge about Nixtla and time series transformers(37:07): A question Jorge wishes to be asked(40:20): Jorge's advice for the audience(41:38): Backstage takeaways with executive producer, Audra Montenegro-------------------Links:LinkedIn - Connect with JorgeLearn more about MindsDB open source codeLearn more about MindsDB

Open||Source||Data
A Sam Ramji Feature: The Evolution of Open Source, Kubernetes, and AI's Forward Journey

Open||Source||Data

Play Episode Listen Later Sep 6, 2023 69:46


On this episode, we've partnered with the Future Rodeo podcast for a discussion between Sam and Matt Wallace. Matt is the Chief Technology Officer and EVP at Faction, a pioneer of multi-cloud data services, and host of Future Rodeo.In this episode, Sam and Matt discuss Microsoft's transformation, the impact of Kubernetes on container orchestration, and the rapid acceleration of AI research and development.-------------------Episode Timestamps:(01:38): Microsoft's open source transformation(13:19): The impact of Kubernetes and how it defragmented the industry(22:06): The transformative power of AI and how it's changing the value of reasoning(54:58): The concept of cognitive economy and its potential impact on AI and software development(01:03:25): Potential implications of advancements in robotics, AI, and clean energy(01:04:17): Sam's advice for those entering the industry or choosing a career path-------------------Links:LinkedIn - Connect with MattListen to the Future Rodeo podcast

Open||Source||Data
The Importance of Open Source Data for Generative AI, Now and in the Future with Abby Kearns

Open||Source||Data

Play Episode Listen Later Aug 23, 2023 46:14


This episode features an interview with Abby Kearns, technology executive, board director, and angel investor. Her career has spanned executive leadership, product marketing, product management, and consulting across Fortune 500 companies and startups, including Puppet, Cloud Foundry Foundation, and Verizon. Abby currently serves as a board director for Lightbend, Stackpath, and Invoke. In this episode, Sam sits down with Abby to discuss the betrayal source license, the role open source plays in AI, and empowering trust.-------------------“There's so much happening so quickly that I think open source has the power to help harness a lot of that innovative conversation. In a way that I think it's going to be really, really hard to match in a proprietary way. I think open source and the ability, given the fact that we're talking about AI and data, the two are very interrelated at this point. AI is not super interesting without data. I think the power of open source right now and what's happening, I think it has to happen in open source and I think it really has to have that level of transparency and visibility. But, always the ability for everyone to step up and understand what's happening at this moment in time and shape it.” – Abby Kearns-------------------Episode Timestamps:(00:50): Sam and Abby discuss the betrayal source license(14:12): What open source data means to Abby(23:30): Abby dives into the companies she's investing in(34:30): How nonprofits can empower trust(38:32): A question Abby wishes to be asked(40:21): Abby's advice for the audience(43:53): Backstage takeaways with executive producer, Audra Montenegro-------------------Links:LinkedIn - Connect with AbbyTwitter - Follow AbbyRead Design the Life You Love

Open||Source||Data
The Value of Reproducibility and Ease of AI Deployment with Daniel Lenton

Open||Source||Data

Play Episode Listen Later Aug 9, 2023 33:58


This episode features an interview with Daniel Lenton, Founder and CEO of Ivy, where the team is on a mission to unify the fragmented AI stack. Prior to Ivy, Daniel was a Robotics Research Engineer at Dyson and a Deep Learning Research Scientist for Amazon Prime Air. During his PhD, Daniel explored the intersection between learning-based geometric representations, ego-centric perception, spatial memory, and visuomotor control for robotics.In this episode, Sam and Daniel discuss the inspiration behind Ivy, open source reproducibility, and democratizing AI.-------------------"There's too much amazing stuff going on, from too many different parties. We just want to be the objective source of truth to show you the data and show you where your model will be doing best, and continue to do this as a service or something like this. This is high-level, some of the areas we see and going into, we really want to be a useful tool for anybody that wants to just kind of understand this fragmented complex space quickly and intuitively, and we are trying to be the tool that does that." – Daniel Lenton-------------------Episode Timestamps:(01:00): What open source data means to Daniel(05:37): The challenges of building Ivy(15:37): The future of Ivy(25:19): Who should know about Ivy(28:46): Daniel's advice for the audience(32:00): Backstage takeaways with executive producer, Audra Montenegro-------------------Links:LinkedIn - Connect with DanielLearn more about Ivy

Someone Like You
Sharing Open-Source Data to Investigate Global Fishing Activity — with Tony Long, CEO of Global Fishing Watch

Someone Like You

Play Episode Listen Later Aug 9, 2023 38:51


A third of the world's fish docks are overfished, with the remaining two-thirds fished at capacity. Today's guest is Tony Long, CEO of Global Fishing Watch, an international non-profit organization sharing open-source information to combat illegal fishing practices. Tony spent 27 years with the British Royal Navy, and his love of the high seas led him to join the non-profit sector. Tony speaks with Marco on the problems of commercial fishing we face today, the various tools Global Fishing Watch uses to track fishing activity globally, and why making the data they collect accessible to everyone is critical. By continuing to break down electronic barriers and making fishing data open-source, Global Fishing Watch is paving the way to protecting our oceans by 2030 with transparent and accessible practices.  Follow us on Instagram @someonelikeyoupodcast.   https://unlessbrands.com/episode-36-tony-long-global-fishing-watch

Open||Source||Data
ML Engineering Teams and Niche Chat Bot Experiences with Demetrios Brinkmann

Open||Source||Data

Play Episode Listen Later Jul 26, 2023 50:17


This episode features an interview with Demetrios Brinkmann, Founder of the MLOps Community, an organization for people to share best practices around MLOps. Demetrios fell into the Machine Learning Operations world and has since interviewed leading names around MLOps, data science, and machine learning. In this episode, Sam sits down with Demetrios to discuss LLM in production use cases, ML engineering teams, and the LLM Survey Report from the MLOps Community.-------------------"I think the most novel ones that I saw from the survey were when a chat bot would prompt a human as opposed to the human prompting the chat bot. It's almost like you have this LLM coach. And in that way, it's not necessarily like this isn't LLM in production that an end user is getting that's not outside the business or that is outside the business. It's more like internally, you can think about maybe it's an accountant and the accountant is filing my taxes for the year. As they're filing them, the LLM is prompting them on different tax laws that maybe they weren't thinking about or different ways that they could file things." – Demetrios Brinkmann-------------------Episode Timestamps:(04:30): LLMs as the new standard(19:26): Key LLM in production use cases(31:18): What open source data means to Demetrios(34:36): What Demetrios is seeing in open source AI models(42:44): One question Demetrios wishes to be asked(44:41): Demetrios's advice for the audience(47:19): Backstage takeaways with executive producer, Audra Montenegro-------------------Links:LinkedIn - Connect with DemetriosRead the LLM Survey ReportListen to The MLOps Podcast

DataTalks.Club
Investing in Open-Source Data Tools - Bela Wiertz

DataTalks.Club

Play Episode Listen Later Jul 21, 2023 54:57


We talked about: Bela's background Why startups even need investors Why open source is a viable go-to-market strategy Building a bottom-up community The investment thesis for the TKM Family Office and the blurriness of the funding round naming convention Angel investors vs VC Funds vs family offices Bela's investment criteria and GitHub stars as a metric Inbound sourcing, outbound sourcing, and investor networking Making a good impression on an investor Balancing open and closed source parts of a product The future of open source Recent successes of open source companies Bela's resource recommendations Links: Understand who is engaging with your open source project article: https://www.crowd.dev/ Top 6 Books on Developer Community Building: https://www.crowd.dev/post/top-6-books-on-developer-community-building Which open source software metrics matter: https://www.bvp.com/atlas/measuring-the-engagement-of-an-open-source-software-community#Which-open-source-software-metrics-matter Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Open||Source||Data
Building With Trust, Inspiration, and Reputation with Jaya Gupta, Yuliia Tkachova, and Omoju Miller

Open||Source||Data

Play Episode Listen Later Jul 12, 2023 4:12


This bonus episode features conversations from season 5 of the Open||Source||Data podcast. In this episode, you'll hear from Jaya Gupta, Partner at Foundation Capital; Yuliia Tkachova, Co-founder and CEO of Masthead Data; and Omoju Miller, Founder and CEO of Fimio.Sam sat down with each guest to discuss how they are building foundations for trust, inspiration, and reputation as we all race into the AI-centric future.You can listen to the full episodes from Jaya Gupta, Yuliia Tkachova, and Omoju Miller by clicking the links below.-------------------Episode Timestamps:(00:49): Jaya Gupta(01:48): Yuliia Tkachova(03:03): Omoju Miller-------------------Links:Listen to Jaya's episodeListen to Yuliia's episodeListen to Omoju's episode

The Data Exchange with Ben Lorica
An Open Source Data Framework for LLMs

The Data Exchange with Ben Lorica

Play Episode Listen Later Jul 6, 2023 49:24


Jerry Liu is CEO and co-founder of LlamaIndex, an open source project and startup that builds tools that enable teams to augment LLMs with their own private data. Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.

Open||Source||Data
FMOps and a Founders Automated Future with Jaya Gupta

Open||Source||Data

Play Episode Listen Later Jun 28, 2023 33:49


This episode features an interview with Jaya Gupta, Partner at Foundation Capital, where she leads early-stage investments across the enterprise software stack. Previously, Jaya was a Senior Business Analyst at McKinsey & Company focusing on software diligence and helping startups expand their go-to-market strategies.In this episode, Sam and Jaya discuss her journey to Foundation Model Ops, how software is becoming more accessible, and the democratization of AI tools.-------------------"At the end of the day, FMOps isn't just about the new tools. It's actually more about the new builders, the new workflows, and a completely new market of customers. I was on the other day, looking at LangChain's page of integrations, I don't know if you've seen it, but it's like Anyscale, Databricks, all these other huge legendary companies are integrating with LangChain, and I think it's clear that there's a huge community that is building something real and valuable." – Jaya Gupta-------------------Episode Timestamps:(01:05): What open source data means to Jaya(08:51): Jaya's journey to Foundation Model Ops(15:58): How software is becoming more accessible(23:04): The democratization of AI tools(27:01): One question Jaya wishes to be asked(29:32): Jaya's advice for the audience(31:51): Backstage takeaways with executive producer, Audra Montenegro-------------------Links:LinkedIn - Connect with JayaFollow Jaya on TwitterLearn more about FMOps

Open||Source||Data
MLOPs: Privacy, Security, Cost, and Latency - a Sneak Peek with Bart Farrell

Open||Source||Data

Play Episode Listen Later Jun 14, 2023 8:04


This episode features an interview with Bart Farrell, is a CNCF Ambassador, a Cloud Native Community Consultant, and Content Creator. An American entrepreneur living in Spain, Bart has spent the last decade helping tech companies broaden their audience through exceptional content. He has organized and hosted over 250 cloud native in-person and virtual events in 10 different countries.In this episode, Audra and Bart discuss upcoming AI and MLOps events, his work as a community consultant, and what open source data means to him.-------------------“When we're looking at other technologies, in particular use cases like low latency, if we're talking about autonomous vehicles, we're talking about the financial sector, we're talking about fraud detection, things where decisions have to be made in real time. What are the technologies that are helping out with that? How can organizations, some that are more advanced than others, go through that adoption phase? And others that aren't so advanced, that haven't really moved things yet into production, how can they be better prepared in order to tackle these challenges that are coming up? That being said, we've got quite a cross section of different larger and smaller organizations that are really playing a pivotal role in the changes that are going on when it comes to edge meeting AI and MLOps.” – Bart Farrell-------------------Episode Timestamps:(01:27): Bart's background(02:45): Bart dives into The Cutting Edge of MLOps live event(06:18): What open source data means to Bart-------------------Links:LinkedIn - Connect with BartTwitter - Follow BartLearn more about The Cutting-EDGE of MLOps webinarLearn more about Edgecase 2023Listen to The AI-Native Stack with Mikiko Bazeley, Zain Hasan, and Tuana Celik

OECD
Multinational enterprises demystified using open-source data

OECD

Play Episode Listen Later Jun 12, 2023 21:43


Over the past few decades, as trade and investment barriers have lessened, and transport and communication costs have declined, multinational enterprises or MNEs have become an increasingly important fixture in the global economy. As these entities begin to represent a larger share of global economic activity, the importance of monitoring them and understanding their behaviour has never been greater. However, MNEs cross borders by definition, making them notably difficult to keep track of at the national level. The new OECD UNSD Multinational Enterprise Information Platform gathers together data on the world's largest multinationals from a range of public sources. These data cover the geographical and digital scope of individual multinationals and an array of indicators, complementing major recent reforms to the international tax system led by the OECD and in response to the challenges arising from digitalisation. But what new benefits does this initiative deliver? What does the data reveal? And how can it be used for economic analysis? And what does this say about where the global economy is heading? This OECD Podcast aims to address these questions and more in conversation with one of our own data experts. Host: Ashley Ward Guest: Graham Pilgrim, Head of Real-Time Data Analytics, OECD Statistics and Data Directorate Producer: Anna Wahlgren, Ashley Ward, Robin Allison Davis To learn more about the OECD's work with multinational enterprises, go to: https://www.oecd.org/sdd/its/mne-platform.htm

Engenharia de Dados [Cast]
Dremio & Iceberg for Building an Open-Source Data Lakehouse with Dipankar Mazumdar, Data Advocate at Dremio

Engenharia de Dados [Cast]

Play Episode Listen Later Jun 6, 2023 73:55


No episódio de hoje, Luan Moreno,  Mateus Oliveira e Antony Lucas entrevistaram Dipankar Mazumdar, atualmente como  Data Advocate na Dremio.Dremio é uma das mais conhecidas tecnologias de Self-Service SQL Analytics de mercado, unificando a visão dos dados e utilizando a lingua franca de dados: o SQL. Alinhado com o Apache Iceberg, o Dremio traz a proposta de ser um Open Data Lakehouse. Com Apache Iceberg, você tem os seguintes benefícios:Compactação de Dados;Time Travel;ACID;Hidden Partition;Desenvolvido para multi-plataforma.Falamos também nesse bate-papo sobre os seguintes temas:Engenharia de Dados;Apache Iceberg;Dremio.Aprenda mais sobre como o Dremio e Iceberg que juntos, podem prover mais uma opção de Data Lakehouse, principalmente para casos que vamos trabalhar com plataformas distintas de processamento e exploração de dados.Dipankar Mazumdar = Linkedinhttps://www.dremio.com/https://iceberg.apache.org/ Luan Moreno = https://www.linkedin.com/in/luanmoreno/

Open||Source||Data
Web3 and Putting Reputation on Code with ML with Omoju Miller

Open||Source||Data

Play Episode Listen Later May 31, 2023 62:01


This episode features an interview with Omoju Miller, Founder and CEO of Fimio, a web3 reputation company. Originally from Lagos, Nigeria, Omoju holds a doctoral degree in Computer Science Education from UC Berkeley. Her expertise in machine learning and computational intelligence led her to companies such as Google and GitHub. Omoju also served as a volunteer advisor to the Obama administration's White House Presidential Innovation Fellows.In this episode, Sam sits down with Omoju to discuss how machine learning can make applications more secure, what the future of the internet looks like, and the fascinating story behind Fimio.-------------------“So my first view is, in this future internet we have people, we also have bots, we have machines, we have code doing things. And bots sounds like such a horrible word now. [...] You need to have a level of trust on what that bot is. Everything from the humans to the machines collaborating in this decentralized world, we need to have some kind of reputation attached to each of those nodes. And the reason why we need that reputation is, as the thing scales, it becomes overwhelming to get value from it. You need something to help you filter, to find what you're looking for. Otherwise, you get stuck in that environment where you're just completely overwhelmed and you don't even know what to do. So I think of what I'm doing as just reputation to make this decentralized future slightly more attainable.” – Omoju Miller-------------------Episode Timestamps:(00:59): Omoju's inspiration for starting Fimio(10:27): The future of smart contracts(28:47): Using mathematics to guarantee the safety of algorithms(34:34): What led Omoju to building a mathematical product(51:27): What open source data means to Omoju(55:38): One question Omoju wishes to be asked(57:47): Omoju's advice for the audience(01:00:08): Backstage takeaways with executive producer, Audra Montenegro-------------------Links:LinkedIn - Connect with OmojuVisit Fimio

Open Source Startup Podcast
E87: Commercializing Open Source Data Systems with Astronomer & CoreDB

Open Source Startup Podcast

Play Episode Listen Later May 22, 2023 38:52


Ry Walker is Founder of open source data companies Astronomer and CoreDB. Astronomer is the commercial company tied to the popular open source data workflow management system Apache Airflow, and CoreDB is a database company based on the popular open source database Postgres. CoreDB has raised $7M from investors including Venrock and CincyTech, and Astronomer has raised $283M from investors including Venrock, Insight, and Sierra Ventures. In this episode, we dig into the Astronomer journey and when things really started to work, what a great UI means in the data space, where the idea for CoreDB came from, his learnings around open source monetization, the benefits and drawbacks of building a commercial open source data company, and learnings Ry is taking from Astronomer to his new company CoreDB.

Open||Source||Data
The Human Right to Privacy and Caring About UX Design with Yuliia Tkachova

Open||Source||Data

Play Episode Listen Later May 17, 2023 46:34


This episode features an interview with Yullia Tkachova, Co-founder and CEO of Masthead Data, an observability platform that catches anomalies in Google BigQuery in real-time. She holds degrees in Management Information Systems, Math, Statistics, and Marketing. Prior to Masthead, Yuliia designed complex BI products and solutions powered by ML and utilized by Fortune 500 companies.In this episode, Sam and Yuliia discuss how ML is shaping the future of data analytics, caring about users, and the fundamental human right to privacy.-------------------“We map those errors and anomalies on lineage, helping to understand what upstreams and downstreams are affected, what business users are affected. And that actually speeds up all the troubleshooting from hours to minutes. And this is the ultimate goal where we deliver. Because again, my belief that if you don't have this lineage piece was mapped anomalous in errors, it's not observability. It's monitoring. [...] What is also very unique to us, because Masthead operates on logs, it's triggered by logs. So, we do support streaming data. Unlike SQL-first solutions, as you can guess. We don't have to run SQL queries to see if they're anomalous, we're triggered by logs. And this is also what sets us apart.” – Yuliia Tkachova-------------------Episode Timestamps:(01:14): What got Yuliia excited about math and statistics(11:31): The basic human right to privacy(18:21): What open source data means to Yuliia(28:00): Yuliia's reason for building a solution focused on privacy and security(38:09): One question Yuliia wishes to be asked(42:21): Yuliia's advice for the audience(44:46): Backstage takeaways with executive producer, Audra Montenegro-------------------Links:LinkedIn - Connect with YuliiaVisit Masthead Data

Open||Source||Data
Determinism in Complex Environments and Workflow Services with Maxim Fateev

Open||Source||Data

Play Episode Listen Later May 3, 2023 42:06


This episode features an interview with Maxim Fateev, Co-founder and CEO of Temporal, an open source, distributed, and scalable workflow orchestration engine capable of running millions of workflows. He has 20 years of experience architecting mission-critical systems at Uber, Google, Amazon, and Microsoft. In this episode, Sam sits down with Maxim to discuss workflow services, the power behind Temporal, and bringing determinism to highly complex environments.-------------------“[Temporal] has this notion of workflows, which can run for a very long time and handle external events, you can treat them as a durable actor. And they're very good at implementing a lifecycle. For example, you can have an object per model and let this object handle all the events. Like, new data came in, notify this object, this object will go and retrain it. Or, it'll run an activity to superiorly check the status. So you can have end-to-end lifecycle implemented fully in Temporal.” – Maxim Fateev-------------------Episode Timestamps:(01:03): What's top of mind for Maxim in workflow services(04:09): What open source data means to Maxim(11:07): Maxim explains his time at AWS and building Cadence at Uber(23:09): Use cases and the community of Temporal(28:26): How Temporal is being used for ML workloads(32:28): One question Maxim wishes to be asked(36:38): Maxim's advice for those working with complex distributed systems(39:11): Backstage takeaways with executive producer, Audra Montenegro-------------------Links:LinkedIn - Connect with MaximTemporal.ioWatch Maxim's talk “Designing a Workflow Engine from First Principles”Replay Conference 2023

Open||Source||Data
The AI-Native Stack in Practice with Charna Parkey and Sam Bean

Open||Source||Data

Play Episode Listen Later Mar 15, 2023 66:25


This episode features a panel discussion with Charna Parkey, a Real-Time AI Product and Strategy leader at DataStax; and Sam Bean, Staff Engineer at You.com. Charna is a co-author and inventor on several patents, including patent-pending work on ML/coordinated feature engine at the edge. Sam helped create the Spark connector to Weaviate, and is passionate about Big Data, Spark, NLP, Hugging Face, and large language models.In this episode, Charna and Sam discuss adapting to user expectations, what's missing in the AI stack, and how to become an advanced citizen in open source.-------------------"We've seen these companies start to better understand that these streaming technologies have a place, whether it's Kafka or Flink or Pulsar, but it's still incredibly difficult to use and we need a different level of abstraction. [...] We're starting to see the stack change so that it becomes more interchangeable of the components and try to sort of raise that layer of abstraction so that we can get these types of models and these types of capabilities to more people." – Charna Parkey"I think that a lot of what you need to adjust to are these, what you were discussing as I call interaction data, you were calling it event data. But these interactions that people have with the internet and trying to find ways to model that in a way that even if your models aren't real-time, having ways to featurize real-time data in a way that's interpretable by a model. [...] I think Spark and Kafka and Delta and all of those things, give you a lot more flexibility now to move in different directions and readjust and I think, pivot what you want to do with the system." – Sam Bean-------------------Episode Timestamps:(01:29): Sam explains his background(03:36): Charna explains her background(18:13): Sam explains the problems You.com is solving for(28:21): Changes in user expectations in the AI-native stack(39:09): Advice for becoming an advanced citizen in open source(47:25): What's missing in the AI stack(54:51): What open source data means to the panelists(58:22): How technologists should prepare for the future(01:03:10): Executive producer, Audra Montenegro's backstage takeaways-------------------Links:LinkedIn - Connect with CharnaVisit DataStaxLinkedIn - Connect with SamVisit You.com

Intelligence Matters
Kristin Wood on the Intelligence Value of Open Source Data

Intelligence Matters

Play Episode Listen Later Mar 8, 2023 42:18


In this episode of Intelligence Matters, host Michael Morell speaks with former senior CIA officer Kristin Wood about the history, value and current applications of open source data to intelligence collection and analysis. Wood, who helped lead the innovation and technology group at CIA's Open Source Center, walks through the types of information available to the public and for purchase through commercial firms that create unique insights into companies, behaviors and events. Morell and Wood discuss the ways in which the U.S. intelligence community has leveraged - or failed to leverage - some key open source data.See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

Open||Source||Data
The AI-Native Stack with Mikiko Bazeley, Zain Hasan, and Tuana Celik

Open||Source||Data

Play Episode Listen Later Mar 1, 2023 56:48


This episode features a panel discussion with Mikiko Bazeley, Head of MLOps at Featureform; Zain Hasan, Senior Developer Advocate at Weaviate; and Tuana Celik, Developer Advocate at deepset.In this episode, Mikiko, Zain, and Tuana discuss what open source data means to them, how their companies fit into the AI-first ecosystem, and how jobs will need to evolve with the AI-native stack.-------------------“We're almost part of a fancy new AI robot kitchen that you'd find in Tokyo, in some ways. I see a virtual feature store as, yes, you can have a bunch of your ingredients tossed into a closet. Or, what you can do is you can essentially have a nice way to organize them. You can have a way to label them, to capture information.” – Mikiko Bazeley“I really like that analogy as well. I like how Mikiko put it where a vector search engine is really extracting value from what you've already got. [...] So where I see vector search engines, really, is if we think of these embedding providers as the translators to take all of our unstructured data and bring it into vector space into a common machine language, vector search engines are essentially the workhorses that allow us to compute and search over these objects in vectorized format. They're essentially the calculators of the AI stack.” – Zain Hasan“Haystack, I would really position as the kitchen. I need Mikiko to bring the apples. I need Zain to bring the pears. I need Hugging Face or OpenAI to bring the oranges to make a good fruit salad. But, Haystack will provide the spoons and the pans and the knives to make that into something that works together.” – Tuana Celik-------------------Episode Timestamps:(02:08): What open source data means to the panelists(08:22): What interested the panelists about AI/ML(23:20): Mikiko explains Featureform(26:11): Zain explains Weaviate(29:34): Tuana explains deepset(35:11): The panelists discuss how their companies fit into the AI-first ecosystem(44:12): How jobs need to evolve with the AI-native stack(53:45): Executive producer, Audra Montenegro's backstage takeaways-------------------Links:LinkedIn - Connect with MikikoVisit FeatureformLinkedIn - Connect with ZainVisit WeaviateLinkedIn - Connect with TuanaVisit deepsetVisit Data-centric AI

Open||Source||Data
Special Episode: Data on Kubernetes and Cassandra Forward with Patrick McFadin

Open||Source||Data

Play Episode Listen Later Feb 22, 2023 18:44


This special episode of Open||Source||Data features an interview with Patrick McFadin. Patrick has been a distributed systems hacker since he first plugged a modem into his Atari computer. Looking for adventure, he joined the US Navy, working on the Naval Tactical Data System (NTDS), which cemented his love of distributed systems. He is now an Apache Cassandra Committer, and is the Vice President of Developer Relations at DataStax. Sam catches up with Patrick at Data Day Texas to discuss his book Managing Cloud Native Data on Kubernetes, Cassandra Forward, and the future of Apache Cassandra.-------------------“I can now use my Parquet file in Iceberg or DuckDB, and this is data that I created with Cassandra. And we're not getting to the point where we have to reinvent an entire database. We can just connect the Lego parts together and if they're open, then I don't have these encumbrances. I'm not like, ‘Well, I can connect that if I call a salesperson and get a license.' [...] That's what's exciting to me about Cassandra, the way that the ecosystem is evolving around Cassandra. It's not, ‘Cassandra's at the center, it's just a player.' It's at the party." – Patrick McFadin-------------------Episode Timestamps:(01:06): What open source data means to Patrick(02:11): Patrick discusses his book Managing Cloud Native Data on Kubernetes(10:02): Patrick discusses Cassandra Forward(11:09): The future of Apache Cassandra-------------------Links:LinkedIn - Connect with PatrickCassandra Forward

Open||Source||Data
Making Graph Data Easier with Open Initiatives with Denise Gosnell

Open||Source||Data

Play Episode Listen Later Feb 15, 2023 40:10


This episode features an interview with Denise Gosnell, Principal Product Manager at Amazon Web Services. At AWS, Denise leads product and strategy for Amazon Neptune, a fully managed graph database service. Her career centers on her passion for examining, applying, and advocating for the applications of graph data. Denise has also authored, patented, and spoken on graph theory, algorithms, databases, and applications across all industry verticals.In this episode, Sam sits down with Denise to discuss graph initiatives, the future of developer models, and what Denise learned from hiking the Appalachian Trail.-------------------“We just open sourced something called graph-explorer, which is something for the community by the community, Apache 2.0 license. graph-explorer is a low-code visualization tool. But, the best part about it is that it works for JanusGraph, it works for Blazegraph, it works for all of these graph models that we've talked about, because we've got this divided graph community, but it was written to work with all graphs. [...] Today it's all, ‘Here's your Lego blocks and build one on your own. If you want to go ahead and fork Jupyter Notebook and figure out a way to get that D3 force-directed graph way out to pop up, have fun.' It's the first time that we've had a unified way across graph vendors and graph implementations to have a way to visualize your graph data in one tool that's open source.” – Denise Gosnell-------------------Episode Timestamps:(01:17): What open source data means to Denise(04:27): How Denise got interested in computer science(08:39): Denise's work on graph initiatives(14:30): How Denise's work at LDBC relates to SQL standards(23:43): The future of developer models(29:43): One question Denise wishes to be asked(34:05): Denise's advice for graph practitioners(37:37): Executive producer, Audra Montenegro's backstage takeaways-------------------Links:LinkedIn - Connect with DeniseThe Practitioner's Guide to Graph Data