POPULARITY
Technovation with Peter High (CIO, CTO, CDO, CXO Interviews)
980: “Enterprise data platforms must evolve to express context to AI systems.” In this episode of Technovation, Peter High speaks with Louis Landry, Chief Technology Officer at Teradata, a $1.75B cloud analytics and AI data platform. Louis shares how Teradata is reshaping its core platform to meet the demands of the AI era through open, hybrid architectures, vector search integration, and mixed workload optimization. A 11-year veteran of the company, Louis reflects on Teradata's evolution from centralized analytics to a hybrid, multi-cloud powerhouse, and what it means to support some of the world's largest banks, airlines, and healthcare providers. He also offers insight into agentic AI, data-centric intelligence, and model-sharing futures.
Tracy Moore is a seasoned leader in Data and AI and the General Manager of AI & Data at MYOB. With 26 years of experience, she's been instrumental in building and leading AI and data teams to drive real impact. Her career spans major organizations like Skandia Life, Barclays Bank, ANZ, Teradata, Deloitte, KPMG, Xero, and Telstra.We're unpacking some big questions—have businesses truly tapped into the power of Big Data? How can data storytelling bridge the gap? And with AI chatbots dominating the scene, what's next for ethical AI, especially after controversial moments like Elon Musk's Grok-3 in India?
In this episode of Game Time Tech, Robert Kramer and Melody Brue, VP and Principal Analysts at Moor Insights & Strategy, dive into the intersection of sports and technology. Explore how cutting-edge technologies like AI, data analytics, and personalized fan experiences are transforming Major League Baseball, The Masters Golf Tournament, and the Intuit Dome. Highlights include: MLB's Data Evolution with Google Cloud: Data and AI are enhancing fan engagement, team strategies, and broadcasting through platforms like Google Cloud's BigQuery. Masters Golf Tournament's AI Innovations: IBM's Generative AI is powering predictive insights for fans with features like "Every Shot" and "Every Hole" in The Masters app. Inside the Intuit Dome: A look at advanced fan experience technologies, including facial recognition for entry, autonomous stores, and real-time analytics, powered by Teradata. Mercedes-Benz's Cutting-Edge In-Car Experience: Technology is transforming connectivity, from live sports streaming to Zoom calls integrated directly into vehicles. Fan Behavior and Smart Stadiums: How data-driven technologies are shaping stadium interactions, from personalized fan experiences to autonomous retail systems.
Show Summary:In this episode of The STEM Space, Natasha sits down with Nichole Austion — children's book author and Vice President of Public Affairs at the National Math and Science Initiative (NMSI). Nichole shares the story behind her book Miles and the Math Monsters, inspired by her son's journey to overcome struggles with math, where the character discovers math to be a helpful companion intertwined in their everyday environment. Listen in as she shares an excerpt of her book and practical advice on fostering a positive math identity in children.About Nichole Austion, VP of Public Affairs at NMSI: Nichole Austion is the Vice President of Public Affairs at the National Math and Science Initiative (NMSI), where she leads marketing and government relations. With a focus on STEM advancement, Nichole orchestrates strategic initiatives, bridging marketing and government relations to amplify NMSI's impact nationwide. Her expertise stems from her work with global technology firms like Sabre Holdings and Teradata, where she drove multimillion-dollar revenue through innovative marketing strategies. She is the author of “Miles and the Math Monsters,” a children's book that transforms math into a friendly presence, encouraging children to see it as a helpful companion intertwined with their everyday environment. She holds an engineering degree from Howard University and an MBA from The University of Texas at Austin.About National Math and Science Initiative: The National Math and Science Initiative (NMSI) is a nonprofit organization dedicated to improving STEM education in the U.S. It focuses on expanding access to high-quality math and science programs, particularly for students who face barriers to educational opportunities. Since its founding in 2007, NMSI has worked with schools, teachers, and policymakers to enhance STEM learning and prepare students for careers in STEM fields.Links from the Show:Related The STEM Space Podcast Episodes166. Why Math Matters ft. CEO of NMSI179. Seeing Students as Mathematicians ft. Cherelle McKnight of Illustrative MathematicsIs Teaching Engineering Bad? - Part 1 and Part 2117. Why Does Belonging Matter in STEM Education?Vivify STEM Blog PostsTop 10 Ways To Encourage Girls In STEMHow to Teach Growth Mindset and Failing ForwardThe Importance of FailureVivify STEM LessonsFREE! - Add Math Practice to any Design Challenge using these Editable Budget SheetsCatapult ChallengeStomp Rocket ChallengeFREE! - Space for You in STEM Inspirational PostersBreak Down Stereotypes! Who is a STEM Professional? GameFREE! - Women in STEM Classroom PostersOther STEM ResourcesNichole Austion | LinkedInNational Math and Science Initiative (NMSI)Miles and the Math Monsters bookNMSI Professional Development ServicesWebinar | Breaking Boundaries: Celebrating Black Excellence in STEM (ft. Nichole Austion, Joan Higginbotham, and Dr. Ciara Sivels)THE STEM SPACE SHOWNOTESTHE STEM SPACE FACEBOOK GROUPVIVIFY INSTAGRAMVIVIFY FACEBOOKVIVIFY XVIVIFY TIKTOKVIVIFY YOUTUBE
Новый гость студии подкастов TAdviser - генеральный директор российского разработчика решений для построения хранилищ данных и автоматизации процессов управления данными TData Станислав Лазуков. Он рассказал об этапах становления своей компании и реализованных проектах в «Ростелекоме» и ЕВРАЗе, а также о конкуренции на российском рынке управления данными и факторах, влияющих на его развитие. Текстовая расшифровка подкаста здесь-------------------------------------------------------------------Присылайте свои предложения и комментарии по развитию подкастов TAdviser на почту editor@tadviser.ruTelegram-канал TAdviser: https://t.me/tadviserКанал TAdviser на FB: https://www.facebook.com/TAdviser.ruКанал TAdviser на VK: https://vk.com/it_in_russia
Great CFOs don't just crunch numbers – they shape strategies, build teams, and drive transformation. Jack McCullough welcomes Claire Bramley, CFO of Teradata, to explore the journey of a true rockstar CFO. From leading Teradata's shift to cloud-based solutions to championing team diversity and breaking career barriers, Claire shares her insights on what it takes to excel in financial leadership. Whether you are a seasoned executive or an aspiring leader, this episode dives into the power of communication, the importance of storytelling in finance, and how to embrace opportunities that push boundaries.To book a demo with Planful, click here.
What will 2025 bring for enterprise technology? How will AI evolve, and what should businesses anticipate in the coming year? In this episode, I speak with Louis Landry, the newly appointed CTO of Teradata, as he shares his insights on the future of AI, data analytics, and the role of trust in technology. With more than two decades of experience in software architecture and engineering leadership, Louis has played a pivotal role in shaping Teradata's innovation strategy. Having led the company's Technology and Innovation Office, his focus now extends to scaling AI-driven solutions that empower businesses with trusted intelligence. As we look ahead, what will be the defining trends of 2025? Louis explores the growing impact of retrieval-augmented generation, the evolution of large-scale personalization, and the next phase of AI model efficiency. He also delves into the rise of agentic AI systems—where generative AI merges with traditional software to create more autonomous, intelligent processes. But with rapid advancements come pressing concerns. How can organizations ensure AI remains explainable, transparent, and aligned with ethical standards? Louis emphasizes the importance of people-centered accountability, measurable business impact, and the foundational role of trusted data in AI's success. As AI continues to reshape industries, businesses must navigate regulatory challenges, balance innovation with risk, and rethink their approach to data governance. Beyond AI, Louis discusses the role of open-source technology in enterprise environments, the shift from project-focused to outcome-driven AI adoption, and how organizations can harness data harmonization to unlock new opportunities. He also shares his thoughts on the strategic investments businesses should prioritize in an increasingly complex digital world. As AI and analytics continue to evolve, what strategies will define industry leaders in 2025? How can businesses stay ahead while ensuring AI remains trustworthy and effective? Tune in to hear Louis Landry's perspective, and let us know your thoughts. How do you see AI shaping the business landscape in the coming year?
Welcome to the season finale of VirtuallyLive! The Podcast! In this special episode, we wrap up an incredible season by bringing together insights from leaders at top companies like IBM, Nasdaq, Mayo Clinic, Adobe, Salesforce, Bloomberg, Workday, Teradata, Novartis, Siemens Healthineers, AstraZeneca, and Red Hat. Together, they share their predictions on how generative AI is transforming industries and shaping the future of work, education, and healthcare. From the ethics of AI to practical applications like digital twins, personalized healthcare, and smarter decision-making, this episode offers a thoughtful look at the innovations shaping industries today and into the future. Don't forget to like, comment, and subscribe
Jacqueline Woods is the Chief Marketing Officer for Teradata, the cloud analytics and data platform for AI, headquartered in San Diego, California. Jacqueline joined Teradata from NielsenIQ, where she was a member of the executive leadership team and Global Chief Marketing and Communications Officer. She also spent nearly 10 years as CMO of the IBM Global Partner Ecosystem Division, where she focused on building cloud, data, AI, and SaaS strategies. Before that, she was Global Head of Customer Segmentation & Customer Experience at General Electric and also held roles of increasing responsibility at Oracle for 10 years, as well as leadership roles at Ameritech and GTE, now Verizon. Thankfully, Jacqueline has always loved math, because, as she points out, marketing today is based mostly on data. However, she also emphasizes the importance of empathy and notes that it is essential in creating a space where people can be authentic and drive innovation, productivity, and product design. In this episode, Alan and Jacqueline talk about where trust fits into the AI conversation, what leaders need to know before launching an AI initiative, and how AI can boost efficiency and productivity. Jacqueline also tells us why underrepresented people, like black female business leaders, need to be involved in AI as it evolves. While AI has been around for a while, it became all the rage at the end of 2022 with public access to tools like ChatGPT. AI is based on patterns, some factual and some non-factual. So that poses the question: how do we trust AI? That's where Teradata comes in. By having responsible people create the models, take responsibility, and think critically about the training, governance, and outcomes, Teradata is focused on building the trust required to use artificial intelligence, generative artificial intelligence, and large language models for their “global 10,000” clientele, like American Airlines and United Healthcare. These companies rely on Teradata for their cloud data and analytics workloads. Teradata has been stewards of trusted information and data since they were founded about 40 years ago, and they believe people thrive when empowered with better and entrusted information. In this episode, you'll learn about: Why is empathy important for marketers? The importance of clean data Why do underrepresented people have to participate in the evolution of AI? Key Highlights: [02:10] What is empathy? [03:45] Why marketers need empathy [07:00] How a love of math led her to marketing [10:30] Her path to Teradata [13:15] Teradata's focus and mission for mankind [14:20] Teradata's clients, services, and use cases [19:00] How can business leaders ensure AI can be trusted? [21:50] What do leaders need to do before launching an AI initiative? [26:45] Remaining authentic while using AI [30:20] Creative AI use cases as workforce multipliers and how that may change work in the US [33:00] Why underrepresented groups need to participate in AI [36:20] What we can all learn from Moe [40:55] Advice to her younger self [41:45] “Of course it's Ai!” [42:10] Watching the shifting nature of work [44:40] Can you explain what marketing does and why it's important? Looking for more?Visit our website for the full show notes, links to resources mentioned in this episode, and ways to connect with the guest! Become a member today and listen ad-free, visit https://plus.acast.com/s/marketingtoday. Hosted on Acast. See acast.com/privacy for more information.
In this episode of AI, Government, and the Future, host Max Romanik is joined by Meeta Vouk, VP of Product Management AI and Analytics at Teradata and Former Director Product Management and Development AI for IBM Z, to discuss the critical aspects of AI governance, ethical implementation, and bias mitigation. Meeta shares insights from her extensive experience in developing AI solutions while addressing key challenges in hallucination prevention and regulatory frameworks.
Jacqueline Woods is the Chief Marketing Officer at the software company Teradata. She has a unique vantage point as both a CMO herself and as someone who works with other companies every day to better leverage technology, data, and artificial intelligence. She explains how companies can think more strategically about their AI implementation, where personalization is heading next, and the most effective use cases for impacting customers. For Further Reading: Learn more about Jacqueline: https://www.linkedin.com/in/jacqueline-woods-361477/Trust Me I'm a Robot: How AI Can Improve Brand Stickiness (Fast Company): https://www.fastcompany.com/91194920/trust-me-im-a-robot-how-ai-can-improve-brand-stickinessCustomer Complaint Analyzer Demo Video: https://www.teradata.com/insights/videos/customer-complaint-analyzer Listen on your favorite podcast app: https://pod.link/1715735755
How will specialized AI agents collaborate to outperform general AI? - A deep dive into Theoriq's vision for decentralized agent collectives with founder Ron Bodkin, former Google Cloud CTO office lead. This in-depth conversation between Luke Saunders and Ron Bodkin explores: - Why specialized agent collectives may outperform general AI systems - The technical foundations of agent collaboration and evaluation - How Theoriq enables permissionless agent development and discovery - The role of decentralization in ensuring safe and ethical AI development - Future implications for autonomous AI systems and agent coordination - Insights from Ron's experience at Google, Teradata, and Vector Institute The discussion provides valuable perspective on how decentralized networks of specialized AI agents could provide an alternative to centralized AI development, with a focus on modular, community-driven innovation and proper governance structures. Watch more sessions from Crypto x AI Month here: https://delphidigital.io/crypto-ai --- Crypto x AI Month is the largest virtual event dedicated to the intersection of crypto and AI, featuring 40+ top builders, investors, and practitioners. Over the course of three weeks, this event brings together panels, debates, and discussions with the brightest minds in the space, presented by Delphi Digital. Crypto x AI Month is free and open to everyone thanks to the support from our sponsors: https://olas.network/ https://venice.ai/ https://near.org/ https://mira.foundation/ https://www.theoriq.ai/ --- Follow the Speakers: Luke Saunders on Twitter/X ► https://x.com/lukedelphi Ron Bodkin on Twitter/X ► https://x.com/ronbodkin --- Chapters 00:00 Introduction and Sponsor Acknowledgments 00:52 Introduction of Ron Bodkin, Founder of Theoriq AI 01:17 Ron's Background in AI and Big Data 04:28 Ron's Experience at Google and AI Development 07:35 The Impact of Transformers and GPT-3 on AI 11:32 Defining AI Agents and Their Capabilities 15:31 The Concept of Agent Collectives 18:38 The Future of AI and AGI 25:00 Concerns About AI Safety and Development 28:23 Overview of Theoriq AI's Agent Base Layer 30:54 Evaluators in Theoriq's System 34:08 Permissionless Nature of Theoriq's Platform 36:14 Developer Experience and SDK for Theoriq 39:33 Optimizers and Agent Collectives 41:48 Future of Autonomous AI Agents 44:22 Discussion on Truth Terminal and AI Autonomy 47:34 Call to Action and Closing Remarks Disclaimer All statements and/or opinions expressed in this interview are the personal opinions and responsibility of the respective guests, who may personally hold material positions in companies or assets mentioned or discussed. The content does not necessarily reflect the opinion of Delphi Citadel Partners, LLC or its affiliates (collectively, “Delphi Ventures”), which makes no representations or warranties of any kind in connection with the contained subject matter. Delphi Ventures may hold investments in assets or protocols mentioned or discussed in this interview. This content is provided for informational purposes only and should not be misconstrued for investment advice or as a recommendation to purchase or sell any token or to use any protocol.
Generative AI and unstructured data are transforming how businesses improve customer experiences and streamline internal processes. As technology evolves, companies find new ways to gain insights, automate tasks, and personalize interactions, unlocking new growth opportunities. The integration of these technologies is reshaping operations, driving efficiency, and enhancing decision-making, helping businesses stay competitive and agile in a rapidly changing landscape. Organizations that embrace these innovations can better adapt to customer needs and market demands, positioning themselves for long-term success.In this episode, Doug Laney speaks to Katrina M. Conn, Senior Practice Director of Data Science at Teradata, and Sri Raghavan, Principal of Data Science and Analytics at AWS, about sustainability efforts and the ethical considerations surrounding AI. Key Takeaways:Generative AI is being integrated into various business solutions.Unstructured data is crucial for enhancing customer experiences.Real-time analytics can improve customer complaint resolution.Sustainability is a key focus in AI resource management.Explainability in AI models is essential for ethical decision-making.The combination of structured and unstructured data enhances insights.AI innovations are making analytics more accessible to users.Trusted AI frameworks are vital for security and governance.Chapters: 00:00 - Introduction to the Partnership and Generative AI02:50 - Technological Integration and Market Expansion06:08 - Leveraging Unstructured Data for Insights08:55 - Innovations in Customer Experience and Internal Processes11:48 - Sustainability and Resource Optimization in AI15:08 - Ensuring Ethical AI and Explainability23:57 - Conclusion and Future Directions
In this episode of VirtuallyLive! The Podcast, top industry leaders from Salesforce, Teradata, Bloomberg, and Workday come together to explore the "Three I's" — Impact, Inclusion, and Innovation. Get ready for a conversation filled with personal stories, real-world experiences, and insights that bridge the worlds of data, technology, and leadership. Each speaker reveals a game-changing decision that shaped their career or transformed their organization. Tune in to discover how innovation and data-driven strategies are shaping the future of business and digital experiences! Don't forget to like and subscribe!
Join Tony Safoian, CEO of SADA, as he sits down with Steve McMillan, CEO of Teradata, for a deep dive into the dynamic world of cloud security, AI innovation, and the transformative power of strategic partnerships. In this episode of the Cloud and Clear podcast, Tony and Steve explore the unique challenges and opportunities of transitioning legacy systems to the cloud, unlocking AI's potential, and the future of enterprise technology. Don't miss this insightful conversation on the cutting edge of cloud security and AI, and the strategic partnership driving innovation in the space. Join us for more content by liking, sharing, and subscribing!
This week, we're joined by Scott Herren, Executive Vice President and Chief Financial Officer of Cisco Systems, and Eric Kutcher, Senior Partner and Chair of McKinsey & Company, North America. They discuss the rationale for Cisco's recent acquisition, the role of CFOs in driving change within a company, the impact of AI on finance, and the current geopolitical and economic climate. Related insights: The Seasons of the CFO Palo Alto Networks CFO on AI, cybersecurity, and the finance leader's mandate Data, analytics, and decisions: An interview with Teradata's CFODiscover our latest insights and join more than 92,000 influential professionals who are part of our LinkedIn community: https://www.linkedin.com/showcase/mckinsey-strategy-&-corporate-finance/See www.mckinsey.com/privacy-policy for privacy information
Trusted AI ensures that people, data, and AI systems work together transparently to create real value. This requires a focus on performance, innovation, and cost-effectiveness, all while maintaining transparency. However, challenges such as misaligned business strategies and data readiness can undermine trust in AI systems. To build trusted AI, it's crucial to first trust the data. A robust data platform is essential for creating reliable and sustainable AI systems. Tools like Teradata's ClearScape Analytics help address concerns about AI, including issues like generative AI hallucinations, by providing a solid foundation of trusted data and an open, connected architecture.In this episode, Doug Laney, Analytics Strategy Innovation Fellow with West Monroe Partners, speaks to Vedat Akgun, VP of Data Science & AI and Steve Anderson, Senior Director of Data Science & AI at Teradata, about trusted AI. Key Takeaways:Value creation, performance, innovation, and cost-effectiveness are crucial for achieving trusted AI.Trusting data is essential before building AI capabilities to avoid biases, inaccuracies, and ethical violations.A robust data platform is a foundation for creating trusted and sustainable AI systems.Generative AI raises concerns about hallucinations and fictitious data, highlighting the need for transparency and accountability.Teradata offers features and capabilities, such as ClearScape Analytics and an open and connected architecture, to address trust issues in AI.Chapters:00:00 - Introduction and Defining Trusted AI01:33 - Value Creation and the Importance of Driving Business Value03:27 - Transparency as a Principle of Trusted AI09:00 - Trusting Data Before Building AI Capabilities14:51 - The Role of a Robust Data Platform in Trusted AI21:09 - Concerns about Trust in Generative AI23:03 - Addressing Trust Issues with Teradata's Features and Capabilities25:01 - Conclusion
Creating Value with TeradataThe Big Themes:Open Table Formats: Datasets are one of the most common ways that organizations use open table formats, as it enables them to combine several types of data and access that data. Open table formats are improving performance which will help drive a more flexible, low-cost storage option for enterprises. Having this level of customer choice will drive greater adoption and outcomes for customers deploying open table formats over time.Trusted AI: Trusted systems require access to data, but it must be properly managed data. Open table formats can help with trusted data progression, such as reducing data duplication by consolidation and providing a single place of oversight. As open table formats match agility and flexibility with the appropriate levels of governance, it can deliver trusted outcomes that overflow into providing trusted AI.Driving Customer Value and Success: Providing customers with opportunities for success is the ultimate driver. For example, Teradata supported a call center that wanted to improve customer satisfaction and outcomes based on the call center data. Teradata provided the organization with text analytics, large language models, and more being driven through a Teradata analytic model engine that provides real-time advice to agents.The Big Quote: “We've always said, ‘Whoever has access to the most data can win in the analytics space.' So, open table formats are a key component of really helping companies create an environment of trust around the data because without trusted data, you can't have trusted AI.”
With Apple going all in on AI to boost its next round of hardware and services with Apple Intelligence, data is needed now more than ever—massive data. More data than one company can provide, if they hope to make AI services quickly and efficiently do what's asked, without getting bogged-down. Another California company attacking AI plumbing problems is Teradata, one of the largest cloud analytics platforms with a focus on harmonizing data. Starting as a hardware company, making things like the first system over 1 terabyte for Walmart, Teradata has transitioned into a software company, with 7,000 employees in 41 countries who delivered $1.8 Billion in revenue last year. On this episode of the Reboot Chronicles, Teradata CEO Steve McMillan, unpacks how he has rebooted the company and brought it into the 21st century. Listen in as we discuss how they made that transition, how AI is impacting their growth, his personal journey, and where they may end up in the revolutionary times ahead.
Trusted AI Models and StrategiesThe Big Themes:Data reliability: After conducting a survey, Teradata found that 40% of executives don't trust their own data to generate accurate AI outputs. AI requires traceability and transparency around where data comes from. Teradata helps combat this disconnect by providing access and opportunities for companies to use their data in more compelling ways to achieve the business outcomes they're looking for.3 Pillars of AI: Teradata recognizes people, transparency, and value creation as three core elements of AI. Having clean data that you can trust is critical. Ensuring ROI on the AI investments to create value is essential. Having a process that provides better compliance, governance, and security is vital.AI strategy: Business leaders and the C-suite must understand their own inputs and outputs to make the company better. Teradata believes in building Trusted AI models to ensure businesses have outputs they can rely on.The Big Quote: “We believe that trusted AI is really the way that people, data, and AI work together...Ultimately, it is the responsibility of the people that are managing those environments, using that data to ensure and determine that it's used in the right purposes."
When data plays a vital role in the enterprise, effectively using analytics to drive business value is crucial. Today, we're joined by Steve Fiore, Senior Director of Customer Experience at Teradata, who will share insights into how analytics and AI are shaping enterprise strategies and personal productivity. RESOURCES Connect with Greg on LinkedIn: https://www.linkedin.com/in/gregkihlstrom Don't miss the Mid-Atlantic MarCom Summit, the region's largest marketing communications conference. Register with the code "Agile" and get 15% off. Don't miss a thing: get the latest episodes, sign up for our newsletter and more: https://www.theagilebrand.show Check out The Agile Brand Guide website with articles, insights, and Martechipedia, the wiki for marketing technology: https://www.agilebrandguide.com The Agile Brand podcast is brought to you by TEKsystems. Learn more here: https://www.teksystems.com/versionnextnow The Agile Brand is produced by Missing Link—a Latina-owned strategy-driven, creatively fueled production co-op. From ideation to creation, they craft human connections through intelligent, engaging and informative content. https://www.missinglink.company
In this episode we dive into the complexities of establishing and scaling an effective ABM program – including the challenges, strategies, and keys to success.
If you're trying to demonstrate value in a new category or sell a new solution like AI, then this podcast is for you. We pulled together three segments on the topic featuring:Neeraj Agrawal - General Partner, Battery VenturesKeno Helmi - CRO, EspressiveChris Degnan - CRO, SnowflakeADDITIONAL RESOURCESFor more information on Selling in a New Category, check out Force Management's eBook: https://hubs.li/Q02GXNTZ0Tune in and learn more about this episode of The Revenue Builders Podcast.HERE ARE SOME KEY SECTIONS TO CHECK OUT[00:01:28] Understanding market transitions and spotting opportunities.[00:03:01] Challenges of new technologies as solutions looking for problems.[00:04:29] Investing in new product areas and the importance of timing.[00:05:07] The role of POVs in selling complex technologies.[00:06:06] Different sales motions: Provoking interest vs. competing in an active market.[00:07:52] Qualifying economic buyers before a POV.[00:11:12] Realities of selling new technology at a startup.[00:13:24] Strategies for targeting early customers and overcoming competition.[00:15:14] Key customers that helped shape Snowflake's success.HIGHLIGHT QUOTES[00:01:46] "Spotting these transitions and being there at the right point is a key component here."[00:03:01] "New technologies as solutions looking for a problem must be the harder ones to guess."[00:05:07] "Sharing a POV or a POC is critically important when you're selling a technology your customers may not understand."[00:06:56] "If I have a brand new technology that customers may not understand, I can alleviate a lot of their concerns by doing a POV."[00:11:32] "There's a misperception if you're a salesperson that you can go into an early stage startup and make a ton of money."[00:14:09] "Teradata was almost arrogant... They let the cloud sideswipe them."
Need help to make your marketing efforts truly effective in today's noisy digital landscape? Look no further. Tune in as Jacqueline Woods, Teradata's Chief Marketing Officer, shares her data-driven approach to modern marketing strategy! In this episode, Jacqueline Woods shares her insights on targeted marketing, segmentation, and the science behind successful marketing campaigns. We'll cover: Why precise targeting is crucial in today's crowded marketing landscape. How to effectively segment your audience by role, industry, and solution. How lower barriers to entry have increased competition. The power of a three-dimensional marketing matrix in reaching your customers. Why understanding different roles' priorities is key to effective messaging. How to leverage insights from one industry to innovate in another. Why marketing should be data-driven and intentional. Why some low-ROI activities might still be strategically important. How to balance data-driven decisions with strategic considerations. Why understanding customer needs is the foundation of successful selling. How to articulate your unique value proposition effectively. The critical importance of intentional, data-backed marketing decisions. How to avoid common pitfalls in pricing and promotion strategies. Follow Mark: LinkedIn: https://hi.switchy.io/markdrager Instagram: https://hi.switchy.io/KcKi Want more free tools? Go to our podcast page at https://hi.switchy.io/KcKe
Justin Borgman, Co-Founder and CEO of Starburst, explores the cutting-edge world of data management and analytics. Justin shares insights into Starburst's innovative use of Trino and Apache Iceberg, revolutionizing data warehousing and analytics. Learn about the company's journey, the evolution of data lakes, and the role of data science in modern enterprises. Episode Overview: In Episode 86 of Great Things with Great Tech, Anthony Spiteri chats with Justin Borgman, Co-Founder and CEO of Starburst. This episode dives into the transformative world of data management and analytics, exploring how Starburst leverages cutting-edge technologies like Trino and Apache Iceberg to revolutionize data warehousing. Justin shares his journey from founding Hadapt to leading Starburst, the evolution of data lakes, and the critical role of data science in today's tech landscape. Key Topics Discussed: Starburst's Origins and Vision: Justin discusses the founding of Starburst and the vision to democratize data access and eliminate data silos. Trino and Iceberg: The importance of Trino as a SQL query engine and Iceberg as an open table format in modern data management. Data Democratization: How Starburst enables organizations to perform high-performance analytics on data stored anywhere, avoiding vendor lock-in. Data Science Evolution: Insights into what it takes to become a data scientist today, emphasizing continuous learning and adaptability. Future of Data Management: The shift towards data and AI operating systems, and Starburst's role in shaping this future. Technology and Technology Partners Mentioned: Starburst, Trino, Apache Iceberg, Teradata, Hadoop, SQL, S3, Azure, Google Cloud Storage, Kafka, Dell, Data Lakehouse, AI, Machine Learning, Big Data, Data Governance, Data Ingestion, Data Management, Capacity Management, Data Security, Compliance, Open Source ☑️ Web: https://www.starburst.io ☑️ Support the Channel: https://ko-fi.com/gtwgt ☑️ Be on #GTwGT: Contact via Twitter @GTwGTPodcast or visit https://www.gtwgt.com ☑️ Subscribe to YouTube: https://www.youtube.com/@GTwGTPodcast?sub_confirmation=1 Check out the full episode on our platforms: YouTube: https://youtu.be/kmB_pjGb5Js Spotify: https://open.spotify.com/episode/2l9aZpvwhWcdmL0lErpUHC?si=x3YOQw_4Sp-vtdjyroMk3Q Apple Podcasts: https://podcasts.apple.com/us/podcast/darknet-diaries-with-jack-rhysider-episode-83/id1519439787?i=1000654665731 Follow Us: Website: https://gtwgt.com Twitter: https://twitter.com/GTwGTPodcast Instagram: https://instagram.com/GTwGTPodcast ☑️ Music: https://www.bensound.com
In this episode of the Thoughtful Entrepreneur, your host Josh Elledge speaks to the Co-founder of Neocore and Author of Content Capitalist, Michael Becker.Michael Becker is involved with NeoCore, a pioneering technology company based in Dubai, as part of the Antler pre-launch accelerator program. The company is developing earbuds that can detect brainwaves to tailor content experiences innovatively. Their team, including a physicist and a neuroscientist, aims to make advanced technology accessible to consumers.Michael discussed the evolution of technology through various "epochs of computing," from the invention of the microchip to the advent of the iPhone. He described NeoCore's mission to lead the next technological phase, focusing on integrating AI and brain-computer interfaces. This could transform our interactions with technology by making it more intuitive and attuned to our cognitive states.He also stressed the importance of creating high-quality content in different formats, such as blogs and videos. Michael advocates treating content creators like brand journalists and using multimedia to enhance visibility. He believes adopting a sophisticated content and outreach approach is essential for effective lead generation.Key Points from the Episode:Michael Becker and his company NeoCoreExplanation of the brainwave-detecting earbuds and their connection to personalized contentVision and goals for NeoCore in the future of technology and AIOverview of Michael Becker's book "Content Capitalist" and its relevance to B2B SaaS and content strategyImportance of high-quality content creation and its impact on inbound lead generationFuture trends in content marketing, including the use of AI and personalized contentExamples and case studies of successful content strategies and their impact on businessesMichael's book "Content Capitalist" and its potential transformational impactAbout Michael BeckerMichael Becker is a versatile professional whose career spans corporate achievements and personal growth. Initially, he contributed significantly to Teradata, playing a vital role in a $90 million sale in 2016. He then moved to Emarsys, where his efforts in expanding the branded media program were instrumental in SAP's acquisition of $500 million in 2019. His early work also includes being the first content hire at Sharpen, although specific achievements at this company need to be detailed.In parallel to his corporate success, Michael experienced a profound spiritual awakening in 2018, which marked a pivotal shift in his life. Motivated to share his new-found insights, he established a theme page that quickly grew to 60,000 followers. Leveraging this platform, he transitioned into mentoring, coaching over 30 conscious entrepreneurs and launching an eLearning program. His international lifestyle as a digital nomad in Mexico and Costa Rica culminated in the successful sale of his brand in 2023, setting the stage for his next entrepreneurial endeavors.About Neocore:Neocore is a pioneering startup that develops consumer-grade brain-computer interface (BCI) technology, specifically through EEG-detecting earbuds paired with a connected mobile app. Their technology aims to transform how users interact with digital devices by utilizing cognitive data to enhance user experiences. Neocore's innovations are designed to elevate human agency through cognitively guided device interfacing and amplify neural potential with a chat interface that adapts to individual cognitive patterns.The company has...
As much as companies promise and commit to diversity, equity, and inclusion initiatives, the reality is corporate strategy, especially at the senior level, remains a male-dominated field. Some chief strategy officers have made it their mission to change that reality. In this episode, we hear from two of them. Alok Agrawal, Chief Strategy Officer & Head of Ventures at Celestica, and Nicolas Chapman, former EVP and Chief Strategy Officer of Teradata, have made tremendous strides in shifting the diversity levels of their strategy offices. Today's discussion is moderated by another diversity-focused strategy leader, Kalina Nikolova, SVP of Business Operations and Strategy at Yahoo. These CSOs will share the real hurdles they overcome to bring more diverse hiring to their strategy function. From them, you'll learn: -Advice from successful CSOs on how to set realistic objectives to support your DEI hiring initiatives -How to influence and work with HR to fill your hiring pipeline with more diverse candidates -Actionable steps to fight unconscious bias and make your job descriptions and interview processes more welcoming to diverse hires Learn more about Outthinker's community of chief strategy officers - https://outthinkernetwork.com/ Follow us on LinkedIn - https://www.linkedin.com/company/outthinker-networks
Join us in today's episode of the HR Leaders podcast as we welcome Kathy Cullen-Cote, Chief People Officer at Teradata.Kathy shares her extensive experience and innovative strategies in human resources, discussing the crucial role of evolving company culture to enhance diversity and inclusivity within the workforce.
Ever feel like you're drowning in data but not sure where to start? I see many businesses just doing, doing, doing, without truly taking the time to stop and understand their marketing numbers. And then they're left wondering, “Why isn't my profit going up?”, “Why isn't my revenue going up?”, or “Why is my revenue going up but my profit is declining?”A recent survey by Teradata revealed that 87% of marketers consider data to be their most underutilized asset. So today, I'm sharing how we can harness the power of marketing metrics strategically to steer your business towards growth and profitability. Tune in to find out which key metrics you need to track and how you can streamline the process while gaining valuable insights to optimize your marketing efforts.SHOW NOTES:https://themichellefernandez.com/podcast/285FREE LIVE TRAINING: How To Get More Customers & Revenue In A Predictable Way (without throwing money away on ads!)Connect with me on InstagramConnect with me on FacebookVisit my websiteP.s. Utilizing an all-in-one tool like OptimaFunnels will streamline your sales process, helping you track leads, automate follow-ups, and ultimately increase your conversion rates. Sign up today and revolutionize your sales approach with OptimaFunnels—your gateway to effortless sales success!
For this week's episode, Jacquelyn interviewed Scott Dykstra, CTO and co-founder of Space and Time. Before diving into web3, Scott spent almost 8 years at the cloud analytics and data platform Teradata and throughout the years he held roles of senior architect, director of cloud solutions and worked his way up to VP of the firm's global cloud.As for Space and Time, the company aims to be a verifiable compute layer for web3 that scales zero-knowledge proofs, or ZK proofs, on a decentralized data warehouse. Zero-knowledge proofs are a cryptographic action used to prove something about a piece of data, without revealing the origin data itself. Space and Time has indexed data both off-chain and on-chain from Ethereum, Bitcoin, Polygon, Sui, Avalanche, Sei and Aptos and is adding support for more chains to power the future of AI x blockchain. This episode is wrapping up Chain Reaction's monthly series diving into different topics and themes in crypto. This month's focused on blockchain and AI integrations.Jacquelyn and Scott discuss Space and Time's origin story, how data warehouses work in Web2.0 vs web3 and the importance of data transparency.They also dive into: Blockchain and AI potentialIts OpenAI and blockchain data developments Future use cases for data and on-chain AIAdvice throughout the bull and bear markets Chain Reaction comes out every Thursday at 12:00 p.m. ET, so be sure to subscribe to us on Apple Podcasts, Spotify or your favorite pod platform to keep up with the action.
Jacqueline Woods is the Chief Marketing Officer for Teradata, the cloud analytics and data platform for AI, headquartered in San Diego, California. Jacqueline joined Teradata from NielsenIQ, where she was a member of the executive leadership team and Global Chief Marketing and Communications Officer. She also spent nearly 10 years as CMO of the IBM Global Partner Ecosystem Division, where she focused on building cloud, data, AI, and SaaS strategies. Before that, she was Global Head of Customer Segmentation & Customer Experience at General Electric and also held roles of increasing responsibility at Oracle for 10 years, as well as leadership roles at Ameritech and GTE, now Verizon. Thankfully, Jacqueline has always loved math, because, as she points out, marketing today is based mostly on data. However, she also emphasizes the importance of empathy and notes that it is essential in creating a space where people can be authentic and drive innovation, productivity, and product design.In this episode, Alan and Jacqueline talk about where trust fits into the AI conversation, what leaders need to know before launching an AI initiative, and how AI can boost efficiency and productivity. Jacqueline also tells us why underrepresented people, like black female business leaders, need to be involved in AI as it evolves. While AI has been around for a while, it became all the rage at the end of 2022 with public access to tools like ChatGPT. AI is based on patterns, some factual and some non-factual. So that poses the question: how do we trust AI? That's where Teradata comes in. By having responsible people create the models, take responsibility, and think critically about the training, governance, and outcomes, Teradata is focused on building the trust required to use artificial intelligence, generative artificial intelligence, and large language models for their “global 10,000” clientele, like American Airlines and United Healthcare. These companies rely on Teradata for their cloud data and analytics workloads. Teradata has been stewards of trusted information and data since they were founded about 40 years ago, and they believe people thrive when empowered with better and entrusted information.In this episode, you'll learn about:Why is empathy important for marketers?The importance of clean data Why do underrepresented people have to participate in the evolution of AI?Our Sponsor:Download Emailtooltester's free comparison spreadsheet to find the best email marketing service for your business.Key Highlights:[02:10] What is empathy?[03:45] Why marketers need empathy [07:00] How a love of math led her to marketing [10:30] Her path to Teradata[19:00] How can business leaders ensure AI can be trusted?[21:50] What to do before launching an AI initiative?[26:45] Remaining authentic using AI[30:20] Creative AI use cases as workforce multipliers[33:00] Why underrepresented groups need to participate in AI [36:20] What we can all learn from Moe[41:45] “Of course it's Ai!”[42:10] Watching the shifting nature of work[44:40] Can you explain what marketing does and why it's important?Looking for more?Visit our website for the full show notes, links to resources mentioned in this episode, and ways to connect with the guest! Become a member today and listen ad-free, visit https://plus.acast.com/s/marketingtoday. Hosted on Acast. See acast.com/privacy for more information.
Every week on Pipeline Visionaries, we sit down with amazing marketing leaders to uncover the pipeline strategies that have been fundamental to their success. In each episode, we ask these guests which three areas of investment are most important to their marketing strategies. Tune into this special series to hear the budget items our CMO guests can't live without!Find parts one, two, three, four, five, six, seven, eight and nineEpisode Timestamps: *(01:37): Grant Johnson, CMO at Billtrust*(04:22): Jessica Gilmartin, CMO at Calendly*(06:11): Megan McDonagh, CMO at Amperity*(07:17): Shafqat Islam, CMO at Optimizely*(08:42:) Efrat Ravid, CMO at Quantum Metric*(10:59): Orlando Baeza, CMO & CRO at Flock Freight*(13:32): Jenny Victor, CMO at Epicor*(16:18): Jessica Shapiro, CMO at LiveRamp*(19:09): Jacqueline Woods, CMO at Teradata*(21:24): Brad Rinklin, CMO at Infoblox*(24:33): Celia Fleischaker, CMO at isolved Sponsor:Pipeline Visionaries is brought to you by Qualified.com, the #1 Conversational Marketing platform for companies that use Salesforce and the secret weapon for pipeline pros. The world's leading enterprise brands trust Qualified to instantly meet with buyers, right on their website, and maximize sales pipeline. Visit Qualified.com to learn more.Links:Connect with Ian on LinkedInLearn more about Caspian Studios
Zane Rowe is the CFO of Workday, a California-based company that develops enterprise software for managing finance, HR, and planning. Workday has been growing rapidly, and its growth-oriented culture is a central theme of the discussion we recorded between Zane and McKinsey's CFO Eric Kutcher Listen to Eric's conversation with Palo Alto Network's CFO, Dipak Golechha: https://link.chtbl.com/SgZPqa6t Listen to Eric's conversation with Teradata CFO Claire Bramley: https://link.chtbl.com/w3ustEBP Join our Strategy and Corporate Finance LinkedIn community and follow us on X at @McKStrategy. Related reading: Data, analytics, and decisions: An interview with Teradata's CFO Palo Alto Networks CFO on AI, cybersecurity, and the finance leader's mandate Gen AI: A guide for CFOs Join 90,000 other members of our LinkedIn community: https://www.linkedin.com/showcase/mckinsey-strategy-&-corporate-finance/See www.mckinsey.com/privacy-policy for privacy information
In today's episode, we're resharing Dheeraj Pandey's popular session from ELC Annual 2023 on the disciplined pursuit of less! As the Co-Founder, CEO & Chairman of DevRev.ai, he shares how AI tools can maximize customer impact & reduce information asymmetry between various teams, including eng, customer support, product, sales, etc., ultimately creating a more customer-centric mindset. He reveals how to leverage AI to tackle “verbs,” such as classifying, routing, attributing, summarizing and more, further streamlining productivity and empowering your org to focus on customer needs.ABOUT DHEERAJ PANDEYDheeraj Pandey is the co-founder & CEO of DevRev.ai, one of the hottest startups in Silicon Valley, with over 70 million dollars in seed funding.He previously founded Nutanix (Nasdaq: NTNX), a global leader in enterprise cloud software and hyperconverged infrastructure solutions, and currently sits on the board of Adobe (Nasdaq: ADBE) and is a member of their Audit Committee.Dheeraj co-founded Nutanix in 2009 and led as its CEO and Chairman for 11+ years. Boasting the largest software IPO in 2016, Nutanix is now a multi-billion dollar company with thousands of employees in over 60 countries. Pandey has been recognized with prestigious industry awards, including Dell's Founders 50 and the E&Y Entrepreneur of the Year, Silicon Valley.Before founding Nutanix, Pandey was the VP of engineering at Aster Data (now Teradata). His technology and enterprise software experience include engineering and leadership roles at Oracle, Zambeel and Trilogy Software. Pandey has been recognized with several prestigious industry awards, including Dell's Founders 50 and the E&Y Entrepreneur of the Year, Silicon Valley. Pandey holds a degree in Computer Science from the Indian Institute of Technology (IIT), Kanpur, and an M.S. in Computer Science from the University of Texas at Austin. In addition, he was a Graduate Fellow of Computer Science at the University of Texas at Austin Ph.D. program."In my last company, we had brought almost 7,000 employees together. My biggest job was to really bring all the VPs together. What does it mean for them to work together, behave well together, and respect each other? And it's all because there were all these silos of departments. If you look at the power of AI, AI knows no boundaries. If anything, it needs the entire knowledge graph and the knowledge graph of customers and product and people and their work, not just people on the inside, but also users and their activities on the outside. That's a big problem that we all have to go and solve for.”- Dheeraj Pandey This episode is brought to you by testRigor!testRigor is trusted by tens of thousands of companies across the globe, including Netflix, Splunk, BusinessWire, and more to solve three main problems with end-to-end test automation:It's challenging, expensive, and slow to hire QA Automation EngineersLow productivity building your own QA AutomationFragile tests, that cause maintenance to consume enormous amounts of timetestRigor solves all of the above by allowing our users to express test cases in plain EnglishTo learn more, check out a case study on testRigor hereSign up for a free trial today at testrigor.comSHOW NOTES:The role of essentialism in software dev & company building (1:52)Dheeraj's experience fostering a customer-centric approach in all teams (4:22)Commonly used tools & why they fall short for full eng functions (7:20)Why it's important to connect AI, analytics & collaboration features (10:15)How AI can help solve information asymmetry (13:12)Using AI for analytics to help make teams more customer-centric (15:14)Audience Q&As: A day in the life of a PM using LLMs in an interactive discussion (18:03)Tips for educating users to provide better prompts when using GenAI (22:50)How would a company typically use the DevRev product? (24:38)DevRev's object model of support (27:21)Is DevRev capable of answering arbitrary questions once data is uploaded? (28:36)Methods used to measure performance w/ DevRev (30:02)Creating multiple namespaces w/in the same index to host multi-tenant data (31:10)Qualitative & quantitative benefits DevRev offers to its customer base (33:39)LINKS AND RESOURCESVideo Version of EpisodeAll of the Sessions from ELC AnnualThis episode wouldn't have been possible without the help of our incredible production team:Patrick Gallagher - Producer & Co-HostJerry Li - Co-HostNoah Olberding - Associate Producer, Audio & Video Editor https://www.linkedin.com/in/noah-olberding/Dan Overheim - Audio Engineer, Dan's also an avid 3D printer - https://www.bnd3d.com/Ellie Coggins Angus - Copywriter, Check out her other work at https://elliecoggins.com/about/
In this episode of Infrastructure Matters, host Krista Macomber, alongside her co-hosts Camberley Bates and Steve Dickens review Broadcom's Analyst Event with the CEO of Broadcom Hock Tan and the VMware executive team. The crew reviewed the implications for Broadcom's strategy and their expected R&D investments. Camberley discussed the latest in Dell's Partner Program for fiscal year 2025 and where their APEX services are headed with the partners. Plus the Teradata progress with Vantage and implications of their earnings report. Key points discussed include: Coverage on Broadcom's hosted a virtual analyst meeting keynoted by Hock Tan, emphasizing strategic focus and investments VMware's partnership with Google on subscription licensing and portability aims to simplify and enhance offerings Dell's partner programs and changes in business models for partners, particularly in the context of Dell's APEX partner program adjustments The importance of customer adoption, technology execution, and embracing AI in companies' strategies, specifically focusing on Teradata's transition to Vantage and the integration of AI into their analytics offerings Teradata's pivot to the cloud with Vantage, boasting a 48% increase in cloud annual recurring revenue
Jon and Pete Najarian on today's episode of Rebel's Edge as they discuss stocks including Tripadvisor spiking on a potential takeover, Carl Icahn's 9.9% stake in JetBlue, UOA in Teradata, and Children's Place in big trouble. Gain a deeper understanding of these stocks as Jon and Pete share their perspectives, potential growth prospects, and market predictions. They also talk about the return of Indiana State basketball, and Harbaugh going to the Chargers. Stay on the cutting edge with Rebel's Edge.
This episode features an interview with Jacqueline Woods, CMO of Teradata and David Chan, Managing Director of Deloitte Digital. Jacqueline is an executive with 30 years of experience leading marketing efforts at Fortune 100 companies including IBM, GE, Oracle, and Verizon. David has spent his career partnering with clients to digitally transform their organizations by enabling key CX capabilities to creatively solve complex business problems.In this episode, Kailey sits down with Jacqueline and David for a panel discussion on the top CX trends, AI predictions for the year ahead, and omni-channel, real-time personalization.-------------------Key Takeaways:According to Jacqueline, data is more like water than oil. In order for AI to have real impact, your data needs to be clean with a traceable lineage.While real-time personalization is important to customers, what matters most is the messages being delivered to them are contextually relevant to their experience.With the world going cookieless, you should measure how much your business relies on third party cookies and then figure out how much to invest in first party data services to support the gap.-------------------“I often talk about data – it's not like oil. To me, it's more like water. You have a lot of water that's not usable. You have a lot of things in data today that aren't usable. Now, in order for AI to be really impactful in your organization, it has to start with data. Do you have clean data? Is that data pristine? Do you know the lineage of the data? Because, AI is nothing if it doesn't have clean data to essentially build intelligence off of, particularly when you talk about generative AI.” – Jacqueline Woods“Everyone wants real-time personalization. What that means is the data has to be real-time collected. Data has to be real-time processed. Data has to be real-time curated to be made of some sort of business sense to then activate on in real-time. To me, what matters more is less about whether it's real-time, because just faster is not always better. It's about how contextually relevant the message is being returned to the customer from the brand. That is more meaningful.” – David Chan-------------------Episode Timestamps:*(03:44) - Jacqueline and David's career journeys*(07:27) - AI trends in 2024*(26:43) - The need for omni-channel, real-time personalization*(34:57) - Trust and privacy*(44:05) - 2024 CX predictions*(52:22) - Jacqueline and David's recommendations for staying ahead of the CX curve-------------------Links:Connect with Jacqueline on LinkedInConnect with David on LinkedInConnect with Kailey on LinkedInLearn more about Caspian Studios-------------------SponsorGood Data, Better Marketing is brought to you by Twilio Segment. In today's digital-first economy, being data-driven is no longer aspirational. It's necessary. Find out why over 20,000 businesses trust Segment to enable personalized, consistent, real-time customer experiences by visiting Segment.com
Web and Mobile App Development (Language Agnostic, and Based on Real-life experience!)
In this podcast episode, Krish explores Teradata from scratch. He starts by introducing Teradata as a complete cloud analytics and data platform, suitable for building large-scale data warehousing applications. He explains the concepts of data warehousing, data lakes, and data marts. Krish then explores Teradata's platform and products, including Teradata Vantage and ClearScape Analytics. He demonstrates how to get started with Teradata by creating an environment and exploring the JupyterLab interface. Krish creates tables, loads data, and runs queries in Teradata, providing hands-on experience and learning along the way. Krish explores the Teradata platform and its functionalities. He starts by troubleshooting a query and identifying the issue. Then, he runs basic queries to demonstrate the SQL syntax. Krish also discusses the availability of third-party plugins and explores some of them. Finally, he concludes the episode by discussing the next steps for further exploration and learning. Takeaways Teradata is a complete cloud analytics and data platform suitable for building large-scale data warehousing applications. Data warehousing, data lakes, and data marts are important concepts to understand in the context of Teradata. Teradata offers a range of products and platforms, including Teradata Vantage and ClearScape Analytics. JupyterLab and Jupyter Notebooks can be used to interact with Teradata and perform data analysis and exploration. Creating tables, loading data, and running queries are essential tasks in Teradata. Teradata is a powerful platform for data analysis and management. Troubleshooting queries is an essential skill for working with Teradata. Basic SQL syntax can be used to run queries on Teradata. Third-party plugins can enhance the functionality of Teradata. Chapters 00:00 Introduction to Teradata 01:16 Understanding Data Warehousing and Data Lakes 03:35 Data Marts and Teradata 04:26 Exploring Teradata's Platform and Products05:41Getting Started with Teradata 06:25 Teradata Vantage and ClearScape Analytics 07:57 Understanding JupyterLab and Jupyter Notebooks 19:14 Exploring JupyterLab Extensions 28:18 Creating Tables and Loading Data in Teradata 48:02 Running Queries in Teradata 53:49 Troubleshooting Query 55:14 Running Basic Queries 56:00 Third-Party Plugins 57:14 Exploring Plugins 58:18 Next Steps and Further Exploration 58:45 Conclusion Snowpal Products Backends as Services on AWS Marketplace Mobile Apps on App Store and Play Store Web App Education Platform for Learners and Course Creators
Summary Building machine learning systems and other intelligent applications are a complex undertaking. This often requires retrieving data from a warehouse engine, adding an extra barrier to every workflow. The RelationalAI engine was built as a co-processor for your data warehouse that adds a greater degree of flexibility in the representation and analysis of the underlying information, simplifying the work involved. In this episode CEO Molham Aref explains how RelationalAI is designed, the capabilities that it adds to your data clouds, and how you can start using it to build more sophisticated applications on your data. Announcements Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery. Your host is Tobias Macey and today I'm interviewing Molham Aref about RelationalAI and the principles behind it for powering intelligent applications Interview Introduction How did you get involved in machine learning? Can you describe what RelationalAI is and the story behind it? On your site you call your product an "AI Co-processor". Can you explain what you mean by that phrase? What are the primary use cases that you address with the RelationalAI product? What are the types of solutions that teams might build to address those problems in the absence of something like the RelationalAI engine? Can you describe the system design of RelationalAI? How have the design and goals of the platform changed since you first started working on it? For someone who is using RelationalAI to address a business need, what does the onboarding and implementation workflow look like? What is your design philosophy for identifying the balance between automating the implementation of certain categories of application (e.g. NER) vs. providing building blocks and letting teams assemble them on their own? What are the data modeling paradigms that teams should be aware of to make the best use of the RKGS platform and Rel language? What are the aspects of customer education that you find yourself spending the most time on? What are some of the most under-utilized or misunderstood capabilities of the RelationalAI platform that you think deserve more attention? What are the most interesting, innovative, or unexpected ways that you have seen the RelationalAI product used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on RelationalAI? When is RelationalAI the wrong choice? What do you have planned for the future of RelationalAI? Contact Info LinkedIn (https://www.linkedin.com/in/molham/) Parting Question From your perspective, what is the biggest barrier to adoption of machine learning today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast (https://www.dataengineeringpodcast.com) covers the latest on modern data management. Podcast.__init__ () covers the Python language, its community, and the innovative ways it is being used. Visit the site (https://www.themachinelearningpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com (mailto:hosts@themachinelearningpodcast.com)) with your story. To help other people find the show please leave a review on iTunes (https://podcasts.apple.com/us/podcast/the-machine-learning-podcast/id1626358243) and tell your friends and co-workers. Links RelationalAI (https://relational.ai/) Snowflake (https://www.snowflake.com/en/) AI Winter (https://en.wikipedia.org/wiki/AI_winter) BigQuery (https://cloud.google.com/bigquery) Gradient Descent (https://en.wikipedia.org/wiki/Gradient_descent) B-Tree (https://en.wikipedia.org/wiki/B-tree) Navigational Database (https://en.wikipedia.org/wiki/Navigational_database) Hadoop (https://hadoop.apache.org/) Teradata (https://www.teradata.com/) Worst Case Optimal Join (https://relational.ai/blog/worst-case-optimal-join-algorithms-techniques-results-and-open-problems) Semantic Query Optimization (https://relational.ai/blog/semantic-optimizer) Relational Algebra (https://en.wikipedia.org/wiki/Relational_algebra) HyperGraph (https://en.wikipedia.org/wiki/Hypergraph) Linear Algebra (https://en.wikipedia.org/wiki/Linear_algebra) Vector Database (https://en.wikipedia.org/wiki/Vector_database) Pathway (https://pathway.com/) Data Engineering Podcast Episode (https://www.dataengineeringpodcast.com/pathway-database-that-thinks-episode-334/) Pinecone (https://www.pinecone.io/) Data Engineering Podcast Episode (https://www.dataengineeringpodcast.com/pinecone-vector-database-similarity-search-episode-189/) The intro and outro music is from Hitman's Lovesong feat. Paola Graziano (https://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Tales_Of_A_Dead_Fish/Hitmans_Lovesong/) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/)/CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0/)
In this intimate discussion, four alumni participants of the Black Woman Leading Mid-Career (Samira Payne, Keira Braxton, Trenity Dobbey) and Early Career (Tiana Bryan-Okeke) programs share their stories of conquering mindset matters and doing their heart work along their professional journeys. Together, we explore mindset concepts that we navigate as leaders such as leaning into a growth mindset, overcoming negative thinking, challenging your internal saboteur, conquering fears, and aligning your “come from.” They share how they leaned into community and accessed collective healing to support them during this work, and their key takeaways from their experience in the Black Woman Leading program. Join us to reflect on your own journey, celebrate your growth, and build your own resolve as we explore the inner work that goes into being a thriving leader. Guest Bios: ::Tiana Bryan-Okeke Living her life to the beat of Alien Superstar by Beyonce, Tiana Bryan-Okeke has created her own path. This CUNY Medgar Evers College alumnus has 5+ years of executive support allowing leaders across industries to focus on and execute their mission and vision. While streamlining operations and building cohesive teams; Her passion for project management allows her to transform fragmented systems into interconnected workflows that break down silos across expanding initiatives. After joining the inaugural cohort for the BWL Early Career, she has aspirations to continue being luminous while sharing her talents to help others execute their strategic plans. Connect with Tiana on LinkedIn. ::Keira Braxton (formerly Brown) Keira, a native San Diegan, is a passionate and dedicated Human Resources professional with a wealth of experience in developing and implementing HR strategies that align with business goals. She is recognized as an experienced workshop facilitator and is known for her ability to translate complex concepts into practical advice that business leaders can use to build and maintain high-performing workforces. In addition to her work as a People Business Partner, Keira serves as a member of the DEI advisory board at Teradata where she provides guidance and recommendations on the company's DEI initiatives. She also serves as chair for the community outreach, networking and events steering committee for the San Diego chapter of Teradata Alliance of Black employees (TABE). As a leader in the DEI space, Keira is committed to helping organizations create more inclusive workplaces where everyone feels valued and respected. Connect with Keira on LinkedIn. ::Trenity Dobbey Trenity K. Dobbey, a seasoned professional with over a decade of experience in social and human services, holds a Master's in Criminal Justice. Her journey began in corporate America, managing financial portfolios for major banks, before transitioning to a impactful role at the Chicago Children's Advocacy Center. There, she skillfully managed the citywide intake line for child abuse reports, showcasing exceptional skills in handling sensitive cases. Currently, at DFSS City of Chicago, Trenity oversees a diverse portfolio of 50+ agencies citywide, managing a $10 million annual budget within the Workforce Services Division. Her strategic vision and hands-on management style have left a lasting impact on the lives of Chicago residents. Building on her extensive professional and personal background, Trenity is an accomplished life coach. Drawing on her extensive experience, she offers practical guidance rooted in real-world expertise, uniquely positioning herself to drive transformative outcomes for individuals and organizations alike. Connect with Trenity on LinkedIn. ::Samira Payne Samira is currently the Director of Community Revitalization and Network Education at Rebuilding Together, a national nonprofit that supports safe and healthy housing in communities across the country. She participated in the Mid-Career Black Woman Leading Program in Fall 2023. Connect with Samira on LinkedIn. Resources: Programs: We are now enrolling for the January 2024 sessions of our Mid-Career and Early Career leadership development programs. Learn more at https://blackwomanleading.com/programs-overview/ Event: Join us for the Black Woman Leading LIVE! Conference +Retreat, May 13-16, 2024 in Virginia Beach! Learn more at bwlretreat.com Credits: Learn more about our consulting work with organizations at https://knightsconsultinggroup.com/ Email Laura: laura@knightsconsultinggroup.com Connect with Laura on LinkedIn Follow BWL on LinkedIn Instagram: @blackwomanleading Facebook: @blackwomanleading Podcast Music & Production: Marshall Knights Graphics: Téa Campbell Listen and follow the podcast on all major platforms: Apple Podcasts Spotify Stitcher iHeartRadio Audible Podbay
So much of the way data is used happens without us even knowing it. And the power of data is beyond comprehension. At Teradata, Erica Hausheer and her team are trying to harness that power and make lives better — even if you don't realize they're doing it. On this episode, Erica details the ways data, AI, and other innovative technologies are transforming the way we live and work.Tune in to learn:How Teradata is improving your life without you even knowing it (4:00) What it takes to understand and execute high-impact jobs (9:00) The evolution of digital transformation (14:00) The juxtaposition between product development and IT (25:00) Fighting against bad data (30:00) Mission.org is a media studio producing content for world-class clients. Learn more at mission.org.
This episode features an interview with Jacqueline Woods, CMO at Teradata, the connected multi-cloud data platform for enterprise analytics, solving data challenges from start to scale.In this episode, Jacqueline talks about the new age of personalization, why AI should not be held accountable, and reminds us that data in technology is only as good as the data put into it. Jacqueline also shares how data is like water and paints a picture of her vision for a unified, frictionless customer experience. Key Takeaways: Personalization is key. One size does not fit all, so dimensional personalization has become so much more important in recent years. There are greater expectations amongst customers when it comes to their digital experiences and how they engage with companies, and it's important to be aware of these changes and leverage them in marketing strategies.AI isn't responsible. When we talk about “responsible AI” we have to remember that it cannot be responsible - it's a technology. So, AI can be trusted, but people need to be responsible. And when people are responsible, they can begin to create an environment of trust for users.Prepare for the AI driven enterprise. Teradata's estimate is that AI will drive most of our experiences by 2030 and a lot of companies feel disorganized in this area. Data is currently extremely siloed, so the first step is to cultivate a connected enterprise.Quote: “How do you know you're moving the needle? I mean, when we started this conversation with Forbes, it literally was around the same time last year, and my pitch to them was the following: There are a lot of people that say data is the new oil, data is gold, data is this, and I said, you know, my own belief is that data is like water is because this planet is over 72% water. That is what the earth is comprised of. The usable water that you can use on this planet is 2.5%. Most of that is in glaciers, which means that the real usable fresh water is 3 tenths of 1%. And so when you think about data, and all the data that's out there in the world, about 90% of the data is like data that's duplicated or replicated and how much of the core data is new data and information that you can use? And when you distill it down, it probably is very similar to water where the usable data that you can use, once you filtered it, cleaned it, harmonized it, associated it with the right things is probably less than 2 or 3% and that takes a lot of work. And at the end of the day, our thematic or our core belief is that we believe that people thrive when empowered with the right information. People don't always necessarily use the right information when they get it. But if they had it, and they were able to use it, they would actually be better off.”Episode Timestamps: *(05:06) - The Trust Tree: The cloud has changed marketing forever*(30:05) - The Playbook: AI needs to be trusted and people need to be responsible*(46:57) - The Dust Up:*(49:54) - Quick Hits: Jacqueline's Quick HitsSponsor:Pipeline Visionaries is brought to you by Qualified.com, the #1 Conversational Marketing platform for companies that use Salesforce and the secret weapon for pipeline pros. The world's leading enterprise brands trust Qualified to instantly meet with buyers, right on their website, and maximize sales pipeline. Visit Qualified.com to learn more.Links:Connect with Ian on LinkedInConnect with Jacqueline on LinkedInRead: Enterprise 2030: Building the AI-powered company of the futureLearn more about TeradataLearn more about Caspian Studios
You don't need to have an unlimited budget to make remarkable marketing content. In fact, it's better if you're working under some constraints. We have proof.The folks over at Wistia did a little experiment they called One, Ten, One Hundred. They made an ad for the same product (Wistia's Soapbox video recorder) on three different budgets: $1,000, 10,000 and 100,000 dollars, to see which one would perform best. And in this episode, we're giving you the inside scoop on what they found. You'll be surprised at the result.Today, we're showing you how combining a bit of inventiveness with a touch of resourcefulness is more powerful than just throwing money at your marketing. Because when cash is a bit strapped, that's when you're forced to get creative. And it's that creativity that resonates with viewers. That's what we're talking about today with Chris Sheen, Director of Content and Social at Celonis. So take out your scissors and craft paper for this episode of Remarkable.About our guest, Chris SheenChris Sheen is Director of Content and Social at Celonis. He joined Celonis in February of 2022. Prior to his current role, he served as CMO at Sideways 6 and SaleCycle. He has also worked at Teradata and Experian. He is based in London.About CelonisCelonis is the global leader and pioneer in process mining. They pioneered the process mining category 10 years ago and the company is now valued at over $13 billion dollars - decacorn status no less. About One, Ten, One HundredOne, Ten, One Hundred is a Webby Award-winning four-part documentary in which video software company Wistia challenges video production company Sandwich Video to make three ads on different budgets: 1,000, 10,000 and 100,000-dollars. The goal was to explore the impact budget has on creativity in video ads. Wistia then measured ad performance and audience reaction to gauge success of each. It was also a way to advertise for Wistia's tool, Soapbox, which is a video creation tool for SMBs.The metrics they tuned into were traditional demographics, engagement data, cost per customer acquisition and return on investment.The idea for Wistia's documentary came about because their production team realized they didn't have a good understanding of the money-in-money-out ratio. Wistia Founder and CEO, Chris Savage said, “Our production team felt that creativity was the single most important element in producing an effective video and this fits in with our vision to grow through creativity.”What B2B Companies Can Learn From One, Ten, One Hundred:Show the “making of” process behind your product. There's an appeal to seeing a transformation from beginning to end. Ian says, “We like to know the process of making something. The making of something is just as interesting, or even more interesting, than the final asset. People like to watch transformation. They like inside information.” Chris says that it also shows the humanity behind the product, behind the company. He says, “I think B2B companies can just feel like a faceless organization that has a product, that has software. But when you show the making of things, like one of my favorite easy tricks is showing an outtake at the end of a video. It's a, you know, a five second outtake. It shows the human side, it shows the mistake and it completely changes how you feel very quickly about the brand, about the company.” Showing the process humanizes your brand and makes it more appealing to potential customers.Play up how long your product was in development. This conveys to your audience a sense of your specialty and standards of excellence in the industry. Chris says, “Apple and Dyson really show you the level, the hours, the days, years, months, they've gone into making their products, really crafting what they do and the art behind it. Like, ‘We've perfected this. We weren't going to ship it until it was ready.' This is so powerful as a marketing technique. Because it works. It really makes you feel like, ‘Okay, this is going to be something special.” So show the rigor that went into crafting your product.Edutain your audience. Don't just try to educate them. Make it fun. Chris says, “Great content marketing is like entertainment. You've got to know your audience to do that well. Wistia really does. How many companies sat there thinking, ‘We'd love to have a great explainer video for our website, but we just don't have the budget'? I watched [the documentary] and I'm literally thinking, ‘I need to get my craft papers out. I'm going to steal my daughter's school stuff and start making stuff to help sell Celonis.' Because it brings it to life in so many different ways.” So when you're creating content, ask yourself, “Is this educational and is it entertaining?” A good way to measure this is to ask, “Would viewers watch it in their own time?”Create something that you enjoy. Because it's likely what your audience would enjoy too. Chris says, “With Wistia, they're clearly doing it as much for themselves as anyone else. They're clearly loving it, enjoying it, learning a lot themselves. And at the end of it, you kind of feel that they've got as much out of it as I have watching it. And I think that in itself is a great sign of content. If you can do something that, when you look back, you think, ‘I think I would enjoy this if someone else had made this,' I think that's a really strong point if it fits your target market.'Quotes“When you watch [One, Ten, One Hundred], you don't feel like you're watching a piece of content marketing. And that's probably the ultimate B2B marketer's goal, or any marketer's goal really, is to make that content not feel like it's selling something. It's just selling entertainment and education.” - Chris Sheen“We always strive for perfect, don't we? We want perfection in the market. We want it to feel great and look great, sound great. Sometimes it's worth taking a step back and thinking, ‘Actually, what's going to get the message across the most authentically?” - Chris Sheen*”Creative work has to have constraints.” - Ian Faison*”[The documentary] really was binge worthy, which is the ultimate goal for content marketing. It passes the driveway test. That's when you're listening to a song, you get to the end of your journey, you're sitting in your driveway. Do you get out of the car and just walk away, or do you stay to finish it?” - Chris SheenTime Stamps[00:54] Introducing Director of Content & Social at Celonis, Chris Sheen[1:48] Why are we talking about Wistia's One, Ten, One Hundred documentary today?[3:21] What is Wistia's One, Ten, One Hundred documentary about?[5:50] What makes the documentary remarkable?[12:51] What are some marketing lessons we can take from One, Ten, One Hundred?[30:22] What's Chris' content strategy?[36:15] What are some projects at Celonis Chris is proud of?LinksWatch One, Ten, One HundredConnect with Chris on LinkedInLearn more about CelonisAbout Remarkable!Remarkable! is created by the team at Caspian Studios, the premier B2B Podcast-as-a-Service company. Caspian creates both non-fiction and fiction series for B2B companies. If you want a fiction series check out our new offering - The Business Thriller - Hollywood style storytelling for B2B. Learn more at CaspianStudios.com. In today's episode, you heard from Ian Faison (CEO of Caspian Studios) and Meredith Gooderham (Senior Producer). Remarkable was produced this week by Meredith Gooderham, mixed by Scott Goodrich, and our theme song is “Solomon” by FALAK. Create something remarkable. Rise above the noise.
Thanks to the over 17,000 people who have joined the first AI Engineer Summit! A full recap is coming. Last call to fill out the State of AI Engineering survey! See our Community page for upcoming meetups in SF, Paris and NYC.This episode had good interest on Twitter.Fast.ai's “Practical Deep Learning” courses been watched by over >6,000,000 people, and the fastai library has over 25,000 stars on Github. Jeremy Howard, one of the creators of Fast, is now one of the most prominent and respected voices in the machine learning industry; but that wasn't always the case. Being non-consensus and right In 2018, Jeremy and Sebastian Ruder published a paper on ULMFiT (Universal Language Model Fine-tuning), a 3-step transfer learning technique for NLP tasks: The paper demonstrated that pre-trained language models could be fine-tuned on a specific task with a relatively small amount of data to achieve state-of-the-art results. They trained a 24M parameters model on WikiText-103 which was beat most benchmarks.While the paper had great results, the methods behind weren't taken seriously by the community: “Everybody hated fine tuning. Everybody hated transfer learning. I literally did tours trying to get people to start doing transfer learning and nobody was interested, particularly after GPT showed such good results with zero shot and few shot learning […] which I was convinced was not the right direction, but who's going to listen to me, cause as you said, I don't have a PhD, not at a university… I don't have a big set of computers to fine tune huge transformer models.”Five years later, fine-tuning is at the center of most major discussion topics in AI (we covered some like fine tuning vs RAG and small models fine tuning), and we might have gotten here earlier if Jeremy had OpenAI-level access to compute and distribution. At heart, Jeremy has always been “GPU poor”:“I've always been somebody who does not want to build stuff on lots of big computers because most people don't have lots of big computers and I hate creating stuff that most people can't use.”This story is a good reminder of how some of the best ideas are hiding in plain sight; we recently covered RWKV and will continue to highlight the most interesting research that isn't being done in the large labs. Replacing fine-tuning with continued pre-trainingEven though fine-tuning is now mainstream, we still have a lot to learn. The issue of “catastrophic forgetting” and potential solutions have been brought up in many papers: at the fine-tuning stage, the model can forget tasks it previously knew how to solve in favor of new ones. The other issue is apparent memorization of the dataset even after a single epoch, which Jeremy covered Can LLMs learn from a single example? but we still don't have the answer to. Despite being the creator of ULMFiT, Jeremy still professes that there are a lot of open questions on finetuning:“So I still don't know how to fine tune language models properly and I haven't found anybody who feels like they do.”He now advocates for "continued pre-training" - maintaining a diversity of data throughout the training process rather than separate pre-training and fine-tuning stages. Mixing instructional data, exercises, code, and other modalities while gradually curating higher quality data can avoid catastrophic forgetting and lead to more robust capabilities (something we covered in Datasets 101).“Even though I originally created three-step approach that everybody now does, my view is it's actually wrong and we shouldn't use it… the right way to do this is to fine-tune language models, is to actually throw away the idea of fine-tuning. There's no such thing. There's only continued pre-training. And pre-training is something where from the very start, you try to include all the kinds of data that you care about, all the kinds of problems that you care about, instructions, exercises, code, general purpose document completion, whatever. And then as you train, you gradually curate that, you know, you gradually make that higher and higher quality and more and more specific to the kinds of tasks you want it to do. But you never throw away any data….So yeah, that's now my view, is I think ULMFiT is the wrong approach. And that's why we're seeing a lot of these so-called alignment tax… I think it's actually because people are training them wrong.An example of this phenomena is CodeLlama, a LLaMA2 model finetuned on 500B tokens of code: while the model is much better at code, it's worse on generic tasks that LLaMA2 knew how to solve well before the fine-tuning. In the episode we also dive into all the places where open source model development and research is happening (academia vs Discords - tracked on our Communities list and on our survey), and how Jeremy recommends getting the most out of these diffuse, pseudonymous communities (similar to the Eleuther AI Mafia).Show Notes* Jeremy's Background* FastMail* Optimal Decisions* Kaggle* Enlitic* fast.ai* Rachel Thomas* Practical Deep Learning* fastai for PyTorch* nbdev* fastec2 (the underrated library we describe)* Can LLMs learn from a single example?* the Kaggle LLM Science Exam competition, which “challenges participants to answer difficult science-based questions written by a Large Language Model”.* Sebastian Ruder* Alec Radford* Sylvain Gugger* Stephen Merity* Chris Lattner* Modular.ai / Mojo* Jono Whittaker* Zeiler and Fergus paper* ULM Fit* DAWNBench* Phi-1* Code Llama* AlexNetTimestamps* [00:00:00] Intros and Jeremy's background* [00:05:28] Creating ULM Fit - a breakthrough in NLP using transfer learning* [00:06:32] The rise of GPT and the appeal of few-shot learning over fine-tuning* [00:10:00] Starting Fast.ai to distribute AI capabilities beyond elite academics* [00:14:30] How modern LMs like ChatGPT still follow the ULM Fit 3-step approach* [00:17:23] Meeting with Chris Lattner on Swift for TensorFlow at Google* [00:20:00] Continued pre-training as a fine-tuning alternative* [00:22:16] Fast.ai and looking for impact vs profit maximization* [00:26:39] Using Fast.ai to create an "army" of AI experts to improve their domains* [00:29:32] Fast.ai's 3 focus areas - research, software, and courses* [00:38:42] Fine-tuning memorization and training curve "clunks" before each epoch* [00:46:47] Poor training and fine-tuning practices may be causing alignment failures* [00:48:38] Academia vs Discords* [00:53:41] Jeremy's high hopes for Chris Lattner's Mojo and its potential* [01:05:00] Adding capabilities like SQL generation through quick fine-tuning* [01:10:12] Rethinking Fast.ai courses for the AI-assisted coding era* [01:14:53] Rapid model development has created major technical debt* [01:17:08] Lightning RoundAI Summary (beta)This is the first episode we're trying this. Here's an overview of the main topics before you dive in the transcript. * Jeremy's background and philosophies on AI* Studied philosophy and cognitive science in college* Focused on ethics and thinking about AI even 30 years ago* Believes AI should be accessible to more people, not just elite academics/programmers* Created fast.ai to make deep learning more accessible* Development of transfer learning and ULMFit* Idea of transfer learning critical for making deep learning accessible* ULMFit pioneered transfer learning for NLP* Proposed training general language models on large corpora then fine-tuning - this became standard practice* Faced skepticism that this approach would work from NLP community* Showed state-of-the-art results on text classification soon after trying it* Current open questions around fine-tuning LLMs* Models appear to memorize training data extremely quickly (after 1 epoch)* This may hurt training dynamics and cause catastrophic forgetting* Unclear how best to fine-tune models to incorporate new information/capabilities* Need more research on model training dynamics and ideal data mixing* Exciting new developments* Mojo and new programming languages like Swift could enable faster model innovation* Still lots of room for improvements in computer vision-like innovations in transformers* Small models with fine-tuning may be surprisingly capable for many real-world tasks* Prompting strategies enable models like GPT-3 to achieve new skills like playing chess at superhuman levels* LLMs are like computer vision in 2013 - on the cusp of huge new breakthroughs in capabilities* Access to AI research* Many key convos happen in private Discord channels and forums* Becoming part of these communities can provide great learning opportunities* Being willing to do real work, not just talk about ideas, is key to gaining access* The future of practical AI* Coding becoming more accessible to non-programmers through AI assistance* Pre-requisite programming experience for learning AI may no longer be needed* Huge open questions remain about how to best train, fine-tune, and prompt LLMsTranscriptAlessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO at Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI. [00:00:21]Swyx: Hey, and today we have in the remote studio, Jeremy Howard all the way from Australia. Good morning. [00:00:27]Jeremy: The remote studio, also known as my house. Good morning. Nice to see you. [00:00:32]Swyx: Nice to see you too. I'm actually very used to seeing you in your mask as a message to people, but today we're mostly audio. But thank you for doing the very important public service of COVID awareness. It was a pleasure. [00:00:46]Jeremy: It was all very annoying and frustrating and tedious, but somebody had to do it. [00:00:52]Swyx: Somebody had to do it, especially somebody with your profile. I think it really drives home the message. So we tend to introduce people for them and then ask people to fill in the blanks on the personal side. Something I did not know about you was that you graduated with a BA in philosophy from the University of Melbourne. I assumed you had a PhD. [00:01:14]Jeremy: No, I mean, I barely got through my BA because I was working 80 to 100 hour weeks at McKinsey and Company from 19 years old onwards. So I actually didn't attend any lectures in second and third year university. [00:01:35]Swyx: Well, I guess you didn't need it or you're very sort of self-driven and self-motivated. [00:01:39]Jeremy: I took two weeks off before each exam period when I was working at McKinsey. And then, I mean, I can't believe I got away with this in hindsight, I would go to all my professors and say, oh, I was meant to be in your class this semester and I didn't quite turn up. Were there any assignments I was meant to have done, whatever. I can't believe all of them let me basically have it. They basically always would say like, okay, well, if you can have this written by tomorrow, I'll accept it. So yeah, stressful way to get through university, but. [00:02:12]Swyx: Well, it shows that, I guess, you min-maxed the opportunities. That definitely was a precursor. [00:02:18]Jeremy: I mean, funnily, like in as much as I, you know, in philosophy, the things I found interesting and focused on in the little bit of time I did spend on it was ethics and cognitive science. And it's kind of really amazing that it's now come back around and those are actually genuinely useful things to know about, which I never thought would happen. [00:02:38]Swyx: A lot of, yeah, a lot of relevant conversations there. So you were a consultant for a while and then in the magical month of June 1989, you founded both Optimal Decisions and Fastmeal, which I also briefly used. So thank you for that. [00:02:53]Jeremy: Oh, good for you. Yeah. Cause I had read the statistics, which is that like 90% or something of small businesses fail. So I thought if I start two businesses, I have a higher chance. In hindsight, I was thinking of it as some kind of stochastic thing I didn't have control over, but it's a bit odd, but anyway. [00:03:10]Swyx: And then you were president and chief scientist at Kaggle, which obviously is the sort of composition platform of machine learning. And then Enlitic, where you were working on using deep learning to improve medical diagnostics and clinical decisions. Yeah. [00:03:28]Jeremy: I was actually the first company to use deep learning in medicine, so I kind of founded the field. [00:03:33]Swyx: And even now that's still like a pretty early phase. And I actually heard you on your new podcast with Tanish, where you went very, very deep into the stuff, the kind of work that he's doing, such a young prodigy at his age. [00:03:47]Jeremy: Maybe he's too old to be called a prodigy now, ex-prodigy. No, no. [00:03:51]Swyx: I think he still counts. And anyway, just to round out the bio, you have a lot more other credentials, obviously, but most recently you started Fast.ai, which is still, I guess, your primary identity with Rachel Thomas. So welcome. [00:04:05]Jeremy: Yep. [00:04:06]Swyx: Thanks to my wife. Thank you. Yeah. Doing a lot of public service there with getting people involved in AI, and I can't imagine a better way to describe it than fast, fast.ai. You teach people from nothing to stable diffusion in seven weeks or something, and that's amazing. Yeah, yeah. [00:04:22]Jeremy: I mean, it's funny, you know, when we started that, what was that, like 2016 or something, the idea that deep learning was something that you could make more accessible was generally considered stupid. Everybody knew that deep learning was a thing that you got a math or a computer science PhD, you know, there was one of five labs that could give you the appropriate skills and that you would join, yeah, basically from one of those labs, you might be able to write some papers. So yeah, the idea that normal people could use that technology to do good work was considered kind of ridiculous when we started it. And we weren't sure if it was possible either, but we kind of felt like we had to give it a go because the alternative was we were pretty sure that deep learning was on its way to becoming, you know, the most or one of the most, you know, important technologies in human history. And if the only people that could use it were a handful of computer science PhDs, that seemed like A, a big waste and B, kind of dangerous. [00:05:28]Swyx: Yeah. [00:05:29]Alessio: And, you know, well, I just wanted to know one thing on your bio that at Kaggle, you were also the top rank participant in both 2010 and 2011. So sometimes you see a lot of founders running companies that are not really in touch with the problem, but you were clearly building something that you knew a lot about, which is awesome. Talking about deep learning, you created, published a paper on ULM fit, which was kind of the predecessor to multitask learning and a lot of the groundwork that then went to into Transformers. I've read back on the paper and you turned this model, AWD LSTM, which I did the math and it was like 24 to 33 million parameters, depending on what training data set you use today. That's kind of like not even small, it's like super small. What were some of the kind of like contrarian takes that you had at the time and maybe set the stage a little bit for the rest of the audience on what was kind of like the state of the art, so to speak, at the time and what people were working towards? [00:06:32]Jeremy: Yeah, the whole thing was a contrarian take, you know. So okay, so we started Fast.ai, my wife and I, and we thought, yeah, so we're trying to think, okay, how do we make it more accessible? So when we started thinking about it, it was probably 2015 and then 2016, we started doing something about it. Why is it inaccessible? Okay, well, A, no one knows how to do it other than a few number of people. And then when we asked those few number of people, well, how do you actually get good results? They would say like, oh, it's like, you know, a box of tricks that aren't published. So you have to join one of the labs and learn the tricks. So a bunch of unpublished tricks, not much software around, but thankfully there was Theano and rappers and particularly Lasagna, the rapper, but yeah, not much software around, not much in the way of data sets, you know, very hard to get started in terms of the compute. Like how do you get that set up? So yeah, no, everything was kind of inaccessible. And you know, as we started looking into it, we had a key insight, which was like, you know what, most of the compute and data for image recognition, for example, we don't need to do it. You know, there's this thing which nobody knows about, nobody talks about called transfer learning, where you take somebody else's model, where they already figured out like how to detect edges and gradients and corners and text and whatever else, and then you can fine tune it to do the thing you want to do. And we thought that's the key. That's the key to becoming more accessible in terms of compute and data requirements. So when we started Fast.ai, we focused from day one on transfer learning. Lesson one, in fact, was transfer learning, literally lesson one, something not normally even mentioned in, I mean, there wasn't much in the way of courses, you know, the courses out there were PhD programs that had happened to have recorded their lessons and they would rarely mention it at all. We wanted to show how to do four things that seemed really useful. You know, work with vision, work with tables of data, work with kind of recommendation systems and collaborative filtering and work with text, because we felt like those four kind of modalities covered a lot of the stuff that, you know, are useful in real life. And no one was doing anything much useful with text. Everybody was talking about word2vec, you know, like king plus queen minus woman and blah, blah, blah. It was like cool experiments, but nobody's doing anything like useful with it. NLP was all like lemmatization and stop words and topic models and bigrams and SPMs. And it was really academic and not practical. But I mean, to be honest, I've been thinking about this crazy idea for nearly 30 years since I had done cognitive science at university, where we talked a lot about the CELS Chinese room experiment. This idea of like, what if there was somebody that could kind of like, knew all of the symbolic manipulations required to answer questions in Chinese, but they didn't speak Chinese and they were kind of inside a room with no other way to talk to the outside world other than taking in slips of paper with Chinese written on them and then they do all their rules and then they pass back a piece of paper with Chinese back. And this room with a person in is actually fantastically good at answering any question you give them written in Chinese. You know, do they understand Chinese? And is this, you know, something that's intelligently working with Chinese? Ever since that time, I'd say the most thought, to me, the most thoughtful and compelling philosophical response is yes. You know, intuitively it feels like no, because that's just because we can't imagine such a large kind of system. But you know, if it looks like a duck and acts like a duck, it's a duck, you know, or to all intents and purposes. And so I always kind of thought, you know, so this is basically a kind of analysis of the limits of text. And I kind of felt like, yeah, if something could ingest enough text and could use the patterns it saw to then generate text in response to text, it could appear to be intelligent, you know. And whether that means it is intelligent or not is a different discussion and not one I find very interesting. Yeah. And then when I came across neural nets when I was about 20, you know, what I learned about the universal approximation theorem and stuff, and I started thinking like, oh, I wonder if like a neural net could ever get big enough and take in enough data to be a Chinese room experiment. You know, with that background and this kind of like interest in transfer learning, you know, I'd been thinking about this thing for kind of 30 years and I thought like, oh, I wonder if we're there yet, you know, because we have a lot of text. Like I can literally download Wikipedia, which is a lot of text. And I thought, you know, how would something learn to kind of answer questions or, you know, respond to text? And I thought, well, what if we used a language model? So language models are already a thing, you know, they were not a popular or well-known thing, but they were a thing. But language models exist to this idea that you could train a model to fill in the gaps. Or actually in those days it wasn't fill in the gaps, it was finish a string. And in fact, Andrej Karpathy did his fantastic RNN demonstration from this at a similar time where he showed like you can have it ingest Shakespeare and it will generate something that looks a bit like Shakespeare. I thought, okay, so if I do this at a much bigger scale, using all of Wikipedia, what would it need to be able to do to finish a sentence in Wikipedia effectively, to do it quite accurately quite often? I thought, geez, it would actually have to know a lot about the world, you know, it'd have to know that there is a world and that there are objects and that objects relate to each other through time and cause each other to react in ways and that causes proceed effects and that, you know, when there are animals and there are people and that people can be in certain positions during certain timeframes and then you could, you know, all that together, you can then finish a sentence like this was signed into law in 2016 by US President X and it would fill in the gap, you know. So that's why I tried to create what in those days was considered a big language model trained on the entirety on Wikipedia, which is that was, you know, a bit unheard of. And my interest was not in, you know, just having a language model. My interest was in like, what latent capabilities would such a system have that would allow it to finish those kind of sentences? Because I was pretty sure, based on our work with transfer learning and vision, that I could then suck out those latent capabilities by transfer learning, you know, by fine-tuning it on a task data set or whatever. So we generated this three-step system. So step one was train a language model on a big corpus. Step two was fine-tune a language model on a more curated corpus. And step three was further fine-tune that model on a task. And of course, that's what everybody still does today, right? That's what ChatGPT is. And so the first time I tried it within hours, I had a new state-of-the-art academic result on IMDB. And I was like, holy s**t, it does work. And so you asked, to what degree was this kind of like pushing against the established wisdom? You know, every way. Like the reason it took me so long to try it was because I asked all my friends in NLP if this could work. And everybody said, no, it definitely won't work. It wasn't like, oh, maybe. Everybody was like, it definitely won't work. NLP is much more complicated than vision. Language is a much more vastly complicated domain. You know, and you've got problems like the grounding problem. We know from like philosophy and theory of mind that it's actually impossible for it to work. So yeah, so don't waste your time. [00:15:10]Alessio: Jeremy, had people not tried because it was like too complicated to actually get the data and like set up the training? Or like, were people just lazy and kind of like, hey, this is just not going to work? [00:15:20]Jeremy: No, everybody wasn't lazy. So like, so the person I thought at that time who, you know, there were two people I thought at that time, actually, who were the strongest at language models were Stephen Merity and Alec Radford. And at the time I didn't know Alec, but I, after we had both, after I'd released ULM Fit and he had released GPT, I organized a chat for both of us with Kate Metz in the New York Times. And Kate Metz answered, sorry, and Alec answered this question for Kate. And Kate was like, so how did, you know, GPT come about? And he said, well, I was pretty sure that pre-training on a general large corpus wouldn't work. So I hadn't tried it. And then I read ULM Fit and turns out it did work. And so I did it, you know, bigger and it worked even better. And similar with, with Stephen, you know, I asked Stephen Merity, like, why don't we just find, you know, take your AWD-ASTLM and like train it on all of Wikipedia and fine tune it? And he's kind of like, well, I don't think that's going to really lie. Like two years before I did a very popular talk at KDD, the conference where everybody in NLP was in the audience. I recognized half the faces, you know, and I told them all this, I'm sure transfer learning is the key. I'm sure ImageNet, you know, is going to be an NLP thing as well. And, you know, everybody was interested and people asked me questions afterwards and, but not just, yeah, nobody followed up because everybody knew that it didn't work. I mean, even like, so we were scooped a little bit by Dai and Lee, Kwok Lee at Google. They had, they had, I already, I didn't even realize this, which is a bit embarrassing. They had already done a large language model and fine tuned it. But again, they didn't create a general purpose, large language model on a general purpose corpus. They only ever tested a domain specific corpus. And I haven't spoken to Kwok actually about that, but I assume that the reason was the same. It probably just didn't occur to them that the general approach could work. So maybe it was that kind of 30 years of mulling over the, the cell Chinese room experiment that had convinced me that it probably would work. I don't know. Yeah. [00:17:48]Alessio: Interesting. I just dug up Alec announcement tweet from 2018. He said, inspired by Cobe, Elmo, and Yola, I'm fit. We should have a single transformer language model can be fine tuned to a wide variety. It's interesting because, you know, today people think of AI as the leader, kind of kind of like the research lab pushing forward the field. What was that at the time? You know, like kind of like going back five years, people think of it as an overnight success, but obviously it took a while. [00:18:16]Swyx: Yeah. Yeah. [00:18:17]Jeremy: No, I mean, absolutely. And I'll say like, you know, it's interesting that it mentioned Elmo because in some ways that was kind of diametrically opposed to, to ULM fit. You know, there was these kind of like, so there was a lot of, there was a lot of activity at the same time as ULM fits released. So there was, um, so before it, as Brian McCann, I think at Salesforce had come out with this neat model that did a kind of multitask learning, but again, they didn't create a general fine tune language model first. There was Elmo, um, which I think was a lip, you know, actually quite a few months after the first ULM fit example, I think. Um, but yeah, there was a bit of this stuff going on. And the problem was everybody was doing, and particularly after GPT came out, then everybody wanted to focus on zero shot and few shot learning. You know, everybody hated fine tuning. Everybody hated transfer learning. And like, I literally did tours trying to get people to start doing transfer learning and people, you know, nobody was interested, particularly after GPT showed such good results with zero shot and few shot learning. And so I actually feel like we kind of went backwards for years and, and not to be honest, I mean, I'm a bit sad about this now, but I kind of got so disappointed and dissuaded by like, it felt like these bigger lab, much bigger labs, you know, like fast AI had only ever been just me and Rachel were getting all of this attention for an approach I thought was the wrong way to do it. You know, I was convinced was the wrong way to do it. And so, yeah, for years people were really focused on getting better at zero shot and few shots and it wasn't until, you know, this key idea of like, well, let's take the ULM fit approach, but for step two, rather than fine tuning on a kind of a domain corpus, let's fine tune on an instruction corpus. And then in step three, rather than fine tuning on a reasonably specific task classification, let's fine tune on a, on a RLHF task classification. And so that was really, that was really key, you know, so I was kind of like out of the NLP field for a few years there because yeah, it just felt like, I don't know, pushing uphill against this vast tide, which I was convinced was not the right direction, but who's going to listen to me, you know, cause I, as you said, I don't have a PhD, not at a university, or at least I wasn't then. I don't have a big set of computers to fine tune huge transformer models. So yeah, it was definitely difficult. It's always been hard. You know, it's always been hard. Like I've always been somebody who does not want to build stuff on lots of big computers because most people don't have lots of big computers and I hate creating stuff that most people can't use, you know, and also stuff that's created on lots of big computers has always been like much more media friendly. So like, it might seem like a recent thing, but actually throughout my 30 years in data science, the attention's always been on, you know, the big iron results. So when I first started, everybody was talking about data warehouses and it was all about Teradata and it'd be like, oh, this big bank has this huge room full of computers and they have like terabytes of data available, you know, at the press of a button. And yeah, that's always what people want to talk about, what people want to write about. And then of course, students coming out of their PhDs and stuff, that's where they want to go work because that's where they read about. And to me, it's a huge distraction, you know, because like I say, most people don't have unlimited compute and I want to help most people, not the small subset of the most well-off people. [00:22:16]Alessio: That's awesome. And it's great to hear, you do such a great job educating that a lot of times you're not telling your own story, you know? So I love this conversation. And the other thing before we jump into Fast.AI, actually, a lot of people that I know, they run across a new architecture and whatnot, they're like, I got to start a company and raise a bunch of money and do all of this stuff. And say, you were like, I want everybody to have access to this. Why was that the case for you? Was it because you already had a successful venture in like FastMail and you were more interested in that? What was the reasoning? [00:22:52]Jeremy: It's a really good question. So I guess the answer is yes, that's the reason why. So when I was a teenager, I thought it would be really cool to like have my own company. You know, I didn't know the word startup. I didn't know the word entrepreneur. I didn't know the word VC. And I didn't really know what any of those things were really until after we started Kaggle, to be honest. Even the way it started to what we now call startups. I just thought they were just small businesses. You know, they were just companies. So yeah, so those two companies were FastMail and Optimal Decisions. FastMail was the first kind of synchronized email provider for non-businesses. So something you can get your same email at home, on your laptop, at work, on your phone, whatever. And then Optimal Decisions invented a new approach to insurance pricing. Something called profit-optimized insurance pricing. So I saw both of those companies, you know, after 10 years. And at that point, I had achieved the thing that as a teenager I had wanted to do. You know, it took a lot longer than it should have because I spent way longer in management consulting than I should have because I got caught up in that stupid rat race. But, you know, eventually I got there and I remember my mom saying to me, you must be so proud. You know, because she remembered my dream. She's like, you've done it. And I kind of reflected and I was like, I'm not proud at all. You know, like people quite liked FastMail. You know, it's quite nice to have synchronized email. It probably would have happened anyway. Yeah, I'm certainly not proud that I've helped some insurance companies suck more money out of their customers. Yeah, no, I'm not proud. You know, it's actually, I haven't really helped the world very much. You know, maybe in the insurance case I've made it a little bit worse. I don't know. So, yeah, I was determined to not waste more years of my life doing things, working hard to do things which I could not be reasonably sure would have a lot of value. So, you know, I took some time off. I wasn't sure if I'd ever work again, actually. I didn't particularly want to, because it felt like, yeah, it felt like such a disappointment. And, but, you know, and I didn't need to. I had enough money. Like, I wasn't super rich, but I had enough money. I didn't need to work. And I certainly recognized that amongst the other people I knew who had enough money that they didn't need to work, they all worked ridiculously hard, you know, and constantly put themselves in extremely stressful situations. And I thought, I don't want to be one of those idiots who's tied to, you know, buying a bigger plane than the next guy or whatever. You know, Kaggle came along and I mainly kind of did that just because it was fun and interesting to hang out with interesting people. But, you know, with Fast.ai in particular, you know, Rachel and I had a very explicit, you know, long series of conversations over a long period of time about like, well, how can we be the most helpful to society as a whole, and particularly to those people who maybe need more help, you know? And so we definitely saw the world going in a potentially pretty dystopian direction if the world's most powerful technology was controlled by a small group of elites. So we thought, yeah, we should focus on trying to help that not happen. You know, sadly, it looks like it still is likely to happen. But I mean, I feel like we've helped make it a little bit less likely. So we've done our bit. [00:26:39]Swyx: You've shown that it's possible. And I think your constant advocacy, your courses, your research that you publish, you know, just the other day you published a finding on, you know, learning that I think is still something that people are still talking about quite a lot. I think that that is the origin story of a lot of people who are going to be, you know, little Jeremy Howards, furthering your mission with, you know, you don't have to do everything by yourself is what I'm saying. No, definitely. Definitely. [00:27:10]Jeremy: You know, that was a big takeaway from like, analytic was analytic. It definitely felt like we had to do everything ourselves. And I kind of, I wanted to solve medicine. I'll say, yeah, okay, solving medicine is actually quite difficult. And I can't do it on my own. And there's a lot of other things I'd like to solve, and I can't do those either. So that was definitely the other piece was like, yeah, you know, can we create an army of passionate domain experts who can change their little part of the world? And that's definitely happened. Like I find nowadays, at least half the time, probably quite a bit more that I get in contact with somebody who's done really interesting work in some domain. Most of the time I'd say, they say, yeah, I got my start with fast.ai. So it's definitely, I can see that. And I also know from talking to folks at places like Amazon and Adobe and stuff, which, you know, there's lots of alumni there. And they say, oh my God, I got here. And like half of the people are fast.ai alumni. So it's fantastic. [00:28:13]Swyx: Yeah. [00:28:14]Jeremy: Actually, Andre Kapathy grabbed me when I saw him at NeurIPS a few years ago. And he was like, I have to tell you, thanks for the fast.ai courses. When people come to Tesla and they need to know more about deep learning, we always send them to your course. And the OpenAI Scholars Program was doing the same thing. So it's kind of like, yeah, it's had a surprising impact, you know, that's just one of like three things we do is the course, you know. [00:28:40]Swyx: Yes. [00:28:40]Jeremy: And it's only ever been at most two people, either me and Rachel or me and Sylvia nowadays, it's just me. So yeah, I think it shows you don't necessarily need a huge amount of money and a huge team of people to make an impact. [00:28:56]Swyx: Yeah. So just to reintroduce fast.ai for people who may not have dived into it much, there is the courses that you do. There is the library that is very well loved. And I kind of think of it as a nicer layer on top of PyTorch that people should start with by default and use it as the basis for a lot of your courses. And then you have like NBDev, which I don't know, is that the third one? [00:29:27]Jeremy: Oh, so the three areas were research, software, and courses. [00:29:32]Swyx: Oh, sorry. [00:29:32]Jeremy: So then in software, you know, fast.ai is the main thing, but NBDev is not far behind. But then there's also things like FastCore, GHAPI, I mean, dozens of open source projects that I've created and some of them have been pretty popular and some of them are still a little bit hidden, actually. Some of them I should try to do a better job of telling people about. [00:30:01]Swyx: What are you thinking about? Yeah, what's on the course of my way? Oh, I don't know, just like little things. [00:30:04]Jeremy: Like, for example, for working with EC2 and AWS, I created a FastEC2 library, which I think is like way more convenient and nice to use than anything else out there. And it's literally got a whole autocomplete, dynamic autocomplete that works both on the command line and in notebooks that'll like auto-complete your instance names and everything like that. You know, just little things like that. I try to make like, when I work with some domain, I try to make it like, I want to make it as enjoyable as possible for me to do that. So I always try to kind of like, like with GHAPI, for example, I think that GitHub API is incredibly powerful, but I didn't find it good to work with because I didn't particularly like the libraries that are out there. So like GHAPI, like FastEC2, it like autocompletes both at the command line or in a notebook or whatever, like literally the entire GitHub API. The entire thing is like, I think it's like less than 100K of code because it actually, as far as I know, the only one that grabs it directly from the official open API spec that GitHub produces. And like if you're in GitHub and you just type an API, you know, autocomplete API method and hit enter, it prints out the docs with brief docs and then gives you a link to the actual documentation page. You know, GitHub Actions, I can write now in Python, which is just so much easier than writing them in TypeScript and stuff. So, you know, just little things like that. [00:31:40]Swyx: I think that's an approach which more developers took to publish some of their work along the way. You described the third arm of FastAI as research. It's not something I see often. Obviously, you do do some research. And how do you run your research? What are your research interests? [00:31:59]Jeremy: Yeah, so research is what I spend the vast majority of my time on. And the artifacts that come out of that are largely software and courses. You know, so to me, the main artifact shouldn't be papers because papers are things read by a small exclusive group of people. You know, to me, the main artifacts should be like something teaching people, here's how to use this insight and here's software you can use that builds it in. So I think I've only ever done three first-person papers in my life, you know, and none of those are ones I wanted to do. You know, they were all ones that, like, so one was ULM Fit, where Sebastian Ruder reached out to me after seeing the course and said, like, you have to publish this as a paper, you know. And he said, I'll write it. He said, I want to write it because if I do, I can put it on my PhD and that would be great. And it's like, okay, well, I want to help you with your PhD. And that sounds great. So like, you know, one was the masks paper, which just had to exist and nobody else was writing it. And then the third was the Fast.ai library paper, which again, somebody reached out and said, please, please write this. We will waive the fee for the journal and everything and actually help you get it through publishing and stuff. So yeah, so I don't, other than that, I've never written a first author paper. So the research is like, well, so for example, you know, Dawn Bench was a competition, which Stanford ran a few years ago. It was kind of the first big competition of like, who can train neural nets the fastest rather than the most accurate. And specifically it was who can train ImageNet the fastest. And again, this was like one of these things where it was created by necessity. So Google had just released their TPUs. And so I heard from my friends at Google that they had put together this big team to smash Dawn Bench so that they could prove to people that they had to use Google Cloud and use their TPUs and show how good their TPUs were. And we kind of thought, oh s**t, this would be a disaster if they do that, because then everybody's going to be like, oh, deep learning is not accessible. [00:34:20]Swyx: You know, to actually be good at it, [00:34:21]Jeremy: you have to be Google and you have to use special silicon. And so, you know, we only found out about this 10 days before the competition finished. But, you know, we basically got together an emergency bunch of our students and Rachel and I and sat for the next 10 days and just tried to crunch through and try to use all of our best ideas that had come from our research. And so particularly progressive resizing, just basically train mainly on small things, train on non-square things, you know, stuff like that. And so, yeah, we ended up winning, thank God. And so, you know, we turned it around from being like, like, oh s**t, you know, this is going to show that you have to be Google and have TPUs to being like, oh my God, even the little guy can do deep learning. So that's an example of the kind of like research artifacts we do. And yeah, so all of my research is always, how do we do more with less, you know? So how do we get better results with less data, with less compute, with less complexity, with less education, you know, stuff like that. So ULM fits obviously a good example of that. [00:35:37]Swyx: And most recently you published, can LLMs learn from a single example? Maybe could you tell the story a little bit behind that? And maybe that goes a little bit too far into the learning of very low resource, the literature. [00:35:52]Jeremy: Yeah, yeah. So me and my friend, Jono Whittaker, basically had been playing around with this fun Kaggle competition, which is actually still running as we speak, which is, can you create a model which can answer multiple choice questions about anything that's in Wikipedia? And the thing that makes it interesting is that your model has to run on Kaggle within nine hours. And Kaggle's very, very limited. So you've only got 14 gig RAM, only two CPUs, and a small, very old GPU. So this is cool, you know, if you can do well at this, then this is a good example of like, oh, you can do more with less. So yeah, Jono and I were playing around with fine tuning, of course, transfer learning, pre-trained language models. And we saw this, like, so we always, you know, plot our losses as we go. So here's another thing we created. Actually, Sylvain Guuger, when he worked with us, created called fast progress, which is kind of like TQEDM, but we think a lot better. So we look at our fast progress curves, and they kind of go down, down, down, down, down, down, down, a little bit, little bit, little bit. And then suddenly go clunk, and they drop. And then down, down, down, down, down a little bit, and then suddenly clunk, they drop. We're like, what the hell? These clunks are occurring at the end of each epoch. So normally in deep learning, this would be, this is, you know, I've seen this before. It's always been a bug. It's always turned out that like, oh, we accidentally forgot to turn on eval mode during the validation set. So I was actually learning then, or, oh, we accidentally were calculating moving average statistics throughout the epoch. So, you know, so it's recently moving average or whatever. And so we were using Hugging Face Trainer. So, you know, I did not give my friends at Hugging Face the benefit of the doubt. I thought, oh, they've fucked up Hugging Face Trainer, you know, idiots. Well, you'll use the Fast AI Trainer instead. So we switched over to Learner. We still saw the clunks and, you know, that's, yeah, it shouldn't really happen because semantically speaking in the epoch, isn't like, it's not a thing, you know, like nothing happens. Well, nothing's meant to happen when you go from ending one epoch to starting the next one. So there shouldn't be a clunk, you know. So I kind of asked around on the open source discords. That's like, what's going on here? And everybody was just like, oh, that's just what, that's just what these training curves look like. Those all look like that. Don't worry about it. And I was like, oh, are you all using Trainer? Yes. Oh, well, there must be some bug with Trainer. And I was like, well, we also saw it in Learner [00:38:42]Swyx: and somebody else is like, [00:38:42]Jeremy: no, we've got our own Trainer. We get it as well. They're just like, don't worry about it. It's just something we see. It's just normal. [00:38:48]Swyx: I can't do that. [00:38:49]Jeremy: I can't just be like, here's something that's like in the previous 30 years of neural networks, nobody ever saw it. And now suddenly we see it. [00:38:57]Swyx: So don't worry about it. [00:38:59]Jeremy: I just, I have to know why. [00:39:01]Swyx: Can I clarify? This is, was everyone that you're talking to, were they all seeing it for the same dataset or in different datasets? [00:39:08]Jeremy: Different datasets, different Trainers. They're just like, no, this is just, this is just what it looks like when you fine tune language models. Don't worry about it. You know, I hadn't seen it before, but I'd been kind of like, as I say, I, you know, I kept working on them for a couple of years after ULM fit. And then I kind of moved on to other things, partly out of frustration. So I hadn't been fine tuning, you know, I mean, Lama's only been out for a few months, right? But I wasn't one of those people who jumped straight into it, you know? So I was relatively new to the kind of Lama fine tuning world, where else these guys had been, you know, doing it since day one. [00:39:49]Swyx: It was only a few months ago, [00:39:51]Jeremy: but it's still quite a bit of time. So, so yeah, they're just like, no, this is all what we see. [00:39:56]Swyx: Don't worry about it. [00:39:56]Jeremy: So yeah, I, I've got a very kind of like, I don't know, I've just got this brain where I have to know why things are. And so I kind of, I ask people like, well, why, why do you think it's happening? And they'd be like, oh, it would pretty obviously, cause it's like memorize the data set. It's just like, that can't be right. It's only seen it once. Like, look at this, the loss has dropped by 0.3, 0.3, which is like, basically it knows the answer. And like, no, no, it's just, it is, it's just memorize the data set. So yeah. So look, Jono and I did not discover this and Jono and I did not come up with a hypothesis. You know, I guess we were just the ones, I guess, who had been around for long enough to recognize that like, this, this isn't how it's meant to work. And so we, we, you know, and so we went back and like, okay, let's just run some experiments, you know, cause nobody seems to have actually published anything about this. [00:40:51]Well, not quite true.Some people had published things, but nobody ever actually stepped back and said like, what the hell, you know, how can this be possible? Is it possible? Is this what's happening? And so, yeah, we created a bunch of experiments where we basically predicted ahead of time. It's like, okay, if this hypothesis is correct, that it's memorized in the training set, then we ought to see blah, under conditions, blah, but not under these conditions. And so we ran a bunch of experiments and all of them supported the hypothesis that it was memorizing the data set in a single thing at once. And it's a pretty big data set, you know, which in hindsight, it's not totally surprising because the theory, remember, of the ULMFiT theory was like, well, it's kind of creating all these latent capabilities to make it easier for it to predict the next token. So if it's got all this kind of latent capability, it ought to also be really good at compressing new tokens because it can immediately recognize it as like, oh, that's just a version of this. So it's not so crazy, you know, but it is, it requires us to rethink everything because like, and nobody knows like, okay, so how do we fine tune these things? Because like, it doesn't even matter. Like maybe it's fine. Like maybe it's fine that it's memorized the data set after one go and you do a second go and okay, the validation loss is terrible because it's now really overconfident. [00:42:20]Swyx: That's fine. [00:42:22]Jeremy: Don't, you know, don't, I keep telling people, don't track validation loss, track validation accuracy because at least that will still be useful. Just another thing that's got lost since ULMFiT, nobody tracks accuracy of language models anymore. But you know, it'll still keep learning and it does, it does keep improving. But is it worse? You know, like, is it like, now that it's kind of memorized it, it's probably getting a less strong signal, you know, I don't know. So I still don't know how to fine tune language models properly and I haven't found anybody who feels like they do, like nobody really knows whether this memorization thing is, it's probably a feature in some ways. It's probably some things that you can do usefully with it. It's probably, yeah, I have a feeling it's messing up training dynamics as well. [00:43:13]Swyx: And does it come at the cost of catastrophic forgetting as well, right? Like, which is the other side of the coin. [00:43:18]Jeremy: It does to some extent, like we know it does, like look at Code Llama, for example. So Code Llama was a, I think it was like a 500 billion token fine tuning of Llama 2 using code. And also pros about code that Meta did. And honestly, they kind of blew it because Code Llama is good at coding, but it's bad at everything else, you know, and it used to be good. Yeah, I was pretty sure it was like, before they released it, me and lots of people in the open source discords were like, oh my God, you know, we know this is coming, Jan Lukinsk saying it's coming. I hope they kept at least like 50% non-code data because otherwise it's going to forget everything else. And they didn't, only like 0.3% of their epochs were non-code data. So it did, it forgot everything else. So now it's good at code and it's bad at everything else. So we definitely have catastrophic forgetting. It's fixable, just somebody has to do, you know, somebody has to spend their time training a model on a good mix of data. Like, so, okay, so here's the thing. Even though I originally created three-step approach that everybody now does, my view is it's actually wrong and we shouldn't use it. [00:44:36]Jeremy: And that's because people are using it in a way different to why I created it. You know, I created it thinking the task-specific models would be more specific. You know, it's like, oh, this is like a sentiment classifier as an example of a task, you know, but the tasks now are like a, you know, RLHF, which is basically like answer questions that make people feel happy about your answer. So that's a much more general task and it's a really cool approach. And so we see, for example, RLHF also breaks models like, you know, like GPT-4, RLHDEFT, we know from kind of the work that Microsoft did, you know, the pre, the earlier, less aligned version was better. And these are all kind of examples of catastrophic forgetting. And so to me, the right way to do this is to fine-tune language models, is to actually throw away the idea of fine-tuning. There's no such thing. There's only continued pre-training. And pre-training is something where from the very start, you try to include all the kinds of data that you care about, all the kinds of problems that you care about, instructions, exercises, code, general purpose document completion, whatever. And then as you train, you gradually curate that, you know, you gradually make that higher and higher quality and more and more specific to the kinds of tasks you want it to do. But you never throw away any data. You always keep all of the data types there in reasonably high quantities. You know, maybe the quality filter, you stop training on low quality data, because that's probably fine to forget how to write badly, maybe. So yeah, that's now my view, is I think ULM fit is the wrong approach. And that's why we're seeing a lot of these, you know, so-called alignment tacks and this view of like, oh, a model can't both code and do other things. And, you know, I think it's actually because people are training them wrong. [00:46:47]Swyx: Yeah, well, I think you have a clear [00:46:51]Alessio: anti-laziness approach. I think other people are not as good hearted, you know, they're like, [00:46:57]Swyx: hey, they told me this thing works. [00:46:59]Alessio: And if I release a model this way, people will appreciate it, I'll get promoted and I'll kind of make more money. [00:47:06]Jeremy: Yeah, and it's not just money. It's like, this is how citations work most badly, you know, so if you want to get cited, you need to write a paper that people in your field recognize as an advancement on things that we know are good. And so we've seen this happen again and again. So like I say, like zero shot and few shot learning, everybody was writing about that. Or, you know, with image generation, everybody just was writing about GANs, you know, and I was trying to say like, no, GANs are not the right approach. You know, and I showed again through research that we demonstrated in our videos that you can do better than GANs, much faster and with much less data. And nobody cared because again, like if you want to get published, you write a GAN paper that slightly improves this part of GANs and this tiny field, you'll get published, you know. So it's, yeah, it's not set up for real innovation. It's, you know, again, it's really helpful for me, you know, I have my own research lab with nobody telling me what to do and I don't even publish. So it doesn't matter if I get citations. And so I just write what I think actually matters. I wish there was, and, you know, and actually places like OpenAI, you know, the researchers there can do that as well. It's a shame, you know, I wish there was more academic, open venues in which people can focus on like genuine innovation. [00:48:38]Swyx: Twitter, which is unironically has become a little bit of that forum. I wanted to follow up on one thing that you mentioned, which is that you checked around the open source discords. I don't know if it's too, I don't know if it's a pusher to ask like what discords are lively or useful right now. I think that something I definitely felt like I missed out on was the early days of Luther AI, which is a very hard bit. And, you know, like what is the new Luther? And you actually shouted out the alignment lab AI discord in your blog post. And that was the first time I even knew, like I saw them on Twitter, never knew they had a discord, never knew that there was actually substantive discussions going on in there and that you were an active member of it. Okay, yeah. [00:49:23]Jeremy: And then even then, if you do know about that and you go there, it'll look like it's totally dead. And that's because unfortunately, nearly all the discords, nearly all of the conversation happens in private channels. You know, and that's, I guess. [00:49:35]Swyx: How does someone get into that world? Because it's obviously very, very instructive, right? [00:49:42]Jeremy: You could just come to the first AI discord, which I'll be honest with you, it's less bustling than some of the others, but it's not terrible. And so like, at least, to be fair, one of Emma's bustling channels is private. [00:49:57]Swyx: I guess. [00:49:59]Jeremy: So I'm just thinking. [00:50:01]Swyx: It's just the nature of quality discussion, right? Yeah, I guess when I think about it, [00:50:05]Jeremy: I didn't have any private discussions on our discord for years, but there was a lot of people who came in with like, oh, I just had this amazing idea for AGI. If you just thought about like, if you imagine that AI is a brain, then we, you know, this just, I don't want to talk about it. You know, I don't want to like, you don't want to be dismissive or whatever. And it's like, oh, well, that's an interesting comment, but maybe you should like, try training some models first to see if that aligns with your intuition. Like, oh, but how could I possibly learn? It's like, well, we have a course, just actually spend time learning. Like, you know, anyway. And there's like, okay, I know the people who always have good answers there. And so I created a private channel and put them all in it. And I got to admit, that's where I post more often because there's much less, you know, flight of fancy views about how we could solve AGI, blah, blah, blah. So there is a bit of that. But having said that, like, I think the bar is pretty low. Like if you join a Discord and you can hit the like participants or community or whatever button, you can see who's in it. And then you'll see at the top, who the admins or moderators or people in the dev role are. And just DM one of them and say like, oh, here's my GitHub. Well, here's some blog posts I wrote. You know, I'm interested in talking about this, you know, can I join the private channels? And I've never heard of anybody saying no. I will say, you know, Alutha's all pretty open. So you can do the Alutha Discord still. You know, one problem with the Alutha Discord is it's been going on for so long that it's like, it's very inside baseball. It's quite hard to get started. Yeah. Carpa AI looks, I think it's all open. That's just less stability. That's more accessible. [00:52:03]Swyx: Yeah. [00:52:04]Jeremy: There's also just recently, now it's research that does like the Hermes models and data set just opened. They've got some private channels, but it's pretty open, I think. You mentioned Alignment Lab, that one it's all the interesting stuff is on private channels. So just ask. If you know me, ask me, cause I've got admin on that one. There's also, yeah, OS Skunkworks, OS Skunkworks AI is a good Discord, which I think it's open. So yeah, they're all pretty good. [00:52:40]Swyx: I don't want you to leak any, you know, Discords that don't want any publicity, but this is all helpful. [00:52:46]Jeremy: We all want people, like we all want people. [00:52:49]Swyx: We just want people who like, [00:52:51]Jeremy: want to build stuff, rather than people who, and like, it's fine to not know anything as well, but if you don't know anything, but you want to tell everybody else what to do and how to do it, that's annoying. If you don't know anything and want to be told like, here's a really small kind of task that as somebody who doesn't know anything is going to take you a really long time to do, but it would still be helpful. Then, and then you go and do it. That would be great. The truth is, yeah, [00:53:19]Swyx: like, I don't know, [00:53:20]Jeremy: maybe 5% of people who come in with great enthusiasm and saying that they want to learn and they'll do anything. [00:53:25]Swyx: And then somebody says like, [00:53:25]Jeremy: okay, here's some work you can do. Almost nobody does that work. So if you're somebody who actually does the work and follows up, you will massively stand out. That's an extreme rarity. And everybody will then want to help you do more work. [00:53:41]Swyx: So yeah. [00:53:41]Jeremy: So just, yeah, just do work and people will want to support you. [00:53:47]Alessio: Our Discord used to be referral only for a long time. We didn't have a public invite and then we opened it and they're kind of like channel gating. Yeah. A lot of people just want to do, I remember it used to be like, you know, a forum moderator. [00:54:00]Swyx: It's like people just want to do [00:54:01]Alessio: like drive-by posting, [00:54:03]Swyx: you know, and like, [00:54:03]Alessio: they don't want to help the community. They just want to get their question answered. [00:54:07]Jeremy: I mean, the funny thing is our forum community does not have any of that garbage. You know, there's something specific about the low latency thing where people like expect an instant answer. And yeah, we're all somehow in a forum thread where they know it's like there forever. People are a bit more thoughtful, but then the forums are less active than they used to be because Discord has got more popular, you know? So it's all a bit of a compromise, you know, running a healthy community is, yeah, it's always a bit of a challenge. All right, we got so many more things [00:54:47]Alessio: we want to dive in, but I don't want to keep you here for hours. [00:54:50]Swyx: This is not the Lex Friedman podcast [00:54:52]Alessio: we always like to say. One topic I would love to maybe chat a bit about is Mojo, modular, you know, CrystalLiner, not many of you on the podcast. So we want to spend a little time there. You recently did a hacker's guide to language models and you ran through everything from quantized model to like smaller models, larger models, and all of that. But obviously modular is taking its own approach. Yeah, what got you excited? I know you and Chris have been talking about this for like years and a lot of the ideas you had, so. [00:55:23]Jeremy: Yeah, yeah, yeah, yeah, no, absolutely. So I met Chris, I think it was at the first TensorFlow Dev Summit. And I don't think he had even like, I'm not sure if he'd even officially started his employment with Google at that point. So I don't know, you know, certainly nothing had been mentioned. So I, you know, I admired him from afar with LLVM and Swift and whatever. And so I saw him walk into the courtyard at Google. It's just like, oh s**t, man, that's Chris Latner. I wonder if he would lower his standards enough to talk to me. Well, worth a try. So I caught up my courage because like nobody was talking to him. He looked a bit lost and I wandered over and it's like, oh, you're Chris Latner, right? It's like, what are you doing here? What are you doing here? And I was like, yeah, yeah, yeah. It's like, oh, I'm Jeremy Howard. It's like, oh, do you do some of this AI stuff? And I was like, yeah, yeah, I like this AI stuff. Are you doing AI stuff? It's like, well, I'm thinking about starting to do some AI stuff. Yeah, I think it's going to be cool. And it's like, wow. So like, I spent the next half hour just basically brain dumping all the ways in which AI was stupid to him. And he listened patiently. And I thought he probably wasn't even remember or care or whatever. But yeah, then I kind of like, I guess I re-caught up with him a few months later. And it's like, I've been thinking about everything you said in that conversation. And he like narrated back his response to every part of it, projects he was planning to do. And it's just like, oh, this dude follows up. Holy s**t. And I was like, wow, okay. And he was like, yeah, so we're going to create this new thing called Swift for TensorFlow. And it's going to be like, it's going to be a compiler with auto differentiation built in. And blah, blah, blah. And I was like, why would that help? [00:57:10]Swyx: You know, why would you? [00:57:10]Jeremy: And he was like, okay, with a compiler during the forward pass, you don't have to worry about saving context, you know, because a lot will be optimized in the backward. But I was like, oh my God. Because I didn't really know much about compilers. You know, I spent enough to kind of like, understand the ideas, but it hadn't occurred to me that a compiler basically solves a lot of the problems we have as end users. I was like, wow, that's amazing. Okay, you do know, right, that nobody's going to use this unless it's like usable. It's like, yeah, I know, right. So I was thinking you should create like a fast AI for this. So, okay, but I don't even know Swift. And he was like, well, why don't you start learning it? And if you have any questions, ask me. It's just like, holy s**t. Like, not only has Chris Latner lowered his standards enough to talk to me, but he's offering me personal tutoring on the programming language that he made. So I was just like, I'm not g
theCUBE hosts Rob Strechay and Rebecca Knight wrap up our coverage of Teradata Possible 2023
Tony Baer, Principal at dbInsight, joins Corey on Screaming in the Cloud to discuss his definition of what is and isn't a database, and the trends he's seeing in the industry. Tony explains why it's important to try and have an outsider's perspective when evaluating new ideas, and the growing awareness of the impact data has on our daily lives. Corey and Tony discuss the importance of working towards true operational simplicity in the cloud, and Tony also shares why explainability in generative AI is so crucial as the technology advances. About TonyTony Baer, the founder and CEO of dbInsight, is a recognized industry expert in extending data management practices, governance, and advanced analytics to address the desire of enterprises to generate meaningful value from data-driven transformation. His combined expertise in both legacy database technologies and emerging cloud and analytics technologies shapes how clients go to market in an industry undergoing significant transformation. During his 10 years as a principal analyst at Ovum, he established successful research practices in the firm's fastest growing categories, including big data, cloud data management, and product lifecycle management. He advised Ovum clients regarding product roadmap, positioning, and messaging and helped them understand how to evolve data management and analytic strategies as the cloud, big data, and AI moved the goal posts. Baer was one of Ovum's most heavily-billed analysts and provided strategic counsel to enterprises spanning the Fortune 100 to fast-growing privately held companies.With the cloud transforming the competitive landscape for database and analytics providers, Baer led deep dive research on the data platform portfolios of AWS, Microsoft Azure, and Google Cloud, and on how cloud transformation changed the roadmaps for incumbents such as Oracle, IBM, SAP, and Teradata. While at Ovum, he originated the term “Fast Data” which has since become synonymous with real-time streaming analytics.Baer's thought leadership and broad market influence in big data and analytics has been formally recognized on numerous occasions. Analytics Insight named him one of the 2019 Top 100 Artificial Intelligence and Big Data Influencers. Previous citations include Onalytica, which named Baer as one of the world's Top 20 thought leaders and influencers on Data Science; Analytics Week, which named him as one of 200 top thought leaders in Big Data and Analytics; and by KDnuggets, which listed Baer as one of the Top 12 top data analytics thought leaders on Twitter. While at Ovum, Baer was Ovum's IT's most visible and publicly quoted analyst, and was cited by Ovum's parent company Informa as Brand Ambassador in 2017. In raw numbers, Baer has 14,000 followers on Twitter, and his ZDnet “Big on Data” posts are read 20,000 – 30,000 times monthly. He is also a frequent speaker at industry conferences such as Strata Data and Spark Summit.Links Referenced:dbInsight: https://dbinsight.io/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is brought to us in part by our friends at RedHat.As your organization grows, so does the complexity of your IT resources. You need a flexible solution that lets you deploy, manage, and scale workloads throughout your entire ecosystem. The Red Hat Ansible Automation Platform simplifies the management of applications and services across your hybrid infrastructure with one platform. Look for it on the AWS Marketplace.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Back in my early formative years, I was an SRE sysadmin type, and one of the areas I always avoided was databases, or frankly, anything stateful because I am clumsy and unlucky and that's a bad combination to bring within spitting distance of anything that, you know, can't be spun back up intact, like databases. So, as a result, I tend not to spend a lot of time historically living in that world. It's time to expand horizons and think about this a little bit differently. My guest today is Tony Baer, principal at dbInsight. Tony, thank you for joining me.Tony: Oh, Corey, thanks for having me. And by the way, we'll try and basically knock down your primal fear of databases today. That's my mission.Corey: We're going to instill new fears in you. Because I was looking through a lot of your work over the years, and the criticism I have—and always the best place to deliver criticism is massively in public—is that you take a very conservative, stodgy approach to defining a database, whereas I'm on the opposite side of the world. I contain information. You can ask me about it, which we'll call querying. That's right. I'm a database.But I've never yet found myself listed in any of your analyses around various database options. So, what is your definition of databases these days? Where do they start and stop? Tony: Oh, gosh.Corey: Because anything can be a database if you hold it wrong.Tony: [laugh]. I think one of the last things I've ever been called as conservative and stodgy, so this is certainly a way to basically put the thumbtack on my share.Corey: Exactly. I'm trying to normalize my own brand of lunacy, so we'll see how it goes.Tony: Exactly because that's the role I normally play with my clients. So, now the shoe is on the other foot. What I view a database is, is basically a managed collection of data, and it's managed to the point where essentially, a database should be transactional—in other words, when I basically put some data in, I should have some positive information, I should hopefully, depending on the type of database, have some sort of guidelines or schema or model for how I structure the data. So, I mean, database, you know, even though you keep hearing about unstructured data, the fact is—Corey: Schemaless databases and data stores. Yeah, it was all the rage for a few years.Tony: Yeah, except that they all have schemas, just that those schemaless databases just have very variable schema. They're still schema.Corey: A question that I have is you obviously think deeply about these things, which should not come as a surprise to anyone. It's like, “Well, this is where I spend my entire career. Imagine that. I might think about the problem space a little bit.” But you have, to my understanding, never worked with databases in anger yourself. You don't have a history as a DBA or as an engineer—Tony: No.Corey: —but what I find very odd is that unlike a whole bunch of other analysts that I'm not going to name, but people know who I'm talking about regardless, you bring actual insights into this that I find useful and compelling, instead of reverting to the mean of well, I don't actually understand how any of these things work in reality, so I'm just going to believe whoever sounds the most confident when I ask a bunch of people about these things. Are you just asking the right people who also happen to sound confident? But how do you get away from that very common analyst trap?Tony: Well, a couple of things. One is I purposely play the role of outside observer. In other words, like, the idea is that if basically an idea is supposed to stand on its own legs, it has to make sense. If I've been working inside the industry, I might take too many things for granted. And a good example of this goes back, actually, to my early days—actually this goes back to my freshman year in college where I was taking an organic chem course for non-majors, and it was taught as a logic course not as a memorization course.And we were given the option at the end of the term to either, basically, take a final or do a paper. So, of course, me being a writer I thought, I can BS my way through this. But what I found—and this is what fascinated me—is that as long as certain technical terms were defined for me, I found a logic to the way things work. And so, that really informs how I approach databases, how I approach technology today is I look at the logic on how things work. That being said, in order for me to understand that, I need to know twice as much as the next guy in order to be able to speak that because I just don't do this in my sleep.Corey: That goes a big step toward, I guess, addressing a lot of these things, but it also feels like—and maybe this is just me paying closer attention—that the world of databases and data and analytics have really coalesced or emerged in a very different way over the past decade-ish. It used to be, at least from my perspective, that oh, that the actual, all the data we store, that's a storage admin problem. And that was about managing NetApps and SANs and the rest. And then you had the database side of it, which functionally from the storage side of the world was just a big file or series of files that are the backing store for the database. And okay, there's not a lot of cross-communication going on there.Then with the rise of object store, it started being a little bit different. And even the way that everyone is talking about getting meaning from data has really seem to be evolving at an incredibly intense clip lately. Is that an accurate perception, or have I just been asleep at the wheel for a while and finally woke up?Tony: No, I think you're onto something there. And the reason is that, one, data is touching us all around ourselves, and the fact is, I mean, I'm you can see it in the same way that all of a sudden that people know how to spell AI. They may not know what it means, but the thing is, there is an awareness the data that we work with, the data that is about us, it follows us, and with the cloud, this data has—well, I should say not just with the cloud but with smart mobile devices—we'll blame that—we are all each founts of data, and rich founts of data. And people in all walks of life, not just in the industry, are now becoming aware of it and there's a lot of concern about can we have any control, any ownership over the data that should be ours? So, I think that phenomenon has also happened in the enterprise, where essentially where we used to think that the data was the DBAs' issue, it's become the app developers' issue, it's become the business analysts' issue. Because the answers that we get, we're ultimately accountable for. It all comes from the data.Corey: It also feels like there's this idea of databases themselves becoming more contextually aware of the data contained within them. Originally, this used to be in the realm of, “Oh, we know what's been accessed recently and we can tier out where it lives for storage optimization purposes.” Okay, great, but what I'm seeing now almost seems to be a sense of, people like to talk about pouring ML into their database offerings. And I'm not able to tell whether that is something that adds actual value, or if it's marketing-ware.Tony: Okay. First off, let me kind of spill a couple of things. First of all, it's not a question of the database becoming aware. A database is not sentient.Corey: Niether are some engineers, but that's neither here nor there.Tony: That would be true, but then again, I don't want anyone with shotguns lining up at my door after this—Corey: [laugh].Tony: —after this interview is published. But [laugh] more of the point, though, is that I can see a couple roles for machine learning in databases. One is a database itself, the logs, are an incredible font of data, of operational data. And you can look at trends in terms of when this—when the pattern of these logs goes this way, that is likely to happen. So, the thing is that I could very easily say we're already seeing it: machine learning being used to help optimize the operation of databases, if you're Oracle, and say, “Hey, we can have a database that runs itself.”The other side of the coin is being able to run your own machine-learning models in database as opposed to having to go out into a separate cluster and move the data, and that's becoming more and more of a checkbox feature. However, that's going to be for essentially, probably, like, the low-hanging fruit, like the 80/20 rule. It'll be like the 20% of an ana—of relatively rudimentary, you know, let's say, predictive analyses that we can do inside the database. If you're going to be doing something more ambitious, such as a, you know, a large language model, you probably do not want to run that in database itself. So, there's a difference there.Corey: One would hope. I mean, one of the inappropriate uses of technology that I go for all the time is finding ways to—as directed or otherwise—in off-label uses find ways of tricking different services into running containers for me. It's kind of a problem; this is probably why everyone is very grateful I no longer write production code for anyone.But it does seem that there's been an awful lot of noise lately. I'm lazy. I take shortcuts very often, and one of those is that whenever AWS talks about something extensively through multiple marketing cycles, it becomes usually a pretty good indicator that they're on their back foot on that area. And for a long time, they were doing that about data and how it's very important to gather data, it unlocks the key to your business, but it always felt a little hollow-slash-hypocritical to me because you're going to some of the same events that I have that AWS throws on. You notice how you have to fill out the exact same form with a whole bunch of mandatory fields every single time, but there never seems to be anything that gets spat back out to you that demonstrates that any human or system has ever read—Tony: Right.Corey: Any of that? It's basically a, “Do what we say, not what we do,” style of story. And I always found that to be a little bit disingenuous.Tony: I don't want to just harp on AWS here. Of course, we can always talk about the two-pizza box rule and the fact that you have lots of small teams there, but I'd rather generalize this. And I think you really—what you're just describing is been my trip through the healthcare system. I had some sports-related injuries this summer, so I've been through a couple of surgeries to repair sports injuries. And it's amazing that every time you go to the doctor's office, you're filling the same HIPAA information over and over again, even with healthcare systems that use the same electronic health records software. So, it's more a function of that it's not just that the technologies are siloed, it's that the organizations are siloed. That's what you're saying.Corey: That is fair. And I think at some level—I don't know if this is a weird extension of Conway's Law or whatnot—but these things all have different backing stores as far as data goes. And there's a—the hard part, it seems, in a lot of companies once they hit a certain point of maturity is not just getting the data in—because they've already done that to some extent—but it's also then making it actionable and helping various data stores internal to the company reconcile with one another and start surfacing things that are useful. It increasingly feels like it's less of a technology problem and more of a people problem.Tony: It is. I mean, put it this way, I spent a lot of time last year, I burned a lot of brain cells working on data fabrics, which is an idea that's in the idea of the beholder. But the ideal of a data fabric is that it's not the tool that necessarily governs your data or secures your data or moves your data or transforms your data, but it's supposed to be the master orchestrator that brings all that stuff together. And maybe sometime 50 years in the future, we might see that.I think the problem here is both technical and organizational. [unintelligible 00:11:58] a promise, you have all these what we used call island silos. We still call them silos or islands of information. And actually, ironically, even though in the cloud we have technologies where we can integrate this, the cloud has actually exacerbated this issue because there's so many islands of information, you know, coming up, and there's so many different little parts of the organization that have their hands on that. That's also a large part of why there's such a big discussion about, for instance, data mesh last year: everybody is concerned about owning their own little piece of the pie, and there's a lot of question in terms of how do we get some consistency there? How do we all read from the same sheet of music? That's going to be an ongoing problem. You and I are going to get very old before that ever gets solved.Corey: Yeah, there are certain things that I am content to die knowing that they will not get solved. If they ever get solved, I will not live to see it, and there's a certain comfort in that, on some level.Tony: Yeah.Corey: But it feels like this stuff is also getting more and more complicated than it used to be, and terms aren't being used in quite the same way as they once were. Something that a number of companies have been saying for a while now has been that customers overwhelmingly are preferring open-source. Open source is important to them when it comes to their database selection. And I feel like that's a conflation of a couple of things. I've never yet found an ideological, purity-driven customer decision around that sort of thing.What they care about is, are there multiple vendors who can provide this thing so I'm not going to be using a commercially licensed database that can arbitrarily start playing games with seat licenses and wind up distorting my cost structure massively with very little notice. Does that align with your—Tony: Yeah.Corey: Understanding of what people are talking about when they say that, or am I missing something fundamental? Which is again, always possible?Tony: No, I think you're onto something there. Open-source is a whole other can of worms, and I've burned many, many brain cells over this one as well. And today, you're seeing a lot of pieces about the, you know, the—that are basically giving eulogies for open-source. It's—you know, like HashiCorp just finally changed its license and a bunch of others have in the database world. What open-source has meant is been—and I think for practitioners, for DBAs and developers—here's a platform that's been implemented by many different vendors, which means my skills are portable.And so, I think that's really been the key to why, for instance, like, you know, MySQL and especially PostgreSQL have really exploded, you know, in popularity. Especially Postgres, you know, of late. And it's like, you look at Postgres, it's a very unglamorous database. If you're talking about stodgy, it was born to be stodgy because they wanted to be an adult database from the start. They weren't the LAMP stack like MySQL.And the secret of success with Postgres was that it had a very permissive open-source license, which meant that as long as you don't hold University of California at Berkeley, liable, have at it, kids. And so, you see, like, a lot of different flavors of Postgres out there, which means that a lot of customers are attracted to that because if I get up to speed on this Postgres—on one Postgres database, my skills should be transferable, should be portable to another. So, I think that's a lot of what's happening there.Corey: Well, I do want to call that out in particular because when I was coming up in the naughts, the mid-2000s decade, the lingua franca on everything I used was MySQL, or as I insist on mispronouncing it, my-squeal. And lately, on same vein, Postgres-squeal seems to have taken over the entire universe, when it comes to the de facto database of choice. And I'm old and grumpy and learning new things as always challenging, so I don't understand a lot of the ways that thing gets managed from the context coming from where I did before, but what has driven the massive growth of mindshare among the Postgres-squeal set?Tony: Well, I think it's a matter of it's 30 years old and it's—number one, Postgres always positioned itself as an Oracle alternative. And the early years, you know, this is a new database, how are you going to be able to match, at that point, Oracle had about a 15-year headstart on it. And so, it was a gradual climb to respectability. And I have huge respect for Oracle, don't get me wrong on that, but you take a look at Postgres today and they have basically filled in a lot of the blanks.And so, it now is a very cre—in many cases, it's a credible alternative to Oracle. Can it do all the things Oracle can do? No. But for a lot of organizations, it's the 80/20 rule. And so, I think it's more just a matter of, like, Postgres coming of age. And the fact is, as a result of it coming of age, there's a huge marketplace out there and so much choice, and so much opportunity for skills portability. So, it's really one of those things where its time has come.Corey: I think that a lot of my own biases are simply a product of the era in which I learned how a lot of these things work on. I am terrible at Node, for example, but I would be hard-pressed not to suggest JavaScript as the default language that people should pick up if they're just entering tech today. It does front-end, it does back-end—Tony: Sure.Corey: —it even makes fries, apparently. There's a—that is the lingua franca of the modern internet in a bunch of different ways. That doesn't mean I'm any good at it, and it doesn't mean at this stage, I'm likely to improve massively at it, but it is the right move, even if it is inconvenient for me personally.Tony: Right. Right. Put it this way, we've seen—and as I said, I'm not an expert in programming languages, but we've seen a huge profusion of programming languages and frameworks. But the fact is that there's always been a draw towards critical mass. At the turn of the millennium, we thought is between Java and .NET. Little did we know that basically JavaScript—which at that point was just a web scripting language—[laugh] we didn't know that it could work on the server; we thought it was just a client. Who knew?Corey: That's like using something inappropriately as a database. I mean, good heavens.Tony: [laugh]. That would be true. I mean, when I could have, you know, easily just use a spreadsheet or something like that. But so, I mean, who knew? I mean, just like for instance, Java itself was originally conceived for a set-top box. You never know how this stuff is going to turn out. It's the same thing happen with Python. Python was also a web scripting language. Oh, by the way, it happens to be really powerful and flexible for data science. And whoa, you know, now Python is—in terms of data science languages—has become the new SaaS.Corey: It really took over in a bunch of different ways. Before that, Perl was great, and I go, “Why would I use—why write in Python when Perl is available?” It's like, “Okay, you know, how to write Perl, right?” “Yeah.” “Have you ever read anything a month later?” “Oh…” it's very much a write-only language. It is inscrutable after the fact. And Python at least makes that a lot more approachable, which is never a bad thing.Tony: Yeah.Corey: Speaking of what you touched on toward the beginning of this episode, the idea of databases not being sentient, which I equate to being self-aware, you just came out very recently with a report on generative AI and a trip that you wound up taking on this. Which I've read; I love it. In fact, we've both been independently using the phrase [unintelligible 00:19:09] to, “English is the new most common programming language once a lot of this stuff takes off.” But what have you seen? What have you witnessed as far as both the ground truth reality as well as the grandiose statements that companies are making as they trip over themselves trying to position as the forefront leader and all of this thing that didn't really exist five months ago?Tony: Well, what's funny is—and that's a perfect question because if on January 1st you asked “what's going to happen this year?” I don't think any of us would have thought about generative AI or large language models. And I will not identify the vendors, but I did some that had— was on some advanced briefing calls back around the January, February timeframe. They were talking about things like server lists, they were talking about in database machine learning and so on and so forth. They weren't saying anything about generative.And all of a sudden, April, it changed. And it's essentially just another case of the tail wagging the dog. Consumers were flocking to ChatGPT and enterprises had to take notice. And so, what I saw, in the spring was—and I was at a conference from SaaS, I'm [unintelligible 00:20:21] SAP, Oracle, IBM, Mongo, Snowflake, Databricks and others—that they all very quickly changed their tune to talk about generative AI. What we were seeing was for the most part, position statements, but we also saw, I think, the early emphasis was, as you say, it's basically English as the new default programming language or API, so basically, coding assistance, what I'll call conversational query.I don't want to call it natural language query because we had stuff like Tableau Ask Data, which was very robotic. So, we're seeing a lot of that. And we're also seeing a lot of attention towards foundation models because I mean, what organization is going to have the resources of a Google or an open AI to develop their own foundation model? Yes, some of the Wall Street houses might, but I think most of them are just going to say, “Look, let's just use this as a starting point.”I also saw a very big theme for your models with your data. And where I got a hint of that—it was a throwaway LinkedIn post. It was back in, I think like, February, Databricks had announced Dolly, which was kind of an experimental foundation model, just to use with your own data. And I just wrote three lines in a LinkedIn post, it was on Friday afternoon. By Monday, it had 65,000 hits.I've never seen anything—I mean, yes, I had a lot—I used to say ‘data mesh' last year, and it would—but didn't get anywhere near that. So, I mean, that really hit a nerve. And other things that I saw, was the, you know, the starting to look with vector storage and how that was going to be supported was it was going be a new type of database, and hey, let's have AWS come up with, like, an, you know, an [ADF 00:21:41] database here or is this going to be a feature? I think for the most part, it's going to be a feature. And of course, under all this, everybody's just falling in love, falling all over themselves to get in the good graces of Nvidia. In capsule, that's kind of like what I saw.Corey: That feels directionally accurate. And I think databases are a great area to point out one thing that's always been more a little disconcerting for me. The way that I've always viewed databases has been, unless I'm calling a RAND function or something like it and I don't change the underlying data structure, I should be able to run a query twice in a row and receive the same result deterministically both times.Tony: Mm-hm.Corey: Generative AI is effectively non-deterministic for all realistic measures of that term. Yes, I'm sure there's a deterministic reason things are under the hood. I am not smart enough or learned enough to get there. But it just feels like sometimes we're going to give you the answer you think you're going to get, sometimes we're going to give you a different answer. And sometimes, in generative AI space, we're going to be supremely confident and also completely wrong. That feels dangerous to me.Tony: [laugh]. Oh gosh, yes. I mean, I take a look at ChatGPT and to me, the responses are essentially, it's a high school senior coming out with an essay response without any footnotes. It's the exact opposite of an ACID database. The reason why we're very—in the database world, we're very strongly drawn towards ACID is because we want our data to be consistent and to get—if we ask the same query, we're going to get the same answer.And the problem is, is that with generative, you know, based on large language models, computers sounds sentient, but they're not. Large language models are basically just a series of probabilities, and so hopefully those probabilities will line up and you'll get something similar. That to me, kind of scares me quite a bit. And I think as we start to look at implementing this in an enterprise setting, we need to take a look at what kind of guardrails can we put on there. And the thing is, that what this led me to was that missing piece that I saw this spring with generative AI, at least in the data and analytics world, is nobody had a clue in terms of how to extend AI governance to this, how to make these models explainable. And I think that's still—that's a large problem. That's a huge nut that it's going to take the industry a while to crack.Corey: Yeah, but it's incredibly important that it does get cracked.Tony: Oh, gosh, yes.Corey: One last topic that I want to get into. I know you said you don't want to over-index on AWS, which, fair enough. It is where I spend the bulk of my professional time and energy—Tony: [laugh].Corey: Focusing on, but I think this one's fair because it is a microcosm of a broader industry question. And that is, I don't know what the DBA job of the future is going to look like, but increasingly, it feels like it's going to primarily be picking which purpose-built AWS database—or larger [story 00:24:56] purpose database is appropriate for a given workload. Even without my inappropriate misuse of things that are not databases as databases, they are legitimately 15 or 16 different AWS services that they position as database offerings. And it really feels like you're spiraling down a well of analysis paralysis, trying to pick between all these things. Do you think the future looks more like general-purpose databases, or very purpose-built and each one is this beautiful, bespoke unicorn?Tony: [laugh]. Well, this is basically a hit on a theme that I've been—you know, we've been all been thinking about for years. And the thing is, there are arguments to be made for multi-model databases, you know, versus a for-purpose database. That being said, okay, two things. One is that what I've been saying, in general, is that—and I wrote about this way, way back; I actually did a talk at the [unintelligible 00:25:50]; it was a throwaway talk, or [unintelligible 00:25:52] one of those conferences—I threw it together and it's basically looking at the emergence of all these specialized databases.But how I saw, also, there's going to be kind of an overlapping. Not that we're going to come back to Pangea per se, but that, for instance, like, a relational database will be able to support JSON. And Oracle, for instance, does has some fairly brilliant ideas up the sleeve, what they call a JSON duality, which sounds kind of scary, which basically says, “We can store data relationally, but superimpose GraphQL on top of all of this and this is going to look really JSON-y.” So, I think on one hand, you are going to be seeing databases that do overlap. Would I use Oracle for a MongoDB use case? No, but would I use Oracle for a case where I might have some document data? I could certainly see that.The other point, though, and this is really one I want to hammer on here—it's kind of a major concern I've had—is I think the cloud vendors, for all their talk that we give you operational simplicity and agility are making things very complex with its expanding cornucopia of services. And what they need to do—I'm not saying, you know, let's close down the patent office—what I think we do is we need to provide some guided experiences that says, “Tell us the use case. We will now blend these particular services together and this is the package that we would suggest.” I think cloud vendors really need to go back to the drawing board from that standpoint and look at, how do we bring this all together? How would he really simplify the life of the customer?Corey: That is, honestly, I think the biggest challenge that the cloud providers have across the board. There are hundreds of services available at this point from every hyperscaler out there. And some of them are brand new and effectively feel like they're there for three or four different customers and that's about it and others are universal services that most people are probably going to use. And most things fall in between those two extremes, but it becomes such an analysis paralysis moment of trying to figure out what do I do here? What is the golden path?And what that means is that when you start talking to other people and asking their opinion and getting their guidance on how to do something when you get stuck, it's, “Oh, you're using that service? Don't do it. Use this other thing instead.” And if you listen to that, you get midway through every problem for them to start over again because, “Oh, I'm going to pick a different selection of underlying components.” It becomes confusing and complicated, and I think it does customers largely a disservice. What I think we really need, on some level, is a simplified golden path with easy on-ramps and easy off-ramps where, in the absence of a compelling reason, this is what you should be using.Tony: Believe it or not, I think this would be a golden case for machine learning.Corey: [laugh].Tony: No, but submit to us the characteristics of your workload, and here's a recipe that we would propose. Obviously, we can't trust AI to make our decisions for us, but it can provide some guardrails.Corey: “Yeah. Use a graph database. Trust me, it'll be fine.” That's your general purpose—Tony: [laugh].Corey: —approach. Yeah, that'll end well.Tony: [laugh]. I would hope that the AI would basically be trained on a better set of training data to not come out with that conclusion.Corey: One could sure hope.Tony: Yeah, exactly.Corey: I really want to thank you for taking the time to catch up with me around what you're doing. If people want to learn more, where's the best place for them to find you?Tony: My website is dbinsight.io. And on my homepage, I list my latest research. So, you just have to go to the homepage where you can basically click on the links to the latest and greatest. And I will, as I said, after Labor Day, I'll be publishing my take on my generative AI journey from the spring.Corey: And we will, of course, put links to this in the [show notes 00:29:39]. Thank you so much for your time. I appreciate it.Tony: Hey, it's been a pleasure, Corey. Good seeing you again.Corey: Tony Baer, principal at dbInsight. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry, insulting comment that we will eventually stitch together with all those different platforms to create—that's right—a large-scale distributed database.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
Guest: Kathy Cullen-Cote, Chief People Officer at Teradata What does it mean to have a fully flexible work environment in 2023? What elements of the employee experience need to be factored into making a decision on the balance between in-office and remote work? In this latest episode of the HR Works Podcast, Kathy Cullen-Cote, Teradata‘s Chief People Officer, helps us understand what it means to embrace a fully flexible work environment – one that provides employees with the right balance between in-office and remote work, all without sacrificing the employee experience. An expert in building engaged, connected, and caring workplace cultures, Kathy explains how organizations and their HR leadership teams are being defined by their decisions of being in-office, fully remote, or hybrid and shares how HR leaders can best determine the right balance for their workforce.
Summary All software systems are in a constant state of evolution. This makes it impossible to select a truly future-proof technology stack for your data platform, making an eventual migration inevitable. In this episode Gleb Mezhanskiy and Rob Goretsky share their experiences leading various data platform migrations, and the hard-won lessons that they learned so that you don't have to. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack (https://www.dataengineeringpodcast.com/rudderstack) Modern data teams are using Hex to 10x their data impact. Hex combines a notebook style UI with an interactive report builder. This allows data teams to both dive deep to find insights and then share their work in an easy-to-read format to the whole org. In Hex you can use SQL, Python, R, and no-code visualization together to explore, transform, and model data. Hex also has AI built directly into the workflow to help you generate, edit, explain and document your code. The best data teams in the world such as the ones at Notion, AngelList, and Anthropic use Hex for ad hoc investigations, creating machine learning models, and building operational dashboards for the rest of their company. Hex makes it easy for data analysts and data scientists to collaborate together and produce work that has an impact. Make your data team unstoppable with Hex. Sign up today at dataengineeringpodcast.com/hex (https://www.dataengineeringpodcast.com/hex) to get a 30-day free trial for your team! Your host is Tobias Macey and today I'm interviewing Gleb Mezhanskiy and Rob Goretsky about when and how to think about migrating your data stack Interview Introduction How did you get involved in the area of data management? A migration can be anything from a minor task to a major undertaking. Can you start by describing what constitutes a migration for the purposes of this conversation? Is it possible to completely avoid having to invest in a migration? What are the signals that point to the need for a migration? What are some of the sources of cost that need to be accounted for when considering a migration? (both in terms of doing one, and the costs of not doing one) What are some signals that a migration is not the right solution for a perceived problem? Once the decision has been made that a migration is necessary, what are the questions that the team should be asking to determine the technologies to move to and the sequencing of execution? What are the preceding tasks that should be completed before starting the migration to ensure there is no breakage downstream of the changing component(s)? What are some of the ways that a migration effort might fail? What are the major pitfalls that teams need to be aware of as they work through a data platform migration? What are the opportunities for automation during the migration process? What are the most interesting, innovative, or unexpected ways that you have seen teams approach a platform migration? What are the most interesting, unexpected, or challenging lessons that you have learned while working on data platform migrations? What are some ways that the technologies and patterns that we use can be evolved to reduce the cost/impact/need for migraitons? Contact Info Gleb LinkedIn (https://www.linkedin.com/in/glebmezh/) @glebmm (https://twitter.com/glebmm) on Twitter Rob LinkedIn (https://www.linkedin.com/in/robertgoretsky/) RobGoretsky (https://github.com/RobGoretsky) on GitHub Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. To help other people find the show please leave a review on Apple Podcasts (https://podcasts.apple.com/us/podcast/data-engineering-podcast/id1193040557) and tell your friends and co-workers Links Datafold (https://www.datafold.com/) Podcast Episode (https://www.dataengineeringpodcast.com/datafold-proactive-data-quality-episode-205/) Informatica (https://www.informatica.com/) Airflow (https://airflow.apache.org/) Snowflake (https://www.snowflake.com/en/) Podcast Episode (https://www.dataengineeringpodcast.com/snowflakedb-cloud-data-warehouse-episode-110/) Redshift (https://aws.amazon.com/redshift/) Eventbrite (https://www.eventbrite.com/) Teradata (https://www.teradata.com/) BigQuery (https://cloud.google.com/bigquery) Trino (https://trino.io/) EMR == Elastic Map-Reduce (https://aws.amazon.com/emr/) Shadow IT (https://en.wikipedia.org/wiki/Shadow_IT) Podcast Episode (https://www.dataengineeringpodcast.com/shadow-it-data-analytics-episode-121) Mode Analytics (https://mode.com/) Looker (https://cloud.google.com/looker/) Sunk Cost Fallacy (https://en.wikipedia.org/wiki/Sunk_cost) data-diff (https://github.com/datafold/data-diff) Podcast Episode (https://www.dataengineeringpodcast.com/data-diff-open-source-data-integration-validation-episode-303/) SQLGlot (https://github.com/tobymao/sqlglot) Dagster (dhttps://dagster.io/) dbt (https://www.getdbt.com/) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)
In this episode of The Power Producers Podcast, David Carothers and guest co-host Abe Gibson interview Chris Walker, CEO of Refine Labs. Chris discusses how Refine Labs helps companies to execute marketing strategies to start generating revenue, and the importance of digital marketing in today's business landscape. Episode Highlights: Chris explains how he recognized the unique marketing approach he had created and decided to start helping companies at scale through Refine Labs. (2:48) Chris discusses how each platform has a moment in time when it offers the most upside for content marketing. (7:50) Chris mentions that users' expectations for content consumption and execution will need to rise as AI grows and content availability increases. (15:13) Chris explains that effectively marketing your product or service requires dividing your target market into two categories: those actively looking to buy and those not. (20:18) Chris mentions that there are moments of arbitrage and low barriers to entry in crypto, where people can create cheap, high-quality assets. (36:47) Chris discusses the scale and impact of word-of-mouth marketing in today's digital landscape. (39:05) Chris shares his insight on the importance of collecting direct feedback from customers to understand what marketing strategies are truly effective. (44:31) Chris mentions which type of accounts see the most success with growth with Refine Labs. (48:39) Chris believes that in business, it's crucial to align expectations with actual practice and work to achieve success. (53:36) Tweetable Quotes: “It's true that you can put content on any platform, but it's undeniable that each platform has a moment in time, where there's significantly more upside than any other place on the internet.” - Chris Walker “As AI continues to grow and content availability, free content availability goes up people's bar for the content that they believe, consume, and execute against needs to go way up, or you're going to potentially be taking a lot of bad advice.” - Chris Walker “We typically find the most success with growth, what most people would consider mid-market segment or enterprise segment companies. So that could be somewhere between 300 employees and above. We work with companies like IBM and Teradata, so it's not like there are companies that are too big or too complex for us. But we also have a ton of success with what will be called single product, single market or platform software.” - Chris Walker Resources Mentioned: Chris Walker LinkedIn Refine Labs Abe Gibson David Carothers Kyle Houck Florida Risk Partners The Extra 2 Minutes
In this episode of The Power Producers Podcast, David Carothers and guest co-host Abe Gibson interview Chris Walker, CEO of Refine Labs. Chris discusses how Refine Labs helps companies to execute marketing strategies to start generating revenue, and the importance of digital marketing in today's business landscape. Episode Highlights: Chris explains how he recognized the unique marketing approach he had created and decided to start helping companies at scale through Refine Labs. (2:48) Chris discusses how each platform has a moment in time when it offers the most upside for content marketing. (7:50) Chris mentions that users' expectations for content consumption and execution will need to rise as AI grows and content availability increases. (15:13) Chris explains that effectively marketing your product or service requires dividing your target market into two categories: those actively looking to buy and those not. (20:18) Chris mentions that there are moments of arbitrage and low barriers to entry in crypto, where people can create cheap, high-quality assets. (36:47) Chris discusses the scale and impact of word-of-mouth marketing in today's digital landscape. (39:05) Chris shares his insight on the importance of collecting direct feedback from customers to understand what marketing strategies are truly effective. (44:31) Chris mentions which type of accounts see the most success with growth with Refine Labs. (48:39) Chris believes that in business, it's crucial to align expectations with actual practice and work to achieve success. (53:36) Tweetable Quotes: “It's true that you can put content on any platform, but it's undeniable that each platform has a moment in time, where there's significantly more upside than any other place on the internet.” - Chris Walker “As AI continues to grow and content availability, free content availability goes up people's bar for the content that they believe, consume, and execute against needs to go way up, or you're going to potentially be taking a lot of bad advice.” - Chris Walker “We typically find the most success with growth, what most people would consider mid-market segment or enterprise segment companies. So that could be somewhere between 300 employees and above. We work with companies like IBM and Teradata, so it's not like there are companies that are too big or too complex for us. But we also have a ton of success with what will be called single product, single market or platform software.” - Chris Walker Resources Mentioned: Chris Walker LinkedIn Refine Labs Abe Gibson David Carothers Kyle Houck Florida Risk Partners The Extra 2 Minutes