Mark and Allen talk about the latest news in the VoiceFirst world from a developer point of view.
Mark Tucker and Allen Firstenberg celebrate 200 episodes and four years of Two Voice Devs! In this special episode, they reflect on the journey so far, the evolution of the AI landscape, and what excites them most about the future of development. Join them as they discuss: 00:00 Four years ago... 00:10 The evolution of large language models (LLMs) and how the landscape has shifted over the past year. 03:10 The emergence of new players in the AI model space and how Google, Microsoft, and Amazon are vying for dominance. 05:30 The growing trend of smaller and locally deployable models and the future of AI development. 08:00 The ongoing quest for seamless integration of conversational AI with web experiences. 10:30 The need for a convergence of traditional NLU concepts with modern AI approaches. 11:30 The pressing need for sustainability and responsible development in the AI space. 14:00 The importance of integrating AI tools with existing methods and workflows . 16:00 An open invitation for developers to join Mark and Allen as co-hosts and share their perspectives on AI development. 18:00 A reminder that learning is at the heart of the developer experience and the importance of community. 20:00 The highlights from their favorite episodes over the past four years. 23:00 The value of connection and friendship within the developer community. 26:09 Four years ago... Don't miss this milestone episode as Two Voice Devs look back and look forward!
Join Allen Firstenberg and Roger Kibbe as they delve into the exciting world of local, embedded LLMs. We navigate some technical gremlins along the way, but that doesn't stop us from exploring the reasons behind this shift, the potential benefits for consumers and vendors, and the challenges developers will face in this new landscape. We discuss the "killer features" needed to drive adoption, the role of fine-tuning and LoRA adapters, and the potential impact on autonomous agents and an appless future. Resources: * https://developer.android.com/ai/aicore * https://machinelearning.apple.com/research/introducing-apple-foundation-models Timestamps: 00:20: Why are vendors embedding LLMs into operating systems? 04:40: What are the benefits for consumers? 09:40: What opportunities will this open up for app developers? 14:10: The power of LoRA adapters and fine-tuning for smaller models. 17:40: A discussion about Apple, Microsoft, and Google's approaches to local LLMs. 20:10: The challenge of multiple LLM models in a single browser. 23:40: How might developers handle browser compatibility with local LLMs? 24:10: The "three-tiered" system for local, cloud, and third-party LLMs. 27:10: The potential for an "appless" future dominated by browsers and local AI. 28:50: The implications of local LLMs for autonomous agents.
Join us on Two Voice Devs as we welcome back Roger Kibbe. Fresh off emceeing the developer track at the Unparsed Conference in London, Roger shares his insights on the biggest takeaways, trends, and challenges facing #GenAI, #VoiceFirst and #ConversationalAI developers today. Get ready for a dose of reality as Roger emphasizes the need to view LLMs as powerful tools – think hammers – rather than magical solutions. We dive deep into: Timestamps: * 0:00 - Intro * 1:56 - Exploring the Unparsed Conference * 4:47 - LLMs: The hype vs. the reality for developers * 6:37 - The underappreciated power of LLMs for "understanding", not just generating * 11:03 - The right tool for the job: Why a toolbox approach is essential for conversational AI * 13:52 - Beyond the chatbot: Detecting emotion and the future of human communication * 20:28 - Hackathon highlights and the need for more realistic QA approaches * 28:55 - Navigating the shift from deterministic to stochastic systems * 31:59 - Will AI replace junior developers? * 36:30 - How senior developers can (and can't) benefit from AI coding assistants * 39:04 - Final thoughts: The value of cutting through the hype Don't miss this insightful conversation about the future of conversational AI development – grab your toolbox and hit play!
What should people developing with LLMs learn from a decade of experience building Alexa skills? How will Alexa skill developers leverage the latest #GenerativeAI and #CoversationalAI tools as they continue to build #VoiceFirst and multimodal skills? Join Allen and Mark on Two Voice Devs as they delve into the evolving landscape of Alexa skill development in the era of large language models (LLMs). Sparked by a thought-provoking discussion on the Alexa forums, they explore the potential benefits and challenges of integrating LLMs into skills. Key topics and timestamps: (0:00:00) Introduction (0:02:00) LLMs and the Future of Alexa Skills (0:04:00) Limitations of Current Alexa Skill Model with LLMs (0:07:00) Benefits and Drawbacks of Developing for Alexa (0:10:30) Overlooked Potential of Multimodality with LLMs (0:14:50) Lessons from Early Voice Experiences (0:17:00) Intents vs. Tool/Function Calling (0:21:30) Handling Hallucinations and Off-Topic Requests (0:22:00) LLMs' Ability to Handle Nuanced Intents (0:28:00) Cost Considerations of LLMs (0:32:00) Monetizing LLM-Powered Alexa Skills (0:39:40) The Future of Alexa Skill Development: A Hybrid Approach? (0:40:00) Outro Tune in as they discuss the need for hybrid models, the importance of conversation design, and the uncertain future of monetization in this rapidly changing landscape. Don't forget to join the conversation on the Alexa Slack channel or leave your thoughts in the comments below!
OpenAI's ChatGPT 4o and GPT 4o announcements have sent shockwaves through the developer community! In this episode of Two Voice Devs, Mark and Allen dive into the implications of these new models, comparing them to Google's Gemini. We discuss: [00:00:10] Initial takeaways from the OpenAI presentations. [00:02:29] The impressive voice capabilities of ChatGPT 4o. [00:04:49] Concerns about OpenAI's ambitions for conversational AI. [00:07:30] The difference between "doing" and "knowing" AI systems. [00:14:15] A detailed breakdown of GPT 4o, including its strengths and weaknesses. [00:17:43] Comparison with Gemini and implications for developers. [00:19:41] The importance of competition in driving innovation and lowering prices. [00:21:48] The future of AI assistants and the role of developers. Let us know what you think about GPT 4o and Gemini! Have you used them? Share your experiences and thoughts in the comments below.
Allen Firstenberg chats with fellow Google Developer Expert (GDE) Mike Wolfson about his career, the evolution of Android, and his new interest in generative AI. Mike shares his thoughts on the future of AI with agents, Large Action Models (LAMs), and the potential of the "Rabbit," a new AI-powered device. Does the Rabbit live up to its promise? If not - what could? Timestamps: 00:00:00 - Introduction 00:01:32 - Mike's career journey 00:04:15 - Transition from enterprise Java to Android development 00:05:04 - Creating "Droid of the Day" app 00:06:49 - Becoming an Android developer and Google Developer Expert 00:09:23 - Shift in focus from Android to generative AI 00:10:57 - Generative AI as a platform 00:11:47 - The Rabbit and its potential 00:14:59 - Mike's take on the Rabbit as a developer 00:17:31 - Current integrations with the Rabbit 00:19:52 - The future of AI and the Rabbit 00:24:46 - Edge AI and its potential 00:27:16 - The capabilities of the Rabbit and its future 00:32:17 - The Rabbit vs. other devices like meta glasses 00:34:28 - Conclusion and call to action
Join Allen and Roya as they dissect the major AI announcements from Google I/O 2024. From Gemini updates and new models to responsible AI and groundbreaking projects like ASTRA, this episode dives into the future of AI development. Timestamps: [00:00:00] Introduction and Google I/O Overview [00:02:00] Gemini 1.5 Flash & Gemini 1.5 Pro: New Models and Features [00:04:30] AI Studio Access Expansion for Europe, UK & Switzerland [00:06:20] Choosing the Right AI Model for Your Project [00:06:50] Gemini Nano in Google Chrome: Bringing AI to the Browser [00:08:00] Pali Gemma: Open Source Model with Image & Text Input [00:08:50] AI Red Teaming & Model Safety Tools [00:09:50] Parallel Function Calling for Developers [00:10:30] Video Frame Extraction: Easier Multimodal Development [00:11:20] GenKit: Firebase's Generative AI Integration [00:12:00] Gems: Customizable Gemini for Developers [00:12:50] Semantic Embeddings: Understanding & Creating Images [00:13:50] Imogen 3: API Access for Image Generation [00:14:20] Veo: Video Generation with Lumiere Architecture [00:14:50] SynthID: Watermarking & Identifying Generated Content [00:16:30] Responsible AI & Inclusivity [00:18:00] Gemini Developer Competition: Win a DeLorean & Cash Prizes! [00:19:30] Project ASTRA: Multimodal AI with Contextual Memory [00:21:00] Google Glasses & Project ASTRA Integration [00:22:00] Closing Thoughts: AI for Everyone
Join Allen and Mark as they delve into Voiceflow's groundbreaking new feature: intent classification using a hybrid of LLMs and classic NLU models. Discover how this innovative approach leverages the strengths of both technologies to achieve greater accuracy and flexibility in understanding user intent. How they're doing it just may blow your mind!
Join Allen Firstenberg and guest host Stefania Pecore on Two Voice Devs as they delve into the exciting announcements and highlights from Google Cloud Next 2024! This episode focuses on the latest advancements in AI and their impact on the healthcare industry, providing valuable insights for developers and tech enthusiasts. Learn more: * https://cloud.google.com/blog/topics/google-cloud-next/google-cloud-next-2024-wrap-up Timestamps: 00:00:00: Introduction 00:01:02: Stefania's background and journey into AI 00:07:20: Stefania's overall experience at Google Cloud Next 00:11:59: Focus on Healthcare and AI applications, including Mayo Clinic's Solution Studio 00:15:38: Exploring the new Gemini product suite and its features like code assistance and data analysis 00:20:44: Discussing Gemini API updates, including the 1.5 public preview with 1M token context window and grounding tools 00:26:06: Vertex AI Agent Builder and its no-code approach to chatbot developmen t 00:33:02: Hardware announcements, including the A3 VM with NVIDIA H100 GPUs 00:35:24: Stefania's reflections on Cloud Next and the value of attending Tune in to discover the future of AI and its transformative potential, especially in the healthcare sector. Share your thoughts on the Google Cloud Next announcements in the comments below!
This episode of Two Voice Devs takes a closer look at BERT, a powerful language model with applications beyond the typical hype surrounding large language models (LLMs). We delve into the specifics of BERT, its strengths in understanding and classifying text, and how developers can utilize it for tasks like sentiment analysis, entity recognition, and more. Timestamps: 0:00:00: Introduction 0:01:04: What is BERT and how does it differ from LLMs? 0:02:16: Exploring Hugging Face and the BERT base uncased model. 0:04:17: BERT's pre-training process and tasks: Masked Language Modeling and Next Sentence Prediction. 0:11:11: Understanding the concept of masked language modeling and next sentence prediction. 0:19:45: Diving into the original BERT research paper. 0:27:55: Fine-tuning BERT for specific tasks: Sentiment Analysis example. 0:32:11: Building upon BERT: Exploring the Roberta model and its applications. 0:39:27: Discussion on BERT's limitations and its role in the NLP landscape. Join us as we explore the practical side of BERT and discover how this model can be a valuable tool for developers working with text-based data. We'll discuss i ts capabilities, limitations, and potential use cases to provide a comprehensive understanding of this foundational NLP model.
Embark on a wild race with Gemma as we explore the exciting (and sometimes slow) world of running Google's open-source large language model! We'll test drive different methods, from the leisurely pace of Ollama on a local machine to the speedier Groq platform. Join us as we compare these approaches, analyzing performance, costs, and ease of use for developers working with LLMs. Will the tortoise or the hare win this race? Learn more: * Model card: https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/335 * Ollama: https://ollama.com/ * LangChain.js with Ollama: https://js.langchain.com/docs/integrations/llms/ollama * Groq: https://groq.com/ Timestamps: 0:00:00 - Introduction 0:03:05 - Getting to Know Gemma: Exploring the Model Card 0:05:30 - Vertex AI Endpoint: Fast Deployment, But at What Cost? 0:13:40 - Ollama: The Tortoise of Local LLM Hosting 0:17:40 - LangChain Integration: Adding Functionality to Ollama 0:21:44 - Groq: The Hare of LLM Hardware 0:26:06 - Comparing Approaches: Speed vs. Cost vs. Control 0:27:35 - Future of Open LLMs and Google Cloud Next #GemmaSprint This project was supported, in part, by Cloud Credits from Google
The Alexa Developer Rewards Program (ADR) is shutting down, leaving many developers wondering about the future of Alexa skills. Mark and Allen discuss the implications of this change, explore alternative monetization options, and share their thoughts on the future of skill development. Timestamps: 0:00 - Intro and announcement of the ADR program ending 1:45 - History of the ADR program and its impact on skill development 7:13 - Discussion of the Skill Developer Accelerator Program (SDAP) and Skill Coach 14:04 - Status of AWS credits for skill developers 15:10 - Incentives for building skills in the absence of the ADR program 21:30 - Cost-benefit analysis and the future of skill development 25:48 - Call to action: Share your thoughts on the ADR program ending and the future of skills Join the conversation and let us know what you think!E
As large language models (LLMs) become increasingly powerful, ensuring their responsible use is crucial. In this episode of Two Voice Devs, Allen and Mark delve into Google's Gemini LLM, specifically its built-in safety features designed to prevent harmful outputs like harassment, hate speech, sexually explicit content, and dangerous information. Join them as they discuss: (00:01:55) The importance of safety features in LLMs and Google's approach to responsible AI. (00:03:08) A walkthrough of Gemini's safety settings in AI Studio, including the four categories of evaluation and developer control options. (00:06:51) Examples of how Gemini flags potentially harmful prompts and responses, and how developers can adjust settings to control output. (00:08:55) A deep dive into the API, exploring the parameters and responses related to safety features. (00:19:38) The challenges of handling incomplete responses due to safety violations and the need for better recovery strategies. (00:26:47) The importance of industry standards and finer-grained control for responsible AI development. (00:29:00) A call to action for developers and conversation designers to discuss and collaborate on best practices for handling safety issues in LLMs. This episode offers valuable insights for developers working with LLMs and anyone interested in the future of responsible AI. Tune in and share your thoughts on how we can build safer and more ethical AI systems!
In this episode of Two Voice Devs, Mark and Allen discuss how developers can leverage AI tools like ChatGPT to improve their workflow. Mark shares his experience using ChatGPT to generate an OpenAPI specification from TypeScript types, saving him significant time and effort. They discuss the benefits and limitations of using AI for code generation, emphasizing the importance of understanding the generated code and maintaining healthy skepticism. Timestamps: 00:00:00 Introduction 00:00:49 Using AI as a developer tool 00:01:17 Generating OpenAPI specifications with ChatGPT 00:04:02 Mark's prompt and TypeScript types 00:05:37 Reviewing the generated OpenAPI specification 00:07:12 Adding request examples with ChatGPT 00:10:11 Benefits and limitations of AI code generation 00:13:43 Using AI tools for learning and understanding code 00:17:39 Trusting AI-generated code and potential for bias 00:19:04 Integrating AI tools into the development workflow 00:22:38 The future of AI in software development 00:23:17 Programmers as problem solvers, not just code writers 00:25:41 AI as a tool in the developer's toolbox 00:26:07 Call to action: Share your experiences with AI tools This episode offers valuable insights for developers interested in exploring the potential of AI to enhance their productivity and efficiency.
Join us on Two Voice Devs as we chat with Xavi, Head of Cloud Infrastructure at Voiceflow, about the exciting new Voiceflow Functions feature and the future of conversational AI development. Xavi shares his journey into the world of bots and assistants, dives into the technology behind Voiceflow's infrastructure, and explains how functions empower developers to create custom, reusable components for their conversational experiences. Timestamps: 00:00:00 Introduction 00:00:49 Xavi's journey into conversational AI 00:06:08 Voiceflow's infrastructure and technology 00:09:29 Voiceflow's evolution and direction 00:13:28 Introducing Voiceflow Functions 00:16:05 Capabilities and limitations of functions 00:20:35 Future of Voiceflow Functions 00:21:02 Sharing and contributing functions 00:24:02 Technical limitations of functions 00:25:35 Closing remarks and call to action Whether you're a seasoned developer or just getting started with conversational AI, this episode offers valuable insights into the evolving landscape of bot development and the powerful capabilities of Voiceflow.
In this episode of Two Voice Devs, Allen Firstenberg and Roger Kibbe explore the rising trend of local LLMs, smaller language models designed to run on personal devices instead of relying on cloud-based APIs. They discuss the advantages and disadvantages of this approach, focusing on data privacy, control, cost efficiency, and the unique opportunities it presents for developers. They also delve into the importance of fine-tuning these smaller models for specific tasks, enabling them to excel in areas like legal contract analysis and mobile app development. The conversation dives into various popular local LLM models, including: Mistral: Roger's favorite, lauded for its capabilities and ability to run efficiently on smaller machines. Phi-2: A tiny model from Microsoft ideal for on-device applications. Llama: Meta's influential model, with Llama 2 currently leading the pack and Llama 3 anticipated to be comparable to ChatGPT 4. Gemma: Google's new open-source model with potential, but still under evaluation. Learn more: Ollama: https://ollama.com/ Ollama source: https://github.com/ollama/ollama LM Studio: https://lmstudio.ai/ Timestamps: 00:00:00: Introduction and welcome back to Roger Kibbe. 00:01:31: Roger discusses his career path and his passion for voice and AI. 00:06:33: The discussion turns to the larger vs. smaller LLMs. 00:13:52: Understanding key terminology like quantization and fine-tuning. 00:20:58: Roger shares his favorite local LLM models. 00:25:14: Discussing the strengths and weaknesses of smaller models like Gemma. 00:30:32: Exploring the benefits and challenges of running LLMs locally. 00:39:15: The value of local LLMs for developers and individual learning. 00:40:29: The impact of local LLMs on mobile devices and app development. 00:49:27: Closing thoughts and call for audience feedback. Join Allen and Roger as they explore the exciting potential of local LLMs and how they might revolutionize the development landscape!
Join Allen and Mark on Two Voice Devs as they dive into the world of Large Action Models (LAMs) and explore their potential to revolutionize how we build chatbots and voice assistants. Inspired by Braden Ream's article "How Large Action Models Work and Change the Way We Build Chatbots and Agents," the discussion dissects the core functions of conversational AI - understand, decide, and respond - and examines how LAMs might fit into this framework. Allen and Mark also compare and contrast LAMs with Large Language Models (LLMs) and Natural Language Understanding (NLU), highlighting the strengths and limitations of each approach. Tune in to hear their insights on: The evolution of Voiceflow and its shift towards LLMs (03:20) Understanding the core functions of conversational AI (05:40) Clippy as an example of a deterministic agent (06:15) The differences between deterministic and probabilistic models (07:50) NLU vs. LLMs for understanding user input (09:20) How LAMs might fit into the "decide" stage of conversational AI (18:50) The challenges of training LAMs and avoiding hallucinations (20:00) The potential of LAMs to improve response generation (29:30) Cost considerations of using LLMs vs. NLUs (37:00) Whether you're a seasoned developer or just curious about the future of conversational AI, this episode offers a thought-provoking discussion on the potential of LAMs and the challenges that lie ahead. Be sure to share your thoughts in the comments below! Additional Info: https://www.voiceflow.com/blog/large-action-models-change-the-way-we-build-chatbots-again
Google's Gemini 1.5 is here, boasting a mind-blowing 1 million token context window!
In this episode of Two Voice Devs, hosts Allen Firstenberg and Mark Tucker discuss Gemini, Google's latest name for its Generative AI... stuff. Originally known as separate products including Bard and Duet AI, Gemini encompasses a suite of AI tools, including chatbots, product-specific assistants, models, and APIs that developers can use for various tasks. The discussion covers how Gemini compares with offerings from other companies such as OpenAI and Microsoft, including visible similarities and differences. The show concludes by answering the question about why developers should care about this rename with a call to explore possibilities with AI tools like Gemini to let us create more natural and user-friendly interfaces. Learn more: https://blog.google/technology/ai/google-gemini-update-sundar-pichai-2024/ https://blog.google/products/gemini/bard-gemini-advanced-app/ 00:04 Introduction and Catching Up 00:55 Exploring the Gemini Model 04:09 Gemini vs OpenAI: A Comparison 10:20 Understanding the Gemini Branding 12:00 The Developer's Perspective on Gemini 17:46 Closing Thoughts and Future Discussions
In this episode of Two Voice Devs, hosts Allen Firstenberg and Mark Tucker discuss the CSS Speech Module Level 1 Candidate Recommendation Draft, a standard that enables webpages to talk, developed in collaboration with the voice browser activity. They explore its features including the 'aural' box model concept, voice families, earcons and more, drawing parallels with SSML and highlight its innovative approach to web accessibility complementing screen readers. Despite acknowledging its potential, they address some of its key omissions such as phonemes and the lack of a background audio feature. 00:04 Introduction and Welcome 01:14 Exploring the Concept of Webpages Talking 03:00 Deep Dive into CSS Speech Module 03:48 Understanding the Scope of CSS Speech Module 04:27 The Evolution of Voice Interaction 05:22 Comparing CSS Speech with SSML 07:13 The Power of CSS in Voice Development 22:49 The Impact of Voice Balance Property 29:20 The Limitations of CSS Speech 39:37 The Future of CSS Speech 42:50 Conclusion and Final Thoughts
Forget Apps! Talking to this Orange Cube Could Change Everything Is the app model broken? The creators of Rabbit R1, a new voice-first device, certainly think so. In this episode of Two Voice Devs, Mark and Allen break down this innovative device and its potential to change how we interact with technology. What do developers think about the technology underlying RabbitOS? You may be surprised! Key topics: 00:02:00 - What is the Rabbit R1? Rabbit R1 is a new type of device that prioritizes voice input and output. It aims to shift users away from apps and toward a more conversational way of interacting with technology. 00:05:17 - AI models: Rabbit uses a unique "large action model" to understand and complete tasks. It claims to do this faster and more intuitively than existing voice assistants. 00:14:14 - Teach Me mode: See how Rabbit can be trained to interact with new websites and applications. What implications does this have for the future? 00:18:41 - Can it replace apps? While that's a bold claim, Rabbit's conversational approach and innovative features show promise. Could this be the first step towards a new era in human-computer interaction? Additional thoughts: 00:25:06 - Hybrid approach: Rabbit smartly combines intent-based and language-based AI models, potentially offering speed and accuracy. 00:32:56 - Asynchronous interactions: It breaks away from the traditional request-response model, offering a more natural conversational experience that aligns with the Star Trek computer vision. 00:07:48 - Price: At just $199, many people are willing to check it out, and this could accelerate interest in voice-driven interfaces. Is Rabbit R1 a game-changer or just a gimmick? Let us know your thoughts in the comments!
In this episode of 'Two Voice Devs', hosts Allen Firstenberg and Mark Tucker discuss updates made to Alexa Presentation Language (APL) version 2023.3. They highlight conditional imports, updates made for animations, and more, including APL support for different devices and how to "handle" backward compatibility. Learn More: https://developer.amazon.com/en-US/docs/alexa/alexa-presentation-language/apl-latest-version.html 00:08 Introduction and Welcome 00:17 Alexa Presentation Language (APL) Overview 01:02 Understanding APL and its Components 03:23 Exploring APL's Functionality and Usage 05:22 APL's Versioning Strategy and Device Compatibility 09:23 New Features in APL 2023.3: Conditional Imports 15:22 New Features in APL 2023.3: Item Insertion and Removal Commands 18:05 New Features in APL 2023.3: Control Over Scrolling and Paging 19:43 New Features in APL 2023.3: Accessibility Improvements 20:36 New Features in APL 2023.3: Frame Component Deprecation 22:23 New Features in APL 2023.3: Data Property for Sequential and Parallel Commands 25:07 New Features in APL 2023.3: Support for Variable Sized Viewports 26:47 New Features in APL 2023.3: Support for Lottie Files 28:33 New Features in APL 2023.3: String Functions and Vector Graphic Improvements 30:11 New Features in APL 2023.3: Extensions and APL Cheat Sheets 37:26 Strategies for Backwards Compatibility in APL 38:40 Conclusion and Farewell
In their New Year's discussion, Mark and Allen explore their hopes and predictions for technological advancements in 2024. They discuss the future of Large Language Models (and if that's the right name for them now), expressing anticipation for improvements in latency issues and the potential for models to be hosted on devices rather than cloud-based platforms. The conversation also ventures into the world of AI agents, function calling, and the importance of developers in ensuring safety measures are integrated in AI systems. Finally, they exude excitement about the possibility of AI in multimedia formats, where tools can generate differing output forms like text, video, images, and possibly even audio directly. They explore potential developer opportunities and challenges, emphasizing the importance of understanding regulations and ensuring user privacy and safety. 00:04 Introduction and New Year Reflections 02:05 Looking Forward: Predictions for 2024 02:14 The Future of Large Language Models (LLMs) 03:08 The Impact of LLMs on Voice Assistants 07:44 The Potential of On-Device AI Models 10:14 The Role of Developers in the AI Landscape 20:11 The Future of Multimodal AI Models 26:35 The Importance of Regulations in AI 29:22 Conclusion: Exciting Times Ahead
Allen Firstenberg and Mark Tucker, hosts of Two Voice Devs, reflect on the year 2023, discussing significant changes and trends in the #VoiceFirst and #GenerativeAI industry and where their predictions from last year were accurate... or fell short. They discuss the transformation and challenges Amazon faced, gleaning predictions from hints at large language models (LLMs) from Google, Amazon, Microsoft, and Apple. They also mention the shift of Voiceflow towards LLMs and recall the notion of retrieval augmented generation. 00:04 Introduction and Welcome 00:12 Reflecting on the Past Year 01:13 Amazon's Progress and Challenges 01:59 Exploring Amazon's Monetization and Widgets 08:45 Google's Journey and the End of Conversational Actions 11:53 The Rise of Large Language Models (LLMs) 17:04 The Impact of Voiceflow and Dialogflow 20:48 Closing Remarks and New Year Wishes
Mark and Allen get into the Tech-mas spirit, with a little help from Bard. Hoping you all have the happiest of holiday seasons. #GenerativeAI #VoiceFirst #ConversationalAI #HappyHolidays
In this in-depth chat between Allen Firstenberg and Linda Lawton, they dive into the functionalities and potential of Google's newly released Gemini model. From their initial experiences to exciting possibilities for the future, they discuss the Gemini Pro and Gemini Pro Vision models, how to #BuildWithGemini, its focus on both text and images, and speedier and more cohesive responses compared to older models. They also delve into its potential for multi-modal support, unique reasoning capabilities, and the challenges they've encountered. The conversation draws interesting insights and sparks exciting ideas on how Gemini could evolve in the future. 00:04 Introduction and Welcome 00:23 Discussing the New Gemini Model 01:33 Comparing Gemini and Bison Models 02:07 Exploring Gemini's Vision Model 03:03 Gemini's Response Quality and Speed 03:53 Gemini's Token Length and Context Window 05:05 Gemini's Pricing and Google AI Studio 05:33 Upcoming Projects and Previews 06:16 Gemini's Role in Code Generation 07:54 Gemini's Model Variants and Limitations 12:01 Creating a Python Desktop App with Gemini 14:07 Gemini's Potential for Assisting the Visually Impaired 18:35 Gemini's Ability to Reason and Count 20:15 Gemini's Multi-Step Reasoning 20:33 Testing Gemini with Multiple Images 21:52 Exploring Image Recognition Capabilities 22:13 Discussing the Limitations of 3D Object Recognition 23:53 Testing Image Recognition with Personal Photos 24:52 Potential Applications of Image Recognition 25:45 Exploring the Multimodal Capabilities of the AI 26:41 Discussing the Challenges of Using the AI in Europe 27:26 Exploring the AQA Model and Its Potential 33:37 Discussing the Future of AI and Image Recognition 37:12 Wishlist for Future AI Capabilities 40:11 Wrapping Up and Looking Forward
Join Allen Firstenberg and guest host Noble Ackerson, at the Voice and AI 2023 conference. They discuss the growth of AI and how LLM (large language models) are affecting the tech world and delve deep into topics like LangChain, generative AI, and how to optimize AI operations to tackle network latency. There are also plenty of audience questions, exploring the current challenges in AI and potential solutions. 00:03 Introduction and Background of Two Voice Devs 00:31 The Evolution of Voice Technology and AI 01:50 Interactive Q&A Session Begins 01:58 Discussion on Open Source Software and Generative AI 02:59 Deep Dive into LangChain 05:43 Audience Participation and Questions 06:00 Challenges with LangChain and Overhead 08:14 Exploring the Intersection of Voice Technology and Generative AI 12:51 Addressing Network Latency in Voice Technology 19:49 The Future of AI and Voice Technology 26:53 Addressing the Challenges of Network Latency 37:13 Closing Remarks and Future Engagements
Join Mark Tucker and Allen Firstenberg on Thanksgiving Day for a sincere heart-to-heart on the highs and lows of their tech industry journey. Expressing their gratitude for their family, friends, and colleagues in the tech industry and beyond, they acknowledge the challenging times faced by many. They call on their viewers to remember how unique and important they are and invite them to express their thoughts and emotions openly by reaching out to them. 00:04 Introduction and Thanksgiving Greetings 00:28 Reflecting on the Past Year 02:19 Gratitude for Personal Relationships 03:54 Acknowledging Industry Challenges and Layoffs 05:59 Importance of Community and Support 07:59 Encouragement and Closing Remarks
Mark Tucker and Allen Firstenberg delve into the recent changes made by VoiceFlow. We explore how VoiceFlow, originally a design resource for Alexa Skills and Google Assistant Actions, has evolved and shifted to include chatbot roles and generative AI responses. Highlighted too are the implications of VoiceFlow's decoupling and transition to 'bot logic as a service'. We look at the necessary technical adjustments and solutions required in the aftermath of these changes, and Mark shares how he created a Jovo plugin as a hassle-free 'integration layer' for handling multiple platforms, taking advantage of Jovo's generic input output. More info: https://github.com/jovo-community/jovo4-voiceflowdialog-app 00:04 Introduction 00:54 Introducing VoiceFlow 01:44 Exploring VoiceFlow's Evolution 03:13 Understanding VoiceFlow's Changes 05:39 Explaining the VoiceFlow Integration 14:39 Discussing the VoiceFlow Dialog API 25:42 Conclusion
On this episode, Mark Tucker and Allen Firstenberg dive deep into the latest announcements by OpenAI. They discuss various developments including the launch of GPTs (collections of prompts and documents with configuration settings), the new text-to-speech model, upcoming GPT-4 Turbo, reproducible outputs, and the introduction of the Assistant API. While they express excitement for what these developments could mean for #VoiceFirst, #ConversationAI, and #GenerativeAI, they also voice concerns about discovery solutions, monetization, and the reliance on platform-based infrastructure. Tune in and join the conversation. More info: https://openai.com/blog/new-models-and-developer-products-announced-at-devday 00:04 Introduction and OpenAI Announcements Edition 00:52 Discussion on OpenAI's New Text to Speech Model 02:15 Exploring the Pricing and Quality of OpenAI's Text to Speech Model 02:52 Concerns and Limitations of OpenAI's Text to Speech Model 06:24 Introduction to GPT 4 Turbo 06:48 Benefits and Limitations of GPT 4 Turbo 09:27 Exploring the Features of GPT 4 Turbo 18:52 Introduction to GPTs and Their Potential 22:22 Concerns and Questions About GPTs 32:14 Discussion on the Assistant API 37:32 Final Thoughts and Wrap Up
Allen and Mark discuss the practical uses and advantages offered by MakerSuite, an API currently available for Google's PaLM #GenerativeAI model. We look at its unique feature that treats prompts like templates, allowing for versatile manipulation of these templates for varying results. We further delve into how it saves these prompts in Google Drive and how this can be linked to LangChain's new hub concept, leading to an effective 'MakerSuite hub.' Finally, we explore if prompts are more like code or content, and how that fits into the development process. What do you think? More info: MakerSuite: https://makersuite.google.com/ MakerSuite Hub in LangChain JS: https://js.langchain.com/docs/ecosystem/integrations/makersuite
Mark and Allen explore TypeChat - a new library from Microsoft that makes prompt engineering for function-like operations in #ConversationalAI easier and more robust. Is this a replacement for Intents? Does it go beyond what we could do with Intent-based systems? Is it lacking something? Let's explore! Learn more: https://github.com/microsoft/TypeChat
What started as a casual conversation between Mark and Allen turned into a brief exploration of what Retrieval Augmented Generation (RAG) means in the #GenerativeAI and #ConversationalAI world. Toss in some discussion about VoiceFlow and Google's Vertex AI Search and Conversation and we have another dive into the current hot method to bridge the Fuzzy Human / Digital Computer divide.
Last week, before Google's annual hardware event, Allen teased part of his prediction about Google Assistant and Bard. This week, we'll show the full clip of Allen's prediction and see just how close he was. Then Mark and Allen discuss how recent announcements from OpenAI, Amazon Alexa, and Google compare to each other and, more important, what they each mean for developers in a #GenerativeAI, #ConversationalAI, and perhaps even a #VoiceFirst world, and perhaps make a few more predictions and what we'll hear next. More info: Blog post about Assistant With Bard: https://blog.google/products/assistant/google-assistant-bard-generative-ai/ Announcement at the the Made By Google event: https://www.youtube.com/live/pxlaUCJZ27E?si=I1noN-l3LQHgBktp&t=2941
The Google Cloud Next conference is a massive display of the latest technologies and products available from Google Cloud - from AI to Zero-Trust solutions. Unsurprisingly, #MachineLearning was prominent in this years show, so Mark and Allen take a look at some of the biggest #GenerativeAI and #ConversationalAI announcements this year. More info: https://cloud.google.com/blog/topics/google-cloud-next/next-2023-wrap-up
Mark shares the exciting news that Amazon Alexa will soon have a #VocieFirst #ConversationalAI LLM chat mode! While Allen agrees that this is very exciting news, he still has quite a few questions about how #GenerativeAI technology will fit into Alexa skills. We ask the difficult questions and see what answers are currently out there. What do you think about this announcement from Alexa? More info: LLM feature description: https://developer.amazon.com/en-US/blogs/alexa/alexa-skills-kit/2023/09/alexa-llm-fall-devices-services-sep-2023 Event video: https://youtu.be/_JcP7N0QPOk
Noble and Allen take a look back at our experiences at this years VOICE + AI conference. What were the big topics being discussed? The amusing moments? And what do we want to see next year? #GenerativeAI #ConversationalAI #VoiceFirst
Allen and guest host Linda have a wide ranging conversation, from Linda's career path and her experiences as a Google Developer Expert for Google Analytics, to how she leveraged that knowledge while trying out something new with Google's #GenerativeAI tool, MakerSuite and the PaLM API. We take a close look at how developers can use prompts (more than one!) to help turn a user's request into actionable data structures that feed into an API and get results. More from Linda: https://LindaLawton.DK https://daimto.com #MakerSuiteSprint #LargeLanguageModel
We're just days away from the annual VOICE+AI conference, hosted this year in Washington, DC. Both Allen and Noble will be speaking (and hosting a live and in person recording of a future episode!), so we'll give a little preview of what you can hear if you're attending.
Allen and Mark revisit a conversation from episode 146 where they discovered Google had a Vector Database. Now, several months later, Allen has done some work with the Google Cloud Vertex AI Matching Engine and incorporated it into LangChain JS. We discuss why this is important, and how it fits into the overall landscape of LLMs and MLs today. (And Allen has a little announcement towards the end.) More info: * Matching Engine: https://cloud.google.com/vertex-ai/docs/matching-engine/overview * LangChain JS: https://js.langchain.com/docs/modules/data_connection/vectorstores/integrations/googlevertexai
This seems like an easy question, right? If you want to do #ConversationalAI or #GenerativeAI on your own machine with a model such as Llama 2, you can just download the model and... well... then what? This is the question posed to guest host Noble Ackerson - and the answer was both more complicated and simpler than Allen could imagine!
Amazon has made some changes to the Alexa Presentation Language, dubbing this version 2023.2, and Allen is a bit confused about what these updates bring. Mark, however, clarifies what's new, how it relates to what was previously available, and why some users can benefit from this latest APL release.
One of the neat features we've seen come out of the #GenerativeAI and #ConversationalAI explosion recently has been the attention being paid to text embeddings and how they can be used to radically change how we index and search for things. Allen, however, has recently been working with an image embedding model from Google, including incorporating it into LangChain JS. Mark asks about what that process was like, what this new model lets us do, and starts to explore some of the potential of this new tool that is available for everyone. References: LangChain JS module: https://js.langchain.com/docs/modules/data_connection/experimental/multimodal_embeddings/google_vertex_ai Information from Google: https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-image-embeddings Google Model Garden info: https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/5 XKCD: https://xkcd.com/1425/
Three years of Two Voice Devs! There's no doubt that the #VoiceFirst industry has changed over that time, with the rise of #GenerativeAI and #ConversationalAI taking the world by storm. Mark and Allen look back at how the show has evolved over this time, and why we hope you'll be joining us as we continue forward on our journey!
Guest Host Xavier Portilla returns to chat with Allen about some of the latest additions to Dialogflow CX. New system functions make some of the processing you can do on inputs easier and faster, while prebuilt flows and flow scoped parameters make it easier to have clearly defined, and reusable, components in your conversation design. More info: https://cloud.google.com/dialogflow/docs/release-notes#July_05_2023
Guest host Xavier Portilla joins Allen to take a look at a new slot type that the Alexa team has in public beta. How can this new type be used? How does it differ from previous slot types? And what is a slot type anyway?
Guest Host Leslie Pound joins Allen to discuss her perspective on software development and #GenerativeAI and how, rather than trying to translate our fuzzy side, developers should think about how it helps us be more aware of how users are seeking to be more inspired or creative.
Noble Ackerson returns to discuss about a recent presentation that Allen made to the Google Developer Group NYC chapter where he illustrates how #GenerativeAI can be used as a bridge between the discrete nature of computers and the "fuzzy" nature of humans. He and Noble discuss how Large Language Models, such as OpenAI and Google's PaLM 2, along with libraries like LangChain become a powerful tool in every developer's toolbox.
Allen is joined by Noble Ackerson to discuss the latest feature that OpenAI has included with it's GPT models. Functions provide a well defined way for developers to turn unstructured human input to a more structured format that can be processed by your code or using a library such as LangChain. We take a look at both how they can be used, but some of the open questions that remain about their use. More info: - https://platform.openai.com/docs/guides/gpt/function-calling
This week, Google completed the "sunset" of Conversational Actions for the Google Assistant. Mark and Allen discuss the ups and downs of Actions on Google, how it fit into the #VoiceFirst landscape, and what may come next.
Another milestone episode! Mark and Allen take advantage of the event to look back at our predictions from episode 100, look back at how #VoiceFirst development has changed over the past 50 episodes (and several years), and look forward to what we'll be talking about in the next 50 episodes.