POPULARITY
This week is all about Voices! 🎶🎤🔊 Mandy Chan joins Melanie and Mark to discuss the intricacies of building user Voice user interfaces with Actions on Google, developing with SSML and more! Mandy Chan Mandy Chan is the developer community manager for the Actions On Google team. Her role is to help expand the funnel of the Actions on Google developer community by creating practical tools and content like http://bit.ly/aog-codelab-1 and http://bit.ly/aog-codelab-2 Mandy began to build voice applications back in early 2016, and since then, she has built more than a dozen Voice Applications on Actions On Google and other platforms. One of her most frequently downloaded open source projects is called the SSML-Builder which creates well-formed Speech Synthesis Markup Language without worrying about string concatenation. You can learn more about her open source project on http://bit.ly/ssml-build When she is not pondering about how to improve the developer experience, you can find her hiking at mountains or learning new magic tricks. You can also learn more about Mandy by following @MandyChanNYC Cool things of the week AI at Google: our principles blog Incorporating Google’s AI Principles into Google Cloud blog Deploying to Google Kubernetes Engine blog Fighting fire with machine learning: two students use TensorFlow to predict wildfires blog Together, we can help Puerto Rico recover donation match Introducing sole-tenant nodes for Google Compute Engine — when sharing isn’t an option blog docs Interview Actions on Google site docs github console g+ community ssml-builder site npm Advanced SSML by Leon blog Actions on Google: SSML docs Actions on Google Codelabs level one level two Dialogflow site docs console Google Assistant SDK for devices site Cloud Functions for Firebase docs Google Action Firebase Services docs To get inspired by some interesting voice applications voice experiment Mandy Chan medium github Systers on June 21st 9AM PST – Getting started with Actions on Google Workshop site Question of the week I want to push a Docker image to Google Container Registry via docker push. How can I set things up so that I don’t have to use gcloud docker -- push every time? Pushing and Pulling Images docs Authentication Methods docs Where can you find us next? Mark is speaking at the San Francisco Kubernetes Meetup: Scaling Game Servers and the Conduit Service Mesh on June 14th, and also speaking at the Online Kubernetes Community Meeting on the 21st of June, at 10am Pacific. Melanie is speaking at a joint WiMLDS and PyLadies event “Paths to Data Science” on June 26th and Stanford AI4ALL on June 28th.
If you like Homie & Lexy, please give us a review on iTunes or wherever you get podcasts. Homie & Lexy was created by Doug Schumacher at Arrovox. The voices for Homie and Lexy are generated using Speech Synthesis Markup Language with Polly, Amazon’s Text-to-Speech technology. Thanks: Allison Beda of A Muse Productions -- script development Jon Tidey of Epic Sounds -- audio production Marco Nicolis and Binney Peh of Amazon AWS -- Polly text-to-voice technology
If you like Homie & Lexy, please give us a review on iTunes or wherever you listen to podcasts.Homie & Lexy was created by Doug Schumacher at Arrovox.The voices for Homie and Lexy are generated using Speech Synthesis Markup Language with Polly, Amazon’s Text-to-Speech technology.Thanks:Allison Beda of A Muse Productions -- script developmentJon Tidey of Epic Sounds -- audio productionMarco Nicolis and Binney Peh of Amazon AWS -- Polly text-to-voice technology
Soroush has a new mic ATR2500-USB Thanks to you, Patreon supporters, for buying us new mics! Chris is making an Alexa Skill FlightAware ADS-B Exchange Cheap ADS-B Aircraft Radar (this isn't Chris's exact setup, but it's similar) What it's like to build an Alexa skill - and how you can do it yourself Build your First Alexa Skill Fact Skill Tutorial: Build an Alexa Skill in 6 Steps AWS Lambda Creating a Deployment Package (Node.js) Speech Synthesis Markup Language (SSML) Reference Chris's ADS-B posts: Monitoring aircraft via ADS-B on OS X Quick ADS-B monitoring on OS X Soroush is using Sourcery Sourcery Sourcery in Practice Kyle Fuller: GitHub @kylef, Twitter @kylefuller Stencil SwiftTemplate Commit from Krunoslav Zaher: “Swift templates proof of concept” Equality.swifttemplate Chris helped a friend who's making a Swift CLI program dyld: Library not loaded: @rpath/libswiftAppKit.dylib Referenced from: /Users/friend/Library/Developer/Xcode/DerivedData/application-gqcotuckdopephaodrgawgaxuzwr/Build/Products/Debug/CSwiftV/CSwiftV.framework/Versions/A/CSwiftV Reason: image not found If my advice turns out to have been helpful, I'll publish it verbatim in a blog post. In the meantime, here are some relevant links I sent this friend: CocoaPods 0.36 - Framework and Swift Support What are Frameworks? Bundle Structures Swift.org - ABI Dashboard Swift.org - Package Manager Building a command line tool using the Swift Package Manager How to build a custom Swift framework and how is it related to the SPM? Getting Started with Swift Package Manager An Introduction to the Swift Package Manager SO question: OSX Command Line Tool with Swift Cocoa Library, Library not loaded SO answer about dynamic frameworks in a CLI tool SO question: Setting up a Framework on macOS Command Line apps - Reason: image not found kylef/Commander README: frameworks and rpath JP Simard on Twitter: “the app's rpath should point to the frameworks' parent locations” realm-cocoa-converter: “A library that provides the ability to import/export Realm files from a variety of data container formats.”
I had my first Alexa Skill certified today, one I built over the past couple of weeks for City Cinema here in Charlottetown. “Alexa Skills” are custom apps built for Amazon’s voice-controlled Echo line of products; think of them as a very early prototype of the computer on Star Trek, but lacking most of the artificial intelligence. While Echo devices aren’t yet available for sale in Canada, they work in Canada, at least mostly, and it’s clear they’ll be here eventually. So it’s a good time to build up some “voice app” muscle memory, and City Cinema was a good, simple, practical use case. Simple and practical because there’s really only one thing people want to know about City Cinema: what’s playing. Tonight. On Friday. Next Thursday. So here’s a high-level overview of what it took to make an Alexa Skill. First, I needed to select an Invocation Name. This is the “trigger word” or the “app name” that Alexa will glue to my skill. I selected the obvious: City Cinema. Next, I created an Intent Schema, a JSON description of the things my skill can do, its methods, in other words. In this case, it can only do a single thing–tell you what’s playing–so there’s only a single intent defined, WhatsPlaying, that has an optional parameter (called a “slot” in Alexa-speak), the date. There are also a few built-in intents added to the schema to allow me to define what happens when a user answers “yes” or “no” to a question, and when they cancel or stop. { "intents": [ { "intent": "WhatsPlaying", "slots": [ { "name": "Date", "type": "AMAZON.DATE" } ] }, { "intent": "AMAZON.YesIntent" }, { "intent": "AMAZON.NoIntent" }, { "intent": "AMAZON.CancelIntent" }, { "intent": "AMAZON.StopIntent" } ] } Next, I defined the Sample Utterances, a list of the actual things that users can say that will initiate a “what’s playing” lookup: WhatsPlaying what's playing on {Date} WhatsPlaying what's playing {Date} WhatsPlaying what's on {Date} WhatsPlaying what's showing on {Date} WhatsPlaying what is playing on {Date} WhatsPlaying what is playing {Date} WhatsPlaying what is on {Date} WhatsPlaying what is showing on {Date} WhatsPlaying showtimes for {Date} WhatsPlaying what are the showtimes for {Date} WhatsPlaying what are showtimes for {Date} WhatsPlaying showtimes for {Date} WhatsPlaying the schedule for {Date} WhatsPlaying schedule for {Date} Defining these utterances is where you realize that a lot of what we call “artificial intelligence” is still very ELIZA-like: a nest of if-then statements. Finally, I pointed the skill at an API endpoint on a server that I control. There are no limitations here other than that the endpoint must be served via HTTPS. From this point, I could code the endpoint in whatever language I liked; all I needed to do is accept inputs from Alexa, and respond with outputs. I opted to code in PHP, and to use the nascent third-party Amazon Alexa PHP Library as a convenience wrapper. There are a bunch of things the endpoint must do that using this wrapper makes easier: requests must be validated as having come from Amazon, and there must be application logic in place to respond to LaunchRequest, SessionEndedRequest, and IntentRequest requests. Other than that, the heavy lifting of the skill is relatively simple, at least in this case. When a user says, for example, “Alexa, ask City Cinema what’s playing tonight,” Alexa matches the utterance to one of those that I defined, WhatsPlaying what’s playing {Date}, and passes my endpoint the intent (WhatsPlaying) and the date (as YYYY-MM-DD). So I end up with a PHP object that looks, in part, like this: [intent] => Array ( [name] => WhatsPlaying [slots] => Array ( [Date] => Array ( [name] => Date [value] => 2017-03-02 ) ) ) From there I just use the same business logic that the regular CityCinema.net site uses to query the schedule database; I then munge the answer into SSML (Speech Synthesis Markup Language) to form the response. I pass back to Alexa a JSON response that looks like this: { "version": "1.0", "response": { "outputSpeech": { "type": "SSML", "ssml": "Playing at City Cinema on ????0302: Jackie at 7:00.Do you want to hear a description of this film?" }, "card": { "content": "Jackie at 7:00", "title": "Playing Thursday, March 2", "type": "Simple" }, "shouldEndSession": false }, "sessionAttributes": { "Operation": "FilmDetails", "Date": "2017-03-02" } } While I can return a plain text reply, using SSML allows me to express some additional nuance in how dates and times are interpreted, and to insert breathy pauses when it helps to increase clarity. Note that I also pass back some sessionAttributes values, Operation and Date. This allows me to respond properly when the user says “yes” or “no” in reaction to the question “Do you want to hear a description of this film?”; they are, in essence, parameters that are passed back to my endpoint with the follow-on intent. Like this, in part: case 'AMAZON.NoIntent': if (array_key_exists('Operation', $alexaRequest->session->attributes)) { $operation = $alexaRequest->session->attributes['Operation']; } switch ($operation) { case "FilmDetails": $message = ""; $message .= "Ok, see you at the movies!"; $message .= ""; $card = ''; $endSession = TRUE; break; } break; The Alexa Skills API also provides facility for passing back a “card,” which is a text representation (or variation) of the speech returned. For example, for a “what’s playing” intent, I return the name of the film and the time; if the user answers “yes” to the “Do you want to hear a description of this film?” question, then I follow up with a card that includes the full film description (I experimented with passing this back for speaking, but it was too long to be useful). And that’s it. The application logic is a little more complex than I’ve outlined, mostly to handle the edge cases and the required responses to things like a request without a date, or a request like “Alexa, launch City Cinema.” But the PHP endpoint code only runs 257 lines long. It is not rocket science. There’s an Apple-like certification process that happens once you’re ready to launch a skill to the public; in my case I submitted the skill for certification at 11:00 a.m. on February 28 and got back a positive response on March 2 at 1:46 a.m., so it was a less-than-48-hour turnaround. The skill is now live on Amazon.com. I foolishly selected “Canada” as the sole country where it would be available when I submitted the skill for certification; because the Echo isn’t available in Canada, this renders the skill effectively unusable for the moment because to use an Echo in Canada you have to pretend to be in the U.S. I’ve opened this up to all countries now, which requires a re-certification. So in a few days the world should have access to the skill. And, eventually, when the Echo gets released in Canada, the skill should be of practical utility to Echo owners in the neighbourhood.
