POPULARITY
"Sabemos bien poco sobre la historia de Irán y, sin embargo, es una de las culturas más ricas, antiguas e influyentes de la historia. El zoroastrismo, de hecho, nos acompaña hasta el presente y descubrir sus secretos no ha sido fácil.Para hablar sobre todo ello tenemos con nosotros a Jaime Martínez Porro, Licenciado en Filología Clásica por la Universidad de Salamanca, Máster en Textos de la Antigüedad Clásica en la misma universidad. Doctor en Iranística por la Universidad Libre de Berlín con el tema “Orthography and Recitation in the Avestan Manuscripts”. Actualmente es investigador postdoctoral del Instituto de Estudios de Irán de la Universidad Libre Berlín, en el proyecto “Corpus Avesticum Berolinense: An edition of the Zoroastrian Rituals in Avestan Language”, un proyecto de larga duración financiado por la Sociedad Alemana de Investigación. También es miembro del proyecto Avestan Digital Archive y ambos proyectos coordinados por el Prof. Alberto Cantera."
Send us a textShannon and Mary welcome Heidi Martin (@Droppin' Knowledge with Heidi) back to the podcast, and Heidi's husband, Adam, a LETRS trainer, also joins the conversation. They talk about all the aspects that go into the topic of Word Knowledge. What are all the things students should know and understand about a word to demonstrate complete knowledge of that term? You'll walk away from listening to this episode with a more complete understanding of all the layers of skills within Word knowledge that we need to help our readers build.RESOURCES MENTIONED DURING THE EPISODE: Heidi's website Heidi's social media TikTok Heidi's social media Instagram Heidi on FB Heidi's decodables Heidi's freebie about the Reading Brain SOR 101 (Science of Reading 101) Membership Strive for Five Conversations by Sonia Cabell and Tricia Zucker *Amazon affiliate link Etymonline our previous Science of Reading episode with Heidi our episode about Delightful Word Learning with Collette Hiller InferCabulary 3 Clicks Spelling Spelling Riddles Morpheme Magic Structured Word Inquiry (Dr. Pete Bowers) Bonus Episodes access through your podcast appBonus episodes access through PatreonFree Rubrics Guide created by usFinding Good Books Guide created by usInformation about our Patreon membershipScreen DeepA podcast decoding young brains and behavior in a digital world.Listen on: Apple Podcasts SpotifySupport the showGet Literacy Support through our Patreon
Ekev | How Do You Spell "Mezuzot"? by Rav Yitzchak Etshalom How many ways can "Mezuzot" be legitimately spelled? The Orthography of the Torah Rashi, in his comment on Devarim 6:9, infers from the written form of "Mezuzot" - מזוזת - that we need only write the words of שמע...ואהבת (and the rest) on one doorpost, since the written version alludes to "Mezuzat". However, our Sifrei Torah all have that word written מזזות. We explore the phenomenon of כתיב מלא וחסר - (plene and "defective" orthography), where Hebrew letters are used to "guide" reading and are read as vowels. We also explore the rabbinic history of variant texts of the Torah and how that challenge was met. Source sheet >>
This week we're exploring one facet of how stupid the written system of English is by featuring songs that feature words containing various pronunciations (all 6!) of "ough". Discussed in this episode: Fleetwood Mac - Although The Sun Is Shining (1969) Stephen Stills - Thoroughfare Gap (1978) The Beatles - I'm Looking Through You (1965) The Creation - Through My Eyes (1967) Johnny Thunders & The Heartbreakers - It's Not Enough (1977) Beastie Boys - Tough Guy (1994) Misfits - Cough/Cool (1977) Butthole Surfers - Cough Syrup (1996) X - I Must Not Think Bad Thoughts (1983) Alanis Morissette - You Oughta Know (1995) Beyonce - Love Drought (2016)
Join us for this interesting conversation with our guest Literacy Expert Susan Ebbers who will share the research and strategies surrounding learning to read.Ebbers will illustrate how research supports the entwining of phonology, orthography, morphology, and vocabulary when teaching children to read, and to read more capably and with greater comprehension; and how this type of multidimensional approach is even more effective when integrated within the context of phrases, sentences, passages, and stories. Ebbers will also discuss the role self-efficacy plays in nurturing a motivation to read despite difficulties.She will share ways to help students build skills systematically while also building confidence as well as strategies to:Reinforce basic decoding and “sight word” recognition Reinforce phonics, including polysyllabic decoding, in context Develop vocabulary and basic morphological awareness Engage interest and boost self-efficacy within the context of readingWe hope you will join us for this important conversation.Featured DownloadIntegrating Language Components: Examples from Power Readers®
Welcome to the NASCO Moments PodcastThis episode INDIGENOUS LANGUAGES - HOW TO COLLECTIVELY RESCUE OUR DIFFERENT MOTHER TONGUES AS NIGERIANSOur Guest is Dr. Ehoma Akimemi - Head of the Department of Linguistics University of JosHappy ListeningNASCO GroupClick Here to follow us on Instagram
In this episode of the Structured Literacy Podcast, I discuss how English words can seem confusing. Still, they are largely logical when we understand four key areas: phonics, orthography, morphology, and etymology. The truth is that spelling needs to be explicitly taught, just like reading. The challenge is finding a way to do that while building our knowledge of how words work. The full transcript can be found on the website www.jocelynseamereducation.comSpelling Success in Action 2 - Prefixes and Suffixes is now available for pre-order. Morphology is important to teach our students. It improves spelling, vocabulary, comprehension and knowledge about parts of speech. Our program covers instruction from words to sentences with differentiated content. For more information, visit www.jocelynseamereducation.com Quick LinksJocelyn Seamer Education HomepageThe Resource RoomThe Evergreen TeacherShopYoutube channelFacebook Page#jocelynseamereducation #literacy #bestpractice #earlyprimaryyears #primaryschool #primaryschools #primaryschoolteacher #earlyyearseducation #earlyyearseducator #structuredliteracy #scienceofreading #classroom #learning #learningisfun #studentsuccess #studentsupport #teacherlife #theresourceroom #theevergreenteacher #upperprimary #upperprimaryteacher #thestructuredliteracypodcast #phoneme #grapheme #phonics #syntheticphonics
Beth Williamson is a PhD student at Royal Holloway, University of London working collaboratively with the Royal Geographical Society (with IBG). Her research explores how the Royal Geographical Society (RGS) tackled the problem of ‘orthography' when recording and mapping place names in the nineteenth and early twentieth centuries, revealing how geography and linguistics, and politics and diplomacy, shaped the way the world was brought to ‘order'. In this episode of our 'Narratives of Nation' series, Beth explores the circumstances leading up to the appointment of the Orthography Committee at the RGS and the actions the committee took to achieve a uniform system of orthography. -------------- Image credit: The Royal Geographical Society (with the Institute of British Geographers) --------------- Technecast is a podcast series showcasing research from across the arts and humanities. It is produced by Edwin Gilson, Felix Clutson, Izzi Sykes, Morag Thomas and Olivia Aarons. Fancy turning your research into a podcast episode? We'd be happy to hear from you at technecaster@gmail.com.
Ben Kantor has recently published two books on the pronunciation of NT Greek with Eerdmans: A Short Guide to the Pronunciation of New Testament Greek The Pronunciation of New Testament Greek: Judeo-Palestinian Greek Phonology and Orthography from Alexander to Islam In this episode of the Biblical Languages Podcast, host Kevin Grasso interviews Ben on his new books. They discuss how we can know what NT Greek sounded like, different pronunciation systems in use in the first century, the importance of pronunciation, and what languages Jesus and other Jews most likely spoke in first century Palestine. Benjamin Paul Kantor is a Research Associate at the University of Cambridge in the United Kingdom. He received his B.A. in Classical Studies with an emphasis in Greek from the Hebrew University of Jerusalem in 2012. Subsequently, he received his Ph.D. in Hebrew Bible from the University of Texas in 2017. He specializes in the historical phonology of Greek and Hebrew and has particular interest in ancient Greek and Hebrew pedagogy. In addition to his research work, he also runs a website, KoineGreek.com, which focuses on providing “living language” resources for students and scholars of ancient Greek. As always, this episode is brought to you by Biblingo, the premier solution for learning, maintaining, and enjoying the biblical languages. Visit biblingo.org to learn more and start your 10-day free trial. If you enjoy this episode, be sure to subscribe on your favorite podcast app and leave us a review. You can also follow Biblingo on social media @biblingoapp to discuss the episode with us and other listeners.
Orthography is a noun that refers to the conventional spelling system of a language. The Greek word orthos (OR those) means ‘correct,' while the suffix G-R-A-P-H-Y comes from the Greek word for ‘writing.' So our word of the day may be used to refer to the way of spelling a word. Here's an example: My daughter is such a word nerd that when I asked her how cat was spelled, she gave me a ten-minute dissertation on the orthography of the word. I didn't want to know why the word was spelled that way — just how it was spelled.
Have you ever wondered what it takes to write down a language for the first time? In this episode, we interview Mike Cahill, the Orthography Services Coordinator for SIL. He helps us walk through the factors that need to be considered when creating a writing system, or orthography, for a language. When an orthography is usable and acceptable, it goes a long way in making a Bible translation usable and acceptable as well!Support the show
This hour: spelling — what it is, why it matters, and why some of us actually find it fun. There will be a test. GUESTS: Deb Amlen: Crossword columnist and senior staff editor of the crossword column Wordplay for The New York Times Richard Gentry: Education consultant and the author, most recently, of the Spelling Connectionsseries Peter Sokolowski: Editor at large at Merriam-Webster and a member of the Word Panel for the Scripps National Spelling Bee The Colin McEnroe Show is available as a podcast on Apple Podcasts, Spotify, Google Podcasts, Stitcher, or wherever you get your podcasts. Subscribe and never miss an episode! Subscribe to The Noseletter, an email compendium of merriment, secrets, and ancient wisdom brought to you by The Colin McEnroe Show. Join the conversation on Facebook and Twitter. Colin McEnroe, Taylor Doyle, Jacob Gannon, Jonathan McNicol, Cat Pastor, and Lily Tyson contributed to this show, which originally aired December 6, 2022.Support the show: http://www.wnpr.org/donateSee omnystudio.com/listener for privacy information.
Spelling things out, reflections, new parkrunner joy, Nicola's European jolly highlights and Danny caught the boog at East Park parkrun in Wolverhampton.
Thank you for Streaming this podcast episode with The Experts About Nothing! We continue to discuss the Playoffs, how does a leader lead, is Jamie Foxx the best entertainer of all time and more. -- See the live video playback ON Facebook, Youtube or Twitch. Subscribe to see more of what we do!! Just Follow Us For More -- Link with us: https://linktr.ee/ExpertsAboutNothingPodcast
Merriam-Webster's Word of the Day for April 25, 2023 is: orthography or-THAH-gruh-fee noun Orthography refers to “correct spelling,” or “the art of writing words with the proper letters according to standard usage.” // As the winner of several spelling bees, she impressed her teachers with her exceptional grasp of orthography. See the entry > Examples: “What makes [poet John] Ashbery difficult ... is nonetheless different from what makes his ‘modernist precursors' like Pound and Eliot difficult. It requires no supplemental linguistic, historical, philosophical, or literary knowledge to appreciate. ... His verse rarely relies on outright violations of the norms of syntax, orthography, or page layout to achieve its effects. Rather, it tends to be composed of grammatically well-formed units combined in such a way as to produce semantically nonsensical wholes.” — Ryan Ruby, The Nation, 27 Jan. 2022 Did you know? The concept of orthography (a term that comes from the Greek words orthos, meaning “right or true,” and graphein, meaning “to write”) was not something that really concerned English speakers until the introduction of the printing press in England in the second half of the 15th century. From that point on, English spelling became progressively more uniform. Our orthography has been relatively stable since the 1755 publication of Samuel Johnson's A Dictionary of the English Language, with the notable exception of certain spelling reforms, such as the change of musick to music. Incidentally, many of these reforms were championed by Merriam-Webster's own Noah Webster.
On this episode of Linguistics Everyday, Ed, Cara, and Drew discuss the Manchu language, the Jurchen people, and a little bit about the History of China. Contact us at @LinguisticsEver or email us at LinguisticsEveryday@gmail.com Some papers: Language death and language revivalism The case of Manchu by Daniel Kane The Manchu Academy of Beijing by Laura E. Hess Manchu-Chinese Bilingual Compositions and Their Verse-Technique by Giovanni Stary Some Observations on a Rubbing of a 17th—Century Inscription in Uighur-Mongolian Script with Elements of Manchu Script and Orthography by Hsiao Su-yin The Legitimization of the Qing Dynasty by Piero Corradini
This hour Colin and his guests school us on spelling – what it is, why it matters, and why some of us actually find it fun. There will be a test. GUESTS: Peter Sokolowski: Editor at large at Merriam-Webster. He contributed definitions to the brand-new Seventh Edition of the Official Scrabble Players Dictionary and has just been made a member of the Word Panel for the Scripps National Spelling Bee Richard Gentry: Education consultant and a former university professor, reading center director, and elementary school teacher. He has most recently published the spelling-book series Spelling Connections: A Word Study Approach for grades 1-6 Deb Amlen: Crossword columnist and senior staff editor of the crossword column “Wordplay” for The New York Times. She also writes the weekly “Diary of a Spelling Bee Fanatic” column. The Colin McEnroe Show is available as a podcast on Apple Podcasts, Spotify, Google Podcasts, Stitcher, or wherever you get your podcasts. Subscribe and never miss an episode! Join the conversation on Facebook and Twitter. Colin McEnroe, Lily Tyson, Jonathan McNicol, Taylor Doyle, Jacob Gannon, and Cat Pastor contributed to this show.Support the show: http://www.wnpr.org/donateSee omnystudio.com/listener for privacy information.
We discuss everything from the Latin alphabet to Egyptian hieroglyphs, and somehow manage to digress into how we only scream in vowels.
Why do foreigners in Japan speak in Katakana? Well, Bobby does it because the TV director told him it's funnier that way. But Dr. Wes Robertson has actual NON-anecdotal research into writing system/pronoun choices and how calling yourself ORE isn't conveying what you think it is.Ollie reminds us that loose lips sink ships.Bobby comes to terms. Topics discussed on this episode range from: Brian getting himself into some hot water. Scholars of the sakoku-jidai The New Entry Ban and the online twitter reactions How the Entry Ban is going to affect TV Explaining jokes The Japanese trendy words of the year and what we think should win (in which we totally overlook the eventual winner because we don't care about Shohei Ohtani) Ollie getting confused between the News and the Extras How a McDonald's Spokesperson launched both Wes and Bobby's Careers Why is Foreigner Japanese represented in Katakana? Wes research into using variations in script and what it's intended to convey Other reasons for using katakana Reasons for varying scripting choices outside of the general conventions (katakana for loan words, etc) The affective associations with different writing systems The writing traps that non-native Japanese learners often fall into Our word choices as Japanese learners and how we might be adding social nuance that we don't intend to add Using excessive/unnecessary Kanji Choices and turtles all the way down Pronoun choices and how much of these choices are conscious or subconscious forJapanese people/Japanese authors The extensive inferences that Japanese people can make about a writer based on their scripting choices How technology has changed scripting choices, and what that can reveal about people who make those choices Young Japanese women pretending to be lecherous old men online because why not? The phenomenon of trying to represent another demographics language trends and getting it wrong Content Links:The Sociolinguists of Japanese ScriptTopics on this week's extras include:This week's extras are 40 plus minutes of intense Japanese linguistic analysis applied to joke writing and punnery, and also to Bobby's 6-year-old daughter's first heartbreak. Yes, really. And we talk more about the trending words, including "Ussee-wa." We get into some of the cultural reasons why trying to tell Japanese people you're just joking doesn't work and discover who Wes's least favorite comedian is.It's very insightful and fascinating stuff from Wes, and you'll only ever get to listen to it by supporting the podcast for less than $1 an episode by becoming a member at http://buymeacoffee.com.Have something you'd like to say? Send us a fax at japanbyrivercruise.comor Tweet to us at @jbrcpodSocial Media Links:Dr Wes Robertson: Twitter | WordPress | Book | Wes's Death Metal Linguistics Podcast: Lingua BrutallicaOllie Horn: Twitter | InstagramBobby Judo: Twitter | Instagram | YouTubeOther things to click onSome are affiliate links because we're sell-outs We record remotely using Squadcast and the podcast is hosted on Transistor. Bobby uses the Samson Go Mic and Ollie uses the AT2005USB mic ★ Support this podcast ★
With the Scripps National Spelling Bee back after a Covid-enforced year off, we conduct our very own spelling quiz. Also, Kavita Pillay offers her take on why Indian American kids perform so well in spelling bees. And author and self-described “crummy" speller David Wolman tells us why he wrote a history of English spelling and the many attempts to reform it. Photo of a spelling bee in Fulton, MD, by Howard County Library System via Flickr/Creative Commons. Music in this episode by Cloudline, Podington Bear and Alexander Boyes. Read a transcript of the episode here.
With the Scripps National Spelling Bee back after a Covid-enforced year off, we conduct our very own spelling quiz. Also, Kavita Pillay offers her take on why Indian American kids perform so well in spelling bees. And author and self-described “crummy" speller David Wolman tells us why he wrote a history of English spelling and the many attempts to reform it. Photo of a spelling bee in Fulton, MD, by Howard County Library System via Flickr/Creative Commons. Music in this episode by Cloudline, Podington Bear and Alexander Boyes. Read a transcript of the episode here.
The mad lad finally went and did. I even made graphic explainers for the patreon post. FULL EPISODE HERE: https://www.patreon.com/cornerspaeti HOW TO REACH US: Corner Späti https://twitter.com/cornerspaeti Julia https://twitter.com/KMarxiana Rob https://twitter.com/leninkraft Nick https://twitter.com/sternburgpapi Ciarán https://twitter.com/CiaranDold
In this episode, Nina Cnockaert-Guillou talks to Dr Nike Stam, an O'Donovan Scholar at the School of Celtic Studies of the Dublin Institute for Advanced Studies (DIAS). They discuss Celtic Studies, the Dublin Institute, Dr Stam's research, and the podcast she created called Ní hAnsae or 'not difficult' in Old Irish. What we mentioned in this episode [links also available at celticstudents.blogspot.com]: Utrecht University, Celtic Languages and Cultures (www.uu.nl/bachelors/en/celtic-languages-and-culture) School of Celtic Studies at DIAS (dias.ie/celt/) O'Donovan Scholarship (applications open, deadline 5 July) (www.dias.ie/2021/06/02/vacancy-odonovan-scholarship-5/) Irish Script on Screen Project (ISOS) (isos.dias.ie) Bibliography of Irish Linguistics and Literature (BILL) (bill.celt.dias.ie) Glór archive (www.dias.ie/celt/celt-publications-2/glor-audio-archive/glor-cork/) Celtic Studies Bookshop (shop.dias.ie) Stam, Nike. A Typology of Code-Switching in the Commentary to the Félire Óengusso. Utrecht, 2017. www.lotpublications.nl/a-typology-of-code-switching-in-the-commentary-to-the-f%c3%a9lire-%c3%b3engusso Dorleijn, Margreet, and Jacomine Nortier. “Code-Switching and the Internet.” In The Cambridge Handbook of Linguistic Code-Switching, edited by Barbara Bullock and Almeida Jacqueline Toribio, 127–141. Cambridge, 2009. More info on the Félire Óengusso (www.vanhamel.nl/codecs/Félire_Óengusso). Have a look at the manuscripts on ISOS or on Digital Bodleian (digital.bodleian.ox.ac.uk). Further reading: Horst, ter, Tom. Codeswitching in the Irish-Latin Leabhar Breac: Mediaeval Homiletic Culture. LOT 452. Utrecht, 2017. www.lotpublications.nl/codeswitching-in-the-irish-latin-leabhar-breac Newsletter of the School of Celtic Studies (sign up at www.dias.ie/2010/08/18/contact-us/) Ní hAnsae Podcast (www.dias.ie/ga/series/ni-hansae/). Production team: Christina Cleary, Margaret Irons, Nike Stam. Technical support: Andrew McCarthy. Multilingual MSS Conference: mmmc.celt.dias.ie/ The conference is over, sadly, but proceedings will be published! In the meantime, you can listen to the special Ní hAnsae episode here: www.dias.ie/ga/podcast/episode-7-celebrating-multilingualism/ Dr Stam's new project: www.uu.nl/en/news/an-opportunity-for-6-utrecht-humanities-scholars-to-further-develop-their-research-ideas Sebba, Mark. Spelling and Society: The Culture and Politics of Orthography around the World. Cambridge, 2007. "Lomax the Songhunter" documentary (www.youtube.com/watch?v=Zh7bw0s3ris) Mabinogi-Ogi (Stwnsh) (www.youtube.com/watch?v=bN6igaYvO8o) Episode in English, recorded in April 2021. Host: Nina Cnockaert-Guillou Guest: Nike Stam Music: “Kesh Jig, Leitrim Fancy” by Sláinte, CC BY-SA 3.0 US (creativecommons.org/licenses/by-sa/3.0/us/), available from freemusicarchive.org. --- Send in a voice message: https://podcasters.spotify.com/pod/show/celticstudents/message
Imagine this book was written in Comic Sans. Would this choice impact your image of me as an author, despite causing no literal change to the content within? Generally, discussions of how language variants influence interpretation of language acts/users have focused on variation in speech. But it is important to remember that specific ways of representing a language are also often perceived as linked to specific social actors. Nowhere is this fact more relevant than in written Japanese, where a complex history has created a situation where authors can represent any sentence element in three distinct scripts. In Scripting Japan: Orthography, Variation, and the Creation of Meaning in Written Japanese (Routledge, 2020), Wesley Robertson provides the first investigation into the ways Japanese authors and their readers engage with this potential for script variation as a social language practice, looking at how purely script-based language choices reflect social ideologies, become linked to language users, and influence the total meaning created by language acts. Throughout the text, analysis of data from multiple studies examines how Japanese language users' experiences with the script variation all around them influence how they engage with, produce, and understand both orthographic variation and major social divides, ultimately evidencing that even the avoidance of variation can become a socially significant act in Japan. Jingyi Li is a PhD Candidate in Japanese History at the University of Arizona. She researches about early modern Japan, literati, and commercial publishing. Learn more about your ad choices. Visit megaphone.fm/adchoices Support our show by becoming a premium member! https://newbooksnetwork.supportingcast.fm
Imagine this book was written in Comic Sans. Would this choice impact your image of me as an author, despite causing no literal change to the content within? Generally, discussions of how language variants influence interpretation of language acts/users have focused on variation in speech. But it is important to remember that specific ways of representing a language are also often perceived as linked to specific social actors. Nowhere is this fact more relevant than in written Japanese, where a complex history has created a situation where authors can represent any sentence element in three distinct scripts. In Scripting Japan: Orthography, Variation, and the Creation of Meaning in Written Japanese (Routledge, 2020), Wesley Robertson provides the first investigation into the ways Japanese authors and their readers engage with this potential for script variation as a social language practice, looking at how purely script-based language choices reflect social ideologies, become linked to language users, and influence the total meaning created by language acts. Throughout the text, analysis of data from multiple studies examines how Japanese language users' experiences with the script variation all around them influence how they engage with, produce, and understand both orthographic variation and major social divides, ultimately evidencing that even the avoidance of variation can become a socially significant act in Japan. Jingyi Li is a PhD Candidate in Japanese History at the University of Arizona. She researches about early modern Japan, literati, and commercial publishing. Learn more about your ad choices. Visit megaphone.fm/adchoices
Imagine this book was written in Comic Sans. Would this choice impact your image of me as an author, despite causing no literal change to the content within? Generally, discussions of how language variants influence interpretation of language acts/users have focused on variation in speech. But it is important to remember that specific ways of representing a language are also often perceived as linked to specific social actors. Nowhere is this fact more relevant than in written Japanese, where a complex history has created a situation where authors can represent any sentence element in three distinct scripts. In Scripting Japan: Orthography, Variation, and the Creation of Meaning in Written Japanese (Routledge, 2020), Wesley Robertson provides the first investigation into the ways Japanese authors and their readers engage with this potential for script variation as a social language practice, looking at how purely script-based language choices reflect social ideologies, become linked to language users, and influence the total meaning created by language acts. Throughout the text, analysis of data from multiple studies examines how Japanese language users' experiences with the script variation all around them influence how they engage with, produce, and understand both orthographic variation and major social divides, ultimately evidencing that even the avoidance of variation can become a socially significant act in Japan. Jingyi Li is a PhD Candidate in Japanese History at the University of Arizona. She researches about early modern Japan, literati, and commercial publishing. Support our show by becoming a premium member! https://newbooksnetwork.supportingcast.fm/japanese-studies
Imagine this book was written in Comic Sans. Would this choice impact your image of me as an author, despite causing no literal change to the content within? Generally, discussions of how language variants influence interpretation of language acts/users have focused on variation in speech. But it is important to remember that specific ways of representing a language are also often perceived as linked to specific social actors. Nowhere is this fact more relevant than in written Japanese, where a complex history has created a situation where authors can represent any sentence element in three distinct scripts. In Scripting Japan: Orthography, Variation, and the Creation of Meaning in Written Japanese (Routledge, 2020), Wesley Robertson provides the first investigation into the ways Japanese authors and their readers engage with this potential for script variation as a social language practice, looking at how purely script-based language choices reflect social ideologies, become linked to language users, and influence the total meaning created by language acts. Throughout the text, analysis of data from multiple studies examines how Japanese language users' experiences with the script variation all around them influence how they engage with, produce, and understand both orthographic variation and major social divides, ultimately evidencing that even the avoidance of variation can become a socially significant act in Japan. Jingyi Li is a PhD Candidate in Japanese History at the University of Arizona. She researches about early modern Japan, literati, and commercial publishing. Learn more about your ad choices. Visit megaphone.fm/adchoices
Imagine this book was written in Comic Sans. Would this choice impact your image of me as an author, despite causing no literal change to the content within? Generally, discussions of how language variants influence interpretation of language acts/users have focused on variation in speech. But it is important to remember that specific ways of representing a language are also often perceived as linked to specific social actors. Nowhere is this fact more relevant than in written Japanese, where a complex history has created a situation where authors can represent any sentence element in three distinct scripts. In Scripting Japan: Orthography, Variation, and the Creation of Meaning in Written Japanese (Routledge, 2020), Wesley Robertson provides the first investigation into the ways Japanese authors and their readers engage with this potential for script variation as a social language practice, looking at how purely script-based language choices reflect social ideologies, become linked to language users, and influence the total meaning created by language acts. Throughout the text, analysis of data from multiple studies examines how Japanese language users' experiences with the script variation all around them influence how they engage with, produce, and understand both orthographic variation and major social divides, ultimately evidencing that even the avoidance of variation can become a socially significant act in Japan. Jingyi Li is a PhD Candidate in Japanese History at the University of Arizona. She researches about early modern Japan, literati, and commercial publishing. Learn more about your ad choices. Visit megaphone.fm/adchoices Support our show by becoming a premium member! https://newbooksnetwork.supportingcast.fm
Imagine this book was written in Comic Sans. Would this choice impact your image of me as an author, despite causing no literal change to the content within? Generally, discussions of how language variants influence interpretation of language acts/users have focused on variation in speech. But it is important to remember that specific ways of representing a language are also often perceived as linked to specific social actors. Nowhere is this fact more relevant than in written Japanese, where a complex history has created a situation where authors can represent any sentence element in three distinct scripts. In Scripting Japan: Orthography, Variation, and the Creation of Meaning in Written Japanese (Routledge, 2020), Wesley Robertson provides the first investigation into the ways Japanese authors and their readers engage with this potential for script variation as a social language practice, looking at how purely script-based language choices reflect social ideologies, become linked to language users, and influence the total meaning created by language acts. Throughout the text, analysis of data from multiple studies examines how Japanese language users' experiences with the script variation all around them influence how they engage with, produce, and understand both orthographic variation and major social divides, ultimately evidencing that even the avoidance of variation can become a socially significant act in Japan. Jingyi Li is a PhD Candidate in Japanese History at the University of Arizona. She researches about early modern Japan, literati, and commercial publishing. Learn more about your ad choices. Visit megaphone.fm/adchoices Support our show by becoming a premium member! https://newbooksnetwork.supportingcast.fm
In this episode, Claire talks to Kelly Ashley a former teacher and current Primary English Specialist and author. Kelly starts by explaining how she moved from America to the UK. She explains her experience of the American schooling system as a teenager and young adult. She also talks about her university journey and what options were available to her. After choosing various subjects including anthropology, sociology and child psychology, Kelly decided to choose teaching as her career. She completed a two-year teaching course in America and, after meeting her husband, made moved countries. After qualifying and moving to North Carolina, Kelly visited different schools to secure a teaching job. She successfully found work in a large 5-form entry school as a Grade 3 (Year 2) class teacher. As she gained experience within the school, Kelly didn’t shy away from leadership roles and climbed up the ladder relatively swiftly. However, she explains how she left the school, and America, after meeting her future husband and moved to the UK. After teaching for 6 years in America and halfway to completing her master’s degree, Kelly’s transition to the UK as a teacher was not as straight forward as she would have wished for. She was informed that she needed to requalify as a teacher to teach in the UK and she later requalified through the Graduate Teaching Programme. In this podcast, Kelly talks about her journey as a teacher in the US and UK. She talks about the transition from the different countries as a teacher and how she became an English specialist. Throughout the podcast, Kelly compares the different schooling systems and the cultures in America and the UK. She shares the various strategies she has established and refined over the years to support children with closing the vocabulary gap, as well as aiding them to ensure they are exposed to a well-rich and well-versed environment. She talks about her book and how it can support teachers in the classroom. KEY TAKEAWAYS Reading and writing workshop in America In this workshop, the teacher models a piece of text and the children have the opportunity to craft a text of their own. The workshops focused on children writing about personal interests. The text is explored as a reader and writer and how language and the language features can be used to portray a certain message. The workshop did not have a text-focused approach due to the pressures of the curriculum. Improving the vocabulary of reluctant readers Finding a way to help children develop a love of reading can start with identifying their interests. Share stories to heighten children’s engagement. The more teachers do this, the more it will help to connect with children’s personal interests and their personal understanding. It is all about that motivation and understanding. Provide children with a range of texts and encourage them to read different texts based on their interest. Recommend different texts types and books to help children develop their vocabulary and engagement with different texts. Closing the word gap Talk, talk and talk. In order to close the gap for children that don’t have a wealth of language under the age of 3, it is essential to interact and communicate with them verbally. It is important to acknowledge the extent of word and text knowledge children have at the age of three. If they have not been exposed to nursery rhymes or stories, they will not have a wealth of vocabulary. Firstly, it is important to understand the amount of talk used with children. Secondly, how we can extend the talk to dialogical talk. Dialogical talk – clarifying or asking a follow up question to an answer given or link it to personal experience and have a back and forth conversation. Develop on children’s answers When children respond to answers, develop and ask questions about their answers with new vocabulary. Engage and keep children interacted with the dialogue and associate words to the experience to help them broaden their vocabulary. Drip feeding new language Find opportunities within the classroom setting to drip feed and introduce new language. This can be through play-based learning, role play, group discussions or other methods. Recharging: charge up the word by teaching them a new word in a variety of ways. It’s the importance of recharging that word and giving them something to do with that word later. Challenging children and giving them the vocabulary and exposing them to the rich language won’t do them any harm. Storing vocabulary Even after vocabulary is processed through the auditory and visual channels there is a further challenge of words coming out. There are two different types of language stores in our brain: Receptive store – something we receive. We receive language through reading, we receive it through listening to people talk. Expressive vocabulary store – how we express our ideas and vocabulary through writing and speaking. Word of the day approach Research shows, to be a fully functioning, literate adult we need to have a vocabulary store of 50,000 – 60,000 words at the age of 16. In order to achieve this, children need to be exposed to 2,000 – 3,000 words every year up to the age of 16. If a child enters the school setting at the age of 3 with a significant word gap, they are already considerably behind the average child. However, it does not mean children need to be taught 2,000 – 3,000 words a year, it means children need to be exposed to a language-rich environment as they will learn these words through talk. In addition to this is modelling and interacting through high-quality texts. Ashley’s approach is a contextual based approach. A contextual based approach – teaching words in context to make play more engaging and interesting. After the context has been disclosed, how can the words be recharged and linked to their experience? The context must be strong and solid to ensure the word is rechargeable. The word must have a worse purpose for the children. If it doesn’t, the validity is questionable. Orthography and Phonology Orthography – visual or spelling. Writing a word and identifying words that start with the same letter string, i.e. ‘swamp, swing, sweat, sweater.’ Children may make a visual connection of the different words or they may make a visual connection to the last phoneme ‘mp’ i.e. bump, lamp, chomp etc.’ Phonology – sounds of the words and words that are in our language. Repeat the words in different tones and pitches, segmenting the word and getting children to repeat the word. Activate the understand of the word i.e. ‘what would and wouldn’t you see in a swamp?’ Morphology Morphology – changing an aspect. Morpheme – smallest unit of meaning in a word. Swamp holds meaning. However ‘swamped’ has a different meaning and has two morphemes. If we get an understanding of the root word it will help children understand the different morphemes associated with that root word. This supports the concept of word families in the National Curriculum. Etymology Etymology – history of words in our language. Getting children to investigate how words have arrived in our language and how they have changed over time. BEST MOMENTS “The American [schooling] system is really different from the UK system.” “As soon as a I got into [teaching] I was absolutely hooked.” “I just drove around to different primary schools with my resume and I just went into the office and said, ‘Are you looking for any teachers?’ This was literally two weeks before schools started.” “It was a massive culture shock, educational culture shock, personal culture shock, everything.” “I was seconded to support the North Yorkshire English team. That eventually landed to a position coming opened. I applied and then I was working as a National Strategy Consultant.” “At the heart of it, whether you have a single age class or a mixed age class you need to be catering for the needs of all of your leaners. I think the biggest challenge for me was getting to grips with the change in curriculum and the curriculum expectations. Whilst I was in America, I was very familiar with what children needed to know and when they needed to know it. That was the challenge: more getting to grips with the expectations and what they should be achieving when. But the basic principles of understanding what are children doing and what do they need to do next, it was still applicable even though I had a mixed age class. It was thinking about, ‘how can we ensure that that offer really challenges the children in the most appropriate way?’” “The approaches to teaching back then [in America], especially in terms of literacy were a lot more holistic. You saw a lot of things like readers’ and writers’ workshop which, really interestingly, are coming back now.” “Education swings in roundabouts. There are some core principles, we have this great way in education of renaming the same thing.” “I had to almost relearn how to spell certain things.” “You could, theoretically, walk into a classroom in the US and still feel quite at home. Even though the curriculum is still quite different to how we shape the curriculum in the UK.” “Sharing stories to try and heighten that interest. The more that you can do to help children to connect what they are reading to their personal interests and their personal understanding. It is all about that motivation and understanding. What reading materials are they having access to? Giving them a choice.” “As an adult it means, you need to have a good knowledge and understanding of what’s out there. Who are the new authors? Who are the authors that have been out there?” “It’s about going and exploring books… help the child to see the connections that we can make.” “If you hook onto an author or style that the child’s is really into, it’s really exploiting that and thinking is there something I can do here to engage the talk, engage the love of language, get them to explore that technical vocabulary… that will just open up their interest a bit more. It is about finding books that match their interest but also finding books that broaden their interest.” “If we want to make that dialogic, we might say, ‘Oh blueberries, I really like blueberries. What’s your favourite part of your breakfast meal?’ We might ask them a follow up question or ask them to clarify or we might link them into to a personal experience. It’s that dialogue - back and forth conversation - that will help children to find themselves within language, but also to better articulate themselves.” “Repeating that word in a sentence is called recasting, helping them to get the structure of the language.” “Limiting vocabulary in any way is never really a good idea.” “The speaking and repetition are really key.” VALUABLE RESOURCES Kelly Ashley: https://kellyashleyconsultancy.wordpress.com/ Kelly Ashley Consultancy: https://kellyashleyconsultancy.wordpress.com/vocabulary-development/ Dinosaur Dig: https://www.amazon.co.uk/Dinosaur-Dig-Penny-Dales-Dinosaurs/dp/0857630946 The Thirty Million Word Gap, (Hart Risley): http://www.wvearlychildhood.org/resources/C-13_Handout_1.pdf Bringing Words to Life, (Isabel Beck) : https://www.amazon.co.uk/Bringing-Words-Life-Second-Instruction-ebook/dp/B00BHYG41M/ Oli Cav: https://www.olicav.com/ Details for the Giveaway: https://www.facebook.com/ClassroomSecretsLimited/ The Teachers’ Podcast: https://www.facebook.com/groups/TheTeachersPodcast/ Classroom Secrets Facebook: https://www.facebook.com/ClassroomSecretsLimited/ Classroom Secrets website: https://classroomsecrets.co.uk/ LIFE/work balance campaign: https://classroomsecrets.co.uk/lifeworkbalance-and-wellbeing-in-education-campaign-2019/ ABOUT THE HOST Claire Riley Claire, alongside her husband Ed, is one of the directors of Classroom Secrets, a company she founded in 2013 and which provides outstanding differentiated resources for teachers, schools, parents and tutors worldwide. Having worked for a number of years as a teacher in both Primary and Secondary education, and experiencing first-hand the difficulties teachers were facing finding appropriate high-quality resources for their lessons, Claire created Classroom Secrets with the aim of helping reduce the workload for all school staff. Claire is a passionate believer in a LIFE/work balance for those who work in education citing the high percentage of teachers who leave or plan to leave their jobs each year. Since February 2019, Classroom Secrets has been running their LIFE/work balance campaign to highlight this concerning trend. The Teachers’ Podcast is a series of interviews where Claire meets with a wide range of guests involved in the field of education. These podcasts provide exciting discussions and different perspectives and thoughts on a variety of themes which are both engaging and informative for anyone involved in education.
My guest today is Carl Hoffman, the CEO of Basis Technology, and a specialist in text analytics. Carl founded Basis Technology in 1995, and in 1999, the company shipped its first products for website internationalization, enabling Lycos and Google to become the first search engines capable of cataloging the web in both Asian and European languages. In 2003, the company shipped its first Arabic analyzer and began development of a comprehensive text analytics platform. Today, Basis Technology is recognized as the leading provider of components for information retrieval, entity extraction, and entity resolution in many languages. Carl has been directly involved with the company’s activities in support of U.S. national security missions and works closely with analysts in the U.S. intelligence community. Many of you work all day in the world of analytics: numbers, charts, metrics, data visualization, etc. But, today we’re going to talk about one of the other ingredients in designing good data products: text! As an amateur polyglot myself (I speak decent Portuguese, Spanish, and am attempting to learn Polish), I really enjoyed this discussion with Carl. If you are interested in languages, text analytics, search interfaces, entity resolution, and are curious to learn what any of this has to do with offline events such as the Boston Marathon Bombing, you’re going to enjoy my chat with Carl. We covered: How text analytics software is used by Border patrol agencies and its limitations. The role of humans in the loop, even with good text analytics in play What actually happened in the case of the Boston Marathon Bombing? Carl’s article“Exact Match” Isn’t Just Stupid. It’s Deadly. The 2 lessons Carl has learned regarding working with native tongue source material. Why Carl encourages Unicode Compliance when working with text, why having a global perspective is important, and how Carl actually implements this at his company Carl’s parting words on why hybrid architectures are a core foundation to building better data products involving text analytics Resources and Links: Basis Technology Carl’s article: “Exact Match” isn’t Just Stupid. It’s Deadly. Carl Hoffman on LinkedIn Quotes from Today’s Episode “One of the practices that I’ve always liked is actually getting people that aren’t like you, that don’t think like you, in order to intentionally tease out what you don’t know. You know that you’re not going to look at the problem the same way they do…” — Brian O’Neill “Bias is incredibly important in any system that tries to respond to human behavior. We have our own innate cultural biases that we’re sometimes not even aware of. As you [Brian] point out, it’s impossible to separate human language from the underlying culture and, in some cases, geography and the lifestyle of the people who speak that language…” — Carl Hoffman “What I can tell you is that context and nuance are equally important in both spoken and written human communication…Capturing all of the context means that you can do a much better job of the analytics.” — Carl Hoffman “It’s sad when you have these gaps like what happened in this border crossing case where a name spelling is responsible for not flagging down [the right] people. I mean, we put people on the moon and we get something like a name spelling [entity resolution] wrong. It’s shocking in a way.” — Brian O’Neill “We live in a world which is constantly shades of gray and the challenge is getting as close to yes or no as we can.”– Carl Hoffman Episode Transcript Brian: Hey everyone, it’s Brian here and we have a special edition of Experiencing Data today. Today, we are going to be talking to Carl Hoffman who’s the CEO of Basis Technology. Carl is not necessarily a traditional what I would call Data Product Manager or someone working in the field of creating custom decision support tools. He is an expert in text analytics and specifically Basis Technology focuses on entity resolution and resolving entities across different languages. If your product, or service, or your software tool that you’re using is going to be dealing with inputs and outputs or search with multiple languages, I think your going to find my chat with Carl really informative. Without further ado here’s my chat Mr. Carl Hoffman. All right. Welcome back to Experiencing Data. Today, I’m happy to have Carl Hoffman on the line, the CEO of Basis Technology, based out of Cambridge, Massachusetts. How’s it going, Carl? Carl: Great. Good to talk to you, Brian. Brian: Yeah, me too. I’m excited. This episode’s a little but different. Basis Tech primarily focuses on providing text analytics more as a service as opposed to a data product. There are obviously some user experience ramifications on the downstream side of companies, software, and services that are leveraging some of your technology. Can you tell people a little bit about the technology of Basis and what you guys do? Carl: There are many companies who are in the business of extracting actionable information from large amounts of dirty, unstructured data and we are one of them. But what makes us unique is our ability to extract what we believe is one of the most difficult forms of big data, which is text in many different languages from a wide range of sources. You mentioned text analytics as a service, which is a big part of our business, but we actually provide text analytics in almost every conceivable form. As a service, as an on-prem cloud offering, as a conventional enterprise software, and also as the data fuel to power your in-house text analytics. There’s another half of our business as well which is focused specifically on one of the most important sources of data, which is what we call digital forensics or cyber forensics. That’s the challenge of getting data off of digital media that maybe either still in use or dead. Brian: Talk to me about dead. Can you go unpack that a little bit? Carl: Yes. Dead basically means powered off or disabled. The primary application there is for corporate investigators or for law enforcement who are investigating captured devices or digital media. Brian: Got it. Just to help people understand some of the use cases that someone would be leveraging some of the capabilities of your platforms, especially the stuff around entity resolution, can you talk a little bit about like my understanding, for example, one use case for your software is obviously border crossings, where your information, your name is going to be looked up to make sure that you should be crossing whatever particular border that you’re at. Can you talk to us a little bit about what’s happening there and what’s going on behind the scenes with your software? Like what is that agent doing and what’s happening behind the scenes? What kind of value are you providing to the government at that instance? Carl: Border crossings or the software used by border control authorities is a very important application of our software. From a data representational challenge, it’s actually not that difficult because for the most part, border authorities work with linear databases of known individuals or partially known individuals and queries. Queries may be the form manually typed by an officer or maybe scan of a passport. The complexity comes in when a match must be scored, where a decision must be rendered as to whether a particular query or a particular passport scan matches any of the names present on a watch list. Those watch list can be in many different formats. They can come from many different sources. Our software excels at performing that match at very high accuracy, regardless of the nature of the query and regardless of the source of the underlying watch list. Brian: I assume those watch lists may vary in the level of detail around for example, aliases, spelling, which alphabet they were being printed in. Part of the value of what your services is doing is helping to say, “At the end of the day, entity number seven on the list is one human being who may have many ways of being represented with words on a page or a screen,” so the goal obviously is to make sure that you have the full story of that one individual. Am I correct that you may get that in various formats and different levels of detail? And part of what your system is doing is actually trying to match up that person or give it what you say a non-binary response but a match score or something that’s more of a gray response that says, “This person may also be this person.” Can you compact that a little bit for us? Carl: Your remarks are exactly correct. First, what you said about gray is very important. These decisions are rarely 100% yes or no. We live in a world which is constantly shades of gray and the challenge is getting us close to yes or no as we can. But the quality of the data in watch lists can vary pretty wildly, based on the prominence and the number of sources. The US border authorities must compile information from many different sources, from UN, from Treasury Department, from National Counterterrorism Center, from various states, and so on. The amount of detail and the degree of our certainty regarding that data can vary from name to name. Brian: We talked about this when we first were chatting about this episode. Am I correct when I think about one of the overall values you’re doing is obviously we’re offloading some of the labor of doing this kind of entity resolution or analysis onto software and then picking up the last mile with human, to say, “Hey, are these recommendations correct? Maybe I’ll go in and do some manual labor.” Is that how you see it, that we do some of the initial grunt work and you present an almost finished story, and then the human comes in and needs to really provide that final decision at the endpoint? Are we doing enough of the help with the software? At what point should we say, “That’s no longer a software job to give you a better score about this person. We think that really requires a human analysis at this point.” Is there a way to evaluate or is that what you think about like, “Hey, we don’t want to go past up that point. We want to stop here because the technology is not good enough or the data coming in will never be accurate enough and we don’t want to go past that point.” I don’t know if that makes sense. Carl: It does makes sense. I can’t speak for all countries but I can say that in the US, the decision to deny an individual entry or certainly the decision to apprehend an individual is always made by a human. We designed our software to assume a human in the loop for the most critical decisions. Our software is designed to maximize the value of the information that is presented to the human so that nothing is overlooked. Really, the two biggest threats to our national security are one, having very valuable information overlooked, which is exactly what happened in the case of the Boston Marathon bombing. We had a great deal of information about Tamerlan and Dzhokhar Tsarnaev, yet that information was overlooked because the search engines failed to surface it in response to queries by a number of officials. And secondly, detaining or apprehending innocent individuals, which hurts our security as much as allowing dangerous individuals to pass. Brian: This has been in the news somewhat but talk about the “glitch” and what happened in that Boston Marathon bombing in terms of maybe some of these tools and what might have happened or not what might have happened, but what you understand was going on there such that there was a gap in this information. Carl: I am always very suspicious when anyone uses the word ‘glitch’ with regard to any type of digital equipment because if that equipment is executing its algorithm as it has been programmed to do, then you will get identical results for identical inputs. In this case, the software that was in use at the time by US Customs and Border Protection was executing a very naive name-matching algorithm, which failed to match two different variant spellings of the name Tsarnaev. If you look at the two variations for any human, it would seem almost obvious that the two variations are related and are in fact connected to the same name that’s natively written in Cyrillic. What really happened was a failure on the part of the architects of that name mentioning system to innovate by employing the latest technology in name-matching, which is what my company provides. In the aftermath of that disaster, our software was integrated into the border control workflow, first with the goal of redacting false-positives, and then later with the secondary goal of identifying false negatives. We’ve been very successful on both of those challenges. Brian: What were the two variants? Are you talking about the fact that one was spelled in Cyrillic and one was spelled in a Latin alphabet? They didn’t bring back data point A and B because they look like separate individuals? What was it, a transliteration? Carl: They were two different transliterations of the name Tsarnaev. In one instance, the final letters in the names are spelled -naev and the second instance it’s spelled -nayev. The presence or absence of that letter y was the only difference between the two. That’s a relatively simple case but there are many similar stories for more complex names. For instance, the 2009 Christmas bomber who successfully boarded a Northwest Delta flight with a bomb in his underwear, again because of a failure to match two different transliterations of his name. But in his case, his name is Umar Farouk Abdulmutallab. There was much more opportunity for divergent transliterations. Brian: On this kind of topic, you wrote an interesting article called “Exact Match” Isn’t Just Stupid. It’s Deadly. You’ve talked a little bit about this particular example with the Boston Marathon bombing. You mentioned that they’re thinking globally about building a product out. Can you talk to us a little about what it means to think globally? Carl: Sure. Thinking globally is really a mindset and an architectural philosophy in which systems are built to accommodate multiple languages and cultures. This is an issue not just with the spelling of names but with support for multiple writing systems, different ways of rendering and formatting personal names, different ways of rendering, formatting, and parsing postal addresses, telephone numbers, dates, times, and so on. The format of a questionnaire in Japanese is quite different from the format of a questionnaire in English. If you will get any complex global software product, there’s a great deal of work that must be done to accommodate the needs of a worldwide user base. Brian: Sure and you’re a big fan of Unicode-compliant software, am I correct? Carl: Yes. Building Unicode compliance is equivalent to building a solid stable foundation for an office tower. It only gets you to the ground floor, but without it, the rest of the tower starts to lean like the one that’s happening in San Francisco right now. Brian: I haven’t heard about that. Carl: There’s a whole tower that’s tipping over. You should read it. It’s a great story. Brian: Foundation’s not so solid. Carl: Big lawsuit’s going on right now. Brian: Not the place you want to have a sagging tower either. Carl: Not the place but frankly, it’s really quite comparable because I’ve seen some large systems that will go unnamed, where there’s legacy technology and people are unaware perhaps why it’s so important to move from Python version 2 to Python version 3. One of the key differences is Unicode compliance. So if I hear about a large-scale enterprise system that’s based on Python version 2, I’m immediately suspicious that it’s going to be suitable for a global audience. Brian: I think about, from an experience standpoint, inputs, when you’re providing inputs into forms and understanding what people are typing in. If it’s a query form, obviously giving people back what they wanted and not necessarily what they typed in. We all take for granted things like this spelling correction, and not just the spelling correction, but in Google when you type in something, it sometimes give you something that’s beyond a spelling thing, “Did you mean X, Y, and Z?” I would think that being in the form about what people are typing into your form fields and mining your query logs, this is something I do sometimes with clients when they’re trying to learn something. I actually just read an article today about dell.com and the top query term on dell.com is ‘Google,’ which is a very interesting thing. I would be curious to know why people are typing that in. Is it really like people are actually trying to access Google or are they trying to get some information? But the point is to understand the input side and to try to return some kind of logical output. Whether it’s text analytics that’s providing that or it’s name-matching, it’s being aware of that and it’s sad when you have these gaps like what happened in this border crossing case where a name spelling is responsible for not flagging down these people. I mean, we put people on the moon and we get something like a name spelling wrong. It’s shocking in a way. I guess for those who are working in tech, we can understand how it might happen, but it’s scary that that’s still going on today. You’ve probably seen many other. Are you able to talk about it? Obviously, you have some in the intelligence field and probably government where you can’t talk about some of your clients, but are there other examples of learning that’s happened that, even if it’s not necessarily entity resolution where you’ve put dots together with some of your platform? Carl: I’ll say the biggest lesson that I’ve learned from nearly two decades of working on government applications involving multi-lingual data is the importance of retaining as much of the information in its native form as possible. For example, there is a very large division of the CIA which is focused on collecting open source intelligence in the form of newspapers, magazines, the digital equivalent of those, radio broadcast, TV broadcasts and so one. It’s a unit which used to be known as the Foreign Broadcast Information Service, going back to Word War II time, and today it’s called the Open Source Enterprise. They have a very large collection apparatus and they produce some extremely high quality products which are summaries and translations from sources in other languages. In their workflow, previously they would collect information, say in Chinese or in Russian, and then do a translation or summary into English, but then would discard the original or the original would be hidden from their enterprise architecture for query purposes. I believe that is no longer the case, but retaining the pre-translation original, whether it’s open source, closed source, commercial, enterprise information, government-related information, is really very important. That’s one lesson. The other lesson is appreciating the limits of machine translation. We’re increasingly seeing machine translation integrated into all kinds of information systems, but there needs to be a very sober appreciation of what is and what is not achievable and scalable by employing machine translation in your architecture. Brian: Can you talk at all about the translation? We have so much power now with NLP and what’s possible with the technology today. As I understand it, when we talk about translation, we’re talking about documents and things that are in written word that are being translated from one language to another. But in terms of spoken word, and we’re communicating right now, I’m going to ask you two questions. What do you know about NLP and what do you know about NLP? The first one I had a little bit of attitude which assumes that you don’t know too much about it, and the second one, I was treating you as an expert. When this gets translated into text, it loses that context. Where are we with that ability to look at the context, the tone, the sentiment that’s behind that? I would imagine that’s partly why you’re talking about saving the original source. It might provide some context like, “What are the headlines were in the paper?” and, “Which paper wrote it?” and, “Is there a bias with that paper?” whatever, having some context of the full article that that report came from can provide additional context. Humans are probably better at doing some of that initial eyeball analysis or having some idea of historically where this article’s coming from such that they can put it in some context as opposed to just seeing the words in a native language on a computer screen. Can you talk a little bit about that or where we are with that? And am I incorrect that we’re not able to look at that sentiment? I don’t even know how that would translate necessarily unless you had a playing back of a recording of someone saying the words. You have translation on top of the sentiment. Now you’ve got two factors of difficulty right there and getting it accurate. Carl: My knowledge of voice and speech analysis is very naive. I do know there’s an area of huge investment and the technology is progressing very rapidly. I suspect that voice models are already being built that can distinguish between the two different intonations you used in asking that question and are able to match those against knowledge bases separately. What I can tell you is that context and nuance are equally important in both spoken and written human communication. My knowledge is stronger when it comes to its written form. Capturing all of the context means that you can do a much better job of the analytics. That’s why, say, when we’re analyzing a document, we’re looking not only the individual word but the sentence, the paragraph, where does the text appear? Is it in the body? Is it in a heading? Is it in a caption? Is it in a footnote? Or if we’re looking at, say, human-typed input—I think this is where your audience would care if you’re designing forms or search boxes—there’s a lot that can be determined in terms of how the input is typed. Again, especially when you’re thinking globally. We’re familiar with typing English and typing queries or completing forms with the letters A through Z and the numbers 0 through 9, but the fastest-growing new orthography today is emoticons and emoji offer a lot of very valuable information about the mindset of the author. Say that we look at Chinese or Japanese, which are basically written with thousand-year-old emoji, where an individual must type a sequence of keys in order to create each of the Kanji or Hanzu that appears. There’s a great deal of information we can capture. For instance, if I’m typing a form in Japanese, saying I’m filling out my last name, and then my last name is Tanaka. Well, I’m going to type phonetically some characters that represent Tanaka, either in Latin letters or one of the Japanese phonetic writing systems, then I’m going to pick from a menu or the system is going to automatically pick for me the Japanese characters that represent Tanaka. But any really capable input system is going to keep both whatever I typed phonetically and the Kanji that I selected because both of those have value and the association between the two is not always obvious. There are similar ways of capturing context and meaning in other writing systems. For instance, let’s say I’m typing Arabic not in Arabic script but I’m typing with Roman letters. How I translate from those Roman letters into the Arabic alphabet may vary, depending upon if I’m using Gulf Arabic, or Levantine Arabic, or Cairene Arabic, and say the IP address of the person doing the typing may factor into how I do that transformation and how I interpret those letters. There’s examples for many other writing systems other than the Latin alphabet. Brian: I meant to ask you. Do you speak any other languages or do you study any other languages? Carl: I studied Japanese for a few years in high school. That’s really what got me into using computers to facilitate language understanding. I just never had the ability to really quickly memorize all of the Japanese characters, the radical components, and the variant pronunciations. After spending countless hours combing through paper dictionaries, I got very interested in building electronic dictionaries. My interest in electronic dictionaries eventually led to search engines and to lexicons, algorithms powered by lexicons, and then ultimately to machine learning and deep learning. Brian: I’m curious. I assume you need to employ either a linguist or at least people that speak multiple languages. One concern with advanced analytics right now and especially anything with prediction, is bias. I speak a couple of different languages and I think one of the coolest things about learning another language is seeing the world through another context. Right now, I’m learning Polish and there’s the concept of case and it doesn’t just come down to learning the prefixes and suffixes that are added to words. Effectively, that’s what the output is but it’s even understanding the nuance of when you would use that and what you’re trying to convey, and then when you relay it back to your own language, we don’t even have an equivalent between this. We would never divide this verb into two different sentiments. So you start to learn what you don’t even know to think about. I guess what I’m asking here is how do you capture those things? Say, in our case where I assume you’re an American and I am to, so we have our English that we grew up with and our context for that. How do you avoid bias? Do you think about bias? How do you build these systems in terms of approaching it from a single language? Ultimately, this code is probably written in English, I assume. Not to say that the code would be written in a different language but just the approach when you’re thinking about all these systems that have to do with language, where does that come in having integrating other people that speaks other languages? Can you talk about that a little bit? Carl: Bias is incredibly important in any system that tries to respond to human behavior. We have our own innate cultural biases that we’re sometimes not even aware of. As you point out, it’s impossible to separate human language from the underlying culture and, in some cases, geography and the lifestyle of the people who speak that language. Yes, this is something that we think about. I disagree with your remark about code being written in English. The most important pieces of code today are the frameworks for implementing various machine learning and deep learning architectures. These architectures for the most part are language or domain-agnostic. The language bias tends to creep in as an artifact of the data that we collect. If I were to, say, harvest a million pages randomly on the internet, a very large percentage of those pages would be in English, out of proportion to the proportion of the population of the planet who speaks English, just because English is common language for commerce, science, and so on. The bias comes in from the data or it comes in from the mindset of the architect, who may do something as simple-minded as allocating only eight bits per character or deciding that Python version 2 is an acceptable development platform. Brian: Sure. I should say, I wasn’t so much speaking about the script, the code, as much as I was thinking more about the humans behind it, their background, and their language that they speak, or these kinds of choices that you’re talking about because they’re informed by that person’s perspective. But thank you for clarifying. Carl: I agree with that observation as well. You’re certainly right. Brian: Do you have a way? You’re experts in this area and you’re obviously heavily invested in this area. Are there things that you have to do to prevent that bias, in terms of like, “We know what we don’t know about it, or we know enough about it but we don’t know if about, so we have a checklist or we have something that we go through to make sure that we’re checking ourselves to avoid these things”? Or is it more in the data collection phase that you’re worried about more so than the code or whatever that’s actually going to be taking the data and generating the software value at the other end? Is it more on the collection side that you’re thinking about? How do you prevent it? How do you check yourself or tell a client or customer, “Here’s how we’ve tried to make sure that the quality of what we’re giving you is good. We did A, B, C, and D.” Maybe I’m making a bigger issue out of this than it is. I’m not sure. Carl: No, it is a big issue. The best way to minimize that cultural bias is by building global teams. That’s something that we’ve done from the very beginning days of our company. We have a company in which collectively the team speaks over 20 languages, originate from many different countries around the world, and we do business in native countries around the world. That’s just been an absolute necessity because we produce products that are proficient in 40 different human languages. If you’re a large enterprise, more than 500 people, and you’re targeting markets globally, then you need to build a global team. That applies to all the different parts of the organization, including the executive team. It’s rare that you will see individuals who are, say, American culture with no meaningful international experience being successful in any kind of global expansion. Brian: That’s pretty awesome that you have that many languages going in the staff that you have working at the company. That’s cool and I think it does provide a different perspective on it. We talk about it even in the design firm. Sometimes, early managers in the design will want to go hire a lot of people that look like they do. Not necessarily physically but in terms of skill set. One of the practices that I’ve always liked is actually getting people that aren’t like you, that don’t think like you, in order to intentionally tease out what you don’t know, you know that you’re not going to look at the problem the same way they are, and you don’t necessarily know what the output is, but you can learn that there’s other perspectives to have, so too many like-minded individuals doesn’t necessarily mean that it’s better. I think that’s cool. Can you talk to me a little bit about one of the fun little nuggets that stuck in my head and I think you’ve attributed to somebody else, but was the word about getting insights from medium data. Can you talk to us about that? Carl: Sure. I should first start by crediting the individual who planted that idea in my head, which is Dr. Catherine Havasi of the MIT Media Lab, who’s also a cofounder of a company called Luminoso, which is a partner of ours. They do common sense understanding. The challenge with building truly capable text analytics from large amounts of unstructured text is obtaining sufficient volume. If you are a company on the scale of Facebook or Google, you have access to truly enormous amount of text. I can’t quantify it in petabytes or exabytes, but it is a scale that is much greater than the typical global enterprise or Fortune 2000 company, who themselves may have very massive data lakes. But still, those data lakes are probably three to five orders of magnitudes smaller than what Google or Facebook may have under their control. That intermediate-sized data, which is sloppily referred to as big data, we think of it as medium data. We think about the challenge of allowing companies with medium data assets to obtain big data quality results, or business intelligence that’s comparable to something that Google or Facebook might be able to obtain. We do that by building models that are hybrid, that combine knowledge graphs or semantic graphs, derived from very large open sources with the information that they can extract from their proprietary data lakes, and using the open sources and the models that we build as amplifiers for their own data. Brian: I believe when we were talking, you have mentioned a couple of companies that are building products on top of you. Difio, I think, was one, and Tamr, and Luminoso. So is that related to what these companies are doing? Carl: Yes, it absolutely is related. Luminoso, in particular, is using this process of synthesizing results from their customers, proprietary data with their own models. The Luminoso team grew out of the team at MIT that built something called Constant Net, which is a very large net of graph in multiple languages. But actually, Difio as well is also using this approach of federating both open and closed source repositories by integrating a large number of connectors into their architecture. They have access to web content. They have access to various social media fire hoses. They have access to proprietary data feeds from financial news providers. But then, they fuse that with internal sources of information that may come from sources like SharePoint, or Dropbox, or Google Drive, or OneDrive, your local file servers, and then give you a single view into all of this data. Brian: Awesome. I don’t want to keep you too long. This has been super informational for me, learning about your space that you’re in. Can you tell us any closing thoughts, advice for product managers, analytics practitioners? We talked a little about obviously thinking globally and some of those areas. Any other closing thoughts about delivering good experiences, leveraging text analytics, other things to watch out for? Any general thoughts? Carl: Sure. I’ll close with a few thoughts. One is repeating what I’ve said before about Unicode compliance. The fact that I again have to state that is somewhat depressing yet it’s still isn’t taken as an absolute requirement, which is today, and yet continues to be overlooked. Secondly, just thinking globally, anything that you’re building, you got to think about a global audience. I’ll share with you an anecdote. My company gives a lot of business to Eventbrite, who I would expect by now would have a fully globalized platform, but it turns out their utility for sending an email to everybody who signed-up for an event doesn’t work in Japanese. I found that out the hard way when I needed to send an email to everybody that was signed up for our conference in Tokyo. That was very disturbing and I’m not afraid to say that live on a podcast. They need to fix it. You really don’t want customers finding out about that during a time of high stress and high pressure, and there’s just no excuse for that. Then my third point with regard to natural language understanding. This is a really incredibly exciting time to be involved with natural language, with human language because the technology is changing so rapidly and the space of what is achievable is expanding so rapidly. My final point of advice is that hybrid architectures have been the best and continue to be the best. There’s a real temptation to say, “Just grow all of my text into a deep neural net and magic is going to happen.” That can be true if you have sufficiently large amounts of data, but most people don’t. Therefore, you’re going to get better results by using hybrids of algorithmic simpler machine learning architectures together with deep neural nets. Brian: That last tip, can you take that down one more notch? I assume you’re talking about a level of quality on the tail-end of the technology implementation, there’s going to be some higher quality output. Can you translate what a hybrid architecture means in terms of a better product at the other end? What would be an example of that? Carl: Sure. It’s hard to do without getting too technical, but I’ll try and I’ll try to use some examples in English. I think the traditional way of approaching deep nets has very much been take a very simple, potentially deep and recursive neural network architecture and just throw data at it, especially images or audio waveforms. I throw my images in and I want to classify which ones were taken outdoors and which ones were taken indoors with no traditional signal processing or image processing added before or after. In the image domain, my understanding is that, that kind of purist approach is delivered the best results and that’s what I’ve heard. I don’t have first-hand information about that. However, when it comes to human language in its written form, there’s a great deal of traditional processing of that text that boosts the effectiveness of the deep learning. That falls into a number of layers that I won’t go into, but to just give you one example, let’s talk about what we called Orthography. The English language is relatively simple and that the orthography is generally quite simple. We’ve got the letters A through Z, an uppercase and lowercase, and that’s about it. But if you look inside, say a PDF of English text, you’ll sometimes encounter things like ligatures, like a lowercase F followed by a lowercase I, or two lowercase Fs together, will be replaced with single glyph to make it look good in that particular typeface. If I think those glyphs and I just throw them in with all the rest of my text, that actually complicates the job of the deep learning. If I take that FI ligature and convert it back to separate F followed by I, or the FF ligature and convert it back to FF, my deep learning doesn’t have to figure out what those ligatures are about. Now that seems pretty obscure in English but in other writing systems, especially Arabic, for instance, in which there’s an enormous number of ligatures, or Korean or languages that have diacritical marks, processing those diacritical marks, those ligatures, those orthographic variations using conventional means will make your deep learning run much faster and give you better results with less data. That’s just one example but there’s a whole range or other text-processing steps using algorithms that have been developed over many years, that simply makes the deep learning work better and that results in what we call a hybrid architecture. Brian: So it sounds like taking, as opposed to throw it all in a pot and stir, there’s the, “Well, maybe I’m going to cut the carrots neatly into the right size and then throw them in the soup.” Carl: Exactly. Brian: You’re kind of helping the system do a better job at its work. Carl: That’s right and it’s really about thinking about your data and understanding something about it before you throw it into the big brain. Brian: Exactly. Cool. Where can people follow you? I’ll put a link up to the Basis in the show notes but are you on Twitter or LinkedIn somewhere? Where can people find you? Carl: LinkedIn tends to be my preferred social network. I just was never really good at summarizing complex thoughts into 140 characters, so that’s the best place to connect with me. Basically, we’ll tell you all about Basis Technology and rosette.com is our text analytics platform, which is free for anybody to explore, and to the best of my knowledge, it is the most capable text analytics platform with the largest number of languages that you will find anywhere on the public internet. Brian: All right, I will definitely put those up in the show notes. This has been fantastic, I’ve learned a ton, and thanks for coming on Experiencing Data. Carl: Great talking with you, Brian. Brian: All right. Cheers. Carl: Cheers.
Dr Mícheál Hoyne is Bergin Fellow in the School of Celtic Studies at the Dublin Institute for Advanced Studies. He studied Modern Irish and History in Trinity College, where he also wrote his PhD, and before joining the School of Celtic Studies taught in the Philipps-Universitaet in Marburg. Conference by the Royal Irish Academy Library in partnership with Roinn na Sean-Ghaeilge, Ollscoil Mhá Nuad. The Royal Irish Academy manuscript known by its shelfmark ‘23 N 10' was produced in Ballycummin, Co. Roscommon, in the sixteenth century. It is an extraordinarily important manuscript for many reasons, but it is particularly significant because it contains tales which are amongst the oldest surviving literature in Irish. These tales would originally have been preserved in a now-lost manuscript called Cín Dromma Snechta. Aside from wonderful examples of Old Irish narrative literature, the manuscript also preserves legal texts, poetry and wisdom literature from early medieval Ireland. This two-day conference will explore all aspects of the production, survival and significance of the ‘Book of Ballycummin' and the marvels of medieval Irish literature which are contained within it. Described in the nineteenth century as a ‘little remnant of the work of the ancients', this manuscript is a remarkable witness to the earliest development of Irish literature. Location: Academy House Date: 8 March, 2019 Disclaimer: The Royal Irish Academy has prepared this content responsibly and carefully, but disclaims all warranties, express or implied, as to the accuracy of the information contained in any of the materials. The views expressed are the authors' own and not those of the Royal Irish Academy.
In this episode of the Strange Horizons podcast, editor Anaea Lay presents poetry from the January issues of Strange Horizons. "Scythia" by Marinelle G. Ringer read by G. Ringer. You can read the full text of the poem and more about Marinelle here. "Orthography in the Lands of Yahm" by Daniel Ausema read by Daniel Ausema. You can read the full text of the poem and more about Daniel here. "Retirement" by Samantha Renda-Dollman read by Julia Rios. You can read the full text of the poem and more about Samantha here. "Meatspace" by David C. Kopaska-Merkel read by Ciro Faienza. You can read the full text of the poem and more about David here.
Language Made Difficult, Vol. XXVI — The SpecGram LingNerds are joined again by guest Aya Katz. After some Lies, Damned Lies, and Linguistics, the LingNerds discuss whether English has a perfectly phonetic orthography, and some of the interesting languagey things that linguists notice out in the world. (And in the outtakes Trey insults various programming languages left and right, potentially sparking a future holy war.)
Haitian Creole (kreyòl) is a French-based Creole spoken by the entire population of Haiti. It is also one of the two or three Creole languages to have an officially standardized orthography. However, despite its relatively old history of standardization, the orthography of Haitian Creole is constantly criticized by native Haitian speakers who want to call into question the legitimacy of the official orthography. This paper examines the language beliefs, language attitudes and orthography practices of the Haitian speech community in the diaspora of New York City.
Common prefixes like in- and con- sometimes change their form in English words. The prefix roots in combine , collate and corrupt are all con-. Likewise the prefix roots in illegal and irregular are in-. This disguising of prefix roots is called prefix assimilation.Like this? Build a competent vocabulary with Membean.
If you experience any technical difficulties with this video or would like to make an accessibility-related request, please send a message to digicomm@uchicago.edu.
A 21st Century Proposal for English Spelling Reform; by H. Sanderson Chambers III; From Volume CXLIX, Number 2 of Speculative Grammarian, January 2004. — As is well-known to all educated people—and if it’s not well-known to you, then you’re not one of us—the early part of the 20th century was the heyday of the Simplified Spelling movement, which sought to reform English spelling on the grounds that it was “mard by absurdities and inconsistencies”. So what, you might say? Well, among other things, the simplifiers claimed that the spelling system kept English from being adopted as an international language: “A language, in which to learn to spel imperfectly takes two ful years of scool-time in the countries where it is spoken, does not recommend itself to the forener as a convenient medium for conducting his relations with other foreners”. (Read by David J. Peterson.)
Exploring children's difficulties with language and literacy - Audio
Transcript -- Professors Maggie Snowling and Charles Hulme talk about the cognitive approach that lies at the heart of their research into developmental disorders in children.
Exploring children's difficulties with language and literacy - Audio
Professors Maggie Snowling and Charles Hulme talk about the cognitive approach that lies at the heart of their research into developmental disorders in children.
If you experience any technical difficulties with this video or would like to make an accessibility-related request, please send a message to digicomm@uchicago.edu.
A Way with Words — language, linguistics, and callers from all over
Are serial commas always necessary? An English teacher says she was surprised to learn that she and her husband, who's also an English teacher, are giving their students conflicting advice.--Get your language question answered on the air! Call or write with your questions at any time:Email: words@waywordradio.orgPhone: United States toll-free (877) WAY-WORD/(877) 929-9673London +44 20 7193 2113Mexico City +52 55 8421 9771Site: http://waywordradio.org.Podcast: http://waywordradio.org/podcast/Forums: http://waywordradio.org/discussion/Newsletter: http://waywordradio.org/newsletter/Twitter: http://twitter.com/wayword/Skype: skype://waywordradio