POPULARITY
Hub & Spoken: Data | Analytics | Chief Data Officer | CDO | Strategy
In this episode, host Jason Foster sits down with Anthony Deighton, CEO at Tamr, to delve into the complexities of data quality and analytics. They explore the challenges organisations face in managing and improving data quality, the pivotal role of AI in addressing these challenges, and strategies for aligning data quality initiatives with business objectives. They also explore the evolving role of central data teams, led by Chief Data Officers, in spearheading enterprise-wide data quality initiatives and how businesses can effectively tackle key challenges. ***** Cynozure is a leading data, analytics and AI company that helps organisations to reach their data potential. They work with clients on data and AI strategy, data management, data architecture and engineering, analytics and AI, data culture and literacy, and change management and leadership. The company was named one of The Sunday Times' fastest-growing private companies in 2022 and 2023 and named the Best Place to Work in Data by DataIQ in 2023.
Episode 339 of The VentureFizz Podcast features Andy Palmer, serial entrepreneur, seed investor, and now author. Andy is one of my OG's, that being original guests. If you go way back into the archives of The VentureFizz Podcast, you'll find my first interview with him for Episode #22 back in 2018. In that interview, we talk about his full background story, his career building companies like Tamr & Vertica, plus investing in companies. For this interview, Andy and I connected for a different reason. He has written a book with his co-author, Paula Caliguiri titled “Live for a Living: How to Create Your Career Journey to Work Happier Not Harder.” The two met on a Bumble date which resulted in a friendship and a collaboration for this book, which is useful at any stage of one's career. It's ultimately a guide to designing a life that leverages your personal values, motivators, and goals in your career. I'm a fan of audiobooks, so it was fun listening to Andy and Paula narrate the book and I have to admit, they both have great voices for it. Plus, it was cool to know some of the people that they highlight in the book like Elizabeth Lawler, CEO & Co-Founder of AppMap, who we recently on my podcast. The format of this interview is different from my normal style which was a fun change of pace. In this episode, we chat about: * How Andy got involved in AI back in the 80's and how this is his third AI hype cycle, plus the details about his latest company called DBOS. * The inspiration behind the book. * The 5 phases of ideal careers that being: Starter Phase, Foundational Phase, High-Growth Phase, High-Focus Phase, and Give-Back Phase. We go through each phase in detail. * Why a simultaneous career path can be a benefit. * The advantages of working at startups and how it can push your career forward. * And more! There is obviously a lot more to this book than what we had the opportunity to chat about, so make sure you check it out! The title again is Live for a Living and you can find it on Amazon or the audio version Spotify or any other major outlet.
Thankfulness: From Knowledge to Action by Br. Tarek Tamr.For more information and further updates, please visit us at https://www.icoi.net. Hosted on Acast. See acast.com/privacy for more information.
Ismail Tamr is one of the well-known figures in the world of art and acting. He is of Syrian nationality. He was born in the Levant on December 11, 1990, making him 33 years old. Ismail possesses various talents including singing, acting, and dancing. He holds a bachelor's degree in business administration. His professional life is filled with accomplishments as he is a famous actor, having participated in numerous works since graduating from the Syrian Academic Institute for Drama in 2010. He has been successful in portraying various roles in television series and films. In addition, he has engaged in rap singing for several companies and received acclaim for his performances. His works have garnered a large viewership, contributing significantly to his recognition on a global scale and among Arab audiences. #hikmatwehbi#IsmaeelTamr#اسماعيل_تمر #podcast #arabicpodcast#hikmatwehbipodcast #wstudiodxb حكمت_وهبي# حكمت_وهبي_بودكاست# بودكاست#
Starting a revolution is no easy task. Just ask Dr. Michael Stonebraker and Andy Palmer, co-founders of Tamr, the enterprise data mastering company. Their path to innovation begins with a universal problem. They also collaborate with other data radicals who challenge them to think differently and help them grow.Michael is a database pioneer, MIT professor, and entrepreneur. He has founded nine database startups over 40 years and won the A.M. Turing Award in 2014. Andy is a serial entrepreneur and founder, board member, and advisor for over 50 start-ups. Satyen, Michael, and Andy discuss Tamr's tech evolution, third normal form, and probabilistic methods.--------“There's a lot of work to be done in these big enterprises of getting all the data cataloged, getting it all mastered and curated, and then delivering it out for lots of people to consume. Early on at Tamr, we did a lot of stuff on-premise and those projects just took so much longer and you ended up doing a whole bunch of infrastructure stuff that's just not required. We're really encouraging all of our customers to think cloud native, multi-tenant infrastructure as the de facto starting point because that'll let them get to better outcomes much faster.” – Andy Palmer“Data products and data mastering are basically a cloud problem. And so you want to be cloud native, you want to run software as a service, you want to be friendly to the cloud vendors. Tamr spent a lot of time over the last two or three years doing exactly that. There's a big difference between running on the cloud and being cloud native and running software as a service. That's what we're focused on big time right now. After that, I think there's a lot of research directions we're paying attention to. Trying to build more semantics into tables to be able to leverage. You can think of this as leveraging more exhaustive catalogs to do our stuff better. I think that's something we're thinking about a bunch.” – Dr. Michael Stonebraker--------Timestamps:*(04:47): The procurement proliferation*(15:51): Solving data chaos*(24:49): Probabilistically solving data problems*(37:34): The future of Tamr*(43:16): A great technologist versus a great entrepreneur*(44:51): Satyen's Takeaways--------SponsorThis podcast is presented by Alation.Learn more:* Subscribe to the newsletter: https://www.alation.com/podcast/* Alation's LinkedIn Profile: https://www.linkedin.com/company/alation/* Satyen's LinkedIn Profile: https://www.linkedin.com/in/ssangani/--------LinksConnect with Andy on LinkedInConnect with Michael on LinkedInLearn more about DBOS
Today I'm joined by Anthony Deighton, General Manager of Data Products at Tamr. Throughout our conversation, Anthony unpacks his definition of a data product and we discuss whether or not he feels that Tamr itself is actually a data product. Anthony shares his views on why it's so critical to focus on solving for customer needs and not simply the newest and shiniest technology. We also discuss the challenges that come with building a product that's designed to facilitate the creation of better internal data products, as well as where we are in this new wave of data product management, and the evolution of the role. Highlights/ Skip to: I introduce Anthony, General Manager of Data Products at Tamr, and the topics we'll be discussing today (00:37) Anthony shares his observations on how BI analytics are an inch deep and a mile wide due to the data that's being input (02:31) Tamr's focus on data products and how that reflects in Anthony's recent job change from Chief Product Officer to General Manager of Data Products (04:35) Anthony's definition of a data product (07:42) Anthony and I explore whether he feels that decision support is necessary for a data product (13:48) Whether or not Anthony feels that Tamr qualifies as a data product (17:08) Anthony speaks to the importance of focusing on outcomes and benefits as opposed to endlessly knitting together features and products (19:42) The challenges Anthony sees with metrics like Propensity to Churn (21:56) How Anthony thinks about design in a product like Tamr (30:43) Anthony shares how data science at Tamr is a tool in his toolkit and not viewed as a “fourth” leg of the product triad/stool (36:01) Anthony's views on where we are in the evolution of the DPM role (41:25) What Anthony would do differently if he could start over at Tamr knowing what he knows now (43:43) Links Tamr: https://www.tamr.com/ Innovating: https://www.amazon.com/Innovating-short-guide-making-things/dp/B0C8R79PVB The Mom Test: https://www.amazon.com/The-Mom-Test-Rob-Fitzpatrick-audiobook/dp/B07RJZKZ7F LinkedIn: https://www.linkedin.com/in/anthonydeighton/
On the show today we're exploring generative AI, analytics, and data. As the world goes crazy for ChatGPT are we ready to unlock it's potential? First we talk to Dmitry Shapiro, CEO of Koji. Dimitry is convinced we're entering a new phase of our relationship with generative AI, and asks us to look at creativity again, and ourselves 'what's good enough'? Then we meet Anthony Deighton, Chief Product Officer at Tamr. If AI is going to play a key role in enterprise, perhaps we need to improve the quality of data it's built on. Also we mentioned Amber's London Marathon attempt for Mind. Please support her if you can: https://www.justgiving.com/fundraising/amber-harrison8
Mike Stonebraker is a veritable database pioneer and a Turing Award recipient. In addition to teaching at MIT, he is a serial entrepreneur and co-creator of Postgres. Andy Palmer is a veteran business leader who serves as the CEO of Tamr, a company he co-founded with Mike. Through his seed fund Koa Labs, Andy has helped found and/or fund numerous innovative companies in diverse sectors, including health care, technology, and the life sciences. In this conversation with Tristan and Julia, Mike and Andy take us through the evolution of database technology over 5+ decades. They share unique insights into relational databases, the switch from row-based to columnar databases, and some of the patterns of database adoption they see repeated over time. For full show notes and to read 7+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.
*listeners: please fill out this short 3-min. survey to help me level up the show: https://forms.gle/iporNXSgtfuoHSPJA Sarah Breathnach works with early stage SaaS companies to build a strong marketing foundation for fast growth. She's currently Head of Demand Generation at Hunters. Prior, Sarah was Demand Generation Lead at Tamr, a SaaS startup backed by Google Ventures and others that has raised $69.2M. I happened to stumble upon Sarah while listening to her speak in a recorded webinar and thought "woah, I gotta bring her on the show." Hunters is a Series C ($118.4M total) security tech startup based out of Tel Aviv, Israel. Here's what we hit on: Generating leads versus generating demand - what's the big difference between the two and the big "aha!" for startups, why can't you do both; People think you need a big team and lots of money to generate demand - how many people do you really need and how much money do you really need; What are some smart ways to approach demand generation when you're lean and have a small budget (HINT: focus on the innovators and risk-takers); We setup "kiosks" at trade shows and events to find out if we were selling to the right person; How did marketing help 5X revenue (HINT: early on marketing built those important foundations); Why we decided to focus on the "2 opportunities goal" from any demand generation effort we did; Top marketing channels for Hunters; From Chris Walker (CEO at Refine Labs): "It's time to get out of the tech. Top marketing teams will be talking with customers every week and leaning into marketing fundamentals.” Are we too much in the tech and not enough in the fundamental, foundational marketing work; Sarah asks me her burning question "what is the hardest thing about being a marketer today." You can find Sarah on LinkedIn: www.linkedin.com/in/sarahbreathnach/ Find out more about Hunters: www.hunters.ai/ This episode is sponsored by UMSO, the website builder for startups. Visit umso.com/MSM to learn more and use the code MSM20 for a 20% discount on your first three months. For more content, subscribe to Modern Startup Marketing on Apple or Spotify or wherever you like to listen, and don't forget to leave a review. And whenever you're ready, there are 3 ways I can help you: 1. Startup marketing strategy, execution and advising (25+ happy clients and mentees) >> www.furmanovmarketing.com 2. GTM bootcamp for EdTech Founders (sign up for the next cohort) >> https://gofuelsales.com/edtech-bootcamp/ 3. Sponsor my Top 10% podcast and get startup founders, marketers and VCs hearing about your brand You can also find me hanging out on LinkedIn every single week: www.linkedin.com/in/annafurmanov --- Send in a voice message: https://anchor.fm/anna-furmanov/message
#CloudNClear episode 133 is AVAILABLE NOW!
Tamr's Chief Product Officer, Anthony Deighton, joins Coruzant Technologies for the Digital Executive podcast. He shares his shift from being a business consultant to technology. He was inspired by innovative software and tech. Now he leverages machine learning to analyze large amounts of data looking for patterns in a short amount of time.
Big data is no longer the problem. It's bad data. Infrastructure isn't standing in the way of companies trying to become data-driven. The quality and accuracy of a company's data is now the bottleneck. Hear more about the innovations we're delivering to solve this problem at scale.
Shobhit Chugh is the founder and CEO of Intentional Product Manager, a platform that helps product managers and product leaders fast-track their careers to CPO status. In this episode, he weighs in on the challenges of being a product manager and building a coaching business. He then shares 3 ways to succeed in your career so that you can run the job instead of letting the job run you. He also offers some advice on becoming product-led. Show Notes [00:38] About Shobhit's core mission [01:38] An overview of his journey to product management [02:48] The challenges of building your first product team [04:44] A common mistake that product managers make [08:22] Do your job right, be clear on what will demonstrate your contributions to your company, and have a purpose or a mission that's bigger than yourself [12:12] Why Shobhit switched from being a product manager to coaching product managers [13:48] On building his company remotely and overcoming impostor syndrome [18:15] Figure out what your customers really need, focus on one thing at a time, and think about the emotional aspects of the product and product-led growth [21:22] Shobhit's advice for people who want to get into product management About Shobhit Chugh Before becoming the Intentional Product Manager coach, Shobhit Chugh worked as a consultant at McKinsey. He also used to be a product manager for Google's Crashlytics. Shobhit's previous adventures include working at startups like Tamr, High Start Group (now WEVO), and Lattice Engines (acquired by D&B); co-founding Adaptly (acquired by Accenture), and getting an MBA and Masters in Engineering Management at Kellogg School of Management. Link McKinsey & Company Join Shobhit's Masterclass on 5 Steps Product Managers can take to have an Outstanding Career Profile Intentional Product Manager Shobhit Chugh on LinkedIn
It wasn't that long ago that enterprise software was dominated by proprietary data architectures from the likes of IBM, Oracle, and SAP. Once you made a selection, you were stuck with it for better or worse, in sickness and in health. But that's not the case today. The methodical movement towards resilient, open SaaS applications and best-of-breed tools has ushered in the era of DataOps buoyed by the “modern data stack.” But, is this a good thing? Join Tim, Juan and special guest Andy Palmer, CEO of Tamr, to talk open vs. closed ecosystems and how to decide what's right for you. This episode will feature: How DataOps will reshape the next wave of best-of-breed systems Will monoliths make a comeback? What was a big purchase you made that you almost immediately regretted?
Episode 6, Season 2Anthony sits down with Trifacta’s CEO Adam Wilson to talk about changes to the data landscape, why we need to pay more attention to how we people become “data-driven”, and how Tamr and Trifacta work in harmony to deliver transformative data initiatives.
Episode 6, Season 2Anthony sits down with Trifacta’s CEO Adam Wilson to talk about changes to the data landscape, why we need to pay more attention to how we people become “data-driven”, and how Tamr and Trifacta work in harmony to deliver transformative data initiatives.
keluar lagi nihh episode baru, episode kali ini kita bakal nyeritain bagaimana keseruan kelanjutan ekspedisi khalid di perang irak, langsung aja cus dengerin
Episode 3, Season 2Anthony chats with Evren Eryurek, Director of Product Management at Google Cloud about what’s happening today with cloud migrations, and uniquely how Google is helping make the transition easy and helping to unlock the value of business data in the cloud. The two also discuss the Google and Tamr partnership, and Evren shares his advice for recent grads interested in a career in technology.
Episode 3, Season 2Anthony chats with Evren Eryurek, Director of Product Management at Google Cloud about what’s happening today with cloud migrations, and uniquely how Google is helping make the transition easy and helping to unlock the value of business data in the cloud. The two also discuss the Google and Tamr partnership, and Evren shares his advice for recent grads interested in a career in technology.
In this episode of Full Contact CEO, Alex Magleby welcomes serial entrepreneur, co-founder and CEO of Tamr, founder Koa Labs, data scientist, venture capitalist, rugby superhero, mentor to many, keeper of thee table at Henrietta’s, over 50 companies started, Bowdoin Rugby, Boston RFC alum, and overall amazing human, Andy Palmer. Andy was the first ever Angel Investor of the Year award recipient from the New England Venture Capital Association. The two discuss Andy’s mission-driven investment philosophies, data-management for companies big and small, and of course mentorship, leadership, and the future of sports-business. Join us on this must listen on life as entrepreneurs, the key differences between Cambridge and Silicon Valley, and to learn how to best hire the right management teams!
Dr. Michael Stonebraker, co-founder of Tamr, joins host John Gilroy on this week's Federal Tech Talk to discuss why he thinks his company has solved the basic problem with machine learning.
With the proliferation of data sources to give a more comprehensive view of the information critical to your business, it is even more important to have a canonical view of the entities that you care about. Is customer number 342 in your ERP the same as Bob Smith on Twitter? Using master data management to build a data catalog helps you answer these questions reliably and simplify the process of building your business intelligence reports. In this episode Manager of Data Analytics at Dexcom at Tamr, Chuck Miller discusses the challenges of building a master data set, why you should have one, and some of the techniques that modern platforms and systems provide for maintaining it.Chuck LinkedIn: https://www.linkedin.com/in/chuckmillerlinkedin/The Data Standard LinkedIn: https://www.linkedin.com/company/the-data-standard/
ShownotesTareq Tamr from Thinkbites interviews me regarding the article:Soft Skills - The Importance of Personal DevelopmentWe coverSoft-skills developmentIntentionalitySocial MediaMasjid In-reach vs. Out-reachCheck out Thinkbites for more great articles. Fiqh of Social Media BookGet the book in paperback or Kindle. Stay Connectedibnabeeomar.com Newsletter - What Muslim Leaders Read Sign up for the email list at and get your copy of the 40 hadith on social media ebookSocial Media: Twitter, Facebook, Instagram, YouTubeTo support this podcast and related projects, please consider contributing at Patreon. Subscribe to the PodcastApple Podcasts: http://bit.ly/ibnabeeomar_appleGoogle play: http://bit.ly/ibnabeeomar_googleSpotify: http://bit.ly/ibnabeeomar_spotify
[2:25] - What is leadership and how does it differ from management? [5:29] - How did you influence and get support for your ideas at Siebel where you were responsible for a product that made insignificant revenues compared to CRM, their flagship product? [12:02] - What is important - leadership or management for a founder? [16:25] - What is your philosophy on hiring and how do you hire for talent and cultural fit? [21:15] - You joined Qlik when it was a small unknown Swedish company and took it from that to a public company. Talk me through your journey there and how you evolved as a leader at different stages of the company. [32:20] - Having worked in different geos, is there a different leadership style in Europe vs America for example? [34:56] - What differentiates good leaders from bad leaders when it comes to performance management and nurturing talent? [38:30] - What advice on leadership would you give to entrepreneurs who are dealing with different needs of employees, customers, and businesses because of the pandemic? [40:28] - What advice would you give to your younger self on leadership? Productivity tool: tryshift.comBook: Power of One
Even with almost 2 billions adherents around the world, the religion of Islam as it is presented in the west still struggles to communicate itself to the Muslims masses. Even now, Muslim community workers are still attempting to communicate the religion appropriately to the generality of people. For some, the religion and all of its apparent teachings may not completely resonate with the typical adherent. On this episode of Not Another Muslim Podcast, we talk to Tareq Tamr on the idea of reframing religion for the Muslim masses. Tareq Tamr is a youth director and psychology student from Windsor, Ontario who has served his communities as a khateeb, Islamic school teacher, and MSA executive. He is the founder of Thinkbites, a non-profit online publication and content platform for college students and young professionals encouraging personal, spiritual, and community development and Podcast Producer for Yaqeen Institute. He is also the founder of Real Talk Windsor, a youth-run halaqa program for students in high school and older. He studies the Islamic sciences with local teachers on a part-time basis, and he is passionate about the intersection of spirituality, psychology, and mental health. Follow him @ibnabitareq
On today's episode of Scaleup Marketing I'm joined by Anthony Deighton, the Chief Product Officer at Tamr. Anthony has 20 years of experience in enterprise software, including 10+ years at Qlik growing it from an unknown Swedish software company to a public company and market leader. Anthony and I talk about his experience building Qlik from the ground up, how product and product marketing teams need to work together, and what messaging development has in common with standup comedy.
Episode 12, Season 1“Andy Palmer is a serial entrepreneur who's helped found or fund more than 50 companies, including Vertica, a pioneer in the database management industry. He's currently CEO of Tamr, a data mastering platform that uses machine learning to do the heavy lifting around consolidating, cleansing, and categorizing data. Andy's also served as CIO of Infinity Pharmaceuticals, a biotech startup, and ran data and software engineering operations at the research group of pharmaceutical company Novartis.Andy's approach to data mastering involves dataops. This concept has emerged in recent years and shares traits with dev ops. He discusses what dataops is and how it helps companies with their digital transformation projects. He also talks about Tamr's upcoming DataMasters Summit and how the event will provide people with practical advice on where to start their dataops journeys.”To find more information about DataMasters Summit 2020, including the complete list of speakers, and to register to attend, please visit: http://tamr.com/summit2020.
Andy Palmer is a serial entrepreneur who’s helped found or fund more than 50 companies, including Vertica, a pioneer in the database management industry. He’s currently CEO of Tamr, a data mastering platform that uses machine learning to do the heavy lifting around consolidating, cleansing, and categorizing data. Andy’s also served as CIO of Infinity Pharmaceuticals, a biotech startup, and ran data and software engineering operations at the research group of pharmaceutical company Novartis. Andy’s approach to data mastering involves dataops. This concept has emerged in recent years and shares traits with dev ops. He discusses what dataops is and how it helps companies with their digital transformation projects. He also talks about Tamr’s upcoming DataMasters Summit and how the event will provide people with practical advice on where to start their dataops journeys.
Andy Palmer is a serial entrepreneur who's helped found or fund more than 50 companies, including Vertica, a pioneer in the database management industry. He's currently CEO of Tamr, a data mastering platform that uses machine learning to do the heavy lifting around consolidating, cleansing, and categorizing data. Andy's also served as CIO of Infinity Pharmaceuticals, a biotech startup, and ran data and software engineering operations at the research group of pharmaceutical company Novartis.Andy's approach to data mastering involves dataops. This concept has emerged in recent years and shares traits with dev ops. He discusses what dataops is and how it helps companies with their digital transformation projects. He also talks about Tamr's upcoming DataMasters Summit and how the event will provide people with practical advice on where to start their dataops journeys.DataMasters is produced by PI Media for Tamr.
Episode 12, Season 1“Andy Palmer is a serial entrepreneur who’s helped found or fund more than 50 companies, including Vertica, a pioneer in the database management industry. He’s currently CEO of Tamr, a data mastering platform that uses machine learning to do the heavy lifting around consolidating, cleansing, and categorizing data. Andy’s also served as CIO of Infinity Pharmaceuticals, a biotech startup, and ran data and software engineering operations at the research group of pharmaceutical company Novartis.Andy’s approach to data mastering involves dataops. This concept has emerged in recent years and shares traits with dev ops. He discusses what dataops is and how it helps companies with their digital transformation projects. He also talks about Tamr’s upcoming DataMasters Summit and how the event will provide people with practical advice on where to start their dataops journeys.”To find more information about DataMasters Summit 2020, including the complete list of speakers, and to register to attend, please visit: http://tamr.com/summit2020.
Stefanie Costa Leabo, Chief Data Officer for the City of Boston, talks about her non-linear career path (she considered becoming a political science professor at one point), the tangible connection between her work and what happens outside of city hall, and how sharing data helps Boston's residents better understand what's going on in their city.Subscribe and listen to DataMasters wherever you get your podcasts.DataMasters is produced by PI Media for Tamr.
Stefanie Costa Leabo, Chief Data Officer for the City of Boston, talks about her non-linear career path (she considered becoming a political science professor at one point), the tangible connection between her work and what happens outside of city hall, and how sharing data helps Boston's residents better understand what's going on in their city. DataMasters is produced by PI Media for Tamr.
Data pipelines are becoming a quintessential part of every organisation that is wanting to be data fluent. This means DataOps is becoming increasingly more important in this process as it is used by analytic and data teams, to improve quality and reduce the cycle time of data analytics. However, now organisations are facing unprecedented challenges with current events and widespread lockdown. It's important to understand how to guide DataOps teams through these rapid changes and uncertainties so that data can remain functional. In this podcast, Ronald van Loon speaks to Suki Dhuphar, EMEA Field Engineering Lead at Tamr. Firstly, Suki outlines the challenges that DataOps teams face with remote work and how to overcome these challenges. Then he explains how DataOps teams streamline their data engineering processes once they're equipped with the right tools. Also, Suki explains how a remote work environment impacts an organisations data pipeline and creating a culture that is adaptable. Finally, he explains the best practices needed to help increase data and analytics efficiency for transitioning processes.
It’s cliché to say that data cleaning accounts for 80% of a data scientist’s job, but it’s directionally true. That’s too bad, because fun things like data exploration, visualization and modelling are the reason most people get into data science. So it’s a good thing that there’s a major push underway in industry to automate data cleaning as much as possible. One of the leaders of that effort is Ihab Ilyas, a professor at the University of Waterloo and founder of two companies, Tamr and Inductiv, both of which are focused on the early stages of the data science lifecycle: data cleaning and data integration. Ihab knows an awful lot about data cleaning and data engineering, and has some really great insights to share about the future direction of the space — including what work is left for data scientists, once you automate away data cleaning.
Angela Liu is the Director of Hack.Diversity, the workforce development division of the New England Venture Capital Association. Launched in 2017, Hack.Diversity partners with Boston’s fast growing tech teams to not only increase the representation of Black and Latinx technologists in the innovation economy, but also evolve organizational practices to support retention and promotion of that talent. By the end of 2020, Liu will have scaled operations, community, and curriculum to support a network of 150+ Hack Fellows to contribute to 25+ companies including Drift, Rapid7, Liberty Mutual, Tamr, and Vertex. Prior to joining NEVCA and Hack.Diversity, Liu spent three years at MIT building pipelines towards STEM education access for students historically underserved and underrepresented in STEM fields. A 2020 Spark Boston Impact Winner, Liu is a “1.5 generation” immigrant from Guangzhou, China and first-generation college student who studied Science, Technology, and International Affairs at Georgetown University’s School of Foreign Service. Discover more Boston Speaks Up at Boston Business Journal's BostInno: www.americaninno.com/boston/boston-speaks-up/
Episode 6, Season 1What’s required to master large numbers of data sources? First, avoid approaches that require writing rules. Then use machine learning and cloud computing to efficiently handle the workload. That advice comes from Mike Stonebraker, a database pioneer who helped create the INGRES relational database system, won the 2014 A.M. Turing Award, and has co-founded several data management startups, including Tamr.Mike talks about common data mastering mistakes, why traditional tools aren’t right for the task, and shares examples of companies that have successful mastered data at scale.
What's required to master large numbers of data sources? First, avoid approaches that require writing rules. Then use machine learning and cloud computing to efficiently handle the workload. That advice comes from Mike Stonebraker, a database pioneer who helped create the INGRES relational database system, won the 2014 A.M. Turing Award, and has co-founded several data management startups, including Tamr. Mike, who's an adjunct professor of computer science at MIT, talks about common data mastering mistakes, why traditional tools aren't right for the task, and shares examples of companies that have successful mastered data at scale.
In this podcast, Dr. Michael Stonebraker discussed his perspective on the growing data ops industry and its future. Dr. Stonebraker has launched several startups that defined data ops. He shares his insights into the data ops market and what to expect in the future of data and operations. Timeline: 0:30 Mike's take on the "no sequel movement". 6:48 Evolution of database. 13:55 Mobility of data and cloud. 18:41 Tamr's shift from the database to AI. 29:00 Ingredient for a successful start-up. 36:50 Leadership qualities that keep you successful and sane. 41:50 Mike's parting thoughts. Podcast Link: https://futureofdata.org/dr-mikestonebraker-on-the-future-of-dataops-and-ai/ Dr. Stonebraker's BIO: Dr. Stonebraker has been a pioneer of database research and technology for more than forty years. He was the main architect of the INGRES relational DBMS, and the object-relational DBMS, POSTGRES. These prototypes were developed at the University of California at Berkeley, where Stonebraker was a Professor of Computer Science for twenty-five years. More recently, at M.I.T., he was a co-architect of the Aurora/Borealis stream processing engine, the C-Store column-oriented DBMS, the H-Store transaction processing engine, which became VoltDB, the SciDB array DBMS, and the Data Tamer data curation system. Presently he serves as an advisor to VoltDB and Chief Technology Officer of Paradigm4 and Tamr, Inc. Professor Stonebraker was awarded the ACM System Software Award in 1992 for his work on INGRES. Additionally, he was awarded the first annual SIGMOD Innovation award in 1994 and was elected to the National Academy of Engineering in 1997. He was awarded the IEEE John Von Neumann award in 2005 and the 2014 Turing Award and is presently an Adjunct Professor of Computer Science at M.I.T, where he is co-director of the Intel Science and Technology Center focused on big data. About #Podcast: #FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey in creating the data-driven future. Wanna Join? If you or any you know wants to join in, Register your interest by emailing us @ info@analyticsweek.com Want to sponsor? Email us @ info@analyticsweek.com Keywords: FutureOfData, DataAnalytics, Leadership, Futurist, Podcast, BigData, Strategy
Dr. @MikeStonebraker on his journey to the evolution of data ops and winning #Turing Award #FutureOfData #Leadership #Podcast Timeline: 0:29 Mike's journey. 30:23 Reason behind Mike's preference of academia over the corporate. 38:50 Tips to leaders on data management. In this podcast, Dr. Michael Stonebraker discussed his journey into creating data ops and winning the Turing award. He shared his life's several aha moments and progressions that mirrored the evolution of the data ops industry. It's a delightful conversation for anyone seeking to understand how data ops have evolved over the last couple of decades and what it takes to win the Turing Award. Podcast Link: iTunes: https://apple.co/2VtcX6d Youtube: https://youtu.be/bY1qjy0qpq4 Dr. Stonebraker's BIO: Dr. Stonebraker has been a pioneer of database research and technology for more than forty years. He was the main architect of the INGRES relational DBMS, and the object-relational DBMS, POSTGRES. These prototypes were developed at the University of California at Berkeley where Stonebraker was a Professor of Computer Science for twenty-five years. More recently at M.I.T., he was a co-architect of the Aurora/Borealis stream processing engine, the C-Store column-oriented DBMS, the H-Store transaction processing engine, which became VoltDB, the SciDB array DBMS, and the Data Tamer data curation system. Presently he serves as an advisor to VoltDB and Chief Technology Officer of Paradigm4 and Tamr, Inc. Professor Stonebraker was awarded the ACM System Software Award in 1992 for his work on INGRES. Additionally, he was awarded the first annual SIGMOD Innovation award in 1994 and was elected to the National Academy of Engineering in 1997. He was awarded the IEEE John Von Neumann award in 2005 and the 2014 Turing Award and is presently an Adjunct Professor of Computer Science at M.I.T, where he is co-director of the Intel Science and Technology Center focused on big data. About #Podcast: #FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to come on the show and discuss their journey in creating the data-driven future. Wanna Join? If you or any you know wants to join in, Register your interest by emailing us @ info@analyticsweek.com Want to sponsor? Email us @ info@analyticsweek.com Keywords: FutureOfData, DataAnalytics, Leadership, Futurist, Podcast, BigData, Strategy
Ramy Youssef, Normalisation, Intention of the Show, Perspectives, One's Religiosity, Reality of the Ummah. We touch on all of this with our guest Tareq Tamr with special cohost Mahin Islam from The Mad Mamluks. Tareq Tamr is a medical student from Canada who is active with youth work in the Muslim community. He is also active on social media discussing issues within the Muslim community. Hosts : Tanzim & Mahin Please email us your comments, feedback, and questions at: info@boysinthecave.com, and leave a review and 5-star rating on iTunes! Check out our website - boysinthecave.com Follow us on: Facebook – https://www.facebook.com/boysinthecave/ Instagram – @boysinthecave Twitter - @boysinthecave Become a Patreon today! https://www.patreon.com/boysinthecave -------------------------------------------------------------------------------------------------------- Tareq Tamr's Online Visibility https://www.facebook.com/tareqtamr1 https://www.instagram.com/ibnabitareq/ https://twitter.com/ibnabitareq -------------------------------------------------------------------------------------------------------
My guest today is Carl Hoffman, the CEO of Basis Technology, and a specialist in text analytics. Carl founded Basis Technology in 1995, and in 1999, the company shipped its first products for website internationalization, enabling Lycos and Google to become the first search engines capable of cataloging the web in both Asian and European languages. In 2003, the company shipped its first Arabic analyzer and began development of a comprehensive text analytics platform. Today, Basis Technology is recognized as the leading provider of components for information retrieval, entity extraction, and entity resolution in many languages. Carl has been directly involved with the company’s activities in support of U.S. national security missions and works closely with analysts in the U.S. intelligence community. Many of you work all day in the world of analytics: numbers, charts, metrics, data visualization, etc. But, today we’re going to talk about one of the other ingredients in designing good data products: text! As an amateur polyglot myself (I speak decent Portuguese, Spanish, and am attempting to learn Polish), I really enjoyed this discussion with Carl. If you are interested in languages, text analytics, search interfaces, entity resolution, and are curious to learn what any of this has to do with offline events such as the Boston Marathon Bombing, you’re going to enjoy my chat with Carl. We covered: How text analytics software is used by Border patrol agencies and its limitations. The role of humans in the loop, even with good text analytics in play What actually happened in the case of the Boston Marathon Bombing? Carl’s article“Exact Match” Isn’t Just Stupid. It’s Deadly. The 2 lessons Carl has learned regarding working with native tongue source material. Why Carl encourages Unicode Compliance when working with text, why having a global perspective is important, and how Carl actually implements this at his company Carl’s parting words on why hybrid architectures are a core foundation to building better data products involving text analytics Resources and Links: Basis Technology Carl’s article: “Exact Match” isn’t Just Stupid. It’s Deadly. Carl Hoffman on LinkedIn Quotes from Today’s Episode “One of the practices that I’ve always liked is actually getting people that aren’t like you, that don’t think like you, in order to intentionally tease out what you don’t know. You know that you’re not going to look at the problem the same way they do…” — Brian O’Neill “Bias is incredibly important in any system that tries to respond to human behavior. We have our own innate cultural biases that we’re sometimes not even aware of. As you [Brian] point out, it’s impossible to separate human language from the underlying culture and, in some cases, geography and the lifestyle of the people who speak that language…” — Carl Hoffman “What I can tell you is that context and nuance are equally important in both spoken and written human communication…Capturing all of the context means that you can do a much better job of the analytics.” — Carl Hoffman “It’s sad when you have these gaps like what happened in this border crossing case where a name spelling is responsible for not flagging down [the right] people. I mean, we put people on the moon and we get something like a name spelling [entity resolution] wrong. It’s shocking in a way.” — Brian O’Neill “We live in a world which is constantly shades of gray and the challenge is getting as close to yes or no as we can.”– Carl Hoffman Episode Transcript Brian: Hey everyone, it’s Brian here and we have a special edition of Experiencing Data today. Today, we are going to be talking to Carl Hoffman who’s the CEO of Basis Technology. Carl is not necessarily a traditional what I would call Data Product Manager or someone working in the field of creating custom decision support tools. He is an expert in text analytics and specifically Basis Technology focuses on entity resolution and resolving entities across different languages. If your product, or service, or your software tool that you’re using is going to be dealing with inputs and outputs or search with multiple languages, I think your going to find my chat with Carl really informative. Without further ado here’s my chat Mr. Carl Hoffman. All right. Welcome back to Experiencing Data. Today, I’m happy to have Carl Hoffman on the line, the CEO of Basis Technology, based out of Cambridge, Massachusetts. How’s it going, Carl? Carl: Great. Good to talk to you, Brian. Brian: Yeah, me too. I’m excited. This episode’s a little but different. Basis Tech primarily focuses on providing text analytics more as a service as opposed to a data product. There are obviously some user experience ramifications on the downstream side of companies, software, and services that are leveraging some of your technology. Can you tell people a little bit about the technology of Basis and what you guys do? Carl: There are many companies who are in the business of extracting actionable information from large amounts of dirty, unstructured data and we are one of them. But what makes us unique is our ability to extract what we believe is one of the most difficult forms of big data, which is text in many different languages from a wide range of sources. You mentioned text analytics as a service, which is a big part of our business, but we actually provide text analytics in almost every conceivable form. As a service, as an on-prem cloud offering, as a conventional enterprise software, and also as the data fuel to power your in-house text analytics. There’s another half of our business as well which is focused specifically on one of the most important sources of data, which is what we call digital forensics or cyber forensics. That’s the challenge of getting data off of digital media that maybe either still in use or dead. Brian: Talk to me about dead. Can you go unpack that a little bit? Carl: Yes. Dead basically means powered off or disabled. The primary application there is for corporate investigators or for law enforcement who are investigating captured devices or digital media. Brian: Got it. Just to help people understand some of the use cases that someone would be leveraging some of the capabilities of your platforms, especially the stuff around entity resolution, can you talk a little bit about like my understanding, for example, one use case for your software is obviously border crossings, where your information, your name is going to be looked up to make sure that you should be crossing whatever particular border that you’re at. Can you talk to us a little bit about what’s happening there and what’s going on behind the scenes with your software? Like what is that agent doing and what’s happening behind the scenes? What kind of value are you providing to the government at that instance? Carl: Border crossings or the software used by border control authorities is a very important application of our software. From a data representational challenge, it’s actually not that difficult because for the most part, border authorities work with linear databases of known individuals or partially known individuals and queries. Queries may be the form manually typed by an officer or maybe scan of a passport. The complexity comes in when a match must be scored, where a decision must be rendered as to whether a particular query or a particular passport scan matches any of the names present on a watch list. Those watch list can be in many different formats. They can come from many different sources. Our software excels at performing that match at very high accuracy, regardless of the nature of the query and regardless of the source of the underlying watch list. Brian: I assume those watch lists may vary in the level of detail around for example, aliases, spelling, which alphabet they were being printed in. Part of the value of what your services is doing is helping to say, “At the end of the day, entity number seven on the list is one human being who may have many ways of being represented with words on a page or a screen,” so the goal obviously is to make sure that you have the full story of that one individual. Am I correct that you may get that in various formats and different levels of detail? And part of what your system is doing is actually trying to match up that person or give it what you say a non-binary response but a match score or something that’s more of a gray response that says, “This person may also be this person.” Can you compact that a little bit for us? Carl: Your remarks are exactly correct. First, what you said about gray is very important. These decisions are rarely 100% yes or no. We live in a world which is constantly shades of gray and the challenge is getting us close to yes or no as we can. But the quality of the data in watch lists can vary pretty wildly, based on the prominence and the number of sources. The US border authorities must compile information from many different sources, from UN, from Treasury Department, from National Counterterrorism Center, from various states, and so on. The amount of detail and the degree of our certainty regarding that data can vary from name to name. Brian: We talked about this when we first were chatting about this episode. Am I correct when I think about one of the overall values you’re doing is obviously we’re offloading some of the labor of doing this kind of entity resolution or analysis onto software and then picking up the last mile with human, to say, “Hey, are these recommendations correct? Maybe I’ll go in and do some manual labor.” Is that how you see it, that we do some of the initial grunt work and you present an almost finished story, and then the human comes in and needs to really provide that final decision at the endpoint? Are we doing enough of the help with the software? At what point should we say, “That’s no longer a software job to give you a better score about this person. We think that really requires a human analysis at this point.” Is there a way to evaluate or is that what you think about like, “Hey, we don’t want to go past up that point. We want to stop here because the technology is not good enough or the data coming in will never be accurate enough and we don’t want to go past that point.” I don’t know if that makes sense. Carl: It does makes sense. I can’t speak for all countries but I can say that in the US, the decision to deny an individual entry or certainly the decision to apprehend an individual is always made by a human. We designed our software to assume a human in the loop for the most critical decisions. Our software is designed to maximize the value of the information that is presented to the human so that nothing is overlooked. Really, the two biggest threats to our national security are one, having very valuable information overlooked, which is exactly what happened in the case of the Boston Marathon bombing. We had a great deal of information about Tamerlan and Dzhokhar Tsarnaev, yet that information was overlooked because the search engines failed to surface it in response to queries by a number of officials. And secondly, detaining or apprehending innocent individuals, which hurts our security as much as allowing dangerous individuals to pass. Brian: This has been in the news somewhat but talk about the “glitch” and what happened in that Boston Marathon bombing in terms of maybe some of these tools and what might have happened or not what might have happened, but what you understand was going on there such that there was a gap in this information. Carl: I am always very suspicious when anyone uses the word ‘glitch’ with regard to any type of digital equipment because if that equipment is executing its algorithm as it has been programmed to do, then you will get identical results for identical inputs. In this case, the software that was in use at the time by US Customs and Border Protection was executing a very naive name-matching algorithm, which failed to match two different variant spellings of the name Tsarnaev. If you look at the two variations for any human, it would seem almost obvious that the two variations are related and are in fact connected to the same name that’s natively written in Cyrillic. What really happened was a failure on the part of the architects of that name mentioning system to innovate by employing the latest technology in name-matching, which is what my company provides. In the aftermath of that disaster, our software was integrated into the border control workflow, first with the goal of redacting false-positives, and then later with the secondary goal of identifying false negatives. We’ve been very successful on both of those challenges. Brian: What were the two variants? Are you talking about the fact that one was spelled in Cyrillic and one was spelled in a Latin alphabet? They didn’t bring back data point A and B because they look like separate individuals? What was it, a transliteration? Carl: They were two different transliterations of the name Tsarnaev. In one instance, the final letters in the names are spelled -naev and the second instance it’s spelled -nayev. The presence or absence of that letter y was the only difference between the two. That’s a relatively simple case but there are many similar stories for more complex names. For instance, the 2009 Christmas bomber who successfully boarded a Northwest Delta flight with a bomb in his underwear, again because of a failure to match two different transliterations of his name. But in his case, his name is Umar Farouk Abdulmutallab. There was much more opportunity for divergent transliterations. Brian: On this kind of topic, you wrote an interesting article called “Exact Match” Isn’t Just Stupid. It’s Deadly. You’ve talked a little bit about this particular example with the Boston Marathon bombing. You mentioned that they’re thinking globally about building a product out. Can you talk to us a little about what it means to think globally? Carl: Sure. Thinking globally is really a mindset and an architectural philosophy in which systems are built to accommodate multiple languages and cultures. This is an issue not just with the spelling of names but with support for multiple writing systems, different ways of rendering and formatting personal names, different ways of rendering, formatting, and parsing postal addresses, telephone numbers, dates, times, and so on. The format of a questionnaire in Japanese is quite different from the format of a questionnaire in English. If you will get any complex global software product, there’s a great deal of work that must be done to accommodate the needs of a worldwide user base. Brian: Sure and you’re a big fan of Unicode-compliant software, am I correct? Carl: Yes. Building Unicode compliance is equivalent to building a solid stable foundation for an office tower. It only gets you to the ground floor, but without it, the rest of the tower starts to lean like the one that’s happening in San Francisco right now. Brian: I haven’t heard about that. Carl: There’s a whole tower that’s tipping over. You should read it. It’s a great story. Brian: Foundation’s not so solid. Carl: Big lawsuit’s going on right now. Brian: Not the place you want to have a sagging tower either. Carl: Not the place but frankly, it’s really quite comparable because I’ve seen some large systems that will go unnamed, where there’s legacy technology and people are unaware perhaps why it’s so important to move from Python version 2 to Python version 3. One of the key differences is Unicode compliance. So if I hear about a large-scale enterprise system that’s based on Python version 2, I’m immediately suspicious that it’s going to be suitable for a global audience. Brian: I think about, from an experience standpoint, inputs, when you’re providing inputs into forms and understanding what people are typing in. If it’s a query form, obviously giving people back what they wanted and not necessarily what they typed in. We all take for granted things like this spelling correction, and not just the spelling correction, but in Google when you type in something, it sometimes give you something that’s beyond a spelling thing, “Did you mean X, Y, and Z?” I would think that being in the form about what people are typing into your form fields and mining your query logs, this is something I do sometimes with clients when they’re trying to learn something. I actually just read an article today about dell.com and the top query term on dell.com is ‘Google,’ which is a very interesting thing. I would be curious to know why people are typing that in. Is it really like people are actually trying to access Google or are they trying to get some information? But the point is to understand the input side and to try to return some kind of logical output. Whether it’s text analytics that’s providing that or it’s name-matching, it’s being aware of that and it’s sad when you have these gaps like what happened in this border crossing case where a name spelling is responsible for not flagging down these people. I mean, we put people on the moon and we get something like a name spelling wrong. It’s shocking in a way. I guess for those who are working in tech, we can understand how it might happen, but it’s scary that that’s still going on today. You’ve probably seen many other. Are you able to talk about it? Obviously, you have some in the intelligence field and probably government where you can’t talk about some of your clients, but are there other examples of learning that’s happened that, even if it’s not necessarily entity resolution where you’ve put dots together with some of your platform? Carl: I’ll say the biggest lesson that I’ve learned from nearly two decades of working on government applications involving multi-lingual data is the importance of retaining as much of the information in its native form as possible. For example, there is a very large division of the CIA which is focused on collecting open source intelligence in the form of newspapers, magazines, the digital equivalent of those, radio broadcast, TV broadcasts and so one. It’s a unit which used to be known as the Foreign Broadcast Information Service, going back to Word War II time, and today it’s called the Open Source Enterprise. They have a very large collection apparatus and they produce some extremely high quality products which are summaries and translations from sources in other languages. In their workflow, previously they would collect information, say in Chinese or in Russian, and then do a translation or summary into English, but then would discard the original or the original would be hidden from their enterprise architecture for query purposes. I believe that is no longer the case, but retaining the pre-translation original, whether it’s open source, closed source, commercial, enterprise information, government-related information, is really very important. That’s one lesson. The other lesson is appreciating the limits of machine translation. We’re increasingly seeing machine translation integrated into all kinds of information systems, but there needs to be a very sober appreciation of what is and what is not achievable and scalable by employing machine translation in your architecture. Brian: Can you talk at all about the translation? We have so much power now with NLP and what’s possible with the technology today. As I understand it, when we talk about translation, we’re talking about documents and things that are in written word that are being translated from one language to another. But in terms of spoken word, and we’re communicating right now, I’m going to ask you two questions. What do you know about NLP and what do you know about NLP? The first one I had a little bit of attitude which assumes that you don’t know too much about it, and the second one, I was treating you as an expert. When this gets translated into text, it loses that context. Where are we with that ability to look at the context, the tone, the sentiment that’s behind that? I would imagine that’s partly why you’re talking about saving the original source. It might provide some context like, “What are the headlines were in the paper?” and, “Which paper wrote it?” and, “Is there a bias with that paper?” whatever, having some context of the full article that that report came from can provide additional context. Humans are probably better at doing some of that initial eyeball analysis or having some idea of historically where this article’s coming from such that they can put it in some context as opposed to just seeing the words in a native language on a computer screen. Can you talk a little bit about that or where we are with that? And am I incorrect that we’re not able to look at that sentiment? I don’t even know how that would translate necessarily unless you had a playing back of a recording of someone saying the words. You have translation on top of the sentiment. Now you’ve got two factors of difficulty right there and getting it accurate. Carl: My knowledge of voice and speech analysis is very naive. I do know there’s an area of huge investment and the technology is progressing very rapidly. I suspect that voice models are already being built that can distinguish between the two different intonations you used in asking that question and are able to match those against knowledge bases separately. What I can tell you is that context and nuance are equally important in both spoken and written human communication. My knowledge is stronger when it comes to its written form. Capturing all of the context means that you can do a much better job of the analytics. That’s why, say, when we’re analyzing a document, we’re looking not only the individual word but the sentence, the paragraph, where does the text appear? Is it in the body? Is it in a heading? Is it in a caption? Is it in a footnote? Or if we’re looking at, say, human-typed input—I think this is where your audience would care if you’re designing forms or search boxes—there’s a lot that can be determined in terms of how the input is typed. Again, especially when you’re thinking globally. We’re familiar with typing English and typing queries or completing forms with the letters A through Z and the numbers 0 through 9, but the fastest-growing new orthography today is emoticons and emoji offer a lot of very valuable information about the mindset of the author. Say that we look at Chinese or Japanese, which are basically written with thousand-year-old emoji, where an individual must type a sequence of keys in order to create each of the Kanji or Hanzu that appears. There’s a great deal of information we can capture. For instance, if I’m typing a form in Japanese, saying I’m filling out my last name, and then my last name is Tanaka. Well, I’m going to type phonetically some characters that represent Tanaka, either in Latin letters or one of the Japanese phonetic writing systems, then I’m going to pick from a menu or the system is going to automatically pick for me the Japanese characters that represent Tanaka. But any really capable input system is going to keep both whatever I typed phonetically and the Kanji that I selected because both of those have value and the association between the two is not always obvious. There are similar ways of capturing context and meaning in other writing systems. For instance, let’s say I’m typing Arabic not in Arabic script but I’m typing with Roman letters. How I translate from those Roman letters into the Arabic alphabet may vary, depending upon if I’m using Gulf Arabic, or Levantine Arabic, or Cairene Arabic, and say the IP address of the person doing the typing may factor into how I do that transformation and how I interpret those letters. There’s examples for many other writing systems other than the Latin alphabet. Brian: I meant to ask you. Do you speak any other languages or do you study any other languages? Carl: I studied Japanese for a few years in high school. That’s really what got me into using computers to facilitate language understanding. I just never had the ability to really quickly memorize all of the Japanese characters, the radical components, and the variant pronunciations. After spending countless hours combing through paper dictionaries, I got very interested in building electronic dictionaries. My interest in electronic dictionaries eventually led to search engines and to lexicons, algorithms powered by lexicons, and then ultimately to machine learning and deep learning. Brian: I’m curious. I assume you need to employ either a linguist or at least people that speak multiple languages. One concern with advanced analytics right now and especially anything with prediction, is bias. I speak a couple of different languages and I think one of the coolest things about learning another language is seeing the world through another context. Right now, I’m learning Polish and there’s the concept of case and it doesn’t just come down to learning the prefixes and suffixes that are added to words. Effectively, that’s what the output is but it’s even understanding the nuance of when you would use that and what you’re trying to convey, and then when you relay it back to your own language, we don’t even have an equivalent between this. We would never divide this verb into two different sentiments. So you start to learn what you don’t even know to think about. I guess what I’m asking here is how do you capture those things? Say, in our case where I assume you’re an American and I am to, so we have our English that we grew up with and our context for that. How do you avoid bias? Do you think about bias? How do you build these systems in terms of approaching it from a single language? Ultimately, this code is probably written in English, I assume. Not to say that the code would be written in a different language but just the approach when you’re thinking about all these systems that have to do with language, where does that come in having integrating other people that speaks other languages? Can you talk about that a little bit? Carl: Bias is incredibly important in any system that tries to respond to human behavior. We have our own innate cultural biases that we’re sometimes not even aware of. As you point out, it’s impossible to separate human language from the underlying culture and, in some cases, geography and the lifestyle of the people who speak that language. Yes, this is something that we think about. I disagree with your remark about code being written in English. The most important pieces of code today are the frameworks for implementing various machine learning and deep learning architectures. These architectures for the most part are language or domain-agnostic. The language bias tends to creep in as an artifact of the data that we collect. If I were to, say, harvest a million pages randomly on the internet, a very large percentage of those pages would be in English, out of proportion to the proportion of the population of the planet who speaks English, just because English is common language for commerce, science, and so on. The bias comes in from the data or it comes in from the mindset of the architect, who may do something as simple-minded as allocating only eight bits per character or deciding that Python version 2 is an acceptable development platform. Brian: Sure. I should say, I wasn’t so much speaking about the script, the code, as much as I was thinking more about the humans behind it, their background, and their language that they speak, or these kinds of choices that you’re talking about because they’re informed by that person’s perspective. But thank you for clarifying. Carl: I agree with that observation as well. You’re certainly right. Brian: Do you have a way? You’re experts in this area and you’re obviously heavily invested in this area. Are there things that you have to do to prevent that bias, in terms of like, “We know what we don’t know about it, or we know enough about it but we don’t know if about, so we have a checklist or we have something that we go through to make sure that we’re checking ourselves to avoid these things”? Or is it more in the data collection phase that you’re worried about more so than the code or whatever that’s actually going to be taking the data and generating the software value at the other end? Is it more on the collection side that you’re thinking about? How do you prevent it? How do you check yourself or tell a client or customer, “Here’s how we’ve tried to make sure that the quality of what we’re giving you is good. We did A, B, C, and D.” Maybe I’m making a bigger issue out of this than it is. I’m not sure. Carl: No, it is a big issue. The best way to minimize that cultural bias is by building global teams. That’s something that we’ve done from the very beginning days of our company. We have a company in which collectively the team speaks over 20 languages, originate from many different countries around the world, and we do business in native countries around the world. That’s just been an absolute necessity because we produce products that are proficient in 40 different human languages. If you’re a large enterprise, more than 500 people, and you’re targeting markets globally, then you need to build a global team. That applies to all the different parts of the organization, including the executive team. It’s rare that you will see individuals who are, say, American culture with no meaningful international experience being successful in any kind of global expansion. Brian: That’s pretty awesome that you have that many languages going in the staff that you have working at the company. That’s cool and I think it does provide a different perspective on it. We talk about it even in the design firm. Sometimes, early managers in the design will want to go hire a lot of people that look like they do. Not necessarily physically but in terms of skill set. One of the practices that I’ve always liked is actually getting people that aren’t like you, that don’t think like you, in order to intentionally tease out what you don’t know, you know that you’re not going to look at the problem the same way they are, and you don’t necessarily know what the output is, but you can learn that there’s other perspectives to have, so too many like-minded individuals doesn’t necessarily mean that it’s better. I think that’s cool. Can you talk to me a little bit about one of the fun little nuggets that stuck in my head and I think you’ve attributed to somebody else, but was the word about getting insights from medium data. Can you talk to us about that? Carl: Sure. I should first start by crediting the individual who planted that idea in my head, which is Dr. Catherine Havasi of the MIT Media Lab, who’s also a cofounder of a company called Luminoso, which is a partner of ours. They do common sense understanding. The challenge with building truly capable text analytics from large amounts of unstructured text is obtaining sufficient volume. If you are a company on the scale of Facebook or Google, you have access to truly enormous amount of text. I can’t quantify it in petabytes or exabytes, but it is a scale that is much greater than the typical global enterprise or Fortune 2000 company, who themselves may have very massive data lakes. But still, those data lakes are probably three to five orders of magnitudes smaller than what Google or Facebook may have under their control. That intermediate-sized data, which is sloppily referred to as big data, we think of it as medium data. We think about the challenge of allowing companies with medium data assets to obtain big data quality results, or business intelligence that’s comparable to something that Google or Facebook might be able to obtain. We do that by building models that are hybrid, that combine knowledge graphs or semantic graphs, derived from very large open sources with the information that they can extract from their proprietary data lakes, and using the open sources and the models that we build as amplifiers for their own data. Brian: I believe when we were talking, you have mentioned a couple of companies that are building products on top of you. Difio, I think, was one, and Tamr, and Luminoso. So is that related to what these companies are doing? Carl: Yes, it absolutely is related. Luminoso, in particular, is using this process of synthesizing results from their customers, proprietary data with their own models. The Luminoso team grew out of the team at MIT that built something called Constant Net, which is a very large net of graph in multiple languages. But actually, Difio as well is also using this approach of federating both open and closed source repositories by integrating a large number of connectors into their architecture. They have access to web content. They have access to various social media fire hoses. They have access to proprietary data feeds from financial news providers. But then, they fuse that with internal sources of information that may come from sources like SharePoint, or Dropbox, or Google Drive, or OneDrive, your local file servers, and then give you a single view into all of this data. Brian: Awesome. I don’t want to keep you too long. This has been super informational for me, learning about your space that you’re in. Can you tell us any closing thoughts, advice for product managers, analytics practitioners? We talked a little about obviously thinking globally and some of those areas. Any other closing thoughts about delivering good experiences, leveraging text analytics, other things to watch out for? Any general thoughts? Carl: Sure. I’ll close with a few thoughts. One is repeating what I’ve said before about Unicode compliance. The fact that I again have to state that is somewhat depressing yet it’s still isn’t taken as an absolute requirement, which is today, and yet continues to be overlooked. Secondly, just thinking globally, anything that you’re building, you got to think about a global audience. I’ll share with you an anecdote. My company gives a lot of business to Eventbrite, who I would expect by now would have a fully globalized platform, but it turns out their utility for sending an email to everybody who signed-up for an event doesn’t work in Japanese. I found that out the hard way when I needed to send an email to everybody that was signed up for our conference in Tokyo. That was very disturbing and I’m not afraid to say that live on a podcast. They need to fix it. You really don’t want customers finding out about that during a time of high stress and high pressure, and there’s just no excuse for that. Then my third point with regard to natural language understanding. This is a really incredibly exciting time to be involved with natural language, with human language because the technology is changing so rapidly and the space of what is achievable is expanding so rapidly. My final point of advice is that hybrid architectures have been the best and continue to be the best. There’s a real temptation to say, “Just grow all of my text into a deep neural net and magic is going to happen.” That can be true if you have sufficiently large amounts of data, but most people don’t. Therefore, you’re going to get better results by using hybrids of algorithmic simpler machine learning architectures together with deep neural nets. Brian: That last tip, can you take that down one more notch? I assume you’re talking about a level of quality on the tail-end of the technology implementation, there’s going to be some higher quality output. Can you translate what a hybrid architecture means in terms of a better product at the other end? What would be an example of that? Carl: Sure. It’s hard to do without getting too technical, but I’ll try and I’ll try to use some examples in English. I think the traditional way of approaching deep nets has very much been take a very simple, potentially deep and recursive neural network architecture and just throw data at it, especially images or audio waveforms. I throw my images in and I want to classify which ones were taken outdoors and which ones were taken indoors with no traditional signal processing or image processing added before or after. In the image domain, my understanding is that, that kind of purist approach is delivered the best results and that’s what I’ve heard. I don’t have first-hand information about that. However, when it comes to human language in its written form, there’s a great deal of traditional processing of that text that boosts the effectiveness of the deep learning. That falls into a number of layers that I won’t go into, but to just give you one example, let’s talk about what we called Orthography. The English language is relatively simple and that the orthography is generally quite simple. We’ve got the letters A through Z, an uppercase and lowercase, and that’s about it. But if you look inside, say a PDF of English text, you’ll sometimes encounter things like ligatures, like a lowercase F followed by a lowercase I, or two lowercase Fs together, will be replaced with single glyph to make it look good in that particular typeface. If I think those glyphs and I just throw them in with all the rest of my text, that actually complicates the job of the deep learning. If I take that FI ligature and convert it back to separate F followed by I, or the FF ligature and convert it back to FF, my deep learning doesn’t have to figure out what those ligatures are about. Now that seems pretty obscure in English but in other writing systems, especially Arabic, for instance, in which there’s an enormous number of ligatures, or Korean or languages that have diacritical marks, processing those diacritical marks, those ligatures, those orthographic variations using conventional means will make your deep learning run much faster and give you better results with less data. That’s just one example but there’s a whole range or other text-processing steps using algorithms that have been developed over many years, that simply makes the deep learning work better and that results in what we call a hybrid architecture. Brian: So it sounds like taking, as opposed to throw it all in a pot and stir, there’s the, “Well, maybe I’m going to cut the carrots neatly into the right size and then throw them in the soup.” Carl: Exactly. Brian: You’re kind of helping the system do a better job at its work. Carl: That’s right and it’s really about thinking about your data and understanding something about it before you throw it into the big brain. Brian: Exactly. Cool. Where can people follow you? I’ll put a link up to the Basis in the show notes but are you on Twitter or LinkedIn somewhere? Where can people find you? Carl: LinkedIn tends to be my preferred social network. I just was never really good at summarizing complex thoughts into 140 characters, so that’s the best place to connect with me. Basically, we’ll tell you all about Basis Technology and rosette.com is our text analytics platform, which is free for anybody to explore, and to the best of my knowledge, it is the most capable text analytics platform with the largest number of languages that you will find anywhere on the public internet. Brian: All right, I will definitely put those up in the show notes. This has been fantastic, I’ve learned a ton, and thanks for coming on Experiencing Data. Carl: Great talking with you, Brian. Brian: All right. Cheers. Carl: Cheers.
Episode Notes This week I got to sit down with Tareq Tamr aka Ibn Abi Tareq who has grown quite a sizeable following on Social Media by talking about Islam ion ways that many people can relate. Hmmm wonder what a podcast like that sounds like?
Employees of large organizations know too well the pain of working on multiple systems. Disconnected data is costly and operationally inefficient, and it prohibits businesses from being able to compete with larger enterprises. Tamr helps some of the largest companies in the world integrate data from across disparate data silos in their organizations.
Humam and Tareq discuss how practicing Muslims need to do a better job of getting along. ============== www.wahedinvest.com Wahed Invest set out with an idea to provide our community with a reliable, transparent, and most importantly, accessible investment product. www.halfourdeen.com Half our Deen is the Private Muslim Matrimonial website. www.MyWassiyah.com Receive an exclusive discount by using the link below to sign up with MyWassiyah.com http://6mywassiyah.refr.cc/themadmamluks Make sure income from investments is halal! https://www.wahedinvest.com/ ============== E-mail us your comments, feedback, and questions at TheMadMamluks@gmail.com Follow us on Twitter: @TheMadMamluks Like us on Facebook: www.facebook.com/themadmamluks View pictures of our guests and studio on Instagram: TheMadMamluks *NEW* Subscribe to watch us Live on YouTube: www.youtube.com/themadmamluks
“When you are in a position when you are being forced to do things that are not consistent to your values, then you have to be more creative, and you have to find another way. There are lots of ways to be successful” Take some time in your day to get to know Andy Palmer. Andy is the CEO and founder of Tamr and Koa Labs in Boston. He was named Boston's Angel investor of the year, and has helped found or fund more than 50 innovative companies in technology, healthcare, and life sciences. Jay and Andy have a frank conversation about the emotional challenges of entrepreneurship, touching on burnout, overwhelm and maintaining your values while you navigate your entrepreneurial journey. Luckily, Andy shares some secrets about what has made him so successful, and some of the takeaways that he's learned from advising hundreds of founders in his role as an investor and business mentor. Tune in and listen to the conversation as Jay and Andy explore how knowledge and mastery of one's self is the truest determinant of success if you want to be a thriving entrepreneur. Visit https://jayrooke.com/035-andy-palmer/ for resources and show notes. Check out my website: https://jayrooke.com/ Follow me on: Facebook: https://www.facebook.com/TribeCreator/ LinkedIn: https://www.linkedin.com/in/jayrooke/ Twitter: https://twitter.com/JayRooke Episode Highlights: 00:55 Boston Cambridge startup Innovation Ecosystem 08:08 Partnership model for Entrepreneurship 13:55 Know yourself 22:47 Learning balance 27:43 The best organizational model
With the proliferation of data sources to give a more comprehensive view of the information critical to your business it is even more important to have a canonical view of the entities that you care about. Is customer number 342 in your ERP the same as Bob Smith on Twitter? Building a master data set helps you answer these questions reliably and simplify the process of building your business intelligence reports. In this episode the head of product at Tamr, Mark Marinelli, discusses the challenges of building a master data set, why you should have one, and some of the techniques that modern platforms and systems provide for maintaining it.
In this podcast @AndyPalmer from @Tamr sat with @Vishaltx from @AnalyticsWeek to talk about the emergence/need/market for Data Ops, a specialized capability emerging from merging data engineering and dev ops ecosystem due to increased convoluted data silos and complicated processes. Andy shared his journey on what some of the businesses and their leaders are doing wrong and how businesses need to rethink their data silos to future proof themselves. This is a good podcast for any data leader thinking about cracking the code on getting high-quality insights from data. Timelines: 0:28 Andy's journey. 4:56 What's Tamr? 6:38 What's Andy's role in Tamr. 8:16 What's data ops? 13:07 Right time for business to incorporate data ops. 15:56 Data exhaust vs. data ops. 21:05 Tips for executives in dealing with data. 23:15 Suggestions for businesses working with data. 25:48 Creating buy-in for experimenting with new technologies. 28:47 Using data ops for the acquisition of new companies. 31:58 Data ops vs. dev ops. 36:40 Big opportunities in data science. 39:35 AI and data ops. 44:28 Parameters for a successful start-up. 47:49 What still surprises Andy? 50:19 Andy's success mantra. 52:48 Andy's favorite reads. 54:25 Final remarks. Andy's Recommended Read: Enlightenment Now: The Case for Reason, Science, Humanism, and Progress by Steven Pinker https://amzn.to/2Lc6WqK The Three-Body Problem by Cixin Liu and Ken Liu https://amzn.to/2rQyPvp Andy's BIO: Andy Palmer is a serial entrepreneur who specializes in accelerating the growth of mission-driven startups. Andy has helped found and/or fund more than 50 innovative companies in technology, health care, and the life sciences. Andy's unique blend of strategic perspective and disciplined tactical execution is suited to environments where uncertainty is the rule rather than the exception. Andy has a specific passion for projects at the intersection of computer science and the life sciences. Most recently, Andy co-founded Tamr, a next-generation data curation company, and Koa Labs, a start-up club in the heart of Harvard Square, Cambridge, MA. Specialties: Software, Sales & Marketing, Web Services, Service Oriented Architecture, Drug Discovery, Database, Data Warehouse, Analytics, Startup, Entrepreneurship, Informatics, Enterprise Software, OLTP, Science, Internet, eCommerce, Venture Capital, Bootstrapping, Founding Team, Venture Capital firm, Software companies, early-stage venture, corporate development, venture-backed, venture capital fund, world-class, stage venture capital About #Podcast: #FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future. Podcast link: https://futureofdata.org/emergence-of-dataops-age-andypalmer-futureofdata-podcast/ Wanna Join? If you or any you know wants to join in, Register your interest and email at info@analyticsweek.com Want to sponsor? Email us @ info@analyticsweek.com Keywords: #FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy
Welcome to Episode 22 of The VentureFizz Podcast, the flagship podcast of Boston's most trusted source for startup and tech jobs, news, and insights! Let's face it: in order for a tech ecosystem to thrive, you need people who are not only successful entrepreneurs, but also know how to pay it forward for the next generation. For the 22nd episode of our podcast, I interviewed Andy Palmer, who is the CEO & Co-Founder of Tamr in Cambridge. Andy is the definition of what it means to pay it forward. A serial successful entrepreneur in his own right, and he is also a very active angel investor with close to 60 of his own investments (not to mention being an LP in other local funds, and mentoring lots of founders along the way). In this episode, we cover topics like: -How rugby relates to startups -The details behind Vertica, a very successful company that was acquired by HP -Why you shouldn't raise capital in the early days of your company -The common mistakes entrepreneurs make when pitching an idea -And lots more! Lastly, if you like the show, please remember to subscribe to and review us on iTunes, or your podcast player of choice! And make sure to follow Tamr on Twitter @Tamr_Inc and VentureFizz @VentureFizz.
Now access our FREE eBook “The Mentoring Round” featuring career insights from 25 of our CFO Thought Leaders: http://bit.ly/2Ga5Vfq
Procurement isn't usually seen as a "sexy" aspect of a business's operations. Procurement personnel are responsible for sourcing suppliers or vendors, determining criterion of success, negotiating deal terms, and tracking results and deliverables - all of which could be considered "under appreciated" work. This week, Tamr's Eliot Knudsen walks us through the ways that AI is making it's way into the procurement process, and what it means for the future of this job function. For more executive interviews about the applications and implications of AI, visit: www.TechEmergence.com
For the first Utterly Biased Podcast, Dennis Keohane sits down with Mike Troiano to talk about his move from CMO at Actifio to a VC at G20 Ventures, including his investment thesis and role of investors. Dennis also talks with Andy Palmer of Tamr who just took on the role of founder partner at Founder Collective.
Today’s guest is Eliot Knudsen. Eliot’s background is in Data Science, Statistics, Mathematics and Machine Learning. As you will hear in the show, Eliot is currently a Field Engineer for Tamr, a leader in procurement data analytics. I invited Eliot on the show to talk in more detail about data, and how procurement organizations can access and consolidate primary sources of data to allow strategies and decisions to be made with data, rather than a hunch. Full show notes can be found at artofprocurement.com/data
A researcher at MIT’s Computer Science and Artificial Intelligence Lab, Michael Stonebraker has founded and led nine different big-data spin-offs, including VoltDB, Tamr and Vertica - the latter of which was bought by Hewlett Packard for $340 million. Now he’s bringing his insights to a new online course being offered this month through edX and MIT Professional Education. Co-taught by long-time business partner Andy Palmer, “Startup Success: How to Launch a Technology Company in 6 Steps” covers topics ranging from generating ideas and recruiting top talent to pitching VCs and negotiating deals - all in the span of three weeks.
Summary Eliot Knudsen, field engineer at Tamr talks to me about their machine learning tool and a new way of examining data. Details Who he is and what he does; what is Tamr; working with data sources, the traditional way, the Tamr way, machine learning combined with human guidance;data quality and foreign languages; Thompson Reuters example, curating data, increasing speed; deploying Tamr; how Tamr works, db, java, web client; competitors; future work.