Podcasts about ETL

Share on
Share on Facebook
Share on Twitter
Share on Reddit
Copy link to clipboard
  • 196PODCASTS
  • 383EPISODES
  • 39mAVG DURATION
  • 5WEEKLY NEW EPISODES
  • Dec 27, 2021LATEST

POPULARITY

20122013201420152016201720182019202020212022


Best podcasts about ETL

Latest podcast episodes about ETL

AI and the Future of Work
Peter Fishman, co-founder and CEO of Mozart Data, discusses data pipelines and why they're defining the future of data analytics

AI and the Future of Work

Play Episode Listen Later Dec 27, 2021 37:07


Peter Fishman ("Fish"), co-founder and CEO of Mozart Data, had a vision for making it easy for any business to unlock the value of their data via a modern data stack. He and his co-founder believe rote data engineering work shouldn't require teams of in-house data engineers. Fish turned his PhD in Economics and passion for statistics into a successful, venture-backed YC company that is defining the future of data analytics.Listen and learn...Why Fish believes "not every business gets value out of their data... but every business can."The role of data pipelines in automating the cleaning and transforming of data.Fish's prediction for where humans will be needed for data analysis in a decade.What Fish learned working with David Sacks at Yammer.How bacon hot sauce inspired the founding of Mozart Data.References in this episode:Barr Moses from Monte Carlo  on AI and the Future of WorkDerek Steer from Mode on AI and the Future of WorkFivetran for simplifying data integration

Talk Python To Me - Python conversations for passionate developers

Do you enjoy the "final 2 questions" I always ask at the end of the show? I think it's a great way to track the currents of the Python community. This episode focuses in on one of those questions: "What notable PyPI package have you come across recently? Not necessarily the most popular one but something that delighted you and people should know about?" Our guest, Antonio Andrade put together a GitHub repository cataloging guests' response to this question over the past couple of years. So I invited him to come share the packages covered there. We touch on over 40 packages during this episode so I'm sure you'll learn a few new gems to incorporate into your workflow. Links from the show Antonio on Twitter: @AntonioAndrade Notable PyPI Package Repo: github.com/xandrade/talkpython.fm-notable-packages Antonio's recommended packages from this episode: Sumy: Extract summary from HTML pages or plain texts: github.com gTTS (Google Text-to-Speech): github.com Packages discussed during the episode 1. FastAPI - A-W-E-S-O-M-E web framework for building APIs: fastapi.tiangolo.com 2. Pythonic - Graphical automation tool: github.com 3. umap-learn - Uniform Manifold Approximation and Projection: readthedocs.io 4. Tortoise ORM - Easy async ORM for python, built with relations in mind: tortoise.github.io 5. Beanie - Asynchronous Python ODM for MongoDB: github.com 6. Hathi - SQL host scanner and dictionary attack tool: github.com 7. Plotext - Plots data directly on terminal: github.com 8. Dynaconf - Configuration Management for Python: dynaconf.com 9. Objexplore - Interactive Python Object Explorer: github.com 10. AWS Cloud Development Kit (AWS CDK): docs.aws.amazon.com 11. Luigi - Workflow mgmt + task scheduling + dependency resolution: github.com 12. Seaborn - Statistical Data Visualization: pydata.org 13. CuPy - NumPy & SciPy for GPU: cupy.dev 14. Stevedore - Manage dynamic plugins for Python applications: docs.openstack.org 15. Pydantic - Data validation and settings management: github.com 16. pipx - Install and Run Python Applications in Isolated Environments: pypa.github.io 17. openpyxl - A Python library to read/write Excel 2010 xlsx/xlsm files: readthedocs.io 18. HttpPy - More comfortable requests with python: github.com 19. rich - Render rich text, tables, progress bars, syntax highlighting, markdown and more to the terminal: readthedocs.io 20. PyO3 - Using Python from Rust: pyo3.rs 21. fastai - Making neural nets uncool again: fast.ai 22. Numba - Accelerate Python Functions by compiling Python code using LLVM: numba.pydata.org 23. NetworkML - Device Functional Role ID via Machine Learning and Network Traffic Analysis: github.com 24. Flask-SQLAlchemy - Adds SQLAlchemy support to your Flask application: palletsprojects.com 25. AutoInvent - Libraries for generating GraphQL API and UI from data: autoinvent.dev 26. trio - A friendly Python library for async concurrency and I/O: readthedocs.io 27. Flake8-docstrings - Extension for flake8 which uses pydocstyle to check docstrings: github.com 28. Hotwire-django - Integrate Hotwire in your Django app: github.com 29. Starlette - The little ASGI library that shines: github.com 30. tenacity - Retry code until it succeeds: readthedocs.io 31. pySerial - Python Serial Port Extension: github.com 32. Click - Composable command line interface toolkit: palletsprojects.com 33. Pytest - Simple powerful testing with Python: docs.pytest.org 34. testcontainers-python - Test almost anything that can run in a Docker container: github.com 35. cibuildwheel - Build Python wheels on CI with minimal configuration: readthedocs.io 36. async-rediscache - An easy to use asynchronous Redis cache: github.com 37. seinfeld - Query a Seinfeld quote database: github.com 38. notebook - A web-based notebook environment for interactive computing: readthedocs.io 39. dagster - A data orchestrator for machine learning, analytics, and ETL: dagster.io 40. bleach - An easy safelist-based HTML-sanitizing tool: github.com 41. flynt - string formatting converter: github.com   Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe on YouTube: youtube.com Follow Talk Python on Twitter: @talkpython Follow Michael on Twitter: @mkennedy Sponsors Coiled TopTal AssemblyAI Talk Python Training

Fréquence Plus : Le Buzz
Le Buzz du 17 Décembre

Fréquence Plus : Le Buzz

Play Episode Listen Later Dec 17, 2021 3:57


Lumière sur l'association La Braillotte, créée en 2015 pour mettre en valeur et défendre l'identité de la Franche-Comté. Depuis l'association a lancé des livres « Moi j'parle le comtois ! ...pas toi ?» et des calendriers ! Et L'édition 2022 est prête ! On la découvre avec Sophie Garnier, fondatrice de l'association La Braillotte.

Angelneers: Insights From Startup Builders
Hightouch: Pioneering the New Era of Operational Analytics with Kashish Gupta

Angelneers: Insights From Startup Builders

Play Episode Listen Later Dec 17, 2021 49:02


Reverse ETL is the hot, popular trend within the modern tech stack. ETL tools like Fivetran, Stitch, and Matillion make it easy to set up and send data to a warehouse with the click of a button. As software firms of all sizes generate more data than ever, their data warehouses are becoming more and more important in facilitating the rise of operational analytics in various internal organizations. Hightouch is a data platform that helps users to sync their customer data in their data warehouse to their SaaS sales and marketing tools such as Hubspot, Salesforce, Marketo, Zendesk, Gainsight and others. We wrap up Season 2 of our podcast with an interview with Kashish Gupta, a co-founder and co-CEO of Hightouch, discussing the main key innovation in data warehouse technology that facilitated empowering business teams with operational analytics.   https://hightouch.io/blog/ kashgupta.com/

Education Evolution
87. Empowering Learners & Educating Their Guides

Education Evolution

Play Episode Listen Later Dec 7, 2021 42:34


We know that learning is challenging for many kids, especially when schools rely on a prescribed curriculum that's voted on from the top--not with individual kids' needs in mind. Seeing all the gaps in the education system has led many to taking action to support our colorful, mismatched learners. I'm one of those, and so are this week's guests. This week we're talking to three amazing powerhouses in educational change, Kathleen McClaskey, CEO and chief learning officer of Empower the Learner; Julie Hartman, chief mindfulness officer; and Dr. Hillary Goldthwait-Fowles, Chief Accessibility and Technology consultant rounds out the team.. In this episode, we talk about the roadblocks to assistive technology, teaching kids mindfulness and how to empower them, being more compassionate parents and teachers, preconceived notions about learning, and so much more. This is such an eye-opening and thought-provoking conversation. Be sure to tune in! About Kathleen McClaskey Kathleen McClaskey, M.Ed. is CEO and Chief Learning Officer of Empower the Learner, LLC, founder of Make Learning Personal, and co-author of bestsellers Make Learning Personal and How to Personalize Learning, and contributing author to 100 No-Nonsense Things that ALL Teachers Should STOP Doing. She is an innovative thought leader, international speaker, professional developer, and Universal Design for Learning (UDL) consultant with over 35 years of experience in creating learner-centered environments as a teacher, K-12 technology administrator, and consultant. Kathleen is passionate about empowering ALL learners to thrive with tools, skills, and practices so they become self-directed learners, learners with agency, who are future-ready for college, career, and life. About Julie Hartman Julie Hartman, Chief Mindfulness Officer for Empower the Learner, is the founder of The Mindful Learner Program, and is also a Life and Mindfulness Coach with over 20 years of experience coaching, mentoring, and teaching. Julie is an eternal optimist dedicated to helping others realize their own inherent worth and empowerment. She designs, leads, and teaches individual and group coaching programs, classes and workshops, meditation groups, and coaching intensives. She is committed to bringing the best of what she is living and learning to her work and continually strives to be a positive leader in the fields of personal growth, mindfulness, and transformation. About Dr. Hillary Goldthwait-Fowles, ATP Hillary Goldthwait-Fowles is an accessibility accomplice specializing in assistive and inclusive technology, universal design for learning, and accessible educational materials. She has been in the field of education for 26 years, as a special education teacher and as an assistive technology specialist. She is also an adjunct faculty member, course designer, and subject matter expert at the University of New England and the University of Maine at Farmington. She is a firm believer that educators have been prepared backward to teach in education, which excludes children who do not "fit the mold," and recognizes the intentionality of this harmful design. She serves as the Chief Accessibility and Technology Consultant for ETL. Home is where her heart is in Saco, Maine with her husband, son, and stepson (who have both left the nest) and cats. Jump in the Conversation: [2:28] - What is Empower the Learner (ETL) [3:04] - How ETL was formed [4:29] - ETL process for getting kids to understand who they are and how they learn [6:53] - It's wrong to teach one way [7:32] - If we come at it from a place of love, we can help kids be seen and heard [8:49] - Most commonly recommended tools [10:33] - We need to presume confidence for all kids [11:39] - Mindfulness in personal development [15:15] - Using mindfulness for self-regulation [16:35] - Examples of how to apply this learning [20:22] - Preconceived notions in learning and labels [21:27] - Goal is the same; means to get to that goal is flexible [24:52] - We're creating more kids with mental health challenges than we ever have before [26:30] If we're not empowering teachers, how can they empower kids [27:29] Turbo Time [30:25] What people need to know about UDL [32:51] Magic Wand Moment   Links & Resources Empower the Learner Register for the Book Creator webinar  Book Creator Empower the Learner template and book Follow Kathleen on LinkedIn Follow Hillary on LinkedIn Follow Julie on LinkedIn Mia Mingus's Disability Justice  Alice Wong on Identity, Disability Justice, and Her New Anthology Joy Zabala and Tools to Task Limitless by Jim Kwik Judy Heumann: Washington Post article on the “badass mother of disability rights” Crip Camp 2020 film on Netflix Being Human book by Judy Heumman Haben Girma, first deaf/blind graduate of Harvard Law School Email Maureen Maureen's TEDx: Changing My Mind to Change Our Schools The Education Evolution Facebook: Follow Education Evolution Twitter: Follow Education Evolution LinkedIn: Follow Education Evolution EdActive Collective Maureen's book: Creating Micro-Schools for Colorful Mismatched Kids Micro-school feature on Good Morning America The Micro-School Coalition Facebook: The Micro-School Coalition LEADPrep

Catalog & Cocktails
Modern Data Stack: Technology, Methodology, or both? w/ Nick Schrock

Catalog & Cocktails

Play Episode Listen Later Dec 2, 2021 59:54


The modern data stack is often defined by the type of technologies that exist within it. Cloud-based, open source, low/no code tools, ELT, and reverse ETL. But surely there's more to it… isn't there? What holds the modern data stack together and makes it the architecture of choice for so many data-driven enterprises? Join Tim, Juan and special guest, Nick Schrock, founder of Elemental and creator of Dagster and GraphQL, to chat about all things MDS. This episode will feature: Is modern data stack a methodology or a set of disparate cloud technologies? Thoughts on consolidation among MDS tools Describe your reaction upon glancing at Matt Turck's latest data landscape diagram

Screaming in the Cloud
Keeping the Chaos Searchable with Thomas Hazel

Screaming in the Cloud

Play Episode Listen Later Nov 30, 2021 44:43


About ThomasThomas Hazel is Founder, CTO, and Chief Scientist of ChaosSearch. He is a serial entrepreneur at the forefront of communication, virtualization, and database technology and the inventor of ChaosSearch's patented IP. Thomas has also patented several other technologies in the areas of distributed algorithms, virtualization and database science. He holds a Bachelor of Science in Computer Science from University of New Hampshire, Hall of Fame Alumni Inductee, and founded both student & professional chapters of the Association for Computing Machinery (ACM).Links:ChaosSearch: https://www.chaossearch.io TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by my friends at ThinkstCanary. Most companies find out way too late that they've been breached. ThinksCanary changes this and I love how they do it. Deploy canaries and canary tokens in minutes and then forget about them. What's great is the attackers tip their hand by touching them, giving you one alert, when it matters. I use it myself and I only remember this when I get the weekly update with a “we're still here, so you're aware” from them. It's glorious! There is zero admin overhead  to this, there are effectively no false positives unless I do something foolish. Canaries are deployed and loved on all seven continents. You can check out what people are saying at canary.love. And, their Kub config canary token is new and completely free as well. You can do an awful lot without paying them a dime, which is one of the things I love about them. It is useful stuff and not an, “ohh, I wish I had money.” It is speculator! Take a look; that's canary.love because it's genuinely rare to find a security product that people talk about in terms of love. It really is a unique thing to see. Canary.love. Thank you to ThinkstCanary for their support of my ridiculous, ridiculous non-sense.   Corey: This episode is sponsored in part by our friends at Vultr. Spelled V-U-L-T-R because they're all about helping save money, including on things like, you know, vowels. So, what they do is they are a cloud provider that provides surprisingly high performance cloud compute at a price that—while sure they claim its better than AWS pricing—and when they say that they mean it is less money. Sure, I don't dispute that but what I find interesting is that it's predictable. They tell you in advance on a monthly basis what it's going to going to cost. They have a bunch of advanced networking features. They have nineteen global locations and scale things elastically. Not to be confused with openly, because apparently elastic and open can mean the same thing sometimes. They have had over a million users. Deployments take less that sixty seconds across twelve pre-selected operating systems. Or, if you're one of those nutters like me, you can bring your own ISO and install basically any operating system you want. Starting with pricing as low as $2.50 a month for Vultr cloud compute they have plans for developers and businesses of all sizes, except maybe Amazon, who stubbornly insists on having something to scale all on their own. Try Vultr today for free by visiting: vultr.com/screaming, and you'll receive a $100 in credit. Thats v-u-l-t-r.com slash screaming.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. This promoted episode is brought to us by our friends at ChaosSearch.We've been working with them for a long time; they've sponsored a bunch of our nonsense, and it turns out that we've been talking about them to our clients since long before they were a sponsor because it actually does what it says on the tin. Here to talk to us about that in a few minutes is Thomas Hazel, ChaosSearch's CTO and founder. First, Thomas, nice to talk to you again, and as always, thanks for humoring me.Thomas: [laugh]. Hi, Corey. Always great to talk to you. And I enjoy these conversations that sometimes go up and down, left and right, but I look forward to all the fun we're going to have.Corey: So, my understanding of ChaosSearch is probably a few years old because it turns out, I don't spend a whole lot of time meticulously studying your company's roadmap in the same way that you presumably do. When last we checked in with what the service did-slash-does, you are effectively solving the problem of data movement and querying that data. The idea behind data warehouses is generally something that's shoved onto us by cloud providers where, “Hey, this data is going to be valuable to you someday.” Data science teams are big proponents of this because when you're storing that much data, their salaries look relatively reasonable by comparison. And the ChaosSearch vision was, instead of copying all this data out of an object store and storing it on expensive disks, and replicating it, et cetera, what if we queried it in place in a somewhat intelligent manner?So, you take the data and you store it, in this case, in S3 or equivalent, and then just query it there, rather than having to move it around all over the place, which of course, then incurs data transfer fees, you're storing it multiple times, and it's never in quite the format that you want it. That was the breakthrough revelation, you were Elasticsearch—now OpenSearch—API compatible, which was great. And that was, sort of, a state of the art a year or two ago. Is that generally correct?Thomas: No, you nailed our mission statement. No, you're exactly right. You know, the value of cloud object stores, S3, the elasticity, the durability, all these wonderful things, the problem was you couldn't get any value out of it, and you had to move it out to these siloed solutions, as you indicated. So, you know, our mission was exactly that, transformed customers' cloud storage into an analytical database, a multi-model analytical database, where our first use case was search and log analytics, replacing the ELK stack and also replacing the data pipeline, the schema management, et cetera. We automate the entire step, raw data to insights.Corey: It's funny we're having this conversation today. Earlier, today, I was trying to get rid of a relatively paltry 200 gigs or so of small files on an EFS volume—you know, Amazon's version of NFS; it's like an NFS volume except you're paying Amazon for the privilege—great. And it turns out that it's a whole bunch of operations across a network on a whole bunch of tiny files, so I had to spin up other instances that were not getting backed by spot terminations, and just firing up a whole bunch of threads. So, now the load average on that box is approaching 300, but it's plowing through, getting rid of that data finally.And I'm looking at this saying this is a quarter of a terabyte. Data warehouses are in the petabyte range. Oh, I begin to see aspects of the problem. Even searching that kind of data using traditional tooling starts to break down, which is sort of the revelation that Google had 20-some-odd years ago, and other folks have since solved for, but this is the first time I've had significant data that wasn't just easily searched with a grep. For those of you in the Unix world who understand what that means, condolences. We're having a support group meeting at the bar.Thomas: Yeah. And you know, I always thought, what if you could make cloud object storage like S3 high performance and really transform it into a database? And so that warehouse capability, that's great. We like that. However to manage it, to scale it, to configure it, to get the data into that, was the problem.That was the promise of a data lake, right? This simple in, and then this arbitrary schema on read generic out. The problem next came, it became swampy, it was really hard, and that promise was not delivered. And so what we're trying to do is get all the benefits of the data lake: simple in, so many services naturally stream to cloud storage. Shoot, I would say every one of our customers are putting their data in cloud storage because their data pipeline to their warehousing solution or Elasticsearch may go down and they're worried they'll lose the data.So, what we say is what if you just said activate that data lake and get that ELK use case, get that BI use case without that data movement, as you indicated, without that ETL-ing, without that data pipeline that you're worried is going to fall over. So, that vision has been Chaos. Now, we haven't talked in, you know, a few years, but this idea that we're growing beyond what we are just going after logs, we're going into new use cases, new opportunities, and I'm looking forward to discussing with you.Corey: It's a great answer that—though I have to call out that I am right there with you as far as inappropriately using things as databases. I know that someone is going to come back and say, “Oh, S3 is a database. You're dancing around it. Isn't that what Athena is?” Which is named, of course, after the Greek Goddess of spending money on AWS? And that is a fair question, but to my understanding, there's a schema story behind that does not apply to what you're doing.Thomas: Yeah, and that is so crucial is that we like the relational access. The time-cost complexity to get it into that, as you mentioned, scaled access, I mean, it could take weeks, months to test it, to configure it, to provision it, and imagine if you got it wrong; you got to redo it again. And so our unique service removes all that data pipeline schema management. And because of our innovation because of our service, you do all schema definition, on the fly, virtually, what we call views on your index data, that you can publish an elastic index pattern for that consumption, or a relational table for that consumption. And that's kind of leading the witness into things that we're coming out with this quarter into 2022.Corey: I have to deal with a little bit of, I guess, a shame here because yeah, I'm doing exactly what you just described. I'm using Athena to wind up querying our customers' Cost and Usage Reports, and we spend a couple hundred bucks a month on AWS Glue to wind up massaging those into the way that they expect it to be. And it's great. Ish. We hook it up to Tableau and can make those queries from it, and all right, it's great.It just, burrr goes the money printer, and we somehow get access and insight to a lot of valuable data. But even that is knowing exactly what the format is going to look like. Ish. I mean, Cost and Usage Reports from Amazon are sort of aspirational when it comes to schema sometimes, but here we are. And that's been all well and good.But now the idea of log files, even looking at the base case of sending logs from an application, great. Nginx, or Apache, or [unintelligible 00:07:24], or any of the various web servers out there all tend to use different logging formats just to describe the same exact things, start spreading that across custom in-house applications and getting signal from that is almost impossible. “Oh,” people say, “So, we'll use a structured data format.” Now, you're putting log and structuring requirements on application developers who don't care in the first place, and now you have a mess on your hands.Thomas: And it really is a mess. And that challenge is, it's so problematic. And schemas changing. You know, we have customers and one reasons why they go with us is their log data is changing; they didn't expect it. Well, in your data pipeline, and your Athena database, that breaks. That brings the system down.And so our system uniquely detects that and manages that for you and then you can pick and choose how you want to export in these views dynamically. So, you know, it's really not rocket science, but the problem is, a lot of the technology that we're using is designed for static, fixed thinking. And then to scale it is problematic and time-consuming. So, you know, Glue is a great idea, but it has a lot of sharp [pebbles 00:08:26]. Athena is a great idea but also has a lot of problems.And so that data pipeline, you know, it's not for digitally native, active, new use cases, new workloads coming up hourly, daily. You think about this long-term; so a lot of that data prep pipelining is something we address so uniquely, but really where the customer cares is the value of that data, right? And so if you're spending toils trying to get the data into a database, you're not answering the questions, whether it's for security, for performance, for your business needs. That's the problem. And you know, that agility, that time-to-value is where we're very uniquely coming in because we start where your data is raw and we automate the process all the way through.Corey: So, when I look at the things that I have stuffed into S3, they generally fall into a couple of categories. There are a bunch of logs for things I never asked for nor particularly wanted, but AWS is aggressive about that, first routing through CloudTrail so you can get charged 50-cent per gigabyte ingested. Awesome. And of course, large static assets, images I have done something to enter colloquially now known as shitposts, which is great. Other than logs, what could you possibly be storing in S3 that lends itself to, effectively, the type of analysis that you built around this?Thomas: Well, our first use case was the classic log use cases, app logs, web service logs. I mean, CloudTrail, it's famous; we had customers that gave up on elastic, and definitely gave up on relational where you can do a couple changes and your permutation of attributes for CloudTrail is going to put you to your knees. And people just say, “I give up.” Same thing with Kubernetes logs. And so it's the classic—whether it's CSV, where it's JSON, where it's log types, we auto-discover all that.We also allow you, if you want to override that and change the parsing capabilities through a UI wizard, we do discover what's in your buckets. That term data swamp, and not knowing what's in your bucket, we do a facility that will index that data, actually create a report for you for knowing what's in. Now, if you have text data, if you have log data, if you have BI data, we can bring it all together, but the real pain is at the scale. So classically, app logs, system logs, many devices sending IoT-type streams is where we really come in—Kubernetes—where they're dealing with terabytes of data per day, and managing an ELK cluster at that scale. Particularly on a Black Friday.Shoot, some of our customers like—Klarna is one of them; credit card payment—they're ramping up for Black Friday, and one of the reasons why they chose us is our ability to scale when maybe you're doing a terabyte or two a day and then it goes up to twenty, twenty-five. How do you test that scale? How do you manage that scale? And so for us, the data streams are, traditionally with our customers, the well-known log types, at least in the log use cases. And the challenge is scaling it, is getting access to it, and that's where we come in.Corey: I will say the last time you were on the show a couple of years ago, you were talking about the initial logging use case and you were speaking, in many cases aspirationally, about where things were going. What a difference a couple years is made. Instead of talking about what hypothetical customers might want, or what—might be able to do, you're just able to name-drop them off the top of your head, you have scaled to approximately ten times the number of employees you had back then. You've—Thomas: Yep. Yep.Corey: —raised, I think, a total of—what, 50 million?—since then.Thomas: Uh, 60 now. Yeah.Corey: Oh, 60? Fantastic.Thomas: Yeah, yeah.Corey: Congrats. And of course, how do you do it? By sponsoring Last Week in AWS, as everyone should. I'm taking clear credit for that every time someone announces around, that's the game. But no, there is validity to it because telling fun stories and sponsoring exciting things like this only carry you so far. At some point, customers have to say, yeah, this is solving a pain that I have; I'm willing to pay you money to solve it.And you've clearly gotten to a point where you are addressing the needs of those customers at a pretty fascinating clip. It's bittersweet from my perspective because it seems like the majority of your customers have not come from my nonsense anymore. They're finding you through word of mouth, they're finding through more traditional—read as boring—ad campaigns, et cetera, et cetera. But you've built a brand that extends beyond just me. I'm no longer viewed as the de facto ombudsperson for any issue someone might have with ChaosSearch on Twitters. It's kind of, “Aww, the company grew up. What happened there?”Thomas: No, [laugh] listen, this you were great. We reached out to you to tell our story, and I got to be honest. A lot of people came by, said, “I heard something on Corey Quinn's podcasts,” or et cetera. And it came a long way now. Now, we have, you know, companies like Equifax, multi-cloud—Amazon and Google.They love the data lake philosophy, the centralized, where use cases are now available within days, not weeks and months. Whether it's logs and BI. Correlating across all those data streams, it's huge. We mentioned Klarna, [APM Performance 00:13:19], and, you know, we have Armor for SIEM, and Blackboard for [Observers 00:13:24].So, it's funny—yeah, it's funny, when I first was talking to you, I was like, “What if? What if we had this customer, that customer?” And we were building the capabilities, but now that we have it, now that we have customers, yeah, I guess, maybe we've grown up a little bit. But hey, listen to you're always near and dear to our heart because we remember, you know, when you stop[ed by our booth at re:Invent several times. And we're coming to re:Invent this year, and I believe you are as well.Corey: Oh, yeah. But people listening to this, it's if they're listening the day it's released, this will be during re:Invent. So, by all means, come by the ChaosSearch booth, and see what they have to say. For once they have people who aren't me who are going to be telling stories about these things. And it's fun. Like, I joke, it's nothing but positive here.It's interesting from where I sit seeing the parallels here. For example, we have both had—how we say—adult supervision come in. You have a CEO, Ed, who came over from IBM Storage. I have Mike Julian, whose first love language is of course spreadsheets. And it's great, on some level, realizing that, wow, this company has eclipsed my ability to manage these things myself and put my hands-on everything. And eventually, you have to start letting go. It's a weird growth stage, and it's a heck of a transition. But—Thomas: No, I love it. You know, I mean, I think when we were talking, we were maybe 15 employees. Now, we're pushing 100. We brought on Ed Walsh, who's an amazing CEO. It's funny, I told him about this idea, I invented this technology roughly eight years ago, and he's like, “I love it. Let's do it.” And I wasn't ready to do it.So, you know, five, six years ago, I started the company always knowing that, you know, I'd give him a call once we got the plane up in the air. And it's been great to have him here because the next level up, right, of execution and growth and business development and sales and marketing. So, you're exactly right. I mean, we were a young pup several years ago, when we were talking to you and, you know, we're a little bit older, a little bit wiser. But no, it's great to have Ed here. And just the leadership in general; we've grown immensely.Corey: Now, we are recording this in advance of re:Invent, so there's always the question of, “Wow, are we going to look really silly based upon what is being announced when this airs?” Because it's very hard to predict some things that AWS does. And let's be clear, I always stay away from predictions, just because first, I have a bit of a knack for being right. But also, when I'm right, people will think, “Oh, Corey must have known about that and is leaking,” whereas if I get it wrong, I just look like a fool. There's no win for me if I start doing the predictive dance on stuff like that.But I have to level with you, I have been somewhat surprised that, at least as of this recording, AWS has not moved more in your direction because storing data in S3 is kind of their whole thing, and querying that data through something that isn't Athena has been a bit of a reach for them that they're slowly starting to wrap their heads around. But their UltraWarm nonsense—which is just, okay, great naming there—what is the point of continually having a model where oh, yeah, we're going to just age it out, the stuff that isn't actively being used into S3, rather than coming up with a way to query it there. Because you've done exactly that, and please don't take this as anything other than a statement of fact, they have better access to what S3 is doing than you do. You're forced to deal with this thing entirely from a public API standpoint, which is fine. They can theoretically change the behavior of aspects of S3 to unlock these use cases if they chose to do so. And they haven't. Why is it that you're the only folks that are doing this?Thomas: No, it's a great question, and I'll give them props for continuing to push the data lake [unintelligible 00:17:09] to the cloud providers' S3 because it was really where I saw the world. Lakes, I believe in. I love them. They love them. However, they promote the move the data out to get access, and it seems so counterintuitive on why wouldn't you leave it in and put these services, make them more intelligent? So, it's funny, I've trademark ‘Smart Object Storage,' I actually trademarked—I think you [laugh] were a part of this—‘UltraHot,' right? Because why would you want UltraWarm when you can have UltraHot?And the reason, I feel, is that if you're using Parquet for Athena [unintelligible 00:17:40] store, or Lucene for Elasticsearch, these two index technologies were not designed for cloud storage, for real-time streaming off of cloud storage. So, the trick is, you have to build UltraWarm, get it off of what they consider cold S3 into a more warmer memory or SSD type access. What we did, what the invention I created was, that first read is hot. That first read is fast.Snowflake is a good example. They give you a ten terabyte demo example, and if you have a big instance and you do that first query, maybe several orders or groups, it could take an hour to warm up. The second query is fast. Well, what if the first query is in seconds as well? And that's where we really spent the last five, six years building out the tech and the vision behind this because I like to say you go to a doctor and say, “Hey, Doc, every single time I move my arm, it hurts.” And the doctor says, “Well, don't move your arm.”It's things like that, to your point, it's like, why wouldn't they? I would argue, one, you have to believe it's possible—we're proving that it is—and two, you have to have the technology to do it. Not just the index, but the architecture. So, I believe they will go this direction. You know, little birdies always say that all these companies understand this need.Shoot, Snowflake is trying to be lake-y; Databricks is trying to really bring this warehouse lake concept. But you still do all the pipelining; you still have to do all the data management the way that you don't want to do. It's not a lake. And so my argument is that it's innovation on why. Now, they have money; they have time, but, you know, we have a big head start.Corey: I remembered last year at re:Invent they released a, shall we say, significant change to S3 that it enabled read after write consistency, which is awesome, for again, those of us in the business of misusing things as databases. But for some folks, the majority of folks I would say, it was a, “I don't know what that means and therefore I don't care.” And that's fine. I have no issue with that. There are other folks, some of my customers for example, who are suddenly, “Wait a minute. This means I can sunset this entire janky sidecar metadata system that is designed to make sure that we are consistent in our use of S3 because it now does it automatically under the hood?” And that's awesome. Does that change mean anything for ChaosSearch?Thomas: It doesn't because of our architecture. We're append-only, write-once scenario, so a lot of update-in-place viewpoints. My viewpoint is that if you're seeing S3 as the database and you need that type of consistency, it make sense of why you'd want it, but because of our distributive fabric, our stateless architecture, our append-only nature, it really doesn't affect us.Now, I talked to the S3 team, I said, “Please if you're coming up with this feature, it better not be slower.” I want S3 to be fast, right? And they said, “No, no. It won't affect performance.” I'm like, “Okay. Let's keep that up.”And so to us, any type of S3 capability, we'll take advantage of it if benefits us, whether it's consistency as you indicated, performance, functionality. But we really keep the constructs of S3 access to really limited features: list, put, get. [roll-on 00:20:49] policies to give us read-only access to your data, and a location to write our indices into your account, and then are distributed fabric, our service, acts as those indices and query them or searches them to resolve whatever analytics you need. So, we made it pretty simple, and that is allowed us to make it high performance.Corey: I'll take it a step further because you want to talk about changes since the last time we spoke, it used to be that this was on top of S3, you can store your data anywhere you want, as long as it's S3 in the customer's account. Now, you're also supporting one-click integration with Google Cloud's object storage, which, great. That does mean though, that you're not dependent upon provider-specific implementations of things like a consistency model for how you've built things. It really does use the lowest common denominator—to my understanding—of object stores. Is that something that you're seeing broad adoption of, or is this one of those areas where, well, you have one customer on a different provider, but almost everything lives on the primary? I'm curious what you're seeing for adoption models across multiple providers?Thomas: It's a great question. We built an architecture purposely to be cloud-agnostic. I mean, we use compute in a containerized way, we use object storage in a very simple construct—put, get, list—and we went over to Google because that made sense, right? We have customers on both sides. I would say Amazon is the gorilla, but Google's trying to get there and growing.We had a big customer, Equifax, that's on both Amazon and Google, but we offer the same service. To be frank, it looks like the exact same product. And it should, right? Whether it's Amazon Cloud, or Google Cloud, multi-select and I want to choose either one and get the other one. I would say that different business types are using each one, but our bulk of the business isn't Amazon, but we just this summer released our SaaS offerings, so it's growing.And you know, it's funny, you never know where it comes from. So, we have one customer—actually DigitalRiver—as one of our customers on Amazon for logs, but we're growing in working together to do a BI on GCP or on Google. And so it's kind of funny; they have two departments on two different clouds with two different use cases. And so do they want unification? I'm not sure, but they definitely have their BI on Google and their operations in Amazon. It's interesting.Corey: You know its important to me that people learn how to use the cloud effectively. Thats why I'm so glad that Cloud Academy is sponsoring my ridiculous non-sense. They're a great way to build in demand tech skills the way that, well personally, I learn best which I learn by doing not by reading. They have live cloud labs that you can run in real environments that aren't going to blow up your own bill—I can't stress how important that is. Visit cloudacademy.com/corey. Thats C-O-R-E-Y, don't drop the “E.” Use Corey as a promo-code as well. You're going to get a bunch of discounts on it with a lifetime deal—the price will not go up. It is limited time, they assured me this is not one of those things that is going to wind up being a rug pull scenario, oh no no. Talk to them, tell me what you think. Visit: cloudacademy.com/corey,  C-O-R-E-Y and tell them that I sent you!Corey: I know that I'm going to get letters for this. So, let me just call it out right now. Because I've been a big advocate of pick a provider—I care not which one—and go all-in on it. And I'm sitting here congratulating you on extending to another provider, and people are going to say, “Ah, you're being inconsistent.”No. I'm suggesting that you as a provider have to meet your customers where they are because if someone is sitting in GCP and your entire approach is, “Step one, migrate those four petabytes of data right on over here to AWS,” they're going to call you that jackhole that you would be by making that suggestion and go immediately for option B, which is literally anything that is not ChaosSearch, just based upon that core misunderstanding of their business constraints. That is the way to think about these things. For a vendor position that you are in as an ISV—Independent Software Vendor for those not up on the lingo of this ridiculous industry—you have to meet customers where they are. And it's the right move.Thomas: Well, you just said it. Imagine moving terabytes and petabytes of data.Corey: It sounds terrific if I'm a salesperson for one of these companies working on commission, but for the rest of us, it sounds awful.Thomas: We really are a data fabric across clouds, within clouds. We're going to go where the data is and we're going to provide access to where that data lives. Our whole philosophy is the no-movement movement, right? Don't move your data. Leave it where it is and provide access at scale.And so you may have services in Google that naturally stream to GCS; let's do it there. Imagine moving that amount of data over to Amazon to analyze it, and vice versa. 2020, we're going to be in Azure. They're a totally different type of business, users, and personas, but you're getting asked, “Can you support Azure?” And the answer is, “Yes,” and, “We will in 2022.”So, to us, if you have cloud storage, if you have compute, and it's a big enough business opportunity in the market, we're there. We're going there. When we first started, we were talking to MinIO—remember that open-source, object storage platform?—We've run on our laptops, we run—this [unintelligible 00:25:04] Dr. Seuss thing—“We run over here; we run over there; we run everywhere.”But the honest truth is, you're going to go with the big cloud providers where the business opportunity is, and offer the same solution because the same solution is valued everywhere: simple in; value out; cost-effective; long retention; flexibility. That sounds so basic, but you mentioned this all the time with our Rube Goldberg, Amazon diagrams we see time and time again. It's like, if you looked at that and you were from an alien planet, you'd be like, “These people don't know what they're doing. Why is it so complicated?” And the simple answer is, I don't know why people think it's complicated.To your point about Amazon, why won't they do it? I don't know, but if they did, things would be different. And being honest, I think people are catching on. We do talk to Amazon and others. They see the need, but they also have to build it; they have to invent technology to address it. And using Parquet and Lucene are not the answer.Corey: Yeah, it's too much of a demand on the producers of that data rather than the consumer. And yeah, I would love to be able to go upstream to application developers and demand they do things in certain ways. It turns out as a consultant, you have zero authority to do that. As a DevOps team member, you have limited ability to influence it, but it turns out that being the ‘department of no' quickly turns into being the ‘department of unemployment insurance' because no one wants to work with you. And collaboration—contrary to what people wish to believe—is a key part of working in a modern workplace.Thomas: Absolutely. And it's funny, the demands of IT are getting harder; the actual getting the employees to build out the solutions are getting harder. And so a lot of that time is in the pipeline, is the prep, is the schema, the sharding, and et cetera, et cetera, et cetera. My viewpoint is that should be automated away. More and more databases are being autotune, right?This whole knobs and this and that, to me, Glue is a means to an end. I mean, let's get rid of it. Why can't Athena know what to do? Why can't object storage be Athena and vice versa? I mean, to me, it seems like all this moving through all these services, the classic Amazon viewpoint, even their diagrams of having this centralized repository of S3, move it all out to your services, get results, put it back in, then take it back out again, move it around, it just doesn't make much sense. And so to us, I love S3, love the service. I think it's brilliant—Amazon's first service, right?—but from there get a little smarter. That's where ChaosSearch comes in.Corey: I would argue that S3 is in fact, a modern miracle. And one of those companies saying, “Oh, we have an object store; it's S3 compatible.” It's like, “Yeah. We have S3 at home.” Look at S3 at home, and it's just basically a series of failing Raspberry Pis.But you have this whole ecosystem of things that have built up and sprung up around S3. It is wildly understated just how scalable and massive it is. There was an academic paper recently that won an award on how they use automated reasoning to validate what is going on in the S3 environment, and they talked about hundreds of petabytes in some cases. And folks are saying, ah, S3 is hundreds of petabytes. Yeah, I have clients storing hundreds of petabytes.There are larger companies out there. Steve Schmidt, Amazon's CISO, was recently at a Splunk keynote where he mentioned that in security info alone, AWS itself generates 500 petabytes a day that then gets reduced down to a bunch of stuff, and some of it gets loaded into Splunk. I think. I couldn't really hear the second half of that sentence because of the sound of all of the Splunk salespeople in that room becoming excited so quickly you could hear it.Thomas: [laugh]. I love it. If I could be so bold, those S3 team, they're gods. They are amazing. They created such an amazing service, and when I started playing with S3 now, I guess, 2006 or 7, I mean, we were using for a repository, URL access to get images, I was doing a virtualization [unintelligible 00:29:05] at the time—Corey: Oh, the first time I played with it, “This seems ridiculous and kind of dumb. Why would anyone use this?” Yeah, yeah. It turns out I'm really bad at predicting the future. Another reason I don't do the prediction thing.Thomas: Yeah. And when I started this company officially, five, six years ago, I was thinking about S3 and I was thinking about HDFS not being a good answer. And I said, “I think S3 will actually achieve the goals and performance we need.” It's a distributed file system. You can run parallel puts and parallel gets. And the performance that I was seeing when the data was a certain way, certain size, “Wait, you can get high performance.”And you know, when I first turned on the engine, now four or five years ago, I was like, “Wow. This is going to work. We're off to the races.” And now obviously, we're more than just an idea when we first talked to you. We're a service.We deliver benefits to our customers both in logs. And shoot, this quarter alone we're coming out with new features not just in the logs, which I'll talk about second, but in a direct SQL access. But you know, one thing that you hear time and time again, we talked about it—JSON, CloudTrail, and Kubernetes; this is a real nightmare, and so one thing that we've come out with this quarter is the ability to virtually flatten. Now, you heard time and time again, where, “Okay. I'm going to pick and choose my data because my database can't handle whether it's elastic, or say, relational.” And all of a sudden, “Shoot, I don't have that. I got to reindex that.”And so what we've done is we've created a index technology that we're always planning to come out with that indexes the JSON raw blob, but in the data refinery have, post-index you can select how to unflatten it. Why is that important? Because all that tooling, whether it's elastic or SQL, is now available. You don't have to change anything. Why is Snowflake and BigQuery has these proprietary JSON APIs that none of these tools know how to use to get access to the data?Or you pick and choose. And so when you have a CloudTrail, and you need to know what's going on, if you picked wrong, you're in trouble. So, this new feature we're calling ‘Virtual Flattening'—or I don't know what we're—we have to work with the marketing team on it. And we're also bringing—this is where I get kind of excited where the elastic world, the ELK world, we're bringing correlations into Elasticsearch. And like, how do you do that? They don't have the APIs?Well, our data refinery, again, has the ability to correlate index patterns into one view. A view is an index pattern, so all those same constructs that you had in Kibana, or Grafana, or Elastic API still work. And so, no more denormalizing, no more trying to hodgepodge query over here, query over there. You're actually going to have correlations in Elastic, natively. And we're excited about that.And one more push on the future, Q4 into 2022; we have been given early access to S3 SQL access. And, you know, as I mentioned, correlations in Elastic, but we're going full in on publishing our [TPCH 00:31:56] report, we're excited about publishing those numbers, as well as not just giving early access, but going GA in the first of the year, next year.Corey: I look forward to it. This is also, I guess, it's impossible to have a conversation with you, even now, where you're not still forward-looking about what comes next. Which is natural; that is how we get excited about the things that we're building. But so much less of what you're doing now in our conversations have focused around what's coming, as opposed to the neat stuff you're already doing. I had to double-check when we were talking just now about oh, yeah, is that Google cloud object store support still something that is roadmapped, or is that out in the real world?No, it's very much here in the real world, available today. You can use it. Go click the button, have fun. It's neat to see at least some evidence that not all roadmaps are wishes and pixie dust. The things that you were talking to me about years ago are established parts of ChaosSearch now. It hasn't been just, sort of, frozen in amber for years, or months, or these giant periods of time. Because, again, there's—yeah, don't sell me vaporware; I know how this works. The things you have promised have come to fruition. It's nice to see that.Thomas: No, I appreciate it. We talked a little while ago, now a few years ago, and it was a bit of aspirational, right? We had a lot to do, we had more to do. But now when we have big customers using our product, solving their problems, whether it's security, performance, operation, again—at scale, right? The real pain is, sure you have a small ELK cluster or small Athena use case, but when you're dealing with terabytes to petabytes, trillions of rows, right—billions—when you were dealing trillions, billions are now small. Millions don't even exist, right?And you're graduating from computer science in college and you say the word, “Trillion,” they're like, “Nah. No one does that.” And like you were saying, people do petabytes and exabytes. That's the world we're living in, and that's something that we really went hard at because these are challenging data problems and this is where we feel we uniquely sit. And again, we don't have to break the bank while doing it.Corey: Oh, yeah. Or at least as of this recording, there's a meme going around, again, from an old internal Google Video, of, “I just want to serve five terabytes of traffic,” and it's an internal Google discussion of, “I don't know how to count that low.” And, yeah.Thomas: [laugh].Corey: But there's also value in being able to address things at much larger volume. I would love to see better responsiveness options around things like Deep Archive because the idea of being able to query that—even if you can wait a day or two—becomes really interesting just from the perspective of, at that point, current cost for one petabyte of data in Glacier Deep Archive is 1000 bucks a month. That is ‘why would I ever delete data again?' Pricing.Thomas: Yeah. You said it. And what's interesting about our technology is unlike, let's say Lucene, when you index it, it could be 3, 4, or 5x the raw size, our representation is smaller than gzip. So, it is a full representation, so why don't you store it efficiently long-term in S3? Oh, by the way, with the Glacier; we support Glacier too.And so, I mean, it's amazing the cost of data with cloud storage is dramatic, and if you can make it hot and activated, that's the real promise of a data lake. And, you know, it's funny, we use our own service to run our SaaS—we log our own data, we monitor, we alert, have dashboards—and I can't tell you how cheap our service is to ourselves, right? Because it's so cost-effective for long-tail, not just, oh, a few weeks; we store a whole year's worth of our operational data so we can go back in time to debug something or figure something out. And a lot of that's savings. Actually, huge savings is cloud storage with a distributed elastic compute fabric that is serverless. These are things that seem so obvious now, but if you have SSDs, and you're moving things around, you know, a team of IT professionals trying to manage it, it's not cheap.Corey: Oh, yeah, that's the story. It's like, “Step one, start paying for using things in cloud.” “Okay, great. When do I stop paying?” “That's the neat part. You don't.” And it continues to grow and build.And again, this is the thing I learned running a business that focuses on this, the people working on this, in almost every case, are more expensive than the infrastructure they're working on. And that's fine. I'd rather pay people than technologies. And it does help reaffirm, on some level, that—people don't like this reminder—but you have to generate more value than you cost. So, when you're sitting there spending all your time trying to avoid saving money on, “Oh, I've listened to ChaosSearch talk about what they do a few times. I can probably build my own and roll it at home.”It's, I've seen the kind of work that you folks have put into this—again, you have something like 100 employees now; it is not just you building this—my belief has always been that if you can buy something that gets you 90, 95% of where you are, great. Buy it, and then yell at whoever selling it to you for the rest of it, and that'll get you a lot further than, “We're going to do this ourselves from first principles.” Which is great for a weekend project for just something that you have a passion for, but in production mistakes show. I've always been a big proponent of buying wherever you can. It's cheaper, which sounds weird, but it's true.Thomas: And we do the same thing. We have single-sign-on support; we didn't build that ourselves, we use a service now. Auth0 is one of our providers now that owns that [crosstalk 00:37:12]—Corey: Oh, you didn't roll your own authentication layer? Why ever not? Next, you're going to tell me that you didn't roll your own payment gateway when you wound up charging people on your website to sign up?Thomas: You got it. And so, I mean, do what you do well. Focus on what you do well. If you're repeating what everyone seems to do over and over again, time, costs, complexity, and… service, it makes sense. You know, I'm not trying to build storage; I'm using storage. I'm using a great, wonderful service, cloud object storage.Use whats works, whats works well, and do what you do well. And what we do well is make cloud object storage analytical and fast. So, call us up and we'll take away that 2 a.m. call you have when your cluster falls down, or you have a new workload that you are going to go to the—I don't know, the beach house, and now the weekend shot, right? Spin it up, stream it in. We'll take over.Corey: Yeah. So, if you're listening to this and you happen to be at re:Invent, which is sort of an open question: why would you be at re:Invent while listening to a podcast? And then I remember how long the shuttle lines are likely to be, and yeah. So, if you're at re:Invent, make it on down to the show floor, visit the ChaosSearch booth, tell them I sent you, watch for the wince, that's always worth doing. Thomas, if people have better decision-making capability than the two of us do, where can they find you if they're not in Las Vegas this week?Thomas: So, you find us online chaossearch.io. We have so much material, videos, use cases, testimonials. You can reach out to us, get a free trial. We have a self-service experience where connect to your S3 bucket and you're up and running within five minutes.So, definitely chaossearch.io. Reach out if you want a hand-held, white-glove experience POV. If you have those type of needs, we can do that with you as well. But we booth on re:Invent and I don't know the booth number, but I'm sure either we've assigned it or we'll find it out.Corey: Don't worry. This year, it is a low enough attendance rate that I'm projecting that you will not be as hard to find in recent years. For example, there's only one expo hall this year. What a concept. If only it hadn't taken a deadly pandemic to get us here.Thomas: Yeah. But you know, we'll have the ability to demonstrate Chaos at the booth, and really, within a few minutes, you'll say, “Wow. How come I never heard of doing it this way?” Because it just makes so much sense on why you do it this way versus the merry-go-round of data movement, and transformation, and schema management, let alone all the sharding that I know is a nightmare, more often than not.Corey: And we'll, of course, put links to that in the [show notes 00:39:40]. Thomas, thank you so much for taking the time to speak with me today. As always, it's appreciated.Thomas: Corey, thank you. Let's do this again.Corey: We absolutely will. Thomas Hazel, CTO and Founder of ChaosSearch. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast episode, please leave a five-star review on your podcast platform of choice, whereas if you've hated this episode, please leave a five-star review on your podcast platform of choice along with an angry comment because I have dared to besmirch the honor of your homebrewed object store, running on top of some trusty and reliable Raspberries Pie.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

AXOPEN
Gestion des flux de données et interfaçage des applications dans le SI

AXOPEN

Play Episode Listen Later Nov 23, 2021 33:13


Bienvenue dans le 7e épisode de notre série de podcasts consacrés aux professionnels de l'informatique ! INTERVIEW #7 : Gestion des flux de données et interfaçage des applications dans le SI On aborde aujourd'hui un sujet essentiel pour toutes les entreprises : la gestion des flux de données et les connexions entre les différentes applications du SI. Pourquoi c'est un enjeu primordial ? Comment aborder ce type de projet ? Avec quels outils ? Combien ça coûte ? On en parle avec notre invité du jour : Nicolas d'Ambrosio, créateur et président de DIGITALISIM. 00:00 : Introduction 01:36 : Le métier de Nicolas d'Ambrosio chez DIGITALISIM : être un levier business à travers les techniques web, marketing et communication avec une orientation technique 07:30 : La gestion de données : pourquoi c'est important ? 09:00 : Zoom sur le shadow IT 10:35 : Quelles pistes pour optimiser la gestion de données ? 12:00 : La problématique de l'interconnexion des outils du SI 13:36 : Comment aborder le sujet de la gestion des flux de données et avec quels outils ? 17:09 : Qu'est-ce qu'un ETL et à quoi ça sert ? 18:09 : Quelles sont les bonnes pratiques à adopter ? 21:02 : Quel traitement sur le long terme ? Maintenance et points de contrôle 23:21 : Combien ça coûte ? 26:25 : Quelle évolution dans les prochaines années ? 29:40 : Le rêve de projet informatique de Nicolas Encore merci à Nicolas pour ta participation à cette interview, et à très bientôt ! · Pour en savoir plus sur DIGITALISIM : https://www.digitalisim.fr/ · Pour se connecter à Nicolas sur LinkedIn : https://www.linkedin.com/in/nicolasdambrosio/ ____ Basée à Lyon depuis 2007, AXOPEN est une entreprise spécialisée dans l'expertise et le développement de projets informatiques sur mesure, constituée de plus de 35 passionné(e)s, experts en nouvelles technologies et en développement. ____ Crédit son : Maxime Ledan

Vaybertaytsh
Episode 60: Etl Niborski | עטל ניבאָרסקי

Vaybertaytsh

Play Episode Listen Later Nov 22, 2021


I'm so pumped about this conversation with Etl Niborski, recorded in Tel Aviv this past summer. Etl is a 19-year-old left-wing activist and a native Yiddish speaker who recently completed her national service working in a school in Jaffa for at-risk youth. We talked all about what it's like to be an Israeli at the end of high school — all of the complicated decisions one has to make about joining the military or finding a way not to — how her Yiddishist background impacts her political thinking, what it was like to be a Yiddish-speaking, non-Hasidic kid on the streets of Jerusalem, and about her current Yiddish activities and projects.For more from The White Screen's album Sex, Drugs, and Palestine, click here.To see our most recent merch, click here.

Poulain Raffûte
Romane Ménager : "On a le jeu et les qualités pour être championnes du monde en Nouvelle-Zélande"

Poulain Raffûte

Play Episode Listen Later Nov 17, 2021 32:18


Il y a des fratries qui marquent notre sport depuis toujours. Pour en citer quelques-unes, il y a eu les frères Boniface, les Camberabero, les Spanghero, les Underwoods, Tuilagi, Armitage, Du Plessis, les frères Bergamasco, Lièvremont et plus récemment les frères Marchand, Couilloud et Arnolds à Toulouse !Depuis plus de six ans maintenant, les sœurs Ménager, Romane la 3e ligne et Marine l'ailière, font les beaux jours du rugby féminin de Lille à Montpellier et surtout en équipe de France. Toutes deux en contrat fédéral depuis 2018, elles sont championnes en 2016 avec Lille puis en 2019 avec Montpellier !Romane, c'est trois Tournois des Six Nations, dont un Grand Chelem en 2018. Elle est devenue une référence à son poste de 3e ligne et vient d'affronter l'une des deux bêtes noires de l'équipe de France, les Black Ferns.Safi N'Diaye dit de Romane qu'elle est une joueuse exceptionnelle, bosseuse, technique, physique, une des meilleures joueuses du monde à son poste et qu'elle est fière de jouer à ses côtés à Montpellier et en équipe de France. Gaëlle Hermet, capitaine de l'équipe de France, ne tarit pas d'éloges non plus et nous a avoué que Romane est le genre de joueuse que tu préfères avoir dans ton équipe que contre toi. C'est une joueuse de grande qualité, autant humaine que technique. Et Lénaïg Corson de conclure : c'est une bête de travail, une superbe athlète. Bref une joueuse complète et une femme qui fait l'unanimité dans le vestiaire et dans la vie !Au sortir d'un match génial et d'une victoire épique face aux Black Ferns ce week-end pour une nouvelle génération de joueuses du XV de France, Romane fait presque office "d'ancienne" à seulement... 25 ans ! Elle va nous raconter son parcours, son rugby partagé avec sa sœur jumelle Marine, ses ambitions et sa vision du rugby féminin qui prend de plus en plus de place sur la toile et dans le cœur des français.Bonne écoute et bienvenue dans Poulain Raffûte.Emission concoctée par Raphaël Poulain, raffûteur en chef, et Arnaud Beurdeley, journaliste reporter au Midi Olympique et réalisée par Sébastien Petit, journaliste pour Eurosport.Ecoutez d'autres épisodes :Safi N'Diaye: "Les rugbymen m'ont fait rêver, aux femmes de faire rêver les jeunes générations"Gaëlle Hermet : "On veut prouver à la France entière qu'on peut gagner ce Mondial"Jessy Trémoulière : "Non, le rugby n'est pas réservé qu'aux hommes"Lenaïg Corson: "Ne plus parler de l'équipe de France féminine, c'est difficile à vivre"Vous pouvez réagir à cet épisode sur notre page Twitter.Retrouvez tous les podcasts d'Eurosport ici Voir Acast.com/privacy pour les informations sur la vie privée et l'opt-out.

Microsoft Mechanics Podcast
What's new in SQL Server 2022

Microsoft Mechanics Podcast

Play Episode Listen Later Nov 17, 2021 13:30


A first look at SQL Server 2022—the latest Azure-enabled database and data integration innovations. See what it means for your hybrid workloads, including first-time bi-directional high availability and disaster recovery between Azure SQL Managed Instance and SQL Server, Azure Synapse Link integration with SQL for ETL free near real-time reporting and analytics over your operational data, and new next-generation built-in query intelligence with parameter sensitive plan optimization. Bob Ward, SQL engineering leader, joins Jeremy Chapman to share the focus on this round of updates. ► QUICK LINKS: 00:00 - Introduction 00:38 - Overview of updates 02:19 - Disaster recovery 04:26 - Failover and restore example 06:16 - Azure Synapse integration 09:04 - Built-in query intelligence 10:19 - See it in action 12:52 - Wrap up ► Link References: Learn more about SQL Server 2022 at https://aka.ms/SQLServer2022 Apply to join our private preview, and try it out at https://aka.ms/EAPSignUp ► Unfamiliar with Microsoft Mechanics? We are Microsoft's official video series for IT. You can watch and share valuable content and demos of current and upcoming tech from the people who build it at Microsoft. Subscribe to our YouTube: https://www.youtube.com/c/MicrosoftMechanicsSeries?sub_confirmation=1 Join us on the Microsoft Tech Community: https://techcommunity.microsoft.com/t5/microsoft-mechanics-blog/bg-p/MicrosoftMechanicsBlog Watch or listen via podcast here: https://microsoftmechanics.libsyn.com/website ► Keep getting this insider knowledge, join us on social: Follow us on Twitter: https://twitter.com/MSFTMechanics Follow us on LinkedIn: https://www.linkedin.com/company/microsoft-mechanics/ 

Smart Software with SmartLogic
Re-Platforming One of the Original Dot Coms in Elixir with Angel Jose

Smart Software with SmartLogic

Play Episode Listen Later Nov 11, 2021 47:57


Today's guest is Angel Jose, a Software Engineer Manager at Cars.com with a passion for product and the customer experience. Angel played a key role in completely re-platforming Cars.com via Elixir, Phoenix, and other open source tooling, and his former adventures in the blockchain space include working with ETH, EOS, and general distributed tooling. In today's episode, we discuss Cars.com's decision to migrate to an entirely Elixir-based system, rebuilding the data model from scratch, redesigning all of the user interfaces, and what that meant for the team that Angel was tasked with leading, as well as how the Elixir system functions at such incredible scale, with Cars.com receiving more than a million visitors daily! We touch on Angel's approach to onboarding new engineers, how Elixir impacts this process, and the broader impact Elixir has on the community as a whole, as well as what he hopes to see from the community in the future, so make sure not to miss this awesome conversation about adopting Elixir with Angel Jose! Key Points From This Episode: Hot takes, rants, and obsessions: Angel's best and worst taco experiences. Why Angel won't be at ElixirConf 2021 and the story of how he began programming in Elixir. The process of finding a job in software engineering after completing an online bootcamp. Angel's experience of navigating the freedom that comes with being an engineer. Find out how Angel got involved in re-platforming Cars.com, one of the original dot coms. Get a glimpse into the make up of the engineering team at Cars.com. How the pandemic impacted not only Angel's deadlines but the car industry as a whole. The ETL pipeline of different data points that makes up Cars.com and Auto.com. Angel shares his opinion of LiveView and what he has learned about using it at scale. Advice for those adopting new technology: make sure there are enough resources out there. Where Angel believes his team would be without Elixir and what they are looking forward to. Some of the tangible benefits Cars.com has seen from flipping the switch to Elixir. How Angel approaches onboarding new engineers by providing them with resources and integrating learning into their day-to-day. The importance of celebrating small wins and fostering feelings of accomplishment. Angel on how Elixir impacts onboarding and new engineers; more simplicity, less magic. How Elixir has impacted the programming community and what Angel hopes to see in future. Taco happy hour, conference food, making the most of each meal, remote work, and more! What Angel has learned from working remotely, particularly from a social perspective. Angel shares his dream car after working at Cars.com and moving to Colorado. Links Mentioned in Today's Episode: Angel Jose on LinkedIn — https://www.linkedin.com/in/ajose01/ Angel Jose on Twitter — https://twitter.com/ajose01 Cars.com — https://www.cars.com/ Cars.com Careers — https://www.cars.com/careers/ Elixir Conf — https://2021.elixirconf.com/ Elixir Slack — https://elixir-slackin.herokuapp.com/ General Assembly — https://generalassemb.ly/ SmartLogic — https://smartlogic.io/ Special Guest: Angel Jose.

Screaming in the Cloud
Building a Partnership with Your Cloud Provider with Micheal Benedict

Screaming in the Cloud

Play Episode Listen Later Nov 10, 2021 54:44


About Micheal Micheal Benedict leads Engineering Productivity at Pinterest. He and his team focus on developer experience, building tools and platforms for over a thousand engineers to effectively code, build, deploy and operate workloads on the cloud. Mr. Benedict has also built Infrastructure and Cloud Governance programs at Pinterest and previously, at Twitter -- focussed on managing cloud vendor relationships, infrastructure budget management, cloud migration, capacity forecasting and planning and cloud cost attribution (chargeback). Links: Pinterest: https://www.pinterest.com Teletraan: https://github.com/pinterest/teletraan Twitter: https://twitter.com/micheal Pinterestcareers.com: https://pinterestcareers.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: You know how git works right?Announcer: Sorta, kinda, not really. Please ask someone else!Corey: Thats all of us. Git is how we build things, and Netlify is one of the best way I've found to build those things quickly for the web. Netlify's git based workflows mean you don't have to play slap and tickle with integrating arcane non-sense and web hooks, which are themselves about as well understood as git. Give them a try and see what folks ranging from my fake Twitter for pets startup, to global fortune 2000 companies are raving about. If you end up talking to them, because you don't have to, they get why self service is important—but if you do, be sure to tell them that I sent you and watch all of the blood drain from their faces instantly. You can find them in the AWS marketplace or at www.netlify.com. N-E-T-L-I-F-Y.comCorey: This episode is sponsored in part by our friends at Vultr. Spelled V-U-L-T-R because they're all about helping save money, including on things like, you know, vowels. So, what they do is they are a cloud provider that provides surprisingly high performance cloud compute at a price that—while sure they claim its better than AWS pricing—and when they say that they mean it is less money. Sure, I don't dispute that but what I find interesting is that it's predictable. They tell you in advance on a monthly basis what it's going to going to cost. They have a bunch of advanced networking features. They have nineteen global locations and scale things elastically. Not to be confused with openly, because apparently elastic and open can mean the same thing sometimes. They have had over a million users. Deployments take less that sixty seconds across twelve pre-selected operating systems. Or, if you're one of those nutters like me, you can bring your own ISO and install basically any operating system you want. Starting with pricing as low as $2.50 a month for Vultr cloud compute they have plans for developers and businesses of all sizes, except maybe Amazon, who stubbornly insists on having something to scale all on their own. Try Vultr today for free by visiting: vultr.com/screaming, and you'll receive a $100 in credit. Thats v-u-l-t-r.com slash screaming.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Every once in a while, I like to talk to people who work at very large companies that are not in fact themselves a cloud provider. I know it sounds ridiculous. How can you possibly be a big company and not make money by selling managed NAT gateways to an unsuspecting public? But I'm told it can be done here to answer that question. And hopefully at least one other is Pinterest. It's head of engineering productivity, Micheal Benedict. Micheal, thank you for taking the time to join me today.Micheal: Hi, Corey, thank you for inviting me today. I'm really excited to talk to you.Corey: So, exciting times at Pinterest in a bunch of different ways. It was recently reported—which of course, went right to the top of my inbox as 500,000 people on Twitter all said, “Hey, this sounds like a ‘Corey would be interested in it' thing.” It was announced that you folks had signed a $3.2 billion commitment with AWS stretching until 2028. Now, if this is like any other large-scale AWS contract commitment deal that has been made public, you were probably immediately inundated with a whole bunch of people who are very good at arithmetic and not very good at business context saying, “$3.2 billion? You could build massive data centers for that. Why would anyone do this?” And it's tiresome, and that's the world in which we live. But I'm guessing you heard at least a little bit of that from the peanut gallery.Micheal: I did, and I always find it interesting when direct comparisons are made with the total amount that's been committed. And like you said, there's so many nuances that go into how to perceive that amount, and put it in context of, obviously, what Pinterest does. So, I at least want to take this opportunity to share with everyone that Pinterest has been on the cloud since day one. When Ben initially started the company, that product was launched—it was a simple Django app—it was launched on AWS from day one, and since then, it has grown to support 450-plus million MAUs over the course of the decade.And our infrastructure has grown pretty complex. We started with a bunch of EC2 machines and persisting data in S3, and since then we have explored an array of different products, in fact, sometimes working very closely with AWS, as well and helping them put together a product roadmap for some of the items they're working on as well. So, we have an amazing partnership with them, and part of the commitment and how we want to see these numbers is how does it unlock value for Pinterest as a business over time in terms of making us much more agile, without thinking about the nuances of the infrastructure itself. And that's, I think, one of the best ways to really put this into context, that it's not a single number we pay at the end [laugh] of the month, but rather, we are on track to spending a certain amount over a period of time, so this just keeps accruing or adding to that number. And we basically come out with an amazing partnership in AWS, where we have that commitment and we're able to leverage their products and full suite of items without any hiccups.Corey: The most interesting part of what you said is the word partner. And I think that's the piece that gets lost an awful lot when we talk about large-scale cloud negotiations. It's not like buying a car, where you can basically beat the crap out of the salesperson, you can act as if $400 price difference on a car is the difference between storm out of the dealership and sign the contract. Great, you don't really have to deal with that person ever again.In the context of a cloud provider, they run your production infrastructure, and if they have a bad day, I promise you're going to have a bad day, too. You want to handle those negotiations in a way that is respectful of that because they are your partner, whether you want them to be or not. Now, I'm not suggesting that any cloud provider is going to hold an awkward negotiation against the customer, but at the same time, there are going to be scenarios in which you're going to want to have strong relationships, where you're going to need to cash in political capital to some extent, and personally, I've never seen stupendous value in trying to beat the crap out of a company in order to get another tenth of a percent discount on a service you barely use, just because someone decided that well, we didn't do well in the last negotiation so we're going to get them back this time.That's great. What are you actually planning to do as a company? Where are you going? And the fact that you just alluded to, that you're not just a pile of S3 and EC2 instances speaks, in many ways, to that. By moving into the differentiated service world, suddenly you're able to do things that don't look quite as much like building a better database and start looking a lot more like servicing your users more effectively and well.Micheal: And I think, like you said, I feel like there's like a general skepticism in viewing that the cloud providers are usually out there to rip you apart. But in reality, that's not true. To your point, as part of the partnership, especially with AWS and Pinterest, we've got an amazing relationship going on, and behind the scenes, there's a dedicated team at Pinterest, called the Infrastructure Governance Team, a cross-functional team with folks from finance, legal, engineering, product, all sitting together and working with our AWS partners—even the AWS account managers at the times are part of that—to help us make both Pinterest successful, and in turn, AWS gets that amazing customer to work with in helping build some of their newer products as well. And that's one of the most important things we have learned over time is that there's two parts to it; when you want to help improve your business agility, you want to focus not just on the bottom line numbers as they are. It's okay to pay a premium because it offsets the people capital you would have to invest in getting there.And that's a very tricky way to look at math, but that's what these teams do; they sit down and work through those specifics. And for what it's worth, in our conversations, the AWS teams always come back with giving us very insightful data on how we're using their systems to help us better think about how we should be pricing or looking things ahead. And I'm not the expert on this; like I said, there's a dedicated team sitting behind this and looking through and working through these deals, but that's one of the important takeaways I hope the users—or the listeners of this podcast then take away that you want to treat your cloud provider as your partner as much as possible. They're not always there to screw you. That's not their goal. And I apologize for using that term. It is important that you set that expectations that it's in their best interest to actually make you successful because that's how they make money as well.Corey: It's a long-term play. I mean, they could gouge you this quarter, and then you're trying to evacuate as fast as possible. Well, they had a great quarter, but what's their long-term prospect? There are two competing philosophies in the world of business; you can either make a lot of money quickly, or you can make a little bit of money and build it over time in a sustained way. And it's clear the cloud providers are playing the long game on this because they basically have to.Micheal: I mean, it's inevitable at this point. I mean, look at Pinterest. It is one of those success stories. Starting as a Django app on a bunch of EC2 machines to wherever we are right now with having a three-plus billion dollar commitment over a span of couple of years, and we do spend a pretty significant chunk of that on a yearly basis. So, in this case, I'm sure it was a great successful partnership.And I'm hoping some of the newer companies who are building the cloud from the get-go are thinking about it from that perspective. And one of the things I do want to call out, Corey, is that we did initially start with using the primitive services in AWS, but it became clear over time—and I'm sure you heard of the term multi-cloud and many of that—you know, when companies start evaluating how to make the most out of the deals they're negotiating or signing, it is important to acknowledge that the cost of any of those evaluations or even thinking about migrations never tends to get factored in. And we always tend to treat that as being extremely simple or not, but those are engineering resources you want to be spending more building on the product rather than these crazy costly migrations. So, it's in your best interest probably to start using the most from your cloud provider, and also look for opportunities to use other cloud providers—if they provide more value in certain product offerings—rather than thinking about a complete lift-and-shift, and I'm going to make DR as being the primary case on why I want to be moving to multi-cloud.Corey: Yeah. There's a question, too, of the numbers on paper look radically different than the reality of this. You mentioned, Pinterest has been on AWS since the beginning, which means that even if an edict had been passed at the beginning, that, “Thou shalt never build on anything except EC2 and S3. The end. Full stop.”And let's say you went down that rabbit hole of, “Oh, we don't trust their load balancers. We're going to build our own at home. We have load balancers at home. We'll use those.” It's terrible, but even had you done that and restricted yourselves just to those baseline building blocks, and then decide to do a cloud migration, you're still looking back at over a decade of experience where the app has been built unconsciously reflecting the various failure modes that AWS has, the way that it responds to API calls, the latency in how long it takes to request something versus it being available, et cetera, et cetera.So, even moving that baseline thing to another cloud provider is not a trivial undertaking by any stretch of the imagination. But that said—because the topic does always come up, and I don't shy away from it; I think it's something people should go into with an open mind—how has the multi-cloud conversation progressed at Pinterest? Because there's always a multi-cloud conversation.Micheal: We have always approached it with some form of… openness. It's not like we don't want to be open to the ideas, but you really want to be thinking hard on the business case and the business value something provides on why you want to be doing x. In this case, when we think about multi-cloud—and again, Pinterest did start with EC2 and S3, and we did keep it that way for a long time. We built a lot of primitives around it, used it—for example, my team actually runs our bread and butter deployment system on EC2. We help facilitate deployments across a 100,000-plus machines today.And like you said, we have built that system keeping in mind how AWS works, and understanding the nuances of region and AZ failovers and all of that, and help facilitate deployments across 1000-plus microservices in the company. So, thinking about leveraging, say, a Google Cloud instance and how that works, in theory, we can always make a case for engineering to build our deployment system and expand there, but there's really no value. And one of the biggest cases, usually, when multi-cloud comes in is usually either negotiation for price or actually a DR strategy. Like, what if AWS goes down in and us-east-1? Well, let's be honest, they're powering half the internet [laugh] from that one single—Corey: Right.Micheal: Yeah. So, if you think your business is okay running when AWS goes down and half the internet is not going to be working, how do you want to be thinking about that? So, DR is probably not the best reason for you to be even exploring multi-cloud. Rather, you should be thinking about what the cloud providers are offering as a very nuanced offering which your current cloud provider is not offering, and really think about just using those specific items.Corey: So, I agree that multi-cloud for DR purposes is generally not necessarily the best approach with the idea of being able to failover seamlessly, but I like the idea for backups. I mean, Pinterest is a publicly-traded company, which means that among other things, you have to file risk disclosures and be responsive to auditors in a variety of different ways. There are some regulations to start applying to you. And the idea of, well, AWS builds things out in a super effective way, region separation, et cetera, whenever I talk to Amazonians, they are always surprised that anyone wouldn't accept that, “Oh, if you want backups use a different region. Problem solved.”Right, but it is often easier for me to have a rehydrate the business level of backup that would take weeks to redeploy living on another cloud provider than it is for me to explain to all of those auditors and regulators and financial analysts, et cetera why I didn't go ahead and do that path. So, there's always some story for okay, what if AWS decides that they hate us and want to kick us off the platform? Well, that's why legal is involved in those high-level discussions around things like risk, and indemnity, and termination for convenience and for cause clauses, et cetera, et cetera. The idea of making an all-in commitment to a cloud provider goes well beyond things that engineering thinks about. And it's easy for those of us with engineering backgrounds to be incredibly dismissive of that of, “Oh, indemnity? Like, when does AWS ever lose data?” “Yeah, but let's say one day they do. What is your story going to be when asked some very uncomfortable questions by people who wanted you to pay attention to this during the negotiation process?” It's about dotting the i's and crossing the t's, especially with that many commas in the contractual commitments.Micheal: No, it is true. And we did evaluate that as an option, but one of the interesting things about compliance, and especially auditing as well, we generally work with the best in class consultants to help us work through the controls and how we audit, how we look at these controls, how to make sure there's enough accountability going through. The interesting part was in this case, as well, we were able to work with AWS in crafting a lot of those controls and setting up the right expectations as and when we were putting proposals together as well. Now, again, I'm not an expert on this and I know we have a dedicated team from our technical program management organization focused on this, but early on we realized that, to your point, the cost of any form of backups and then being able to audit what's going in, look at all those pipelines, how quickly we can get the data in and out it was proving pretty costly for us. So, we were able to work out some of that within the constructs of what we have with our cloud provider today, and still meet our compliance goals.Corey: That's, on some level, the higher point, too, where everything is everything comes down to context; everything comes down to what the business demands, what the business requires, what the business will accept. And I'm not suggesting that in any case, they're wrong. I'm known for beating the ‘Multi-cloud is a bad default decision' drum, and then people get surprised when they'll have one-on-one conversations, and they say, “Well, we're multi-cloud. Do you think we're foolish?” “No. You're probably doing the right thing, just because you have context that is specific to your business that I, speaking in a general sense, certainly don't have.”People don't generally wake up in the morning and decide they're going to do a terrible job or no job at all at work today, unless they're Facebook's VP of Integrity. So, it's not the sort of thing that lends itself to casual tweet size, pithy analysis very often. There's a strong dive into what is the level of risk a business can accept? And my general belief is that most companies are doing this stuff right. The universal constant in all of my consulting clients that I have spoken to about the in-depth management piece of things is, they've always asked the same question of, “So, this is what we've done, but can you introduce us to the people who are doing it really right, who have absolutely nailed this and gotten it all down?” “It's, yeah, absolutely no one believes that that is them, even the folks who are, from my perspective, pretty close to having achieved it.”But I want to talk a bit more about what you do beyond just the headline-grabbing large dollar figure commitment to a cloud provider story. What does engineering productivity mean at Pinterest? Where do you start? Where do you stop?Micheal: I want to just quickly touch upon that last point about multi-cloud, and like you said, every company works within the context of what they are given and the constraints of their business. It's probably a good time to give a plug to my previous employer, Twitter, who are doing multi-cloud in a reasonably effective way. They are on the data centers, they do have presence on Google Cloud, and AWS, and I know probably things have changed since a couple of years now, but they have embraced that environment pretty effectively to cater to their acquisitions who were on the public cloud, help obviously, with their initial set of investments in the data center, and still continue to scale that out, and explore, in this case, Google Cloud for a variety of other use cases, which sounds like it's been extremely beneficial as well.So, to your point, there is probably no right way to do this. There's always that context, and what you're working with comes into play as part of making these decisions. And it's important to take a lot of these with a grain of salt because you can never understand the decisions, why they were made the way they were made. And for what it's worth, it sort of works out in the end. [laugh]. I've rarely heard a story where it's never worked out, and people are just upset with the deals they've signed. So, hopefully, that helps close that whole conversation about multi-cloud.Corey: I hope so. It's one of those areas where everyone has an opinion and a lot of them do not necessarily apply universally, but it's always fun to take—in that case, great, I'll take the lesser trod path of everyone's saying multi-cloud is great, invariably because they're trying to sell you something. Yeah, I have nothing particularly to sell, folks. My argument has always been, in the absence of a compelling reason not to, pick a provider and go all in. I don't care which provider you pick—which people are sometimes surprised to hear.It's like, “Well, what if they pick a cloud provider that you don't do consulting work for?” Yeah, it turns out, I don't actually need to win every AWS customer over to have a successful working business. Do what makes sense for you, folks. From my perspective, I want this industry to be better. I don't want to sit here and just drum up business for myself and make self-serving comments to empower that. Which apparently is a rare tactic.Micheal: No, that's totally true, Corey. One of the things you do is help people with their bills, so this has come up so many times, and I realize we're sort of going off track a bit from that engineering productivity discussion—Corey: Oh, which is fine. That's this entire show's theme, if it has one.Micheal: [laugh]. So, I want to briefly just talk about the whole billing and how cost management works because I know you spend a lot of time on that and you help a lot of these companies be effective in how they manage their bills. These questions have come up multiple times, even at Pinterest. We actually in the past, when I was leading the infrastructure governance organization, we were working with other companies of our similar size to better understand how they are looking into getting visibility into their cost, setting sort of the right controls and expectations within the engineering organization to plan, and capacity plan, and effectively meet those plans in a certain criteria, and then obviously, if there is any risk to that, actively manage risk. That was like the biggest thing those teams used to do.And we used to talk a lot trade notes, and get a better sense of how a lot of these companies are trying to do—for example, Netflix, or Lyft, or Stripe. I recall Netflix, content was their biggest spender, so cloud spending was like way down in the list of things for them. [laugh]. But regardless, they had an active team looking at this on a day-to-day basis. So, one of the things we learned early on at Pinterest is that start investing in those visibility tools early on.No one can parse the cloud bills. Let's be honest. You're probably the only person who can reverse… [laugh] engineer an architecture diagram from a cloud bill, and I think that's like—definitely you should take a patent for that or something. But in reality, no one has the time to do that. You want to make sure your business leaders, from your finance teams to engineering teams to head of the executives all have a better understanding of how to parse it.So, investing engineering resources, take that data, how do you munch it down to the cost, the utilization across the different vectors of offerings, and have a very insightful discussion. Like, what are certain action items we want to be taking? It's very easy to see, “Oh, we overspent EC2,” and we want to go from there. But in reality, that's not just that thing; you will start finding out that EC2 is being used by your Hadoop infrastructure, which runs hundreds of thousands of jobs. Okay, now who's actually responsible for that cost? You might find that one job which is accruing, sort of, a lot of instance hours over a period of time and a shared multi-tenant environment, how do you attribute that cost to that particular cost center?Corey: And then someone left the company a while back, and that job just kept running in perpetuity. No one's checked the output for four years, I guess it can't be that necessarily important. And digging into it requires context. It turns out, there's no SaaS tool to do this, which is unfortunate for those of us who set out originally to build such a thing. But we discovered pretty early on the context on this stuff is incredibly important.I love the thing you're talking about here, where you're discussing with your peer companies about these things because the advice that I would give to companies with the level of spend that you folks do is worlds apart from what I would advise someone who's building something new and spending maybe 500 bucks a month on their cloud bill. Those folks do not need to hire a dedicated team of people to solve for these problems. At your scale, yeah, you probably should have had some people in [laugh] here looking at this for a while now. And at some point, the guidance changes based upon scale. And if there's one thing that we discover from the horrible pages of Hacker News, it's that people love applying bits of wisdom that they hear in wildly inappropriate situations.How do you think about these things at that scale? Because, a simple example: right now I spend about 1000 bucks a month at The Duckbill Group, on our AWS bill. I know. We have one, too. Imagine that. And if I wind up just committing admin credentials to GitHub, for example, and someone compromises that and start spinning things up to mine all the Bitcoin, yeah, I'm going to notice that by the impact it has on the bill, which will be noticeable from orbit.At the level of spend that you folks are at, at company would be hard-pressed to spin up enough Bitcoin miners to materially move the billing needle on a month-to-month basis, just because of the sheer scope and scale. At small bill volumes, yeah, it's pretty easy to discover the thing that spiking your bill to three times normal. It's usually a managed NAT gateway. At your scale, tripling the bill begins to look suspiciously like the GDP of a small country, so what actually happened here? Invariably, at that scale, with that level of massive multiplier, it's usually the simplest solution, an error somewhere in the AWS billing system. Yes, they exist. Imagine that.Micheal: They do exist, and we've encountered that.Corey: Kind of heartstopping, isn't it?Micheal: [laugh]. I don't know if you remember when we had the big Spectre and the Meltdown, right, and those were interesting scenarios for us because we had identified a lot of those issues early on, given the scale we operate, and we were able to, sort of, obviously it did have an impact on the builds and everything, but that's it; that's why you have these dedicated teams to fix that. But I think one of the points you made, these are large bills and you're never going to have a 3x jump the next day. We're not going to be seeing that. And if that happens, you know, God save us. [laugh].But to your point, one of the things we do still want to be doing is look at trends, literally on a week-over-week basis because even a one percentage move is a pretty significant amount, if you think about it, which could be funding some other aspects of the business, which we would prefer to be investing on. So, we do want to have enough rigor and controls in place in our technical stack to identify and alert when something is off track. And it becomes challenging when you start using those higher-order services from your public cloud provider because there's no clear insights on how do you, kind of, parse that information. One of the biggest challenges we had at Pinterest was tying ownership to all these things.No, using tags is not going to cut it. It was so difficult for us to get to a point where we could put some sense of ownership in all the things and the resources people are using, and then subsequently have the right conversation with our ads infrastructure teams, or our product teams to help drive the cost improvements we want to be seeing. And I wouldn't be surprised if that's not a challenge already, even for the smaller companies who have bills in the tunes of tens and thousands, right?Corey: It is. It's predicting the spend and trying to categorize it appropriately; that's the root of all AWS bill panic on the corporate level. It's not that the bill is 20% higher, so we're going to go broke. Most companies spend far more on payroll than they do on infrastructure—as you mentioned with Netflix, content is a significantly larger [laugh] expense than any of those things; real estate, it's usually right up there too—but instead it's, when you're trying to do business forecasting of, okay, if we're going to have an additional 1000 monthly active users, what will the cost for us be to service those users and, okay, if we're seeing a sudden 20% variance, if that's the new normal, then well, that does change our cost projections for a number of years, what happens? When you're public, there starts to become the question of okay, do we have to restate earnings or what's the deal here?And of course, all this sidesteps past the unfortunate reality that, for many companies, the AWS bill is not a function of how many customers you have; it's how many engineers you hired. And that is always the way it winds up playing out for some reason. “It's why did we see a 10% increase in the bill? Yeah, we hired another data science team. Oops.” It's always seems to be the data science folks; I know I'd beat up on those folks a fair bit, and my apologies. And one day, if they analyze enough of the data, they might figure out why.Micheal: So, this is where I want to give a shout out to our data science team, especially some of the engineers working in the Infrastructure Governance Team putting these charts together, helping us derive insights. So, definitely props to them.I think there's a great segue into the point you made. As you add more engineers, what is the impact on the bottom line? And this is one of the things actually as part of engineering productivity, we think about as well on a long-term basis. Pinterest does have over 1000-plus engineers today, and to large degree, many of them actually have their own EC2 instances today. And I wouldn't say it's a significant amount of cost, but it is a large enough number, were shutting down a c5.9xl can actually fund a bunch of conference tickets or something else.And then you can imagine that sort of the scale you start working with at one point. The nuance here is though, you want to make sure there's enough flexibility for these engineers to do their local development in a sustainable way, but when moving to, say production, we really want to tighten the flexibility a bit so they don't end up doing what you just said, spin up a bunch of machines talking to the API directly which no one will be aware of.I want to share a small anecdote because when back in the day, this was probably four years ago, when we were doing some analysis on our bills, we realized that there was a huge jump every—I believe Wednesday—in our EC2 instances by almost a factor of, like, 500 to 600 instances. And we're like, “Why is this happening? What is going on?” And we found out there was an obscure job written by someone who had left the company, calling an EC2 API to spin up a search cluster of 500 machines on-demand, as part of pulling that ETL data together, and then shutting that cluster down. Which at times didn't work as expected because, you know, obviously, your Hadoop jobs are very predictable, right?So, those are the things we were dealing with back in the day, and you want to make sure—since then—this is where engineering productivity as team starts coming in that our job is to enable every engineer to be doing their best work across code building and deploying the services. And we have done this.Corey: Right. You and I can sit here and have an in-depth conversation about the intricacies of AWS billing in a bunch of different ways because in different ways we both specialize in it, in many respects. But let's say that Pinterest theoretically was foolish enough to hire me before I got into this space as an engineer, for terrifying reasons. And great. I start day one as a typical software developer if such a thing could be said to exist. How do you effectively build guardrails in so that I don't inadvertently wind up spinning up all the EC2 instances available to me within an account, which it turns out are more than one might expect sometimes, but still leave me free to do my job without effectively spending a nine-month safari figuring out how AWS bills work?Micheal: And this is why teams like ours exist, to help provide those tools to help you get started. So today, we actually don't let anyone directly use AWS APIs, or even use the UI for that matter. And I think you'll soon realize, the moment you hit, like, probably 30 or 40 people in your organization, you definitely want to lock it down. You don't want that access to be given to anyone or everyone. And then subsequently start building some higher-order tools or abstraction so people can start using that to control effectively.In this case, if you're a new engineer, Corey, which it seems like you were, at some point—Corey: I still write code like I am, don't worry.Micheal: [laugh]. So yes, you would get access to our internal tool to actually help spin up what we call is a dev app, where you get a chance to, obviously, choose the instance size, not the instance type itself, and we have actually constrained the instance types we have approved within Pinterest as well. We don't give you the entire list you get a chance to choose and deploy to. We actually have constraint to based on the workload types, what are the instance types we want to support because in the future, if we ever want to move from c3 to c5—and I've been there, trust me—it is not an easy thing to do, so you want to make sure that you're not letting people just use random instances, and constrain that by building some of these tools. As a new engineer, you would go in, you'd use the tool, and actually have a dev app provisioned for you with our Pinterest image to get you started.And then subsequently, we'll obviously shut it down if we see you not being using it over a certain amount of time, but those are sort of the guardrails we've put in over there so you never get a chance to directly ever use the EC2 APIs, or any of those AWS APIs to do certain things. The similar thing applies for S3 or any of the higher-order tools which AWS will provide, too.Corey: This episode is sponsored by our friends at Oracle Cloud. Counting the pennies, but still dreaming of deploying apps instead of "Hello, World" demos? Allow me to introduce you to Oracle's Always Free tier. It provides over 20 free services and infrastructure, networking databases, observability, management, and security.And - let me be clear here - it's actually free. There's no surprise billing until you intentionally and proactively upgrade your account. This means you can provision a virtual machine instance or spin up an autonomous database that manages itself all while gaining the networking load, balancing and storage resources that somehow never quite make it into most free tiers needed to support the application that you want to build.With Always Free you can do things like run small scale applications, or do proof of concept testing without spending a dime. You know that I always like to put asterisks next to the word free. This is actually free. No asterisk. Start now. Visit https://snark.cloud/oci-free that's https://snark.cloud/oci-free.Corey: How does that interplay with AWS launches yet another way to run containers, for example, and that becomes a valuable potential avenue to get some business value for a developer, but the platform you built doesn't necessarily embrace that capability? Or they release a feature to an existing tool that you use that could potentially be a just feature capability story, much more so than a cost savings one. How do you keep track of all of that and empower people to use those things so they're not effectively trying to reimplement DynamoDB on top of EC2?Micheal: That's been a challenge, actually, in the past for us because we've always been very flexible where engineers have had an opportunity to write their own solutions many a times rather than leveraging the AWS services, and of late, that's one of the reasons why we have an infrastructure organization—an extremely lean organization for what it's worth—but then still able to achieve outsized outputs. Where we evaluate a lot of these use cases, as they come in and open up different aspects of what we want to provide say directly from AWS, or build certain abstractions on top of it. Every time we talk about containers, obviously, we always associate that with something like Kubernetes and offerings from there on; we realized that our engineers directly never ask for those capabilities. They don't come in and say, “I need a new container orchestration system. Give that to me, and I'm going to be extremely productive.”What people actually realize is that if you can provide them effective tools and that can help them get their job done, they would be happy with it. For example, like I said, our deployment system, which is actually an open-source system called Teletraan. That is the bread and butter at Pinterest at which my team runs. We operate 100,000-plus machines. We have actually looked into container orchestration where we do have a dedicated Kubernetes team looking at it and helping certain use cases moved there, but we realized that the cost of entire migrations need to be evaluated against certain use cases which can benefit from being on Kubernetes from day one. You don't want to force anyone to move there, but give them the right incentives to move there. Case in point, let's upgrade your OS. Because if you're managing machines, obviously everyone loves to upgrade their OSes.Corey: Well, it's one of the things I love savings plans versus RIs; you talk about the c3 to c5 migration and everyone has a story about one of those, but the most foolish or frustrating reason that I ever saw not to do the upgrade was what we bought a bunch of Reserved Instances on the C3s and those have a year-and-a-half left to run. And it's foolish not on the part of customers—it's economically sound—but on the part of AWS where great, you're now forcing me to take a contractual commitment to something that serves me less effectively, rather than getting out of the way and letting me do my job. That's why it's so important to me at least, that savings plans cover Fargate and Lambda, I wish they covered SageMaker instead of SageMaker having its own thing because once again, you're now architecturally constrained based upon some ridiculous economic model that they have imposed on us. But that's a separate rant for another time.Micheal: No, we actually went through that process because we do have a healthy balance of how we do Reserved Instances and how we look at on-demand. We've never been big users have spot in the past because just the spot market itself, we realized that putting that pressure on our customers to figure out how to manage that is way more. When I say customers, in this case, engineers within the organization.Corey: Oh, yes. “I want to post some pictures on Pinterest, so now I have to understand the spot market. What?” Yeah.Micheal: [laugh]. So, in this case, when we even we're moving from C3 to C5—and this is where the partnership really plays out effectively, right, because it's also in the best interest of AWS to deprecate their aging hardware to support some of these new ones where they could also be making good enough premium margins for what it's worth and give the benefit back to the user. So, in this case, we were able to work out an extremely flexible way of moving to a C5 as soon as possible, get help from them, actually, in helping us do that, too, allocating capacity and working with them on capacity management. I believe at one point, we were actually one of the largest companies with a C3 footprint and it took quite a while for us to move to C5. But rest assured, once we moved, the savings was just immense. We were able to offset any of those RI and we were able to work behind the scenes to get that out. But obviously, not a lot of that is considered in a small-scale company just because of, like you said, those constraints which have been placed in a contractual obligation.Corey: Well, this is an area in which I will give the same guidance to companies of your scale as well as small-scale companies. And by small-scale, I mean, people on the free tier account, give or take, so I do mean the smallest of the small. Whenever you wind up in a scenario where you find yourself architecturally constrained by an economic barrier like this, reach out to your account manager. I promise you have one. Every account, even the tiny free tier accounts, have an account manager.I have an account manager, who I have to say has probably one of the most surreal jobs that AWS, just based upon the conversations I throw past him. But it's reaching out to your provider rather than trying to solve a lot of this stuff yourself by constraining how you're building things internally is always the right first move because the worst case is you don't get anywhere in those conversations. Okay, but at least you explored that, as opposed to what often happens is, “Oh, yeah. I have a switch over here I can flip and solve your entire problem. Does that help anything?”Micheal: Yeah.Corey: You feel foolish finding that out only after nine months of dedicated work, it turns out.Micheal: Which makes me wonder, Corey. I mean, do you see a lot of that happening where folks don't tend to reach out to their account managers, or rather treat them as partners in this case, right? Because it sounds like there is this unhealthy tension, I would say, as to what is the best help you could be getting from your account managers in this case.Corey: Constantly. And the challenge comes from a few things, in my experience. The first is that the quality of account managers and the technical account managers—the folks who are embedded many cases with your engineering teams in different ways—does vary. AWS is scaling wildly and bursting at the seams, and people are hard to scale.So, some are fantastic, some are decidedly less so, and most folks fall somewhere in the middle of that bell curve. And it doesn't take too many poor experiences for the default to be, “Oh, those people are useless. They never do anything we want, so why bother asking them?” And that leads to an unhealthy dynamic where a lot of companies will wind up treating their AWS account manager types as a ticket triage system, or the last resort of places that they'll turn when they should be involved in earlier conversations.I mean, take Pinterest as an example of this. I'm not sure how many technical account managers you have assigned to your account, but I'm going to go out on a limb and guess that the ratio of technical account managers to engineers working on the environment is incredibly lopsided. It's got to be a high ratio just because of the nature of how these things work. So, there are a lot of people who are actively working on things that would almost certainly benefit from a more holistic conversation with your AWS account team, but it doesn't occur to them to do it just because of either perceived biases around levels of competence, or poor experiences in the past, or simply not knowing the capabilities that are there. If I could tell one story around the AWS account management story, it would be talk to folks sooner about these things.And to be clear, Pinterest has this less than other folks, but AWS does themselves no favors by having a product strategy of, “Yes,” because very often in service of those conversations with a number of companies, there is the very real concern of are they doing research so that they can launch a service that competes with us? Amazon as a whole launching a social network is admittedly one of the most hilarious ideas I [laugh] can come up with and I hope they take a whack at it just to watch them learn all these lessons themselves, but that is again, neither here nor there.Micheal: That story is very interesting, and I think you mentioned one thing; it's just that lack of trust, or even knowing what the account managers can actually do for you. There seems to be just a lack of education on that. And we also found it the hard way, right? I wouldn't say that Pinterest figured this out on day one. We evolved sort of a relationship over time. Yes, our time… engagements are, sort of, lopsided, but we were able to negotiate that as part of deals as we learned a bit more on what we can and we cannot do, and how these individuals are beneficial for Pinterest as well. And—Corey: Well, here's a question for you, without naming names—and this might illustrate part of the challenge customers have—how long has your account manager—not the technical account managers, but your account manager—been assigned to your account?Micheal: I've been at Pinterest for five years and I've been working with the same person. And he's amazing.Corey: Which is incredibly atypical. At a lot of smaller companies, it feels like, “Oh, I'm your account manager being introduced to you.” And, “Are you the third one this year? Great.” What happens is that if the account manager excels, very often they get promoted and work with a smaller number of accounts at larger spend, and whereas if they don't find that AWS is a great place for them for a variety of reasons, they go somewhere else and need to be backfilled.So, at the smaller account, it's, “Great. I've had more account managers in a year than you've had in five.” And that is often the experience when you start seeing significant levels of rotation, especially on the customer engineering side where you wind up with you have this big kickoff, and everyone's aware of all the capabilities and you look at it three years later, and not a single person who was in that kickoff is still involved with the account on either side, and it's just sort of been evolving evolutionarily from there. One thing that we've done in some of our larger accounts as part of our negotiation process is when we see that the bridges have been so thoroughly burned, we will effectively request a full account team cycle, just because it's time to get new faces in where the customer, in many cases unreasonably, is not going to say, “Yeah but a year-and-a-half ago you did this terrible thing and we're still salty about it.” Fine, whatever. I get it. People relationships are hard. Let's go ahead and swap some folks out so that there are new faces with new perspectives because that helps.Micheal: Well, first off, if you had so many switches in account manager, I think that's something speaks about [laugh] how you've been working, too. I'm just kidding. There are a bu—Corey: Entirely possible. In seriousness, yes. But if you talk to—like, this is not just me because in my case, yeah, I feel like my account manager is whoever drew the short straw that week because frankly, yeah, that does seem like a great punishment to wind up passing out to someone who is underperforming. But for a lot of folks who are in the mid-tier, like, spending $50 to $100,000 a month, this is a very common story.Micheal: Yeah. Actually, we've heard a bit about this, too. And like you said, I think maintaining context is the most thing. You really want your account manager to vouch for you, really be your champion in those meetings because AWS, like you said is so large, getting those exec time, and reviews, and there's so many things that happen, your account manager is the champion for you, or right there. And it's important and in fact in your best interest to have a great relationship with them as well, not treat them as, oh yet another vendor.And I think that's where things start to get a bit messy because when you start treating them as yet another vendor, there is no incentive for them to do the best for you, too. You know, people relationships are hard. But that said though, I think given the amount of customers like these cloud companies are accruing, I wouldn't be surprised; every account manager seems to be extremely burdened. Even in our case, although I've been having a chance to work with this one person for a long time, we've actually expanded. We have now multiple account managers helping us out as we've started scaling to use certain aspects of AWS which we've never explored before.We were a bit constrained and reserved about what service we want to use because there have been instances where we have tried using something and we have hit the wall pretty immediately. API rate limits, or it's not ready for primetime, and we're like, “Oh, my God. Now, what do we do?” So, we have a bit more cautious. But that said, over time, having an account manager who understands how you work, what scale you have, they're able to advocate with the internal engineering teams within the cloud provider to make the best of supporting you as a customer and tell that success story all the way out.So yeah, I can totally understand how this may be hard, especially for those small companies. For what it's worth, I think the best way to really think about it is not treat them as your vendor, but really go out on a limb there. Even though you signed a deal with them, you want to make sure that you have the continuing relationship with them to have—represent your voice better within the company. Which is probably hard. [laugh].Corey: That's always the hard part. Honestly, if this were the sort of thing that were easy to automate, or you could wind up building out something that winds up helping companies figure out how to solve these things programmatically, talk about interesting business problems that are only going to get larger in the fullness of time. This is not going away, even if AWS stopped signing up new customers entirely right now, they would still have years of growth ahead of them just from organic growth. And take a company with the scale of Pinterest and just think of how many years it would take to do a full-on exodus, even if it became priority number one. It's not realistic in many cases, which is why I've never been a big fan of multi-cloud as an approach for negotiation. Yeah, AWS has more data on those points than any of us do; they're not worried about it. It just makes you sound like an unsophisticated negotiator. Pick your poison and lean in.Micheal: That is the truth you just mentioned, and I probably want to give a call out to our head of infrastructure, [Coburn 00:42:13]. He's also my boss, and he had brought this perspective as well. As part of any negotiation discussions, like you just said, AWS has way more data points on this than what we think we can do in terms of talking about, “Oh, we are exploring this other cloud provider.” And it's—they would be like, “Yeah. Do tell me more [laugh] how that's going.”And it's probably in the best interest to never use that as a negotiation tactic because they clearly know the investments that's going to build on what you've done, so you might as well be talking more—again, this is where that relationship really plays together because you want both of them to be successful. And it's in their best interest to still keep you happy because the good thing about at least companies of our size is that we're probably, like, one phone call away from some of their executive team, where we could always talk about what didn't work for us. And I know not everyone has that opportunity, but I'm really hoping and I know at least with some of the interactions we've had with the AWS teams, they're actively working and building that relationship more and more, giving access to those customer advisory boards, and all of them to have those direct calls with the executives. I don't know whether you've seen that in your experience in helping some of these companies?Corey: Have a different approach to it. It turns out when you're super loud and public and noisy about AWS and spend too much time in Seattle, you start to spend time with those people on a social basis. Because, again, I'm obnoxious and annoying to a lot of AWS folks, but I'm also having an obnoxious habit of being right in most of the things I'm pointing out. And that becomes harder and harder to ignore. I mean, part of the value that I found in being able to do this as a consultant is that I begin to compare and contrast different customer environments on a consistent ongoing basis.I mean, the reason that negotiation works well from my perspective is that AWS does a bunch of these every week, and customers do these every few years with AWS. And well, we do an awful lot of them, too, and it's okay, we've seen different ways things can get structured and it doesn't take too long and too many engagements before you start to see the points of commonality in how these things flow together. So, when we wind up seeing things that a customer is planning on architecturally and looking to do in the future, and, “Well, wait a minute. Have you talked to the folks negotiating the contract about this? Because that does potentially have bearing and it provides better data than what AWS is gathering just through looking at overall spend trends. So yeah, bring that up. That is absolutely going to impact the type of offer you get.”It just comes down to understanding the motivators that drive folks and it comes down to, I think understanding the incentives. I will say that across the board, I have never yet seen a deal from AWS come through where it was, “Okay, at this point you're just trying to hoodwink the customer and get them to sign on something that doesn't help them.” I've seen mistakes that can definitely lead to that impression, and I've seen areas where they're doing data is incomplete and they're making assumptions that are not borne out in reality. But it's not one of those bad faith type—Micheal: Yeah.Corey: —of negotiations. If it were, I would be framing a lot of this very differently. It sounds weird to say, “Yeah, your vendor is not trying to screw you over in this sense,” because look at the entire IT industry. How often has that been true about almost any other vendor in the fullness of time? This is something a bit different, and I still think we're trying to grapple with the repercussions of that, from a negotiation standpoint and from a long-term business continuity standpoint, when your faith is linked—in a shared fate context—with your vendor.Micheal: It's in their best interest as well because they're trying to build a diversified portfolio. Like, if they help 100 companies, even if one of them becomes the next Pinterest, that's great, right? And that continued relationship is what they're aiming for. So, assuming any bad faith over there probably is not going to be the best outcome, like you said. And two, it's not a zero-sum game.I always get a sense that when you're doing these negotiations, it's an all-or-nothing deal. It's not. You have to think they're also running a business and it's important that you as your business, how okay are you with some of those premiums? You cannot get a discount on everything, you cannot get the deal or the numbers you probably want almost everything. And to your point, architecturally, if you're moving in a certain direction where you think in the next three years, this is what your usage is going to be or it will come down to that, obviously, you should be investing more and negotiating that out front rather than managed NAT [laugh] gateways, I guess. So, I think that's also an important mindset to take in as part of any of these negotiations. Which I'm assuming—I don't know how you folks have been working in the past, but at least that's one of the key items we have taken in as part of any of these discussions.Corey: I would agree wholeheartedly. I think that it just comes down to understanding where you're going, what's important, and again in some cases knowing around what things AWS will never bend contractually. I've seen companies spend six weeks or more trying to get to negotiate custom SLAs around services. Let me save everyone a bunch of time and money; they will not grant them to you.Micheal: Yeah.Corey: I promise. So, stop asking for them; you're not going to get them. There are other things they will negotiate on that they're going to be highly case-dependent. I'm hesitant to mention any of them just because, “Well, wait a minute, we did that once. Why are you talking about that in public?” I don't want to hear it and confidentiality matters. But yeah, not everything is negotiable, but most things are, so figuring out what levers and knobs and dials you have is important.Micheal: We also found it that way. AWS does cater to their—they are a platform and they are pretty clear in how much engagement—even if we are one of their top customers, there's been many times where I know their product managers have heavily pushed back on some of the requests we have put in. And that makes me wonder, they probably have the same engagement even with the smallest of customers, there's always an implicit assumption that the big fish is trying to get the most out of your public cloud providers. To your point, I don't think that's true. We're rarely able to negotiate anything exclusive in terms of their product offerings just for us, if that makes sense.Case in point, tell us your capacity [laugh] for x instances or type of instances, so we as a company would know how to plan out our scale-ups or scale-downs. That's not going to happen exclusively for you. But those kind of things are just, like, examples we have had a chance to work with their product managers and see if, can we get some flexibility on that? For what it's worth, though, they are willing to find a middle ground with you to make sure that you get your answers and, obviously, you're being successful in your plans to use certain technologies they offer or [unintelligible 00:48:31] how you use their services.Corey: So, I know we've gone significantly over time and we are definitely going to do another episode talking about a lot of the other things that you're involved in because I'm going to assume that your full-time job is not worrying about the AWS bill. In fact, you do a fair number of things beyond that; I just get stuck on that one, given that it is but I eat, sleep, breathe, and dream about.Micheal: Absolutely. I would love to talk more, especially about how we're enabling our engineers to be extremely productive in this new world, and how we want to cater to this whole cloud-native environment which is being created, and make sure people are doing their best work. But regardless, Corey, I mean, this has been an amazing, insightful chat, even for me. And I really appreciate you having me on the show.Corey: No, thank you for joining me. If people want to learn more about what you're up to, and how you think about things, where can they find you? Because I'm also going to go out on a limb and assume you're also probably hiring, given that everyone seems to be these days.Micheal: Well, that is true. And I wasn't planning to make a hiring pitch but I'm glad that you leaned into that one. Yes, we are hiring and you can find me on Twitter at twitter dot com slash M-I-C-H-E-A-L. I am spelled a bit differently, so make sure you can hit me up, and my DMs are open. And obviously, we have all our open roles listed on pinterestcareers.com as well.Corey: And we will, of course, put links to that in the [show notes 00:49:45]. Thank you so much for taking the time to speak with me today. I really appreciate it.Micheal: Thank you, Corey. It was really been great on your show.Corey: And I'm sure we'll do it again in the near future. Micheal Benedict, Head of Engineering Productivity at Pinterest. I am Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with a long rambling comment about exactly how many data centers Pinterest could build instead.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

My Imaginary Friends with L. Penelope
Fixed Points in Plotting

My Imaginary Friends with L. Penelope

Play Episode Listen Later Nov 8, 2021 25:35


Mentioned: Thursday, November 11, 2021 @ 9pm ET - L. Penelope in conversation with Kit Rocha - https://www.mystgalaxy.com/penelope11121    - Save the Cat beat sheet - https://lpenelope.com/extras/resources-for-writers/  - Jami Gold worksheet - https://jamigold.com/for-writers/worksheets-for-writers/  - Write Your Novel from the Middle by James Scott Bell – https://amzn.to/35eBwLu     - My Events - https://lpenelope.com/calendar/    - Kate Stradling - https://katestradling.com/books/ The My Imaginary Friends podcast is a weekly, behind the scenes look at the journey of a working author navigating traditional and self-publishing. Join fantasy and paranormal romance author L. Penelope as she shares insights on the writing life, creativity, inspiration, and this week's best thing. Subscribe and view show notes at: https://lpenelope.com/podcast | Get the Footnotes newsletter - http://lpen.co/footnotes Support the show - http://frolic.media/podcasts! Stay in touch with me! Website | Instagram | Twitter | Facebook Music credit: Say Good Night by Joakim Karud https://soundcloud.com/joakimkarud Creative Commons — Attribution-ShareAlike 3.0 Unported— CC BY-SA 3.0 http://creativecommons.org/licenses/by-sa/3.0/ Music promoted by Audio Library https://youtu.be/SZkVShypKgM Affiliate Disclosure: I may receive compensation for links to products on this site either directly or indirectly via affiliate links. Heartspell Media, LLC is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com.

Data Engineering Podcast
Exploring The Evolution And Adoption of Customer Data Platforms and Reverse ETL

Data Engineering Podcast

Play Episode Listen Later Nov 5, 2021 62:06


The precursor to widespread adoption of cloud data warehouses was the creation of customer data platforms. Acting as a centralized repository of information about how your customers interact with your organization they drove a wave of analytics about how to improve products based on actual usage data. A natural outgrowth of that capability is the more recent growth of reverse ETL systems that use those analytics to feed back into the operational systems used to engage with the customer. In this episode Tejas Manohar and Rachel Bradley-Haas share the story of their own careers and experiences coinciding with these trends. They also discuss the current state of the market for these technological patterns and how to take advantage of them in your own work.

Software Sessions
Robotic Process Automation with Alexander Pugh

Software Sessions

Play Episode Listen Later Oct 28, 2021 67:04


Alexander Pugh is a software engineer at Albertsons. He has worked in Robotic Process Automation and the cognitive services industry for over five years.This episode originally aired on Software Engineering Radio.Related LinksAlexander Pugh's personal siteEnterprise RPA Solutions Automation Anywhere UiPath blueprism Enterprise "Low Code/No Code" API Solutions appian mulesoft Power Automate RPA and the OS Office primary interop assemblies Office Add-ins documentation Task Scheduler for developers The Component Object Model The Document Object Model TranscriptYou can help edit this transcript on GitHub.[00:00:00] Jeremy: Today, I'm talking to Alexander Pugh. He's a solutions architect with over five years of experience working on robotic process automation and cognitive services. Today, we're going to focus on robotic process automation. Alexander welcome to software engineering radio. [00:00:17] Alex: Thank you, Jeremy. It's really good to be here. [00:00:18] Jeremy: So what does robotic process automation actually mean? [00:00:23] Alex: Right. It's a, it's a very broad nebulous term. when we talk about robotic process automation, as a concept, we're talking about automating things that humans do in the way that they do them. So that's the robotic, an automation that is, um, done in the way a human does a thing.Um, and then process is that thing, um, that we're automating. And then automation is just saying, we're turning this into an automation where we're orchestrating this and automating this. and the best way to think about that in any other way is to think of a factory or a car assembly line. So initially when we went in and we, automated a car or factory, automation line, what they did is essentially they replicated the process as a human did it. So one day you had a human that would pick up a door and then put it on the car and bolt it on with their arms. And so the initial automations that we had on those factory lines were a robot arm that would pick up that door from the same place and put it on the car and bolt it on there.Um, so the same can be said for robotic process automation. We're essentially looking at these, processes that humans do, and we're replicating them, with an automation that does it in the same way. Um, and where we're doing that is the operating system. So robotic process automation is essentially going in and automating the operating system to perform tasks the same way a human would do them in an operating system.So that's, that's RPA in a nutshell, Jeremy: So when you say you're replicating something that a human would do, does it mean it has to go through some kind of GUI or some kind of user interface?[00:02:23] Alex: That's exactly right, actually. when we're talking about RPA and we look at a process that we want to automate with RPA, we say, okay. let's watch the human do it. Let's record that. Let's document the human process. And then let's use the RPA tool to replicate that exactly in that way.So go double click on Chrome, launch that click in the URL line and send key in www.cnn.com or what have you, or servicenow hit enter, wait for it to load and then click, you know, where you want to, you know, fill out your ticket for service. Now send key in. So that's exactly how an RPA solution at the most basic can be achieved.Now and any software engineer knows if you sit there and look over someone's shoulder and watch them use an operating system. Uh, you'll say, well, there's a lot of ways we can do this more efficiently without going over here, clicking that, you know, we can, use a lot of services that the operating system provides in a programmatic way to achieve the same ends and RPA solutions can also do that.The real key is making sure that it is still achieving something that the human does and that if the RPA solution goes away, a human can still achieve it. So if you're, trying to replace or replicate a process with RPA, you don't want to change that process so much so that a human can no longer achieve it as well.that's something where if you get a very technical, and very fluent software engineer, they lose sight of that because they say, oh, you know what? There's no reason why we need to go open a browser and go to you know, the service now portal and type this in when I can just directly send information to their backend.which a human could not replicate. Right? So that's kind of where the line gets fuzzy. How efficiently can we make this RPA solution? [00:04:32] Jeremy: I, I think a question that a lot of people are probably having is a lot of applications have APIs now. but what you're saying is that for it to, to be, I suppose, true RPA, it needs to be something that a user can do on their own and not something that the user can do by opening up dev tools or making a post to an end point.[00:04:57] Alex: Yeah. And so this, this is probably really important right now to talk about why RPA, right? Why would you do this when you could put on a server, a a really good, API ingestion point or trigger or a web hook that can do this stuff. So why would we, why would we ever pursue our RPA?There there's a lot of good reasons for it. RPA is very, very enticing to the business. RPA solutions and tools are marketed as a low code, no code solution for the business to utilize, to solve their processes that may not be solved by an enterprise solution and the in-between processes in a way.You have, uh, a big enterprise, finance solution that everyone uses for the finance needs of your business, but there are some things that it doesn't provide for that you have a person that's doing a lot of, and the business says, Okay. well, this thing, this human is doing this is really beneath their capability. We need to get a software solution for it, but our enterprise solution just can't account for it. So let's get a RPA capability in here. We can build it ourselves, and then there we go. So there, there are many reasons to do that. financial, IT might not have, um, the capability or the funding to actually build and solve the solution. Or it it's at a scale that is too small to open up, uh, an IT project to solve for. Um, so, you know, a team of five is just doing this and they're doing it for, you know, 20 hours a week, which is uh large, but in a big enterprise, that's not really. Maybe um, worth building an enterprise solution for it. or, and this is a big one. There are regulatory constraints and security constraints around being able to access this or communicate some data or information in a way that is non-human or programmatic. So that's really where, um, RPA is correctly and best applied and you'll see it most often.So what we're talking about there is in finance, in healthcare or in big companies where they're dealing with a lot of user data or customer data in a way. So when we talk about finance and healthcare, there are a lot of regulatory constraints and security reasons why you would not enable a programmatic solution to operate on your systems. You know, it's just too hard. We we're not going to expose our databases or our data to any other thing. It would, it would take a huge enterprise project to build out that capability, secure that capability and ensure it's going correctly. We just don't have the money the time or the strength honestly, to afford for it.So they say, well, we already have. a user pattern. We already allow users to, to talk to this information and communicate this information. Let's get an RPA tool, which for all intents and purposes will be acting as a user. And then it can just automate that process without us exposing to queries or any other thing, an enterprise solution or programmatic, um, solution.So that's really why RPA, where and why you, you would apply it is there's, there's just no capability at enterprise for one reason or another to solve for it. [00:08:47] Jeremy: as software engineers, when we see this kind of problem, our first thought is, okay, let's, let's build this custom application or workflow. That's going to talk to all these API APIs. And, and what it sounds like is. In a lot of cases there just isn't the time there just isn't the money, to put in the effort to do that.And, it also sounds like this is a way of being able to automate that. and maybe introducing less risk because you're going through the same, security, the same workflow that people are doing currently. So, you know, you're not going to get into things that they're not supposed to be able to get into because all of that's already put in place.[00:09:36] Alex: Correct. And it's an already accepted pattern and it's kind of odd to apply that kind of very IT software engineer term to a human user, but a human user is a pattern in software engineering. We have patterns that do this and that, and, you know, databases and not, and then the user journey or the user permissions and security and all that is a pattern.And that is accepted by default when you're building these enterprise applications okay.What's the user pattern. And so since that's already established and well-known, and all the hopefully, you know, walls are built around that to enable it to correctly do what it needs to do. It's saying, Okay. we've already established that. Let's just use that instead of. You know, building a programmatic solution where we have to go and find, do we already have an appropriate pattern to apply to it? Can we build it in safe way? And then can we support it? You know, all of a sudden we, you know, we have the support teams that, you know, watch our Splunk dashboards and make sure nothing's going down with our big enterprise application.And then you're going to build a, another capability. Okay. WHere's that support going to come from? And now we got to talk about change access boards, user acceptance testing and, uh, you know, UAT dev production environments and all that. So it becomes, untenable, depending on your, your organization to, to do that for things that might fall into a place that is, it doesn't justify the scale that needs to be thrown upon it.But when we talk about something like APIs and API exist, um, for a lot of things, they don't exist for everything. And, a lot of times that's for legacy databases, that's for mainframe capability. And this is really where RPA shines and is correctly applied. And especially in big businesses are highly regulated businesses where they can't upgrade to the newest thing, or they can't throw something to the cloud.They have a, you know, their mainframe systems or they have their database systems that have to exist for one reason or the other until there is the motivation and the money and the time to correctly migrate and, and solve for them. So until that day, and again, there's no, API to, to do anything on a, on a mainframe, in this bank or whatnot, it's like, well, Okay. let's just throw RPA on it.Let's, you know, let's have a RPA do this thing, uh, in the way that a human does it, but it can do it 24 7. and an example, or use cases, you work at a bank and, uh, there's no way that InfoSec is going to let you query against this database with, your users that have this account or your customers that have this no way in any organization at a bank.Is InfoSec going to say, oh yeah. sure. Let me give you an Odata query, you know, driver on, you know, and you can just set up your own SQL queries and do whatever they're gonna say no way. In fact, how did you find out about this database in the first place and who are you.How do we solve it? We, we go and say, Okay. how does the user get in here well they open up a mainframe emulator on their desktop, which shows them the mainframe. And then they go in, they click here and they put this number in there, and then they look up this customer and then they switch this value to that value and they say, save.And it's like, okay. cool. That's that RPA can do. And we can do that quite easily. And we don't need to talk about APIs and we don't need to talk about special access or doing queries that makes, you know, Infosec very scared. you know, a great use case for that is, you know, a bank say they, they acquire, uh, a regional bank and they say, cool, you're now part of our bank, but in your systems that are now going to be a part of our systems, you have users that have this value, whereas in our bank, that value is this value here. So now we have to go and change for 30,000 customers this one field to make it line up with our systems. Traditionally you would get a, you know, extract, transform load tool an ETL tool to kind of do that. But for 30,000 customers that might be below the threshold, and this is banking. So it's very regulated and you have to be very, very. Intentional about how you manipulate and move data around.So what do we have to do? okay. We have to hire 10 contractors for six months, and literally what they're going to do eight hours a day is go into the mainframe through the simulator and customer by customer. They're going to go change this value and hit save. And they're looking at an Excel spreadsheet that tells them what customer to go into.And that's going to cost X amount of money and X, you know, for six months, or what we could do is just build a RPA solution, a bot, essentially that goes, and for each line of that Excel spreadsheet, it repeats this one process, open up mainframe emulator, navigate into the customer profile and then changes value, and then shut down and repeat.And It can do that in one week and, and can be built in two, that's the, the dream use case for RPA and that's really kind of, uh, where it would shine.[00:15:20] Jeremy: It sounds like the. best use case for it is an old system, a mainframe system, in COBOL maybe, uh, doesn't have an API. And so, uh, it makes sense to rather than go, okay, how can we get directly into the database?[00:15:38] Alex: How can we build on top of it? Yeah,[00:15:40] Jeremy: we build on top of it? Let's just go through the, user interface that exists, but just automate that process. And, the, you know, the example you gave, it sounds very, very well-defined you're gonna log in and you're going to put in maybe this ID, here's the fields you want to get back.and you're going to save those and you didn't have to make any real decisions, I suppose, in, in terms of, do I need to click this thing or this thing it's always going to be the same path to, to get there.[00:16:12] Alex: exactly. And that's really, you need to be disciplined about your use cases and what those look like. And you can broadly say a use case that I am going to accept has these features, and one of the best ways to do that is say it has to be a binary decision process, which means there is no, dynamic or interpreted decision that needs to, or information that needs to be made.Exactly like that use case it's very binary either is, or it isn't you go in you journey into there. and you change that one thing and that's it there's no oh, well this information says this, which means, and then I have to go do this. Once you start getting in those if else, uh, processes you're, you're going down a rabbit hole and it could get very shaky and that introduces extreme instability in what you're trying to do.And also really expands your development time cause you have to capture these processes and you have to say, okay. tell me exactly what we need to build this bot to do. And for, binary decision processes, that's easy go in here, do this, but nine times out of 10, as you're trying to address this and solution for it, you'll find those uncertainties.You'll find these things where the business says, oh, well, yeah. that happens, you know, one times out of 10 and this is what we need to do. And it's like, well, that's going to break the bot. It, you know, nine times out of 10, this, this spot is going to fall over. this is now where we start getting into, the machine learning and AI, realm.And why RPA, is classified. Uh, sometimes as a subset of the AI or machine learning field, or is a, a pattern within that field is because now that you have this bot or this software that enables you to do a human process, let's enable that bot to now do decision-making processes where it can interpret something and then do something else.Because while we can just do a big tree to kind of address every capability, you're never going to be able to do that. And also it's, it's just a really heavy, bad way to build things. So instead let's throw in some machine learning capability where it just can understand what to do and that's, you know, that's the next level of RPA application is Okay. we've got it. We've, we've gone throughout our organization. We found every kind of binary thing, that can be replaced with an RPA bot. Okay.Now what are the ones that we said we couldn't do? Because it had some of that decision-making that, required too much of a dynamic, uh, intelligence behind it. And let's see if we can address those now that we have this. And so that's, that's the 2.0, in RPA is addressing those non-binary, paths. I would argue that especially in organizations that are big enough to justify bringing in an RPA solution to solve for their processes. They have enough binary processes, binary decision processes to keep them busy.Some people, kind of get caught up in trying to right out the gate, say, we need to throw some machine learning. We need to make these bots really capable instead of just saying, well, we we've got plenty of work, just changing the binary processes or addressing those. Let's just be disciplined and take that, approach.Uh, I will say towards RPA and bots, the best solution or the only solution. When you talk about building a bot is the one that you eventually turn off. So you can say, I built a bot that will go into our mainframe system and update this value. And, uh, that's successful.I would argue that's not successful. When that bot is successful is when you can turn it off because there's an enterprise solution that addresses it. and, and you don't have to have this RPA bot that lives over here and does it instead, you're enterprise, capability now affords for it. And so that's really, I think a successful bot or a successful RPA solution is you've been able to take away the pain point or that human process until it can be correctly addressed by your systems that everyone uses. [00:21:01] Jeremy: from, the business perspective, you know, what are some of the limitations or long-term problems with, with leaving an RPA solution in place?[00:21:12] Alex: that's a, that's a good question. Uh, from the business there, isn't, it's solved for. leaving it in place is other than just servicing it and supporting it. There's no real issue there, especially if it's an internal system, like a mainframe, you guys own that. If it changes, you'll know it, if it changes it's probably being fixed or addressed.So there's no, problem. However, That's not the only application for RPA. let's talk about another use case here, your organization, uses, a bank and you don't have an internal way to communicate it. Your user literally has to go to the bank's website, log in and see information that the bank is saying, Hey, this is your stuff, right?The bank doesn't have an API for their, that service. because that would be scary for the bank. They say, we don't want to expose this to another service. So the human has to go in there, log in, look at maybe a PDF and download it and say, oh, Okay.So that is happens in a browser. So it's a newer technology.This isn't our mainframe built in 1980. You know, browser based it's in the internet and all that, but that's still a valid RPA application, right? It's a human process. There's no API, there's no easy programmatic way to, to solution for it. It would require the bank and your it team to get together and, you know, hate each other. Think about why this, this is so hard. So let's just throw a bot on it. That's going to go and log in, download this thing from the bank's website and then send it over to someone else. And it's going to do that all day. Every day. That's a valid application. And then tomorrow the bank changes its logo. And now my bot is it's confused.Stuff has shifted on the page. It doesn't know where to click anymore. So you have to go in and update that bot because sure enough, that bank's not going to send out an email to you and saying, Hey, by the way, we're upgrading our website in two weeks. Not going to happen, you'll know after it's happened.So that's where you're going to have to upgrade the bot. and that's the indefinite use of RPA is going to have to keep until someone else decides to upgrade their systems and provide for a programmatic solution that is completely outside the, uh, capability of the organization to change. And so that's where the business would say, we need this indefinitely.It's not up to us. And so that is an indefinite solution that would be valid. Right? You can keep that going for 10 years as long, I would say you probably need to get a bank that maybe meets your business needs a little easier, but it's valid. And that would be a good way for the business to say yes, this needs to keep running forever until it doesn't.[00:24:01] Jeremy: you, you brought up the case of where the webpage changes and the bot doesn't work anymore. specifically, you're, you're giving the example of finance and I feel like it would be basically catastrophic if the bot is moving money to somewhere, it shouldn't be moving because the UI has moved around or the buttons not where it expects it to be.And I'm kind of curious what your experience has been with that sort of thing.[00:24:27] Alex: you need to set organizational thresholds and say, this is this something this impacting or something that could go this wrong. It is not acceptable for us to solve with RPA, even though we could do it, it's just not worth it. Some organizations say that's anything that touches customer data healthcare and banking specialists say, yeah, we have a human process where the human will go and issue refunds to a customer, uh, and that could easily be done via RPA solution, but it's fraught with, what, if it does something wrong, it's literally going to impact.Uh, someone somewhere they're their moneys or their, their security or something like that. So that, that definitely should be part of your evaluation. And, um, as an organization, you should set that up early and stick to it and say, Nope, this is outside our purview. Even we can do it. It has these things.So I guess the answer to that is you should never get to that process, but now we're going to talk about, I guess, the actual nuts and bolts of how RPA solutions work and how they can be made to not action upon stuff when it changes or if it does so RPA software, by and large operates by exposing the operating system or the browsers underlying models and interpreting them.Right. So when we talk about something like a, mainframe emulator, you have your RPA software on Microsoft windows. It's going to use the COM the component operating model, to see what is on the screen, what is on that emulator, and it's gonna expose those objects. to the software and say, you can pick these things and click on that and do that.when we're talking about browser, what the RPA software is looking at is not only the COM the, the component object model there, which is the browser, itself. But then it's also looking at the DOM the document object model that is the webpage that is being served through the browser. And it's exposing that and saying, these are the things that you can touch or, operate on.And so when you're building your bots, what you want to make sure is that the uniqueness of the thing that you're trying to access is something that is truly unique. And if it changes that one thing that the bot is looking for will not change. So we let's, let's go back to the, the banking website, right?We go in and we launch the browser and the bot is sitting there waiting for the operating system to say, this process is running, which is what you wanted to launch. And it is in this state, you know, the bot says, okay. I'm expecting this kind of COM to exist. I see it does exist. It's this process, and it has this kind of name and cool Chrome is running. Okay. Let's go to this website. And after I've typed this in, I'm going to wait and look at the DOM and wait for it to return this expected a webpage name, but they could change their webpage name, the title of it, right. They can say, one day can say, hello, welcome to this bank. And the next day it says bank website, all of a sudden your bot breaks it no longer is finding what it was told to expect.So you want to find something unique that will never change with that conceivably. And so you find that one thing on the DOM on the banking website, it's, you know, this element or this tag said, okay, there's no way they're changing that. And so it says cool the page is loaded. Now click on this field, which is log in.Okay. You want to find something unique on that field that won't change when they upgrade, you know, from bootstrap to this kind of, you know, UI framework. that's all well, and good. That's what we call the happy path. It's doing this perfectly. Now you need to define what it should do when it doesn't find these things, which is not keep going or find similar it's it needs to fail fast and gracefully and pass on that failure to someone and not keep going. And that's kind of how we prevent that scary use case where it's like. okay. it's gone in, it's logged into the bank website now it's transactioning, bad things to bad places that we didn't program it for it, Well you unfortunately did not specify in a detailed enough way what it needs to look for.And if it doesn't find that it needs to break, instead of saying that this is close enough. And so, in all things, software engineering, it's that specificity, it's that detail, that you need to hook onto. And that's also where, when we talk about this being a low-code no-code solutions that sometimes RPA is marketed to the business.It's just so often not the case, because yes. It might provide a very user, business, friendly interface for you to build bots. But the knowledge you need to be able to ensure stability and accuracy, um, to build the bots is, is a familiarity that's probably not going to be had in the business. It's going to be had by a developer who knows what the DOM and COM are and how the operating system exposes services and processes and how.JavaScript, especially when we're talking about single page apps and react where you do have this very reactive DOM, that's going to change. You need to be fluent with that and know, not only how HTML tags work and how CSS will change stuff on you in classes, but also how clicking on something on a single page app is as simple as a username input field will dynamically change that whole DOM and you need to account for it. so, it is it's, traditionally not as easy as saying, oh, the business person can just click, click, click, click, and then we have a bot. You'll have a bot, but it's probably going to be break breaking quite often. It's going to be inaccurate in its execution.this is a business friendly user-friendly non-technical tool. And I launch it and it says, what do you want to do? And it says, let me record what you're going to do. And you say, cool.And then you go about you open up Chrome and you type in the browser, and then you click here, click there, hit send, and then you stop recording. The tool says, cool, this is what you've done. Well, I have yet to see a, a solution that is that isn't able to not need further direction or, or defining on that process, You still should need to go in there and say, okay, yeah.you recorded this correctly, but you know, you're not interpreting correctly or as accurate as you need to that field that I clicked on.And if you know, anybody hits, you know, F12 on their keyboard while they have Chrome open and they see how the DOM is built, especially if this is using kind of any kind of template, Webpage software. It's going to have a lot of cruft in that HTML. So while yes, the recording did correctly see that you clicked on the input box.What it's actually seen is that you actually clicked on the div. That is four levels scoped above it, whereas the parent, and there are other things within that as well. And so the software could be correctly clicking on that later, but other things could be in there and you're going to get some instability.So the human or the business, um, bot builder, the roboticist, I guess, would need to say, okay, listen, we need to pare this down, but it's, it's even beyond that. There are concepts that you can't get around when building bots that are unique to software engineering as a concept. And even though they're very basic, it's still sometimes hard for the business user to, they felt to learn that.And I I'm talking concepts as simple as for loops or loops in general where the business of course has, has knowledge of what we would call a loop, but they wouldn't call it a loop and it's not as accurately defined. So they have to learn that. And it's not as easy as just saying, oh Yeah.do a loop. And the business will say, well, what's a loop.Like I know, you know, conceptually what a loop could be like a loop in my, when I'm tying my shoe. But when you're talking about loop, that's a very specific thing in software and what you can do. And when you shouldn't do it, and that's something that these, no matter how good your low code, no code solution might be, it's going to have to afford for that concept.And so a business user is still going to have to have some lower level capability to apply those concepts. And, and I I've yet to see anybody be able to get around that in their RPA solutions.[00:33:42] Jeremy: So in your experience, even though these vendors may sell it as being a tool that anybody can just sit down and use but then you would want a developer to, to sit with them or, or see the result and, and try and figure out, okay, what do you, what do you really want this, this code to do?Um, not just sort of these broad strokes that you were hoping the tool was gonna take care of for you? Yeah.[00:34:06] Alex: that that's exactly right. And that's how every organization will come to that realization pretty quickly. the head of the game ones have said, okay, we need to have a really good, um, COE structure to this robotic operating model where we can have, a software engineering, developer capability that sits with the business, capability.And they can, marry with each other, other businesses who may take, um, these vendors at their word and say, it's a low code meant for business. It just needs to make sure it's on and accessible. And then our business people are just gonna, uh, go in there and do this. They find out pretty quickly that they need some technical, um, guidance to go in because they're building unstable or inaccurate bots.and whether they come to that sooner or later, they, they always come to that. Um, and they realize that, okay, there there's a technical capability And, this is not just RPA. This is the story of all low-code no-code solutions that have ever existed. It always comes around that, while this is a great interface for doing that, and it's very easy and it makes concepts easy.Every single time, there is a technical capability that needs to be afforded. [00:35:26] Jeremy: For the. The web browser, you mentioned the DOM, which is how we typically interact with applications there. But for native applications, you, you briefly mentioned, COM. And I was wondering when someone is writing, um, you know, a bot, uh, what are the sorts of things they see, or what are the primitives they're working with?Like, is there a name attached to each button, each text, field, [00:35:54] Alex: wouldn't that be a great world to live in, so there's not. And, and, as we build things in the DOM. People get a lot better. We've seen people are getting much better about using uniqueness when they build those things so that they can latch onto when things were built or built for the COM or, you know, a .NET for OS that might, that was not no one no one was like oh yeah, we're going to automate this.Or, you know, we need to make this so that this button here is going to be unique from that button over there on the COM they didn't care, you know, different name. Um, so yeah, that is, that is sometimes a big issue when you're using, uh, an RPA solution, you say, okay. cool. Look at this, your calculator app. And Okay. it's showing me the component object model that this was built. It that's describing what is looking at, but none of these nodes have, have a name. They're all, you know, node one node, 1.1 node two, or, or whatnot, or button is just button and there's no uniqueness around it. And that is, you see a lot of that in legacy older software, um, E legacy is things built in 2005, 2010.Um, you do see that, and that's the difficulty at that point. You can still solve for this, but what you're doing is you're using send keys. So instead of saying, Okay.RPA software, open up this, uh, application and then look for. You know, thing, this object in the COM and click on it, it's going to, you know, it can't, there is no uniqueness.So what you say is just open up the software and then just hit tab three times, and that should get you to this one place that was not unique, but we know if you hit tab three times, it's going to get there now. That's all well and good, but there's so many things that could interfere with that and break it.And the there's no context for the bot to grab onto, to verify, Okay. I am there. So any one thing, you could have a pop-up which essentially hijacks your send key, right? And so the bot yes, absolutely hit tab three times and it should be in that one place. It thinks it is, and it hits in enter, but in between the first and second tab, a pop-up happened and now it's latched onto this other process, hits enter. And all of a sudden outlook's opening bot doesn't know that, but it's still going on and it's going to enter in some financial information into oops, an email that it launched because it thought hitting enter again would do so. Yeah.That's, that's where you get that instability. Um, there are other ways around it or other solutions.and this is where we get into the you're using, um, lower level software engineering solutioning instead of doing it exactly how the user does it. When we're talking about the operating system and windows, there are a ton of interop services and assemblies that a, uh, RPA solution can access.So instead of cracking open Excel, double-clicking on Excel workbook waiting for it to load, and then reading the information and putting information in, you can use the, you know, the office 365 or whatnot that, um, interop service assembly and say, Hey, launch this workbook without the UI, showing it, attach to that process that, you know, it is.And then just send to it, using that assembly send information into it. And the human user can't do that. It can't manipulate stuff like that, but the bot can, and it it's the same end as the human users trying. And it's much more efficient and stable because the UI couldn't afford for that kind of stability.So that would be a valid solution. But at that point, you're really migrating into a software engineering, it developer solution of something that you were trying not to do that for. So when is that? Why, you know, why not just go and solve for it with an enterprise or programmatic solution in the first place?So that's the balance. [00:40:18] Jeremy: earlier you're talking about the RPA needs to be something that, uh, that the person is able to do. And it sounds like in this case, I guess there still is a way for the person to do it. They can open up the, the Excel sheet and right it's just that the way the, the RPA tool is doing it is different. Yeah.[00:40:38] Alex: Right. And more efficient and more stable. Certainly. Uh, especially when we're talking about Excel, you have an Excel with, you know, 200,000 lines, just opening that that's, that's your day, that's going to Excel it, just going to take its time opening and visualizing that information for you. Whereas you, you know, an RPA solution doesn't even need to crack that open.Uh, it can just send data right directly to that workbook and it that's a valid solution. And again, some of these processes, it might be just two people at your organization that are essentially doing it. So it's, you know, you don't really, it's not at a threshold where you need an enterprise solution for it, but they're spending 30 minutes of their day just waiting for that Excel workbook to open and then manipulating the data and saving it.And then, oh, their computer crashed. So you can do an RPA solution. It's going to be, um, to essentially build for a more efficient way of doing it. And that would be using the programmatic solution, but you're right. It is doing it in a way that a human could not achieve it. Um, and that again is. The where the discipline and the organizational, aspect of this comes in where it's saying, is that acceptable?Is it okay to have it do things in this way, that are not human, but achieving the same ends. And if you're not disciplined that creeps, and all of a sudden you have a RPA solution that is doing things in a way that where the whole reason to bring that RPA solution is to not have something that did something like that. And that's usually where the stuff falls apart. IT all of a sudden perks their head up and says, wait, I have a lot of connections coming in from this one computer doing stuff very quickly with a, you know, a SQL query. It's like, what is going on? And so all of a sudden, someone built a bot to essentially do a programmatic connection.And it is like, you should not be who gave you this permissions who did this shut down everything that is RPA here until we figure out what you guys went and did. So that's, that's the dance. [00:42:55] Jeremy: it's almost like there's this hidden API or there's this API that you're not intended to use. but in the process of trying to automate this thing, you, you use it and then if your, IT is not aware of it, then things just kind of spiral out of control.[00:43:10] Alex: Exactly. Right. So let's, you know, a use case of that would be, um, we need to get California tax information on alcohol sales. We need to see what each county taxes for alcohol to apply to something. And so today the human users, they go into the California, you know, tobacco, wildlife, whatever website, and they go look up stuff and okay, let's, that's, that's very arduous.Let's throw a bot on that. Let's have a bot do that. Well, the bot developers, smart person knows their way around Google and they find out, well, California has an API for that. instead of the bot cracking open Chrome, it's just going to send this rest API call and it's going to get information back and that's awesome and accurate and way better than anything. but now all of a sudden IT sees connections going in and out. all of a sudden it's doing very quickly and it's getting information coming into your systems in a way that you did not know was going to be, uh, happening. And so while it was all well and good, it's, it's a good way for, the people whose job it is to protect yourself or know about these things, to get very, um, angry, rightly so that this is happening.that's an organizational challenge, uh, and it's an oversight challenge and it's a, it's a developer challenge because, what you're getting into is the problems with having too technical people build these RPA bots, right? So on one hand we have business people who are told, Hey, just crack this thing open and build it.And it's like, well, they don't have enough technical fluency to actually build a stable bot because they're just taking it at face value. Um, on the other hand, you have software engineers or developers that are very technical that say, oh, this process. Yeah. Okay. I can build a bot for that. But what if I used, you know, these interop services, assemblies that Microsoft gives me and I can access it like that.And then I can send an API call over here. And while I'm at it, I'm going to, you know, I'm just going to spin up a server just on this one computer that can do this. When the bot talks to it. And so you have the opposite problem. Now you have something that is just not at all RPA, it's just using the tool to, uh, you know, manipulate stuff, programmatically.[00:45:35] Jeremy: so, as a part of all this, is using the same credentials as a real user, right. You're you're logging in with a username and password. if the form requires something like two factor authentication or, you know, or something like that, like, how does that work since it's not an actual person?[00:45:55] Alex: Right. So in a perfect world, you're correct. Um, a bot is a user. I know a lot of times you'll hear, say, people will be like, oh, hi, I have 20 RPA bots. What they're usually saying is I have 20 automations that are being run for separate processes, with one user's credentials, uh, on a VDI. So you're right.They, they are using a user's credentials with the same permissions that any user that does that process has, that's why it's easy. but now we have these concepts, like two factor authentication, which every organization is using that should require something that exists outside of that bot users environment. And so how do you afford for that in a perfect world? It would be a service account, not a user account and service accounts are governed a little differently. A lot of times service accounts, um, have much more stringent rules, but also allow for things like password resets, not a thing, um, or two factor authentication is not a thing for those.So that would be the perfect solution, but now you're dragging in IT. Um, so, you know, if you're not structurally set up for that, that's going to be a long slog. Uh, so what would want to do some people actually literally have a, we'll have a business person that has their two factor auth for that bot user on their phone.And then just, you know, they'll just go in and say, yeah.that's me. that's untenable. So, um, sometimes what a lot of these, like Microsoft, for instance, allow you to do is to install a two factor authentication, application, um, on your desktop so that when you go to log in a website and says, Hey, type in your password.Cool. Okay. Give me that code. That's on your two factor auth app. The bot can actually launch that. Copy the code and paste it in there and be on its way. But you're right now, you're having to afford for things that aren't really part of the process you're trying to automate. They are the incidentals that also happen.And so you have to build your bot to afford for those things and interpret, oh, I need to do two factor authentication. And a lot of times, especially if you have an entirely business focused PA um, robotic operating model, they will forget about those things or find ways around them that the bot isn't addressing, like having that authenticator app on their phone.that's, um, stuff that definitely needs to be addressed. And sometimes is only, found at runtime like, oh, it's asking for login. And when I developed it, I didn't need to do that because I had, you know, the cookie that said you're good for 30 days, but now, oh, no. [00:48:47] Jeremy: yeah. You could have two factor. Um, you could have, it asking you to check your email for a code. There could be a fraud warning. There's like all sorts of, you know, failure cases that can happen. [00:48:58] Alex: exactly. And those things are when we talk about, uh, third-party vendors, um, third-party provider vendors, like going back to the banking website, if you don't tell them that you're going to be using a bot to get their information or to interface with that website, you're setting yourself up for a bad time because they're going to see that kind of at runtime behavior that is not possible at scale by user.And so you run into that issue at runtime, but then you're correct. There are other things that you might run into at runtime that are not again, part of the process, the business didn't think that that was part of the process. It's just something they do that actually the bot has to afford for. that's part of the journey, uh, in building these. [00:49:57] Jeremy: when you're, when you're building these, these bots, what are the types of tools that, that you've used in the past? Are these commercial, packages, are these open source? Like what, what does that ecosystem look like?[00:50:11] Alex: Yeah, in this space, we have three big ones, which is, uh, automation anywhere UI path and, blue prism. Those are the RPA juggernauts providing this software to the companies that need it. And then you have smaller ones that are, trying to get in there, or provide stuff in a little different way. and you even have now, big juggernauts that are trying to provide for it, like Microsoft with something like power automate desktop.So all of these, say three years ago, all of these softwares existed or all of these RPA solution softwares existed or operated in the same kind of way, where you would install it on your desktop. And it would provide you a studio to either record or define, uh, originally the process that was going to be automated on that desktop when you pushed play and they all kind of expose or operate in the same way they would interpret the COM or the DOM that the operating system provided. Things like task scheduler have traditionally, uh, exposed, uh, and they all kind of did that in the same way. Their value proposition in their software was the orchestration capability and the management of that.So I build a bot to do this, Jim over there built a bot to do that. Okay. This RPA software, not only enabled you to define those processes, But what their real value was is they enable a place where I can say this needs to run at this time on this computer.And it needs to, you know, I need to be able to monitor it and it needs to return information and all that kind of orchestration capability. Now all of these RPA solutions actually exist in that, like everything else in the browser. So instead of installing, you know, the application and launching it and, and whatnot, and the orchestration capability being installed on another computer that looked at these computers and ran stuff on them.Now it's, it's all in the cloud as it were, and they are in the browser. So I go to. Wherever my RPA solution is in my browser. And then it says, okay, cool. You, you still need to install something on the desktop where you want the spot to run and it deploys it there. But I define and build my process in the provided browser studio.And then we're going to give you a capability to orchestrate, monitor, and, uh, receive information on those things that you have, those bots that you have running, and then what they're now providing as well is the ability to tie in other services to your bot so that it has expanded capability. So I'm using automation anywhere and I built my bot and it's going, and it's doing this or that.And automation anywhere says, Hey, that's cool. Wouldn't you like your bot to be able to do OCR? Well, we don't have our own OCR engine, but you probably as an enterprise do. Just use, you know, use your Kofax OCR engine or Hey, if you're really a high speed, why don't you use your Azure cognitive services capability?We'll tie it right into our software. And so when you're building your bot, instead of just cracking open a PDF and send key control C and key control V to do stuff instead, we'll use your OCR engine that you've already paid for to, to understand stuff. And so that's, how they expand, what they're offering, um, into addressing more and more capabilities.[00:53:57] Alex: But now we're, we're migrating into a territory where it's like, well, things have APIs why even build a bot for them. You know, you can just build a program that uses the API and the user can drive this. And so that's where people kind of get stuck. It's they they're using RPA on a, something that just as easily provides for a programmatic solution as opposed to an RPA solution.but because they're in their RPA mode and they say, we can use a bot for everything, they don't even stop and investigate and say, Hey, wouldn't this be just as easy to generate a react app and let a user use this because it has an API and IT can just as easily monitor and support that because it's in an Azure resource bucket.that's where an organization needs to be. Clear-eyed and say, Okay. at this point RPA is not the actual solution. We can do this just as easy over here and let's pursue that. [00:54:57] Jeremy: the experience of making these RPAs. It sounds like you have this browser-based IDE, there's probably some kind of drag and drop set up, and then you, you, you mentioned JavaScript. So I suppose, does that mean you can kind of dive a little bit deeper and if you want to set up specific rules or loops, you're actually writing that in JavaScript.[00:55:18] Alex: Yeah. So not, not necessarily. So, again, the business does not know what an IDE is. It's a studio. Um,so that's, but you're correct. It's, it's an IDE. Um, each, whether we're talking about blue prism or UiPath or automation anywhere, they all have a different flavor of what that looks like and what they enable.Um, traditionally blue prism gave you, uh, a studio that was more shape based where you are using UML shapes to define or describe your process. And then there you are, whereas automation anywhere traditionally used, uh, essentially lines or descriptors. So I say, Hey, I want to open this file. And your studio would just show a line that said open file.You know, um, although they do now, all of them have a shape based way to define your process. Go here, here. You know, here's a circle which represents this. Uh, let's do that. Um, or a way for you to kind of more, um, creatively define it in a, like a text-based way. When we talk about Java script, um, or anything like that, they provide predefined actions, all of them saying, I want to open a file or execute this that you can do, but all of them as well, at least last time I checked also allow you for a way to say, I want to programmatically run something I want to define.And since they're all in the browser, it is, uh, you know, Javascript that you're going to be saying, Hey, run this, this JavaScript, run this function. Um, previously, uh, things like automation anywhere would, uh, let you write stuff in, in .NET essentially to do that capability. But again, now everything's in the browser.So yeah, they do, They do provide for a capability to introduce more low level capability to your automation. That can get dangerous. Uh, it can be powerful and it can be stabilizing, but it can be a very slippery slope where you have an RPA solution bot that does the thing. But really all it does is it starts up and then executes code that you built.[00:57:39] Alex: Like what, what was the, the point in the first place? [00:57:43] Jeremy: Yeah. And I suppose at that point, then anybody who knows how to use the RPA tool, but isn't familiar with that code you wrote, they're just, they can't maintain it [00:57:54] Alex: you have business continuity and this goes back to our, it has to be replicable or close as close to the human process, as you can make. Because that's going to be the easiest to inherit and support. That's one of the great things about it. Whereas if you're a low level programmer, a dev who says, I can easily do this with a couple of lines of, you know, dot net or, you know, TypeScript or whatever.And so the bot just starts up in executes. Well, unless someone that is just as proficient comes along later and says, this is why it's breaking you now have an unsupportable business, solution. that's bad Juju. [00:58:38] Jeremy: you have the software engineers who they want to write code. then you have the people who are either in business or in IT that go, I don't want to look at your code.I don't want to have to maintain it. Yeah. So it's like you almost, if you're a software engineer coming in, you almost have to fight that urge to, to write anything yourself and figure out, okay, what can I do with the tool set and only go to code if I can't do it any other way.[00:59:07] Alex: That's correct. And that's the, it takes discipline. more often than not, not as fun as writing the code where you're like, I can do this. And this is really where the wheels come off is. You went to the business that is that I have this process, very simple. I need to do this and you say, cool, I can do that.And then you're sitting there writing code and you're like, but you know what? I know what they really want to do. And I can write that now. And so you've changed the process and while it is, and nine times out of 10, the business will be like, oh, that's actually what we wanted. The human process was just as close as we could get nothing else, but you're right.That's, that's exactly what we needed. Thank you nine times out of 10. They'll love you for that. But now you own their process. Now you're the one that defined it. You have to do the business continuity. You have to document it. And when it falls over, you have to pick it back up and you have to retrain.And unless you have an organizational capacity to say, okay, I've gone in and changed your process. I didn't automate it. I changed it. Now I have to go in and tell you how I changed it and how you can do it. And so that, unless you have built your robotic operating model and your, your team to afford for that, your developer could be writing checks bigger than they can cash.Even though this is a better capability. [01:00:30] Jeremy: you, you sort of touched on this before, and I think this is probably the, the last topic we'll cover, but you've been saying how the end goal should be to not have to use the RPAs anymore And I wonder if you have any advice for how to approach that process and, and what are some of the mistakes you've seen people make[01:00:54] Alex: Mm Hmm. I mean the biggest mistake I've seen organizations make, I think is throwing the RPA solution out there, building bots, and they're great bots, and they are creating that value. They're enabling you to save money and also, enabling your employees to go on and do better, more gratifying work. but then they say, that's, it that's as far as we're going to think, instead of taking those savings and saying, this is for replacing this pain point that we had to get a bot in the first place to do so.That's a huge common mistake. Absolutely understandable if I'm a CEO or even, you know, the person in charge of, you know, um, enterprise transformation. Um, it's very easy for me to say, ha victory, here's our money, here's our savings. I justified what we've done. Go have fun. Um, and instead of saying, we need to squirrel this money away and give it to the people that are going to change the system. So that, that's definitely one of the biggest things.The problem with that is that's not realized until years later when they're like, oh, we're still supporting these bots. So it is upfront having a turnoff strategy. When can we turn this bot off? What is that going to look like? Does it have a roadmap that will eventually do that?And that I think is the best way. And that will define what kind of processes you do indeed build bots for is you go to it and say, listen, we've got a lot of these user processes, human processes that are doing this stuff. Is there anything on your roadmap that is going to replace that and they say, oh yeah you know, in three years we're actually going to be standing up our new thing.We're going to be converting. And part of our, uh, analysis of the solution that we will eventually stand up will be, does it do these things? And so yes, in three years, you're good. And you say, cool, those are the processes I'm going to automate and we can shut those off. That's your point of entry for these things not doing that leads to bots running and doing things even after there is a enterprise solution for that. And more often than not, I would say greater than five times out of 10, when we are evaluating a process to build a bot for easily five times out of 10, we say, whoa, no, actually there's, you don't even need to do this.Our enterprise application can do this. you just need retraining, because your process is just old and no one knew you were doing this. And so they didn't come in and tell you, Hey, you need to use this.So that's really a lot of times what, what the issue is. And then after that, we go in and say, Okay.no, there's, there's no solution for this. This is definitely a bot needs to do this. Let's make sure number one, that there isn't a solution on the horizon six months to a year, because otherwise we're just going to waste time, but let's make sure there is, or at least IT, or the people in charge are aware that this is something that needs to be replaced bot or no bot.And so let's have an exit strategy. Let's have a turn-off strategy. When you have applications that are relatively modern, like you have a JIRA, a ServiceNow, you know, they must have some sort of API and it may just be that nobody has come in and told them, you just need to plug these applications together.[01:04:27] Alex: And so kind of what you're hitting on and surfacing is the future of RPA. Whereas everything we're talking about is using a bot to essentially bridge a gap, moving data from here to there that can't be done, programmatically. Accessing something from here to there that can't be done programmatically.So we use a bot to do it. That's only going to exist for so long. Legacy can only be legacy for so long, although you can conceivably because we had that big COBOL thing, um, maybe longer than we we'd all like, but eventually these things will be. upgraded. and so either the RPA market will get smaller because there's less legacy out there.And so RPA as a tool and a solution will become much more targeted towards specific systems or we expand what RPA is and what it can afford for. And so that I think is more likely the case. And that's the future where bots or automations aren't necessary interpreting the COM and the DOM and saying, okay, click here do that.But rather you're able to quickly build bots that utilize APIs that are built in and friendly. And so what we're talking about there is things like Appian or MuleSoft, which are these kind of API integrators are eventually going to be classified as RPA. They're going to be within this realm.And I think, where, where you're seeing that at least surfaced or moving towards is really what Microsoft's offering in that, where they, uh, they have something called power automate, which essentially is it just a very user-friendly way to access API. that they built or other people have built.So I want to go and I need to get information to service now, service now has an API. Yeah. Your, IT can go in and build you a nice little app that does a little restful call to it, or a rest API call to it gets information back, or you can go in and, you know, use Microsoft power automate and say, okay, I want to access service now.And it says, cool. These are the things you can do. And I say, okay, I just want to put information in this ticket and we're not talking about get or patch or put, uh, or anything like that. We're just saying, ah, that's what it's going to do. And that's kind of what Microsoft is, is offering. I think that is the new state of RPA is being able to interface in a user-friendly way with APIs. Cause everything's in the browser to the point. where, you know, Microsoft's enabling add ins for Excel to be written in JavaScript, which is just the new frontier. Um, so that's, that's kind of going to be the future state of this. I believe. [01:07:28] Jeremy: so, so moving from RPAs being this thing, that's gonna click through website, click through, um, a desktop application instead it's maybe more of this high, higher level tool where the user will still get this, I forget the term you used, but this tool to build a workflow, right. A studio. Okay. Um, and instead of saying, oh, I want this to click this button or fill in this form.It'll be, um, I want to get this information from service now. And I want to send a message using that information to slack or to Twilio, or, um, you're basically, talking directly to these different services and just telling it what you want and where it should go.[01:08:14] Alex: That's correct. So, as you said, everything's going to have an API, right? Seemingly everything has an API. And so instead of us, our RPA bots or solutions being UI focused, they're going to be API focused, um, where it doesn't have to use the user interface. It's going to use the other service. And again, the cool thing about APIs in that way is that it's not, directly connecting to your data source.It's the same as your UI for a user. It sits on top of it. It gets the request and it correctly interprets that. And does it the same thing with your UI where I say I click here and you know, wherever it says. okay. yeah. You're allowed to do that. Go ahead. So that's kind of that the benefit to that.Um, but to your point, the, the user experience for whether you're using a UI or API to build up RPA bot, it's going to be the same experience for the user. And then at this point, what we're talking about, well, where's the value offering or what is the value proposition of RPA and that's orchestration and monitoring and data essentially.we'll take care of hosting these for you. we'll take care of where they're going to run, uh, giving you a dashboard, things like that.[01:09:37] Alex: That's a hundred percent correct. It's it's providing a view into that thing and letting the business say, I want to no code this. And I want to be able to just go in and understand and say, oh, I do want to do that. I'm going to put these things together and it's going to automate this business process that I hate, but is vital, and I'm going to save it, the RPA software enables you to say, oh, I saw they did that. And I see it's running and everything's okay in the world and I want to turn it on or off. And so it's that seamless kind of capability that that's what that will provide. And I think that's really where it isn't, but really where it's going. Uh, it'll be interesting to see when the RPA providers switch to that kind of language because currently and traditionally they've gone to business and said, we can build you bots or no, no, your, your users can build bots and that's the value proposition they can go in.And instead of writing an Excel where you had one very, very advanced user. Building macros into Excel with VBA and they're unknown to the, the IT or anybody else instead, you know, build a bot for it. And so that's their business proposition today. Instead, it's going to shift, and I'd be interested to see when it shifts where they say, listen, we can provide you a view into those solutions and you can orchestrate them in, oh, here's the studio that enables people to build them.But really what you want to do is give that to your IT and just say, Hey, we're going to go over here and address business needs and build them. But don't worry. You'll be able to monitor them and at least say, yeah okay. this is, this is going.[01:11:16] Jeremy: Yeah. And that's a, a shift. It sounds like where RPA is currently, you were talking about how, when you're configuring them to click on websites and GUIs, you really do still need someone with the software expertise to know what's going on. but maybe when you move over to communicating with API, Um, maybe that won't be as important maybe, somebody who just knows the business process really can just use that studio and get what they need.[01:11:48] Alex: that's correct. Right. Cause the API only enables you to do what it defined right. So service now, which does have a robust API, it says you can do these things the same as a user can only click a button that's there that you've built and said they can click. And so that is you can't go off the reservation as easy with that stuff, really what's going to become prime or important is as no longer do I actually have an Oracle server physically in my location with a database.Instead I'm using Oracle's cloud capability, which exists on their own thing. That's where I'm getting data from. What becomes important about being able to monitor these is not necessarily like, oh, is it falling over? Is it breaking? It's saying, what information are you sending or getting from these things that are not within our walled garden.And that's really where, it or the P InfoSec is, is going to be maybe the main orchestrator owner of RPA, because they're, they're going to be the ones to say you can't, you can't get that. You're not allowed to get that information. It's not necessarily that you can't do it. Um, and you can't do it in a dangerous way, but it's rather, I don't want you

Psyda Podcast with Minhaaj
Data Science Careers with Dhaval Patel - Codebasics Youtube

Psyda Podcast with Minhaaj

Play Episode Listen Later Oct 23, 2021 112:42


Dhaval Patel is a software & data engineer with more than 17 years of experience. He has been working as a data engineer for a Fintech giant Bloomberg LP (New York) as well as NVidia in the past. He teaches programming, machine learning, data science through YouTube channel CodeBasics which has 428K subscribers worldwide. 00:00 Intro 01:34 Autoimmune disease ‘Ulcerative colitis', Life & Death Struggle, Back to Life 03:40 Mental Health, Steroids & Immune System 11:00 Planning Videos, Pedagogy & Smart People Problem 17:15 Working at Bloomberg, Bloomberg Trading Terminal & Exceptional Talent in Bloomberg 21:13 Career Tracks on Data Related Spectrum, Pathways for different Careers 25:16 Data Structure and Algorithms, Politics vs Equations, Eternity 28:20 ML vs Deterministic Programming, Time & Space complexity of the ML Models 30:37 Kaggle vs Real Life, Soft Skills for Engineers, Transition from Competitions to Industrial Use-cases 30:02 Litmus Test for Hiring Data Scientists, Continuous Engagement & Adaptability 42:35 Loss of Productivity by Lack of Communication Skills, Education System Deficiencies, How to Win Friends by Dale Carnegie 46:50 Death by PowerPoint, Simplicity & Walk vs Talk 49:51 Negotiating Salary, Action vs Motivation, Cellphone is a Distraction 57:35 Growing Vegetables, Joy of Gardening, Rural Childhood & GMO Food 01:01:40 Dhando Investor, Motel Business Monopoly by Patels, Software Engineering 01:04:04 Deep learning, C++ Back-propagation Algorithms, Nvidia Titan RTX GPUs, Amazon Stores Experience 01:08:49 Nvidia Broadcast Noise Cancellation Demonstration, Nvidia Card Filtering, CNNs and Edge Detection 01:16:06 BlackBox Models, ML-centric vs Data-Centric Models, 01:19:25 Natural Language Understanding, Yann Lecaun, Low Accuracy is NLP Models 01:21:18 Github AI Pairing, Data Structures & Future of Programming Languages 01:27:01 ETL pipelines & Distributed Computing Structures 01:30:00 FAST API, Beginner's Tools, Pytorch vs TensorFlow, Improvements in Tensorflow 2.0 01:35:05 Programmers vs Normal People, Semantics of English vs Programming Languages, pd.read_csv 01:38:03 Nvidia GPU vs Apple M1 GPU, Hope for non-Nvidia Deep-learning, Google Colab 01:41:30 Google Pixel, Google Tensor Chips & Chip Shortages 01:44:00 Discord Community for Data Science, Mentorship & Abundance Mindset 01:49:00 Struggles, Battles, Hopelessness & Dysphonia

Les Grandes Gueules
Les Grandes Gueules - Jeudi 14 octobre 2021

Les Grandes Gueules

Play Episode Listen Later Oct 14, 2021 124:39


Avec : David Dickens, directeur marketing. Johnny Blanc, fromager. Et Léa Falco, étudiante. - Alain Marschall et Olivier Truchot présentent un show de 3 heures avec leurs invités, où actualité rime avec liberté de ton, sur RMC la radio d'opinion. « Les Grandes Gueules » animées par Alain Marschall et Olivier Truchot sont de retour pour une 18e saison ! Agriculteur, fromager, avocat, enseignante… les 14 GG, issues de la société civile, n'ont jamais peur de défendre leurs idées. Entre débats animés, accrochages et éclats de rires, ces 3 heures de talk-show sont le reflet des vraies préoccupations des Français. Cette année, Fred Hermel débarque dans les GG avec un billet d'humeur : « C'est ça la France ». Chaque matin dès 6h, écoutez un show radio/télé unique en France. Pendant trois heures, l'équipe de RMC s'applique à partager l'actualité au plus près du quotidien des Français. Un rendez-vous exceptionnel mêlant infos en direct, débats autour de l'actualité, réactions et intervention d'experts. En simultané de 6h à 8h30 sur RMC Découverte. RMC est une radio généraliste, essentiellement axée sur l'actualité et sur l'interactivité avec les auditeurs, dans un format 100% parlé, inédit en France. La grille des programmes de RMC s'articule autour de rendez-vous phares comme Apolline Matin (6h-9h), les Grandes Gueules (9h-12h), Estelle Midi (12h-15h).

Les Grandes Gueules
Les Grandes Gueules du 14 octobre : David Dickens, Johnny Blanc et Léa Falco - 11h/12h

Les Grandes Gueules

Play Episode Listen Later Oct 14, 2021 40:30


Avec : David Dickens, directeur marketing. Johnny Blanc, fromager. Et Léa Falco, étudiante. - Alain Marschall et Olivier Truchot présentent un show de 3 heures avec leurs invités, où actualité rime avec liberté de ton, sur RMC la radio d'opinion. « Les Grandes Gueules » animées par Alain Marschall et Olivier Truchot sont de retour pour une 18e saison ! Agriculteur, fromager, avocat, enseignante… les 14 GG, issues de la société civile, n'ont jamais peur de défendre leurs idées. Entre débats animés, accrochages et éclats de rires, ces 3 heures de talk-show sont le reflet des vraies préoccupations des Français. Cette année, Fred Hermel débarque dans les GG avec un billet d'humeur : « C'est ça la France ». Chaque matin dès 6h, écoutez un show radio/télé unique en France. Pendant trois heures, l'équipe de RMC s'applique à partager l'actualité au plus près du quotidien des Français. Un rendez-vous exceptionnel mêlant infos en direct, débats autour de l'actualité, réactions et intervention d'experts. En simultané de 6h à 8h30 sur RMC Découverte. RMC est une radio généraliste, essentiellement axée sur l'actualité et sur l'interactivité avec les auditeurs, dans un format 100% parlé, inédit en France. La grille des programmes de RMC s'articule autour de rendez-vous phares comme Apolline Matin (6h-9h), les Grandes Gueules (9h-12h), Estelle Midi (12h-15h).

Les Grandes Gueules
Les Grandes Gueules - Jeudi 14 octobre 2021

Les Grandes Gueules

Play Episode Listen Later Oct 14, 2021 124:39


Avec : David Dickens, directeur marketing. Johnny Blanc, fromager. Et Léa Falco, étudiante. - Alain Marschall et Olivier Truchot présentent un show de 3 heures avec leurs invités, où actualité rime avec liberté de ton, sur RMC la radio d'opinion. « Les Grandes Gueules » animées par Alain Marschall et Olivier Truchot sont de retour pour une 18e saison ! Agriculteur, fromager, avocat, enseignante… les 14 GG, issues de la société civile, n'ont jamais peur de défendre leurs idées. Entre débats animés, accrochages et éclats de rires, ces 3 heures de talk-show sont le reflet des vraies préoccupations des Français. Cette année, Fred Hermel débarque dans les GG avec un billet d'humeur : « C'est ça la France ». Chaque matin dès 6h, écoutez un show radio/télé unique en France. Pendant trois heures, l'équipe de RMC s'applique à partager l'actualité au plus près du quotidien des Français. Un rendez-vous exceptionnel mêlant infos en direct, débats autour de l'actualité, réactions et intervention d'experts. En simultané de 6h à 8h30 sur RMC Découverte. RMC est une radio généraliste, essentiellement axée sur l'actualité et sur l'interactivité avec les auditeurs, dans un format 100% parlé, inédit en France. La grille des programmes de RMC s'articule autour de rendez-vous phares comme Apolline Matin (6h-9h), les Grandes Gueules (9h-12h), Estelle Midi (12h-15h).

Les Grandes Gueules
Les Grandes Gueules du 14 octobre : David Dickens, Johnny Blanc et Léa Falco - 11h/12h

Les Grandes Gueules

Play Episode Listen Later Oct 14, 2021 40:30


Avec : David Dickens, directeur marketing. Johnny Blanc, fromager. Et Léa Falco, étudiante. - Alain Marschall et Olivier Truchot présentent un show de 3 heures avec leurs invités, où actualité rime avec liberté de ton, sur RMC la radio d'opinion. « Les Grandes Gueules » animées par Alain Marschall et Olivier Truchot sont de retour pour une 18e saison ! Agriculteur, fromager, avocat, enseignante… les 14 GG, issues de la société civile, n'ont jamais peur de défendre leurs idées. Entre débats animés, accrochages et éclats de rires, ces 3 heures de talk-show sont le reflet des vraies préoccupations des Français. Cette année, Fred Hermel débarque dans les GG avec un billet d'humeur : « C'est ça la France ». Chaque matin dès 6h, écoutez un show radio/télé unique en France. Pendant trois heures, l'équipe de RMC s'applique à partager l'actualité au plus près du quotidien des Français. Un rendez-vous exceptionnel mêlant infos en direct, débats autour de l'actualité, réactions et intervention d'experts. En simultané de 6h à 8h30 sur RMC Découverte. RMC est une radio généraliste, essentiellement axée sur l'actualité et sur l'interactivité avec les auditeurs, dans un format 100% parlé, inédit en France. La grille des programmes de RMC s'articule autour de rendez-vous phares comme Apolline Matin (6h-9h), les Grandes Gueules (9h-12h), Estelle Midi (12h-15h).

Screaming in the Cloud
Keeping the Cloudwatch with Ewere Diagboya

Screaming in the Cloud

Play Episode Listen Later Oct 14, 2021 32:21


About EwereCloud, DevOps Engineer, Blogger and AuthorLinks: Infrastructure Monitoring with Amazon CloudWatch: https://www.amazon.com/Infrastructure-Monitoring-Amazon-CloudWatch-infrastructure-ebook/dp/B08YS2PYKJ LinkedIn: https://www.linkedin.com/in/ewere/ Twitter: https://twitter.com/nimboya Medium: https://medium.com/@nimboya My Cloud Series: https://mycloudseries.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by Honeycomb. When production is running slow, it's hard to know where problems originate: is it your application code, users, or the underlying systems? I've got five bucks on DNS, personally. Why scroll through endless dashboards, while dealing with alert floods, going from tool to tool to tool that you employ, guessing at which puzzle pieces matter? Context switching and tool sprawl are slowly killing both your team and your business. You should care more about one of those than the other, which one is up to you. Drop the separate pillars and enter a world of getting one unified understanding of the one thing driving your business: production. With Honeycomb, you guess less and know more. Try it for free at Honeycomb.io/screaminginthecloud. Observability, it's more than just hipster monitoring.Corey: This episode is sponsored in part by Liquibase. If you're anything like me, you've screwed up the database part of a deployment so severely that you've been banned from touching every anything that remotely sounds like SQL, at at least three different companies. We've mostly got code deployments solved for, but when it comes to databases we basically rely on desperate hope, with a roll back plan of keeping our resumes up to date. It doesn't have to be that way. Meet Liquibase. It is both an open source project and a commercial offering. Liquibase lets you track, modify, and automate database schema changes across almost any database, with guardrails to ensure you'll still have a company left after you deploy the change. No matter where your database lives, Liquibase can help you solve your database deployment issues. Check them out today at liquibase.com. Offer does not apply to Route 53.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. I periodically make observations that monitoring cloud resources has changed somewhat since I first got started in the world of monitoring. My experience goes back to the original Call of Duty. That's right: Nagios.When you set instances up, it would theoretically tell you when they were unreachable or certain thresholds didn't work. It was janky but it kind of worked, and that was sort of the best we have. The world has progressed as cloud has become more complicated, as technologies have become more sophisticated, and here today to talk about this is the first AWS Hero from Africa and author of a brand new book, Ewere Diagboya. Thank you for joining me.Ewere: Thanks for the opportunity.Corey: So, you recently published a book on CloudWatch. To my understanding, it is the first such book that goes in-depth with not just how to wind up using it, but how to contextualize it as well. How did it come to be, I guess is my first question?Ewere: Yes, thanks a lot, Corey. The name of the book is Infrastructure Monitoring with Amazon CloudWatch, and the book came to be from the concept of looking at the ecosystem of AWS cloud computing and we saw that a lot of the things around cloud—I mostly talked about—most of this is [unintelligible 00:01:49] compute part of AWS, which is EC2, the containers, and all that, you find books on all those topics. They are all proliferated all over the internet, you know, and videos and all that.But there is a core behind each of these services that no one actually talks about and amplifies, which is the monitoring part, which helps you to understand what is going on with the system. I mean, knowing what is going on with the system helps you to understand failures, helps you to predict issues, helps you to also envisage when a failure is going to happen so that you can remedy it and also [unintelligible 00:02:19], and in some cases, even give you a historical view of the system to help you understand how a system has behaved over a period of time.Corey: One of the articles that I put out that first really put me on AWS's radar, for better or worse, was something that I was commissioned to write for Linux Journal, back when that was a print publication. And I accidentally wound up getting the cover of it with my article, “CloudWatch is of the devil, but I must use it.” And it was a painful problem that people generally found resonated with them because no one felt they really understood CloudWatch; it was incredibly expensive; it didn't really seem like it was at all intuitive, or that there was any good way to opt out of it, it was just simply there, and if you were going to be monitoring your system in a cloud environment—which of course you should be—it was just sort of the cost of doing business that you then have to pay for a third-party tool to wind up using the CloudWatch metrics that it was gathering, and it was just expensive and unpleasant all around. Now, a lot of the criticisms I put about CloudWatch's limitations in those days, about four years ago, have largely been resolved or at least mitigated in different ways. But is CloudWatch still crappy, I guess, is my question?Ewere: Um, yeah. So, at the moment, I think, like you said, CloudWatch has really evolved over time. I personally also had that issue with CloudWatch when I started using CloudWatch; I had the challenge of usability, I had the challenge of proper integration, and I will talk about my first experience with CloudWatch here. So, when I started my infrastructure work, one of the things I was doing a lot was EC2, basically. I mean, everyone always starts with EC2 at the first time.And then we had a downtime. And then my CTO says, “Okay, [Ewere 00:04:00], check what's going on.” And I'm like, “How do I check?” [laugh]. I mean, I had no idea of what to do.And he says, “Okay, there's a tool called CloudWatch. You should be able to monitor.” And I'm like, “Okay.” I dive into CloudWatch, and boom, I'm confused again. And you look at the console, you see, it shows you certain metrics, and yet [people 00:04:18] don't understand what CPU metric talks about, what does network bandwidth talks about?And here I am trying to dig, and dig, and dig deeper, and I still don't get [laugh] a sense of what is actually going on. But what I needed to find out was, I mean, what was wrong with the memory of the system, so I delved into trying to install the CloudWatch agent, get metrics and all that. But the truth of the matter was that I couldn't really solve my problem very well, but I had [unintelligible 00:04:43] of knowing that I don't have memory out of the box; it's something that has to set up differently. And trust me, after then I didn't touch CloudWatch [laugh] again. Because, like you said, it was a problem, it was a bit difficult to work with.But fast forward a couple of years later, I could actually see someone use CloudWatch for a lot of beautiful stuff, you know? It creates beautiful dashboards, creates some very well-aggregated metrics. And also with the aggregated alarms that CloudWatch comes with, [unintelligible 00:05:12] easy for you to avoid what to call incident fatigue. And then also, the dashboards. I mean, there are so many dashboards that simplified to work with, and it makes it easy and straightforward to configure.So, the bootstrapping and the changes and the improvements on CloudWatch over time has made CloudWatch a go-to tool, and most especially the integration with containers and Kubernetes. I mean, CloudWatch is one of the easiest tools to integrate with EKS, Kubernetes, or other container services that run in AWS; it's just, more or less, one or two lines of setup, and here you go with a lot of beautiful, interesting, and insightful metrics that you will not get out of the box, and if you look at other monitoring tools, it takes a lot of time for you to set up, for you to configure, for you to consistently maintain and to give you those consistent metrics you need to know what's going on with your system from time to time.Corey: The problem I always ran into was that the traditional tools that I was used to using in data centers worked pretty well because you didn't have a whole lot of variability on an hour-to-hour basis. Sure, when you installed new servers or brought up new virtual machines, you had to update the monitoring system. But then you started getting into this world of ephemerality with auto-scaling originally, and later containers, and—God help us all—Lambda now, where it becomes this very strange back-and-forth story of, you need to be able to build something that, I guess, is responsive to that. And there's no good way to get access to some of the things that CloudWatch provides, just because we didn't have access into AWS's systems the way that they do. The inverse, though, is that they don't have access into things running inside of the hypervisor; a classic example has always been memory: memory usage is an example of something that hasn't been able to be displayed traditionally without installing some sort of agent inside of it. Is that still the case? Are there better ways of addressing those things now?Ewere: So, that's still the case, I mean, for EC2 instances. So before, now, we had an agent called a CloudWatch agent. Now, there's a new agent called Unified Cloudwatch Agent which is, I mean, a top-notch from CloudWatch agent. So, at the moment, basically, that's what happens on the EC2 layer. But the good thing is when you're working with containers, or more or less Kubernetes kind of applications or systems, everything comes out of the box.So, with containers, we're talking about a [laugh] lot of moving parts. The container themselves with their own CPU, memory, disk, all the metrics, and then the nodes—or the EC2 instance of the virtual machines running behind them—also having their own unique metrics. So, within the container world, these things are just a click of a button. Everything happens at the same time as a single entity, but within the EC2 instance and ecosystem, you still find this there, although the setup process has been a bit easier and much faster. But in the container world, that problem has totally been eliminated.Corey: When you take a look at someone who's just starting to get a glimmer of awareness around what CloudWatch is and how to contextualize it, what are the most common mistakes people make early on?Ewere: I also talked about this in my book, and one of the mistakes people make in terms of CloudWatch, and monitoring in generalities: “What am I trying to figure out?” [laugh]. If you don't have that answer clearly stated, you're going to run into a lot of problems. You need to answer that question of, “What am I trying to figure out?” I mean, monitoring is so broad, monitoring is so large that if you do not have the answer to that question, you're going to get yourself into a lot of trouble, you're going to get yourself into a lot of confusion, and like I said, if you don't understand what you're trying to figure out in the first place, then you're going to get a lot of data, you're going to get a lot of information, and that can get you confused.And I also talked about what I call alarm fatigues or incident fatigues. This happens when you configure so many alarms, so many metrics, and you're getting a lot of alarms hitting and notification services—whether it's Slack, whether it's an email—and it causes fatigue. What happens here is the person who should know what is going on with the system gets a ton of messages and in that scenario can miss something very important because there's so many messages coming in, so many integrations coming in. So, you should be able to optimize appropriately, to be able to, like you said, conceptualize what you're trying to figure out, what problems are you trying to solve? Most times you really don't figure this out for a start, but there are certain bare minimums you need to know about, and that's part of what I talked about in the book.One of the things that I highlighted in the book when I talked about monitoring of different layers is, when you're talking about monitoring of infrastructure, say compute services, such as virtual machines, or EC2 instances, the certain baseline and metrics you need to take note of that are core to the reliability, the scalability, and the efficiency of your system. And if you focus on these things, you can have a baseline starting point before you start going deeper into things like observability and knowing what's going on entirely with your system. So, baseline understanding of—baseline metrics, and baseline of what you need to check in terms of different kinds of services you're trying to monitor is your starting point. And the mistake people make is that they don't have a baseline. So, we do not have a baseline; they just install a monitoring tool, configure a CloudWatch, and they don't know the problem they're trying to solve [laugh] and that can lead to a lot of confusion.Corey: So, what inspired you from, I guess, kicking the tires on CloudWatch—the way that we all do—and being frustrated and confused by it, all the way to the other side of writing a book on it? What was it that got you to that point? Were you an expert on CloudWatch before you started writing the book, or was it, “Well, by the time this book is done, I will certainly know [laugh] more about the service than I did when I started.”Ewere: Yeah, I think it's a double-edged sword. [laugh]. So, it's a combination of the things you just said. So, first of all, I have experienced with other monitoring tools; I have love for reliability and scalability of a system. I started Kubernetes at some of the early times Kubernetes came out, when it was very difficult to deploy, when it was very difficult to set up.Because I'm looking at how I can make systems a little bit more efficient, a little bit more reliable than having to handle a lot of things like auto-scaling, having to go through the process of understanding how to scale. I mean, that's a school of its own that you need to prepare yourself for. So, first of all, I have a love for making sure systems are reliable and efficient, and second of all, I also want to make sure that I know what is going on with my system per time, as much as possible. The level of visibility of a system gives you the level of control and understanding of what your system is doing per time. So, those two things are very core to me.And then thirdly, I had a plan of a streak of books I want to write based on AWS, and just like monitoring is something that is just new. I mean, if you go to the package website, this is the first book on infrastructure monitoring AWS with CloudWatch; it's not a very common topic to talk about. And I have other topics in my head, and I really want to talk about things like networking, and other topics that you really need to go deep inside to be able to appreciate the value of what you see in there with all those scenarios because in this book, every chapter, I created a scenario of what a real-life monitoring system or what you need to do looks like. So, being that I have those premonitions, I know that whenever it came to, you know, to share with the world what I know in monitoring, what I've learned in monitoring, I took a [unintelligible 00:12:26]. And then secondly, as this opportunity for me to start telling the world about the things I learned, and then I also learned while writing the book because there are certain topics in the book that I'm not so much of an expert in things, like big data and all that.I had to also learn; I had to take some time to do more research, to do more understanding. So, I use CloudWatch, okay? I'm kind of good in CloudWatch, and also, I also had to do more learning to be able to disseminate this information. And also, hopefully, X-Ray some parts of monitoring and different services that people do not really pay so much attention into.Corey: What do you find that is still the most, I guess, confusing to you as you take a look across the ecosystem of the entire CloudWatch space? I mean, every time I play with it, I take a look, and I get lost in, “Oh, they have contributor analyses, and logs, and metrics.” And it's confusing, and every time I wind up, I guess, spiraling out of control. What do you find that, after all of this, is a lot easier for you, and what do you find that's a lot more understandable?Ewere: I'm still going to go back to the containers part. I'm sorry, I'm in love containers. [laugh].Corey: No, no, it's fair. Containers are very popular. Everyone loves them. I'm just basically anti-container based upon no better reason than I'm just stubborn and bloody-minded most of the time.Ewere: [laugh]. So, pretty much like I said, I kind of had experience with other monitoring tools. Trust me, if you want to configure proper container monitoring for other tools, trust me, it's going to take you at least a week or two to get it properly, from the dashboards, to the login configurations, to the piping of the data to the proper storage engine. These are things I talked about in the book because I took monitoring from the ground up. I mean, if you've never done monitoring before, when you take my book, you will understand the basic principles of monitoring.And [funny 00:14:15], you know, monitoring has some big data process, like an ETL process: extraction, transformation, and writing of data into an analytic system. So, first of all, you have to battle that. You have to talk about the availability of your storage engine. What are you using? An Elasticsearch? Are you using an InfluxDB? Where do you want to store your data? And then you have to answer the question of how do I visualize the data? What method do I realize this data? What kind of dashboards do I want to use? What methods of representation do I need to represent this data so that it makes sense to whoever I'm sharing this data with. Because in monitoring, you definitely have to share data with either yourself or with someone else, so the way you present the data needs to make sense. I've seen graphs that do not make sense. So, it requires some level of skill. Like I said, I've [unintelligible 00:15:01] where I spent a week or two having to set up dashboards. And then after setting up the dashboard, someone was like, “I don't understand, and we just need, like, two.” And I'm like, “Really?” [laugh]. You know? Because you spend so much time. And secondly, you discover that repeatability of that process is a problem. Because some of these tools are click and drag; some of them don't have JSON configuration. Some do, some don't. So, you discover that scalability of this kind of system becomes a problem. You can't repeat the dashboards: if you make a change to the system, you need to go back to your dashboard, you need to make some changes, you need to update your login, too, you need to make some changes across the layer. So, all these things is a lot of overhead [laugh] that you can cut off when you use things like Container Insights in CloudWatch—which is a feature of CloudWatch. So, for me, that's a part that you can really, really suck out so much juice from in a very short time, quickly and very efficiently. On the flip side, when you talk about monitoring for big data services, and monitoring for a little bit of serverless, there might be a little steepness in the flow of the learning curve there because if you do not have a good foundation in serverless, when you get into [laugh] Lambda Insights in CloudWatch, trust me, you're going to be put off by that; you're going to get a little bit confused. And then there's also multifunction insights at the moment. So, you need to have some very good, solid foundation in some of those topics before you can get in there and understand some of the data and the metrics that CloudWatch is presenting to you. And then lastly, things like big data, too, there are things that monitoring is still being properly fleshed out. Which I think that in the coming months and years to come, they will become more proper and they will become more presentable than they are at the moment.Corey: This episode is sponsored by our friends at Oracle HeatWave is a new high-performance accelerator for the Oracle MySQL Database Service. Although I insist on calling it “my squirrel.” While MySQL has long been the worlds most popular open source database, shifting from transacting to analytics required way too much overhead and, ya know, work. With HeatWave you can run your OLTP and OLAP, don't ask me to ever say those acronyms again, workloads directly from your MySQL database and eliminate the time consuming data movement and integration work, while also performing 1100X faster than Amazon Aurora, and 2.5X faster than Amazon Redshift, at a third of the cost. My thanks again to Oracle Cloud for sponsoring this ridiculous nonsense.Corey: The problem I've always had with dashboards is it seems like managers always want them—“More dashboards, more dashboards”—then you check the usage statistics of who's actually been viewing the dashboards and the answer is, no one since you demoed it to the execs eight months ago. But they always claim to want more. How do you square that?I guess, slicing between what people asked for and what they actually use.Ewere: [laugh]. So yeah, one of the interesting things about dashboards in terms of most especially infrastructure monitoring, is the dashboards people really want is a revenue dashboards. Trust me, that's what they want to see; they want to see the money going up, up, up, [laugh] you know? So, when it comes to—Corey: Oh, yes. Up and to the right, then everyone's happy. But CloudWatch tends to give you just very, very granular, low-level metrics of thing—it's hard to turn that into something executives care about.Ewere: Yeah, what people really care about. But my own take on that is, the dashboards are actually for you and your team to watch, to know what's going on from time to time. But what is key is setting up events across very specific and sensitive data. For example, when any kind of sensitive data is flowing across your system and you need to check that out, then you tie a metric to that, and in turn alarm to it. That is actually the most important thing for anybody.I mean, for the dashboards, it's just for you and your team, like I said, for your personal consumption. “Oh, I can see all the RDS connections are getting too high, we need to upgrade.” Oh, we can see that all, the memory, there was a memory spike in the last two hours. I know that's for you and your team to consume; not for the executive team. But what is really good is being able to do things like aggregate data that you can share.I think that is what the executive team would love to see. When you go back to the core principles of DevOps in terms of the DevOps Handbook, you see things like a mean time to recover, and change failure rate, and all that. The most interesting thing is that all these metrics can be measured only by monitoring. You cannot change failure rates if you don't have a monitoring system that tells you when there was a failure. You cannot know your release frequency when you don't have a metric that measures number of deployments you have and is audited in a particular metric or a particular aggregator system.So, we discovered that the four major things you measure in DevOps are all tied back to monitoring and metrics, at minimum, to understand your system from time to time. So, what the executive team actually needs is to get a summary of what's going on. And one of the things I usually do for almost any company I work for is to share some kind of uptime system with them. And that's where CloudWatch Synthetics Canary come in. So, Synthetic Canary is a service that helps you calculate that helps you check for uptime of the system.So, it's a very simple service. It does a ping, but it is so efficient, and it is so powerful. How is it powerful? It does a ping to a system and it gets a feedback. Now, if the status code of your service, it's not 200 or not 300, it considers it downtime.Now, when you aggregate this data within a period of time, say a month or two, you can actually use that data to calculate the uptime of your system. And that uptime [unintelligible 00:19:50] is something you can actually share to your customers and say, “Okay, we have an SLA of 99.9%. We have an SLA of 99.8%.” That data should not be doctored data; it should not be a data you just cook out of your head; it should be based on your system that you have used, worked with, monitored over a period of time so that the information you share with your customers are genuine, they are truthful, and they are something that they can also see for themselves.Hence companies are using [unintelligible 00:20:19] like status page to know what's going on from time to time whenever there is an incident and report back to their customers. So, these are things that executives will be more interested in than just dashboards, [laugh] dashboards, and more dashboards. So, it's more or less not about what they really ask for, but what you know and what you believe you are going to draw value from. I mean, an executive in a meeting with a client and says, “Hey, we got a system that has 99.9% uptime.”He opens the dashboard or he opens the uptime system and say, “You see our uptime? For the past three months, this has been our metric.” Boom. [snaps fingers]. That's it. That's value, instantly. I'm not showing [laugh] the clients and point of graphs, you know? “Can you explain the memory metric?” That's not going to pass the message, send the message forward.Corey: Since your book came out, I believe, if not, certainly by the time it was finished being written and it was in review phase, they came out with Managed Prometheus and Managed Grafana. It looks almost like they're almost trying to do a completely separate standalone monitoring stack of AWS tooling. Is that a misunderstanding of what the tools look like, or is there something to that?Ewere: Yeah. So, I mean by the time those announced at re:Invent, I'm like, “Oh, snap.” I almost told my publisher, “You know what? We need to add three more chapters.” [laugh]. But unfortunately, we're still in review, in preview.I mean, as a Hero, I kind of have some privilege to be able to—a request for that, but I'm like, okay, I think it's going to change the narrative of what the book is talking about. I think I'm going to pause on that and make sure this finishes with the [unintelligible 00:21:52], and then maybe a second edition, I can always attach that. But hey, I think there's trying to be a galvanization between Prometheus, Grafana, and what CloudWatch stands for. Because at the moment, I think it's currently on pre-release, it's not fully GA at the moment, so you can actually use it. So, if you go to Container Insights, you can see that you can still get how Prometheus and Grafana is presenting the data.So, it's more or less a different view of what you're trying to see. It's trying to give you another perspective of how your data is presented. So, you're going to have CloudWatch: it's going to have CloudWatch dashboards, it's going to have CloudWatch metrics, but hey, this different tools, Prometheus, Grafana, and all that, they all have their unique ways of presenting the data. And part of the reason I believe AWS has Prometheus and Grafana there is, I mean, Prometheus is a huge cloud-native open-source monitoring, presentation, analytics tool; it packs a lot of heat, and a lot of people are so used to it. Everybody like, “Why can't I have Prometheus in CloudWatch?”I mean—so instead of CloudWatch just being a simple monitoring tool, [unintelligible 00:22:54] CloudWatch has become an ecosystem of monitoring tool. So, we got—we're not going to see cloud [unintelligible 00:23:00], or just [unintelligible 00:23:00] log, analytics, metrics, dashboards, no. We're going to see it as an ecosystem where we can plug in other services, and then integrate and work together to give us better performance options, and also different perspectives to the data that is being collected.Corey: What do you think is next, as you take a look across the ecosystem, as far as how people are thinking about monitoring and observability in a cloud context? What are they missing? Where's the next evolution lead?Ewere: Yeah, I think the biggest problem with monitoring, which is part of the introduction part of the book, where I talked about the basic types of monitoring—which is proactive and reactive monitoring—is how do we make sure we know before things happen? [laugh]. And one of the things that can help with that is machine learning. There is a small ecosystem that is not so popular at the moment, which talks about how we can do a lot of machine learning in DevOps monitoring observability. And that means looking at historic data and being able to predict on the basic level.Looking at history, [then are 00:24:06] being able to predict. At the moment, there are very few tools that have models running at the back of the data being collected for monitoring and metrics, which could actually revolutionize monitoring and observability as we see it right now. I mean, even the topic of observability is still new at the moment. It's still very integrated. Observability just came into Cloud, I think, like, two years ago, so it's still being matured.But one thing that has been missing is seeing the value AI can bring into monitoring. I mean, this much [unintelligible 00:24:40] practically tell us, “Hey, by 9 p.m. I'm going to go down. I think your CPU or memory is going down. I think I'm line 14 of your code [laugh] is a problem causing the bug. Please, you need to fix it by 2 p.m. so that by 6 p.m., things can run perfectly.” That is going to revolutionize monitoring. That's going to revolutionize observability and bring a whole new level to how we understand and monitor the systems.Corey: I hope you're right. If you take a look right now, I guess, the schism between monitoring and observability—which I consider to be hipster monitoring, but they get mad when I say that—is there a difference? Is it just new phrasing to describe the same concepts, or is there something really new here?Ewere: In my book, I said, monitoring is looking at it from the outside in, observability is looking at it from the inside out. So, what monitoring does not see under, basically, observability sees. So, they are children of the same mom. That's how I put it. One actually needs the other and both of them cannot be separated from each other.What we've been working with is just understanding the system from the surface. When there's an issue, we go to the aggregated results that come out of the issue. Very basic example: you're in a Java application, and we all know Java is very memory intensive, on the very basic layer. And there's a memory issue. Most times, infrastructure is the first hit with the resultant of that.But the problem is not the infrastructure, it's maybe the code. Maybe garbage collection was not well managed; maybe they have a lot of variables in the code that is not used, and they're just filling up unnecessary memory locations; maybe there's a loop that's not properly managed and properly optimized; maybe there's a resource on objects that has been initialized that has not been closed, which will cause a heap in the memory. So, those are the things observability can help you track. Those are the things that we can help you see. Because observability runs from within the system and send metrics out, while basic monitoring is about understanding what is going on on the surface of the system: memory, CPU, pushing out logs to know what's going on and all that.So, on the basic level, observability helps gives you, kind of, a deeper insight into what monitoring is actually telling you. It's just like the result of what happened. I mean, we are told that the symptoms of COVID is coughing, sneezing, and all that. That's monitoring. [laugh].But before we know that you actually have COVID, we need to go for a test, and that's observability. Telling us what is causing the sneezing, what is causing the coughing, what is causing the nausea, all the symptoms that come out of what monitoring is saying. Monitoring is saying, “You have a cough, you have a runny nose, you're sneezing.” That is monitoring. Observability says, “There is a COVID virus in the bloodstream. We need to fix it.” So, that's how both of them act.Corey: I think that is probably the most concise and clear definition I've ever gotten on the topic. If people want to learn more about what you're up to, how you view about these things—and of course, if they want to buy your book, we will include a link to that in the [show notes 00:27:40]—where can they find you?Ewere: I'm on LinkedIn; I'm very active on LinkedIn, and I also shared the LinkedIn link. I'm very active on Twitter, too. I tweet once in a while, but definitely, when you send me a message on Twitter, I'm also going to be very active.I also write blogs on Medium, I write a couple of blogs on Medium, and that was part of why AWS recognized me as a Hero because I talk a lot about different services, I help with comparing services for you so you can choose better. I also talk about setting basic concepts, too; if you just want to get your foot wet into some stuff and you need something very summarized, not AWS documentation per se, something that you can just look at and know what you need to do with the service, I talk about them also in my blogs. So yeah, those are the two basic places I'm in: LinkedIn and Twitter.Corey: And we will, of course, put links to that in the [show notes 00:28:27]. Thank you so much for taking the time to speak with me. I appreciate it.Ewere: Thanks a lot.Corey: Ewere Diagboya, head of cloud at My Cloud Series. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you hated this podcast, please leave a five-star review on your podcast platform of choice along with a comment telling me how many more dashboards you would like me to build that you will never look at.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Les Grandes Gueules
Les Français consomment plus de mozzarella que de camembert - 14/10

Les Grandes Gueules

Play Episode Listen Later Oct 14, 2021 21:14


Avec : David Dickens, directeur marketing. Johnny Blanc, fromager. Et Léa Falco, étudiante. - Alain Marschall et Olivier Truchot présentent un show de 3 heures avec leurs invités, où actualité rime avec liberté de ton, sur RMC la radio d'opinion. « Les Grandes Gueules » animées par Alain Marschall et Olivier Truchot sont de retour pour une 18e saison ! Agriculteur, fromager, avocat, enseignante… les 14 GG, issues de la société civile, n'ont jamais peur de défendre leurs idées. Entre débats animés, accrochages et éclats de rires, ces 3 heures de talk-show sont le reflet des vraies préoccupations des Français. Cette année, Fred Hermel débarque dans les GG avec un billet d'humeur : « C'est ça la France ». Chaque matin dès 6h, écoutez un show radio/télé unique en France. Pendant trois heures, l'équipe de RMC s'applique à partager l'actualité au plus près du quotidien des Français. Un rendez-vous exceptionnel mêlant infos en direct, débats autour de l'actualité, réactions et intervention d'experts. En simultané de 6h à 8h30 sur RMC Découverte. RMC est une radio généraliste, essentiellement axée sur l'actualité et sur l'interactivité avec les auditeurs, dans un format 100% parlé, inédit en France. La grille des programmes de RMC s'articule autour de rendez-vous phares comme Apolline Matin (6h-9h), les Grandes Gueules (9h-12h), Estelle Midi (12h-15h).

Les Grandes Gueules
Les Grandes Gueules du 14 octobre : David Dickens, Johnny Blanc et Léa Falco - 10h/11h

Les Grandes Gueules

Play Episode Listen Later Oct 14, 2021 41:47


Avec : David Dickens, directeur marketing. Johnny Blanc, fromager. Et Léa Falco, étudiante. - Alain Marschall et Olivier Truchot présentent un show de 3 heures avec leurs invités, où actualité rime avec liberté de ton, sur RMC la radio d'opinion. « Les Grandes Gueules » animées par Alain Marschall et Olivier Truchot sont de retour pour une 18e saison ! Agriculteur, fromager, avocat, enseignante… les 14 GG, issues de la société civile, n'ont jamais peur de défendre leurs idées. Entre débats animés, accrochages et éclats de rires, ces 3 heures de talk-show sont le reflet des vraies préoccupations des Français. Cette année, Fred Hermel débarque dans les GG avec un billet d'humeur : « C'est ça la France ». RMC est une radio généraliste, essentiellement axée sur l'actualité et sur l'interactivité avec les auditeurs, dans un format 100% parlé, inédit en France. La grille des programmes de RMC s'articule autour de rendez-vous phares comme Apolline Matin (6h-9h), les Grandes Gueules (9h-12h), Estelle Midi (12h-15h), Super Moscato Show (15h-18h), Rothen s'enflamme (18h-20h), l'After Foot (20h-minuit).

Les Grandes Gueules
GG 2022 : "On ne peut pas être président sans aimer les arbres", Michel Barnier - 14/10

Les Grandes Gueules

Play Episode Listen Later Oct 14, 2021 13:51


Chaque jour, nos Grandes Gueules debriefent les brèves de campagne dans "GG 2022". - Avec : David Dickens, directeur marketing. Johnny Blanc, fromager. Et Léa Falco, étudiante. - Alain Marschall et Olivier Truchot présentent un show de 3 heures avec leurs invités, où actualité rime avec liberté de ton, sur RMC la radio d'opinion. « Les Grandes Gueules » animées par Alain Marschall et Olivier Truchot sont de retour pour une 18e saison ! Agriculteur, fromager, avocat, enseignante… les 14 GG, issues de la société civile, n'ont jamais peur de défendre leurs idées. Entre débats animés, accrochages et éclats de rires, ces 3 heures de talk-show sont le reflet des vraies préoccupations des Français. Cette année, Fred Hermel débarque dans les GG avec un billet d'humeur : « C'est ça la France ». RMC est une radio généraliste, essentiellement axée sur l'actualité et sur l'interactivité avec les auditeurs, dans un format 100% parlé, inédit en France. La grille des programmes de RMC s'articule autour de rendez-vous phares comme Apolline Matin (6h-9h), les Grandes Gueules (9h-12h), Estelle Midi (12h-15h), Super Moscato Show (15h-18h), Rothen s'enflamme (18h-20h), l'After Foot (20h-minuit).

Les Grandes Gueules
Cachan : un élu accuse la police de tirer sur la population - 14/10

Les Grandes Gueules

Play Episode Listen Later Oct 14, 2021 24:16


Avec : David Dickens, directeur marketing. Johnny Blanc, fromager. Et Léa Falco, étudiante. - Alain Marschall et Olivier Truchot présentent un show de 3 heures avec leurs invités, où actualité rime avec liberté de ton, sur RMC la radio d'opinion. « Les Grandes Gueules » animées par Alain Marschall et Olivier Truchot sont de retour pour une 18e saison ! Agriculteur, fromager, avocat, enseignante… les 14 GG, issues de la société civile, n'ont jamais peur de défendre leurs idées. Entre débats animés, accrochages et éclats de rires, ces 3 heures de talk-show sont le reflet des vraies préoccupations des Français. Cette année, Fred Hermel débarque dans les GG avec un billet d'humeur : « C'est ça la France ». Chaque matin dès 6h, écoutez un show radio/télé unique en France. Pendant trois heures, l'équipe de RMC s'applique à partager l'actualité au plus près du quotidien des Français. Un rendez-vous exceptionnel mêlant infos en direct, débats autour de l'actualité, réactions et intervention d'experts. En simultané de 6h à 8h30 sur RMC Découverte. RMC est une radio généraliste, essentiellement axée sur l'actualité et sur l'interactivité avec les auditeurs, dans un format 100% parlé, inédit en France. La grille des programmes de RMC s'articule autour de rendez-vous phares comme Apolline Matin (6h-9h), les Grandes Gueules (9h-12h), Estelle Midi (12h-15h).

Les Grandes Gueules
Les Grandes Gueules du 14 octobre : David Dickens, Johnny Blanc et Léa Falco - 9h/10h

Les Grandes Gueules

Play Episode Listen Later Oct 14, 2021 42:21


Avec : David Dickens, directeur marketing. Johnny Blanc, fromager. Et Léa Falco, étudiante. - Alain Marschall et Olivier Truchot présentent un show de 3 heures avec leurs invités, où actualité rime avec liberté de ton, sur RMC la radio d'opinion. « Les Grandes Gueules » animées par Alain Marschall et Olivier Truchot sont de retour pour une 18e saison ! Agriculteur, fromager, avocat, enseignante… les 14 GG, issues de la société civile, n'ont jamais peur de défendre leurs idées. Entre débats animés, accrochages et éclats de rires, ces 3 heures de talk-show sont le reflet des vraies préoccupations des Français. Cette année, Fred Hermel débarque dans les GG avec un billet d'humeur : « C'est ça la France ». RMC est une radio généraliste, essentiellement axée sur l'actualité et sur l'interactivité avec les auditeurs, dans un format 100% parlé, inédit en France. La grille des programmes de RMC s'articule autour de rendez-vous phares comme Apolline Matin (6h-9h), les Grandes Gueules (9h-12h), Estelle Midi (12h-15h), Super Moscato Show (15h-18h), Rothen s'enflamme (18h-20h), l'After Foot (20h-minuit).

Les Grandes Gueules
Le monde de Macron : Ian Brossat attaque Rachida Dati - 14/10

Les Grandes Gueules

Play Episode Listen Later Oct 14, 2021 12:47


Avec : David Dickens, directeur marketing. Johnny Blanc, fromager. Et Léa Falco, étudiante. - Alain Marschall et Olivier Truchot présentent un show de 3 heures avec leurs invités, où actualité rime avec liberté de ton, sur RMC la radio d'opinion. « Les Grandes Gueules » animées par Alain Marschall et Olivier Truchot sont de retour pour une 18e saison ! Agriculteur, fromager, avocat, enseignante… les 14 GG, issues de la société civile, n'ont jamais peur de défendre leurs idées. Entre débats animés, accrochages et éclats de rires, ces 3 heures de talk-show sont le reflet des vraies préoccupations des Français. Cette année, Fred Hermel débarque dans les GG avec un billet d'humeur : « C'est ça la France ». RMC est une radio généraliste, essentiellement axée sur l'actualité et sur l'interactivité avec les auditeurs, dans un format 100% parlé, inédit en France. La grille des programmes de RMC s'articule autour de rendez-vous phares comme Apolline Matin (6h-9h), les Grandes Gueules (9h-12h), Estelle Midi (12h-15h), Super Moscato Show (15h-18h), Rothen s'enflamme (18h-20h), l'After Foot (20h-minuit).

Les Grandes Gueules
Hausse des prix du carburant : possible retour des gilets jaunes ? - 14/10

Les Grandes Gueules

Play Episode Listen Later Oct 14, 2021 25:44


Avec : David Dickens, directeur marketing. Johnny Blanc, fromager. Et Léa Falco, étudiante. - Alain Marschall et Olivier Truchot présentent un show de 3 heures avec leurs invités, où actualité rime avec liberté de ton, sur RMC la radio d'opinion. « Les Grandes Gueules » animées par Alain Marschall et Olivier Truchot sont de retour pour une 18e saison ! Agriculteur, fromager, avocat, enseignante… les 14 GG, issues de la société civile, n'ont jamais peur de défendre leurs idées. Entre débats animés, accrochages et éclats de rires, ces 3 heures de talk-show sont le reflet des vraies préoccupations des Français. Cette année, Fred Hermel débarque dans les GG avec un billet d'humeur : « C'est ça la France ». Chaque matin dès 6h, écoutez un show radio/télé unique en France. Pendant trois heures, l'équipe de RMC s'applique à partager l'actualité au plus près du quotidien des Français. Un rendez-vous exceptionnel mêlant infos en direct, débats autour de l'actualité, réactions et intervention d'experts. En simultané de 6h à 8h30 sur RMC Découverte. RMC est une radio généraliste, essentiellement axée sur l'actualité et sur l'interactivité avec les auditeurs, dans un format 100% parlé, inédit en France. La grille des programmes de RMC s'articule autour de rendez-vous phares comme Apolline Matin (6h-9h), les Grandes Gueules (9h-12h), Estelle Midi (12h-15h).

Streaming Audio: a Confluent podcast about Apache Kafka
Powering Event-Driven Architectures on Microsoft Azure with Confluent

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Oct 14, 2021 38:42


When you order a pizza, what if you knew every step of the process from the moment it goes in the oven to being delivered to your doorstep? Event-Driven Architecture is a modern, data-driven approach that describes “events” (i.e., something that just happened). A real-time data infrastructure enables you to provide such event-driven data insights in real time. Israel Ekpo (Principal Cloud Solutions Architect, Microsoft Global Partner Solutions, Microsoft) and Alicia Moniz (Cloud Partner Solutions Architect, Confluent) discuss use cases on leveraging Confluent Cloud and Microsoft Azure to power real-time, event-driven architectures. As an Apache Kafka® community stalwart, Israel focuses on helping customers and independent software vendor (ISV) partners build solutions for the cloud and use open source databases and architecture solutions like Kafka, Kubernetes, Apache Flink, MySQL, and PostgreSQL on Microsoft Azure. He's worked with retailers and those in the IoT space to help them adopt processes for inventory management with Confluent. Having a cloud-native, real-time architecture that can keep an accurate record of supply and demand is important in keeping up with the inventory and customer satisfaction. Israel has also worked with customers that use Confluent to integrate with Cosmos DB, Microsoft SQL Server, Azure Cognitive Search, and other integrations within the Azure ecosystem. Another important use case is enabling real-time data accessibility in the public sector and healthcare while ensuring data security and regulatory compliance like HIPAA. Alicia has a background in AI, and she expresses the importance of moving away from the monolithic, centralized data warehouse to a more flexible and scalable architecture like Kafka. Building a data pipeline leveraging Kafka helps ensure data security and consistency with minimized risk.The Confluent and Azure integration enables quick Kafka deployment with out-of-the-box solutions within the Kafka ecosystem. Confluent Schema Registry captures event streams with a consistent data structure, ksqlDB enables the development of real-time ETL pipelines, and Kafka Connect enables the streaming of data to multiple Azure services.EPISODE LINKSMicrosoft Azure at Kafka Summit AmericasIzzyAcademy Kafka on Azure Learning Series by Alicia MonizWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

Raw Data By P3
Imke Feldmann

Raw Data By P3

Play Episode Listen Later Oct 13, 2021 75:37


Imke Feldmann is among the first few to have recognized the incredible value and potential of this thing called Power Pivot in Excel (which was the precursor to Power BI).  And did she ever run with it, launching quite the successful solo consultancy and training service!  She exemplifies the helpful nature of the data community through her blog, The BIccountant, where she shares her amazing Microsoft BI tool knowledge. Her background is in Finance and Accounting, but you'll quickly realize she knows a great deal more than just Finance and Accounting! Contact Imke: The BIccountant Imke's Twitter References in this Episode: Imke's Github MS Power BI Idea - Customizable Ribbon - Please Upvote :) MS Power BI Idea - Speed Up PQ By Breaking Refresh Chain - Please Upvote :) Episode Timeline: 3:00 - The value of outsourcing certain business functions, Imke's path to Power BI starts with Rob's blog, a multi-dimensional cube discussion breaks out! 19:45 - One of Power BI's strengths is collaboration, Imke LOVES her some Power Query and M and loves DAX not so much 33:45 - Imke has a BRILLIANT idea about how to improve Power Query and some other improvements that we'd like to see in PQ 52:30 -  Rob's VS code experience, how COVID has affected the consulting business, Staying solo vs growing a company and how Imke determines which clients she takes on Episode Transcript: Rob Collie (00:00:00): Hello friends. Today's guest is Imke Feldmann. We've been working for a long time, nearly a year to arrange the schedules to get her on the show, and I'm so glad that we finally managed to do it. For a moment, imagine that it's 2010, 2011, that era. During that timeframe, I felt not quite alone, but a member of a very slowly growing and small community of people who had glimpsed what Power Pivot could do. And for those of you who don't know what Power Pivot is, and that was the version of Power BI, the first version that was embedded only in Excel. And at the time, the way the community grew, we'll use a metaphor for this. Imagine that the community was a map of the world and the map is all dark, but slowly, you'd see these little dim lights lighting up like one over here in the UK, one in the Southwest corner of the United States, very faintly. Rob Collie (00:00:51): And these would be people who were just becoming aware of this thing, this Power Pivot thing, and you'd watch them. They'd sort of show up on the radar, very tentatively at first kind of dipping their toe, and then that light would get brighter, and brighter, and brighter over time, as they really leaned in, and they learned more and more, and they became more adept at it. And this was the way things went for a long time. And then in 2011, out of nowhere in Germany on the map, this light comes on at full intensity, brightly declaring itself as super talented and powerful. And that was what it felt like to come across Imke Feldmann. Rob Collie (00:01:27): Like all of our guests, there's a little bit of that accidental path in her career, but also a tremendous sense of being deliberate. When this stuff crossed her radar, she appreciated it immediately. And I didn't know this until this conversation, but she quit her corporate job in 2013, the same year that I founded P3 as a real company, and became a freelancer. So for eight plus years, she has been a full time Power BI professional. There truly aren't that many people who can say that in the world. Our conversation predictably wandered. At one point, we got pretty deep into the notion of M and Power Query and it's screaming need for more buttons on its ribbon. And Imke has some fantastic ideas on how they should be addressing that. Rob Collie (00:02:14): We also, of course, naturally talked about the differences between remaining a solo freelancer as she has, in contrast to the path that I chose, which is scaling up a consulting practice business. Along the way we reprised the old and completely pointless debate of DAX versus M, I even try to get Tom hooked on M as his new obsession. We'll see how well that goes. Most importantly though, it was just a tremendous pleasure to finally get to talk to Imke at length for the first time after all these years, we literally crossed paths 10 years ago. So it was a conversation 10 years in the making compress down to an hour and change. I hope you enjoy it as much as we did, so let's get into it. Announcer (00:02:56): Ladies and gentlemen, may I have your attention, please? Announcer (00:03:00): This is The Raw Data by P3 Adaptive podcast, with your host Rod Collie, and your cohost Thomas LaRock. Find out what the experts at P3 Adaptive can do for your business. Just go to P3adaptive.com. Raw Data by P3 Adaptive is data with the human element. Rob Collie (00:03:24): Welcome to the show Imke Feldmann. How are you today? Imke Feldmann (00:03:27): Thank you, Rob. Great. It's a great day here over in Germany. Rob Collie (00:03:30): We have been talking about doing this for the better part of a year. So I'm glad that we're landing the guest, Imke is here. I really appreciate you doing this. So why don't we start with the basics. What are you up to these days? What do you do for a living? Imke Feldmann (00:03:48): I have people building great Power BI solutions these days. Rob Collie (00:03:55): Ah, yes. Imke Feldmann (00:03:55): That's how I fill my days. Rob Collie (00:03:58): I hear that that's a good business. Imke Feldmann (00:03:58): Yeah, it is. Rob Collie (00:04:03): So, and your website is? Imke Feldmann (00:04:06): Thebiaccountant.com. Rob Collie (00:04:07): Is that what you are on Twitter as well? Imke Feldmann (00:04:08): Yes. That's also my Twitter handle theBIccountant without an A in the middle. I just replaced the A from accountant with a BI. Rob Collie (00:04:17): There you go. Imke Feldmann (00:04:18): Yeah. Rob Collie (00:04:18): That's right. So that means that I'm going to make a tremendous leap here, wait till you see these powers of observation and deduction. You must have an accounting background? Imke Feldmann (00:04:29): I do, yes. Rob Collie (00:04:30): See you look at that. That's why I make the money. Okay, let's start there, was accounting your first career out of school? Imke Feldmann (00:04:39): Yes. I went to university and studied some economics or business stuff there, they'll know it's translated into English. And then I worked as a business controller. After that, I took over a job to lead a bookkeeping departments or to work with an area where the numbers came from basically. And then after that, I worked as the finance director, where I was responsible for a whole bunch of areas, controlling bookkeeping, IT, HR, and production. So that was quite a job with a broad range of responsibilities. Rob Collie (00:05:18): So you mentioned, kind of slipped IT into that list, right? Imke Feldmann (00:05:23): Yeah. Rob Collie (00:05:23): There's all these things in that list of responsibilities that all seemed they belong together, right? Bookkeeping, accounting, control or finance, IT. We've run into this before, with actually a number of people, that a lot of times the accounting or finance function in a company kind of wins the job of IT by default. Imke Feldmann (00:05:45): Yeah. It seems quite common in Germany, at least I would say. Rob Collie (00:05:48): I get multiple examples, but one that I can absolutely point to is Trevor Hardy from the Canadian Football League, he is in accounting, accounting and finance. And just by default, well, that's close to computers. Imke Feldmann (00:06:00): Yes. Rob Collie (00:06:01): And so it just kind of pulls the IT function in. Now is that true at really large organizations in Germany or is it a mid market thing? Imke Feldmann (00:06:09): No I would say a mid market thing. Rob Collie (00:06:12): That's true here too. So when there isn't an IT org yet it ends up being, oftentimes it falls to the finance and accounting function. Hey, that's familiar. It's kind of funny when you think about it, but it's familiar. And isn't finance itself pretty different from accounting? How much of a leap is that? What was that transition like for you taking over the finance function as well? We tend to talk about these things, at least in the US, is like almost like completely separate functions at times. Imke Feldmann (00:06:43): It depends, but at least it had something to do with my former education, which wasn't the case with IT. So, I mean, of course on a certain management level, you are responsible for things that you're not necessarily familiar with in detail. You just have to manage the people that know the details and do the jobs for you. So that was not too big an issue I must admit. Rob Collie (00:07:10): My first job out of school was Microsoft, an organization of that size, I was hyper specialized in terms of what I did. At this company at P, we are nowhere near that scale, and there's a lot more of that multiple hat wearing. I've definitely been getting used to that over the last decade, the first decade plus of my career, not so much. Imke Feldmann (00:07:31): Yeah. That's interesting because I basically went completely the other way around. I see myself now as working as a technical specialist and as a freelancer, I don't have to manage any employees anymore. Rob Collie (00:07:47): Well, so now you wear all the hats? Imke Feldmann (00:07:49): Yes. In a certain way, yes. Rob Collie (00:07:51): Okay. There's no HR department necessarily, right, so it's just you. But marketing, sales, delivery, everything. Imke Feldmann (00:08:01): Yep, that's true. Yep. And when I first started, I tried to do everything by myself, but the test changed as well. So in the past I started to outsource more things, but to external companies, not internal staff. Rob Collie (00:08:17): So you're talking about outsourcing certain functions in your current business, is that correct? Imke Feldmann (00:08:22): Yes, yes. Rob Collie (00:08:22): So it's interesting, right? Even that comes with tremendous risk when you delegate a certain function to an outside party whose incentives and interests they are never going to be 100% aligned with yours. Even we have been taken for a ride multiple times by third-party consulting firms that we've hired to perform certain functions for us. Imke Feldmann (00:08:46): Oh, no I don't outsource and your services that I directly provide to my clients. Rob Collie (00:08:49): Oh, no, no. Imke Feldmann (00:08:50): No. Rob Collie (00:08:50): No, we don't either. But I'm saying for example, our Salesforce implementation for instance- Imke Feldmann (00:08:56): Okay, mm-hmm (affirmative). Rob Collie (00:08:57): ... Has been a tremendous money sink for us over the years. Where we're at is good, but the ROI on that spend has been pretty poor. It's really easy to throw a bunch of money at that and it just grinds and grinds and grinds. And so this contrast that I'm getting around to is really important because that's not what it's like to be a good Power BI consultant, right? You're not that kind of risk for your clients. But if you go out and hire out some sort of IT related services for example, like Salesforce development, we're exposed to that same sort of drag you out into the deep water and drown you business model, that's not how we operate. I'm pretty sure that's not how you operate either. And so anyway, when you start talking about outsourcing, I just thought, oh, we should probably talk about that. Have you outsourced anything for your own sort of back office? Imke Feldmann (00:09:52): Back office stuff, yeah. My blog, WordPress stuff, or computer stuff in the background. So security [inaudible 00:09:59] the stuff and things like that, things that are not my core, I hire consultants to help me out with things that I would formally Google, spend hours Googling with. Rob Collie (00:10:09): Yes. Imke Feldmann (00:10:10): Now I just hire consultants to do that. Or for example, for Power Automate, this is something that I wanted to learn and I saw the big potential for clients. And there I also did private training basically, or coaching, or how you called it, hire specialists. Rob Collie (00:10:27): To kind of getting you going? Imke Feldmann (00:10:29): Exactly, exactly. Rob Collie (00:10:30): And those things that you've outsourced for your back office, have there been any that felt like what I described you end up deep in the spend and deepen the project going, "What's going on here?" Imke Feldmann (00:10:41): I'm usually looking for freelancers on that. And I made quiet good experiences with it, I must say. Rob Collie (00:10:49): Well done. Well done. All right. So let's rewind a bit, we'll get to the point where you're in charge of the finance department, which of course includes IT. Imke Feldmann (00:10:58): Not necessarily so. I felt quite sad for the guys who I had to manage because I said, "Well, I'm really sorry, but you will hear a lot of questions from me, especially at the beginning of our journey," because I had to learn so much in order to be a good manager for them. So that was quite different situation compared to the management roles in finance that I had before, because there I had the impression that I knew something, but IT was basically blank. Rob Collie (00:11:30): I would imagine that that experience turned out to be very important, the good cross pollination, the exposure to the IT function and sort of like seeing it from their side of the table, how valuable is that turned out to be for your career? Imke Feldmann (00:11:45): I think it was a good learning and really interesting experience for me just to feel comfortable with saying that I have no clue and ask the people how things work and just feel relaxed about not being the expert in a certain area and just be open to ask, to get a general understanding of things. Rob Collie (00:12:09): That's definitely the way to do it, is to be honest and transparent and ask all the questions you need to do. It's easier said than done. I think a lot of people feel the need to bluff in those sorts of situations. And that usually comes back to haunt them, not always. Imke Feldmann (00:12:25): No, that's true. Rob Collie (00:12:27): Some people do get away with it, which is a little sad. So at what point did you discover Power BI? Imke Feldmann (00:12:35): I didn't discover Power BI, I discovered Power Pivot, for your blog of course. Rob Collie (00:12:41): Oh, really? Imke Feldmann (00:12:43): Yes, yes, yes, yes. I think it was in, must be 2011, something like that. Rob Collie (00:12:50): Early, yeah. Imke Feldmann (00:12:51): Yeah. Quite early. When I was building a multidimensional cube with a freelancer for our finance department, then I was just searching a bit what is possible, how we should approach this and things like that. So we started with multi-dimensional cube because that was something where I could find literature about and also find experts who could have me building that. But when doing so, I really liked the whole experience and it was a really excellent project that I liked very much. And so I just searched around in the internet and tried to find out what's going on in that area. And this is where I discovered your blog. Rob Collie (00:13:35): I have no idea. First of all, I had no idea that my old blog was where you first crossed paths with this. Imke Feldmann (00:13:42): I think [inaudible 00:13:43]. Rob Collie (00:13:44): And secondly, I had no idea that it was that early. I mean, I remember when you showed up on the radar, Scott [inaudible 00:13:51] had discovered your blog and said, "Hey, Rob, have you seen this? Have you seen what she is doing? She is amazing." That wasn't 2011, that was a little bit later. I don't remember when but... Imke Feldmann (00:14:06): No, I think we've met first. I think we met on the Mr. XR Forum on some crazy stuff I did there. I cannot even remember what that was, but I started blogging in 2015 and we definitely met before. Rob Collie (00:14:21): That's what it was. It was the forums. And Scott was the one that had stumbled upon what you were doing there and brought my attention to it. I was like, whoa. It was like... Imke Feldmann (00:14:34): That last really some crazy stuff. I think I was moving data models from one Excel file to another or something like that. Some crazy stuff with [inaudible 00:14:43] and so on. Rob Collie (00:14:44): You obviously remember a better than I do. But I just remember being jaw dropped, blown away, impressed, by what you were doing. And the thing is the world of Power Pivot interest at that point in time still seems so small. The community still seems so small that for you to emerge on our radar fully formed, already blowing our minds, that was the first thing we ever heard from you. That was a real outlier because usually the way the curve of awareness went with other members of the community is that like, you'd see something modest from them. And you'd sorta like witnessed their upward trajectory as they developed. Of course, you've continued to improve and learn and all of that since then. But as far as our experience of it, it was you just showed up already at the graduate level, just like where did she come from? So cool. So you said that you enjoyed the multi-dimensional cube project? Imke Feldmann (00:15:43): Mm-hmm (affirmative). Yes. I don't know MDX, but I totally enjoyed the project. So being able to build a reporting solution for my own company, basically then for the company I worked for, and doing it live with a consultant with a freelancer on my hand, discussing how things should look like and just seeing the thing form before my eyes and grow. And this was just such an enjoyable experience for me. Rob Collie (00:16:11): So the thing that's striking about that for me is, there's no doubt that the multi-dimensional product from Microsoft was a valuable product. It did good things. But I never have heard someone say that they really enjoyed the implementation process as a client, right? Imke Feldmann (00:16:31): Okay. Rob Collie (00:16:31): You had a freelancer doing the work. So something you said there really jumped out at me, it was, sort of like doing the project live. So the way that this worked traditionally, at least in the US, is the consultant would interview you about your requirements and write a big long requirements document and then disappear and go build a whole bunch of stuff and come back and show it to you, and it's completely not what anyone expected. It's almost like you're on completely different planets. Obviously, if you'd had that experience, you would not be saying that you enjoyed it. So there had to be something different about the way that you and that freelancer interacted. Do you remember what the workflow was like? Imke Feldmann (00:17:16): What we did is that we often met together and just looked at where we're at and what the next steps should be. And we definitely had specific targets in mind. So there were some reports that I had defined as a target, and around these reports I was aware that we needed something that a proper data model, because I also knew that I wanted to have some sort of a general set up that could be carried from Excel as well. So I knew about cube functions, and I knew that on one hand I needed these reports that had formerly been within our ERP system. Also, I wanted them to be in a separate solution that was under my control and independent from the ERP system. And on the other hand, I wanted some more. So I wanted the flexibility to be able to vary this data and for certain other purposes in the controlling department as well. So basically being able to do ad hoc analysis on it. Imke Feldmann (00:18:23): And we met often and I showed a certain interest in how the table logic was created. So I knew that the MDX was over my head at the time, but I showed a very strong interest in which table are created, how they relate to each other, and that was quite unusual. At least this is what the [inaudible 00:18:47] the freelancer told me. Rob Collie (00:18:49): I bet. Imke Feldmann (00:18:50): He said that he doesn't see that very often that clients showed this sort of interest. Rob Collie (00:18:56): Did he say, "Yeah. You really seem to be having fun with this. Most of my clients don't enjoy this." You said that you met very often, so were there times where he was writing MDX while you were in the room? Imke Feldmann (00:19:10): Sometimes yes, because I said, "Well, can we switch this a bit or make some changes?" And sometimes he said, "Well, I can try adjust now." Because he came over for one day or half a day, and then we spoke things through and defined further things. And if we were finishing early, he would just stay and do some coding there. But apart from that, he would work from home and do the big stuff. Rob Collie (00:19:37): OLAP originally it stands for online analytical processing, where online meant not batch, right? It meant you could ask a question and get the answer while you were still sitting there. Imke Feldmann (00:19:51): Okay. Oh, really? Rob Collie (00:19:53): That's what online meant. Imke Feldmann (00:19:54): It's interesting. Rob Collie (00:19:56): It basically meant almost like real time. It's a cousin of real time, that's what online meant at that point, as opposed to offline where you write a query and submit it and come back next week right? So that's what the online and OLAP comes from. Imke Feldmann (00:20:12): Oh, interesting. Rob Collie (00:20:13): We would pick a different terminology of OLAP were it invented today. So something interesting about, it sounds like your experience, and I did not anticipate drilling into your experience with multi-dimensional on this conversation, but I think it's really important is that at least some portion of that project that you sponsored and implemented with the freelancer, at least some portion of the work was similarly performed online. Meaning the two of you were sort of in real time communication as things evolved. And the old model and the vast majority of multidimensional solutions that have ever been built in the world, the MDX powered solutions, were built and an offline model, where the majority of the communication supposedly takes place in the form of a requirements document. Rob Collie (00:21:05): And that was a deeply, deeply, deeply flawed approach to the problem, that just doesn't actually work. So I guess it's not surprising to me that the one time I've ever heard someone say they really enjoyed that multi-dimensional project, that at least a portion of that multidimensional project was sort of almost like real-time collaboratively performed rather than completely asynchronous, right? I guess we want to be really geeky, we could say it was a synchronous model of communication as opposed to an asynchronous one. And Power BI really facilitates that kind of interaction. Imke Feldmann (00:21:41): Absolutely. Rob Collie (00:21:42): The reason why the MDX multi-dimensional model worked the way it did, or there was two reasons, one is a legitimate one on one of them is more cynical. So the legitimate reason is, is that it required reprocessing of the cube for every change, it's just too slow, right? The stakeholder, the business stakeholder doesn't typically have the time or the patience to sit there while the code's being written, because it's so long between even just implementing a formula change sometimes would be, well, we need to wait an hour. And so the attention span of the business person can't be held for good reason there, right? And so that sort of drove it into an asynchronous model. Rob Collie (00:22:23): The other reason is, is that that is asynchronous model turned out to be a really good business model for the consultants, because the fact that it didn't work meant that every project lasted forever. And so that's the cynical reason. But Power BI is not long delays. You change the measure formula, or you add an extra relationship, or heck even bringing in a new table, just a brand new table, bring it in, it wasn't even in the model, now it's in the model. End to end that can sometimes be measured in minutes or even seconds. And so you can retain engaged collaborative interest. Now it's not like you're always doing that, right? There's still room for offline asynchronous work in our business, but really critical portions of it can be performed the other way. And I think that makes a huge difference. Imke Feldmann (00:23:13): Yep. And that's what I like about it. So it's so great to be able to have, as a consultant, to perform really relatively large tasks without any further involvement of other people. Which, I mean, honestly, I don't call myself a team worker, not because I don't love other people also, but teamwork means you have to communicate with other people, make sure that they know what you're working on. So there are so many interfaces that have to be maintained if you're working with other people. And so I really laugh the way I work currently being able to deliver full solutions as a one woman show consultant. That is really a pleasure for me. That's really my preferred way of work, I must say. Because I can really focus on the things that have to be done and I'm able to deliver value in a relatively short time for the clients. Rob Collie (00:24:14): That's a really interesting concept. There are certain kinds of problems in which collaboration, a team collaboration is absolutely necessary. The magic of collaboration sometimes can beat problems that no individual could ever beat. At the same time though, there's this other dynamic, right, where having a team working on a problem is actually a real liability because the communication complexity between the people becomes the majority of the work. Here's a really hyper simplified example. There used to be sort of a three-person committee, if you will, that was running our company P3, me and two other people. Imke Feldmann (00:24:57): Mm-hmm (affirmative). Rob Collie (00:24:58): And so all leadership decisions were essentially handled at that level. Well, things change, people move on, right? And so we went from a three person committee to a two person committee. We didn't anticipate the two of us who stayed, right? We did not anticipate how much simpler that was going to make things. We thought, just do the math, right, it's going to be like, well, it's one less person to get on the same page. So it's going to be a one-third reduction in complexity. It was actually double that because we went from having three pairs of communication, right, the triangle has three sides, to a line that only has one side, right? So there was only one linkage that needed to be maintained as opposed to three geometrically, combinatorially, whatever we're going to say, right? It just became- Imke Feldmann (00:25:45): Exponential. Rob Collie (00:25:45): ... Exponetially simpler. And so for problems that can be soloed, you have this amazing savings in efficiency, in clarity, even, right? Imke Feldmann (00:25:59): Yup. Rob Collie (00:25:59): There's just so many advantages when you can execute as one person, then there's the other examples like our company at our size now, even ignoring the number of consultants that we need to do our business, just the back office alone, we need the difference in skills. We need the difference in talents and interests and everything. We simply could not exist without that kind of collaboration. However, when our consultants were working with a client, usually it's essentially a one-on-one type of thing, right? We don't typically put teams of consultants on the same project. We might have multiple consultants working for the same client and they might be building something that's somehow integrated, but it's still very similar, I think to your model, when you actually watch sort of the work being done, there's this amazing savings and complexities. Imke Feldmann (00:26:50): Yup, that's true. Of course I have a network in the background. So when big problems arise where I need brain input, of course, I have a network, but it's not a former company. Rob Collie (00:27:02): And that's how we work too, right? We have all kinds of internal Slack channels. For some reason we adopted Slack years ago before Teams was really a thing. So Slack is sort of like our internal social network. There's a lot of discussion of problems, and solutions, and a lot of knowledge sharing, and people helping each other out behind the scenes in that same way. Again, we do bring multiple consultants into particularly large projects, but it's not like there's three people working together on the same formula. In Power BI, the things that you do in ETL, the things that you do in power query are intimately interrelated with the data model and the decks that you need to create. And imagine parceling that out to three different people. You have one formula writer, one data modeler, one ETL specialist, you would never ever get anywhere in that kind of approach. Imke Feldmann (00:28:00): Not necessarily. I mean, the tax people are the person responsible for the data model. He could write down his requirements. He could define the tables basically. And then someone could try to get the data from the sources. But of course, then you get some feedback that the data isn't there or that the model has to be shaped in a different way. So it has two sides to it. But that's interesting to see that you have the same experience, that Power BI models or solutions of a certain size that can very well be handled by one person alone. And that really brings speed, and flexibility, and agility to the whole development process I think. Rob Collie (00:28:41): You communicate with yourself at what's above giga? Peta, petabit? you communicate with yourself at petabit speed and you communicate with others through a noisy 2,400 baud modem that's constantly breaking up. It's amazing what that can do for you sometimes. So there comes a point in your journey where you decide to go freelance. Imke Feldmann (00:29:07): Yup. Rob Collie (00:29:08): That's a courageous leap. When did that happen and what led you to that conclusion? Imke Feldmann (00:29:13): I made the decision in 2012 already to do that. Rob Collie (00:29:19): Wow. Imke Feldmann (00:29:20): And I just saw the light. I just saw the light in Power Pivot and then Power Query came along and I saw what Microsoft was after. And as I said, I enjoyed the building of the cube, getting my hands dirty, reading about the technologies behind it and so on. And this was what I felt passionate about. And I also had the idea that I needed some break from company politics. And so I just thought, well, I give it a try. And if it doesn't work, I can find a job after that or find a company where I work for at any time after that. So I just tried it and it worked. Rob Collie (00:30:05): So you decided in 2012, did you make the break in 2012 as well? Imke Feldmann (00:30:12): I prepared it, and then I just in 2013, I started solo. Rob Collie (00:30:18): Okay. 2013 is also when we formally formed our company. For 2010-2013, it was a blog. I had other jobs. I had other clients essentially, but I wasn't really hanging out the shingle so to speak, as you know, we're not an actual business really until 2013. And I guess it's not much accident that we both kind of did the same thing about the same time, it's that demand was finally sufficient I think in 2013 to support going solo. In 2012, there weren't enough clients to even support one consultant. And so, oh, that's great. And I think you really liked Power Query too, does M speak to you? Imke Feldmann (00:31:02): Yes. Yes. Yeah. Rob Collie (00:31:03): It does, doesn't it? Imke Feldmann (00:31:04): I really prefer Power Query or M over DAX, I must admit. It has been much more liable to me than DAX. Rob Collie (00:31:15): Oh, and I liked you so much before you said that. I'm team DAX all the way. Imke Feldmann (00:31:23): I know. I know. I know. I mean, of course I love to use DAX as well, but I really feel very, very strong about Power Query. And I mean, I had such a great journey with it. I mean, it was really [inaudible 00:31:35] work for me personally, that I did with it. And it was just a great journey to understand how things work. I mean, this has been the first coding language for me that I really learned. And it was just a great journey to learn all the things and starting to blog about it. And of course, I started basically helping people in the forum, that's where I basically built my knowledge about it, solving other people's problems. And this was just a great journey. And Polar Query has always been good to me than DAX. Rob Collie (00:32:14): This is really cool, right? So you fell in love with Power Pivot, so DAX and data model, right? There was no Power Query. Imke Feldmann (00:32:21): Mm-hmm (affirmative)-, that's true. Rob Collie (00:32:23): Okay. And because we had no Power Query, there were many, many, many things you couldn't do in Power Pivot unless your data source was a database. Imke Feldmann (00:32:30): Yup. Rob Collie (00:32:31): Because you needed views created that gave you the right shape tables, right? If your original data source didn't have a lookup table, a dimension table, you had to make one. And how are you going to make one without Power Query? It gets crazy, right? At least unbelievable. So try to mentally travel back for a moment to the point in time where you're willing to, and not just, it doesn't sound like you were just willing to, you were eager to go solo to become a freelancer, right, with just DAX and data modeling. And then after that, this thing comes along that you light up when you talk about. You didn't have this thing that you love, but you were already in, that doesn't happen very often. Imke Feldmann (00:33:18): It could be that loved DAX at the beginning, but it just started to disappoint me at sometimes. Rob Collie (00:33:29): Oh, okay. Thomas LaRock (00:33:29): It disappoints everyone. Rob Collie (00:33:29): I'm just devastated. Imke Feldmann (00:33:35): No, I mean, it's amazing what DAX can do, but I mean, we all know it looks easy at the beginning, but then you can really get trapped in certain situations. Rob Collie (00:33:46): Yeah. I described these two things is like the length and width of a rectangle, Power Query and DAX. Take your pick, which one's the width, which one's the length? I don't care. And then we ask which one is more responsible for the area of the rectangle, right? Neither. You can double the length of either of them and it doubles the area of the rectangle. So it's really ironic that I'm so sort of firmly on team DAX for a number of reasons. Number one, is that I'm really not actually that good at it compared to the people who've come along since. Like my book, for instance, I think, I look at it as this is the 100 and maybe the 200 level course at university, maybe the first in the second course, maybe, but it's definitely not the third course. The thing that you take in your third or fourth year of university, that's not covered in my book in terms of DAX. Rob Collie (00:34:44): And basically every one of the consultants at our company is better at DAX than I am. And that's great. That's really good. And the other thing that's ironic about my love of DAX over M, is if these two were in conflict, which they aren't. Imke Feldmann (00:35:00): No they are. Rob Collie (00:35:02): Is that I actually was trying for years to get a Power Query like project started on the Excel team. I knew how much time was being chewed up in the world just transforming data, not analyzing it even, just getting things ready for analysis. It's just ungodly amounts of time. And so I was obsessed with end-user ETL. When I was on the Excel team, it was like a running joke, someone would mention in a meeting, "Well, that's kind of like ETL," and other people would go, "Oh no, no, don't say that in front of Rob, he's going to get started and he won't shut up about it for the next 30 minutes." On the podcast with the Power Query team, I told them I'm really glad that no one ever agreed to fund my project on the Excel team because now that I see what Power Query is like I grossly underestimated how much work needed to go into something like that. And I'm glad that Microsoft isn't saddled with some old and completely inadequate solution to the Power Query space, because now that I've seen what the real thing looks like, I'm like, "Oh my gosh, we would've never been able to pull that off." Rob Collie (00:36:14): So the thing that I was most obsessed with is the thing that now that it's actually been built, for some reason, I just find M to be, I don't know, there's like a reverse gravity there that pushes me away. Imke Feldmann (00:36:26): What I actually would like to see is that there's less need to use M in the Power Query product. So first, the only thing I was dreaming about was finally to have a function library that can easily be shipped from then, or that you can download from internet or wherever, where you can use additional functions in your M code. So this was the first thing that I was really passionate about and thought that we should have such a thing in Power Query to be able to make more cool things, or group steps together. But now what I really think we should actually have and see in Power Query is the ability to build our own ribbons and to the query editor. Rob Collie (00:37:13): Yes. Imke Feldmann (00:37:13): Like we have in an Excel. So this is something that in my eyes would really bring a big push to the product and actually would make so much sense for the people who start using these products. I mean the whole Power platform can have so many benefits for finance department, all departments, but I mean, I'm passionate about finance departments. But have you counted how many low-code languages are in there, if you include Power Apps and Power Automate and all these things? Rob Collie (00:37:50): Low-code. Imke Feldmann (00:37:50): And honestly, in order to come up with any solution that makes sense in a business environment, I would say in all of these solutions, there is no way around the code at the end. I mean, you get quite far with clicky, clicky, but I haven't seen solutions where you get around the languages. And now imagine the typical finance people who really they know the Excel formulas and some of them might know VBA as well. And now their server uses new low-code, no-code word, and just get your head around about five or six new languages that you all have to know and learn in order to get something useful and so on. So I think that's just not feasible for people who have real jobs in the business to learn all that. Rob Collie (00:38:42): Well, that's what you're here for, right? That's what your business is for and that's what P3 is for. Imke Feldmann (00:38:48): We get them started and the products are great. And if there are people in the companies who have a drive to learn things and take the time they get their heads around it, but it could be easier. It could be easier with things like that, where we could provide additional user interfaces and just make it even easier for people to build great solutions for them or adapt solutions that consultants had build initially, but to maintain them by themselves and make adjustments to them if needed. Rob Collie (00:39:19): So [inaudible 00:39:20] has an old joke where he says, when he's doing a presentation or something, he says, "That's a good question. And I define good question as a question I know the answer to, right." And then he says, "But then a great question is a question that is covered by the very next slide." So there's a similar parallel joke to make here, which is that, that idea you just talked about with the ribbons and everything, right? So if I said, it's a smart idea, what I would mean is, again, this is a joke, right? I would mean that that's an idea that I agree with and have kind of already had. But if I say it's a brilliant idea- Imke Feldmann (00:39:55): Okay. Rob Collie (00:39:56): ... Then it's an even better version of an idea that I've already had that has never occurred to me. Your idea is a brilliant idea. Imke Feldmann (00:40:02): Okay. Rob Collie (00:40:06): It goes beyond. So I have been advocating privately behind the scenes with the Power Query team forever telling them that they need about three or four more ribbon tabs. There's just way too many commonly encountered problems for which you can imagine there being a button for, and there's no button. Imke Feldmann (00:40:28): Exactly. Rob Collie (00:40:29): And it's like, I don't understand. I used to be on teams like that, but I don't understand why they haven't gotten to this. Because it seems so low hanging fruit. They've already built the engine, they've built the language, right? The language can already handle this, but you actually had two brilliant ideas in there that had never occurred to me. First of all, I'm used to the idea that the community can't contribute libraries of functions, they can't do that for DAX. Imke Feldmann (00:40:57): Mm-hmm (affirmative). Rob Collie (00:40:58): That's not even like engineering possible for DAX. And the reason for it is, is that the DAX engine is so heavily optimized in so many ways that there'd be no way to plug in some new function that's unpredictable in terms of what it needs to do. All of these things, they're all inherently interrelated and they make changes in the storage and the query engine to make this function work better and vice versa, because it has to take advantage of the index compression scheme and all of that kind of stuff. It's actually not possible, is the wrong word, but it's actually orders of magnitude more difficult, if not impossible to allow DAX to have UDF, user-defined function type of feature. Rob Collie (00:41:42): I don't think Power Query is like that though. Maybe naively, because again, I'm not on the internals team on the Power Query side. But it does seem like a UDF capability is at least much more feasible- Imke Feldmann (00:41:53): Absolutely. Rob Collie (00:41:54): ... For Power Query, which does execute row by row essentially. Other languages have this, right? One of the reasons that R is so popular is not that R is so awesome, is that R has tremendous libraries of commonly solved problems that you can just go grab off the internet or off the shelf and plug into your solution. Imke Feldmann (00:42:14): I have my own library I've created. You can go to my GitHub and you'll see 50, 60 custom M functions. You can package them in a record and [inaudible 00:42:24] them as a library and your M code, or you could even connect live to them and run them with an execute statement. But this is too difficult, although it's just a couple of clicks, but it's too difficult or at least intimidating for the beginners, who really Power Query beginners who start with the products, I think there's so much potential to make their life easier. And that's not through some coding stuff, or I know this function, I know that function, that's really can only come in my eyes through user interface with buttons. Rob Collie (00:42:59): Yeah, I agree. And just as importantly for me, is that I might actually come around and be like, just as much team Power Query as team DAX. Honestly, my frustration is just the M language and just my total lack of desire to learn it. [crosstalk 00:43:16]. It is what it really comes down to. It's not about M, it's not about Power Query, it's about me. Whereas again, I know the need that it fills is massively important. So it's not that I think it's a bad mission, I think it's like the mission in a lot of ways. I was obsessed with it long before I ever crossed paths with business intelligence, I was obsessed with data transformation, end user data transformation. It's just a problem that's about as ubiquitous as it gets. So let's make it happen. We agree, the two of us, that's it, right? It's like we need to go provide a unified front. Imke Feldmann (00:43:52): I think that that's an idea in the idea forum, I might send the link that you can maybe post. Rob Collie (00:43:56): We want that thing up, voted to the moon. I'll even go figure out what my sign in is on the ideas side. Imke Feldmann (00:44:08): Oh, good luck with it. Rob Collie (00:44:09): Which is absolutely impossible. I have no idea which of the 14 counts. And then I'll try to create a new one and it'll go, "Nah, you're not allowed to. We know it's you, but we won't tell you who it is, what your email address is." So I completely agree. So there's so many problems. I always struggle to produce the list. It's like I need to be writing down the list of things that are crucial, but here's an example. Remove duplicates, but control which duplicate you keep. That's a problem that can't be solved in the GUI today. Imke Feldmann (00:44:48): And you need the intimidating type of buffer that you have to write by hand around it, which is just pain. Rob Collie (00:44:56): Remove dups and don't care which one you keep. Okay, fine. That's a great simple button. There should be an advanced section that allows you to specify, oh, but before you keep the dups, sort by this column or sort in the following manner. Imke Feldmann (00:45:10): Exactly. Rob Collie (00:45:10): And then keep the first one of each group. It's easy for us to say outside the team, but apparently that is a, we just make a joke, right? That's apparently a Manhattan project level of software to add that extra button. Anyway, we'll get that. Thomas LaRock (00:45:27): That doesn't make sense to me though. I'm fascinated by all of your conversation and you guys are a hundred miles away from me in a lot of this stuff, but I could listen to it all day. But no, the fact that Excel can't do the remove duplicates, except for like the first of each one of something, that's a simple group by. In my head, I sit there and go that's easily solvable because Excel and DAX does such great stuff that I would never want to do in TSQL, how the hell do we stumble across a thing that's been solved by straight up SQL language that somehow can't get into an Excel? Rob Collie (00:46:01): Well, let's explain the problem very clearly and see if we're on the same page as to what the problem is, but either way it'll be valuable. So let's say you have a whole bunch of orders, a table full of orders. That is a really wide Franken table. It's got things like customer ID, customer address, customer phone number, but also what product they ordered, and how much of it, and how much it cost. Okay, and a date, a date of the order. All right. And you've been given this table because the people that are responsible for this system, they think that what you want is a report and not a data source. And this is incredibly common. Okay. So you need to extract a customer's dimension or lookup table out of this. You need to create a customer's table so that you can build a good star schema model. Okay. And Power Query is right there to help you. Power Query will help you invent a customer's look up table where one wasn't provided, and that's awesome. Rob Collie (00:46:58): Okay. So you say, okay, see customer ID this column. I want to remove duplicates based on that column. Okay, great. But now it's just that the order that the data came in from the report file or the database or whatever that will determine which duplicate is kept. What you really want to do of course is take the most recent customer order of each customer ID because they've probably moved. They may have changed phone numbers, whatever, right? You want their most recent contact information. You don't want their contact information for 15 years ago. And the M language allows you to solve this problem essentially sort by date, and then keep the most recent, but only if you get into the code manually, and as Imke points out, it's not even if you go into the code, the things that you would want to do, if you do a sort, you can add a sort step to the Power Query with the buttons, with the GUI, and then you do the remove duplicates and it ignores the source. Imke Feldmann (00:47:59): Yes. Rob Collie (00:48:02): The GUI almost tries to tell you that it's impossible, but if you know about table dot buffer. Imke Feldmann (00:48:07): So the question is why do we have a sort command in Power Query when it doesn't give the sort order? I mean, that is the question to ask. But that's how it is. Rob Collie (00:48:16): It sorts the results. It sorts the results, it just doesn't sort for the intermediate steps. Imke Feldmann (00:48:20): Why? No, that's quite technical. But would just be great if such a common task could be done with buttons that is reliable at the end. I fully agree. Rob Collie (00:48:35): So Tom, I think this one's really just an example of, again, I truly think that M and Power Query, just like DAX and data modeling, the Power BI data modeling, both of these things belong in the software hall of fame of all time. It is amazing, Power Query, M, is just ridiculously amazing. It's one of the best things ever invented. Remember this is someone who's associated with being a critic of it. Imke Feldmann (00:49:04): Yeah, you're making progress, it's great to see. Rob Collie (00:49:07): And yet I'm telling you that it's one of the top five things ever invented probably. And I think there's a certain tendency when you've done something that amazing to lose track of the last mile. I think it's more of a human thing. Imke Feldmann (00:49:19): Maybe, but I mean, what I see is that they are investing quite a lot in data flows, which makes a lot of sense as well in my eyes. Rob Collie (00:49:27): All that really does though, as far as you and I are concerned, Imke, is it makes it even more important that they solve this problem. Because it's now exposed in two different usage scenarios. Imke Feldmann (00:49:37): Yeah, you're right. Rob Collie (00:49:39): And I want my data flow to be able to control which duplicates are kept too. So that's what I'm saying. There's all these big sort of infrastructural technical challenges that do tend to draw resources. And it's not a neglect thing. Imke Feldmann (00:49:54): No, no. Rob Collie (00:49:54): It isn't like a willful failure or anything like that, I don't want to paint that kind of negative of a picture. Imke Feldmann (00:49:59): No. Rob Collie (00:50:00): It's just that out here in reality, the inability to do, even if we just identified the top 10 things like this, addressing those top 10 things with GUI, with buttons, what have I think in the world, maybe even a bigger impact than the entire data flows project, right? Because you would expand the footprint of human beings that are advocates of this stuff and then you go build data flows. You don't have to think of it as either or, right? They should do both. It's just that I think it's hard to appreciate the impact of those 10 buttons when you're on the software team. It's easier to appreciate the impact of data flows, which is massive. I don't mean to denigrate that. I think it's crazy good. It's just that this other thing is of a similar magnitude in terms of benefit, but it's harder to appreciate when you're on the software team. It's easier to appreciate when you're out here in the trenches, living it every single day. And every time I run into a problem like this, I have to put my hand up and say to my own team, I have to say, " Help." Thomas LaRock (00:51:02): So a casual observation I have is that you wish for there to exist one tool that will handle all of your data janitorial needs. And that tool doesn't necessarily exist because life is dirty, so is your data and you're never going to anticipate everything possible. Now, should that sorting functionality exist in that duplicates, the scenario gave me? Yeah, probably. But there's always going to be something next. And that's why I go to you and I say, the thing that you've described to me is you need your data to be tidy so that it can be consumed and used by a lot of these features that we've talked about today. And in order to get to tidy data, there's no necessarily one tool. Thomas LaRock (00:51:48): You're a big fan of the ETL, Rob. You know that, hey, maybe I need to take the source data and run it through some Python scripts, or some M, or something first before it goes to this next thing. And that's the reality that we really have. What you're wishing for is the one tool, the one button to rule it all. And that's going to take a while before that ever comes around. Rob Collie (00:52:09): The thing is though, is that M is ridiculously complete. Imke Feldmann (00:52:14): Yeah. Rob Collie (00:52:15): You can do anything with it. And it's a language that's optimized for data transformation. So I know you can do anything with C++ too, right? But this is a data crunching, data transformation, specialized language that is really complete. And its UI is woefully under serving the capabilities of the engine. And so I suppose we could imagine and deliberately design a data transformation scenario that maybe M couldn't do it. Imke Feldmann (00:52:45): No. Rob Collie (00:52:46): I think that'd be a very difficult challenge considering how good M is. Imke Feldmann (00:52:49): I think in terms of logic, M can do anything, but in terms of performance, there is some room for improvements. So because there's a streaming semantic running in the background, and as long as the stream runs through all the steps, if you have complex queries, this can really slow things down. And currently there is no button or command in the M language to cut the stream and say, well, stop it here and buffer what you have calculated until here, and then continue from there. So if you have really complex stuff that would benefit from an intermediate buffer, then you can store that in an Azure blob or CSV, or whatever. Specifically if you're working with data flows, you can create some automatic processes that would enable this kind of buffering. Imke Feldmann (00:53:45): And then you will see that the speed of the whole process that can really increase dramatically because in some situations, the speed in M drops exponentially. And these are occasions where a buffer would really helped things, but we don't have it yet in the engine of Power Query. So this was what really be something else that would be fairly beneficial if we wouldn't have to make these work-arounds through things. Rob Collie (00:54:14): Tom, that just occurred to me, I can't believe this is the first time that this thought has crossed my mind. But I think that you might fall into an abyss of love with M. Thomas LaRock (00:54:28): Well, I'm a huge James Bond fan, but... Rob Collie (00:54:30): Oh, no. I think you would really, really just dig it. Thomas LaRock (00:54:38): I don't think I have time to take on a new relationship at this point. I'm still with Python and R, so I mean, I don't know. I'm not going to disagree, I'm just, please don't start a new addiction for me. Rob Collie (00:54:51): Think of the content though, that you could produce over time. The M versus SQL versus Python treatises. Thomas LaRock (00:54:59): Cookbook. Rob Collie (00:55:00): You were made for this mission Tom. Thomas LaRock (00:55:03): Okay. So we'll have to talk later about it. You can sweet talk me. You know I've let you sweet talk me into any [inaudible 00:55:08]. Rob Collie (00:55:08): That's right, that's right. Come on, Tom. Get into M, you know that thing that I have nothing but praise for, that I just love to death, you need to do that. Thomas LaRock (00:55:18): For you. That's what you want to do, is you want to learn it but [inaudible 00:55:21] through me. Rob Collie (00:55:22): Oh, that wouldn't work. I would be, "Oh yeah, well this is still M." Thomas LaRock (00:55:29): You're going to be like, "Tom, where's your latest blog post on M so I can read it and hate upon it even more?" Rob Collie (00:55:37): No, I would not read. Just as the first step. Thomas LaRock (00:55:42): I'm going to read it, but not leave a comment about how much I hate it. Rob Collie (00:55:45): Let's go back to talking about how we did a bunch of big fat Fisher-Price buttons for me to mash my thumbs in the UI. That's what I need. Thomas LaRock (00:55:54): You know what? I'll do that. I'll open up VS code and I'll just build this one big button, it's Rob's button. Rob Collie (00:56:00): Hey, you won't believe this, but I recently installed VS code. Thomas LaRock (00:56:03): I don't believe it, why? Rob Collie (00:56:05): Well, because I needed to edit, not even write, because I'm not capable of it. I needed to edit an interface, add on customization for World of Warcraft. And the only purpose of this World of Warcraft add on interface modification was to allow me to drop snarky comments into a particular channel of the conversation based on the button that I press. I needed a menu of snarky comments to drop at particular points in time. It's hard to type them out all the time, right? So it's just like, now here we go. I dropped one of those. I dropped one of those. Thomas LaRock (00:56:37): We got to get you a real job or something. You got way too much time on your hands. Rob Collie (00:56:42): That was my number one contribution to the World of Warcraft Guild. For a couple of months, there was the snarky rogue chat. Thomas LaRock (00:56:48): You know that is on brand. Rob Collie (00:56:56): It prefixed every comment in the chat with a prefix, you came from rogue chat 9,000. So that people who aren't on the joke were like, "Why is this guy, he's usually very quiet, become so obnoxious. Look at the things he's saying." Anyway. So VS code. And that also involved GitHub. Because my friend who wrote the stub, the shell of this add on for me is a vice president at GitHub. So of course he puts the code in GitHub and points me to it and then points me to VS code, and I'm like, "Oh, you're making me work now? Okay. But you wrote the shell for me, so okay. All right. I'll play ball." So it doesn't sound like you regret your decision to go solo. Imke Feldmann (00:57:40): Absolutely. Rob Collie (00:57:41): You're not looking to go back to corporate life. Imke Feldmann (00:57:43): Absolutely not. Rob Collie (00:57:44): Not missing that. So what can you tell us about the last year or two? What impact, if any, did COVID have on your business? Imke Feldmann (00:57:52): Business has grown especially the last year. So people needed more reports than ever and solutions. So it really, I don't know whether it was COVID effect or just the fact that Power BI is growing and growing. Rob Collie (00:58:07): I'm sure it's both. So the dynamic we saw during 2020. So 2020 would be the, if you're going to have a year that was negatively impacted by COVID, it would have been 2020. And what we saw in 2020 was that we were definitely not acquiring new clients. We weren't making new relationships at nearly the rate we had been people weren't taking risks on meeting a new BI firm. That wasn't something that there was as much appetite for as there had been. However, amongst the clients where we already had a good relationship, we'd already been working with them for a while, their needs for data work expanded as a result of COVID because it did, it created all kinds of new problems and it invalidated so many existing blueprints of tribal knowledge of how we run the business. When reality changes, you need new maps, you need new campuses. Rob Collie (00:59:04): And so on net, we ended up our overall business still grew modestly over the course of 2020, year over year compared to 2019. But then when the new clients started to become viable again, people started looking, we're interested in making new relationships, 2021 has been a very, very strong year of growth, not moderate, really kind of crazy. How do you keep up with increased demand as a one person shop? Imke Feldmann (00:59:35): Saying no. Rob Collie (00:59:36): You have to make your peace with saying no. At one point in my history, I faced sort of the same thing and I decided not to say no, and instead decided to grow the company. That brought an enormous amount of risk and stress- Imke Feldmann (00:59:55): I can imagine. Rob Collie (00:59:55): ... Into my life that I did not anticipate its magnitude. I'm sure I anticipated it, but I didn't anticipate the magnitude of it. I'm very grateful that I'd made that decision though, because where we are today is incredible. That's a rocky transition. So today everything runs like clockwork basically. We have a lot of growth ahead of us that seems almost like it's just going to happen, we're just going to keep growing for a long time. But we had to set the table we had to build our organism as a company into a very different form than what it had been when it was just me. And that molting process it was very painful. I don't pretend that the scaling decision is the right decision, it's very much a personal one. I've certainly lived that. If the version of me that made the decision to scale the company knew everything that was coming, it would have been a much harder decision to make. You kind of have to have a little bit of naive optimism even to make that leap. Imke Feldmann (01:00:57): I can imagine that once you get these things figured out and with the dynamic that the product has, that has a good chance to get it going into a very successful business, I believe. Rob Collie (01:01:10): Well, with your profile and with the growing demand for these sorts of services, the percentage of no that you have to say is just going to keep going up. Imke Feldmann (01:01:20): Yeah. But I made my decision and that's just fine. Rob Collie (01:01:25): I'm very supportive of that decision. I don't have any criticism of it, again, especially knowing what I know now. But if there's going to be come a point where you're going to be saying yes 1% of the time, and the answer to that is ultimately, well, you just raise your rates, which is also very difficult to do. In the end, it's almost like an auction for your services. You need to run yourself like Google. There's a 40 hour block of Imke time coming up for availability. We'll just put it on eBay. Imke Feldmann (01:01:59): I mean, it's just nice to be able to choose with whom you work with. That's just nice. And I earned enough money, so that's fine. So I'm happy with that. Rob Collie (01:02:12): How do you choose who you work with? Is it mostly based on industry? Is it mostly based on job function that you're helping? Or is it more about the specific people? There's all kinds of things that could... Let's say if I came to your website today, I filled out your contact form, what are the things that I could say in that contact for a message that would lead you to say no, versus leads you to say maybe? Imke Feldmann (01:02:37): What I really like to do is to work with finance directors. So basically not people exactly like me, but I like to see that the managers approached me and they have an interest in the product itself and also therefore an interest to push it into their departments. So this is for me, a very, very good starting point because it's an area I'm familiar with. I know that there's enough critical support to get the decisions that have to be made and maybe also push IT to help with certain things. This is really one of my favorite set ups, I would say. Rob Collie (01:03:19): Yeah, we do a lot of work with finance departments as well. How long does sort of your average relationship run with a client? How long do you end up working with the same organization on average? Imke Feldmann (01:03:31): That's hard to say, that's really completely different. It can be the initial five days kickoff where we set up a PNL statement connect all the finance data and they go along with that. And basically, never hear again, or just occasionally hear again, "Can you help me with this problem or that problem?" And it could also be going on for years, basically with breaks in between of course, but some customers, they come every now and then when they want to expand things. Now I have a customer that I'm working on some hours or even days ever week since over a year by now. Rob Collie (01:04:15): That sounds similar to my experience as a freelancer, when it was just me, less similar to our business today, a little bit less. I mean, I think it's still more similar than not. It's just that the dial has moved a little bit. Imke Feldmann (01:04:32): So how long are your engagements then, usually? Rob Collie (01:04:35): Most of our engagements are, if we start out doing kind of that kickoff you're talking about, we started like a project with people, that tends to not be the end. We don't typically have people just immediately vanish after that because that's usually the point at which, I mean, they've got something working already, very often after the first week or so of working with a client, they've usually got some really amazing things built already at that point. But at the same time, that's really just at the beginning of the appetite. Usually there are things that are

Les Grandes Gueules
Marine Le Pen : une chute inarrêtable ? - 29/09

Les Grandes Gueules

Play Episode Listen Later Sep 29, 2021 22:18


Avec : Marie-Anne Soubré, avocate. Didier Giraud, agriculteur. Et Léa Falco, étudiante. - Alain Marschall et Olivier Truchot présentent un show de 3 heures avec leurs invités, où actualité rime avec liberté de ton, sur RMC la radio d'opinion. « Les Grandes Gueules » animées par Alain Marschall et Olivier Truchot sont de retour pour une 18e saison ! Agriculteur, fromager, avocat, enseignante… les 14 GG, issues de la société civile, n'ont jamais peur de défendre leurs idées. Entre débats animés, accrochages et éclats de rires, ces 3 heures de talk-show sont le reflet des vraies préoccupations des Français. Cette année, Fred Hermel débarque dans les GG avec un billet d'humeur : « C'est ça la France ». Chaque matin dès 6h, écoutez un show radio/télé unique en France. Pendant trois heures, l'équipe de RMC s'applique à partager l'actualité au plus près du quotidien des Français. Un rendez-vous exceptionnel mêlant infos en direct, débats autour de l'actualité, réactions et intervention d'experts. En simultané de 6h à 8h30 sur RMC Découverte. RMC est une radio généraliste, essentiellement axée sur l'actualité et sur l'interactivité avec les auditeurs, dans un format 100% parlé, inédit en France. La grille des programmes de RMC s'articule autour de rendez-vous phares comme Apolline Matin (6h-9h), les Grandes Gueules (9h-12h), Estelle Midi (12h-15h).

Les Grandes Gueules
Les Grandes Gueules du 29 septembre : Marie-Anne Soubré, Didier Giraud et Léa Falco - 9h/10h

Les Grandes Gueules

Play Episode Listen Later Sep 29, 2021 40:53


Avec : Marie-Anne Soubré, avocate. Didier Giraud, agriculteur. Et Léa Falco, étudiante. - Alain Marschall et Olivier Truchot présentent un show de 3 heures avec leurs invités, où actualité rime avec liberté de ton, sur RMC la radio d'opinion. « Les Grandes Gueules » animées par Alain Marschall et Olivier Truchot sont de retour pour une 18e saison ! Agriculteur, fromager, avocat, enseignante… les 14 GG, issues de la société civile, n'ont jamais peur de défendre leurs idées. Entre débats animés, accrochages et éclats de rires, ces 3 heures de talk-show sont le reflet des vraies préoccupations des Français. Cette année, Fred Hermel débarque dans les GG avec un billet d'humeur : « C'est ça la France ». RMC est une radio généraliste, essentiellement axée sur l'actualité et sur l'interactivité avec les auditeurs, dans un format 100% parlé, inédit en France. La grille des programmes de RMC s'articule autour de rendez-vous phares comme Apolline Matin (6h-9h), les Grandes Gueules (9h-12h), Estelle Midi (12h-15h), Super Moscato Show (15h-18h), Rothen s'enflamme (18h-20h), l'After Foot (20h-minuit).

Les Grandes Gueules
Le monde de Macron : Immigration, Macron met la pression sur le Maghreb - 29/09

Les Grandes Gueules

Play Episode Listen Later Sep 29, 2021 13:24


Avec : Marie-Anne Soubré, avocate. Didier Giraud, agriculteur. Et Léa Falco, étudiante. - Alain Marschall et Olivier Truchot présentent un show de 3 heures avec leurs invités, où actualité rime avec liberté de ton, sur RMC la radio d'opinion. « Les Grandes Gueules » animées par Alain Marschall et Olivier Truchot sont de retour pour une 18e saison ! Agriculteur, fromager, avocat, enseignante… les 14 GG, issues de la société civile, n'ont jamais peur de défendre leurs idées. Entre débats animés, accrochages et éclats de rires, ces 3 heures de talk-show sont le reflet des vraies préoccupations des Français. Cette année, Fred Hermel débarque dans les GG avec un billet d'humeur : « C'est ça la France ». RMC est une radio généraliste, essentiellement axée sur l'actualité et sur l'interactivité avec les auditeurs, dans un format 100% parlé, inédit en France. La grille des programmes de RMC s'articule autour de rendez-vous phares comme Apolline Matin (6h-9h), les Grandes Gueules (9h-12h), Estelle Midi (12h-15h), Super Moscato Show (15h-18h), Rothen s'enflamme (18h-20h), l'After Foot (20h-minuit).

Les Grandes Gueules
Marine Le Pen : une chute inarrêtable ? - 29/09

Les Grandes Gueules

Play Episode Listen Later Sep 29, 2021 22:18


Avec : Marie-Anne Soubré, avocate. Didier Giraud, agriculteur. Et Léa Falco, étudiante. - Alain Marschall et Olivier Truchot présentent un show de 3 heures avec leurs invités, où actualité rime avec liberté de ton, sur RMC la radio d'opinion. « Les Grandes Gueules » animées par Alain Marschall et Olivier Truchot sont de retour pour une 18e saison ! Agriculteur, fromager, avocat, enseignante… les 14 GG, issues de la société civile, n'ont jamais peur de défendre leurs idées. Entre débats animés, accrochages et éclats de rires, ces 3 heures de talk-show sont le reflet des vraies préoccupations des Français. Cette année, Fred Hermel débarque dans les GG avec un billet d'humeur : « C'est ça la France ». Chaque matin dès 6h, écoutez un show radio/télé unique en France. Pendant trois heures, l'équipe de RMC s'applique à partager l'actualité au plus près du quotidien des Français. Un rendez-vous exceptionnel mêlant infos en direct, débats autour de l'actualité, réactions et intervention d'experts. En simultané de 6h à 8h30 sur RMC Découverte. RMC est une radio généraliste, essentiellement axée sur l'actualité et sur l'interactivité avec les auditeurs, dans un format 100% parlé, inédit en France. La grille des programmes de RMC s'articule autour de rendez-vous phares comme Apolline Matin (6h-9h), les Grandes Gueules (9h-12h), Estelle Midi (12h-15h).

Building the Backend: Data Solutions that Power Leading Organizations
Exploring Open-Source Data Integration With Airbyte

Building the Backend: Data Solutions that Power Leading Organizations

Play Episode Listen Later Sep 28, 2021 35:22


“The hardest part of ETL is not building the connectors, it is maintaining them.” Truer words never spoken. Really enjoyed this episode with Michel Tricot CEO & Co-Founder of Airbyte where we discuss all things data integration and connectors. Top 3 value bombs: The future of ETL/ELT integration connectors may lie with open source. Many closed source data integration tools only create connectors if the ROI is there, but this leaves many tools out and speed to market can be slow. Airbyte has created a modular open source framework that allows the community to quickly build reliable data connectors. As Airbyte starts to monetize they have some innovative methods, one of which is if a developer from the open source community creates and maintains a connector they could potentially get a small percentage of revenue associated with that connector. Data governance ang logging is  increasingly becoming more important in the coming years. 

Bot Nirvana | RPA & AI Podcast | Process Automation

AutomationEdge is a Hyperautomation platform with RPA-as-a-Service, API Connectors, Chatbots, ETL and more.

Screaming in the Cloud
Cranking Up the Heatwave with Nipun Agarwal

Screaming in the Cloud

Play Episode Listen Later Sep 23, 2021 34:45


About NipunNipun Agarwal is Vice President, MySQL HeatWave and Advanced Development, Oracle. His interests include distributed data processing, machine learning, cloud technologies and security. Nipun was part of the Oracle Database team where he introduced a number of new features. He has been awarded over 170 patents.Links:HeatWave: https://oracle.com/heatwave TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: You could build you go ahead and build your own coding and mapping notification system, but it takes time, and it sucks! Alternately, consider Courier, who is sponsoring this episode. They make it easy. You can call a single send API for all of your notifications and channels. You can control the complexity around routing, retries, and deliverability and simplify your notification sequences with automation rules. Visit courier.com today and get started for free. If you wind up talking to them, tell them I sent you and watch them wince—because everyone does when you bring up my name. Thats the glorious part of being me. Once again, you could build your own notification system but why on god's flat earth would you do that?Corey: This episode is sponsored in part by our friends at VMware. Let's be honest—the past year has been far from easy. Due to, well, everything. It caused us to rush cloud migrations and digital transformation, which of course means long hours refactoring your apps, surprises on your cloud bill, misconfigurations and headache for everyone trying manage disparate and fractured cloud environments. VMware has an answer for this. With VMware multi-cloud solutions, organizations have the choice, speed, and control to migrate and optimizeapplications seamlessly without recoding, take the fastest path to modern infrastructure, and operate consistently across the data center, the edge, and any cloud. I urge to take a look at vmware.com/go/multicloud. You know my opinions on multi cloud by now, but there's a lot of stuff in here that works on any cloud. But don't take it from me thats: VMware.com/go/multicloud and my thanks to them again for sponsoring my ridiculous nonsense.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Today's promoted episode is slightly off the beaten track. Normally in tech, we tend to find folks that have somewhere between an 18 to 36-month average tenure at companies. And that's great, however, let's do the exact opposite of that today. My guest is Nipun Agarwal, who's the VP of MySQL HeatWave and Advanced Development at Oracle, where you've been an employee for 27 years, is it?Nipun: That's absolutely right. 27 years and that was my first job out of school. So, [laugh] yes.Corey: First, thank you for joining me. It is always great to talk to people who have focused on an area that I only make fun of from a distance, in this case, databases which, you know, DNS works well enough for most use cases, but occasionally customers have other constraints. You are clearly at or damn near at the top of your field. In my pre-show research, I was able to unearth that you have—what is it now, 170, 180 filed patents that have been issued?Nipun: That's right. 180 issued patents. [laugh].Corey: You clearly know what you're doing when it comes to databases.Nipun: Thank you for the opportunity. Yes, thank you.Corey: So, being a VP at Oracle, but starting off as your first job as almost a mailroom to the executive suite style story, we don't see those anymore. In most companies, it very much feels like the path to advance is to change jobs to other companies. It's still interesting seeing that that's not always the path forward, for some folks. I think that the folks who have been in companies for a long time need more examples and role models to look at in that sense, just because it is such an uncommon narrative these days. You're not bouncing around between four companies.Nipun: Yeah. I've been lucky enough to have joined Oracle, and although I had been at Oracle, I've been on multiple teams at Oracle and there has been a great opportunity of talent, colleagues, and projects, where even to this day, I feel that I have a lot more to learn. And there are opportunities within the company to learn and to grow. So no, I've had an awesome ride.Corey: Let's dive in a little bit to something that's been making the rounds recently, specifically you've released something called HeatWave, which has been boasting some, frankly, borderline unbelievable performance benchmarks, and of course, everyone loves to take a crack at Oracle for a variety of reasons, so Twitter is very angry. But I've learned at some point, through the course of my career, to disambiguate Twitter's reactions from what's actually happening out there. So, let's start at the beginning. What is HeatWave?Nipun: HeatWave is an in-memory query accelerator for MySQL. It accelerates complex, long-running, analytic queries. The interesting thing about HeatWave is, with HeatWave we now have a single MySQL database which can run all your applications, whether they're OLTP, whether they're mixed workloads, or whether they're analytics, without having to move the data out of MySQL. Because in the past, people would need to move the data from MySQL to some other database running analytics, so people would end up with two different databases. With this single database, no need for moving the data, and all existing tools and applications which worked with MySQL continue to work, except they will be much faster. That's what HeatWave is.Corey: The benchmarks that you are publishing are fairly interesting to me, specifically, the ones that I've seen are, you've classified HeatWave as six-and-a-half times faster than Amazon Redshift, seven times faster than Snowflake, nine times faster than BigQuery, and a number of other things, and fourteen hundred times faster than Amazon Aurora. And what's interesting to me about the things that you're naming is they're not all data-warehouse style stuff. Aurora, for example, is Amazon's interpretation of an in-house developed managed database service named after a Disney Princess. And it tends to be aimed at things that are not necessarily massive scale. What is the sweet spot, I guess, of HeatWaves data sizes when it comes to really being able to shine?Nipun: So, there are two aspects where our customers are going to benefit from HeatWave. One characteristics is the data size, but the other characteristics is the complexity of the queries. So, let's first do the comparison with Aurora—and that's a very good question—the 1400 times comparison we have shown, yes, if you take the TPC-H queries on a four terabyte workload and if you run them, that's what you're going to see. Now, the interesting thing is this: not only is it 1400 times faster it's also at half the price because for most of these systems, if you throw more gear, if you throw more hardware, the performance would vary. So, it's very important to go with how much of performance and at what price.So, for pure analytics—say, for four terabytes—is 1400 times faster at half the price. So, if it provides truly 800 times better price performance compared to Aurora for pure analytics. Now, let's take the other extreme. 100 gigabytes—which is a much smaller, your bread and butter database—and this is for mixed workloads. So, something like a CH-benCHmark, which has a combination of say, some TPC-C transactions, and then some added IPP-CH queries, which—the CH benCHmark.Here we have 42 times advantage price performance over Aurora because we are 42% of the cost, less than half the cost of Aurora and for the complex queries, we are about 18 times faster, and for pure OLTP, we are at par. So, the aggregate comes out to be about 42 times better. So, the mileage varies depending upon the data size and depending upon the complexity of the queries. So, in the case of Aurora, it will be anywhere from 42 times better price performance all the way to 2800.Corey: Does this have an upper bound, for example? Like, if we take a look at something like Redshift or something like Snowflake, where they're targeting petabyte-scale workloads at some point, that becomes a very different story for a lot of companies out there. Is that something that this can scale to, or is there a general reasonable upper bound of, okay, once you're above X number of terabytes, it's probably good to start looking at tiering data out or looking at a different solution?Nipun: We designed HeatWave primarily for those customers who had to move the data out of MySQL database into some other database for running analytics. The upper bound for the data in the MySQL database is 64 terabytes. Based on the demand and such we are seeing, we support 32 terabytes processing in HeatWave at any given point in time. You can still have 64 terabytes in the MySQL database, but the amount of data you can load into the HeatWave cluster at any given point in time is 32 terabytes.Corey: Which is completely reasonable. I would agree with you from not having much database exposure myself in the traditional sense, but from a cloud economics standpoint alone, anytime you have to move data to a different database for a different workload, you're instantly jacking costs through the roof. Even if it's just the raw data volumes, you now have to store it in two different places instead of one. Plus, in many cases, the vaguearities of data transfer pricing in many places wind up meaning that you're paying money to move things out, there's a replication story, there's a sync factor, and then it just becomes a management overhead problem. If there's a capacity to start using the data where it is in more intelligent ways, that alone has a massive economic wind, just from a time it takes your team to not have to focus on changing infrastructure and just going ahead to run the queries. If you want to start getting into the weeds of all the different ways something like this is an economic win, there's a lot of angles to look at it from.Nipun: That's an excellent point and I'm very glad you brought it up. So, now let's take the other set of benchmarks we were talking about: Snowflake. So, HeatWave is seven times faster and one-fifth the cost; it's about 35 times better price performance. Compared to let's say Redshift AQUA, six-and-a-half times faster at half the cost, so 13 times better price performance. And it goes on and on.Now, these numbers I was quoting is for 10 terabytes TPC-H queries. And the point which you said is very, very valid. When we are talking about the cost for these other systems, it's only the cost for analytics without including the cost of the source database or without including the cost of moving the data or managing to different databases. Whereas when you're talking about the cost of HeatWave, this is the cost which includes the cost of both transaction processing as well as the analytics. So, it's a single database; all the cost is included, whereas, for these other vendors, it's only the cost of the analytic database. So, the actual cost to a user is probably going to be much higher with these other databases. So, the price performance advantage with HeatWave will perhaps be even higher.Corey: Tell me a little bit about how it works. I mean, it's easy to sit here and say, “Oh, it's way faster and it's better in a bunch of benchmark stuff,” and we will get into that in a little bit, but it's described primarily as an in-memory query accelerator. Naively, I think, “Oh, it's just faster because instead of having data that lives on disk, it winds up having some of it live in RAM. Well, that seems simple and straightforward.” Like, oh, yeah, I'm going to go on a limb and assume that there aren't 160 patents tied to the idea that RAM is faster than disk. There's clearly a lot more going on. How does this work? What is it foundationally?Nipun: So, the thing to realize is HeatWave has been built from the ground up for the cloud and it is optimized for the Oracle Cloud. So, let's take these things one at a time. When I say designed from the ground up for the cloud, we have actually invented and implemented new algorithms for distributed query processing, which is what gives us such a good advantage in terms of operations like joint processing, window functions, aggregations. So, we have come up—invented, implemented new algorithms for distributed query processing. Secondly, we have designed it for the cloud.And by that what I mean is, A, we have a lot of emphasis on scalability, that it scales to thousands of cores with a very, very good scale factor, which is very important for the cloud. The next angle about the cloud is that not only have we optimized it for the cloud, but we have gone with commodity cloud services, meaning, for instance, when you're looking at the storage, we are looking at the least expensive price. So, for instance, we use object store; you don't use, for instance, locally attached SSDs because that will be expensive. Similarly, for compute: instead of using Intel, we use AMD chips because they are less expensive. Similarly, networking: standard networking.And all of this has been optimized for the specific Oracle Cloud infrastructure shapes we have, for the specific VMs we use, for the specific networking bandwidth we get, for the object store bandwidth and such; so that's the third piece, optimized for OCI. And the last bit is pervasive use of machine learning in the service. So, a combination of these four things: designed for the cloud, using commodity cloud services, optimized for the quality cloud infrastructure, and finally the pervasive use of machine learning is what gives us very good performance, very good scale, at a very inexpensive price.Corey: I want to dig into the idea of the pervasive use of machine learning. In many cases, machine learning is the answer to how do I wind up bilking a bunch of VCs out of money? And Oracle is not a venture-backed company at this stage of its existence, it is a very large, publicly-traded entity; you have no need to do that. And I would also further accept that this is one of those bounded problem spaces where something that looks machine-learning-like could do very well. Is that based upon what it observes and learns from data access patterns? Is it something that it learns based from a specific workload in question? What is the gathering, and is it specific to individual workloads that a given customer has, or is it holistically across all of the database workloads that you see in Oracle Cloud?Nipun: So, there are multiple parts to this question. The first thing is—and I think as you're noting—that with the cloud, we have a lot more opportunity for automation because we know exactly what is the hardware stack, we know the software stack, we know the configuration parameters.Corey: Oh yes, hell is other people's data centers, for sure.Nipun: [laugh]. And the approach we have taken for automation is machine-learning-based automation because one of the big advantages is that we can have a model which is tailored to a specific instance and as you run more queries, as you run more workloads, the system gets more intelligent. And we can talk about that maybe later about, like, specific things which make it very, very compelling. The third thing, I think, which you were alluding to, is that there are two aspects in machine learning: data, and the models or the algorithms. So, the first thing is, we have made a lot of enhancements, both to the MySQL engine as well as HeatWave, to collect new kinds of data.And by new kinds of data, I mean, that not only do we collect statistics of data, but we collect statistics of, say, the queries: what was the compilation time? What was the execution time? And then, based on this data which we're collecting, we have then come up with very advanced algorithms—machine learning algorithms—which are, again, a lot of them, there is, like, you know, patterns or [IP 00:14:13] which we have built on top of the existing state of art. So, for instance, taking these statistics and extrapolating them on larger data sizes. That's completely an innovation which we did in-house.How do we sample a very small percentage of the data and still be accurate? And finally, how do we come up with these machine learning models which are accurate without hiring an army of engineers? That's because we invented our AutoML, which is very efficient. So, that's basically the ecosystem of the machine learning which we have, which has been used to provide this.Corey: It's easy for folks to sit there and have a bunch of problems with Oracle for a variety of reasons, some of which are no longer germane, some of which are, I'm not here to judge. But I think it's undeniable—though it sometimes gets eclipsed by people's knee-jerk reactions—the reason that Oracle is in so many companies that it is in is because it works. You folks have been pioneers in the database space for a very long time and that's undeniable. If it didn't deliver performance that was untouchable for a long time, it would not have gotten to the point where you now are, where it is the database of record for an awful lot of shops. And I know it's somehow trendy, sometimes, for the startup set to think, “Oh, big companies are slow and awful. All innovation comes out of small, scrappy startups here.”But your customers are not fools. They made intelligent decisions based upon constraints that they're working within and problems that they need to solve. And you still have an awful lot of customers that are not getting off of Oracle anytime soon because it works. It's one of those things that I think is nuanced and often missed. But I do feel the need to ask about the lock-in story. Today, HeatWave is available only on the managed MySQL service in Oracle Cloud, correct?Nipun: Correct.Corey: Is there any licensing story tied to that? In other words, “Well, if I'm going to be using this, I need to wind up making a multi-year commitment. I need to get certain support things, as well,” the traditional on-premises Oracle story. Or is this an actual cloud service, in that you pay for what you use while you use it, and when you turn it off, you're done? In theory. In practice, we know in cloud economics, no one ever turns anything off until the company goes out of business.Nipun: So, it's exactly the letter what you said that this is a managed service. It's pay as you go, you pay only for what you consume, and if you decide to move on, there's absolutely no license or anything that is holding you back. The second thing—and I'm glad you brought it up—about the vendor lock-in. One of the very important things to realize about HeatWave is, A, it's just an accelerator for MySQL, but in the process of doing so, we have not introduced any proprietary syntax. So, if customers have the MySQL application running on some other cloud, they can very easily migrate to OCI and try MySQL HeatWave.But for whatever reason, if they don't like it, and they want to move out, there is absolutely nothing which is holding them back. So, the ease of which they can come in with the same ease they can walk out because we don't have any vendor lock-in. There is absolutely no proprietary extensions to HeatWave.Corey: There is the counter-argument as far as lock-in goes, and we see this sometimes with companies we talk to that were considering Google Cloud Spanner, as an example. It's great, and you can use it in a whole bunch of different places and effectively get ACID-compliance-like behavior across multiple regions, and you don't have to change any of the syntax of what it is you're using except the lock-in there is one of a strategic or software architecture lock-in because there's nothing else quite like that in the universe, which means that if you're going to migrate off of the single cloud where that's involved, you have to re-architect a lot, and that leads to a story of lock-in. I'm curious as to whether you're finding that customers are considering that as far as the performance that you're giving for MySQL querying is apparently unparalleled in the rest of the industry; that leads to a sort of lock-in itself when people get used to that kind of responsiveness and build applications that expect that kind of tolerances. At some point, if there's nothing else in the industry like it, does that means that they find themselves de-facto locked in?Nipun: If you were to talk about some functionality which we are offering which no one else is offering, perhaps you could, kind of, make that case. But that's not the case for performance because when we are so much faster—so suppose I said, okay, we are so much faster; we are six-and-a-half times faster than Redshift at half the cost. Well, if someone wanted the same performance, they can absolutely do it Redshift on a much larger cluster, and pay a lot more. So, if they want the best performance at the best price, they can come to Oracle Cloud; if they want the same performance but they will have to pay more, they can go anywhere else. So, I don't think that's a vendor lock-in at all.That's a value which we are bringing in that for the same performance, we are much cheaper. Or you can have that kind of a balance that we are faster and cheaper. So, there is no lock-in. So, it's not to say that, okay, we have made some extensions to MySQL which are only available in our cloud. That is not at all the case.Now, for some other vendors and for some other applications—you brought up Spanner; that's one. But we have had multiple customers of MySQL who, when they were trying Google BigQuery, they mentioned this aspect that, okay, Google BigQuery had these proprietary extensions and they feel locked in. That is not the case at all with HeatWave.Corey: This episode is sponsored by our friends at Oracle HeatWave is a new high-performance accelerator for the Oracle MySQL Database Service. Although I insist on calling it “my squirrel.” While MySQL has long been the worlds most popular open source database, shifting from transacting to analytics required way too much overhead and, ya know, work. With HeatWave you can run your OLTP and OLAP, don't ask me to ever say those acronyms again, workloads directly from your MySQL database and eliminate the time consuming data movement and integration work, while also performing 1100X faster than Amazon Aurora, and 2.5X faster than Amazon Redshift, at a third of the cost. My thanks again to Oracle Cloud for sponsoring this ridiculous nonsense.Corey: I do want to call out, just because it seems like there's a lies, damned lies, and database benchmarks story here where, for example, Azure for a while was doing a campaign where they were five times less expensive for database workloads than AWS until you scratched beneath the surface and realize it's because they're playing ridiculous games with licensing, making it very expensive to run a Microsoft SQL Server on anything that wasn't Azure. Customers are not necessarily as credulous as they once were when it comes to benchmarking. And Oracle for a long time hasn't really done benchmarking, and in fact, has actively discouraged it. For HeatWave, you've not only published benchmarks, which okay, vendors can say anything they want, and I'm going to wait until I see independent returns, but you put not just the benchmarks, but data sets, and your entire methodology onto GitHub as well. What led to that change? That seems like the least Oracle-like thing I could possibly imagine.Nipun: I couldn't take credit for the idea. The idea actually was from our Chief Marketing Officer, that was really his idea. But here is the reason why it makes a lot more sense for us to do it for MySQL HeatWave. MySQL is pervasive; pretty much any cloud vendor you can think about has a MySQL-based managed service. And obviously, MySQL runs on premise, like a lot of customers and applications do it.Corey: That's one of the baseline building blocks of any environment. I don't even need to be in the cloud; I can get MySQL working somewhere. Everyone has it, and if not, why don't you? And I can build it in a VM myself in 20 minutes.Nipun: That's right.Corey: It is a de-facto standard.Nipun: That's right. So, given that is the case and many other cloud vendors are innovating on top of it—which is great—how do you compare the innovation or the value proposition of Cloud Vendor A with us? So, for that, what we felt was that it is very important and very fair that we publish our scripts so that people can run those same scripts with a HeatWave, as well as with other cloud offerings, and make a determination for themselves. So, given the popularity of MySQL and given that pretty much all cloud vendors provide an offering of MySQL, and many of them have enhanced it, in order for customers to have an apples-to-apples comparison, it is imperative that we do this.Corey: I haven't run benchmarks myself just yet, just because it turns out, there's a lot of demands on my time and also, as mentioned, I'm not a deep database expert, unless it comes to DNS. And we keep waiting for people to come back with, “Aha. Here's why you're completely comprised of liars.” And I haven't heard any of that. I've heard edges and things here about, “Well, if you add an index over here, it might speed things up a bit,” but nothing that leads me to believe that it is just a marketing story.It is a great marketing story, but things like this fall apart super quickly in the event that it doesn't stand up to engineering scrutiny. And it's been out long enough that I would have fully expected to have heard about it. Lord knows if anyone is listening and has thoughts on this, I will be getting some letters after this episode, I expect. But I've come to expect those; please feel free to reach out. I'm always thrilled to do follow-up episodes and address things like this.When does it make sense from your perspective for someone to choose HeatWave on top of the Oracle Cloud MySQL service instead of using some of the other things we've talked about: Aurora, Redshift, Snowflake, et cetera? When does that become something that a customer should actively consider? Is it for net-new workloads? Should they consider it for migration stories? Should they run their database workloads in Oracle Cloud and keep other stuff elsewhere? What is the adoption path that you see that tends to lead to success?Nipun: All customers of MySQL, or all customers of any open-source database, those would be absolutely people who should consider MySQL HeatWave. For the very simple reason: first, regardless of the workload, whether it is OLTP only, or mixed workloads, or analytics, the cost is going to be significantly lower. I'll say at least it's going to be half the cost. In most of the cases, it's probably going to be less than half the cost. So, right off the bat, customers save half the cost by moving to MySQL HeatWave.And then depending upon the workload you have, as you have more complex queries, the performance advantage starts increasing. So, if you were just running only OLTP, if you only had transactions and you didn't have any complex queries—which is very unlikely for real-world applications, but even if that was the case, you're going to save 60% by going to MySQL HeatWave. But as you have more complex queries you will start finding that the net advantage you're going to get with performance is going to keep increasing and will go anywhere from 10 times aggregate to as much as 1400 times. So, all open-source, MySQL-based applications, they should consider moving. Then you mentioned about Snowflake, Redshift, and such; for all of them, it depends on what the source database is and what is it that they're trying to do.If they are moving data from, say, some open-source databases, if they are ETL-ing from MySQL, not only will MySQL HeatWave be much faster and much cheaper, but there's going to be a tremendous value proposition to the application because they don't need to have two different applications for two different databases. They can come back to MySQL, they can have a single database on which they can run all their applications. And then again, you have many of these cloud-native applications are born in the cloud where people may be looking for a simple database which does the job, and this is a great story—both in terms of cost as well as in terms of performance—and it's a single database for all your applications, significantly reduces the complexity for users.Corey: To turn the question around a little bit, what sort of workloads is MySQL HeatWave not a fit for? What sort of workloads are going to lead to a poor customer experience? Where, yeah, this is not a fit for that workload?Nipun: None, except in terms of the data size. So, if you have data sizes which are more than 64 terabytes, then yes, MySQL HeatWave is not a good fit. But if your data size is under 64 terabytes, you're going to win in all the cases by moving to MySQL HeatWave, given the functionality and capabilities of MySQL.Corey: I'd also like to point out that recently, HeatWave gained the MySQL Autopilot capability, which I believe is a lot of the machine learning technologies that you were speaking about a few minutes ago. Are there plans to continue to expand what HeatWave does and offer additional functionality? And—if you can talk about any of that. I know that roadmap is always something that is difficult to ask about, but it's clear that you're investing in this. Is your area of investment looking more like it's adding additional features? Is it continuing to improve existing performance? Something else entirely? And of course, we also accept you can't tell me any of [laugh] that has a valid answer.Nipun: Well, we just got started, so we just had our first [GF 00:27:03] HeatWave in December, and you saw that earlier this week we had our second major release of HeatWave. We are just getting started, so absolutely we are investing a lot in this area. But we are pretty much going to attempt all the things that you said. We have feedback from existing customers which is very high up on the priority list. And some of these are just one, say, class of enhancements which [unintelligible 00:27:25], can HeatWave handle larger sizes of data? Absolutely, we have done that; we will continue doing that.Second is, can HeatWave accelerate more constructs or more queries? Absolutely, we will do that. And then you have other kinds of capabilities which customers are asking which you can think of are, like you know, bigger features, which for instance, we announced the support for scale-out data storage which improves recovery time. Well, you're going to improve the recovery time or you're going to improve the time it takes to restart the database. And when I say improve, we are talking about not an improvement of 2X or 3X, but it's 100 times improvement for, let's say, a 10 terabyte data size.And then we have a very good roadmap which, I mean, it's a little far out that I can't say too much about it, but we will be adding a lot of very good new capabilities which will differentiate HeatWave even more, compared to the competitive services.Corey: You have very clearly forgotten more about databases than most of us are ever going to know. As you've been talking to folks about HeatWave, what do you find is the most common misunderstanding that folks like me tend to come away with when we're discussing the technology? What is it that is, I guess, a nuance that is often being missed in the industry's perspective as they evaluate the new technology?Nipun: One aspect is that many times, people just think about a service to be here some open-source code or some on-premise code which is being hosted as a managed service. Sure, there's a lot of value to having a managed service, don't get me wrong, but when you have innovations, particularly when you have spent years in years or decades of innovation for something which is optimized for the cloud, you have an architectural advantage which is going to pay dividends to customers for years and years to come. So, there is no substitute for that; if you have designed something for the cloud, it is going to do much better whether it's in terms of performance, whether it's in terms of scalability, whether it's in terms of cost. So, that's what people have to realize that it takes time, it takes investment, but when we start getting the payoff, it's going to be fairly big. And people have to think that okay, how many technologies or services are out there which have made this kind of investment?So, what I'm really excited about is, MySQL is the most popular database amongst developers in the world; we spend a lot of time, a lot of person-years investing over the last, you know, decade, and now we are starting to see the dividends. And from what we have seen so far, the response has been terrific. I mean, it's been really, really good response, and we are very excited about it.Corey: I want to thank you for taking so much time to speak with me today. If people want to learn more, where can they go?Nipun: Thank you very much for the opportunity. If they would like to know more, they can go to oracle.com/heatwavewhere we have a lot of details, including a technical brief, including all the details of the performance numbers we talked about, including a link to the GitHub where they can download the scripts. And we encourage them to download the scripts, see that they're able to reproduce the results we said, and then try their workloads. And they can find information as to how they can get free credits to try the service for free on their own and make up their mind themselves.Corey: [laugh]. Kicking the tires on something is a good way to form an opinion about it, very often. Thank you so much for being so generous with your time. I appreciate it.Nipun: Thank you.Corey: Nipun Agarwal, Vice President of MySQL HeatWave and Advanced Development at Oracle. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an insulting comment formatted as a valid SQL query.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Entrepreneurial Thought Leaders
Research Insight: Entrepreneurship Education Is About More than Startup Creation

Entrepreneurial Thought Leaders

Play Episode Listen Later Sep 22, 2021 20:45


In a recent paper, Stanford professor Chuck Eesley and Notre Dame professor Yong Suk Lee observed that formal entrepreneurship education helped Stanford alumni founders raise more funding and scale more quickly than peers who received no formal entrepreneurship training. But entrepreneurship education didn't lead to a higher rate of startup creation itself. What should that finding mean for entrepreneurship educators? In this episodes, Eesley poses that question to three thought leaders devoted to training future innovators: Jon Fjeld of Duke's Innovation and Entrepreneurship Initiative, Hadiyah Mujhid of HBCUvc, and Elizabeth Brake of Venture for America. The conversations explore the many ways that entrepreneurship education can impact students and aspiring innovators — even if they never found a company themselves.

Voice of the DBA
Patterns and Potential Problems

Voice of the DBA

Play Episode Listen Later Sep 7, 2021 3:10


I saw a post recently from a developer that needed to refactor and rename a table in a live system. The post describes a pattern for doing so and gives the steps taken, though not the actual code. I like the pattern overall, and I think it can work well in many situations. It's for a PostgreSQL table, so I don't know what restrictions might be different from SQL Server, but this type of pattern can work for SQL Server as well. It also could be problematic. Using the famous "it depends", there could be issues with this pattern, depending on your workload and how your application is structured. The triggers in use could also be an issue in some environments, as they create an additional load. Read the rest of Patterns and Potential Problems

The Cloud Pod
131: The Cloud Pod relaxes and has an AWS data brew

The Cloud Pod

Play Episode Listen Later Aug 27, 2021 78:59


On The Cloud Pod this week, everyone's favorite guessing game is back, with the team making their predictions for AWS Summit and re:Inforce — which were not canceled, as they led us to believe last week.                   A big thanks to this week's sponsors: Foghorn Consulting, which provides full-stack cloud solutions with a focus on strategy, planning and execution for enterprises seeking to take advantage of the transformative capabilities of AWS, Google Cloud and Azure. JumpCloud, which offers a complete platform for identity, access, and device management — no matter where your users and devices are located.  This week's highlights

Open Source – Software Engineering Daily
Grouparoo Open Source Data Tools with Brian Leonard

Open Source – Software Engineering Daily

Play Episode Listen Later Aug 26, 2021 50:55


ETL stands for “extract, transform, load” and refers to the process of integrating data from many different sources into one location, usually a data warehouse. This process has become especially important for companies as they use many different services to collect and manage data.  The company Grouparoo provides an open source framework that helps you The post Grouparoo Open Source Data Tools with Brian Leonard appeared first on Software Engineering Daily.

Software Engineering Daily
Grouparoo Open Source Data Tools with Brian Leonard

Software Engineering Daily

Play Episode Listen Later Aug 26, 2021 50:55


ETL stands for “extract, transform, load” and refers to the process of integrating data from many different sources into one location, usually a data warehouse. This process has become especially important for companies as they use many different services to collect and manage data.  The company Grouparoo provides an open source framework that helps you The post Grouparoo Open Source Data Tools with Brian Leonard appeared first on Software Engineering Daily.

Screaming in the Cloud
Saving Vowels and Upping Security with Clint Sharp

Screaming in the Cloud

Play Episode Listen Later Aug 25, 2021 33:41


About ClintClint is the CEO and a co-founder at Cribl, a company focused on making observability viable for any organization, giving customers visibility and control over their data while maximizing value from existing tools.Prior to co-founding Cribl, Clint spent two decades leading product management and IT operations at technology and software companies, including Splunk and Cricket Communications. As a former practitioner, he has deep expertise in network issues, database administration, and security operations.Links: Cribl: https://cribl.io Cribl sandbox: https://sandbox.cribl.io Cribl.cloud: https://cribl.cloud Jobs: https://cribl.io/jobs TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part my Cribl Logstream. Cirbl Logstream is an observability pipeline that lets you collect, reduce, transform, and route machine data from anywhere, to anywhere. Simple right? As a nice bonus it not only helps you improve visibility into what the hell is going on, but also helps you save money almost by accident. Kind of like not putting a whole bunch of vowels and other letters that would be easier to spell in a company name. To learn more visit: cribl.ioCorey: And now for something completely different!Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest this week for this promoted episode is Clint Sharp, the CEO and co-founder of a company called Cribl. Clint, thank you for joining me, and let's get the big question out of the way first: what is Cribl?Clint: Yeah, so Cribl makes a stream processing engine for log and metric data. And that sounds really dry and boring, but what it really means is, we help connect, in the observability and security world, lots of log and metric sources, so you can take stuff from anywhere and put it to anywhere. And you can think of it like ETL or you can think of it like middleware; it sits there in this particular space, and it's built for SRE and security people.Corey: Now, I looked into this a little bit previously, and I had a sneaking suspicion when I started kicking a few of the tires on this, that there's probably going to be an economic story of optimization and saving money because of a couple things. One, that's what I do; I pay attention to things that save customers money in the end run, and to your company's called Cribl—that's C-R-I-B-L. That should probably have another L and certainly, you should buy a vowel to go in there somewhere, but that's someone optimizing but still keeping things intact enough to be understood slash pronounceable. It really does feel like in this space, saving money on vowels is a notable tenet for companies that focus on saving money.Clint: Yeah, so what's interesting about enterprises is they care about money, and then they don't care about money. And so it's a really good way to get a meeting. We definitely do help people save a ton of money, but ultimately, I think what the value people get out of the product is helping connect all the things that they have. And so one of the biggest problems that we see in the spaces is, “Hey, I have all these agents deployed.” Maybe it's Fluentd or Fluent Bit, or Elastic Beats or Splunk's Forwarder.And I want to get this data over to my fancy new data lake, or over to my machine learning and AI systems, and maybe I want to put it on a Kafka Topic, but it's only designed to work with the thing it's designed to work with. So, if I have Beats deployed, it works with Elastic. Okay, great. How do I also use that same data elsewhere? And really, that's the big problem that we end up solving for our customers.Corey: It's the many-to-many problem. There's a lot of work that's implemented multiple times in multiple ways; it feels like it's effectively you're logging the same thing 15 different times in 15 different ways.Clint: Well, then you look at the endpoint, and you find, “Oh, hey, we've got, like, eight agents rolled out here,” which is, you know, one from each vendor, they're all collecting the same thing. And then people are like, “Oh, man, this is chewing up a ton of resources and we're spending 20 or 30% of every box just, like, collecting security data and IT data. And couldn't that be better?” And then oh, by the way, each one of those agents has their own security surface area, so you have to make sure that those agents themselves are secure because they're often making outbound connections; they're listening for inbound connections. So, we really kind of help at the edge, help people reuse existing resources.Corey: One thing you said a few sentences ago caught me a little bit off guard and I want to dive into that a little bit. You talked about the observability and security world. Now, every time I talk to folks in one of those two spaces, they're sort of tangentially aware of the other one exists, on some level, but they're always framed as two very distinct universes. And you talk about them as if they're effectively one and the same. Was that intentional?Clint: Well, the data is the same. And it starts there because we're collecting log data, and that log data may go into a SIEM tool, and people are using that to try to understand their security posture, and malicious actors, and threats. Oh, and by the way, that same log data is also used for understanding the performance and availability of your systems. The same type of metric data is used in both, the same type of catalogs that say, hey, what is my inventory, and what assets do I have, and where are they deployed? And all of that is relevant for both sides.And the tooling often ends up being very similar, if not identical. And I used to work in Splunk many years ago; that's a tool that's well known for being popular in both camps. And so I developed this decade-long perspective of like, man, I'd show up and actually, they're sitting right next to each other; there's DevOps—DevSecOps now, which are now trying to marry those things. And so certainly, there's just a ton of overlap.Corey: It's still all just sparkling systems administration, but people fight me on that one.Clint: Oh, yeah. Well, yeah, so SRE is sysadmin plus, plus, plus, plus, plus.Corey: Now, I've told it—what is it, it's SRE if it's in the Mountain View region of Silicon Valley. Otherwise, it's just sparkling DevOps? Yep. Same story. It's from my perspective, we called ourselves sysadmins, and then if we called ourselves DevOps, but, “I know, but DevOps isn't a job title.”Great, but it is a 40% raise so I'm going to be quiet about the purity of titles and take the money was my approach back then. And now there are 10 or 15 different ways you can refer to people who are more or less doing the same job and there's no consistency between company to company in many respects. They almost become buzzwords and trite at some point, but it's easier than trying to have a 15-minute conversation in response to, “So, what do you do at whatever company you work at?”Clint: Well, also the grizzled sysadmin persona very much now a security person as well, right? So, you know, coming out of that sysadmin lineage, now I have to learn a whole bunch of new words, and security very much as a discipline, what I would criticize as saying, is very gatekeeper-y in terms of, “Okay, we're going to come up with their own vernacular so that we know that you're not one of us.” That's one of my big criticisms of security. But the skill set, the same people who were sysadmin 20 years ago are definitely becoming security specialists, they're becoming SREs. And so if you share the same lineage, then you're really not all that different.Corey: Well, that's why I launched Last Week in AWS security newsletter podcast combo that just as just recently started launching as of the time that this airs because, “Security is everyone's job,” but strangely, they don't pay everyone like that. And it ties into an entire ecosystem of folks who have to care about security, but the word security doesn't appear in their job title. And most security products seem to be pitched at the executive level where they use the same tired wording that you'll see on airport ads everywhere, or they're talking to InfoSec practitioners—whatever those might look at—and tying into, in some cases, a very hostile community. In other cases, they're talking extensively about the ins and outs of how to overcome and defeat particular attack styles, or the—worst of all worlds—where it just reduces down into compliance and auditing checkboxes, which no one gets super excited about. I'm not interested in any of that.I want to tell stories about, okay, as someone who has other work to get done, what's the security impact of what's happening lately? How do you round it up and distill it down into something useful, instead of something that winds up just acting as a giant distraction and becoming a budget justifier?Clint: Well, security detection, I think, is a really fascinating area. You're seeing a lot of consolidation now between traditional SIEM companies that—Splunk would be in there, but then you've got newer players like Exabeam, you got newer players like CrowdStrike who are coming from the EDR space, and they're coming very strongly and saying, “Hey, look, I own the endpoint but really what I need to be able to do is analyze all this data.” And that's where really these things are combining because tell me that XDR is not fundamentally the same—like, I keep using the word lineage, but the same type of product that I was building a SIEM from before. And most people I talked to are having a really hard time. Like, “What's the difference between XDR and SIEM? Aren't these things largely the same?”But at the same time, then when you look at observability, it's the same problem; I need to be able to ask and answer arbitrary questions of data. And security detection is fundamentally the same problem, I have all this data that's being egressed from my complex systems, all my endpoints, all of my VMs, my containers, all of my infrastructure, all my applications, and I need to be able to detect when someone is doing something wrong, like, some malicious actor is doing something wrong. Tell me that's not observability.Corey: Of course it is. And the same problems apply to both where, if I have something happened in my application and my observability tooling doesn't tell me for 20 minutes, that's kind of a problem in the same way that you have that in the security space. Yet somehow, AWS's CloudTrail takes about that, on average, to wind up surfacing various things that are happening in the environment. In many cases, the entire event can be over by the time CloudTrail says, “Hey, there's a thing going on.” For those who aren't familiar, CloudTrail effectively captures management events that happen talking to the AWS APIs.So, someone creates something, someone accesses something, et cetera, et cetera. That's useful when you need that, but if you're going to take action based on that, you want to know sooner rather than later. Same story with any sort of monitoring tool that, “Oh, yeah, the site's taking an outage and our system will let us know in only 20 short minutes.” Oh, I assure you customers will tell us long before then.Clint: That's sort of dovetails into some of the things that we see in the marketplace that we help with which are—talk about CloudTrail, people say all data is security relevant but I have to pay for all that data, too, so that data has to go somewhere. Do I care about every cloud—of course, I don't care about every CloudTrail event; I care about some subset of those.Corey: And honestly, in the full sweep of time, you really care about that one specific CloudTrail thing, but it's the needle in the haystack.Clint: And so AWS, this is a constant conflict between people who have to observe and secure systems need all the data because I may not know in advance what question I want to ask, but at the same time, I do know that not all of that is necessarily interesting right now, and so there's a fundamental tension between, okay, the developer says, “Well, look. You can't ask a question of data that's not there, so I'm going to put everything in the log. Literally every byte of data, everything that I could ever think of, I'm going to put in that log.”And then the receiver of that says like—I'll give a good example. We've been talking about EDR. CrowdStrike EDR logs, phenomenal data source, have a ton of really interesting information about the security of your endpoints, and they also have an extra 100 fields that nobody gives a crap about. So, what do I do with that data? Do I pay to ingest all that data because all my vendors are charging me based off the bytes of data that are going into their platforms? And so there's a real optimization potential there to have a really strong opinion on what good data is.Corey: Part of the problem, too, is that you absolutely want the totality of everything captured around the specific event you care about. But by and large, we've all been in environments where we have a low-traffic app, and we see giant piles of web server logs. “Okay, great. Let's take a look at what those web server logs are.” And by volume, it's 98% load balancer health checks showing up.It seems to me there might either be a way to strip them out entirely or alternately express those in a way that is a lot more compact and doesn't fill things out. I still feel like there's some terrible company somewhere where their entire way of getting signal from noise is to pay a whole bunch of interns to read the entire log by hand. I like to imagine that is me speaking hyperbolically, but I'm kind of scared it's not.Clint: Yeah. And then the question is, well, then how do I achieve a goal of actually getting the right data to the right place? So, that's something that we help out about. I think that the—I feel a lot for the persona of this kind of sysadmin, this type of security person because they're caught in this tension: like, do I go write code? My skill set as an SRE or my skill set as a security person is being an expert in the data itself.I know that event is good, and I know that event is bad. Am I also supposed to be a person who then needs to go write a bunch of pipelines and Lambda functions, and how do I actually achieve the goal because there's always way more demand than there is capacity to be able to onboard all of this data. So fundamentally, how do we get the right thing to the right place?Corey: That's, on some level, a serious problem. I will say that looking at what you do and how you do it, you take a whole bunch of different disparate data sources, and then effectively reduce all of those into passing through the Cribl log stream, and then sending the data out to exactly where it needs to go. And I have to imagine that when you talk about what you're doing to typical VCs and whatnot, their question is, “Ah, but what if AWS launches a thing to do that?” To which I can only assume that your response must have been, “You're right, if AWS does learn to speak coherently and effectively across all of their internal service teams, we're going to have a serious problem.” At which point, I can only imagine that your VCs threw back their heads, you shared a happy laugh, and then they handed you another $200 million, which you have just raised. Congratulations, by the way.Clint: Thank you so much. It's, you know, people say a lot of times in startup-land, like, “Oh, we shouldn't celebrate the fundraising.” I'll tell you, as a person who's done it a few times, I celebrate. That's a shitload of work.Corey: Oh, absolutely. I looked into it in the very early days of, okay, as I'm building out what would become The Duckbill Group, do I talk to VCs and the rest? And I did a little bit of investigation, and it's, wow, that it's so much work to build the pitch deck and have all the meetings and wind up doing all of that. I'd rather just go and sell things to customers and see how that works. And oh, that turned out to raise money that I don't have to repay.Okay, that seems like a different path. And there are advantages and disadvantages to every approach you can take on this. I mean, yeah, no shade here on how you decide to build out a technology company using VC-backed up resourcing, which is a sensible way to do it, but it's a different style. And the sheer amount of work that very clearly goes into raising a fundraising round is just staggering to me. And that's for seed-level rounds; I can only imagine down the path. This is not your first round.Clint: Yeah, I mean, it's a validation, I think, of where we're going, and really, kind of, our vision because we've been talking a lot about how data moves, but I think one of the other key concepts that we're advocating for that there's a net-new concept in the industry is this concept of an observability lake. And back to that tension of there's always way more data, S3 as an example provides excellent economics, but very few people provide a way for you to use just raw data that I end up going and dumping into S3. And that's really the fallback for it. Like, if I don't know what to do with this data, I don't want to delete it because what if it becomes security relevant? Let's talk about the SUNBURST SolarWinds attack.Everybody in the industry wishes that they had every flow log, every log from every endpoint dating back two or three years so that they could actually go do a detailed investigation of, “Okay. That SolarWinds box got breached, and what all was it talking to?” And they can actually build a graph from that and go understand that. But most people have deleted all that data. They've decided that I can't afford to have it anymore.And so really, this concept of a lake is like, well, look, I can finally at least put it somewhere as an insurance policy and make sure that's actually going to be relevant. And then eventually what's going to be happening is people are going to go help you make use of that data—and we will as well—be going out there to help you take petabytes and petabytes and petabytes of logs data, metric data, trace data, observability data and give you the ability to analyze that effectively.Corey: My constant complaint about the term ‘data lake'—because I've seen this happen in various client environments, AWS will release something that specifically targets data lakes, and I'll talk to my client about that service. “This is a data lake solutions, but it would be awesome.” And they look at me like I'm very foolish and say, “Yeah, we don't have a data lake.” To which my response is, “Great. What's that eight petabytes of data sitting in S3?” “Oh, it's mostly logs.”And I don't think that they're foolish, I don't think I'm foolish, but very often talking to folks who have data lakes do not recognize what they have as being a data lake because that feels almost like it's a marketing term that has been inflicted on people. Like, they would consider it—because we all consider it this way—as more of a data morass. You're not really sure what's in there; you're told by your data science teams, who are incredibly expensive, that one day we'll unlock value in all of those web server logs, the load balancer health checks dating back to 2012, but we just don't know what that is yet. But do you really want to risk deleting it? And it becomes this, effectively, deadstock that sits there.So, you want to retain it, particularly if you have compliance obligations. There's—theoretically at least—business value locked up in those things and you need to be able to access that in a reasonable way. And anytime I see tooling that winds up billing based upon amount of data stored in it, so just cut retention significantly. It feels like it cuts against the grain of what they're trying to do.Clint: I mean, yeah, retention, I mean, especially for security people—this is the difference between security and operations because operations is like, “Last 24 hours a data, I need. Pretty much after that, give me some aggregated statistics and I'm good.” Security people want full-fidelity data dating back years. But I think one of the other important concepts that we haven't seen in the industry, and part of what we're trying to change is, you know, I put data into a tool today. It's that tool's data, right?So—and it doesn't matter which tool it is that I'm put—they're all the same. But fundamentally, I put data into a metrics or time-series database and put data into a logging tool, and that data is now owned by that vendor. And the big difference that we see in the concept of a lake is raw data at rest in S3 buckets—or other object storage depending on your cloud provider, depending on who, on-prem, is providing you that interface—in a way in which I can choose in the future, what tool is going to use to analyze that and I'm no longer locked in. And I think that's really what we've been trying to advocate as an industry is that every enterprise I've talked to has everything. They've got one of every single tool and none of them are going away.There is no such thing as a single pane of glass; that's a myth that we've been talking about for 30 freaking years and it's just never actually going to happen. And so really, what you need to be able to do is integrate things better and just make sure that people can actually use the tool that they want to use to analyze the data in the way that they see fit, and not be bound by the decision that was made six months ago as to which tool to put it in.Corey: This episode is sponsored in part by Thinkst. This is going to take a minute to explain, so bear with me. I linked against an early version of their tool, canarytokens.org in the very early days of my newsletter, and what it does is relatively simple and straightforward. It winds up embedding credentials, files, that sort of thing in various parts of your environment, wherever you want to; it gives you fake AWS API credentials, for example. And the only thing that these things do is alert you whenever someone attempts to use those things. It's an awesome approach. I've used something similar for years. Check them out. But wait, there's more. They also have an enterprise option that you should be very much aware of canary.tools. You can take a look at this, but what it does is it provides an enterprise approach to drive these things throughout your entire environment. You can get a physical device that hangs out on your network and impersonates whatever you want to. When it gets Nmap scanned, or someone attempts to log into it, or access files on it, you get instant alerts. It's awesome. If you don't do something like this, you're likely to find out that you've gotten breached, the hard way. Take a look at this. It's one of those few things that I look at and say, “Wow, that is an amazing idea. I love it.” That's canarytokens.org and canary.tools. The first one is free. The second one is enterprise-y. Take a look. I'm a big fan of this. More from them in the coming weeks.Corey: I can tell this story—why not. I don't imagine it was your direct fault, but nine years ago, now—so I should disclaim this. I am not even suggesting this is the way it is today. I was at a startup and we reached out to Splunk to look at handling a lot of our log analysis needs because it turned out we had a bunch of things that were spewing out logs. Nothing compared to what most sites look at these days, but back then for us, it felt like a lot of data.And we got a quote that was more than the valuation of the company at the time. Because it seems like their biggest market headwind at the time was the rise of democracy basically making monarchies go out of fashion, and there were fewer princesses that we could kidnap for ransom in order to pay the Splunk bill. And, to their credit, they reached out every quarter and said, “Oh, have your needs change any?” “No, we have not massively inflated the value of this company so we can afford your bill. Thank you for asking.”But the problem that I had is when I pushed back on them on this—because it's not just one of those make fun of it and move on stories because Splunk was at the time very much the best-of-breed answer here—their response was, “Oh, just go ahead and log less and that brings your bill back into something that's a lot more cohesive and understandable.”Clint: Which destroys the utility of the whole tool to begin with.Corey: Exactly. The entire reason to have a tool like that is to go through vast quantities of data and extract meaning from it. And if you're not able to do that because you have less data, it completely defeats the value proposition of what it is you're bringing to the table. Because in the security space, in many ways in the observability space, and certainly in my world of the cost optimization space, it's an optimization story. It does not speed your time to market, it does not increase revenue in almost every case, so it's always going to be a trailing function behind things that do.Companies are structured top to bottom in order to increase revenue and enter new markets with the right offerings at the right times and serve customers because that can massively increase the value of the company. Reduction and, I guess, the housekeeping stuff is things people get really excited about for short windows of time and then not again. It's inconsistent.Clint: Yeah, about every time the bill comes due is when they get really excited about it.Corey: Exactly. And I have to assume on some level, this was one of those, “Okay, first start using it. You'll see how valuable it becomes, and then you'll start logging more data.” But it didn't feel right because it's either being disingenuous, or it's saying that, “Oh, don't worry. You'll find the money somehow.”Which is not true in that scenario. Now, they've redone their pricing multiple times since then. There are other entrants in the market that help us look at data in a bunch of different ways, but across the board, it's frustrating seeing that there are all these neat tools that I wanted to use and I was perfectly positioned to use back then, and now nine years later, when someone says, “Oh, we use Splunk.” My immediate instinctive reaction is, “Oh, wow. You must have a lot of money to spend on services.” Which is not necessarily even close to reality in some cases, but first impressions like that really stick around a long time.Clint: Oh, absolutely. They stick around often because they're reaffirmed multiple times throughout [laugh] people's continued interactions. And I think there's just really a fundamental tension in the marketplace where the value proposition is massive amounts of data. And massive is different, depending on the size of your organization: if you're a big Fortune 100, massive might be, you know, a 100 petabytes at rest and a petabyte a day of data moving; or for you, massive might be a terabyte a day moving, and maybe a 50 terabytes at rest. But—and by the way, that's not going down.So, some of the bigger trends that we're seeing with the advent of zero trust, with the advent of remote work, with just in general growth of cloud containerized workloads, microservices, people are seeing a lot more data today than they were seeing two years ago, three years ago. And by the way, it's not like IT went from 2% of the budget to 10% of the budget. The budget's the same, so I got to do more with less. And it's a tension between data growth and cost and capacity. And so we got to get smarter.Corey: I like the fact that you're saying that you have to get smarter as you think about this from a tool perspective of being able to serve your customers, as opposed to a lot of tooling out there seems to inherently and intrinsically take the world view—and I don't know if this is an actual choice or just an unfortunate side effect—of, “Yeah, we have to educate our customers because right now, our customers are fairly dumb and we'd like it if they were smarter. If you were smart enough to appreciate how we do things, then things will go super well.” And I always found that to be a condescending attitude that doesn't serve customers super well. And it also leaves a lot of money on the table because for better or worse, you have to meet customers where they are: at their level of understanding, at their expression of the problem. And I've talked to a number of folks over at Cribl and, similar to certain large cloud providers, one of the things that you focus on is the customer; it's clearly a value of the company. How do you think about that?Clint: I'm a thousand percent agree with you. And for us, what I found after having been a practitioner for a decade and then working my way over to the vendor side, it's really nothing specific about one particular employer. Being a vendor is so complex. There's all these things that you're trying to con—you have investors, and you have the press, and analysts, and you have people who are constantly trying to influence where it is that you're—“I need to be in the upper right of the Gartner Magic Quadrant, so I have to make sure that those analysts really believe what it is that I'm saying.” And then pretty soon, just nobody even talks about the customer anymore.It's like, well, do people actually want to buy it? Is this thing actually solving real problems? And so from the beginning, me and my co-founders, we just wanted to make sure that the concept of the customer was embedded at the core of the company. And every time that an employee at Cribl is interacting and talking about what should we do next, and what features should we build, and how should we market, and how should we sell, let's make sure the customer is there. Customers first always is the value, including in how we sell.We actively leave money on the table when it's not in the customer's right interest because we know that we want them to come back and buy from us again, later. When we market, we try to make sure that we're speaking to our customers in a language that is their language. When we're building a product, we use the product, we try to make sure that this is actually everyday, we don't look at, hey, it needs to look like this and have these features to meet these criteria and be called this. It's just like, “Well, does it actually help the customer solve a real problem for them? If so, let's build it. And if not, then who gives a [BLEEP]?”Corey: Exactly. It's understanding what your customers' pain points are. I mean, I ran into some similar problems when I was starting my consultancy where I—it turns out that I knew people who were more or less top of their class when it came to AWS bill understanding, reconciliation, and the rest. And those are the people I reached out to because I assumed that they knew what they're doing. There must be lots of people like them, everyone must be like these folks.And I talked to them about how they looked at their AWS bill. And, okay, “They said I would—I'd love to hire you to come in and do this as a consultant, but I would expect this, this, this, this, and this.” And, “Okay, I better come loaded for bear.” And so I did. And it turns out there's a lot more people out there who have never heard of a savings plan or a reserved instance before or, “Wait. You mean continues to charge me even after I'm still using it if I don't turn it off?” Yes, that is generally how it works.There's nothing wrong with that level of understanding of these things—well, there are several things wrong but that's beside the point—but understanding where folks are and understanding how you can meet them where they are and get them to a better place is way more important than trying to prove that I'm the smartest kid in town when it comes to a lot of the edge case, and corner cases, and nuanced areas. And so many tools seem to have fallen in love with their own tooling, and in love with how smart they are, and how clear their lines of thought leadership are, that they've almost completely forgotten that there are people in the world who do not think like that, who do not have the level of visibility or deep thought into the problem space; they just know that the logs are unmanageable, or the bill for this thing is really expensive, or whatever their expression or experience of that problem is, there are tools out there that can help them, but all of the messaging, all of the marketing distills down to, “Oh, you must be at least this smart to enter,” like it's an amusement park ride with a weird sign.Clint: Software is fundamentally a people business and when you end up implementing a tool—what's become fascinating to me as I've become the CEO of this company, rather than just kind of a product guy, so now I've had to sell it and I've had to market it, and I had to start very much from scratch, is that this stuff doesn't just get implemented by magic; even if they download the tool and is the easiest to use tool that you've ever used, they still don't have the time to learn all the details and intricacies of your product, and so hey, they actually want some professional services people to come and go install that; they want a salesperson to help them understand the value. I know a lot of people, especially coming from my background in, like, SRE or sysadmin from when I was doing it, kind of, “Oh, salespeople.” But, like, they do a real job; they help you articulate the value of this thing so that your bosses understand what you're actually buying. The sales engineers help you understand what those features are. And so having a customer-aligned company means that every interaction that they have with you needs to be a really, really great interaction so that they want to interact with you again because fundamentally, even though the bits are really awesome and they solve this really awesome technology challenge, nobody really cares about it.Ultimately, they're buying from people, they're implementing software built by people, and they're calling for support—which is another important part—from people who fundamentally care about them as well. So, in every interaction, fundamentally software is a people business, and you got to have the best people and the people that care.Corey: I wish more people took that philosophy because, frankly, it's missing from an awful lot of different expressions of what companies do. It's oh, if we can make the code just a little bit smarter, a little bit more predictive, then we never have to talk to the customer at all. It's, “No. You shouldn't write a line of anything before doing a whole bunch of customer research to validate that your understanding of the problem space aligns with theirs.”Clint: A good way to find out that doesn't work is to fail for a while, too. So, [laugh] so we did our fair share of that, too, and kind of pontificating and trying to figure out what we thought was best at the market, and it turned out that really what you needed to be able to do was to work closely with customers and understand their problems and tightly pair that sales cycle, that marketing messaging, that product all towards customer pain. And if you do that, customers are great because they see the people who care, and they will reward you by becoming your customer and continuing to advocate for you and talk about you. And it's so rewarding if you can take the right perspective.Corey: So, we've covered a fair number of things: your philosophy on the world of security versus observability; we've talked about meeting customers where they are; we've talked about AWS being so inept at communicating internally and cross-functionally that you're able to raise staggeringly large rounds, and we've talked about, I guess, how we wind up viewing the world of log collection, for lack of a better term. If people want to learn more about what you're up to, and how you get there, where can they find you?Clint: Yeah, go to cribl.io. If you're a hands-on product person and you just want to see what we do, you can go to sandbox.cribl.io. And there's an online learning course, takes about an hour, walk you through the product. We'd love for you to try it.Corey: Oh, I don't have to speak to a salesperson?Clint: No, you don't have to talk to anybody. You can download the bits, you can try our cloud product for free at cribl.cloud. We are all about making sure that engineers can get access to the product before you have to talk to us. And if you think that's valuable, if this helps you solve a problem, then and only then should you engage with us and we'll see if we can figure out a way to sell you some software.Corey: Customer-focused. I'm also going to take a spot check here. I'm going to guess that given your recent funding news, you're also aggressively hiring.Clint: We are hiring across every function, and if you are interested in working for our customers-first software company and this sounds refreshing, please check out cribl.io/jobs, and we've hiring everywhere.Corey: I can endorse. We used to hang out, back before you wound up starting this place, and you were kicking around this idea of, “I have an idea for a company,” and my general perception is, “Eh, I don't know. Doesn't sound like it has legs to me.” And well, here we are. I sure can pick them. Badly. Clint, thank you so much for taking the time to speak with me.Clint: Thanks, Corey. It's been a pleasure.Corey: Clint Sharp, CEO and co-founder of Cribl. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you hated this podcast, please leave a five-star review on your podcast platform of choice along with an insulting comment telling me exactly why I'm wrong about the phrase ‘data lake' and tell me how many petabytes of useless material you have sitting in S3.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Building the Backend: Data Solutions that Power Leading Organizations

Travis welcomes to his podcast Saket Saurabh, who provides a window into the world of data management and the self-service options that are democratizing it. Co-founder and CEO of Nexla, Saket has a passion for data and infrastructure and how to improve its flow among partners, customers and vendors. Nexla automates various data engineering tasks, intelligently creates an abstraction of data and enables collaboration among people at different skill levels. Named a 2021 Cool Vendor by Gartner, Nexla is a leader in data preparation, integration and tracking.Top 3 value bombs: Data architectures overall need to be more abstract to enable future flexibilityThe first stumbling block for most organizations is not knowing where to locate their data.ETL is dead. The ELT model has become central while streaming and real-time use cases are becoming prevalent.

Raw Data By P3
Greg Beaumont

Raw Data By P3

Play Episode Listen Later Aug 17, 2021 82:45


We didn't know what to expect when we sat down with Greg Beaumont, Senior Business Intelligence Specialist at Microsoft specializing in serving Microsoft's Healthcare space customers' technical Power BI issues.  What we got was an insightful, delightful, and impactful conversation with a really cool and smart human! References in this Episode: The Game Azure Health Bot The Future Will Be Decentralized-Charles Hoskinson Spider Goats Episode Timeline: 3:10 - The magic of discovery with the Power Platform, It's all about the customers(and Greg has a LOT of customers!), and Greg's Data Origin Story 21:10 - The IT/Business Gap, Getting good BI and keeping data security is a tricky thing, The COVID Challenge hits Healthcare 43:00 - Power BI-Not just a data visualization tool, a very cool discussion on Genomics and using data to save lives, the importance of Data Modelling 59:10 - The Bitcoin Analogy, The VertiPaq Engine and when is Direct Query the answer 1:08:30 - We get a little personal with Greg, Azure/Power BI integration and Machine Learning, Cognitive Services and Sentiment Analysis Episode Transcript: Rob Collie (00:00:00): Hello, friends. Today's guest is Greg Beaumont from Microsoft. Like one of our previous guests, hopefully, Greg has one of those interface jobs. The place where the broader Microsoft Corporation meets its customers at a very detailed and on the ground level. On one hand, it's one of those impossible jobs. More than 100 customers in the healthcare space look to Greg as their primary point of contact for all things technical, around Power BI. That's a tall order, folks. And at the same time, it's one of those awesome jobs. It's not that dissimilar, really, from our job here at P3. Rob Collie (00:00:45): In a role that, first of all, you get broad exposure to a tremendous number of organizations and their problems, you learn a lot super, super quickly. When you're doing it right, your work day is just nonstop magic. The power platform is magic and not really because of the technology, but instead because of its impact on the people who use it, who interact with it, who benefit from it, whose lives are changed by it. And again, I can't stress this enough, software usually doesn't do this. And as we talked with him, Krissy and I just couldn't stop nodding, because we could hear it, he lives it, just like we do. And I hope that just leaps out of the audio for you like it did for us. Rob Collie (00:01:32): No surprises here, Greg didn't start his life as a data professional. He's our second guest on this show, whose original training was in biology. And so, some familiar themes come back again, that good data professionals come from a wide variety of backgrounds, that the hybrid tweeners between IT and business are really where the value is at today. And I love this about Greg, that we made a point of talking about how much easier it is today to break into the data profession than it's ever been and what an amazing thing that is to celebrate. Rob Collie (00:02:06): We talked about COVID and specifically its impacts on the industry. How that has served as a catalyst for many organizations to rethink their analytic strategy, the implications of remote work, data privacy and security. And of course, it wouldn't be an episode of Raw Data, if we didn't nerd out about at least one thing. So, we get a little bit into genomics and the idea of DNA and RNA as forms of biological computer code. And as you'd expect, and want, Greg is far from a one dimensional data professional, just such an interesting person, authentically human, a real pleasure to speak with, so let's get into it. Announcer (00:02:47): Ladies and gentlemen, could I have your attention, please. Rob Collie (00:02:51): This is the Raw Data by P3 adaptive podcast with your host, Rob Collie. Find out what the experts at P3 Adaptive can do for your business. Just go to p3adaptive.com. Raw Data by P3 Adaptive is data with the human element. Rob Collie (00:03:13): Welcome to the show, Greg Beaumont. How are you? Greg Beaumont (00:03:17): I'm doing well. How are you all? Rob Collie (00:03:19): I think we're doing pretty well. Greg Beaumont (00:03:19): Awesome. Rob Collie (00:03:20): Business is booming. Data has turned out to be relatively hot field, but I think it's probably got some legs to it. And the Microsoft platform also, well, it's just kind of kicking ass, isn't it? So, business wise, we couldn't be better. I think personally, we're doing well, too. We won't go into all that. What are you up to these days? What's your job title and what's an average day look for you? Greg Beaumont (00:03:39): So, I'm working in Microsoft and my title is Technical Specialist. And I'm a Business Intelligence Technical Specialist, so I focus almost exclusively on Power BI and where it integrates with other products within the Microsoft stack. Now, I'm in the Microsoft field, which is different from a number of guests you've had, who work at corporate and we're working on the product groups, which is that I'm there to help the customers. Greg Beaumont (00:04:01): And you hear a lot of different acronyms with these titles. So, my role is often called the TS. In the past, it was called a TSP. It's just a change in the title. Sometimes you might hear the title, CSA, Cloud Solution Architect. It's very similar to what I do, but a little bit different. But effectively from an overarching standpoint, our goal in the field as Technical Specialists is to engage with customers, so that they understand how and where to use our products, and to ensure that they have a good experience when they succeed. Rob Collie (00:04:29): Your job is literally where the Microsoft organism meets the customers. Greg Beaumont (00:04:34): Yep. Rob Collie (00:04:35): That's not the role I had. I was definitely on the corporate side, back in my days at Microsoft. I think the interaction between the field and corporate has gotten a lot stronger over the years. I think it's a bit more organic, that interplay, that it used to feel like crossing a chasm sort of thing. And I don't think that's really true anymore. Greg Beaumont (00:04:54): At a green, I think that's by design, too. So, with the more frequent release schedules and also kind of how things have changed under Satya, customer feedback drives the roadmap. So when these monthly updates come out, a lot of it is based off of customer demand and what customers are encountering and what they need. So, we're able to pivot and meet the needs of those customers much more quickly. Rob Collie (00:05:15): Yeah, you mentioned the changing acronyms, right? I mean like yes. My gosh, a thousand times yes. It's almost like a deliberate obfuscation strategy. It's like who's what? Why did we need to take the P off of TSP? I mean, I'm sure it was really important in some meeting somewhere, but it's just like, "Oh, yeah, it's really hard to keep track of." It's just a perpetually moving target. But at the same time, so many fundamentals don't change, right? The things that customers need and the things that Microsoft needs to provide. The fundamentals, of course, evolving, but they don't move nearly as fast as the acronym game. Greg Beaumont (00:05:52): Right. I think that acronym game is part of what makes it difficult your first year here, because people have a conversation and you don't know what they're talking about. Right? Rob Collie (00:06:00): Yeah, yeah, yeah. Greg Beaumont (00:06:00): And if they just spelled it out, it would make a lot more sense. Rob Collie (00:06:03): Krissy was talking to me today about, "Am I understanding what Foo means?" There's an internal Microsoft dialect, right? Krissy was like, "Is Foo like X? Is it like a placeholder for variable?" I'm like, "Yes, yes." She's like, "Okay. That's what I thought, but I just want to make sure." Krissy Dyess (00:06:18): That's why there's context clues in grade school really come into play when you're working with Microsoft organization, because you really got to take in all the information and kind of decipher it a bit. And those context clues help out. Greg, how long have you been in that particular role? Has it been your whole time at Microsoft or are have you been in different roles? Greg Beaumont (00:06:36): So, I should add, too, that I'm specifically in the healthcare org, and even within healthcare, we've now subspecialized into sub-verticals within healthcare. So, I work exclusively with healthcare providers, so people who are providing care to patients in a patient care setting. I do help out on a few other accounts, too, but that's my primary area of responsibility. Greg Beaumont (00:06:55): So, I started with Microsoft in 2016. I was actually hired into a regional office as what's called the traditional TSP role and it was data platform TSP. So, it was what used to be the SQL Server TS role. A few months later, the annual realign happened, I got moved over to Modern Workplace because they wanted to have an increased focus on Power BI, and I had some experience in that area. Plus, I was the new guy, so they put me into the experimental role. A year later, that's when they added the industry verticals and that's when I moved into what is kind of the final iteration of my current role. And the titles have changed a few times, but I've effectively been in this role working with healthcare customers for over four years now. Rob Collie (00:07:35): And so, like a double vertical specialization? Greg Beaumont (00:07:37): Yeah. Rob Collie (00:07:37): Healthcare providers, where there's a hierarchy here? Greg Beaumont (00:07:40): Yeah, yeah. Rob Collie (00:07:41): Those are the jaw dropping things for me is sometimes people in roles like yours, even after all that specialization, you end up with a jillion customers that you're theoretically responsible for. Double digits, triple digits, single digits in terms of how many customers you have to cover? Greg Beaumont (00:07:58): I'm triple digits. And that is one of the key differences from that CSA role that you'll see on the Azure team is they tend to be more focused on just a couple of customers and they get more engaged in kind of projects. And I will do that with customers, but it's just, it's a lot more to manage. Rob Collie (00:08:14): Yeah. What a challenging job. If you think about it, the minimum triple digit number is 100, right? So, let's just say, it's 100 for a moment. Well, you've got 52 weeks a year plus PTO, right? So, you're just like, "Okay." It is very, very difficult to juggle. That's a professional skill that is uncommon. I would say that's probably harder than the acronym game. Greg Beaumont (00:08:37): Yeah, there's been times I was on a vacation day and I got a call. I didn't recognize the number. I'm like, "Okay, I'm going to have to route this to somebody because I'm off today." And they're like, "Well, I'm the VP of so and so and we need to do this." And I'm like, "Okay, I got to go back inside and work now, because this is an important call." So, you have to be flexible and you're correct, that it makes it a challenge to have that work-life balance also, but the work is very rewarding, so it's worth it. Rob Collie (00:09:01): Yeah. It's something that vaguely I have a sense of this. I mean, transitioning from corporate Microsoft to, I mean, you can think of my role now as field. I'm much, much closer to the customers than I ever was at corporate. And yes, Brian Jones and I talked about it a little bit. And this is a bit of an artifact of the old release model that it was like every few years, you'd release a product, which isn't the case anymore. But that satisfying feeling of helping people, like even if you build something amazing back at Microsoft in the days that I was there, you were never really around for that victory lap. You would never get that feedback. It even never make it to you. Rob Collie (00:09:37): It was years later muted whereas one of the beautiful things about working closely with customers and our clients with Power BI, and actually the Microsoft platform as a whole, is just how quickly you can deliver these amazingly transformational like light up moments that go beyond just the professional. You can get this emotional, really strong validating emotional feeling of having helped. And that is difficult to get, I think even today, probably, even with their monthly release cycles, et cetera. By definition, you're just further removed from the "Wow" that happens out where the people are. Greg Beaumont (00:10:15): Yep. And I'm sure you all see that, too, with your business is that a lot of work often goes into figuring out what needs to be in these solutions and reports, but when you actually put it in the hands of leaders, and they realize the power of what it can provide for their business, in my case for their patients, for their doctors, for their nurses, it becomes real. They see it's actually possible and it's not just a PowerPoint deck. Rob Collie (00:10:38): And that sense of possibility, that sense of almost child-like wonder that comes back at those moments, you just wouldn't expect from the outside. I had a family member one time say, "Oh, Rob, I could never do what you do." Basically, it was just saying "How boring it must be, right?" It's so boring working with software, working with..." I'm like, "Are you kidding me? This is one of the places in life where you get to create and just an amazingly magical." It's really the only word that comes close to capturing it. You just wouldn't expect that, right? Again, from the outside like, "Oh, you work in data all day. Boring." Greg Beaumont (00:11:17): I'd add to that, that I'd compare it to maybe the satisfaction people get out of when they beat a game or a video game. That when you figure out how to do a solution and it works and you put in that time and that effort and that thought, there's that emotional reward, you get that I built something that that actually did what they wanted it to do. Rob Collie (00:11:35): Yeah. And after you beat the video game, not only did that happen, but other people's lives get better as a result of you beating this game. It's just like it's got all those dynamics, and then some. All these follow on effects. Greg Beaumont (00:11:46): It's like being an athlete and enjoying the sport that you compete in. Rob Collie (00:11:50): Yeah. We're never going to retire. We're going to be the athletes that hang on way too long. Greg Beaumont (00:11:56): Yep. Rob Collie (00:11:58): So, unfortunately, I think our careers can go longer than a professional athletes, so there's that. I can't even really walk up and down stairs anymore without pain, so. So what about before Microsoft? What were you up to beforehand and how did you end up in this line of work in the first place? Greg Beaumont (00:12:15): Sure. And I think that's actually something where listeners can get some value, because the way I got into this line of work, I think today, there's much more opportunity for people all over the world from different socioeconomic backgrounds to be able to break into this field without having to kind of go through the rites of passage that people used to. So, I was actually a Biology major from a small school. Came from a military family. I didn't have corporate contacts or great guidance counseling or anything like that. My first job right out of school was I said, "Oh, I got a Biology major. I got a job at a research institution." They're like, "Okay, you're going to be cleaning out the mouse cages." And it was sort of $10.50 an hour. Greg Beaumont (00:12:53): So, at that point, I said, "Okay, I got to start thinking about a different line of work here." So, I kind of bounced around a little bit. I wanted to get into IT, but if you wanted to learn something like SQL Server, you couldn't do it unless you had a job in IT. As an average person, you couldn't just go buy a SQL Server and put it in your home unless you had the amount of money that you needed to do that. Side projects with Access and Excel. Small businesses did things probably making less than minimum wage and side gigs, in addition to what I was doing for full-time work to pay the bills. Eventually caught on with a hospital where I was doing some interesting projects with data using Access and Excel. They wouldn't even give me access to Crystal Reports when we wanted to do some reporting. That was really where I kind of said , "Data is where I want to focus." Greg Beaumont (00:13:41): We did some projects around things like Radon Awareness, so people who would build a new house now, they're like, "Oh, I have to pay $1500 for that Radon machine down in the basement." But when you talk to a thoracic surgeon and their nursing team and you hear stories about people who are nonsmokers, perfectly healthy, who come in with tumors all over their lungs, you realize the value there and by looking at the data of where there's pockets of radon in the country reaching out to those people has value, right? I think it's that human element where you're actually doing something that makes a difference. So, that kind of opened my eyes. Greg Beaumont (00:14:14): I then after that job, I got on with a small consulting company. I was a Project Manager. It was my first exposure to Microsoft BI. It was actually ProClarity over SQL Server 2005 and we were working with data around HEDIS and Joint Commission healthcare performance measures for one of the VA offices. So, I was the PM and the Data Architect was building the SSIS packages, built out kind of skeleton of an analysis services cube. He asked me to lean in on the dashboarding side, and that's also where I started learning MDX because we were writing some MDX expressions to start doing some calculations that we were then exposing in ProClarity. And at that point, it was like, "This is magic." Greg Beaumont (00:14:57): From a used case perspective, what they were doing traditionally doing was they'd send somebody in from some auditing agency, who would look at, I think it was 30 to 60 patient records, for each metric and then they take a look at where all of the criteria hit for that metric, yes or no. And it would be pass/fail, how good is this institution doing of meeting this particular expectation. So, it would be things like, "Does a patient receive aspirin within a certain amount of time that they've been admitted if they have heart problems?" Something like that. With looking at it from a data perspective, you can look at the whole patient population, and then you could start slicing and dicing it by department, by time of day that they were admitted, by all of these different things. Greg Beaumont (00:15:38): And that's when I kind of said, "This is really cool, really interesting. I think there's a big future here." And I kind of decided to take that route. And from there, I got on with a Microsoft partner, where I stayed for about six years. And that's kind of where I was exposed to a lot of very smart, very gifted people. And I was able to kind of learn from them and then that led to eventually getting a job at Microsoft. But to make a long story short, today, you could go online and get Power BI Desktop for free. There's training resources all over the place, and you could skill up and get started and get a great job. I'd like to tell people take the amount of time you spend every night playing video games and watching television, take half that time and devote it to learning Power BI and you'll be amazed at how far you get in six to 12 months. Rob Collie (00:16:24): That's such good advice. I'm not really allowed to play a lot of video games, so I might need more time than that. But I had my time to do that years ago, learning DAX and everything. A couple of things really jumped out at me there. First of all, you're right, it was almost like a priesthood before. It was so hard to get your foot in the door. Look, you had to climb incrementally, multiple steps in that story to just get to the point where you were sitting next to the thing that was SSIS and MDX which, again, neither of those things had a particularly humane learning curve. Even when you got there, which was a climb, you get to that point and then they're like, "And here's your cliff. Your smooth cliff that you have to scale. If you wanted a piece of this technology," right? Rob Collie (00:17:11): You wanted to learn MDX, you had to get your hands on an SSAS server. The license for it. And then you had to have a machine you could install it on that was beefy enough to handle it. It's just, there's so many barriers to entry. And the data gene, I like to talk about, it does. It cuts across every demographic, as far as I can tell, damn near equally everywhere. Let's call it one in 20. It's probably a little less frequent than that. Let's call it 5% of the population is carrying the data gene and you've got to get exposure. And that's a lot easier to get that exposure today than it was even 10 years ago. Greg Beaumont (00:17:50): I'd completely agree with that. The people in this field tend to be the type of people who likes solving puzzles, who like building things that are complex and have different pieces, but who also enjoy the reward of getting it to work at the end. You've had several guests that have come on the show that come from nontraditional backgrounds. But I'm convinced that 20 years ago, there were a lot of people who would have been great data people, who just never got the opportunity to make it happen. Greg Beaumont (00:18:14): Whereas today, the opportunity is there and I think Microsoft has done a great job with their strategy of letting you learn and try Power BI. You can go download the dashboard in a day content for free and the PDF is pretty self-explanatory and if you've used excel in the past, you can walk through it and teach yourself the tool. I think the power of that from both the perspective of giving people opportunity and also building up a workforce for this field of work is amazing. Rob Collie (00:18:42): Yeah. I mean, all those people that were sort of in a sense like kind of left behind, years ago, they weren't given an avenue. A large number of them did get soaked up by Excel. If they're professionally still active today, there's this tremendous population of Excel people if they were joining the story today, they might be jumping into Power BI almost from the beginning, potentially. And of course, if they were doing that, they'd still be doing Excel. But there's still this huge reservoir of people who are still tomorrow, think about the number of people tomorrow, just tomorrow. Today, they're good at Excel and tomorrow, they will sort of, they'll have their first discovery moment with Power BI. The first moment of DAX or M or whatever, that's a large number of people tomorrow who are about to experience. It's almost like did you see the movie The Game? Greg Beaumont (00:19:36): I have not. Rob Collie (00:19:37): There's this moment early in the movie where Michael Douglas has just found out that his brother or something has bought them a pass to the game. And no one will tell him what it is. He meets this guy at a bar who says, "Oh, I'm so envious that you get to play for the first time." Also, this is really silly, but it's also like the ACDC song For Those About To Rock, We Salute You. For those about to DAX, we salute you, because that's going to happen tomorrow, right? Such a population every day that's lighting up and what an exciting thing to think about. Do you ever get down for any reason, just stop and think, "Oh, what about the 5000 people today who are discovering this stuff for the first time." That is a happy thing. Greg Beaumont (00:20:16): Yeah, I actually had a customer where one of their analysts who turned out to be just a Power BI Rockstar, he said, "I'd been spending 20 years of my life writing V-lookups, and creating giant Excel files. And now, everything I was trying to do is at my fingertips," right? And then within a year, he went from being a lifelong Excel expert to creating these amazing reports that got visibility within the organization and provided a ton of value. Rob Collie (00:20:42): And that same person you're talking about is also incredibly steeped in business decision-making. They've been getting a business training their whole career at the same time. And it's like suddenly, you have this amazingly capable business tech hybrid, that literally, it just like moved mountains. It's crazy. We've talked about that a lot on the show, obviously, the hybrids, just amazing. And a lot of these people have come to work for us. Rob Collie (00:21:09): That's the most common origin story for our consultants. It's not the only one. I mean, we do have some people who came from more traditional IT backgrounds, but they're also hybrids. They understand business incredibly well. And so, they never really quite fit in on the pure IT side, either. It's really kind of interesting. Greg Beaumont (00:21:26): Yeah, I think there's still a gap there between IT and business, even in kind of the way solutions get architected in the field. It's understanding what the business really wants out of the tool is often very different from how IT understands to build it. And I think that's where people like that provide that bridge, to make things that actually work and then provide the value that's needed. Rob Collie (00:21:47): There's such valuable ambassadors. It's just so obvious when IT is going to interact with a business unit to help them achieve some goal. It's so obvious that, of course, who you need to engage with IT. IT thinks, "We need to engage with the leaders of this business unit." They've got the secret weapon, these hybrid people that came up through the ranks with Excel. The word shadow IT is perfect. These people within the business, like they've been Excel people for their entire careers, they have an IT style job. Rob Collie (00:22:22): Almost all the challenges that IT complains about with working with business, you take these Excel people and sort of put them in a room where they feel safe. They'll tell you the same things. They're like, "I had exactly the same problems with my 'users,' the people that I build things for." And yeah, there's such a good translator. And if the communication flows between IT and business sort of through that portal, things go so much better. That's a habit. We're still in the process of developing as a world. Greg Beaumont (00:22:51): Yeah. And in healthcare that actually also provides some unique challenges. With regulation and personal health information, these Excel files have sensitive data in them, and you have to make sure it's protected and that the right people can see it. And how do you give them the power to use their skills to improve your organization, while also making sure that you keep everything safe. So, I think that's a hot topic these days. Rob Collie (00:23:15): Yeah. I mean, it's one of those like a requirement, even of the Hello World equivalent of anything is that you right off the bat have to have things like row level security and object level security in place and sometimes obfuscation. What are some of the... we don't want to get to shop talky, but it is a really fascinating topic, what are the handful of go-to techniques for managing sensitive healthcare information? How do you get good BI, while at the same time protecting identity and sensitivity. So often, you still need to be able to uniquely identify patients to tie them across different systems, can identify them as people. It's really, really, really tricky stuff. Greg Beaumont (00:24:02): And I think just to kind of stress the importance of this, you can actually go search for look up HIPAA wall of shame or HIPAA violation list. When this information gets shared with the wrong people, there's consequences and can result in financial fees and fines. And in addition to that, you lose the trust of people whose personal information may have been violated. So, I think a combination of you said things row level security and object level security as a start, you can also do data masking, but then there's issues of people export to Excel. What do they do with that data afterwards? Greg Beaumont (00:24:37): And then there's going to be tools like Microsoft Information Protection, where when you export sensitive information to Excel, it attaches an encrypted component. I'm not an MIT expert. I know how it works. I don't know the actual technology behind it. But it attaches an encrypted component where only people who are allowed to see that information can then open that file. So, you're protecting the information at the source and in transit, but you're still giving people the flexibility to go build a report or to potentially use data from different sources, but then have it be protected every step of the way. Greg Beaumont (00:25:11): So like you said, without getting too techie, there's ways to do it, but it's not just out of the box easy. There's steps you have to go through, talk to experts, get advice. Whether it's workshops or proof of concepts, there's different ways that customers can figure that out. Rob Collie (00:25:28): Yeah. So because of that sort of mandatory minimum level of sensitivity handling and information security, I would expect, now that we're talking about it, that IT sort of has to be a lot more involved by default in the healthcare space with the solutions than IT would necessarily be in other industries. Another way to say it, it's harder for the business to be 100% in charge of data modeling in healthcare than it is in other industries. Greg Beaumont (00:26:02): Yep. But you can have a hybrid model, which is where the business provides data that's already been vetted and protected and there might be other data that doesn't have any sensitive data in it, where it's game on, supply chain or something like that. But having these layers in between, the old way of doing things was just nobody gets access to it. Then there was kind of canned reporting where everybody gets burst in the reports that contain what they're allowed to see. But now, you can do things in transit, so that the end users can still use filters and build a new report and maybe even share it with other people. And know that whoever they're sharing with will only be able to see what they're allowed to see. It gets pretty complex, but it's definitely doable and the customers that are doing it are finding a lot of value in those capabilities. Rob Collie (00:26:48): That's fundamentally one of the advantages of having a data model. I was listening to a podcast with Jeffrey Wang from Microsoft and he was talking about it. And I thought this was a really crisp and concise summary, which is that the Microsoft Stack Power BI has a model-centric approach to the world whereas basically, all the competitors are report centric. And what does that mean? Why does that even make a difference? Well, when you build a model, you've essentially built all the reports in a way. You've enabled all of the reports. You can build many, many, many, many, many like an infinite number of different reports based on emerging and evolving business needs without having to go back to square one. Rob Collie (00:27:28): In a report-centric model, which is basically what the industry has almost always had, almost everywhere, outside of a few notable examples, Power BI being one of them. When a report centric model, every single change, I remember there being a statistic that was just jaw dropping. I forget what the actual numbers were, but it was something like the average number of business days it took to add a single column to a single existing report. It was like nine business days, when it should just be a click. And that's the difference. And so, preserving that benefit of this model centric approach, while at the same time, still making sure that everyone's playing within the right sandbox that you can't jump the fence and end up with something that's inappropriate. Very challenging, but doable. Greg Beaumont (00:28:15): Yep. That reminded me of an old joke we used to tell in consulting and this was back in the SharePoint Performance Point with Analysis Services days is there be a budget for a project, there'd be change requests along the ways, they discover issues with the data. And at the very end of the project, they rushed the visualization to market. And they're like after six months, with 10 people dedicated on this project, "Here's your line chart." Rob Collie (00:28:39): Yeah. I had a director of IT at a large insurance company one time, looking me in the eye and just brutally confess. Yeah, my team, we spent three months to put a dot on a chart. And that's not what you want. Greg Beaumont (00:28:59): Right, right. Rob Collie (00:29:01): That was unspoken. This was bad. To the extent that you're able to tell, what are some of the interesting things that you've seen in the healthcare space with this platform recently? Anything that we can talk about? Greg Beaumont (00:29:15): Yeah, so I think I'd start with how everything changed with COVID. Just because I think people would be interested in that topic and kind of how it changed everything. I actually had a customer yesterday at a large provider who said, "COVID was the catalyst for us to reconsider our investment in analytics, and that it spurred interest from even an executive level to put more money into analytics because of the things that happened." So obviously, when it hit everybody was, "What in the world is going on here?" Right? "Are we even going to have jobs? Is the whole world going to collapse or is this just going to be kind of fake news that comes and goes?" Everybody was unsure what was going on. Greg Beaumont (00:29:50): At the same time, the healthcare providers, a lot of them were moving people to work from home and these were organizations where they had very strict working conditions because of these data privacy and data security considerations, and all of a sudden, you're in a rush to move people home. So, some of my counterparts who do teams, they have some just amazing stories. They were up all night helping people set up ways to securely get their employees to a work-from-home type experience, so that they only had essential workers interacting with the patients, but then the office workers were able to effectively conduct business from home. Greg Beaumont (00:30:25): Additionally, there were use cases that were amazing. So, Microsoft has now what's called the Cloud for Health where we're effectively taking our technology and trying to make it more targeted towards healthcare customers and their specific needs, because we see the same types of use cases repeat from customer to customer. One of those use cases that came out of COVID was called Virtual Visits. And I actually know the team that built that solution, but because of patients who were on COVID, they didn't know how contagious it was. Greg Beaumont (00:30:56): There were people being put on ventilators, who weren't allowed to see their families and they were setting up a team's application, where people were actually able to talk to their family and see their family before they went under, right? There were chaplains who were reading people their last rites using video conferencing, and things like that. So, it was pretty heavy stuff, but I think from a healthcare perspective, it showed the value technology can provide. Greg Beaumont (00:31:21): And from our perspective in the field, it's like we're not just out there talking about bits and bytes. It kind of hit home that there's real people who are impacted by what we're doing and it adds another kind of layer of gravity, I'd call it, taking what you do seriously, right? I had another customer, they were doing some mapping initiatives with some of the COVID data because they wanted to provide maps for their employees of where the hotspots were. Greg Beaumont (00:31:46): And we were up till I think 11:00 at night one night working through a proof of concept. And they said, "Yeah, what's next is we also want to start mapping areas of social unrest." I said, "Wow, social unrest. Why are you worried about that?" And they said, "Well, we expect because of this lockdown, that eventually there's going to be rioting and issues in all different parts of the world." And at that time, I just kind of didn't really think about that, but then a lot of those things did happen. It was kind of just interesting to be working at night and hearing those stories, and then seeing how everything kind of unfolded. Greg Beaumont (00:32:18): Another example, look it up, there's an Azure COVID Health Bot out there and then there's some information on that, where you can ask questions and walk through your symptoms, and it will kind of give you some instructions on what to do. Another one that is even popular now is looking at employees who are returning to work. So, when people return to work find out vaccination status, "Are you able to come back to work? Are you essential? Are you nonessential?" I don't think a lot of customers were prepared to run through that scenario when it hit. Greg Beaumont (00:32:48): So, having these agile tools where you can go get your list of not only employees, but maybe partners that refer people to your network, because you might not have all the referring doctors in your system. So with Power BI, you can go get extracts, tie it all together and then build out a solution that helps you get those things done. I'd say it was eye opening. I think for customers and also for myself and my peers, that we're not just selling widgets. We're selling things that make a difference and have that human perspective to it. Rob Collie (00:33:20): Yeah, that does bring it home, doesn't it? That statement from an organization that COVID was the catalyst, evaluating and investing in their analytic strategy? Greg Beaumont (00:33:29): Yep. Rob Collie (00:33:30): Being in BI, being an analytics is one of the best ways to future proof one's career because at baseline, it's a healthy industry, there's always value to be created. But then when things get bad, for some reason, whatever crisis hits, it's actually more necessary than ever because when you've been in a groove when a an industry or an organization has been in an operational groove for a long time, any number of years, eventually, you just sort of start to intuitively figure it out. There's a roadmap that emerges slowly over time. Now, even that roadmap probably isn't as good as you think it is. If you really tested your assumptions, you'd find that some of them were flawed and analytics could have helped you be a lot more efficient even then. Rob Collie (00:34:14): But regardless, the perception is that we've got a groove, right? And then when the world completely changes overnight, all of your roadmaps, your travel roadmaps, none of them are valid anymore. And now, you need a replacement and you need it fast. And so, what happens is, is that analytics spending, BI spending, whatever you want to call it, or activity, actually increases during times of crisis. So, you got a healthy baseline business. It's an industry that's not withering and dying in good times, but it actually it's like a hedge against bad times. Rob Collie (00:34:47): When I saw that research years and years ago, when I was working at Microsoft Corporate, we just come out of the dot-com crack up, we'd seen that BI spending it across the IT industry was the only sector that went up during that time where everything else was falling. It's like, "Oh, okay." So, not only do I enjoy this stuff, but I really should never get out of it. It's like one of the best future proofing career moves you can make is the work in this field. And so, I mean, we've seen it, right? The early days of the COVID crisis, you're right when no one knew the range of possible outcomes going forward was incredibly wide. The low end and the high end were exponentially different from one another. Rob Collie (00:35:29): And so, we experienced in our business, sort of a gap in spring and early summer last year. We weren't really seeing a whole lot of new clients, people who are willing to forge a brand new relationship. Again, what happens when a crisis hits? You slam on the brakes. No unnecessary spending first of all. Let's get all the spending under control, because we don't know as a company what's going to happen in the industry, right? You see a lot of vendor spending freezes and of course, to other companies, we're a vendor, right? So, our existing clients, though, doubled down on how much they used us and how much they needed us. Rob Collie (00:36:08): And then later in the year, the new client business returned, and we actually ended up, our business was up last year, despite that Q2 interruption and sort of making new friends. And this year, holy cow like whatever was bottled up last year is coming back big time. And so, yeah. You never really want to be the ghoul that sort of morbidly goes, "Oh, crisis." From a business perspective, yeah, anything that changes, anything that disrupts the status quo tends to lead to an increased focus on the things that we do. Greg Beaumont (00:36:43): Yeah, I think something you said there, too, was when you don't know what's going to happen was when the business intelligence spending increased. I mean, the intelligence and business intelligence, it's not just a slogan. The purpose of these tools is to find out the things you don't know. So when there's uncertainty, that's when BI can provide that catalyst to sort of add some clarity to what you're actually dealing with. Rob Collie (00:37:06): Yeah, I've been using, even though I'm not a pilot, I've never learned to fly a plane or anything. I've been using an aviation metaphor lately, which is windshield is nice and clear. You might not be looking at the instruments on your cockpit very much, right? You know there's not a mountain in front of you, you can see how far away the ground is. And you could sort of intuit your way along, right? But then suddenly, whoosh clouds. And oh, boy, now, you really need those instruments, right? You need the dashboards, you need the altimeter, you need the radar. You need all that stuff so much more. Rob Collie (00:37:37): And so, and our business has kind of always been this. The reason I've been using this metaphor is really for us, it's like given how fast we operate, and I think you can appreciate this having come from a Microsoft partner consulting firm before Microsoft years ago, our business model, we move so fast with projects. We're not on that old model with the original budget and the change orders and all of that. That was all dysfunctional. Rob Collie (00:38:01): It was necessary, because of the way software worked back then, but it was absolutely dysfunctional. It's not the way that you get customer satisfaction. So, we've committed to the high velocity model. But that means seeing the future of our business financially two months in the future is very difficult relative to the old sort of glacial pace, right? If there's a mountain there, we're going to have months to turn around it. Krissy Dyess (00:38:26): To add a bit to your analogy there, Rob. I am married to a pilot and I have gone up in the small tiny airplane. And before the gadgets, there's actually the map. The paper map, right? So, you had the paper map, which my husband now would hand to me. And he'd tell me, "Okay, let me know the elevations of different areas to make sure we're high enough, we're not going to crash into the mountains." Krissy Dyess (00:38:47): What's happened is people just they got used to different ways that they were doing things. They were forced into these more modern ways. And I think even now, this wave of seeing this catalyst we can change and how are other people changing is also driving the people to seek help from others in terms of getting guidance, right? Because even though you've had the change, it doesn't necessarily mean that the changes that you made were 100% the right way and you can learn so much from others in the community and the people that are willing to help. Krissy Dyess (00:39:24): And I think that's one of the things too, that our company provides as a partner, we're able to kind of go alongside. We've seen what's works, what doesn't work, what are some of those pitfalls? What are those mountains approaching? And we're really able to help guide others that want to learn and become better. Rob Collie (00:39:42): Yeah. I mean, this is us getting just a little bit commercial, but you can forgive us, right? That high velocity model also exposes us to a much larger denominator. We see a lot at this business that accumulates. The example I've given before is and this is just a really specific techy, so much of this is qualitative, but there's a quantitative. It's sort of like a hard example of like, "Oh, yeah, that's right. This pattern that we need here for this food spoilage inventory problem is exactly the same as this tax accounting problem we solved over there, right?" As soon as you realize that you don't need to do all the figuring out development work, you just skip to the end. Rob Collie (00:40:22): And really, most of the stuff that Krissy was talking about, I think, is actually it's more of the softer stuff. It's more of the soft wisdom that accumulates over the course of exposure to so many different industries and so many different projects. That's actually really one of the reasons why people come to work here is they want that enrichment. Greg Beaumont (00:40:38): Yeah, that makes sense. Because you see all these different industries and you actually get exposed to customers that are the best in the business for that type of, whether it be a solution or whether it be a product or whether it be like a framework for doing analytics or something like that. So, you get that exposure and you also get to contribute. Rob Collie (00:40:55): Even just speaking for myself, in the early days of this business, when it was really still just me, I got exposure to so many business leaders. Business and IT leaders that, especially given the profile of the people who would take the risk back in 2013, you had to be some kind of exceptional to be leaning into this technology with your own personal and professional reputation eight years ago, right? It was brand new. So, imagine the profile of the people I was getting exposed to, right? Wow, I learned so much from those people in terms of leadership, in terms of business. They were learning data stuff from me, but at the same time, I was taking notes. Greg Beaumont (00:41:33): Everybody was reading your blog, too. I can't count the number of times I included a reference to one of your articles to help answer some questions. And it was the first time I was introduced to the Switch True DAX statement. And then I'd print that. Rob Collie (00:41:47): Which- Greg Beaumont (00:41:48): Sent that link to many people. "Don't do if statements, do this. Just read this article." Rob Collie (00:41:53): And even that was something that I'd saw someone else doing. And I was like, "Oh, my God, what is that?" My head exploded like, "Oh." Yeah, those were interesting days. I think on the Chandu podcast, I talked about how I was writing about this stuff almost violently, couldn't help it. It was just like so fast. Two articles a week. I was doing two a week for years. There was so much to talk about, so many new discoveries. It was just kind of pouring out in a way. Krissy Dyess (00:42:24): Greg, you came in to the role around 2016. And to me 2017 was really that big year with the monthly releases where Power BI just became this phenomenon, right? It just kept getting better and better in terms of capabilities and even the last couple years, all the attention around security has been huge, especially with the health and life science space. And last year, with this catalyst to shift mindsets into other patterns, working patterns using technology, do you feel like you've seen any kind of significant shifts just compared to last year or this year? Greg Beaumont (00:43:05): Yeah. And so something that burns my ears every time I hear it is when people call Power BI a data visualization tool. It does that and it does a great job. Rob Collie (00:43:11): I hate that. Greg Beaumont (00:43:12): But it's become much more than that. When it launched, it was a data visualization tool. But if you think about it at that time, they said, "Well, business users can't understand complex data models, so you have to do that in analysis services." Then they kind of ingested analysis services into Power BI and made it more of a SaaS product where you can scale it. There's Dataflows, the ETL tool, which is within Power BI, which is an iteration of Power Query, which has been around since the Excel days. So, now you have ETL. You have effectively from the old SQL Server world, you have the SSIS layer, you have the SSAS layer. With paginated reports, you have the SSRS layer. And you have all these different layers of the solution now within an easy to use SaaS product. Greg Beaumont (00:43:55): So this evolution has been happening, where it's gobbling up these other products that used to be something that only central IT could do. And now, we're putting that power by making it easier to use in the hands of those analysts who really know what they want from the data. Because if you think about it, the old process was is you go and you give the IT team your requirements, and they interpret how to take what you want, and translate it into computer code. Greg Beaumont (00:44:21): But now, we're giving those analysts the ability to take their requirements and go do it themselves. And there's still a very valid place for central IT because there's so many other things they can do, but it frees up their time to work on higher valued projects and I see that continuing with Power BI, right? But like we're adding AI, ML capabilities and data volumes keep increasing then capabilities I think will continue to expand it. Rob Collie (00:44:46): Greg, I used to really caused a storm when I would go to a conference that was full of BI professionals. And I would say that something like, "What percentage of the time of BI project, traditional BI project was actually spent typing the right code?" The code that stuck, right? And I would make the claim that it was less than 1%. So, it's like less than 1% of the time of a project, right? And everyone would just get so upset at me, right? But I just didn't understand why it was controversial. Rob Collie (00:45:19): Like you describe like yeah, we have these long requirements meetings in the old model. Interminably long, exhausting, and we'd write everything down. We'd come up with this gigantic requirements document that was flawed from the get-go. It was just so painful. It's like the communication cost was everything and the iteration and discovery, there wasn't enough time for that. And when I say that the new way of building these projects is sometimes literally 100 times faster than the old way. Like it sounds like hyperbole. Greg Beaumont (00:45:53): It's not. Yeah. Rob Collie (00:45:54): It can be that fast, but you're better off telling people, it's twice as fast because they'll believe you. If you tell them the truth, they'd go, "Nah, you're a snake oil salesman. Get out of here." Greg Beaumont (00:46:07): Yeah. And I think the speed of being able to develop, too, it's going to basically allow these tools to be able to do things that people didn't even dream of in the past. It's not just going to be traditional business use cases. I know in healthcare, something that's a hot topic is genomics, right? Genomics is incredibly complex then you go beyond Power BI and into Azure at that point, too and Cloud compute and things like that. Greg Beaumont (00:46:31): So, with Genomics, you think about your DNA, right? Your DNA is basically a long strand of computer code. It is base pairs of nucleic acids, adenine, thymine, and guanine, cytosine that effectively form ones and zeros in a really long string. Rob Collie (00:46:46): Did you know it effortlessly he named those base pairs? There's that biology background peeking back out. Greg Beaumont (00:46:52): I did have to go look it up before the meeting. I said, "Just in case this comes up, I need to make sure I pronounce them right," so. Rob Collie (00:46:59): Well, for those of us who listen to podcasts at 1.5x speed, that is going to sound super impressive, that string there. Greg Beaumont (00:47:05): Yeah. I should call out, too, though that I'm not a genomics expert, so some of what I'm saying here, I'm paraphrasing and repeating from people I've talked to who are experts, including physicians and researchers. So, this long string of code, if you sequence your entire genome, the file is about 100 gigabytes for one person, okay? At 100 gigabytes, you can consume that, but if you want to start comparing hundreds of people and thousands of people in different patient cohorts, all of a sudden, it gets to be a lot of information and it gets very complex. Greg Beaumont (00:47:35): If you think of that strand of DNA as being like a book with just two letters that alternate, there's going to be paragraphs and chapters and things like that, which do different things. So, one of the physicians I spoke to worked with Children's Cancer. Here's kind of where the use case comes in. So, you take something breast cancer where there's BRCA1 or BRCA2, BRCA1, BRCA2 genes where if you have it, there's a measurable increased probability that you'll get that type of cancer within a certain age range. There's a lot of other diseases and cancers, where it might be 30 genes. And depending on different combinations of those genes, it changes the risk of getting that specific type of cancer. Greg Beaumont (00:48:17): But this physician told me that there are specific children's cancers, where they know that if they have certain combinations of genes, that they have a very high probability of getting this cancer. And when the child actually feel sick and goes to the doctor, it's already spread and it's too late. So, if you can do this sequencing, basically run it through machine learning algorithms, so it will determine the probability, you could effectively catch it at stage zero. Because these cancers, it's something that could be related to growth hormones and as you're growing up, and as you become an adult, you're then no longer at risk of getting that childhood cancer. So, if they could identify it early and treated at stage zero, instead of stage 4, it sounds sci-fi, but the tools are there to do it. Greg Beaumont (00:49:01): It just never ceases to amaze me that you watch the news and they talk about self-driving cars and identifying when a banana is ripe, and things like that. But it's like, you know what? These same tools could be out there changing people's lives and making a measurable difference in the world. I think just especially post COVID, I'll expect to see a lot more investment in these areas. And also, interest because I think that might be one of the positives that comes out of this whole experience. Rob Collie (00:49:27): I do think that the sort of the worlds of Medicine and Computer Science are on a merging course. Let's not call it collision course. That sounds more dramatic. There is a merging going on. You're right DNA is biologically encoded instructions by an RNA. The mRNA vaccine is essentially injecting the source code that your body then compiles into antibodies. It's crazy and it's new. There's no two ways about it. Rob Collie (00:49:56): mRNA therapies, in general, which of course they were working on originally as anticancer and sort of just like, "Oh, well, we could use it for this, too." And there's all kinds of other things too, right? Gosh, when you go one level up from DNA or some point of abstraction, you get into protein folding. And whoa, is that... Greg Beaumont (00:50:15): Crazy, yeah. Rob Collie (00:50:16): ... computationally. We're all just waiting for quantum computers, I think. Greg Beaumont (00:50:20): Now, I'll have to call out that I'm making a joke here, so people don't take me seriously. But if you think about it, the nucleus in each of your cells contains an important model of that DNA, right? There isn't just a central repository that everything communicates with. You have a cache of that DNA in every cell in your body, except red blood cells, which perform a specific task. There may be more of the power automated the human body. But cheap attempt at a joke there, so. Rob Collie (00:50:44): Well, I like it, I like it. Let's go in with both feet. I've also read that one of the reasons why it's difficult to clone adult animals is because you start off with your original DNA, but then you're actually making firmware updates to certain sections of the DNA throughout your life. And so, those edits that are being made all the time are inappropriate for an embryo. Greg Beaumont (00:51:09): Yep. Rob Collie (00:51:10): And so, if you clone, you create an embryo, right? And now, it's got these weird adult things going on in it. That's why things kind of tend to go sideways. It can all come back to this notion of biological code and it's fascinating. A little terrifying, too, when you start to think of it that way. I've listened to some very scary podcasts about the potential for do-it-yourself bioweapon development. There was this explosion back, in what, in the '90s when the virus and worm writers discovered VVA. Remember that? We call them the script kiddies that would author these viruses that would spread throughout the computer systems of the world. And a lot of them, the people writing these things were not very sophisticated. They weren't world renowned hackers. Greg Beaumont (00:51:53): For every instance where you can use this technology to cure cancer, you're right that there's also the possibility of the Island of Dr. Moreau, right? You go look up CRISPR Technology, C-R-I-S-P-R, where they can start splicing together things from different places and making it viable. And 10 years ago, they had sheep that were producing spider webs in their milk and it's just, there's crazy stuff out there if you kind of dive into the dark depths of Biology. Now that we went down the rabbit hole, how do we correct course, right? Rob Collie (00:52:23): Well, we did go down a rabbit hole, but who cares? That's what we do. Greg Beaumont (00:52:26): Even you kind of step it back up to just kind of easy use cases in healthcare, so one of the ones that we use as a demo a lot came from a customer, and this was pre-COVID. But something as simple as hand washing, you don't think about it much. But when you're in the hospital, how many of those people are washing their hands appropriately when they care for you. And there's some white papers out there, which are showing that basically, there are measurable amounts of infections that happen in hospitals due to people not washing their hands appropriately. So, a lot of healthcare organizations will anonymously kind of observe people periodically to see who's doing a good job of washing their hands. Rob Collie (00:53:04): I was going to ask, how is this data collected? Greg Beaumont (00:53:06): This customer actually had nurses who were using a clipboard and they would write down their notes, fax it somewhere, and then somebody would enter it into Excel. So, there was this long process. And with another TS, who covers teams, we basically put a PLC together in a couple days, where they enter the information into a power app within teams, so they made their observation, entered it in. It did a write back straight to an Azure SQL Database at that time. Now, they might use the data verse. And then from Azure SQL DB, you can immediately report on it and Power BI. It even set up alerts, so that if somebody wasn't doing a good job, you could kind of take care of the situation, rather than wait for two days for the Excel report to get emailed out, and maybe lower the infection rates in the hospital. Greg Beaumont (00:53:53): So, it saved time from the workers who are writing things down and faxing things just from a sheer productivity perspective. But it also hopefully, I don't know if it will be measurable or not, but you'd have some anticipated increase in quality, because you're able to address issues faster. And that's the simplest thing ever, right? You can spend a billion dollars to come up with a new drug or you can just make sure are people washing their hands. Rob Collie (00:54:17): Both data collection and enforcement, they happen to be probably the same thing. There's like, "Oh, I'm being watched." The anonymity is gone. That's a fascinating story. Okay. What kinds of solutions are you seeing these days? What's happening out in the world that you think is worth talking to the audience about? Greg Beaumont (00:54:38): We're seeing this ability to execute better where the tools are easier to use, you can do things faster, but there's still challenges that I see frequently out there. So, I know something that you all are experts in its data modeling and understanding how to take a business problem and translate it into something that's going to perform well. So, not only do you get the logic right, but when somebody pushes a button they don't have to go to lunch and come back, they get a result quickly. That's still a challenge. And it's a challenge, because it's not always easy, right? I mean, it's the reason cubes were created in the first place was because when you have complex logic and you're going against a relational database, the query has to happen somewhere, but like that logic. Greg Beaumont (00:55:19): So take for example, if somebody wants to look at year over year percent change for a metric and they want to be able to slice it by department, maybe by disease group, maybe by weekend versus weekday, and then they want to see that trend over time. If you translate that into a SQL query, it gets really gnarly really fast. And that problem is still real. One of the trends I'm seeing in the industry is there's a big push to do everything in DirectQuery mode, because then you can kind of manage access, manage security, do all of those necessary security things in one place and have it exist in one place. Greg Beaumont (00:56:00): But when you're sending giant gnarly SQL queries back to relational databases, even if they're PDWs with multiple nodes, it gets very expensive from a compute perspective, and kind of when you scale out to large number of users, concurrency is still an issue. So that's something where you look at recently what Power BI has come out with aggregations and composite models. That's some of the technology that I think can mitigate some of those problems. And even if we think about something like Azure synapse, right? You can have your dedicated SQL pools then you can have a materialized view. A materialized view is effectively a cache of data within synapse, but then you can also have your caches in Power BI, and kind of layer everything together in a way that's going to take that logic and distribute it. Greg Beaumont (00:56:46): Does that make sense? Rob Collie (00:56:47): It does. I think this is still a current joke. The majority of cases where we've encountered people who think they want or need DirectQuery, the majority of them are actually perfect poster children case studies for when you should use cash and import mode. Right? It turns out the perceived need for DirectQuery, there is a real percentage of problems out there for which DirectQuery is the appropriate solution and it is the best solution. But it's the number of times people use it is a multiple of that real ideal number. Rob Collie (00:57:17): I think part of it is just familiarity. Still, I've long talked about how we're still experiencing as an industry the hangover from most data professionals being storage professionals. Everyone needed a database, just to make the wheels go round. The first use of data isn't BI. The first use of data is line of business applications. Every line of business application needed a database, right? So, we have minted millions of database professionals. this is also why I think partly why Power BI gets sort of erroneously pigeonholed as a visualization tools, because people are used to that. They're used to, we have a storage layer and reports layer, that's it, right? Rob Collie (00:57:56): Reporting services was Microsoft's runaway successful product in this space. Paginated reports is still around for good reason. And I think that if you're a long-term professional in this space with a long history, even if you're relatively young in the industry, but you've been working with other platforms, this storage layer plus visuals layer is just burned in your brain. And this idea of this like, "Why do you need to import the data? Why do you need a schedule? Why do you need all this stuff?" It's like as soon as people hear that they can skip it, and go to DirectQuery, they just run to

Python Bytes
#246 Love your crashes, use Rich to beautify tracebacks

Python Bytes

Play Episode Listen Later Aug 11, 2021 46:19


Watch the live stream: Watch on YouTube About the show Sponsored by us: Check out the courses over at Talk Python And Brian's book too! Special guest: David Smit Brain #1: mktestdocs Vincent D. Warmerdam Tutorial with videos Utilities to check for valid Python code within markdown files and markdown formatted docstrings. Example: import pathlib import pytest from mktestdocs import check_md_file @pytest.mark.parametrize('fpath', pathlib.Path("docs").glob("**/*.md"), ids=str) def test_files_good(fpath): check_md_file(fpath=fpath) This will take any codeblock that starts with ```python and run it, checking for any errors that might happen. Putting assert statements in the code block will actually check things. Other examples in README.md for markdown formatted docstrings from functions and classes. Suggested usage is for code in mkdocs documentation. I'm planning on trying it with blog posts. Michael #2: Redis powered queues (QR3) via Scot Hacker QR queues store serialized Python objects (using cPickle by default), but that can be changed by setting the serializer on a per-queue basis. There are a few constraints on what can be pickled, and thus put into queues Create a queue: bqueue = Queue('brand_new_queue_name', host='localhost', port=9000) Add items to the queue >> bqueue.push('Pete') >> bqueue.push('John') >> bqueue.push('Paul') >> bqueue.push('George') Getting items out >> bqueue.pop() 'Pete' Also supports deque, or double-ended queue, capped collections/queues, and priority queues. David #3: 25 Pandas Functions You Didn't Know Existed Bex T So often, I come across a pandas method or function that makes me go “AH!” because it saves me so much time and simplifies my code Example: Transform Don't normally like these articles, but this one had several “AH” moments between styler options convert dtypes mask nasmallest, nalargest clip attime Brian #4: FastAPI and Rich Tracebacks in Development Hayden Kotelman Rich has, among other cool features, beautiful tracebacks and logging. FastAPI makes it easy to create web API's This post shows how to integrate the two for API's that are easy to debug. It's really only a few simple steps Create a dataclass for the logger config. Create a function that will either install rich as the handler (while not in production) or use the production log configuration. Call logging.basicConfig() with the new settings. And possibly override the logger for Uvicorn. Article contains all code necessary, including examples of the resulting logging and tracebacks. Michael #5: Dev in Residence I am the new CPython Developer in Residence Report on first week Łukasz Langa: “When the PSF first announced the Developer in Residence position, I was immediately incredibly hopeful for Python. I think it's a role with transformational potential for the project. In short, I believe the mission of the Developer in Residence (DIR) is to accelerate the developer experience of everybody else.” The DIR can: providing a steady review stream which helps dealing with PR backlog; triaging issues on the tracker dealing with issue backlog; being present in official communication channels to unblock people with questions; keeping CI and the test suite in usable state which further helps contributors focus on their changes at hand; keeping tabs on where the most work is needed and what parts of the project are most important. David #6: Dagster Dagster is a data orchestrator for machine learning, analytics, and ETL Great for local development that can be deployed on Kubernetes, etc Dagit provides a rich UI to monitor the execution, view detailed logs, etc Can deploy to Airflow, Dask, etc Quick demo? References https://www.dataengineeringpodcast.com/dagster-data-applications-episode-104/ https://softwareengineeringdaily.com/2019/11/15/dagster-with-nick-schrock/ Extras Michael: Get a vaccine, please. Python 3.10 Type info ---- er Make the 3.9, thanks John Hagen. Here is a quick example. All of these are functionally equivalent to PyCharm/mypy: # Python 3.5-3.8 from typing import List, Optional def fun(l: Optional[List[str]]) -> None: # Python 3.9+ from typing import Optional def fun(l: Optional[list[str]]) -> None: # Python 3.10+ def fun(l: list[str] | None) -> None: Note how with 3.10 we no longer need any imports to represent this type. David: Great SQL resource Joke: Pray

The Watering Mouth Podcast
2. Why You Can't Lose the Weight on Eat to Live Diet

The Watering Mouth Podcast

Play Episode Listen Later Jul 21, 2021 29:00


On today's podcast, I discuss the major reasons why we can't lose the weight and remain consistent while following the Eat to Live diet. Learn it all here! Then head over to the High Nutrient Lifestyle Group on Facebook for some free support and camaraderie!  Links Mentioned: YouTube video 4 Reasons You're Not Losing Weight on ETL https://thewateringmouth.com/4-reasons-youre-not-losing-weight-on-the-eat-to-live-nutritarian-diet-youtube/ Free 9-Day Eat to Live Challenge: http://www.thewateringmouth.com  Eat to Live by Dr. Joel Fuhrman*: https://amzn.to/3BfL1Zs Eat for Life by Dr. Joel Fuhrman*: (most recent and updated info): https://amzn.to/36EgN4B Visit my site for all the info, including a free 9-Day Eat to Live Challenge just for you: http://www.thewateringmouth.com Join one of my live, free 5-Day Challenges and lose weight and eat healthy with hundreds and hundreds of other folks just like you: http://www.thewateringmouth.com/challenge Join my private, safe healthy eating, affordable group coaching membership called the Eat to Live Family: http://www.thewateringmouth.com/family Check out my 500+ YouTube videos: http://www.youtube.com/thewateringmouth Follow me on Facebook: http://www.facebook.com/thewateringmouth Follow me on Instagram: http://www.instagram.com/thewateringmouth Join my FREE private Facebook group, the Eat to Live High Nutrient Lifestyle group: https://www.facebook.com/groups/highnutrientlifestylegroup

Entrepreneurial Thought Leaders
Nicole Diaz (Snap Inc.) - How to Build an Ethical Company

Entrepreneurial Thought Leaders

Play Episode Listen Later Jun 2, 2021 50:22


Nicole Diaz is the Global Head of Integrity & Compliance Legal for Snap Inc., where her responsibilities include promoting ethical business standards and adherence to the Code of Conduct, managing risk in key areas such as anti-bribery and trade law, and leading internal investigations. In this conversation with Stanford professor Tom Byers, Diaz insists that ethics is a strategic imperative for 21st century businesses, and explores how the concept of “enlightened self-interest” can create a framework for better decision-making without requiring a commitment to pure (and unrealistic) altruism.

Entrepreneurial Thought Leaders
Jannick Malling (Public.com) - Social Fintech

Entrepreneurial Thought Leaders

Play Episode Listen Later May 26, 2021 51:01


Jannick Malling is the co-founder and co-CEO of Public.com, an investing social network where members can own fractional shares of stocks and ETFs, follow popular creators, and share ideas within a community of investors. In this conversation with Stanford lecturer Toby Corey, Malling discusses building magical products in a highly regulated industry, turning company values into everyday tools, and why having two CEOs is sometimes better than having one.