Podcasts about Data lake

System or repository of data stored in its natural/raw format

  • 230PODCASTS
  • 357EPISODES
  • 40mAVG DURATION
  • 1MONTHLY NEW EPISODE
  • May 4, 2025LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about Data lake

Latest podcast episodes about Data lake

The Cloudcast
The Early AI Journey and Learning Curve

The Cloudcast

Play Episode Listen Later May 4, 2025 21:45


As more companies begin to adopt AI into their workforce and day-to-day processes, it will be interesting to watch how their learning curve  is spread across knowledge workers. SHOW: 920SHOW TRANSCRIPT: The Cloudcast #920 TranscriptSHOW VIDEO: https://youtube.com/@TheCloudcastNET CLOUD NEWS OF THE WEEK: http://bit.ly/cloudcast-cnotwCHECK OUT OUR NEW PODCAST: "CLOUDCAST BASICS"SHOW SPONSORS:Cut Enterprise IT Support Costs by 30-50% with US CloudSHOW NOTES:AI Horseless Carriages (AI user-experiences)HOW WILL WE VIEW AN AI AGENT IN THE CONTEXT OF HUMANS OR “USERS”The low-hanging fruit, simple on-ramp is the key to early AI adoption Google and Microsoft are already showing revenue increases, likely through the productivity apps bundlingExpect prices to increase slowly, but frequently as adoption happens and companies get used to the knowledge worker productivity increases (or expectations)Curious how knowledge workers are adopting, sharing, increasing their learning curveSharing still seems to be lacking within the AI tools. Not just sharing of an individual task, but sharing of learning curves, best practices, datasetsIs there a dataset collection opportunity? This feels like Big Data or Data Lake 5.0. FEEDBACK?Email: show at the cloudcast dot netTwitter/X: @cloudcastpodBlueSky: @cloudcastpod.bsky.socialInstagram: @cloudcastpodTikTok: @cloudcastpod

Digital Health Talks - Changemakers Focused on Fixing Healthcare
AI-Driven Healthcare: Sutter Health's Journey to Scale Clinical Innovation

Digital Health Talks - Changemakers Focused on Fixing Healthcare

Play Episode Listen Later Mar 25, 2025 27:38


Join Kiran Mysore, Chief Data & Analytics Officer at Sutter Health, as he shares insights on scaling AI adoption, building sustainable innovation infrastructure, and transforming healthcare delivery through data-driven approaches. Learn how one of the nation's largest health systems is successfully integrating advanced analytics and AI into clinical practice while maintaining governance and ethical standards. Real-world implementation of clinical AI at scale Building sustainable innovation infrastructure Data strategy for improved patient and provider experiences Governance frameworks for responsible AI adoption Implementation and impact from digital scribes to diagnosticsKiran Mysore, Chief Data & Analytics Officer at Sutter HealthShahid Shah, Chairman of the Board, Netspective Foundation

So klingt Wirtschaft
Innovation durch Daten: Warum das Mindset entscheidend ist

So klingt Wirtschaft

Play Episode Listen Later Feb 12, 2025 16:24 Transcription Available


Unternehmen müssen ihren Umgang mit Daten und KI ändern. Weg von zielgerichtetem Denken, hin zu datengetriebener Neugier. Niels Strohkirch von Fujitsu erklärt, wie es geht.

Steady Lads
What Is Data Lake? (Deep Dive On LAKE Token)

Steady Lads

Play Episode Listen Later Feb 8, 2025 26:24


n this episode I talk to Oliver Slapal co founder of Data Lake. We do a deep dive into Data Lake, what it is, their new token launchpad Lakedotfun, the LAKE token and how it works as well as a ton more.-----------THE OBSIDIAN COUNCIL PREMIUM MEMBERSHIP

Run The Numbers
The Analytics Escalator: Unlocking Value in Finance with Mambu CFO Jesper Sorensen

Run The Numbers

Play Episode Listen Later Jan 23, 2025 51:49


CJ delves deep into the world of analytics in this interview with Jesper Sorensen, the CFO of Mambu, a leading fintech and banking platform in Europe valued at over $5 billion. Jesper, who has authored three books on analytics, introduces the Analytics Escalator—a framework for unlocking real value through descriptive, diagnostic, predictive, and prescriptive analytics. In the discussion, Jesper covers everything from getting started with analytics and deciding where it should sit within an organization to identifying key opportunities, building a solid BI tech stack, and understanding the role of analytics tools and data lakes. He outlines the journey to creating a high-impact analytics function within a company, emphasizing the critical interplay between people, processes, and systems in fostering a thriving analytics culture—and shares practical advice on how to achieve it.—SPONSORS:RightRev automates the revenue recognition process from end to end, gives you real-time insights, and ensures ASC 606 / IFRS 15 compliance—all while closing books faster. Whether it's multi-element arrangements, subscription renewals, or complex usage-based contracts, RightRev takes care of it all. That means fewer spreadsheets, fewer errors, and more time for your team to focus on growth. For modern revenue recognition simplified, visit rightrev.com and schedule a demo.Brex offers the world's smartest corporate card on a full-stack global platform that is everything CFOs need to manage their finances on an elite level. Plus they offer modern banking and treasury as well as intuitive expenses and accounting automation, bill pay, and travel. Brex makes it easy to control spend before it happens, automate annoying tasks, and optimize your finances. Find out how Brex can help you make every dollar count at brex.com/metrics.Planful is a financial performance management platform designed to streamline financial tasks for businesses. It helps with budgeting, closing the books, and financial reporting, all on a cloud-based platform. By improving the efficiency and accuracy of these processes, Planful allows businesses to make better financial decisions. Find out more at www.planful.com/metrics.Vanta's trust management platform takes the manual work out of your security and compliance process and replaces it with continuous automation. Over 9000 businesses use it to automate compliance needs across over 35 frameworks like SOC 2 and ISO 27001. Centralize security workflows, complete questionnaires up to five times faster, and proactively manage vendor risk. For a limited time, get $1,000 off of Vanta at vanta.com/metrics.—FOLLOW US ON X:@cjgustafson222 (CJ)—TIMESTAMPS:(00:00) Preview and Intro(02:01) Sponsor – RightRev | Brex(04:53) Jesper's Pre-CFO Career(06:50) The Huge Success of Mambu(10:40) Understanding What Analytics Is(13:56) Sponsor – Planful | Vanta(16:02) Where Analytics Should Sit in the Org(20:12) Model for Serving a Team's BI Needs Internally(22:03) Creating Value with the Analytics Escalator(29:51) Prescriptive Analytics and How to Achieve Them(31:08) The Steps on the Analytics Escalator(33:36) Establishing People, Processes, and Systems(34:54) Self-Service Versus Customer Service BI(37:57) The Components of a Good BI Tech Stack(38:50) Data Visualization Tools Versus Analytics Tools(41:59) The Value of a Data Lake(44:26) Build-Versus-Buy for Analytics Tools(46:50) The Need for a Proof-of-Concept for Analytics Tools(48:10) Managing the Trade-Off Between Performance and Cost(50:19) Wrap Get full access to Mostly metrics at www.mostlymetrics.com/subscribe

Identity At The Center
#325 - Theorycrafting Modern Identity Architecture with Ian Glazer

Identity At The Center

Play Episode Listen Later Jan 13, 2025 69:17


Welcome to the Identity at the Center podcast! In this episode, hosts Jeff and Jim dive deep into modern identity architecture with guest Ian Glazer. They discuss topics such as the importance of policy, data orchestration, and the evolving landscape of identity and access management (IAM). Ian shares his thoughts on the future of IAM, the integration of various data sources, the role of events in IAM, and the potential for real-time identity solutions. They also touch on upcoming conferences, the European Identity and Cloud Conference 2025, and the significance of engaging with the identity community. Tune in for a thought-provoking discussion on the advancements and future directions of digital identity! Chapters 00:00 Introduction and Podcast Overview 00:11 Upcoming Plans and Challenges 01:03 Guest Invitation and Podcast Dynamics 03:31 Conference Announcements and Discounts 06:05 Welcoming the Guest: Ian Glazer 06:46 Fido Feud and Conference Experiences 16:29 Identity Market Trends and Innovations 19:19 Modern Identity Architectures 33:51 Identity First Security: A New Approach 34:50 Unified Data Tiers: Breaking Down Silos 36:14 Modern IAM: Opportunities and Challenges 37:02 Ephemeral Access and Zero Standing Privilege 39:18 Understanding Identity Data 41:30 Workforce Identity Data Platforms 47:14 Orchestration and Execution in IAM 51:09 Real-Time Event-Based Identity Systems 54:45 Future Directions and Community Engagement 59:03 Teaching and Sharing Knowledge 01:05:33 Closing Thoughts and Recommendations Connect with Ian: https://www.linkedin.com/in/iglazer/ Notional architecture for modern IAM: Part 3 of 4 (blog): https://weaveidentity.com/blog/notional-architecture-for-modern-iam/ 2025: The year we free our IAM data: https://weaveidentity.com/blog/2025-the-year-we-free-our-iam-data/ Learn more about Weave Identity: https://weaveidentity.com/ Digital Identity Advancement Foundation: https://digitalidadvancement.org/ Avoid the Noid! - https://en.wikipedia.org/wiki/The_Noid Connect with us on LinkedIn: Jim McDonald: https://www.linkedin.com/in/jimmcdonaldpmp/ Jeff Steadman: https://www.linkedin.com/in/jeffsteadman/ Visit the show on the web at http://idacpodcast.com Keywords: IDAC, Identity at the Center, Jeff Steadman, Jim McDonald, Ian Glazer, Weave Identity, Identity and Access Management, IAM, Modern Identity Architectures, Modern IAM, Data Tier, Events, Orchestration, Zero Trust, ZTNA, Shared Signals Framework, EIC, Gartner, Black Hat, RSA, Identibeer, Data Lake, OIDs, IANS

SQL Data Partners Podcast
Episode 283: Data Lakehouse vs Data Warehouse vs My House

SQL Data Partners Podcast

Play Episode Listen Later Jan 2, 2025 48:59


Microsoft Fabric offers two enterprise-scale, open-standard format workloads for data storage: Warehouse and Lakehouse. Which service should you choose? In this episode, we dive into the technical components of OneLake, along with some of the decisions you'll be asked to make as you start to build out your data infrastructure. These are two good articles we mention in the podcast that could help inform your decision on the services to implement in your OneLake. Microsoft Fabric Decision Guide: Choose between Warehouse and Lakehouse - Microsoft Fabric | Microsoft Learn Lakehouse vs Data Warehouse vs Real-Time Analytics/KQL Database: Deep Dive into Use Cases, Differences, and Architecture Designs | Microsoft Fabric Blog | Microsoft Fabric We hope you enjoyed this conversation on the nuances of data storage within Microsoft OneLake! If you have questions or comments, please send them our way. We would love to answer your questions on a future episode. Leave us a comment and some love ❤️ on LinkedIn, X, Facebook, or Instagram. The show notes for today's episode can be found at Episode 283: Data Lakehouse vs Data Warehouse vs My House. Have fun on the SQL Trail!

In Numbers We Trust - Der Data Science Podcast
#61: Technologische Must-Haves: Unser Survival-Guide für Data-Science-Projekte

In Numbers We Trust - Der Data Science Podcast

Play Episode Listen Later Dec 5, 2024 42:04


Zusammenfassend unsere Must-Haves: Datenbank / DWH  Lösung zur Datenvisualisierung Möglichkeit, unkompliziert zu entwickeln (lokal oder im Web) Versionskontrolle / CI/CD Deployment-Lösung Trennung von Entwicklungs- und Produktivumgebung Monitoring für Modell & Ressourcen   Verwandte Podcast-Episoden Folge #2: Erfolgsfaktoren für Predictive Analytics Projekte Folge #5: Data Warehouse vs. Data Lake vs. Data Mesh Folge #20: Ist Continuous Integration (CI) ein Muss für Data Scientists? Folge #21: Machine Learning Operations (MLOps) Folge #29: Die Qual der Wahl: Data Science Plattform vs. Customized Stack Folge #35: Erfolgsfaktoren für Machine Learning Projekte mit Philipp Jackmuth von dida Folge #43: Damit es im Live-Betrieb nicht kracht: Vermeidung von Overfitting & Data Leakage Folge #54: Modell-Deployment: Wie bringe ich mein Modell in die Produktion?   Technologien & Tools Datenvisualisierung: Azure Databricks, AWS Quicksight, Redash Entwicklungsumgebung: VSCode, INWT Python IDE V2, Remote Explorer, Pycharm Versionskontrolle: GitHub, GitLab, Azure DevOps CI/CD: GitHub Actions, GitLab CI, Jenkins Deployment: Kubernetes, Docker, Helm, ArgoCD Experiment-Tracking: MLFlow, DVC, Tensorboard Monitoring: Prometheus, Grafana, AWS Cloudwatch

Project Geospatial
FOSS4G NA 2024 - Searching the Spatial Data Lake: Bring GeoParquet to Apache Lucene - Wes Richardet

Project Geospatial

Play Episode Listen Later Oct 28, 2024 25:32


Wes Richardet's talk at FOSS4G NA 2024 focuses on improving search capabilities within spatial data lakes using GeoParquet and Apache Lucene. He discusses the evolution of data storage, the need for efficient search solutions, and the integration of different technologies to enhance performance. Highlights

Dynamics Update
Interview - Scott Sewell JJ Yadav - SynapseLink and Fabric

Dynamics Update

Play Episode Listen Later Oct 11, 2024 50:59


Hi All In this episode, we have the pleasure of speaking with Scott Sewell, Principal Program Manager, and JJ Yadav, Principal Solution Architect, from Microsoft. Our discussion centers around Link to Fabric and SynapseLink. We explore how these technologies should be integrated when implementing D365FO and how customers currently using Export to Datalake or BYOD can benefit from these new solutions. Additionally, we delve into the scenarios where each solution is most appropriate. Link to the Training/Workshop mentioned in the episode:  https://github.com/microsoft/Dynamics-365-FastTrack-Implementation-Assets/tree/master/Analytics/DataverseLink/FabricWorkshop Gustav and Johan

The MSDW Podcast
Community Summit 2024 Preview with Aqueducts Consulting

The MSDW Podcast

Play Episode Listen Later Oct 3, 2024 22:28


MSDW is previewing Community Summit North America 2024 with a new series of quick podcast episodes featuring exhibitors. In this episode, we speak with Joe Christensen, founder of Aqueducts Consulting.  The team at Aqueducts thinks a lot about how to help Dynamics 365 F&O and AX customers find success in their reporting and data management efforts, Joe tells us.  He has seen plenty of pain and frustration among customers over the years, and there are many good reasons why, from the technology to project planning to user adoption and organizational change. Reporting challenges often come down to a solution's reliability, Joe explains. When there's no reliability, a company's reporting system often reverts to becoming little more than key people and tools, rather than something designed to deliver reports to everyone.   We also discuss the evolution of the Dynamics 365 F&O data strategy, first with the introduction of Export to Data Lake and now Synapse Link. The Aqueducts Consulting booth at Summit will give attendees as unique "data factory" experience, and Joe explains what that will look like, blending concepts, technology, and operational process.  More information: See the Aqueducts Consulting teaw at Booth 1817 Partner Solution Showcase: Tuesday October 15th, 10:45-11:45 AM Come by to cover reporting out of D365F&O.  Hear from your peers on how they designed their reporting out of F&O, with a special focus on Synapse Link, and the lakehouse/data warehouse approaches.  And ask me about Report Factory!  It's our new way of making sure your team can survive when any of your key people leave the BI / reporting team. Demo Zone: Wednesday Oct 16th, 4:45-4:55 PM  Where we run through how to get the most progress on F&O reporting, from 14 years of reporting in the field. Connect with Joe on LinkedIn: https://www.linkedin.com/in/joe-christensen-aqc/

BVL.digital Podcast
#230: Wie KI die Rolle der Supply Chain Planer verändert (Prof. Dr. Kai Hoberg, Kühne Logistics University)

BVL.digital Podcast

Play Episode Listen Later Oct 2, 2024 48:05


Prof. Dr. Kai Hoberg ist Professor für Supply Chain und Operations Strategy an der Kühne Logistics University in Hamburg und beschäftigt sich derzeit verstärkt mit der Frage, wie Künstliche Intelligenz in der Supply Chain Planung eingesetzt werden kann und wie KI in Zukunft die Rolle der Supply Chain Planer verändert wird. Und genau darüber spricht unser Host Boris Felgendreher mit Kai Hoberg in dieser Folge des BVL Podcasts. Unter anderem geht es um folgende Themen: - Aktueller Stand der KI und Automatisierung in der Supply Chain Planung. - Die Entwicklung der Landschaft von Planungssystemen und das Aufkommen neuer KI-gesteuerter Akteure. - Die Herausforderungen traditioneller Softwareanbieter wie SAP im Wettlauf um die Integration von KI. - Die Bedeutung von Daten und die Notwendigkeit eines „Data Lake“ für das Training von KI-Systemen. - Der Erfolg von KI in der Supply-Chain-Planung im Vergleich zu anderen Technologien wie Blockchain. - Konkrete Beispiele für KI-Anwendungen in der Supply-Chain-Planung, darunter Nachfrageprognose und Automatisierung von Bestellungen. - Die Notwendigkeit der Interaktion zwischen Mensch und KI in der Supply-Chain-Planung. - Die Bedeutung des „Upskilling“ für Supply-Chain-Planer im KI-Zeitalter. - Die Herausforderungen, die sich aus der „Blackbox“-Natur vieler KI-Systeme ergeben. - Die Bedeutung des Customizing von KI-Systemen mit branchenspezifischen Daten. - Die Notwendigkeit, „KI-Washing“ zu vermeiden und Lösungen auf der Grundlage ihres Wertes und nicht ihres KI-Gehalts zu bewerten. - Das Potenzial von Large Language Models (LLMs) und generativer KI in der Supply-Chain-Planung. - Praktische Ratschläge für Supply-Chain-Planer, wie sie KI-Lösungen optimal nutzen können. - Die Idee, Gamification-Elemente in die Interaktion mit KI-Systemen zu integrieren. - und vieles mehr Hilfreiche Links: Kühne Logistics University: https://www.klu.org/faculty-research/faculty/resident-faculty/kai-hoberg Prof. Kai Hoberg auf LinkedIn: https://www.linkedin.com/in/kai-hoberg/ Boris Felgendreher auf LinkedIn: https://www.linkedin.com/in/borisfelgendreher/ BVL Supply Chain CX: https://www.bvl.de/cx

The Ravit Show
Enterprise Data & AI, Data Lake and much more!

The Ravit Show

Play Episode Listen Later Oct 2, 2024 7:56


Why should you build a Data Lake? I had a fantastic conversation with Olegs Kosels, Enterprise Data Architect at Jamf, on The Ravit Show. We discussed some key topics around Enterprise Data and AI, focusing on the practical challenges and opportunities. Olegs shared insights on how he built a Data Lake and how Jamf is using Dremio to streamline their data processes. We also discussed the overall data journey at Jamf and Olegs' thoughts on the future of Data and AI. Stay tuned for more such interviews from Big Data London! #data #ai #bigdatalondon2024 #theravitshow

Microsoft Mechanics Podcast
Generative AI with Microsoft Fabric

Microsoft Mechanics Podcast

Play Episode Listen Later Aug 15, 2024 2:52


Microsoft Fabric seamlessly integrates with generative AI to enhance data-driven decision-making across your organization. It unifies data management and analysis, allowing for real-time insights and actions. With Real Time Intelligence, keeping grounding data for large language models (LLMs) up-to-date is simplified. This ensures that generative AI responses are based on the most current information, enhancing the relevance and accuracy of outputs. Microsoft Fabric also infuses generative AI experiences throughout its platform, with tools like Copilot in Fabric and Azure AI Studio enabling easy connection of unified data to sophisticated AI models.   ► QUICK LINKS: 00:00 - Unify data with Microsoft Fabric 00:35 - Unified data storage & real-time analysis 01:08 - Security with Microsoft Purview 01:25 - Real-Time Intelligence 02:05 - Integration with Azure AI Studio   ► Link References This is Part 3 of 3 in our series on leveraging generative AI. Watch our playlist at https://aka.ms/GenAIwithAzureDBs   ► Unfamiliar with Microsoft Mechanics?  As Microsoft's official video series for IT, you can watch and share valuable content and demos of current and upcoming tech from the people who build it at Microsoft. • Subscribe to our YouTube: https://www.youtube.com/c/MicrosoftMechanicsSeries • Talk with other IT Pros, join us on the Microsoft Tech Community: https://techcommunity.microsoft.com/t5/microsoft-mechanics-blog/bg-p/MicrosoftMechanicsBlog • Watch or listen from anywhere, subscribe to our podcast: https://microsoftmechanics.libsyn.com/podcast   ► Keep getting this insider knowledge, join us on social: • Follow us on Twitter: https://twitter.com/MSFTMechanics  • Share knowledge on LinkedIn: https://www.linkedin.com/company/microsoft-mechanics/ • Enjoy us on Instagram: https://www.instagram.com/msftmechanics/ • Loosen up with us on TikTok: https://www.tiktok.com/@msftmechanics

Oracle University Podcast
Database Essentials

Oracle University Podcast

Play Episode Listen Later Jul 23, 2024 12:24


Join hosts Lois Houston and Nikita Abraham, along with Hope Fisher, Oracle's Product Manager for Database Technologies, as they break down the basics of databases, explore different database management systems, and delve into database development.   Whether you're a newcomer or just need a refresher, this quick, informative episode is sure to offer you some valuable insights.   Oracle MyLearn: https://mylearn.oracle.com/ou/course/database-essentials/133032/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Radhika Banka, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started! 00:26 Nikita: Hello and welcome to the Oracle University Podcast. I'm Nikita Abraham, Principal Technical Editor with Oracle University, and with me is Lois Houston, Director of Innovation Programs. Lois: Hi there! For the last seven weeks, we've been exploring the world of OCI Container Engine for Kubernetes with our senior instructor Mahendra Mehra. We covered key aspects of OKE to help you create, manage, and optimize Kubernetes clusters in Oracle Cloud Infrastructure. So, be sure you check out those episodes if you're interested in Kubernetes. 01:00 Nikita: Today, we're doing something a little different. We've had a lot of episodes on different aspects of Oracle Database, but what if you're just getting started in this world? We wanted you to have something that you could listen to as well. And so we have Hope Fisher with us today. Hope is a Product Manager for Database Technologies at Oracle, and we're going to ask her to take us through the basics of database, the different database management systems, and database development.  Lois: Hi Hope! Thanks for joining us for this episode. Before we dive straight into terminologies and concepts, I want to take a step back and really get down to the basics. We sometimes use the terms data and information interchangeably, but they're not the same, right? 01:43 Hope: Data is raw material or a set of facts and observations. Information is the meaning derived from the facts. The difference between data and information can be explained by using an example, such as test scores. In one class, if every student receives a numbered score and the scores can be calculated to determine a class average, the class average can be calculated to determine the school average. So in this scenario, each student's test score is one piece of data. And information is the class's average score or the school's average score. There is no value in data until you actually do something with it. 02:24 Nikita: Right, so then how do we make all this data useful? Do we create a database system?  Hope: A database system provides a simple function—treat data as a collection of information, organize it, and make the data usable by providing easy access to it and giving you a place where that data can be stored. Every organization needs to collect and maintain data to meet its requirements. Most organizations today use a database to automate their information systems. An information system can be defined as a formal system for storing and processing data. A database is an organized collection of data put together as a unit. The rationale of a database is to collect, store, and retrieve related data for use by database applications. A database application is a software program that interacts with the database to access and manipulate data. A database is usually managed by a Database Administrator, also known as a DBA. 03:25 Nikita: Hope, give us some examples of database systems. Hope: Popular examples of database systems include Oracle Database, MySQL, which is also owned by Oracle, Microsoft SQL server, Postgres, and others. There are relational database management systems. The acronym is DBMS. Some of the strengths of a DBMS include flexibility and scalability. Given the huge amounts of information that modern businesses need to handle, these are important factors to consider when surveying different types of databases. 03:59 Lois: This may seem a little bit silly, but why not just use spreadsheets, Hope? Why use databases? Hope: The easy answer is that spreadsheets are designed for specific problems, relatively small amounts of data and individual users. Databases are designed for lots of data, shared information use, and complex data analysis. Spreadsheets are typically used for specific problems or small amounts of data. Individual users generally use spreadsheets. In a database, cells contain records that come from external tables. Databases are designed for lots of data. They are intended to be shared and used for more complex data analysis. They need to be scalable, secure, and available to many users. This differentiation means that spreadsheets are static documents, while databases can be relational. 04:51 Nikita: Hope, what are some common database applications?  Hope: Database applications are used in far and wide use cases that most commonly can be grouped into three areas. Applications that run companies called enterprise applications. Enterprise applications are designed to integrate computer systems that run all phases of an enterprise's operations to facilitate cooperation and coordination of work across the enterprise. The intent is to integrate core business processes, like sales, accounting, finance, human resources, inventory, and manufacturing. Applications that do something very specific, like healthcare applications-- specialized software is software that's written for a specific task rather than for a broad application area.  And then there are also applications that are used to examine data and turn it into information, like a data warehouse, analytics, and data lake. 05:54 Lois: We've spoken about data lakes before. But since this is an episode about the basics of database, can you briefly tell us what a data lake is? Hope: A data lake is a place to store your structured and unstructured data as well as a method for organizing large volumes of highly diverse data from diverse sources. Data lakes are becoming increasingly important as people, especially in businesses and technology, want to perform broad data exploration and discovery. Bringing data together into a single place or most of it into a single place makes that simpler. 06:29 Nikita: Thanks for that, Hope. So, what kind of organizations use databases? And, who within these organizations uses databases the most? Hope: Almost every enterprise uses databases. Enterprises use databases for a variety of reasons and in a variety of ways. Data and databases are part of almost any process of the enterprise. Data is being collected to help solve business needs and drive value. Many people in an organization work with databases. These include the application developers who create applications that support and drive the business. The database administrator or DBA maintains and updates the database. And the end user uses the data as needed. 07:19 Do you want to stay ahead of the curve in the ever-evolving AI  landscape? Look no further than our brand-new OCI Generative AI Professional course and certification. For a limited time only, we're offering both the course and certification for free. So, don't miss out on this exclusive opportunity to get certified on Generative AI at no cost. Act fast because this offer is valid only until July 31, 2024. Visit https://education.oracle.com/genai to get started. That's https://education.oracle.com/genai. 07:57 Nikita: Welcome back. Now that we've discussed foundational database concepts, I want to move on to database management systems. Take us through what a database management system is, Hope. Hope: A Database Management System, DBMS, has the following elements. The kernel code manages memory and storage for the DBMS. The repository of metadata is called a data dictionary. The query language enables applications to access the data. Oracle database functions include data definitions, storage, structure, and security. Additional functionality also provides for user access control, backup and recovery, integrity, and communications. There are many different database types and management systems. The most common is the relational database management system. 08:51 Nikita: And how do relational databases store data?  Hope: Essentially and very simplistically, there are key elements of the relational database. Database table containing rows and columns; the data in the table, which is stored a row at a time; and the columns which contain attributes or related information. And then the different tables in a database relate to one another and share a column. 09:17 Lois: Customers usually have a mix of applications and data structures, and ideally, they should be able to implement a data management strategy that effectively uses all of their data in applications, right? How does Oracle approach this?  Hope: Oracle's approach to this enterprise data management strategy and architecture is converged database to all different data types and workloads. The converged database is a database that has native support for all modern data types and, of course, traditional relational data.  By providing support for all of these data types, a converged database can run all sorts of workloads, from transaction processing to analytics and machine learning to blockchain to support the applications and systems. Oracle provides a single database engine that supports all data models, process types, and development environments. It also addresses many kinds of workloads against the same data sets. And there's no need to use dozens of specialized databases. Deploying several single-purpose databases would increase costs, complexity, and risk. 10:25 Nikita: In the final part of our conversation today, I want to bring up database development. Hope, how are databases developed?  Hope: Data modeling is the first part of the database development process. Conceptual data modeling is the examination of a business and business data to determine the structure of business information and the rules that govern it. This structure forms the basis for database design. A conceptual model is relatively stable over long periods of time. Physical data modeling, or database building, is concerned with implementation in each technical software and hardware environment. The physical implementation is highly dependent on the current state of technology and is subject to change as available technologies rapidly change. Conceptual model captures the functional and informational needs of a business and is used to identify important entities and their relationships.  A logical model includes the entities and relationships. This is also called an entity relationship model and provides the details of the relationships.  11:34 Lois: I think that's a good place to wrap up our episode. To know more about the Oracle Database architecture, offerings, and so on, visit mylearn.oracle.com. Thanks for joining us today, Hope.  Nikita: Join us next week for another episode of the Oracle University Podcast. Until then, this is Nikita Abraham… Lois: And Lois Houston, signing off! 11:55 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.

The Cloud Pod
265: Swing and a WIF

The Cloud Pod

Play Episode Listen Later Jun 28, 2024 39:48


Welcome to episode 265 of the Cloud Pod Podcast – where the forecast is always cloudy! Justin and Matthew are with you this week, and even though it's a light news week, you're definitely going to want to stick around. We're looking forward to FinOps, talking about updates to Consul, WIF coming to Vault 1.17, and giving an intro to Databricks LakeFlow. Because we needed another lake product. Be sure to stick around for this week's Cloud Journey series too.  Titles we almost went with this week: The CloudPod lets the DataLake flow Amazon attempts an international incident in Taiwan What's your Vector Mysql?  A big thanks to this week's sponsor: We're sponsorless! Want to reach a dedicated audience of cloud engineers? Send us an email, or hit us up on our Slack Channel and let's chat!  General News 01:40 Consul 1.19 improves Kubernetes workflows, snapshot support, and Nomad integration Consul 1.19 is now generally available, improving the user experience, providing flexibility and enhancing integration points.  Consul 1.19 introduces a new registration custom resource definition (CRD) that simplifies the process of registering external services into the mesh.   Consul service mesh already supports routing to services outside of the mesh through terminating gateways. However, there are advantages to using the new Registration CRD.  Consul snapshots can now be stored in multiple destinations, previously, you could only snapshot to a local path or to a remote object store destination but not both.   Now you can take a snapshot of NFS Mounts, San attached Storage, or Object storage.  Consul API gateways can now be deployed on Nomad, combined with transparent proxy and enterprise features like admin partitions  01:37 Matthew- “What I was surprised about, which I did not know, was that console API gateway can now be deployed on Nomad. Was it not able to be deployed before? Just feels weird… you know, consoles should be able to be deployed on nomad compared to that. You know, it’s all the same company, but sometimes team A doesn’t always talk to team B.” 03:21 Vault 1.17 brings WIF, EST support for PKI, and more   Vault 1.17 is now generally available with new secure workflows, better performance and improved secrets management scalability.  Key new features: Workload Identify Federation (WIF) allows you to eliminate concerns around providing security credentials to vault plugins.   Using the new support for WIF< a trust relationship can be established between an external system and va

The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
60 – Interoperability of Data Lake Table Format (Apache Iceberg, Apache Hudi, Delta Lake)

The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists

Play Episode Listen Later Jun 28, 2024


Alex Merced discusses where interoperability tools like Apache Xtable and Uniform

Tech Lead Journal
#175 - How to Solve Real-World Data Analysis Problems - David Asboth

Tech Lead Journal

Play Episode Listen Later May 20, 2024 57:10


“All data scientists and analysts should spend more time in the business, outside the data sets, just to see how the actual business works. Because then you have the context, and then you understand the columns you're seeing in the data." David Asboth, author of “Solve Any Data Analysis Problem” and co-host of the “Half Stack Data Science” podcast, shares practical tips for solving real-world data analysis challenges. He highlights the gap between academic training and industry demands, emphasizing the importance of understanding the business problem and maintaining a results-driven approach. David offers practical insights on data dictionary, data modeling, data cleaning, data lake, and prediction analysis. We also explore AI's impact on data analysis and the importance of critical thinking when leveraging AI solutions. Tune in to level up your skills and become an indispensable, results-driven data analyst.   Listen out for: Career Journey - [00:01:38] Half Stack Data Science Podcast - [00:06:33] Real-World Data Analysis Gaps - [00:10:46] Understanding the Business/Problem - [00:15:36] Result-Driven Data Analysis - [00:18:28] Feedback Iteration - [00:21:44] Data Dictionary - [00:23:48] Data Modeling - [00:27:18] Data Cleaning - [00:30:43] Data Lake - [00:35:05] Common Data Analysis Tasks - [00:36:50] Prediction Analysis - [00:40:23] The Impact of AI on Data Analysis - [00:43:15] Importance of Critical Thinking - [00:47:05] Common Tasks Solved by AI - [00:50:07] 3 Tech Lead Wisdom - [00:53:10] _____ David Asboth's BioDavid is a “data generalist”; currently a freelance data consultant and educator with an MSc. in Data Science and a background in software and web development. With over 6 years experience teaching, he has taught everyone from junior analysts up to C-level executives in industries like banking and management consulting about how to successfully apply data science, machine learning, and AI to their day-to-day roles. He co-hosts the Half Stack Data Science podcast about data science in the real world and is the author of Solve Any Data Analysis Problem, a book about the data skills that aspiring analysts actually need in their jobs, which will be published by Manning in 2024. Follow David: LinkedIn – linkedin.com/in/david-asboth-9256772 Website – davidasboth.com Podcast – halfstackdatascience.com _____ Our Sponsors Manning Publications is a premier publisher of technical books on computer and software development topics for both experienced developers and new learners alike. Manning prides itself on being independently owned and operated, and for paving the way for innovative initiatives, such as early access book content and protection-free PDF formats that are now industry standard.Get a 45% discount for Tech Lead Journal listeners by using the code techlead45 for all products in all formats. Like this episode? Show notes & transcript: techleadjournal.dev/episodes/175. Follow @techleadjournal on LinkedIn, Twitter, and Instagram. Buy me a coffee or become a patron.

The Obsidian Table
START UP: My Top Pick for a 100x Token in DeSci | Data Lake on 100x Podcast

The Obsidian Table

Play Episode Listen Later Apr 25, 2024 46:29


Can Blockchain lead to breakthroughs in science and medicine? Can crypto help create healthier and longer lives? DataLake is doing exactly that by evolving the way patients are recruited for scientific trials. We talked in depth about their token on a past 100x Gem Show, but this time we're joined by their CEO to go deep on DeSci (decentralized science) for the first time in our podcast's history! This is not sponsored in any way whatsoever; we're just genuinely stoked about exploring one of crypto's more unique use-cases, and asking the question... Why should you be paying attention to Data Lake?

Arrow Bandwidth
Spotlight On Sophos UK&I, Episode 3 April 2024, Ireland, Integrations And Partner Care

Arrow Bandwidth

Play Episode Listen Later Apr 12, 2024 24:00


In Episode 3 of the "Spotlight on Sophos" podcast series, we have a guest host, Ross Collins, Arrow Technical Account Manager for Ireland talking to Sophos's Jon Hope about the latest achievements and innovations at Sophos. As well as an update on Arrow Ireland they highlight the benefits of Sophos's integration with Veeam and Cisco Umbrella; how the new Sophos Partner Care team can help partners particularly with the NFR Not For Resale programme; and the hot off the press Adaptive Attack Protection additions. Listeners will also gain valuable insights into Sophos' Data Lake control and industry leading network security features. Tune in to get all the latest technical information in this short, compact, compelling podcast.

Salesforce Developer Podcast
215: The Future of AI with Salesforce's Einstein 1 Studio & Data Cloud featuring Danielle Larregui

Salesforce Developer Podcast

Play Episode Listen Later Mar 19, 2024 17:13


Join us as we welcome the Data Cloud Queen herself, Danielle Larregui. Get ready to witness the groundbreaking power of Einstein 1 Studio as Danielle unveils its transformative capabilities within the Salesforce Data Cloud. Discover how developers can effortlessly create AI models using a no-code or low-code approach directly with their Data Lake data. We'll explore the practicality of generating predictions, integrating external AI platforms, and leveraging built-in tools for assessing prediction accuracy. Brace yourself for the standout feature of 'Bring Your Own Model,' which allows seamless, real-time data sharing without the need for ETL processes. We'll discuss the availability of Snowflake's integration and the potential that lies with Google BigQuery. Imagine how these integrations can revolutionize your external data management, from segmentation to identity resolution.  Stay tuned to learn how Data Cloud Enrichment could further enhance your Salesforce CRM by leveraging the power of Data Cloud data. Show Highlights: Introduction of Einstein 1 Studio and Model Builder within Salesforce Data Cloud for creating AI models using no-code or low-code approaches. How the "Bring Your Own Model" feature enables real-time data sharing with Salesforce Data Cloud without ETL processes. How Data Cloud Enrichment allows Salesforce CRM records to be updated with Data Cloud data. Remote Data Cloud, which could unify data management for organizations with multiple Salesforce instances. Ability to use predictions made by AI models in Salesforce flows, Apex classes, and reporting within Data Cloud. Links: Bring Your Google Vertex AI Models To Data Cloud - https://developer.salesforce.com/blogs/2023/11/bring-your-google-vertex-ai-models-to-data-cloud Use Model Builder to Integrate Databricks Models with Salesforce - https://developer.salesforce.com/blogs/2024/03/use-model-builder-to-integrate-databricks-models-with-salesforce  

The Obsidian Table
100X GEM SHOW: Perion ($PERC) | Mendi Finance ($MENDI) | Data Lake ($LAKE) | Will These 100x?

The Obsidian Table

Play Episode Listen Later Mar 7, 2024 29:14


The 100x Gem Show: where we ask one simple question: can this token 100x? It's the start of a bull market, we're ready to make some crazy gains. Let's see if these tokens are ripe for them! Today's projects are Perion, Mendi Finance, and Data Lake. Here's where you can find the projects on CoinGecko: https://www.coingecko.com/en/coins/mendi-finance https://www.coingecko.com/en/coins/perion https://www.coingecko.com/en/coins/data-la This is not a paid episode. None of them were informed we'd be analyzing the token. Find our speakers this week: Matthew Walker - https://twitter.com/hawaiianmint Cesar Martinez: https://twitter.com/poppabigmac Our Current Partners: Astrabit Trading: https://astrabit.io/ Shrapnel: https://twitter.com/playSHRAPNEL Kadena: https://twitter.com/kadena_io Blocksquare: https://twitter.com/blocksquare_io FortBlockGames: https://twitter.com/FortBlockGames Disclosures: As always, we want to stress that nothing in this is financial investment advice. Our goal with these conversations is to give everyone listening one more tool in their belt to utilize while they do their own research and learn more about crypto. 100x Podcast Partners are not endorsements to purchase or invest. They are projects or brands who have (at a minimum) purchased ad space in our podcast (which is how we fund the podcast's operations). We meet with them, often have them on the podcast so you can hear from them directly, and often find additional ways to support each other (like introducing us to other cool guests). Please do your own research. Time stamps: Intro: 00:00 Partner Highlight (Astrabit, Shrapnel, FortBlockGames) : 01:13 Perion: 3:36 Partner Highlight (Kadena, Blocksquare): 13:20 Mendi Finance: 14:51 Data Lake: 20:25

ApartmentHacker Podcast
Vidur Gupta | Beekin | CollectiveConversations

ApartmentHacker Podcast

Play Episode Listen Later Feb 20, 2024 26:18


In this conversation, Mike Brewer interviews Vidur Gupta, the founder and CEO of Beacon, an analytics platform for real estate. They discuss the power of AI and machine learning in property management, the benefits of using data to make informed decisions, and the role of machine learning models in predicting resident behavior. Vidur explains how Beacon's products and solutions help investors and operators across the asset lifecycle, from underwriting to management and financing. He also highlights the importance of creating a culture of trust and empowerment within an organization. The conversation concludes with recommendations for operators on adopting AI and a book recommendation: 'The Age of AI' by Eric Schmidt. Takeaways Beacon is an analytics platform that uses AI and machine learning to help investors and operators make data-driven decisions in real estate. Machine learning models can analyze large pools of data to predict resident behavior, optimize pricing, and measure social impact. AI removes emotion from decision-making and provides a more objective and accurate approach. Creating a culture of trust and empowerment is essential for building a successful organization. Operators should have a framework for evaluating AI and understand its limitations and benefits. Chapters 00:00 Introduction and Origin of the Name Beacon 01:15 Unpacking What Beacon Does 02:12 The AI Component of Beacon 03:13 The Power of AI and Machine Learning 04:45 Data Lake and Modeling 05:38 Using Data to Make Informed Decisions 06:05 Machine Learning in Property Management 07:35 The Power of Machine Learning in Decision-Making 08:17 Weighting Variables in Machine Learning Models 09:09 The Dynamic Nature of Machine Learning Models 09:49 The Benefits of Machine Learning over Rules-Based Models 10:48 Applying Machine Learning to Real Estate 11:26 Removing Emotion from Decision-Making with AI 12:43 Using AI to Overcome Biases in Decision-Making 13:39 Building the Future State with Beacon 14:34 Products and Solutions Offered by Beacon 15:52 Predicting Resident Lease Renewals 16:53 Dynamic Pricing of Leases 17:22 Measuring Social Impact with a Score 19:03 Using Beacon for Acquisition and CapEx Planning 26:48 Creating a Culture of Trust and Empowerment 29:47 Drawing Inspiration as a Leader 33:34 Recommended Books: The Age of AI 34:34 Advice for Operators on Adopting AI --- Send in a voice message: https://podcasters.spotify.com/pod/show/mike-brewer/message Support this podcast: https://podcasters.spotify.com/pod/show/mike-brewer/support

Software Engineering Daily
Building a Data Lake with Adam Ferrari

Software Engineering Daily

Play Episode Listen Later Feb 6, 2024 46:19 Very Popular


Starburst is a data lake analytics platform. It's designed to help users work with structured data at scale, and is built on the open source platform, Trino. Adam Ferrari is the SVP of Engineering at Starburst. He joins the show to talk about Starburst, data engineering, and what it takes to build a data lake. The post Building a Data Lake with Adam Ferrari appeared first on Software Engineering Daily.

Podcast – Software Engineering Daily
Building a Data Lake with Adam Ferrari

Podcast – Software Engineering Daily

Play Episode Listen Later Feb 6, 2024 46:19


Starburst is a data lake analytics platform. It’s designed to help users work with structured data at scale, and is built on the open source platform, Trino. Adam Ferrari is the SVP of Engineering at Starburst. He joins the show to talk about Starburst, data engineering, and what it takes to build a data lake. The post Building a Data Lake with Adam Ferrari appeared first on Software Engineering Daily.

The Crypto Conversation
Subsquid Network - The Web3 Data Lake

The Crypto Conversation

Play Episode Listen Later Feb 1, 2024 45:05


Dr. Dmitry Zhelezov is the Co-Founder and CEO and Marcel Fohrmann is the Co-Founder and CFO of Subsquid, a decentralized data lake and query engine that offers developers permissionless, cost-efficient access to on-chain data from over 100 chains and is integrated into a large ecosystem of Web2- and Web3-native developer tools. Why you should listen Subsquid Network is a decentralized query engine optimized for batch extraction of large volumes of data. It currently serves historial on-chain data ingested from 100+ EVM and Substrate networks, including event logs, transaction receipts, traces and per-transaction state diffs. In the future, it will additionally support general-purpose SQL queries and an ever-growing collection of structured data sets derived from on- and off- chain data. Supporting links Bitget Bitget Academy Bitget Research Bitget Wallet Subsquid Andy on Twitter  Brave New Coin on Twitter Brave New Coin If you enjoyed the show please subscribe to the Crypto Conversation and give us a 5-star rating and a positive review in whatever podcast app you are using.  

Data Engineering Podcast
Build A Data Lake For Your Security Logs With Scanner

Data Engineering Podcast

Play Episode Listen Later Jan 29, 2024 62:38


Summary Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. The majority of products that are available either require too much effort to structure the logs, or aren't fast enough for interactive use cases. Cliff Crosland co-founded Scanner to provide fast querying of high scale log data for security auditing. In this episode he shares the story of how it got started, how it works, and how you can get started with it. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst (https://www.dataengineeringpodcast.com/starburst) and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Your host is Tobias Macey and today I'm interviewing Cliff Crosland about Scanner, a security data lake platform for analyzing security logs and identifying issues quickly and cost-effectively Interview Introduction How did you get involved in the area of data management? Can you describe what Scanner is and the story behind it? What were the shortcomings of other tools that are available in the ecosystem? What is Scanner explicitly not trying to solve for in the security space? (e.g. SIEM) A query engine is useless without data to analyze. What are the data acquisition paths/sources that you are designed to work with?- e.g. cloudtrail logs, app logs, etc. What are some of the other sources of signal for security monitoring that would be valuable to incorporate or integrate with through Scanner? Log data is notoriously messy, with no strictly defined format. How do you handle introspection and querying across loosely structured records that might span multiple sources and inconsistent labelling strategies? Can you describe the architecture of the Scanner platform? What were the motivating constraints that led you to your current implementation? How have the design and goals of the product changed since you first started working on it? Given the security oriented customer base that you are targeting, how do you address trust/network boundaries for compliance with regulatory/organizational policies? What are the personas of the end-users for Scanner? How has that influenced the way that you think about the query formats, APIs, user experience etc. for the prroduct? For teams who are working with Scanner can you describe how it fits into their workflow? What are the most interesting, innovative, or unexpected ways that you have seen Scanner used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Scanner? When is Scanner the wrong choice? What do you have planned for the future of Scanner? Contact Info LinkedIn (https://www.linkedin.com/in/cliftoncrosland/) Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. Links Scanner (https://scanner.dev/) cURL (https://curl.se/) Rust (https://www.rust-lang.org/) Splunk (https://www.splunk.com/) S3 (https://aws.amazon.com/s3/) AWS Athena (https://aws.amazon.com/athena/) Loki (https://grafana.com/oss/loki/) Snowflake (https://www.snowflake.com/en/) Podcast Episode (https://www.dataengineeringpodcast.com/snowflakedb-cloud-data-warehouse-episode-110/) Presto (https://prestodb.io/) Trino (thttps://trino.io/) AWS CloudTrail (https://aws.amazon.com/cloudtrail/) GitHub Audit Logs (https://docs.github.com/en/organizations/keeping-your-organization-secure/managing-security-settings-for-your-organization/reviewing-the-audit-log-for-your-organization) Okta (https://www.okta.com/) Cribl (https://cribl.io/) Vector.dev (https://vector.dev/) Tines (https://www.tines.com/) Torq (https://torq.io/) Jira (https://www.atlassian.com/software/jira) Linear (https://linear.app/) ECS Fargate (https://aws.amazon.com/fargate/) SQS (https://aws.amazon.com/sqs/) Monoid (https://en.wikipedia.org/wiki/Monoid) Group Theory (https://en.wikipedia.org/wiki/Group_theory) Avro (https://avro.apache.org/) Parquet (https://parquet.apache.org/) OCSF (https://github.com/ocsf/) VPC Flow Logs (https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)

Digitally Transformed
Unlock the Power of Your Data with Microsoft Fabric

Digitally Transformed

Play Episode Listen Later Dec 21, 2023 40:20


Join host Justin Starbird and industry experts Connor O'Neill, Director of Advanced Analytics, and Brian Smith, Senior Strategic Advisor from Infused Innovations, on the latest episode of Digitally Transformed, where we dive into the revolutionary landscape of Microsoft Fabric.Explore the Tech Horizon: Get ready for a captivating discussion on the markers guiding businesses into the future of technology with insights into planning for the next year, touching on the challenges and opportunities presented by Microsoft Fabric. Discover how this innovative project is poised to reshape data solutions and AI.Microsoft Fabric Unveiled: Unravel the layers of Microsoft Fabric. Learn how this unified Data Lake platform simplifies data integration, engineering, and analytics. Explore specific use cases, from AI and machine learning to leveraging large language models. Discover how Fabric empowers businesses to harness the power of their data effectively.Democratizing Data for All: Explore how Microsoft Fabric breaks down barriers, making advanced tools accessible to businesses of all sizes. Our experts shed light on the platform's dynamic pricing and ease of use, revolutionizing data analytics for small and medium-sized enterprises. The hosts also address potential concerns, providing valuable insights into the ongoing evolution of Fabric during its preview phase.Embark on Your Digital Transformation Journey: Discover the benefits of transforming your data into a strategic asset by embracing technologies like Microsoft Fabric to gain a competitive edge in the rapidly evolving digital landscape. Join Justin, Connor, and Brian as they demystify Microsoft Fabric and provide a compelling overview of its capabilities. Tune in today to gain deeper insights into the features and benefits of this game-changing technology!Digitally Transformed promises to keep you informed and empowered in the dynamic realm of digital transformation!

The New Stack Podcast
Integrating a Data Warehouse and a Data Lake

The New Stack Podcast

Play Episode Listen Later Nov 16, 2023 20:59


TNS host Alex Williams is joined by Florian Valeye, a data engineer at Back Market, to shed light on the evolving landscape of data engineering, particularly focusing on Delta Lake and his contributions to open source communities. As a member of the Delta Lake community, Valeye discusses the intersection of data warehouses and data lakes, emphasizing the need for a unified platform that breaks down traditional barriers.Delta Lake, initially created by Databricks and now under the Linux Foundation, aims to enhance reliability, performance, and quality in data lakes. Valeye explains how Delta Lake addresses the challenges posed by the separation of data warehouses and data lakes, emphasizing the importance of providing asset transactions, real-time processing, and scalable metadata.Valeye's involvement in Delta Lake began as a response to the challenges faced at Back Market, a global marketplace for refurbished devices. The platform manages large datasets, and Delta Lake proved to be a pivotal solution in optimizing ETL processes and facilitating communication between data scientists and data engineers.The conversation delves into Valeye's journey with Delta Lake, his introduction to Rust programming language, and his role as a maintainer in the Rust-based library for Delta Lake. Valeye emphasizes Rust's importance in providing a high-level API with reliability and efficiency, offering a balanced approach for developers.Looking ahead, Valeye envisions Delta Lake evolving beyond traditional data engineering, becoming a platform that seamlessly connects data scientists and engineers. He anticipates improvements in data storage optimization and envisions Delta Lake serving as a standard format for machine learning and AI applications.The conversation concludes with Valeye reflecting on his future contributions, expressing a passion for Rust programming and an eagerness to explore evolving projects in the open-source community. Learn more from The New Stack about Delta Lake and The Linux Foundation:Delta Lake: A Layer to Ensure Data QualityData in 2023: Revenge of the SQL NerdsWhat Do You Know about Your Linux System?

Cloud N Clear
The Future of Customer Data in the Hybrid Cloud: A Conversation with Industry Experts | EP 166

Cloud N Clear

Play Episode Listen Later Oct 31, 2023 21:10


Brian Suk, Associate CTO at SADA hosts episode 166 of Cloud N Clear to discuss all things data in a hybrid cloud world. He is joined by Adrian Estala, VP Field CDO at Starburst – a data-driven company offering a full-featured data lake analytics platform, built on open source Trino. Learn more about simplifying customer pipelines to make data more useful, insights on Data Lakehouse tools, and how to get your data ‘right' and get it right fast.  Join us in this engaging episode, and don't forget to LIKE, SHARE, & SUBSCRIBE for more enlightening content! ✅  

Ready, Set, Cloud Podcast!
How Scanner Built an Ultra-Fast Serverless Data Lake with Cliff Crosland

Ready, Set, Cloud Podcast!

Play Episode Listen Later Oct 20, 2023 27:42


Have you ever wondered why querying your data lakes were so slow? Or, if you're like Allen, did you ever wonder what a data lake actually is? Join Cliff Crosland as he explains how the Scanner team has changed data lakes forever by going serverless. This episode is a showcase of some brilliant engineering to solve a problem in a serverless manner. About Cliff Cliff is the CEO and co-founder of Scanner.dev, a security data lake product built for scale, speed, and cost efficiency. Prior to founding Scanner, he was a Principal Engineer at Cisco where he led the backend infrastructure team for the Webex People Graph. He was also the engineering lead for the data platform team at Accompany before its acquisition by Cisco. He has a love-hate relationship with Rust, but it's mostly love these days. Links Twitter - https://twitter.com/CliftonCrosland LinkedIn - https://www.linkedin.com/in/cliftoncrosland Scanner - https://scanner.dev Blog - Serverless Speed: Rust vs Go, Java, and Python - https://blog.scanner.dev/serverless-speed-rust-vs-go-java-python-in-aws-lambda-functions --- Send in a voice message: https://podcasters.spotify.com/pod/show/readysetcloud/message Support this podcast: https://podcasters.spotify.com/pod/show/readysetcloud/support

The Cloud Pod
230: If I Ever Own a Sailboat, I Will Name it Kafka… and Sail it on a Data Lake

The Cloud Pod

Play Episode Listen Later Oct 11, 2023 54:50


Welcome to The Cloud Pod episode 230, where the forecast is always cloudy! This week we're sailing our pod across the data lake and talking about updates to managed delivery from Kafka. We also take a gander at Bedrock, some new security tools from our friends over at Google. We're also back with our Cloud Journey Series talking security theater.Stay Tuned!   Titles we almost went with this week:

Microsoft Mechanics Podcast
Automate data-driven actions | Data Activator in Microsoft Fabric

Microsoft Mechanics Podcast

Play Episode Listen Later Oct 5, 2023 8:31


React fast to changes in data with an automated system of detection and action using Data Activator. Monitor and track changes at a granular level as they happen, instead of at an aggregate level, where important insights may be left in the detail and have already become a problem. As a domain expert, this provides a no code way to take data, whether real-time streaming from your IoT devices, or batch data collected from your business systems, and dynamically monitor patterns by establishing conditions. When these conditions are met, Data Activator automatically triggers specific actions, such as notifying dedicated teams or initiating system-level remediations.  Join Will Thompson, Group Product Manager for Data Activator, as he shares how to monitor granular high volume of operational data and translate it into specific actions. ► QUICK LINKS: 00:00 - Monitor and track operational data in real-time 00:53 - Demo: Logistics company use case 02:49 - Add a condition 04:04 - Test actions 04:36 - Batch data 06:21 - Trigger an automated workflow 07:12 - How it works 08:12 - Wrap up ► Link References Get started at https://aka.ms/dataActivatorPreview Check out the Data Activator announcement blog at https://aka.ms/dataActivatorBlog ► Unfamiliar with Microsoft Mechanics? As Microsoft's official video series for IT, you can watch and share valuable content and demos of current and upcoming tech from the people who build it at Microsoft. • Subscribe to our YouTube: https://www.youtube.com/c/MicrosoftMechanicsSeries • Talk with other IT Pros, join us on the Microsoft Tech Community: https://techcommunity.microsoft.com/t5/microsoft-mechanics-blog/bg-p/MicrosoftMechanicsBlog • Watch or listen from anywhere, subscribe to our podcast: https://microsoftmechanics.libsyn.com/podcast ► Keep getting this insider knowledge, join us on social: • Follow us on Twitter: https://twitter.com/MSFTMechanics • Share knowledge on LinkedIn: https://www.linkedin.com/company/microsoft-mechanics/ • Enjoy us on Instagram: https://www.instagram.com/msftmechanics/ • Loosen up with us on TikTok: https://www.tiktok.com/@msftmechanics  

Defense in Depth
How to Prime Your Data Lake

Defense in Depth

Play Episode Listen Later Sep 14, 2023 27:18


All links and images for this episode can be found on CISO Series. A security data lake, a data repository of everything you need to analyze and get analyzed sounds wonderful. But priming that lake, and stocking it with the data you want to get the insights you need is a more difficult task than it seems. Check out this post for the discussion that is the basis of our conversation on this week's episode co-hosted by me, David Spark (@dspark), the producer of CISO Series, and Geoff Belknap (@geoffbelknap), CISO, LinkedIn. Joining us is our sponsored guest, Matt Tharp, Head of Field Engineering, Comcast DataBee. Thanks to our podcast sponsor, Comcast Technology Solutions In this episode: What exactly is a data lake? How are people thinking about and handling the risks? If you want security data lakes to be successful, what customer problem are you trying to solve? How can you make it both dead simple to use AND highly effective?

AI Today Podcast: Artificial Intelligence Insights, Experts, and Opinion
AI Today Podcast: AI Glossary Series – Data Warehouse, Data Lake, Extract Transform Load (ETL)

AI Today Podcast: Artificial Intelligence Insights, Experts, and Opinion

Play Episode Listen Later Sep 8, 2023 16:07


In this episode of the AI Today podcast hosts Kathleen Walch and Ron Schmelzer define the terms Data, Dataset, Big Data, DIKUW Pyramid, explain how these terms relate to AI and why it's important to know about them. Show Notes: FREE Intro to CPMAI mini course CPMAI Training and Certification AI Glossary AI Glossary Series – DevOps, Machine Learning Operations (ML Ops) AI Glossary Series – Automated Machine Learning (AutoML) AI Glossary Series – Data Preparation, Data Cleaning, Data Splitting, Data Multiplication, Data Transformation AI Glossary Series – Data Augmentation, Data Labeling, Bounding box, Sensor fusion AI Glossary Series – Data, Dataset, Big Data, DIKUW Pyramid Continue reading AI Today Podcast: AI Glossary Series – Data Warehouse, Data Lake, Extract Transform Load (ETL) at Cognilytica.

FemInnovation
Ep. 7: Data Donation and AI in Women's Health: A Conversation with Ligia Kornowska

FemInnovation

Play Episode Listen Later Sep 6, 2023 31:54


In this episode of FemInnovation, host Bethany Corbin sits down with Ligia Kornowska, an internationally recognized leader in AI in medicine, telemedicine, and medical data, and 2x Forbes 30 under 30 honoree. Ligia is the Managing Director of the Polish Hospital Federation (the largest hospital organization in Poland), Co-Founder and Chair of the Board of Data Lake, and President of the Donate Your Data Foundation. She is also a recipient of the ESET Heroes of Progress award and has been listed among the 100 most influential people in Polish healthcare for the last 3 years. In this episode, Bethany and Ligia discuss democratizing access to medical data (with patient consent!) to help advance women's healthcare and close the gender data gap. They break down the use of blockchain technology to give medical organizations and startups access to essential data to help improve, test, and refine their products and offerings. Bethany and Ligia also discuss best practices and concerns with AI that femtech and healthtech startups should know, and the importance of breaking into emerging femtech markets worldwide. RELEVANT LINKS: More about Data Lake at this website.  More about Ligia's work at the International Hospital Federation at this website. Connect with Our Guest, Ligia Kornowska: LINKEDIN | INSTAGRAM | TWITTER/XFollow Our Host:WEBSITE | LINKEDIN | TWITTER | INSTAGRAM | ADDITIONAL LINKS

Who's your Data? Podcast
Ep21: Ethical Sourcing of Medical Data

Who's your Data? Podcast

Play Episode Listen Later Aug 28, 2023 47:54


Progress in healthcare and medical research requires a lot of data. In the era of Big Data health data is among the most valuable and the most private information anyone can have. Two major hurdles with finding quality medical data are access to that data in a private and ethical way as well as bias on the data due to underrepresentation of women, certain racial or ethnic groups. In this episode I talk to Dinidh O'Brien about how donateyourdata.org and DataLake are trying to solve this problem with a patient-first approach to sourcing medical data for research. Data Lake is an EU-funded start-up creating a global medical data donation system based on blockchain technology, with privacy and informed consent as fundamental pillars. We discuss this data donation framework and how it addresses the issues of privacy, consent, data monetization and working to minimize biases. Dinidh explains how they approach patients to opt in, how they vet organizations that request access to this data and how they plan to expand throughout Europe and the US.

RunAs Radio
Microsoft Fabric with Andrew Snodgrass

RunAs Radio

Play Episode Listen Later Aug 23, 2023 41:27


What is Microsoft Fabric, and why do you want some? Richard talks to Andrew Snodgrass of Directions on Microsoft about Microsoft's recently announced Fabric product. Andrew explains that Fabric is an effort to integrate the various data products, including PowerBI, DataLake, Data Factory, and Data Warehousing, under a standard banner. It is early days for Fabric, but it's a great time to take it out for a spin for those who haven't dug into Azure data analytics products. But if you have existing implementations of PowerBI and many other data products, test carefully - the migration paths aren't simple!Links:Microsoft FabricAzure Synapse AnalyticsAzure Data FactoryPower BIFabric WorkspacesOneLakeKusto Query Language (KQL)Parquet Files in FabricMicrosoft PurviewOneLake File ExplorerRecorded July 12, 2023

Data Protection Gumbo
205: Plumbing the Depths of Unstructured Data - Superna

Data Protection Gumbo

Play Episode Listen Later Jul 18, 2023 28:00


Alex Hesterberg, CEO at Superna embarks on a captivating exploration of the expanding world of unstructured data and data security trends. The discussion gets fervid as Alex enlightens us about the amplified use of unstructured data platforms in recovery and resiliency along with its application in tier zero platforms like SAP HANA. His insights about data integration into business intelligence tools and the potential of AI and ML technologies like ChatGPT are truly riveting.

Equity
Is ChatGPT the iBeer of LLMs?

Equity

Play Episode Listen Later Jul 12, 2023 32:01


This week we had a very special guest on the podcast: Matthew Lynley, one of the founding hosts of Equity and a former TechCruncher. Since his Equity days, Lynley went off and started his very own AI-focused publication called Supervised.We brought him back on the show to ask him questions in a format where we can all learn together. Here's what we got into:From Transformers to GPT4: How attention became so critical inside of neural networks, and how transformers set the path for modern AI services.Recent acquisitions in the AI space, and what it means for the “LLM stack:” With Databricks buying MosaicML and Snowflake already busy with its own checkbook, a lot of folks are working to build out a full-stack LLM data extravaganza. We talked about what that means.Where startups sit in the current AI race: While it's great to think about the majors, we also need to know what the startup angle is. The answer? It's a little early to say, but what is clear is that startups are taking some big swings at the industry and are hellbent to snag a piece of the pie.Thanks to everyone for hanging out with us. Equity is back on Friday for our weekly news roundup!For episode transcripts and more, head to Equity's Simplecast website.Equity drops at 7 a.m. PT every Monday, Wednesday and Friday, so subscribe to us onApple Podcasts, Overcast, Spotify and all the casts. TechCrunch also has a great show on crypto, a show that interviews founders, one that details how our stories come together and more!

Programmatic Digest's podcast
127. Data Privacy, Data Clean Rooms, Identity Discussion with U of Digital's Myles Younger

Programmatic Digest's podcast

Play Episode Listen Later Jun 20, 2023 54:50


When this podcast was launched back in 2018, one of the biggest reasons was to share knowledge and highlight diversity. In the last 4-5 years, it grew into a community where we meet weekly and talk all things programmatic activations and industry trends.  With that said, one of our goals in 2023, was to invite more guests during the free community where members would have the opportunity to learn and ask questions directly.  Myles Younger joined us in our weekly community call, aka the Programmatic Meetup. In this episode, Myles talks about data privacy, data clean rooms, and identity from definition to hot takes. At the latter part of the episode, some of our ninjas had the chance to ask questions directly to Myles and discuss as a group. Truly a wonderful opportunity and experience!   Thanks to our friend at U of Digital!    About Us: Our mission is to teach historically excluded people how to get started in programmatic media buying and find a dream job.  We do so by providing on-demand lessons via the Reach and Frequency™️ program, a dope community with like-minded programmatic experts, and live free and paid group coaching.  Hélène Parker has over 10 years of experience in programmatic media buying, servicing agencies and brands in activation, strategy and planning, and leadership.  She now dedicates her time to recruiting and training programmatic traders while consulting companies on how to grow and scale a programmatic department.  Interested in training or hiring programmatic juniors? Book a Free Call   Timestamp: 00:00:29 - 2 Wins and a Challenge 00:03:42 - Myles Younger Introduction  00:06:25 - Myles' shift into programmatic 00:07:52 - Defining programmatic to a 5 years old 00:15:40 - Latest important news about data privacy 00:21:28 - Changing the meaning of third party cookies 00:26:20 - Data Clean Rooms 00:31:18 - DMP Obsoletion 00:33:12 - Data Clean Rooms Accessibility 00:36:16 - Data Clean Rooms difference from DMP or Data Lake 00:39:04 - Question and Answer 00:50:54 - Words of Wisdom from Myles Younger   Interested in finding out if you are a fit for a career in digital advertising and programmatic? Take our free Quiz: www.heleneparker.com/programmaticquiz   Guest Information: Myles Younger LinkedIn U of Digital Website | LinkedIn | Newsletter Meet Our Team: Hélène Parker - Chief Programmatic Coach Website | LinkedIn | Twitter | The Reach & Frequency Course Programmatic Digest - Youtube | LinkedIn | Instagram Alexa Gabrielle Ramos - Podcast Editor Instagram | Website | LinkedIn  S and S Creative Media - Podcast and Media Manager Instagram | Website | LinkedIn   Get this directly in your inbox weekly including more gems!  Let's keep in touch: Sign up to receive our weekly newsletter here: www.heleneparker.com/newsletter Join our next training program by signing up to our waitlist below: https://www.heleneparker.com/waitlist/     Also take a moment to check out: How To Optimise Data Segment: https://youtu.be/boj0SJF5kn8  Join Our Slack Channel for programmatic ninjas looking to level up and build a network: https://join.slack.com/t/theprogrammaticmeetup/shared_invite/zt-1nlaoighs-ES98OYwn67rkk1vqgC4i9Q  

Engenharia de Dados [Cast]
Simplify Data Engineering Projects in Your Lakehouse with Delta Lake Framework with Matthew Powers & Denny Lee, Developer Advocates at Databricks

Engenharia de Dados [Cast]

Play Episode Listen Later May 23, 2023 72:32


No episódio de hoje, Luan Moreno e Mateus Oliveira entrevistaram Denny Lee & Mathew Powers, atualmente Developer Advocates na Databricks.Delta Lake é um produto open-source, que nos permite aplicar o famoso Data Lakehouse {Data Lake + Data Warehouse}, desenvolvido pela empresa dos criadores do Apache Spark. Delta Lake resolve o problema do Apache Spark, armazenamento, processamento de dados no Data Lake de forma otimizada.Com Delta Lake, você tem os seguintes benefícios:Formato de arquivo como se fosse uma tabela;Time Travel;ACID;Batch e Streaming Unificados.Falamos também nesse bate-papo sobre os seguintes temas:Estado da arte dos dados;Delta Lake.Aprenda mais sobre Delta Lake, como utilizar uma tecnologia para Data LakeHouse, junto com o time da databricks que mais impulsiona a comunidade com conteúdos, releases e eventos para ajudar este produto open-source.Denny Lee - Linkedin Mathew Powers - Linkedinhttps://delta.io/ Luan Moreno = https://www.linkedin.com/in/luanmoreno/

The My Love of Golf Podcast
Brooks win brings a full team PGA Championship Review. Teepster & Data Lake Tips with Rossco, Rocket & Magic. | THE MLOG PODCAST EP239

The My Love of Golf Podcast

Play Episode Listen Later May 23, 2023 69:31


PGA Championship Review What a great tournament. It delivered right until the end. The team breaks down Brooks Koepka's win and reviews some of the other notable moments from the PGA Championship. Of course, we look through the Teepster results and check in on this week's Charles Schwab being held at Colonial.

Data Engineering Podcast
Keep Your Data Lake Fresh With Real Time Streams Using Estuary

Data Engineering Podcast

Play Episode Listen Later May 21, 2023 55:50


Summary Batch vs. streaming is a long running debate in the world of data integration and transformation. Proponents of the streaming paradigm argue that stream processing engines can easily handle batched workloads, but the reverse isn't true. The batch world has been the default for years because of the complexities of running a reliable streaming system at scale. In order to remove that barrier, the team at Estuary have built the Gazette and Flow systems from the ground up to resolve the pain points of other streaming engines, while providing an intuitive interface for data and application engineers to build their streaming workflows. In this episode David Yaffe and Johnny Graettinger share the story behind the business and technology and how you can start using it today to build a real-time data lake without all of the headache. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudderstack (https://www.dataengineeringpodcast.com/rudderstack) Your host is Tobias Macey and today I'm interviewing David Yaffe and Johnny Graettinger about using streaming data to build a real-time data lake and how Estuary gives you a single path to integrating and transforming your various sources Interview Introduction How did you get involved in the area of data management? Can you describe what Estuary is and the story behind it? Stream processing technologies have been around for around a decade. How would you characterize the current state of the ecosystem? What was missing in the ecosystem of streaming engines that motivated you to create a new one from scratch? With the growth in tools that are focused on batch-oriented data integration and transformation, what are the reasons that an organization should still invest in streaming? What is the comparative level of difficulty and support for these disparate paradigms? What is the impact of continuous data flows on dags/orchestration of transforms? What role do modern table formats have on the viability of real-time data lakes? Can you describe the architecture of your Flow platform? What are the core capabilities that you are optimizing for in its design? What is involved in getting Flow/Estuary deployed and integrated with an organization's data systems? What does the workflow look like for a team using Estuary? How does it impact the overall system architecture for a data platform as compared to other prevalent paradigms? How do you manage the translation of poll vs. push availability and best practices for API and other non-CDC sources? What are the most interesting, innovative, or unexpected ways that you have seen Estuary used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Estuary? When is Estuary the wrong choice? What do you have planned for the future of Estuary? Contact Info Dave Y (mailto:dave@estuary.dev) Johnny G (mailto:johnny@estuary.dev) Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. To help other people find the show please leave a review on Apple Podcasts (https://podcasts.apple.com/us/podcast/data-engineering-podcast/id1193040557) and tell your friends and co-workers Links Estuary (https://estuary.dev) Try Flow Free (https://dashboard.estuary.dev/register) Gazette (https://gazette.dev) Samza (https://samza.apache.org/) Flink (https://flink.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/apache-flink-with-fabian-hueske-episode-57/) Storm (https://storm.apache.org/) Kafka Topic Partitioning (https://www.openlogic.com/blog/kafka-partitions) Trino (https://trino.io/) Avro (https://avro.apache.org/) Parquet (https://parquet.apache.org/) Fivetran (https://www.fivetran.com/) Podcast Episode (https://www.dataengineeringpodcast.com/fivetran-data-replication-episode-93/) Airbyte (https://www.dataengineeringpodcast.com/airbyte-open-source-data-integration-episode-173/) Snowflake (https://www.snowflake.com/en/) BigQuery (https://cloud.google.com/bigquery) Vector Database (https://learn.microsoft.com/en-us/semantic-kernel/concepts-ai/vectordb) CDC == Change Data Capture (https://en.wikipedia.org/wiki/Change_data_capture) Debezium (https://debezium.io/) Podcast Episode (https://www.dataengineeringpodcast.com/debezium-change-data-capture-episode-114/) MapReduce (https://en.wikipedia.org/wiki/MapReduce) Netflix DBLog (https://netflixtechblog.com/dblog-a-generic-change-data-capture-framework-69351fb9099b) JSON-Schema (http://json-schema.org/) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)

The CTO Advisor
Understanding a the value of a Data Lake House

The CTO Advisor

Play Episode Listen Later May 17, 2023


Overview This episode of the CTO Advisor podcast discusses IBM Watsonx.Data, a part of IBM's AI platform and toolset. The conversation focuses on the concept of a data lake house and its role in data governance and analytics. Tony Baer, …

Facts Not Feelings with Brooke C. Furniss
Data Activation in Automotive Industry: Myths, Risks, and Best Practices with Brian Davis of Orbee

Facts Not Feelings with Brooke C. Furniss

Play Episode Listen Later May 4, 2023 60:54


Welcome to another episode of Facts Not Feelings. Join us as we sit down with Brian Davis, the Vice President of Sales and Solutions at Orbee, who has extensive experience in helping automotive companies leverage their data for business success.In this episode, we explore the challenges faced by marketers in activating a data lake, the importance of vendor collaboration, and the potential risks associated with data activation. We also discuss the role of AI and machine learning in this field, as well as the evolution of first-party data usage and the ethical and transparent use of data.As the automotive industry continues to evolve and become increasingly data-driven, the insights and best practices shared by Brian will be invaluable to anyone looking to succeed in this space. So don't miss out on this opportunity to learn from a true expert in the field - tune in to this episode of Facts Not Feelings now!Connect with Brian Davis: https://qrco.de/bdujPLLet BZ Consultants Inspect What Should Be Expected

Datacenter Technical Deep Dives
Building Security Data Lake on AWS in AWS with Richard Fan

Datacenter Technical Deep Dives

Play Episode Listen Later Apr 4, 2023 60:24


Lahiru Hewawasam is Security Engineer for ExpressVPN and an AWS Community Builder. In this episode he talks about architectural best practices when building a data lake dedicated to security concerns. Resources: https://www.linkedin.com/in/richardfan1126/ https://twitter.com/richardfan1126

The Cloudcast
Data Lakehouses and Apache Hudi

The Cloudcast

Play Episode Listen Later Feb 15, 2023 30:59


Kyle Weller (@KyleJWeller, Head of Product @onehousehq) talks about the latest trends in  OSS Data Lakes, Data Warehouses, and the evolution to “Data Lakehouses” with Apache HudiSHOW: 694CLOUD NEWS OF THE WEEK - http://bit.ly/cloudcast-cnotwNEW TO CLOUD? CHECK OUT - "CLOUDCAST BASICS"SHOW SPONSORS:Datadog Synthetic Monitoring: Frontend and Backend Modern MonitoringEnsure frontend issues don't impair user experience by detecting user-facing issues with API and browser tests with a free 14 day Datadog trial. Listeners of The Cloudcast will also receive a free Datadog T-shirt. Solve your IAM mess with Strata's Identity Orchestration platformHave an identity challenge you thought was too big, too complicated, or too expensive to fix? Let us solve it for you! Visit strata.io/cloudcast to share your toughest IAM challenge and receive a set of AirPods ProHow to Fix the Internet (A new podcast from the EFF)SHOW NOTES:Onehouse (homepage)Onehouse raises $25M Series A fundingApache Hudi (homepage)Delta Lake (homepage)Apache Iceberg (homepage)​​Apache Hudi vs Delta Lake vs Apache Iceberg - Lakehouse Feature ComparisonTopic 1 - Welcome to the show. Tell us a little bit of your background, and where you focus your efforts at Onehouse?Topic 2 - Your focus is on an emerging open source project, Apache Hudi. Before we dive into the project and technologies, we're always interested in the background of what drove the creation of new projects. What problems existed before Hudi? Topic 3 - Let's dive into Hudi. Data lakes, Delta Lakes, Lake houses, Icebergs. What is going on with all these water metaphors?  Topic 4 - Hudi is focused on streaming data lakes. What are some of the things (types of applications) that need a streaming data lake? Where do transactions come into play? Where do data warehouse capabilities come into play?Topic 5 - Stitching together open source projects and platforms can be complicated. How does the Onehouse platform simplify all of this for either data scientists or platform teams?Topic 6 - What are some examples of how companies are using Onehouse and Hudi today? FEEDBACK?Email: show at the cloudcast dot netTwitter: @thecloudcastnet

Zavtracast (Завтракаст)
Магнитное Поле №4 – Data Lake, Data Governance и управление данными

Zavtracast (Завтракаст)

Play Episode Listen Later Dec 28, 2022 50:51


Под конец 2022 года мы с вами не прощаемся, а публикуем четвертый выпуск подкаста “Магнитное Поле”, который мы делаем совместно с IT-командой ритейлера Магнит. Как обычно, мы стараемся максимально просто рассказать вам о сложном. Вот и в этот раз мы залезаем в дебри управления данными, чтобы постараться разобраться, что же такое Data Governance, Data Lake, как правильно валидировать данные, как их хранить, зачем вообще все это нужно. Помогает в этом нам гость четвертого выпуска – Павел Шорохов, Chief Data Officer Магнита. Кстати, он как раз объясняет, откуда пошла такая должность и за что отвечают люди с ней в крупных компаниях. Получилось страшно интересно, так что крайне рекомендуем, если вы работаете в сфере, где приходится работать с терабайтами-петабайтами-зетабайтами данных. Этот выпуск вы можете послушать прямо в ленте подкаста Завтракаст на любом подкаст-сервисе: https://podcast.ru/1068329384 А также посмотреть на YouTube-канале Завтракаста: https://youtu.be/DjVfxSY9_PI Вы можете узнать много интересных кейсов и историй из блогов MagnIT На VC: https://vc.ru/magnit-tech На Хабре: https://habr.com/ru/company/magnit Посмотреть вакансии и отправить резюме https://magnit.tech Запись Магнитное Поле №4 – Data Lake, Data Governance и управление данными впервые появилась Zavtracast.

Software Engineering Daily
Data Lake for Developers with Jorge Sancha

Software Engineering Daily

Play Episode Listen Later Sep 12, 2022 36:13


Data analytics technology and tools have seen significant improvements in the past decade. But, it can still take weeks to prototype, build and deploy new transformations and deployments, usually requiring considerable engineering resources. Plus, most data isn't real-time. Instead, most of it is still batch-processed. Tinybird Analytics provides an easy way to ingest and query The post Data Lake for Developers with Jorge Sancha appeared first on Software Engineering Daily.