Podcasts about Apache Spark

  • 145PODCASTS
  • 279EPISODES
  • 45mAVG DURATION
  • ?INFREQUENT EPISODES
  • Jun 23, 2025LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about Apache Spark

Latest podcast episodes about Apache Spark

How to B2B a CEO (with Ashu Garg)
How to Turn Research Into Real Companies | Ion Stoica, Co-founder and Executive Chairman, Databricks

How to B2B a CEO (with Ashu Garg)

Play Episode Listen Later Jun 23, 2025 63:11


My guest today is Ion Stoica, professor of computer science at UC Berkeley and the co-founder of Conviva, Databricks, and Anyscale. Over the last two decades, Ion's research labs - the AMP Lab, the RISE Lab, and now the Sky Computing Lab - have seeded a generation of category-defining companies. Ion has the unique ability to turn non-consensus ideas into durable businesses. He applied machine learning to video optimization with Conviva before AI became mainstream. He scaled Apache Spark into a $60B platform with Databricks. And now, with Anyscale, he's betting on Ray as the foundation for distributed AI workloads. In this episode, we dig into both sides of Ion's work: how to build world-class research labs, and how to turn research into real companies. His clarity of thought makes the future feel legible, and his track record suggests he's very often right. Hope you enjoy the conversation! Chapters: 00:00 The Spark thesis: win the ecosystem first, monetize later 01:00 Intro: From lab to company - Ion's repeatable playbook 03:00 Did you always plan to become a founder, or did it just happen? 05:23 Let's start with Spark - how did the project come about? 13:04 What were the most important early decisions at Databricks? 23:49 You were the first CEO - what did you have to learn (or unlearn)? 30:01 How was building Anyscale different from building Databricks? 33:53 What's obvious to you about the future of AI that others miss? 37:31 Why AI works so well for code 41:00 The thesis behind OPAQUE Systems 44:06 Future infra will be heterogeneous, distributed, and vertically integrated 49:03 China's edge: faster diffusion from lab to market 53:19 Platform companies still work, but only with the right investors 55:57 What role did the Databricks Unit (DBU) play in value capture? 58:02 AI progress is plateauing, but adoption is just beginning

Engineering Kiosk
#177 Stream Processing & Kafka: Die Basis moderner Datenpipelines mit Stefan Sprenger

Engineering Kiosk

Play Episode Listen Later Jan 7, 2025 67:40


Data Streaming und Stream Processing mit Apache Kafka und dem entsprechenden Ecosystem.Eine ganze Menge Prozesse in der Softwareentwicklung bzw. für die Verarbeitung von Daten müssen nicht zur Laufzeit, sondern können asynchron oder dezentral bearbeitet werden. Begriffe wie Batch-Processing oder Message Queueing / Pub-Sub sind dafür geläufig. Es gibt aber einen dritten Player in diesem Spiel: Stream Processing. Da ist Apache Kafka das Flaggschiff, bzw. die verteilte Event Streaming Platform, die oft als erstes genannt wird.Doch was ist denn eigentlich Stream Processing und wie unterscheidet es sich zu Batch Processing oder Message Queuing? Wie funktioniert Kafka und warum ist es so erfolgreich und performant? Was sind Broker, Topics, Partitions, Producer und Consumer? Was bedeutet Change Data Capture und was ist ein Sliding Window? Auf was muss man alles acht geben und was kann schief gehen, wenn man eine Nachricht schreiben und lesen möchte?Die Antworten und noch viel mehr liefert unser Gast Stefan Sprenger.Bonus: Wie man Stream Processing mit einem Frühstückstisch für 5-jährige beschreibt.Unsere aktuellen Werbepartner findest du auf https://engineeringkiosk.dev/partnersDas schnelle Feedback zur Episode:

The Six Five with Patrick Moorhead and Daniel Newman
Sparking AI Innovation with Dell's Data Lakehouse - Six Five On The Road at SC24

The Six Five with Patrick Moorhead and Daniel Newman

Play Episode Listen Later Dec 30, 2024 16:44


Unstructured data is the next frontier for AI: think video, audio, and more. David Nicholson is joined by Dell Technologies' Vice President of Product Management for Artificial Intelligence and Data Management Chad Dunn for a conversation on the strategic importance of high-quality data and the dynamic capabilities of the Dell Data Lakehouse in facilitating effective AI workloads. Highlights include ⤵️ Data quality is paramount: "Garbage in, expensive garbage out" applies more than ever in the age of generative AI Dell's Data Lakehouse: This intelligent platform helps organizations extract, prepare, and analyze data for AI workloads, including both structured and unstructured data with tools like Apache Spark and Trino Customer experiences: The evolving landscape of data challenges in large enterprises Pushing the boundaries: Dell's approach to managing unstructured data and integrating AI Factory visions into Lakehouse functionalities  

AWS Morning Brief
re:Invent Begins

AWS Morning Brief

Play Episode Listen Later Dec 2, 2024 11:38


AWS Morning Brief for the week of December 2, with Corey Quinn. Links:Amazon CloudWatch adds context to observability data in service consoles, accelerating analysisAmazon Cognito introduces Managed Login to support rich branding for end user journeysAmazon Cognito now supports passwordless authentication for low-friction and secure loginsAmazon Connect Email is now generally availableAmazon EBS announces Time-based Copy for EBS SnapshotsAmazon EC2 Auto Scaling introduces highly responsive scaling policiesAmazon EC2 Capacity Blocks now supports instant start times and extensionsAmazon ECR announces 10x increase in repository limit to 100,000Amazon EFS now supports up to 2.5 million IOPS per file systemAmazon S3 now supports enforcement of conditional write operations for S3 general purpose bucketsApplication Signals provides OTEL support via X-Ray OTLP endpoint for tracesAWS delivers enhanced root cause insights to help explain cost anomaliesEnhanced Pricing Calculator now supports discounts and purchase commitments (in preview)AWS PrivateLink now supports cross-region connectivityAnnouncing the new AWS User Notifications SDKAnnouncing new feature tiers: Essentials and Plus for Amazon CognitoAnnouncing Savings Plans Purchase AnalyzerData Exports for FOCUS 1.0 is now in general availabilityIntroducing a new experience for AWS Systems ManagerIntroducing generative AI troubleshooting for Apache Spark in AWS Glue (preview)Understanding how certain database parameters impact scaling in Amazon Aurora Serverless v2Analyzing your AWS Cost Explorer data with Amazon Q Developer: Now Generally AvailableYour guide to AWS for Advertising & Marketing at re:Invent 2024AWS IoT Services alignment with US Cyber Trust MarkStreamlining AWS Organizations Cleanup StrategiesSponsorWiz: wiz.io/lastweek

Open at Intel
AI, Community, and the Future of Generative Applications

Open at Intel

Play Episode Listen Later Nov 27, 2024 20:53


In this engaging conversation at the All Things Open conference, Tim Spann, Principal Developer Advocate at Zilliz, discusses the importance of community collaboration in advancing AI technologies. He emphasizes the need for diverse perspectives in solving complex problems and highlights his work with the Milvus open source vector database. Tim also explains the evolving landscape of retrieval augmented generation (RAG) and its applications and shares insights into the future of AI development. The conversation concludes on a lighter note with Tim describing his creative use of Milvus in a fun Halloween project to catalog and identify ghosts. 00:00 Introduction 00:41 Meet Tim Spann: Principal Developer Advocate 01:35 The Importance of Community in AI 02:56 Advanced RAG and Multimodal Models 06:17 The Future of Agentic RAG 09:04 Challenges and Excitement in AI Development 13:35 Building AI the Right Way 17:50 Fun with AI: Capturing Ghosts 19:24 Conclusion and Final Thoughts   Guest: Tim Spann is a Principal Developer Advocate for Zilliz and Milvus. He works with Apache NiFi, Apache Kafka, Apache Pulsar, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Principal Developer Advocate at Cloudera, Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science.

Programmers Quickie
Trino versus Apache Spark

Programmers Quickie

Play Episode Listen Later Nov 25, 2024 23:02


Trino versus Apache Spark

The New Stack Podcast
Is Apache Spark Too Costly? An AWS Engineer Tells His Story

The New Stack Podcast

Play Episode Listen Later Nov 21, 2024 25:26


Is Apache Spark too costly? Amazon Principal Engineer Patrick Ames tackled this question during an interview with The New Stack Makers, sharing insights into transitioning from Spark to Ray for managing large-scale data. Ames, described as a "go-to" engineer for exabyte-scale projects, emphasized a goal-driven approach to solving complex engineering problems, from simplifying daily chores to optimizing software solutions.Initially, Spark was chosen at Amazon for its simplicity and open-source flexibility, allowing efficient merging of data with minimal SQL code. The team leveraged Spark in a decoupled architecture over S3 storage, scaling it to handle thousands of jobs daily. However, as data volumes grew to hundreds of terabytes and beyond, Spark's limitations became apparent. Long processing times and high costs prompted a search for alternatives.Enter Ray—a unified framework designed for scaling AI and Python applications. After experimentation, Ames and his team noted significant efficiency improvements, driving the shift from Spark to Ray to meet scalability and cost-efficiency needs.Learn more from The New Stack about Apache Spark and Ray: Amazon to Save Millions Moving From Apache Spark to RayHow Ray, a Distributed AI Framework, Helps Power ChatGPT Join our community of newsletter subscribers to stay on top of the news and at the top of your game. 

Jaani
Databricks

Jaani

Play Episode Listen Later Nov 14, 2024 3:24


Databricks är en molnbaserad plattform baserad på Apache Spark som blivit populär bland företag för databehandling, analys och maskininlärning.

Dev Sem Fronteiras
Arquiteto de Soluções na Databricks em Madri, Espanha - Dev Sem Fronteiras #165

Dev Sem Fronteiras

Play Episode Listen Later Nov 7, 2024 44:24


O paulista e paulistano Caio teve contato com tecnologia logo cedo, por conta dos amigos. Por volta dos 14 anos, já arranhava na programação, o que não fez com que fosse uma surpresa que ele buscasse essa área como a sua graduação. A vontade de estudar fora e a familiaridade com a Espanha o levaram a fazer mestrado e doutorado no país, antes de acabar indo com a família para Londres, por onde ele ficou por cinco anos. De volta à Espanha graças a uma oportunidade de trabalhar na Databricks, Caio engatou a fase mais atual da sua carreira, que também inclui fortes laços acadêmicos. Neste episódio, o Caio conta as diferenças de se morar em Londres e em Madri, além de detalhar como é seu dia a dia de pai, de marido, e professor e de arquiteto de soluções na terra da Puerta del Sol. Fabrício Carraro, o seu viajante poliglota Caio Moreno, Arquiteto de Soluções na Databricks em Madri, Espanha Links: Databricks Data + AI World Tour Data Saturday: Madrid Conheça a Formação Apache Spark com Python da Alura, entre para o universo Big Data e aprenda a criar projetos utilizando o Apache Spark e a linguagem Python. TechGuide.sh, um mapeamento das principais tecnologias demandadas pelo mercado para diferentes carreiras, com nossas sugestões e opiniões. #7DaysOfCode: Coloque em prática os seus conhecimentos de programação em desafios diários e gratuitos. Acesse https://7daysofcode.io/ Ouvintes do podcast Dev Sem Fronteiras têm 10% de desconto em todos os planos da Alura Língua. Basta ir a https://www.aluralingua.com.br/promocao/devsemfronteiras/e começar a aprender inglês e espanhol hoje mesmo!  Produção e conteúdo: Alura Língua Cursos online de Idiomas – https://www.aluralingua.com.br/ Alura Cursos online de Tecnologia – https://www.alura.com.br/ Edição e sonorização: Rede Gigahertz de Podcasts

MarTech Interviews
Pepperdata Capacity Optimizer: Cut Apache Spark Cluster Cloud Costs by Up to 47%

MarTech Interviews

Play Episode Listen Later Oct 3, 2024


Maximizing the efficiency of cloud infrastructure is a constant challenge for businesses. From scaling application workloads to ensuring optimal resource allocation, managing cloud environments can quickly become complex and expensive. Companies often face bloated instance hours, inefficient resource utilization, and a need for constant manual tuning to keep everything running smoothly. Pepperdata Capacity Optimizer Pepperdata …

What's Next|科技早知道
S8E10 | AI 喧嚣之下,数据双巨头的隐秘战争|硅谷徐老师

What's Next|科技早知道

Play Episode Listen Later Jun 21, 2024 48:28


一提起 AI ,大家首先想到的可能是各大科技巨头在算力和算法上的抢夺和竞争。但是在算力和算法背后,另一场没有硝烟的战争也在持续升温,那就是 AI 数据公司之间的博弈。就在几天前,大数据存储和云计算领域内两家最有影响力的公司 Snowflake 和 Databricks 分别召开了他们的年度峰会。 在峰会上两家公司分别介绍了自家数据生态的发展方向以及如何为企业提供更好的AI数据服务。但是出乎意料的消息是,峰会期间 Databricks 宣布重金收购这个领域3大开源数据社区之一 Iceberg 背后的商业公司 Tabular, 这让两家数据巨头之间的关系更加剑拔弩张,Databricks 颇有后来者居上的势头。 这期节目的两位嘉宾都是在 AI 数据领域有着丰富的经验和洞察的从业者。他们刚刚从 Snowflake 和 Databricks 的峰会现场回来,为我们带来了数据AI、企业级AI的共识转变的一些观察和思考。 内容涉及大量英文专业名词,「声动活泼」公众号上也同步整理了本期节目的要点,如果你喜爱本期节目或对节目内容感到好奇,欢迎在微信搜索「声动活泼」查看 最新文章 (https://mp.weixin.qq.com/s?__biz=MzIwMDczNTE3OQ==&mid=2247501751&idx=1&sn=d4f694182775514286d8b66494e626ee&chksm=96fa2713a18dae05e6a7ed74df24e025a7f5279a0930aeae78558a501264e703d535c7d0b0d6#rd)。 本期人物 丁教 Diane,「声动活泼」联合创始人、「科技早知道」主播 硅谷徐老师,AI 高管、连续创业者、斯坦福客座讲师,小红书和微信视频号:硅谷徐老师 |公众号:硅谷云| YouTube: Byte into Future 堵俊平: Datastrato AI 创始人 CEO Jack Song:Uber 数据平台工程总监,曾任 Airbnb 人工智能平台工程总监、Mastercard 数据和人工智能的技术副总裁 主要话题 [05:36] 从 Snowflake 和 Databricks 峰会看数据生态新趋势:AI for data 和 Open data catalog [09:50 ] Open data catalog 大火 : 统一数据湖仓数据架构,为 AI 引擎和数据引擎承上启下 [13:53] 引擎多样化和数据管理需求驱动统一和独立的 open data catalog 生态 [19:28] Databricks 收购 Tabular:会继续拥抱中立还是与商业利益捆绑? [23:14] Snowflakes 与 Databrick 暗暗较劲:Iceberg 社区会良性发展还是走向分裂? [25:10] Databricks 管理 Apache 社区 : 开源社区走向商业化是社区良性发展的重要标志 [29:56] Databricks 营收增长迅猛:战斗力来自于其开源属性 [31:25] 从 data for AI 到 AI for data: GenAI 时代的数据服务新方向 [40:17] Semantic search (语义搜索)是 AI 与 data 相互整合的一个突破口 所涉部分术语 Snowflake Snowflake 是一家成立于 2012 年的美国云原生数据仓库公司,于 2020 年上市。它的核心产品是云数据平台 Snowflake,该平台改变了传统的数据仓库架构,专为云环境设计,提供了高度可扩展、高性能的数据存储和处理能力。 Databricks Databricks 成立于 2013 年,由开源大数据项目Apache Spark的创建者们成立,是一家提供大数据处理和分析平台的公司。自成立以来发展迅猛估值已超过 400 亿美元,但仍未上市。 Iceberg社区 Iceberg 社区是一个开源数据湖格式项目,iceberg 专为大数据分析而设计,其目标是简化数据湖的管理,使得数据工程师可以像操作数据库一样操作数据湖中的数据。Tabular 是 iceberg 背后的商业公司,本次 Databricks 对 Tabular 的收购引发了公众对于 iceberg 的开源和中立属性的担忧。 Delta Lake Delta Lake 是一个由 Databricks 开发并开源的数据存储项目,致力于提升数据湖的管理能力和性能。 Delta Lake 与 Iceberg 存在潜在的竞争关系。 Hudi (Hadoop Upserts and Deletes Incrementally) 与 Iceberg 和 Delta lake 类似,Hudi 也是开源的数据湖社区,它旨在提供高效的大型数据集上的插入、更新和删除操作,同时保持数据湖的灵活性和规模。 Open data catalog 开放数据目录是专门面向人工智能和机器学习领域的一类数据资源库或平台。这类开放数据目录专注于提供可用于训练算法、测试模型或驱动研究的高质量数据集。数据公司通过建立和维护这样的目录,促进数据共享,降低数据获取门槛,加速AI技术的研发和应用创新。 Semantic Search 即语义搜索,是一种前沿搜索技术。不同于传统搜索的关键词匹配,语义搜索利用人工智能对自然语言进行理解和处理,旨在理解用户查询背后的意图和上下文,从而提供更加准确和相关的搜索结果。 幕后制作 监制:Diane、雅娴、六工 后期:Jack 运营:George 公众号:东君、六工 设计:饭团 商务合作 声动活泼商务合作咨询 (https://sourl.cn/6vdmQT) 支持我们,加入新一年的播客创新 2021 年我们发起了「声动胡同会员计划」,这是一个纯支持项目,支持「声动活泼」在播客内容上不断探索和创新。回顾 2023 年,得益于这些支持,「声动活泼」的每档节目都不断突破,不仅荣登苹果中国的年度热门节目榜单,还在 CPA 和喜马拉雅等平台都榜上有名。2024 年全新付费节目「不止金钱 (https://www.xiaoyuzhoufm.com/podcast/65a625966d045a7f5e0b5640)」现已上线,欢迎收听。同时,新一季「跳进兔子洞」即将上线,敬请期待! 胡同 https://files.fireside.fm/file/fireside-uploads/images/4/4931937e-0184-4c61-a658-6b03c254754d/Z0YbNKpo.png 加入我们 声动活泼正在招聘全职「节目监制」、「人才发展伙伴」、「商业发展经理」,查看详细讯息请 点击链接 (https://sourl.cn/j8tk2g)。如果你已准备好简历,欢迎发送至 hr@shengfm.cn, 标题请用:姓名+岗位名称。 关于声动活泼 「用声音碰撞世界」,声动活泼致力于为人们提供源源不断的思考养料。 我们还有这些播客:声动早咖啡 (https://www.xiaoyuzhoufm.com/podcast/60de7c003dd577b40d5a40f3)、声东击西 (https://etw.fm/episodes)、吃喝玩乐了不起 (https://www.xiaoyuzhoufm.com/podcast/644b94c494d78eb3f7ae8640)、反潮流俱乐部 (https://www.xiaoyuzhoufm.com/podcast/5e284c37418a84a0462634a4)、泡腾 VC (https://www.xiaoyuzhoufm.com/podcast/5f445cdb9504bbdb77f092e9)、商业WHY酱 (https://www.xiaoyuzhoufm.com/podcast/61315abc73105e8f15080b8a)、跳进兔子洞 (https://therabbithole.fireside.fm/) 、不止金钱 (https://www.xiaoyuzhoufm.com/podcast/65a625966d045a7f5e0b5640) 欢迎在即刻 (https://okjk.co/Qd43ia)、微博等社交媒体上与我们互动,搜索 声动活泼 即可找到我们。 期待你给我们写邮件,邮箱地址是:ting@sheng.fm 声小音 https://files.fireside.fm/file/fireside-uploads/images/4/4931937e-0184-4c61-a658-6b03c254754d/gK0pledC.png 欢迎扫码添加声小音,在节目之外和我们保持联系。 Special Guests: Jack Song and 堵俊平.

Engenharia de Dados [Cast]
Como é Trabalhar com Apache Spark na Visão de uma Engenharia de Dados Iniciante

Engenharia de Dados [Cast]

Play Episode Listen Later May 29, 2024 63:39


No episódio de hoje Mateus Oliveira entrevistou  Ananda Ellen (Engenharia de Dados),  Leonardo Côco e Victor Grutner, integrantes do time de dados da One Way Solution.Spark e Engenharia de Dados para Iniciantes, são tópicos extremamente relevantes nos dias de hoje, afinal, todos nós fomos iniciantes um dia.Nesse bate-papo conversamos sobre quais desafios e conquistas temos durante o aprendizado de Apache Spark.Não somente isso, falamos também sobre a área de Analytics na visão do Leonardo e do Victor, consultores da One Way Solution.Neste podcast iremos falar sobre:Área de AnalyticsApache Spark para IniciantesEsse podcast tem como principal intuito mostrar os desafios que os profissionais iniciantes na área de dados tem em comum, e como você pode se sobressair nestes desafios, com experiências de quem já passou por isso. Luan Moreno = https://www.linkedin.com/in/luanmoreno/

Programmers Quickie
Trino versus Apache Spark

Programmers Quickie

Play Episode Listen Later Apr 22, 2024 5:37


MLOps.community
[Exclusive] Databricks Roundtable // Introducing DBRX: The Future of Language Models

MLOps.community

Play Episode Listen Later Apr 12, 2024 48:35


Join us at our first in-person conference on June 25 all about AI Quality: https://www.aiqualityconference.com/ MLOps Coffee Sessions Special episode with Databricks, Introducing DBRX: The Future of Language Models, fueled by our Premium Brand Partner, Databricks. DBRX is designed to be especially capable of a wide range of tasks and outperforms other open LLMs on standard benchmarks. It also promises to excel at code and math problems, areas where others have struggled. Our panel of experts will get into the technical nuances, potential applications, and implications of DBRx for businesses, developers, and the broader tech community. This session is a great opportunity to hear from insiders about how DBRX's capabilities can benefit you. // Bio Denny Lee - Co-host Denny Lee is a long-time Apache Spark™ and MLflow contributor, Delta Lake maintainer, and a Sr. Staff Developer Advocate at Databricks. A hands-on distributed systems and data sciences engineer with extensive experience developing internet-scale data platforms and predictive analytics systems. He has previously built enterprise DW/BI and big data systems at Microsoft, including Azure Cosmos DB, Project Isotope (HDInsight), and SQL Server. Davis Blalock Davis Blalock is a research scientist and the first employee at MosaicML. He previously worked at PocketSonics (acquired 2013) and completed his PhD at MIT, where he was advised by John Guttag. He received his M.S. from MIT and his B.S. from the University of Virginia. He is a Qualcomm Innovation Fellow, NSF Graduate Research Fellow, and Barry M. Goldwater Scholar. He is also the author of Davis Summarizes Papers, one of the most widely-read machine learning newsletters. Bandish Shah Bandish Shah is an Engineering Manager at MosaicML/Databricks, where he focuses on making generative AI training and inference efficient, fast, and accessible by bridging the gap between deep learning, large-scale distributed systems, and performance computing. Bandish has over a decade of experience building systems for machine learning and enterprise applications. Prior to MosaicML, Bandish held engineering and development roles at SambaNova Systems where he helped develop and ship the first RDU systems from the ground up, and Oracle where he worked as an ASIC engineer for SPARC-based enterprise servers. Abhi Venigalla Abhi is an NLP architect working on helping organizations build their own LLMs using Databricks. Joined as part of the MosaicML team and used to work as a researcher at Cerebras Systems. Ajay Saini Ajay is an engineering manager at Databricks leading the GenAI training platform team. He was one of the early engineers at MosaicML (acquired by Databricks) where he first helped build and launch Composer (an open source deep learning training framework) and afterwards led the development of the MosaicML training platform which enabled customers to train models (such as LLMs) from scratch on their own datasets at scale. Prior to MosaicML, Ajay was co-founder and CEO of Overfit, an online personal training startup (YC S20). Before that, Ajay worked on ML solutions for ransomware detection and data governance at Rubrik. Ajay has both a B.S. and MEng in computer science with a concentration in AI from MIT. // MLOps Jobs board https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links Website: https://www.databricks.com/ Databricks DBRX: https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/

Oracle University Podcast
Best of 2023: Getting Started with Oracle Cloud Infrastructure

Oracle University Podcast

Play Episode Listen Later Nov 28, 2023 13:26


Oracle's next-gen cloud platform, Oracle Cloud Infrastructure, has been helping thousands of companies and millions of users run their entire application portfolio in the cloud. Today, the demand for OCI expertise is growing rapidly. Join Lois Houston and Nikita Abraham, along with Rohit Rahi, as they peel back the layers of OCI to discover why it is one of the world's fastest-growing cloud platforms.   Oracle MyLearn: https://mylearn.oracle.com/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X (formerly Twitter): https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, Kiran BR, Rashmi Panda, David Wright, the OU Podcast Team, and the OU Studio Team for helping us create this episode.   ------------------------------------------------------   Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started. 00:26 Lois: Welcome to the Oracle University Podcast. I'm Lois Houston, Director of Innovation Programs with Oracle University, and with me today is Nikita Abraham, Principal Technical Editor. Nikita: Hi there! You're listening to our Best of 2023 series, where over the next few weeks, we'll be revisiting six of our most popular episodes of the year. 00:47 Lois: Today is episode 2 of 6, and we're throwing it back to our very first episode of the Oracle University Podcast. It was a conversation that Niki and I had with Rohit Rahi, Vice President, CSS OU Cloud Delivery. During this episode, we discussed Oracle Cloud Infrastructure's core coverage on different tiers. Nikita: But we began by asking Rohit to explain what OCI is and tell us about its key components. So, let's jump right in. 01:14 Rohit: Some of the world's largest enterprises are running their mission-critical workloads on Oracle's next generation cloud platform called Oracle Cloud Infrastructure. To keep things simple, let us break them down into seven major categories: Core Infrastructure, Database Services, Data and AI, Analytics, Governance and Administration, Developer Services, and Application Services.  But first, the foundation of any cloud platform is the global footprint of regions. We have many generally available regions in the world, along with multi-cloud support with Microsoft Azure and a differentiated hybrid offering called Dedicated Region Cloud@Customer.  01:57 Rohit: We have building blocks on top of this global footprint, the seven categories we just mentioned. At the very bottom, we have the core primitives: compute, storage, and networking. Compute services cover virtual machines, bare metal servers, containers, a managed Kubernetes service, and a managed VMWare service.  These services are primarily for performing calculations, executing logic, and running applications. Cloud storage includes disks attached to virtual machines, file storage, object storage, archive storage, and data migration services. 02:35 Lois: That's quite a wide range of storage services. So Rohit, we all know that networking plays an important role in connecting different services. These days, data is growing in size and complexity, and there is a huge demand for a scalable and secure approach to store data. In this context, can you tell us more about the services available in OCI that are related to networking, database, governance, and administration? 03:01 Rohit: Networking features let you set up software defined private networks in Oracle Cloud. OCI provides the broadest and deepest set of networking services with the highest reliability, most security features, and highest performance.  Then we have database services, we have multiple flavors of database services, both Oracle and open source. We are the only cloud that runs Autonomous Databases and multiple flavors of it, including OLTP, OLAP, and JSON.  And then you can run databases and virtual machines, bare metal servers, or even Exadata in the cloud. You can also run open source databases, such as MySQL and NoSQL in the Oracle Cloud Infrastructure.  03:45 Rohit: Data and AI Services, we have a managed Apache Spark service called Dataflow, a managed service for tracking data artifacts across OCI called Data Catalog, and a managed service for data ingestion and ETL called Data Integration.  We also have a managed data science platform for machine learning models and training. We also have a managed Apache Kafka service for event streaming use cases.  Then we have Governance and Administration services. These services include security, identity, and observability and management. We have unique features like compartments that make it operationally easier to manage large and complex environments. Security is integrated into every aspect of OCI, whether it's automatic detection or remediation, what we typically refer as Cloud Security Posture Management, robust network protection or encryption by default.  We have an integrated observability and management platform with features like logging, logging analytics, and Application Performance Management and much more.  04:55 Nikita: That's so fascinating, Rohit. And is there a service that OCI provides to ease the software development process? Rohit: We have a managed low code service called APEX, several other developer services, and a managed Terraform service called Resource Manager.  For analytics, we have a managed analytics service called Oracle Analytics Cloud that integrates with various third-party solutions.  Under Application services, we have a managed serverless offering, call functions, and API gateway and an Events Service to help you create microservices and event driven architectures.  05:35 Rohit: We have a comprehensive connected SaaS suite across your entire business, finance, human resources, supply chain, manufacturing, advertising, sales, customer service, and marketing all running on OCI.  That's a long list. And these seven categories and the services mentioned represent just a small fraction of more than 80 services currently available in OCI.  Fortunately, it is quick and easy to try out a new service using our industry-leading Free Tier account. We are the first cloud to offer a server for just a penny per core hour.  Whether you're starting with Oracle Cloud Infrastructure or migrating your entire data set into it, we can support you in your journey to the cloud.   06:28 Have an idea and want a platform to share your technical expertise? Head over to the new Oracle University Learning Community. Drive intellectual, free-flowing conversations with your peers. Listen to experts and learn new skills. If you are already an Oracle MyLearn user, go to MyLearn to join the Community. You will need to log in first. If you have not yet accessed Oracle MyLearn, visit mylearn.oracle.com and create an account to get started.  Join the conversation today! 07:04 Nikita: Welcome back! Now let's listen to Rohit explain the core constructs of OCI's physical architecture, starting with regions. Rohit: Region is a localized geographic area comprising of one or more availability domains.  Availability domains are one or more fault tolerant data centers located within a region, but connected to each other by a low latency, high bandwidth network. Fault domains is a grouping of hardware and infrastructure within an availability domain to provide anti-affinity. So think about these as logical data centers.  Today OCI has a massive geographic footprint around the world with multiple regions across the world. And we also have a multi-cloud partnership with Microsoft Azure. And we have a differentiated hybrid cloud offering called Dedicated Region Cloud@Customer.  08:02 Lois: But before we dive into the physical architecture, can you tell us…how does one actually choose a region?  Rohit: Choosing a region, you choose a region closest to your users for lowest latency and highest performance. So that's a key criteria. The second key criteria is data residency and compliance requirements. Many countries have strict data residency requirements, and you have to comply to them. And so you choose a region based on these compliance requirements.  08:31 Rohit: The third key criteria is service availability. New cloud services are made available based on regional demand at times, regulatory compliance reasons, and resource availability, and several other factors. Keep these three criteria in mind when choosing a region.  So let's look at each of these in a little bit more detail. Availability domain. Availability domains are isolated from each other, fault tolerant, and very unlikely to fail simultaneously. Because availability domains do not share physical infrastructure, such as power or cooling or the internal network, a failure that impacts one availability domain is unlikely to impact the availability of others.  A particular region has three availability domains. One availability domain has some kind of an outage, is not available. But the other two availability domains are still up and running.  09:26 Rohit: We talked about fault domains a little bit earlier. What are fault domains? Think about each availability domain has three fault domains. So think about fault domains as logical data centers within availability domain.  We have three availability domains, and each of them has three fault domains. So the idea is you put the resources in different fault domains, and they don't share a single point of hardware failure, like physical servers, physical rack, top of rack switches, a power distribution unit. You can get high availability by leveraging fault domains.  We also leverage fault domains for our own services. So in any region, resources in at most one fault domain are being actively changed at any point in time. This means that availability problems caused by change procedures are isolated at the fault domain level. And moreover, you can control the placement of your compute or database instances to fault domain at instance launch time. So you can specify which fault domain you want to use.  10:29 Nikita: So then, what's the general guidance for OCI users?  Rohit: The general guidance is we have these constructs, like fault domains and availability domains to help you avoid single points of failure. We do that on our own. So we make sure that the servers, the top of rack switch, all are redundant. So you don't have hardware failures or we try to minimize those hardware failures as much as possible. You need to do the same when you are designing your own architecture.  So let's look at an example. You have a region. You have an availability domain. And as we said, one AD has three fault domains, so you see those fault domains here.  11:08 Rohit: So first thing you do is when you create an application you create this software-defined virtual network. And then let's say it's a very simple application. You have an application tier. You have a database tier.  So first thing you could do is you could run multiple copies of your application. So you have an application tier which is replicated across fault domains. And then you have a database, which is also replicated across fault domains.  11:34 Lois: What's the benefit of this replication, Rohit?  Rohit: Well, it gives you that extra layer of redundancy. So something happens to a fault domain, your application is still up and running.  Now, to take it to the next step, you could replicate the same design in another availability domain. So you could have two copies of your application running. And you can have two copies of your database running.  11:57 Now, one thing which will come up is how do you make sure your data is synchronized between these copies? And so you could use various technologies like Oracle Data Guard to make sure that your primary and standby-- the data is kept in sync here. And so that-- you can design your application-- your architectures like these to avoid single points of failure. Even for regions where we have a single availability domain, you could still leverage fault domain construct to achieve high availability and avoid single points of failure.  12:31 Nikita: Thank you, Rohit, for taking us through OCI at a high level.  Lois: For a more detailed explanation of OCI, please visit mylearn.oracle.com, create a profile if you don't already have one, and get started on our free training on OCI Foundations.  Nikita: We hope you enjoyed that conversation. Join us next week for another throwback episode. Until then, this is Nikita Abraham... Lois: And Lois Houston, signing off! 12:57 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.

AWS Podcast
#628: Data on EKS

AWS Podcast

Play Episode Listen Later Oct 9, 2023 20:56


Organizations use their data to make better decisions and build innovative experiences for their customers. With the exponential growth in data, and the rapid pace of innovation in machine learning (ML), there is a growing need to build modern data applications that are agile and scalable. In this episode, Jillian is joined by Vara Bonthu, Principal Solutions Architect, and Alex Lines, Sr. Containers Specialist, to talk through why Kubernetes is becoming a popular choice for modernizing data applications, like batch processing and ML. They also discuss how AWS's open-source project, Data on EKS, helps customers build and test common use cases, like batch processing with Apache Spark or training an ML language model, to decrease the time it takes to get to production. Data on EKS website: https://awslabs.github.io/data-on-eks/ Data on EKS GitHub repository: https://github.com/awslabs/data-on-eks Data on EKS blog: https://go.aws/46bD1b8

Startup Field Guide by Unusual Ventures: The Product Market Fit Podcast
How open source AI will find product market fit: A conversation with Databricks, and AI startup Together

Startup Field Guide by Unusual Ventures: The Product Market Fit Podcast

Play Episode Listen Later Sep 25, 2023 46:33


Open source AI models have become key drivers of innovation and collaboration. An increasing number of developers and end users are leveraging open source technologies. There is immense potential in the long-term impact of open source AI.  In this episode, we are releasing a conversation on the future of open source AI between Wei Lien Dang (Unusual Ventures), and Reynold Xin (Databricks) and Vipul Ved Prakash (Together). Join us as we discuss:3:16: The rise of open source LLMs and foundation models 7:23  Building open source AI platforms to serve customers 10:35 Why Together and Databricks decided to build with open source 13:33 LLMs and the need for standardization 21:09 The role of academia in AI research and innovation 26:57 Innovations in training data 30:55 Making the decision to choose open source models 36:52 Growing Accessibility of Machine Learning with LLMs 40:31 How the open source ecosystem will evolve in the future 47:18 Best practices for parameterizing LLMs over timeWei Lien Dang is a General Partner at Unusual Ventures and leads investments in infrastructure software, security, and developer tool.  Wei was a co-founder of StackRox, a cloud-native security company prior to its acquisition by Red Hat. He can be reached at wei@unusual.vc and Twitter LinkedIn Vipul Ved Prakash is the co-founder and CEO of Together. He was also the founder of Topsy and Cloudmark. Reynold Xin is the co-founder of Databricks. Last valued at $43B, Databricks has been a juggernaut data infrastructure business built on Apache Spark analytics engine. They recently launched multiple AI products including Lakehouse AI and their own open source LLM — Dolly. Unusual Ventures is a seed-stage venture capital firm designed from the ground up to give a distinct advantage to founders building the next generation of software companies. Unusual has invested in category-defining companies like Webflow, Arctic Wolf Networks, Carta, Robinhood, and Harness. Learn more about us at https://www.unusual.vc/.Further reading from Unusual Ventures: Why the future of AI-native infrastructure will be open How good is your LLM? Nobody know yet What AI builders should know about data protection and privacy

The New Stack Podcast
How Apache Flink Delivers for Deliveroo

The New Stack Podcast

Play Episode Listen Later Sep 20, 2023 20:38


Deliveroo, a prominent food delivery company, relies on Apache Flink, a distributed processing engine, to enhance its three-sided marketplace, connecting delivery drivers, restaurants, and customers. Seeking to improve real-time data streaming and gain insights into customer behavior, Deliveroo transitioned to Flink, comparing it to alternatives like Apache Spark and Kafka Streams. Flink, with feature parity to their previous platform, offered stability and scalability. They initially experimented with Flink on Kubernetes but turned to the Amazon Managed Service for Flink (MSF) for enhanced support and maintenance.Engineers from Deliveroo, Felix Angell and Duc Anh Khu, emphasized the need for flexibility in data modeling to accommodate their fast-paced product development. However, flexibility can be complex, often requiring data model adjustments. They expressed the desire for a self-serve configuration feature in MSF, allowing easy customization of low-level settings and auto-scaling based on application metrics. This move to Flink and MSF has empowered Deliveroo to focus on core responsibilities like continuous integration and delivery while efficiently managing their data processing needs.Learn more from The New Stack about Apache Flink and AWS:Kinesis, Kafka and Amazon Managed Service for Apache FlinkApache Flink for Real Time Data AnalysisApache Flink for Unbounded Data Streams

The Cloud Pod
221: The Biggest Innovator in SFTP in 30 Years? Amazon Web Services!

The Cloud Pod

Play Episode Listen Later Aug 7, 2023 53:37


Welcome episode 221 of The Cloud Pod podcast - where the forecast is always cloudy! This week your hosts, Justin, Jonathan, Ryan, and Matthew look at some of the announcements from AWS Summit, as well as try to predict the future - probably incorrectly - about what's in store at Next 2023. Plus, we talk more about the storm attack, SFTP connectors (and no, that isn't how you get to the Moscone Center for Next) Llama 2, Google Cloud Deploy and more!  Titles we almost went with this week: Now You Too Can Get Ignored by Google Support via Mobile App The Tech Sector Apparently Believes Multi-Cloud is Great… We Hate You All.  The cloud pod now wants all your HIPAA Data The Meta Llama is Spreading Everywhere The Cloud Pod Recursively Deploys Deploy A big thanks to this week's sponsor: Foghorn Consulting, provides top-notch cloud and DevOps engineers to the world's most innovative companies. Initiatives stalled because you have trouble hiring?  Foghorn can be burning down your DevOps and Cloud backlogs as soon as next week.

All TWiT.tv Shows (MP3)
FLOSS Weekly 743: Data Is Surprisingly Exciting

All TWiT.tv Shows (MP3)

Play Episode Listen Later Aug 2, 2023 65:18


William Kwok speaks with Doc Searls and Shawn Powers about Apache SeaTunnel, an exciting and extremely useful open-source way to synchronize multiple databases. Hosts: Doc Searls and Shawn Powers Guest: William Kwok Download or subscribe to this show at https://twit.tv/shows/floss-weekly Think your open source project should be on FLOSS Weekly? Email floss@twit.tv. Thanks to Lullabot's Jeff Robbins, web designer and musician, for our theme music. Get episodes ad-free with Club TWiT at https://twit.tv/clubtwit

FLOSS Weekly (MP3)
FLOSS Weekly 743: Data Is Surprisingly Exciting - Apache SeaTunnel, William Kwok

FLOSS Weekly (MP3)

Play Episode Listen Later Aug 2, 2023 65:18


William Kwok speaks with Doc Searls and Shawn Powers about Apache SeaTunnel, an exciting and extremely useful open-source way to synchronize multiple databases. Hosts: Doc Searls and Shawn Powers Guest: William Kwok Download or subscribe to this show at https://twit.tv/shows/floss-weekly Think your open source project should be on FLOSS Weekly? Email floss@twit.tv. Thanks to Lullabot's Jeff Robbins, web designer and musician, for our theme music. Get episodes ad-free with Club TWiT at https://twit.tv/clubtwit

All TWiT.tv Shows (Video LO)
FLOSS Weekly 743: Data Is Surprisingly Exciting

All TWiT.tv Shows (Video LO)

Play Episode Listen Later Aug 2, 2023 65:17


William Kwok speaks with Doc Searls and Shawn Powers about Apache SeaTunnel, an exciting and extremely useful open-source way to synchronize multiple databases. Hosts: Doc Searls and Shawn Powers Guest: William Kwok Download or subscribe to this show at https://twit.tv/shows/floss-weekly Think your open source project should be on FLOSS Weekly? Email floss@twit.tv. Thanks to Lullabot's Jeff Robbins, web designer and musician, for our theme music. Get episodes ad-free with Club TWiT at https://twit.tv/clubtwit

FLOSS Weekly (Video HD)
FLOSS Weekly 743: Data Is Surprisingly Exciting - Apache SeaTunnel, William Kwok

FLOSS Weekly (Video HD)

Play Episode Listen Later Aug 2, 2023 65:17


William Kwok speaks with Doc Searls and Shawn Powers about Apache SeaTunnel, an exciting and extremely useful open-source way to synchronize multiple databases. Hosts: Doc Searls and Shawn Powers Guest: William Kwok Download or subscribe to this show at https://twit.tv/shows/floss-weekly Think your open source project should be on FLOSS Weekly? Email floss@twit.tv. Thanks to Lullabot's Jeff Robbins, web designer and musician, for our theme music. Get episodes ad-free with Club TWiT at https://twit.tv/clubtwit

AI Today Podcast: Artificial Intelligence Insights, Experts, and Opinion
AI Today Podcast: AI Glossary Series – Machine Learning Tools: Keras, PyTorch, Scikit Learn, TensorFlow, Apache Spark, Kaggle

AI Today Podcast: Artificial Intelligence Insights, Experts, and Opinion

Play Episode Listen Later Jul 14, 2023 17:24


In this episode of the AI Today podcast hosts Kathleen Walch and Ron Schmelzer define the terms Machine Learning Tools: Keras, PyTorch, Scikit Learn, TensorFlow, Apache Spark, Kaggle, explain how these terms relate to AI and why it's important to know about them. Show Notes: FREE Intro to CPMAI mini course CPMAI Training and Certification AI Glossary Glossary Series: (Artificial) Neural Networks, Node (Neuron), Layer Glossary Series: Bias, Weight, Activation Function, Convergence, ReLU Glossary Series: Perceptron Glossary Series: Hidden Layer, Deep Learning Glossary Series: Loss Function, Cost Function & Gradient Descent Glossary Series: Backpropagation, Learning Rate, Optimizer Glossary Series: Feed-Forward Neural Network AI Glossary Series – Machine Learning, Algorithm, Model Continue reading AI Today Podcast: AI Glossary Series – Machine Learning Tools: Keras, PyTorch, Scikit Learn, TensorFlow, Apache Spark, Kaggle at Cognilytica.

Foundr Magazine Podcast with Nathan Chan
466: Reshape Free Products into Revenue-Generators with Ali Ghosdi of Databricks

Foundr Magazine Podcast with Nathan Chan

Play Episode Listen Later Jun 30, 2023 48:57


Ali Ghosdi was a reluctant founder. He planned to become an academic researcher and professor, not lead a successful tech startup. In 2013, alongside seven other co-founders, Ghosdi helped build an open-source data product called Apache Spark, a best-of-breed future predicting code. The research project eventually became a business called Databricks. In 2016, he was picked as CEO and helped transform the open-source startup into a technology enterprise with a $38 billion valuation. Databricks boasts investors like Andreessen Horowitz, Microsoft, and Amazon.  Nathan and Ali discuss: Being a reluctant startup co-founder Partnering with Andreessen Horowitz as their first investor The pros and cons of having co-founders The pressure of living up to early success Transforming an open-source startup into a revenue enterprise  The difference between professional and founder CEOs How startups and small businesses can use AI tools right now. Why product market fit is an art How to work backward in your business  Why you shouldn't listen to the consensus  And much more data, AI, and product advice… Who do you want to see next on the podcast? Comment and let us know! And don't forget to leave us a 5-star review if you loved this episode. Wait, there's more… If you enjoy the Foundr podcast, check out our free trainings. Get exclusive, actionable advice from some of the world's best entrepreneurs.  Speak with our friendly course experts to get clarity on the next steps for your idea, business or career. You will get tailored insights from results achieved by our proven practitioners as well as thousands of students. Book a call now...  For more Foundr content, follow us on your favorite platform:  Foundr.com Instagram YouTube Facebook Twitter LinkedIn Magazine

GOTO - Today, Tomorrow and the Future
Scaling Machine Learning with Spark • Adi Polak & Holden Karau

GOTO - Today, Tomorrow and the Future

Play Episode Listen Later Jun 30, 2023 40:06 Transcription Available


This interview was recorded for the GOTO Book Club.gotopia.tech/bookclubRead the full transcription of the interview hereAdi Polak - VP of Developer Experience at Treeverse & Contributing to lakeFS OSSHolden Karau - Co-Author of "Kubeflow for Machine Learning" & many more books & Open Source Engineer at NetflixDESCRIPTIONLearn how to build end-to-end scalable machine learning solutions with Apache Spark. With this practical guide, author Adi Polak introduces data and ML practitioners to creative solutions that supersede today's traditional methods. You'll learn a more holistic approach that takes you beyond specific requirements and organizational goals--allowing data and ML practitioners to collaborate and understand each other better.Scaling Machine Learning with Spark examines several technologies for building end-to-end distributed ML workflows based on the Apache Spark ecosystem with Spark MLlib, MLflow, TensorFlow, and PyTorch. If you're a data scientist who works with machine learning, this book shows you when and why to use each technology.You will:• Explore machine learning, including distributed computing concepts and terminology• Manage the ML lifecycle with MLflow• Ingest data and perform basic preprocessing with Spark• Explore feature engineering, and use Spark to extract features• Train a model with MLlib and build a pipeline to reproduce it• Build a data system to combine the power of Spark with deep learning• Get a step-by-step example of working with distributed TensorFlow• Use PyTorch to scale machine learning and its internal architecture* Book description: © O'ReillyThe interview is based on the book "Scaling Machine Learning with Spark"RECOMMENDED BOOKSAdi Polak • Machine Learning with Apache SparkHolden Karau, Trevor Grant, Boris Lublinsky, Richard Liu & Ilan Filonenko • Kubeflow for Machine LearningHolden Karau • Distributed Computing 4 KidsHolden Karau • Scaling Python with DaskHolden Karau & Boris Lublinsky • Scaling Python with RayHolden Karau & Rachel Warren • High Performance SparkHolden Karau, Konwinski, Wendell & Zaharia • Learning SparkHolden Karau & Krishna Sankar • Fast Data Processing with Spark 2nd EditionHolden Karau • Fast Data Processing with Spark 1st EditionTwitterLinkedInFacebookLooking for a unique learning experience?Attend the next GOTO conference near you! Get your ticket: gotopia.techSUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted almost daily

Cabeça de Lab
EXPLORANDO O PODER DO APACHE SPARK

Cabeça de Lab

Play Episode Listen Later Jun 5, 2023 43:15


No episódio de hoje vamos mergulhar no universo do Apache Spark e descobrir como ele revoluciona o processamento de dados em larga escala. Se você é um aficionado por big data ou apenas curioso sobre o assunto, este episódio é para você. Vem conhecer e se surpreender com o poder do Spark! Então chega+ e bora ouvir esse papo que tá massa demais! --- Edição completa por Rádiofobia Podcast e Multimídia: ⁠⁠https://radiofobia.com.br/⁠⁠ --- Nos siga no Twitter e no Instagram: @luizalabs @cabecadelab Dúvidas, cabeçadas e sugestões, mande e-mail para o ⁠⁠cabecadelab@luizalabs.com⁠⁠ Participantes: MILENE MANCINI VASCONCELOS| https://www.instagram.com/m_mvasconcelos/ ANA GONÇALVES | https://br.linkedin.com/in/anaflavialg DENILSON FERNANDES | https://www.linkedin.com/in/denilson-fernandes-a5207724/ MATHEUS FERREIRA | https://www.linkedin.com/in/mateusmferreira/

Data Radicals
The Bazaar in the Cathedral with Matei Zaharia

Data Radicals

Play Episode Listen Later May 24, 2023 49:47


When building a data platform, it's important to stay true to your vision. Whether that's through creating a definitive user experience or an open platform that allows people to build upon it, you're constructing a cathedral. This cathedral is sophisticated and dependable, and allows for a bazaar of business intelligence, machine learning, and AI use cases.In this episode, Satyen interviews Matei Zaharia, Chief Technologist and Co-founder of Databricks. Matei is an open source trailblazer and the creator of Apache Spark, a widely used framework for distributed data processing. He is also an Associate Professor of Computer Science at Stanford University where he leads various data management and machine learning projects. Matei and Satyen discuss the Databricks and Alation partnership, exploring how platforms can help companies own their data, and consider the value of democratizing open source large language models.--------“One of the early stories about open source has been this thing about the cathedral and the bazaar. The cathedral is the thing that's all designed by one person, maybe. It's extremely coherent and so on, but also takes forever to build. And when you go there, there's one message you're hearing. And then the bazaar is the open thing. You don't know who's going to show up each day, but there'll be some really interesting goods and things that you just wouldn't see anywhere else. If you just want to get started and get stuff done, follow the defaults in the product and it'll work. But, we want to be open to some of that innovation and let people bring that in.” – Matei Zaharia--------Time Stamps:*(01:33): The story behind Spark*(11:56): Solving for user problems versus product vision*(20:12): The cathedral and the bazaar of open source*(24:04): Matei explains the Databricks Unity Catalog*(31:04): The Databricks and Alation partnership*(43:36): The data culture at Databricks*(48:21): Satyen's Takeaways--------SponsorThis podcast is presented by Alation.Learn more:* Subscribe to the newsletter: https://www.alation.com/podcast/* Alation's LinkedIn Profile: https://www.linkedin.com/company/alation/* Satyen's LinkedIn Profile: https://www.linkedin.com/in/ssangani/--------LinksFollow Matei on LinkedInFollow Matei on TwitterLearn more about Databricks's Unity CatalogLearn more about Alation + Databricks

Engenharia de Dados [Cast]
Simplify Data Engineering Projects in Your Lakehouse with Delta Lake Framework with Matthew Powers & Denny Lee, Developer Advocates at Databricks

Engenharia de Dados [Cast]

Play Episode Listen Later May 23, 2023 72:32


No episódio de hoje, Luan Moreno e Mateus Oliveira entrevistaram Denny Lee & Mathew Powers, atualmente Developer Advocates na Databricks.Delta Lake é um produto open-source, que nos permite aplicar o famoso Data Lakehouse {Data Lake + Data Warehouse}, desenvolvido pela empresa dos criadores do Apache Spark. Delta Lake resolve o problema do Apache Spark, armazenamento, processamento de dados no Data Lake de forma otimizada.Com Delta Lake, você tem os seguintes benefícios:Formato de arquivo como se fosse uma tabela;Time Travel;ACID;Batch e Streaming Unificados.Falamos também nesse bate-papo sobre os seguintes temas:Estado da arte dos dados;Delta Lake.Aprenda mais sobre Delta Lake, como utilizar uma tecnologia para Data LakeHouse, junto com o time da databricks que mais impulsiona a comunidade com conteúdos, releases e eventos para ajudar este produto open-source.Denny Lee - Linkedin Mathew Powers - Linkedinhttps://delta.io/ Luan Moreno = https://www.linkedin.com/in/luanmoreno/

Engenharia de Dados [Cast]
Spark on Kubernetes [SPOK] with Hudson Buzby, Solutions Architect at Spot.io

Engenharia de Dados [Cast]

Play Episode Listen Later May 11, 2023 84:25


No episódio de hoje, Luan Moreno, Mateus Oliveira & Tiago Xavier entrevistaram Hudson Buzby, atualmente como Arquiteto de Soluções na Spot by NetApp. SPOK ou Spark Operator on Kubernetes, é o deployment para Apache Spark que utiliza um Operator de Kubernetes para melhor gerenciar os drivers e executors usando Kubernetes como infraestrutra escalável. Com SPOK, você possui os seguintes benefícios:Melhor utilização de recursos escaláveis.Infraestrutura mais leve.Criação de serviços Serverless {Ocean}  Falamos também nesse bate-papo sobre os seguintes temas:HistóriaSpark Operator on KubernetesOcean for Apache SparkDicas das trincheirasNesta sessão você  aprenderá a utilizar melhor o Apache Spark no Kubernetes, e entender um pouco mais  porque os clientes da Spot embarcaram nesta jornada.SpotHudson BuzbyData MechanicsOcean for Apache Spark Luan Moreno = https://www.linkedin.com/in/luanmoreno/

MLOps.community
The Birth and Growth of Spark: An Open Source Success Story // Matei Zaharia // MLOps Podcast #155

MLOps.community

Play Episode Listen Later Apr 25, 2023 58:12


MLOps Coffee Sessions #155 with Matei Zaharia, The Birth and Growth of Spark: An Open Source Success Story, co-hosted by Vishnu Rachakonda. // Abstract We dive deep into the creation of Spark, with the creator himself - Matei Zaharia Chief technologist at Databricks. This episode also explores the development of Databricks' other open source home run ML Flow and the concept of "lake house ML". As a special treat Matei talked to us about the details of the "DSP" (Demonstrate Search Predict) project, which aims to enable building applications by combining LLMs and other text-returning systems. // About the guest: Matei has the unique advantage of being able to see different perspectives, having worked in both academia and the industry. He listens carefully to people's challenges and excitement about ML and uses this to come up with new ideas. As a member of Databricks, Matei also has the advantage of applying ML to Databricks' own internal practices. He is constantly asking the question "What's a better way to do this?" // Bio Matei Zaharia is an Associate Professor of Computer Science at Stanford and Chief Technologist at Databricks. He started the Apache Spark project during his Ph.D. at UC Berkeley, and co-developed other widely used open-source projects, including MLflow and Delta Lake, at Databricks. At Stanford, he works on distributed systems, NLP, and information retrieval, building programming models that can combine language models and external services to perform complex tasks. Matei's research work was recognized through the 2014 ACM Doctoral Dissertation Award for the best Ph.D. dissertation in computer science, an NSF CAREER Award, and the US Presidential Early Career Award for Scientists and Engineers (PECASE). // MLOps Jobs board https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links https://cs.stanford.edu/~matei/ https://spark.apache.org/ --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Vishnu on LinkedIn: https://www.linkedin.com/in/vrachakonda/ Connect with Matei on LinkedIn: https://www.linkedin.com/in/mateizaharia/ Timestamps: [00:00] Matei's preferred coffee [01:45] Takeaways [05:50] Please subscribe to our newsletters, join our Slack, and subscribe to our podcast channels! [06:52] Getting to know Matei as a person [09:10] Spark [14:18] Open and freewheeling cross-pollination [16:35] Actual formation of Spark [20:05] Spark and MLFlow Similarities and Differences [24:24] Concepts in MLFlow [27:34] DJ Khalid of the ML world [30:58] Data Lakehouse [33:35] Stanford's unique culture of the Computer Science Department [36:06] Starting a company [39:30] Unique advice to grad students [41:51] Open source project [44:35] LLMs in the New Revolution [47:57] Type of company to start with [49:56] Emergence of Corporate Research Labs [53:50] LLMs size context [54:44] Companies to respect [57:28] Wrap up

The Cloud Pod
208: Azure AI Lost in Space

The Cloud Pod

Play Episode Listen Later Apr 21, 2023 57:43


Welcome to the newest episode of The Cloud Pod podcast! Justin, Ryan and Matthew are your hosts this week as we discuss all the latest news and announcements in the world of the cloud and AI. Do people really love Matt's Azure know-how? Can Google make Bard fit into literally everything they make? What's the latest with Azure AI and their space collaborations? Let's find out! Titles we almost went with this week: Clouds in Space, Fictional Realms of Oracles, Oh My.  The cloudpod streams lambda to the cloud A big thanks to this week's sponsor:  Foghorn Consulting, provides top-notch cloud and DevOps engineers to the world's most innovative companies. Initiatives stalled because you have trouble hiring?  Foghorn can be burning down your DevOps and Cloud backlogs as soon as next week.

The Secret To Success
100 Additional AI Tools That Are Not ChatGPT

The Secret To Success

Play Episode Listen Later Mar 21, 2023 73:49


Top 100 AI Tools that are not ChatGPThttps://www.youtube.com/watch?v=9y8aDC6WbgkWhat Are You using ChatGPT For:Help with a business planHelp with e-BooksBlogsCreating a courseShow notes for a podcastList of topics to discussUpdate resume' and cover letterWrite speechesComplete outlinesResearch for booksFind grants for minority womenOutline for KaraokeWriting letters to politicsVideo Editors & GeneratorsSynthesia = https://www.synthesia.io/AI video creation is a time and cost-efficient alternative to the complex and costly traditional video creation processeshttps://www.youtube.com/watch?v=UVNUCBUrHL0Runway = https://runwayml.com/Runway is a new kind of creative suite. One where AI is a collaborator and anything you can imagine can be created.https://www.youtube.com/watch?v=trXPfpV5iRQDescript = https://www.descript.com/Descript is the only tool you need to write, record, transcribe, edit, collaborate, and share your videos and podcastsNova AI = https://wearenova.ai/Create stellar videos, cut, trim and collide your clips. Add subtitles, translate and more. Entirely online, no installation is needed.Trint = https://trint.com/a tool for generating captions from voice in your video through quick speech recognition, auto-generating simple captions that can be easily altered and styled with different fonts, borders, and shadows.Unscreen = https://www.unscreen.com/Unscreen is an AI-powered online tool that helps you remove the background from videos and GIFs. With Unscreen, you can easily extract the foreground object and place it onto a new background of yourAimages = https://aimages.ai/Aimages is an AI-powered platform that provides a range of image editing and processing services. It offers tools for image restoration, enhancement, colorization, and more.Bhuman = https://www.bhuman.ai/Bhuman is an AI-powered platform that helps companies optimize their hiring processes. It offers tools for resume screening, candidate ranking, and interview scheduling, among other features.Kaiber = https://www.kaiber.ai/Kaiber is an AI-powered platform that helps businesses automate their customer support operations. It uses natural language processing and machine learning to analyze customer queries and provide personalized responses in real-time.Make-A-Video = https://makeavideo.studio/a meta AI system for creating videos based on textual input by generating one-of-a-kind videos with just a few words or lines of text. Papercup = https://www.papercup.com/a tool for translating videos with expressive AI voices, enabling content owners and creators to reach large audiences in days without stretching using AI dubbing localization.Reface: Face Swap Videos = https://hey.reface.ai/a face-swap smartphone app for swapping faces with friends or celebrities, putting your face into a pre-made film, and including various effects, gifts, and amusing videos.Topaz Video AI = https://www.topazlabs.com/topaz-video-aia video enhancement tool for de-interlacing, upscaling, and motion interpolation with optimized processing times for modern workstations. Image & ArtsMidJourney = https://www.midjourney.com/MidJourney is a platform that uses AI to create personalized employee training and development programs. It uses natural language processing and machine learning to analyze employee skills and knowledge gaps, and then creates customized training plans to help them achieve their goals.Dall-E2 = https://openai.com/product/dall-e-2Dall-E2 is an AI-powered image generation tool developed by OpenAI. It uses a neural network to generate high-quality images from textual descriptions, allowing users to create realistic images of objects that don't exist in the real world.Stable Diffusion = https://stablediffusionweb.com/Stable Diffusion is an AI-powered platform that provides image and video editing services. It uses machine learning to generate high-quality visual effects, such as slow motion, time-lapse, and stabilization.Night Cafe Studio = https://nightcafe.studio/Night Cafe Studio is an AI-powered platform that provides photo and video editing services. It uses machine learning to enhance and stylize images and videos, and also offers tools for removing backgrounds and adding special effects.Gaugan = http://gaugan.org/gaugan2/Gaugan is an AI-powered platform that allows users to create photorealistic landscapes using a simple paintbrush interface. It uses machine learning to generate realistic textures and lighting effects, enabling users to create complex natural scenes without any prior knowledge of 3D modeling or rendering.This Beach Does Not Exist = https://thisbeachdoesnotexist.com/This Beach Does Not Exist is a website that uses AI to generate high-quality images of beaches that don't exist in the real world. Each time the page is refreshed, a new beach image is generated using a machine learning algorithm.Neural.Love = https://neural.love/Neural.Love is an AI-powered platform that allows users to generate personalized love letters using natural language processing and machine learning. It analyzes user input and generates customized love letters that are tailored to their individual preferences.The Next Rembrandt = https://www.nextrembrandt.com/The Next Rembrandt is an AI-powered project that used machine learning to create a new Rembrandt painting. The project analyzed Rembrandt's style and techniques and used that data to create a completely new and original painting in his style.Let's Enhance = https://letsenhance.io/Let's Enhance is an AI-powered platform that allows users to enhance and upscale their images without losing quality. It uses machine learning to remove noise and artifacts from images, increase resolution, and improve sharpness and detail.Auto Draw = https://www.autodraw.com/Auto Draw is an AI-powered drawing tool that uses machine learning to help users create professional-looking illustrations. It suggests relevant shapes and icons as users draw, making it easy to create complex designs quickly and easily.Playground AI = https://playgroundai.com/Playground AI is an AI-powered platform that allows users to create and train their own machine learning models. It offers a range of pre-built models for image and speech recognition, and also allows users to upload their own data to train custom models.Imagen = https://imagen-ai.com/Imagen is an AI-powered platform that provides image analysis and classification services. It uses machine learning to identify and classify objects within images, making it a useful tool for a range of applications, from security and surveillance to marketing and advertising.Artbreeder = https://www.artbreeder.com/Artbreeder is an AI-powered platform that allows users to generate and manipulate images using machine learning. It allows users to mix and blend different images together to create unique and original artwork, and also offers tools for facial recognition and character creation.ProductivityChatGPT = https://openai.com/ChatGPT is an open AI chatbot that uses the transformer architecture to generate human-like text in various styles and formats. It launched in November 2022 and has become a versatile tool for many use cases. Best of all, it's free to use!Jasper = https://www.jasper.ai/Jasper is a generative AI platform for businesses that helps teams create content 10x faster. With over 50 templates and AI trained on industry best practices, Jasper is a powerful tool for content creation.Rewind = https://www.rewind.ai/Rewind is a search engine that records everything you've seen, said, or heard on your computer and makes it searchable. With mind-boggling compression, you can easily find what you need.TLDR This = https://tldrthis.com/AI writing tool that helps you summarize any piece of text into concise, easy-to-digest content. You can choose between short and detailed summaries to free yourself from information overload.Notion AI = https://www.notion.so/Notion AI is an AI-powered tool that can be directly leveraged within any Notion page. It helps automate tedious tasks, write faster, and even handle the first draft to augment your creativity.Lyric Studio = https://lyricstudio.com/Lyric Studio is a tool for songwriters and musicians that generates unique lyrics for any music genre. It offers multiple options based on your selected topic and helps you find rhymes for specific words with its smart suggestion feature. Collaborate in real-time with your co-writers using Lyric Studio.Noty.AI = https://noty.ai/Noty.AI is a platform that uses AI to automate the process of generating high-quality marketing copy for businesses. With its AI-powered copywriting technology, Noty.AI enables businesses to create marketing messages, ads, and product descriptions quickly and efficiently. The platform also offers a variety of tools to help businesses optimize their marketing campaigns, including analytics, testing, and targeting capabilities.Shortly = https://www.shortlyai.com/AI-powered writing tool that continues your writing for you when you run out of ideas or aren't sure about your writing style. It uses GPT-3 and can help you rewrite, shorten, or expand your sentences with simple commands.Rationale = https://rationale.jina.ai/Rationale is a revolutionary AI tool that assists business owners, managers, and individuals in making tough decisions. With this app, simply enter your pending decision and its AI-powered system will list pros and cons or generate a SWOT analysis to help you weigh your options.INK = https://inkforall.com/combines an AI writer, an SEO optimizer, and a content planner. Its technology crafts natural language optimization AI models to understand the meaning of content and uncover the nuances of what makes it perform. This app aims to replace multiple tools that writers already use and provide a smooth user experience that covers different aspects of writing. Vowel = https://www.vowel.com/a tool for remote teams to host, summarize, search, and share video meetings without any add-ons required. This AI tool helps you save time and catch up on meetings in seconds. Copy.AI = https://www.copy.ai/text generator perfect for marketers who write different types of copy. You can write 10x faster, engage your audience, and never struggle with the blank page again. Just tell it what you want, and the AI will create the marketing copy for you. Provide some input data and choose the right tone, and the AI will generate a few different versions of copy for you to choose from. DeepL = https://www.deepl.com/translatorDeepL is an exceptional machine translation tool that provides unparalleled accuracy and nuance. With DeepL Pro, you can translate quickly and focus on your work, no matter the language or location. DeepL Pro is secure, accurate, and customizable to meet your needs.WordTune = https://www.wordtune.com/WordTune uses advanced AI tools and language models to understand the context and meaning of written text. As the first AI-based writing companion, WordTune goes beyond simple grammar and spelling fixes to help you express your thoughts in writing. Piggy To = https://piggy.ai/Piggy is a mobile-friendly tool that helps you create engaging and shareable content. With just a prompt, Piggy generates multiple slides in a visually appealing format, saving you time and energy.Sudowrite = https://www.sudowrite.com/Sudowrite is a web-based writing tool that uses AI to assist users in improving their writing skills. It provides features such as grammar and style suggestions, tone analysis, and personalized feedback to help users enhance the clarity, coherence, and impact of their writing. Article Forge = https://www.articleforge.com/Article Forge is an AI writing tool that uses advanced deep learning to write entire articles automatically. From product descriptions to blog posts, Article Forge delivers high-quality, SEO-optimized content about any topic with just a single click.Grammarly = https://www.grammarly.comGrammarly with Grammar Lease is a new AI-powered app that helps you write with confidence. With auto-suggestions, you can go beyond grammar and spelling and work on style and tone. Whether you're writing emails, documents, or social media posts, Grammarly with Grammar Lease has got your back.Copy Monkey: https://www.copymonkey.ai/Copy Monkey is an AI-powered writing assistant that helps you write more effectively and efficiently by suggesting better phrasing and providing real-time feedback on your writing.Elephas: https://github.com/maxpumperla/elephasElephas is an open-source library that allows you to distribute deep learning models using Apache Spark. With Elephas, you can easily scale up your deep learning models to work with large datasets. MusicSoundDraw: https://www.sounddraw.com/SoundDraw is an AI-powered music creation platform that lets you draw your own melodies and rhythms, and then automatically generates full songs based on your inputs.JukeBox: https://openai.com/blog/jukebox/JukeBox is an AI-powered music generator developed by OpenAI. It can create original music in a variety of genres, and even generate lyrics to go along with the music.Harmonai: https://harmonai.com/Harmonai is an AI-powered tool that helps you write better harmonies for your music compositions. It uses machine learning algorithms to analyze your melodies and suggest harmonies that complement them.Aiva: https://www.aiva.ai/Aiva is an AI-powered music composer that can create original music for a variety of applications, including films, video games, and advertisements.Reffusion: https://reffusion.com/Reffusion is an AI-powered marketing platform that uses machine learning to optimize your marketing campaigns and improve your ROI.Supertone: https://www.supertone.ai/Supertone is an AI-powered sound design tool that helps you create custom sound effects for your film, television, or video game projects.Beatoven AI: https://beatoven.ai/Beatoven AI is an AI-powered music generator that lets you create original music using your voice or any other sound you can produce.Boomy: https://boomy.com/Boomy is an AI-powered music production platform that allows you to create original songs in minutes using simple drag-and-drop tools.Mubert: https://mubert.com/Mubert is an AI-powered music streaming platform that generates unique electronic music in real-time based on the listener's preferences.Design/Graphic DesignDesign Beast: https://designbeast.io/Design Beast is an AI-powered graphic design platform that helps you create professional-quality designs for your business or personal projects.FontJoy: https://fontjoy.com/FontJoy is an AI-powered font pairing tool that helps you choose the perfect font combinations for your design projects.Profile Picture AI: https://profilepicture.ai/Profile Picture AI is an AI-powered tool that automatically generates high-quality profile pictures for your social media accounts.Looka: https://looka.com/Looka is an AI-powered logo design platform that helps you create professional-quality logos for your business or personal projects.Beautiful AI: https://beautiful.ai/Beautiful AI is an AI-powered presentation design platform that helps you create stunning and effective presentations in minutes.Flair AI: https://github.com/flairNLP/flairFlair AI is an open-source natural language processing (NLP) library that allows you to perform a variety of NLP tasks, including sentiment analysis, named entity recognition, and text classification.Khroma: https://khroma.co/Khroma is an AI-powered color palette generator that helps you choose the perfect colors for your design projects.FontPair: https://fontpair.co/FontPair is an AI-powered font pairing tool that helps you choose the perfect font combinations for your design projects.Pikazo: https://pikazoapp.comPikazo is a mobile app that uses artificial intelligence to transform your photos into unique works of art.Jitter = https://jitter.video/Jitter helps you create animated designs in seconds. Perfect for animating interfaces or creating social media posts.BusinessResume.IO = Resume.io Helps you create a great resume and cover letter to set you apart from other job applicants. They offer 18 templates to choose from. Visit their website here: https://resume.io/NameLicks = https://namelicks.com/Name Licks uses artificial intelligence to generate short, brandable business names. Just enter your keywords, choose the level of randomness, and pick a naming style.  Durable Gig - https://www.durablegig.com/ Durable Gig is an AI-powered platform for solo business owners to create a fully designed website with copy, images, and contact form in under a minute.Textio - https://textio.com/ Textio provides gold-standard recruiting guidance using AI to optimize job posts, email, social posts, and more with data-driven inclusion guidance, expanding the candidate pool and establishing a consistent candidate experience.Timely - https://timelyapp.com/ Timely automates company time tracking, tracking time spent in every web and desktop app automatically for precise daily recordkeeping.Zia - https://www.zoho.com/creator/zia/ Zia is an AI-powered assistant for your business that can collect customer data, write documents, and help you find sales numbers easily.Cresta - https://www.cresta.ai/ Cresta uses machine learning algorithms to provide real-time guidance to sales and service agents to improve customer service, increase sales, and improve customer satisfaction.Ferret - https://ferret.ai/ Ferret is an AI app that provides exclusive relationship intelligence to help businesses avoid high-risk individuals and spot promising opportunities.EchoWin - https://echowin.ai/ EchoWin uses AI to automate incoming calls, assisting clients in obtaining answers to their questions, completing business tasks, or connecting them to the appropriate person if necessary.Boost.ai - https://www.boost.ai/ Boost.ai allows businesses to create customized virtual assistants that can handle tasks such as answering frequently asked questions, providing customer support, or processing transactions, and can be integrated with various messaging channels.Scale - https://scale.com/ Scale helps businesses deliver value from their AI investments faster by providing better data, leading to more performant models and faster deployment.RAD AI - https://radai.ventures/ RAD AI blends information with authentic content across all marketing platforms, generating emotional interactions with target audiences by analyzing previous performance and devising tactics for future content.Adobe Sensei - https://www.adobe.com/sensei.htmlAdobe Sensei uses AI and machine learning to help businesses create effortlessly, make informed decisions, and target marketing for better results, creating and offering the ideal customer experience.Poly AI (https://www.polyai.com/) Poly AI's voice assistant can engage in a natural conversation for as long as it takes to solve the customer's problem. Improve customer experience, achieve accurate resolution, and uncover data-driven business opportunities with Poly AI.DigitalGenius (https://www.digitalgenius.com/) DigitalGenius automates responses to common customer queries and proactively identifies issues to resolve them faster. This AI tool enables faster response times, quicker resolutions, and improved customer satisfaction.AudioVoice Maker (https://www.voicemaker.in/) Voice Maker uses text-to-speech systems and related tools to generate speech. Register to use the free plan with 100 converts per week, or purchase basic, premium, and business plans for full access to all features and voices. Voice Maker supports over 130 languages worldwide.Podcast Castle (https://www.podcastcastle.com/)Podcast Castle is a multimedia creation platform that allows you to create high-quality audio interviews and provides AI-powered sound editing. It offers studio-quality recording, AI-powered editing, and seamless exporting all in a single web-based interface.VoiceMod (https://www.voicemod.net/)VoiceMod is a voice transformer and modifier with effects that make you sound like a girl, boy, demon, or robot.CereProc (https://www.cereproc.com/) CereProc's text-to-speech technology offers more than 5,000 expressive voices for your voiceover needs. It also allows you to clone your own voice.Cleanvoice AI (https://cleanvoice.ai/)Cleanvoice AI is an artificial intelligence tool that removes filler sounds, stuttering, and mouth sounds from your podcast or audio recording, saving you hours of editing time.LaLa AI = https://lala-ai.com/LALA AI is an AI-powered language learning app that uses natural language processing (NLP) technology to help users learn foreign languages. Real EstateInterior AI (https://www.interiorai.com/) nterior AI provides interior design ideas using artificial intelligence and allows you to virtually stage interiors for real estate listings with different interior styles.AI Room Planner (https://www.airplanner.com/) AI Room Planner offers hundreds of interior design ideas for your room for free, with no limit.GetFloorPlan (https://getfloorplan.com/) GetFloorPlan can convert your 2D floor plan into a fully furnished 3D layout with a 360 virtual tour, with a capacity of up to thousands per day.Cool AIed Interior design ideas (https://www.architecturelab.net/cool-aied-interior-design-ideas/) This article showcases AI-generated interior design ideas for those looking to decorate or get inspiration.Learn & ResearchPerplexity AI = Perplexity (https://perplexity.ai/) Perplexity is an AI tool that condenses difficult topics and questions into a concise summary of four to five sentences. It provides sources and allows for further questioning.Genei https://genei.io/Genei is an AI-powered search and summarization tool that helps users quickly find and digest relevant information from documents and web pages. IRIS AI (https://iris.ai/) IRIS AI is a research workspace that uses AI to filter and extract data, understand situations, and generate summaries.Consensus (https://www.useconsensus.com/) Consensus helps with finding research studies to back up arguments by allowing users to search for points they are trying to make.Scholar C (https://scholarc.com/) Scholar C is a summarizer tool that breaks down essays, reports, and books into bite-sized sections to save time.Semantic Scholar (https://www.semanticscholar.org/)Semantic Scholar indexes over 2 million scholarly publications and extracts significant conclusions to keep users up to date on recent research trends.Wisdom AI (https://www.wisdomai.ai/) Wisdom AI is a natural language processing platform that extracts insights from data to help with decision making.Cactus (https://getcactus.app/) Cactus is a hub of various tools for students to save time with tasks such as reading, writing, and language learning.E l i five (https://eli5.ai/) E l i five is an AI-powered tutor available 24/7 to have conversations with users about any topic.Elicit (https://elicit.org/) Elicit is an AI tool for researchers that uses GPT-3 to quickly find relevant papers and provide answers for free.LegalDo Not Pay (https://donotpay.com/) Do Not Pay is an app that includes the world's first robot lawyer. It helps users fight corporations, beat bureaucracy, and sue anyone at the press of a buttonSupport this podcast at — https://redcircle.com/the-secret-to-success/exclusive-contentAdvertising Inquiries: https://redcircle.com/brandsPrivacy & Opt-Out: https://redcircle.com/privacy

Tech ONTAP Podcast
Episode 360: NetApp AI and Nvidia with Apache Spark Horovod

Tech ONTAP Podcast

Play Episode Listen Later Mar 17, 2023 47:38


Did you know that Google created Tensorflow and Kubernetes? Or that Solaris invented NFS? That Amazon created S3? Or that Uber created Apache Spark Horovod? Some of the key technologies that companies use today were originally created by businesses trying to solve their own internal challenges. In this episode of the Tech ONTAP Podcast, NetApp TME Rick Huang and Solutions Architect Ken Hillier join us to discuss Rick's new blog on using NetApp AI with Apache Spark Horovod for deep learning and inference use cases.

Streaming Audio: a Confluent podcast about Apache Kafka
Next-Gen Data Modeling, Integrity, and Governance with YODA

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Mar 7, 2023 55:55 Transcription Available


In this episode, Kris interviews Doron Porat, Director of Infrastructure at Yotpo, and Liran Yogev, Director of Engineering at ZipRecruiter (formerly at Yotpo), about their experiences and strategies in dealing with data modeling at scale.Yotpo has a vast and active data lake, comprising thousands of datasets that are processed by different engines, primarily Apache Spark™. They wanted to provide users with self-service tools for generating and utilizing data with maximum flexibility, but encountered difficulties, including poor standardization, low data reusability, limited data lineage, and unreliable datasets.The team realized that Yotpo's modeling layer, which defines the structure and relationships of the data, needed to be separated from the execution layer, which defines and processes operations on the data.This separation would give programmers better visibility into data pipelines across all execution engines, storage methods, and formats, as well as more governance control for exploration and automation.To address these issues, they developed YODA, an internal tool that combines excellent developer experience, DBT, Databricks, Airflow, Looker and more, with a strong CI/CD and orchestration layer.Yotpo is a B2B, SaaS e-commerce marketing platform that provides businesses with the necessary tools for accurate customer analytics, remarketing, support messaging, and more.ZipRecruiter is a job site that utilizes AI matching to help businesses find the right candidates for their open roles.EPISODE LINKSCurrent 2022 Talk: Next Gen Data Modeling in the Open Data PlatformData Mesh 101Data Mesh Architecture: A Modern Distributed Data ModelWatch the video version of this podcastKris Jenkins' TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

Engenharia de Dados [Cast]
Databricks como Plataforma de Lakehouse para Times de Dados

Engenharia de Dados [Cast]

Play Episode Listen Later Feb 7, 2023 70:28


Nesse episódio Luan Moreno & Mateus Oliveira entrevistam Rodrigo Oliveira, atualmente como Arquiteto de Solução na Databricks.Databricks é uma plataforma de linguagem unificada que tem como motor de processamento o Apache Spark, possibilitando o processamento de dados em batch e streaming em um serviço gerenciado presente nas principais nuvens (AWS, Azure e GCP).Além disto, o Databricks proporciona:Experiência de Notebook AvançadaWorkspace para Times de DadosCriação de Clusters para o seu Use-CasePlataforma de Desenvolvimento de PipelinesNesse bate papo foi feito a abordagem dos seguintes temas:Apache Spark (Open-Source)Delta Lake (Open-Source)Data LakehouseUnity CatalogWorkflowsDelta Live Tables (DLT)Databricks SQLSnowflake vs. DatabricksEntenda melhor como utilizar o Databricks em um ambiente corporativo para colaboração entre os times de dados, além de uma solução de fácil desenvolvimento e entrega de valor para sua empresa, se tornando cada vez mais uma plataforma de dados.Rodrigo OliveiraDatabricks Luan Moreno = https://www.linkedin.com/in/luanmoreno/

Hipsters Ponto Tech
Big Data e Apache Spark – Hipsters Ponto Tech #341

Hipsters Ponto Tech

Play Episode Listen Later Jan 24, 2023 37:25


Hoje o papo é sobre Big Data e, mais especificamente, o Apache Spark. Vamos discutir vários casos de uso e desafios atuais da Ciência de Dados e como o Apache Stark, essa ferramenta que tem aparecido cada vez mais em soluções, ajuda a resolvê-los. Vem ver quem acompanha a gente neste papo!

The Cloud Pod
195: The Cloud Pod can't wait for Azure Ultra Fungible Storage (Premium)!

The Cloud Pod

Play Episode Listen Later Jan 20, 2023 48:49


On The Cloud Pod this week, Amazon announces massive corporate and tech lay offs and S3 Encrypts New Objects By Default, BigQuery multi-statement transactions are now generally available, and Microsoft announces acquisition of Fungible to accelerate datacenter innovation. Thank you to our sponsor, Foghorn Consulting, which provides top notch cloud and DevOps engineers to the world's most innovative companies. Initiatives stalled because you're having trouble hiring? Foghorn can be burning down your DevOps and Cloud backlogs as soon as next week. General News: Amazon to lay off 18,000 corporate and tech workers. [1:11] Episode Highlights ⏰ Amazon S3 Encrypts New Objects By Default. [3:09] ⏰ Announcing the GA of BigQuery multi-statement transactions. [13:04] ⏰ Microsoft announces acquisition of Fungible to accelerate datacenter innovation. [17:14] Top Quote

Open||Source||Data
Functional Programming and an Ideal Data Stack Building Experience with Holden Karau

Open||Source||Data

Play Episode Listen Later Jan 18, 2023 45:11


This episode features an interview with Holden Karau, an Open Source Engineer at Netflix. Holden is best known for her work on Apache Spark, her advocacy in the open source software movement, and her creation of a variety of related projects including spark-testing-base. Previously, Holden worked at Big Tech companies like Apple, IBM, and Google as a software engineer and developer advocate.In this episode, Sam sits down with Holden to discuss the data analysis stack, functional programming, and the future of open source software data tooling.-------------------“These things are not one off. We may think that they're one off and they don't need testing, but that's not the reality. When you write something, it needs to be maintainable and as software people, the only real way that I think we know to make something vaguely maintainable is to at least have tests. And these tests need to cover common failure cases that we've experienced. And certainly, there's different approaches to this. There's property based testing, there's golden sets, all kinds of different options. I don't think necessarily any one approach is right or better here, but I think we need something. We need less untitled 5.IPython Notebook running in production, scheduled every hour. That is not a way to run a company.” – Holden Karau-------------------Episode Timestamps:(02:27): What open source data means to Holden(04:37): What interested Holden in mathematical computer science (09:51): What drew Holden to Spark(12:49): What Holden has learned about cognitive systems(20:02): What we need to learn as developers and data specialists(25:28): The future of the data analysis stack(31:21): Improvements in data tooling over the next 5 years(34:25): A question Holden wishes to be asked(40:51): Holden's advice for open source data project committers(43:18): Executive producer, Audra Montenegro's backstage takeaways-------------------Links:LinkedIn - Connect with HoldenBuy Holden's booksVisit Holden's website

AWS Podcast
#564: [INTRODUCING] Amazon Athena for Apache Spark

AWS Podcast

Play Episode Listen Later Dec 26, 2022 19:57 Very Popular


Amazon Athena for Apache Spark enables you to run Apache Spark interactive analytics quicker than ever before, without the need to plan for, configure, and manage resources. In this episode, Raj Devnath (Sr. Product Manager) and Anthony Virtuoso (Sr. Principal Engineer) join Simon to talk about this new launch that allows you to combine the ease of use, fast performance and on-demand availability of Athena with Spark's expressive programing model to ask more sophisticated questions of your data. Amazon Athena for Apache Spark: https://go.aws/3FP7LCG Read the blog: https://go.aws/3HSLk2a What's new: https://amzn.to/3WeOFwN See our page here: https://go.aws/3Gd1B0V

The Cloud Pod
191: The Cloud Pod Reinvents the Recap Show

The Cloud Pod

Play Episode Listen Later Dec 14, 2022 75:47


The Cloud Pod recaps all of the positives and negatives of Amazon ReInvent 2022, the annual conference in Las Vegas, bringing together 50,000 cloud computing professionals.  This year's keynote speakers include Adam Selpisky, CEO of Amazon Web Services, Swami Sivasubramanian, Vice President of Data and Machine Learning at AWS and Werner Vogels, Amazon's CTO.  Attendees and web viewers were treated to new features and products, such as AWS Lambda Snapstart for Java Functions, New Quicksight capabilities and quality-of-life improvements to hundreds of services.  Justin, Jonathan, Ryan, Peter and Special guest Joe Daly from the Finops foundation talk about the show and the announcements. Thank you to our sponsor, Foghorn Consulting, which provides top notch cloud and DevOps engineers to the world's most innovative companies. Initiatives stalled because you're having trouble hiring? Foghorn can be burning down your DevOps and Cloud backlogs as soon as next week. Episode Highlights ⏰ AWS Pricing Calculator now supports modernization cost estimates for Microsoft workloads. ⏰ AWS Re:Invent 2022 announcements and keynote updates. Top Quote

ACM ByteCast
Matei Zaharia - Episode 32

ACM ByteCast

Play Episode Listen Later Dec 13, 2022 54:27


In this episode of ACM ByteCast, Bruke Kifle hosts Matei Zaharia, computer scientist, educator, and creator of Apache Spark. Matei is the Chief Technologist and Co-Founder of Databricks and an Assistant Professor of Computer Science at Stanford. He started the Apache Spark project during his PhD at UC Berkeley in 2009 and has worked broadly on other widely used data and machine learning software, including MLflow, Delta Lake, and Apache Mesos. Matei's research was recognized through the 2014 ACM Doctoral Dissertation Award, an NSF Career Award, and the US Presidential Early Career Award for Scientists and Engineers. Matei, who was born in Romania and grew up mostly in Canada, describes how he developed Spark, a framework for writing programs that run on a large cluster of nodes and process data in parallel, and how this led him to co-found Databricks around this technology. Matei and Bruke also discuss the new paradigm shift from traditional data warehouses to data lakes, as well as his work on MLflow, an open-source platform for managing the end-to-end machine learning lifecycle. He highlights some recent announcements in the field of AI and machine learning and shares observations from teaching and conducting research at Stanford, including an important current gap in computing education.

Software Defined Talk
Episode 389: The Miscellaneous Keynote

Software Defined Talk

Play Episode Listen Later Dec 2, 2022 72:39


This week we recap the news from AWS re:Invent and discuss application vendors mandating use of specific Kubernetes distros. Plus, some thoughts on dog boarding… Watch the YouTube Live Recording of Episode 389 (https://www.youtube.com/watch?v=h8L0QEIMvOs) Runner-up Titles Everyone gets a Graviton Instance What a Boring re:Invent Part of our brand 17 Days in the Hole Under the Stars, Under the Sea Tighten it up Don't make me pay for security Secure by default That's a great message and I don't believe it Works with Lambda Security, it keeps getting better? Rundown AWS re:Invent What's New at AWS – Cloud Innovation & News - 2022 Archive (https://aws.amazon.com/about-aws/whats-new/2022/?whats-new-content-all.sort-by=item.additionalFields.postDateTime&whats-new-content-all.sort-order=desc&awsf.whats-new-analytics=*all&awsf.whats-new-app-integration=*all&awsf.whats-new-arvr=*all&awsf.whats-new-blockchain=*all&awsf.whats-new-business-applications=*all&awsf.whats-new-cloud-financial-management=*all&awsf.whats-new-compute=*all&awsf.whats-new-containers=*all&awsf.whats-new-customer-enablement=*all&awsf.whats-new-customer%20engagement=*all&awsf.whats-new-database=*all&awsf.whats-new-developer-tools=*all&awsf.whats-new-end-user-computing=*all&awsf.whats-new-mobile=*all&awsf.whats-new-gametech=*all&awsf.whats-new-iot=*all&awsf.whats-new-machine-learning=*all&awsf.whats-new-management-governance=*all&awsf.whats-new-media-services=*all&awsf.whats-new-migration-transfer=*all&awsf.whats-new-networking-content-delivery=*all&awsf.whats-new-quantum-tech=*all&awsf.whats-new-robotics=*all&awsf.whats-new-satellite=*all&awsf.whats-new-security-id-compliance=*all&awsf.whats-new-serverless=*all&awsf.whats-new-storage=*all) Compute Amazon EC2 C7g instances – Compute –Amazon Web Services (https://aws.amazon.com/ec2/instance-types/c7g/?sc_icampaign=aware_ec2-c7gn-instances_reinvent22&sc_ichannel=ha&sc_icontent=awssm-11814_aware_reinvent22&sc_iplace=ribbon&trk=1b39069e-86fc-466c-99c7-4ab2427ddb3a~ha_awssm-11814_aware_reinvent22) Announcing Amazon EC2 M6in, M6idn, R6in, and R6idn network optimized instances (https://aws.amazon.com/about-aws/whats-new/2022/11/amazon-ec2-m6in-m6idn-r6in-r6idn-network-optimized-instances/) Announcing Amazon EC2 Hpc6id instances (https://aws.amazon.com/about-aws/whats-new/2022/11/announcing-amazon-ec2-hpc6id-instances/) AWS Nitro Enclaves now supports Amazon EKS and Kubernetes (https://aws.amazon.com/about-aws/whats-new/2022/11/aws-nitro-enclaves-supports-amazoneks-kubernetes/) Introducing Finch: An Open Source Client for Container Development (https://aws.amazon.com/blogs/opensource/introducing-finch-an-open-source-client-for-container-development/) New – Accelerate Your Lambda Functions with Lambda SnapStart (https://aws.amazon.com/blogs/aws/new-accelerate-your-lambda-functions-with-lambda-snapstart/) Data Announcing Amazon Redshift integration for Apache Spark with Amazon EMR (https://aws.amazon.com/about-aws/whats-new/2022/11/amazon-redshift-integration-apache-spark-amazon-emr/) AWS announces Amazon Redshift integration for Apache Spark (https://aws.amazon.com/about-aws/whats-new/2022/11/aws-announces-amazon-redshift-integration-apache-spark/) AWS announces Amazon Aurora zero-ETL integration with Amazon Redshift (https://aws.amazon.com/about-aws/whats-new/2022/11/amazon-aurora-zero-etl-integration-redshift/) Serverless Open-Source Search Engine – Amazon OpenSearch Serverless (https://aws.amazon.com/opensearch-service/features/serverless/) Introducing AWS Glue 4.0 (https://aws.amazon.com/about-aws/whats-new/2022/11/introducing-aws-glue-4-0/) Security Introducing Amazon Security Lake (Preview) (https://aws.amazon.com/about-aws/whats-new/2022/11/amazon-security-lake-preview/) AWS co-announces release of the Open Cybersecurity Schema Framework (OCSF) (https://aws.amazon.com/blogs/security/aws-co-announces-release-of-the-open-cybersecurity-schema-framework-ocsf-project/) Amazon GuardDuty now protects Amazon Elastic Kubernetes Service clusters (https://aws.amazon.com/about-aws/whats-new/2022/01/amazon-guardduty-elastic-kubernetes-service-clusters/) Solutions AWS CEO: The cloud isn't just about technology (https://www.protocol.com/enterprise/aws-adam-selipsky-cloud) AWS Supply Chain (https://aws.amazon.com/aws-supply-chain/) AWS Clean Room (https://aws.amazon.com/clean-rooms/) Announcing AWS SimSpace Weaver (https://aws.amazon.com/about-aws/whats-new/2022/11/aws-simspace-weaver-available/) Amazon Connect announces Contact Lens agent performance evaluation forms (https://aws.amazon.com/about-aws/whats-new/2022/11/amazon-connect-contact-lens-agent-performance-evaluation-forms/) Introducing Amazon Omics (https://aws.amazon.com/about-aws/whats-new/2022/11/amazon-omics-generally-available/) Corey Quinn on re:Invent (https://twitter.com/QuinnyPig/status/1597664998234345472) Ask SDT — “using a "supported platform" list to drive cross sales.” (https://softwaredefinedtalk.slack.com/archives/C6CDLDCVB/p1669255641385689) (SDT Slack) Relevant to your Interests SigmaOS raises $4 million to build a browser for productivity nerds (https://techcrunch.com/2022/11/16/sigmaos-raises-4-million-to-build-a-browser-for-productivity-nerds/) The Distributed Computing Manifesto (https://www.allthingsdistributed.com/2022/11/amazon-1998-distributed-computing-manifesto.html) Unpacking Musk's "hardcore" marching orders (https://www.axios.com/newsletters/axios-login-3bf3c6e4-d8cd-492c-942d-c7f80719e66b.html?chunk=0&utm_term=emshare#story0) Akeyless secures a cash infusion to help companies manage their passwords, certificates and keys (https://techcrunch.com/2022/11/16/akeyless-secures-a-cash-infusion-to-help-companies-manage-their-passwords-certificates-and-keys/) Vista passes halfway mark to $20bn target for latest flagship (https://www.privateequityinternational.com/vista-passes-halfway-mark-to-20bn-target-for-latest-flagship/) 1Password Will Support Passkeys Starting in Early 2023 (https://www.macrumors.com/2022/11/17/1password-passkeys-support-2023/) Passkeys: the future of authentication in 1Password (https://www.future.1password.com/passkeys/?utm_medium=sign-in-side-panel&utm_source=1password&utm_campaign=passkeys) 10,000 Google Employees Could Be Rated as Low Performers (https://www.theinformation.com/articles/10-000-google-employees-could-be-rated-as-low-performers) Resignations Roil Twitter as Elon Musk Tries Persuading Some Workers to Stay (https://www.nytimes.com/2022/11/17/technology/twitter-elon-musk-ftc.html) Hundreds of employees say no to being part of Elon Musk's ‘extremely hardcore' Twitter (https://www.theverge.com/2022/11/17/23465274/hundreds-of-twitter-employees-resign-from-elon-musk-hardcore-deadline) Security of Passkeys in the Google Password Manager (https://security.googleblog.com/2022/10/SecurityofPasskeysintheGooglePasswordManager.html) With $8.6M in seed funding, Nx wants to take monorepos mainstream (https://techcrunch.com/2022/11/17/with-8-6m-in-seed-funding-nx-wants-to-take-monorepos-mainstream/) Facebook parent Meta winding down some non-core hardware projects (https://www.reuters.com/technology/facebook-parent-meta-winding-down-some-non-core-hardware-projects-2022-11-11/) OpenStack passes 40 million cores in production use (https://www.theregister.com/2022/11/18/openstack_thriving_survey/) A note from CEO Andy Jassy about role eliminations (https://www.aboutamazon.com/news/company-news/a-note-from-ceo-andy-jassy-about-role-eliminations) Twitter is Going Great (https://twitterisgoinggreat.com/) Building Kubernetes Applications with Acorn (https://acorn.io/building-kubernetes-applications-with-acorn/) Platforms at Kubecon 2022 (https://blog.joshgav.com/posts/kubecon-platforms-review) Zoom's looming squeeze (https://www.axios.com/newsletters/axios-login-149ea16b-be11-451a-b4de-5a1e2f8f0ce7.html?chunk=0&utm_term=emshare#story0) Sony's VR headset-console integration could limit sales, but allow depth (https://www.emergingtechbrew.com/stories/2022/11/18/sony-s-vr-headset-console-integration-could-limit-sales-but-allow-depth?utm_campaign=etb&utm_medium=newsletter&utm_source=morning_brew&mid=f642abf4dca6751d0ec109d4cbc6782e) The State of Kubernetes {Open-Source} Security | ARMO (https://www.armosec.io/blog/the-state-of-kubernetes-open-source-security/) Considerations when implementing developer portals in regulated enterprise environments (https://www.redhat.com/en/blog/considerations-when-implementing-developer-portals-regulated-enterprise-environments) Broadcom's proposed $61B VMware acquisition scrutinized by UK regulators (https://techcrunch.com/2022/11/21/broadcoms-proposed-61b-vmware-acquisition-scrutinized-by-uk-regulators/) 2023 may be the year of multicloud Kubernetes (https://www.infoworld.com/article/3679752/2023-may-be-the-year-of-multicloud-kubernetes.html?utm_source=substack&utm_medium=email) Server-side WebAssembly prepares for takeoff in 2023 (https://www.techtarget.com/searchitoperations/news/252527414/Server-side-WebAssembly-prepares-for-takeoff-in-2023?utm_source=substack&utm_medium=email) Zoom shares drop on light forecast as company faces 'heightened deal scrutiny' (https://www.cnbc.com/2022/11/21/zoom-zm-earnings-q3-2023.html?utm_source=newsletter&utm_medium=email&utm_campaign=newsletter_axioslogin&stream=top) What's coming for cloud computing in 2023 (https://www.infoworld.com/article/3680553/whats-coming-for-cloud-computing-in-2023.html) The Rise of Platform Engineering - Software Engineering Daily (https://softwareengineeringdaily.com/2020/02/13/setting-the-stage-for-platform-engineering/) IBM sues Micro Focus, claims it copied mainframe software (https://www.theregister.com/2022/11/22/ibm_sues_micro_focus_for/) How to beat the Kubernetes skills shortage (https://www.infoworld.com/article/3679749/how-to-beat-the-kubernetes-skills-shortage.html) TikTok Couldn't Ensure Accurate Responses To Government Inquiries, A ByteDance Risk Assessment Said (https://www.forbes.com/sites/emilybaker-white/2022/11/28/tiktok-inaccurate-government-inquiries-internal-bytedance-risk-assessment/?sh=7f57dc9723fe) Exclusive: Sam Bankman-Fried says he's down to $100,000 (https://www.axios.com/2022/11/29/sam-bankman-fried-100000-ftx-cftc-regulation?utm_source=newsletter&utm_medium=email&utm_campaign=newsletter_axiosprorata&stream=top) Why Big Tech is not rushing to clone Twitter (https://www.axios.com/newsletters/axios-login-1cea6d1a-1428-448d-b0d3-5da3ae9425ef.html?chunk=0&utm_term=emshare#story0) Amazon Alexa is a “colossal failure,” on pace to lose $10 billion this year (https://arstechnica.com/gadgets/2022/11/amazon-alexa-is-a-colossal-failure-on-pace-to-lose-10-billion-this-year/) I analyzed 290 booths at KubeCon - here are the DevOps trends for 2023 (https://www.uptime.build/post/i-analyzed-290-booths-at-kubecon-here-are-the-devops-trends-for-2023?utm_source=substack&utm_medium=email) Nonsense Billionaires like Elon Musk want to save civilization by having tons of genetically superior kids. Inside the movement to take 'control of human evolution.' (https://www.businessinsider.com/pronatalism-elon-musk-simone-malcolm-collins-underpopulation-breeding-tech-2022-11) Australia: How 'bin chickens' learnt to wash poisonous cane toads (https://www.bbc.com/news/world-australia-63699884) A 12,000 lb. metal sculpture of Elon Musk's head on a goat body riding a rocket parked outside Tesla HQ failed to elicit a response from the billionaire (https://www.businessinsider.com/elon-musk-head-on-goat-body-riding-a-rocket-sculpture-2022-11) The leap second's time will be up in 2035—and tech companies are thrilled (https://www.popsci.com/technology/bipm-abandon-leap-second/) Conferences THAT Conference Texas Speakers and Schedule (https://that.us/events/tx/2023/schedule/). Jan 15th-18th use code SDT for 5% off CloudNativeSecurityCon North America (https://events.linuxfoundation.org/cloudnativesecuritycon-north-america/), Seattle, Feb 1 – 2, 2023 DevOpsDays Birmingham, AL 2023 (https://devopsdays.org/events/2023-birmingham-al/welcome/), April 20 - 21, 2023 Listener Feedback Sudesh shared a list of Tech Companies Hiring (https://airtable.com/shrAPDHg8apj4mnRR/tbl6Kz4KeeCp3HrSM) Send “End of Year” listener questions to questions@softwaredefinedtalk.com (mailto:questions@softwaredefinedtalk.com). SDT news & hype Join us in Slack (http://www.softwaredefinedtalk.com/slack). Get a SDT Sticker! Send your postal address to stickers@softwaredefinedtalk.com (mailto:stickers@softwaredefinedtalk.com) and we will send you free laptop stickers! Follow us on Twitch (https://www.twitch.tv/sdtpodcast), Twitter (https://twitter.com/softwaredeftalk), Instagram (https://www.instagram.com/softwaredefinedtalk/), LinkedIn (https://www.linkedin.com/company/software-defined-talk/) and YouTube (https://www.youtube.com/channel/UCi3OJPV6h9tp-hbsGBLGsDQ/featured). Use the code SDT to get $20 off Coté's book, Digital WTF (https://leanpub.com/digitalwtf/c/sdt), so $5 total. Become a sponsor of Software Defined Talk (https://www.softwaredefinedtalk.com/ads)! Recommendations Brandon: The Complete History & Strategy of Qualcomm (https://www.acquired.fm/episodes/qualcomm) Matt: Kishi Bashi This Must Be The Place (https://www.youtube.com/watch?v=IslMHJFkIME) Carma (https://carma.com.au) car purchase: referral code: REF22-872E Photo Credits Header (https://unsplash.com/photos/K8i-gRJHT_0) CoverArt (https://twitter.com/DevchicaJasmin/status/1597874321510526978)

The Tech Blog Writer Podcast
2155: Databricks - The Story Behind the Lakehouse Company

The Tech Blog Writer Podcast

Play Episode Listen Later Oct 27, 2022 39:45


  Many are citing open source as the future. The UK Government's National Data Strategy even talks about the importance of opening public sector datasets to form the backbone of innovation, efficiency, and growth. This is a trend that Databricks is betting on in a big way. Databricks is the lakehouse company. More than 7,000 organizations worldwide — including Comcast, Condé Nast, H&M, and over 40% of the Fortune 500 — rely on the Databricks Lakehouse Platform to unify their data, analytics and AI. The company is headquartered in San Francisco, with offices around the globe. Founded by the original creators of Apache Spark™, Delta Lake and MLflow, Databricks is on a mission to help data teams solve the world's toughest problems. I have invited Dael Williamson, EMEA CTO, Field Advisory & Engineering at Databricks, to join me on Tech Talks Daily to share the story behind the company and how they are helping data teams solve the world's most challenging problems.

Engenharia de Dados [Cast]
Conferência Data+AI Summit 2022 da Databricks: Anúncios e Novidades por Luan Moreno

Engenharia de Dados [Cast]

Play Episode Listen Later Aug 31, 2022 52:13


Anúncios e Novidades da Conferência da Databricks, Data+AI Summit 2022, segue informações:https://databricks.com/dataaisummit/ Delta Lake 2.0https://databricks.com/blog/2022/06/30/open-sourcing-all-of-delta-lake.html MLFlow 2.0https://databricks.com/blog/2022/06/29/introducing-mlflow-pipelines-with-mlflow-2-0.html Project Lightspeedhttps://databricks.com/blog/2022/06/28/project-lightspeed-faster-and-simpler-stream-processing-with-apache-spark.html Spark Connecthttps://databricks.com/blog/2022/07/07/introducing-spark-connect-the-power-of-apache-spark-everywhere.html Databricks Runtime 11.0https://docs.databricks.com/release-notes/runtime/releases.html Databricks Workflowshttps://databricks.com/blog/2022/05/10/introducing-databricks-workflows.html DBT em Produção no Databrickshttps://databricks.com/blog/2022/06/29/top-5-workflows-announcements-at-data-ai-summit.html Delta Live Tables e Projeto Enzymehttps://databricks.com/blog/2022/06/29/delta-live-tables-announces-new-capabilities-and-performance-optimizations.html Novos Conectores do Databricks SQLhttps://databricks.com/blog/2022/06/29/connect-from-anywhere-to-databricks-sql.htmlDatabricks SQL ServerLesshttps://databricks.com/blog/2022/06/28/databricks-sql-serverless-now-available-on-aws.html Unity Cataloghttps://databricks.com/blog/2022/06/28/whats-new-with-databricks-unity-catalog-at-the-data-ai-summit-2022.htmlTerraform para Databrickshttps://databricks.com/blog/2022/06/22/databricks-terraform-provider-is-now-generally-available.html No YouTube possuímos um canal de Engenharia de Dados com os tópicos mais importantes dessa área e com lives todas as quartas-feiras.https://www.youtube.com/channel/UCnErAicaumKqIo4sanLo7vQ Quer ficar por dentro dessa área com posts e updates semanais, então acesse o LinkedIN para não perder nenhuma notícia.https://www.linkedin.com/in/luanmoreno/ Disponível no Spotify e na Apple Podcasthttps://open.spotify.com/show/5n9mOmAcjra9KbhKYpOMqYht Luan Moreno = https://www.linkedin.com/in/luanmoreno/

Engenharia de Dados [Cast]
Casos de Uso e Experiências de Campo com Apache Spark

Engenharia de Dados [Cast]

Play Episode Play 60 sec Highlight Listen Later Jul 8, 2022 69:02


Trazemos nesse episódio o especialista Pedro Toledo para falar um pouco da sua experiência com a tecnologia de Big Data mais utilizada do mundo. Discutimos sobre os seguintes temas:Importância do Apache Spark e Casos de UsoCurva de AprendizagemLinguagens de ProgramaçãoProblemas ComunsDBT vs. Apache Spark e Stack Moderna de DadosDelta Lake e Data LakehouseDicas para IniciantesA intenção principal é mostrar para um Engenheiro de Dados como o Apache Spark é uma poderosa ferramenta de Analytics e como a mesma pode ser utilizada para resolver problemas na área de Big Data.No YouTube possuímos um canal de Engenharia de Dados com os tópicos mais importantes dessa área e com lives todas as quartas-feiras.https://www.youtube.com/channel/UCnErAicaumKqIo4sanLo7vQ Quer ficar por dentro dessa área com posts e updates semanais, então acesse o LinkedIN para não perder nenhuma notícia.https://www.linkedin.com/in/luanmoreno/ Disponível no Spotify e na Apple Podcasthttps://open.spotify.com/show/5n9mOmAcjra9KbhKYpOMqYhttps://podcasts.apple.com/br/podcast/engenharia-de-dados-cast/ LinkedIN do Pedro Toledo = https://www.linkedin.com/in/pedro-toledo/  Luan Moreno = https://www.linkedin.com/in/luanmoreno/

AWS Podcast
#527: [INTRODUCING] Amazon EMR Serverless

AWS Podcast

Play Episode Listen Later Jun 5, 2022 17:23 Very Popular


Want an easier way to run big data applications in the cloud? Introducing Amazon EMR Serverless, a new deployment option in Amazon EMR that makes it easy for data analysts and engineers to run open-source big data analytics frameworks such as Apache Spark and Apache Hive without configuring, managing, and scaling clusters or servers. EMR Serverless automatically scales resources up and down to provide just the right amount of capacity for your application. You pay only for what you use and you can minimize concerns about over- or under-provisioning. In this episode, Radhika Ravirala (Principal Product Manager) and Natacha Maheshe (Principal Product Marketing Manager) join Hawn to discuss EMR Serverless use cases, how EMR Serverless addresses customers challenges, and how you can started. Learn more about EMR Serverless - https://go.aws/390CjW1 Read the blog - https://go.aws/38Q86sA Check out the What's New announcement - https://go.aws/39bMWFg Watch the video - https://bit.ly/3NxW0nb

serverless hawn apache spark apache hive amazon emr
Outspoken with Shana Cosgrove
For the Back of the Room: Benjamin Harvey, Founder & CEO of AI Squared, Research Professor, and Data Science Leader.

Outspoken with Shana Cosgrove

Play Episode Listen Later May 24, 2022 50:25


Consistency, Impact, and Versatility.In this episode of The Outspoken Podcast, host Shana Cosgrove talks to Benjamin Harvey, Founder & CEO of AI Squared, Research Professor, and Data Science Leader. He takes us on his multifaceted journey, spanning from Harvard to The NSA to Johns Hopkins University to his most recent endeavor, AI Squared. Benjamin gets technical, discussing everything from Hadoop to Apache Spark to ACID transactions. We hear how Benjamin's new journey as founder of AI Squared is rooted in his discipline and the simplicity of consistent hard work over time. Finally, Benjamin's NCAA basketball dominance is revealed, as he impresses us with his performance against Kevin Love and Russell Westbrook. QUOTES “Having that mindset of being focused on helping others, and not having to go back to that environment is really what kept me focused and gave me that grit, those habits to be able to be successful” - Benjamin Harvey [20:12] “General Nakasone is really focused on figuring out how to even enable folks that are in mission spaces to be able to do tours in industry and then come back. Because the point of the matter is that a guy like myself - I was interested in doing a tour in industry, but I didn't come back because programs like that don't exist.” - Benjamin Harvey [30:21] “How can I leave my mark for the next generation of Ben Harveys that come from similar backgrounds, face similar challenges - how can I take the experiences and the knowledge of navigating through some of these challenges and make it easier for that next generation to be successful?” - Benjamin Harvey [47:40]   TIMESTAMPS  [00:04] Intro [01:39] How Benjamin Knows Stephanie Beben [03:38] Benjamin's Day Job [05:33] Discussing JHU COVID-19 Risk Tools [09:19] Benjamin's Family [11:17] Was Benjamin This Driven as a Kid? [14:01] High School Experience [16:35] Benjamin's Scholarship [19:05] Benjamin's Discipline [21:24] From Harvard to Bowie [25:07] Highlights of Benjamin's Time at NSA [28:24] Why Benjamin Left the NSA [32:56] Advantages of Spark Over Hadoop [35:55] ACID Transactions [38:56] Moving on From Databricks [41:25] Creating AI Squared [44:47] Benjamin's Experience as Founder [46:16] Advice Benjamin Would Give His Younger Self [47:32] What Does Success Look Like? [48:05] Benjamin's Favorite Books [49:13] A Surprising Fact About Benjamin [50:05] Outro     RESOURCES https://www.linkedin.com/in/stephanie-beben/ (Stephanie Beben) https://www.nsa.gov/ (National Security Agency (NSA)) https://www.boozallen.com/ (Booz Allen Hamilton) https://www.nsa.gov/Signals-Intelligence/Overview/#:~:text=SIGINT%20is%20intelligence%20derived%20from,capabilities%2C%20actions%2C%20and%20intentions. (SIGINT) https://www.linkedin.com/in/jaysha-camacho/ (Jaysha Camacho Irizarry) https://www.nea.com/ (New Enterprise Associates) https://www.jhu.edu/ (Johns Hopkins University (JHU)) https://www.cdc.gov/ (Centers for Disease Control and Prevention (CDC)) https://covid19risktools.com:8443/ (COVID-19 Risk Tools) https://www.army.mil/ (United States Army) https://www.american.edu/ (American University) https://www.att.com/ (AT&T) https://www.mvsu.edu/ (Mississippi Valley State University) https://www.fldoe.org/accountability/assessments/k-12-student-assessment/archive/fcat/ (FCAT) https://www.gwu.edu/ (George Washington University (GWU)) https://www.harvard.edu/ (Harvard University) https://connects.catalyst.harvard.edu/Profiles/display/Person/42421 (Vincent James Carey, Ph.D.) https://www.bioconductor.org/ (Bioconductor) https://www.bowiestate.edu/ (Bowie State University) https://www.imdb.com/title/tt0119217/ (Good Will Hunting) https://www.imdb.com/title/tt0120660/ (Enemy of the State) https://www.theguardian.com/world/2013/jun/09/edward-snowden-nsa-whistleblower-surveillance (Edward Snowden) https://www.dni.gov/index.php?option=com_content&view=article&id=572&Itemid=991...

The Cloud Pod
TCP Talks: The Service Not the Software: Anthony Lye on Evolution and Revolution

The Cloud Pod

Play Episode Listen Later Apr 25, 2022 62:38


In this TCP Talks episode, Justin Brodley and Jonathan Baker talk with Anthony Lye, Executive Vice President and General Manager of NetApp's Public Cloud Services Business Unit. An industry veteran for over 25 years, Anthony has been at the forefront of cloud innovation for over half this time. Anthony shares his insight on the importance of embracing disruption in the tech industry. He discusses how NetApp seized the right opportunities, got lucky, and came to dominate the Cloud space — even while younger app developers may have no idea what it was. "They don't comprehend — nor should they — the complexities of infrastructure,” Anthony explains. “And I really love the fact that we've been able to democratize ONTAP, because it's cool, but you've got to be really smart to get the best out of it. And so we just decided we would be the smart ones.” What's really behind innovation in tech? “The context is where you are. And people like to think that the world operates through evolution. And sometimes it's revolution –- sometimes, you have to do something radically different.” Anthony also discusses cloud computing trends, the importance of customer focus, what NetApp does differently, and the multi-cloud. Featured Guest

What's Next|科技早知道
S6E06|对话Databricks联合创始人Reynold Xin:380 亿美元估值背后的长期主义

What's Next|科技早知道

Play Episode Listen Later Apr 6, 2022 78:44


Databricks 为什么是 Howie 眼里今后十年最优秀的那一批大数据公司?作为被 Bloomberg 等媒体评为 2022 年最值得期待 IPO 的公司之一,Databricks 的投资人名单聚集了亚马逊、谷歌、Salesforce、摩根士丹利、贝莱德、T. Rowe Price、Fidelity、A16Z 等主要厂商和顶级基金,不少投资人认为,上市后的 Databricks 将对齐目前市场估值 760 亿美元的 Snowflake,成为未来全球最有影响力的大数据厂商。 本期节目,主播 Howie 邀请 Databricks 联合创始人 Reynold Xin 辛湜,从创立初期讲起,深度分享大数据明星公司如何从小项目完成蜕变。Databricks 是怎么在十年前就确定自己的产品与商业模式,为什么不走定制化项目,为什么坚决走「云」?在后期从 1 到 100 的发展中,又是如何应对亚马逊微软等超级巨头的挑战?Snowflake 和 Databricks 如何看待彼此?为什么 Lake House 会是一个好赛道?Reynold 分享给 A16Z 等风投机构的独门人才招聘秘籍是什么?随着行业越来越成熟,大数据产业的下一个发展周期还会有哪些机遇?听完 Reynold的分享,可能我们每个人都会逐渐理解,为什么对大数据和人工智能的创业者和从业人员来说,长期主义非常必要。 本期人物 Howie,硅谷人工智能创投家,「科技早知道」客座主播 Reynold Xin,Databricks 联合创始人 主要话题 [03:03] Databricks 卖周边、会议门票的收入比实际产品还要多? [07:39] 为什么创业早期就决定针对「云」? [16:21] 从 2008 年、2009 年看,并不知道属于「云」的未来什么会到 [26:26] 为什么在开源上亚马逊竞争不过 Databricks ? [33:29] Databricks 业务暴增的转折点是什么? [43:51] 为什么说未来是「湖仓一体」? [52:26] Reynold 关于管理、招聘以及创业的方法论和心得 [01:09:53] 大数据与人工智能的下一个机遇在哪里?SaaS 模式还能如何探索? 延伸阅读 - TechCrunch 关于 Databricks 实现高速增长并达到 380亿美元估值的报道:As Databricks reaches $800M ARR, a fresh look at its last private valuation (https://techcrunch.com/2022/02/17/as-databricks-reaches-800m-arr-a-fresh-look-at-its-last-private-valuation/) - Databricks 关于 Lake House 的解释:What Is a Lakehouse? (https://databricks.com/blog/2020/01/30/what-is-a-data-lakehouse.html) - Databricks:2013 年成立于美国旧金山的大数据独角兽,公司雏形诞生于 Apache Spark,通过开源 SaaS 模式服务企业客户,合作对象包括微软、谷歌、阿里巴巴等互联网巨头;初期面对的竞争对手包括行业巨头 Hortonworks、Cloudarea、亚马逊,后期的主要竞争对手包括同为独角兽的 Snowflake,2021 年 8 月完成 16 亿美元的 H 轮融资,估值 380 亿美元。 - Docker:跟 Databricks 类似的大数据公司,2019-2021 年曾面临较大经营困境,近期刚完成 1.05亿美元由 Bain Capital 领投的C轮融资,最新估值 21 亿美元。 - Apache Spark:比较主流的开源大数据处理框架,由加州大学伯克利分校的 AMP Lab 开发,可以独立部署也可以部署在 Hadoop 集群中,与 MapReduce 类似但速度更快。 - Data Warehouse:数据仓,用于报告的数据分析的系统,BI 的核心组件,通常为业务人员使用,通常会与数据科学家常用的 Data Lake (数据湖)比较。 往期节目 - #45 股神加持云端独角兽 Snowflake,SaaS 的黄金 10 年来了? (https://guiguzaozhidao.fireside.fm/snowflake) - S3E04 曾经对抗大公司的开源软件,怎么就开始被大公司青睐了(上集 (https://guiguzaozhidao.fireside.fm/56)和下集 (https://guiguzaozhidao.fireside.fm/57)) 使用音乐 I Can't Get Enough - Love Beans 幕后制作 监制:刘灿 后期:Luke、Jack 运营:Yao 封面设计:饭团 关于节目 原「硅谷早知道」,全新改版后为「What's Next|科技早知道」。放眼全球,聚焦科技发展,关注商业格局变化。 关于我们 声动活泼的宗旨是「用声音碰撞世界」,致力于为人们提供源源不断的思考养料。 - 我们还有这些播客:声东击西 (https://etw.fm/episodes)、声动早咖啡 (https://sheng-espresso.fireside.fm/)、反潮流俱乐部 (https://fanchaoliuclub.fireside.fm/)、泡腾 VC (https://popvc.fireside.fm/)、商业WHY酱 (https://msbussinesswhy.fireside.fm/) - 欢迎在即刻 (https://okjk.co/Qd43ia)、微博等社交媒体上与我们互动,搜索 声动活泼 即可找到我们 - 期待你给我们写邮件,邮箱地址是:ting@sheng.fm - 如果你喜欢我们的节目,欢迎 打赏 (https://afdian.net/@shengfm)支持或把我们的节目推荐给一两位朋友 欢迎加入声动胡同小社区! 也许你知道「声动活泼」办公室在北京二环内的胡同里,事实上我们也有一个线上的「声动胡同小社区」。成为社区会员,你可以收到一周不少于三次的来自「声动小邮筒」的邮件,同时还可以参加我们各种各样的线上和线下活动,或者是一些有趣的游戏。 点击这里 (https://shengpodcasts.notion.site/a977c74222484894a9fe6245bc0f4dba)即可了解社区氛围。我们期待你加入这个虚拟胡同社区来支持我们,并和我们一起亲近交流,和有趣的人进行「碰撞」,收获新知、友谊并看见更大的世界。 国内用户(年付):加入声动胡同小社区 (https://sourl.cn/G4B2Wt) 海外用户(月付):加入声动胡同小社区 (https://sdhp.memberful.com/join) 期待你的加入! Special Guest: Reynold Xin.