Podcasts about Vertica

  • 38PODCASTS
  • 48EPISODES
  • 46mAVG DURATION
  • 1MONTHLY NEW EPISODE
  • Aug 22, 2024LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about Vertica

Latest podcast episodes about Vertica

DMRadio Podcast
Time After Time: Why the 'When' Matters

DMRadio Podcast

Play Episode Listen Later Aug 22, 2024 52:36


The legendary database guru Dr. Michael Stonebraker said nearly 20 years ago that "one size does not fit all." He was talking about database modes, and was pushing Vertica back then, a column-oriented database that was purpose built for analytics. Today, there are hundreds of databases on the market, including multi-modal databases that enable different architectures on the same foundation. Check out this episode of Future Proof to learn more as Host @eric_kavanagh interviews Avthar Sewrathan of Timescale DB!

The VentureFizz Podcast
Andy Palmer - Serial Entrepreneur, Investor, and Author

The VentureFizz Podcast

Play Episode Listen Later May 28, 2024 46:49


Episode 339 of The VentureFizz Podcast features Andy Palmer, serial entrepreneur, seed investor, and now author. Andy is one of my OG's, that being original guests. If you go way back into the archives of The VentureFizz Podcast, you'll find my first interview with him for Episode #22 back in 2018. In that interview, we talk about his full background story, his career building companies like Tamr & Vertica, plus investing in companies. For this interview, Andy and I connected for a different reason. He has written a book with his co-author, Paula Caliguiri titled “Live for a Living: How to Create Your Career Journey to Work Happier Not Harder.” The two met on a Bumble date which resulted in a friendship and a collaboration for this book, which is useful at any stage of one's career. It's ultimately a guide to designing a life that leverages your personal values, motivators, and goals in your career. I'm a fan of audiobooks, so it was fun listening to Andy and Paula narrate the book and I have to admit, they both have great voices for it. Plus, it was cool to know some of the people that they highlight in the book like Elizabeth Lawler, CEO & Co-Founder of AppMap, who we recently on my podcast. The format of this interview is different from my normal style which was a fun change of pace. In this episode, we chat about: * How Andy got involved in AI back in the 80's and how this is his third AI hype cycle, plus the details about his latest company called DBOS. * The inspiration behind the book. * The 5 phases of ideal careers that being: Starter Phase, Foundational Phase, High-Growth Phase, High-Focus Phase, and Give-Back Phase. We go through each phase in detail. * Why a simultaneous career path can be a benefit. * The advantages of working at startups and how it can push your career forward. * And more! There is obviously a lot more to this book than what we had the opportunity to chat about, so make sure you check it out! The title again is Live for a Living and you can find it on Amazon or the audio version Spotify or any other major outlet.

Programmers Quickie
When to Choose Delta Lake, ClickHouse, Druid, or Vertica

Programmers Quickie

Play Episode Listen Later Apr 18, 2024 4:44


Vertical Church / Iglesia Vertical
Su Corazón Por Los Perdidos // Un Lugar En La Mesa // Virgilio Sierra @Vertica_lSocial

Vertical Church / Iglesia Vertical

Play Episode Listen Later Dec 10, 2023 42:47


Bienvenido a Iglesia Vertical Online! No importa tu edad o tu historia, tú eres bienvenido! Nuestra misión es simple: Iglesia Vertical existe para apuntar gente hacia Dios, Enseñarles a seguir a Jesus y Equiparlos para marcar una diferencia. Estamos muy emocionados de poder ofrecer una experiencia de Iglesia Online a las 9AM, 10:45AM*English, 12:30PM, 5PM*English, 7PM. Conversa con nosotros en vivo, disfruta de momentos especiales de adoración y mira el servicio con toda tu familia. Somos una Iglesia Generosa y juntos estamos marcando una diferencia DAR: http://www.iglesiavertical.com/dar/ Únete a un grupo de vida virtual: https://govertical.churchcenter.com/g... Necesitas oración: https://govertical.churchcenter.com/p... Síguenos: ►► Instagram: https://www.instagram.com/vertical_so... ►► Facebook: https://www.facebook.com/verticalsocial

Vertical Church / Iglesia Vertical
Blessed Are The Pure In Heart // THE BEATITUDES // Virgilio Sierra @vertica_lsocial

Vertical Church / Iglesia Vertical

Play Episode Listen Later Oct 15, 2023 43:17


Welcome to Vertical Church! Whatever your age or life story, you are welcome! Our mission is simple: Vertical Church exists to point people up to God, teach them to follow Jesus, and equip them to make a difference. We're excited to offer live online church services at 9AM, 10:45AM*English, 12:30PM, 5PM*English & 7PM. Chat with us live, enjoy special moments of worship and watch together as a family. GIVE: To support our growth and global impact click here: http://vertical.life/give/ Join a Virtual Life Group: https://govertical.churchcenter.com/g... Need Prayer?: https://govertical.churchcenter.com/p... FOLLOW US: ►► Instagram: https://www.instagram.com/vertical_so... ►► Facebook: https://www.facebook.com/verticalsocial #thebeatitudes

Growing With Fishes Podcast
Commercial Aquaponics in OK w Bain Howard of Vertica Aquaponics @ 3rd Annual Virtual AP Canna Conf

Growing With Fishes Podcast

Play Episode Listen Later Oct 10, 2023 52:25


3rd Annual Virtual Aquaponic Cannabis Conference held on Jan 14 - 15th 2023 featuring speakers from around the world. Speakers Information: Bain Howard IG: @verticafarmsok Want to Support more content like this? Check out APMJClass.com & ThePestClass.com

AI in Action Podcast
E455 Gaurav Rao, EVP & GM of Machine Learning and AI at AtScale

AI in Action Podcast

Play Episode Listen Later Jun 23, 2023 17:42


Today's guest is Gaurav Rao, EVP & GM of Machine Learning and AI at AtScale in Boston, MA. Founded in 2013, AtScale enables smarter decision-making by accelerating the flow of data-driven insights. The company's semantic layer platform simplifies, accelerates, and extends business intelligence and data science capabilities for enterprise customers across all industries. With AtScale, customers are empowered to democratize data, implement self-service BI and build a more agile analytics infrastructure for better, more impactful decision making. AtScale has brought together a leadership team comprised of accomplished executives from across the global technology ecosystem. Drawing experience from companies including Vertica, Yahoo, Oracle, Dell, Citrix, AWS and IBM, AtScale is squarely focused on transforming the way organizations think about their data and analytics. Gaurav brings his expertise across Product, Engineering, Sales and Corporate Strategy to AtScale, specializing in Data Science, Machine Learning and AI in support of business needs across public, private and hybrid enterprise environments. In the episode, Gaurav will discuss: The interesting work he does with AtScale, An insight into their ML & AI team, The impact they bring to customers, Plans for growing the team, Educating and promoting trust in AI and What excites him for the future at AtScale

Authentic Leadership for Everyday People
061 Chris Lynch - The Punk Rock Venture Capitalist

Authentic Leadership for Everyday People

Play Episode Listen Later Nov 7, 2022 62:16


Chris Lynch is the Executive Chairman, and CEO at AtScale, the industry leader in data federation and cloud transformation. Sometimes known as the Punk Rock VC, Chris is a VC investor, operator, advisor, and mentor to dozens of entrepreneurs and startups. He has been in the founding team of Arrow Point, Acopia Networks and Vertica.We talk about the influence of punk rock and the Sex Pistols on his approach to work, why connection is the key to success in any industry. Chris is also pushing for more transparency and disclosure in equity compensation in privately owned companies, especially in high tech and private equity/venture capital environments. So we talked about the changes needed, and most importantly, about what questions people should ask before accepting a job in a privately owned company where equity is a significant part of the compensation.KEY TAKEAWAYS[02:40] - Chris shares how he arrived at where he is today.[05:01] - How work, love, and punk rock influenced Chris' business career. [08:23] - From dreams of playing bass in a punk rock band to inventing a persona that helped him 'fake it until he made it.'[11:04] - Why connection and authenticity are the keys to success no matter what you are 'selling.'[13:10] - How accepting failure as part of the process ultimately leads to success.[16:21] - "Celebrating success is fine. But it doesn't make you more successful. Overcoming failure makes you more successful."[19:41] - Chris shares how his definition of success has evolved over his career. [24:43] - What are the traits of a good Venture Capitalist (VC)?[26:25] - Chris shares his answer to the 'trick question' that opened the door to him working with the Godfather of VC (and a top 5 IPO).[28:50] - Chris shares the secret to his success and his message for those he mentors.[30:18] - How to join Chris in his mission to create more transparency in the private equity market.[34:57] - The most important questions to ask before joining a startup or a privately owned venture that features equity compensation. [37:51] - Why employees need to negotiate equity compensation the same way they would for cash. [39:47] - How to find more information about Chris and the types of investments he is [41:28] - Crowdsourcing the power of music and the tech entrepreneur ecosystem to do good through Tech Tackles X.[44:13] - Why perfection is the enemy of progress.[46:05] - Chris shares a question for listeners to answer for a special opportunity to meet with him via ZOOM. Jean dot O'Neill at escale.com.[47:16] - Chis shares the VC jargon that leaves him empty.[50:34] - Why we all need to stop fighting amongst ourselves to connect with what's real.[53:04] - Chis shares his favorite meal and drink.[53:50] - Why Chris believes the Sex Pistols changed the world with just one record.Contact Dino at: dino@al4ep.com Websites:al4ep.com AtScale.comreverbadvisors.comTech Tackles Cancer / Tech Tackles X: techtacklesx.orgOther Christopher Lynch links:LinkedIn:

The DotCom Magazine Entrepreneur Spotlight
Chuka Ikokwu, Co-Founder & CEO, Divercity, A DotCom Magazine Interview

The DotCom Magazine Entrepreneur Spotlight

Play Episode Listen Later Aug 8, 2022 22:55


About Chuka Ikokwu and Divercity: I am a results-driven diversity & inclusion advocate and Tech Entrepreneur with expertise in Product Management & Data Science. I am currently focused on helping companies build more diverse and inclusive workforces. In the past I managed Business Development Operations at Yahoo! Inc., and was responsible for conceiving and executing North American Business Development Initiatives; prior to founding Divercity.io, a DEI HR-tech platform, I was a Senior Manager of Analytics at Unity, WB, Ubisoft and CrowdStar Inc., using big data and tools like SQL, Snowflake, Hive, R, Vertica, RedShift, and Tableau to drive User Acquisition, Revenue, and Retention for groundbreaking Mobile Apps and triple-A titles. I worked at Riot Games in the past as employee no. 44. I earned my MBA from the MIT Sloan School of Management and a Bachelor of Science with honors from the University of Texas at Arlington. Currently, as a founder at Divercity, Inc., my main goal is to close the diversity and inclusion gap in the tech industry. Divercity (divercity.io) is a diversity and inclusion measurement and recruiting platform that helps forward-thinking companies and HR professionals accurately measure and track their diversity breakdown and find and connect with diverse talent. Our mission is to demystify and solve the diversity hiring gap in the tech industry and to bring more equity and inclusion to it through a combination of community, relationships, and artificial intelligence technology.

De Quamfy Show
★004 (NL) $QNT QRC-20 & Tokenomics!

De Quamfy Show

Play Episode Listen Later Apr 8, 2022 131:40


Het was weer een volle week met nieuws en updates in de Crypto wereld. Quant heeft een nieuwe update gelanceerd met als NUKE de QRC-20 Token.... Behandelde topics: Quant Network Year in review https://www.quant.network/pr-announce... Overledger 2.1.5 release https://www.quant.network/pr-announce... Finfluencers https://nos.nl/artikel/2410342-afm-fi... Vertical en horizontal markets https://en.wikipedia.org/wiki/Vertica... Siemens & JPM https://www.ft.com/content/e14582fc-a... Tokenomics tweets & infographic https://twitter.com/varg_88/status/13... ___________________________________________________________________________ Dit is de Nederlandstalige QNT community waar wij met elkaar spreken over alles dat met $QNT te maken heeft. NL/BE Quant Group https://t.me/Dutchqnt  ___________________________________________________________________________________ Door Ons Gebruikte Exchanges & Brokers Coinmetro: https://coinmetro.com/?ref=jarno Bitcoinmeester: https://www.bitcoinmeester.nl/?r=1000... Bitvavo: https://bitvavo.com/?a=03889DE34C Crypto .com https://crypto.com/app/2enw9fc23d

Data Transforming Business
S6E10: Advanced Analytics is the Future of Data with Vertica

Data Transforming Business

Play Episode Listen Later Mar 30, 2022 25:04


Advanced analytics is the next step for organisations that want to be at the cutting edge of data analysis. For decades, organisations have been generating reports, primarily documenting 'What happened?' or 'What is happening?'. Essentially, it was once solely about descriptive analytics reporting, until some decided to take it to the next level with diagnostic analytics - i.e. looking at why things happen. Fast forward to more recent years and the evolution of technologies like data lake and warehouse have led to explosive growth in data volume, velocity, and variety and increased the demand for organisations to garner greater insights from data. This is how advanced analytics was born. Advanced analytics: the next stage in data evolution On this podcast, Susan Walsh, Founder and MD of The Classification Guru, joins Jack Lambert, Technical Partnerships Manager at Jaguar TCS Racing. and Mark Whalley, Manager at Vertica. The trio of data experts shed light on the use cases of advanced analytics through the lens of Vertica's technology partnership with Jaguar TCS Racing. From looking at the current state of the industry to analysing the technical implementation of software into an organisation, you can expect a deep dive into the advanced analytics methods and tools driving the future of data. Discover: How Vertica is using advanced analytics to help businesses accelerate their data analysis The birth, journey, and growth of the technology partnership between Vertica and Jaguar TCS Racing Top tips on how to implement advanced analytics

The Stack Overflow Podcast
Column by your name: The analytics database that skips the rows

The Stack Overflow Podcast

Play Episode Listen Later Feb 16, 2022 24:34


These days, every company looking at analyzing their data for insights has a data pipeline setup. Many companies have a fast production database, often a NoSQL or key-value store, that goes through a data pipeline.The pipeline process performs some sort of extract-transform-load process on it, then routes it to a larger data store that the analytics tools can access. But what if you could skip some steps and speed up the process with a database purpose-built for analytics?On this sponsored episode of the podcast, we chat with Rohit (Ro) Amarnath, the CTO at Vertica, to find out how your analytics engine can speed up your workflow. After a humble beginning with a ZX Spectrum 128, he's now in charge of Vertica Accelerator, a SaaS version of the Vertica database. Vertica was founded by database researcher Dr. Michael Stonebreaker and Andrew Palmer. Dr. Stonebreaker helped develop several databases, including Postgres, Streambase, and VoltDB. Vertica was born out of research into purpose-built databases. Stonebreaker's research found that columnar database storage was faster for data warehouses because there were fewer read/writes per request. Here's a quick example that shows how columnar databases work. Suppose that you want all the records from a specific US state or territory. There are 52 possible values here (depending on how you count territories). To find all instances of a single state in a row-based DB, the search must check every row for the value of the state column. However, searching by column is faster by an order of magnitude: it just runs down the column to find matching values, then retrieves row data for the matches. The Vertica database was designed specifically for analytics as opposed to transactional databases. Ro spent some time at a Wall Street firm building reports—P&L, performance, profitability, etc. Transactions were important to day-to-day operations, but the real value of data came from analyses that showed where to cut costs or increase investments in a particular business. Analytics help with overall strategy, which tends to be more far-reaching and effective. For most of its life, Vertica has been an on-premises database managing a data warehouse. But with the ease of cloud storage, Vertica Accelerator is looking to give you a data lake as a service. If you're unfamiliar, data lakes take the data warehouse concept—central storage for all your data—and remove limits. You can have “rivers” of data flowing into your stores; if you go from a terabyte to a petabyte overnight, your cloud provider will handle it for you. Vertica has worked with plenty of industries that push massive amounts of data: healthcare, aviation, online games. They've built a lot of functionality into the database itself to speed up all manner of applications. One of their prospective customers had a machine learning model with thousands of lines of code that was reduced to about ten lines because so much was being done in the database itself. In the future, Vertica plans to offer more powerful management of data warehouses and lakes, including handling the metadata that comes with them. To learn more about Vertica's analytics databases, check out our conversation or visit their website.

The Stack Overflow Podcast
Column by your name: The analytics database that skips the rows

The Stack Overflow Podcast

Play Episode Listen Later Feb 16, 2022 24:34


These days, every company looking at analyzing their data for insights has a data pipeline setup. Many companies have a fast production database, often a NoSQL or key-value store, that goes through a data pipeline.The pipeline process performs some sort of extract-transform-load process on it, then routes it to a larger data store that the analytics tools can access. But what if you could skip some steps and speed up the process with a database purpose-built for analytics?On this sponsored episode of the podcast, we chat with Rohit (Ro) Amarnath, the CTO at Vertica, to find out how your analytics engine can speed up your workflow. After a humble beginning with a ZX Spectrum 128, he's now in charge of Vertica Accelerator, a SaaS version of the Vertica database. Vertica was founded by database researcher Dr. Michael Stonebreaker and Andrew Palmer. Dr. Stonebreaker helped develop several databases, including Postgres, Streambase, and VoltDB. Vertica was born out of research into purpose-built databases. Stonebreaker's research found that columnar database storage was faster for data warehouses because there were fewer read/writes per request. Here's a quick example that shows how columnar databases work. Suppose that you want all the records from a specific US state or territory. There are 52 possible values here (depending on how you count territories). To find all instances of a single state in a row-based DB, the search must check every row for the value of the state column. However, searching by column is faster by an order of magnitude: it just runs down the column to find matching values, then retrieves row data for the matches. The Vertica database was designed specifically for analytics as opposed to transactional databases. Ro spent some time at a Wall Street firm building reports—P&L, performance, profitability, etc. Transactions were important to day-to-day operations, but the real value of data came from analyses that showed where to cut costs or increase investments in a particular business. Analytics help with overall strategy, which tends to be more far-reaching and effective. For most of its life, Vertica has been an on-premises database managing a data warehouse. But with the ease of cloud storage, Vertica Accelerator is looking to give you a data lake as a service. If you're unfamiliar, data lakes take the data warehouse concept—central storage for all your data—and remove limits. You can have “rivers” of data flowing into your stores; if you go from a terabyte to a petabyte overnight, your cloud provider will handle it for you. Vertica has worked with plenty of industries that push massive amounts of data: healthcare, aviation, online games. They've built a lot of functionality into the database itself to speed up all manner of applications. One of their prospective customers had a machine learning model with thousands of lines of code that was reduced to about ten lines because so much was being done in the database itself. In the future, Vertica plans to offer more powerful management of data warehouses and lakes, including handling the metadata that comes with them. To learn more about Vertica's analytics databases, check out our conversation or visit their website.

Screaming in the Cloud
Developing Storage Solutions Before the Rest with AB Periasamay

Screaming in the Cloud

Play Episode Listen Later Feb 2, 2022 38:54


About ABAB Periasamy is the co-founder and CEO of MinIO, an open source provider of high performance, object storage software. In addition to this role, AB is an active investor and advisor to a wide range of technology companies, from H2O.ai and Manetu where he serves on the board to advisor or investor roles with Humio, Isovalent, Starburst, Yugabyte, Tetrate, Postman, Storj, Procurify, and Helpshift. Successful exits include Gitter.im (Gitlab), Treasure Data (ARM) and Fastor (SMART).AB co-founded Gluster in 2005 to commoditize scalable storage systems. As CTO, he was the primary architect and strategist for the development of the Gluster file system, a pioneer in software defined storage. After the company was acquired by Red Hat in 2011, AB joined Red Hat's Office of the CTO. Prior to Gluster, AB was CTO of California Digital Corporation, where his work led to scaling of the commodity cluster computing to supercomputing class performance. His work there resulted in the development of Lawrence Livermore Laboratory's “Thunder” code, which, at the time was the second fastest in the world.  AB holds a Computer Science Engineering degree from Annamalai University, Tamil Nadu, India.AB is one of the leading proponents and thinkers on the subject of open source software - articulating the difference between the philosophy and business model. An active contributor to a number of open source projects, he is a board member of India's Free Software Foundation.Links: MinIO: https://min.io/ Twitter: https://twitter.com/abperiasamy MinIO Slack channel: https://minio.slack.com/join/shared_invite/zt-11qsphhj7-HpmNOaIh14LHGrmndrhocA LinkedIn: https://www.linkedin.com/in/abperiasamy/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at Sysdig. Sysdig is the solution for securing DevOps. They have a blog post that went up recently about how an insecure AWS Lambda function could be used as a pivot point to get access into your environment. They've also gone deep in-depth with a bunch of other approaches to how DevOps and security are inextricably linked. To learn more, visit sysdig.com and tell them I sent you. That's S-Y-S-D-I-G dot com. My thanks to them for their continued support of this ridiculous nonsense.Corey: This episode is sponsored in part by our friends at Rising Cloud, which I hadn't heard of before, but they're doing something vaguely interesting here. They are using AI, which is usually where my eyes glaze over and I lose attention, but they're using it to help developers be more efficient by reducing repetitive tasks. So, the idea being that you can run stateless things without having to worry about scaling, placement, et cetera, and the rest. They claim significant cost savings, and they're able to wind up taking what you're running as it is, in AWS, with no changes, and run it inside of their data centers that span multiple regions. I'm somewhat skeptical, but their customers seem to really like them, so that's one of those areas where I really have a hard time being too snarky about it because when you solve a customer's problem, and they get out there in public and say, “We're solving a problem,” it's very hard to snark about that. Multus Medical, Construx.ai, and Stax have seen significant results by using them, and it's worth exploring. So, if you're looking for a smarter, faster, cheaper alternative to EC2, Lambda, or batch, consider checking them out. Visit risingcloud.com/benefits. That's risingcloud.com/benefits, and be sure to tell them that I said you because watching people wince when you mention my name is one of the guilty pleasures of listening to this podcast.in a siloCorey: Welcome to Screaming in the Cloud. I'm Corey Quinn. I'm joined this week by someone who's doing something a bit off the beaten path when we talk about cloud. I've often said that S3 is sort of a modern wonder of the world. It was the first AWS service brought into general availability. Today's promoted guest is the co-founder and CEO of MinIO, Anand Babu Periasamy, or AB as he often goes, depending upon who's talking to him. Thank you so much for taking the time to speak with me today.AB: It's wonderful to be here, Corey. Thank you for having me.Corey: So, I want to start with the obvious thing, where you take a look at what is the cloud and you can talk about AWS's ridiculous high-level managed services, like Amazon Chime. Great, we all see how that plays out. And those are the higher-level offerings, ideally aimed at problems customers have, but then they also have the baseline building blocks services, and it's hard to think of a more baseline building block than an object store. That's something every cloud provider has, regardless of how many scare quotes there are around the word cloud; everyone offers the object store. And your solution is to look at this and say, “Ah, that's a market ripe for disruption. We're going to build through an open-source community software that emulates an object store.” I would be sitting here, more or less poking fun at the idea except for the fact that you're a billion-dollar company now.AB: Yeah.Corey: How did you get here?AB: So, when we started, right, we did not actually think about cloud that way, right? “Cloud, it's a hot trend, and let's go disrupt is like that. It will lead to a lot of opportunity.” Certainly, it's true, it lead to the M&S, right, but that's not how we looked at it, right? It's a bad idea to build startups for M&A.When we looked at the problem, when we got back into this—my previous background, some may not know that it's actually a distributed file system background in the open-source space.Corey: Yeah, you were one of the co-founders of Gluster—AB: Yeah.Corey: —which I have only begrudgingly forgiven you. But please continue.AB: [laugh]. And back then we got the idea right, but the timing was wrong. And I had—while the data was beginning to grow at a crazy rate, end of the day, GlusterFS has to still look like an FS, it has to look like a file system like NetApp or EMC, and it was hugely limiting what we can do with it. The biggest problem for me was legacy systems. I have to build a modern system that is compatible with a legacy architecture, you cannot innovate.And that is where when Amazon introduced S3, back then, like, when S3 came, cloud was not big at all, right? When I look at it, the most important message of the cloud was Amazon basically threw everything that is legacy. It's not [iSCSI 00:03:21] as a Service; it's not even FTP as a Service, right? They came up with a simple, RESTful API to store your blobs, whether it's JavaScript, Android, iOS, or [AAML 00:03:30] application, or even Snowflake-type application.Corey: Oh, we spent ten years rewriting our apps to speak object store, and then they released EFS, which is NFS in the cloud. It's—AB: Yeah.Corey: —I didn't realize I could have just been stubborn and waited, and the whole problem would solve itself. But here we are. You're quite right.AB: Yeah. And even EFS and EBS are more for legacy stock can come in, buy some time, but that's not how you should stay on AWS, right? When Amazon did that, for me, that was the opportunity. I saw that… while world is going to continue to produce lots and lots of data, if I built a brand around that, I'm not going to go wrong.The problem is data at scale. And what do I do there? The opportunity I saw was, Amazon solved one of the largest problems for a long time. All the legacy systems, legacy protocols, they convinced the industry, throw them away and then start all over from scratch with the new API. While it's not compatible, it's not standard, it is ridiculously simple compared to anything else.No fstabs, no [unintelligible 00:04:27], no [root 00:04:28], nothing, right? From any application anywhere you can access was a big deal. When I saw that, I was like, “Thank you Amazon.” And I also knew Amazon would convince the industry that rewriting their application is going to be better and faster and cheaper than retrofitting legacy applications.Corey: I wonder how much that's retconned because talking to some of the people involved in the early days, they were not at all convinced they [laugh] would be able to convince the industry to do this.AB: Actually, if you talk to the analyst reporters, the IDC's, Gartner's of the world to the enterprise IT, the VMware community, they would say, “Hell no.” But if you talk to the actual application developers, data infrastructure, data architects, the actual consumers of data, for them, it was so obvious. They actually did not know how to write an fstab. The iSCSI and NFS, you can't even access across the internet, and the modern applications, they ran across the globe, in JavaScript, and all kinds of apps on the device. From [Snap 00:05:21] to Snowflake, today is built on object store. It was more natural for the applications team, but not from the infrastructure team. So, who you asked that mattered.But nevertheless, Amazon convinced the rest of the world, and our bet was that if this is going to be the future, then this is also our opportunity. S3 is going to be limited because it only runs inside AWS. Bulk of the world's data is produced everywhere and only a tiny fraction will go to AWS. And where will the rest of the data go? Not SAN, NAS, HDFS, or other blob store, Azure Blob, or GCS; it's not going to be fragmented. And if we built a better object store, lightweight, faster, simpler, but fully compatible with S3 API, we can sweep and consolidate the market. And that's what happened.Corey: And there is a lot of validity to that. We take a look across the industry, when we look at various standards—I mean, one of the big problems with multi-cloud in many respects is the APIs are not quite similar enough. And worse, the failure patterns are very different, of I don't just need to know how the load balancer works, I need to know how it breaks so I can detect and plan for that. And then you've got the whole identity problem as well, where you're trying to manage across different frames of reference as you go between providers, and leads to a bit of a mess. What is it that makes MinIO something that has been not just something that has endured since it was created, but clearly been thriving?AB: The real reason, actually is not the multi-cloud compatibility, all that, right? Like, while today, it is a big deal for the users because the deployments have grown into 10-plus petabytes, and now the infrastructure team is taking it over and consolidating across the enterprise, so now they are talking about which key management server for storing the encrypted keys, which key management server should I talk to? Look at AWS, Google, or Azure, everyone has their own proprietary API. Outside they, have [YAML2 00:07:18], HashiCorp Vault, and, like, there is no standard here. It is supposed to be a [KMIP 00:07:23] standard, but in reality, it is not. Even different versions of Vault, there are incompatibilities for us.That is where—like from Key Management Server, Identity Management Server, right, like, everything that you speak around, how do you talk to different ecosystem? That, actually, MinIO provides connectors; having the large ecosystem support and large community, we are able to address all that. Once you bring MinIO into your application stack like you would bring Elasticsearch or MongoDB or anything else as a container, your application stack is just a Kubernetes YAML file, and you roll it out on any cloud, it becomes easier for them, they're able to go to any cloud they want. But the real reason why it succeeded was not that. They actually wrote their applications as containers on Minikube, then they will push it on a CI/CD environment.They never wrote code on EC2 or ECS writing objects on S3, and they don't like the idea of [past 00:08:15], where someone is telling you just—like you saw Google App Engine never took off, right? They liked the idea, here are my building blocks. And then I would stitch them together and build my application. We were part of their application development since early days, and when the application matured, it was hard to remove. It is very much like Microsoft Windows when it grew, even though the desktop was Microsoft Windows Server was NetWare, NetWare lost the game, right?We got the ecosystem, and it was actually developer productivity, convenience, that really helped. The simplicity of MinIO, today, they are arguing that deploying MinIO inside AWS is easier through their YAML and containers than going to AWS Console and figuring out how to do it.Corey: As you take a look at how customers are adopting this, it's clear that there is some shift in this because I could see the story for something like MinIO making an awful lot of sense in a data center environment because otherwise, it's, “Great. I need to make this app work with my SAN as well as an object store.” And that's sort of a non-starter for obvious reasons. But now you're available through cloud marketplaces directly.AB: Yeah.Corey: How are you seeing adoption patterns and interactions from customers changing as the industry continues to evolve?AB: Yeah, actually, that is how my thinking was when I started. If you are inside AWS, I would myself tell them that why don't use AWS S3? And it made a lot of sense if it's on a colo or your own infrastructure, then there is an object store. It even made a lot of sense if you are deploying on Google Cloud, Azure, Alibaba Cloud, Oracle Cloud, it made a lot of sense because you wanted an S3 compatible object store. Inside AWS, why would you do it, if there is AWS S3?Nowadays, I hear funny arguments, too. They like, “Oh, I didn't know that I could use S3. Is S3 MinIO compatible?” Because they will be like, “It came along with the GitLab or GitHub Enterprise, a part of the application stack.” They didn't even know that they could actually switch it over.And otherwise, most of the time, they developed it on MinIO, now they are too lazy to switch over. That also happens. But the real reason that why it became serious for me—I ignored that the public cloud commercialization; I encouraged the community adoption. And it grew to more than a million instances, like across the cloud, like small and large, but when they start talking about paying us serious dollars, then I took it seriously. And then when I start asking them, why would you guys do it, then I got to know the real reason why they wanted to do was they want to be detached from the cloud infrastructure provider.They want to look at cloud as CPU network and drive as a service. And running their own enterprise IT was more expensive than adopting public cloud, it was productivity for them, reducing the infrastructure, people cost was a lot. It made economic sense.Corey: Oh, people always cost more the infrastructure itself does.AB: Exactly right. 70, 80%, like, goes into people, right? And enterprise IT is too slow. They cannot innovate fast, and all of those problems. But what I found was for us, while we actually build the community and customers, if you're on AWS, if you're running MinIO on EBS, EBS is three times more expensive than S3.Corey: Or a single copy of it, too, where if you're trying to go multi-AZ and you have the replication traffic, and not to mention you have to over-provision it, which is a bit of a different story as well. So, like, it winds up being something on the order of 30 times more expensive, in many cases, to do it right. So, I'm looking at this going, the economics of running this purely by itself in AWS don't make sense to me—long experience teaches me the next question of, “What am I missing?” Not, “That's ridiculous and you're doing it wrong.” There's clearly something I'm not getting. What am I missing?AB: I was telling them until we made some changes, right—because we saw a couple of things happen. I was initially like, [unintelligible 00:12:00] does not make 30 copies. It makes, like, 1.4x, 1.6x.But still, the underlying block storage is not only three times more expensive than S3, it's also slow. It's a network storage. Trying to put an object store on top of it, another, like, software-defined SAN, like EBS made no sense to me. Smaller deployments, it's okay, but you should never scale that on EBS. So, it did not make economic sense. I would never take it seriously because it would never help them grow to scale.But what changed in recent times? Amazon saw that this was not only a problem for MinIO-type players. Every database out there today, every modern database, even the message queues like Kafka, they all have gone scale-out. And they all depend on local block store and putting a scale-out distributed database, data processing engines on top of EBS would not scale. And Amazon introduced storage optimized instances. Essentially, that reduced to bet—the data infrastructure guy, data engineer, or application developer asking IT, “I want a SuperMicro, or Dell server, or even virtual machines.” That's too slow, too inefficient.They can provision these storage machines on demand, and then I can do it through Kubernetes. These two changes, all the public cloud players now adopted Kubernetes as the standard, and they have to stick to the Kubernetes API standard. If they are incompatible, they won't get adopted. And storage optimized that is local drives, these are machines, like, [I3 EN 00:13:23], like, 24 drives, they have SSDs, and fast network—like, 25-gigabit 200-gigabit type network—availability of these machines, like, what typically would run any database, HDFS cluster, MinIO, all of them, those machines are now available just like any other EC2 instance.They are efficient. You can actually put MinIO side by side to S3 and still be price competitive. And Amazon wants to—like, just like their retail marketplace, they want to compete and be open. They have enabled it. In that sense, Amazon is actually helping us. And it turned out that now I can help customers build multiple petabyte infrastructure on Amazon and still stay efficient, still stay price competitive.Corey: I would have said for a long time that if you were to ask me to build out the lingua franca of all the different cloud providers into a common API, the S3 API would be one of them. Now, you are building this out, multi-cloud, you're in all three of the major cloud marketplaces, and the way that you do that and do those deployments seems like it is the modern multi-cloud API of Kubernetes. When you first started building this, Kubernetes was very early on. What was the evolution of getting there? Or were you one of the first early-adoption customers in a Kubernetes space?AB: So, when we started, there was no Kubernetes. But we saw the problem was very clear. And there was containers, and then came Docker Compose and Swarm. Then there was Mesos, Cloud Foundry, you name it, right? Like, there was many solutions all the way up to even VMware trying to get into that space.And what did we do? Early on, I couldn't choose. I couldn't—it's not in our hands, right, who is going to be the winner, so we just simply embrace everybody. It was also tiring that to allow implement native connectors to all of them different orchestration, like Pivotal Cloud Foundry alone, they have their own standard open service broker that's only popular inside their system. Go outside elsewhere, everybody was incompatible.And outside that, even, Chef Ansible Puppet scripts, too. We just simply embraced everybody until the dust settle down. When it settled down, clearly a declarative model of Kubernetes became easier. Also Kubernetes developers understood the community well. And coming from Borg, I think they understood the right architecture. And also written in Go, unlike Java, right?It actually matters, these minute new details resonating with the infrastructure community. It took off, and then that helped us immensely. Now, it's not only Kubernetes is popular, it has become the standard, from VMware to OpenShift to all the public cloud providers, GKS, AKS, EKS, whatever, right—GKE. All of them now are basically Kubernetes standard. It made not only our life easier, it made every other [ISV 00:16:11], other open-source project, everybody now can finally write one code that can be operated portably.It is a big shift. It is not because we chose; we just watched all this, we were riding along the way. And then because we resonated with the infrastructure community, modern infrastructure is dominated by open-source. We were also the leading open-source object store, and as Kubernetes community adopted us, we were naturally embraced by the community.Corey: Back when AWS first launched with S3 as its first offering, there were a bunch of folks who were super excited, but object stores didn't make a lot of sense to them intrinsically, so they looked into this and, “Ah, I can build a file system and users base on top of S3.” And the reaction was, “Holy God don't do that.” And the way that AWS decided to discourage that behavior is a per request charge, which for most workloads is fine, whatever, but there are some that causes a significant burden. With running something like MinIO in a self-hosted way, suddenly that costing doesn't exist in the same way. Does that open the door again to so now I can use it as a file system again, in which case that just seems like using the local file system, only with extra steps?AB: Yeah.Corey: Do you see patterns that are emerging with customers' use of MinIO that you would not see with the quote-unquote, “Provider's” quote-unquote, “Native” object storage option, or do the patterns mostly look the same?AB: Yeah, if you took an application that ran on file and block and brought it over to object storage, that makes sense. But something that is competing with object store or a layer below object store, that is—end of the day that drives our block devices, you have a block interface, right—trying to bring SAN or NAS on top of object store is actually a step backwards. They completely missed the message that Amazon told that if you brought a file system interface on top of object store, you missed the point, that you are now bringing the legacy things that Amazon intentionally removed from the infrastructure. Trying to bring them on top doesn't make it any better. If you are arguing from a compatibility some legacy applications, sure, but writing a file system on top of object store will never be better than NetApp, EMC, like EMC Isilon, or anything else. Or even GlusterFS, right?But if you want a file system, I always tell the community, they ask us, “Why don't you add an FS option and do a multi-protocol system?” I tell them that the whole point of S3 is to remove all those legacy APIs. If I added POSIX, then I'll be a mediocre object storage and a terrible file system. I would never do that. But why not write a FUSE file system, right? Like, S3Fs is there.In fact, initially, for legacy compatibility, we wrote MinFS and I had to hide it. We actually archived the repository because immediately people started using it. Even simple things like end of the day, can I use Unix [Coreutils 00:19:03] like [cp, ls 00:19:04], like, all these tools I'm familiar with? If it's not file system object storage that S3 [CMD 00:19:08] or AWS CLI is, like, to bloatware. And it's not really Unix-like feeling.Then what I told them, “I'll give you a BusyBox like a single static binary, and it will give you all the Unix tools that works for local filesystem as well as object store.” That's where the [MC tool 00:19:23] came; it gives you all the Unix-like programmability, all the core tool that's object storage compatible, speaks native object store. But if I have to make object store look like a file system so UNIX tools would run, it would not only be inefficient, Unix tools never scaled for this kind of capacity.So, it would be a bad idea to take step backwards and bring legacy stuff back inside. For some very small case, if there are simple POSIX calls using [ObjectiveFs 00:19:49], S3Fs, and few, for legacy compatibility reasons makes sense, but in general, I would tell the community don't bring file and block. If you want file and block, leave those on virtual machines and leave that infrastructure in a silo and gradually phase them out.Corey: This episode is sponsored in part by our friends at Vultr. Spelled V-U-L-T-R because they're all about helping save money, including on things like, you know, vowels. So, what they do is they are a cloud provider that provides surprisingly high performance cloud compute at a price that—while sure they claim its better than AWS pricing—and when they say that they mean it is less money. Sure, I don't dispute that but what I find interesting is that it's predictable. They tell you in advance on a monthly basis what it's going to going to cost. They have a bunch of advanced networking features. They have nineteen global locations and scale things elastically. Not to be confused with openly, because apparently elastic and open can mean the same thing sometimes. They have had over a million users. Deployments take less that sixty seconds across twelve pre-selected operating systems. Or, if you're one of those nutters like me, you can bring your own ISO and install basically any operating system you want. Starting with pricing as low as $2.50 a month for Vultr cloud compute they have plans for developers and businesses of all sizes, except maybe Amazon, who stubbornly insists on having something to scale all on their own. Try Vultr today for free by visiting: vultr.com/screaming, and you'll receive a $100 in credit. Thats v-u-l-t-r.com slash screaming.Corey: So, my big problem, when I look at what S3 has done is in it's name because of course, naming is hard. It's, “Simple Storage Service.” The problem I have is with the word simple because over time, S3 has gotten more and more complex under the hood. It automatically tiers data the way that customers want. And integrated with things like Athena, you can now query it directly, whenever of an object appears, you can wind up automatically firing off Lambda functions and the rest.And this is increasingly looking a lot less like a place to just dump my unstructured data, and increasingly, a lot like this is sort of a database, in some respects. Now, understand my favorite database is Route 53; I have a long and storied history of misusing services as databases. Is this one of those scenarios, or is there some legitimacy to the idea of turning this into a database?AB: Actually, there is now S3 Select API that if you're storing unstructured data like CSV, JSON, Parquet, without downloading even a compressed CSV, you can actually send a SQL query into the system. IN MinIO particularly the S3 Select is [CMD 00:21:16] optimized. We can load, like, every 64k worth of CSV lines into registers and do CMD operations. It's the fastest SQL filter out there. Now, bringing these kinds of capabilities, we are just a little bit away from a database; should we do database? I would tell definitely no.The very strength of S3 API is to actually limit all the mutations, right? Particularly if you look at database, they're dealing with metadata, and querying; the biggest value they bring is indexing the metadata. But if I'm dealing with that, then I'm dealing with really small block lots of mutations, the separation of objects storage should be dealing with persistence and not mutations. Mutations are [AWS 00:21:57] problem. Separation of database work function and persistence function is where object storage got the storage right.Otherwise, it will, they will make the mistake of doing POSIX-like behavior, and then not only bringing back all those capabilities, doing IOPS intensive workloads across the HTTP, it wouldn't make sense, right? So, object storage got the API right. But now should it be a database? So, it definitely should not be a database. In fact, I actually hate the idea of Amazon yielding to the file system developers and giving a [file three 00:22:29] hierarchical namespace so they can write nice file managers.That was a terrible idea. Writing a hierarchical namespace that's also sorted, now puts tax on how the metadata is indexed and organized. The Amazon should have left the core API very simple and told them to solve these problems outside the object store. Many application developers don't need. Amazon was trying to satisfy everybody's need. Saying no to some of these file system-type, file manager-type users, what should have been the right way.But nevertheless, adding those capabilities, eventually, now you can see, S3 is no longer simple. And we had to keep that compatibility, and I hate that part. I actually don't mind compatibility, but then doing all the wrong things that Amazon is adding, now I have to add because it's compatible. I kind of hate that, right?But now going to a database would be pushing it to the whole new level. Here is the simple reason why that's a bad idea. The right way to do database—in fact, the database industry is already going in the right direction. Unstructured data, the key-value or graph, different types of data, you cannot possibly solve all that even in a single database. They are trying to be multimodal database; even they are struggling with it.You can never be a Redis, Cassandra, like, a SQL all-in-one. They tried to say that but in reality, that you will never be better than any one of those focused database solutions out there. Trying to bring that into object store will be a mistake. Instead, let the databases focus on query language implementation and query computation, and leave the persistence to object store. So, object store can still focus on storing your database segments, the table segments, but the index is still in the memory of the database.Even the index can be snapshotted once in a while to object store, but use objects store for persistence and database for query is the right architecture. And almost all the modern databases now, from Elasticsearch to [unintelligible 00:24:21] to even Kafka, like, message queue. They all have gone that route. Even Microsoft SQL Server, Teradata, Vertica, name it, Splunk, they all have gone object storage route, too. Snowflake itself is a prime example, BigQuery and all of them.That's the right way. Databases can never be consolidated. There will be many different kinds of databases. Let them specialize on GraphQL or Graph API, or key-value, or SQL. Let them handle the indexing and persistence, they cannot handle petabytes of data. That [unintelligible 00:24:51] to object store is how the industry is shaping up, and it is going in the right direction.Corey: One of the ways I learned the most about various services is by talking to customers. Every time I think I've seen something, this is amazing. This service is something I completely understand. All I have to do is talk to one more customer. And when I was doing a bill analysis project a couple of years ago, I looked into a customer's account and saw a bucket with okay, that has 280 billion objects in it—and wait was that billion with a B?And I asked them, “So, what's going on over there?” And there's, “Well, we built our own columnar database on top of S3. This may not have been the best approach.” It's, “I'm going to stop you there. With no further context, it was not, but please continue.”It's the sort of thing that would never have occurred to me to even try, do you tend to see similar—I would say they're anti-patterns, except somehow they're made to work—in some of your customer environments, as they are using the service in ways that are very different than ways encouraged or even allowed by the native object store options?AB: Yeah, when I first started seeing the database-type workloads coming on to MinIO, I was surprised, too. That was exactly my reaction. In fact, they were storing these 256k, sometimes 64k table segments because they need to index it, right, and the table segments were anywhere between 64k to 2MB. And when they started writing table segments, it was more often [IOPS-type 00:26:22] I/O pattern, then a throughput-type pattern. Throughput is an easier problem to solve, and MinIO always saturated these 100-gigabyte NVMe-type drives, they were I/O intensive, throughput optimized.When I started seeing the database workloads, I had to optimize for small-object workloads, too. We actually did all that because eventually I got convinced the right way to build a database was to actually leave the persistence out of database; they made actually a compelling argument. If historically, I thought metadata and data, data to be very big and coming to object store make sense. Metadata should be stored in a database, and that's only index page. Take any book, the index pages are only few, database can continue to run adjacent to object store, it's a clean architecture.But why would you put database itself on object store? When I saw a transactional database like MySQL, changing the [InnoDB 00:27:14] to [RocksDB 00:27:15], and making changes at that layer to write the SS tables [unintelligible 00:27:19] to MinIO, and then I was like, where do you store the memory, the journal? They said, “That will go to Kafka.” And I was like—I thought that was insane when it started. But it continued to grow and grow.Nowadays, I see most of the databases have gone to object store, but their argument is, the databases also saw explosive growth in data. And they couldn't scale the persistence part. That is where they realized that they still got very good at the indexing part that object storage would never give. There is no API to do sophisticated query of the data. You cannot peek inside the data, you can just do streaming read and write.And that is where the databases were still necessary. But databases were also growing in data. One thing that triggered this was the use case moved from data that was generated by people to now data generated by machines. Machines means applications, all kinds of devices. Now, it's like between seven billion people to a trillion devices is how the industry is changing. And this led to lots of machine-generated, semi-structured, structured data at giant scale, coming into database. The databases need to handle scale. There was no other way to solve this problem other than leaving the—[unintelligible 00:28:31] if you looking at columnar data, most of them are machine-generated data, where else would you store? If they tried to build their own object storage embedded into the database, it would make database mentally complicated. Let them focus on what they are good at: Indexing and mutations. Pull the data table segments which are immutable, mutate in memory, and then commit them back give the right mix. What you saw what's the fastest step that happened, we saw that consistently across. Now, it is actually the standard.Corey: So, you started working on this in 2014, and here we are—what is it—eight years later now, and you've just announced a Series B of $100 million dollars on a billion-dollar valuation. So, it turns out this is not just one of those things people are using for test labs; there is significant momentum behind using this. How did you get there from—because everything you're saying makes an awful lot of sense, but it feels, at least from where I sit, to be a little bit of a niche. It's a bit of an edge case that is not the common case. Obviously, I missing something because your investors are not the types of sophisticated investors who see something ridiculous and, “Yep. That's the thing we're going to go for.” There right more than they're not.AB: Yeah. The reason for that was the saw what we were set to do. In fact, these are—if you see the lead investor, Intel, they watched us grow. They came into Series A and they saw, everyday, how we operated and grew. They believed in our message.And it was actually not about object store, right? Object storage was a means for us to get into the market. When we started, our idea was, ten years from now, what will be a big problem? A lot of times, it's hard to see the future, but if you zoom out, it's hidden in plain sight.These are simple trends. Every major trend pointed to world producing more data. No one would argue with that. If I solved one important problem that everybody is suffering, I won't go wrong. And when you solve the problem, it's about building a product with fine craftsmanship, attention to details, connecting with the user, all of that standard stuff.But I picked object storage as the problem because the industry was fragmented across many different data stores, and I knew that won't be the case ten years from now. Applications are not going to adopt different APIs across different clouds, S3 to GCS to Azure Blob to HDFS to everything is incompatible. I saw that if I built a data store for persistence, industry will consolidate around S3 API. Amazon S3, when we started, it looked like they were the giant, there was only one cloud industry, it believed mono-cloud. Almost everyone was talking to me like AWS will be the world's data center.I certainly see that possibility, Amazon is capable of doing it, but my bet was the other way, that AWS S3 will be one of many solutions, but not—if it's all incompatible, it's not going to work, industry will consolidate. Our bet was, if world is producing so much data, if you build an object store that is S3 compatible, but ended up as the leading data store of the world and owned the application ecosystem, you cannot go wrong. We kept our heads low and focused on the first six years on massive adoption, build the ecosystem to a scale where we can say now our ecosystem is equal or larger than Amazon, then we are in business. We didn't focus on commercialization; we focused on convincing the industry that this is the right technology for them to use. Once they are convinced, once you solve business problems, making money is not hard because they are already sold, they are in love with the product, then convincing them to pay is not a big deal because data is so critical, central part of their business.We didn't worry about commercialization, we worried about adoption. And once we got the adoption, now customers are coming to us and they're like, “I don't want open-source license violation. I don't want data breach or data loss.” They are trying to sell to me, and it's an easy relationship game. And it's about long-term partnership with customers.And so the business started growing, accelerating. That was the reason that now is the time to fill up the gas tank and investors were quite excited about the commercial traction as well. And all the intangible, right, how big we grew in the last few years.Corey: It really is an interesting segment, that has always been something that I've mostly ignored, like, “Oh, you want to run your own? Okay, great.” I get it; some people want to cosplay as cloud providers themselves. Awesome. There's clearly a lot more to it than that, and I'm really interested to see what the future holds for you folks.AB: Yeah, I'm excited. I think end of the day, if I solve real problems, every organization is moving from compute technology-centric to data-centric, and they're all looking at data warehouse, data lake, and whatever name they give data infrastructure. Data is now the centerpiece. Software is a commodity. That's how they are looking at it. And it is translating to each of these large organizations—actually, even the mid, even startups nowadays have petabytes of data—and I see a huge potential here. The timing is perfect for us.Corey: I'm really excited to see this continue to grow. And I want to thank you for taking so much time to speak with me today. If people want to learn more, where can they find you?AB: I'm always on the community, right. Twitter and, like, I think the Slack channel, it's quite easy to reach out to me. LinkedIn. I'm always excited to talk to our users or community.Corey: And we will of course put links to this in the [show notes 00:33:58]. Thank you so much for your time. I really appreciate it.AB: Again, wonderful to be here, Corey.Corey: Anand Babu Periasamy, CEO and co-founder of MinIO. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with what starts out as an angry comment but eventually turns into you, in your position on the S3 product team, writing a thank you note to MinIO for helping validate your market.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Tech Sales Insights
E35 – Self service analytics and insightful decision making with Chris Lynch CEO, AtScale

Tech Sales Insights

Play Episode Listen Later Nov 24, 2021 50:22


Prior to AtScale, Chris co-founded and was a General Partner at Accomplice, a venture capital firm investing in early-stage tech companies. Chris has held leadership roles at tech startups like Vertica, Acopia Networks, and Arrowpoint Communications. Send in a voice message: https://anchor.fm/salescommunity/message

DevZen Podcast
Вертикальный Timescale — Episode 0360

DevZen Podcast

Play Episode Listen Later Nov 14, 2021 150:03


В этом выпуске: где правильнее всего пасти бочку с пивом; невероятные легенды, в которых поется об offline-митапах; C-Store, Vertica и классические вайтпеперы о колоночно-аналитических СУБД; кто может вам помочь, если вы все еще на PostgtreSQL 9.6; малоизвестные возможности PostgreSQL; а также как работают непрерывные агрегаты в TimescaleDB; и кинозен. Шоуноты: [00:01:22] Что мы узнали за… Читать далее →

postgresql timescale c store vertica timescaledb
Growing With Fishes Podcast
Commercial Aquaponic Cannabis Round Table on Future Cannabis Project 6/1/2020

Growing With Fishes Podcast

Play Episode Listen Later Oct 25, 2021 182:01


Stephen Raisner Potent Ponics & Organic Innovations @PotentPonics Marty Waddell, APMeds @APMeds Tanner Stewart, Stewart Farms @Tannerstewart.life Laine Keyes, Habitat Life @Chief_Cultivator Bain Howard, Vertica @vertica.ok Originally aired on the Future Cannabis Project:https://www.youtube.com/watch?v=cmNiwqmRN2w&t=24s&ab_channel=FutureCannabisProject

Data Engineering Podcast
Accelerating ML Training And Delivery With In-Database Machine Learning

Data Engineering Podcast

Play Episode Listen Later Jun 15, 2021 65:32


When you build a machine learning model, the first step is always to load your data. Typically this means downloading files from object storage, or querying a database. To speed up the process, why not build the model inside the database so that you don't have to move the information? In this episode Paige Roberts explains the benefits of pushing the machine learning processing into the database layer and the approach that Vertica has taken for their implementation. If you are looking for a way to speed up your experimentation, or an easy way to apply AutoML then this conversation is for you.

Fremtidens Ledelse - to go!
Miniserie del 5 – Vertica (Progressive virksomheder)

Fremtidens Ledelse - to go!

Play Episode Listen Later May 18, 2021 76:01


Vertica er en ikke-hierarkiserende virksomhed i hjertet af Århus, med fokus på selvledelse, teal og en stærk kultur.

Sammen Om Mental Sundhed
Arbejde på distancen

Sammen Om Mental Sundhed

Play Episode Listen Later Apr 28, 2021 23:38


I denne episode kan du møde Ane Stürup fra Vertica samt Ninna Maria Wilstrup og Maja Vilhelmsen fra Det Nationale Forskningscenter for Arbejdsmiljø. De fortæller om mental sundhed i distancearbejde.  Læs mere om partnerskabet bag podcasten, og find værktøjer til at styrke den mentale sundhed på arbejdspladsen på mentalsundhed.dk.

arbejde arbejdsmilj vertica det nationale forskningscenter
Growing With Fishes Podcast
Growing With Fishes Podcast 235 Bain Of Vertica Aquaponics The Largest Aquaponic MJ Grow In Oklahoma

Growing With Fishes Podcast

Play Episode Listen Later Apr 16, 2021 91:14


Growing With Fishes Podcast. A podcast dedicated to growing aquaponic & cannabis and spreading information to the masses about sustainable cannabis and veggie production! Stoney Scholar Family Relief Fund https://www.gofundme.com/f/ahvbf-stoney-scholar-family-relief-fund Vertica Insta: @VerticaFarmsOK @Vertica.ok Growing With Fishes Podcast Discord https://discord.gg/nqBf3bj Aquaponic Cannabis Merch https://jellibomb.myshopify.com/collections/aquaponic-cannabis-conference-2020?fbclid=IwAR3P2ym57P0OXaAJHXozGLh8lQxxeE_SHwFiDYlLfTgYTW4lHnscLoew_7A Potent Ponics Classes http://www.APMJClass.com Marty's Channel APMeds https://www.youtube.com/user/mwaddell6901/videos Steve Channel Potent Ponics https://www.youtube.com/channel/UCRkqYlFzKpbCXreVKPYFlGg Facebook group Aquaponic Cannabis Growers https://www.facebook.com/groups/1510902559180077/ Potentponics.com True Aquaponic Nutrients https://trueaquaponics.com/?ref=zQK0Q

Tech Sales Insights
E23 - Two Winners in Every Deal with Colin Mahony, Vertica

Tech Sales Insights

Play Episode Listen Later Mar 31, 2021 38:05


He leads the Vertica business for Micro Focus, helping the world's most data-driven organizations leverage and monetize their business data through advanced analytics and machine learning. Founded in 2005, Vertica offers one of the industry's most advanced analytics platforms for ingesting, processing, and storing large volumes of continuously flowing information. In 2011, he joined HP as part of the highly successful acquisition of Vertica and took over as VP and GM of HP Vertica, where he guided the business to remarkable annual growth and recognized industry leadership. Prior to Vertica, he was VP at Bessemer Venture Partners, primarily focused on enterprise software, telecom, and digital media investments. Prior to Bessemer, he worked at Lazard Tech Partners and was a Senior Analyst at the Yankee Group. He earned an MBA from Harvard Business School and a Bachelor's degree in Economics with a minor in Computer Science from Georgetown University. Join David Nour and Randy Seidl on this episode of the Tech Sales Insights podcast with Colin Mahony. Send in a voice message: https://anchor.fm/salescommunity/message

Gestalt IT
Object Storage is the Future of Primary Storage

Gestalt IT

Play Episode Listen Later Oct 13, 2020 28:55


On-premises object storage has traditionally been associated with archive data. Scale and density were important, but the performance was not a major consideration. Fast object storage was an oxymoron. That paradigm is changing. DevOps teams across enterprises are now building container-based applications leveraging S3. High-performance applications are being designed to bring the capabilities and benefits of S3 and cloud-optimized architectures to on-premises environments. Vertica in Eon mode and Splunk SmartStore are a couple of examples highlighting this trend. Will this trend become more prevalent? Will a majority of new high-performance applications be designed with object storage? Will object storage become the future of primary storage? Tune in to this podcast to find out. © Gestalt IT, LLC for Gestalt IT: Object Storage is the Future of Primary Storage

Insider Research im Gespräch
Data Analytics in der hybriden Cloud-Welt, Sabrina Corazza, Vertica, und Martin Maier, Pure Storage

Insider Research im Gespräch

Play Episode Listen Later Sep 24, 2020 30:58


Was macht modernes Data Analytics aus? Wo liegen die aktuellen Herausforderungen? Welche Rolle spielen dabei Cloud und lokale IT-Infrastrukturen? Das Interview von Oliver Schonschek, Insider Research, mit Sabrina Corazza von Vertica und Martin Maier von Pure Storage liefert Antworten.

DataMasters
[DataMasters] Dataops and the state of digital transformation

DataMasters

Play Episode Listen Later Sep 23, 2020 30:02


Episode 12, Season 1“Andy Palmer is a serial entrepreneur who's helped found or fund more than 50 companies, including Vertica, a pioneer in the database management industry. He's currently CEO of Tamr, a data mastering platform that uses machine learning to do the heavy lifting around consolidating, cleansing, and categorizing data. Andy's also served as CIO of Infinity Pharmaceuticals, a biotech startup, and ran data and software engineering operations at the research group of pharmaceutical company Novartis.Andy's approach to data mastering involves dataops. This concept has emerged in recent years and shares traits with dev ops. He discusses what dataops is and how it helps companies with their digital transformation projects. He also talks about Tamr's upcoming DataMasters Summit and how the event will provide people with practical advice on where to start their dataops journeys.”To find more information about DataMasters Summit 2020, including the complete list of speakers, and to register to attend, please visit: http://tamr.com/summit2020.

כל תכני עושים היסטוריה
[DataMasters] Dataops and the state of digital transformation

כל תכני עושים היסטוריה

Play Episode Listen Later Sep 23, 2020 30:03


Andy Palmer is a serial entrepreneur who's helped found or fund more than 50 companies, including Vertica, a pioneer in the database management industry. He's currently CEO of Tamr, a data mastering platform that uses machine learning to do the heavy lifting around consolidating, cleansing, and categorizing data. Andy's also served as CIO of Infinity Pharmaceuticals, a biotech startup, and ran data and software engineering operations at the research group of pharmaceutical company Novartis.Andy's approach to data mastering involves dataops. This concept has emerged in recent years and shares traits with dev ops. He discusses what dataops is and how it helps companies with their digital transformation projects. He also talks about Tamr's upcoming DataMasters Summit and how the event will provide people with practical advice on where to start their dataops journeys.DataMasters is produced by PI Media for Tamr.

DataMasters
[DataMasters] Dataops and the state of digital transformation

DataMasters

Play Episode Listen Later Sep 23, 2020 30:02


Andy Palmer is a serial entrepreneur who’s helped found or fund more than 50 companies, including Vertica, a pioneer in the database management industry. He’s currently CEO of Tamr, a data mastering platform that uses machine learning to do the heavy lifting around consolidating, cleansing, and categorizing data. Andy’s also served as CIO of Infinity Pharmaceuticals, a biotech startup, and ran data and software engineering operations at the research group of pharmaceutical company Novartis. Andy’s approach to data mastering involves dataops. This concept has emerged in recent years and shares traits with dev ops. He discusses what dataops is and how it helps companies with their digital transformation projects. He also talks about Tamr’s upcoming DataMasters Summit and how the event will provide people with practical advice on where to start their dataops journeys.

DataMasters
[DataMasters] Dataops and the state of digital transformation

DataMasters

Play Episode Listen Later Sep 23, 2020 30:02


Episode 12, Season 1“Andy Palmer is a serial entrepreneur who’s helped found or fund more than 50 companies, including Vertica, a pioneer in the database management industry. He’s currently CEO of Tamr, a data mastering platform that uses machine learning to do the heavy lifting around consolidating, cleansing, and categorizing data. Andy’s also served as CIO of Infinity Pharmaceuticals, a biotech startup, and ran data and software engineering operations at the research group of pharmaceutical company Novartis.Andy’s approach to data mastering involves dataops. This concept has emerged in recent years and shares traits with dev ops. He discusses what dataops is and how it helps companies with their digital transformation projects. He also talks about Tamr’s upcoming DataMasters Summit and how the event will provide people with practical advice on where to start their dataops journeys.”To find more information about DataMasters Summit 2020, including the complete list of speakers, and to register to attend, please visit: http://tamr.com/summit2020.

DE GODIN A GODIN
T2.E6 Adaptándonos al nuevo mundo digital con Leslie Alonso

DE GODIN A GODIN

Play Episode Listen Later Aug 15, 2020 58:02


En este capítulo hablamos sobre la adaptación al mundo digital, Leslie Alonso nos comparte como estamos inmersos en esta digitalización y como nos puede ayudar a ser más eficientes, al mismo tiempo que nos comparte de como fue su salto del mundo Godin a Emprendedora. Ahora te cuento un poco de Leslie: Su pasión es pasar el tiempo surfeando en medio de datos y construir analítica que resuelva problemas del negocio. Experiencia en: Transformación Digital - Visualización y Monetización de Datos - Retail Co-Fundadora de Pabis Retail, donde dan solución analítica que a través de tecnología ayuda a CPGs y Retailers a crecer tanto en su proceso de madurez analítica como en sus negocios. Pionera a nivel Latam en implementar tecnologías de Big Data como EON de Vertica, RPA y Tableau Embedded para brindar un mejor desempeño en Modelos Analíticos basados en grandes volúmenes de datos. En el pasado, Leslie fue Cuenta Clave para marcas de consumo de masivo con foco en Ventas, Operaciones y Cadena de Suminsitro, y Analista de Datos orientada a automatizar y construir modelos analíticos que impactan la rentabilidad del negocio en temas de Trade Marketing, Pricing, Field Sales y Supply Chain para CPGs. También es docente en el ISDI, donde imparte clases de Transformación Digital y Visualización de Datos No te lo puedes perder!

theCUBE Insights
Vertica BDC 2020

theCUBE Insights

Play Episode Listen Later Apr 1, 2020 12:19


theCUBE host Dave Vellante (@dvellante) shares his analysis of the Vertica BDC keynote from theCUBE studiosTo see more of our coverage of this event, please visit: https://www.thecube.net/vertica-bigdata-2020

vertica dave vellante
De Dataloog
DTL044 - Hoe het NLse MonetDB het laatste druppeltje performance uit een database haalt

De Dataloog

Play Episode Listen Later Dec 15, 2019 46:53


Nu serverless en de cloud het nieuwe paradigma lijken te zijn, zou je bijna vergeten dat het optimaliseren van databases op bare metal (on premise) minstens zo belangrijk is. Als het uitvoeren van complexe queries de bottleneck in data science projecten wordt, moet je overwegen om de snelheid van het datawarehouse te verbeteren met een column store database.   We hebben het al eens eerder gehad over column store databases zoals Vertica. In deze uitzending staat MonetDB centraal. MonetDB is een spin off van het Centrum voor Wiskunde en Informatica (CWI) en heeft als start up nu een investering van ServiceNow gekregen. Zij zijn in staat om queries 700x (!!) sneller te laten draaien dan traditionele SQL omgevingen. Ook zijn ze in staat om Machine Learning in de database zelf toe te passen, zelfs Tensorflow! Lex Knape en Jurjen Helmus spreken in deze uitzending met Ying (Jenny) Zhang COO en Niels Nes CTO van MonetDB. We praten over column store database technologie, de optimalisatie van snelheid en over hoe het contract met ServiceNow tot stand gekomen is. Lex leek in het begin nog niet zo overtuigd maar de 700x snellere database blijft toch wel even hangen! shownotes en meer info op: https://www.dedataloog.nl/uitzending/dtl-044-hoe-het-nlse-monetdb-het-laatste-druppeltje-performance-uit-een-server-haalt/

SQL Server רדיו
לחיות בניו זילנד

SQL Server רדיו

Play Episode Listen Later Nov 5, 2019 38:30


Guy is back from New Zealand. we talk a little bit about his trip, SQL Server 2019, pagination techniques and performance, projections in Vertica, lessons learnt from Availability Groups performance case, and more

new zealand sql server vertica availability groups
RIoT Underground
XXI: Big Data, Data Analytics, and IoT, with Paige Roberts, Open Source Relations Manager at Vertica

RIoT Underground

Play Episode Listen Later Oct 15, 2019 38:50


Join RIoT's Executive Director Tom Snyder as he hosts a RIoT Underground discussion with Paige Roberts, Open Source Relations Manager at Vertica. Talking everything from data analytics and IoT, large-scale analytics, machine learning and predictive analytics, with applications in everything from self-driving cars to healthcare. Don't miss it!###Vertica provides a best-in-class, unified analytics platform that will forever be independent from underlying infrastructure. https://www.vertica.com/Support the show (http://www.ncriot.org)

SQL Server Radio
The Best Song of the 90s

SQL Server Radio

Play Episode Listen Later Sep 5, 2019 28:57


In this show: MySQL, Vertica, Azure Data Explorer and Matan 24 Hours of PASS: Summit Preview 2019 PASS Summit 2019: Session Schedule The Data Bias in Song Contests How SQL Server Compares Strings with Trailing Spaces Using Azure Data Explorer and Fastly to Improve Taboola Customer Experiences Kusto in Taboola – Azure Data Explorer

Momenta Edge
#65 Coversation with Jay Allardyce - Uptake Software, Industrial IoT

Momenta Edge

Play Episode Listen Later Jul 3, 2019 48:54


Jay Allardyce is the Chief Product Officer and Head of Customer Success at Uptake Software.  Our conversation touched on his extensive background in technology and analytics from his time at HP, Vertica, and GE prior to joining Uptake.  We discuss the evolution of the market around analytic software, some of the key experiences and lessons from working with different types of customers and business problems, and the unique value-add that AI provides when embedded in business process.  We discuss the landscape of IoT platforms and the critical elements for success both in terms of the platform as well as in organizations adopting advanced connected solutions, and dive into some of the aspects of the energy industry that make it a unique, and exciting sector for innovation.  Lastly, he shares his vision for AI-powered transformation and the unique opportunities across different industries today.

The VentureFizz Podcast
Episode 104: Chris Lynch - CEO and Executive Chairman of AtScale

The VentureFizz Podcast

Play Episode Listen Later Jun 2, 2019 57:39


Welcome to Episode 104 of The VentureFizz Podcast, the flagship podcast from the leading authority for jobs & careers in the tech industry. For this episode of our podcast, I interviewed Chris Lynch, CEO and Executive Chairman of AtScale. Chris is a serial entrepreneur and operator who has a tremendous track record at building successful tech companies like Vertica, Acopia Networks, Arrowpoint Communications, and others. He also co-founded and was a General Partner at Accomplice, a VC firm where he led investments in early-stage companies. He's back into an operating role with AtScale, the global leader in data warehouse virtualization. The company has raised $120M in funding, including a $50M Series D round which was announced this past December and led by Morgan Stanley. In this episode of our podcast, we cover lots of topics, like: -Chris' background and what his paper route taught him about entrepreneurship. -A walk through the different companies and his experience as an investor. -All about his current company AtScale, including what the company does, the story of how he got involved, and what the plans are for the Boston office. -How he learned to manage and build sales teams. -Advice for founders trying to build out their go-to-market sales strategy. -His charitable work with the St. Baldrick's Foundation and why it means so much to him. -Plus, a lot more. Is your company hiring? If the answer is yes, then you might want to consider adding a BIZZpage for your company. It is our employment branding and hiring solution that helps keep you company top of mind with our targeted audience of professionals in the tech industry. A subscription includes a BIZZpage for employment branding, unlimited postings to our Job Board, access to our exclusive content series, and more. If you are interested in additional details, please send an email to info@venturefizz.com. Lastly, if you like the show, please remember to subscribe to and review us on iTunes, or your podcast player of choice!

SQL Server רדיו
איך מרימים סביבה ב-10 שעות עם חן שאוליאן

SQL Server רדיו

Play Episode Listen Later Apr 4, 2019 45:35


הפעם אנחנו שמחים לארח בתוכנית את חן שאוליאן. יחד אנחנו מדברים על SQL Server, על MongoDB, על Vertica, על הפדיחות הכי גדולות, על מור”קים מעניינים ועל טיולים בעולם עם ילדים…

De Dataloog
DTL014 - Vertica en de technologie achter Column Store Databases

De Dataloog

Play Episode Listen Later Mar 14, 2019 51:50


Column Store Databases (CSD) zijn wellicht onbekend, maar bij de grote Tech bedrijven zeker niet onbemind, zo ook Vertica. De technologie achter CSDs is dat ze niet rij maar kolom georiënteerd zijn wat ze sneller en compacter maakt. Vertica is één van de belangrijkste leveranciers van deze technologie. Daarbij hebben zij de Machine Learning toolkit in de database draaien in plaats van op een apart platform wat het geheel nog sneller en efficiënter maakt. De Dataloog praat met Erikjan Franssen over de technologie, de meerwaarde hiervan voor grote maar ook kleine bedrijven. Bovendien geeft Erikjan ons een goede inkijk in de business modellen achter tech software.   Wat hebben we weer veel geleerd deze uitzending! En uiteraard de kettingvraag welk type business model het beste aansluit op SAAS platforms. Een top uitzending die het beluisteren meer dan waard is. Zie ook de shownotes op www.dedataloog.nl

The freeCodeCamp Podcast
Ep. 37 - The Rise of the Data Engineer

The freeCodeCamp Podcast

Play Episode Listen Later Jul 2, 2018 18:40


When Maxime worked at Facebook, his role started evolving. He was developing new skills, new ways of doing things, and new tools. And — more often than not — he was turning his back on traditional methods. He was a pioneer. He was a data engineer! In this podcast, you'll learn about the rise of the data engineer and what it takes to be one. Written by Maxime Beauchemin: https://twitter.com/mistercrunch Read by Abbey Rennemeyer: https://twitter.com/abbeyrenn Original article: https://fcc.im/2tHLCST Learn to code for free at: https://www.freecodecamp.org Intro music by Vangough: https://fcc.im/2APOG02 Transcript: I joined Facebook in 2011 as a business intelligence engineer. By the time I left in 2013, I was a data engineer. I wasn’t promoted or assigned to this new role. Instead, Facebook came to realize that the work we were doing transcended classic business intelligence. The role we’d created for ourselves was a new discipline entirely. My team was at forefront of this transformation. We were developing new skills, new ways of doing things, new tools, and — more often than not — turning our backs to traditional methods. We were pioneers. We were data engineers! Data Engineering? Data science as a discipline was going through its adolescence of self-affirming and defining itself. At the same time, data engineering was the slightly younger sibling, but it was going through something similar. The data engineering discipline took cues from its sibling, while also defining itself in opposition, and finding its own identity. Like data scientists, data engineers write code. They’re highly analytical, and are interested in data visualization. Unlike data scientists — and inspired by our more mature parent, software engineering — data engineers build tools, infrastructure, frameworks, and services. In fact, it’s arguable that data engineering is much closer to software engineering than it is to a data science. In relation to previously existing roles, the data engineering field could be thought of as a superset of business intelligence and data warehousing that brings more elements from software engineering. This discipline also integrates specialization around the operation of so called “big data” distributed systems, along with concepts around the extended Hadoop ecosystem, stream processing, and in computation at scale. In smaller companies — where no data infrastructure team has yet been formalized — the data engineering role may also cover the workload around setting up and operating the organization’s data infrastructure. This includes tasks like setting up and operating platforms like Hadoop/Hive/HBase, Spark, and the like. In smaller environments people tend to use hosted services offered by Amazon or Databricks, or get support from companies like Cloudera or Hortonworks — which essentially subcontracts the data engineering role to other companies. In larger environments, there tends to be specialization and the creation of a formal role to manage this workload, as the need for a data infrastructure team grows. In those organizations, the role of automating some of the data engineering processes falls under the hand of both the data engineering and data infrastructure teams, and it’s common for these teams to collaborate to solve higher level problems. While the engineering aspect of the role is growing in scope, other aspects of the original business engineering role are becoming secondary. Areas like crafting and maintaining portfolios of reports and dashboards are not a data engineer’s primary focus. We now have better self-service tooling where analysts, data scientist and the general “information worker” is becoming more data-savvy and can take care of data consumption autonomously. ETL is changing We’ve also observed a general shift away from drag-and-drop ETL (Extract Transform and Load) tools towards a more programmatic approach. Product know-how on platforms like Informatica, IBM Datastage, Cognos, AbInitio or Microsoft SSIS isn’t common amongst modern data engineers, and being replaced by more generic software engineering skills along with understanding of programmatic or configuration driven platforms like Airflow, Oozie, Azkabhan or Luigi. It’s also fairly common for engineers to develop and manage their own job orchestrator/scheduler. There’s a multitude of reasons why complex pieces of software are not developed using drag and drop tools: it’s that ultimately code is the best abstraction there is for software. While it’s beyond the scope of this article to argue on this topic, it’s easy to infer that these same reasons apply to writing ETL as it applies to any other software. Code allows for arbitrary levels of abstractions, allows for all logical operation in a familiar way, integrates well with source control, is easy to version and to collaborate on. The fact that ETL tools evolved to expose graphical interfaces seems like a detour in the history of data processing, and would certainly make for an interesting blog post of its own. Let’s highlight the fact that the abstractions exposed by traditional ETL tools are off-target. Sure, there’s a need to abstract the complexity of data processing, computation and storage. But I would argue that the solution is not to expose ETL primitives (like source/target, aggregations, filtering) into a drag-and-drop fashion. The abstractions needed are of a higher level. For example, an example of a needed abstraction in a modern data environment is the configuration for the experiments in an A/B testing framework: what are all the experiment? what are the related treatments? what percentage of users should be exposed? what are the metrics that each experiment expects to affect? when is the experiment taking effect? In this example, we have a framework that receives precise, high level input, performs complex statistical computation and delivers computed results. We expect that adding an entry for a new experiment will result in extra computation and results being delivered. What is important to note in this example is that the input parameters of this abstraction are not the one offered by a traditional ETL tool, and that a building such an abstraction in a drag and drop interface would not be manageable. To a modern data engineer, traditional ETL tools are largely obsolete because logic cannot be expressed using code. As a result, the abstractions needed cannot be expressed intuitively in those tools. Now knowing that the data engineer’s role consist largely of defining ETL, and knowing that a completely new set of tools and methodology is needed, one can argue that this forces the discipline to rebuild itself from the ground up. New stack, new tools, a new set of constraints, and in many cases, a new generation of individuals. Data modeling is changing Typical data modeling techniques — like the star schema — which defined our approach to data modeling for the analytics workloads typically associated with data warehouses, are less relevant than they once were. The traditional best practices of data warehousing are loosing ground on a shifting stack. Storage and compute is cheaper than ever, and with the advent of distributed databases that scale out linearly, the scarcer resource is engineering time. Here are some changes observed in data modeling techniques: further denormalization: maintaining surrogate keys in dimensions can be tricky, and it makes fact tables less readable. The use of natural, human readable keys and dimension attributes in fact tables is becoming more common, reducing the need for costly joins that can be heavy on distributed databases. Also note that support for encoding and compression in serialization formats like Parquet or ORC, or in database engines like Vertica, address most of the performance loss that would normally be associated with denormalization. Those systems have been taught to normalize the data for storage on their own. blobs: modern databases have a growing support for blobs through native types and functions. This opens new moves in the data modeler’s playbook, and can allow for fact tables to store multiple grains at once when needed dynamic schemas: since the advent of map reduce, with the growing popularity of document stores and with support for blobs in databases, it’s becoming easier to evolve database schemas without executing DML. This makes it easier to have an iterative approach to warehousing, and removes the need to get full consensus and buy-in prior to development. systematically snapshoting dimensions (storing a full copy of the dimension for each ETL schedule cycle, usually in distinct table partitions) as a generic way to handle slowly changing dimension (SCD) is a simple generic approach that requires little engineering effort, and that unlike the classical approach, is easy to grasp when writing ETL and queries alike. It’s also easy and relatively cheap to denormalize the dimension’s attribute into the fact table to keep track of its value at the moment of the transaction. In retrospect, complex SCD modeling techniques are not intuitive and reduce accessibility. conformance, as in conformed dimensions and metrics is still extremely important in modern data environment, but with the need for data warehouses to move fast, and with more team and roles invited to contribute to this effort, it’s less imperative and more of a tradeoff. Consensus and convergence can happen as a background process in the areas where the pain point of divergence become out-of-hand. Also, more generally, it’s arguable to say that with the commoditization of compute cycles and with more people being data-savvy then before, there’s less need to precompute and store results in the warehouse. For instance you can have complex Spark job that can compute complex analysis on-demand only, and not be scheduled to be part of the warehouse. Roles & responsibilities The data warehouse A data warehouse is a copy of transaction data specifically structured for query and analysis. — Ralph Kimball A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision making process. — Bill Inmon The data warehouse is just as relevant as it ever was, and data engineers are in charge of many aspects of its construction and operation. The data engineer’s focal point is the data warehouse and gravitates around it. The modern data warehouse is a more public institution than it was historically, welcoming data scientists, analysts, and software engineers to partake in its construction and operation. Data is simply too centric to the company’s activity to have limitation around what roles can manage its flow. While this allows scaling to match the organization’s data needs, it often results in a much more chaotic, shape-shifting, imperfect piece of infrastructure. The data engineering team will often own pockets of certified, high quality areas in the data warehouse. At Airbnb for instance, there’s a set of “core” schemas that are managed by the data engineering team, where service level agreement (SLAs) are clearly defined and measured, naming conventions are strictly followed, business metadata and documentation is of the highest quality, and the related pipeline code follows a set of well defined best practices. It also becomes the role of the data engineering team to be a “center of excellence” through the definitions of standards, best practices and certification processes for data objects. The team can evolve to partake or lead an education program sharing its core competencies to help other teams become better citizens of the data warehouse. For instance, Facebook has a “data camp” education program and Airbnb is developing a similar “Data University” program where data engineers lead session that teach people how to be proficient with data. Data engineers are also the “librarians” of the data warehouse, cataloging and organizing metadata, defining the processes by which one files or extract data from the warehouse. In a fast growing, rapidly evolving, slightly chaotic data ecosystem, metadata management and tooling become a vital component of a modern data platform. Performance tuning and optimization With data becoming more strategic than ever, companies are growing impressive budgets for their data infrastructure. This makes it increasingly rational for data engineers to spend cycles on performance tuning and optimization of data processing and storage. Since the budgets are rarely shrinking in this area, optimization is often coming from the perspective of achieving more with the same amount of resources or trying to linearize exponential growth in resource utilization and costs. Knowing that the complexity of the data engineering stack is exploding we can assume that the complexity of optimizing such stack and processes can be just as challenging. Where it can be easy to get huge wins with little effort, diminishing returns laws typically apply. It’s definitely in the interest of the data engineer to build [on] infrastructure that scales with the company, and to be resource conscious at all times. Data Integration Data integration, the practice behind integrating businesses and systems through the exchange of data, is as important and as challenging as its ever been. As Software as a Service (SaaS) becomes the new standard way for companies to operate, the need to synchronize referential data across these systems becomes increasingly critical. Not only SaaS needs up-to-date data to function, we often want to bring the data generated on their side into our data warehouse so that it can be analyzed along the rest of our data. Sure SaaS often have their own analytics offering, but are systematically lacking the perspective that the rest of you company’s data offer, so more often than not it’s necessary to pull some of this data back. Letting these SaaS offering redefine referential data without integrating and sharing a common primary key is a disaster that should be avoided at all costs. No one wants to manually maintain two employee or customer lists in 2 different systems, and even worse: having to do fuzzy matching when bringing their HR data back into their warehouse. Worse, company executive often sign deal with SaaS providers without really considering the data integration challenges. The integration workload is systematically downplayed by vendors to facilitate their sales, and leaves data engineers stuck doing unaccounted, under appreciated work to do. Let alone the fact that typical SaaS APIs are often poorly designed, unclearly documented and “agile”: meaning that you can expect them to change without notice. Services Data engineers are operating at a higher level of abstraction and in some cases that means providing services and tooling to automate the type of work that data engineers, data scientists or analysts may do manually. Here are a few examples of services that data engineers and data infrastructure engineer may build and operate. data ingestion: services and tooling around “scraping” databases, loading logs, fetching data from external stores or APIs, … metric computation: frameworks to compute and summarize engagement, growth or segmentation related metrics anomaly detection: automating data consumption to alert people anomalous events occur or when trends are changing significantly metadata management: tooling around allowing generation and consumption of metadata, making it easy to find information in and around the data warehouse experimentation: A/B testing and experimentation frameworks is often a critical piece of company’s analytics with a significant data engineering component to it instrumentation: analytics starts with logging events and attributes related to those events, data engineers have vested interests in making sure that high quality data is captured upstream sessionization: pipelines that are specialized in understand series of actions in time, allowing analysts to understand user behaviors Just like software engineers, data engineers should be constantly looking to automate their workloads and building abstraction that allow them to climb the complexity ladder. While the nature of the workflows that can be automated differs depending on the environment, the need to automate them is common across the board. Required Skills SQL mastery: if english is the language of business, SQL is the language of data. How successful of a business man can you be if you don’t speak good english? While generations of technologies age and fade, SQL is still standing strong as the lingua franca of data. A data engineer should be able to express any degree of complexity in SQL using techniques like “correlated subqueries” and window functions. SQL/DML/DDL primitives are simple enough that it should hold no secrets to a data engineer. Beyond the declarative nature of SQL, she/he should be able to read and understand database execution plans, and have an understanding of what all the steps are, how indices work, the different join algorithm and the distributed dimension within the plan. Data modeling techniques: for a data engineer, entity-relationship modeling should be a cognitive reflex, along with a clear understanding of normalization, and have a sharp intuition around denormalization tradeoffs. The data engineer should be familiar with dimensional modeling and the related concepts and lexical field. ETL design: writing efficient, resilient and “evolvable” ETL is key. I’m planning on expanding on this topic on an upcoming blog post. Architectural projections: like any professional in any given field of expertise, the data engineer needs to have a high level understanding of most of the tools, platforms, libraries and other resources at its disposal. The properties, use-cases and subtleties behind the different flavors of databases, computation engines, stream processors, message queues, workflow orchestrators, serialization formats and other related technologies. When designing solutions, she/he should be able to make good choices as to which technologies to use and have a vision as to how to make them work together. All in all Over the past 5 years working in Silicon Valley at Airbnb, Facebook and Yahoo!, and having interacted profusely with data teams of all kinds working for companies like Google, Netflix, Amazon, Uber, Lyft and dozens of companies of all sizes, I’m observing a growing consensus on what “data engineering” is evolving into, and felt a need to share some of my findings. I’m hoping that this article can serve as some sort of manifesto for data engineering, and I’m hoping to spark reactions from the community operating in the related fields!

The VentureFizz Podcast
Episode 22: Andy Palmer - Co-Founder & CEO of Tamr

The VentureFizz Podcast

Play Episode Listen Later Apr 28, 2018 39:27


Welcome to Episode 22 of The VentureFizz Podcast, the flagship podcast of Boston's most trusted source for startup and tech jobs, news, and insights! Let's face it: in order for a tech ecosystem to thrive, you need people who are not only successful entrepreneurs, but also know how to pay it forward for the next generation. For the 22nd episode of our podcast, I interviewed Andy Palmer, who is the CEO & Co-Founder of Tamr in Cambridge. Andy is the definition of what it means to pay it forward. A serial successful entrepreneur in his own right, and he is also a very active angel investor with close to 60 of his own investments (not to mention being an LP in other local funds, and mentoring lots of founders along the way). In this episode, we cover topics like: -How rugby relates to startups -The details behind Vertica, a very successful company that was acquired by HP -Why you shouldn't raise capital in the early days of your company -The common mistakes entrepreneurs make when pitching an idea -And lots more! Lastly, if you like the show, please remember to subscribe to and review us on iTunes, or your podcast player of choice! And make sure to follow Tamr on Twitter @Tamr_Inc and VentureFizz @VentureFizz.

Google Cloud Platform Podcast
Phoenix One Games with Michael Will

Google Cloud Platform Podcast

Play Episode Listen Later Aug 30, 2017 41:36


Michael Will, Director of Technical Operations at Phoenix One Games, joins your co-hosts Mark and Francesc today to tell us about their migration from their own hardware to the cloud, how they did it, and what they learned from it. About Michael Will Michael is experienced in the design and deployment of high performance compute clusters both from hardware and software aspect with a parallel execution and solution architecture focus from IBM 390 running Linux to Penguin Computing commodity hardware for 10+ years, applying in the last 5 years to his passion for real time strategy gaming at Phoenix Age / Kabam / Phoenix One Games. Cool things of the week Moving The New York Times Games Platform to Google App Engine New York Times blog Introducing Network Service Tiers: Your cloud network, your way GCP blog Titan in depth: Security in plaintext GCP blog Final Day at Cloud Next gcppodcast 67 Interview Google Compute Engine docs Gaming Solutions on Google Cloud Platform docs Google Cloud Storage docs Couchbase Google Cloud and Couchbase Server: Zero to Millions of Operations in No Time couchbase.com VERTICA database info Phoenix One Games Question of the week How to secure an App Engine application? App Engine login (required/admin in your app.yaml), only Google accounts. OAuth2: Firebase Authentication Auth0. Cloud Identity-Aware Proxy App Engine firewall (beta) Where can you find us next? Francesc just released another justforfunc episode on Go Type Aliases, and will be presenting at Google Cloud Summit in Sydney in September. Mark is entering crazy season, currently speaking at Pax Dev and then attending Pax West right after. He'll then be speaking at Gameacon and Austin Game Conference and attending Strangeloop once he's done with all that.

The Usual Podcast
Episode 101 - Sick Flips

The Usual Podcast

Play Episode Listen Later Jun 14, 2017 82:53


In the Star Wars: The Old Republic section of the show, Marshall and Will discuss their weeks in game, and changes coming in Game Update 5.2.2. Links: Blue Moon Hop Valley Brewing CXP Legacy Perk Server Merge Discussion Thread Male Force User companion… Returning Companions priority Rocket Boots? Flashpoints PvP on Iokath Scripted? This week's Nightlife Hypercrate Prices Star Cluster and Vertica pack drop changes???!!! Discussion Topic: Class Changes This Summer Only a start! Play to Give Update Pippi's Longstockings SWTOR Refer-a-friend links at theusualpodcast.com on our about page Star Wars Section - (19:15) In the Star Wars section of the show, the guys discuss collectibles stolen from Rancho Obi-Wan, the Han Solo film, The Last Jedi releasing early in the UK, Battlefront II, and more. Links: Valuable “Star Wars” collectibles stolen from Rancho Obi-Wan, company seeking tips and information Books 5 STAR WARS BOOKS TO HELP NEW FANS TAKE THEIR FIRST STEP INTO A LARGER WORLD Han Solo Set Video Features New Scout Trooper Thandie Newton Talks About The Han Solo Spinoff Movie Set Photo Hints At Han's Home The Last Jedi Classic Star Wars Characters Get Updates For The Last Jedi The Last Jedi Gets A New Release Date In The UK Episode IX Colin Trevorrow On Making Episode IX Without Carrie Fisher Episode IX Will Offer "Satisfying” Trilogy Ending Battlefront II Demo At EA Play Will Have A Multiplayer Battle On Naboo Merchandise Wave one of SDCC Star Wars Exclusives Wear The Star Wars Crawls With Mighty Fine's New Scarves More Details Emerge On Stern Pinball's Star Wars 40th Anniversary Machine The Usual Round-up - (37:50) The guys discuss Riverdale season 2, a live-action Cowboy Bebop TV series, the return of Rick Moranis, the CW shows new lineup, Wonder Woman's opening weekend, convention news, and a lot more! Links: Reel Reviews TV HBO CLARIFIES GAME OF THRONES SPIN-OFFS, FINAL SEASON MAY AIR IN 2019 Riverdale: Skeet Ulrich Upped To Series Regular For Season 2 COWBOY BEBOP IS GETTING A LIVE-ACTION TV SERIES Leonard Nimoy Documentary To Air Nationally On PBS Krypton To Replace Game Of Thrones In Northern Ireland Krypton to focus on Seg-El, Superman's grandfather Snowpiercer (TNT) casts Jennifer Connelly to join Daveed Diggs Movies RICK MORANIS WILL RETURN TO ACTING FOR THE FIRST TIME IN 20 YEARS FANTASTIC BEASTS' CASTING CALL SEEKS TEENAGED DUMBLEDORE, GRINDELWALD, AND NEWT SCAMANDER Ron Perlman Grabs Dinner With New Hellboy Actor David Harbour Universal Usual Berlanti CWTV announces lineup/premiere week DCEU News JLD and Batgirl next probable to start production Wonder Woman Wonder Woman Beats Green Lantern's Total Box Office Gross In One Weekend Wonder Woman's Box Office Opening Makes History For Female Directors Wonder Woman Opens To $228 Million Worldwide Wonder Woman suspended in Tunisia Wonder Woman Director On Where The Sequel Will Be Set Director Has Not Signed On For Sequel Women-Only Screenings Defended By Austin Mayor Life Sized Wonder Woman Statue Suicide Squad Will Reportedly Start Filming in 2018 Justice League Reportedly Begins Significant Reshoots Nielsen and Wright confirm returns War of the Gods? Amazons vs Apokolips Disney Ducktales announces cast/roles Netflix News Futurama Rumored To Be Leaving Netflix And Fans Are Not Happy Fox Gambit X-Men Producer Offers Update On Gambit Movie Sony News Sony Pictures Looking Create An All-Female Superhero Team Marvel News Stan Lee Wants To Eventually Co-Star In A Marvel Movie Spider-Man: Homecoming Cast And Crew Discuss Michael Keaton's Vulture Tom Holland Visits Children's Hospital Los Angeles Jessica Jones Season 2 Krysten Ritter Says Jessica Jones Season 2 Is Meant To Be Binged Luke Cage Season 2 Starts Filming Today (June 6th) The Punisher Director Reveals Premiere Date Guardians of the Galaxy Director Drops New Adam Warlock Teaser Ant-Man & the Wasp Casts Hannah John-Kamen Conventions SDCC SDCC 2017 Shuttle Map, Schedule Revealed Official Dragon Ball Fan Bash New Hotel for 2019 Entertainment Weekly Auctions Off Two Tickets to SDCC 2017 Party for Charity CharityBuzz Auctioning VIP Access to ‘Outlander' San Diego Comic-Con Panel, Signing Kidrobot San Diego Comic-Con 2017 Exclusives Wonder Woman Wooden Figurines and Invisible Jet SDCC Exclusive Released Entertainment Earth San Diego Comic-Con 2017 Exclusives Badge Price Infographic Trailer Time! Professor Marston and the Wonder Woman American Made Battle of the Sexes Beatriz at Dinner Murder on the Orient Express Hitman's Bodyguard (Red Band) Outro and Contact Information If you have comments or questions, you can find us at theusualpodcast.com, email us at theusualpodcast@gmail.com, and find us on Facebook, Pinterest, Google+, Instagram, Twitch, and YouTube. Marshall is @darthpops on Twitter, and Will is@iamwillgriggs. Please take the time to give us a positive rating on iTunes and Stitcher, and like and share us on the social medias! Use our link to try Audible free for 30 days! Like what we're doing? Become a patron HERE, or check out our support us page for more way to show your love.

Pivotal Podcasts
What is Happening to the Data Warehouse Market? (Ep. 8)

Pivotal Podcasts

Play Episode Listen Later Nov 7, 2016


The data warehouse market has experienced some interesting developments over the last few months and years. First, Actian decided to leave the market all together over the summer, pulling its analytical database and SQL-on-Hadoop offering from the market. Not long after that, HPE announced it was divesting its software business, which includes Vertica, to a British conglomerate with virtually no U.S. market presence. Meanwhile, data warehouse appliance stalwarts IBM Netezza and Teradata have been struggling for the last couple of years to adjust to changes in customer requirements. On the flip side, Pivotal Greenplum, the only open source-based MPP analytical database on the market, continues to gain momentum with enterprises looking for a modern data warehouse to support data science and advanced analytics workloads. So where does this leave us? In this episode of Pivotal Insights, host Jeff Kelly speaks to a group of Pivotal data engineering leaders and Pivotal’s head of data sales about what they are hearing from customers and prospects that are trying to make sense of their data warehouse options. They offer unvarnished feedback from the field, including the customer reaction to the struggles of vendors like Actian, Vertica and Teradata, thoughts on why so many are turning to Pivotal Greenplum, as well as the role open source, cloud and MPP architecture play as enterprise customers evaluate their data warehouse options (hint: they’re really important.)

Pivotal Insights
What is Happening to the Data Warehouse Market? (Ep. 8)

Pivotal Insights

Play Episode Listen Later Nov 7, 2016 30:20


The data warehouse market has experienced some interesting developments over the last few months and years. First, Actian decided to leave the market all together over the summer, pulling its analytical database and SQL-on-Hadoop offering from the market. Not long after that, HPE announced it was divesting its software business, which includes Vertica, to a British conglomerate with virtually no U.S. market presence. Meanwhile, data warehouse appliance stalwarts IBM Netezza and Teradata have been struggling for the last couple of years to adjust to changes in customer requirements. On the flip side, Pivotal Greenplum, the only open source-based MPP analytical database on the market, continues to gain momentum with enterprises looking for a modern data warehouse to support data science and advanced analytics workloads. So where does this leave us? In this episode of Pivotal Insights, host Jeff Kelly speaks to a group of Pivotal data engineering leaders and Pivotal’s head of data sales about what they are hearing from customers and prospects that are trying to make sense of their data warehouse options. They offer unvarnished feedback from the field, including the customer reaction to the struggles of vendors like Actian, Vertica and Teradata, thoughts on why so many are turning to Pivotal Greenplum, as well as the role open source, cloud and MPP architecture play as enterprise customers evaluate their data warehouse options (hint: they’re really important.)

CSAIL Alliances Podcasts
A Discussion with MIT CSAIL's Michael Stonebraker

CSAIL Alliances Podcasts

Play Episode Listen Later Feb 24, 2016 7:44


A researcher at MIT’s Computer Science and Artificial Intelligence Lab, Michael Stonebraker has founded and led nine different big-data spin-offs, including VoltDB, Tamr and Vertica - the latter of which was bought by Hewlett Packard for $340 million. Now he’s bringing his insights to a new online course being offered this month through edX and MIT Professional Education. Co-taught by long-time business partner Andy Palmer, “Startup Success: How to Launch a Technology Company in 6 Steps” covers topics ranging from generating ideas and recruiting top talent to pitching VCs and negotiating deals - all in the span of three weeks.

Succesfulde danske iværksætter historier (Sæson 1)
Wupti.com, af medstifter Claus Kristensen

Succesfulde danske iværksætter historier (Sæson 1)

Play Episode Listen Later Dec 9, 2014 22:13


Hør historien om, hvordan Wupti.com gik fra idé til succes, når Vertica interviewer medstifter Claus Kristensen. (Desc)

E-handel og e-handelsplatforme
Vælg den rigtige e-handelsplatform (Tip 3 af 10)

E-handel og e-handelsplatforme

Play Episode Listen Later Nov 30, 2014 5:53


Hør hvad du skal være opmærksom på i dit valg af e-handelsplatform. Hvad skal du se på, hvor skal du kigge og hvad skal du spørge til? Vertica giver dig svaret.

DevOps Дефлопе подкаст
010 - Секретный девопс из недр Undev

DevOps Дефлопе подкаст

Play Episode Listen Later Jun 2, 2014 59:17


Новости Как внедряют DevOps в RedHat Open Source Chef Server 11.1.0 Чатопсы и доктор Манхеттен Berkshelf Workflow Музыкальный альбом как модуль к ядру Open-Source Service Discovery Consul от Hashcorp Alert Design Мониторинг на основе данных Обсуждение CitusDB Обсуждение Citus на hackernews Сравнение Citus и MonetDB VoltDB Vertica Интервью с Дмитрием Васильевым из Undev Поддержка erb в ролях, кастомный chef formater, deb-хелперы для ruby Riemann сервер Сталин Riemann клиент Курчатов Обновление автоматических атрибутов ноды без chef-client (эдакий ocs inventory) Для интересующихся реализацией chef-server можно глянуть тут. Но это достаточно увесистый chef-server, неудобный тем, что данные хранит либо в файле с go-серилизацией либо в mysql