The origin story behind the best open source projects and communities.
Stephan Ewan (@StephanEwan) is the co-founder of Restate, the open-source workflow-as-code engine. Restate is lightweight, simple, and provides durable execution. Before Restate, Stephan co-created Apache Flink, the open-source stream processing framework. Lessons learned from Flink have heavily influenced the development of Restate, although Stephan says they have exact opposite use cases. Contributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com. Subscribe to Contributor on Substack for email notifications! In this episode we discuss: The history of Flink and the impact of the 2016 U.S. election Why tooling for real-time transactional problems has historically had room for improvement What constitutes “modern” workflow engines Can you use Restate for any use case? Moving from a large company to a small startup as an open-source developer Links: Restate Apache Flink People mentioned: Kostas Tzoumas (@kostas_tzoumas) Other episodes: Temporal with Maxim Fateev
Eyal Solomon (@EyalSolomo44643) is the CEO and co-founder of Lunar, an open-source platform which bills itself as the “first reverse API gateway.” Lunar allows engineering teams to monitor, manage, and optimize API consumption. According to Eyal, it's very easy to integrate with APIs, but difficult to keep them maintained, and there was a clear need for a generic solution to control and scale every API consumed in production. Contributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com. Subscribe to Contributor on Substack for email notifications! In this episode we discuss: How most companies think their API maintenance is a unique problem The importance of managing API consumption in the face of the AI revolution Why Eyal and his team decided to open-source Lunar Future plans for Lunar, including the development of autonomous optimization and pre-built flows Eyal's thoughts on how to start conversations with potential enterprise clients Links: Lunar People mentioned: Roy Gabbay (LinkedIn)
Shirshanka Das (@shirshanka) is the CTO of Acryl Data and founder of DataHub, which bills itself as the #1 open-source metadata platform. It enables data discovery, data observability and federated governance to help tame complex data ecosystems. Shirshanka first developed DataHub while at LinkedIn, but has grown it into an independent project with a thriving community. Contributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com. Subscribe to Contributor on Substack for email notifications! In this episode we discuss: How DataHub differs from traditional data catalogs Themes around why community members get involved and stick with the project Partnering with Netflix to develop runtime metadata model extensibility The influence of the pandemic on DataHub's open-sourcing Dealing with the future of a project with big community and unlimited scope Links: DataHub The History of DataHub
After his first child was born, Matt Wonlaw (@tantaman) imagined giving his son life advice. What kind of life did he want his kid to lead? At the time, he was working for Facebook, and he decided that his own life needed a change in direction. So Matt started vlcn, aka Vulcan Labs, a research company that develops open-source projects like CR-SQLite and Materialite. vlcn has an unusual business model – Matt receives donations and sponsorships from users and clients. It's all part of his mission to rethink the modern data stack for writing rich and complex applications. Contributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com. Subscribe to Contributor on Substack for email notifications! In this episode we discuss: One reason that software is still too hard to write: Object orientations How CR-SQLite allows databases to be merged together and Materialite provides Incremental View Maintenance for JavaScript Why coding directly to relations can provide a more flexible and efficient approach to building applications Matt's decision to build vlcn as a research lab rather than as a startup Thoughts for the future on PGLite Links: vlcn (Vulcan Labs) CR-SQLite Materialite fly.io PGLite People mentioned: Johannes Schickling (@schickling)
Amplication is an open-source development platform for scalable and secure Node.js applications. It allows engineers to skip writing boilerplate code and offers the flexibility to customize and add components. Amplification was created by Yuval Hazaz (@Yuvalhazaz1), a veteran developer who determined that low-code platforms save time but restrict freedom. Instead, Amplication uses code generation to reliably and consistently build robust production‑ready backend services. Contributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com. Subscribe to Contributor on Substack for email notifications! In this episode we discuss: Yuval's “secret sauce” for building an open-source community How platform engineers can use Amplication for company-wide standardization A baseline organic growth rate for open-source projects The role of generative AI in code modernization Links: Amplication
OpenBB is an open-source investment research platform created by Didier Lopes (@didier_lopes). OpenBB grew out of a project called Gamestonk Terminal that Didier began working on shortly before the Gamestop short squeeze in January 2021. Today, OpenBB has evolved into an infrastructure platform that allows users to build extensions and access financial data with automation and customization. Contributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com. Subscribe to Contributor on Substack for email notifications! In this episode we discuss: What Vice Media got wrong about OpenBB Some major contributors to the project and the features or directions that they proposed How a machine learning engineer from Bloomberg reached out about OpenBB Different types of OpenBB users – students, retail investors, and other financial professionals OpenBB's exciting AI roadmap Links: OpenBB People mentioned: James Maslek (@jmaslek11 Artem Veremey (@artemvv)
OpenTelemetry is an open-source observability framework for collecting and managing telemetry data. OpenTelemetry has been more successful than expected, becoming the second fastest growing project in the CNCF. It allows for flexibility and avoids vendor lock-in, making it attractive to startups and large enterprises alike. On today's show, Eric (@ericmander) sits down with Austin Parker (@austinlparker), director of open-source at Honeycomb. Contributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com. Subscribe to Contributor on Substack for email notifications! In this episode we discuss: How Austin's interest in complex systems led him to the observability field and developer relations An X argument that contributed to the merger of OpenTelemetry and OpenCensus Why foundations help maintainers to strike a balance with their contributors Austin's opinion on the secret to OpenTelemetry's success Links: OpenTelemetry Honeycomb People mentioned: Charity Majors (@mipsytipsy) Christine Yen (@cyen)
OPAL is an open-source administration layer for Policy Engines such as Open Policy Agent (OPA). OPAL provides the necessary infrastructure to load policy and data into multiple policy engines, ensuring they have the information they need to make decisions. Today, we're talking to Or Weis (@OrWeis), co-creator of OPAL and co-founder of Permit, the end-to-end authorization platform that envisions a world where developers never have to build permissions again. Contributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com. Subscribe to Contributor on Substack for email notifications! In this episode we discuss: History of Permit and OPAL The benefits of an open-foundation model rather than open-core RBAC vs ABAC vs ReBAC Why developers would prefer to not have to deal with authorization Or's own podcast, Command+Shift+Left Links: OPAL Permit Command+Shift+Left Terraform People mentioned: Asaf Cohen (@asafchn) Filip Grebowski (@developerfilip) Other episodes: Open Policy Agent with Torin Sandall Community Driven IaC: OpenTofu with Kuba Martin
FerretDB enables users to run MongoDB applications on existing Postgres infrastructure. Peter Farkas (@FarkasP), co-founder and CEO of FerretDB, explains the need for an open source interface for document databases. Peter also discusses the licensing change of MongoDB and the uncertainty it created for users. He emphasizes the importance of open standards and collaboration among MongoDB alternatives to provide users with choice and interoperability. Contributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com. Subscribe to Contributor on Substack for email notifications! In this episode we discuss: The epic mountain adventure that inspired FerretDB Why commercial open-source can be additive rather than extractive How compatibility and open standards drives innovation and competition PDFs as an example of corporation-supported standards Three tenets for building a successful open source project Links: FerretDB Percona People: Peter Zaitsev (@PeterZaitsev)
Ben Johnson (@benbjohnson) is the creator of Litestream and LiteFS, two open-source disaster recovery solution for SQLite. Litestream is designed to provide continuous backups for SQLite databases by streaming incremental changes, allowing for easy data recovery in the event of a server crash. LiteFS, on the other hand, is built on LiteStream but uses transactional control to focus on replication and high availability. Join us as Ben discusses the challenges and trade-offs of open source contributions and the future of databases. Contributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com. Subscribe to Contributor on Substack for email notifications! In this episode we discuss: The history of how Ben got involved in SQLite development out of “spite” How Litestream “works on a fluke” Different use cases for Litestream vs LiteFS Why fully open contributions isn't always Ben's style The greater server-side SQLite landscape Links: Litestream LiteFS Fly.io BoltDB People mentioned: Philip O'Toole (@general_order24) Other episodes: The Social Miracle: rqlite with Philip O'Toole The Big Fork: libSQL with Glauber Costa
Tonic is a native gRPC implementation in Rust that allows users to easily build gRPC servers and clients without extensive async experience. Tonic is part of the Tokio stack, which is a library that provides an asynchronous runtime for Rust and more tools to write async applications. Today, Lucio Franco (@lucio_d_franco) of Turso joins the podcast to discuss his unique experience maintaining Tonic and contributing to the asynchronous Rust ecosystem. Contributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com. Subscribe to Contributor on Substack for email notifications! In this episode we discuss: The challenges of async Rust and ways the community has addressed them Lucio's plan on how to get a job in distributed databases How the Tokio team avoided power dynamics Problems around working on open-source in the corporate world Why Lucio encouraged a collaborator to go on without him Links: Tonic Tokio Turso Tower People: Carl Lerche (@carllerche) Other episodes: The Big Fork: libSQL with Glauber Costa
rqlite is a lightweight, distributed relational database built on Raft and SQLite. Founder Philip O'Toole (@general_order24) decided to combine these technologies while working at a startup years ago. The startup no longer exists, but rqlite is going strong. Today, Philip is an engineering manager at Google, while he continues to be the driving force behind the open development of rqlite. Contributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com. Subscribe to Contributor on Substack for email notifications! In this episode we discuss: The biggest misconceptions about how rqlite differs from SQLite Why writing databases is more interesting than new programmers might think The tradeoff between a large community versus smaller, more focused leadership Reasons why open-source development progresses in bursts of energy How to really pronounce “rqlite” Links: rqlite InfluxData dqlite Litestream libSQL Turso OpenTelemetry People: Ben Johnson (@benbjohnson) Other episodes: libSQL with Glauber Costa
Kuba Martin (@cube2222_2) is Software Engineering Team Lead at Spacelift and Interim Tech Lead of OpenTofu, the open-source fork of Terraform. Terraform is a declarative infrastructure-as-code (IaC) tool that recently switched to a source-available license. Spacelift and other companies that heavily relied on Terraform came together to fork it into a community-driven project originally called OpenTF, which has now become OpenTofu and is governed by the Linux Foundation. Contributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com. Subscribe to Contributor on Substack for email notifications! In this episode we discuss: Two kinds of forks How OpenTofu handled the opportunity to rethink their licensing and copyright Finding hundreds of pledges to the OpenTF Manifesto The benefits of a technical steering committee Recreating the community registry Links: OpenTofu Spacelift Terraform Gruntwork Harness env0 Scalr
Ry Walker (@rywalker) is the founder and CEO of Tembo, the Postgres developer platform for building any and every data service. To Ry, the full capabilities of Postgres appear underappreciated and underused for most users. Tembo is an attempt to harness the large ecosystem of Postgres extensions, and ultimately collapse the database sprawl of the modern data stack. Contributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com. Subscribe to Contributor on Substack for email notifications! In this episode we discuss: Taking the “red pill” of using Postgres for everything Providing universal support for Postgres extensions Why Ry dislikes the current state of the modern data stack How databases across the board have mostly changed into application platforms What makes Tembo “Startup Mt. Everest” Links: Tembo OSSRank Citus Data Modal Supabase Wrappers People mentioned: Erik Bernhardsson (@bernhardsson) Other episodes: Clickhouse with Alexey Milovidov and Ivan Blinkov
Jan Oberhauser (@JanOberhauser) is the founder and CEO of n8n, the free and source-available workflow automation tool for technical users. n8n's flexible architecture allows users to avoid the limitations of other automation tools, while also opening doors for complex automation scenarios. The project has garnered over 30,000 GitHub stars and a thriving community of 55,000+ members. Subscribe to Contributor on Substack for email notifications, and join our Slack community! In this episode we discuss: How Jan's background in film effects laid the groundwork for n8n Why n8n uses a forum over Discord or Slack for a community platform Use cases from scheduling fitness classes to upgrading financial mainframes How n8n might stack up against the well-thought out Python script Why n8n uses a fair-code license rather than open-source Links: n8n n8n Community Other episodes: Temporal with Maxim Fateev From Orchestration to Building Applications: Conductor with Jeu George Rethinking the Workflow Problem: Windmill with Ruben Fiszel
Glauber Costa (@glcst) is the founder of Turso and the co-creator of libSQL, an open source, open contribution fork of the database engine library, SQLite. Most people believe that SQLite is open-source software, but it actually exists in the public domain and doesn't accept external contributions. With their big fork, Glauber and his team have set out to evolve SQLite into a modern database with support for distributed data, an asynchronous interface, compatibility with WASM and Linux, and more. Subscribe to Contributor on Substack for email notifications, and join our Slack community! In this episode we discuss: Community reactions to forking SQLite How Glauber was spoiled by starting his career developing for Linux The controversial decision to launch libSQL without writing a single line of code The plan for incorporating upstream changes from SQLite Examples of how application developers need to move code “to the edge” Links: libSQL SQLite Turso LiteFS Litestream rqlite VLCN People mentioned: Avi Kivity (@AviKivity) Dor Laor (@DorLaor) Ben Johnson (@benbjohnson) Phillip O'Toole (@general_order24) Matt Tantaman (@tantaman) Other episodes: Scylla with Dor Laor Apache Cassandra with Patrick McFadin
Ruben Fiszel (@rubenfiszel) is the creator of Windmill, the open-source developer platform that lets users easily turn scripts into workflows and internal apps with auto-generated UIs. Windmill doesn't force engineers to change their coding style or adopt a convoluted API, and its low-code design makes it accessible to non-technical users. Tune in to find out how Windmill offers speed, performance and flexibility, while avoiding the limitations of rigid tools. Subscribe to Contributor on Substack for email notifications, and join our Slack community! In this episode we discuss: Why many engineers try to reinvent the wheel when it comes to workflow engines When Ruben first saw the need for a platform like Windmill while working at Palantir “Today is the nicest period to build open-source…” Ruben's incredible presence with support and bug fixes Windmill's generous open-source offerings and the future of the business Links: Windmill Retool Tokio Apache Airflow Apache Spark Other episodes: Prefect with Jeremiah Lowin Dagster with Nick Schrock Temporal with Maxim Fateev Temporal (Part 2) with Maxim Fateev and Dominik Tornow Apache Cassandra with Patrick McFadin
Jesse Clark (@jn2clark) is a co-founder of Marqo, the end-to-end, multimodal vector search engine. Vector search has exploded along with the rise of generative AI models, so Marqo's arrival has had excellent timing. The project has quickly grown to almost 3000 GitHub stars, despite being less than a year old. Jesse and his team weren't exactly expecting this level of immediate success, but they are well-positioned to continue developing Marqo as a fixture in the worlds of information retrieval and machine learning. Subscribe to Contributor on Substack for email notifications, and join our Slack community! In this episode we discuss: Jesse's journey from physics research, to Stitch Fix, Amazon, and finally starting Marqo Industry vs academia in the cutting edge of machine learning Why “almost any organization in the world would benefit from Marqo” Talking about machine learning language - tensors, vectors, embeddings How Jesse deals with the stress of knowing how fast the AI space is innovating Links: Marqo People mentioned: Katrina Lake (@kmlake) Eric Colson (@ericcolson)
Jeu George (@jeugeorge) is the co-creator of Conductor, the open-source application building platform. Conductor began as a workflow orchestrator and was originally developed at Netflix. Jeu also co-founded Orkes, a company which offers a cloud product based on Conductor. Tune in to find out how Conductor has evolved into an open-source, battle-tested distributed application platform. Subscribe to Contributor on Substack for email notifications, and join our Slack community! In this episode we discuss: The core tenets of building Conductor - reliability, language and cloud agnosticism How Conductor enables teams to share and manage their custom modules The role of Conductor in Netflix's switch from licensed to original content Jeu's journey from Netflix, to Uber, and finally to Orkes How Orkes is focusing on integrations and AI orchestration moving forward Links: Conductor Orkes People mentioned: Viren Baraiya (@virenbaraiya) Boney Sekh (@boneyorkes) Dilip Lukose (@diliplukose)
Advait Ruia (@Advait_Ruia) is the co-founder of SuperTokens, the open-source user authentication and authorization framework. SuperTokens integrates natively into both your front-end client and your backend endpoint. This approach gives developers more control over the user experience and allows for custom workflows. Tune in to find out why SuperTokens aims to be the best of both the build and the buy argument for authentication solutions. Subscribe to Contributor on Substack for email notifications, and join our Slack community! In this episode we discuss: How SuperTokens evolved from a blog post on session management into a full-fledged infrastructure company Why there is increasing demand for authentication providers Do founders need to be in the Bay Area? Advait's advice for building community and providing support Areas where SuperTokens could use outside contributions Links: SuperTokens SuperTokens Product Roadmap People mentioned: Rishabh Poddar (@rishpoddar) Other episodes: Hasura with Tanmai Gopal
Loris Degioanni (@lorisdegio) joins Eric Anderson (@ericmander) to chat about Falco, the open-source runtime security tool for modern cloud infrastructures. Loris is the founder and CTO of Sysdig, and co-creator of Wireshark, the legendary open-source packet analysis tool. Today, Loris talks about all these projects and more - tune in to learn about some deep history and Loris' predictions for the future. Subscribe to Contributor on Substack for email notifications, and join our Slack community! In this episode we discuss: How Loris began working with Gerald Combs as a student in Italy Why Loris' teams name their products after animals The new non-profit Wireshark Foundation Parallel development of cloud technology and containers during Loris' career The little things that make open-source projects go viral Links: Falco Sysdig Wireshark People mentioned: Solomon Hykes (@solomonhykes)
Emre Baran (@emre) is the CEO and co-founder of Cerbos, the open-source authorization layer for implementing roles and permissions. Cerbos allows developers to decouple authorization logic from core code into its own centrally distributed component. Easier said than done, perhaps - but Cerbos is secure, intentionally simple to implement, and developer-focused. Subscribe to Contributor on Substack for email notifications, and join our Slack community! In this episode we discuss: The difference between authentication and authorization Why Cerbos is language-agnostic Authorization patterns in a single application versus a larger network The reason most devs start out trying to do authorization themselves, and sometimes give up How the upcoming Cerbos Cloud will empower less technical users to deploy and manage policies and logs Links: Cerbos Cerbos Cloud Beta Zanzibar: Google's Consistent, Global Authorization System People mentioned: Charith Ellawala (Github: @charithe) Other episodes: Open Policy Agent with Torin Sandall
Eric Anderson (@ericmander) has a conversation with Liam Randall (@Hectaman) and Bailey Hayes (@baihay) of Cosmonic, the platform-as-a-service environment for building cloud-native applications using WebAssembly. Bailey is also on the steering committee for the Bytecode Alliance, which stewards WebAssembly. In 2021, Cosmonic donated their WebAssembly runtime, wasmCloud, to the CNCF as an open-source project. Today, Liam and Bailey trace the history of WebAssembly, and their personal paths alongside it. Subscribe to Contributor on Substack for email notifications, and join our Slack community! In this episode we discuss: How WebAssembly came together over the last decade to become the fourth standardized language of the web The moments when Bailey and Liam both realized they might be changing the future of computing Modding Microsoft Flight Simulator with Wasm modules Liam's thoughts on how WebAssembly will affect business models going forward Links: Cosmonic WebAssembly Bytecode Alliance CNCF wasmCloud Wasmtime WAMR Better together: A Kubernetes and Wasm case study Spin People mentioned: Kevin Hoffman (@KevinHoffman) Kelsey Hightower (@kelseyhightower) Guy Bedford (@guybedford) Peter Huene (@peterhuene) Chris Aniszczyk (@cra) Other episodes: Envoy Proxy with Matt Klein Suborbital with Connor Hicks
Eric Anderson (@ericmander) is joined by Milos Rusic (@rusic_milos) to discuss Haystack, the open-source NLP framework for leveraging Transformer models and building intelligent search systems. Milos and his colleagues at deepset were early contributors to Hugging Face's Transformer models, and began building pipelines for searching large document stores. Today, Haystack is wildly popular, with an active Discord community and over 6,000 GitHub stars. Subscribe to Contributor on Substack for email notifications, and join our Slack community! In this episode we discuss: A deep dive into how Haystack works and its many use cases How a customer demo with one-minute long queries helped inspire Haystack Marketing open-source projects vs word of mouth NLP applications working with structured data and translating between types of data Imagining a world where every person has their own personal ChatGPT Links: Haystack deepset Hugging Face Notion Other episodes: Milvus with Frank Liu
Eric Anderson (@ericmander) talks with Artyom Keydunov (@keydunov) about Cube, the semantic layer for building data applications. Cube helps engineers bridge data warehouses and data experiences, and provides access control, security, caching, and more helpful features. The project began in open-source and has evolved quite a lot over the last few years with a ton of community support. Subscribe to Contributor on Substack for email notifications, and join our Slack community! In this episode we discuss: What is a semantic layer? Coming up with the idea to open-source during a game of ping pong Setting a ten-company-deployment goal Using Cube to track COVID stats in lockdown How one contributor built a GraphQL API Links: Cube Superset Metabase Observable Streamlit People mentioned: Pavel Tiunov (@paveltiunov87)
Eric Anderson (@ericmander) and Erika Hokanson (@erikawh0) remember the life of Jeff Meyerson, creator of the influential podcast Software Engineering Daily. He passed during the summer of 2022. Still, his work lives on - thousands of episodes, talks, music, a book, and a community of dedicated listeners and engineers whose lives were touched by Jeff's dreams. Software Engineering Daily is still running, and you can listen to new episodes right here or wherever you get your podcasts. Subscribe to Contributor on Substack for email notifications, and join our Slack community! Links: Software Engineering Daily Software Engineering Radio The Prion (Soundcloud) (Spotify) You Are Not A Commodity Move Fast: How Facebook Builds Software People mentioned: Pranay Mohan (@pranaymohan)
We're kicking off the new year with a conversation between Eric Anderson (@ericmander), Sergei Egorov (@bsideup) and Eli Aleyner (@ealeyner). Sergei and Eli founded AtomicJar to maintain Testcontainers, the family of open-source libraries that allow developers to write and run integration tests locally, and treat them as unit tests. Testcontainers is wildly popular, with over six thousand GitHub stars (and climbing!). Tune in to find out how Sergei and Eli are helping people test their software quicker, easier, and more efficiently. Subscribe to Contributor on Substack for email notifications, and join our Slack community! In this episode we discuss: How Testcontainers solves the problem of confidence The value of Github's networking effect Inspiration from Amazon's S3 “test bunny” Consequences of Docker's over- and under-adoption Replicating success in other languages besides Java Links: Testcontainers AtomicJar Spring Quarkus Micronaut How We Maintain Security Testing within the Software Development Life Cycle People mentioned: Richard North (@whichrich) Kevin Wittek (@Kiview) Martin Fowler (@martinfowler)
Eric Anderson (@ericmander) is joined by Nate Rush (@naterush1997) and Aaron Diamond-Reivich (@_aaronDR) to talk about Mito, the open-source spreadsheet that generates Python code for data analysts. Mito is a Python library and acts as an extension to a Jupyter Notebook. Tune in to find out how the Mito team is bridging the gap in data science between spreadsheets and programming. Subscribe to Contributor on Substack for email notifications, and join our Slack community! In this episode we discuss: How Nate, Aaron and Aaron's fraternal twin brother Jake have been friends since middle school Programming tools for spreadsheet users vs spreadsheet tools for people who are trying to become programmers Advantages to integrating into other open-source projects Reflecting on the hype around Python data science Python needs for Mito's enterprise customers Links: Mito Project Jupyter pandas Superhuman Streamlit People mentioned: Jacob Diamond-Reivich (@Jake_Stack808)
Eric Anderson (@ericmander) and Simba Khadder (@simba_khadder) explore Featureform, the “virtual” feature store platform that aims to standardize data pipelines for machine learning. Contributor is no stranger to feature stores, but Simba has a broader definition than most. Join us to learn how Featureform enables data scientists and machine learning practitioners to solve a common, but rarely addressed organizational problem. Subscribe to Contributor on Substack for email notifications, and join our Slack community! In this episode we discuss: How there is no standard or north star for MLOps Why enterprise is where Featureform's value shines MLPlatform problems vs MLOps problems Why copy/paste and Git don't cut it Deploying MLOps solutions that make data scientists and everyone else happy Links: Featureform Terraform Apache Spark Feathr Other episodes: Tensorflow with Rajat Monga
Eric Anderson (@ericmander) hosts Ben Haynes (@benhaynes), CEO and co-founder of Directus. Directus is an open-source data platform that layers on SQL databases to provide an instant API, and includes a no-code data studio interface. Listen in to find out how Directus is aiming to democratize the modern data stack for everyone. Subscribe to Contributor on Substack for email notifications, and join our Slack community! In this episode we discuss: The inspiration to create an “admin interface on steroids” Reflecting on Directus' unusual linear growth trend How Directus powers digital experiences, applications, and internal dev tools Ben's thoughts on maintaining a sustainable, premium open-source experience Automated data processing with Directus Flows Links: Directus Supabase Other episodes: Chef with Adam Jacob
Eric Anderson (@ericmander) chats with Toni de la Fuente (@ToniBlyx) about how he created Prowler, an open source security tool for AWS. Toni talks about taking Prowler from a nights-and-weekends project to his current full-time job, managing a team of four. They discuss transitioning from primarily coding to primarily managing tickets and users, as well as being “client zero” and bringing the project to big companies. Subscribe to Contributor on Substack for email notifications, and join our Slack community! In this episode we discuss: The roadmap from open source Prowler to Prowler Pro Prowler's diverse set of users What Toni learned from quitting an earlier open source project The differences between Prowler and other security services for AWS Links: Prowler on Github Prowler Pro Verica Black Hat People mentioned: Aaron Rinehart Casey Rosenthal
Eric Anderson (@ericmander) meets legendary open-source developer Max Howell (@mxcl) to talk about tea, a decentralized protocol for remunerating the open-source ecosystem. Max is the creator of Homebrew, and he chats about his exit from the project. The conversation turns to his newest project, tea, which is an evolution of Brew, and takes inspiration from blockchain technology. They also discuss Max's famous interview at Google and his time working for Apple. Subscribe to Contributor on Substack for email notifications, and join our Slack community! In this episode we discuss: Max's experience creating Homebrew, one of the largest open-source projects ever The utility of Web3 beyond decentralized finance Writing a white paper for tea, “just like everyone else” Why Max wants a global team, with people in every time zone How tea ensures a sustainable future for open-source Links: Homebrew tea.xyz tea white paper Bitcoin white paper Max's Google interview tweet Log4j vulnerability “Nebraska” XKCD comic Nix OS People mentioned: Timothy Lewis
Eric Anderson (@ericmander) and Connor Hicks (@cohix) launch into detail on Suborbital, an open-source project that allows developers to create WebAssembly projects embedded in other applications. Connor conceived of Suborbital while frustrated with the cold start problem that can impact Function-as-a-Service platforms. Today, Suborbital collaborates with companies like Microsoft on a community called Wasm Builders, dedicated to sharing and developing innovations in WebAssembly applications. Subscribe to Contributor on Substack for email notifications, and join our Slack community! In this episode we discuss: The three tentpoles of WebAssembly that make it a useful foundation for Suborbital Surprising niche use cases for WebAssembly like IoT and data modeling Open-source tools in the Suborbital ecosystem Putting focus on building a larger Wasm Builders community Connor's thoughts on how WebAssembly can improve edge computing Links: Suborbital WebAssembly Suborbital Compute Atmo Reactr Subo Sat Firecracker
Eric Anderson (@ericmander) and Frank Liu (@frankzliu) talk about Milvus, the open-source vector database built for scalable similarity search. Vector databases are built to search, index and store embeddings, a requirement for powerful AI applications. Frank is Director of Operations at Zilliz, the company that stewards the project. Tune in to find out how Milvus is the database for the AI era. Subscribe to Contributor on Substack for email notifications, and join our Slack community! In this episode we discuss: A crash course on embeddings and vector databases Using Milvus for logo search, crypto predictions, drug discovery, and more Other open-source projects at Zilliz that complement Milvus “Embedding Everything” How Milvus incorporates tunable consistency to its search process Links: Milvus Zilliz Towhee Attu Feder Other episodes: Clickhouse with Alexey Milovidov and Ivan Blinkov Correction: Milvus is based on a “shared storage” architecture, not “shared nothing.”
Eric Anderson (@ericmander) reunites with old colleagues Kenn Knowles (@KennKnowles) and Pablo Estrada (@polecitoem) for a conversation on Apache Beam, the open-source programming model for data processing. The trio once worked together at Google, and Beam was a turning point in the history of open-source there. Today, both Kenn and Pablo are members of the Beam PMC, and join the show with the inside scoop on Beam's past, present and future. In this episode we discuss: Transitioning Beam to the Apache Way How “inner source” works at Google Thoughts on the relationship between batch processing and streaming Some ways that community “power users” have contributed to Beam Information on Beam Summit 2022, the first onsite summit since COVID began The first few people to register can use code BEAM_POD_INV for a discount on tickets! Links: Apache Beam Apache Spark Apache Flink Apache Nemo Apache Samza Apache Crunch MapReduce paper MillWheel paper FlumeJava paper Dataflow paper Beam Summit 2022 Website Other episodes: TensorFlow with Rajat Monga
Eric Anderson (@ericmander) returns to Temporal with co-founder Maxim Fateev (@mfateev) and principal engineer Dominik Tornow (@DominikTornow). When Maxim joined us in September of 2020, the company called their project a “workflow orchestrator.” Today, Temporal has grown in popularity and usability, but the terminology around that abstraction has changed. Tune in to track the evolution of what Maxim calls a genuinely “new category of software.” In this episode we discuss: New features and developments in the last 2 years The proper way to pronounce “Temporal” How Temporal guarantees that workflow execution actually runs to execution Describing Temporal as a new pair of glasses Replay, Temporal's first developer conference on August 25-26, in Seattle Links: Temporal Cadence Apache Cassandra Replay People mentioned: Samar Abbas (@samarabbas77) Other episodes: Temporal with Maxim Fateev Apache Cassandra with Patrick McFadin
Eric Anderson (@ericmander) interviews Avi Press (@avi_press) about Scarf, the distribution platform for open-source software that facilitates analytics and commercialization. Scarf offers a set of tools that allows founders and maintainers to understand adoption of their products, including Scarf Gateway, which provides a central access point to containers and packages. From there, open-source developers can connect with the people that rely on their work. In this episode we discuss: Why you can't rely on Github as a source of comprehensive data about open-source software Tracing a user's journey interacting with a project across multiple platforms How better observability allows maintainers to make better software Inspiring indie maintainers to commercialize their projects The privilege of being able to work in open-source, and how Scarf can enable a more inclusive developer community Links: Scarf Tidelift Gitcoin OpenTeams Aviyel
Eric Anderson (@ericmander) and Patrick Dougherty (@cpdough) talk about Rasgo, the data transformation platform for MLOps that makes generating SQL easy. The team at Rasgo recently open-sourced a package called RasgoQL, that allows users to execute SQL queries against a data warehouse using Python syntax. Tune in to find out how Rasgo aims to bridge an important gap in the Modern Data Stack. In this episode we discuss: The advantages of offering both a low-code/no-code UI and a Python interface "How can a data scientist, without needing full-time resources from data engineering, be somewhat self-sufficient in data prep and able to deliver those insights without a massive human capital investment needed?" Where Rasgo fits into the world of feature stores Why one Rasgo user took a trip to a wind farm in Texas Eric's predictions for the future of data prep and transformation Links: Rasgo RasgoQL DuckDB Delta Lake People mentioned: Jared Parker (@jaredtparker_)
Eric Anderson (@ericmander) and Willem Pienaar (@willpienaar) talk about Feast, the open-source feature store for machine learning. Feature stores act as a bridge between models and data, and allow data scientists to ship features into production without the need for engineers. Willem co-created Feast at Gojek, and later teamed up with the folks at Tecton to back the project. In this episode we discuss: The value of feature stores in MLOps What happens when you open-source too early Why most open-source code has nothing to hide Bringing an open-source project to an existing company Good and bad use cases for a feature store Links: Feast Tecton Turing Merlin Kubeflow apply() Conference People mentioned: Mike Del Balso Kevin Stumpf (@kevinmstumpf) Ajey Gore (@AjeyGore) Demetrios Brinkmann (@Dpbrinkm) Wes McKinney (@wesmckinn) Other episodes: Flyte with Ketan Umare Great Expectations with Abe Gong and Kyle Eaton
Eric Anderson (@ericmander) and Ketan Umare (@ketanumare) Flyte, the open-source workflow automation platform for large-scale machine learning and data use cases. Ketan is a former engineer at Lyft, where he created Flyte to help models in Pricing, Locations, ETA, and more. Today, the project allows machine learning developers everywhere to bring their ideas from conception to production. In this episode we discuss: How Flyte combines compute with parts of a workflow engine in a way that is best for the user The importance of reliable fares and ETA predictions at a ride-sharing app A progenitor to Flyte called “Better Airflow” Ketan's innovative approach to bringing typing to machine learning workloads Why Flyte landed at the Linux Foundation Links: Flyte Union.ai Apache Airflow Kubeflow Luigi MLTwist Other episodes: Great Expectations with Abe Gong and Kyle Eaton Envoy Proxy with Matt Klein
Eric Anderson (@ericmander) meets with Davit Buniatyan (@DBuniatyan) of Activeloop, the database for AI. Davit was inspired to found Activeloop while working on large datasets in a neuroscience research lab at Princeton. Powering the technology at Activeloop is Hub, the open-source dataset format for AI applications. Join us to learn how Hub promises to enhance and expand various verticals in deep learning. In this episode we discuss: Reconfiguring traditional ML tooling for the cloud Connectomics - working with thin slices of a mouse brain with neuroscientist Sebastian Seung Choosing between university, a start-up, and open-source Davit's original product, that ran computation on crypto mining GPUs on a distributed scale Focusing on different data modalities for computer vision Links: Activeloop Activeloop Hub Apache Parquet Apache Spark TensorFlow Snowflake Databricks Timescale People mentioned: Sebastian Seung (@SebastianSeung) Other episodes: TensorFlow with Rajat Monga
Eric Anderson (@ericmander), Alexander Jung (@nderjung) and Simon Kuenzer (Github: @skuenzer) get technical on Unikraft, the open-source unikernel development kit. Unikernels are specialized, high performing OS images that have the potential to revolutionize virtualization. Unikraft makes unikernels easy to use by prioritizing modularity, security, and POSIX-compatibility. In this episode we discuss: How Unikraft seeks wider adoption of unikernels in real-world applications Unikraft's background in research and academia Bottom-up as well as top-down specialization Building a community with a large proportion of students Links: Unikraft Unikraft: Fast, Specialized Unikernels the Easy Way Xen Project MirageOS HermitCore Firecracker
Eric Anderson (@ericmander) has a conversation with Yury Selivanov (@1st1), the co-founder of EdgeDB. EdgeDB is the world's first “graph-relational database.” It's a term coined specifically for this new type of database, designed to ease the pain of dealing with the usual relational and NoSQL models. And no, EdgeDB is NOT a graph database! In this episode we discuss: A glitch at EdgeDB's Matrix-inspired launch event Origin of the term and design philosophy, “graph-relational” What to know about becoming a Python core developer How EdgeDB's next-gen query language compares to GraphQL and SQL Links: EdgeDB magicstack uvloop People mentioned: Elvis Pranskevichus (@elprans) Colin McDonnell (@colinhacks) Victor Petrovykh (Github: @vpetrovykh) Dan Abramov (@dan_abramov) Brett Cannon (@brettsky) Daniel Levine (@daniel_levine) Other episodes: Hasura with Tanmai Gopal Dgraph with Manish Jain
Eric Anderson (@ericmander) sits down with Pete Goddard (@pete_paco) to talk about Deephaven, the open-core query engine built for real-time streams and batch data. Pete is the CEO of Deephaven Data Labs, and comes to the data world from a background in capital markets trading. Deephaven originally addressed a need for real-time data infrastructure in the finance world, but the team realized how useful their technology could be in a wider variety of verticals. Join us for Pete's unique perspective on reaching out into alternate industries and use cases through community development. In this episode we discuss: How Pete transitioned from Wall Street to open-source software Selling investors on open-source Two questions people always ask Pete The luxury of Deephaven's incremental update model Barrage, Deephaven's API for streaming tables that extends Apache Arrow Flight Links: Deephaven Barrage Apache Kafka Apache Arrow Flight Eclipse Jetty Other episodes: TensorFlow with Rajat Monga