Technical interviews about software topics.
ReactJS developers have lots of options for building their applications, and those options are not easy to work through. State management, concurrency, networking, and testing all have elements of complexity and a wide range of available tools. Take a look at any specific area of JavaScript application development, and you can find highly varied opinions.Kent Dodds is a JavaScript teacher who focuses on React, JavaScript, and testing. In today's episode, Kent provides best practices for building JavaScript applications, specifically React. He provides a great deal of advice on testing, which is unsurprising considering he owns TestingJavaScript.com. Kent is an excellent speaker who has taught thousands of people about JavaScript, so it was a pleasure to have him on the show.Kent is also speaking at Reactathon, a San Francisco JavaScript conference taking place March 30th and 31st in San Francisco. This week we will be interviewing speakers from Reactathon, and if you are interested in JavaScript and the React ecosystem then stay tuned, and if you hear something you like, you can check out the Reactathon conference in person.
A data warehouse serves the purpose of providing low latency queries for high volumes of data. A data warehouse is often part of a data pipeline, which moves data through different areas of infrastructure in order to build applications such as machine learning models, dashboards, and reports.Modern data pipelines are often associated with the term “ELT” or Extract, Load, Transform. In the “ELT” workflow, data is taken out of a source such as a data lake, loaded into a data warehouse, and then transformed within the data warehouse to create materialized views on the data. Data warehouse queries are usually written in SQL, and for the last 50 years, SQL has been the primary language for executing these kinds of queries. DBT is a system for data modeling that allows the user to write queries that involve a mix of SQL and a templating language called Jinja. Jinja allows the analyst to blend imperative code along with the declarative SQL. Tristan Handy is the CEO of Fishtown Analytics, the company that created DBT, and he joins the show to discuss how DBT works, and the role it plays in modern data infrastructure.
GraphQL is a system that allows frontend engineers to make requests across multiple data sources using a simple query format. In GraphQL, a frontend developer does not have to worry about the request logic for individual backend services. The frontend developer only needs to know how to issue GraphQL requests from the client, and these requests are handled by a GraphQL server.GraphQL is mostly used to issue queries across internal databases and services. But many of the data sources that a company needs to query in modern infrastructure are not databases–they are APIs like Salesforce, Zendesk, and Stripe. These API companies might store a large percentage of the data that a given company needs to query, and executing queries, subscriptions, and joins against these APIs is not a simple task.OneGraph is a company that builds integrations with third-party services and exposes them through a GraphQL interface. Sean Grove is a founder of OneGraph, and he joins the show to explain the problem that OneGraph solves, how OneGraph is built, and some of the difficult engineering challenges required to design OneGraph.
Cloud computing caused a fundamental economic shift in how software is built. Before the cloud, businesses needed to buy physical servers in order to operate. There was an up-front cost that often amounted to tens of thousands of dollars required to pay for these servers. Cloud computing changed the up-front capital expense to an ongoing operational expense, with businesses increasingly shifting to Amazon Web Services, Microsoft Azure, and Google Compute Platform. Although the initial motivation for moving onto cloud providers might have been decreased cost, over time the cloud providers have developed unique services that make software even easier to build than before. There has also been a proliferation of new software infrastructure companies that have been built on top of the cloud providers, giving rise to new databases, logging companies, and platform-as-a-service products.Danel Dayan is a venture investor with Battery Ventures and a co-author of the State of the OpenCloud 2019, a report that compiles a wide set of statistics and information on how cloud computing and open source are impacting the software industry. Danel joins the show to talk about his work as an investor, as well as his previous career at Google, where he worked on mergers and acquisitions.If you want to reach Danel you can email him at ddayan@battery.com or tweet at him via @daneldayan.
Play Episode Listen Later Mar 12, 2020
Lyft is a ridesharing company that generates a high volume of data every day. This data includes ride history, pricing information, mapping, routing, and financial transactions. The data is stored across a variety of different databases, data lakes, and queueing systems, and is processed at scale in order to generate machine learning models, reports, and data applications.Data workflows involve a set of interconnected systems such as Kubernetes, Spark, Tensorflow, and Flink. In order for these systems to work together harmoniously, a workflow manager is often used to orchestrate them together. A workflow platform lets a data engineer have a high-level view into how data moves through the system, and can be used to reason about retries, resource utilization, and scalability.Flyte is a data processing system built and open-sourced at Lyft. Allyson Gale and Ketan Umare work at Lyft, and they join the show to talk about how Flyte works, and why they needed to build a new workflow processing system when there are already tools available such as Airflow.
Descript is a software product for editing podcasts and video.Descript is a deceptively powerful tool, and its software architecture includes novel usage of transcription APIs, text-to-speech, speech-to-text, and other domain-specific machine learning applications. Some of the most popular podcasts and YouTube channels use Descript as their editing tool because it provides a set of features that are not found in other editing tools such as Adobe Premiere or a digital audio workstation.Descript is an example of the downstream impact of machine learning tools becoming more accessible. Even though the company only has a small team of machine learning engineers, these engineers are extremely productive due to the combination of APIs, cloud computing, and frameworks like TensorFlow.Descript was founded by Andrew Mason, who also founded Groupon and Detour, and Andrew joins the show to describe the technology behind Descript and the story of how it was built. It is a remarkable story of creative entrepreneurship, with numerous takeaways for both engineers and business founders.
Physical places have a large amount of latent data. Pick any location on a map, and think about all of the questions you could ask about that location. What businesses are at that location? How many cars pass through it? What is the soil composition? How much is the land on that location worth?The world of web-based information has become easy to query. We can use search engines like Google, as well as APIs like Diffbot and Clearbit. Today, the physical world is not so easy to query, but it is becoming easier. Location data as a service is a burgeoning field, with some vendors offering products for satellite data, foot traffic, and other specific location-based domains.SafeGraph is a company that provides location data-as-a-service. SafeGraph data sets include data about businesses, patterns describing human movement, and geometric representations describing the shape and size of buildings. Ryan Fox Squire develops data products for SafeGraph, and he joins the show to talk about the engineering and strategy that goes into building a data-as-a-service company.
A high volume of data can contain a high volume of useful information. That fact is well understood by the software world. Unfortunately, it is not a simple process to surface useful information from this high volume of data. A human analyst needs to understand the business, formulate a question, and determine what metrics could reveal the answer to such a question.Sisu is a system for automatically surfacing insights from large data sets within companies. A user of Sisu can select a database column that they are interested in learning more about, and Sisu will automatically analyze the records in the database to look for trends and relationships between that column and the other columns. For example, if I have a database of user purchases, including how much money those users spent on each purchase, I can ask Sisu to analyze the purchase price column, and find what kinds of attributes correlate with a high purchase price. Perhaps there will be correlations such as age and city that I can use to understand my customers better. Sisu can automatically surface these correlations and display them to me to help me make business decisions.Peter Bailis is the CEO of Sisu Data and an assistant professor at Stanford. Peter joins the show to give his perspective on the development of Sisu, which came out of his research on data-intensive systems, including MacroBase, an analytic monitoring engine that prioritizes human attention.
Software investing requires a deep understanding of the market, and an ability to predict what changes might occur in the near future. At the level of core infrastructure, software investing is particularly difficult. Databases, virtualization, and large scale data processing tools are all complicated, highly competitive areas.As the software world has matured, it has become apparent just how big these infrastructure companies can become. Consequently, the opportunities to invest in these infrastructure companies have become highly competitive.When a venture capital fund invests into an infrastructure company, the fund will then help the infrastructure company bring their product to market. This involves figuring out the product design, the sales strategy, and the hiring roadmap. A strong investor will be able to give insight into all of these different facets of building a software company.Vivek Saraswat is a venture investor with Mayfield, a venture fund that focuses on early to growth-stage investments. Vivek joins the show to discuss his experience at AWS, Docker, and Mayfield, as well as his broad lessons around how to build infrastructure companies today.
Infrastructure-as-code allows developers to use programming languages to define the architecture of their software deployments, including servers, load balancers, and databases. There have been several generations of infrastructure-as-code tools. Systems such as Chef, Puppet, Salt, and Ansible provided a domain-specific imperative scripting language that became popular along with the early growth of Amazon Web Services. Hashicorp's Terraform project created an open source declarative model for infrastructure. Kubernetes YAML definitions are also a declarative system for infrastructure as code.Pulumi is a company that offers a newer system for infrastructure as code, combining declarative and imperative syntax. Pulumi programs can be written in TypeScript, Python, Go, or .NET. Joe Duffy is the CEO of Pulumi, and he joins the show to talk about his work on the Pulumi project and his vision for the company. Joe also discusses his twelve years at Microsoft, and how his work in programming language tooling shaped how he thinks about building infrastructure-as-code.
Over the last fifteen years, there has been a massive increase in the number of new software tools. This is true at the infrastructure layer: there are more databases, more cloud providers, and more open-source projects. And it's also true at a higher level: there are more APIs, project management systems, and productivity tools.ClickUp is a project management and productivity system for organizations and individuals. The goal of ClickUp is to create a system that integrates closely with other project management systems, popular SaaS tools, and the Google Suite of docs and spreadsheets. The company was started in 2016, and despite raising zero outside capital, it has grown as rapidly as many venture-backed companies.Zeb Evans and Alex Yurkowski are the founders of ClickUp. They join the show to talk about their experience building the company. We talk through their process of scaling the infrastructure, and their philosophy of moving fast. This episode has some useful strategic advice for anyone who is looking to take a product to market and iterate quickly–even if that product is bootstrapped. Full disclosure: ClickUp is a sponsor of Software Engineering Daily.
A large cloud provider has high volumes of network traffic moving through data centers throughout the world. These providers manage the infrastructure for thousands of companies, across racks and racks of multitenant servers, and cables that stretch underseas, connecting network packets with their destination.Google Cloud Platform has grown steadily into a wide range of products, including database services, machine learning, and containerization. Scaling a cloud provider requires both technical expertise and skillful management.Lakshmi Sharma is the director of product management for networking at Google Cloud Platform. She joins the show to discuss the engineering challenges of building a large scale cloud provider, including reliability, programmability, and how to direct a large hierarchical team.We're looking for new show ideas, so if you have any interesting topics, please feel free to reach out via twitter or email us at jeff@softwareengineeringdaily.com
Datomic is a database system based on an append-only record keeping system. Datomic users can query the complete history of the database, and Datomic has ACID transactional support. The data within Datomic is stored in an underlying database system such as Cassandra or Postgres. The database is written in Clojure, and was co-authored by the creator of Clojure, Rich Hickey.Datomic has a unique architecture, with a component called a Peer, which gets embedded in an application backend. A Peer stores a subset of the database data in memory in this application backend, improving the latency of database queries that hit this caching layer.Marshall Thompson works at Cognitect, the company that supports and sells the Datomic database. Marshall joins the show to talk about the architecture of Datomic, its applications, and the life of a query against the database.We're looking for new show ideas, so if you have any interesting topics, please feel free to reach out via twitter or email us at jeff@softwareengineeringdaily.com
Programming languages are dynamically typed or statically typed. In a dynamically typed language, the programmer does not need to declare if a variable is an integer, string, or other type. In a statically typed language, the developer must declare the type of the variable upfront, so that the compiler can take advantage of that information.Dynamically typed languages give a programmer flexibility and fast iteration speed. But they also introduce the possibility of errors that can be avoided by performing type checking. This is one of the reasons why TypeScript has risen in popularity, giving developers the option to add types to their JavaScript variables.Sorbet is a typechecker for Ruby. Sorbet allows for gradual typing of Ruby programs, which helps engineers avoid errors that might otherwise be caused by the dynamic type system. Dmitry Petrashko is an engineer at Stripe who helped build Sorbet. He has significant experience in compilers, having worked on Scala before his time at Stripe. Dmitry joins the show to discuss his work on Sorbet, and the motivation for adding type checking to Ruby.We're looking for new show ideas, so if you have any interesting topics, please feel free to reach out via twitter or email us at jeff@softwareengineeringdaily.comWe realize right now humanity is going through a hard time with the Caronovirus pandemic, but we all have skills useful to fight this battle. Head over to codevid19.com to join the world's largest pandemic hackathon!
Remote engineering work makes some elements of software development harder, and some elements easier. With Slack and email, communication becomes more clear cut. Project management tools lay out the responsibilities and deliverables of each person. GitHub centralizes and defines the roles of developers.On the other hand, remote work subtracts the role of nuanced conversation. There is no water cooler or break room. Work can become systematic, rigid, and completely transactional. Your co-workers are your allies, but they feel less like friends when you don't see them every day. For some people, this can have a devastating long-term impact on their psyche.Managers have the responsibility of ensuring the health and productivity of the people that work with them. Managing an all-remote team includes a different set of challenges than an in-person team. Ryan Chartrand is the CEO of X-Team, a team of developers who work across the world and collaborate with each other remotely. X-Team partners with large companies who need additional development work. Ryan joins the show to talk about the dynamics of leading a large remote workforce, as well as his own personal experiences working remotely.
Food delivery apps have changed how the restaurant world operates. After seven years of mobile food delivery, the volume of food ordered through these apps has become so large that entire restaurants can be sustained solely through the order flow that comes in from the apps. This raises the question as to why you even need an “on-prem” restaurant.A cloud kitchen is a large, shared kitchen where food is prepared for virtual restaurants. These virtual restaurants exist only on mobile apps. There are no waiters, there are only the food delivery couriers who pick up the food from these warehouse-sized food preparation facilities. A virtual restaurant entrepreneur could open up multiple restaurants operated from the same cloud kitchen. The mobile app user might see separate restaurant listings for a pizza place, a cookie bakery, and a Thai food restaurant, when all of them are operated by the same restaurateur.Ashley Colpaart is the founder of The Food Corridor, a system for cloud kitchen management. Ashley joins the show to talk about the dynamics of virtual restaurants and the cloud kitchen industry.
Modern web development involves a complicated toolchain for managing dependencies. One part of this toolchain is the bundler, a tool that puts all your code and dependencies together into static asset files. The most popular bundler is webpack, which was originally released in 2012, before browsers widely supported ES Modules.Today, every major browser supports the ES Module system, which improves the efficiency of JavaScript dependency management. Snowpack is a system for managing dependencies that takes advantage of the browser support for ES Modules. Snowpack is made by Pika, a company that is developing a set of web technologies including a CDN, a package catalog, and a package code editor.Fred Schott is the founder of Pika and the creator of Snowpack. Fred joins the show to talk about his goals with Pika, and the ways in which modern web development is changing.
Facebook Messenger is a chat application that millions of people use every day to talk to each other. Over time, Messenger has grown to include group chats, video chats, animations, facial filters, stories, and many more features. Messenger is a tool for utility as well as for entertainment.Messenger is used both on mobile and on desktop, but the size of the mobile application is particularly important on mobile. There are many users who are on devices that do not have much storage space.As Messenger has accumulated features, the iOS code base has grown larger and larger. Several generations of Facebook engineers have rotated through the company with the responsibility of working on Facebook Messenger, which has led to different ways of managing information within the same codebase. The iOS codebase had room for improvement.Project Lightspeed was a project within Facebook that had the goal of making Messenger on iOS much smaller. Mohsen Agsen is an engineer with Facebook, and he joins the show to talk about the process of rewriting the Messenger app.
Cortico is a non-profit that builds audio tools to improve public dialogue. Allison King is an engineer at Cortico, and she joins the show to talk about the process of building audio applications. One of these applications was a system for ingesting radio streams, transcribing the radio, and looking for duplicate information across the different radio stations. In a talk at Data Council, Allison talked through the data engineering architecture for processing these radio streams, and the patterns that she found across the radio streams, including clusters of political leanings.Another project from Cortico is called Local Voices Network. The Local Voices Network is built around a piece of hardware called a “digital hearth”, a specialized device that records discussions among people in a community. These community discussions are made available to journalists, public officials, and political candidates, creating a listening channel that connects these communities and stakeholders. Much of our conversation is focused on the engineering of the digital hearth, this device that sits in the center of community discussions.
Serverless tools have come a long way since the release of AWS Lambda in 2014. Serverless apps were originally architected around Lambda, with the functions-as-a-service being used to glue together larger pieces of functionality and API services.Today, many of the common AWS services such as API Gateway and DynamoDB have functionality built in to be able to respond to events. These services can use Amazon EventBridge to connect to each other. In many cases, a developer does not need AWS Lambda to glue services together in order to build an event-driven application.Jeremy Daly is the host of the Serverless Chats podcast, a show about patterns and strategies in serverless architecture. Jeremy joins the show to talk about modern serverless development, and the new tools available in the AWS ecosystem.
V8 is the JavaScript engine that runs Chrome. Every popular website makes heavy use of JavaScript, and V8 manages the execution environment of that code. The code that processes in your browser can run faster or slower depending on how “hot” the codepath is. If a certain line of code is executed frequently, that code might be optimized to run faster.V8 is running behind the scenes in your browser all the time, evaluating the code in your different tabs and determining how to manage that runtime in memory. As V8 is observing your code and analyzing it, V8 needs to allocate resources in order to determine what code to optimize. This process can be quite memory intensive, and can add significant overhead to the memory overhead of Chrome.Ross McIlroy is an engineer at Google, where he worked on a project called V8 Lite. The goal of V8 Lite was to significantly reduce the execution overhead of V8. Ross joins the show to talk about JavaScript memory consumption, and his work on V8 Lite. We have done some great shows on JavaScript in the past, which you can find on SoftwareDaily.com. Also, if you are interested in writing about JavaScript, we have a new writing feature that you can check out by going to SoftwareDaily.com/write.
Building a game is not easy. The development team needs to figure out a unique design and gameplay mechanics that will attract players. There is a great deal of creative work that goes into making a game successful, and these games are often built with low budgets by people who are driven by the art and passion of game creation.A game engine is a system used to build and run games. Game engines let the programmer work at a high level of abstraction, by providing interfaces for graphics, physics, and scripting. Popular game engines include Unreal Engine and Unity, both of which require a license that reduces the amount of money received by the game developer.Godot is an open source and free to use game engine. The project was started by Juan Linietsky, who joins the show to discuss his motivation for making Godot. We have done some great shows on gaming in the past, which you can find on SoftwareDaily.com. Also, if you are interested in writing about game development, we have a new writing feature that you can check out by going to SoftwareDaily.com/write.
Kafka is a distributed stream processing system that is commonly used for storing large volumes of append-only event data. Kafka has been open source for almost a decade, and as the project has matured, it has been used for new kinds of applications. Kafka's pubsub interface for writing and reading topics is not ideal for all of these applications, which has led to the creation of ksqlDB, a database system built for streaming applications that uses Kafka as the underlying infrastructure for storing data.Michael Drogalis is a principal product manager at Confluent, where he helped develop ksqlDB. Michael joins the show to discuss ksqlDB, including the architecture, the query semantics, and the applications which might want a database that focuses on streams. We have done many great shows on Kafka in the past, which you can find on SoftwareDaily.com. Also, if you are interested in writing about Kafka, we have a new writing feature that you can check out by going to SoftwareDaily.com/write.
A workflow is an application that involves more than just a simple request/response communication. For example, consider a session of a user taking a ride in an Uber. The user initiates the ride, and the ride might last for an hour. At the end of the ride, the user is charged for the ride and sent a transactional email.Throughout this entire ride, there are many different services and database tables being accessed across the Uber infrastructure. The transactions across this infrastructure need to be processed despite server failures which may occur along the way. Workflows are not just a part of Uber. Many different types of distributed operations at a company might be classified as a workflow: banking operations, spinning up a large cluster of machines, performing a distributed cron job.Maxim Fateev is the founder of Temporal.io, and the co-creator of Cadence, a workflow orchestration engine. Maxim developed Cadence when he was at Uber, seeing the engineering challenges that come from trying to solve the workflow orchestration problem. Before Uber, Maxim worked at AWS on the Simple Workflow Service, which was also a system for running workflows. Altogether, Maxim has developed workflow software for more than a decade.
Machine learning models require the use of training data, and that data needs to be labeled. Today, we have high quality data infrastructure tools such as TensorFlow, but we don't have large high quality data sets. For many applications, the state of the art is to manually label training examples and feed them into the training process.Snorkel is a system for scaling the creation of labeled training data. In Snorkel, human subject matter experts create labeling functions, and these functions are applied to large quantities of data in order to label it. For example, if I want to generate training data about spam emails, I don't have to hire 1000 email experts to look at emails and determine if they are spam or not. I can hire just a few email experts, and have them define labeling functions that can indicate whether an email is spam. If that doesn't make sense, don't worry. We discuss it in more detail in this episode.Braden Hancock works on Snorkel, and he joins the show to talk about the labeling problems in machine learning, and how Snorkel helps alleviate those problems. We have done many shows on machine learning in the past, which you can find on SoftwareDaily.com. Also, if you are interested in writing about machine learning, we have a new writing feature that you can check out by going to SoftwareDaily.com/write.
When a developer spins up a virtual machine on AWS, that virtual machine could be purchased using one of several types of cost structures. These cost structures include on-demand instances, spot instances, and reserved instances.On-demand instances are often the most expensive, because the developer gets reliable VM infrastructure without committing to long-term pricing. Spot instances are cheap, spare compute capacity with lower reliability, that is available across AWS infrastructure. Reserved instances allow a developer to purchase longer term VM contracts for a lower price.Reserved instances can provide significant savings, but it can be difficult to calculate how much infrastructure to purchase. Aran Khanna is the founder of Reserved.ai, a company that builds cost management tools for AWS. He joins the show to talk about the landscape of cost management, and what he is building with Reserved.ai.
Data analysts need to collaborate with each other in the same way that software engineers do. They also need a high quality development environment. These data analysts are not working with programming languages like Java and Python, so they are not using an IDE such as Eclipse. Data analysts predominantly use SQL, and the tooling for a data analyst to work with SQL is often a SQL explorer tool that lacks the kind of collaborative experience that we would expect in the age of Slack and GitHub.Rahil Sondhi is the creator of PopSQL, a collaborative SQL explorer. He created PopSQL after several years in the software industry, including 4 years at Instacart. Rahil joins the show to talk about the frictions that data analysts encounter when working with databases, and how those frictions led to the design of PopSQL.
Ceph is a storage system that can be used for provisioning object storage, block storage, and file storage. These storage primitives can be used as the underlying medium for databases, queueing systems, and bucket storage. Ceph is used in circumstances where the developer may not want to use public cloud resources like Amazon S3.As an example, consider telecom infrastructure. Telecom companies that have their own data centers need software layers which make it simpler for the operators and developers that are working with that infrastructure to spin up databases and other abstractions with the same easy experience that is provided by a cloud provider by AWS.Sage Weil has been a core developer on Ceph since 2005, and the company he started around Ceph sold to Red Hat for $175 million. Sage joins the show to talk about the engineering behind Ceph and his time spent developing companies.
Shopify is a platform for selling products and building a business. It is a large e-commerce company with hundreds of engineers and several different mobile apps. Shopify's engineering culture is willing to adopt new technologies aggressively, trying new tools that might provide significant leverage to the organization.React Native is one of those technologies. React Native can be used to make cross-platform mobile development easier by allowing code reuse between Android and iOS. React Native was developed within Facebook, and has been adopted by several other prominent technology companies, with varying degrees of success. Many companies have seen improvements to their mobile development and release process. However, in a previous episode, we talked with Airbnb about their adoption of React Native, which was less successful.Farhan Thawar is a VP of engineering at Shopify. He joins the show to talk about Shopify's experience using React Native, the benefits of cross-platform development, and his perspective on when it is not a good idea to use React Native.
NGINX is a web server that is used as a load balancer, an API gateway, a reverse proxy, and other purposes. Core application servers such as Ruby on Rails are often supported by NGINX, which handles routing the user requests between the different application server instances. This model of routing and load balancing between different application instances has matured over the last ten years due to an increase in the number of servers, and an increase in the variety of services. A pattern called “service mesh” has grown in popularity and is used to embed routing infrastructure closer to individual services by giving them a sidecar proxy. The application sidecars are connected to each other, and requests between any two services are routed through a proxy. These different proxies are managed by a central control plane which manages policies of the different proxies.Alan Murphy works at NGINX, and he joins the show to give a brief history of NGINX and how the product has evolved from a reverse proxy and edge routing tool to a service mesh. Alan has worked in the world of load balancing and routing for more than a decade, having been at F5 Networks for many years before F5 acquired NGINX. We also discussed the business motivations behind the merger of those two companies. Full disclosure: NGINX is a sponsor of Software Engineering Daily.
Play Episode Listen Later Apr 17, 2020
Facebook applications use maps for showing users where to go. These maps can display businesses, roads, and event locations. Understanding the geographical world is also important for performing search queries that take into account a user's location. For all of these different purposes, Facebook needs up-to-date, reliable mapping data.OpenStreetMap is an open system for accessing mapping data. Anyone can use OpenStreetMap to add maps to their application. The data in OpenStreetMap is crowdsourced by users who submit updates to the OpenStreetMap database. Since anyone can submit data to OpenStreetMap, there is a potential for bad data to appear in the system.Facebook uses OpenStreetMap for its mapping data, including for important applications where bad data would impact a map user in a meaningfully negative way. In order to avoid this, Facebook builds infrastructure tools to improve the quality of its maps. Saurav Mapatra and Jacob Wasserman work at Facebook on its mapping infrastructure, and join the show to talk about the tooling Facebook has built around OpenStreetMap data.
Zoom video chat has become an indispensable part of our lives. In a crowded market of video conferencing apps, Zoom managed to build a product that performs better than the competition, scaling with high quality to hundreds of meeting participants, and millions of concurrent users.Zoom's rapid growth in user adoption came from its focus on user experience and video call quality. This focus on product quality came at some cost to security quality. As our entire digital world has moved onto Zoom, the engineering community has been scrutinizing Zoom more closely, and discovered several places where the security practices of Zoom are lacking.Patrick Wardle is an engineer with a strong understanding of Apple products. He recently wrote about several vulnerabilities he discovered on Zoom, and joins the show to talk about the security of large client-side Mac applications as well as the specific vulnerabilities of Zoom.
Web development has historically had more work being done on the server than on the client. The observability tooling has reflected this emphasis on the backend. Monitoring tools for log management and backend metrics have existed for decades, helping developers debug their server infrastructure.Today, web frontends have more work to do. Detailed components in frameworks such as React and Angular might respond quickly without waiting for a network request, with their mutations being processed entirely in the browser. This results in better user experiences, but more work is being done on the client side, away from the backend observability tools.Matt Arbesfeld is a co-founder of LogRocket, a tool that records and plays back browser sessions and allows engineers to look at those sessions to understand what kinds of issues are occurring in the user's browser. Matt joins the show to talk about the field of frontend monitoring, and the engineering behind his company LogRocket.
NGINX is a web server that can be used to manage the APIs across an organization. Managing these APIs involves deciding on the routing and load balancing across the servers which host them. If the traffic of a website suddenly spikes, the website needs to spin up new replica servers and update the API gateway to route traffic to those new replicas.Some servers should not be accessible to outside traffic, and policy management is used to configure the security policies of different APIs. And as a company grows, the number of APIs also grows, increasing the complexity of managing routing logic and policies.Kevin Jones is a product manager with NGINX. He joins the show to discuss how API management has changed with the growth of cloud and mobile, and how NGINX has evolved over that period of time. Full disclosure: NGINX is a sponsor of Software Engineering Daily.
Play Episode Listen Later Apr 23, 2020
Serverless computing is a way of designing applications that do not directly address or deploy application code to servers. Serverless applications are composed of stateless functions-as-a-service and stateful data storage systems such as Redis or DynamoDB. Serverless applications allow for scaling up and down the entire architecture, because each component is naturally scalable. And this pattern can be used to create a wide variety of applications. The functions-as-a-service can handle the compute logic, and the data storage systems can handle the storage. But these applications do not give the developer as much flexibility as an ideal serverless system might. The developer would need to use cloud-specific state management systems.Vikram Sreekanti is the creator of Cloudburst, a system for stateful functions as a service. Cloudburst is architected as a set of VMs that can execute functions-as-a-service that are scheduled onto them. Each VM can utilize a local cache, as well as an autoscaling key-value store called Anna which is accessible to the Cloudburst runtime components. Vikram joins the show to talk about serverless computing and his efforts to build stateful serverless functionality.
Chatbots became widely popular around 2016 with the growth of chat platforms like Slack and voice interfaces such as Amazon Alexa. As chatbots came into use, so did the infrastructure that enabled chatbots. NLP APIs and complete chatbot frameworks came out to make it easier for people to build chatbots.The first suite of chatbot frameworks were largely built around rule-based state machine systems. These systems work well for a narrow set of use cases, but fall over when it comes to chatbot models that are more complex. Rasa was started in 2015, amidst the chatbot fever. Since then, Rasa has developed a system that allows a chatbot developer to train their bot through a system called interactive learning. With interactive learning, I can deploy my bot, spend some time talking to it, and give that bot labeled feedback on its interactions with me. Rasa has open source tools for natural language understanding, dialogue management, and other components needed by a chatbot developer.Tom Bocklisch works at Rasa, and he joins the show to give some background on the field of chatbots and how Rasa has evolved over time.
Python is the most widely used language for data science, and there are several libraries that are commonly used by Python data scientists including Numpy, Pandas, and scikit-learn. These libraries improve the user experience of a Python data scientist by giving them access to high level APIs.Data science is often performed over huge datasets, and the data structures that are instantiated with those datasets need to be spread across multiple machines. To manage large distributed datasets, a library such as scikit-learn can use a system called Dask. Dask allows the instantiation of data structures such as a Dask dataframe or a Dask array.Matthew Rocklin is the creator of Dask. He joins the show to talk about distributed computing with Dask, its use cases, and the Python ecosystem. He also provides a detailed comparison between Dask and Spark, which is also used for distributed data science.
A relational database often holds critical operational data for a company, including user names and financial information. Since this data is so important, a relational database must be architected to avoid data loss.Relational databases need to be a distributed system in order to provide the fault tolerance necessary for production use cases. If a database node goes down, the database must be able to recover smoothly without data loss, and this requires having all of the data in the database replicated beyond a single node.If you write to a distributed transactional database, that write must propagate to each of the other nodes in the database. If you read from a distributed database, that read must return the same data that any other database reader would see. These constraints can be satisfied differently depending on the design of the database system. As a result, there is a vast market of distributed databases from cloud providers and software vendors.CockroachDB is an open source, globally consistent relational database. CockroachDB is heavily informed by Google Spanner, the relational database that Google uses for much of its transactional workloads. Peter Mattis is a co-founder of CockroachDB, and he joins the show to discuss the architecture of CockroachDB, the process of building a business around a database, and his memories working on distributed systems at Google. Full disclosure: CockroachDB is a sponsor of Software Engineering Daily.
A data workflow scheduler is a tool used for connecting multiple systems together in order to build pipelines for processing data. A data pipeline might include a Hadoop task for ETL, a Spark task for stream processing, and a TensorFlow task to train a machine learning model. The workflow scheduler manages the tasks in that data pipeline and the logical flow between them. Airflow is a popular data workflow scheduler that was originally created at Airbnb. Since then, the project has been adopted by numerous companies that need workflow orchestration for their data pipelines. Jeremiah Lowin was a core committer to Airflow for several years before he identified several features of Airflow that he wanted to change.Prefect is a dataflow scheduler that was born out of Jeremiah's experience working with Airflow. Prefect's features include data sharing between tasks, task parameterization, and a different API than Airflow. Jeremiah joins the show to discuss Prefect, and how his experience with Airflow led to his current work in dataflow scheduling.
Play Episode Listen Later Apr 30, 2020
A content management system (CMS) defines how the content on a website is arranged and presented. The most widely used CMS is WordPress, the open source tool that is written in PHP. A large percentage of the web consists of WordPress sites, and WordPress has a huge ecosystem of plugins and templates.Despite the success of WordPress, the JAMStack represents the future of web development. JAM stands for JavaScript, APIs, and Markup. In contrast to the monolithic WordPress deployments, a JAMStack site consists of loosely coupled components. And there are numerous options for a CMS in this environment.TinaCMS is one such option. TinaCMS is an acronym for “Tina Is Not A CMS”, and it is a toolkit for content management. Scott Gallant, Jordan Patterson, and Nolan Phillips work on TinaCMS, and they join the show to explore the topic of content management on the JAMStack.
A data warehouse is a system for performing fast queries on large amounts of data. A data lake is a system for storing high volumes of data in a format that is slow to access. A typical workflow for a data engineer is to pull data sets from this slow data lake storage into the data warehouse for faster querying.Apache Spark is a system for fast processing of data across distributed datasets. Spark is not thought of as a data warehouse technology, but it can be used to fulfill some of the responsibilities. Delta is an open source system for a storage layer on top of a data lake. Delta integrates closely with Spark, creating a system that Databricks refers to as a “data lakehouse.”Michael Armbrust is an engineer with Databricks. He joins the show to talk about his experience building the company, and his perspective on data engineering, as well as his work on Delta, the storage system built for the Spark ecosystem.
We are all living in social isolation due to the quarantine from COVID-19. Isolation is changing our habits and our moods, ravaging the economy, and changing how we work. One positive change is that more people have been reconnecting with their friends and family over frequent calls and video chats.Isolation is not a normal way for humans to live. We are social animals, and we need social interaction. We've changed how we use Internet products. There has been an evolution of trends in online shopping, social networking, and video communication software.Courtland Allen is the founder of Indie Hackers and Anurag Goel is the founder of Render, a new cloud provider. Both Courtland and Anurag are friends of mine, and join this episode to talk about how their lives are changing as a result of social isolation.
For many applications, a transactional MySQL database is the source of truth. To make a MySQL database scale, some developers deploy their database using Vitess, a sharding system built on top of Kubernetes. Jiten Vaidya and Anthony Yeh work at PlanetScale, a company that focuses on building and supporting MySQL databases sharded with Vitess. Their experience comes from working at YouTube, which has a massive, rapidly growing database for storing the information about videos on the site. Sharding is not the only database problem that YouTube faced. Availability was another issue.At YouTube, the database operators want YouTube's MySQL cluster to be resilient to the failure of an entire data center. Similarly, a developer deploying an important MySQL database to the cloud wants their database to be resilient to the failure of an entire cloud provider. Jiten and Anthony join the show to talk about their work building multicloud support for MySQL, and their process of deploying a consistent MySQL database in Azure, GCP, and AWS.
Redis is an in-memory object storage system that is commonly used as a cache for web applications. This core primitive of in-memory object storage has created a larger ecosystem encompassing a broad set of tools. Redis is also used for creating objects such as queues, streams, and probabilistic data structures.Machine learning systems also need access to fast, in-memory object storage. RedisAI is a newer module for supporting machine learning tasks. For serverless computing, RedisGears allows for the execution of functions close to your Redis instance. RedisEdge allows for edge computing with Redis.Alvin Richards returns to the show to discuss the expansion of Redis to becoming a broad suite of in-memory tools, as well as the resiliency properties of Redis and usage patterns for the tool. RedisLabs is a sponsor of Software Engineering Daily, and RedisConf is a virtual conference around Redis that runs May 12-13. If you are interested in Redis, you can check out RedisConf for free by going to RedisConf.com.
“Data stream” is a word that can be used in multiple ways. A stream can refer to data in motion or data at rest. When a stream is data in motion, an endpoint is receiving new pieces of data on a continual basis. Each new data point is sent over the wire and captured by the other end. Another way a stream can be represented is as a sequence of events that have been written to a storage medium. This is a stream at rest.Pravega is a system for storing large streams of data. Pravega can be used as an alternative to systems like Apache Kafka or Apache Pulsar. Flavio Junquiera is an engineer at Dell EMC who works on Pravega. He joins the show to talk about the history of stream processing and his work on Pravega.
Dropbox is a consumer storage product with petabytes of data. Dropbox was originally started on the cloud, backed by S3. Once there was a high enough volume of data, Dropbox created its own data centers, designing hardware for the express purpose of storing user files. Over the last 13 years, Dropbox's infrastructure has developed hardware, software, networking, data center infrastructure, and operational procedures that make the cloud storage product best in class.Andrew Fong has been an engineer at Dropbox for 8 years. He joins the show to talk about how the Dropbox engineering organization has changed over that period of time, and what he is doing at the company today.
Social distancing has been imposed across the United States. We are running an experiment unlike anything before it in history, and it is likely to have a lasting impact on human behavior. By looking at location data of how people are moving around today, we can examine the real-world impacts of social distancing.SafeGraph is a company that provides geospatial location data to be used by developers and researchers. Much of their data is aggregated from cell phone GPS pings which identify where anonymized users are in the world. This data set provides the basis for SafeGraph's social distancing metrics, which measure how frequently people are coming into contact with one another.Ryan Fox Squire works at SafeGraph, and he returns to the show to discuss social distancing metrics and the research that has come out of studying these metrics.
Infrastructure-as-code tools are used to define the architecture of software systems. Common infrastructure-as-code tools include Terraform and AWS CloudFormation. When infrastructure is defined as code, we can use static analysis tools to analyze that code for configuration mistakes, just as we could analyze a programming language with traditional static analysis tools.When a developer writes a program, that developer might use static analysis to parse a program for common mistakes–memory leaks, potential null pointers, and security holes. The concept of static analysis can be extended to infrastructure as code, allowing for the discovery of higher level problems such as insecure policies across cloud resources.Guy Eisenkot is an engineer with Bridgecrew, a company that makes static analysis tools for security and compliance. Guy joins the show to talk about cloud security and how static analysis can be used to improve the quality of infrastructure deployments.
A large software company such as Dropbox is at a constant risk of security breaches. These security breaches can take the form of social engineering attacks, network breaches, and other malicious adversarial behavior. This behavior can be surfaced by analyzing collections of log data.Log-based threat response is not a new technique. But how should those logs be analyzed? Grapl is a system for modeling log data as a graph, and analyzing that graph for threats based on how nodes in the graph have interacted. By building a graph from log data, Grapl can classify interaction patterns that correspond to threats.Colin O'Brien is the creator of Grapl, and he joins the show to discuss security, as well as threat detection and response.
Play Episode Listen Later May 14, 2020
A credit score is a rating that allows someone to qualify for a line of credit, which could be a loan such as a mortgage, or a credit card. We are assigned a credit score based on a credit history, which could be related to work history, rental payments, or loan repayments. One problem with the credit scoring system is that it is not internationalized. If I am coming from Brazil, I have a rental history of someone from Brazil. That information does not get naturally ported over to the United States. There needs to be a system for translating a foreign credit history to a US credit history.Nova Credit is a company that makes a credit passport–a system for allowing users in one geographic location to use the credit history that they have built up to have credit in another location, namely the United States. Brian Regan and Misha Esipov work at Nova Credit, and they join the show to talk about how the company works, and the problem it solves.
Amazon's virtual server instances have come a long way since the early days of EC2. There are now a wide variety of available configuration options for spinning up an EC2 instance, which can be chosen from based on the workload that will be scheduled onto a virtual machine. There are also Fargate containers and AWS Lambda functions, creating even more options for someone who wants to deploy virtualized infrastructure.The high demand for virtual machines has led to Amazon moving down the stack, designing custom hardware such as the Nitro security chip, and low level software such as the Firecracker virtual machine monitor. AWS also has built Outposts, which allow for on-prem usage of AWS infrastructure.Anthony Liguori is an engineer at AWS who has worked on a range of virtualization infrastructure: software platforms, hypervisors, and hardware. Anthony joins the show to talk about virtualization at all levels of the stack.
For the last five months, we have been working on a new version of Software Daily, the platform we built to host and present our content. We are creating a platform that integrates the podcast with a set of other features that make it easier to learn from the audio interviews. Software Daily includes the following features: The world of software is large, and growing bigger every day. Software Daily is a place to explore this world of software companies and projects.If the podcast is a useful resource for you to learn about software, then Software Daily might also provide you with value. This post (and episode) is a brief description of the features that we have built into Software Daily.If you want to listen to Software Engineering Daily without ads, you can become a paid subscriber, paying $10/month or $100/year by going to softwaredaily.com/subscribe. We now have an RSS feed that paid customers can add to a podcast player like Overcast (on iOS) or Podcast Addict (on Android). You can also listen to the premium episodes using our apps for iOS or Android.Whether you are a listener who is fine with listening to ads, or you are a listener who pays to hear episodes without ads, we are happy to have you tuning in.Apple podcasts limits the number of episodes in an RSS feed to 300. The feed with the last 300 episodes is available by searching for Software Daily. In total, we have more than 1200 episodes in our back catalog.Listeners often want to find all our episodes on React, or Kubernetes, or serverless, or self-driving cars. We have been covering these topics for years, and much of the old content has retained its value. Software Daily allows you to easily find all the episodes relating to a subject that you are interested in.You can also find our most popular episodes, ranked by how people interact with them.Additionally, episode transcripts have interactive features with highlighting, commenting, and discussions. We want to create a Medium-like experience for the episodes.Software Daily is a place where listeners can write about the topics they are listening to. When you are listening to lots of episodes about a topic such as GraphQL, you may find it useful to write about that topic as a form of active learning. The topic pages also have a Q&A section. Post questions about a topic, or post an answer. Engage in the community dialogue surrounding a topic you are passionate or curious about. If there is a topic you want to write about, check out softwaredaily.com/write.We will be turning the best written content into short podcast episodes published on the weekends where we will read your contribution and mention your name. If you write something awesome, we want to turn it into audio for larger distribution. Every topic on Software Daily has a Q&A section. We have covered lots of niche software companies and open source projects, and on Software Daily we want to collect more information about the world of software with Q&A.If you want to write about a specific company or topic that you heard about on Software Daily, Q&A is also an option. Our goal with Q&A is to provide a companion experience to listening to the podcast. It is not always easy to retain what you hear in a podcast episode. Answering some questions after you listen to an episode can help with that retention.Are you looking to hire someone specific in the world of software? Post a job on the Software Daily jobs board. We will be announcing some of these jobs on the podcast, especially the more interesting postings, and ones that align with content we are producing.We appreciate you tuning into Software Daily. We would welcome your feedback, and hope you take the time to check out SoftwareDaily.com.
There are many bad recipe web sites. Every time I navigate to a recipe website, it feels like my browser is filling up with spyware. The page loads slowly, everything seems broken, I can feel the 25 different JavaScript adtech tags interrupting each other. Whether I am searching for banana bread or a spaghetti sauce recipe, recipe sites usually make me lose my appetite.Anycart is a recipe platform that allows users to buy all of the ingredients for the recipe and have those ingredients delivered. It's a vertically integrated content site and delivery system. It is also beautifully designed and extremely performant. I learned about it from Zack Bloom, who works at Cloudflare, as he mentioned it as a case study in performance.Rafael Sanches is a founder of Anycart, and he joins the show to talk about building a recipe delivery service, and the innovations in performance that were necessary to building it.
Matterport is a company that builds 3-D imaging for the inside of buildings, construction sites, and other locations that require a “digital twin.” Generating digital images of the insides of buildings has a broad spectrum of applications, and there are considerable engineering challenges in building such a system.Matterport's hardware stack involves a camera built in-house by the company. The camera can take 360 degree scans of a room, stitch the imagery together, and make the digital twin available on the cloud.Japjit Tulsi works at Matterport, and he joins the show to discuss 3-D imaging, and his role as CTO of the company.
Play Episode Listen Later May 20, 2020
Customer data infrastructure is a type of tool for saving analytics and information about your customers. The company that is best known in this category is Segment, a very popular API company. This customer data is used for making all kinds of decisions around product roadmap, pricing, and design.RudderStack is a company built around open source customer data infrastructure. RudderStack can be self-hosted, allowing users to deploy it to their own servers and manage their data however they please. Soumyadeb Mitra is the creator of RudderStack, and he joins the show to talk about the space of customer data infrastructure, and his own company.
Geospatial analytics tools are used to render visualizations for a vast array of applications. Data sources such as satellites and cellular data can gather location data, and that data can be superimposed over a map. A map-based visualization can allow the end user to make decisions based on what they see.ArcGIS is one of the most widely used geospatial analytics platforms. It is created by ESRI, the Environmental Systems Research Institute, which was started in 1969. Today, ESRI products have 40% of the global market share of geospatial analytics software.Max Payson is a solutions engineer at ESRI, and he joins the show to talk about applications of ArcGIS, and the landscape of GIS more broadly.
Over the last 5 years, web development has matured considerably. React has become a standard for frontend component development. GraphQL has seen massive growth in adoption as a data fetching middleware layer. The hosting platforms have expanded beyond AWS and Heroku, to newer environments like Netlify and Vercel.These changes are collectively known as the JAMStack. With the changes brought by the JAMStack, it raises the question: how should an app be built today? Can a framework offer guidance for how the different layers of a JAMStack app should fit together?RedwoodJS is a framework for building JAMStack applications. Tom Preston-Werner is one of the creators of RedwoodJS, as well as the founder of GitHub and Chatterbug, a language learning app. He joins the show to talk about the future of JAMStack development, and his goals for RedwoodJS.
For the last five months, we have been working on a new version of Software Daily, the platform we built to host and present our content. We are creating a platform that integrates the podcast with a set of other features that make it easier to learn from the audio interviews. Software Daily includes the following features: The world of software is large, and growing bigger every day. Software Daily is a place to explore this world of software companies and projects.If the podcast is a useful resource for you to learn about software, then Software Daily might also provide you with value. This post (and episode) is a brief description of the features that we have built into Software Daily.If you want to listen to Software Engineering Daily without ads, you can become a paid subscriber, paying $10/month or $100/year by going to softwaredaily.com/subscribe. We now have an RSS feed that paid customers can add to a podcast player like Overcast (on iOS) or Podcast Addict (on Android). You can also listen to the premium episodes using our apps for iOS or Android.Whether you are a listener who is fine with listening to ads, or you are a listener who pays to hear episodes without ads, we are happy to have you tuning in.Apple podcasts limits the number of episodes in an RSS feed to 300. The feed with the last 300 episodes is available by searching for Software Daily. In total, we have more than 1200 episodes in our back catalog.Listeners often want to find all our episodes on React, or Kubernetes, or serverless, or self-driving cars. We have been covering these topics for years, and much of the old content has retained its value. Software Daily allows you to easily find all the episodes relating to a subject that you are interested in.You can also find our most popular episodes, ranked by how people interact with them.Additionally, episode transcripts have interactive features with highlighting, commenting, and discussions. We want to create a Medium-like experience for the episodes.Software Daily is a place where listeners can write about the topics they are listening to. When you are listening to lots of episodes about a topic such as GraphQL, you may find it useful to write about that topic as a form of active learning. The topic pages also have a Q&A section. Post questions about a topic, or post an answer. Engage in the community dialogue surrounding a topic you are passionate or curious about. If there is a topic you want to write about, check out softwaredaily.com/write.We will be turning the best written content into short podcast episodes published on the weekends where we will read your contribution and mention your name. If you write something awesome, we want to turn it into audio for larger distribution. Every topic on Software Daily has a Q&A section. We have covered lots of niche software companies and open source projects, and on Software Daily we want to collect more information about the world of software with Q&A.If you want to write about a specific company or topic that you heard about on Software Daily, Q&A is also an option. Our goal with Q&A is to provide a companion experience to listening to the podcast. It is not always easy to retain what you hear in a podcast episode. Answering some questions after you listen to an episode can help with that retention.Are you looking to hire someone specific in the world of software? Post a job on the Software Daily jobs board. We will be announcing some of these jobs on the podcast, especially the more interesting postings, and ones that align with content we are producing.We appreciate you tuning into Software Daily. We would welcome your feedback, and hope you take the time to check out SoftwareDaily.com.
Devices on the edge are becoming more useful with improvements in the machine learning ecosystem. TensorFlow Lite allows machine learning models to run on microcontrollers and other devices with only kilobytes of memory. Microcontrollers are very low-cost, tiny computational devices. They are cheap, and they are everywhere.The low-energy embedded systems community and the machine learning community have come together with a collaborative effort called tinyML. tinyML represents the improvements of microcontrollers, lighter weight frameworks, better deployment mechanisms, and greater power efficiency. Zach Shelby is the CEO of EdgeImpulse, a company that makes a platform called Edge Impulse Studio. Edge Impulse Studio provides a UI for data collection, training, and device management. As someone creating a platform for edge machine learning usability, Zach was a great person to talk to the state of edge machine learning and his work building a company in the space.
Brex is a credit card company that provides credit to startups, mostly companies which have raised money. Brex processes millions of transactions, and uses the data from those transactions to assess creditworthiness, prevent fraud, and surface insights for the users of their cards.Brex is full of interesting engineering problems. The high volume of transactions requires data infrastructure to support all those transactions coming through the platform. As a credit card company, Brex needs to integrate with credit card networks and banking systems. There are internal systems for applications such as dispute resolution.Cos Nicolaescu is the CTO at Brex. He joins the show to discuss engineering at Brex, the dynamics of a credit card company, and his strategies around management. It was an instructive look inside of a rapidly growing fintech company.
Every software company is a distributed system, and distributed systems fail in unexpected ways. This ever-present tendency for systems to fail has led to the rise of failure testing, otherwise known as chaos engineering. Chaos engineering involves the deliberate failure of subsystems within an overall system to ensure that the system itself can be resilient to these kinds of unexpected failures.Peter Alvaro is a distributed systems researcher who has published papers on a range of subjects, including debugging, failure testing, databases, and programming languages. He works with both academia and industry. Peter joins the show to discuss his research topics and goals.
Kubernetes has become a highly usable platform for deploying and managing distributed systems. The user experience for Kubernetes is great, but is still not as simple as a full-on serverless implementation–at least, that has been a long-held assumption. Why would you manage your own infrastructure, even if it is Kubernetes? Why not use autoscaling Lambda functions and other infrastructure-as-a-service products?Matt Ward is a listener of the show and an engineer at Mux, a company that makes video streaming APIs. He sent me an email that said Mux has been having success with self-managed Kubernetes infrastructure, which they deliberately opted for over a serverless deployment. I wanted to know more about what shaped this decision to opt for self-managed infrastructure, and the costs and benefits that Mux has accrued as a result.Matt joins the show to talk through his work at Mux, and the architectural impact of opting for Kubernetes instead of fully managed serverless infrastructure.
Server infrastructure traditionally consists of monolithic servers containing all of the necessary hardware to run a computer. These different hardware components are located next to each other, and do not need to communicate over a network boundary to connect the CPU and memory.LegoOS is a model for disaggregated, network-attached hardware. LegoOS disseminates the traditional operating system functionalities into loosely-coupled hardware and software components. By disaggregating data center infrastructure, the overall resource usage and failure rate of server infrastructure can be improved.Yiying Zhang is an assistant professor of computer science at UCSD. Her research focuses on operating systems, distributed systems, and datacenter networking. She joins the show to discuss her work and its implications for data centers and infrastructure.
Many data sources produce new data points at a very high rate. With so much data, the issue of data quality emerges. Low quality data can degrade the accuracy of machine learning models that are built around those data sources. Ideally, we would have completely clean data sources, but that's not very realistic. One alternative is a data cleaning system, which can allow us to clean up the data after it has already been generated.HoloClean is a statistical inference engine that can impute, clean, and enrich data. HoloClean is centered around “The Probabilistic Unclean Database Model”, which allows for two systems–an “intension” and a “realizer” to work together to fill in missing fields and fix erroneous fields in data.HoloClean was created by Theo Rekatsinas, and he joins the show to talk about the problem of fast, unclean data, and his work with HoloClean. We also talk about other problems in machine learning and the engineering workflows around data.
Machine learning workflows have had a problem for a long time: taking a model from the prototyping step and putting it into production is not an easy task. A data scientist who is developing a model is often working with different tools, or a smaller data set, or different hardware than the environment which that model will be deployed to.This problem existed at Uber just as it does at many other companies. Models were difficult to release, iterations were complicated, and collaboration between engineers could never reach a point that resembled a harmonious “DevOps”-like workflow. To address these problems, Uber developed an internal system called Michelangelo.Some of the engineers working on Michelangelo within Uber realized that there was a business opportunity in taking the Michelangelo work and turning it into a product company. Thus, Tecton was born. Tecton is a machine learning platform focused on solving the same problems that existed within Uber. Kevin Stumpf is the CTO at Tecton, and he joins the show to talk about the machine learning problems of Uber, and his current work at Tecton.
A frontend developer issuing a query to a backend server typically requires the developer to issue that query through an ORM or a raw database query. Prisma is an alternative to both of these data access patterns, allowing for easier database access through auto-generated, type-safe query building tailored to an existing database schema.By integrating with Prisma, the developer gets a database client that has query autocompletion, and an API server with less boilerplate code. Prisma also has a system called Prisma Migrate, which simplifies database and schema migrations.Johannes Schickling is CEO of Prisma, and he joins the show to talk about the developments of Prisma that have occurred since we last spoke, and where the company is headed.
Uber needs to visualize data on a range of different surfaces. A smartphone user sees cars moving around on a map as they wait for their ride to arrive. Data scientists and operations researchers within Uber study the renderings of traffic moving throughout a city.Data visualization is core to Uber, and the company has developed a stack of technologies around visualization in order to build appealing, highly functional applications. DeckGL is a library for high-performance visualizations of large data sets. LumaGL is a set of components that targets high performance rendering. These and other tools make up VisGL, the data visualization technology that powers Uber.Uber's visualization team included Ib Green, who left Uber to co-found Unfolded.ai, a company that builds geospatial analytics products. He joins the show to discuss his work on visualization products and libraries at Uber, as well as the process of taking that work to found Unfolded.ai. Full disclosure: I am an investor in Unfolded.ai.
Kubernetes continues to mature as a platform for infrastructure management. At this point, many companies have well-developed workflows and deployment patterns for working with applications built on Kubernetes. The complexity of some of these deployments may be daunting, and when a new employee joins a company, that employee needs to get quickly onboarded with the custom dev environment. Environment management is not the only issue with Kubernetes development. When a service gets updated, that update needs to be live and usable as fast as possible. When Kubernetes-related errors occur, those problems need to be easily accessible in a UI for triage.Dan Bentley is the CEO of Windmill Engineering, a company that makes a set of Kubernetes tools called Tilt. Dan joins the show to talk about the workflow for deploying Kubernetes infrastructure and the role of Tilt, the product he has been working on.
The life cycle of data management includes data cleaning, extraction, integration, analysis and exploration, and machine learning models. It would be great if all of this data management could be handled with automation, but unfortunately that is not an option. For most applications, data management requires a human in the loop.A human in the loop might be responsible for working in a spreadsheet, or labeling data as a mechanical turk, or creating an algorithm for data labeling in Snorkel. Data scientists and data analysts are humans in the loop, studying large data sets.Aditya Parameswaran is an assistant professor at UC Berkeley. He studies human-in-the-loop data analytics, and he joins the show to talk about the work and the projects that he is focused on, including DataSpread, an alternative to Excel, and OrpheusDB, a relational database versioning system.
Play Episode Listen Later Jun 10, 2020
Apache Airflow was released in 2015, introducing the first popular open source solution to data pipeline orchestration. Since that time, Airflow has been widely adopted for dependency-based data workflows. A developer might orchestrate a pipeline with hundreds of tasks, with dependencies between jobs in Spark, Hadoop, and Snowflake.Since Airflow's creation, it has powered the data infrastructure at companies like Airbnb, Netflix, and Lyft. It has also been at the center of Astronomer, a startup that helps enterprises build infrastructure around Airflow. Airflow is used to construct DAGs–directed acyclic graphs for managing data workflows.Maxime Beauchemin is the creator of Airflow. Vikram Koka and Ash Berlin-Taylor work at Astronomer. They join the show to talk about the state of Airflow–the purpose of the project, its use cases, and open source ecosystem.
Grafana is an open source visualization and monitoring tool that is used for creating dashboards and charting time series data. Grafana is used by thousands of companies to monitor their infrastructure. It is a popular component in monitoring stacks, and is often used together with Prometheus, ElasticSearch, MySQL, and other data sources.The engineering complexities around building Grafana involve the large number of integrations, the highly configurable ReactJS frontend, and the ability to query and display large data sets. Grafana also must be deployable to cloud and on-prem environments.Torkel Ödegaard is a co-founder of Grafana Labs, and joins the show to talk about his work on the open source project and the company he is building around it.
Cruise is an autonomous car company with a development cycle that is highly dependent on testing its cars–both in the wild and in simulation. The testing cycle typically requires cars to drive around gathering data, and that data to subsequently be integrated into a simulated system called Matrix.With COVID-19, the ability to run tests in the wild has been severely dampened. Cruise cannot put so many cars on the road, and thus has had to shift much of its testing procedures to rely more heavily on the simulations. Therefore, the simulated environments must be made very accurate, including the autonomous agents such as pedestrians and cars.Tom Boyd is VP of Simulation at Cruise. He joins the show to talk about the testing workflow at Cruise, how the company builds simulation-based infrastructure, and his work managing simulation at the company.
Originally published January 31, 2019Artificial intelligence is reshaping every aspect of our lives, from transportation to agriculture to dating. Someday, we may even create a superintelligence–a computer system that is demonstrably smarter than humans. But there is widespread disagreement on how soon we could build a superintelligence. There is not even a broad consensus on how we can define the term “intelligence”.Information technology is improving so rapidly we are losing the ability to forecast the near future. Even the most well-informed politicians and business people are constantly surprised by technological changes, and the downstream impact on society. Today, the most accurate guidance on the pace of technology comes from the scientists and the engineers who are building the tools of our future.Martin Ford is a computer engineer and the author of Architects of Intelligence, a new book of interviews with the top researchers in artificial intelligence. His interviewees include Jeff Dean, Andrew Ng, Demis Hassabis, Ian Goodfellow, and Ray Kurzweil.Architects of Intelligence is a privileged look at how AI is developing. Martin Ford surveys these different AI experts with similar questions. How will China's adoption of AI differ from that of the US? What is the difference between the human brain and that of a computer? What are the low-hanging fruit applications of AI that we have yet to build?Martin joins the show to talk about his new book. In our conversation, Martin synthesizes ideas from these different researchers, and describes the key areas of disagreement from across the field.
Play Episode Listen Later Jun 16, 2020
Originally published June 13, 2019. We are taking a few weeks off. We'll be back soon with new episodes.Machine learning allows software to improve as that software consumes more data. Machine learning is a tool that every software engineer wants to be able to use. Because machine learning is so broadly applicable, software companies want to make the tools more accessible to the developers across the organization.There are many steps that an engineer must go through to use machine learning, and each additional step inhibits the chances that the engineer will actually get their model into production.An engineer who wants to build machine learning into their application needs access to data sets. They need to join those data sets, and load them into a machine (or multiple machines) where their model can be trained. Once the model is trained, the model needs to test on additional data to ensure quality. If the initial model quality is insufficient, the engineer might need to tweak the training parameters. Once a model is accurate enough, the engineer needs to deploy that model. After deployment, the model might need to be updated with new data later on. If the model is processing sensitive or financially relevant data, a provenance process might be necessary to allow for an audit trail of decisions that have been made by the model.Rob Story and Kelley Rivoire are engineers working on machine learning infrastructure at Stripe. After recognizing the difficulties that engineers faced in creating and deploying machine learning models, Stripe engineers built out Railyard, an API for machine learning workloads within the company.Rob and Kelley join the show to discuss data engineering and machine learning at Stripe, and their work on Railyard.
Originally published November 21, 2019. We are taking a few weeks off. We'll be back soon with new episodes.HTTP is a protocol that allows browsers and web applications to communicate across the Internet.Everyone knows that HTTP is doing some important work, because “HTTP” is at the beginning of most URLs that you enter into your browser. You might be familiar with the request/response model, and HTTP request methods such as GET, PUT, and POST. But unless you have had a reason to learn more about the details of HTTP, you probably don't know much more than that.Julia Evans is a software engineer and writer who creates Wizard Zines, a series of easy-to-read online magazines that explain technical software topics. Julia's zines include “Linux Debugging Tools”, “Help! I Have A Manager!”, and recently “HTTP: Learn your browser's language”.Her zines are a creative, innovative format for describing the world of software engineering while also exploring her own artistic pursuits in writing, design, and illustration. Julia was previously on the show to discuss Ruby profiling, and she returns to the show to discuss HTTP, as well as her creative process and goals with Wizard Zines.
Originally published October 24, 2019. We are taking a few weeks off. We'll be back soon with new episodes.Redis is an in-memory database that persists to disk. Redis is commonly used as an object cache for web applications.Applications are composed of caches and databases. A cache typically stores the data in memory, and a database typically stores the data on disk. Memory has significantly faster access times, but is more expensive and is volatile, meaning that if the computer that is holding that piece of data in memory goes offline, the data will be lost.When a user makes a request to load their personal information, the server will try to load that data from a cache. If the cache does not contain the user's information, the server will go to the database to find that information. Alvin Richards is chief product officer with Redis Labs, and he joins the show to discuss how Redis works. We explore different design patterns for making Redis high availability, or using it as a volatile cache, and we talk through the read and write path for Redis data. Full disclosure: Redis Labs is a sponsor of Software Engineering Daily.
Originally published April 14, 2017. We are taking a few weeks off. We'll be back soon with new episodes.Facebook's open source projects include React, GraphQL, and Cassandra. These projects are key pieces of infrastructure used by thousands of developers–including engineers at Facebook itself. These projects are able to gain traction because Facebook takes time to decouple the projects from their internal infrastructure and clean up the code before releasing them into the wild. Facebook has high standards for what they are willing to release.Tom Occhino manages the React team at Facebook and works closely with engineers to determine what projects make sense to open source. In this episode, Preethi Kasireddy interviews Tom about how Facebook thinks about open source–what went right with React, why it makes sense for Facebook to continue to release new open source projects, and how full-time employees at Facebook interact with that open source codebase.
Originally published December 20, 2019. We are taking a few weeks off. We'll be back soon with new episodes.freeCodeCamp was started five years ago with the goal of providing free coding education to anyone on the Internet.freeCodeCamp has become the best place to begin learning how to write software. There are many other places that a software engineer should visit on their educational journey, but freeCodeCamp is the best place to start, because it is free, and there are no advertisements. For most people learning to code, the price of that education is important, because they are learning to code to build a new career. It's also important that a new programmer learns from an unbiased source of information, because an ad-supported environment will educate the new programmer towards products that they might not need.freeCodeCamp has not been easy to build. Building freeCodeCamp has required expertise in software engineering, business, media, and community development. The donation-based business model of freeCodeCamp doesn't collect very much money. Why would somebody build a non-profit when they could spend their time building a highly profitable software company?Quincy Larson is the founder of freeCodeCamp, and he joins the show for a special episode about his backstory and the journey to building the best place on the Internet for a new programmer to begin.
Play Episode Listen Later Jun 23, 2020
Originally published May 2, 2017. We are taking a few weeks off. We'll be back soon with new episodes.A new programmer learns to build applications using data structures like a queue, a cache, or a database. Modern cloud applications are built using more sophisticated tools like Redis, Kafka, or Amazon S3. These tools do multiple things well, and often have overlapping functionality. Application architecture becomes less straightforward.The applications we are building today are data-intensive rather than compute-intensive. Netflix needs to know how to store and cache large video files, and stream them to users quickly. Twitter needs to update user news feeds with a fanout of the president's latest tweet. These operations are simple with small amounts of data, but become complicated with a high volume of users.Martin Kleppmann is the author of Data Intensive Applications, an O'Reilly book about how to use modern data tools to solve modern data problems. His book includes high-level discussions about architectural strategy, and lower level discussions like how leader election algorithms can create problems for a data intensive application.
Originally published July 25, 2019. We are taking a few weeks off. We'll be back soon with new episodes.Envoy is an open source edge and service proxy that was originally developed at Lyft. Envoy is often deployed as a sidecar application that runs alongside a service and helps that service by providing features such as routing, rate limiting, telemetry, and security policy. Envoy has gained significant traction in the open source community, and has formed the backbone of popular service mesh projects such as Istio.Envoy has been mostly used as a backend technology, but the potential applications of Envoy include frontend client applications as well. The goal of Envoy is to make the network easier to work with–and the network includes client applications such as mobile apps running on a phone.Envoy Mobile is a network proxy for mobile applications. Envoy Mobile brings many of the benefits of Envoy to the mobile client ecosystem. It provides mobile developers with a library that can simplify or abstract away many of the modern advances that have been made in networking in recent years, such as HTTP2, gRPC, and QUIC.Matt Klein is the creator of Envoy, and he joins the show to discuss Envoy Mobile. Matt describes how the networking challenges of mobile applications are similar to those of backend systems and cloud infrastructure. We discuss the advances in networking technology that Envoy Mobile helps bring to the mobile ecosystem, and also touch on the scalability challenges that Matt is seeing at Lyft.
Play Episode Listen Later Jun 25, 2020
Originally published October 8, 2019. We are taking a few weeks off. We'll be back soon with new episodes.Video surveillance impacts human lives every day. On most days, we do not feel the impact of video surveillance. But the effects of video surveillance have tremendous potential. It can be used to solve crimes and find missing children. It can be used to intimidate journalists and empower dictators. Like any piece of technology, video surveillance can be used for good or evil.Video recognition lets us make better use of video feeds. A stream of raw video doesn't provide much utility if we can't easily model its contents. Without video recognition, we must have a human sitting in front of the video to manually understand what is going on in that video.Veronica Yurchuk and Kosh Shysh are the founders of Traces.ai, a company building video recognition technology focused on safety, anonymity, and positive usage. They join the show to discuss the field of video analysis, and their vision for how video will shape our lives in the future.
Play Episode Listen Later Jun 26, 2020
Originally published July 6, 2017. We are taking a few weeks off. We'll be back soon with new episodes.React Native allows developers to reuse components from one user interface on multiple platforms. React Native was introduced by Facebook to reduce the pain of teams who were rewriting their user interfaces for web, iOS, and Android. Nader Dabit hosts React Native Radio, a podcast about React Native. Nader also trains companies to use React Native through his company React Native Training. In this episode, we explore what a developer can and cannot do with React Native, when a developer needs to use native APIs, and some speculation on the future of React Native.
At a customer service center, thousands of hours of audio are generated. This audio provides a wealth of information to transcribe and analyze. With the additional data of the most successful customer service representatives, machine learning models can be trained to identify which speech patterns are associated with a successful worker.By identifying these speaking patterns, a customer service center can continuously improve, with the different representatives learning the different patterns. The same is true for other speech-based tasks, such as sales calls.Cresta is a company that builds systems to ingest high volumes of speech data in order to discover features that correlate with high performance human workers. Zayd Enam is a co-founder of Cresta, and joins the show to talk about the domain of speech data and what he and his team are building at Cresta.
A software company manages and interacts with hundreds of APIs. These APIs require testing, performance analysis, authorization management, and release management. In a word, APIs require collaboration.Postman is a system for API collaboration. It allows users to test APIs with collections of requests, monitor the API responses, and visualize the query results. Users of Postman can collaborate with their team through Team Workspaces, sharing collections, environments, history, and more.Abhinav Asthana is the founder of Postman and he joins the show to talk about API management and collaboration. Abhinav started Postman as a hobby project, and it has grown into a large and successful business, far beyond the original product of API testing.
As a user browses a webpage, that browser session generates events that need to be recorded, validated, enriched, and stored. This data is sometimes called customer data infrastructure, or CDI. This data requires a full stack of different tools: a system on the frontend to collect the data, middleware to transport the data, and backend systems for storing and loading that data into data warehouses and other analytical systems.Snowplow Analytics is a data collection platform for storing events. In Snowplow, modules called Trackers send data to Collectors. The data can then be validated and enriched, and then put into the user's data warehouse via ETL.Alex Dean is the CEO of Snowplow, and he joins the show to talk through the business model, management, and engineering of Snowplow Analytics, as well as the overall data engineering landscape.
DynamoDB is a managed NoSQL database service from AWS. It is widely used as a transactional database to fulfill key-value and wide-column data models. In a previous show with Rick Houlihan, we explored how to build a data model and optimize the query patterns for a NoSQL database. Today's show is about DynamoDB specifically: partitioning, indexing, query semantics, normalization, table design, and other subjects. We talk through how to be cost conscious, and how to integrate with event-based AWS Lambda triggers.Alex DeBrie is the author of The DynamoDB Book, a book whose title speaks for itself. Alex has comprehensive experience with DynamoDB, and he joins the show to share that experience through a detailed discussion of use cases and strategies related to DynamoDB.
Deepgram is an end-to-end deep learning platform for speech recognition. Unlike the general purpose APIs from Google or Amazon, Deepgram models are custom-trained for each customer. Whether the customer is a call center, a podcasting company, or a sales department, Deepgram can work with them to build something specific to their use case.Sound data is incredibly rich. Consider all the features in a voice recording: volume, intonation, inflection. And once the speech is transcribed, there are many more features that can be discovered from the text transcription.Scott Stephenson is the CEO of Deepgram, and he joins the show to talk through end-to-end deep learning for speech, as well as the dynamics of the business and the deployment strategy for working with customers.
The modern release workflow involves multiple stakeholders: engineers, management, designers, and product managers. It is a collaborative process that is often held together with brittle workflows. A developer deploys a new build to an ad hoc staging environment and pastes a link to that environment in Slack. Other stakeholders click on that link, then send messages to each other in Slack, or make comments on the pull request in GitHub.This workflow is far from ideal. Collaborating around pull requests can be made easier with a dedicated set of tools for sharing and discussing those pull requests. This is the goal of FeaturePeek, a system for spinning up dedicated pull request environments, creating screenshots and comments, and reimagining the lifecycle of the release workflow.Eric Silverman is a co-founder of FeaturePeek and he joins the show to discuss release management, the interactions between different stakeholders, and the development of his company. Much like the previous show about Postman, in which we explored how API management has become a ripe space for collaboration, the same is true of pull requests.
AWS has over 150 different services. Databases, log management, edge computing, and lots of others. Instead of being overwhelmed by all of these products, an engineering team can simplify their workflow by focusing on a small subset of AWS services–the defaults.Daniel Vassalo is the author of The Good Parts of AWS. An excerpt from the book: “The cost of acquiring new information is high and the consequence of deviating from a default choice is low, so sticking with the default will likely be the optimal choice. A default choice is any option that gives you very high confidence that it will work.” Having confidence in your workflow–even if it is a simple workflow–has advantages.S3, EC2, Elastic Load Balancers: for simple web applications, this is really all you need to build your business. Daniel Vassallo worked at AWS for more than 8 years before leaving to become an entrepreneur and author. He joins the show to talk about what the good parts of AWS are, and his strategy for building applications with that subset of services.
Developing machine learning models is not easy. From the perspective of the machine learning researcher, there is the iterative process of tuning hyperparameters and selecting relevant features. From the perspective of the operations engineer, there is a handoff from development to production, and the management of GPU clusters to parallelize model training.In the last five years, machine learning has become easier to use thanks to point solutions. TensorFlow, cloud provider tools, Spark, Jupyter Notebooks. But every company works differently, and there are few hard and fast rules for the workflows around machine learning operations.Determined AI is a platform that provides a means for collaborating around data prep, model development and training, and model deployment. Neil Conway is a co-founder of Determined, and he joins the show to discuss the challenges around machine learning operations, and what he has built with Determined.
M3 is a scalable metrics database originally built to host Uber's rapidly growing data storage from Prometheus. When Rob Skillington was at Uber, he helped design, implement, and deploy M3. Since leaving Uber, he has co-founded a company around a hosted version of M3 called Chronosphere.If you have access to a scalable metrics database, you might as well start accumulating as much data as possible, right? Not exactly. If your company generates enough data, you probably want to turn down the dials on how frequently you save a metric. Downsampling will reduce the amount of money that you pay for these hosted metrics.In today's show, Rob discusses the engineering and deployment of M3, and how that work led him to founding Chronosphere, as well as the product offering of the company.
WordPress has been a dominant force in the world of online publishing for many years because of how battle-tested it is. WordPress is the definitive leader in CMS technology. But there have always been alternatives. Drupal, Ghost, and other open source CMSes. More recently, there has been an emergence of the headless CMS, such as Contentful, which decouples the CMS backend from the frontend presentation layer.Strapi is a popular open source headless CMS. Pierre Burgy is the founder of Strapi, and he joins the show to talk about the CMS category, the role that Strapi fills, and the technology behind Strapi.
Netflix runs all of its infrastructure on Amazon Web Services. This includes business logic, data infrastructure, and machine learning. By tightly coupling itself to AWS, Netflix has been able to move faster and have strong defaults about engineering decisions. And today, AWS has such an expanse of services that it can be used as a platform to build custom tools.Metaflow is an open source machine learning platform built on top of AWS that allows engineers at Netflix to build directed acyclic graphs for training models. These DAGs get deployed to AWS as Step Functions, a serverless orchestration platform.Savin Goyal is a machine learning engineer with Netflix, and he joins the show to talk about the machine learning challenges within Netflix, and his experience working on Metaflow. We also talk about DAG systems such as AWS Step Functions and Airflow.
A service mesh provides routing, load balancing, policy management, and other features to a set of services that need to communicate with each other. The mesh can simplify operations across these different services by providing an interface to configure them. There are lots of different vendors who offer service mesh technology: AWS has AppMesh, Google has Istio (which is open source), Buoyant has Linkerd (which is also open source), and HashiCorp has Consul Connect. Unfortunately, these service meshes do not all play well together. And at a large enough company, different teams will be setting up different service meshes. So it would be useful for services in those different meshes to be able to communicate with each other.Luke Kysow is an engineer at HashiCorp where he works on Consul Connect, and he joins the show to discuss service mesh usage, adoption, and possible strategies for maintaining multiple service meshes within a single organization.
GitHub has been a social network for developers for many years. Most social networks are centered around mobile applications, but GitHub sits squarely in a developer's browser-based desktop workflow. As a result, the design of a mobile app for GitHub is less straightforward. GitHub did acquire a popular mobile client called GitHawk, which was developed by Ryan Nystrom.Since joining GitHub, Ryan has worked on a new mobile app for GitHub, along with a team of engineers including Brian Lovin. Ryan and Brian both join the show to discuss GitHub mobile, and how they designed, architected, and built the app.There is no company quite like GitHub–a social network combined with a version control system that provides a critical utility. All this made for an interesting episode about a one-of-a-kind mobile product.
Software companies can be funded in a variety of ways: venture capital, self-funding, and debt, among others. In order to receive financing, a company is evaluated on its ability to generate future cash flows. After all, a valuation is a number that summarizes the present value of future cash flows.Determining that valuation number is a complicated, subjective process. If the valuation can be determined more intelligently and objectively, then smarter financing decisions can be made. This is the reasoning behind the company Capital, which aims to build a better modeling system for evaluating companies.Blair Silverberg and Chris Olivares are founders of Capital, and they join the show to explore the modeling process for valuations, and their strategy for doing this with their software models.
ADP has been around for more than 70 years, fulfilling payroll and other human resources services. Payroll processing is a complex business, involving the movement of money in accordance with regulatory and legal strictures. From an engineering point of view, ADP has decades of software behind it, and a bright future of a platform company used by thousands of companies. Balancing the maintenance of old code while charting a course with the new projects is not a simple task. Tim Halbur is the CTO of ADP, and he joins the show to talk through how engineering works at ADP, and how the organization builds for the future of the company while maintaining the code of the past.
Play Episode Listen Later Jul 20, 2020
Managing microservices becomes a challenge as the number of services within the organization grows. With that many services comes more interdependencies–downstream and upstream services that may be impacted by an update to your service. One solution to this problem: a dashboard and newsfeed system that lets you see into the health and changes across your services. With this kind of system, you can avoid accidentally shipping code that will impact other service owners. It can also help with testing, giving you an end-to-end picture for how a test can impact other services.Anish Dhar and Ganesh Datta are co-founders of Cortex, a system for managing your services. Anish and Ganesh join the show to talk about their work building Cortex, and the value that it provides to the companies that use it.In a previous show we covered a company called Effx, which does something similar.
Users do not use web applications in the way that you might expect. And it is not easy to get the data that is necessary to get a full picture. But a newer API within browsers does make this more possible by capturing DOM mutations. The change capture of these DOM mutations can be stored for replay in the future. After being stored, this change capture can be retrieved and replayed. That allows for comprehensive frontend monitoring, which has been built into a product called FullStory.Michael Morrissey is the CTO of FullStory, and he joins the show to talk about how session capture works, and the architecture of FullStory–how sessions get saved, stored and retrieved. In a previous show we talked about LogRocket, a product which does something similar.
A large codebase cannot be searched with naive indexing algorithms. In order to search through a codebase the size of Uber's it is necessary to build a much more sophisticated indexing system than simple pure text search.SourceGraph is a system for universal code search. It allows developers to more easily onboard to a new codebase, make large refactors, and perform other tasks. SourceGraph can integrate with source control systems, IDEs, and other tools to fit comfortably into an engineer's workflow.Beyang Liu is a co-founder of SourceGraph and he joins the show to talk about how codebases can become large and unwieldy, and the tooling that SourceGraph offers to make these codebases easier to work with.
Pandas is a Python data analysis library, and an essential tool in data science. Pandas allows users to load large quantities of data into a data structure called a dataframe, over which the user can call mathematical operations. When the data fits entirely into memory this works well, but sometimes there is too much data for a single box.The Modin project scales Pandas workflows to multiple machines by utilizing Dask or Ray, which are distributed computing primitives for Python programs. Modin builds an execution plan for large data frames to be operated on against each other, which makes data science considerably easier for these large data sets.Devin Petersohn started the Modin project, and he joins the show to talk about data science with Python, and his work in the Berkeley RISELab.
Ray is a general purpose distributed computing framework. At a low level, Ray provides fault-tolerant primitives that support applications running across multiple processors. At a higher level, Ray supports scalable reinforcement learning, including the common problem of hyperparameter tuning.In a previous episode, we explored the primitives of Ray as well as Anyscale, the business built around Ray and reinforcement learning. In today's episode, Richard Liaw explores some of the libraries and applications that sit on top of Ray. RLlib gives APIs for reinforcement learning such as policy serving and multi-agent environments. Tune gives developers an easy way to do scalable hyperparameter tuning, which is necessary for exploring different types of deep learning configurations. In a future show, we will explore Tune in more detail.
Acquisitions are part of the technology industry. A successful corporation will often have an “exit”, either going public or becoming acquired. And with each of these corporations, there is a set of stories that narrate the company from beginning to end. Acquired is a podcast that tells the stories of companies such as YouTube, Instagram, and PayPal. During each episode, the life of a company is explored from its beginning til the end. Media companies, chip companies, and software companies all take the center stage on various episodes.David Rosenthal and Ben Gilbert are the hosts of Acquired, and they join today's show to talk about the podcast they started, a few business stories, and the podcast industry itself.
Across a company, there is a wide range of resources that employees need access to. Documents, S3 buckets, git repositories, and many others. As access to resources changes across the organization, a history of the changes to permissions can be useful for compliance and monitoring.Indent is a system for simplifying access management across infrastructure. Indent allows users within an organization to request access to resources, and keeps logs of the changes to who can access those resources.Fouad Matin and Dan Gillespie are the founders of Indent, and they join the show to talk through the application of access control management, and the architecture of Indent itself, which has numerous interesting engineering decisions within it.Indent job opportunities
Drug trials can lead to new therapeutics and preventative medications being discovered and placed on the market. Unfortunately, these drug trials typically require animal testing. This means animals are killed or harmed as a result of needing to verify that a drug will not kill humans.Animal testing is unavoidable, but the extent to which testing needs to occur can be reduced by inserting machine learning models which simulate the effects of a drug on the human body. If the simulated effect is negative enough, animal testing doesn't need to be run, thus no animals need to be harmed.Bryan Vicknair and Jason Walsh work at VeriSIM Life, a company which makes software simulations of animals. These simulations can be used to model drug testing, and change the workflow for drug trials. They join the show to talk through the mechanics of drug testing, and how VeriSIM Life fits into that workflow.
Dev.to has become one of the most popular places for developers to write about engineering, programming languages, and everyday life. For those who have not seen it, DEV is like a cross between Twitter and Medium, but targeted at developers. The content on DEV ranges from serious to humorous to technically useful.DEV contains a set of features which appeal to a developer community, such as the ability to embed code snippets in a post, but for the most part the entire app is generalizable to other types of communities. Hence, the motivation for “Forem”. Forem is an open source project to make it possible to spin up instances of communities that are like DEV, but for other communities such as mixed martial arts, or doctors.Ben Halpern is the creator of DEV and Forem, and he joins the show to talk about the DEV Community and his long-term goals for what the DEV team is building.
Logs are the source of truth. If a company is sufficiently instrumented, the logging data that streams off of the internal infrastructure can be refined to tell a comprehensive story for what is changing across that infrastructure in real time. This includes logins, permissions changes, other events that could signal a potential security compromise.Datadog is a company that was built around log management, metrics storage, and distributed tracing. More recently, they have also built tools for monitoring the security of an organization. Detecting security threats can be achieved by alerting on known security risks, or pieces of information that could be indicative of a vulnerability.Marc Tremsal works at Datadog, and joins the show to talk through security monitoring. Full disclosure: Datadog is a sponsor of Software Engineering Daily.
The US Army Cyber School is a training program which trains cyber soldiers and leaders to be adept in cyber military strategy and tactics. In order to teach these skills, the cyber school uses a system they call “courseware as code”, a workflow that allows updates to the curriculum in a reversion-friendly fashion similar to infrastructure-as-code.Ben Allison teaches at the US Army Cyber School and has put work into developing the training program and ongoing lesson plans. Ben joins the show to talk about how the US Army manages curriculum through courseware as code, and the work he has done to improve this workflow over time.Ben is also speaking at GitLab Commit 2020, GitLab's upcoming conference. You can register for GitLab Commit yourself by going to softwareengineeringdaily.com/gitlabcommit.
Play Episode Listen Later Aug 18, 2020
Business intelligence tooling allows analysts to see large quantities of data presented to them in a flexible interface including charts, graphs, and other visualizations. BI tools have been around for decades, and as the world moves towards increased open source software, the business intelligence tools are following that trend.Metabase is an open source business intelligence system that has been widely adopted by enterprises. It includes all the common tools that are expected from a business intelligence system: large-scale data ingestion, visualization software, and a flexible user interface.Sameer Al-Sakran is the CEO of Metabase and he joins the show to talk about Metabase's design, engineering, and usage.
Play Episode Listen Later Aug 19, 2020
Image annotation is necessary for building supervised learning models for computer vision. An image annotation platform streamlines the annotation of these images. Well-known annotation platforms include Scale AI, Amazon Mechanical Turk, and Crowdflower.There are also large consulting-like companies that will annotate images in bulk for you. If you have an application that requires lots of annotation, such as self-driving cars, then you might be compelled to outsource this annotation to such a company.SuperAnnotate is an image annotation platform that can be used by these image annotation outsourcing firms. This episode explores SuperAnnotate, and the growing niche of image annotation. Vahan and Tigran Petrosyan are the founders of SuperAnnotate, and join the show for today's interview.
Chatbots are useful for developing well-defined applications such as first-contact customer support, sales, and troubleshooting. But the potential for chatbots is so much greater. Over the last five years, there have been numerous platforms that have arisen to allow for better, more streamlined chatbot creation.Dialogue software enables the creation of sophisticated chatbots. ParlAI is a dialogue platform built inside of Facebook. It allows for the development of dialogue models within Facebook. These chatbots can “remember” information from session to session, and continually learn from user input.Stephen Roller is an engineer who helped build ParlAI, and he joins the show to discuss the history of chatbot applications and what the Facebook team is trying to accomplish with the development of ParlAI.
Every software company works off of several different development environments–at the very least there is staging, testing, and production. Every push to staging can be spun up as an application to be explored, tinkered with, and tested. These ad hoc spin-ups are known as release apps.A release app is an environment for engineers to play with, and potentially throw away or promote to production. Release apps have been made easier due to technologies such as infrastructure-as-code, continuous integration, and Kubernetes.Tommy McClung is the co-founder of Release App, a company that makes it easy to spin up release environments for your software. Tommy joins the show to discuss release workflows, and his work building Release App.
Code is version controlled through git, the version control system originally built to manage the Linux codebase. For decades, software has been developed using git for version control. More recently, data engineering has become an unavoidable facet of software development. It is reasonable to ask–why are we not version controlling our data?Dmitry Petrov is the founder of Iterative.ai, a company for collaborating and version controlling data sets. Dmitry joins the show to talk about how data version control works, and Iterative.ai, the company he is building around dataset management and collaboration.
As software permeates our lives, there are an increased number of situations where the legal system must be designed to account for that software. Whether the issues are open source licensing, cryptocurrencies, or worker classifications, software overlaps heavily with the law.Just as software is crafted by engineers, the legal structure around software is crafted by lawyers. There are large law firms that have built their business by knowing how to navigate these software and business questions.Mark Radcliffe is a lawyer who has been working with software companies for decades. He joins the show to talk about the intersection of software and the law, which we discuss from multiple points of view.
CrowdFlower was a company started in 2007 by Lukas Biewald, an entrepreneur and computer scientist. CrowdFlower solved some of the data labeling problems that were not being solved by Amazon Mechanical Turk. A decade after starting CrowdFlower, the company was sold for several hundred million dollars.Today, data labeling has only grown in volume and scope. But Lukas has moved on to a different part of the machine learning stack: tooling for hyperparameter search and machine learning monitoring.Lukas Biewald joins the show to talk about the problems he was solving with CrowdFlower, the solutions that he developed as part of that company, and the efforts with his current focus: Weights and Biases, a machine learning tooling company.
Anduril is a technology defense company with a focus on drones, computer vision, and other problems related to national security. It is a full-stack company that builds its own hardware and software, which leads to a great many interesting questions about cloud services, engineering workflows, and management.Gokul Subramanian is an engineer at Anduril, and he joins the show to share his knowledge of how Anduril operates and what the company has built.
Hyperparameters define the strategy for exploring a space in which a machine learning model is being developed. Whereas the parameters of a machine learning model are the actual data coming into a system, the hyperparameters define how those data points are fed into the training process for building a model to be used by an end consumer.A different set of hyperparameters will yield a different model. Thus, it is important to try different hyperparameter configurations to see which models end up performing better for a given application. Hyperparameter tuning is an art and a science.Richard Liaw is an engineer and researcher, and the creator of Tune, a library for scalable hyperparameter tuning. Richard joins the show to talk through hyperparameters and the software that he has built for tuning them.
WebAssembly allows for the execution of languages other than JavaScript in a browser-based environment. But WebAssembly is still not widely used outside of a few particular niches such as Dropbox and Figma. Nicolo Davis works on an application called Boardgame Lab, and he joins the show to explain why WebAssembly can be useful even for a simple application.Nicolo also shares his reflections on TypeScript, Rust, and the future of web development. He talks through the client/server interaction, performance, error handling, and the process of an actual migration.
APIs within a company change all the time. Every service owner has an API to manage, and those APIs have upstream and downstream connections. APIs need to be tested for integration points as well as for their “contract”, the agreement between an API owner and the consumers of that API.Aidan Cuniffe is the founder of Optic, a product built for API change management. He joins the show to explain why there is an opportunity for such a product, and the market dynamics of the space of API testing and change management.
After working at VMware for 10 years, Jerry Chen developed an expertise in technology companies. Today, he works at Greylock, where he looks at deals in the infrastructure and developer tooling space. Jerry is an expert in go-to-market strategy and makes investments in technologies that have a good chance at becoming large and profitable businesses.In today's episode, Jerry and I talk through the dynamics of modern infrastructure investing, including examples of deals such as Chronosphere and Rockset, both of which have been featured in previous episodes of the podcast. Jerry gives his perspective on deal terms, board dynamics, and everything else that goes into a smart investment.
Robotic process automation involves the scripting and automation of highly repeatable tasks. RPA tools such as UIPath paved the way for a newer wave of automation, including the Robot Framework, an open source system for RPA.Antti Karjalainen is the CEO of Robocorp, a company that provides an RPA tool suite for developers. Antti joins the show to talk through the definition of RPA, common RPA tasks, and what he is building with Robocorp.
Biometric authentication uses signals from a human's unique biology to verify identity. Forms of biometric authentication include fingerprints, eye patterns, and the way a person walks, otherwise known as gait.UnifyID is a company that builds systems for biometric authentication. John Whaley is the CEO of UnifyID, and he joins the show to talk through techniques for biometrics, and the implementation details that UnifyID has built to turn these into a reality.
The Internet Archive collects historical records of the Internet. The Wayback Machine is one tool from the Internet Archive which you may be familiar with. One project you may be unfamiliar with is book scanning. Internet Archive scans high volumes of books in order to digitize them.In today's episode, Davide Semenzin joins the show to talk through the history of the Internet Archive and the engineering behind book digitization. We talk through OCR, storage, architecture, and scalability.
The most popular email client is Gmail, the web-based email client from Google. Gmail is dominant, but that dominance has come at a price, namely speed. Gmail caters to the lowest common denominator, serving a large ecosystem of use cases and plugins. This makes for a slow overall performance.Superhuman is an email client built for power users. Rahul Vohra is the founder of Superhuman, and joins the show to talk about the design and engineering of an email client that is made to be fast.
Factories require quality assurance work. That QA work can be accomplished by a robot with a camera together with computer vision. This allows for sophisticated inspection techniques that do not require as much manual effort on the part of a human.Arye Barnehama is a founder of Elementary Robotics, a company that makes these kinds of robots. Arye joins the show to talk through the engineering of Elementary Robotics, and his vision for the future of the factory floor.
Investing in enterprise software has become a competitive business. Lots of venture capital firms compete for the good deals at every stage. This level of competition has driven more capital into the early stages. Ed Sim is a partner with Boldstart, an early stage enterprise investment firm. He joins the show to talk about modern enterprise investment strategy and his own varied personal experiences in working at funds.
The Java ecosystem is maturing. The GraalVM high performance runtime provides a virtual machine for running applications in a variety of languages. TornadoVM extends the Graal compiler with a new backend for OpenCL. TornadoVM allows the offloading of JVM applications onto heterogeneous hardware.Juan Fumero works on TornadoVM. He joins the show to talk about the use case for TornadoVM, the design, and the engineering that underlies the system. We also talk about the overall Java ecosystem.
Robinhood is a platform for buying and selling stocks and cryptocurrencies. Robinhood is complex, fast-moving, and financial, and together these things require high quality engineering in distributed systems, observability, and data infrastructure.Jaren Glover is an engineer at Robinhood, and he joins the show to talk about the problem space within Robinhood, as well as the specific DevOps and software engineering challenges.
Twitter is a social media platform with billions of objects: people, tweets, words, events, and other entities. The high volume of information that gets created on Twitter everyday leads to a complex engineering problem for the developers building the Twitter search index.Nico Tonozzi is an engineer at Twitter. He joins the show to talk through the problem space of search at Twitter, as well as some recent challenges that he had to tackle in the continuously changing Twitter product.
Salesforce is a platform with a large number of developers, ISVs, and companies built on top of it. There is a thriving ecosystem of applications built and managed around Salesforce, leading to an important set of relationships and integration points between Salesforce and the other entities involved with the company.Kevin Poorman works at Salesforce as a developer evangelist, helping to strengthen the relationships in the Salesforce ecosystem. Kevin joins the show to talk about Salesforce and the applications that connect to it.
Developer tooling and infrastructure is a fruitful area for investing. A wide variety of technologies can have large investment outcomes based on the fact that there are lots of engineers and businesses are willing to pay for products that give those engineers a higher degree of leverage.Lee Edwards is a partner with Root Ventures. His focus is on hard problems within software, and he joins the show to talk about the thesis of his firm, as well as his personal beliefs on what makes a good investment.
Deno is a runtime for JavaScript applications. Deno is written in Rust, which changes the security properties of it. Parts of Deno are also written in TypeScript, which are causing problems in the compilation and organization of Deno. Elio Rivero is an engineer who has studied Deno and TypeScript, and he joins the show to talk about the newer JavaScript runtime and the issues caused by TypeScript.
Pachyderm is a system for data version control. Code has been version controlled for many years, but not data. In previous episodes with Joe Doliner, we explored the evolution of Pachyderm. In today's show, we talk about the state of the company in 2020, as well as Pachyderm Hub, and end-to-end machine learning and data lineage product.
A private network connects servers, computers, and cloud instances. These networked objects are often separated by firewalls and subnets that create latency and complication. David Crawshaw is the CTO of Tailscale, a company that works to make private networks easier to build and simpler to configure and maintain. David joins the show to talk about private networks and the implementation of Tailscale.
Ray is a general purpose distributed computing framework. Ray is used for reinforcement learning and other compute intensive tasks. It was developed at the Berkeley RISELab, a research and development lab with an emphasis on practical applications. Ion Stoica is a professor at Berkeley, and he joins the show to talk about the present and future of the Ray framework.
Machine learning models are only as good as the datasets they're trained on. Aquarium is a system that helps machine learning teams make better models by improving their dataset quality. Model improvement is often made by curating high quality datasets, and Aquarium helps make that a reality. Peter Gao works on Aquarium, and he joins the show to talk through modern machine learning and the role of Aquarium.
Databases are the source of truth for every company. Editing the data in the database normally requires writing a query in SQL or a domain specific querying language–languages that are only accessible to engineers and highly technical people. BaseDash is a tool for interfacing with a database without requiring the usage of a query language. It allows the user to interface with the database as easily as a spreadsheet. Max Musing is a founder of BaseDash, and he joins the show to talk about how it works and why he built it.
Training a computer vision model is not easy. Bottlenecks in the development process make it even harder. Ad hoc code, inconsistent data sets, and other workflow issues hamper the ability to streamline models. Roboflow is a company built to simplify and streamline these model training workflows. Brad Dwyer is a founder of Roboflow and joins the show to talk about model development and his company.
Play Episode Listen Later Oct 14, 2020
Development environments are brittle and hard to manage. They lack the kind of fungibility afforded by infrastructure-as-code. Gitpod is a company that allows developers to describe development environments as code to make them easier to work with, and enabling a more streamlined GitOps workflow. Johannes Landgraf and Sven Efftinge are creators of Gitpod and they join the show to discuss the product and the motivation for building it.
Firebase is well-known as a platform that makes it easy to build real-time applications quickly and easily. Firebase was acquired by Google, and has been turned into a large platform that runs on top of Google Cloud. Firebase is closed-source, which leads to a different ecosystem than open source platforms. Supabase is a new open source alternative to Firebase, built on Postgres and Elixir. Paul Copplestone is the founder of Supabase and he joins the show to talk through what he is building.
Containers and virtual machines are two ways of running virtualized infrastructure. Containers use less resources than VMs, and typically use the runc open source container runtime. Sysbox is a containerization runtime that offers an alternative to runc, and allows for the deployment of Docker or Kubernetes within a container. Cesar Talledo is the founder of Nestybox, a company built around the Sysbox runtime. He joins the show to talk about container runtimes and his new company.
Machine learning models require training data, and training data needs to be labeled. Raw images and text can be labeled using a training data platform like Labelbox. Labelbox is a system of labeling tools that enables a human workforce to create data that is ready to be consumed by machine learning training algorithms. The Labelbox team joins the show today to discuss training data and how to label it.
Predicting the spread of COVID-19 is not easy. The best methods we have available require us to extrapolate trends from a large volume of data, and this requires the construction of large-scale models. Because of the expertise needed for developing these models, Silicon Valley engineers were brought in to help develop a maintainable model. Two of these engineers are Josh Wills and Sam Shah, and they join the show to talk about the engineering behind the COVID model, and their work to build it.
Cloud resources can get out of control if proper management constraints are not put in place. Cloud Custodian enables users to be well managed in the cloud. It is a YAML DSL that allows you to easily define rules to enable a well-managed cloud infrastructure giving security and cost optimization. Kapil Thangavelu works on Cloud Custodian and he joins the show to talk about modern cloud management and what he is building with Cloud Custodian.
Play Episode Listen Later Oct 22, 2020
For all the advances in software development over the years, one area that has seen minimal improvement is the terminal. Typing commands into a black text interface seems antiquated compared to the dynamic, flashy interfaces available in web browsers and modern desktop applications. Fig is a visual terminal assistant with the goal of changing that. Fig sits next to the developer's normal terminal and enhances the terminal experience. The founders of Fig, Brendan Falk and Matt Schrage, join the show today to discuss how Fig works and why it is useful to have an enhanced terminal.
Federated learning is machine learning without a centralized data source. Federated Learning enables mobile phones or edge servers to collaboratively learn a shared prediction model while keeping all the training data on device. Mike Lee Williams is an expert in federated learning, and he joins the show to give an overview of the subject and share his thoughts on its applications.
Effective data science requires clean data. As data moves through the data pipeline, there may be errors introduced. Errors can also arise from code changes, database migrations, and other forms of data movement. How can you ensure data quality within a fast moving, dynamic data system? Datafold is a company built around data quality management. It allows users to compare tables and databases, as well as automate data QA. Gleb Mezhanskiy is a founder of Datafold and joins the show to talk about the data quality space and what he is building with Datafold.
Play Episode Listen Later Oct 27, 2020
Pair programming allows developers to partner on solving problems and learn from each other more effectively. Pair programming has become harder to do as remote work has become more prevalent. GitDuck is a tool to enable more effective pair programming. Dragos Fotescu and Thiago Monteiro are the founders of GitDuck, and they join the show to explain what they have built and their motivation behind it.
The Salesforce Ecosystem has thousands of developers, designers, product people, and entrepreneurs engaging with each other. Salesforce exposes APIs and SDKs that allow people to build infrastructure on top of the Salesforce platform. In a previous episode, we explored how the ecosystem works as a whole. In today's show, Chuck Liddell joins the show to talk about how developers themselves engage with Salesforce. Chuck is CEO of Valence, a Salesforce AppExchange ISV that adds native integration middleware to Salesforce.
Staff engineer is a job title that suggests the engineer has deep expertise, and considerable experience. More and more companies are adopting a “staff engineer track” where an engineer can work to become a staff engineer. What is the role of staff engineer? Is it a management role or an individual contributor? What are the expectations and obligations of staff engineer? Will Larson is an experienced engineer who has worked at Stripe and other prominent tech companies. He joins the show to talk about the role of staff engineering, and the material he has written about it.
Fivetran is a company that builds data integration infrastructure. If your company is performing ELT or ETL jobs to move data from one place to another, Fivetran can help with that movement from source to destination. Once the data is moved into a data warehouse, a tool called DBT (data build tool) can be used to transform the data more effectively. We have done shows previously about Fivetran and DBT. In today's episode, George Fraser of Fivetran returns to discuss the cross section of these two technologies, and what his company is doing around that integration point.
A customer data platform such as Segment allows developers to build analytics and workflows around customer data such as purchases, clicks, and other interactions. These customer data platforms (CDP) are often tightly coupled to an underlying data warehouse technology. Hightouch is a platform that provides an unbundled CDP–a platform that sits on top of your own data warehouse. The Hightouch team joins the show to talk about what they are building and the CDP ecosystem as a whole.
Data labeling and model training requires tools to enable humans to work in the loop more effectively. The “human in the loop” is necessary to train models via human-labeled data. Humanloop is a platform for streamlining the tasks of the human in the loop. Raza Habib is a founder of Humanloop joins the show to talk about NLP workflows and his work on Humanloop.
Newer machine learning tooling is often focused on streamlining the workflows and developer experience. One such tool is BentoML. BentoML is a workflow that allows data scientists and developers to ship models more effectively. Chaoyu Yang is the creator of BentoML and he joins the show to talk about why he created Bento and the engineering behind the project.
Sendbird is a company that makes chat, voice, and video APIs for developers. The biggest company in this category is arguably Twilio, but Sendbird works at a higher level of abstraction, with an emphasis on developer experience and visual components. John Kim is the CEO of Sendbird and he joins the show to discuss the engineering and competitive positioning of his company.
Data science requires data sets to be cataloged and indexed. The data sets are versioned and might be in CSV files in S3, a database, or another data storage system. Splitgraph allows the user to query this data catalog like it is a Postgres database, routing queries to any data set across your catalog. Miles Richardson is a founder of Splitgraph and joins the show to talk about data cataloging and what he is building with Splitgraph.
Play Episode Listen Later Nov 9, 2020
Static analysis allows for the discovery of issues in a codebase without compiling. There have been many generations of static analysis tools. A newer static analysis tool is DeepSource, which automates code reviews, identifies bug risks, and generates pull requests to fix them. Jai Pradeesh and Sanket Saurav are founders of DeepSource, and join the show to talk through the creation of static analysis tooling, and their work on DeepSource.
DevOps practices are shared via community, and community manifests at conferences. Unfortunately, conferences are not possible right now due to COVID-19. The world has turned to virtual conferences. All Day DevOps is a 24 hour conference sharing learnings and software strategies around DevOps, starting November 12th. Derek Weeks and Mark Miller are organizers of the conference and they join the show to talk about modern DevOps.
Netlify is a cloud provider for JAMStack applications. To make those applications more performant, Netlify has built out capacity for edge computing–specifically “edge handlers”. Edge handlers can be used for a variety of use cases that need lower latency or other edge computing functionality. Matt Biilmann Christensen is the CEO of Netlify and joins the show to talk through the engineering behind edge handlers.
Microservices route requests between each other. As the underlying infrastructure changes, this routing becomes more complex and dynamic. The interaction patterns across this infrastructure requires operators to create rules around traffic management. Tobias Kunze Briseno is the founder of Glasnostic, a system for ensuring resilience of microservice applications. Tobias joins the show to talk about microservice routing and traffic management, and what he has built with Glasnostic.
Internal tools are often built with Ruby on Rails or NodeJS. Developers create entire full-fledged applications in order to suit simple needs such as database lookups, dashboarding, and product refunds. This internal tooling creates a drain on engineering resources. Retool is a low-code platform for creating internal tools. These internal tools can be written by bizops, marketing, or roles other than engineers. David Hsu is the founder of Retool and joins the show to talk through what he has built.
Banking and money management are at the core of many modern applications. Payment operations teams work to enable the transfer of funds between different bank accounts, and to track the movement of those funds effectively. Modern Treasury is a company that builds payment operations APIs. Sam Aarons works at Modern Treasury and joins the show to talk through the engineering at Modern Treasury.
Data leaks can cause privacy violations and other cloud security vulnerabilities. Visibility and control of cloud resources can help secure data and ensure compliance and governance. Open Raven is a system for discovering and classifying sensitive data in a public cloud, and assuring compliance and governance. Dave Cole is a founder of Open Raven, and he joins the show to talk through data protection and what he has built with Open Raven.
The JavaScript ecosystem has millions of packages. How do you choose from those packages to find the best in breed for your projects? OpenBase is a system for searching and discovering JavaScript packages. OpenBase includes reviews, insights, and statistics around these JavaScript packages. Lior Grossman is a founder of OpenBase, and joins the show to talk about the JavaScript ecosystem and what he is building.
Infrastructure at Spotify runs at high speeds. Developers work autonomously, building and deploying services all the time. Backstage is an open source platform built at Spotify that allows developers to build portals for making sense of their infrastructure. Backstage developer portals are powered by a central service catalog, with centralized services and streamlined development. Stefan Alund joins the show to explain how Backstage works and their role in developing it.
GitHub manages a large API surface for both internal and external developers. This API surface has been migrated from purely RESTful requests to GraphQL. GraphQL is a newer request language for data fetching with fewer round trips. Marc-Andre Giroux works at GitHub and is the author of Production Ready GraphQL. He joins the show to talk about GraphQL across the industry, and specifically at GitHub.
Originally published January 10, 2020Slack is a messaging platform for organizations. Since its creation in 2013, Slack has quickly become a core piece of technology used by a wide variety of technology companies, groups, and small teams. The messages that are sent on Slack are generated at a very high volume, and are extremely sensitive. These messages must be stored on Slack's servers in a way that does not risk a message from one company accidentally being accessible to another company. The messages must be highly available, and they also must be indexed for search.When Slack was scaling, the company started to encounter limitations in its data infrastructure that the company was unsure how to solve. During this time, Josh Wills was the director of data engineering at Slack, and he joins the show to retell the history of his time at Slack, and why the problem of searching messages was so hard. Josh also provides a great deal of industry context around how engineers from Facebook and Google differ from one another. When Slack was starting to become popular, the company quickly began to attract engineers from both of those companies. Facebook and Google have distinct solutions for how they have tackled the problems of data engineering.
Play Episode Listen Later Nov 24, 2020
Originally published May 14, 2018The Kubernetes ecosystem consists of enterprises, vendors, open source projects, and individual engineers. The Cloud Native Computing Foundation was created to balance the interests of all the different groups within the cloud native community. CNCF has similarities to the Linux Foundation and the Apache Foundation. CNCF helps to guide open source projects in the Kubernetes ecosystem–including Prometheus, Fluentd, and Envoy. With the help of the CNCF, these projects can find common ground where possible.KubeCon is a conference organized by the Cloud Native Computing Foundation. I attended the most recent KubeCon in Copenhagen. KubeCon was a remarkably well-run conference–and the attendees were excited and optimistic. As much traction as Kubernetes has, it is still very early days and it was fun to talk to people and forecast what the future might bring.At KubeCon, I sat down with Chris Aniszczyk and Dan Kohn, who are the COO and director of the CNCF. I was curious about how to scale an organization like the CNCF. In some ways, it is like scaling a government. Kubernetes is growing faster than Linux grew, and the applications of Kubernetes are as numerous as those of Linux. Different constituencies want different things out of Kubernetes–and as those constituencies rapidly grow in number, how do you maintain diplomacy among competing interests? It's not an easy task, and that diplomacy has been established by keeping in mind lessons from previous open source projects.
Originally published July 27, 2018React Native allows developers to reuse frontend code between mobile platforms. A user interface component written in React Native can be used in both iOS and Android codebases. Since React Native allows for code reuse, this can save time for developers, in contrast to a model where completely separate teams have to create frontend logic for iOS and Android. React Native was created at Facebook. Facebook itself uses React Native for mobile development, and contributes heavily to the open source React Native repository.In 2016, Airbnb started using React Native in a significant portion of their mobile codebase. Over the next two years, Airbnb saw the advantages and the disadvantages of adopting the cross platform, JavaScript based system. After those two years, the engineering management at Airbnb came to the conclusion to stop using React Native. Gabriel Peal is an engineer at Airbnb who was part of the decision to move off of React Native. Gabriel wrote a blog post giving the backstory for React Native at Airbnb, and he joins the show to give more detail on the decision.
Play Episode Listen Later Nov 26, 2020
October 1, 2019The development of self-driving cars is one of the biggest technological changes that is under way.Across the world, thousands of engineers are working on developing self-driving cars. Although it still seems far away, self-driving cars are starting to feel like an inevitability. This is especially true if you spend much time in downtown San Francisco, where you will see a self-driving car being tested every day. Much of the time, that self-driving car will be operated by Cruise.Cruise is a company that is building a self-driving car service. The company has hundreds of engineers working across the stack, from computer vision algorithms to automotive hardware. Cruise's engineering requires engineers who can work with cloud tools as well as low-latency devices. It also requires product developers and managers to lead these different teams.The field of self-driving is very new. There is not much literature available on how to build a self-driving car. There is even less literature on how to manage a team of engineers that are building, testing, and deploying software and hardware for real cars that are driving around the streets of San Francisco.Mo Elshenawy is VP of engineering at Cruise, and he joins the show to talk about the engineering that is required to develop fully self-driving car technology, as well as how to structure teams to align the roles of product design, software engineering, testing, machine learning, and hardware. Full disclosure: Cruise is a sponsor of Software Engineering Daily.
Originally published November 7, 2018An instruction set defines a low level programming language for moving information throughout a computer. In the early 1970's, the prevalent instruction set language used a large vocabulary of different instructions. One justification for a large instruction set was that it would give a programmer more freedom to express the logic of their programs.Many of these instructions were rarely used. Think of your favorite programming language (or your favorite human language). What percentage of words in the vocabulary do you need to communicate effectively? We sometimes call these language features “syntactic sugar”. They add expressivity to a language, but may not improve functionality or efficiency.These extra language features can have a cost.Dave Patterson and John Hennessy created the RISC architecture: Reduced Instruction Set Compiler architecture. RISC proposed reducing the size of the instruction set so that the important instructions could be optimized for. Programs would become more efficient, easier to analyze, and easier to debug.Dave Patterson's first paper on RISC was rejected. He continued to research the architecture and advocate for it. Eventually RISC became widely accepted, and Dave won a Turing Award together with John Hennessy.Dave joins the show to talk about his work on RISC and his continued work in computer science research to the present. He is involved in the Berkeley RISELab and works at Google on the Tensor Processing Unit.Machine learning is an ocean of new scientific breakthroughs and applications that will change our lives. It was inspiring to hear Dave talk about the changing nature of computing, from cloud computing to security to hardware design.
For several years, we have had the ability to create artificially generated text articles. More recently, audio and video synthesis have been feasible for artificial intelligence. Rosebud is a company that creates animated virtual characters that can speak. Users can generate real or fictional presenters easily with Rosebud. Dzmitry Pletnikau is an engineer with Rosebud and joins the show to talk about the technology and engineering behind the company.
Business intelligence is crucial for both internal and external applications at any company. There is a wide array of proprietary BI tools. Today, there is an increasing number of options for open source business intelligence, one of which is CubeJS. CubeJS is an open source analytical API platform for building BI. Artyom and Pavel from CubeJS join the show to talk about what they have built and their vision for the platform.
Border Gateway Protocol is a protocol designed for routing and reachability between autonomous systems on the internet. BGPmon is a tool for assessing the routing health of your network, which allows for a network administrator to understand network stability and risk of data. Andree Toonk is the founder of BGPmon and joins the show to talk about BGP, how to monitor routing data, and his work at Cisco.
Play Episode Listen Later Dec 3, 2020
Data science is a collaborative field. Collaboration requires sharing the artifacts that data scientists are working on, such as Jupyter Notebooks and SQL tables. Hex is a platform for improving sharing across data science workflows. Caitlin Colgrove and Barry McCardel are founders of Hex and they join the show to discuss what they have built.
Osquery is a tool for providing visibility into operating system endpoints. It is a flexible tool developed originally at Facebook. Ganesh Pai is the founder of Uptycs, a company that uses Osquery to find threats and malicious activity occurring across nodes. Ganesh joins the show to talk about Osquery usage and his work on Uptycs.
Play Episode Listen Later Dec 8, 2020
Originally published September 4, 2018TIBCO was started in the 90's with a popular message bus product that was widely used by finance companies, logistics providers, and other systems with high throughput. As TIBCO grew in popularity, the company expanded into other areas through products it developed in-house as well as through acquisitions.One acquisition was Jaspersoft, a business intelligence data platform. When TIBCO acquired Jaspersoft in 2014, the architecture was a monolithic Java application. Around this time, customer use cases were shifting from centralized reporting to real-time, embedded visualizations. The use case of the Jaspersoft software was becoming less centralized and less monolithic and the software architecture needed to change in order to reflect that.Jan Schiffman is a VP of engineering at TIBCO and Sherman Wood is a director at TIBCO. They join the show to discuss the process of migrating a large Java monolith to a composable set of services. Breaking up a monolith is not an easy process–nor is it something that every company should do just because they have a monolith. In some cases, a monolith is just fine.Jan and Sherman explain why the business use case for why the Jaspersoft monolith needed to be refactored, and their approach to the refactoring. We also talk through the modern use cases of embedded analytics and the interaction between business analysts and data engineers. At a higher level, we discuss the lessons they have learned from managing a large, complex refactoring. Full disclosure: TIBCO is a sponsor of Software Engineering Daily.
Originally published March 31, 2017Brendan Eich created the first version of JavaScript in 10 days. Since then JavaScript has evolved, and Brendan has watched the growth of the web give rise to new and unexpected use cases.Today Brendan Eich is still pushing the web forward across the technology stack with his involvement in the WebAssembly specification and the Brave browser.For all of its progress, JavaScript struggles to run resource-intensive programs like complex video games. With JavaScript falling short on its charge to be the “assembly language for the web” the four major browser vendors started collaborating on the WebAssembly project to allow programming languages a faster, lower level compile target when deploying to the web.Brendan is the CEO of Brave which aims to provide a faster and safer browsing experience by blocking ads and trackers by default in a new browser. The Brave browser is also helping publishers monetize in interesting new ways while also giving a share of ad revenue to its users.Caleb Meredith is the host of this show. He previously guest hosted a popular episode on Inferno, a fast, React-like JavaScript framework. As we bring on more guest hosts, please send us feedback. We want to know what every host is doing well, and what we can improve on.
Originally published April 3, 2017A hedge fund is a collection of investors that make bets on the future. The “hedge” refers to the fact that the investors often try to diversify their strategies so that the direction of their bets are less correlated, and they can be successful in a variety of future scenarios. Engineering-focused hedge funds have used what might be called “machine learning” for a long time to predict what will happen in the future.Numerai is a hedge fund that crowdsources its investment strategies by allowing anyone to train models against Numerai's data. A model that succeeds in a simulated environment will be adopted by Numerai and used within its real money portfolio. The engineers who create the models are rewarded in proportion to how well the models perform.Xander Dunn is a software engineer at Numerai and in this episode he explains what a hedge fund is, why the traditional strategies are not optimal, and how Numerai creates the right incentive structure to crowdsource market intelligence. This interview was fun and thought provoking–Numerai is one of those companies that makes me very excited about the future.
Originally published August 28, 2019Kent Beck is a legendary figure in the world of software engineering. Kent was an early advocate of Test-Driven Development (TDD), and popularized the idea of writing unit tests before writing code that would satisfy those unit tests. A unit test isolates and tests a small piece of functionality within a large piece of software. Practitioners of Test-Driven Development write tens or hundreds of tests in order to cover a large variety of cases that could potentially occur within their software.When Kent Beck joined Facebook in 2011, he was 50 years old and thought he had seen everything in the software industry. During Facebook Boot Camp, Kent started to realize that Facebook was very different than any other company he had seen. Facebook Boot Camp is the six-week onboarding process that every new hire learns about the software practices of the company.After graduating Facebook Boot Camp, Kent began to explore Facebook's codebase and culture. He found himself rethinking many of the tenets of software engineering that he had previously thought were immutable.Kent joins the show to discuss his time at Facebook, and how the company's approach to building and scaling products thoroughly reshaped his beliefs about software engineering.
Originally published May 16, 2019React is a set of open source tools for building user interfaces. React was open sourced by Facebook, and includes libraries for creating interfaces on the web (ReactJS) and on mobile devices (React Native).React was released during a time when there was not a dominant frontend JavaScript library. Backbone, Angular, and other JavaScript frameworks were all popular, but there was not any consolidation across the frontend web development community. Before React came out, frontend developers were fractured into different communities for the different JavaScript frameworks. After Facebook open sourced React, web developers began to gravitate towards the framework for its one-way data flow and its unconventional style of putting JavaScript and HTML together in a format called JSX. As React has grown in popularity, the React ecosystem has developed network effects. In many cases, the easiest way to build a web application frontend is to compose together open source React components.After seeing the initial traction, Facebook invested heavily into React, creating entire teams within the company whose goal was to improve React. Dan Abramov works on the React team at Facebook and joins the show to talk about how the React project is managed and his vision for the project.
Play Episode Listen Later Dec 14, 2020
Originally published December 20, 2018Ten years ago, there was a distinction between “backend” and “frontend” developers. A backend developer would be managing the business logic and database transactions using Ruby on Rails or Java. A frontend developer would be responsible for implementing designs and arranging buttons using raw HTML and JavaScript.Today, developers can build entire applications in JavaScript. Developers who spent their early career developing frontend JavaScript skills are finding themselves with a surprising amount of power. With NodeJS providing a backend framework and React, Vue, or Angular on the frontend, a single JavaScript developer can write all the code for a whole application—hence the rise of the “full stack developer”.At the same time, the cloud infrastructure is becoming easier to use. Backend-as-a-service simplifies the frustrations of deploying your application, and standing up a database. GraphQL improves the relationship between the frontend and the backend. And futuristic technologies like WebAssembly and web virtual reality are promising to make a JavaScript engineer's life even more interesting.Adam Conrad is an engineer and a writer for Software Engineering Daily. In recent articles, he has documented the changing nature of the frontend, including JavaScript engines, virtual reality, and how mature corporations are using React and GraphQL. He joins the show to share his perspective on what is changing in the frontend—and how full stack JavaScript engineers can position themselves for future success in a quickly changing market.
Play Episode Listen Later Dec 15, 2020
Originally published January 25, 2019When TensorFlow came out of Google, the machine learning community converged around it. TensorFlow is a framework for building machine learning models, but the lifecycle of a machine learning model has a scope that is bigger than just creating a model. Machine learning developers also need to have a testing and deployment process for continuous delivery of models.The continuous delivery process for machine learning models is like the continuous delivery process for microservices, but can be more complicated. A developer testing a model on their local machine is working with a smaller data set than what they will have access to when it is deployed. A machine learning engineer needs to be conscious of versioning and auditability.Kubeflow is a machine learning toolkit for Kubernetes based on Google's internal machine learning pipelines. Google open sourced Kubernetes and TensorFlow, and the projects have users AWS and Microsoft. David Aronchick is the head of open source machine learning strategy at Microsoft, and he joins the show to talk about the problems that Kubeflow solves for developers, and the evolving strategies for cloud providers.David was previously on the show when he worked at Google, and in this episode he provides some useful discussion about how open source software presents a great opportunity for the cloud providers to collaborate with each other in a positive sum relationship.
Originally published September 17, 2019Ever since Apache Kafka was open sourced from LinkedIn, it has been used to solve a wide variety of problems in distributed systems and data engineering.Kafka is a distributed messaging queue that is used by developers to publish messages and subscribe to topics with a certain message type. Kafka allows information to flow throughout a company such that multiple systems can consume the messages from a single sender. In previous shows, we have covered design patterns within Kafka, Kafka streams, event sourcing with Kafka, and many other subjects relating to the technology. Kafka is broadly useful, and new strategies for using Kafka continue to emerge as the open source project develops new functionality and becomes a platform for data applications.In today's episode, Tim Berglund returns to Software Engineering Daily for a discussion of how applications are built today using Kafka–including systems that are undergoing a refactoring, data engineering applications, and systems with a large number of communicating services.If you are interested in learning more about how companies are using Kafka, the Kafka Summit in San Francisco is September 30th – October 1st. Companies like LinkedIn, Uber, and Netflix will be talking about how they use Kafka. Full disclosure: Confluent (the company where Tim works) is a sponsor of Software Engineering Daily.
Originally published December 9, 2019Machine learning algorithms have existed for decades. But in the last ten years, several advancements in software and hardware have caused dramatic growth in the viability of applications based on machine learning.Smartphones generate large quantities of data about how humans move through the world. Software-as-a-service companies generate data about how these humans interact with businesses. Cheap cloud infrastructure allows for the storage of these high volumes of data. Machine learning frameworks such as Apache Spark, TensorFlow, and PyTorch allow developers to easily train statistical models.These models are deployed back to the smartphones and the software-as-a-service companies, which improves the ability for humans to move through the world and gain utility from their business transactions. And as the humans interact more with their computers, it generates more data, which is used to create better models, and higher consumer utility.The combination of smartphones, cloud computing, machine learning algorithms, and distributed computing frameworks is often referred to as “artificial intelligence.” Chris Benson is the host of the podcast Practical AI, and he joins the show to talk about the modern applications of artificial intelligence, and the stories he is covering on Practical AI. On his podcast, Chris talks about everything within the umbrella of AI, from high level stories to low level implementation details.
Originally published October 18, 2019Apache Kafka was created at LinkedIn. Kafka was open sourced in 2011, when the company was eight years old. By that time, LinkedIn had developed a social network with millions of users. LinkedIn's engineering team was building a range of externally facing products and internal tools, and many of these tools required a high-throughput system for publishing data and subscribing to topics.Kafka was born out of this need. Over time, Kafka's importance within LinkedIn has only grown. Kafka plays a central role for services, log management, data engineering, and compliance. LinkedIn might be the biggest user of Apache Kafka in the entire software industry. Kafka has many use cases, and it is likely that they are almost all on display within LinkedIn.Nacho Solis is a senior software engineering manager at LinkedIn, where he helps teams build infrastructure for Kafka, as well as Kafka itself. Nacho joins the show to discuss the history of Kafka at LinkedIn, and the challenges of managing such a large deployment of Kafka. We also talk about streaming, data infrastructure, and more general problems in the world of engineering management.Full disclosure: LinkedIn is a sponsor of Software Engineering Daily.
Originally published July 7, 2017Airbnb is a company that is driven by design. New user interfaces are dreamed up by designers and implemented for web, iOS, and Android. This implementation process takes a lot of resources, but it used to take even more before the company started using React Native. React Native allows Airbnb to reuse components effectively.React Native works by presenting a consistent model for the user interface regardless of the underlying platform, and emitting a log of changes to that user interface. The underlying platform translates those changes into platform specific code.Leland Richardson is an engineer at Airbnb. In today's episode, he explains how Airbnb uses React Native, how React Native works, and the future of the platform.
Originally published June 21, 2019Niantic is the company behind Pokemon Go, an augmented reality game where users walk around in the real world and catch Pokemon which appear on their screen.The idea for augmented reality has existed for a long time. But the technology to bring augmented reality to the mass market has appeared only recently. Improved mobile technology makes it possible for a smartphone to display rendered 3-D images over a video stream without running out of battery.Ingress was the first game to come out of Niantic, followed by Pokemon Go, but there are other games on the way. Niantic is also working on the Niantic Real World platform, a “planet-scale” AR platform that will allow independent developers to build multiplayer augmented reality experiences that are as dynamic and entertaining as Pokemon Go.Paul Franceus is an engineer at Niantic, and he joins the show to describe his experience building and launching Pokemon Go, as well as abstracting the technology from Pokemon Go and opening up the Niantic Real World platform to developers.
Originally published March 6, 2020ReactJS developers have lots of options for building their applications, and those options are not easy to work through. State management, concurrency, networking, and testing all have elements of complexity and a wide range of available tools. Take a look at any specific area of JavaScript application development, and you can find highly varied opinions.Kent Dodds is a JavaScript teacher who focuses on React, JavaScript, and testing. In today's episode, Kent provides best practices for building JavaScript applications, specifically React. He provides a great deal of advice on testing, which is unsurprising considering he owns TestingJavaScript.com. Kent is an excellent speaker who has taught thousands of people about JavaScript, so it was a pleasure to have him on the show.
Originally published April 7, 2017Engineers in Silicon Valley see a world of constant progress. Our work is creative and intellectually challenging. We are building the future and getting compensated quite well for it. But what if we are actually achieving far less than what is possible? What if, after so many years of high margins, gourmet lunch, and self-flattery, we have lowered our standards for innovation? And if Silicon Valley has been lulled into complacency, what does that say about the rest of the United States?American exceptionalism has faltered and complacency has risen in its wake.Today's guest Tyler Cowen is an economist and author. His new book The Complacent Class is the final book in a trilogy that describes a decline of American output and a decline in American mindset.Complacent America has lost its ability to assess risk. Children are prevented from playing tag for risk of injury. College students protest against speakers who might present challenging ideas. The number of Americans under 30 who own a business has fallen by 65% since the 1980's–millennials are too busy going to business school to start businesses.In his books, Tyler weaves together history, philosophy, and contemporary culture. He presents hard data about many different fields, and theorizes about how the trends in those fields relate to each other.He also has a podcast, Conversations with Tyler, and in this episode I tried to mirror his interview style. If you like this episode, you should check out his show–he has interviewed people like Ezra Klein, Peter Thiel, and Kareem Abdul-Jabbar.
Happy holidays. I want to thank everyone who continues to support the show through your listenership. Since we are on holiday from regularly scheduled content, I am taking this opportunity to share with you something I have been working on personally for the last two years: a musical album. Some of you don't know I
Play Episode Listen Later Dec 28, 2020
Originally published April 17, 2019Drishti is a company focused on improving manufacturing workflows using computer vision.A manufacturing environment consists of assembly lines. A line is composed of sequential stations along that manufacturing line. At each station on the assembly line, a worker performs an operation on the item that is being manufactured. This type of workflow is used for the manufacturing of cars, laptops, stereo equipment, and many other technology products.With Drishti, the manufacturing process is augmented by adding a camera at each station. Camera footage is used to train a machine learning model for each station on the assembly line. That machine learning model is used to ensure the accuracy and performance of each task that is being conducted on the assembly line.Krish Chaudhury is the CTO at Drishti. From 2005 to 2015 he led image processing and computer vision projects at Google before joining Flipkart, where he worked on image science and deep learning for another four years. Krish had spent more than twenty years working on image and vision related problems when he co-founded Drishti.In today's episode, we discuss the science and application of computer vision, as well as the future of manufacturing technology and the business strategy of Drishti.
Originally published May 29, 2020Kubernetes has become a highly usable platform for deploying and managing distributed systems. The user experience for Kubernetes is great, but is still not as simple as a full-on serverless implementation–at least, that has been a long-held assumption. Why would you manage your own infrastructure, even if it is Kubernetes? Why not use autoscaling Lambda functions and other infrastructure-as-a-service products?Matt Ward is a listener of the show and an engineer at Mux, a company that makes video streaming APIs. He sent me an email that said Mux has been having success with self-managed Kubernetes infrastructure, which they deliberately opted for over a serverless deployment. I wanted to know more about what shaped this decision to opt for self-managed infrastructure, and the costs and benefits that Mux has accrued as a result.Matt joins the show to talk through his work at Mux, and the architectural impact of opting for Kubernetes instead of fully managed serverless infrastructure.
Originally published September 13, 2019Amazon Web Services first came out in 2006.It took several years before the software industry realized that cloud computing was a transformative piece of technology. Initially, the common perspective around cloud computing was that it was a useful tool for startups, but would not be a smart option for large, established businesses. Cloud computing was not considered economical nor secure.Today, that has changed. Every company that writes software is figuring out how to utilize the cloud. Software companies with on-prem servers are migrating old applications to the cloud, and most companies that have started in the last decade do not even have physical servers. Applications that are started on the cloud are referred to as “cloud-native.” The architecture of cloud-native applications is a newer topic of discussion, and some software patterns that became established in the pre-cloud era make less sense today.Cornelia Davis is VP of technology at Pivotal and the author of Cloud Native Patterns, a book about developing applications in the distributed, virtual world of the cloud. Cornelia was previously on the show to discuss Cloud Foundry. In today's episode, our conversation centers on her book, and her perspective on the emerging patterns of cloud native software.
Originally published July 19, 2019In 2011, Facebook had begun to focus its efforts on mobile development. Mobile phones did not have access to reliable, high bandwidth connections, and the Facebook engineering team needed to find a solution to improve the request latency between mobile clients and the backend Facebook infrastructure.One source of latency was recursive data fetching. If a mobile application client made a request to the backend for newsfeed, the backend API would return the newsfeed, but some components of that feed would require additional requests to the backend. In practice, this might result in a newsfeed loading partially on a phone, but having a delayed loading time for the comments of a newsfeed item.GraphQL is a solution that came out of this problem of recursive data fetching. A GraphQL server provides middleware to aggregate all of the necessary information to serve a complete request. GraphQL connects to backend data sources and federates the frontend request across these different data sources.GraphQL was open sourced in 2015, and has found many use cases in addition to simplifying backend data fetching for mobile clients. Today, GraphQL is used by PayPal, Shopify, Twitter, and hundreds of other companies.Lee Byron is the co-creator of GraphQL and he joins the show to tell the story of GraphQL, and how it fit into Facebook's shift to mobile.
Originally published July 30, 2019“Internet of Things” is a term used to describe the increasing connectivity and intelligence of physical objects within our lives. IoT has manifested within enterprises under the term “Industrial IoT,” as wireless connectivity and machine learning have started to improve devices such as centrifuges, conveyor belts, and factory robotics. In the consumer space, IoT has moved slower than many people expected, and it remains to be seen when we will have widespread computation within consumer devices such as microwaves, washing machines, and lightswitches.IoT computers have different constraints than general purpose computers. Security, reliability, battery life, power consumption, and cost structures are very different in IoT devices than in your laptop or smartphone. One technology that could solve some of the problems within IoT is WebAssembly, a newer binary instruction format for executable programs.Jonathan Beri is a software engineer and the organizer of the San Francisco WebAssembly Meetup. He has significant experience in the IoT industry, and joins the show to discuss the state of WebAssembly, the surrounding technologies, and their impact on IoT.
Serverless has grown in popularity over the last five years, and the space of applications that can be built entirely with serverless has increased dramatically. This is due to two factors: the growing array of serverless tools (such as edge-located key value stores) and the rising number of companies with serverless offerings. One of those companies is Fastly, which originally gained adoption for its CDN solution. Tyler McMullen is the CTO of Fastly and he joins the show to talk through how Fastly looks at edge computing today. This is Tyler's third appearance on the show.
Play Episode Listen Later Jan 7, 2021
When Tim Wagner worked at Amazon, he invented AWS Lambda. After working on the early serverless infrastructure, he joined Coinbase and worked as VP of Engineering. Since leaving Coinbase, he has started a new company called Vendia. Vendia combines his learnings from the serverless space with the innovations around blockchains to work on the problem of data sharing. Tim and David Wells join the show to discuss what they are working on with Vendia.
Data lakes and data warehouses store high volumes of multidimensional data. Data sources for these pieces of infrastructure can become unreliable for a variety of reasons. When data sources break, it can cause downstream problems. One company working to solve the problem of data reliability is Monte Carlo Data. Barr Moses and Lior Gavish are founders of Monte Carlo and join the show to talk about data reliability and the overall landscape of data infrastructure.
TensorFlow Lite is an open source deep learning framework for on-device inference. TensorFlow Lite was designed to improve the viability of machine learning applications on phones, sensors, and other IoT devices. Pete Warden works on TensorFlow Lite at Google and joins the show to talk about the world of machine learning applications and the necessary frameworks and devices necessary to build them.
Cost management is growing in importance for companies that want to manage their significant cloud bill. Kubernetes plays an increasing role in modern infrastructure, so managing cost of Kubernetes clusters becomes important as well. Kubecost is a company focused on giving visibility into Kubernetes resources and reducing spend. Webb Brown is a founder of Kubecost and joins the show to talk about Kubernetes cost optimization and what he is building with Kubecost.
Large technology companies are a new type of industry. Their power and reach is resistant to a comparison of previous generations, such as big oil. Alex Kantrowitz is a journalist who has covered big technology for much of his career, and he currently runs Big Technology, a newsletter and podcast about the biggest technology companies in the world. He's also the author of Always Day One: How The Tech Titans Plan To Stay On Top Forever. Alex joins the show to talk about his work and share his thoughts on big tech.
Network discovery allows enterprises to identify what devices are on their network. These devices can include smartphones, servers, desktop computers, and tablets. Being able to index the devices on a network is crucial to figuring out the security profile of that network. HD Moore is a founder of Rumble Networks, a company focused on network discovery and asset inventory. He joins the show to talk about how network discovery works and his experience building Rumble.
Agriculture infrastructure allows plants such as corn, soy, and wheat to move from large scale farms to consumers all around the world. The relevant players in the agricultural infrastructure includes growers, shippers, and planners. These individuals need new technology to interact more efficiently. Growers need to be able to connect more smoothly with buyers. Farmers need better management of their carbon credits. Microbial technology can allow plants to be better shielded from tough conditions. Agricultural health, transport, commerce, and logistics are all problems that Indigo Agriculture is focused on solving. David Potere is head of geoinnovation at Indigo, and joins the show to talk about the problems the company is solving and the engineering practices at Indigo.
GraphQL has changed the common design patterns for the interface between backend and frontend. This is usually achieved by the presence of a GraphQL server, which interprets and federates a query from the frontend to the backend server infrastructure. Dgraph is a distributed graph database with native GraphQL support. Manish Jain is a founder of Dgraph, and joins the show to talk about its purpose and his vision for the future of the technology.
Rust and Golang are two of the newest lower level languages for doing systems programming. They are often used for applications such as file systems, operating systems, and latency-sensitive applications. How do they compare in terms of safety, speed, and programming ergonomics? Linhai Song is an assistant professor and researcher at Penn State University, and joins the show to talk about his work researching Go and Rust.
Companies can have a negative impact on the environment by outputting excess carbon. Many companies want to reduce their net carbon impact to zero, which can be done by investing in forests. Pachama is a marketplace for forest investments. Pachama uses satellites, imaging, machine learning, and other techniques to determine how much carbon is being absorbed by different forests. Diego Saez-Gil is a founder of Pachama, and joins the show to talk through how Pachama works and the long-term goals of the company.
Kafka has achieved widespread popularity as a popular distributed queue and event streaming platform, with enterprise adoption and a billion dollar company (Confluent) built around it. But could there be value in building a new platform from scratch? Redpanda is a streaming platform built to be compatible with Kafka, that does not require the JVM nor Zookeeper, both of which are dependencies that made Kafka harder to work with than perhaps necessary. Alexander Gallego is a core committer to Redpanda and joins the show to talk about why he started the project and its value proposition.
For startups that are still seeking product/market fit, pre-seed investments are critical to funding initial investments in the product and in the infrastructure needed to scale. Afore Capital is a pre-seed fund that invests in innovative companies across a wide variety of verticals. Afore focuses on startups with unique product insights and novel distribution approaches.Gaurav Jain is a co-Founder and Managing Partner at Afore Capital. Before that, he worked with the Android team at Google. Gaurav joins the show today to talk about the risks and rewards of pre-seed investing, and about how founders and investors can find opportunity in the current venture environment.
Reinforcement learning is a paradigm in machine learning that uses incentives- or “reinforcement”- to drive learning. The learner is conceptualized as an intelligent agent working within a system of rewards and penalties in order to solve a novel problem. The agent is designed to maximize rewards while pursuing a solution by trial-and-error. Programming a system to respond to the complex and unpredictable “real world” is one of the principal challenges in robotics engineering. One field which is finding new applications for reinforcement learning is the study of MEMS devices- robots or other electronic devices built at the micrometer scale. The use of reinforcement learning in microscopic devices poses a challenging engineering problem, due to constraints with power usage and computational power.Nathan Lambert is a PhD student at Berkeley who works with the Berkeley Autonomous Microsystems Lab. He has also worked at Facebook AI Research and Tesla. He joins the show today to talk about the application of reinforcement learning to robotics and how deep learning is changing the MEMS device landscape.
Play Episode Listen Later Jan 27, 2021
Microservices are built to scale. But as a microservices-based system grows, so does the operational overhead to manage it. Even the most senior engineers can't be familiar with every detail of dozens- perhaps hundreds- of services. While smaller teams may track information about their microservices via spreadsheets, wikis, or other more traditional documentation, these methods often prove unsuitable for the unique demands of a sprawling microservices system. A microservices catalog is a solution to this problem. A microservices catalog seeks to centralize information about the services in your software architecture, including the purpose of a service, its owner, and instructions for using it. A microservices catalog can also provide a centralized source of knowledge about a system, which can help on-call engineers diagnose issues and also provide resources for onboarding new team members. Larger companies sometimes devote significant internal resources toward developing in-house microservices catalogs, while smaller organizations may not have the resources at their disposal to do so. OpsLevel's founders recognized that many teams were re-inventing the wheel building internal microservices catalogs, and set out to design a toolset that could meet the needs of users of all sizes.OpsLevel's team has drawn from extensive experience working with industry leaders in DevOps to create a comprehensive toolset for managing microservices infrastructure. OpsLevel provides a “single pane of glass for operations,” integrating with a variety of tools such as Slack, git, CI/CD, incident management, and deployment systems. John Laban and Kenneth Rose are the co-founders of OpsLevel. Before John and Kenneth founded OpsLevel they worked together at PagerDuty, where John was the first engineer on the team. Kenneth, OpsLevel's CTO, was also previously a senior developer at Shopify. John and Kenneth join the show today to talk about how OpsLevel can help developers manage their microservices better, and even transform how their team does DevOps.
Security is more important than ever, especially in regulated fields such as healthcare and financial services. Developers working in highly regulated industries often spend considerable time building tooling to help improve compliance and pass security audits. While the core of many security workflows is similar, each industry and each organization may have its own idiosyncratic needs or particular regulatory requirements to meet.Sym is a platform for building security workflows that seeks to build on those core similarities while empowering developers with the tools they need to meet their application's unique security and compliance needs. Sym believes in putting engineers in control of security, in the same way that DevOps put engineers in control of infrastructure. Yasyf Mohamedali is the CEO and co-founder of SymOps. Before SymOps, he was the CTO of Karuna Health. He joins the show today to talk about security and innovation in regulated industries and how Sym can help developers close the intent-to-implementation gap in application security.
Embedded Software Engineering is the practice of building software that controls embedded systems- that is, machines or devices other than standard computers. Embedded systems appear in a variety of applications, from small microcontrollers, to consumer electronics, to large-scale machines such as cars, airplanes, and machine tools. iRobot is a consumer robotics company that applies embedded engineering to build robots that perform common household tasks. Its flagship product is the Roomba, perhaps one of the most well-known autonomous consumer robots on the market today. iRobot's engineers work at the intersection of software and hardware, and work in a variety of domains from electrical engineering to AI.Chris Svec is a Software Engineering Manager at iRobot. He started his career designing x86 chips and later moved up the hardware/software stack into embedded software. He joins the show today to talk about iRobot, the design process for embedded systems, and the future of embedded systems programming.
In a distributed application, observability is key to handling incidents and building better, more stable software. Legacy monitoring methods were built to respond to predictable failure modes, and to aggregate high-level data like access speed, connectivity, and downtime. Observability, on the other hand, is a measure of how well you can infer the internal state of a system from its outputs in order to trace the cause. At its core, building a system with observability means using instrumentation to provide insights on how and why internal components within a system are performing a certain way. Developers and SREs can build on that data to proactively debug potential failure modes, set service-level objectives, and speed up incident response.New Relic has been an industry leader in the observability space for the better part of a decade. This year, they announced New Relic One, an evolution of their flagship platform that streamlines and simplifies all the functions available to help organizations achieve observability. New Relic One enhances the Full-Stack Observability Platform through AIOps with their Applied Intelligence, which draws insights from the observability data to help detect anomalies before they become incidents.Lew Cirne is the founder and CEO of New Relic. He joins the show today to talk about how New Relic One helps developers move beyond monitoring and embrace observability, and how he sees the future of software observability platforms.
Play Episode Listen Later Feb 2, 2021
Cilium is open-source software built to provide improved networking and security controls for Linux systems operating in containerized environments along with technologies like Kubernetes. In a containerized environment, traditional Layer 3 and Layer 4 networking and security controls based on IP addresses and ports, like firewalls, can be difficult to operate at scale because of the volatility of the system. Cilium is eBPF, which is an in-kernel virtual machine which attaches applications directly to code paths in the kernel. In effect, this makes the Linux kernel “programmable” without changing kernel source code or loading modules. Cilium takes advantage of this functionality to insert networking and security functions at the kernel level rather than in traditional Layer 3 or Layer 4 controls. This allows Cilium to combine metadata from Layer 3 and Layer 4 with application-layer metadata such as HTTP method and header values in order to establish rules and provide visibility based on service, pod, or container identity. Isovalent, co-founded by the creator of Cilium, maintains the Cilium Open Source Project and also offers Cilium Enterprise, which is a suite of tools helping organizations adopt Cilium and overcome the hurdles of building a secure, stable cloud-native application. Dan Wendlant and Thomas Graf are the co-founders of Isovalent. Thomas, the firm's CTO, was the original creator of the Cilium open-source project and spent 15 years working on the Linux kernel prior to founding Isovalent. Dan, Isovalent's CEO, has also worked at VMWare and Nicira. They join the show today to talk about why Cilium and Cilium Enterprise are a great choice for organizations looking to build cloud-native applications.
Play Episode Listen Later Feb 3, 2021
Video calling over the internet has experienced explosive growth in the last decade. In 2010, surveys estimated that around 1 in 5 Americans had tried online video calling for any reason. By May of 2020, that number had nearly tripled. A significant factor in the growth of video calling has been an open-source project called WebRTC, or “Web Real-Time Communication.” WebRTC makes it possible to capture and stream audio or video data between browsers without the use of plugins or third-party software.Daily is a developer platform that builds on WebRTC to provide realtime video APIs for developers. Developers can easily add video call widgets to their code which come with a set of default configurations for functions such as bandwidth management and cross-browser support. Daily also offers a set of frontend libraries and REST APIs for developers who want to build a customized experience.Kwindla Hultman Kramer is a co-Founder at Daily, and he's joined today by Wesley Faulkner, who handles developer relations. They join the show today to talk about the growth in demand for video calling services, building a developer-friendly video calling API, and what's next for video calling applications.
Open source software is software distributed along with its source code, using a permissive license that allows anyone to view, use, or modify it. The term “open source” also refers more broadly to a philosophy of technology development which prioritizes transparency and community development of a project. Typically, development is managed by a governing body, whether a company, foundation, or just a group of passionate users, and work is done in public repositories like Github. Nearly every corner of the software engineering world has been impacted in some way by open source. Well-known open source projects include Linux, Kubernetes, and WordPress.Kevin Xu is the author of Interconnected, a bilingual newsletter on tech, business, and U.S-China relations. He is an investor in open source startups at OSS Capital, and formerly served in the Obama White House. He joins the show today to talk about the benefits of open source in the public and private sectors, and how open source will be critical to the development of high-tech industry in our country as we pivot to facing some of the 21st century's most pressing challenges.
A data-driven organization collects a wide variety of data to help in strategic decision-making. The cost of storing large amounts and variety of data has dropped dramatically in the last two decades, but too much unstructured data may not improve decision-making, and can even lead to “analysis paralysis.” Organizations react by extracting the most important, actionable data and placing it into a data warehouse, which has a predesigned structure meant to streamline the data in preparation for analysis. The key challenge with this approach is identifying what should be streamlined, and how to structure the data warehouse to focus on the most important, actionable items. This is especially important for organizations seeking to scale, as the necessary structure to generate the most relevant insights may change as the organization grows. Narrator is building data intelligence that uses a simple, proprietary Universal Data Model to help organizations streamline their data warehousing. Narrator is built on the belief that data tells the story of a system, and its platform empowers organizations to use those stories to make better decisions.Ahmed Elsamadisi is the founder and CEO of Narrator. Before founding Narrator, he spent several years working in data analysis and algorithm design for WeWork, Raytheon, and Cornell's Autonomous Systems Laboratory. He joins the show today to talk about how Narrator generates the most actionable insights from a data warehouse, why a Universal Data Model is so important when scaling, and what makes Narrator's approach to data analysis different.
Play Episode Listen Later Feb 9, 2021
The incredible advances in machine learning research in recent years often take time to propagate out into usage in the field. One reason for this is that such “state-of-the-art” results for machine learning performance rely on the use of handwritten, idiosyncratic optimizations for specific hardware models or operating contexts. When developers are building ML-powered systems to deploy in the cloud and at the edge, their goals to ensure the model delivers the best possible functionality and end-user experience- and importantly, their hardware and software stack may require different optimizations to achieve that goal.OctoML provides a SaaS product called the Octomizer to help developers and AIOps teams deploy ML models most efficiently on any hardware, in any context. The Octomizer deploys its own ML models to analyze your model topology, and optimize, benchmark, and package the model for deployment. The Octomizer generates insights about model performance over different hardware stacks and helps you choose the deployment format that works best for your organization.Luis Ceze is the Co-Founder and CEO of OctoML. Luis is a founder of the ApacheTVM project, which is the basis for OctoML's technology. He is also a professor of Computer Science at the University of Washington. Jason Knight is co-founder and CPO at OctoML. Luis and Jason join the show today to talk about how OctoML is automating deep learning engineering, why it's so important to consider hardware when building deep learning systems, and how the field of deep learning is evolving.
Play Episode Listen Later Feb 10, 2021
Blockchain technology has a wide variety of potential applications. Fields such as finance, supply chain management, and even voting have seen innovations driven by the development of distributed applications built on blockchains, called DApps. However, developing a DApp on a blockchain often requires low-level knowledge about cryptographic protocols or particular networks. Since no one blockchain platform has emerged as dominant- and the field itself is rapidly evolving- there is a high opportunity cost for developers if they choose to invest significant time learning one blockchain paradigm or another.Reach provides a platform for developing DApps, complete with a high-level language based on Javascript. Reach allows developers to write one set of code to specify the DApp and all its components, and which can be deployed onto any blockchain implementation under the hood. Reach's goal is to allow developers to focus on writing business logic for their DApps rather than worrying about low-level implementation details and aims to smooth the steep learning curve for developers new to the world of blockchain.Chris Swenor and Jay McCarthy are the founders of Reach. Chris was formerly the co-founder and CEO of Alacris Protocol, an operating system for blockchain applications, and he is currently a technologist in residence and mentor at Harvard. Jay has been a computer science professor for over a decade, and worked on the development of the Racket programming language. Chris and Jay join the show today to talk about the challenges of developing on blockchain, how Reach helps make blockchain developers more productive, and how the blockchain ecosystem might evolve in the future.
Serverless computing refers to an architectural pattern where server-side code is run on-demand by cloud providers, who also handle server resource allocation and operations. Of course, there is a server involved on the provider's side, but administrative functions to manage that server such as capacity planning, configuration, or management of containers are handled behind-the-scenes, allowing the application developers to focus on business logic. This makes for highly elastic and scalable systems and can reduce development, testing, and iteration time due to reduced overhead. Function as a Service (FaaS) describes a model of serverless computing where services are decomposed into modular functions and deployed to a serverless platform. These functions are executed only when called and are typically stateless. Despite the benefits of elasticity and modularity that FaaS offers, it has drawbacks as well. Taking disaggregation of functionality to an extreme means that behavior that formerly required a method call now may require a network call to another function, increasing latency and making larger-scale operations inefficient. Cloudburst is a stateful FaaS platform built to combine the power of low-latency mutable state and communication in Python with the elasticity and scalability allowed by serverless architecture. Johann Schleier-Smith is an entrepreneur and engineer who is currently a board member of Sama. Johann was formerly the founder and CTO of if(we), a social network and incubator. He is the co-author of the paper “Cloudburst: Stateful Functions as a Service,” and joins the show today to talk about how Cloudburst addresses the drawbacks of current FaaS models, and what's next for serverless computing.
Over the past few years, the conventional wisdom around the value proposition of Big Data has begun to shift. While the prevailing attitude towards Big Data may once have been “bigger is better,” many organizations today recognize that broad-scale data collection comes with its own set of risks. Data privacy is becoming a hotly debated topic both in the technology industry and in regulatory agencies and governments. Bigger and less private datasets are more attractive targets for hackers, meaning that an organization must invest heavily in security as well to avoid a breach. Every organization faces a tradeoff between the value of the insights produced from large datasets versus increased storage costs and increasing privacy risks. Tonic is building a “synthetic data” platform to address these tradeoffs and help organizations mitigate data risk. Tonic takes in raw data, perhaps from a data lake, and transforms it into more manageable, de-identified data sets for ease of use and user privacy. Tonic can create statistically identical, structured datasets that allow software engineers and business analysts to extract the same useful insights that drive an organization's progress, without the risk of working with identifiable, private user data. Ian Coe, Andrew Colombi, and Adam Kamor are co-founders of Tonic. Along with their fourth co-founder, Karl Hanson, Ian, Andrew, and Adam all worked together at Palantir Technologies where the idea for Tonic was born. They join the show today to talk about the value of synthetic data, the risks and rewards of big data, and how compliance, privacy, and security are driving innovation in the data management sector.
In the past several years, Kubernetes has become the de-facto standard for orchestrating containerized, stateless applications. Tools such as StatefulSets and Persistent Volumes have helped developers build stateful applications on Kubernetes, but this can quickly become difficult to manage as an application scales. Tasks such as machine learning, distributed AI, and big data analytics often require a distributed application to maintain some sort of state across services. KubeDirector is an open-source controller that helps streamline the deployment and management of complex stateful scale-out application clusters on Kubernetes. KubeDirector provides an application-agnostic deployment pattern and enables developers to run non-cloud native stateful applications on Kubernetes without modifying the code. KubeDirector is part of a larger project called BlueK8s, which aims to bring enterprise-level capabilities for distributed stateful applications to Kubernetes.Kartik Mathur is an engineere at HPE Developer, an open-source initiative within Hewlett-Packard Enterprise. HPE is an enterprise contributor to the KubeDirector open-source community. Kartik previously worked as senior software engineers at BlueData, which created the KubeDirector project before its acquisition by HPE. Kartik joins the show today to talk about why state is important for Big Data or Machine learning applications, how KubeDirector can help manage the complexity of stateful applications, and what's next for the BlueK8s project as a whole.
Prediction Markets provide an exchange for trading based on the outcome of events. Most prediction markets are centralized- they operate like a casino, where betting takes place under the supervision of one central governing organization. This makes the market less efficient than it otherwise might be: the central organization is a business, and it makes money by extracting value from the trades the customers make. Augur is a prediction market built on the Ethereum blockchain. A trading network built on a blockchain can have a decentralized, permissionless transaction record without a centralized, governing body. Augur's network is built to be transparent, low-cost, and free from interference.Joey Krug joins us today from Pantera Capital, a venture capital fund focused on Blockchain technology. Joey is also a co-founder of the Forecast Foundation, which contributes to the development of the Augur open-source project. We discuss what it takes to build a trustworthy decentralized market, how Augur is solving challenges such as the oracle problem, and why blockchain may be the key to democratizing finance.
A “co-location” center is a data center that leases out networking and compute infrastructure to retail clients. Co-location centers host clients with a wide variety of infrastructure strategies, from small retail customers, to medium-size teams running hybrid cloud models, to large corporate clients who prefer not to incur the capital cost of building their own data center. While Equinix is already a market leader in co-location centers, they have expanded to provide a wide variety of services for their clients, including managed IaaS, disaster recovery, and integrations with cloud providers such as AWS and Google Cloud.Shaedon Blackman is a partner developer analyst at Equinix. As a partner developer analyst, Shaedon works to build Equinix's network of corporate partners, while also advocating for diverse and inclusive human capital within the organization. Before he joined Equinix, he was a Core Fellow at Pursuit, a software engineering fellowship funded by Google, and the Chief Operating Officer of a non-profit youth program. He joins the show today to talk about a variety of important topics facing the tech industry today, including diversity, inclusion, and education, and also how Equinix is building partnerships and sponsoring open-source projects to achieve its goals.
Studies show that people in “maker” professions such as developers and writers are most productive when they can carve out dedicated time for focused work, without the frequent context-switching that comes with an irregular meeting schedule. Meetings and other non-development work are necessary parts of the job, but a team will be much more productive with deep work time in mind. Okay is an engineering metrics dashboard platform designed with the goal of maximizing time for deep work. Okay helps break down time slots into categories such as Maker Time, Meeting Load, and Friction Time based on data collected and feedback from the team. Okay organizes both quantitative and qualitative data into a single dashboard for team planning, and supports plug-and-play integrations with productivity tools such as GoogleCalendar, PagerDuty, and CircleCI, and more.Tomas Barreto is the CTO and co-founder of Okay. Before founding Okay, he worked with Sequoia Capital and Y Combinator, and was VP of engineering at Box. He joins the show today to talk about why Maker Time is so important for engineers, and how Okay helps teams make data-driven decisions to maximize productivity.
Yelp.com is a crowdsourced review platform focused on restaurants and local businesses. Originally created as an email-based recommendation service, Yelp re-launched in its modern form in 2005. At the time, its focus on user-created reviews and social interactions was fairly novel, and made it stand out from competitors such as Angie's List and CitySearch. Since then, Yelp has become a worldwide brand, and as of 2021 it has over 171 million reviews on its site.The mid- to-late 2000s represented a time of explosive growth and profound change in the web application space. Industry leaders like Yelp had to adapt their technology stacks for the unprecedented scaling they were experiencing. At the same time, the rise of smartphones led Yelp and many others into the mobile application space. Michael Stoppelman was an engineer at Yelp during this turbulent time. He left Yelp in 2015, and now works as an angel investor. He joins the show today to talk about the engineering challenges Yelp's team faced during this time, the profound changes that the industry as a whole went through, and how the history of Yelp can help us contextualize the startup landscape today.
Cloud platforms are often categorized as providing either Infrastructure-as-a-Service or Platform-as-a-Service. On one side of the spectrum are IaaS giants such as AWS, which provide a broad range of services for building infrastructure. On the other are PaaS providers such as Heroku and Netlify which abstract away the lower-level choices and focus on developer experience. Digital Ocean has carved out a sizable niche in the cloud hosting space by targeting the middle ground- a streamlined cloud platform built for developers, which still offers the ability to choose, customize, and manage infrastructure. The release of Digital Ocean's App Platform takes this goal a step further. The App Platform allows users to build and deploy an app or static site directly from GitHub directly onto a DigitalOcean-managed Kubernetes cluster. Teams can access the power, scale, and flexibility of Kubernetes without having to worry about the complexity of managing a cluster themselves. The App Platform gives developers the choice of how much of their infrastructure they want to control, and how much they want to be provided by the platform.Cody Baker and Apurva Joshi work at Digital Ocean. They join the show today to talk about why Digital Ocean stands out in a competitive cloud hosting space, what is the value proposition for developers interested in the App Platform, and how the PaaS industry is evolving.
Play Episode Listen Later Feb 25, 2021
Modern SaaS products are increasingly delivered via the cloud, rather than as downloadable, executable programs. However, many potential users of those SaaS products may need that software deployed on-prem, in a private network. Organizations have a variety of reasons for preferring on-prem software, such as security, integration with private tools, and compliance with regulations. The cost of setting up a bespoke on-prem version of a SaaS offering was often prohibitive for both the vendor and potential users.Replicated leverages the portability of containers to help SaaS vendors ship an on-prem or multi-prem version of their software. Replicated gives SaaS vendors a suite of ready-made components to help install and manage an instance of their software on-prem. Replicated has seen explosive growth in its six-year lifespan: today, 50 of the Fortune 100 companies manage apps with Replicated.Grant Miller is the Founder and CEO of Replicated. Replicated recently closed a Series B fundraising round to help scale their work in the cloud-native space, including the launch of their Kubernetes-Off-the-Shelf platform. He joins the show today to talk about the next generation of tools on the Replicated platform, how Kubernetes is changing enterprise IT, and why the on-prem software market isn't going away anytime soon.
Static analysis is a type of debugging that identifies defects without running the code. Static analysis tools can be especially useful for enforcing security policies by analyzing code for security vulnerabilities early in the development process, allowing teams to rapidly address potential issues and conform to best practices.R2C has developed a fast, open-source static analysis tool called Semgrep. Semgrep provides syntax-aware code scanning and a database of thousands of community-defined rules to compare your code against. Semgrep also makes it easy for security engineers and developers to define custom rules to enforce their organization's policies. R2C's platform has been adopted by industry leaders such as Dropbox and Snowflake, and recently received the “Disruptive Innovator” distinction at Forbes' 2020 Cybersecurity Awards.Isaac Evans is the Founder and CEO of R2C. Before founding R2C he was an Entrepreneur in Residence at Redpoint Ventures and a computer scientist at the US Department of Defense. Isaac joins the show today to talk about how R2C is helping teams improve their cloud security, why static analysis is a natural fit for CI/CD workflows, and what to expect from R2C and the Semgrep project in the future.
Build automation tools automate the process of building code, including steps such as compiling, packaging binary code, and running automated tests. Because of this, build automation tools are considered a key part of a continuous delivery pipeline. Build automation tools read build scripts to define how they should perform a build. Common build scripts include Makefile, Dockerfile, and bash. Earthly is a build automation tool that allows you to execute all your builds in containers. Earthly uses Earthfiles, which draws from the best features of Makefile and Dockerfile and provides a common layer between language-specific tooling and the CI build spec. Earthly builds are repeatable, isolated, and self-contained, and will run the same way across different environments such as a CI system or a developer's laptop. Vlad Ionescu is the Founder and CEO of Earthly Technologies. He was formerly the founder and chief architect at ShiftLeft.io. Vlad joins the show today to talk about why reproducible builds are important, how Earthly simplifies build scripts, and what the long-term vision for Earthly looks like.
A data warehouse is a centralized repository that an enterprise may use to store selected data from production systems. Data is transformed into a structured form that makes it easily accessible for business intelligence or other operational users. SQL-compliant databases are frequently used for data warehouses due to the popularity of SQL as a tool in business data analytics.PostgreSQL is a free and open-source relational database management system. Postgres-based databases are widespread and are used by a variety of organizations, from Reddit to the International Space Station, and Postgres databases are a common offering from cloud providers such as AWS, Alibaba Cloud, and Heroku. Josh Drake and Thomas Richter are experts on Postgres data warehousing. They join the show today to talk about the staying power of Postgres, why Postgres is a good choice for data warehousing, and how cloud technology is changing relational database management systems.
AWS offers over 200 services as part of its IaaS platform, and that number continues to grow. Organizing all of these services, and tracking the costs they incur, can be a significant challenge, often requiring teams of AWS-certified sysadmins working together to get a handle on an enterprise-scale system. Vantage provides an alternative, streamlined AWS console that makes it easier to manage AWS services and track associated costs. Users link their AWS account to Vantage, and it automatically profiles all their services and aggregates the information into a dashboard. Users can customize how their Vantage console appears and allows users to break down service usage by region.Ben Schaechter is the co-founder of Vantage. Before founding Vantage he was a Senior Product Manager at AWS and DigitalOcean. Ben joins the show today to talk about how Vantage helps streamline the AWS experience, and why teams of all sizes can benefit from a better user experience on the AWS platform.
WordPress is a free and open-source content management system, or CMS, written in PHP. Since its release in 2003, WordPress has become ubiquitous on the web. It is estimated that roughly 60 million websites use WordPress as a CMS. However, despite its popularity, WordPress has limitations in its design. WordPress sites are dynamic, and the front and back end are tightly coupled. A dynamic, full-stack application can be useful when handling complex functionality, but it also slows down the site and opens up security vulnerabilities. Strattic is a static site generator and hosting platform that specializes in converting WordPress sites into a static architecture. Static pages are isolated from the backing application, improving security against common WordPress vulnerabilities. Modern web users have high expectations for speed and security, and Strattic helps WordPress sites achieve this without sacrificing the benefits of the WordPress platform.Zeev Suraski is the CTO of Strattic. Zeev is one of the architects and principal authors of the PHP language, which is the foundation of WordPress. Zeev joins the show today to talk about the place of PHP in modern web development, and how Strattic helps WordPress developers build modern, fast, and secure sites.
Email has become such a routine feature of knowledge work that we often take it- and the email clients we use for it- for granted. While advancements such as intelligent spam filtering have improved the experience, many email clients retain the same basic structure and offer a largely similar experience.Superhuman is building a modern email client meant to reimagine email from the ground up. Superhuman is built to be fast, and seamlessly integrates insights from social networks such as Linkedin. It offers features such as undo send, AI triage, and mail status tracking. Superhuman even works offline, using Service Workers to serve cached assets when a network connection is not available. Emuye Reynolds is the Head of Engineering at Superhuman. She was formerly a Senior Software Developer at Apple and led the development of the UI for Apple TV UI. She joins the show today to talk about what's lacking from traditional email clients, what engineering challenges her team faces when building across multiple platforms, and how Superhuman's new features represent an evolutionary step forward for email client technology.
Vectors are the foundational mathematical building blocks of Machine Learning. Machine Learning models must transform input data into vectors to perform their operations, creating what is known as a vector embedding. Since data is not stored in vector form, an ML application must perform significant work to transform data in different formats into a form that ML models can understand. This can be computationally intensive and hard to scale, especially for the high-dimensional vectors used in complex models.Pinecone is a managed database built specifically for working with vector data. Pinecone is serverless and API-driven, which means engineers and data scientists can focus on building their ML application or performing analysis without worrying about the underlying data infrastructure.Edo Liberty is the founder and CEO of Pinecone. Prior to Pinecone, he led the creation of Amazon SageMaker at AWS. He joins the show today to talk about the fundamental importance of vectors in machine learning, how Pinecone built a vector-centric database, and why data infrastructure improvements are key to unlocking the next generation of AI applications.
Google Cloud, AWS, and Azure are the dominant cloud providers on the market today. But the market is still highly competitive, and there is significant overlap in the services offered by all three large providers. Since all three offer a broad range of services, developers looking to choose a platform for their application must focus on providers' domains of relative excellence and how those align with their needs. One domain where Google Cloud Platform excels is with its database offerings. Google has data management baked into its organizational DNA, and has been the source of several innovative technologies in the data space such as Spanner, BigTable, and BigQuery. Andi Gutmans is a general manager and VP of engineering for databases at Google. He joins the show today to talk about how Google came to excel at databases and data management, how machine learning and Big Data users in particular can benefit from Google Cloud's offerings, and how new features such as Database Migration Service are helping Google stay ahead of the curve in a competitive cloud landscape.
Software-Defined Networking describes a category of technologies that separate the networking control plane from the forwarding plane. This enables more automated provisioning and policy-based management of network resources. Implementing software-defined networking is often the task of Site Reliability Engineers, or SREs. Site reliability engineers work at the intersection of development and operations by bringing software development practices to system administration. Equinix manages co-location data centers and provides networking, security, and cloud-related services to their clients. Equinix is leveraging its status as a market leader in on-prem networking capabilities to expand into cloud and IaaS offerings such as Equinix Metal, which has been referred to as “bare-metal-as-a-service,” and offers integrations with 3rd party cloud technologies with a goal of creating a seamless alternative to modern public clouds for organizations seeking the benefits of colocation.Tim Banks is a Principal Solutions Architect at Equinix and he joins the show to talk about what Equinix offers and how it differs from other cloud providers.
The shift to microservices architectures and distributed systems has been a challenge for systems using conventional security practices, such filtering IP addresses using network policies. In addition, the increasing intersection of development and operations exemplified by the DevOps methodology has expanded the scope responsibilities in implementing secure systems. SPIFFE is a set of open-source specifications for issuing identity to services in heterogenous, distributed environments such as a cloud-native microservices architecture. Systems implementing SPIFFE bypass the need for application-level authentication and network-level ACL configuration. SPIRE, or the SPIFFE Runtime Environment, is a system that implements the SPIFFE standards to manage platform and workload attestation, providing an API for controlling policies, and coordinating certificate issuance and rotation.Derek Edwards is the head of engineering at Anthem.ai, and Ryan Turner is a software engineer at Uber. They join the show today to talk about the challenges of managing security in a distributed system, how adopting SPIFFE represented a paradigm shift in their authentication workflow, and how the SPIFFE and SPIRE projects are evolving to meet the needs of the next generation of cloud-native applications.
Play Episode Listen Later Mar 19, 2021
As the volume and scope of data collected by an organization grow, tasks such as data discovery and data management grow in complexity. Simply put, the more data there is, the harder it is for users such as data analysts to find what they're looking for. A metadata hub helps manage Big Data by providing metadata search and discovery tools, and a centralized hub which presents a holistic view of the data ecosystem. DataHub is Linkedin's open-sourced metadata search and discovery tool. It is Linkedin's second generation of metadata hubs after WhereHows. Pardhu Gunnam and Mars Lan join us today from Metaphor, a company they co-founded to build out the DataHub ecosystem. Pardhu and Mars, and the other co-founders of Metaphor, were part of the team at Linkedin that built the DataHub project. They join the show today to talk about how DataHub democratizes data access for an organization, why the new DataHub architecture was critical to Linkedin's growth, and what we can expect to see from the DataHub project moving forwards.
Observability is a key feature of a well-architected application. Because building an observability system for a cloud application can be challenging, especially at scale, many organizations elect to use third-party observability platforms rather than build internal tools. But these third-party provider contracts often charge by volume of data collected, which can be unpredictable and difficult
The complexity of building web applications seems to have grown exponentially in the last several years. This added complexity may bring power, but it can also make applications brittle, costly, and difficult to maintain. Suborbital is an open-source project with a goal of making web application development simple. Its flagship project is Atmo, a platform
Play Episode Listen Later Mar 24, 2021
ELT, or “Extract, Load, and Transform,” is the process that modern data pipelines use to replicate data from a source and load it into a target system such as a cloud data warehouse. ELT is a more flexible evolution of the traditional “Extract, Load, Transform” workflow used in pre-cloud systems. The power of ELT relies
Play Episode Listen Later Mar 25, 2021
Many startups today begin their life as an open-source project. Open source projects allow early adopters of a technology to experiment, to contribute code and feedback, and to shape the evolution of the project in its early stages. When a “community maintainer” company emerges to provide service offerings based on that project, its early customer
Right now, more than 10 million people use notebooks like Jupyter in their workflow. Notebooks are open-source tools for creating and sharing documents with live code, equations, visualizations and explanatory text. Notebooks like Jupyter have exploded in popularity the past 5 years to become the standard tool for data science teams. They became especially important
Product teams sometimes double as data teams. They struggle through import errors, scrub long and complicated data sheets for consistency, and map spreadsheet fields on step 3 in a long instruction document. Data structuring and synchronization is a very real problem that product teams regularly overcome. Flatfile uses AI-assisted data onboarding to eliminate repetitive work
Creation Labs is helping bring Europe 1 step closer to fully autonomous long haul trucking. They have developed an AI Driver Assistance System (AIDAS) that retrofits to any commercial vehicle, starting with VW Crafters and MAN TGE trucks. Their system uses camera hardware mounted to the vehicle to capture video data that is processed with
Digital communities have exponentially grown in importance ever since most of the world went remote. Basically every popular online forum, message board, chat app, and other online social aggregators were created before this new normal. Many of these platforms lack sufficient organization or are just outdated for a fully remote environment. If society continues to
Cryptocurrencies like Bitcoin and Dogecoin are electronic currencies with a complete transaction history stored on a blockchain. A cryptocurrency blockchain is a linear record of all the transactions between users for a given currency. This record is public and distributed across thousands of computers, which makes falsifying a transaction nearly impossible because the hacker would
A major change in the software industry is the expectation of automation. The infrastructure for deploying code, hosting it, and monitoring it is now being viewed as a fully automatable substrate. Equinix Metal has taken the bare metal servers that you would see in data centers and fitted them with supreme automation and repeatability. This
The typical procedure many companies follow to reach production-level code is design the program, code and test it in different environments, and put it in a pipeline to deploy to production. Developers can make it pretty far into building their core features before inevitably breaking to include enterprise features and security standards like Single Sign
Using artificial intelligence and machine learning in a product or database is traditionally difficult because it involves a lot of manual setup, specialized training, and a clear understanding of the various ML models and algorithms. You need to develop the right ML model for your data, train the model, evaluate it, optimize it, analyze it
Play Episode Listen Later Apr 7, 2021
A smart contract contains the “terms” of a blockchain transaction between a buyer and a seller as well as the capabilities to execute those terms. In order for smart contracts to include outside data from the world, such as stock market data, weather, sports data, etc…, the contract needs a third party service called an
In decentralized finance (DeFi) a liquidity pool is a collection of cryptocurrency funds created from the deposits of many users and usually multiple different currencies. There are 2 main types of pools: custodial and non-custodial. Custodial pools are controlled by a third party manager which contains information like the private keys and the funds. They
A ‘token' can represent almost anything in Ethereum, according to Ethereum.org: Lottery tickets, points in an online platform, fiat currency, and much more. These tokens must follow a standard called ECR-20 to have the same type and value of any other token, and behave just like the ETH. The platform Opyn lets users buy and
Decentralized applications, termed “dApps,” are applications that feel like normal apps but are actually deployed (mostly) on the Ethereum blockchain. This means dApps can't be taken down, can't be censored or blocked, typically use Ethereum accounts as identity, and would only experience downtime if Ethereum itself went down. There are a lot of things you
Play Episode Listen Later Apr 15, 2021
IT infrastructure are the components required to operate IT environments, like networks, virtual machines or containers, an operating system, hardware, data storage, etc…. As companies build out different deployment environments with infrastructure configurations, they must maintain the different environments, replicate them, and update them. The management of infrastructure, often automated to some extent, is referred
A data warehouse is a data management system that often contains large amounts of historical data and is used for business intelligence activities like analytics. It centralizes customer data from multiple sources to be an organization's single source of truth. Getting the data from your data warehouse into the different applications used by your organization
Play Episode Listen Later Apr 17, 2021
A decentralized exchange, usually referred to as a DEX, is a platform for exchanging cryptocurrencies. Depending on trading volume for different coins, some DEXs are more liquid than others. On the one hand you can freely swap unlisted tokens and maintain full control over your private keys and wallet information. On the other hand, without
Volatility is the degree of fluctuation of something's price. Highly volatile assets may see rapid and large price changes, while less volatile assets will maintain a steady price. This concept is important in decentralized finance because cryptocurrencies tend to be volatile assets. The company Synthetix provides assets called Synths that provide exposure to an asset
Non-fungible tokens are proofs of authenticity that are stored on a blockchain. Unlike fungible tokens, such as cryptocurrencies which are interchangeable, non-fungible tokens aren't inherently equivalent to any other token. Because they are unique, they can be used to represent any unique asset. Their presence on a blockchain enables an NFT owner to trade the
A liquid market enables individuals or groups to quickly buy and sell assets. Decentralized platforms can struggle to execute trades when their platform does not have much liquidity for a specific token. Newer tokens or tokens with limited supply are most often the least liquid because there might be an imbalance of buyers and sellers.
Encryption algorithms provide the means to secure and transfer sensitive information by taking input and transforming it into an unreadable output. Usually a special key, or multiple keys, are needed to unscramble the information back to the original input. These algorithms power the security of everything from our cell phone lock screens to Fortune 500
Large portions of software development budgets are dedicated for testing code. A new component may take weeks to thoroughly test, and even then mistakes happen. If you consider software defects as security issues then the concern goes well beyond an application temporarily crashing. Although even minor bugs can cost companies a lot of time to
Geospatial technology impacts every person who uses a smartphone, drives a car, or flies in airplanes. It refers to all of the technology used to acquire and interpret geographic information. In more advanced settings, geospatial technology is used for constructing dynamic maps, 3D visualizations, and scientific and governmental simulations. The company Makepath specializes in geospatial
Play Episode Listen Later Apr 27, 2021
Modern applications are increasingly built as large, distributed systems. A distributed system is a program where its components are located on different machines that communicate with one another to create a single cohesive app. Components may exist as multiple instances across “nodes,” the computers hosting them, which form clusters of nodes that span across geographic
Cloud computing provides tools, storage, servers, and software products through the internet. Securing these resources is a constant process for companies deploying new code to their cloud environments. It's easy to overlook security flaws because company applications are very complex and many people work together to develop them. Wyze Labs, for example, had millions of
Play Episode Listen Later May 5, 2021
In software engineering, telemetry is the data that is collected about your applications. Unlike logging, which is used in the development of apps to pinpoint errors and code flows, telemetry data includes all operational data including logs, metrics, events, traces, usage, and other analytical data. Companies usually visualize this information to troubleshoot problems and understand
Natural Language Processing (NLP) is a branch of artificial intelligence concerned with giving computers the ability to understand text and spoken words. “Understanding” includes intent, sentiment, and what's important in the message. NLP powers things like voice-operated software, digital assistants, customer service chat bots, and many other academic, consumer and enterprise tools. The company Botpress
Microservice architecture has become very common over the past few years because of the availability of containers and container orchestrators like Kubernetes. While containers are overall positive for scaling apps and making them more available, they've also introduced hurdles like persisting data and state, and container restarts or pod failures. Development teams put significant work
Play Episode Listen Later May 11, 2021
The traveling salesman problem is a classic challenge of finding the shortest and most efficient route for a person to take given a list of destinations. This is one of many real-world optimization problems that companies encounter. How should they schedule product distribution, or promote product bundles, or define sales territories? The answers to these
An application programming interface, API for short, is the connector between 2 applications. For example, a user interface that needs user data will call an endpoint, like a special URL, with request parameters and receive the data back if the request is valid. Modern applications rely on APIs to send data back and forth to
Apache Hudi is an open-source data management framework used to simplify incremental data processing and data pipeline development. This framework more efficiently manages business requirements like data lifecycle and improves data quality. Some common use cases for Hudi is record-level insert, update, and delete, simplified file management and near real-time data access, and simplified CDC
Apache Spark is a popular open source analytics engine for large-scale data processing. Applications can be written in Java, Scala, Python, R, and SQL. These applications have flexible options to run on like Kubernetes or in the cloud. The company Data Mechanics is a cloud-native Spark platform for data engineers. It runs continuously optimized Apache
Columnar databases store and retrieve columns of data rather than rows of data. Each block of data in a columnar database stores up to 3 times as many records as row-based storage. This means you can read data with a third of the power needed in row-based data, among other advantages. The company Altinity is
The company Skynet Labs provides an open protocol for hosting data and web applications on the decentralized web. Skynet allows for decentralized, censorship-resistant, highly redundant storage and applications that are available around the globe. Developers don't pay for their application's storage, can launch apps with access to a user's data right away, are free from
Play Episode Listen Later May 20, 2021
Application Programming Interfaces (APIs) are interfaces that enable multiple software applications to send and retrieve data from one another. They are commonly used for retrieving, saving, editing, or deleting data from databases, transmitting data between apps, and embedding third-party services into apps. The company BaseTen helps companies build and deploy machine learning APIs and applications.
Apache Superset is an open-source, fast, lightweight and modern data exploration and visualization platform. It can connect to any SQL based data source through SQLAlchemy at petabyte scale. Its architecture is highly scalable and it ships with a wide array of visualizations. The company Preset provides a powerful, easy to use data exploration and visualization
Running applications in containerized environments involves regularly organizing, adding and replacing containers. This complex job may involve managing clusters of containers in different geographic locations with different configuration requirements. Platforms like Kubernetes are great for managing this complexity, but include steep learning curves to efficiently get anything off the ground. The company Portainer provides a
Cloud data warehouses are databases hosted in cloud environments. They provide typical benefits of the cloud like flexible data access, scalability, and performance. The company Firebolt provides a cloud data warehouse built for modern data environments. It decouples storage and compute to operate on top of existing data lakes like S3. It computes orders of
Play Episode Listen Later May 27, 2021
Arun Kumar is an Assistant Professor in the Department of Computer Science and Engineering and the Halicioglu Data Science Institute at the University of California, San Diego. His primary research interests are in data management and systems for machine learning/artificial intelligence-based data analytics. Systems and ideas based on his research have been released as part
Data exploration uses visual exploration to understand what is in a dataset and the characteristics of the data. Data scientists explore data to understand things like customer behavior and resource utilization. Some common programming languages used for data exploration are Python, R, and Matlab. Doris Jung-Lin Lee is currently a Graduate Research Assistant at the
Play Episode Listen Later May 29, 2021
Flutter is a UI toolkit developed by Google that helps developers build natively compiled applications for mobile, web, desktop, and embedded devices from a single code base. Development is fast because the screen “hot reloads” as you develop, the architecture is layered for fast and expressive designs, and its widgets incorporate all critical platform differences
AWS Outposts is a fully managed service that offers the same AWS infrastructure, AWS services, APIs, and tools to virtually any datacenter, co-location space, or on-premises facility for a truly consistent hybrid experience. AWS Outposts is ideal for workloads that require low latency access to on-premises systems, local data processing, data residency, and migration of
Play Episode Listen Later Jun 2, 2021
Platforms like Ethereum have billions of dollars of market cap and large developer communities. However, it is still a challenge to build widely adopted DApps on it because of current limitations. Blockchain Proof of Work transactions are typically slow, and Proof of Stake transactions trade off decentralization to achieve high throughput. Transaction fees get expensive,
Next week Corey Quinn will be guest hosting on Software Engineering Daily, presenting a Tour of the Cloud. Corey Quinn is the Chief Cloud Economist at The Duckbill Group, where he helps companies fix their AWS bill by making it smaller and less horrifying. If you’re looking to lower your AWS bill or negotiate a
Mark Saroufim is the author of an article entitled “Machine Learning: The Great Stagnation”. Mark is a PyTorch Partner Engineer with Facebook AI. He has spent his entire career developing machine learning and artificial intelligence products. Before joining Facebook to do PyTorch engineering with external partners, Mark was a Machine Learning Engineer at Graphcore. Before
Corey Quinn is guest hosting on Software Engineering Daily this week, presenting a Tour of the Cloud. Corey Quinn is the Chief Cloud Economist at The Duckbill Group, where he helps companies fix their AWS bill by making it smaller and less horrifying. If you’re looking to lower your AWS bill or negotiate a new
Corey Quinn is guest hosting on Software Engineering Daily this week, presenting a Tour of the Cloud. Corey Quinn is the Chief Cloud Economist at The Duckbill Group, where he helps companies fix their AWS bill by making it smaller and less horrifying. If you’re looking to lower your AWS bill or negotiate a new
Corey Quinn is guest hosting on Software Engineering Daily this week, presenting a Tour of the Cloud. Corey Quinn is the Chief Cloud Economist at The Duckbill Group, where he helps companies fix their AWS bill by making it smaller and less horrifying. If you’re looking to lower your AWS bill or negotiate a new
Corey Quinn is guest hosting on Software Engineering Daily this week, presenting a Tour of the Cloud. Corey Quinn is the Chief Cloud Economist at The Duckbill Group, where he helps companies fix their AWS bill by making it smaller and less horrifying. If you’re looking to lower your AWS bill or negotiate a new
Corey Quinn is guest hosting on Software Engineering Daily this week, presenting a Tour of the Cloud. Corey Quinn is the Chief Cloud Economist at The Duckbill Group, where he helps companies fix their AWS bill by making it smaller and less horrifying. If you’re looking to lower your AWS bill or negotiate a new
Play Episode Listen Later Jun 14, 2021
Coinbase is a very popular and well trusted cryptocurrency platform for buying and selling digital assets like Bitcoin, Ethereum, and many more. With Coinbase you can manage your portfolio of cryptocurrencies in 1 place like you would for other investments. There are added features like scheduling recurring purchases of assets, time-delayed withdrawals from digital vaults,
Amundsen was started at Lyft and is the leading open-source data catalog with the fastest-growing community and the most integrations. Amundsen enables you to search your entire organization by text search, see automated and curated metadata, share context with co workers, and learn from others by seeing most common queries on a table or frequently
Delivering Saas products involves a lot more than just building the product. Saas management involves customer relationship management, licensing, renewals, maintaining software visibility, and the general management of the technology portfolio. The company Blissfully helps businesses manage their SaaS products from within a complete IT platform with organization, automation, and security built in. The Blissfully
Play Episode Listen Later Jun 17, 2021
The company StreamSets is enabling DataOps practices in today's enterprises. StreamSets is a data engineering platform designed to help engineers design, deploy, and operate smart data pipelines. StreamSets Data Collector is a codeless solution for designing pipelines, triggering CDC operations, and monitoring data in flight. StreamSets Transformer uses Apache Spark to generate insights about your
Proof of Work cryptocurrency mining, as used on the Ethereum and Bitcoin blockchains, requires huge amounts of energy to validate transactions and generate new tokens. The alternative, Proof of Stake, needs large deposits of assets to be staked up front in order to work. While both consensus protocols have their own drawbacks, they are the
Play Episode Listen Later Jun 21, 2021
Complete information games are games where every player has information about the game sequence, strategies, and payoffs throughout gameplay. Playing chess, for example, relies on knowing the location of every piece everywhere on the board. In an incomplete information game like Minecraft, you continually gain new information during gameplay. Until very recently, incomplete information was
In this episode we discuss software investing, business, and the future with David Rosenthal, co-host of the Acquired podcast. This interview was also recorded as our very first video podcast. Check out the video on the Software Daily YouTube channel. Sponsorship inquiries: sponsor@softwareengineeringdaily.com
The quantity and quality of a company's data can mean the difference between a major success or major failure. Companies like Google have used big data from its earliest days to steer their product suite in the direction consumers need. Other companies, like Apple, didn't always use big data analytics to drive product design, but
Uber is one of many examples we've discussed on this show that has changed the world with big data analysis. With over 8 million users, 1 billion Uber trips and people driving for Uber in over 400 cities and 66 countries, Uber has redefined an entire industry in a very short time frame. It's difficult
The cloud has delivered amazing benefits like on-demand infrastructure that's easy to use, pay-as-you-go subscription plans, and effortless scaling of applications. This flexibility minimizes the growing pains for businesses and explains why today's startups and established companies are both building apps on the cloud. However, the costs of using the cloud stack up once companies
For some data problems, you may be more concerned with the state of data at a particular point. A ticket is booked, or it's not. How many poetry submissions were made to the contest? This is relational data. For other problems, you're concerned with the change in data over time. Solar energy consumption, for example,
In this episode we discuss plug and play auth, password management, and crypto with Sean Li, co-founder and CEO of Magic. This interview was also recorded as a video podcast. Check out the video on the Software Daily YouTube channel. Sponsorship inquiries: sponsor@softwareengineeringdaily.com
ELT is a process for copying data from a source system into a target system. It stands for “Extract, Load, Transform” and starts with extracting a copy of data from the source location. It's loaded into the target system like a data warehouse, and then it's ready to be transformed into a usable format for
Continuous integration is a coding practice where engineers deliver incremental and frequent code changes to create higher quality software and collaborate more. Teams attempting to continuously integrate new code need a consistent and automated pipeline for reviewing, testing, and deploying the changes. Otherwise change requests pile up in the queue and nothing gets integrated efficiently.
There are over 4 billion people using email. Many people using email for business communicate quick questions to colleagues, send repetitive, template-based information to potential customers and freshly hired employees, and repeat a lot of the same phrases. We actually repeat phrases in a lot of written formats. How often do you copy and paste
When I worked as an engineer at Amazon, I would arrive at the office every day before 6:30am. Amazon's enormous campus is in downtown Seattle. The company expands across the city like a massive, growing organism composed of towering buildings and ecospheres. The constant wet mist of Seattle always seems lighter within Amazon campus, perhaps
SOC 2 is a security audit to prove that SaaS companies have secured their company and customer data. It's often considered the minimum audit necessary to sell software. HIPAA is a federal law regulating how sensitive medical information about patients must be handled. ISO 27001 is the global benchmark for demonstrating your information security management
In this episode we discuss coding bootcamps, fear, fitness, and more with Ruben Harris, CEO of Career Karma. This interview was also recorded as a video podcast. Check out the video on the Software Daily YouTube channel. Sponsorship inquiries: sponsor@softwareengineeringdaily.com
The company Dynatrace provides intelligent observability, continuous automation, and causation-based AI to help Cloud Ops, DevOps, and SRE teams transform faster, innovate more, and deliver better business outcomes. They offer application performance monitoring, infrastructure monitoring, cloud automation, application security, and much more. While being an industry leader in simplifying complex cloud applications, Dynatrace is continuously
We talk to a lot of exciting startups from all over the world about their tech products. Recently we've heard from people innovating in the blockchain space, cloud infrastructure, databases, and automation tools. However, in today's episode, we're going to talk about how these tech startups get investments to build their great products in the
Here is the full audiobook for “Move Fast: How Facebook Builds Software”. Continue listening to the end to hear my most recent album “Simulation”.
In this episode we discuss the new Move Fast book, as well as many aspects of the current state of software engineering. Daliana Liu interviews Jeff Meyerson, host of Software Daily and author of Move Fast. This interview was also recorded as a video podcast. Check out the video on the Software Daily YouTube channel.
Play Episode Listen Later Jul 14, 2021
Data science is an interdisciplinary field that combines strong technical skills with industry knowledge to perform a large range of jobs. Data scientists solve business questions with hands-on work cleaning and analyzing data, building machine learning models and applying algorithms, and generating dynamic visuals and tools to understand the world from the data it generates.
DevOps has shortened the development life cycle for countless applications and is embraced by companies around the world. But managing and monitoring multiple environments is still a major pain point, particularly when companies need to mix cloud and legacy systems. Knowing when services go down and quickly pinpointing the cause is essential for continuous development.
Play Episode Listen Later Jul 19, 2021
Big data analytics is the process of collecting data, processing and cleaning it, then analyzing it with techniques like data mining, predictive analytics, and deep learning. This process requires a suite of tools to operate efficiently. Data analytics can save companies money, drive product development, and give insight into the market and customers. The company
Play Episode Listen Later Jul 20, 2021
Governments, consumers, and companies across the world are becoming more aware and attentive to the risks and causes of climate change. From recycling to using solar power, people are looking for ways to reduce their carbon footprint. Markets like the financial sector, governments, and consulting are looking for ways to understand climate data to make
Play Episode Listen Later Jul 21, 2021
In 2003, Google developed a robust cluster management system called Borg. This enabled them to manage clusters with tens of thousands of machines, moving them away from virtual machines and firmly into container management. Then, in 2014, they open sourced a version of Borg called Kubernetes, or K8s. Now, in 2021, CockroachDB is a distributed
Play Episode Listen Later Jul 22, 2021
If you've ever googled a CS or programming question, you likely found an answer (or many) on Stack Overflow. Founded in 2008 and named after a common computing error, Stack Overflow empowers the world to develop technology through collective knowledge. More than 100 million people visit Stack Overflow every month making it one of the
Apache Pulsar is a cloud-native, distributed messaging and streaming platform originally created at Yahoo! and now a top-level Apache Software Foundation project (pulsar.apache.org). Pulsar is used by many large companies like Yahoo!, Verizon media, Tencent, and Splunk. The company DataStax, an open, multi-cloud stack for modern data apps, has added to their product stack Astra
In the previous episode, Pulsar Revisited, we discussed how the company DataStax has added to their product stack Astra Streaming, their cloud-native messaging and event streaming service that's built on top of Apache Pulsar. We discussed Apache Pulsar and the added features DataStax offers like injecting machine learning into your data streams and viewing real-time
We've been running Software Daily for 6 years. When we started it back in 2015, the goal was to create a software engineering podcast that was 60% as good as Software Engineering Radio, but aired with 5x the frequency. 5 shows per week, 50 weeks per year, 5 ads per show, means 1,250 advertising spots.
Prophecy is a complete Low-Code Data Engineering Platform for the Enterprise. Prophecy enables all your teams on Apache Spark with a unique low-code designer. While you visually build your Dataflows – Prophecy generates high-quality Spark code on Git. Then, you can schedule Spark workflows with Prophecy's low-code Airflow. Not only that, Prophecy provides end-to-end visibility
In today's containerized world, it's common to encounter similar issues with known solutions across multiple pods. For most people there are 2 solutions: go pod-by-pod finding and fixing the problem, or do that while also spending months trying to automate that process. This is significant time and manual labor. The company Shoreline orchestrates real-time debugging
In this episode we discuss venture capital and more with Preface Ventures Founder and General Partner Farooq Abbasi. This interview was also recorded as a video podcast. Check out the video on the Software Daily YouTube channel. Sponsorship inquiries: sponsor@softwareengineeringdaily.com
Play Episode Listen Later Aug 2, 2021
Enterprise data warehouses store all company data in a single place to be accessed, queried, and analyzed. They're essential for business operations because they support managing data from multiple sources, providing context, and have built-in analytics tools. While keeping a single source of truth is important, easily moving data from the warehouse to other applications
Play Episode Listen Later Aug 3, 2021
Blockchain protocols like Bitcoin and Ethereum have changed the cyber world dramatically in the last decade. They've created communities of like-minded developers, generated new financial markets, and popularized “decentralization” in computer networks. However, they require large resources to operate which makes scaling difficult and transactions expensive. Hedera is a decentralized public network that takes the
Play Episode Listen Later Aug 4, 2021
Kubernetes is an open-source container orchestration system. It makes managing container clusters possible as well as deploying code changes to these containers. Microservice architecture is widely used today in large part because of Kubernetes. However, using it can require a large time commitment due to its learning curve. The company Okteto empowers developers to innovate
The term “boilerplate code” refers to code sections that are repeated across many projects with little to no variation. Every developer is familiar with boilerplate code, whether it be pom.xml files in Java or setting up React.js applications, tweaking boilerplate code for every project is inevitable. Actually, the company Wasp believes writing boilerplate code doesn't
Direct-to-Consumer companies sell their products without going through a traditional middleman like an outlet store or wholesaler. By posting content on platforms like TikTok, Facebook, and YouTube, companies can reach millions of users.TikTok, for example, has an estimated 1 billion monthly active users (wallaroomedia). The company Kindred Studios helps direct-to-consumer companies create consistent, beautiful content
Latency is the time it takes to get from point A to point B. In programming, this might be the time from a user selecting their photos library to the pictures reaching their computer screen from the database. Fly.io is a simple platform for running full-stack apps and databases close to your users. Some available
Play Episode Listen Later Aug 11, 2021
According to Fugue's new State of Cloud Security 2020 report, cloud misconfiguration remains the top cause of data breaches in the cloud, and millions of database servers are currently exposed across cloud providers. Some of the leading reasons are a lack of adequate oversight and too many APIs and interfaces to govern. (securityaffairs.co). Argos Security
Serverless computing is a cloud computing solution that lets developers deploy applications to containers without managing the servers themselves. Servers and resources are provisioned automatically, pay only for what you use, and experience little to no errors or downtime (ionos). Google Cloud Run is a managed compute platform that enables you to run containers that
Auren Hoffman is the CEO of SafeGraph. In this episode we discuss data as a service and more. This interview was also recorded as a video podcast. Check out the video on the Software Daily YouTube channel. Sponsorship inquiries: sponsor@softwareengineeringdaily.com
Play Episode Listen Later Aug 14, 2021
This episode was published on the GeekWire podcast. Subscribe to GeekWire for more great content. Facebook CEO Mark Zuckerberg told employees recently that the company's long-term goal is to “bring the metaverse to life” — helping to create an interconnected world of physical, virtual and augmented reality spaces that will reshape the way we work, interact with
Whether sending messages, shopping in an app, or watching videos, modern consumers expect information and responsiveness to be near-instant in their apps and devices. From a developer's perspective, this means clean code and a fast database. Apache Druid is a database built to power real-time analytic workloads for event-driven data, like user-facing applications, streaming, and
Pay-as-you-go pricing has become a strong selling point for modern SaaS companies as well as cloud-based companies. Public cloud providers, for example, typically only charge you for only what you use. But implementing this option is challenging because it requires advanced platform analytics. The company Octane is a drop-in metered billing system that gives you
Domain names are the address of your website that people type into the browser URL bar. Once purchased, a domain name is stored on your behalf by custodians like Google domains. Blockchain domains, on the other hand, are similar to regular domain names except they are stored and controlled in your cryptocurrency wallet. The company
Time series data are simply measurements or events that are tracked, monitored, downsampled, and aggregated over time. This could be server metrics, application performance monitoring, network data, sensor data, events, clicks, trades in a market, and many other types of analytics data (influxdata.com). The platform InfluxData is designed for building and operating time series applications.
“In October 1958, Physicist William Higinbotham created what is thought to be the first video game. It was a very simple tennis game, similar to the classic 1970s video game Pong, and it was quite a hit at a Brookhaven National Laboratory open house” (aps.org). 63 years have passed, and video games are prolific. The
Application security is usually done with a set of tools and services known as SIEM – Security Information and Event Management. SIEM tools usually try to provide visibility into an organization's security systems, as well as event log management and security event notifications. The company Panther takes traditional SIEM security a step further. Panther processes
Google uses automated programs called spiders, or crawlers, to index and rank web pages. Then, when a user searches for something, it uses a special algorithm to determine the order of results to display (howstuffworks). This process, of course, applies to web pages on the internet. There are 2 major projects, worked on by the
In the late 1970s a printer at MIT kept jamming, resulting in regular pileups of print jobs in the printer's queue. To solve this problem, some computer scientists wrote a software program that alerted every user in the backed up queue “The printer is jammed, please fix it.” When a man named Richard Stallmen was
ETL stands for “extract, transform, load” and refers to the process of integrating data from many different sources into one location, usually a data warehouse. This process has become especially important for companies as they use many different services to collect and manage data. The company Grouparoo provides an open source framework that helps you
Shinji Kim is Founder and CEO of Select Star. In this episode we discuss data discovery and more. This interview was also recorded as a video podcast. Check out the video on the Software Daily YouTube channel. Sponsorship inquiries: sponsor@softwareengineeringdaily.com
Kubernetes is an open source container orchestration service released by Google in 2014. It has quickly grown into a platform with a huge community of enthusiasts and professionals. Besides becoming the de facto standard for container orchestration, it has fostered an ecosystem of related tools and services with increasing power and sophistication (opensource.com). Argo, a
Whether organizing projects, working from home, or conducting business, you need to use many necessary apps and cloud services to do the job. Despite these apps being necessary, switching between them and keeping them interconnected and updated is a challenge. They very easily become disorganized which makes using them less efficient. The company ClickUp solves
This episode is hosted by Kyle Polich of the Data Skeptic podcast. We’re glad to welcome Kyle to the Software Daily team. Becoming a contributor to an existing software project can be a daunting task for an engineer. A common convention is to add a README file to your repository to serve as a trailhead
The Israeli Tech Radar is an opinionated map of the latest technologies and trends in the Israeli tech industry. Now in its fifth edition, the Tech Radar was built in collaboration with Monday, Wix, Riskified, Netapp, Tabula, and other tech companies. Lior Kanfi is the CEO of Tikal. In this episode, we interview Lior about
Instabase is a technology platform for building automation solutions. Users deploy it onto their own infrastructure and can leverage the tools offered by the platform to build complex workflows for handling tasks like income verification and claims processing. In this episode we interview Anant Bhardwaj, founder of Instabase. He describes Instabase as an operating system.
Play Episode Listen Later Sep 8, 2021
By most accounts, the first databases came on line in the 1960s. This class of software has continued to evolve alongside the technology it runs on and the applications it supports. In the early days, databases were typically closed source commercial products. Today, databases run in the cloud on distributed systems. Increasingly, the leading tools
A software engineer will make many mistakes on their career journey. In time, engineers learn to make smaller mistakes, recognize them faster, and build with appropriate guardrails. The demands of delivering software in a timely and efficient fashion often demands developers carefully optimize tradeoffs to deliver solutions to the problems at hand. Software Mistakes and
An application network is a way to connect applications, data and devices through APIs that expose some or all of their assets and data on the network. That network allows other consumers from other parts of the business to come in and discover and use those assets (mulesoft.com). The company Tetrate provides the tools necessary
Please The Easy Thing about Easy Things
Play Episode Listen Later Sep 14, 2021
Modern companies leverage dozens or even hundreds of software solutions to solve specific needs of the business. Organizations need to collect all these disparate data sources into a data warehouse in order to add value. The raw data typically needs transformation before it can be analyzed. In many cases, companies develop homegrown solutions, thus reinventing
As developers hone their craft, becoming more productive often means learning utilities and tools at the command line. The right combination of various parsing commands chained together through pipes can enable engineers to quickly and efficiently automate many adhoc data processing tasks. In this episode I speak with Adam Gordon Bell about some of his
Web applications often have some sort of login system, and once a user creates an account, they have access to features anonymous users can't see. In time, application designers will often add an admin level of access for special users. This is often a slow trickle of technical debt. Proper execution of a programmatic authorization
Facebook's goal is to connect the world and allow every human to foster better relationships. Today is a rare treat. Mark Zuckerberg is the CEO of Facebook. After I spent nearly 3 years writing a book about how his company works, he has finally agreed to do an interview. Special thanks to Jennifer Li at
Money laundering is not a new crime. However, the growth of digital communications has greatly expanded the opportunity for money launderers to find innovative new ways to hide their true intent. Some estimates suggest it could be as high as 2-5% of the world's GDP. Unit21 is a customizable no-code platform for risk and compliance
Interest in autonomous vehicles dates back to the 1920s. It wasn't until the 1980s that the first truly autonomous vehicle prototypes began to appear. The first DARPA Grand Challenge took place in 2004 offering competitors $1 million dollars to complete a 150-mile course through the Mojave desert. The prize was not claimed. Since then, rapid
A developer's core deliverables are individual commits and the pull requests they aggregate into. While the number of lines of code written alone may not be very informative, in total, the code and metadata about the code found in tracking systems present a rich dataset with great promise for analysis and productivity optimization insights. LinearB
The dream of machines with artificial general intelligence is entirely plausible in the future, yet well beyond the reach of today's cutting edge technology. However, a virtual agent need not win in Alan Turing's Imitation Game to be useful. Modern technology can deliver on some of the promises of narrow intelligence for accomplishing specific tasks.
Financial technology or fintech has always been a hot topic. This is increasingly true in recent years as disruptive companies enter the market to give better alternatives and solutions to consumers. Current is focused on creating better financial outcomes. In addition to providing banking services, their app has many tools and reminders to help users
Tedious, repetitive tasks are better handled by machines. Unless these tasks truly require human intelligence, repetitive tasks are often good candidates for automation. Implementing process automation can be challenging and technical. Increasingly, engineers are seeking out tools and platforms to facilitate faster, more reliable automation. In this episode I talk to Yaseer Sheriff, Co-Founder and
The way we write, compile, and run software has continued to evolve since computer programming began. The cloud, serverless, no-code, and CI/CD are all contemporary ideas introduced to help software engineers spend more time on their application and less time on the chores of running it. Darklang is a new way of building serverless backends.
Applications write data to persistent storage like a database. The most popular database query language is SQL which has many similar dialects. SQL is expressive and powerful for describing what data you want. What you do with that data requires a solution in the form of a data pipeline. Ideally, these analytical workflows can follow
Companies that gather data about their users have an ethical obligation and legal responsibility to protect the personally identifiable information in their dataset. Ideally, developers working on a software application wouldn't need access to production data. Yet without high-quality example data, many technology groups stumble on avoidable problems. Organizations need a solution to protect privacy
In a version control system, a Monorepo is a version control management strategy in which all your code is contained in one potentially large but complete repository. The monorepo is in stark contrast to an alternative approach in which software teams independently manage microservices or deliver software as libraries to be imported in other projects.
Phishing attacks, malware, and ransomware are just some of the major threats everyone connected to the internet faces. For companies, the stakes are especially high. Setting up a secure infrastructure is difficult. Your adversary only needs to find one flaw to get in. Vancord is a private cybersecurity company, based in Connecticut, that was founded
Abstract Software Daily is a place to create software. Introduction SoftwareDaily.com is a social network that allows people from all over the world to come together and create software. Inspiration from Amazon Working at Amazon taught me that we can build anything. Inspiration from Facebook Writing “Move Fast” taught me that social networking will allow us to expand our
Play Episode Listen Later Oct 5, 2021
The first industrial deployments of machine learning and artificial intelligence solutions were bespoke by definition and often had brittle operating characteristics. Almost no one builds custom databases, web servers, or email clients. Yet technology groups today often consider developing homegrown ML and data solutions in order to solve their unique use cases. Today's modern data
By most accounts, demand for software engineers exceeds supply. Not just anyone can develop this skill set to the level required to deliver enterprise-grade production code. For those that can, companies are incentivized to take extra measures to ensure software engineers are as productive as possible. The pace of business is often throttled by the
As our guest today points out, most enterprise software applications are essentially forms for collecting data. The tag and related components started appearing in HTML fairly early and those same concepts are still in use with modern web browsers. However, the technology for capturing state, validating input, and providing other common services for the
Infrastructure as Code is an approach to machine provisioning and setup in which a programmer describes the underlying services they need for their projects. However, this infrastructure code doesn't compile a binary artifact like traditional source code. The successful completion of running the code signals that the servers and other components described in the configuration
The expression firing on all cylinders dates back to the early 1900s and refers to a function of the internal combustion engine. This expression poetically applies to successful businesses as well. Each department must operate at peak performance and the couplings between departments need optimization as well. In this episode, I interview business coach Jon
The last 15 years have seen the emergence of cloud-based developer APIs and services as dominant components of the developer toolchain. As a result, there has never been more power at developers' fingertips. But making that power usable and accessible is a challenge that is shared between the providers and the consumers of these services.
Whether you love them or hate them, share them or ignore them, you encounter memes all over the internet. Those that are popular can often take off and spawn a long history of remixes, variants, derivatives, and inspired works. In this episode, we interview Johan Unger, the founder of meme.com. They're creating a platform for
Venture capital investment has continued to flow into technology startups. No one builds technology from scratch. There are cloud services, software libraries, 3rd party services, and software platforms that modern entrepreneurs must adopt to build their products efficiently and quickly. These layers of infrastructure are a key area for many investors. In this episode, I
The gig economy involves independent contractors engaging in flexible jobs. Today gig workers often get work from centralized platforms that facilitate the process of connecting workers with employers in exchange for a fee. Some workers find the relationship between worker and platform to be adversarial in nature since the platform can establish and enforce rules
Imagine a world where you own some sort of building whether that's a grocery store, a restaurant, a factory… and you want to know how many people reside in each section of the store, or maybe how long did the average person wait to be seated or how long did it take the average factory
Virtual meetings were growing in popularity before the need accelerated as a result of the pandemic. Gather is a place where you can create a space for your community today. Users who join find themselves in a shared virtual space that offers the ability to interact with other users as well as interact with the
The notebook paradigm of coding is relatively new in comparison to REPLs and IDEs. Notebooks run in your browser and give you discrete cells for running segments of code. All the code in a single cell runs at once, but cells run independently. Cells can be re-run, which is a blessing and a curse. The
Play Episode Listen Later Oct 27, 2021
One of the most painful parts of getting started on a new development team is getting one's environment set up. Whether it's undocumented steps, overly complex setups, or simply the challenges of understanding how the pieces fit together, getting a dev environment up often feels like a chore to be suffered through in order to
Modern business applications are complex. It’s not enough to have raw logs or some basic telemetry. Today’s enterprise organizations require an application performance monitoring solution or APM. Today’s applications are complex distributed systems whose performance depends on a wide variety of factors. Every single line of code can affect production and teams need insights into
According to builtwith.com, more than 10 million websites are powered by React framework. Of the top 10k sites by traffic, 44.7% of those are built with React. This javascript framework is capable of powering a wide array of modern applications and remains fairly beloved by developers that use it. In this episode, I interview Kent
Play Episode Listen Later Nov 1, 2021
Welcome to Software Engineering Daily; I'm your guest host, Joey Baruch. I'm the CTO at Alvarez and Marsal Data Intelligence Gateway (A&M DIG), prior to which I co-founded and was CTO of HuMoov, a vertical SaaS. I've been a software engineer at PayPal, IBM Research Labs, and Qualcomm via the acquisition of Wilocity. Joining me
The React Framework has seen continuous growth of adoption since its launch. There are many reasons for that, but one reason is how relatively painless it is to use `react-create-app` or copy some boilerplate code and have a functioning, hot reloading, live demo up and running in minutes. There is, however, a long way to
The manner in which users interact with technology has rapidly switched to mobile consumption. The devices almost all of us carry with us at all times open endless opportunities for developers to create location-based experiences. Foursquare became a household name when the introduced social check-ins. Today they're a location data platform. Ankit Patel is the
It wasn't that long ago that companies scheduled downtime in order to release an updated version of the software running their website. That's rare today. Most developers want continuous testing, integration, and deployment. While that comes with many benefits, it also places greater demands on quality engineers who can no longer gate all updates into
Angular is a free and open-source web application framework. It's maintained by the Angular team at Google. It's used by millions of web applications and has a strong ecosystem of core contributors and library builders. In this episode, I interview Minko Gechev, Developer Relations Lead at Google. We explore several aspects of open-source software development,
It does not matter if it runs on your machine. Your code must run in the production environment and it must do so performantly. For that, you need tooling to better understand your application’s behavior under different circumstances. In the earliest days of software development, all we had were logs, which are still around and
Machine learning models must first be trained. That training results in a model which must be serialized or packaged up in some way as a deployment artifact. A popular deployment path is using Tensorflow.js to take advantage of the portability of JavaScript, allowing your model to be run on a web server or client. Gant
The internet is a layer cake of technologies and protocols. At a fundamental level, the internet runs on the TCP/IP protocol. It's a packet based system. When your browser requests a file from a web server, that server chops up the file into tiny pieces known as packets and puts them on the network labeled
The banking industry uses technology that some modern software engineers may regard as out of date or old-fashioned. Entrepreneurs wanting to create products in the banking space historically faced a steep curve to build software that could integrate with established banking systems. Christopher Dean seeks to change that. He founded Treasury Prime, a company that
To many people's surprise tech sales is not much of an art. It's actually a regimented science where reps have clear step-by-step processes to bring in new business. Each stage takes the customer closer to the end of the deal and consists of learning more about the customer's needs. A CRM is a database reps
Modern businesses run on the cloud and increasingly so they run on multi-cloud infrastructure. As any growing company can tell you, cloud costs can easily run far out of control. Today's enterprises are trying to deliver new products and services at a fast pace. That needs to be done in a cost-effective, ideally cloud-agnostic way.
Play Episode Listen Later Nov 17, 2021
Neural networks, in particular, deep neural networks have revolutionized machine learning. Researchers and companies have pushed on the efficiency of every aspect of the machine learning lifecycle. The impact of the trained models is particularly significant for computer vision and in turn for autonomous driving and security systems. In this episode, I interview Forrest Iandola,
With a few impressive exceptions, software is rarely written by one person. It takes a team and as that team outgrows a single shared office, coordination and communication become emergent problems. There are lots of lessons to be learned from companies that have already found approaches that scale. In this episode, I interview Tramale Turner,
Thanks to the amazing books, blogs, videos, quickstarts, frameworks, and other software-related resources, getting started as a software engineer is easier than ever. Although you can get started in a day, it can take years to become a master of the craft and most practitioners describe it as a profession of lifelong learning. Titus Winters
Consumers are increasingly becoming aware of how detrimental it can be when companies mismanage data. This demand has fueled regulations, defined standards, and applied pressure to companies. Modern enterprises need to consider corporate risk management and regulatory compliance. In this interview, I speak with Terry O'Daniel, Director of Engineering (Risk & Compliance) at Instacart. Sponsorship
Application observability is a fairly mature area. Engineering teams have a wide selection of tools they can choose to adopt and a significant amount of thought leadership and philosophy already exists giving guidance for managing your application. That application is going to persist data. As you scale up, your system is invariably going to experience
When creating a website, there's no shortage of choices for how to do it. Builders must make strategic decisions about the language or framework they want to adopt. An important first consideration for many is selecting a web application framework like React or Vue. Motivated by a low page response time and good user experience,
Play Episode Listen Later Nov 30, 2021
Internships can be an incredibly valuable resource to new professionals and are often the first professional work experience for many participants. It’s often the case that internship programs are suboptimal. Employers don't always provide a clear path to success for the intern. Interns in turn don't always have a resource to reach out for help
Once a machine learning model is trained and validated, it often feels like a major milestone has been achieved. In reality, it's more like the first lap in a relay race. Deploying ML to production bears many similarities to a typical software release process, but brings several novel challenges like failing to generalize as expected
Climate modeling is increasingly important as supply chains, emergency management, and dozens of other efforts need to make predictions about future conditions and how they will impact business. Analyzing climate data requires geospatial systems, and those systems need a full-stack geospatial technology solution. Gopal Erinjippurath serves as CTO and Head of Product at Sust Global,
Many software projects run the risk of evolving over time to a complex state that is inhospitable for new contributors to join. This is a dangerous place for a company to be. Either software needs to remain more accessible, or faster paths must be created to help them get on board. Today's interview is with
Microservice architecture has become a ubiquitous design choice. Application developers typically have neither the training nor the interest in implementing low-level security features into their software. For this and many other reasons, the notion of a service mesh has been introduced to provide a framework for service-to-service communication. Today's guest is Zack Butcher. While working
Writing your application's code is only half the battle. Getting it to run on your machine is a milestone, but it's far from your code running in a production environment. There are an increasing set of options application designers have for helping to manage deployment, environments, and CI/CD. Encore is a backend engine for the
As cloud providers enable greater levels of specificity and control, they empower compliance-driven enterprise companies. This level of parameterization is downright inhospitable to a new software engineer and can be a cognitive barrier to entry for a senior professional with a great idea but limited time. Developers want to focus on their code, algorithms, front
The lifeblood of most companies is their sales departments. When you're selling something other than a commodity, it's typically necessary to carefully groom the onboarding experience for inbound future customers. Historically, companies approached this in a one-size-fits-all manner, giving all customers a common experience. In today's data-driven age, a better experience can be provided that
Relational databases have been a fixture of software applications for decades. They are highly tuned for performance and typically offer explicit guarantees like transactional consistency. More recently, there's been a figurative cambrian explosion of other-than-relational databases. Simple key value stores or counters were an early win in this space. Managing a graph data structure is
Everyone is becoming increasingly aware of supply chains for physical goods. Software has its own supply chain. A supply of open source solutions exists as does a demand for these solutions by industry. Both have surely grown, but it would be nice to have a way of measuring by how much. The State of Software
Robotic process automation or RPA refers to software robots constructed to automate some business process. Perhaps the most ubiquitous example is adding filters to your email inbox. I've worked with a lot of salespeople that configure complex email follow-up campaigns when inbound emails come in, but even that's a fairly basic example compared to what's
InfluxDB is an open-source time-series database. It's maintained by InfuxData who offers a suite of products that help organizations gain insights from time-series data. In this episode, I interview Zoe Steinkamp, Software Engineering and Developer Advocate at InfluxData. We explore some of the common use cases for time-series databases such as IoT and some recent
As the internet has grown, increasingly, we are consumers of services provided by corporations rather than owners and operators of our own systems. To many, this trend towards centralization is antithetical to the spirit of a free and open internet. Urbit is a new operating system and peer-to-peer network. There are several layers of novel
If you haven't encountered a data quality problem, then you haven't yet worked on a large enough project. Invariably, a gap exists between the state of raw data and what an analyst or machine learning engineer needs to solve their problem. Many organizations needing to automate data preparation workflows look to Trifacta as a solution.
If you haven't encountered a data quality problem, then you haven't yet worked on a large enough project. Invariably, a gap exists between the state of raw data and what an analyst or machine learning engineer needs to solve their problem. Many organizations needing to automate data preparation workflows look to Trifacta as a solution.
As the internet has grown, increasingly, we are consumers of services provided by corporations rather than owners and operators of our own systems. To many, this trend towards centralization is antithetical to the spirit of a free and open internet. Urbit is a new operating system and peer-to-peer network. There are several layers of novel
InfluxDB is an open-source time-series database. It's maintained by InfuxData who offers a suite of products that help organizations gain insights from time-series data. In this episode, I interview Zoe Steinkamp, Software Engineering and Developer Advocate at InfluxData. We explore some of the common use cases for time-series databases such as IoT and some recent
Robotic process automation or RPA refers to software robots constructed to automate some business process. Perhaps the most ubiquitous example is adding filters to your email inbox. I've worked with a lot of salespeople that configure complex email follow-up campaigns when inbound emails come in, but even that's a fairly basic example compared to what's
Everyone is becoming increasingly aware of supply chains for physical goods. Software has its own supply chain. A supply of open source solutions exists as does a demand for these solutions by industry. Both have surely grown, but it would be nice to have a way of measuring by how much. The State of Software
Relational databases have been a fixture of software applications for decades. They are highly tuned for performance and typically offer explicit guarantees like transactional consistency. More recently, there's been a figurative cambrian explosion of other-than-relational databases. Simple key value stores or counters were an early win in this space. Managing a graph data structure is
The lifeblood of most companies is their sales departments. When you're selling something other than a commodity, it's typically necessary to carefully groom the onboarding experience for inbound future customers. Historically, companies approached this in a one-size-fits-all manner, giving all customers a common experience. In today's data-driven age, a better experience can be provided that
As cloud providers enable greater levels of specificity and control, they empower compliance-driven enterprise companies. This level of parameterization is downright inhospitable to a new software engineer and can be a cognitive barrier to entry for a senior professional with a great idea but limited time. Developers want to focus on their code, algorithms, front
Writing your application's code is only half the battle. Getting it to run on your machine is a milestone, but it's far from your code running in a production environment. There are an increasing set of options application designers have for helping to manage deployment, environments, and CI/CD. Encore is a backend engine for the
Microservice architecture has become a ubiquitous design choice. Application developers typically have neither the training nor the interest in implementing low-level security features into their software. For this and many other reasons, the notion of a service mesh has been introduced to provide a framework for service-to-service communication. Today's guest is Zack Butcher. While working
Many software projects run the risk of evolving over time to a complex state that is inhospitable for new contributors to join. This is a dangerous place for a company to be. Either software needs to remain more accessible, or faster paths must be created to help them get on board. Today's interview is with
Climate modeling is increasingly important as supply chains, emergency management, and dozens of other efforts need to make predictions about future conditions and how they will impact business. Analyzing climate data requires geospatial systems, and those systems need a full-stack geospatial technology solution. Gopal Erinjippurath serves as CTO and Head of Product at Sust Global,
Once a machine learning model is trained and validated, it often feels like a major milestone has been achieved. In reality, it's more like the first lap in a relay race. Deploying ML to production bears many similarities to a typical software release process, but brings several novel challenges like failing to generalize as expected
Play Episode Listen Later Nov 30, 2021
Internships can be an incredibly valuable resource to new professionals and are often the first professional work experience for many participants. It’s often the case that internship programs are suboptimal. Employers don't always provide a clear path to success for the intern. Interns in turn don't always have a resource to reach out for help
When creating a website, there's no shortage of choices for how to do it. Builders must make strategic decisions about the language or framework they want to adopt. An important first consideration for many is selecting a web application framework like React or Vue. Motivated by a low page response time and good user experience,
Application observability is a fairly mature area. Engineering teams have a wide selection of tools they can choose to adopt and a significant amount of thought leadership and philosophy already exists giving guidance for managing your application. That application is going to persist data. As you scale up, your system is invariably going to experience
Consumers are increasingly becoming aware of how detrimental it can be when companies mismanage data. This demand has fueled regulations, defined standards, and applied pressure to companies. Modern enterprises need to consider corporate risk management and regulatory compliance. In this interview, I speak with Terry O'Daniel, Director of Engineering (Risk & Compliance) at Instacart. Sponsorship
Thanks to the amazing books, blogs, videos, quickstarts, frameworks, and other software-related resources, getting started as a software engineer is easier than ever. Although you can get started in a day, it can take years to become a master of the craft and most practitioners describe it as a profession of lifelong learning. Titus Winters
With a few impressive exceptions, software is rarely written by one person. It takes a team and as that team outgrows a single shared office, coordination and communication become emergent problems. There are lots of lessons to be learned from companies that have already found approaches that scale. In this episode, I interview Tramale Turner,
Play Episode Listen Later Nov 17, 2021
Neural networks, in particular, deep neural networks have revolutionized machine learning. Researchers and companies have pushed on the efficiency of every aspect of the machine learning lifecycle. The impact of the trained models is particularly significant for computer vision and in turn for autonomous driving and security systems. In this episode, I interview Forrest Iandola,
Modern businesses run on the cloud and increasingly so they run on multi-cloud infrastructure. As any growing company can tell you, cloud costs can easily run far out of control. Today's enterprises are trying to deliver new products and services at a fast pace. That needs to be done in a cost-effective, ideally cloud-agnostic way.
To many people's surprise tech sales is not much of an art. It's actually a regimented science where reps have clear step-by-step processes to bring in new business. Each stage takes the customer closer to the end of the deal and consists of learning more about the customer's needs. A CRM is a database reps
The banking industry uses technology that some modern software engineers may regard as out of date or old-fashioned. Entrepreneurs wanting to create products in the banking space historically faced a steep curve to build software that could integrate with established banking systems. Christopher Dean seeks to change that. He founded Treasury Prime, a company that
The internet is a layer cake of technologies and protocols. At a fundamental level, the internet runs on the TCP/IP protocol. It's a packet based system. When your browser requests a file from a web server, that server chops up the file into tiny pieces known as packets and puts them on the network labeled
Machine learning models must first be trained. That training results in a model which must be serialized or packaged up in some way as a deployment artifact. A popular deployment path is using Tensorflow.js to take advantage of the portability of JavaScript, allowing your model to be run on a web server or client. Gant
It does not matter if it runs on your machine. Your code must run in the production environment and it must do so performantly. For that, you need tooling to better understand your application’s behavior under different circumstances. In the earliest days of software development, all we had were logs, which are still around and
Angular is a free and open-source web application framework. It's maintained by the Angular team at Google. It's used by millions of web applications and has a strong ecosystem of core contributors and library builders. In this episode, I interview Minko Gechev, Developer Relations Lead at Google. We explore several aspects of open-source software development,
It wasn't that long ago that companies scheduled downtime in order to release an updated version of the software running their website. That's rare today. Most developers want continuous testing, integration, and deployment. While that comes with many benefits, it also places greater demands on quality engineers who can no longer gate all updates into
The manner in which users interact with technology has rapidly switched to mobile consumption. The devices almost all of us carry with us at all times open endless opportunities for developers to create location-based experiences. Foursquare became a household name when the introduced social check-ins. Today they're a location data platform. Ankit Patel is the
The React Framework has seen continuous growth of adoption since its launch. There are many reasons for that, but one reason is how relatively painless it is to use `react-create-app` or copy some boilerplate code and have a functioning, hot reloading, live demo up and running in minutes. There is, however, a long way to
Play Episode Listen Later Nov 1, 2021
Welcome to Software Engineering Daily; I'm your guest host, Joey Baruch. I'm the CTO at Alvarez and Marsal Data Intelligence Gateway (A&M DIG), prior to which I co-founded and was CTO of HuMoov, a vertical SaaS. I've been a software engineer at PayPal, IBM Research Labs, and Qualcomm via the acquisition of Wilocity. Joining me
According to builtwith.com, more than 10 million websites are powered by React framework. Of the top 10k sites by traffic, 44.7% of those are built with React. This javascript framework is capable of powering a wide array of modern applications and remains fairly beloved by developers that use it. In this episode, I interview Kent
Modern business applications are complex. It’s not enough to have raw logs or some basic telemetry. Today’s enterprise organizations require an application performance monitoring solution or APM. Today’s applications are complex distributed systems whose performance depends on a wide variety of factors. Every single line of code can affect production and teams need insights into
Play Episode Listen Later Oct 27, 2021
One of the most painful parts of getting started on a new development team is getting one's environment set up. Whether it's undocumented steps, overly complex setups, or simply the challenges of understanding how the pieces fit together, getting a dev environment up often feels like a chore to be suffered through in order to
The notebook paradigm of coding is relatively new in comparison to REPLs and IDEs. Notebooks run in your browser and give you discrete cells for running segments of code. All the code in a single cell runs at once, but cells run independently. Cells can be re-run, which is a blessing and a curse. The
Virtual meetings were growing in popularity before the need accelerated as a result of the pandemic. Gather is a place where you can create a space for your community today. Users who join find themselves in a shared virtual space that offers the ability to interact with other users as well as interact with the
Imagine a world where you own some sort of building whether that's a grocery store, a restaurant, a factory… and you want to know how many people reside in each section of the store, or maybe how long did the average person wait to be seated or how long did it take the average factory
The gig economy involves independent contractors engaging in flexible jobs. Today gig workers often get work from centralized platforms that facilitate the process of connecting workers with employers in exchange for a fee. Some workers find the relationship between worker and platform to be adversarial in nature since the platform can establish and enforce rules
Venture capital investment has continued to flow into technology startups. No one builds technology from scratch. There are cloud services, software libraries, 3rd party services, and software platforms that modern entrepreneurs must adopt to build their products efficiently and quickly. These layers of infrastructure are a key area for many investors. In this episode, I
Whether you love them or hate them, share them or ignore them, you encounter memes all over the internet. Those that are popular can often take off and spawn a long history of remixes, variants, derivatives, and inspired works. In this episode, we interview Johan Unger, the founder of meme.com. They're creating a platform for
The last 15 years have seen the emergence of cloud-based developer APIs and services as dominant components of the developer toolchain. As a result, there has never been more power at developers' fingertips. But making that power usable and accessible is a challenge that is shared between the providers and the consumers of these services.
The expression firing on all cylinders dates back to the early 1900s and refers to a function of the internal combustion engine. This expression poetically applies to successful businesses as well. Each department must operate at peak performance and the couplings between departments need optimization as well. In this episode, I interview business coach Jon
Infrastructure as Code is an approach to machine provisioning and setup in which a programmer describes the underlying services they need for their projects. However, this infrastructure code doesn't compile a binary artifact like traditional source code. The successful completion of running the code signals that the servers and other components described in the configuration
As our guest today points out, most enterprise software applications are essentially forms for collecting data. The tag and related components started appearing in HTML fairly early and those same concepts are still in use with modern web browsers. However, the technology for capturing state, validating input, and providing other common services for the
By most accounts, demand for software engineers exceeds supply. Not just anyone can develop this skill set to the level required to deliver enterprise-grade production code. For those that can, companies are incentivized to take extra measures to ensure software engineers are as productive as possible. The pace of business is often throttled by the
Play Episode Listen Later Oct 5, 2021
The first industrial deployments of machine learning and artificial intelligence solutions were bespoke by definition and often had brittle operating characteristics. Almost no one builds custom databases, web servers, or email clients. Yet technology groups today often consider developing homegrown ML and data solutions in order to solve their unique use cases. Today's modern data
Abstract Software Daily is a place to create software. Introduction SoftwareDaily.com is a social network that allows people from all over the world to come together and create software. Inspiration from Amazon Working at Amazon taught me that we can build anything. Inspiration from Facebook Writing “Move Fast” taught me that social networking will allow us to expand our
Phishing attacks, malware, and ransomware are just some of the major threats everyone connected to the internet faces. For companies, the stakes are especially high. Setting up a secure infrastructure is difficult. Your adversary only needs to find one flaw to get in. Vancord is a private cybersecurity company, based in Connecticut, that was founded
In a version control system, a Monorepo is a version control management strategy in which all your code is contained in one potentially large but complete repository. The monorepo is in stark contrast to an alternative approach in which software teams independently manage microservices or deliver software as libraries to be imported in other projects.
Companies that gather data about their users have an ethical obligation and legal responsibility to protect the personally identifiable information in their dataset. Ideally, developers working on a software application wouldn't need access to production data. Yet without high-quality example data, many technology groups stumble on avoidable problems. Organizations need a solution to protect privacy
Applications write data to persistent storage like a database. The most popular database query language is SQL which has many similar dialects. SQL is expressive and powerful for describing what data you want. What you do with that data requires a solution in the form of a data pipeline. Ideally, these analytical workflows can follow
In order to claim this podcast we'll send an email to with a verification link. Simply click the link and you will be able to edit tags, request a refresh, and other features to take control of your podcast page!
An email is on the way to from mike@ivy.fm with a verification link. Simply click the link and you will be able to edit tags, request a refresh, and other features to take control of your podcast page!