A distributed, scalable, and highly available real-time search platform with a RESTful API.
POPULARITY
The ClickHouse open source project has gained interest in the observability community, thanks to its outstanding performance benchmarks. Now ClickHouse is doubling down on observability with the release of ClickStack, a new open source observability stack that bundles in ClickHouse, OpenTelemetry and HyperDX frontend. I invited Mike Shi, the co-founder of HyperDX and co-creator of ClickStack, to tell us all about this new project. Mike is Head of Observability at ClickHouse, and brings prior observability experience with Elasticsearch and more.You can read the recap post: https://medium.com/p/73f129a179a3/Show Notes:00:00 episode and guest intro04:38 taking the open source path as an entrepreneur10:51 the HyperDX observability user experience 16:08 challenges in implementing observability directly on ClickHouse20:03 intro to ClickStack and incorporating OpenTelemetry32:35 balancing simplicity and flexibility36:15 SQL vs. Lucene query languages 39:06 performance, cardinality and the new JSON type52:14 use cases in production by OpenAI, Anthropic, Tesla and more55:38 episode outroResources:HyperDX https://github.com/hyperdxio/hyperdx ClickStack https://clickhouse.com/docs/use-cases/observability/clickstack Shopify's Journey to Planet-Scale Observability: https://medium.com/p/9c0b299a04ddClickHouse: Breaking the Speed Limit for Observability and Analytics https://medium.com/p/2004160b2f5e New JSON data type for ClickHouse: https://clickhouse.com/blog/a-new-powerful-json-data-type-for-clickhouseSocials:BlueSky: https://bsky.app/profile/openobservability.bsky.socialTwitter: https://twitter.com/OpenObservLinkedIn: https://www.linkedin.com/company/openobservability/YouTube: https://www.youtube.com/@openobservabilitytalksDotan Horovits============Twitter: @horovitsLinkedIn: www.linkedin.com/in/horovitsMastodon: @horovits@fosstodonBlueSky: @horovits.bsky.socialMike Shi=======Twitter: https://x.com/MikeShi42LinkedIn: https://www.linkedin.com/in/mikeshi42BlueSky: https://bsky.app/profile/mikeshi42.bsky.socialOpenObservability Talks episodes are released monthly, on the last Thursday of each month and are available for listening on your favorite podcast app and on YouTube.
Gagan Singh of Elastic discuses how agentic AI systems reduce analyst burnout by automatically triaging security alerts, resulting in measurable ROI for organizationsTopics Include:AI breaks security silos between teams, data, and tools in SOCsAttackers gain system access; SOC teams have only 40 minutes to detect/containAlert overload causes analyst burnout; thousands of low-value alerts overwhelm teams dailyAI inevitable for SOCs to process data, separate false positives from real threatsAgentic systems understand environment, reason through problems, take action without hand-holdingAttack discovery capability reduces hundreds of alerts to 3-4 prioritized threat discoveriesAI provides ROI metrics: processed alerts, filtered noise, hours saved for organizationsRAG (Retrieval Augmented Generation) prevents hallucination by adding enterprise context to LLMsAWS integration uses SageMaker, Bedrock, Anthropic models with Elasticsearch vector database capabilitiesEnd-to-end LLM observability tracks costs, tokens, invocations, errors, and performance bottlenecksJunior analysts detect nation-state attacks; teams shift from reactive to proactive securityFuture requires balancing costs, data richness, sovereignty, model choice, human-machine collaborationParticipants:Gagan Singh – Vice President Product Marketing, ElasticAdditional Links:Elastic – LinkedIn - Website – AWS Marketplace See how Amazon Web Services gives you the freedom to migrate, innovate, and scale your software company at https://aws.amazon.com/isv/
SANS Internet Stormcenter Daily Network/Cyber Security and Information Security Stormcast
Increased Elasticsearch Recognizance Scans Our honeypots noted an increase in reconnaissance scans for Elasticsearch. In particular, the endpoint /_cluster/settings is hit hard. https://isc.sans.edu/diary/Increased%20Elasticsearch%20Recognizance%20Scans/32212 Microsoft Patch Tuesday Issues Microsoft noted some issues deploying the most recent patches with WSUS. There are also issues with certain SSDs if larger files are transferred. https://learn.microsoft.com/en-us/windows/release-health/status-windows-11-24h2#3635msgdesc https://www.tomshardware.com/pc-components/ssds/latest-windows-11-security-patch-might-be-breaking-ssds-under-heavy-workloads-users-report-disappearing-drives-following-file-transfers-including-some-that-cannot-be-recovered-after-a-reboot SAP Vulnerabilities Exploited CVE-2025-31324, CVE-2025-42999 Details explaining how to take advantage of two SAP vulnerabilities were made public https://onapsis.com/blog/new-exploit-for-cve-2025-31324/
Alex & Chris get into a fairly recent technological change at CodePen where we ditched our Elasticsearch implementation for just using our own Postgres database for search. Sometimes choices like this are more about team expertise, dev environment practicalities, and complexity tradeoffs. We found this change to be much better for us, which matters! For the most part search is better and faster. Postgres is not nearly as fancy and capable as Elasticsearch, but we werent taking advantage of what Elasticsearch had to offer anyway. For the power users out there: it's true that we've lost the ability to do in-code search for now. But it's temporary and will be coming back in time. Time Jumps
In this episode of the Don't Panic, It's Just Data podcast, Kevin Petrie, VP of Research at BARC and the podcast host, is joined by Dainius Jocas, Search Engineer at Vinted, and Radu Gheorghe, Software Engineer at Vespa.ai. They discuss how Vinted, an online marketplace for secondhand products, modernised its data architecture to address new AI search use cases and the challenges faced with Elasticsearch. From the switch to Vespa and the advantages of supporting multiple languages and complex queries, the podcast offers insights on the trade-offs organisations must think about when updating their search systems, especially regarding AI and machine learning applications.Vinted Elasticsearch ChallengesVinted's search architecture was built on Elasticsearch before they switched to Vespa. Elasticsearch is a functional system that presents a few major challenges. With over 20 supported languages, the company's "index per language" approach created significant sharding problems, leading to infrastructure imbalances and constant adjustments."The index for the French language, the biggest language that we support, was more than three times bigger than the second biggest language, which created imbalances in the Elasticsearch data nodes' load," Jocas explained.In addition to these technical obstacles, organisational issues arose as teams responsible for different parts of the search process found themselves "pointing fingers at each other at an increasing rate." The need for a more integrated, effective solution became clear.The Solution: A New Platform for a New EraThe search for a better solution led Vinted to Vespa. The initial adoption was a "one success story" when a machine learning engineer, working on recommendations, discovered that Vespa was ten times faster than Elasticsearch for their use case. This initial benchmark, run on a single decommissioned server, was a "true testament to how efficient Vespa is when it comes to serving requests,” Jocas told Petrie.Vespa helped Vinted solve their language problem by allowing it to set a language per document. Thus, it eliminates the need for separate indexes and the associated sharding headaches. As Jocas put it, "We got out of the sharding problem once and for all."TakeawaysVinted faced challenges with its initial Elasticsearch architecture.The need for better integration between matching and ranking was identified.Vespa outperformed Elasticsearch in handling image search and recommendations.Transitioning to Vespa involved significant learning and support from developers.Vespa allows for language-specific document handling, simplifying architecture.Organisations must evaluate the complexity and volume of their data before transitioning.Vespa is optimised for query performance, while Elasticsearch excels in data writing.The learning curve for Vespa can be steep, but support is available.It's important to focus on optimising new systems rather than emulating old ones.Partial updates in Vespa are more efficient than in Elasticsearch.Chapters00:00 Introduction to Vinted and...
ParadeDB built a Postgres extension that facilitates full-text search and analytics on Postgres without the need to transfer data. Learn more about your ad choices. Visit podcastchoices.com/adchoices
In this special in-person episode, Sanne Grinovero shares the story of Java's evolution from his unique perspective as a long-time open-source contributor. He shares his 16-year career journey at Red Hat, highlighting his amazing work on key projects like Hibernate, Infinispan, and especially the creation of Quarkus. His career trajectory, from a student who initially disliked Java's complexity to a leading figure in its modernization, shows the transformative power of open source.A key part of the conversation focuses on how technical challenges spark innovation. Sanne explains how the task of making the popular Hibernate framework compatible with GraalVM's limitations led directly to the birth of Quarkus. This journey tells the bigger story of how Java adapted for cloud-native development, ensuring it continues to be a top choice for developers seeking high performance and a great developer experience. Timestamps:(00:00:00) Trailer & Intro(00:02:16) Career Turning Points(00:04:52) Winning an Innovation Award(00:06:35) Java Heroes(00:08:04) Working as a Consultant(00:09:56) Taking a Massive Pay Cut to Work on Open Source(00:10:59) Contributing to Big Open Source as a Youngster(00:12:53) State of Hibernate Project(00:15:15) Spring Boot(00:16:54) Making Hibernate Work on GraalVM(00:21:05) GraalVM Limitations for Running Hibernate(00:26:09) Java for Cloud Native Application(00:28:04) Quarkus vs Spring Boot(00:33:21) JRebel & Quarkus(00:34:35) Java vs New Programming Languages(00:39:22) The ORM Dilemma(00:42:38) Some Hibernate Design Pattern Tips(00:46:40) Getting Paid Working on Open Source(00:48:41) Hibernate License Change(00:51:05) Intellectual Property & Meaningful Contributions(00:52:52) AI Usage & Copyright in Open Source(00:55:21) Biggest Challenge Working in a Big Open Source(00:56:08) Politics in Open Source(00:58:32) Security Risks in Open Source(01:02:25) Donating Hibernate to Commonhaus Foundation(01:04:49) The Future of Red Hat(01:06:39) 3 Tech Lead Wisdom_____Sanne Grinovero's BioSanne Grinovero has been a member of the Hibernate team for 10 years; today he leads this project in his role of Sr. Principal Software Engineer at Red Hat, while also working on Quarkus as a founding R&D engineer.Deeply interested in solving performance and concurrency challenges around data access, scalability, and exploring integration with new storage technologies, distributed systems and search engines.Working on Hibernate features led him to contribute to related open source technologies; most notably to Apache Lucene and Elasticsearch, Infinispan and JGroups, ANTLR, WildFly, various JDBC drivers, the OpenJDK and more recently getting interested in GraalVM.After being challenged to reduce memory consumption and improve bootstrap times of Hibernate, Sanne worked as part of a small R&D team at Red Hat on some ideas which have evolved into what is known today as Quarkus.Follow Sanne:LinkedIn – linkedin.com/in/sannegrinoveroTwitter – twitter.com/SanneGrinoveroGitHub – github.com/sanneLike this episode?Show notes & transcript: techleadjournal.dev/episodes/220.Follow @techleadjournal on LinkedIn, Twitter, and Instagram.Buy me a coffee or become a patron.
Wer darf eigentlich was? Und sollten wir alle wirklich alles dürfen?Jedes Tech-Projekt beginnt mit einer simplen Frage: Wer darf eigentlich was? Doch spätestens wenn das Startup wächst, Kunden Compliance fordern oder der erste Praktikant an die Produktionsdatenbank rührt, wird Role Based Access Control (RBAC) plötzlich zur Überlebensfrage – und wer das Thema unterschätzt, hat schnell die Rechtehölle am Hals.In dieser Folge nehmen wir das altbekannte Konzept der rollenbasierten Zugriffskontrolle auseinander. wir klären, welches Problem RBAC eigentlich ganz konkret löst, warum sich hinter den harmlosen Checkboxen viel technische Tiefe und organisatorisches Drama verbirgt und weshalb RBAC nicht gleich RBAC ist.Dabei liefern wir dir Praxis-Insights: Wie setzen Grafana, Sentry, Elasticsearch, OpenSearch oder Tracing-Tools wie Jäger dieses Rechtekonzept um? Wo liegen die Fallstricke in komplexen, mehrmandantenfähigen Systemen?Ob du endlich verstehen willst, warum RBAC, ABAC (Attribute-Based), ReBAC (Relationship-Based) und Policy Engines mehr als nur Buzzwords sind oder wissen möchtest, wie du Policies, Edge Cases und Constraints in den Griff bekommst, darum geht es in diesem Deep Dives.Auch mit dabei: Open Source-Highlights wie Casbin, SpiceDB, OpenFGA und OPA und echte Projekt- und Startup-Tipps für pragmatischen Start und spätere Skalierung.Bonus: Ein Märchen mit Kevin und Max, wo auch manchmal der Praktikant trotzdem gegen den Admin gewinnt
What if you could use ElasticSearch serverless? While at Build, Carl and Richard chatted with Ken Exner about the new announcements around Elastic providing serverless storage and search! Ken talks about paying for only the data you move and store with serverless, rather than needing to operate any infrastructure for Elastic. The conversation digs into the potential of Elastic in Azure AI Foundry to provide ultra-fast access to current company data for your LLM implementations. Elastic did vector databases before LLMs made them essential for RAG - and you can take advantage of it!
This episode was sponsored by Elastic! Elastic is the company behind Elasticsearch, they help teams find, analyze, and act on their data in real-time through their Search, Observability, and Security solutions. Thanks Elastic! This episode was recorded at Elastic's offices in San Francisco during a meetup.Find info about the show, past episodes including transcripts, our swag store, Patreon link, and more at https://cupogo.dev/.
What if you could use ElasticSearch serverless? While at Build, Carl and Richard chatted with Ken Exner about the new announcements around Elastic providing serverless storage and search! Ken talks about paying for only the data you move and store with serverless, rather than needing to operate any infrastructure for Elastic. The conversation digs into the potential of Elastic in Azure AI Foundry to provide ultra-fast access to current company data for your LLM implementations. Elastic did vector databases before LLMs made them essential for RAG - and you can take advantage of it!
Join hosts Daniel Garcia and Grant Copley as they dive into the latest news and updates in the BoxLang and CFML world. Don't miss out on insights, discussions, and what's coming next for modern software development!
In a new season of the Oracle University Podcast, Lois Houston and Nikita Abraham dive into the world of Oracle GoldenGate 23ai, a cutting-edge software solution for data management. They are joined by Nick Wagner, a seasoned expert in database replication, who provides a comprehensive overview of this powerful tool. Nick highlights GoldenGate's ability to ensure continuous operations by efficiently moving data between databases and platforms with minimal overhead. He emphasizes its role in enabling real-time analytics, enhancing data security, and reducing costs by offloading data to low-cost hardware. The discussion also covers GoldenGate's role in facilitating data sharing, improving operational efficiency, and reducing downtime during outages. Oracle GoldenGate 23ai: Fundamentals: https://mylearn.oracle.com/ou/course/oracle-goldengate-23ai-fundamentals/145884/237273 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://x.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Kris-Ann Nansen, Radhika Banka, and the OU Studio Team for helping us create this episode. --------------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started! 00:25 Nikita: Welcome to the Oracle University Podcast! I'm Nikita Abraham, Team Lead: Editorial Services with Oracle University, and with me is Lois Houston: Director of Innovation Programs. Lois: Hi everyone! Welcome to a new season of the podcast. This time, we're focusing on the fundamentals of Oracle GoldenGate. Oracle GoldenGate helps organizations manage and synchronize their data across diverse systems and databases in real time. And with the new Oracle GoldenGate 23ai release, we'll uncover the latest innovations and features that empower businesses to make the most of their data. Nikita: Taking us through this is Nick Wagner, Senior Director of Product Management for Oracle GoldenGate. He's been doing database replication for about 25 years and has been focused on GoldenGate on and off for about 20 of those years. 01:18 Lois: In today's episode, we'll ask Nick to give us a general overview of the product, along with some use cases and benefits. Hi Nick! To start with, why do customers need GoldenGate? Nick: Well, it delivers continuous operations, being able to continuously move data from one database to another database or data platform in efficiently and a high-speed manner, and it does this with very low overhead. Almost all the GoldenGate environments use transaction logs to pull the data out of the system, so we're not creating any additional triggers or very little overhead on that source system. GoldenGate can also enable real-time analytics, being able to pull data from all these different databases and move them into your analytics system in real time can improve the value that those analytics systems provide. Being able to do real-time statistics and analysis of that data within those high-performance custom environments is really important. 02:13 Nikita: Does it offer any benefits in terms of cost? Nick: GoldenGate can also lower IT costs. A lot of times people run these massive OLTP databases, and they are running reporting in those same systems. With GoldenGate, you can offload some of the data or all the data to a low-cost commodity hardware where you can then run the reports on that other system. So, this way, you can get back that performance on the OLTP system, while at the same time optimizing your reporting environment for those long running reports. You can improve efficiencies and reduce risks. Being able to reduce the amount of downtime during planned and unplanned outages can really make a big benefit to the overall operational efficiencies of your company. 02:54 Nikita: What about when it comes to data sharing and data security? Nick: You can also reduce barriers to data sharing. Being able to pull subsets of data, or just specific pieces of data out of a production database and move it to the team or to the group that needs that information in real time is very important. And it also protects the security of your data by only moving in the information that they need and not the entire database. It also provides extensibility and flexibility, being able to support multiple different replication topologies and architectures. 03:24 Lois: Can you tell us about some of the use cases of GoldenGate? Where does GoldenGate truly shine? Nick: Some of the more traditional use cases of GoldenGate include use within the multicloud fabric. Within a multicloud fabric, this essentially means that GoldenGate can replicate data between on-premise environments, within cloud environments, or hybrid, cloud to on-premise, on-premise to cloud, or even within multiple clouds. So, you can move data from AWS to Azure to OCI. You can also move between the systems themselves, so you don't have to use the same database in all the different clouds. For example, if you wanted to move data from AWS Postgres into Oracle running in OCI, you can do that using Oracle GoldenGate. We also support maximum availability architectures. And so, there's a lot of different use cases here, but primarily geared around reducing your recovery point objective and recovery time objective. 04:20 Lois: Ah, reducing RPO and RTO. That must have a significant advantage for the customer, right? Nick: So, reducing your RPO and RTO allows you to take advantage of some of the benefits of GoldenGate, being able to do active-active replication, being able to set up GoldenGate for high availability, real-time failover, and it can augment your active Data Guard and Data Guard configuration. So, a lot of times GoldenGate is used within Oracle's maximum availability architecture platinum tier level of replication, which means that at that point you've got lots of different capabilities within the Oracle Database itself. But to help eke out that last little bit of high availability, you want to set up an active-active environment with GoldenGate to really get true zero RPO and RTO. GoldenGate can also be used for data offloading and data hubs. Being able to pull data from one or more source systems and move it into a data hub, or into a data warehouse for your operational reporting. This could also be your analytics environment too. 05:22 Nikita: Does GoldenGate support online migrations? Nick: In fact, a lot of companies actually get started in GoldenGate by doing a migration from one platform to another. Now, these don't even have to be something as complex as going from one database like a DB2 on-premise into an Oracle on OCI, it could even be simple migrations. A lot of times doing something like a major application or a major database version upgrade is going to take downtime on that production system. You can use GoldenGate to eliminate that downtime. So this could be going from Oracle 19c to Oracle 23ai, or going from application version 1.0 to application version 2.0, because GoldenGate can do the transformation between the different application schemas. You can use GoldenGate to migrate your database from on premise into the cloud with no downtime as well. We also support real-time analytic feeds, being able to go from multiple databases, not only those on premise, but being able to pull information from different SaaS applications inside of OCI and move it to your different analytic systems. And then, of course, we also have the ability to stream events and analytics within GoldenGate itself. 06:34 Lois: Let's move on to the various topologies supported by GoldenGate. I know GoldenGate supports many different platforms and can be used with just about any database. Nick: This first layer of topologies is what we usually consider relational database topologies. And so this would be moving data from Oracle to Oracle, Postgres to Oracle, Sybase to SQL Server, a lot of different types of databases. So the first architecture would be unidirectional. This is replicating from one source to one target. You can do this for reporting. If I wanted to offload some reports into another server, I can go ahead and do that using GoldenGate. I can replicate the entire database or just a subset of tables. I can also set up GoldenGate for bidirectional, and this is what I want to set up GoldenGate for something like high availability. So in the event that one of the servers crashes, I can almost immediately reconnect my users to the other system. And that almost immediately depends on the amount of latency that GoldenGate has at that time. So a typical latency is anywhere from 3 to 6 seconds. So after that primary system fails, I can reconnect my users to the other system in 3 to 6 seconds. And I can do that because as GoldenGate's applying data into that target database, that target system is already open for read and write activity. GoldenGate is just another user connecting in issuing DML operations, and so it makes that failover time very low. 07:59 Nikita: Ok…If you can get it down to 3 to 6 seconds, can you bring it down to zero? Like zero failover time? Nick: That's the next topology, which is active-active. And in this scenario, all servers are read/write all at the same time and all available for user activity. And you can do multiple topologies with this as well. You can do a mesh architecture, which is where every server talks to every other server. This works really well for 2, 3, 4, maybe even 5 environments, but when you get beyond that, having every server communicate with every other server can get a little complex. And so at that point we start looking at doing what we call a hub and spoke architecture, where we have lots of different spokes. At the end of each spoke is a read/write database, and then those communicate with a hub. So any change that happens on one spoke gets sent into the hub, and then from the hub it gets sent out to all the other spokes. And through that architecture, it allows you to really scale up your environments. We have customers that are doing up to 150 spokes within that hub architecture. Within active-active replication as well, we can do conflict detection and resolution, which means that if two users modify the same row on two different systems, GoldenGate can actually determine that there was an issue with that and determine what user wins or which row change wins, which is extremely important when doing active-active replication. And this means that if one of those systems fails, there is no downtime when you switch your users to another active system because it's already available for activity and ready to go. 09:35 Lois: Wow, that's fantastic. Ok, tell us more about the topologies. Nick: GoldenGate can do other things like broadcast, sending data from one system to multiple systems, or many to one as far as consolidation. We can also do cascading replication, so when data moves from one environment that GoldenGate is replicating into another environment that GoldenGate is replicating. By default, we ignore all of our own transactions. But there's actually a toggle switch that you can flip that says, hey, GoldenGate, even though you wrote that data into that database, still push it on to the next system. And then of course, we can also do distribution of data, and this is more like moving data from a relational database into something like a Kafka topic or a JMS queue or into some messaging service. 10:24 Raise your game with the Oracle Cloud Applications skills challenge. Get free training on Oracle Fusion Cloud Applications, Oracle Modern Best Practice, and Oracle Cloud Success Navigator. Pass the free Oracle Fusion Cloud Foundations Associate exam to earn a Foundations Associate certification. Plus, there's a chance to win awards and prizes throughout the challenge! What are you waiting for? Join the challenge today by visiting visit oracle.com/education. 10:58 Nikita: Welcome back! Nick, does GoldenGate also have nonrelational capabilities? Nick: We have a number of nonrelational replication events in topologies as well. This includes things like data lake ingestion and streaming ingestion, being able to move data and data objects from these different relational database platforms into data lakes and into these streaming systems where you can run analytics on them and run reports. We can also do cloud ingestion, being able to move data from these databases into different cloud environments. And this is not only just moving it into relational databases with those clouds, but also their data lakes and data fabrics. 11:38 Lois: You mentioned a messaging service earlier. Can you tell us more about that? Nick: Messaging replication is also possible. So we can actually capture from things like messaging systems like Kafka Connect and JMS, replicate that into a relational data, or simply stream it into another environment. We also support NoSQL replication, being able to capture from MongoDB and replicate it onto another MongoDB for high availability or disaster recovery, or simply into any other system. 12:06 Nikita: I see. And is there any integration with a customer's SaaS applications? Nick: GoldenGate also supports a number of different OCI SaaS applications. And so a lot of these different applications like Oracle Financials Fusion, Oracle Transportation Management, they all have GoldenGate built under the covers and can be enabled with a flag that you can actually have that data sent out to your other GoldenGate environment. So you can actually subscribe to changes that are happening in these other systems with very little overhead. And then of course, we have event processing and analytics, and this is the final topology or flexibility within GoldenGate itself. And this is being able to push data through data pipelines, doing data transformations. GoldenGate is not an ETL tool, but it can do row-level transformation and row-level filtering. 12:55 Lois: Are there integrations offered by Oracle GoldenGate in automation and artificial intelligence? Nick: We can do time series analysis and geofencing using the GoldenGate Stream Analytics product. It allows you to actually do real time analysis and time series analysis on data as it flows through the GoldenGate trails. And then that same product, the GoldenGate Stream Analytics, can then take the data and move it to predictive analytics, where you can run MML on it, or ONNX or other Spark-type technologies and do real-time analysis and AI on that information as it's flowing through. 13:29 Nikita: So, GoldenGate is extremely flexible. And given Oracle's focus on integrating AI into its product portfolio, what about GoldenGate? Does it offer any AI-related features, especially since the product name has “23ai” in it? Nick: With the advent of Oracle GoldenGate 23ai, it's one of the two products at this point that has the AI moniker at Oracle. Oracle Database 23ai also has it, and that means that we actually do stuff with AI. So the Oracle GoldenGate product can actually capture vectors from databases like MySQL HeatWave, Postgres using pgvector, which includes things like AlloyDB, Amazon RDS Postgres, Aurora Postgres. We can also replicate data into Elasticsearch and OpenSearch, or if the data is using vectors within OCI or the Oracle Database itself. So GoldenGate can be used for a number of things here. The first one is being able to migrate vectors into the Oracle Database. So if you're using something like Postgres, MySQL, and you want to migrate the vector information into the Oracle Database, you can. Now one thing to keep in mind here is a vector is oftentimes like a GPS coordinate. So if I need to know the GPS coordinates of Austin, Texas, I can put in a latitude and longitude and it will give me the GPS coordinates of a building within that city. But if I also need to know the altitude of that same building, well, that's going to be a different algorithm. And GoldenGate and replicating vectors is the same way. When you create a vector, it's essentially just creating a bunch of numbers under the screen, kind of like those same GPS coordinates. The dimension and the algorithm that you use to generate that vector can be different across different databases, but the actual meaning of that data will change. And so GoldenGate can replicate the vector data as long as the algorithm and the dimensions are the same. If the algorithm and the dimensions are not the same between the source and the target, then you'll actually want GoldenGate to replicate the base data that created that vector. And then once GoldenGate replicates the base data, it'll actually call the vector embedding technology to re-embed that data and produce that numerical formatting for you. 15:42 Lois: So, there are some nuances there… Nick: GoldenGate can also replicate and consolidate vector changes or even do the embedding API calls itself. This is really nice because it means that we can take changes from multiple systems and consolidate them into a single one. We can also do the reverse of that too. A lot of customers are still trying to find out which algorithms work best for them. How many dimensions? What's the optimal use? Well, you can now run those in different servers without impacting your actual AI system. Once you've identified which algorithm and dimension is going to be best for your data, you can then have GoldenGate replicate that into your production system and we'll start using that instead. So it's a nice way to switch algorithms without taking extensive downtime. 16:29 Nikita: What about in multicloud environments? Nick: GoldenGate can also do multicloud and N-way active-active Oracle replication between vectors. So if there's vectors in Oracle databases, in multiple clouds, or multiple on-premise databases, GoldenGate can synchronize them all up. And of course we can also stream changes from vector information, including text as well into different search engines. And that's where the integration with Elasticsearch and OpenSearch comes in. And then we can use things like NVIDIA and Cohere to actually do the AI on that data. 17:01 Lois: Using GoldenGate with AI in the database unlocks so many possibilities. Thanks for that detailed introduction to Oracle GoldenGate 23ai and its capabilities, Nick. Nikita: We've run out of time for today, but Nick will be back next week to talk about how GoldenGate has evolved over time and its latest features. And if you liked what you heard today, head over to mylearn.oracle.com and take a look at the Oracle GoldenGate 23ai Fundamentals course to learn more. Until next time, this is Nikita Abraham… Lois: And Lois Houston, signing off! 17:33 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
Zoekfunctionaliteit is een fundamenteel onderdeel van de moderne digitale wereld. Elastic, het bedrijf achter Elastic Search, voorziet daarin met een platform dat meer doet dan enkel door data bladeren. Elastic Search stelt gebruikers in staat razendsnel complexe queries uit te voeren op grote datasets, iets waar traditionele databases moeite mee hebben. Dat maakt het niet alleen geschikt voor webapplicaties en e-commerceplatforms, maar ook voor beveiligingssystemen en operationele monitoring. Bedrijven als Uber, Netflix en Wikipedia maken dan ook gebruik van Elastic-producten.
Zoekfunctionaliteit is een fundamenteel onderdeel van de moderne digitale wereld. Elastic, het bedrijf achter Elastic Search, voorziet daarin met een platform dat meer doet dan enkel door data bladeren. Elastic Search stelt gebruikers in staat razendsnel complexe queries uit te voeren op grote datasets, iets waar traditionele databases moeite mee hebben. Dat maakt het niet alleen geschikt voor webapplicaties en e-commerceplatforms, maar ook voor beveiligingssystemen en operationele monitoring. Bedrijven als Uber, Netflix en Wikipedia maken dan ook gebruik van Elastic-producten. De oorsprong van het bedrijf ligt in Amsterdam, waar de basis werd gelegd voor wat uiteindelijk een wereldwijd opererende speler is geworden. Het idee voor Elastic Search ontstond uit een persoonlijke behoefte: oprichter Shay Banon wilde een tool bouwen om recepten voor zijn vrouw makkelijk doorzoekbaar te maken. Daaruit groeide een open-sourceproject dat uitgroeide tot een infrastructuurproduct dat in duizenden bedrijven wordt ingezet. Toch brengt die open-source-aanpak ook uitdagingen met zich mee. Elastic heeft in 2021 een wijziging doorgevoerd in het open-sourcemodel, juist omdat grote cloudproviders hun technologie gebruikten zonder bij te dragen aan de ontwikkeling. Die stap was voor Elastic noodzakelijk om zichzelf duurzaam te kunnen blijven ontwikkelen. De balans tussen openheid en commerciële haalbaarheid blijft daarmee een voortdurende afweging. In een wereld waarin hyperscalers domineren, is de vraag gerechtvaardigd of puur open-sourcemodellen levensvatbaar blijven. Inmiddels is Elastic ook meegegaan in de golf van kunstmatige intelligentie. De toevoeging van vector search en AI-koppelingen aan de bestaande zoektechnologie maakt nieuwe toepassingen mogelijk, zoals semantisch zoeken of realtime analyses op ongeordende datasets. Daarmee sluit Elastic aan op de groeiende behoefte aan AI-native infrastructuur, zonder zijn kernproduct uit het oog te verliezen. De organisatie zelf is ingericht zonder traditioneel hoofdkantoor. Het bedrijf is meer dan 7 miljard dollar waard, maar kent een sterk gedistribueerd team. Dat blijkt goed te werken, zeker in een post-pandemische wereld waar locatie minder bepalend is voor samenwerking. Wel blijft de vraag hoe Europa zich verhoudt tot de mondiale techgiganten. Vanuit Nederland is Elastic uitgegroeid tot een voorbeeld van internationale schaalbaarheid, maar het Europese ecosysteem mist volgens Elastic soms de slagkracht om écht door te pakken. Gast Jeroen Berckenkamp Video YouTube Hosts Ben van der Burg & Daniël Mol Redactie Daniël MolSee omnystudio.com/listener for privacy information.
OpenSearch has evolved significantly since its 2021 launch, recently reaching a major milestone with its move to the Linux Foundation. This shift from company-led to foundation-based governance has accelerated community contributions and enterprise adoption, as discussed by NetApp's Amanda Katona in a New Stack Makers episode recorded at KubeCon + CloudNativeCon Europe. NetApp, an early adopter of OpenSearch following Elasticsearch's licensing change, now offers managed services on the platform and contributes actively to its development.Katona emphasized how neutral governance under the Linux Foundation has lowered barriers to enterprise contribution, noting a 56% increase in downloads since the transition and growing interest from developers. OpenSearch 3.0, featuring a Lucene 10 upgrade, promises faster search capabilities—especially relevant as data volumes surge. NetApp's ongoing investments include work on machine learning plugins and developer training resources.Katona sees the Linux Foundation's involvement as key to OpenSearch's long-term success, offering vendor-neutral governance and reassuring users seeking openness, performance, and scalability in data search and analytics.Learn more from The New Stack about OpenSearch: Report: OpenSearch Bests ElasticSearch at Vector ModelingAWS Transfers OpenSearch to the Linux Foundation OpenSearch: How the Project Went From Fork to FoundationJoin our community of newsletter subscribers to stay on top of the news and at the top of your game.
Brandon Liu is an open source developer and creator of the Protomaps basemap project. We talk about how static maps help developers build sites that last, the PMTiles file format, the role of OpenStreetMap, and his experience funding and running an open source project full time. Protomaps Protomaps PMTiles (File format used by Protomaps) Self-hosted slippy maps, for novices (like me) Why Deploy Protomaps on a CDN User examples Flickr Pinball Map Toilet Map Related projects OpenStreetMap (Dataset protomaps is based on) Mapzen (Former company that released details on what to display based on zoom levels) Mapbox GL JS (Mapbox developed source available map rendering library) MapLibre GL JS (Open source fork of Mapbox GL JS) Other links HTTP range requests (MDN) Hilbert curve Transcript You can help correct transcripts on GitHub. Intro [00:00:00] Jeremy: I'm talking to Brandon Liu. He's the creator of Protomaps, which is a way to easily create and host your own maps. Let's get into it. [00:00:09] Brandon: Hey, so thanks for having me on the podcast. So I'm Brandon. I work on an open source project called Protomaps. What it really is, is if you're a front end developer and you ever wanted to put maps on a website or on a mobile app, then Protomaps is sort of an open source solution for doing that that I hope is something that's way easier to use than, um, a lot of other open source projects. Why not just use Google Maps? [00:00:36] Jeremy: A lot of people are gonna be familiar with Google Maps. Why should they worry about whether something's open source? Why shouldn't they just go and use the Google maps API? [00:00:47] Brandon: So Google Maps is like an awesome thing it's an awesome product. Probably one of the best tech products ever right? And just to have a map that tells you what restaurants are open and something that I use like all the time especially like when you're traveling it has all that data. And the most amazing part is that it's free for consumers but it's not necessarily free for developers. Like if you wanted to embed that map onto your website or app, that usually has an API cost which still has a free tier and is affordable. But one motivation, one basic reason to use open source is if you have some project that doesn't really fit into that pricing model. You know like where you have to pay the cost of Google Maps, you have a side project, a nonprofit, that's one reason. But there's lots of other reasons related to flexibility or customization where you might want to use open source instead. Protomaps examples [00:01:49] Jeremy: Can you give some examples where people have used Protomaps and where that made sense for them? [00:01:56] Brandon: I follow a lot of the use cases and I also don't know about a lot of them because I don't have an API where I can track a hundred percent of the users. Some of them use the hosted version, but I would say most of them probably use it on their own infrastructure. One of the cool projects I've been seeing is called Toilet Map. And what toilet map is if you're in the UK and you want find a public restroom then it maps out, sort of crowdsourced all of the public restrooms. And that's important for like a lot of people if they have health issues, they need to find that information. And just a lot of different projects in the same vein. There's another one called Pinball Map which is sort of a hobby project to find all the pinball machines in the world. And they wanted to have a customized map that fit in with their theme of pinball. So these sorts of really cool indie projects are the ones I'm most excited about. Basemaps vs Overlays [00:02:57] Jeremy: And if we talk about, like the pinball map as an example, there's this concept of a basemap and then there's the things that you lay on top of it. What is a basemap and then is the pinball locations is that part of it or is that something separate? [00:03:12] Brandon: It's usually something separate. The example I usually use is if you go to a real estate site, like Zillow, you'll open up the map of Seattle and it has a bunch of pins showing all the houses, and then it has some information beneath it. That information beneath it is like labels telling, this neighborhood is Capitol Hill, or there is a park here. But all that information is common to a lot of use cases and it's not specific to real estate. So I think usually that's the distinction people use in the industry between like a base map versus your overlay. The overlay is like the data for your product or your company while the base map is something you could get from Google or from Protomaps or from Apple or from Mapbox that kind of thing. PMTiles for hosting the basemap and overlays [00:03:58] Jeremy: And so Protomaps in particular is responsible for the base map, and that information includes things like the streets and the locations of landmarks and things like that. Where is all that information coming from? [00:04:12] Brandon: So the base map information comes from a project called OpenStreetMap. And I would also, point out that for Protomaps as sort of an ecosystem. You can also put your overlay data into a format called PMTiles, which is sort of the core of what Protomaps is. So it can really do both. It can transform your data into the PMTiles format which you can host and you can also host the base map. So you kind of have both of those sides of the product in one solution. [00:04:43] Jeremy: And so when you say you have both are you saying that the PMTiles file can have, the base map in one file and then you would have the data you're laying on top in another file? Or what are you describing there? [00:04:57] Brandon: That's usually how I recommend to do it. Oftentimes there'll be sort of like, a really big basemap 'cause it has all of that data about like where the rivers are. Or while, if you want to put your map of toilets or park benches or pickleball courts on top, that's another file. But those are all just like assets you can move around like JSON or CSV files. Statically Hosted [00:05:19] Jeremy: And I think one of the things you mentioned was that your goal was to make Protomaps or the, the use of these PMTiles files easy to use. What does that look like for, for a developer? I wanna host a map. What do I actually need to, to put on my servers? [00:05:38] Brandon: So my usual pitch is that basically if you know how to use S3 or cloud storage, that you know how to deploy a map. And that, I think is the main sort of differentiation from most open source projects. Like a lot of them, they call themselves like, like some sort of self-hosted solution. But I've actually avoided using the term self-hosted because I think in most cases that implies a lot of complexity. Like you have to log into a Linux server or you have to use Kubernetes or some sort of Docker thing. What I really want to emphasize is the idea that, for Protomaps, it's self-hosted in the same way like CSS is self-hosted. So you don't really need a service from Amazon to host the JSON files or CSV files. It's really just a static file. [00:06:32] Jeremy: When you say static file that means you could use any static web host to host your HTML file, your JavaScript that actually renders the map. And then you have your PMTiles files, and you're not running a process or anything, you're just putting your files on a static file host. [00:06:50] Brandon: Right. So I think if you're a developer, you can also argue like a static file server is a server. It's you know, it's the cloud, it's just someone else's computer. It's really just nginx under the hood. But I think static storage is sort of special. If you look at things like static site generators, like Jekyll or Hugo, they're really popular because they're a commodity or like the storage is a commodity. And you can take your blog, make it a Jekyll blog, hosted on S3. One day, Amazon's like, we're charging three times as much so you can move it to a different cloud provider. And that's all vendor neutral. So I think that's really the special thing about static storage as a primitive on the web. Why running servers is a problem for resilience [00:07:36] Jeremy: Was there a prior experience you had? Like you've worked with maps for a very long time. Were there particular difficulties you had where you said I just gotta have something that can be statically hosted? [00:07:50] Brandon: That's sort of exactly why I got into this. I've been working sort of in and around the map space for over a decade, and Protomaps is really like me trying to solve the same problem I've had over and over again in the past, just like once and forever right? Because like once this problem is solved, like I don't need to deal with it again in the future. So I've worked at a couple of different companies before, mostly as a contractor, for like a humanitarian nonprofit for a design company doing things like, web applications to visualize climate change. Or for even like museums, like digital signage for museums. And oftentimes they had some sort of data visualization component, but always sort of the challenge of how to like, store and also distribute like that data was something that there wasn't really great open source solutions. So just for map data, that's really what motivated that design for Protomaps. [00:08:55] Jeremy: And in those, those projects in the past, were those things where you had to run your own server, run your own database, things like that? [00:09:04] Brandon: Yeah. And oftentimes we did, we would spin up an EC2 instance, for maybe one client and then we would have to host this server serving map data forever. Maybe the client goes away, or I guess it's good for business if you can sign some sort of like long-term support for that client saying, Hey, you know, like we're done with a project, but you can pay us to maintain the EC2 server for the next 10 years. And that's attractive. but it's also sort of a pain, because usually what happens is if people are given the choice, like a developer between like either I can manage the server on EC2 or on Rackspace or Hetzner or whatever, or I can go pay a SaaS to do it. In most cases, businesses will choose to pay the SaaS. So that's really like what creates a sort of lock-in is this preference for like, so I have this choice between like running the server or paying the SaaS. Like businesses will almost always go and pay the SaaS. [00:10:05] Jeremy: Yeah. And in this case, you either find some kind of free hosting or low-cost hosting just to host your files and you upload the files and then you're good from there. You don't need to maintain anything. [00:10:18] Brandon: Exactly, and that's really the ideal use case. so I have some users these, climate science consulting agencies, and then they might have like a one-off project where they have to generate the data once, but instead of having to maintain this server for the lifetime of that project, they just have a file on S3 and like, who cares? If that costs a couple dollars a month to run, that's fine, but it's not like S3 is gonna be deprecated, like it's gonna be on an insecure version of Ubuntu or something. So that's really the ideal, set of constraints for using Protomaps. [00:10:58] Jeremy: Yeah. Something this also makes me think about is, is like the resilience of sites like remaining online, because I, interviewed, Kyle Drake, he runs Neocities, which is like a modern version of GeoCities. And if I remember correctly, he was mentioning how a lot of old websites from that time, if they were running a server backend, like they were running PHP or something like that, if you were to try to go to those sites, now they're like pretty much all dead because there needed to be someone dedicated to running a Linux server, making sure things were patched and so on and so forth. But for static sites, like the ones that used to be hosted on GeoCities, you can go to the internet archive or other websites and they were just files, right? You can bring 'em right back up, and if anybody just puts 'em on a web server, then you're good. They're still alive. Case study of news room preferring static hosting [00:11:53] Brandon: Yeah, exactly. One place that's kind of surprising but makes sense where this comes up, is for newspapers actually. Some of the users using Protomaps are the Washington Post. And the reason they use it, is not necessarily because they don't want to pay for a SaaS like Google, but because if they make an interactive story, they have to guarantee that it still works in a couple of years. And that's like a policy decision from like the editorial board, which is like, so you can't write an article if people can't view it in five years. But if your like interactive data story is reliant on a third party, API and that third party API becomes deprecated, or it changes the pricing or it, you know, it gets acquired, then your journalism story is not gonna work anymore. So I have seen really good uptake among local news rooms and even big ones to use things like Protomaps just because it makes sense for the requirements. Working on Protomaps as an open source project for five years [00:12:49] Jeremy: How long have you been working on Protomaps and the parts that it's made up of such as PMTiles? [00:12:58] Brandon: I've been working on it for about five years, maybe a little more than that. It's sort of my pandemic era project. But the PMTiles part, which is really the heart of it only came in about halfway. Why not make a SaaS? [00:13:13] Brandon: So honestly, like when I first started it, I thought it was gonna be another SaaS and then I looked at it and looked at what the environment was around it. And I'm like, uh, so I don't really think I wanna do that. [00:13:24] Jeremy: When, when you say you looked at the environment around it what do you mean? Why did you decide not to make it a SaaS? [00:13:31] Brandon: Because there already is a lot of SaaS out there. And I think the opportunity of making something that is unique in terms of those use cases, like I mentioned like newsrooms, was clear. Like it was clear that there was some other solution, that could be built that would fit these needs better while if it was a SaaS, there are plenty of those out there. And I don't necessarily think that they're well differentiated. A lot of them all use OpenStreetMap data. And it seems like they mainly compete on price. It's like who can build the best three column pricing model. And then once you do that, you need to build like billing and metrics and authentication and like those problems don't really interest me. So I think, although I acknowledge sort of the indie hacker ethos now is to build a SaaS product with a monthly subscription, that's something I very much chose not to do, even though it is for sure like the best way to build a business. [00:14:29] Jeremy: Yeah, I mean, I think a lot of people can appreciate that perspective because it's, it's almost like we have SaaS overload, right? Where you have so many little bills for your project where you're like, another $5 a month, another $10 a month, or if you're a business, right? Those, you add a bunch of zeros and at some point it's just how many of these are we gonna stack on here? [00:14:53] Brandon: Yeah. And honestly. So I really think like as programmers, we're not really like great at choosing how to spend money like a $10 SaaS. That's like nothing. You know? So I can go to Starbucks and I can buy a pumpkin spice latte, and that's like $10 basically now, right? And it's like I'm able to make that consumer choice in like an instant just to spend money on that. But then if you're like, oh, like spend $10 on a SaaS that somebody put a lot of work into, then you're like, oh, that's too expensive. I could just do it myself. So I'm someone that also subscribes to a lot of SaaS products. and I think for a lot of things it's a great fit. Many open source SaaS projects are not easy to self host [00:15:37] Brandon: But there's always this tension between an open source project that you might be able to run yourself and a SaaS. And I think a lot of projects are at different parts of the spectrum. But for Protomaps, it's very much like I'm trying to move maps to being it is something that is so easy to run yourself that anyone can do it. [00:16:00] Jeremy: Yeah, and I think you can really see it with, there's a few SaaS projects that are successful and they're open source, but then you go to look at the self-hosting instructions and it's either really difficult to find and you find it, and then the instructions maybe don't work, or it's really complicated. So I think doing the opposite with Protomaps. As a user, I'm sure we're all appreciative, but I wonder in terms of trying to make money, if that's difficult. [00:16:30] Brandon: No, for sure. It is not like a good way to make money because I think like the ideal situation for an open source project that is open that wants to make money is the product itself is fundamentally complicated to where people are scared to run it themselves. Like a good example I can think of is like Supabase. Supabase is sort of like a platform as a service based on Postgres. And if you wanted to run it yourself, well you need to run Postgres and you need to handle backups and authentication and logging, and that stuff all needs to work and be production ready. So I think a lot of people, like they don't trust themselves to run database backups correctly. 'cause if you get it wrong once, then you're kind of screwed. So I think that fundamental aspect of the product, like a database is something that is very, very ripe for being a SaaS while still being open source because it's fundamentally hard to run. Another one I can think of is like tailscale, which is, like a VPN that works end to end. That's something where, you know, it has this networking complexity where a lot of developers don't wanna deal with that. So they'd happily pay, for tailscale as a service. There is a lot of products or open source projects that eventually end up just changing to becoming like a hosted service. Businesses going from open source to closed or restricted licenses [00:17:58] Brandon: But then in that situation why would they keep it open source, right? Like, if it's easy to run yourself well, doesn't that sort of cannibalize their business model? And I think that's really the tension overall in these open source companies. So you saw it happen to things like Elasticsearch to things like Terraform where they eventually change the license to one that makes it difficult for other companies to compete with them. [00:18:23] Jeremy: Yeah, I mean there's been a number of cases like that. I mean, specifically within the mapping community, one I can think of was Mapbox's. They have Mapbox gl. Which was a JavaScript client to visualize maps and they moved from, I forget which license they picked, but they moved to a much more restrictive license. I wonder what your thoughts are on something that releases as open source, but then becomes something maybe a little more muddy. [00:18:55] Brandon: Yeah, I think it totally makes sense because if you look at their business and their funding, it seems like for Mapbox, I haven't used it in a while, but my understanding is like a lot of their business now is car companies and doing in dash navigation. And that is probably way better of a business than trying to serve like people making maps of toilets. And I think sort of the beauty of it is that, so Mapbox, the story is they had a JavaScript renderer called Mapbox GL JS. And they changed that to a source available license a couple years ago. And there's a fork of it that I'm sort of involved in called MapLibre GL. But I think the cool part is Mapbox paid employees for years, probably millions of dollars in total to work on this thing and just gave it away for free. Right? So everyone can benefit from that work they did. It's not like that code went away, like once they changed the license. Well, the old version has been forked. It's going its own way now. It's quite different than the new version of Mapbox, but I think it's extremely generous that they're able to pay people for years, you know, like a competitive salary and just give that away. [00:20:10] Jeremy: Yeah, so we should maybe look at it as, it was a gift while it was open source, and they've given it to the community and they're on continuing on their own path, but at least the community running Map Libre, they can run with it, right? It's not like it just disappeared. [00:20:29] Brandon: Yeah, exactly. And that is something that I use for Protomaps quite extensively. Like it's the primary way of showing maps on the web and I've been trying to like work on some enhancements to it to have like better internationalization for if you are in like South Asia like not show languages correctly. So I think it is being taken in a new direction. And I think like sort of the combination of Protomaps and MapLibre, it addresses a lot of use cases, like I mentioned earlier with like these like hobby projects, indie projects that are almost certainly not interesting to someone like Mapbox or Google as a business. But I'm happy to support as a small business myself. Financially supporting open source work (GitHub sponsors, closed source, contracts) [00:21:12] Jeremy: In my previous interview with Tom, one of the main things he mentioned was that creating a mapping business is incredibly difficult, and he said he probably wouldn't do it again. So in your case, you're building Protomaps, which you've admitted is easy to self-host. So there's not a whole lot of incentive for people to pay you. How is that working out for you? How are you supporting yourself? [00:21:40] Brandon: There's a couple of strategies that I've tried and oftentimes failed at. Just to go down the list, so I do have GitHub sponsors so I do have a hosted version of Protomaps you can use if you don't want to bother copying a big file around. But the way I do the billing for that is through GitHub sponsors. If you wanted to use this thing I provide, then just be a sponsor. And that definitely pays for itself, like the cost of running it. And that's great. GitHub sponsors is so easy to set up. It just removes you having to deal with Stripe or something. 'cause a lot of people, their credit card information is already in GitHub. GitHub sponsors I think is awesome if you want to like cover costs for a project. But I think very few people are able to make that work. A thing that's like a salary job level. It's sort of like Twitch streaming, you know, there's a handful of people that are full-time streamers and then you look down the list on Twitch and it's like a lot of people that have like 10 viewers. But some of the other things I've tried, I actually started out, publishing the base map as a closed source thing, where I would sell sort of like a data package instead of being a SaaS, I'd be like, here's a one-time download, of the premium data and you can buy it. And quite a few people bought it I just priced it at like $500 for this thing. And I thought that was an interesting experiment. The main reason it's interesting is because the people that it attracts to you in terms of like, they're curious about your products, are all people willing to pay money. While if you start out everything being open source, then the people that are gonna be try to do it are only the people that want to get something for free. So what I discovered is actually like once you transition that thing from closed source to open source, a lot of the people that used to pay you money will still keep paying you money because like, it wasn't necessarily that that closed source thing was why they wanted to pay. They just valued that thought you've put into it your expertise, for example. So I think that is one thing, that I tried at the beginning was just start out, closed source proprietary, then make it open source. That's interesting to people. Like if you release something as open source, if you go the other way, like people are really mad if you start out with something open source and then later on you're like, oh, it's some other license. Then people are like that's so rotten. But I think doing it the other way, I think is quite valuable in terms of being able to find an audience. [00:24:29] Jeremy: And when you said it was closed source and paid to open source, do you still sell those map exports? [00:24:39] Brandon: I don't right now. It's something that I might do in the future, you know, like have small customizations of the data that are available, uh, for a fee. still like the core OpenStreetMap based map that's like a hundred gigs you can just download. And that'll always just be like a free download just because that's already out there. All the source code to build it is open source. So even if I said, oh, you have to pay for it, then someone else can just do it right? So there's no real reason like to make that like some sort of like paywall thing. But I think like overall if the project is gonna survive in the long term it's important that I'd ideally like to be able to like grow like a team like have a small group of people that can dedicate the time to growing the project in the long term. But I'm still like trying to figure that out right now. [00:25:34] Jeremy: And when you mentioned that when you went from closed to open and people were still paying you, you don't sell a product anymore. What were they paying for? [00:25:45] Brandon: So I have some contracts with companies basically, like if they need a feature or they need a customization in this way then I am very open to those. And I sort of set it up to make it clear from the beginning that this is not just a free thing on GitHub, this is something that you could pay for if you need help with it, if you need support, if you wanted it. I'm also a little cagey about the word support because I think like it sounds a little bit too wishy-washy. Pretty much like if you need access to the developers of an open source project, I think that's something that businesses are willing to pay for. And I think like making that clear to potential users is a challenge. But I think that is one way that you might be able to make like a living out of open source. [00:26:35] Jeremy: And I think you said you'd been working on it for about five years. Has that mostly been full time? [00:26:42] Brandon: It's been on and off. it's sort of my pandemic era project. But I've spent a lot of time, most of my time working on the open source project at this point. So I have done some things that were more just like I'm doing a customization or like a private deployment for some client. But that's been a minority of the time. Yeah. [00:27:03] Jeremy: It's still impressive to have an open source project that is easy to self-host and yet is still able to support you working on it full time. I think a lot of people might make the assumption that there's nothing to sell if something is, is easy to use. But this sort of sounds like a counterpoint to that. [00:27:25] Brandon: I think I'd like it to be. So when you come back to the point of like, it being easy to self-host. Well, so again, like I think about it as like a primitive of the web. Like for example, if you wanted to start a business today as like hosted CSS files, you know, like where you upload your CSS and then you get developers to pay you a monthly subscription for how many times they fetched a CSS file. Well, I think most developers would be like, that's stupid because it's just an open specification, you just upload a static file. And really my goal is to make Protomaps the same way where it's obvious that there's not really some sort of lock-in or some sort of secret sauce in the server that does this thing. How PMTiles works and building a primitive of the web [00:28:16] Brandon: If you look at video for example, like a lot of the tech for how Protomaps and PMTiles works is based on parts of the HTTP spec that were made for video. And 20 years ago, if you wanted to host a video on the web, you had to have like a real player license or flash. So you had to go license some server software from real media or from macromedia so you could stream video to a browser plugin. But now in HTML you can just embed a video file. And no one's like, oh well I need to go pay for my video serving license. I mean, there is such a thing, like YouTube doesn't really use that for DRM reasons, but people just have the assumption that video is like a primitive on the web. So if we're able to make maps sort of that same way like a primitive on the web then there isn't really some obvious business or licensing model behind how that works. Just because it's a thing and it helps a lot of people do their jobs and people are happy using it. So why bother? [00:29:26] Jeremy: You mentioned that it a tech that was used for streaming video. What tech specifically is it? [00:29:34] Brandon: So it is byte range serving. So when you open a video file on the web, So let's say it's like a 100 megabyte video. You don't have to download the entire video before it starts playing. It streams parts out of the file based on like what frames... I mean, it's based on the frames in the video. So it can start streaming immediately because it's organized in a way to where the first few frames are at the beginning. And what PMTiles really is, is it's just like a video but in space instead of time. So it's organized in a way where these zoomed out views are at the beginning and the most zoomed in views are at the end. So when you're like panning or zooming in the map all you're really doing is fetching byte ranges out of that file the same way as a video. But it's organized in, this tiled way on a space filling curve. IIt's a little bit complicated how it works internally and I think it's kind of cool but that's sort of an like an implementation detail. [00:30:35] Jeremy: And to the person deploying it, it just looks like a single file. [00:30:40] Brandon: Exactly in the same way like an mp3 audio file is or like a JSON file is. [00:30:47] Jeremy: So with a video, I can sort of see how as someone seeks through the video, they start at the beginning and then they go to the middle if they wanna see the middle. For a map, as somebody scrolls around the map, are you seeking all over the file or is the way it's structured have a little less chaos? [00:31:09] Brandon: It's structured. And that's kind of the main technical challenge behind building PMTiles is you have to be sort of clever so you're not spraying the reads everywhere. So it uses something called a hilbert curve, which is a mathematical concept of a space filling curve. Where it's one continuous curve that essentially lets you break 2D space into 1D space. So if you've seen some maps of IP space, it uses this crazy looking curve that hits all the points in one continuous line. And that's the same concept behind PMTiles is if you're looking at one part of the world, you're sort of guaranteed that all of those parts you're looking at are quite close to each other and the data you have to transfer is quite minimal, compared to if you just had it at random. [00:32:02] Jeremy: How big do the files get? If I have a PMTiles of the entire world, what kind of size am I looking at? [00:32:10] Brandon: Right now, the default one I distribute is 128 gigabytes, so it's quite sizable, although you can slice parts out of it remotely. So if you just wanted. if you just wanted California or just wanted LA or just wanted only a couple of zoom levels, like from zero to 10 instead of zero to 15, there is a command line tool that's also called PMTiles that lets you do that. Issues with CDNs and range queries [00:32:35] Jeremy: And when you're working with files of this size, I mean, let's say I am working with a CDN in front of my application. I'm not typically accustomed to hosting something that's that large and something that's where you're seeking all over the file. is that, ever an issue or is that something that's just taken care of by the browser and, and taken care of by, by the hosts? [00:32:58] Brandon: That is an issue actually, so a lot of CDNs don't deal with it correctly. And my recommendation is there is a kind of proxy server or like a serverless proxy thing that I wrote. That runs on like cloudflare workers or on Docker that lets you proxy those range requests into a normal URL and then that is like a hundred percent CDN compatible. So I would say like a lot of the big commercial installations of this thing, they use that because it makes more practical sense. It's also faster. But the idea is that this solution sort of scales up and scales down. If you wanted to host just your city in like a 10 megabyte file, well you can just put that into GitHub pages and you don't have to worry about it. If you want to have a global map for your website that serves a ton of traffic then you probably want a little bit more sophisticated of a solution. It still does not require you to run a Linux server, but it might require (you) to use like Lambda or Lambda in conjunction with like a CDN. [00:34:09] Jeremy: Yeah. And that sort of ties into what you were saying at the beginning where if you can host on something like CloudFlare Workers or Lambda, there's less time you have to spend keeping these things running. [00:34:26] Brandon: Yeah, exactly. and I think also the Lambda or CloudFlare workers solution is not perfect. It's not as perfect as S3 or as just static files, but in my experience, it still is better at building something that lasts on the time span of years than being like I have a server that is on this Ubuntu version and in four years there's all these like security patches that are not being applied. So it's still sort of serverless, although not totally vendor neutral like S3. Customizing the map [00:35:03] Jeremy: We've mostly been talking about how you host the map itself, but for someone who's not familiar with these kind of tools, how would they be customizing the map? [00:35:15] Brandon: For customizing the map there is front end style customization and there's also data customization. So for the front end if you wanted to change the water from the shade of blue to another shade of blue there is a TypeScript API where you can customize it almost like a text editor color scheme. So if you're able to name a bunch of colors, well you can customize the map in that way you can change the fonts. And that's all done using MapLibre GL using a TypeScript API on top of that for customizing the data. So all the pipeline to generate this data from OpenStreetMap is open source. There is a Java program using a library called PlanetTiler which is awesome, which is this super fast multi-core way of building map tiles. And right now there isn't really great hooks to customize what data goes into that. But that's something that I do wanna work on. And finally, because the data comes from OpenStreetMap if you notice data that's missing or you wanted to correct data in OSM then you can go into osm.org. You can get involved in contributing the data to OSM and the Protomaps build is daily. So if you make a change, then within 24 hours you should see the new base map. Have that change. And of course for OSM your improvements would go into every OSM based project that is ingesting that data. So it's not a protomap specific thing. It's like this big shared data source, almost like Wikipedia. OpenStreetMap is a dataset and not a map [00:37:01] Jeremy: I think you were involved with OpenStreetMap to some extent. Can you speak a little bit to that for people who aren't familiar, what OpenStreetMap is? [00:37:11] Brandon: Right. So I've been using OSM as sort of like a tools developer for over a decade now. And one of the number one questions I get from developers about what is Protomaps is why wouldn't I just use OpenStreetMap? What's the distinction between Protomaps and OpenStreetMap? And it's sort of like this funny thing because even though OSM has map in the name it's not really a map in that you can't... In that it's mostly a data set and not a map. It does have a map that you can see that you can pan around to when you go to the website but the way that thing they show you on the website is built is not really that easily reproducible. It involves a lot of c++ software you have to run. But OpenStreetMap itself, the heart of it is almost like a big XML file that has all the data in the map and global. And it has tagged features for example. So you can go in and edit that. It has a web front end to change the data. It does not directly translate into making a map actually. Protomaps decides what shows at each zoom level [00:38:24] Brandon: So a lot of the pipeline, that Java program I mentioned for building this basemap for protomaps is doing things like you have to choose what data you show when you zoom out. You can't show all the data. For example when you're zoomed out and you're looking at all of a state like Colorado you don't see all the Chipotle when you're zoomed all the way out. That'd be weird, right? So you have to make some sort of decision in logic that says this data only shows up at this zoom level. And that's really what is the challenge in optimizing the size of that for the Protomaps map project. [00:39:03] Jeremy: Oh, so those decisions of what to show at different Zoom levels those are decisions made by you when you're creating the PMTiles file with Protomaps. [00:39:14] Brandon: Exactly. It's part of the base maps build pipeline. and those are honestly very subjective decisions. Who really decides when you're zoomed out should this hospital show up or should this museum show up nowadays in Google, I think it shows you ads. Like if someone pays for their car repair shop to show up when you're zoomed out like that that gets surfaced. But because there is no advertising auction in Protomaps that doesn't happen obviously. So we have to sort of make some reasonable choice. A lot of that right now in Protomaps actually comes from another open source project called Mapzen. So Mapzen was a company that went outta business a couple years ago. They did a lot of this work in designing which data shows up at which Zoom level and open sourced it. And then when they shut down, they transferred that code into the Linux Foundation. So it's this totally open source project, that like, again, sort of like Mapbox gl has this awesome legacy in that this company funded it for years for smart people to work on it and now it's just like a free thing you can use. So the logic in Protomaps is really based on mapzen. [00:40:33] Jeremy: And so the visualization of all this... I think I understand what you mean when people say oh, why not use OpenStreetMaps because it's not really clear it's hard to tell is this the tool that's visualizing the data? Is it the data itself? So in the case of using Protomaps, it sounds like Protomaps itself has all of the data from OpenStreetMap and then it has made all the decisions for you in terms of what to show at different Zoom levels and what things to have on the map at all. And then finally, you have to have a separate, UI layer and in this case, it sounds like the one that you recommend is the Map Libre library. [00:41:18] Brandon: Yeah, that's exactly right. For Protomaps, it has a portion or a subset of OSM data. It doesn't have all of it just because there's too much, like there's data in there. people have mapped out different bushes and I don't include that in Protomaps if you wanted to go in and edit like the Java code to add that you can. But really what Protomaps is positioned at is sort of a solution for developers that want to use OSM data to make a map on their app or their website. because OpenStreetMap itself is mostly a data set, it does not really go all the way to having an end-to-end solution. Financials and the idea of a project being complete [00:41:59] Jeremy: So I think it's great that somebody who wants to make a map, they have these tools available, whether it's from what was originally built by Mapbox, what's built by Open StreetMap now, the work you're doing with Protomaps. But I wonder one of the things that I talked about with Tom was he was saying he was trying to build this mapping business and based on the financials of what was coming in he was stressed, right? He was struggling a bit. And I wonder for you, you've been working on this open source project for five years. Do you have similar stressors or do you feel like I could keep going how things are now and I feel comfortable? [00:42:46] Brandon: So I wouldn't say I'm a hundred percent in one bucket or the other. I'm still seeing it play out. One thing, that I really respect in a lot of open source projects, which I'm not saying I'm gonna do for Protomaps is the idea that a project is like finished. I think that is amazing. If a software project can just be done it's sort of like a painting or a novel once you write, finish the last page, have it seen by the editor. I send it off to the press is you're done with a book. And I think one of the pains of software is so few of us can actually do that. And I don't know obviously people will say oh the map is never finished. That's more true of OSM, but I think like for Protomaps. One thing I'm thinking about is how to limit the scope to something that's quite narrow to where we could be feature complete on the core things in the near term timeframe. That means that it does not address a lot of things that people want. Like search, like if you go to Google Maps and you search for a restaurant, you will get some hits. that's like a geocoding issue. And I've already decided that's totally outta scope for Protomaps. So, in terms of trying to think about the future of this, I'm mostly looking for ways to cut scope if possible. There are some things like better tooling around being able to work with PMTiles that are on the roadmap. but for me, I am still enjoying working on the project. It's definitely growing. So I can see on NPM downloads I can see the growth curve of people using it and that's really cool. So I like hearing about when people are using it for cool projects. So it seems to still be going okay for now. [00:44:44] Jeremy: Yeah, that's an interesting perspective about how you were talking about projects being done. Because I think when people look at GitHub projects and they go like, oh, the last commit was X months ago. They go oh well this is dead right? But maybe that's the wrong framing. Maybe you can get a project to a point where it's like, oh, it's because it doesn't need to be updated. [00:45:07] Brandon: Exactly, yeah. Like I used to do a lot of c++ programming and the best part is when you see some LAPACK matrix math library from like 1995 that still works perfectly in c++ and you're like, this is awesome. This is the one I have to use. But if you're like trying to use some like React component library and it hasn't been updated in like a year, you're like, oh, that's a problem. So again, I think there's some middle ground between those that I'm trying to find. I do like for Protomaps, it's quite dependency light in terms of the number of hard dependencies I have in software. but I do still feel like there is a lot of work to be done in terms of project scope that needs to have stuff added. You mostly only hear about problems instead of people's wins [00:45:54] Jeremy: Having run it for this long. Do you have any thoughts on running an open source project in general? On dealing with issues or managing what to work on things like that? [00:46:07] Brandon: Yeah. So I have a lot. I think one thing people point out a lot is that especially because I don't have a direct relationship with a lot of the people using it a lot of times I don't even know that they're using it. Someone sent me a message saying hey, have you seen flickr.com, like the photo site? And I'm like, no. And I went to flickr.com/map and it has Protomaps for it. And I'm like, I had no idea. But that's cool, if they're able to use Protomaps for this giant photo sharing site that's awesome. But that also means I don't really hear about when people use it successfully because you just don't know, I guess they, NPM installed it and it works perfectly and you never hear about it. You only hear about people's negative experiences. You only hear about people that come and open GitHub issues saying this is totally broken, and why doesn't this thing exist? And I'm like, well, it's because there's an infinite amount of things that I want to do, but I have a finite amount of time and I just haven't gone into that yet. And that's honestly a lot of the things and people are like when is this thing gonna be done? So that's, that's honestly part of why I don't have a public roadmap because I want to avoid that sort of bickering about it. I would say that's one of my biggest frustrations with running an open source project is how it's self-selected to only hear the negative experiences with it. Be careful what PRs you accept [00:47:32] Brandon: 'cause you don't hear about those times where it works. I'd say another thing is it's changed my perspective on contributing to open source because I think when I was younger or before I had become a maintainer I would open a pull request on a project unprompted that has a hundred lines and I'd be like, Hey, just merge this thing. But I didn't realize when I was younger well if I just merge it and I disappear, then the maintainer is stuck with what I did forever. You know if I add some feature then that person that maintains the project has to do that indefinitely. And I think that's very asymmetrical and it's changed my perspective a lot on accepting open source contributions. I wanna have it be open to anyone to contribute. But there is some amount of back and forth where it's almost like the default answer for should I accept a PR is no by default because you're the one maintaining it. And do you understand the shape of that solution completely to where you're going to support it for years because the person that's contributing it is not bound to those same obligations that you are. And I think that's also one of the things where I have a lot of trepidation around open source is I used to think of it as a lot more bazaar-like in terms of anyone can just throw their thing in. But then that creates a lot of problems for the people who are expected out of social obligation to continue this thing indefinitely. [00:49:23] Jeremy: Yeah, I can totally see why that causes burnout with a lot of open source maintainers, because you probably to some extent maybe even feel some guilt right? You're like, well, somebody took the time to make this. But then like you said you have to spend a lot of time trying to figure out is this something I wanna maintain long term? And one wrong move and it's like, well, it's in here now. [00:49:53] Brandon: Exactly. To me, I think that is a very common failure mode for open source projects is they're too liberal in the things they accept. And that's a lot of why I was talking about how that choice of what features show up on the map was inherited from the MapZen projects. If I didn't have that then somebody could come in and say hey, you know, I want to show power lines on the map. And they open a PR for power lines and now everybody who's using Protomaps when they're like zoomed out they see power lines are like I didn't want that. So I think that's part of why a lot of open source projects eventually evolve into a plugin system is because there is this demand as the project grows for more and more features. But there is a limit in the maintainers. It's like the demand for features is exponential while the maintainer amount of time and effort is linear. Plugin systems might reduce need for PRs [00:50:56] Brandon: So maybe the solution to smash that exponential down to quadratic maybe is to add a plugin system. But I think that is one of the biggest tensions that only became obvious to me after working on this for a couple of years. [00:51:14] Jeremy: Is that something you're considering doing now? [00:51:18] Brandon: Is the plugin system? Yeah. I think for the data customization, I eventually wanted to have some sort of programmatic API to where you could declare a config file that says I want ski routes. It totally makes sense. The power lines example is maybe a little bit obscure but for example like a skiing app and you want to be able to show ski slopes when you're zoomed out well you're not gonna be able to get that from Mapbox or from Google because they have a one size fits all map that's not specialized to skiing or to golfing or to outdoors. But if you like, in theory, you could do this with Protomaps if you changed the Java code to show data at different zoom levels. And that is to me what makes the most sense for a plugin system and also makes the most product sense because it enables a lot of things you cannot do with the one size fits all map. [00:52:20] Jeremy: It might also increase the complexity of the implementation though, right? [00:52:25] Brandon: Yeah, exactly. So that's like. That's really where a lot of the terrifying thoughts come in, which is like once you create this like config file surface area, well what does that look like? Is that JSON? Is that TOML, is that some weird like everything eventually evolves into some scripting language right? Where you have logic inside of your templates and I honestly do not really know what that looks like right now. That feels like something in the medium term roadmap. [00:52:58] Jeremy: Yeah and then in terms of bug reports or issues, now it's not just your code it's this exponential combination of whatever people put into these config files. [00:53:09] Brandon: Exactly. Yeah. so again, like I really respect the projects that have done this well or that have done plugins well. I'm trying to think of some, I think obsidian has plugins, for example. And that seems to be one of the few solutions to try and satisfy the infinite desire for features with the limited amount of maintainer time. Time split between code vs triage vs talking to users [00:53:36] Jeremy: How would you say your time is split between working on the code versus issue and PR triage? [00:53:43] Brandon: Oh, it varies really. I think working on the code is like a minority of it. I think something that I actually enjoy is talking to people, talking to users, getting feedback on it. I go to quite a few conferences to talk to developers or people that are interested and figure out how to refine the message, how to make it clearer to people, like what this is for. And I would say maybe a plurality of my time is spent dealing with non-technical things that are neither code or GitHub issues. One thing I've been trying to do recently is talk to people that are not really in the mapping space. For example, people that work for newspapers like a lot of them are front end developers and if you ask them to run a Linux server they're like I have no idea. But that really is like one of the best target audiences for Protomaps. So I'd say a lot of the reality of running an open source project is a lot like a business is it has all the same challenges as a business in terms of you have to figure out what is the thing you're offering. You have to deal with people using it. You have to deal with feedback, you have to deal with managing emails and stuff. I don't think the payoff is anywhere near running a business or a startup that's backed by VC money is but it's definitely not the case that if you just want to code, you should start an open source project because I think a lot of the work for an opensource project has nothing to do with just writing the code. It is in my opinion as someone having done a VC backed business before, it is a lot more similar to running, a tech company than just putting some code on GitHub. Running a startup vs open source project [00:55:43] Jeremy: Well, since you've done both at a high level what did you like about running the company versus maintaining the open source project? [00:55:52] Brandon: So I have done some venture capital accelerator programs before and I think there is an element of hype and energy that you get from that that is self perpetuating. Your co-founder is gungho on like, yeah, we're gonna do this thing. And your investors are like, you guys are geniuses. You guys are gonna make a killing doing this thing. And the way it's framed is sort of obvious to everyone that it's like there's a much more traditional set of motivations behind that, that people understand while it's definitely not the case for running an open source project. Sometimes you just wake up and you're like what the hell is this thing for, it is this thing you spend a lot of time on. You don't even know who's using it. The people that use it and make a bunch of money off of it they know nothing about it. And you know, it's just like cool. And then you only hear from people that are complaining about it. And I think like that's honestly discouraging compared to the more clear energy and clearer motivation and vision behind how most people think about a company. But what I like about the open source project is just the lack of those constraints you know? Where you have a mandate that you need to have this many customers that are paying by this amount of time. There's that sort of pressure on delivering a business result instead of just making something that you're proud of that's simple to use and has like an elegant design. I think that's really a difference in motivation as well. Having control [00:57:50] Jeremy: Do you feel like you have more control? Like you mentioned how you've decided I'm not gonna make a public roadmap. I'm the sole developer. I get to decide what goes in. What doesn't. Do you feel like you have more control in your current position than you did running the startup? [00:58:10] Brandon: Definitely for sure. Like that agency is what I value the most. It is possible to go too far. Like, so I'm very wary of the BDFL title, which I think is how a lot of open source projects succeed. But I think there is some element of for a project to succeed there has to be somebody that makes those decisions. Sometimes those decisions will be wrong and then hopefully they can be rectified. But I think going back to what I was talking about with scope, I think the overall vision and the scope of the project is something that I am very opinionated about in that it should do these things. It shouldn't do these things. It should be easy to use for this audience. Is it gonna be appealing to this other audience? I don't know. And I think that is really one of the most important parts of that leadership role, is having the power to decide we're doing this, we're not doing this. I would hope other developers would be able to get on board if they're able to make good use of the project, if they use it for their company, if they use it for their business, if they just think the project is cool. So there are other contributors at this point and I want to get more involved. But I think being able to make those decisions to what I believe is going to be the best project is something that is very special about open source, that isn't necessarily true about running like a SaaS business. [00:59:50] Jeremy: I think that's a good spot to end it on, so if people want to learn more about Protomaps or they wanna see what you're up to, where should they head? [01:00:00] Brandon: So you can go to Protomaps.com, GitHub, or you can find me or Protomaps on bluesky or Mastodon. [01:00:09] Jeremy: All right, Brandon, thank you so much for chatting today. [01:00:12] Brandon: Great. Thank you very much.
In this episode of Code with Jason, host Jason Swett interviews Prarthana Shiva, a senior software engineer at NexHealth, who shares how her team is handling massive database scaling challenges. Prarthana explains their PostgreSQL database's growth to 24 terabytes (with projections to triple within a year) and details their innovative solutions including read replicas, Elasticsearch implementation, Redis caching, external write-ahead logs, and optimized vacuuming processes. The conversation also touches on Jason's own database challenges with his CI platform and concludes with Prarthana's upcoming presentation at Sin City Ruby 2025, where she'll discuss their transition from schema-based to row-based multi-tenancy for better scalability.Prarthana Shiva on LinkedInSin City Ruby
How to speed up GenAI? Find out how on this episode of Six Five On the Road at AWS re:Invent with host Keith Townsend and Elastic's Ken Exner, CPO, for a conversation on how Elastic is at the forefront of accelerating generative AI (GenAI) innovation. Fast track this ⤵️ Insights into the adoption of generative AI applications among Elastic's customer base and how Elastic facilitates the acceleration of Gen AI initiatives. Future directions for Elastic's product portfolio with the integration of AI and machine learning. Developer feedback on Elasticsearch's usage in GenAI projects and its prominence as the top vector database. The launch of Elastic Cloud Serverless and Elastic's commitment to balancing usability with flexibility for both developers and end-users. A reflection on Elastic's product developments in the past year and anticipations for innovations in 2025.
At All Things Open in October, Anandhi Bumstead, AWS's director of software engineering, highlighted OpenSearch's journey and the advantages of the Linux Foundation's stewardship. OpenSearch, an open source data ingestion and analytics engine, was transferred by Amazon Web Services (AWS) to the Linux Foundation in September 2024, seeking neutral governance and broader community collaboration. Originally forked from Elasticsearch after a licensing change in 2021, OpenSearch has evolved into a versatile platform likened to a “Swiss Army knife” for its broad use cases, including observability, log and security analytics, alert detection, and semantic and hybrid search, particularly in generative AI applications.Despite criticism over slower indexing speeds compared to Elasticsearch, significant performance improvements have been made. The latest release, OpenSearch 2.17, delivers 6.5x faster query performance and a 25% indexing improvement due to segment replication. Future efforts aim to enhance indexing, search, storage, and vector capabilities while optimizing costs and efficiency. Contributions are welcomed via opensearch.org.Learn more from The New Stack about deploying applications on OpenSearchAWS Transfers OpenSearch to the Linux FoundationFrom Flashpoint to Foundation: OpenSearch's Path ClearsSemantic Search with Amazon OpenSearch Serverless and TitanJoin our community of newsletter subscribers to stay on top of the news and at the top of your game.
Philippe Noël is Co-Founder & CEO of ParadeDB, the modern Elasticsearch alternative built on Postgres. They're purpose-built for heavy, real-time workloads and their open source project, also called paradedb, has over 6K stars on GitHub. ParadeDB has raised $2M from investors including General Catalyst & YC. In this episode, we dig into the benefits of connecting search directly to the database (ie. no ETL), the types of users / use cases that really benefit from ParadeDB (e-commerce, FinTech, etc.), the decision to focus on Postgres, making adoption super easy, Philippe's learnings as a second-time founder & more!
Nicholas Knize discusses optimizing geospatial indexing and hybrid search using advanced data structures within the Lucene framework at FOSS4G NA 2024. He emphasizes reducing cloud infrastructure waste and improving geospatial data processing efficiency. Highlights
Shay Banon, the creator of Elasticsearch, joins us to discuss pulling off a reverse rug pull. Yes, Elasticsearch is open source, again! We discuss the complexities surrounding open source licensing and what made Elastic change their license, the implications of trademark law, the personal and business impact of moving away from open source, and ultimately what made them hit rewind and return to open source.
Shay Banon, the creator of Elasticsearch, joins us to discuss pulling off a reverse rug pull. Yes, Elasticsearch is open source, again! We discuss the complexities surrounding open source licensing and what made Elastic change their license, the implications of trademark law, the personal and business impact of moving away from open source, and ultimately what made them hit rewind and return to open source.
JVM summit, virtual threads, stacks applicatives, licences, déterminisme et LLMs, quantification, deux outils de l'épisode et bien plus encore. Enregistré le 13 septembre 2024 Téléchargement de l'épisode LesCastCodeurs-Episode–315.mp3 News Langages Netflix utilise énormément Java et a rencontré un problème avec les Virtual Thread dans Java 21. Les ingénieurs de Netflix analysent ce problème dans cet article : https://netflixtechblog.com/java–21-virtual-threads-dude-wheres-my-lock–3052540e231d Les threads virtuels peuvent améliorer les performances mais posent des défis. Un problème de locking a été identifié : les threads virtuels se bloquent mutuellement. Cela entraîne des performances dégradées et des instabilités. Netflix travaille à résoudre ces problèmes et à tirer pleinement parti des threads virtuels. Une syntax pour indiquer qu'un type est nullable ou null-restricted arriverait dans Java https://bugs.openjdk.org/browse/JDK–8303099 Foo! interdirait null Foo? indiquerait que null est accepté Foo?[]! serait un tableau non-null de valeur nullable Il y a aussi des idées de syntaxe pour initialiser les tableaux null-restricted JEP: https://openjdk.org/jeps/8303099 Les vidéos du JVM Language Summit 2024 sont en ligne https://www.youtube.com/watch?v=OOPSU4LnKg0&list=PLX8CzqL3ArzUEYnTa6KYORRbP3nhsK0L1 Project Leyden Update Project Babylon - Code Reflection Valhalla - Where Are We? An Opinionated Overview on Static Analysis for Java Rethinking Java String Concatenation Code Reflection in Action - Translating Java to SPIR-V Java in 2024 Type Specialization of Java Generics - What If Casts Have Teeth ? (avec notre Rémi Forax national !) aussi tip or tail pour tout l'ecosysteme quelques liens sur Babylon: Code reflection pour exprimer des langages etranger (SQL) dans Java: https://openjdk.org/projects/babylon/ et sont example en emulation de LINQ https://openjdk.org/projects/babylon/articles/linq Librairies Micronaut sort sa version 4.6 https://micronaut.io/2024/08/26/micronaut-framework–4–6–0-released/ essentiellement une grosse mise à jour de tonnes de modules avec les dernières versions des dépendances Microprofile 7 faire quelques changements et evolution incompatibles https://microprofile.io/2024/08/22/microprofile–7–0-release/#general enleve Metrics et remplace avec Telemetry (metrics, log et tracing) Metrics reste une spec mais standalone Microprofile 7 depende de Jakarta Core profile et ne le package plus Microprofile OpenAPI 4 et Telemetry 2 amenent des changements incompatibles Quarkus 3.14 avec LetsEncrypt et des serialiseurs JAckson sans reflection https://quarkus.io/blog/quarkus–3–14–1-released/ Hibernate ORM 6.6 Serialisateurs JAckson sans reflection installer des certificats letsencrypt simplement (notamment avec la ligne de commande qui aide sympa notamment avec ngrok pour faire un tunnel vers son localhost retropedalage sur @QuarkusTestResource vs @WithTestResource suite aux retour de OOME et lenteur des tests mieux isolés Les logs structurées dans Spring Boot 3.4 https://spring.io/blog/2024/08/23/structured-logging-in-spring-boot–3–4 Les logs structurées (souvent en JSON) vous permettent de les envoyer facilement vers des backends comme Elastic, AWS CloudWatch… Vous pouvez les lier à du reporting et de l'alerting. Spring Boot 3.4 prend en charge la journalisation structurée par défaut. Il prend en charge les formats Elastic Common Schema (ECS) et Logstash, mais il est également possible de l'étendre avec vos propres formats. Vous pouvez également activer la journalisation structurée dans un fichier. Cela peut être utilisé, par exemple, pour imprimer des journaux lisibles par l'homme sur la console et écrire des journaux structurés dans un fichier pour l'ingestion par machine. Infrastructure CockroachDB qui avait une approche Business Software License (source available puis ALS 3 ans apres), passe maintenant en license proprietaire avec source available https://www.cockroachlabs.com/blog/enterprise-license-announcement/ Polyform project offre des licences standardisees selon les besoins de gratuit vs payant https://polyformproject.org/ Cloud Azure fonctions, comment le demarrage a froid est optimisé https://www.infoq.com/articles/azure-functions-cold-starts/?utm_campaign=infoq_content&utm_source=twitter&utm_medium=feed&utm_term=Cloud fonctions ont une latence naturelle forte toutes les lantences longues ne sont aps impactantes pour le business les demarrages a froid peuvent etre mesures avec les outils du cloud provider donc faites en usage faites des decentilers de latences experience 381 ms cold et 10ms apres tracing pour end to end latence les strategies keep alive pings: reveiller la fonctione a intervalles reguliers pour rester “warm” dans le code de la fonction: initialiser les connections et le chargement des assemblies dans l'initialization configurer dans host.json le batching, desactiver file system logging etc deployer les fonctions as zips reduire al taille du code et des fichiers (qui sont copies sur le serveur froid) sur .net activer ready to run qui aide le JIT compiler instances azure avec plus de CPU et memoire sont plus cher amis baissent le cold start dedicated azure instances pour vos fonctions (pas aprtage avec les autres tenants) ensuite montre des exemples concrets Web Sortie de Vue.js 3.5 https://blog.vuejs.org/posts/vue–3–5 Vue.JS 3.5: Nouveautés clés Optimisations de performance et de mémoire: Réduction significative de la consommation de mémoire (–56%). Amélioration des performances pour les tableaux réactifs de grande taille. Résolution des problèmes de valeurs calculées obsolètes et de fuites de mémoire. Nouvelles fonctionnalités: Reactive Props Destructure: Simplification de la déclaration des props avec des valeurs par défaut. Lazy Hydration: Contrôle de l'hydratation des composants asynchrones. useId(): Génération d'ID uniques stables pour les applications SSR. data-allow-mismatch: Suppression des avertissements de désynchronisation d'hydratation. Améliorations des éléments personnalisés: Prise en charge de configurations d'application, d'API pour accéder à l'hôte et au shadow root, de montage sans Shadow DOM, et de nonce pour les balises. useTemplateRef(): Obtention de références de modèle via l'API useTemplateRef(). Teleport différé: Téléportation de contenu vers des éléments rendus après le montage du composant. onWatcherCleanup(): Enregistrement de callbacks de nettoyage dans les watchers. Data et Intelligence Artificielle On entend souvent parler de Large Language Model quantisés, c'est à dire qu'on utilise par exemple des entiers sur 8 bits plutôt que des floatants sur 32 bits, pour réduire les besoins mémoire des GPU tout en gardant une précision proche de l'original. Cet article explique très visuellement et intuitivement ce processus de quantisation : https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-quantization Guillaume continue de partager ses aventures avec le framework LangChain4j. Comment effectuer de la classification de texte : https://glaforge.dev/posts/2024/07/11/text-classification-with-gemini-and-langchain4j/ en utilisant la classe TextClassification de LangChain4j, qui utilise une approche basée sur les vector embeddings pour comparer des textes similaires en utilisant du few-shot prompting, sous différentes variantes, dans cet autre article : https://glaforge.dev/posts/2024/07/30/sentiment-analysis-with-few-shots-prompting/ et aussi comment faire du multimodal avec LangChain4j (avec le modèle Gemini) pour analyser des textes, des images, mais également des vidéos, du contenu audio, ou bien des fichiers PDFs : https://glaforge.dev/posts/2024/07/25/analyzing-videos-audios-and-pdfs-with-gemini-in-langchain4j/ Pour faire varier la prédictibilité ou la créativité des LLMs, certains hyperparamètres peuvent être ajustés, comme la température, le top-k et le top-p. Mais est-ce que vous savez vraiment comment fonctionnent ces paramètres ? Deux articles très clairs et intuitifs expliquent leur fonctionnement : https://medium.com/google-cloud/is-a-zero-temperature-deterministic-c4a7faef4d20 https://medium.com/google-cloud/beyond-temperature-tuning-llm-output-with-top-k-and-top-p–24c2de5c3b16 la tempoerature va ecraser la probabilite du prochain token mais il reste des variables: approximnation des calculs flottants, stacks differentes effectuants ces choix differemment, que faire en cas d'egalité de probabilité entre deux tokens mais il y a d'atures apporoches de configuiration des reaction du LLM: top-k (qui evite les tokens peu frequents), top-p pour avoir les n des tokens qui totalient p% des probabilités temperature d'abord puis top-k puis top-p explique quoi utiliser quand OSI propose une definition de l'IA open source https://www.technologyreview.com/2024/08/22/1097224/we-finally-have-a-definition-for-open-source-ai/ gros debats ces derniers mois utilisable pour tous usages sans besoin de permission chercheurs peuvent inspecter les components et etudier comment le system fonctionne systeme modifiable pour tout objectif y compris chager son comportement et paratger avec d'autres avec ou sans modification quelque soit l'usage Definit des niveaux de transparence (donnees d'entranement, code source, poids) Une longue rétrospective de PostgreSQL a des volumes de malades et les problèmes de lock https://ardentperf.com/2024/03/03/postgres-indexes-partitioning-and-lwlocklockmanager-scalability/ un article pour vous rassurer que vous n'aurez probablement jamais le problème histoire sous forme de post mortem des conseils pour éviter ces falaises Outillage Un premier coup d'oeil à la future notation déclarative de Gradle https://blog.gradle.org/declarative-gradle-first-eap un article qui explique à quoi ressemble cette nouvelle syntaxe déclarative de Gradle (en plus de Groovy et Kotlin) Quelques vidéos montrent le support dans Android Studio, pour le moment, ainsi que dans un outil expérimental, en attendant le support dans tous les IDEs L'idée est d'éviter le scripting et d'avoir vraiment qu'une description de son build Cela devrait améliorer la prise en charge de Gradle dans les IDEs et permettre d'avoir de la complétion rapide, etc c'est moi on on a Maven là? Support de Firefox dans Puppeteer https://hacks.mozilla.org/2024/08/puppeteer-support-for-firefox/ Puppeteer, la bibliothèque d'automatisation de navigateur, supporte désormais officiellement Firefox dès la version 23. Cette avancée permet aux développeurs d'écrire des scripts d'automatisation et d'effectuer des tests de bout en bout sur Chrome et Firefox de manière interchangeable. L'intégration de Firefox dans Puppeteer repose sur WebDriver BiDi, un protocole inter-navigateurs en cours de standardisation au W3C. WebDriver BiDi facilite la prise en charge de plusieurs navigateurs et ouvre la voie à une automatisation plus simple et plus efficace. Les principales fonctionnalités de Puppeteer, telles que la capture de journaux, l'émulation de périphériques, l'interception réseau et le préchargement de scripts, sont désormais disponibles pour Firefox. Mozilla considère WebDriver BiDi comme une étape importante vers une meilleure expérience de test inter-navigateurs. La prise en charge expérimentale de CDP (Chrome DevTools Protocol) dans Firefox sera supprimée fin 2024 au profit de WebDriver BiDi. Bien que Firefox soit officiellement pris en charge, certaines API restent non prises en charge et feront l'objet de travaux futurs. Guillaume a créé une annotation @Retry pour JUnit 5, pour retenter l'exécution d'un test qui est “flaky” https://glaforge.dev/posts/2024/09/01/a-retryable-junit–5-extension/ Guillaume n'avait pas trouvé d'extension par défaut dans JUnit 5 pour remplacer les Retry rules de JUnit 4 Mais sur les réseaux sociaux, une discussion intéressante s'ensuit avec des liens sur des extensions qui implémentent cette approche Comme JUnit Pioneer qui propose plein d'extensions utiles https://junit-pioneer.org/docs/retrying-test/ Ou l'extension rerunner https://github.com/artsok/rerunner-jupiter Arnaud a aussi suggéré la configuration de Maven Surefire pour relancer automatiquement les tests qui ont échoué https://maven.apache.org/surefire/maven-surefire-plugin/examples/rerun-failing-tests.html la question philosophique est: est-ce que c'est tolerable les tests qui ecouent de façon intermitente Architecture Un ancien fan de GraphQL en a fini avec la technologie GraphQL et réfléchit aux alternatives https://bessey.dev/blog/2024/05/24/why-im-over-graphql/ Problèmes de GraphQL: Sécurité: Attaques d'autorisation Difficulté de limitation de débit Analyse de requêtes malveillantes Performance: Problème N+1 (récupération de données et autorisation) Impact sur la mémoire lors de l'analyse de requêtes invalides Complexité accrue: Couplage entre logique métier et couche de transport Difficulté de maintenance et de tests Solutions envisagées: Adoption d'API REST conformes à OpenAPI 3.0+ Meilleure documentation et sécurité des types Outils pour générer du code client/serveur typé Deux approches de mise en œuvre d'OpenAPI: “Implementation first” (génération de la spécification à partir du code) “Specification first” (génération du code à partir de la spécification) retour interessant de quelqu'un qui n'utilise pas GraphQL au quotidien. C'était des problemes qui devaient etre corrigés avec la maturité de l'ecosysteme et des outils mais ca a montré ces limites pour cette personne. Prensentation de Grace Hoper en 1980 sur le future des ordinateurs. https://youtu.be/AW7ZHpKuqZg?si=w_o5_DtqllVTYZwt c'est fou la modernité de ce qu'elle décrit Des problèmes qu'on a encore aujourd'hui positive leadership Elle décrit l'avantage de systèmes fait de plusieurs ordinateurs récemment declassifié Leader election avec les conditional writes sur les buckets S3/GCS/Azure https://www.morling.dev/blog/leader-election-with-s3-conditional-writes/ L'élection de leader est le processus de choisir un nœud parmi plusieurs pour effectuer une tâche. Traditionnellement, l'élection de leader se fait avec un service de verrouillage distribué comme ZooKeeper. Amazon S3 a récemment ajouté le support des écritures conditionnelles, ce qui permet l'élection de leader sans service séparé. L'algorithme d'élection de leader fonctionne en faisant concourir les nœuds pour créer un fichier de verrouillage dans S3. Le fichier de verrouillage inclut un numéro d'époque, qui est incrémenté à chaque fois qu'un nouveau leader est élu. Les nœuds peuvent déterminer s'ils sont le leader en listant les fichiers de verrouillage et en vérifiant le numéro d'époque. attention il peut y avoir plusieurs leaders élus (horloges qui ont dérivé) donc c'est à gérer aussi Méthodologies Guillaume Laforge interviewé par Sfeir, où il parle de l'importance de la curiosité, du partage, de l'importance de la qualité du code, et parsemé de quelques photos des Cast Codeurs ! https://www.sfeir.dev/success-story/guillaume-laforge-maestro-de-java-et-esthete-du-code-propre/ Sécurité Comment crowdstrike met a genoux windows et de nombreuses entreprises https://next.ink/144464/crowdstrike-donne-des-details-techniques-sur-son-fiasco/ l'incident vient de la mise à jour de la configuration de Falcon l'EDR de crowdstrike https://www.crowdstrike.com/blog/falcon-update-for-windows-hosts-technical-details/ qu'est ce qu'un EDR? Un système Endpoint Detection and Response a pour but de surveiller votre machine ( access réseaux, logs, …) pour detecter des usages non habituels. Cet espion doit interagir avec les couches basses du système (réseau, sockets, logs systems) et se greffe donc au niveau du noyau du système d'exploitation. Il remonte les informations en live à une plateforme qui peut ensuite adapter les réponse en live si l'incident a duré moins de 1h30 coté crowdstrike plus de 8 millions de machines se sont retrouvées hors service bloquées sur le Blue Screen Of Death selon Microsoft https://blogs.microsoft.com/blog/2024/07/20/helping-our-customers-through-the-crowdstrike-outage/ cela n'est pas la première fois et était déjà arrivé il y a quelques mois sur Linux. Comme il s'agissait d'une incompatibilité de kernel il avait été moins important car les services ITs gèrent mieux ces problèmes sous Linux https://stackdiary.com/crowdstrike-took-down-debian-and-rocky-linux-a-few-months-ago-and-no-one-noticed/ Les benchmarks CIS, un pilier pour la sécurité de nos environnements cloud, et pas que ! (Katia HIMEUR TALHI) https://blog.cockpitio.com/security/cis-benchmarks/ Le CIS est un organisme à but non lucratif qui élabore des normes pour améliorer la cybersécurité. Les référentiels CIS sont un ensemble de recommandations et de bonnes pratiques pour sécuriser les systèmes informatiques. Ils peuvent être utilisés pour renforcer la sécurité, se conformer aux réglementations et normaliser les pratiques. Loi, société et organisation Microsoft signe un accord avec OVHCloud pour qu'il arretent leur plaine d'antitrust https://www.politico.eu/article/microsoft-signs-antitrust-truce-with-ovhcloud/ la plainte était en Europe mermet a des clients de plus facilement deployer les solutions Microsoft dans le fournisseur de cloud de leur choix la plainte avait ete posé à l'été 2021 ca rendait faire tourner les solutions MS plus cheres et non competitives vs MS ElasticSearch et Kibana sont de nouveau Open Source, en ajoutant la license AGPL à ses autres licences existantes https://www.elastic.co/fr/blog/elasticsearch-is-open-source-again le marché d'il y a trois ans et maintenant a changé AWS est une bon partenaire le flou Elasticsearch vs le produit d'AWS s'est clarifié donc retour a l'open source via AGPL Affero GPL Elastic n'a jamais cessé de croire en l'open source d'après Shay Banon son fondateur Le changement vers l'AGPL est une option supplémentaire, pas un remplacement d'une des autres licences existantes et juste apres, Elastic annonce des resultants decevants faisant plonger l'action de 25% https://siliconangle.com/2024/08/29/elastic-shares-plunge–25-lower-revenue-projections-amid-slower-customer-commitments/ https://unrollnow.com/status/1832187019235397785 et https://www.elastic.co/pricing/faq/licensing pour un résumé des licenses chez elastic Outils de l'épisode MailMate un client email Markdown et qui gere beaucoup d'emails https://medium.com/@nicfab/mailmate-a-powerful-client-email-for-macos-markdown-integrated-email-composition-e218fe2accf3 Emmanuel l'utilise sur les boites email secondaires un peu lent a demarrer (synchro) et le reste est rapide boites virtuelles (par requete) SpamSieve Que macOS je crois Trippy, un analyseur de réseau https://github.com/fujiapple852/trippy Il regroupe dans une CLI traceroute et ping Conférences La liste des conférences provenant de Developers Conferences Agenda/List par Aurélie Vache et contributeurs : 17 septembre 2024 : We Love Speed - Nantes (France) 17–18 septembre 2024 : Agile en Seine 2024 - Issy-les-Moulineaux (France) 19–20 septembre 2024 : API Platform Conference - Lille (France) & Online 20–21 septembre 2024 : Toulouse Game Dev - Toulouse (France) 25–26 septembre 2024 : PyData Paris - Paris (France) 26 septembre 2024 : Agile Tour Sophia-Antipolis 2024 - Biot (France) 2–4 octobre 2024 : Devoxx Morocco - Marrakech (Morocco) 3 octobre 2024 : VMUG Montpellier - Montpellier (France) 7–11 octobre 2024 : Devoxx Belgium - Antwerp (Belgium) 8 octobre 2024 : Red Hat Summit: Connect 2024 - Paris (France) 10 octobre 2024 : Cloud Nord - Lille (France) 10–11 octobre 2024 : Volcamp - Clermont-Ferrand (France) 10–11 octobre 2024 : Forum PHP - Marne-la-Vallée (France) 11–12 octobre 2024 : SecSea2k24 - La Ciotat (France) 15–16 octobre 2024 : Malt Tech Days 2024 - Paris (France) 16 octobre 2024 : DotPy - Paris (France) 16–17 octobre 2024 : NoCode Summit 2024 - Paris (France) 17–18 octobre 2024 : DevFest Nantes - Nantes (France) 17–18 octobre 2024 : DotAI - Paris (France) 30–31 octobre 2024 : Agile Tour Nantais 2024 - Nantes (France) 30–31 octobre 2024 : Agile Tour Bordeaux 2024 - Bordeaux (France) 31 octobre 2024–3 novembre 2024 : PyCon.FR - Strasbourg (France) 6 novembre 2024 : Master Dev De France - Paris (France) 7 novembre 2024 : DevFest Toulouse - Toulouse (France) 8 novembre 2024 : BDX I/O - Bordeaux (France) 13–14 novembre 2024 : Agile Tour Rennes 2024 - Rennes (France) 16–17 novembre 2024 : Capitole Du Libre - Toulouse (France) 20–22 novembre 2024 : Agile Grenoble 2024 - Grenoble (France) 21 novembre 2024 : DevFest Strasbourg - Strasbourg (France) 21 novembre 2024 : Codeurs en Seine - Rouen (France) 27–28 novembre 2024 : Cloud Expo Europe - Paris (France) 28 novembre 2024 : Who Run The Tech ? - Rennes (France) 2–3 décembre 2024 : Tech Rocks Summit - Paris (France) 3 décembre 2024 : Generation AI - Paris (France) 3–5 décembre 2024 : APIdays Paris - Paris (France) 4–5 décembre 2024 : DevOpsRex - Paris (France) 4–5 décembre 2024 : Open Source Experience - Paris (France) 5 décembre 2024 : GraphQL Day Europe - Paris (France) 6 décembre 2024 : DevFest Dijon - Dijon (France) 22–25 janvier 2025 : SnowCamp 2025 - Grenoble (France) 30 janvier 2025 : DevOps D-Day #9 - Marseille (France) 6–7 février 2025 : Touraine Tech - Tours (France) 3 avril 2025 : DotJS - Paris (France) 16–18 avril 2025 : Devoxx France - Paris (France) Nous contacter Pour réagir à cet épisode, venez discuter sur le groupe Google https://groups.google.com/group/lescastcodeurs Contactez-nous via twitter https://twitter.com/lescastcodeurs Faire un crowdcast ou une crowdquestion Soutenez Les Cast Codeurs sur Patreon https://www.patreon.com/LesCastCodeurs Tous les épisodes et toutes les infos sur https://lescastcodeurs.com/
Jerod & Adam share our Zulip first impressions, react to Elasticsearch going open source (again), discuss Christian Hollinger's blog post on why he still self-hosts & answer a listener question: how do we produce podcasts?
Jerod & Adam share our Zulip first impressions, react to Elasticsearch going open source (again), discuss Christian Hollinger's blog post on why he still self-hosts & answer a listener question: how do we produce podcasts?
Welcome to episode 274 of The Cloud Pod, where the forecast is always cloudy! Justin, Ryan and Matthew are your hosts this week as we explore the world of SnapShots, Maia, Open Source, and VMware – just to name a few of the topics. And stay tuned for an installment of our continuing Cloud Journey Series to explore ways to decrease tech debt, all this week on The Cloud Pod. Titles we almost went with this week: The Cloud Pod in Parallel Cluster The Cloud Pod cringes at managing 1000 aws accounts The Cloud Pod welcomes Imagen 3 with less Wokeness The Cloud Pod wants to be instantly snapshotted The Cloud pod hates tech debt A big thanks to this week's sponsor: We're sponsorless! Want to get your brand, company, or service in front of a very enthusiastic group of cloud news seekers? You've come to the right place! Send us an email or hit us up on our slack channel for more info. General News 00:32 Elasticsearch is Open Source, Again Shay Banon is pleased to call ElasticSearch and Kibana “open source” again. He says everyone at Elastic is ecstatic to be open source again, it’s part of his and “Elastics DNA.” They’re doing this by adding AGPL as another license option next to ELv2 and SSPL in the coming weeks. They never stopped believing or behaving like an OSS company after they changed the license, but by being able to use the term open source and by using AGPL – an OSI approved license – removes any questions or fud people might have. Shay says the change 3 years ago was because they had issues with AWS and the market confusion their offering was causing. So, after trying all the other options, changing the license – all while knowing it would result in a fork with a different name – was the path they took. While it was painful, they said it worked. 3 years later, Amazon is fully invested in their OpenSearch fork, the market confusion has mostly gone, and their partnership with AWS is stronger than ever. They are even being named partner of the year with AWS. They want to “make life of our users as simple as possible,” so if you’re ok with the ELv2 or the SSPL, then you can keep using that license. They aren't removing anything, just giving you another option with AGPL. He calls out trolls and people who will pick at this announcement, so they are attempting to address the trolls in advance. “Changing the license was a mistake, and Elastic now backtracks from it”. We removed a lot of market confusion when we changed our license 3 years ago. And because of our actions, a lot has changed. It's an entirely different landscape now. We aren't living in the past. We want to build a better future for our users. It's because we took action then, that we are in a position to take action now. “AGPL i
This week we dig back into home automation, we talk a bit about choosing cameras for a large camera system, and of course we answer your questions! -- During The Show -- 00:52 Intro Home automation Weekend of learning 03:48 Monitoring Remote Location (Cameras) - Rob Powerline adapters might work Ubiquiti Nano Beam Synology Surveillance Station (https://www.synology.com/en-global/surveillance) Frigate Do not put the NVR on the internet Privacy File server upload Home Assistant events 17:18 Camera Systems for Tribal Lands - William NDAA compliant cameras and NVRs ReoLink NVR banned ReoLink Cameras depends - bad idea NDAA compliant brands 360 Vision Technology (360 VTL) Avigilon Axis Communications BCD International Commend FLIR Geutebrück iryx JCI/Tyco Security Mobotix Pelco Rhombus Systems Seek Thermal Solink Vaion/Ava WatchGuard Main 3 NVR in use Exac Vision Avigilon Milestone NDAA conversation Noah's favorites Axis FLIR #### 25:09 Charlie Finds e-ink android - Charlie Boox Palma (https://shop.boox.com/products/palma) Why a camera? Nice for reading Lineage or Graphene will NOT work 27:57 ESPDevices for Light Switches - Avri Shelly's are ESP32 devices Devices can talk to each other 30:00 Beaming podcasts to Volumio and Roku - Tiny Pulse Audio Write in! 31:40 News Wire 4M Linux 46 - opensourcefeed.org (https://www.opensourcefeed.org/4mlinux-46-release/) Debain Bookwork 12.7 - debian.org (https://www.debian.org/News/2024/20240831) Porteus 1.6 - porteus.org (https://forum.porteus.org/viewtopic.php?t=11426) Rhino Linux 2nd Release - itsfoss.com (https://news.itsfoss.com/rhino-linux-2024-2-release/) GNU Screen 5 - theregister.com (https://www.theregister.com/2024/09/03/gnu_screen_5/) Wireshark 4.4 - wireshark.org (https://www.wireshark.org/docs/relnotes/wireshark-4.4.0) Bugzilla releases - bugzilla.org (https://www.bugzilla.org/blog/2024/09/03/release-of-bugzilla-5.2-5.0.4.1-and-4.4.14/) Armbian 24.8 - armbian.com (https://www.armbian.com/newsflash/armbian-24-8-yelt/) Elasticsearch and Kibana licensing - businesswire.com (https://www.businesswire.com/news/home/20240829537786/en/Elastic-Announces-Open-Source-License-for-Elasticsearch-and-Kibana-Source-Code) Xe2 Linux Support - wccftech.com (https://wccftech.com/intel-push-out-xe2-graphics-enablement-linux-6-12-kernel/) Cicada3301 - thehackernews.com (https://thehackernews.com/2024/09/new-rust-based-ransomware-cicada3301.html) New Phi-3.5 AI Models - infoq.com (https://www.infoq.com/news/2024/08/microsoft-phi-3-5/) Open-Source, EU AI Act Compliant LLMs - techzine.eu (https://www.techzine.eu/blogs/privacy-compliance/123863/aleph-alphas-open-source-llms-fully-comply-with-the-ai-act/) View on Why AI Models Should be Open and Free for All - businessinsider.com (https://www.businessinsider.com/anima-anandkumar-ai-climate-change-open-source-caltech-nvidia-2024-8) 33:53 Hoptodesk Comparison to Team Viewer Hoptodesk (https://www.hoptodesk.com/) Free & Open Source Cross platform E2E Encryption Can self host the server Wayland is not officially supported 38:05 EmuDeck ArsTechnica (https://arstechnica.com/gaming/2024/08/emudeck-machines-pack-popular-emulation-suite-in-linux-powered-plug-and-play-pc/) Seeking funding Already been doing this on the steamdeck For retro games Drawing unwanted attention Powered by Bazzite 41:05 Home Automation Zwave Great for nerds/tinkering Not for professional installs RadioRA 2 Licensed dedicated frequency Central planning Never had a failure Designed to be integrated Orbit Panels and Shelly Pro Line Game changer 100% reliable People don't want a wall of dimmers Seeed Studio mmWave Sensor (https://wiki.seeedstudio.com/mmwave_human_detection_kit/) I don't like WiFi for automation Steve's experience -- The Extra Credit Section -- For links to the articles and material referenced in this week's episode check out this week's page from our podcast dashboard! This Episode's Podcast Dashboard (http://podcast.asknoahshow.com/406) Phone Systems for Ask Noah provided by Voxtelesys (http://www.voxtelesys.com/asknoah) Join us in our dedicated chatroom #GeekLab:linuxdelta.com on Matrix (https://element.linuxdelta.com/#/room/#geeklab:linuxdelta.com) -- Stay In Touch -- Find all the resources for this show on the Ask Noah Dashboard Ask Noah Dashboard (http://www.asknoahshow.com) Need more help than a radio show can offer? Altispeed provides commercial IT services and they're excited to offer you a great deal for listening to the Ask Noah Show. Call today and ask about the discount for listeners of the Ask Noah Show! Altispeed Technologies (http://www.altispeed.com/) Contact Noah live [at] asknoahshow.com -- Twitter -- Noah - Kernellinux (https://twitter.com/kernellinux) Ask Noah Show (https://twitter.com/asknoahshow) Altispeed Technologies (https://twitter.com/altispeed)
The Cursor AI code editor raises $60 million, RedMonk's Rachel Stephens tries to determine if rug pulls are worth it, Caleb Porzio details how he made $1 million on GitHub Sponsors, Elastic founder Shay Banon announces that Elasticsearch is open source (again) & Tomas Stropus writes about the art of finishing.
The Cursor AI code editor raises $60 million, RedMonk's Rachel Stephens tries to determine if rug pulls are worth it, Caleb Porzio details how he made $1 million on GitHub Sponsors, Elastic founder Shay Banon announces that Elasticsearch is open source (again) & Tomas Stropus writes about the art of finishing.
The Cursor AI code editor raises $60 million, RedMonk's Rachel Stephens tries to determine if rug pulls are worth it, Caleb Porzio details how he made $1 million on GitHub Sponsors, Elastic founder Shay Banon announces that Elasticsearch is open source (again) & Tomas Stropus writes about the art of finishing.
Welcome to another episode of Category Visionaries — the show that explores GTM stories from tech's most innovative B2B founders. In today's episode, we're speaking with Robert Cowart, CEO & Co-Founder of ElastiFlow, a network performance and security analytics platform that's raised $8 Million in funding. Here are the most interesting points from our conversation: Network Dependency: Robert emphasizes the critical role of network infrastructure in today's world, impacting commerce, healthcare, entertainment, and social interactions. Genesis of ElastiFlow: The company started as an experiment to see how new data platforms like Elasticsearch could improve network observability, leading to a successful GitHub project. Community's Role: The initial success and growth of ElastiFlow were significantly boosted by a loyal community built around the GitHub project, highlighting the importance of community-led growth. Market Entry and Growth: ElastiFlow quickly transitioned from community support to paying customers, even before launching their beta product, showcasing the power of having a dedicated user base. Building a Marketing Strategy: Initially relying on inbound marketing, ElastiFlow has now invested in outbound sales and marketing, including paid ads and content creation, to increase brand awareness and drive growth. Future Vision: The company aims to continue enhancing network observability, adding more context to network traffic records, and ensuring comprehensive support for hybrid IT environments. // Sponsors: Front Lines — We help B2B tech companies launch, manage, and grow podcasts that drive demand, awareness, and thought leadership. www.FrontLines.io The Global Talent Co. — We help tech startups find, vet, hire, pay, and retain amazing marketing talent that costs 50-70% less than the US & Europe. www.GlobalTalent.co
Welcome to episode 265 of the Cloud Pod Podcast – where the forecast is always cloudy! It's a full house this week – Matthew, Jonathan, Ryan and Justin are all here to bring you the latest in cloud news – including FOCUS features in AWS Billing, Magic Quadrants, and AWS Metis. Plus, we have an Andoid vs. Apple showdown in the Aftershow, so be sure to stay tuned for that! Titles we almost went with this week: Tech reports show Gartner leads in the BS quadrant Oracle adds cloud and legal expenses to their FinOps hub AWS Metis: Great chatbot, or Greek tragedy waiting to happen? The cloud pod rocks Cargo Pants A sonnet is written for FOCUSing on spend A big thanks to this week's sponsor: We're sponsorless! Want to reach a dedicated audience of cloud engineers? Send us an email, or hit us up on our Slack Channel and let's chat! General News 01:40 Finops X Recently Justin attended FinOps in beautiful and sunny San Diego – and if you weren't there, you really should plan on attending next year. This year's topics included: Focus 1.0 State of Vendors Conference size – they will most likely outgrow this particular conference center, seeing as how they're either selling out or pretty close to it. Coolest thing about the conference – on stage all the biggies – TOGETHER. It's great to see them all together talking about how they're making Finops better, and introducing new things for Finops and not just saving them for their own conferences. Next Year – Is Oracle going to be on stage next year? 08:22 Justin – “The shift left of FinOps was a big topic. You know, how do we get visibility? How do we show people what things are going to cost? How do we make sure that, you know, people are aware of what they’re doing? And so I think, you know, it’s just a recognition that is important and just as important as security is your cost. And in some ways security is part of your cost story. Because if you bankrupt your company, that’s a pretty bad security situation.” 10:17 Introducing Managed OpenSearch: Gain Control of Your Cloud with Powerful Log Analysis Listen. We don't really *care* about OpenSearch – but the reality is it's taking over the world. Nobody is doing ElasticSearch anymore. Digital Ocean is launching Managed OpenSearch offering, a comprehensive solution designed for in depth log analysis, simplifying troubleshooting, and optimizing application performance. With Digital ocean you can Pinpoint and analyze log data with ease, customize log retention, enhance security and can scale with your business and receive forwarded logs from multiple sources including Digital Ocean droplets, managed databases, etc. Interested in pricing? You can find that here. Or, if you'd like to take a product tour, you can do that
Redis is no longer open source. Just a few months ago, in March 2024, the project was relicensed, leaving its vast community confused. But the community did not give up, and started work to fork Redis to keep it open. In this episode, we delve into the Valkey project, a prominent fork of Redis, established under the Linux Foundation, which brought together important figures from the Redis community, as well as leading industry giants including AWS, Google Cloud, Oracle and others. Valkey has rapidly gained momentum and just reached General Availability (GA). Join us as we explore the motivations behind Valkey's creation, hear first-hand stories on its foundation and journey to GA, and learn of its Redis compatibility, roadmap and implications for the open-source community. Valkey's first Contributor Summit is taking place June 5-6 in Seattle and we will bring you announcements and updates hot off the summit. Our guest is Kyle Davis, the Senior Developer Advocate on the Valkey project, and a past contributor for Redis. Kyle currently works at AWS, a founding member of Valkey, and has a long history with open source and with forks. He was a founding contributor to the OpenSearch project, which started as a fork of Elasticsearch and Kibana after the latter's relicensing off OSS. Most recently Kyle worked to build a community around Bottlerocket OSS project. The episode was live-streamed on 10 June 2024 and the video is available at youtube.com/live/HQ7TAdQpxu4 OpenObservability Talks episodes are released monthly, on the last Thursday of each month and are available for listening on your favorite podcast app and on YouTube. We live-stream the episodes on Twitch and YouTube Live - tune in to see us live, and chime in with your comments and questions on the live chat. https://www.youtube.com/@openobservabilitytalks https://www.twitch.tv/openobservability Show Notes: 01:12 - Episode intro, Kyle Davis' Redis background 05:43 - Redis relicensing off open source 10:10 - Valkey vs. other Redis open source forks 16:50 - drop-in replacement of Redis 19:35 - Redis user experience during the relicensing 28:50 - From fork to GA in less than a month 34:00 - Valkey roadmap and Contributor Summit updates 40:00 - Valkey's Technical Steering Committee and leadership 44:14 - what Valkey latest GA is about Resources: Valkey announced: https://www.linkedin.com/posts/horovits_redis-opensource-activity-7179186700470861824-Gghq Valkey first GA and new member companies: https://www.linkedin.com/posts/horovits_redis-valkey-valkey-activity-7186263342041198593-fsY3 Announcements from Valkey's first Contributor Summit: https://www.linkedin.com/posts/horovits_valkey-welcomes-new-partners-amid-growing-activity-7209084153718362112-OfdI/ For Kubernetes 10th anniversary - special episode with Kelsey Hightower: https://logz.io/blog/kubernetes-and-beyond-2023-reflection/?utm_source=devrel&utm_medium=devrel Socials: Twitter: https://twitter.com/OpenObserv YouTube: https://www.youtube.com/@openobservabilitytalks Dotan Horovits ============ Twitter: @horovits LinkedIn: in/horovits Mastodon: @horovits@fosstodon Kyle Davis ======== LinkedIn: linkedin.com/in/kyle-davis-linux/ Mastodon: @linux_mclinuxface@fosstodon.org
У дванадцятому випуску подкасту 1-2-3 Techno поговорили з Дмитром Чаплинським про культуру передачі знань в стартапах та великих компаніях, пошук мотивації, правильне делегування та вихід за межі стандартних рішень.
PostgreSQL is an incredible general-purpose database, but it can't do everything. Every design decision is a tradeoff, and inevitably some of those tradeoffs get fundamentally baked into the way it's built. Take storage for instance - Postgres tables are row-oriented; great for row-by-row access, but when it comes to analytics, it can't compete with a dedicated OLAP database that uses column-oriented storage. Or can it?Joining me this week is Philippe Noël of ParadeDB, who's going to take us on a tour of Postgres' extension mechanism, from creating custom functions and indexes to Rust code that changes the way Postgres stores data on disk. In his journey to bring Elasticsearch's strengths to Postgres, he's gone all the way down to raw datafiles and back through the optimiser to teach a venerable old dog some new data-access tricks. –ParadeDB: https://paradedb.comParadeDB on Twitter: https://twitter.com/paradedbParadeDB on Github: https://github.com/paradedb/paradedbpgrx (Postgres with Rust): https://github.com/pgcentralfoundation/pgrxTantivy (Rust FTS library): https://github.com/quickwit-oss/tantivyPgMQ (Queues in Postgres): https://tembo.io/blog/introducing-pgmqApache Datafusion: https://datafusion.apache.org/Lucene: https://lucene.apache.org/Kris on Mastodon: http://mastodon.social/@krisajenkinsKris on LinkedIn: https://www.linkedin.com/in/krisjenkins/Kris on Twitter: https://twitter.com/krisajenkins
У восьмий випуск подкасту 1-2-3 Techno до нас завітав Всеволод Соловйов, CTO та co-founder Prophy Science. Він розповів про «надійність» Elasticsearch, роботу над проєктом для Збройних Сил України та співпрацю маленької компанії з бюрократичною Єврокомісією.
In Elixir Wizards Office Hours Episode 2, "Discovery Discoveries," SmartLogic's Project Manager Alicia Brindisi and VP of Delivery Bri LaVorgna join Elixir Wizards Sundi Myint and Owen Bickford on an exploratory journey through the discovery phase of the software development lifecycle. This episode highlights how collaboration and communication transform the client-project team dynamic into a customized expedition. The goal of discovery is to reveal clear business goals, understand the end user, pinpoint key project objectives, and meticulously document the path forward in a Product Requirements Document (PRD). The discussion emphasizes the importance of fostering transparency, trust, and open communication. Through a mutual exchange of ideas, we are able to create the most tailored, efficient solutions that meet the client's current goals and their vision for the future. Key topics discussed in this episode: Mastering the art of tailored, collaborative discovery Navigating business landscapes and user experiences with empathy Sculpting project objectives and architectural blueprints Continuously capturing discoveries and refining documentation Striking the perfect balance between flexibility and structured processes Steering clear of scope creep while managing expectations Tapping into collective wisdom for ongoing discovery Building and sustaining a foundation of trust and transparency Links mentioned in this episode: https://smartlogic.io/ Follow SmartLogic on social media: https://twitter.com/smartlogic Contact Bri: bri@smartlogic.io What is a PRD? https://en.wikipedia.org/wiki/Productrequirementsdocument Special Guests: Alicia Brindisi and Bri LaVorgna.
Episode #34 of "Can I get that software in blue?", a podcast by and for people engaged in technology sales. If you are in the technology presales, solution architecture, sales, support or professional services career paths then this show is for you! If you want to get into building AI products, first go to school and learn about antenna design! At least, that's how our guest for episode 34, Shane Connelly did it. Shane is a deep expert in the search and indexing space having started his career at Autonomy working on early search indexing algorithms and setting up solutions for customers before and after the HP acqusition, later leading Product for the Elasticsearch side of the Elastic product suite. Now he's Head of Product at Vectara building out the next generation of semantic search and retrieval augmented generation platforms. In this episode we touch on benchmarks for gauging the relative performance of difference search algorithms and how it applies to LLMs for doing things like preventing hallucinations in generative AI, what kinds of questions Elasticsearch customers were asking that led Shane to believe that vector based algorithms were the future for doing next generation semantic search, and why he believes Vectara is building the top tier solution to solve these problems. Our website: https://softwareinblue.com Twitter: https://twitter.com/softwareinblue LinkedIn: https://www.linkedin.com/showcase/softwareinblue Make sure to subscribe or follow us to get notified about our upcoming episodes: Youtube: https://www.youtube.com/channel/UC8qfPUKO_rPmtvuB4nV87rg Apple Podcasts: https://podcasts.apple.com/us/podcast/can-i-get-that-software-in-blue/id1561899125 Spotify: https://open.spotify.com/show/25r9ckggqIv6rGU8ca0WP2 Links mentioned in the episode: History of Lucene: https://www.elastic.co/celebrating-lucene Attention is all you need: https://arxiv.org/abs/1706.03762
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! In this episode, we continue our conversation with Ujwala Tulshigiri, Engineering Manager at Uber, focusing on the technical intricacies of migrating workloads and technology consolidation. Ujwala provides an in-depth look into Uber's strategic approach to infrastructure decisions, the challenges of technology migration, and how they contribute to and leverage the open-source community. She discusses the complexities of replacing systems like Elasticsearch with alternatives like Pinot, addressing the nuances of data management, search capabilities, and the importance of maintaining low-latency operations.
In this episode of the Laravel Podcast, we talk about the recent announcement of hiring a new head of engineering at Laravel and the impact it will have on the future of Laravel. We also dive into the upcoming conferences and events, including Laracon EU, Laracon US, and Laracon India. Additionally, we talk about Typesense, a potential alternative to Meilisearch and Algolia for self-hosted search functionality.Taylor Otwell's Twitter - https://twitter.com/taylorotwellMatt Stauffer's Twitter - https://twitter.com/stauffermattLaravel Twitter - https://twitter.com/laravelphpLaravel Website - https://laravel.com/Tighten.co - https://tighten.com/VP/Head of Engineering at Laravel - https://frequent-pick-a8d.notion.site/VP-Head-of-Engineering-at-Laravel-149b566a670841f7a74b3e904e261693Laracon EU - https://laracon.eu/Laracon US - https://laracon.us/Laravel Herd - https://herd.laravel.com/Laravel 11 - https://laravel.com/docs/master/releasesLaravel Live Denmark -https://laravellive.dk/Laravel Live UK - https://laravellive.uk/Laracon India - https://laracon.in/Caleb Porzio Twitter - https://twitter.com/calebporzioLivewire: https://laravel-livewire.com/ThePrimeagen Twitter - https://twitter.com/ThePrimeagenThe Factory - https://www.thefactoryindeepellum.com/Eric Barnes Twitter - https://twitter.com/ericlbarnesJoe Dixon Twitter - https://twitter.com/_joedixonJames Brooks - https://twitter.com/jbrooksukFreek VAn der Herten Twitter - https://twitter.com/freekmurze?lang=enPeter Suhm Twitter - https://twitter.com/petersuhmMichele Hansen Twitter - https://twitter.com/mjwhansenLaracon AU Twitter - https://twitter.com/LaraconAULaravel Scout - https://laravel.com/docs/10.x/scoutTypesense - https://typesense.org/Algolia -https://algolia.com/Meilisearch - https://www.meilisearch.com/Elasticsearch - https://www.elastic.co/elasticsearchLaravel Sail - https://laravel.com/docs/10.x/sailLaravel Vapor - https://vapor.laravel.com/Early Vapor Tweet - https://x.com/taylorotwell/status/1748782542663131442?s=20Tailwind CSS - https://tailwindcss.com/-----Editing and transcription sponsored by Tighten.
SANS Internet Stormcenter Daily Network/Cyber Security and Information Security Stormcast
DShield Sensor Log Collection with Elasticsearch https://isc.sans.edu/forums/diary/DShield%20Sensor%20Log%20Collection%20with%20Elasticsearch/30616/ Anydesk Breach https://anydesk.com/en/public-statement Leaky Vessels https://snyk.io/blog/leaky-vessels-docker-runc-container-breakout-vulnerabilities/
SANS Internet Stormcenter Daily Network/Cyber Security and Information Security Stormcast
DShield Sensor Log Collection with Elasticsearch https://isc.sans.edu/forums/diary/DShield%20Sensor%20Log%20Collection%20with%20Elasticsearch/30616/ Anydesk Breach https://anydesk.com/en/public-statement Leaky Vessels https://snyk.io/blog/leaky-vessels-docker-runc-container-breakout-vulnerabilities/
Jake and Michael discuss all the latest Laravel releases, tutorials, and happenings in the community.Show linksCompare Algolia vs ElasticSearch vs Meilisearch vs TypesenseAspen - The ultimate free API testing tool for macOS with AI integration(02:02) - Laravel 10.41 - Conditional Job Chains, a Number::spell() Threshold, Configurable model:prune Path, and More (06:46) - Laravel 10.42 - Global Defaults for the HTTP Client, a Max Validation Rule for Passwords, and more (09:43) - Laravel Scout Adds Typesense, A Lightening-fast Open-source Search (12:59) - Laravel 11 Introduces the Dumpable Trait (14:46) - Eager Load Limit is Coming to Laravel 11 (18:12) - Dive into the Streamlined Directory Structure in Laravel 11 (23:14) - Meet Aspen: Speedier & Smarter API Testing, Outshining Postman and Insomnia (26:53) - Laravel Live UK (28:25) - Write Tabular Assertions with Pest and PHPUnit (30:39) - Create Beautiful Charts in Filament With the Apex Charts Plugin (31:50) - Generate Tailwind Utility Stylesheets on Demand with Curlwind (34:16) - Download Over 1,500 Google Fonts in Your Laravel Project (35:17) - Create Dynamic Discounts with Custom Conditions on Laravel With the Discountify Package (37:46) - Handling Bulk Imports in Filament
Evelyn Osman, Principal Platform Engineer at AutoScout24, joins Corey on Screaming in the Cloud to discuss the dire need for developers to agree on a standardized tool set in order to scale their projects and innovate quickly. Corey and Evelyn pick apart the new products being launched in cloud computing and discover a large disconnect between what the industry needs and what is actually being created. Evelyn shares her thoughts on why viewing platforms as products themselves forces developers to get into the minds of their users and produces a better end result.About EvelynEvelyn is a recovering improviser currently role playing as a Lead Platform Engineer at Autoscout24 in Munich, Germany. While she says she specializes in AWS architecture and integration after spending 11 years with it, in truth she spends her days convincing engineers that a product mindset will make them hate their product managers less.Links Referenced:LinkedIn: https://www.linkedin.com/in/evelyn-osman/TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest today is Evelyn Osman, engineering manager at AutoScout24. Evelyn, thank you for joining me.Evelyn: Thank you very much, Corey. It's actually really fun to be on here.Corey: I have to say one of the big reasons that I was enthused to talk to you is that you have been using AWS—to be direct—longer than I have, and that puts you in a somewhat rarefied position where AWS's customer base has absolutely exploded over the past 15 years that it's been around, but at the beginning, it was a very different type of thing. Nowadays, it seems like we've lost some of that magic from the beginning. Where do you land on that whole topic?Evelyn: That's actually a really good point because I always like to say, you know, when I come into a room, you know, I really started doing introductions like, “Oh, you know, hey,” I'm like, you know, “I'm this director, I've done this XYZ,” and I always say, like, “I'm Evelyn, engineering manager, or architect, or however,” and then I say, you know, “I've been working with AWS, you know, 11, 12 years,” or now I can't quite remember.Corey: Time becomes a flat circle. The pandemic didn't help.Evelyn: [laugh] Yeah, I just, like, a look at that the year, and I'm like, “Jesus. It's been that long.” Yeah. And usually, like you know, you get some odd looks like, “Oh, my God, you must be a sage.” And for me, I'm… you see how different services kind of, like, have just been reinventions of another one, or they just take a managed service and make another managed service around it. So, I feel that there's a lot of where it's just, you know, wrapping up a pretty bow, and calling it something different, it feels like.Corey: That's what I've been low-key asking people for a while now over the past year, namely, “What is the most foundational, interesting thing that AWS has done lately, that winds up solving for this problem of whatever it is you do as a company? What is it that has foundationally made things better that AWS has put out in the last service? What was it?” And the answers I get are all depressingly far in the past, I have to say. What's yours?Evelyn: Honestly, I think the biggest game-changer I remember experiencing was at an analyst summit in Stockholm when they announced Lambda.Corey: That was announced before I even got into this space, as an example of how far back things were. And you're right. That was transformative. That was awesome.Evelyn: Yeah, precisely. Because before, you know, we were always, like, trying to figure, okay, how do we, like, launch an instance, run some short code, and then clean it up. AWS is going to charge for an hour, so we need to figure out, you know, how to pack everything into one instance, run for one hour. And then they announced Lambda, and suddenly, like, holy shit, this is actually a game changer. We can actually write small functions that do specific things.And, you know, you go from, like, microservices, like, to like, tiny, serverless functions. So, that was huge. And then DynamoDB along with that, really kind of like, transformed the entire space for us in many ways. So, back when I was at TIBCO, there was a few innovations around that, even, like, one startup inside TIBCO that quite literally, their entire product was just Lambda functions. And one of their problems was, they wanted to sell in the Marketplace, and they couldn't figure out how to sell Lambda on the marketplace.Corey: It's kind of wild when we see just how far it's come, but also how much they've announced that doesn't change that much, to be direct. For me, one of the big changes that I remember that really made things better for customers—thought it took a couple of years—was EFS. And even that's a little bit embarrassing because all that is, “All right, we finally found a way to stuff a NetApp into us-east-1,” so now NFS, just like you used to use it in the 90s and the naughts, can be done responsibly in the cloud. And that, on some level, wasn't a feature launch so much as it was a concession to the ways that companies had built things and weren't likely to change.Evelyn: Honestly, I found the EFS launch to be a bit embarrassing because, like, you know, when you look closer at it, you realize, like, the performance isn't actually that great.Corey: Oh, it was horrible when it launched. It would just slam to a halt because you got the IOPS scaled with how much data you stored on it. The documentation explicitly said to use dd to start loading a bunch of data onto it to increase the performance. It's like, “Look, just sandbag the thing so it does what you'd want.” And all that stuff got fixed, but at the time it looked like it was clown shoes.Evelyn: Yeah, and that reminds me of, like, EBS's, like, gp2 when we're, like you know, we're talking, like, okay, provision IOPS with gp2. We just kept saying, like, just give yourself really big volume for performance. And it feel like they just kind of kept that with EFS. And it took years for them to really iterate off of that. Yeah, so, like, EFS was a huge thing, and I see us, we're still using it now today, and like, we're trying to integrate, especially for, like, data center migrations, but yeah, you always see that a lot of these were first more for, like, you know, data centers to the cloud, you know. So, first I had, like, EC2 classic. That's where I started. And I always like to tell a story that in my team, we're talking about using AWS, I was the only person fiercely against it because we did basically large data processing—sorry, I forget the right words—data analytics. There we go [laugh].Corey: I remember that, too. When it first came out, it was, “This sounds dangerous and scary, and it's going to be a flash in the pan because who would ever trust their core compute infrastructure to some random third-party company, especially a bookstore?” And yeah, I think I got that one very wrong.Evelyn: Yeah, exactly. I was just like, no way. You know, I see all these articles talking about, like, terrible disk performance, and here I am, where it's like, it's my bread and butter. I'm specialized in it, you know? I write code in my sleep and such.[Yeah, the interesting thing is, I was like, first, it was like, I can 00:06:03] launch services, you know, to kind of replicate when you get in a data center to make it feature comparable, and then it was taking all this complex services and wrapping it up in a pretty bow for—as a managed service. Like, EKS, I think, was the biggest one, if we're looking at managed services. Technically Elasticsearch, but I feel like that was the redheaded stepchild for quite some time.Corey: Yeah, there was—Elasticsearch was a weird one, and still is. It's not a pleasant service to run in any meaningful sense. Like, what people actually want as the next enhancement that would excite everyone is, I want a serverless version of this thing where I can just point it at a bunch of data, I hit an API that I don't have to manage, and get Elasticsearch results back from. They finally launched a serverless offering that's anything but. You have to still provision compute units for it, so apparently, the word serverless just means managed service over at AWS-land now. And it just, it ties into the increasing sense of disappointment I've had with almost all of their recent launches versus what I felt they could have been.Evelyn: Yeah, the interesting thing about Elasticsearch is, a couple of years ago, they came out with OpenSearch, a competing Elasticsearch after [unintelligible 00:07:08] kind of gave us the finger and change the licensing. I mean, OpenSearch actually become a really great offering if you run it yourself, but if you use their managed service, it can kind—you lose all the benefits, in a way.Corey: I'm curious, as well, to get your take on what I've been seeing that I think could only be described as an internal shift, where it's almost as if there's been a decree passed down that every service has to run its own P&L or whatnot, and as a result, everything that gets put out seems to be monetized in weird ways, even when I'd argue it shouldn't be. The classic example I like to use for this is AWS Config, where it charges you per evaluation, and that happens whenever a cloud resource changes. What that means is that by using the cloud dynamically—the way that they supposedly want us to do—we wind up paying a fee for that as a result. And it's not like anyone is using that service in isolation; it is definitionally being used as people are using other cloud resources, so why does it cost money? And the answer is because literally everything they put out costs money.Evelyn: Yep, pretty simple. Oftentimes, there's, like, R&D that goes into it, but the charges seem a bit… odd. Like from an S3 lens, was, I mean, that's, like, you know, if you're talking about services, that was actually a really nice one, very nice holistic overview, you know, like, I could drill into a data lake and, like, look into things. But if you actually want to get anything useful, you have to pay for it.Corey: Yeah. Everything seems to, for one reason or another, be stuck in this place where, “Well, if you want to use it, it's going to cost.” And what that means is that it gets harder and harder to do anything that even remotely resembles being able to wind up figuring out where's the spend going, or what's it going to cost me as time goes on? Because it's not just what are the resources I'm spinning up going to cost, what are the second, third, and fourth-order effects of that? And the honest answer is, well, nobody knows. You're going to have to basically run an experiment and find out.Evelyn: Yeah. No, true. So, what I… at AutoScout, we actually ended up doing is—because we're trying to figure out how to tackle these costs—is they—we built an in-house cost allocation solution so we could track all of that. Now, AWS has actually improved Cost Explorer quite a bit, and even, I think, Billing Conductor was one that came out [unintelligible 00:09:21], kind of like, do a custom tiered and account pricing model where you can kind of do the same thing. But even that also, there is a cost with it.I think that was trying to compete with other, you know, vendors doing similar solutions. But it still isn't something where we see that either there's, like, arbitrarily low pricing there, or the costs itself doesn't really quite make sense. Like, AWS [unintelligible 00:09:45], as you mentioned, it's a terrific service. You know, we try to use it for compliance enforcement and other things, catching bad behavior, but then as soon as people see the price tag, we just run away from it. So, a lot of the security services themselves, actually, the costs, kind of like, goes—skyrockets tremendously when you start trying to use it across a large organization. And oftentimes, the organization isn't actually that large.Corey: Yeah, it gets to this point where, especially in small environments, you have to spend more energy and money chasing down what the cost is than you're actually spending on the thing. There were blog posts early on that, “Oh, here's how you analyze your bill with Redshift,” and that was a minimum 750 bucks a month. It's, well, I'm guessing that that's not really for my $50 a month account.Evelyn: Yeah. No, precisely. I remember seeing that, like, entire ETL process is just, you know, analyze your invoice. Cost [unintelligible 00:10:33], you know, is fantastic, but at the end of the day, like, what you're actually looking at [laugh], is infinitesimally small compared to all the data in that report. Like, I think oftentimes, it's simply, you know, like, I just want to look at my resources and allocate them in a multidimensional way. Which actually isn't really that multidimensional, when you think about it [laugh].Corey: Increasingly, Cost Explorer has gotten better. It's not a new service, but every iteration seems to improve it to a point now where I'm talking to folks, and they're having a hard time justifying most of the tools in the cost optimization space, just because, okay, they want a percentage of my spend on AWS to basically be a slightly better version of a thing that's already improving and works for free. That doesn't necessarily make sense. And I feel like that's what you get trapped into when you start going down the VC path in the cost optimization space. You've got to wind up having a revenue model and an offering that scales through software… and I thought, originally, I was going to be doing something like that. At this point, I'm unconvinced that anything like that is really tenable.Evelyn: Yeah. When you're a small organization you're trying to optimize, you might not have the expertise and the knowledge to do so, so when one of these small consultancies comes along, saying, “Hey, we're going to charge you a really small percentage of your invoice,” like, okay, great. That's, like, you know, like, a few $100 a month to make sure I'm fully optimized, and I'm saving, you know, far more than that. But as soon as your invoice turns into, you know, it's like $100,000, or $300,000 or more, that percentage becomes rather significant. And I've had vendors come to me and, like, talk to me and is like, “Hey, we can, you know, for a small percentage, you know, we're going to do this machine learning, you know, AI optimization for you. You know, you don't have to do anything. We guaranteed buybacks your RIs.” And as soon as you look at the price tag with it, we just have to walk away. Or oftentimes we look at it, and there are truly very simple ways to do it on your own, if you just kind of put some thought into it.Corey: While we want to talking a bit before this show, you taught me something new about GameLift, which I think is a different problem that AWS has been dealing with lately. I've never paid much attention to it because it is the—as I assume from what it says on the tin, oh, it's a service for just running a whole bunch of games at scale, and I'm not generally doing that. My favorite computer game remains to be Twitter at this point, but that's okay. What is GameLift, though, because you want to shining a different light on it, which makes me annoyed that Amazon Marketing has not pointed this out.Evelyn: Yeah, so I'll preface this by saying, like, I'm not an expert on GameLift. I haven't even spun it up myself because there's quite a bit of price. I learned this fall while chatting with an SA who works in the gaming space, and it kind of like, I went, like, “Back up a second.” If you think about, like, I'm, you know, like, World of Warcraft, all you have are thousands of game clients all over the world, playing the same game, you know, on the same server, in the same instance, and you need to make sure, you know, that when I'm running, and you're running, that we know that we're going to reach the same point the same time, or if there's one object in that room, that only one of us can get it. So, all these servers are doing is tracking state across thousands of clients.And GameLift, when you think about your dedicated game service, it really is just multi-region distributed state management. Like, at the basic, that's really what it is. Now, there's, you know, quite a bit more happening within GameLift, but that's what I was going to explain is, like, it's just state management. And there are far more use cases for it than just for video games.Corey: That's maddening to me because having a global session state store, for lack of a better term, is something that so many customers have built themselves repeatedly. They can build it on top of primitives like DynamoDB global tables, or alternately, you have a dedicated region where that thing has to live and everything far away takes forever to round-trip. If they've solved some of those things, why on earth would they bury it under a gaming-branded service? Like, offer that primitive to the rest of us because that's useful.Evelyn: No, absolutely. And honestly, I wouldn't be surprised if you peeled back the curtain with GameLift, you'll find a lot of—like, several other you know, AWS services that it's just built on top of. I kind of mentioned earlier is, like, what I see now with innovation, it's like we just see other services packaged together and releases a new product.Corey: Yeah, IoT had the same problem going on for years where there was a lot of really good stuff buried in there, like IOT events. People were talking about using that for things like browser extensions and whatnot, but you need to be explicitly told that that's a thing that exists and is handy, but otherwise you'd never know it was there because, “Well, I'm not building anything that's IoT-related. Why would I bother?” It feels like that was one direction that they tended to go in.And now they take existing services that are, mmm, kind of milquetoast, if I'm being honest, and then saying, “Oh, like, we have Comprehend that does, effectively detection of themes, keywords, and whatnot, from text. We're going to wind up re-releasing that as Comprehend Medical.” Same type of thing, but now focused on a particular vertical. Seems to me that instead of being a specific service for that vertical, just improve the baseline the service and offer HIPAA compliance if it didn't exist already, and you're mostly there. But what do I know? I'm not a product manager trying to get promoted.Evelyn: Yeah, that's true. Well, I was going to mention that maybe it's the HIPAA compliance, but actually, a lot of their services already have HIPAA compliance. And I've stared far too long at that compliance section on AWS's site to know this, but you know, a lot of them actually are HIPAA-compliant, they're PCI-compliant, and ISO-compliant, and you know, and everything. So, I'm actually pretty intrigued to know why they [wouldn't 00:16:04] take that advantage.Corey: I just checked. Amazon Comprehend is itself HIPAA-compliant and is qualified and certified to hold Personal Health Information—PHI—Private Health Information, whatever the acronym stands for. Now, what's the difference, then, between that and Medical? In fact, the HIPAA section says for Comprehend Medical, “For guidance, see the previous section on Amazon Comprehend.” So, there's no difference from a regulatory point of view.Evelyn: That's fascinating. I am intrigued because I do know that, like, within AWS, you know, they have different segments, you know? There's, like, Digital Native Business, there's Enterprise, there's Startup. So, I am curious how things look over the engineering side. I'm going to talk to somebody about this now [laugh].Corey: Yeah, it's the—like, I almost wonder, on some level, it feels like, “Well, we wound to building this thing in the hopes that someone would use it for something. And well, if we just use different words, it checks a box in some analyst's chart somewhere.” I don't know. I mean, I hate to sound that negative about it, but it's… increasingly when I talk to customers who are active in these spaces around the industry vertical targeted stuff aimed at their industry, they're like, “Yeah, we took a look at it. It was adorable, but we're not using it that way. We're going to use either the baseline version or we're going to work with someone who actively gets our industry.” And I've heard that repeated about three or four different releases that they've put out across the board of what they've been doing. It feels like it is a misunderstanding between what the world needs and what they're able to or willing to build for us.Evelyn: Not sure. I wouldn't be surprised, if we go far enough, it could probably be that it's just a product manager saying, like, “We have to advertise directly to the industry.” And if you look at it, you know, in the backend, you know, it's an engineer, you know, kicking off a build and just changing the name from Comprehend to Comprehend Medical.Corey: And, on some level, too, they're moving a lot more slowly than they used to. There was a time where they were, in many cases, if not the first mover, the first one to do it well. Take Code Whisperer, their AI powered coding assistant. That would have been a transformative thing if GitHub Copilot hadn't beaten them every punch, come out with new features, and frankly, in head-to-head experiments that I've run, came out way better as a product than what Code Whisperer is. And while I'd like to say that this is great, but it's too little too late. And when I talk to engineers, they're very excited about what Copilot can do, and the only people I see who are even talking about Code Whisperer work at AWS.Evelyn: No, that's true. And so, I think what's happening—and this is my opinion—is that first you had AWS, like, launching a really innovative new services, you know, that kind of like, it's like, “Ah, it's a whole new way of running your workloads in the cloud.” Instead of you know, basically, hiring a whole team, I just click a button, you have your instance, you use it, sell software, blah, blah, blah, blah. And then they went towards serverless, and then IoT, and then it started targeting large data lakes, and then eventually that kind of run backwards towards security, after the umpteenth S3 data leak.Corey: Oh, yeah. And especially now, like, so they had a hit in some corners with SageMaker, so now there are 40 services all starting with the word SageMaker. That's always pleasant.Evelyn: Yeah, precisely. And what I kind of notice is… now they're actually having to run it even further back because they caught all the corporations that could pivot to the cloud, they caught all the startups who started in the cloud, and now they're going for the larger behemoths who have massive data centers, and they don't want to innovate. They just want to reduce this massive sysadmin team. And I always like to use the example of a Bare Metal. When that came out in 2019, everybody—we've all kind of scratched your head. I'm like, really [laugh]?Corey: Yeah, I could see where it makes some sense just for very specific workloads that involve things like specific capabilities of processors that don't work under emulation in some weird way, but it's also such a weird niche that I'm sure it's there for someone. My default assumption, just given the breadth of AWS's customer base, is that whenever I see something that they just announced, well, okay, it's clearly not for me; that doesn't mean it's not meeting the needs of someone who looks nothing like me. But increasingly as I start exploring the industry in these services have time to percolate in the popular imagination and I still don't see anything interesting coming out with it, it really makes you start to wonder.Evelyn: Yeah. But then, like, I think, like, roughly a year or something, right after Bare Metal came out, they announced Outposts. So, then it was like, another way to just stay within your data center and be in the cloud.Corey: Yeah. There's a bunch of different ways they have that, okay, here's ways you can run AWS services on-prem, but still pay us by the hour for the privilege of running things that you have living in your facility. And that doesn't seem like it's quite fair.Evelyn: That's exactly it. So, I feel like now it's sort of in diminishing returns and sort of doing more cloud-native work compared to, you know, these huge opportunities, which is everybody who still has a data center for various reasons, or they're cloud-native, and they grow so big, that they actually start running their own data centers.Corey: I want to call out as well before we wind up being accused of being oblivious, that we're recording this before re:Invent. So, it's entirely possible—I hope this happens—that they announce something or several some things that make this look ridiculous, and we're embarrassed to have had this conversation. And yeah, they're totally getting it now, and they have completely surprised us with stuff that's going to be transformative for almost every customer. I've been expecting and hoping for that for the last three or four re:Invents now, and I haven't gotten it.Evelyn: Yeah, that's right. And I think there's even a new service launches that actually are missing fairly obvious things in a way. Like, mine is the Managed Workflow for Amazon—it's Managed Airflow, sorry. So, we were using Data Pipeline for, you know, big ETL processing, so it was an in-house tool we kind of built at Autoscout, we do platform engineering.And it was deprecated, so we looked at a new—what to replace it with. And so, we looked at Airflow, and we decided this is the way to go, we want to use managed because we don't want to maintain our own infrastructure. And the problem we ran into is that it doesn't have support for shared VPCs. And we actually talked to our account team, and they were confused. Because they said, like, “Well, every new service should support it natively.” But it just didn't have it. And that's, kind of, what, I kind of found is, like, there's—it feels—sometimes it's—there's a—it's getting rushed out the door, and it'll actually have a new managed service or new service launched out, but they're also sort of cutting some corners just to actually make sure it's packaged up and ready to go.Corey: When I'm looking at this, and seeing how this stuff gets packaged, and how it's built out, I start to understand a pattern that I've been relatively down on across the board. I'm curious to get your take because you work at a fairly sizable company as an engineering manager, running teams of people who do this sort of thing. Where do you land on the idea of companies building internal platforms to wrap around the offerings that the cloud service providers that they use make available to them?Evelyn: So, my opinion is that you need to build out some form of standardized tool set in order to actually be able to innovate quickly. Now, this sounds counterintuitive because everyone is like, “Oh, you know, if I want to innovate, I should be able to do this experiment, and try out everything, and use what works, and just release it.” And that greatness [unintelligible 00:23:14] mentality, you know, it's like five talented engineers working to build something. But when you have, instead of five engineers, you have five teams of five engineers each, and every single team does something totally different. You know, one uses Scala, and other on TypeScript, another one, you know .NET, and then there could have been a [last 00:23:30] one, you know, comes in, you know, saying they're still using Ruby.And then next thing you know, you know, you have, like, incredibly diverse platforms for services. And if you want to do any sort of like hiring or cross-training, it becomes incredibly difficult. And actually, as the organization grows, you want to hire talent, and so you're going to have to hire, you know, a developer for this team, you going to have to hire, you know, Ruby developer for this one, a Scala guy here, a Node.js guy over there.And so, this is where we say, “Okay, let's agree. We're going to be a Scala shop. Great. All right, are we running serverless? Are we running containerized?” And you agree on those things. So, that's already, like, the formation of it. And oftentimes, you start with DevOps. You'll say, like, “I'm a DevOps team,” you know, or doing a DevOps culture, if you do it properly, but you always hit this scaling issue where you start growing, and then how do you maintain that common tool set? And that's where we start looking at, you know, having a platform… approach, but I'm going to say it's Platform-as-a-Product. That's the key.Corey: Yeah, that's a good way of framing it because originally, the entire world needed that. That's what RightScale was when EC2 first came out. It was a reimagining of the EC2 console that was actually usable. And in time, AWS improved that to the point where RightScale didn't really have a place anymore in a way that it had previously, and that became a business challenge for them. But you have, what is it now, 2, 300 services that AWS has put out, and out, and okay, great. Most companies are really only actively working with a handful of those. How do you make those available in a reasonable way to your teams, in ways that aren't distracting, dangerous, et cetera? I don't know the answer on that one.Evelyn: Yeah. No, that's true. So, full disclosure. At AutoScout, we do platform engineering. So, I'm part of, like, the platform engineering group, and we built a platform for our product teams. It's kind of like, you need to decide to [follow 00:25:24] those answers, you know? Like, are we going to be fully containerized? Okay, then, great, we're going to use Fargate. All right, how do we do it so that developers don't actually—don't need to think that they're running Fargate workloads?And that's, like, you know, where it's really important to have those standardized abstractions that developers actually enjoy using. And I'd even say that, before you start saying, “Ah, we're going to do platform,” you say, “We should probably think about developer experience.” Because you can do a developer experience without a platform. You can do that, you know, in a DevOps approach, you know? It's basically build tools that makes it easy for developers to write code. That's the first step for anything. It's just, like, you have people writing the code; make sure that they can do the things easily, and then look at how to operate it.Corey: That sure would be nice. There's a lack of focus on usability, especially when it comes to a number of developer tools that we see out there in the wild, in that, they're clearly built by people who understand the problem space super well, but they're designing these things to be used by people who just want to make the website work. They don't have the insight, the knowledge, the approach, any of it, nor should they necessarily be expected to.Evelyn: No, that's true. And what I see is, a lot of the times, it's a couple really talented engineers who are just getting shit done, and they get shit done however they can. So, it's basically like, if they're just trying to run the website, they're just going to write the code to get things out there and call it a day. And then somebody else comes along, has a heart attack when see what's been done, and they're kind of stuck with it because there is no guardrails or paved path or however you want to call it.Corey: I really hope—truly—that this is going to be something that we look back and laugh when this episode airs, that, “Oh, yeah, we just got it so wrong. Look at all the amazing stuff that came out of re:Invent.” Are you going to be there this year?Evelyn: I am going to be there this year.Corey: My condolences. I keep hoping people get to escape.Evelyn: This is actually my first one in, I think, five years. So, I mean, the last time I was there was when everybody's going crazy over pins. And I still have a bag of them [laugh].Corey: Yeah, that did seem like a hot-second collectable moment, didn't it?Evelyn: Yeah. And then at the—I think, what, the very last day, as everybody's heading to re:Play, you could just go into the registration area, and they just had, like, bags of them lying around to take. So, all the competing, you know, to get the requirements for a pin was kind of moot [laugh].Corey: Don't you hate it at some point where it's like, you feel like I'm going to finally get this crowning achievement, it's like or just show up at the buffet at the end and grab one of everything, and wow, that would have saved me a lot of pain and trouble.Evelyn: Yeah.Corey: Ugh, scavenger hunts are hard, as I'm about to learn to my own detriment.Evelyn: Yeah. No, true. Yeah. But I am really hoping that re:Invent proves me wrong. Embarrassingly wrong, and then all my colleagues can proceed to mock me for this ridiculous podcast that I made with you. But I am a fierce skeptic. Optimistic nihilist, but still a nihilist, so we'll see how re:Invent turns out.Corey: So, I am curious, given your experience at more large companies than I tend to be embedded with for any period of time, how have you found that these large organizations tend to pick up new technologies? What does the adoption process look like? And honestly, if you feel like throwing some shade, how do they tend to get it wrong?Evelyn: In most cases, I've seen it go… terrible. Like, it just blows up in their face. And I say that is because a lot of the time, an organization will say, “Hey, we're going to adopt this new way of organizing teams or developing products,” and they look at all the practices. They say, “Okay, great. Product management is going to bring it in, they're going to structure things, how we do the planning, here's some great charts and diagrams,” but they don't really look at the culture aspect.And that's always where I've seen things fall apart. I've been in a room where, you know, our VP was really excited about team topologies and say, “Hey, we're going to adopt it.” And then an engineering manager proceeded to say, “Okay, you're responsible for this team, you're responsible for that team, you're responsible for this team talking to, like, a team of, like, five engineers,” which doesn't really work at all. Or, like, I think the best example is DevOps, you know, where you say, “Ah, we're going to adopt DevOps, we're going to have a DevOps team, or have a DevOps engineer.”Corey: Step one: we're going to rebadge everyone with existing job titles to have the new fancy job titles that reflect it. It turns out that's not necessarily sufficient in and of itself.Evelyn: Not really. The Spotify model. People say, like, “Oh, we're going to do the Spotify model. We're going to do skills, tribes, you know, and everything. It's going to be awesome, it's going to be great, you know, and nice, cross-functional.”The reason I say it bails on us every single time is because somebody wants to be in control of the process, and if the process is meant to encourage collaboration and innovation, that person actually becomes a chokehold for it. And it could be somebody that says, like, “Ah, I need to be involved in every single team, and listen to know what's happening, just so I'm aware of it.” What ends up happening is that everybody differs to them. So, there is no collaboration, there is no innovation. DevOps, you say, like, “Hey, we're going to have a team to do everything, so your developers don't need to worry about it.” What ends up happening is you're still an ops team, you still have your silos.And that's always a challenge is you actually have to say, “Okay, what are the cultural values around this process?” You know, what is SRE? What is DevOps, you know? Is it seen as processes, is it a series of principles, platform, maybe, you know? We have to say, like—that's why I say, Platform-as-a-Product because you need to have that product mindset, that culture of product thinking, to really build a platform that works because it's all about the user journey.It's not about building a common set of tools. It's the user journey of how a person interacts with their code to get it into a production environment. And so, you need to understand how that person sits down at their desk, starts the laptop up, logs in, opens the IDE, what they're actually trying to get done. And once you understand that, then you know your requirements, and you build something to fill those things so that they are happy to use it, as opposed to saying, “This is our platform, and you're going to use it.” And they're probably going to say, “No.” And the next thing, you know, they're just doing their own thing on the side.Corey: Yeah, the rise of Shadow IT has never gone away. It's just, on some level, it's the natural expression, I think it's an immune reaction that companies tend to have when process gets in the way. Great, we have an outcome that we need to drive towards; we don't have a choice. Cloud empowered a lot of that and also has given tools to help rein it in, and as with everything, the arms race continues.Evelyn: Yeah. And so, what I'm going to continue now, kind of like, toot the platform horn. So, Gregor Hohpe, he's a [solutions architect 00:31:56]—I always f- up his name. I'm so sorry, Gregor. He has a great book, and even a talk, called The Magic of Platforms, that if somebody is actually curious about understanding of why platforms are nice, they should really watch that talk.If you see him at re:Invent, or a summit or somewhere giving a talk, go listen to that, and just pick his brain. Because that's—for me, I really kind of strongly agree with his approach because that's really how, like, you know, as he says, like, boost innovation is, you know, where you're actually building a platform that really works.Corey: Yeah, it's a hard problem, but it's also one of those things where you're trying to focus on—at least ideally—an outcome or a better situation than you currently find yourselves in. It's hard to turn down things that might very well get you there sooner, faster, but it's like trying to effectively cargo-cult the leadership principles from your last employer into your new one. It just doesn't work. I mean, you see more startups from Amazonians who try that, and it just goes horribly because without the cultural understanding and the supporting structures, it doesn't work.Evelyn: Exactly. So, I've worked with, like, organizations, like, 4000-plus people, I've worked for, like, small startups, consulted, and this is why I say, almost every single transformation, it fails the first time because somebody needs to be in control and track things and basically be really, really certain that people are doing it right. And as soon as it blows up in their face, that's when they realize they should actually take a step back. And so, even for building out a platform, you know, doing Platform-as-a-Product, I always reiterate that you have to really be willing to just invest upfront, and not get very much back. Because you have to figure out the whole user journey, and what you're actually building, before you actually build it.Corey: I really want to thank you for taking the time to speak with me today. If people want to learn more, where's the best place for them to find you?Evelyn: So, I used to be on Twitter, but I've actually got off there after it kind of turned a bit toxic and crazy.Corey: Feels like that was years ago, but that's beside the point.Evelyn: Yeah, precisely. So, I would even just say because this feels like a corporate show, but find me on LinkedIn of all places because I will be sharing whatever I find on there, you know? So, just look me up on my name, Evelyn Osman, and give me a follow, and I'll probably be screaming into the cloud like you are.Corey: And we will, of course, put links to that in the show notes. Thank you so much for taking the time to speak with me. I appreciate it.Evelyn: Thank you, Corey.Corey: Evelyn Osman, engineering manager at AutoScout24. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, and I will read it once I finish building an internal platform to normalize all of those platforms together into one.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business, and we get to the point. Visit duckbillgroup.com to get started.
It's the year of generative AI and every technology category changed as a result of access to new foundation models. In this podcast, I speak with Elastics's CPO Ken Exner about how enterprise search analytics and other categories, such as how security and observability are evolving.About Ken ExnerChief Product Officer, Elastic"Helping customers gain actionable insights from data is increasingly important in a world of ever-increasing volumes of data. At Elastic, I have the privilege of leading our cross-functional product teams. Nothing is more exciting than seeing engineering, product, and design teams working in rhythm to deliver great experiences for our customers. I am passionate about building customer-oriented solutions that balance flexibility and ease of use, and I don't believe customers should have to compromise for either."Exner joined Elastic after three decades in various technology companies leading product and engineering teams. Most recently, he spent 16 years at Amazon Web Services (AWS), where he built and managed dozens of products used by millions of customers worldwide.He holds a bachelor of science degree from the Haas School of Business at the University of California, Berkeley. He and his family live on the outskirts of Seattle, where they spend time with their pets, which include dogs, cats, chickens, goats, and alpacas. Follow Ken at https://www.linkedin.com/in/ken-exner-b914542/ Follow Maribel at https://www.linkedin.com/in/maribellopez/About Elastic Elastic is a leading platform for search-powered solutions. Elastic understands it's the answers, not just the data. The Elasticsearch platform enables anyone to find the answers they need in real-time using all their data, at scale. Elastic delivers complete, cloud-based, AI-powered solutions for enterprise security, observability and search built on the Elasticsearch platform, the development platform used by thousands of companies, such as well-known brands Uber, Slack, Microsoft, and more than 50% of the Fortune 500.Elastic is a platform for search-powered solutions that helps everyone — organizations, their employees, and their customers — find what they need faster, while keeping applications running smoothly, and protecting against cyber threats.The company offers three main product categories that include Elastic Enterprise Search, Observability, and Security solutions. Some of its customers include well known brands such as Uber, Slack, Microsoft, and thousands of others who rely on us to accelerate results that matter.Follow Elastic at https://www.elastic.co/
Victoria is joined by guest co-host Joe Ferris, CTO at thoughtbot, and Seif Lotfy, the CTO and Co-Founder of Axiom. Seif discusses the journey, challenges, and strategies behind his data analytics and observability platform. Seif, who has a background in robotics and was a 2008 Sony AIBO robotic soccer world champion, shares that Axiom pivoted from being a Datadog competitor to focusing on logs and event data. The company even built its own logs database to provide a cost-effective solution for large-scale analytics. Seif is driven by his passion for his team and the invaluable feedback from the community, emphasizing that sales validate the effectiveness of a product. The conversation also delves into Axiom's shift in focus towards developers to address their need for better and more affordable observability tools. On the business front, Seif reveals the company's challenges in scaling across multiple domains without compromising its core offerings. He discusses the importance of internal values like moving with urgency and high velocity to guide the company's future. Furthermore, he touches on the challenges and strategies of open-sourcing projects and advises avoiding platforms like Reddit and Hacker News to maintain focus. Axiom (https://axiom.co/) Follow Axiom on LinkedIn (https://www.linkedin.com/company/axiomhq/), X (https://twitter.com/AxiomFM), GitHub (https://github.com/axiomhq), or Discord (https://discord.com/invite/axiom-co). Follow Seif Lotfy on LinkedIn (https://www.linkedin.com/in/seiflotfy/) or X (https://twitter.com/seiflotfy). Visit his website at seif.codes (https://seif.codes/). Follow thoughtbot on X (https://twitter.com/thoughtbot) or LinkedIn (https://www.linkedin.com/company/150727/). Become a Sponsor (https://thoughtbot.com/sponsorship) of Giant Robots! Transcript: VICTORIA: This is the Giant Robots Smashing Into Other Giant Robots Podcast, where we explore the design, development, and business of great products. I'm your host, Victoria Guido, and with me today is Seif Lotfy, CTO and Co-Founder of Axiom, the best home for your event data. Seif, thank you for joining me. SEIF: Hey, everybody. Thanks for having me. This is awesome. I love the name of the podcast, given that I used to compete in robotics. VICTORIA: What? All right, we're going to have to talk about that. And I also want to introduce a guest co-host today. Since we're talking about cloud, and observability, and data, I invited Joe Ferris, thoughtbot CTO and Director of Development of our platform engineering team, Mission Control. Welcome, Joe. How are you? JOE: Good, thanks. Good to be back again. VICTORIA: Okay. I am excited to talk to you all about observability. But I need to go back to Seif's comment on competing with robots. Can you tell me a little bit more about what robots you've built in the past? SEIF: I didn't build robots; I used to program them. Remember the Sony AIBOs, where Sony made these dog robots? And we would make them compete. There was an international competition where we made them play soccer, and they had to be completely autonomous. They only communicate via Bluetooth or via wireless protocols. And you only have the camera as your sensor as well as...a chest sensor throws the ball near you, and then yeah, you make them play football against each other, four versus four with a goalkeeper and everything. Just look it up: RoboCup AIBO. Look it up on YouTube. And I...2008 world champion with the German team. VICTORIA: That sounds incredible. What kind of crowds are you drawing out for a robot soccer match? Is that a lot of people involved with that? SEIF: You would be surprised how big the RoboCup competition is. It's ridiculous. VICTORIA: I want to go. I'm ready. I want to, like, I'll look it up and find out when the next one is. SEIF: No more Sony robots but other robots. Now, there's two-legged robots. So, they make them play as two-legged robots, much slower than four-legged robots, but works. VICTORIA: Wait. So, the robots you were playing soccer with had four legs they were running around on? SEIF: Yeah, they were dogs [laughter]. VICTORIA: That's awesome. SEIF: We all get the same robot. It's just a competition on software, right? On a software level. And some other competitions within the RoboCup actually use...you build your own robot and stuff like that. But this one was...it's called the Standard League, where we all have a robot, and we have to program it. JOE: And the standard robot was a dog. SEIF: Yeah, I think back then...we're talking...it's been a long time. I think it started in 2001 or something. I think the competition started in 2001 or 2002. And I compete from 2006 to 2008. Robots back then were just, you know, simple. VICTORIA: Robots today are way too complicated [laughs]. SEIF: Even AI is more complicated. VICTORIA: That's right. Yeah, everything has gotten a lot more complicated [laughs]. I'm so curious how you went from being a world-champion robot dog soccer player [laughs] programmer [laughs] to where you are today with Axiom. Can you tell me a little bit more about your journey? SEIF: The journey is interesting because it came from open source. I used to do open source on the side a lot–part of the GNOME Project. That's where I met Neil and the rest of my team, Mikkel Kamstrup, the whole crowd, basically. We worked on GNOME. We worked on Ubuntu. Like, most of them were working professionally on it. I was working for another company, but we worked on the same project. We ended up at Xamarin, which was bought by Microsoft. And then we ended up doing Axiom. But we've been around each other professionally since 2009, most of us. It's like a little family. But how we ended up exactly in observability, I think it's just trying to fix pain points in my life. VICTORIA: Yeah, I was reading through the docs on Axiom. And there's an interesting point you make about organizations having to choose between how much data they have and how much they want to spend on it. So, maybe you can tell me a little bit more about that pain point and what you really found in the early stages that you wanted to solve. SEIF: So, the early stages of what we wanted to solve we were mainly dealing with...so, the early, early stage, we were actually trying to be a Datadog competitor, where we were going to be self-hosted. Eventually, we focused on logs because we found out that's what was a big problem for most people, just event data, not just metric but generally event data, so logs, traces, et cetera. We built out our own logs database completely from scratch. And one of the things we stumbled upon was; basically, you have three things when it comes to logging, which is low cost, low latency, and large scale. That's what everybody wants. But you can't get all three of them; you can only get two of them. And we opted...like, we chose large scale and low cost. And when it comes to latency, we say it should be just fast enough, right? And that's where we focused on, and this is how we started building it. And with that, this is how we managed to stand out by just having way lower cost than anybody else in the industry and dealing with large scale. VICTORIA: That's really interesting. And how did you approach making the ingestion pipeline for masses amount of data more efficient? SEIF: Just make it coordination-free as possible, right? And get rid of Kafka because Kafka just, you know, drains your...it's where you throw in money. Like maintaining Kafka...it's like back then Elasticsearch, right? Elasticsearch was the biggest part of your infrastructure that would cost money. Now, it's also Kafka. So, we found a way to have our own internal way of queueing things without having to rely on Kafka. As I said, we wrote everything from scratch to make it work. Like, every now and then, I think that we can spin this out of the company and make it a new product. But now, eyes on the prize, right? JOE: It's interesting to hear that somebody who spent so much time in the open-source community ended up rolling their own solution to so many problems. Do you feel like you had some lessons learned from open source that led you to reject solutions like Kafka, or how did that journey go? SEIF: I don't think I'm rejecting Kafka. The problem is how Kafka is built, right? Kafka is still...you have to set up all these servers. They have to communicate, et cetera, etcetera. They didn't build it in a way where it's stateless, and that's what we're trying to go to. We're trying to make things as stateless as possible. So, Kafka was never built for the cloud-native era. And you can't really rely on SQS or something like that because it won't deal with this high throughput. So, that's why I said, like, we will sacrifice some latency, but at least the cost is low. So, if messages show after half a second or a second, I'm good. It doesn't have to be real-time for me. So, I had to write a couple of these things. But also, it doesn't mean that we reject open source. Like, we actually do like open source. We open-source a couple of libraries. We contribute back to open source, right? We needed a solution back then for that problem, and we couldn't find any. And maybe one day, open source will have, right? JOE: Yeah. I was going to ask if you considered open-sourcing any of your high latency, high throughput solutions. SEIF: Not high latency. You make it sound bad. JOE: [laughs] SEIF: You make it sound bad. It's, like, fast enough, right? I'm not going to compete on milliseconds because, also, I'm competing with ClickHouse. I don't want to compete with ClickHouse. ClickHouse is low latency and large scale, right? But then the cost is, you know, off the charts a bit sometimes. I'm going the other route. Like, you know, it's fast enough. Like, how, you know, if it's under two, three seconds, everybody's happy, right? If the results come within two, three seconds, everybody is happy. If you're going to build a real-time trading system on top of it, I'll strongly advise against that. But if you're building, you know, you're looking at dashboards, you're more in the observability field, yeah, we're good. VICTORIA: Yeah, I'm curious what you found, like, which customer personas that market really resonated with. Like, is there a particular, like, industry type where you're noticing they really want to lower their cost, and they're okay with this just fast enough latency? SEIF: Honestly, with the current recession, everybody is okay with giving up some of the speed to reduce the money because I think it's not linear reduction. It's more exponential reduction at this point, right? You give up a second, and you're saving 30%. You give up two seconds, all of a sudden, you're saving 80%. So, I'd say in the beginning, everybody thought they need everything to be very, very fast. And now they're realizing, you know, with limitations you have around your budget and spending, you're like, okay, I'm okay with the speed. And, again, we're not slow. I'm just saying people realize they don't need everything under a second. They're okay with waiting for two seconds. VICTORIA: That totally resonates with me. And I'm curious if you can add maybe a non-technical or a real-life example of, like, how this impacts the operations of a company or organization, like, if you can give us, like, a business-y example of how this impacts how people work. SEIF: I don't know how, like, how do people work on that? Nothing changed, really. They're still doing the, like...really nothing because...and that aspect is you run a query, and, again, as I said, you're not getting the result in a second. You're just waiting two seconds or three seconds, and it's there. So, nothing really changed. I think people can wait three seconds. And we're still like–when I say this, we're still faster than most others. We're just not as fast as people who are trying to compete on a millisecond level. VICTORIA: Yeah, that's okay. Maybe I'll take it back even, like, a step further, right? Like, our audience is really sometimes just founders who almost have no formal technical training or background. So, when we talk about observability, sometimes people who work in DevOps and operations all understand it and kind of know why it's important [laughs] and what we're talking about. So, maybe you could, like, go back to -- SEIF: Oh, if you're asking about new types of people who've been using it -- VICTORIA: Yeah. Like, if you're going to explain to, like, a non-technical founder, like, why your product is important, or, like, how people in their organization might use it, what would you say? SEIF: Oh, okay, if you put it like that. It's more of if you have data, timestamp data, and you want to run analytics on top of it, so that could be transactions, that could be web vitals, rather than count every time somebody visits, you have a timestamp. So, you can count, like, how many visitors visited the website and what, you know, all these kinds of things. That's where you want to use something like Axiom. That's outside the DevOps space, of course. And in DevOps space, there's so many other things you use Axiom for, but that's outside the DevOps space. And we actually...we implemented as zero-config integration with Vercel that kind of went viral. And we were, for a while, the number one enterprise for self-integration because so many people were using it. So, Vercel users are usually not necessarily writing the most complex backends, but a lot of things are happening on the front-end side of things. And we would be giving them dashboards, automated dashboards about, you know, latencies, and how long a request took, and how long the response took, and the content type, and the status codes, et cetera, et cetera. And there's a huge user base around that. VICTORIA: I like that. And it's something, for me, you know, as a managing director of our platform engineering team, I want to talk more to founders about. It's great that you put this product and this app out into the world. But how do you know that people are actually using it? How do you know that people, like, maybe, are they all quitting after the first day and not coming back to your app? Or maybe, like, the page isn't loading or, like, it's not working as they expected it to. And, like, if you don't have anything observing what users are doing in your app, then it's going to be hard to show that you're getting any traction and know where you need to go in and make corrections and adjust. SEIF: We have two ways of doing this. Right now, internally, we use our own tools to see, like, who is sending us data. We have a deployment that's monitoring production deployment. And we're just, you know, seeing how people are using it, how much data they're sending every day, who stopped sending data, who spiked in sending data sets, et cetera. But we're using Mixpanel, and Dominic, our Head of Product, implemented a couple of key metrics to that for that specifically. So, we know, like, what's the average time until somebody starts going from building its own queries with the builder to writing APL, or how long it takes them from, you know, running two queries to five queries. And, you know, we just start measuring these things now. And it's been going...we've been growing healthy around that. So, we tend to measure user interaction, but also, we tend to measure how much data is being sent. Because let's keep in mind, usually, people go in and check for things if there's a problem. So, if there's no problem, the user won't interact with us much unless there's a notification that kicks off. We also just check, like, how much data is being sent to us the whole time. VICTORIA: That makes sense. Like, you can't just rely on, like, well, if it was broken, they would write a [chuckles], like, a question or something. So, how do you get those metrics and that data around their interactions? So, that's really interesting. So, I wonder if we can go back and talk about, you know, we already mentioned a little bit about, like, the early days of Axiom and how you got started. Was there anything that you found in the early discovery process that was surprising and made you pivot strategy? SEIF: A couple of things. Basically, people don't really care about the tech as much as they care [inaudible 12:51] and the packaging, so that's something that we had to learn. And number two, continuous feedback. Continuous feedback changed the way we worked completely, right? And, you know, after that, we had a Slack channel, then we opened a Discord channel. And, like, this continuous feedback coming in just helps with iterating, helps us with prioritizing, et cetera. And that changed the way we actually developed product. VICTORIA: You use Slack and Discord? SEIF: No. No Slack anymore. We had a community Slack. We had a community [inaudible 13:19] Slack. Now, there's no community Slack. We only have a community Discord. And the community Slack is...sorry, internally, we use Slack, but there's a community Discord for the community. JOE: But how do you keep that staffed? Is it, like, everybody is in the Discord during working hours? Is it somebody's job to watch out for community questions? SEIF: I think everybody gets involved now just...and you can see it. If you go on our Discord, you will just see it. Just everyone just gets involved. I think just people are passionate about what they're doing. At least most people are involved on Discord, right? Because there's, like, Discord the help sections, and people are just asking questions and other people answering. And now, we reached a point where people in the community start answering the questions for other people in the community. So, that's how we see it's starting to become a healthy community, et cetera. But that is one of my favorite things: when I see somebody from the community answering somebody else, that's a highlight for me. Actually, we hired somebody from that community because they were so active. JOE: Yeah, I think one of the biggest signs that a product is healthy is when there's a healthy ecosystem building up around it. SEIF: Yeah, and Discord reminds me of the old days of open sources like IRC, just with memes now. But because all of us come from the old IRC days, being on Discord and chatting around, et cetera, et cetera, just gives us this momentum back, gave us this momentum back, whereas Slack always felt a bit too businessy to me. JOE: Slack is like IRC with emoji. Discord is IRC with memes. SEIF: I would say Slack reminds me somehow of MSN Messenger, right? JOE: I feel like there's a huge slam on MSN Messenger here. SEIF: [laughs] What do you guys use internally, Slack or? I think you're using Slack, right? Or Teams. Don't tell me you're using Teams. JOE: No, we're using Slack. SEIF: Okay, good, because I shit talk. Like, there is this, I'll sh*t talk here–when I start talking about Teams, so...I remember that one thing Google did once, and that failed miserably. JOE: Google still has, like, seven active chat products. SEIF: Like, I think every department or every, like, group of engineers just uses one of them internally. I'm not sure. Never got to that point. But hey, who am I to judge? VICTORIA: I just feel like I end up using all of them, and then I'm just rotating between different tabs all day long. You maybe talked me into using Discord. I feel like I've been resisting it, but you got me with the memes. SEIF: Yeah, it's definitely worth it. It's more entertaining. More noise, but more entertaining. You feel it's alive, whereas Slack is...also because there's no, like, history is forever. So, you always go back, and you're like, oh my God, what the hell is this? VICTORIA: Yeah, I have, like, all of them. I'll do anything. SEIF: They should be using Axiom in the background. Just send data to Axiom; we can keep your chat history. VICTORIA: Yeah, maybe. I'm so curious because, you know, you mentioned something about how you realized that it didn't matter really how cool the tech was if the product packaging wasn't also appealing to people. Because you seem really excited about what you've built. So, I'm curious, so just tell us a little bit more about how you went about trying to, like, promote this thing you built. Or was, like, the continuous feedback really early on, or how did that all kind of come together? SEIF: The continuous feedback helped us with performance, but actually getting people to sign up and pay money it started early on. But with Vercel, it kind of skyrocketed, right? And that's mostly because we went with the whole zero-config approach where it's just literally two clicks. And all of a sudden, Vercel is sending your data to Axiom, and that's it. We will create [inaudible 16:33]. And we worked very closely with Vercel to do this, to make this happen, which was awesome. Like, yeah, hats off to them. They were fantastic. And just two clicks, three clicks away, and all of a sudden, we created Axiom organization for you, the data set for you. And then we're sending it...and the data from Vercel is being forwarded to it. I think that packaging was so simple that it made people try it out quickly. And then, the experience of actually using Axiom was sticky, so they continued using it. And then the price was so low because we give 500 gigs for free, right? You send us 500 gigs a month of logs for free, and we don't care. And you can start off here with one terabyte for 25 bucks. So, people just start signing up. Now, before that, it was five terabytes a month for $99, and then we changed the plan. But yeah, it was cheap enough, so people just start sending us more and more and more data eventually. They weren't thinking...we changed the way people start thinking of “what am I going to send to Axiom” or “what am I going to send to my logs provider or log storage?” To how much more can I send? And I think that's what we wanted to reach. We wanted people to think, how much more can I send? JOE: You mentioned latency and cost. I'm curious about...the other big challenge we've seen with observability platforms, including logs, is cardinality of labels. Was there anything you had to sacrifice upfront in terms of cardinality to manage either cost or volume? SEIF: No, not really. Because the way we designed it was that we should be able to deal with high cardinality from scratch, right? I mean, there's open-source ways of doing, like, if you look at how, like, a column store, if you look at a column store and every dimension is its own column, it's just that becomes, like, you can limit on the amount of columns you're creating, but you should never limit on the amount of different values in a column could be. So, if you're having something like stat tags, right? Let's say hosting, like, hostname should be a column, but then the different hostnames you have, we never limit that. So, the cardinality on a value is something that is unlimited for us, and we don't really see it in cost. It doesn't really hit us on cost. It reflects a bit on compression if you get into technical details of that because, you know, high cardinality means a lot of different data. So, compression is harder, but it's not repetitive. But then if you look at, you know, oh, I want to send a lot of different types of fields, not values with fields, so you have hostname, and latency, and whatnot, et cetera, et cetera, yeah, that's where limitation starts because then they have...it's like you're going to a wide range of...and a wider dimension. But even that, we, yeah, we can deal with thousands at this point. And we realize, like, most people will not need more than three or four. It's like a Postgres table. You don't need more than 3,000 to 4000 columns; else, you know, you're doing a lot. JOE: I think it's actually pretty compelling in terms of cost, though. Like, that's one of the things we've had to be most careful about in terms of containing cost for metrics and logs is, a lot of providers will...they'll either charge you based on the number of unique metric combinations or the performance suffers greatly. Like, we've used a lot of Prometheus-based solutions. And so, when we're working with developers, even though they don't need more than, you know, a few dozen metric combinations most of the time, it's hard for people to think of what they need upfront. It's much easier after you deploy it to be able to query your data and slice it retroactively based on what you're seeing. SEIF: That's the detail. When you say we're using Prometheus, a lot of the metrics tools out there are using, just like Prometheus, are using the Gorilla data structure. And the real data structure was never designed to deal with high cardinality labels. So, basically, to put it in a simple way, every combination of tags you send for metrics is its own file on disk. That's, like, the very simple way of explaining this. And then, when you're trying to search through everything, right? And you have a lot of these combinations. I actually have to get all these files from this conversion back together, you know, and then they're chunked, et cetera. So, it's a problem. Generally, how metrics are doing it...most metrics products are using it, even VictoriaMetrics, et cetera. What they're doing is they're using either the Prometheus TSDB data structure, which is based on Gorilla. Influx was doing the same thing. They pivoted to using more and more like the ones we use, and Honeycomb uses, right? So, we might not be as fast on metrics side as these highly optimized. But then when it comes to high [inaudible 20:49], once we start dealing with high cardinality, we will be faster than those solutions. And that's on a very technical level. JOE: That's pretty cool. I realize we're getting pretty technical here. Maybe it's worth defining cardinality for the audience. SEIF: Defining cardinality to the...I mean, we just did that, right? JOE: What do you think, Victoria? Do you know what cardinality is now? [laughs] VICTORIA: All right. Now I'm like, do I know? I was like, I think I know what it means. Cardinality is, like, let's say you have a piece of data like an event or a transaction. SEIF: It's like the distinct count on a property that gives you the cardinality of a property. VICTORIA: Right. It's like how many pieces of information you have about that one event, basically, yeah. JOE: But with some traditional metrics stores, it's easy to make mistakes. For example, you could have unbounded cardinality by including response time as one of the labels -- SEIF: Tags. JOE: And then it's just going to -- SEIF: Oh, no, no. Let me give you a better one. I put in timestamp at some point in my life. JOE: Yeah, I feel like everybody has done that one. [laughter] SEIF: I've put a system timestamp at some point in my life. There was the actual timestamp, and there was a system timestamp that I would put because I wanted to know when the...because I couldn't control the timestamp, and the only timestamp I had was a system timestamp. I would always add the actual timestamp of when that event actually happened into a metric, and yeah, that did not scale. MID-ROLL AD: Are you an entrepreneur or start-up founder looking to gain confidence in the way forward for your idea? At thoughtbot, we know you're tight on time and investment, which is why we've created targeted 1-hour remote workshops to help you develop a concrete plan for your product's next steps. Over four interactive sessions, we work with you on research, product design sprint, critical path, and presentation prep so that you and your team are better equipped with the skills and knowledge for success. Find out how we can help you move the needle at tbot.io/entrepreneurs. VICTORIA: Yeah. I wonder if you could maybe share, like, a story about when it's gone wrong, and you've suddenly charged a lot of money [laughs] just to get information about what's happening in the system. Any, like, personal experiences with observability that kind of informed what you did with Axiom? SEIF: Oof, I have a very bad one, like, a very, very bad one. I used to work for a company. We had to deploy Elasticsearch on Windows Servers, and it was US-East-1. So, just a combination of Elasticsearch back in 2013, 2014 together with Azure and Windows Server was not a good idea. So, you see where this is going, right? JOE: I see where it's going. SEIF: Eventually, we had, like, we get all these problems because we used Elasticsearch and Kibana as our, you know, observability platform to measure everything around the product we were building. And funny enough, it cost us more than actually maintaining the infrastructure of the product. But not just that, it also kept me up longer because most of the downtimes I would get were not because of the product going down. It's because my Elasticsearch cluster started going down, and there's reasons for that. Because back then, Microsoft Azure thought that it's okay for any VM to lose connection with the rest of the VMs for 30 seconds per day. And then, all of a sudden, you have Elasticsearch with a split-brain problem. And there was a phase where I started getting alerted so much that back then, my partner threatened to leave me. So I bought a...what I think was a shock bracelet or a shock collar via Bluetooth, and I connected it to phone for any notification. And I bought that off Alibaba, by the way. And I would charge it at night, put it on my wrist, and go to sleep. And then, when alert happens, it will fully discharge the battery on me every time. JOE: Okay, I have to admit, I did not see where that was going. SEIF: Yeah, did that for a while; definitely did not save my relationship either. But eventually, that was the point where, you know, we started looking into other observability tools like Datadog, et cetera, et cetera, et cetera. And that's where the actual journey began, where we moved away from Elasticsearch and Kibana to look for something, okay, that we don't have to maintain ourselves and we can use, et cetera. So, it's not about the costs as much; it was just pain. VICTORIA: Yeah, pain is a real pain point, actual physical [chuckles] and emotional pain point [laughter]. What, like, motivates you to keep going with Axiom and to keep, like, the wind in your sails to keep working on it? SEIF: There's a couple of things. I love working with my team. So, honestly, I just wake up, and I compliment my team. I just love working with them. They're a lot of fun to work with. And they challenge me, and I challenge them back. And I upset them a lot. And they can't upset me, but I upset them. But I love working with them, and I love working with that team. And the other thing is getting, like, having this constant feedback from customers just makes you want to do more and, you know, close sales, et cetera. It's interesting, like, how I'm a very technical person, and I'm more interested in sales because sales means your product works, the product, the technical parts, et cetera. Because if technically it's not working, you can't build a product on top of it. And if you're not selling it, then what's the point? You only sell when the product is good, more or less, unless you're Oracle. VICTORIA: I had someone ask me about Oracle recently, actually. They're like, "Are you considering going back to it?" And I'm maybe a little allergic to it from having a federal consulting background [laughs]. But maybe they'll come back around. I don't know. We'll see. SEIF: Did you sell your soul back then? VICTORIA: You know, I feel like I just grew up in a place where that's what everyone did was all. SEIF: It was Oracle, IBM, or HP back in the day. VICTORIA: Yeah. Well, basically, when you're working on applications that were built in, like, the '80s, Oracle was, like, this hot, new database technology [laughs] that they just got five years ago. So, that's just, yeah, interesting. SEIF: Although, from a database perspective, they did a lot of the innovations. A lot of first innovations could have come from Oracle. From a technical perspective, they're ridiculous. I'm not sure from a product perspective how good they are. But I know their sales team is so big, so huge. They don't care about the product anymore. They can still sell. VICTORIA: I think, you know, everything in tech is cyclical. So, you know, if they have the right strategy and they're making some interesting changes over there, there's always a chance [laughs]. Certain use cases, I mean, I think that's the interesting point about working in technology is that you know, every company is a tech company. And so, there's just a lot of different types of people, personas, and use cases for different types of products. So, I wonder, you know, you kind of mentioned earlier that, like, everyone is interested in Axiom. But, you know, I don't know, are you narrowing the market? Or, like, how are you trying to kind of focus your messaging and your sales for Axiom? SEIF: I'm trying to focus on developers. So, we're really trying to focus on developers because the experience around observability is crap. It's stupid expensive. Sorry for being straightforward, right? And that's what we're trying to change. And we're targeting developers mainly. We want developers to like us. And we'll find all these different types of developers who are using it, and that's the interesting thing. And because of them, we start adding more and more features, like, you know, we added tracing, and now that enables, like, billions of events pushed through for, you know, again, for almost no money, again, $25 a month for a terabyte of data. And we're doing this with metrics next. And that's just to address the developers who have been giving us feedback and the market demand. I will sum it up, again, like, the experience is crap, and it's stupid expensive. I think that's the [inaudible 28:07] of observability is just that's how I would sum it up. VICTORIA: If you could go back in time and talk to yourself when you were still a developer, now that you're CTO, what advice would you give yourself? JOE: Besides avoiding shock collars. VICTORIA: [laughs] Yes. SEIF: Get people's feedback quickly so you know you're on the right track. I think that's very, very, very, very important. Don't just work in the dark, or don't go too long into stealth mode because, eventually, people catch up. Also, ship when you're 80% ready because 100% is too late. I think it's the same thing here. JOE: Ship often and early. SEIF: Yeah, even if it's not fully ready, it's still feedback. VICTORIA: Ship often and early and talk to people [laughs]. Just, do you feel like, as a developer, did you have the skills you needed to be able to get the most out of those feedback and out of those conversations you were having with people around your product? SEIF: I still don't think I'm good enough. You're just constantly learning, right? I just accepted I'm part of a team, and I have my contributions. But as an individual, I still don't think I know enough. I think there's more I need to learn at this point. VICTORIA: I wonder, what questions do you have for me or Joe? SEIF: How did you start your podcast, and why the name? VICTORIA: Oh, man, I hope I can answer. So, the podcast was started...I think it's, like, we're actually about to be at our 500th Episode. So, I've only been a host for the last year. Maybe Joe even knows more than I do. But what I recall is that one person at thoughtbot thought it would be a great idea to start a podcast, and then they did it. And it seems like the whole company is obsessed with robots. I'm not really sure where that came from. There used to be a tiny robot in the office, is what I remember. And people started using that as, like, the mascot. And then, yeah, that's it, that's the whole thing. SEIF: Was the robot doing anything useful or just being cute? JOE: It was just cute, and it's hard to make a robot cute. SEIF: Was it a real robot, or was it like a -- JOE: No, there was, at one point, a toy robot. The name...I actually forget the origin–origin of the name, but the name Giant Robots comes from our blog. So, we named the podcast the same as the blog: Giant Robots Smashing Into Other Giant Robots. SEIF: Yes, it's called transformers. VICTORIA: Yeah, I like it. It's, I mean, now I feel like -- SEIF: [laughs] VICTORIA: We got to get more, like, robot dogs involved [laughs] in the podcast. SEIF: Like, I wanted to add one thing when we talked about, you know, what gets me going. And I want to mention that I have a six-month-old son now. He definitely adds a lot of motivation for me to wake up in the morning and work. But he also makes me wake up regardless if I want to or not. VICTORIA: Yeah, you said you had invented an alarm clock that never turns off. Never snoozes [laughs]. SEIF: Yes, absolutely. VICTORIA: I have the same thing, but it's my dog. But he does snooze, actually. He'll just, like, get tired and go back to sleep [laughs]. SEIF: Oh, I have a question. Do dogs have a Tamagotchi phase? Because, like, my son, the first three months was like a Tamagotchi. It was easy to read him. VICTORIA: Oh yeah, uh-huh. SEIF: Noisy but easy. VICTORIA: Yes, yes. SEIF: Now, it's just like, yeah, I don't know, like, the last month he has opinions at six months. I think it's because I raised him in Europe. I should take him back to the Middle East [laughs]. No opinions. VICTORIA: No, dogs totally have, like, a communication style, you know, I pretty much know what he, I mean, I can read his mind, obviously [laughs]. SEIF: Sure, but that's when they grow a bit. But what when they were very...when the dog was very young? VICTORIA: Yeah, they, I mean, they also learn, like, your stuff, too. So, they, like, learn how to get you to do stuff or, like, I know she'll feed me if I'm sitting here [laughs]. SEIF: And how much is one dog year, seven years? VICTORIA: Seven years. SEIF: Seven years? VICTORIA: Yeah, seven years? SEIF: Yeah. So, basically, in one year, like, three months, he's already...in one month, he's, you know, seven months old. He's like, yeah. VICTORIA: Yeah. In a year, they're, like, teenagers. And then, in two years, they're, like, full adults. SEIF: Yeah. So, the first month is basically going through the first six months of a human being. So yeah, you pass...the first two days or three days are the Tamagotchi phase that I'm talking about. VICTORIA: [chuckles] I read this book, and it was, like, to understand dogs, it's like, they're just like humans that are trying to, like, maximize the number of positive experiences that they have. So, like, if you think about that framing around all your interactions about, like, maybe you're trying to get your son to do something, you can be like, okay, how do I, like, I don't know, train him that good things happen when he does the things I want him to do? [laughs] That's kind of maybe manipulative but effective. So, you're not learning baby sign language? You're just, like, going off facial expressions? SEIF: I started. I know how Mama looks like. I know how Dada looks like. I know how more looks like, slowly. And he already does this thing that I know that when he's uncomfortable, he starts opening and closing his hands. And when he's completely uncomfortable and basically that he needs to go sleep, he starts pulling his own hair. VICTORIA: [laughs] I do the same thing [laughs]. SEIF: You pull your own hair when you go to sleep? I don't have that. I don't have hair. VICTORIA: I think I do start, like, touching my head though, yeah [inaudible 33:04]. SEIF: Azure took the last bit of hair I had! Went away with Azure, Elasticsearch, and the shock collar. VICTORIA: [laughs] SEIF: I have none of them left. Absolutely nothing. I should sue Elasticsearch for this shit. VICTORIA: [laughs] Let me know how that goes. Maybe there's more people who could join your lawsuit, you know, with a class action. SEIF: [laughs] Yeah. Well, one thing I wanted to also just highlight is, right now, one of the things that also makes the company move forward is we realized that in a single domain, we proved ourselves very valuable to specific companies, right? So, that was a big, big thing, milestone for us. And now we're trying to move into a handful of domains and see which one of those work out the best for us. Does that make sense? VICTORIA: Yeah. And I'm curious: what are the biggest challenges or hurdles that you associate with that? SEIF: At this point, you don't want just feedback. You want constructive criticism. Like, you want to work with people who will criticize the applic...and you iterate with them based on this criticism, right? They're just not happy about you and trying to create design partners. So, for us, it was very important to have these small design partners who can work with us to actually prove ourselves as valuable in a single domain. Right now, we need to find a way to scale this across several domains. And how do you do that without sacrificing? Like, how do you open into other domains without sacrificing the original domain you came from? So, there's a lot of things [inaudible 34:28]. And we are in the middle of this. Honestly, I Forrest Gumped my way through half of this, right? Like, I didn't know what I was doing. I had ideas. I think it's more of luck at this point. And I had luck. No, we did work. We did work a lot. We did sleepless nights and everything. But I think, in the last three years, we became more mature and started thinking more about product. And as I said, like, our CEO, Neil, and Dominic, our head of product, are putting everything behind being a product-led organization, not just a tech-led organization. VICTORIA: That's super interesting. I love to hear that that's the way you're thinking about it. JOE: I was just curious what other domains you're looking at pushing into if you can say. SEIF: So, we are going to start moving into ETL a bit more. We're trying to see how we can fit in specific ML scenarios. I can't say more about the other, though. JOE: Do you think you'll take the same approaches in terms of value proposition, like, low cost, good enough latency? SEIF: Yes, that's definitely one thing. But there's also...so, this is the values we're bringing to the customer. But also, now, our internal values are different. Now it's more of move with urgency and high velocity, as we said before, right? Think big, work small. The values in terms of values we're going to take to the customers it's the same ones. And maybe we'll add some more, but it's still going to be low-cost and large-scale. And, internally, we're just becoming more, excuse my French, agile. I hate that word so much. Should be good with Scrum. VICTORIA: It's painful, but everyone knows what you're talking about [laughs], you know, like -- SEIF: See, I have opinions here about Scrum. I think Scrum should be only used in terms of iceScrum [inaudible 36:04], or something like that. VICTORIA: Oh no [laughter]. Well, it's a Rugby term, right? Like, that's where it should probably stay. SEIF: I did not know it's a rugby term. VICTORIA: Yeah, so it should stay there, but -- SEIF: Yes [laughs]. VICTORIA: Yeah, I think it's interesting. Yeah, I like the being flexible. I like the just, like, continuous feedback and how you all have set up to, like, talk with your customers. Because you mentioned earlier that, like, you might open source some of your projects. And I'm just curious, like, what goes into that decision for you when you're going to do that? Like, what makes you think this project would be good for open source or when you think, actually, we need to, like, keep it? SEIF: So, we open source libraries, right? We actually do that already. And some other big organizations use our libraries; even our competitors use our libraries, that we do. The whole product itself or at least a big part of the product, like database, I'm not sure we're going to open source that, at least not anytime soon. And if we open source, it's going to be at a point where the value-add it brings is nothing compared to how well our product is, right? So, if we can replace whatever's at the back with...the storage engine we have in the back with something else and the product doesn't get affected, that's when we open source it. VICTORIA: That's interesting. That makes sense to me. But yeah, thank you for clarifying that. I just wanted to make sure to circle back. Since you have this big history in open source, yeah, I'm curious if you see... SEIF: Burning me out? VICTORIA: Burning you out, yeah [laughter]. Oh, that's a good question. Yeah, like, because, you know, we're about to be in October here. Do you have any advice or strategies as a maintainer for not getting burned out during the next couple of weeks besides, like, hide in a cave and without internet access [laughs]? SEIF: Stay away from Reddit and Hacker News. That's my goal for October now because I'm always afraid of getting too attached to an idea, or too motivated, or excited by an idea that I drift away from what I am actually supposed to be doing. VICTORIA: Last question is, is there anything else you would like to promote? SEIF: Yeah, check out our website; I think it's at axiom.co. Check it out. Sign up. And comment on Discord and talk to me. I don't bite, sometimes grumpy, but that's just because of lack of sleep in the morning. But, you know, around midday, I'm good. And if you're ever in Berlin and you want to hang out, I'm more than willing to hang out. VICTORIA: Whoo, that's awesome. Yeah, Berlin is great. I was there a couple of years ago but no plans to go back anytime soon, but maybe I'll keep that in mind. You can subscribe to the show and find notes along with a complete transcript for this episode at giantrobots.fm. If you have questions or comments, email us at hosts@giantrobots.fm. And you could find me on Twitter @victori_ousg. And this podcast is brought to you by thoughtbot and produced and edited by Mandy Moore. Thanks for listening. See you next time. Did you know thoughtbot has a referral program? If you introduce us to someone looking for a design or development partner, we will compensate you if they decide to work with us. More info on our website at tbot.io/referral. Or you can email us at referrals@thoughtbot.com with any questions. Special Guests: Joe Ferris and Seif Lotfy.
Elasticsearch is the most established solution today to search and analyze large amounts of logs. However, it can be costly and complex to manage. Quickwit searches large amounts of append only cloud data like logs or ledgers in a fraction time with significantly less cost than Elasticsearch. In this episode, we interview Paul Masurel, one The post Cloud-native Search with Paul Masurel appeared first on Software Engineering Daily.