POPULARITY
Ready to take your tech career to the cloud and build those awe-inspiring systems you see? Then you're in the right place. This episode of AWS Bites is your blueprint for becoming a successful cloud architect. We're not just going to talk about it; we'll show you what worked for us, sharing the critical skills you need, and a practical path to build your expertise. Whether you're a beginner or looking to take the next step, join us as we equip you with the knowledge and tools to make your mark as a cloud architect! In this episode, we mentioned the following resources: Google Cloud Architecture Definition: https://cloud.google.com/learn/what-is-cloud-architecture Market data about the Cloud Professional Services market: https://www.gminsights.com/industry-analysis/cloud-professional-services-market EP 91 - Our Journeys into Software and AWS: https://awsbites.com/91-our-journeys-into-software-and-aws/ AWS Well-Architected Framework: https://aws.amazon.com/architecture/well-architected/ EP 68 - Are you well architected?: https://awsbites.com/68-are-you-well-architected/ Cloud Design Patterns: https://learn.microsoft.com/en-us/azure/architecture/patterns/ The art of scalability (book): https://www.amazon.com/Art-Scalability-Architecture-Organizations-Enterprise/dp/0134032802 Enterprise integration patterns (book): https://www.amazon.com/Enterprise-Integration-Patterns-Designing-Deploying/dp/0321200683/ Designing Data-Intensive Applications (book): https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321 AWS Networking Essentials (free guide): https://aws.amazon.com/getting-started/aws-networking-essentials/ Docker Curriculum (free): https://docker-curriculum.com/ How Linux works (book): https://www.amazon.com/How-Linux-Works-Brian-Ward/dp/1718500408/ Exercism coding challenges: https://exercism.org/ The tangled web (book): https://www.amazon.co.uk/Tangled-Web-Securing-Modern-Applications/dp/1593273886 Low Level YouTube Channel: https://www.youtube.com/lowlevellearning AWS - Best Practices for Security, Identity, & Compliance: https://aws.amazon.com/architecture/security-identity-compliance/ Supercommunicators (book): https://www.amazon.com/Supercommunicators-Unlock-Secret-Language-Connection/dp/0593862066/ An Elegant Puzzle (book): https://www.amazon.com/Elegant-Puzzle-Systems-Engineering-Management/dp/1732265186/ Staff Engineer (book): https://www.amazon.com/Staff-Engineer-Leadership-beyond-management/dp/1736417916/ EP 58 - What can kitties teach us about AWS: https://awsbites.com/58-what-can-kitties-teach-us-about-aws/ AWS User Groups: https://aws.amazon.com/developer/community/usergroups/ Do you have any AWS questions you would like us to address? Leave a comment here or connect with us on X/Twitter: - https://twitter.com/eoins - https://twitter.com/loige
This interview was recorded for the GOTO Book Club.http://gotopia.tech/bookclubRead the full transcription of the interview here:https://gotopia.tech/episodes/331Maciej «MJ» Jedrzejewski - Tech Agnostic Architect & Author of "Master Software Architecture"Artur Skowroński - Head of Java & Kotlin Engineering at VirtusLab & Editor of "JVM Weekly"RESOURCESMJhttps://github.com/meaboutsoftwarehttps://www.linkedin.com/in/jedrzejewski-maciejhttps://www.fractionalarchitect.ioArturhttps://x.com/ArturSkowronskihttps://www.linkedin.com/in/arturskowronskihttps://www.jvm-weekly.comDESCRIPTIONThis conversation explores the evolution and complexities of software architecture, from early programming experiences to advanced design principles.It highlights practical gaps and the value of self-publishing, consulting, and addressing architectural pitfalls. Trends like microservices, serverless computing and AI are examined critically, emphasizing their limitations and supportive roles.Recommendations for further reading include Gregor Hohpe's "Software Architect Elevator", Martin Kleppmann's "Designing Data-Intensive Applications", "Software Architecture: The Hard Parts" and Nick Tune's "Architecture Modernization," offering deep insights into effective software practices.RECOMMENDED BOOKSMaciej «MJ» Jedrzejewski • Master Software Architecture • https://leanpub.com/master-software-architectureGregor Hohpe • Platform Strategy • https://amzn.to/4cxfYdbGregor Hohpe • The Software Architect Elevator • https://amzn.to/3F6d2axMartin Kleppmann • Designing Data-Intensive Applications • https://amzn.to/3mk2RojFord, Richards, Sadalage & Dehghani • Software Architecture: The Hard Parts • https://amzn.to/3QeMgjRNick Tune & Jean-Georges Perrin • Architecture Modernization • https://amzn.to/4b5ASiNSam Newman • Monolith to Microservices • https://amzn.to/2Nml96EVaughn Vernon & Tomasz Jaskula • Strategic Monoliths & Microservices • https://amzn.to/3AcUscjBlueskyTwitterInstagramLinkedInFacebookCHANNEL MEMBERSHIP BONUSJoin this channel to get early access to videos & other perks:https://www.youtube.com/channel/UCs_tLP3AiwYKwdUHpltJPuA/joinLooking for a unique learning experience?Attend the next GOTO conference near you! Get your ticket: gotopia.techSUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted daily!
Welcome back everybody! This week we have another exciting podcast episode, this time with Claudio Jolowicz, well known for his "Hypermodern Python" article series which he recently turned into a book.We dive into what Hypermodern Python means, uv's current quick rise, Nox, 80/20 tooling for (beginner) Python devs, Rust, mindset, and last but not least, some good book tips.We hope you enjoy this interview with Claudio and make sure to check out his great materials.Chapters:00:00 Intro episode01:42 Intro Claudio02:44 Music background04:21 Hypermodern Python08:20 uv rapid develment10:20 Making uv default tool12:56 Nox for local CI15:20 Hypermodern Python, the book19:57 80/20 tooling for beginners22:37 PDM ad segment23:03 uv making it easier 25:02 Cloudflare and Rust27:33 Developer mindset and persistence 30:23 Book tips32:25 Design patterns and TDD33:40 Monkeypatching and dependency injection37:12 Final CTA and wrap upLinks: - Hypermodern Python article series- Hypermodern Python book- NOX vs TOX – WHAT are they for & HOW do you CHOOSE?
Summary Any software system that survives long enough will require some form of migration or evolution. When that system is responsible for the data layer the process becomes more challenging. Sriram Panyam has been involved in several projects that required migration of large volumes of data in high traffic environments. In this episode he shares some of the valuable lessons that he learned about how to make those projects successful. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end-to-end data lakehouse platform built on Trino, the query engine Apache Iceberg was designed for, with complete support for all table formats including Apache Iceberg, Hive, and Delta Lake. Trusted by teams of all sizes, including Comcast and Doordash. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst (https://www.dataengineeringpodcast.com/starburst) and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. This episode is supported by Code Comments, an original podcast from Red Hat. As someone who listens to the Data Engineering Podcast, you know that the road from tool selection to production readiness is anything but smooth or straight. In Code Comments, host Jamie Parker, Red Hatter and experienced engineer, shares the journey of technologists from across the industry and their hard-won lessons in implementing new technologies. I listened to the recent episode "Transforming Your Database" and appreciated the valuable advice on how to approach the selection and integration of new databases in applications and the impact on team dynamics. There are 3 seasons of great episodes and new ones landing everywhere you listen to podcasts. Search for "Code Commentst" in your podcast player or go to dataengineeringpodcast.com/codecomments (https://www.dataengineeringpodcast.com/codecomments) today to subscribe. My thanks to the team at Code Comments for their support. Your host is Tobias Macey and today I'm interviewing Sriram Panyam about his experiences conducting large scale data migrations and the useful strategies that he learned in the process Interview Introduction How did you get involved in the area of data management? Can you start by sharing some of your experiences with data migration projects? As you have gone through successive migration projects, how has that influenced the ways that you think about architecting data systems? How would you categorize the different types and motivations of migrations? How does the motivation for a migration influence the ways that you plan for and execute that work? Can you talk us through one or two specific projects that you have taken part in? Part 1: The Triggers Section 1: Technical Limitations triggering Data Migration Scaling bottlenecks: Performance issues with databases, storage, or network infrastructure Legacy compatibility: Difficulties integrating with modern tools and cloud platforms System upgrades: The need to migrate data during major software changes (e.g., SQL Server version upgrade) Section 2: Types of Migrations for Infrastructure Focus Storage migration: Moving data between systems (HDD to SSD, SAN to NAS, etc.) Data center migration: Physical relocation or consolidation of data centers Virtualization migration: Moving from physical servers to virtual machines (or vice versa) Section 3: Technical Decisions Driving Data Migrations End-of-life support: Forced migration when older software or hardware is sunsetted Security and compliance: Adopting new platforms with better security postures Cost Optimization: Potential savings of cloud vs. on-premise data centers Part 2: Challenges (and Anxieties) Section 1: Technical Challenges Data transformation challenges: Schema changes, complex data mappings Network bandwidth and latency: Transferring large datasets efficiently Performance testing and load balancing: Ensuring new systems can handle the workload Live data consistency: Maintaining data integrity while updates occur in the source system Minimizing Lag: Techniques to reduce delays in replicating changes to the new system Change data capture: Identifying and tracking changes to the source system during migration Section 2: Operational Challenges Minimizing downtime: Strategies for service continuity during migration Change management and rollback plans: Dealing with unexpected issues Technical skills and resources: In-house expertise/data teams/external help Section 3: Security & Compliance Challenges Data encryption and protection: Methods for both in-transit and at-rest data Meeting audit requirements: Documenting data lineage & the chain of custody Managing access controls: Adjusting identity and role-based access to the new systems Part 3: Patterns Section 1: Infrastructure Migration Strategies Lift and shift: Migrating as-is vs. modernization and re-architecting during the move Phased vs. big bang approaches: Tradeoffs in risk vs. disruption Tools and automation: Using specialized software to streamline the process Dual writes: Managing updates to both old and new systems for a time Change data capture (CDC) methods: Log-based vs. trigger-based approaches for tracking changes Data validation & reconciliation: Ensuring consistency between source and target Section 2: Maintaining Performance and Reliability Disaster recovery planning: Failover mechanisms for the new environment Monitoring and alerting: Proactively identifying and addressing issues Capacity planning and forecasting growth to scale the new infrastructure Section 3: Data Consistency and Replication Replication tools - strategies and specialized tooling Data synchronization techniques, eg Pros and cons of different methods (incremental vs. full) Testing/Verification Strategies for validating data correctness in a live environment Implication of large scale systems/environments Comparison of interesting strategies: DBLog, Debezium, Databus, Goldengate etc What are the most interesting, innovative, or unexpected approaches to data migrations that you have seen or participated in? What are the most interesting, unexpected, or challenging lessons that you have learned while working on data migrations? When is a migration the wrong choice? What are the characteristics or features of data technologies and the overall ecosystem that can reduce the burden of data migration in the future? Contact Info LinkedIn (https://www.linkedin.com/in/srirampanyam/) Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. Links DagKnows (https://dagknows.com) Google Cloud Dataflow (https://cloud.google.com/dataflow) Seinfeld Risk Management (https://www.youtube.com/watch) ACL == Access Control List (https://en.wikipedia.org/wiki/Access-control_list) LinkedIn Databus - Change Data Capture (https://github.com/linkedin/databus) Espresso Storage (https://engineering.linkedin.com/data-replication/open-sourcing-databus-linkedins-low-latency-change-data-capture-system) HDFS (https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html) Kafka (https://kafka.apache.org/) Postgres Replication Slots (https://www.postgresql.org/docs/current/logical-replication.html) Queueing Theory (https://en.wikipedia.org/wiki/Queueing_theory) Apache Beam (https://beam.apache.org/) Debezium (https://debezium.io/) Airbyte (https://airbyte.com/) Fivetran (fivetran.com) Designing Data Intensive Applications (https://amzn.to/4aAztR1) by Martin Kleppman (https://martin.kleppmann.com/) (affiliate link) Vector Databases (https://en.wikipedia.org/wiki/Vector_database) Pinecone (https://www.pinecone.io/) Weaviate (https://www.weveate.io/) LAMP Stack (https://en.wikipedia.org/wiki/LAMP_(software_bundle)) Netflix DBLog (https://arxiv.org/abs/2010.12597) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)
Source from Audible
Source from Audible
Summary Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing a non-relational database. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your data workflow, from migration to dbt deployment. Datafold has recently launched data replication testing, providing ongoing validation for source-to-target replication. Leverage Datafold's fast cross-database data diffing and Monitoring to test your replication pipelines automatically and continuously. Validate consistency between source and target at any scale, and receive alerts about any discrepancies. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold (https://www.dataengineeringpodcast.com/datafold). Dagster offers a new approach to building and running data platforms and data pipelines. It is an open-source, cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. Your team can get up and running in minutes thanks to Dagster Cloud, an enterprise-class hosted solution that offers serverless and hybrid deployments, enhanced security, and on-demand ephemeral test deployments. Go to dataengineeringpodcast.com/dagster (https://www.dataengineeringpodcast.com/dagster) today to get started. Your first 30 days are free! Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst (https://www.dataengineeringpodcast.com/starburst) and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Your host is Tobias Macey and today I'm interviewing Oren Eini about the work of designing and building a NoSQL database engine Interview Introduction How did you get involved in the area of data management? Can you describe what constitutes a NoSQL database? How have the requirements and applications of NoSQL engines changed since they first became popular ~15 years ago? What are the factors that convince teams to use a NoSQL vs. SQL database? NoSQL is a generalized term that encompasses a number of different data models. How does the underlying representation (e.g. document, K/V, graph) change that calculus? How have the evolution in data formats (e.g. N-dimensional vectors, point clouds, etc.) changed the landscape for NoSQL engines? When designing and building a database, what are the initial set of questions that need to be answered? How many "core capabilities" can you reasonably design around before they conflict with each other? How have you approached the evolution of RavenDB as you add new capabilities and mature the project? What are some of the early decisions that had to be unwound to enable new capabilities? If you were to start from scratch today, what database would you build? What are the most interesting, innovative, or unexpected ways that you have seen RavenDB/NoSQL databases used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on RavenDB? When is a NoSQL database/RavenDB the wrong choice? What do you have planned for the future of RavenDB? Contact Info Blog (https://ayende.com/blog/) LinkedIn (https://www.linkedin.com/in/ravendb/?originalSubdomain=il) Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. Links RavenDB (https://ravendb.net/) RSS (https://en.wikipedia.org/wiki/RSS) Object Relational Mapper (ORM) (https://en.wikipedia.org/wiki/Object%E2%80%93relational_mapping) Relational Database (https://en.wikipedia.org/wiki/Relational_database) NoSQL (https://en.wikipedia.org/wiki/NoSQL) CouchDB (https://couchdb.apache.org/) Navigational Database (https://en.wikipedia.org/wiki/Navigational_database) MongoDB (https://www.mongodb.com/) Redis (https://redis.io/) Neo4J (https://neo4j.com/) Cassandra (https://cassandra.apache.org/_/index.html) Column-Family (https://en.wikipedia.org/wiki/Column_family) SQLite (https://www.sqlite.org/) LevelDB (https://github.com/google/leveldb) Firebird DB (https://firebirdsql.org/) fsync (https://man7.org/linux/man-pages/man2/fsync.2.html) Esent DB? (https://learn.microsoft.com/en-us/windows/win32/extensible-storage-engine/extensible-storage-engine-managed-reference) KNN == K-Nearest Neighbors (https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm) RocksDB (https://rocksdb.org/) C# Language (https://en.wikipedia.org/wiki/C_Sharp_(programming_language)) ASP.NET (https://en.wikipedia.org/wiki/ASP.NET) QUIC (https://en.wikipedia.org/wiki/QUIC) Dynamo Paper (https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf) Database Internals (https://amzn.to/49A5wjF) book (affiliate link) Designing Data Intensive Applications (https://amzn.to/3JgCZFh) book (affiliate link) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)
This interview was recorded at GOTO Amsterdam for GOTO Unscripted.http://gotopia.techRead the full transcription of this interview hereMartin Kleppmann - Researcher at the Technical University of Munich & Author of "Designing Data-Intensive Applications"Jesse Anderson - Managing Director of Big Data Institute, Host of The Data Dream Team PodcastRESOURCESJesse Anderson: https://youtu.be/cWSCI1LpoGYMartin Kleppmann: https://youtu.be/esMjP-7jlREPrag. Dave Thomas: https://youtu.be/ug8XX2MpzEwhttps://automerge.orgMartinhttps://martin.kleppmann.comhttps://twitter.com/martinklhttps://nondeterministic.computer/@martinhttps://linkedin.com/in/martinkleppmannJessehttps://twitter.com/jessetandersonhttps://www.jesse-anderson.comhttps://sodapodcast.libsyn.com/sitehttps://linkedin.com/in/jessetandersonhttps://www.jesse-anderson.com/category/blogDESCRIPTIONJesse Anderson, director at Big Data Institute, and Martin Kleppmann, author of "Designing Data-Intensive Applications" explore together the evolving data landscape. They start with the origins of Martin's book, emphasizing the crucial art of asking the right questions. Martin unveils industry shifts since 2017, spotlighting the transformative rise of cloud services.The conversation then takes a twist as Martin delves into academia, sharing insights on local-first collaboration software and the fascinating world of Automerge. Aspiring software engineers are treated with some advice on how to navigate the delicate balance between simplicity and adaptability.The interview concludes with a glimpse into diverse career paths in the dynamic realm of data engineering, making it a must-watch for professionals at every stage of their journey.RECOMMENDED BOOKSMartin Kleppmann • Designing Data-Intensive ApplicationsMartin Kleppmann • Secret Colors: A Gentle Introduction to CryptographyJesse Anderson • Data TeamsJesse Anderson • Data Engineering TeamsJesse Anderson • The Ultimate Guide to Switching Careers to Big DataViktor Gamov, Dylan Scott & Dave Klein • Kafka in ActionFabian Hueske & Vasiliki Kalavri • Stream Processing with Apache FlinkTwitterInstagramLinkedInFacebookLooking for a unique learning experience?Attend the next GOTO conference near you! Get your ticket: gotopia.techSUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted daily!
"Designing Data-Intensive Applications"、通称 ”DDIA" 本の Ch12 を読んで感想を語りました。 Amazon.co.jp (英語版) Amazon.co.jp (日本語版) Designing Data-Intensive Applications Lambda Architecture Apache Hudi ポスト・ラムダアーキテクチャの切り札? Apache Hudi Redux.js Domain Dependence (Nassim Nicholas Taleb) Hebbian theory (= Wire together, Fire together) 隈研吾 "監視資本主義: デジタル社会がもたらす光と影" (Netflix) Dopamine, Smartphones & You (Harvard University) Google Mistakenly Tags Black People as ‘Gorillas,' (WSJ)
"Designing Data-Intensive Applications"、通称 ”DDIA" 本の Ch11 を読んで感想を語りました。 Amazon.co.jp (英語版) Amazon.co.jp (日本語版) Designing Data-Intensive Applications Youtube - Greg Young CQRS and Event Sourcing What is Change Data Capture? - Confluent Beamery Hacking Talent - Kafka ITエンジニアの読書について。- 株式会社ジェイテックジャパン Tomohisa Takaoka Linkedin Twitter 関数型で表現するイベントソーシングの実装とその教育 C#とCosmosDBによる自作イベントソーシングフレームワークの設計とコンセプト Ken さんの関連実績 IoT デバイスのログ基盤 (Kinesis Stream) 広告配信のリアルタイムログ (Kinesis Stream) Platform Engineer: アプリケーションログ (Apache Kafka)
"Designing Data-Intensive Applications"、通称 ”DDIA" 本の Ch9 を読んで感想を語りました。 Amazon.co.jp (英語版) Amazon.co.jp (日本語版) Designing Data-Intensive Applications DDIA 読書ノート 【第10章】 project-tsurugi/tsurugidb: Tsurugi - next generation RDB for the new era 各コンポーネントの命名についての指摘と提案 · Issue #4 · project-tsurugi/tsurugidb The Sushi Principle — datasapiens
"Designing Data-Intensive Applications"、通称 ”DDIA" 本の Ch9 を読んで感想を語りました。 Amazon.co.jp (英語版) Amazon.co.jp (日本語版) Designing Data-Intensive Applications Azure Cosmos DB の整合性レベル Amazon Auroraの先進性を誰も解説してくれないから解説する - Qiita, @kumagi A Critique of the CAP Theorem Neo4j Causal Clustering 紹介 by Ken Wagatsuma Triplink(Kentaさんの個人開発サービス)
"Designing Data-Intensive Applications"、通称 ”DDIA" 本の Ch8 を読んで感想を語りました。 Amazon.co.jp (英語版) Amazon.co.jp (日本語版) Designing Data-Intensive Applications Understanding network failures in data centers: measurement, analysis, and implications Trash Day: Coordinating Garbage Collection in Distributed Systems Predictable low latency Spanner: TrueTime と外部整合性 | Google Cloud ビザンチン将軍問題 A technical overview of Azure Cosmos DB | Azure Blog
"Designing Data-Intensive Applications"、通称 ”DDIA" 本の Ch7 を読んで感想を語りました。 Amazon.co.jp (英語版) Amazon.co.jp (日本語版) Designing Data-Intensive Applications Multiversion concurrency control (MVCC) Optimistic concurrency control (OCC) What are some common challenges and pitfalls of MVCC and OCC in your database projects? Google Cloud Spanner: About transactions Neo4j Page Cache Introduced MySQL 8.0: MVCC of Large Objects in InnoDB PostgreSQL Concurrency with MVCC Cockroach Labs: Transaction Retry Error Example
"Designing Data-Intensive Applications"、通称 ”DDIA" 本の Ch6 を読んで感想を語りました。 Amazon.co.jp (英語版) Amazon.co.jp (日本語版) Designing Data-Intensive Applications Apache Cassandra 3.0 Compound Primary Key Dynamic Secondary Hashing Citus v10.0 Sharding with Amazon Relational Database Service Hierarchical partition keys in Azure Cosmos DB Scaling out Postgres with the Citus open source shard rebalancer Azure Cosmos DB を使用したクエリ パフォーマンスのチューニング Azure Cosmos DB の要求ユニット
"Designing Data-Intensive Applications"、通称 ”DDIA" 本の Ch5 を読んで感想を語りました。 Amazon.co.jp (英語版) Amazon.co.jp (日本語版) Designing Data-Intensive Applications Amazon RDS Read Replicas 30分でわかるデータ指向アプリケーションデザイン - Data Engineering Study #18 Shopify Engineering "Read Consistency with Database Replicas"
"Designing Data-Intensive Applications"、通称 ”DDIA" 本の Ch3&4 を読んで感想を語りました。 Amazon.co.jp (英語版) Amazon.co.jp (日本語版) Designing Data-Intensive Applications Database index Log-structured merge-tree (LSM-tree) B-tree protocolbuffers/protobuf msgpack.org Apache Avro Apache Thrift Online transaction processing (OLTP) Online analytical processing (OLAP)
"Designing Data-Intensive Applications"、通称 ”DDIA" 本の Ch1&2 を読んで感想を語りました。 Amazon.co.jp (英語版) Amazon.co.jp (日本語版) Designing Data-Intensive Applications "cookpad storeTV の広告配信を支えるリアルタイムログ集計基盤" (2018) "Managing Kafka with Strimzi: Part 1" (2021) "Managing Kafka with Strimzi: Part 2 — Implementation" (2021) Relational database Document-oriented database Graph database
Drop the fear, not the tables. Chelsea Dole and Floor Drees join Claire Giordano and Pino de Candia to explore the app developer perspective on Path To Citus Con, the podcast for developers who love Postgres. If you're an app developer, you're probably already using Postgres. Now what? What do you need to know? Are databases your best friend or your worst enemy? They talk about the steps to becoming more Postgres-savvy. Should you go depth-first or breadth-first in order to learn more about the underlying database? What are Postgres extensions and how do you go about adopting them? Find out more about the strength of what Floor calls “boring technology.” Finally, both guests tell stories of their non-traditional entries into Postgres that led to their deep work with databases today.Links mentioned in this episode: Fintech startup where Chelsea works, Brex: https://www.brex.com/ Open source data platform where Floor works, Aiven: https://aiven.io/ “Mission-Critical PostgreSQL Databases on Kubernetes" by Karen Jex at KubeCon Europe 2023: https://www.youtube.com/watch?v=_NBQ9JmOMko The Imposters Handbook by Rob Conery: https://bigmachine.io/products/the-imposters-handbook/ Designing Data-Intensive Applications by Martin Kleppmann: https://www.oreilly.com/library/view/designing-data-intensive-applications/9781491903063/ Devopsdays Amsterdam: https://devopsdays.org/events/2023-amsterdam/welcome/ Building Community in Open Source with Floor Drees on the Last Week in AWS podcast: https://www.lastweekinaws.com/podcast/screaming-in-the-cloud/building-community-in-open-source-with-floor-drees/ pg_stat_statements: https://www.postgresql.org/docs/current/pgstatstatements.html PostGIS: https://postgis.net/ “Postgres tips for optimizing Django & Python performance, from my PyCon workshop” by Louise Grandjonc: https://www.citusdata.com/blog/2020/05/20/postgres-tips-for-django-and-python/ Video of Louise's PyCon talk, Optimize Django & Python performance with Postgres superpowers: https://youtu.be/dyBLGjCQJHsGrafana: https://grafana.com/pganalyze: https://pganalyze.com/ auto_explain: https://www.postgresql.org/docs/current/auto-explain.html EXPLAIN ANALYZE in PostgreSQL: https://www.postgresql.org/docs/current/sql-explain.html psql: https://www.postgresguide.com/utilities/psql/ Path To Citus Con Episode 05: My favorite ways to learn more about PostgreSQL with Grant Fritchey and Ryan Booz: https://pathtocituscon.transistor.fm/episodes/my-favorite-ways-to-learn-more-about-postgresql-with-grant-fritchey-and-ryan-booz Coffee Meets Bagel (dating app): https://coffeemeetsbagel.com/
"Designing Data-Intensive Applications"、通称 ”DDIA" 本について熱く語りました。 Amazon.co.jp (英語版) Amazon.co.jp (日本語版) Designing Data-Intensive Applications Martin Kleppmann Lambda Architecture Apache Samza Apache Avro Apache Kafka Apache Hadoop "Moving faster with data streams: The rise of Samza at LinkedIn" "Samza 1.0: Stream Processing at Massive Scale" Jay Kreps
Go 1.20.4 & 1.19.9 coming tomorrowConf42: Golang talks available onlineText marshaling & unmarshaling added to regexp package for 1.21Jonathan's video about the proposal, acceptance, and change processBlog post: WebSockets: Scale at Fractional Footprint in GoReddit question: Which books should I read as an experienced Go developer?Shay's recommendation: Designing Data-Intensive Applications by Martin Kleppman & Benjamin LangeJonathan's recommendation: Go Fundamentals by Mark Bates & Cory Lanou (Jonathan's review)Blog post: FireScroll - A Highly available multi-region KV database with massive read scalability
2023-04-25 Weekly News - Episode 193Watch the video version on YouTube at https://youtube.com/live/JGnYhZM7kk0?feature=shareHosts: Gavin Pickin- Senior Developer at Ortus Solutions Grant Copley - Senior Developer at Ortus Solutions Thanks to our Sponsor - Ortus SolutionsThe makers of ColdBox, CommandBox, ForgeBox, TestBox and all your favorite box-es out there. A few ways to say thanks back to Ortus Solutions: Like and subscribe to our videos on YouTube. Help ORTUS reach for the Stars - Star and Fork our ReposStar all of your Github Box Dependencies from CommandBox with https://www.forgebox.io/view/commandbox-github Subscribe to our Podcast on your Podcast Apps and leave us a review Sign up for a free or paid account on CFCasts, which is releasing new content every week BOXLife store: https://www.ortussolutions.com/about-us/shop Buy Ortus's Books 102 ColdBox HMVC Quick Tips and Tricks on GumRoad (http://gum.co/coldbox-tips) Learn Modern ColdFusion (CFML) in 100+ Minutes - Free online https://modern-cfml.ortusbooks.com/ or buy an EBook or Paper copy https://www.ortussolutions.com/learn/books/coldfusion-in-100-minutes OR — Join us for the 10th Into the Box - In person ONLY!!! Patreon Support ()We have 40 patreons: https://www.patreon.com/ortussolutions. News and AnnouncementsCF Summit West AnnouncedLas Vegas 2-4th of October.Get your early bird passes now. Session passes @ $99 Professional passes @ $199. Only till May 31st, 2023!Can you spot ME - Gavin - Apparently I'm in 3 of the photos!Call for Speakers is OPENhttps://cfsummit.adobeevents.com/Into the Box - Hackathon added to Happy BoxDuring the first day of Into the Box Conference 2023; on May 18th, we're hosting a Happy Box Party where attendees can connect and network with one another. We're excited to announce that this year, we're introducing a new activity: a Hackathon team-up event! This hackathon is an excellent opportunity for tech enthusiasts to come together, collaborate, share their skills and knowledge, and work on innovative projects that tackle real-world problems. We hope you'll join us for this exciting time! We are currently in the process of finalizing the topics for the hackathon and would love to hear your feedback. We want to make sure that the topics we select are relevant to the interests and expertise of our attendees, so please take a moment to let us know what topics you would be most interested in working on.https://www.ortussolutions.com/blog/what-would-you-like-to-hack-on-at-into-the-box-2023 Charlie's Blog AnniversariesAn interesting pair of anniversaries for my blog: 600 posts, over 17 years this monthAfter I posted my last entry, I happened to notice that it was exactly my 600th post here at carehart.org/blog. How about that? And in that time I've had 3,645 comments from folks. I do write mostly for you all, so thanks!I also noticed that it marks my 17th year of blogging here, almost to the day with my first entry posted this same week back then, Apr 15 2006. That's "pretty darn interesting", as Ray Camden might say.FWIW, I'd also blogged elsewhere--yes, on CF--prior to starting this one. And the first of those posts were in early 1998--so technically it's my 25th anniversary of blogging about CF. :-)https://www.carehart.org/blog/2023/4/18/celebrating_600_posts_and_17_yearsICYMI - State of the CF Union 2023 ReleasedHelp us find out the state of the CF Union – what versions of CFML Engine do people use, what frameworks, tools etc.https://teratech.com/state-of-the-cf-union-2023-survey https://www.youtube.com/watch?v=_dubo741aTc New Releases and UpdatesWebinar / Meetups and WorkshopsAdobe ColdFusion Workshop: DevOps, CI/CD, and PipelinesWednesday, May 10, 2023Time: 1:00 - 4:30pm ET; 10:00am - 1:30pm PTHost: CarahsoftCost: No FeeMax CPE Credits Available: 4.2 credit hours (1 CPE credit is based on 50 minutes) Field of Study: Information Technology This workshop is ideal for software engineers that are eager to build pipelines to automate their coding projects. Adobe ColdFusion developers are also encouraged to attend. This course will be beneficial for any professional developer who is looking to simplify their application architecture with Adobe ColdFusion and DevOps.https://carahevents.carahsoft.com/Event/Details/358809-cpe Adobe - Road to Fortuna Series: ColdFusion 2023 in Docker on Google Cloud PlatformMay 23, 202310 AM - 11 AM PTDuring GCP centric webinar, Mark Takata will explore how to run a containerized ColdFusion 2023 server on Google Cloud Platform's Kubernetes powered containerization system. He will demonstrate how the powerful new Google Cloud Platform features added to ColdFusion 2023 can help optimize application development, provisioning and delivery. This will be the first time ColdFusion 2023 will be shown running in containers publicly, and the session is designed to showcase the ease of working in this popular method of software delivery.Speaker - Mark Takata - ColdFusion Technical Evangelist, Adobehttps://docker-gcp-coldfusion.meetus.adobeevents.com/ ICYMI - Exploring APIS: Building Applications with ColdFusion, REST, & GraphQLMark TakataApril 18, 2023 - 10 AM - 11 AM PTIn this session, Mark Takata will demonstrate the power of ColdFusion's data access capabilities by building three different applications. These applications will include a Google Translate clone, a low-code Contacts Manager, and an ETL workflow that integrates no-SQL with a relational database. Mark will use a combination of built-in ColdFusion tooling and freely available third-party integrations to build these applications, providing attendees with valuable insights into ColdFusion's API & data access development capabilities. All code samples will be available on GitHub following the talk in order to help attendees kick-start their own versions of the apps.https://exploring-apis-coldfusion.meetus.adobeevents.com/Recording: https://www.youtube.com/watch?v=SNvg0CvLuiU ICYMI - Mid Michigan ColdFusion Users Group - Ins and outs of CFSetup with Randy Brown of Michigan StateTuesday, April 18th at 7pm EDTRandy Brown from Michigan State is going to show us the ins and outs of CFSetup. CFSetup is a tool to assist administrators and DevOps with setting migration.Recording: https://www.youtube.com/watch?v=hmF7mF_N9xw Charlie presented on CFSetup in Sept 2022 - https://www.youtube.com/redirect?event=live_chat&redir_token=QUFFLUhqa19VLThUMGNUS2hiRG9aVnNJUGpHOUVoRmYwZ3xBQ3Jtc0tucUN6MFJOWjVXQUVxX1dHNkkxSWZUaEFnUEhQWEJlWk1JaEZ5eTBaQXkyUWwzdXhsc1ZVY1p2Y0V2Vk9CT1hMT01pX2pRVmFnVVJidnY4bllWVE1NMDFxVWlEenBhUk5qQUY4d0VxUUNRejVkTVYwaw&q=https%3A%2F%2Fwww.carehart.org%2Fpresentations%2F%23cfsetup_tool ICYMI - Sac Interactive Meetup - Crash Course in Web Components with Nolan ErckICYMI - Wednesday - April 19th, 2023 - 6:30PM PTWeb Components provide a modular way to build a consistent design system and user experience across your entire application. Instead of copy/pasting the same chunks of coded into various places, you can have a JavaScript/HTML expert focus on getting the UX correct, without them needing to worry about what's happening in the back-end at all.Web Components offer reusable functionality on the front-end, with native JavaScript. They are a great middle-ground between standard request/response based traditional web applications, and apps that aren't quite ready to move to a full-on JavaScript SPA framework. Web Components themselves are 100% native JavaScript - no new libraries required! Let's learn how Web Components work, then look at integrating them into a back-end like CFML or PHP -- I promise it's easier than you think!https://www.meetup.com/sacinteractive/events/292762638/ Recording: TBDCFCasts Content Updateshttps://www.cfcasts.comNew FeaturesSearch is now powered by ElasticSearch with drastically increased search relevance.Recent Releases Mastering CommandBox 5 - 3 new videos - https://cfcasts.com/series/mastering-commandbox-5 JVM Args as Array Request Dumper X-Forwarded-For Support 2023 ForgeBox Module of the Week Series - 1 new Video https://cfcasts.com/series/2023-forgebox-modules-of-the-week 2023 VS Code Hint tip and Trick of the Week Series - 1 new Video https://cfcasts.com/series/2023-vs-code-hint-tip-and-trick-of-the-week Just added 2019 Into the Box Videos Watch sessions from previous ITB years Into the Box 2022 - https://cfcasts.com/series/itb-2022 Into the Box 2021 - https://cfcasts.com/series/into-the-box-2021 Into the Box 2020 - https://cfcasts.com/series/itb-2020 Into the Box 2019 - https://cfcasts.com/series/into-the-box-2019 Coming Soon Brad with more CommandBox Videos More ForgeBox and VS Code Podcast snippet videos ColdBox Elixir from Eric Getting Started with Inertia.js from Eric Conferences and TrainingVS Code Day April 26, 2023 - 10AM - 4PM PT TOMORROW!!VS Code Day is our annual event where you'll learn how to elevate your development workflow using the latest and greatest features of VS Code. You'll hear from members of the VS Code team and other VS Code experts on topics like AI-powered programming with GitHub Copilot, coding anywhere with remote development, bringing data science to the cloud, and more. Whether you're just starting out or you're an experienced developer, join us on April 26, 2023 for a day focused on the editor that lets you code anything, cross-platform and free!https://learn.microsoft.com/en-us/events/learn-events/vs-code-day-2023/ J on the BeachBringing DevOps, Devs and Data Scientists together around Big DataMay 10-12, 2023 Malaga, Spainhttps://www.jonthebeach.com/ Ortus Profile: https://www.jonthebeach.com/jobs/54/Ortus%20SolutionsVueJS Live MAY 12 & 15, 2023ONLINE + LONDON, UKCODE / CREATE / COMMUNICATE35 SPEAKERS, 10 WORKSHOPS10000+ JOINING ONLINE GLOBALLY300 LUCKIES MEETING IN LONDONhttps://vuejslive.com/ Into the Box 2023 - 10th EditionMay 17-19, 2023 The conference will be held in The Woodlands (Houston), Texas - This year we will continue the tradition of training and offering a pre-conference hands-on training day on May 17th and our live Mariachi Band Party! However, we are back to our Spring schedule and beautiful weather in The Woodlands! Also, this 2023 will mark our 10 year anniversary. So we might have two live bands and much more!!!IN PERSON ONLY Website launched: https://intothebox.orghttps://itb2023.eventbrite.com/ 1 month away - can't wait, watch videos from the last 4 years on CFCasts Into the Box 2022 - https://cfcasts.com/series/itb-2022 Into the Box 2021 - https://cfcasts.com/series/into-the-box-2021 Into the Box 2020 - https://cfcasts.com/series/itb-2020 Into the Box 2019 - https://cfcasts.com/series/into-the-box-2019 VueConf.usNEW ORLEANS, LA • MAY 24-26, 2023Jazz. Code. Vue.Workshop day: May 24Main Conference: May 25-26https://vueconf.us/ CFCamp - Pre-Conference - Ortus has 4 TrainingsJune 21st, 2023Held at the CFCamp venue at the Marriott Hotel Munich Airport in Freising. TestBox: Getting started with BDD-TDD Oh My! Coldbox 7 - from zero to hero Legacy Code Conversion To The Modern World CommandBox Server Deployment for the Modern Age https://www.cfcamp.org/pre-conference.html CFCampJune 22-23rd, 2023Marriott Hotel Munich Airport, FreisingCheck out all the great sessions: https://www.cfcamp.org/sessions.htmlCheck out all the great speakers: https://www.cfcamp.org/cfcamp-conference-2023/speakers.html Register now: https://www.cfcamp.org/Adobe CF Summit WestLas Vegas 2-4th of October.Get your early bird passes now. Session passes @ $99 Professional passes @ $199. Only till May 31st, 2023!Can you spot ME - Gavin - Apparently I'm in 3 of the photos!Call for Speakers is OPENhttps://cfsummit.adobeevents.com/ More conferencesNeed more conferences, this site has a huge list of conferences for almost any language/community.https://confs.tech/Blogs, Tweets, and Videos of the Week4/18/23 - Blog - Charlie Arehart - New updates released for Java 8, 11, 17, and 20 as of Apr 18 2023It's that time again: there are new JVM updates released today (Apr 18, 2023) for the current long-term support (LTS) releases of Oracle Java, 8, 11, and 17, as well as the current interim update 20.TLDR: The new updates are 1.8.0_371 (aka 8u371), 11.0.19, 17.0.7, and 20.0.1 respectively). For more on each of them, including what changed and the security fixes they each contain (including their CVE scores regarding urgency of concerns), see the Oracle resources I list below. Oracle calls them "critical patch updates" (yep, CPU), but they are in fact scheduled quarterly updates, so that "critical" nomenclature may sometimes be a bit overstated. And as is generally the case with these Java updates, most of them have the same changes and fixes across the versions as each other, though not always.https://www.carehart.org/blog/2023/4/18/java_updates_Apr_2023 4/18/23 - Blog - Charlie Arehart - An interesting pair of anniversaries for my blog: 600 posts, over 17 years this monthAfter I posted my last entry, I happened to notice that it was exactly my 600th post here at carehart.org/blog. How about that? And in that time I've had 3,645 comments from folks. I do write mostly for you all, so thanks!I also noticed that it marks my 17th year of blogging here, almost to the day with my first entry posted this same week back then, Apr 15 2006. That's "pretty darn interesting", as Ray Camden might say.FWIW, I'd also blogged elsewhere--yes, on CF--prior to starting this one. And the first of those posts were in early 1998--so technically it's my 25th anniversary of blogging about CF. :-)https://www.carehart.org/blog/2023/4/18/celebrating_600_posts_and_17_years 4/18/23 - Blog - Ortus Solutions - Get ready to code; Into The Box's 10th edition is just 1 month Away!Our biggest Into the Box conference is around the corner; Big shoutout to everyone already signed in and of course to all the amazing leading companies supporting the event; we're excited to learn alongside you at our 10th anniversary!If you haven't registered yet, what are you waiting for? Get 15% off your Only conference and All Access tickets using the code: lastmonthitb or click on the link below! https://www.ortussolutions.com/blog/just-one-month-for-into-the-box-2023 4/19/23 - Podcast - Working Code Podcast - Episode 123: Negative 10x DevelopersIn episode 58, we weighed-in on whether or not 10x engineers actually exist. On today's episode, we go hard in the other direction, talking about the much less mythical -10x engineer: those engineers that seem to actively work in opposition to the greater good, holding unnecessary meetings and flooding the team with a massive amount of documentation. This discussion was directly inspired by the post, How to be a -10x engineer.https://www.bennadel.com/blog/4451-working-code-podcast-episode-123-negative-10x-developers.htm 4/24/23 - Blog - Ortus Solutions - Into the Box - Hackathon added to Happy BoxDuring the first day of Into the Box Conference 2023; on May 18th, we're hosting a Happy Box Party where attendees can connect and network with one another. We're excited to announce that this year, we're introducing a new activity: a Hackathon team-up event! This hackathon is an excellent opportunity for tech enthusiasts to come together, collaborate, share their skills and knowledge, and work on innovative projects that tackle real-world problems. We hope you'll join us for this exciting time! We are currently in the process of finalizing the topics for the hackathon and would love to hear your feedback. We want to make sure that the topics we select are relevant to the interests and expertise of our attendees, so please take a moment to let us know what topics you would be most interested in working on.https://www.ortussolutions.com/blog/what-would-you-like-to-hack-on-at-into-the-box-2023 4/24/23 - Blog - Ben Nadel - John Gall's Law On Building Complex SystemsYears ago, in the book Designing Data-Intensive Applications, I came across "Gall's Law". It's a hot take on building complex systems by systems theorist, John Gall. The law resonates deeply with me; and, I wanted to pull it out into its own post so that I could share it more easily:A complex system that works is invariably found to have evolved from a simple system that worked. The inverse proposition also appears to be true: a complex system designed from scratch never works and cannot be made to work. You have to start over, beginning with a simple system.— John Gallhttps://www.bennadel.com/blog/4452-john-galls-law-on-building-complex-systems.htm 4/24/23 - Blog - Charlie Arehart - New look for CArehart.org, especially better for the blog on mobileOK, so in my last post (celebrating my 600th post and 17th year blogging here) I happened to admit that the site still looked like it was from 2006--that's actually the year of the blogcfc 5.005 version that I based it on...and I never really changed the "look and feel" much.Until this weekend...and it may not be noticeable to all, but I suspect some things will stand out to regular readers, starting with....https://www.carehart.org/blog/2023/4/24/new_look_for_site_and_especially_mobile 4/25/23 - Blog - Charlie Arehart - Delighted to be speaking at CFCamp 2023I'm delighted to announce that I've been selected to speak at CFCamp 2023, in Munich this June. This will be my 7th year in a row presenting at this wonderful event, and my 8th year total.(While I said "thrilled" about my previous two conference announcements, saying "delighted" here is not a downgrade. Just a desire not to sound repetitive!)My talk will be...Title: Hidden gems in ColdFusion 2023https://www.carehart.org/blog/2023/4/25/speaking_at_CFCamp_2023 CFML JobsSeveral positions available on https://www.getcfmljobs.com/Listing over 60 ColdFusion positions from 38 companies across 29 locations in 5 Countries.0 new jobs listed this weekOther Job LinksThere is a jobs channel in the CFML slack team, and in the Box team slack now tooForgeBox Module of the WeekcbEventCachingOverride - ColdBox Event Caching OverrideThis module allows you to set an override for event caching to allow dynamic cache values, like in this module, midnight.Example// Cache this event, defaulting to 1440 mins, or 1 day, but we want to have it actually timeout at midnightfunction index( event, rc, prc ) cache="true" cacheTimeout="1440" cacheOverrideTimeout="midnight" { // your code here}https://www.forgebox.io/view/cbEventCachingOverride VS Code Hint Tips and Tricks of the WeekTailwind FoldBy StivoWith Tailwind Fold, you can say goodbye to messy and hard-to-read HTML code. This extension helps improve the readability of your code by automatically "folding" long class attributes.https://marketplace.visualstudio.com/items?itemName=stivo.tailwind-fold Thank you to all of our Patreon SupportersThese individuals are personally supporting our open source initiatives to ensure the great toolings like CommandBox, ForgeBox, ColdBox, ContentBox, TestBox and all the other boxes keep getting the continuous development they need, and funds the cloud infrastructure at our community relies on like ForgeBox for our Package Management with CommandBox. You can support us on Patreon here https://www.patreon.com/ortussolutionsDon't forget, we have Annual Memberships, pay for the year and save 10% - great for businesses. Bronze Packages and up, now get a ForgeBox Pro and CFCasts subscriptions as a perk for their Patreon Subscription. All Patreon supporters have a Profile badge on the Community Website All Patreon supporters have their own Private Forum access on the Community Website All Patreon supporters have their own Private Channel access BoxTeam Slack https://community.ortussolutions.com/Top Patreons () John Wilson - Synaptrix Tomorrows Guides Jordan Clark Gary Knight Mario Rodrigues Giancarlo Gomez David Belanger Dan Card Jeffry McGee - Sunstar Media Dean Maunder Nolan Erck Abdul Raheen And many more PatreonsYou can see an up to date list of all sponsors on Ortus Solutions' Websitehttps://ortussolutions.com/about-us/sponsors Thanks everyone!!! ★ Support this podcast on Patreon ★
What are lost updates, and what can we do about them? Maybe we don't do anything and accept the write skew? Also, Allen has sharp ears, Outlaw's gort blah spotterfiles, […]
What are lost updates, and what can we do about them? Maybe we don't do anything and accept the write skew? Also, Allen has sharp ears, Outlaw's gort blah spotterfiles, and Joe is just thinking about breakfast. The full show notes for this episode are available at https://www.codingblocks.net/episode206. News Preventing Lost Updates Detecting Lost Updates […]
Ever wonder how database backups work if new data is coming in while the backup is running? Hang with us while we talk about that, while Allen doesn't stand a […]
Ever wonder how database backups work if new data is coming in while the backup is running? Hang with us while we talk about that, while Allen doesn't stand a chance, Outlaw is in love, and Joe forgets his radio voice. The full show notes for this episode are available at https://www.codingblocks.net/episode204. News Thanks for […]
It's time we learn about multi-object transactions as we continue our journey into Designing Data-Intensive Applications, while Allen didn't specifically have that thought, Joe took a marketing class, and Michael promised he wouldn't cry.
It's time we learn about multi-object transactions as we continue our journey into Designing Data-Intensive Applications, while Allen didn't specifically have that thought, Joe took a marketing class, and Michael promised he wouldn't cry.
We decided to knock the dust off our copies of Designing Data-Intensive Applications to learn about transactions while Michael is full of solutions, Allen isn't deterred by Cheater McCheaterton, and Joe realizes wurds iz hard.
We decided to knock the dust off our copies of Designing Data-Intensive Applications to learn about transactions while Michael is full of solutions, Allen isn't deterred by Cheater McCheaterton, and Joe realizes wurds iz hard.
Could you explain Apache Kafka® in ways that a small child could understand? When Mitch Seymour, author of Mastering Kafka Streams and ksqlDB, wanted a way to communicate the basics of Kafka and event-based stream processing, he decided to author a children's book on the subject, but it turned into something with a far broader appeal.Mitch conceived the idea while writing a traditional manuscript for engineers and technicians interested in building stream processing applications. He wished he could explain what he was writing about to his 2-year-old daughter, and contemplated the best way to introduce the concepts in a way anyone could grasp.Four months later, he had completed the illustration book: Gently Down the Stream: A Gentle Introduction to Apache Kafka. It tells the story of a family of forest-dwelling Otters, who discover that they can use a giant river to communicate with each other. When more Otter families move into the forest, they must learn to adapt their system to handle the increase in activity.This accessible metaphor for how streaming applications work is accompanied by Mitch's warm, painterly illustrations.For his second book, Seymour collaborated with the researcher and software developer Martin Kleppmann, author of Designing Data-Intensive Applications. Kleppmann admired the illustration book and proposed that the next book tackle a gentle introduction to cryptography. Specifically, it would introduce the concepts behind symmetric-key encryption, key exchange protocols, and the Diffie-Hellman algorithm, a method for exchanging secret information over a public channel.Secret Colors tells the story of a pair of Bunnies preparing to attend a school dance, who eagerly exchange notes on potential dates. They realize they need a way of keeping their messages secret, so they develop a technique that allows them to communicate without any chance of other Bunnies intercepting their messages.Mitch's latest illustration book is—A Walk to the Cloud: A Gentle Introduction to Fully Managed Environments. In the episode, Seymour discusses his process of creating the books from concept to completion, the decision to create his own publishing company to distribute these books, and whether a fourth book is on the way. He also discusses the experience of illustrating the books side by side with his wife, shares his insights on how editing is similar to coding, and explains why a concise set of commands is equally desirable in SQL queries and children's literature.EPISODE LINKSMinimizing Software Speciation with ksqlDB and Kafka StreamsGently Down the Stream: A Gentle Introduction to Apache KafkaSecret ColorsA Walk to the Cloud: A Gentle Introduction to Fully Managed EnvironmentsApache Kafka On the Go: Kafka Concepts for BeginnersApache Kafka 101 courseWatch the videoJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)
Guest Geoff Harcourt, CTO of CommonLit, joins Joël to talk about a thing that comes up with a lot with clients: the performance of their test suite. It's often a concern because with test suites, until it becomes a problem, people tend to not treat it very well, and people ask for help on making their test suites faster. Geoff shares how he handles a scenario like this at CommonLit. This episode is brought to you by Airbrake (https://airbrake.io/?utm_campaign=Q3_2022%3A%20Bike%20Shed%20Podcast%20Ad&utm_source=Bike%20Shed&utm_medium=website). Visit Frictionless error monitoring and performance insight for your app stack. Geoff Harcourt (https://twitter.com/geoffharcourt) Common Lit (https://www.commonlit.org/) Cuprite driver (https://cuprite.rubycdp.com/) Chrome DevTools Protocol (CDP) (https://chromedevtools.github.io/devtools-protocol/) Factory Doctor (https://test-prof.evilmartians.io/#/profilers/factory_doctor) Joël's RailsConf talk (https://www.youtube.com/watch?v=LOlG4kqfwcg) Formal Methods (https://www.hillelwayne.com/post/formally-modeling-migrations/) Rails multi-database support (https://guides.rubyonrails.org/active_record_multiple_databases.html) Knapsack pro (https://knapsackpro.com/) Prior episode with Eebs (https://www.bikeshed.fm/353) Shopify article on skipping specs (https://shopify.engineering/spark-joy-by-running-fewer-tests) Transcript: JOËL: Hello and welcome to another episode of The Bike Shed, a weekly podcast from your friends at thoughtbot about developing great software. I'm Joël Quenneville. And today, I'm joined by Geoff Harcourt, who is the CTO of CommonLit. GEOFF: Hi, Joël. JOËL: And together, we're here to share a little bit of what we've learned along the way. Geoff, can you briefly tell us what is CommonLit? What do you do? GEOFF: CommonLit is a 501(c)(3) non-profit that delivers a literacy curriculum in English and Spanish to millions of students around the world. Most of our tools are free. So we take a lot of pride in delivering great tools to teachers and students who need them the most. JOËL: And what does your role as CTO look like there? GEOFF: So we have a small engineering team. There are nine of us, and we run a Rails monolith. I'd say a fair amount of the time; I'm hands down in the code. But I also do the things that an engineering head has to do, so working with vendors, and figuring out infrastructure, and hiring, and things like that. JOËL: So that's quite a variety of things that you have to do. What is new in your world? What's something that you've encountered recently that's been fun or interesting? GEOFF: It's the start of the school year in America, so traffic has gone from a very tiny amount over the summer to almost the highest load that we'll encounter all year. So we're at a new hosting provider this fall. So we're watching our infrastructure and keeping an eye on it. The analogy that we've been using to describe this is like when you set up a bunch of plumbing, it looks like it all works, but until you really pump water through it, you don't see if there are any leaks. So things are in good shape right now, but it's a very exciting time of year for us. JOËL: Have you ever done some actual plumbing yourself? GEOFF: I am very, very bad at home repair. But I have fixed a toilet or two. I've installed a water filter but nothing else. What about you? JOËL: I've done a little bit of it when I was younger with my dad. Like, I actually welded copper pipes and that kind of thing. GEOFF: Oh, that's amazing. That's cool. Nice. JOËL: So I've definitely felt that thing where you turn the water source back on, and it's like, huh, let's see, is this joint going to leak, or are we good? GEOFF: Yeah, they don't have CI for plumbing, right? JOËL: [laughs] You know, test it in production, right? GEOFF: Yeah. [laughs] So we're really watching right now traffic starting to rise as students and teachers are coming back. And we're also figuring out all kinds of things that we want to do to do better monitoring of our application, so some of this is watching metrics to see if things happen. But some of this is also doing some simulated user activity after we do deploys. So we're using some automated browsers with Cypress to log into our application and do some user flows, and then report back on the results. JOËL: So is this kind of like a feature test in CI, except that you're running it in production? GEOFF: Yeah. Smoke test is the word that we've settled on for it, but we run it against our production server every time we deploy. And it's a small suite. It's nowhere as big as our big Capybara suite that we run in CI, but we're trying to get feedback in less than six minutes. That's sort of the goal. In addition to running tests, we also take screenshots with a tool called Percy, and that's a visual regression testing tool. So we get to see the screenshots, and if they differ by more than one pixel, we get a ping that lets us know that maybe our CSS has moved around or something like that. JOËL: Has that caught some visual bugs for you? GEOFF: Definitely. The state of CSS at CommonLit was very messy when I arrived, and it's gotten better, but it still definitely needs some love. There are some false positives, but it's been really, really nice to be able to see visual changes on our production pages and then be able to approve them or know that there's something we have to go back and fix. JOËL: I'm curious, for this smoke test suite, how long does it take to run? GEOFF: We run it in parallel. It runs on Buildkite, which is the same tool that we use to orchestrate our CI, and the longest test takes about five minutes. It signs in as a teacher, creates an account. It creates a class; it invites the student to that class. It then logs out, logs in as that student creates the student account, signs in as the student, joins the class. It then assigns a lesson to the student then the student goes and takes the lesson. And then, when the student submits the lesson, then the test is over. And that confirms all of the most critical flows that we would want someone to drop what they were doing if it's broken, you know, account creation, class creation, lesson creation, and students taking a lesson. JOËL: So you're compressing the first few weeks of school into five minutes. GEOFF: Yes. And I pity the school that has thousands of fake teachers, all named Aaron McCarronson at the school. JOËL: [laughs] GEOFF: But we go through and delete that data every once in a while. But we have a marketer who just started at CommonLit maybe a few weeks ago, and she thought that someone was spamming our signup form because she said, "I see hundreds of teachers named Aaron McCarronson in our user list." JOËL: You had to admit that you were the spammer? GEOFF: Yes, I did. [laughs] We now have some controls to filter those people out of reports. But it's always funny when you look at the list, and you see all these fake people there. JOËL: Do you have any rate limiting on your site? GEOFF: Yeah, we do quite a bit of it, actually. Some of it we do through Cloudflare. We have tools that limit a certain flow, like people trying to credential stuffing our password, our user sign-in forms. But we also do some further stuff to prevent people from hitting key endpoints. We use Rack::Attack, which is a really nice framework. Have you had to do that in client work with clients setting that stuff up? JOËL: I've used Rack:Attack before. GEOFF: Yeah, it's got a reasonably nice interface that you can work with. And I always worry about accidentally setting those things up to be too sensitive, and then you get lots of stuff back. One issue that we sometimes find is that lots of kids at the same school are sharing an IP address. So that's not the thing that we want to use for rate limiting. We want to use some other criteria for rate limiting. JOËL: Right, right. Do you ever find that you rate limit your smoke tests? Or have you had to bypass the rate limiting in the smoke tests? GEOFF: Our smoke tests bypass our rate limiting and our bot detection. So they've got some fingerprints they use to bypass that. JOËL: That must have been an interesting day at the office. GEOFF: Yes. [laughter] With all of these things, I think it's a big challenge to figure out, and it's similar when you're making tests for development, how to make tests that are high signal. So if a test is failing really frequently, even if it's testing something that's worthwhile, if people start ignoring it, then it stops having value as a piece of signal. So we've invested a ton of time in making our test suite as reliable as possible, but you sometimes do have these things that just require a change. I've become a really big fan of...there's a Ruby driver for Capybara called Cuprite, and it doesn't control chrome with Chrome Driver or with Selenium. It controls it with the Chrome DevTools protocol, so it's like a direct connection into the browser. And we find that it's very, very fast and very, very reliable. So we saw that our Capybara specs got significantly more reliable when we started using this as our driver. JOËL: Is this because it's not actually moving the mouse around and clicking but instead issuing commands in the background? GEOFF: Yeah. My understanding of this is a little bit hazy. But I think that Selenium and ChromeDriver are communicating over a network pipe, and sometimes that network pipe is a little bit lossy. And so it results in asynchronous commands where maybe you don't get the feedback back after something happens. And CDP is what Chrome's team and I think what Puppeteer uses to control things directly. So it's great. And you can even do things with it. Like, you can simulate different time zone for a user almost natively. You can speed up or slow down the traveling of time and the direction of time in the browser and all kinds of things like that. You can flip it into mobile mode so that the device reports that it's a touch browser, even though it's not. We have a set of mobile specs where we flip it with CDP into mobile mode, and that's been really good too. Do you find when you're doing client work that you have a demand to build mobile-specific specs for system tests? JOËL: Generally not, no. GEOFF: You've managed to escape it. JOËL: For something that's specific to mobile, maybe one or two tests that have a weird interaction that we know is different on mobile. But in general, we're not doing the whole suite under mobile and the whole suite under desktop. GEOFF: When you hand off a project...it's been a while since you and I have worked together. JOËL: For those who don't know, Geoff used to be with us at thoughtbot. We were colleagues. GEOFF: Yeah, for a while. I remember my very first thoughtbot Summer Summit; you gave a really cool lightning talk about Eleanor of Aquitaine. JOËL: [laughs] GEOFF: That was great. So when you're handing a project off to a client after your ending, do you find that there's a transition period where you're educating them about the norms of the test suite before you leave it in their hands? JOËL: It depends a lot on the client. With many clients, we're working alongside an existing dev team. And so it's not so much one big handoff at the end as it is just building that in the day-to-day, making sure that we are integrating with the team from the outset of the engagement. So one thing that does come up a lot with clients is the performance of their test suite. That's often a concern because the test suite until it becomes a problem, people tend to not treat it very well. And by the time that you're bringing on an external consultant to help, generally, that's one of the areas of the code that's been a little bit neglected. And so people ask for help on making their test suite faster. Is that something that you've had to deal with at CommonLit as well? GEOFF: Yeah, that's a great question. We have struggled a lot with the speed that our test suite...the time it takes for our test suite to run. We've done a few things to improve it. The first is that we have quite a bit of caching that we do in our CI suite around dependencies. So gems get cached separately from NPM packages and browser assets. So all three of those things are independently cached. And then, we run our suites in parallel. Our Jest specs get split up into eight containers. Our Ruby non-system tests...I'd like to say unit tests, but we all know that some of those are actually integration tests. JOËL: [laughs] GEOFF: But those tests run in 15 containers, and they start the moment gems are built. So they don't wait for NPM packages. They don't wait for assets. They immediately start going. And then our system specs as soon as the assets are built kick off and start running. And we actually run that in 40 parallel containers so we can get everything finished. So our CI suite can finish...if there are no dependency bumps and no asset bumps, our specs suite you can finish in just under five minutes. But if you add up all of that time, cumulatively, it's something like 75 minutes is the total execution as it goes. Have you tried FactoryDoctor before for speeding up test suites? JOËL: This is the gem from Evil Martians? GEOFF: Yeah, it's part of TestProf, which is their really, really unbelievable toolkit for improving specs, and they have a whole bunch of things. But one of them will tell you how many invocations of FactoryBot factories each factory got. So you can see a user factory was fired 13,000 times in the test suite. It can even do some tagging where it can go in and add metadata to your specs to show which ones might be candidates for optimization. JOËL: I gave a talk at RailsConf this year titled Your Tests Are Making Too Many Database Calls. GEOFF: Nice. JOËL: And one of the things I talked about was creating a lot more data via factories than you think that you are. And I should give a shout-out to FactoryProf for finding those. GEOFF: Yeah, it's kind of a silent killer with the test suite, and you really don't think that you're doing a whole lot with it, and then you see how many associations. How do you fight that tension between creating enough data that things are realistic versus the streamlining of not creating extraneous things or having maybe mystery guests via associations and things like that? JOËL: I try to have my base factories be as minimal as possible. So if there's a line in there that I can remove, and the factory or the model still saves, then it should be removed. Some associations, you can't do that if there's a foreign key constraint, and so then I'll leave it in. But I am a very hardcore minimalist, at least with the base factory. GEOFF: I think that makes a lot of sense. We use foreign keys all over the place because we're always worried about somehow inserting student data that we can't recover with a bug. So we'd rather blow up than think we recorded it. And as a result, sometimes setting up specs for things like a student answering a multiple choice question on a quiz ends up being this sort of if you give a mouse a cookie thing where it's you need the answer options. You need the question. You need the quiz. You need the activity. You need the roster, the students to be in the roster. There has to be a teacher for the roster. It just balloons out because everything has a foreign key. JOËL: The database requires it, but the test doesn't really care. It's just like, give me a student and make it valid. GEOFF: Yes, yeah. And I find that that challenge is really hard. And sometimes, you don't see how hard it is to enforce things like database integrity until you have a lot of concurrency going on in your application. It was a very rude surprise to me to find out that browser requests if you have multiple servers going on might not necessarily be served in the order that they were made. JOËL: [laughs] So you're talking about a scenario where you're running multiple instances of your app. You make two requests from, say, two browser tabs, and somehow they get served from two different instances? GEOFF: Or not even two browser tabs. Imagine you have a situation where you're auto-saving. JOËL: Oooh, background requests. GEOFF: Yeah. So one of the coolest features we have at CommonLit is that students can annotate and highlight a text. And then, the teachers can see the annotations and highlights they've made, and it's actually part of their assignment often to highlight key evidence in a passage. And those things all fire in the background asynchronously so that it doesn't block the student from doing more stuff. But it also means that potentially if they make two changes to a highlight really quickly that they might arrive out of order. So we've had to do some things to make sure that we're receiving in the right order and that we're not blowing away data that was supposed to be there. Just think about in a Heroku environment, for example, which is where we used to be, you'd have four dynos running. If dyno one takes too long to serve the thing for dyno two, request one may finish after request two. That was a very, very rude surprise to learn that the world was not as clean and neat as I thought. JOËL: I've had to do something similar where I'm making a bunch of background requests to a server. And even with a single dyno, it is possible for your request to come back out of order just because of how TCP works. So if it's waiting for a packet and you have two of these requests that went out not too long before each other, there's no guarantee that all the packets for request one come back before all the packets from request two. GEOFF: Yeah, what are the strategies for on the client side for dealing with that kind of out-of-order response? JOËL: Find some way to effectively version the requests that you make. Timestamp is an easy one. Whenever a request comes in, you take the response from the latest timestamp, and that wins out. GEOFF: Yeah, we've started doing some unique IDs. And part of the unique ID is the browser's timestamp. We figure that no one would try to hack themselves and intentionally screw up their own data by submitting out of order. JOËL: Right, right. GEOFF: It's funny how you have to pick something to trust. [laughs] JOËL: I'd imagine, in this case, if somebody did mess around with it, they would really only just be screwing up their own UI. It's not like that's going to then potentially crash the server because of something, and then you've got a potential vector for a denial of service. GEOFF: Yeah, yeah, that's always what we're worried about, and we have to figure out how to trust these sorts of requests as what's a valid thing and what is, as you're saying, is just the user hurting themselves as opposed to hurting someone else's stuff? MID-ROLL AD: Debugging errors can be a developer's worst nightmare...but it doesn't have to be. Airbrake is an award-winning error monitoring, performance, and deployment tracking tool created by developers for developers that can actually help cut your debugging time in half. So why do developers love Airbrake? It has all of the information that web developers need to monitor their application - including error management, performance insights, and deploy tracking! Airbrake's debugging tool catches all of your project errors, intelligently groups them, and points you to the issue in the code so you can quickly fix the bug before customers are impacted. In addition to stellar error monitoring, Airbrake's lightweight APM helps developers to track the performance and availability of their application through metrics like HTTP requests, response times, error occurrences, and user satisfaction. Finally, Airbrake Deploy Tracking helps developers track trends, fix bad deploys, and improve code quality. Since 2008, Airbrake has been a staple in the Ruby community and has grown to cover all major programming languages. Airbrake seamlessly integrates with your favorite apps to include modern features like single sign-on and SDK-based installation. From testing to production, Airbrake notifiers have your back. Your time is valuable, so why waste it combing through logs, waiting for user reports, or retrofitting other tools to monitor your application? You literally have nothing to lose. Head on over to airbrake.io/try/bikeshed to create your FREE developer account today! GEOFF: You were talking about test suites. What are some things that you have found are consistently problems in real-world apps, but they're really, really hard to test in a test suite? JOËL: Difficult to test or difficult to optimize for performance? GEOFF: Maybe difficult to test. JOËL: Third-party integrations. Anything that's over the network that's going to be difficult. Complex interactions that involve some heavy frontend but then also need a lot of backend processing potentially with asynchronous workers or something like that, there are a lot of techniques that we can use to make all those play together, but that means there's a lot of complexity in that test. GEOFF: Yeah, definitely. I've taken a deep interest in what I'm sure there's a better technical term for this, but what I call network hostile environments or bandwidth hostile environments. And we see this a lot with kids. Especially during the pandemic, kids would often be trying to do their assignments from home. And maybe there are five kids in the house, and they're all trying to do their homework at the same time. And they're all sharing a home internet connection. Maybe they're in the basement because they're trying to get some peace and quiet so they can do their assignment or something like that. And maybe they're not strongly connected. And the challenge of dealing with intermittent connectivity is such an interesting problem, very frustrating but very interesting to deal with. JOËL: Have you explored at all the concept of Formal Methods to model or verify situations like that? GEOFF: No, but I'm intrigued. Tell me more. JOËL: I've not tried it myself. But I've read some articles on the topic. Hillel Wayne is a good person to follow for this. GEOFF: Oh yeah. JOËL: But it's really fascinating when you'll see, okay, here are some invariants and things. And then here are some things that you set up some basic properties for a system. And then some of these modeling languages will then poke holes and say, hey, it's possible for this 10-step sequence of events to happen that will then crash your server. Because you didn't think that it's possible for five people to be making concurrent requests, and then one of them fails and retries, whatever the steps are. So it's really good at modeling situations that, as developers, we don't always have great intuition, things like parallelism. GEOFF: Yeah, that sounds so interesting. I'm going to add that to my list of reading for the fall. Once the school year calms down, I feel like I can dig into some technical topics again. I've got this book sitting right next to my desk, Designing Data-Intensive Applications. I saw it referenced somewhere on Twitter, and I did the thing where I got really excited about the book, bought it, and then didn't have time to read it. So it's just sitting there unopened next to my desk, taunting me. JOËL: What's the 30-second spiel for what is a data-intensive app, and why should we design for it differently? GEOFF: You know, that's a great question. I'd probably find out if I'd dug further into the book. JOËL: [laughs] GEOFF: I have found at CommonLit that we...I had a couple of clients at thoughtbot that dealt with data at the scale that we deal with here. And I'm sure there are bigger teams doing, quote, "bigger data" than we're doing. But it really does seem like one of our key challenges is making sure that we just move data around fast enough that nothing becomes a bottleneck. We made a really key optimization in our application last year where we changed the way that we autosave students' answers as they go. And it resulted in a massive increase in throughput for us because we went from trying to store updated versions of the students' final answers to just storing essentially a draft and often storing that draft in local storage in the browser and then updating it on the server when we could. And then, as a result of this, we're making key updates to the table where we store a student's answers much less frequently. And that has a huge impact because, in addition to being one of the biggest tables at CommonLit...it's got almost a billion recorded answers that we've gotten from students over the years. But because we're not writing to it as often, it also means that reads that are made from the table, like when the teacher is getting a report for how the students are doing in a class or when a principal is looking at how a school is doing, now, those queries are seeing less contention from ongoing writes. And so we've seen a nice improvement. JOËL: One strategy I've seen for that sort of problem, especially when you have a very write-heavy table but that also has a different set of users that needs to read from it, is to set up a read replica. So you have your main that is being written to, and then the read replica is used for reports and people who need to look at the data without being in contention with the table being written. GEOFF: Yeah, Rails multi-DB support now that it's native to the framework is excellent. It's so nice to be able to just drop that in and fire it up and have it work. We used to use a solution that Instacart had built. It was great for our needs, but it wasn't native to the framework. So every single time we upgraded Rails, we had to cross our fingers and hope that it didn't, you know, whatever private APIs of ActiveRecord it was using hadn't broken. So now that that stuff, which I think was open sourced from GitHub's multi-database implementation, so now that that's all native in Rails, it's really, really nice to be able to use that. JOËL: So these kinds of database tricks can help make the application much more performant. You'd mentioned earlier that when you were trying to make your test performant that you had introduced parallelism, and I feel like that's maybe a bit of an intimidating thing for a lot of people. How would you go about converting a test suite that's just vanilla RSpec, single-threaded, and then moving it in a direction of being more parallel? GEOFF: There's a really, really nice tool called Knapsack, which has a free version. But the pro version, I feel like if you're spending any money at all on CI, it's immediately worth the cost. I think it's something like $75 a month for each suite that you run on it. And Knapsack does this dynamic allocation of tests across containers. And it interfaces with several of the popular CI providers so that it looks at environment variables and can tell how many containers you're splitting across. It'll do some things, like if some of your containers start early and some of them start late, it will distribute the work so that they all end at the same time, which is really nice. We've preferred CI providers that charge by the minute. So rather than just paying for a service that we might not be using, we've used services like Semaphore, and right now, we're on Buildkite, which charge by the minute, which means that you can decide to do as much parallelism as you want. You're just paying for the compute time as you run things. JOËL: So that would mean that two minutes of sequential build time costs just the same as splitting it up in parallel and doing two simultaneous minutes of build time. GEOFF: Yeah, that is almost true. There's a little bit of setup time when a container spins up. And that's one of the key things that we optimize. I guess if we ran 200 containers if we were like Shopify or something like that, we could technically make our CI suite finish faster, but it might cost us three times as much. Because if it takes a container 30 seconds to spin up and to get ready, that's 30 seconds of dead time when you're not testing, but you're paying for the compute. So that's one of the key optimizations that we make is figuring out how many containers do we need to finish fast when we're not just blowing time on starting and finishing? JOËL: Right, because there is a startup cost for each container. GEOFF: Yeah, and during the work day when our engineers are working along, we spin up 200 EC2 machines or 150 EC2 machines, and they're there in the fleet, and they're ready to go to run CI jobs for us. But if you don't have enough machines, then you have jobs that sit around waiting to start, that sort of thing. So there's definitely a tension between figuring out how much parallelism you're going to do. But I feel like to start; you could always break your test suite into four pieces or two pieces and just see if you get some benefit to running a smaller number of tests in parallel. JOËL: So, manually splitting up the test suite. GEOFF: No, no, using something like Knapsack Pro where you're feeding it the suite, and then it's dividing up the tests for you. I think manually splitting up the suite is probably not a good practice overall because I'm guessing you'll probably spend more engineering time on fiddling with which tests go where such that it wouldn't be cost-effective. JOËL: So I've spent a lot of time recently working to improve a parallel test suite. And one of the big problems that you have is trying to make sure that all of your parallel surfaces are being used efficiently, so you have to split the work evenly. So if you said you have 70 minutes worth of work, if you give 50 minutes to one worker and 20 minutes to the other, that means that your total test suite is still 50 minutes, and that's not good. So ideally, you split it as evenly as possible. So I think there are three evolutionary steps on the path here. So you start off, and you're going to manually split things out. So you're going to say our biggest chunk of tests by time are the feature specs. We'll make them almost like a separate suite. Then we'll make the models and controllers and views their own thing, and that's roughly half and half, and run those. And maybe you're off by a little bit, but it's still better than putting them all in one. It becomes difficult, though, to balance all of these because then one might get significantly longer than the other then, you have to manually rebalance it. It works okay if you're only splitting it among two workers. But if you're having to split it among 4, 8, 16, and more, it's not manageable to do this, at least not by hand. If you want to get fancy, you can try to automate that process and record a timing file of how long every file takes. And then when you kick off the build process, look at that timing file and say, okay, we have 70 minutes, and then we'll just split the file so that we have roughly 70 divided by number of workers' files or minutes of work in each process. And that's what gems like parallel_tests do. And Knapsack's Classic mode works like this as well. That's decently good. But the problem is you're working off of past information. And so if the test has changed or just if it's highly variable, you might not get a balanced set of workers. And as you mentioned, there's a startup cost, and so not all of your workers boot up at the same time. And so you might still have a very uneven amount of work done by each worker by statically determining the work to be done via a timing file. So the third evolution here is a dynamic or a self-balancing approach where you just put all of the tests or the files in a queue and then just have every worker pull one or two tests when it's ready to work. So that way, if something takes a lot longer than expected, well, it's just not pulling more from the queue. And everybody else still pulls, and they end up all balancing each other out. And then ideally, every worker finishes work at exactly the same time. And that's how you know you got the most value you could out of your parallel processes. GEOFF: Yeah, there's something about watching all the jobs finish in almost exactly, you know, within 10 seconds of each other. It just feels very, very satisfying. I think in addition to getting this dynamic splitting where you're getting either per file or per example split across to get things finishing at the same time, we've really valued getting fast feedback. So I mentioned before that our Jest specs start the moment NPM packages get built. So as soon as there's JavaScripts that can be executed in test, those kick-off. As soon as our gems are ready, the RSpec non-system tests go off, and they start running specs immediately. So we get that really, really fast feedback. Unfortunately, the browser tests take the longest because they have to wait for the most setup. They have the most dependencies. And then they also run the slowest because they run in the browser and everything. But I think when things are really well-oiled, you watch all of those containers end at roughly the same time, and it feels very satisfying. JOËL: So, a few weeks ago, on an episode of The Bike Shed, I talked with Eebs Kobeissi about dependency graphs and how I'm super excited about it. And I think I see a dependency graph in what you're describing here in that some things only depend on the gem file, and so they can start working. But other things also depend on the NPM packages. And so your build pipeline is not one linear process or one linear process that forks into other linear processes; it's actually a dependency graph. GEOFF: That is very true. And the CI tool we used to use called Semaphore actually does a nice job of drawing the dependency graph between all of your steps. Buildkite does not have that, but we do have a bunch of steps that have to wait for other steps to finish. And we do it in our wiki. On our repo, we do have a diagram of how all of this works. We found that one of the things that was most wasteful for us in CI was rebuilding gems, reinstalling NPM packages (We use Yarn but same thing.), and then rebuilding browser assets. So at the very start of every CI run, we build hashes of a bunch of files in the repository. And then, we use those hashes to name Docker images that contain the outputs of those files so that we are able to skip huge parts of our CI suite if things have already happened. So I'll give an example if Ruby gems have not changed, which we would know by the Gemfile.lock not having changed, then we know that we can reuse a previously built gems image that has the gems that just gets melted in, same thing with yarn.lock. If yarn.lock hasn't changed, then we don't have to build NPM packages. We know that that already exists somewhere in our Docker registry. In addition to skipping steps by not redoing work, we also have started to experiment...actually, in response to a comment that Chris Toomey made in a prior Bike Shed episode, we've started to experiment with skipping irrelevant steps. So I'll give an example of this if no Ruby files have changed in our repository, we don't run our RSpec unit tests. We just know that those are valid. There's nothing that needs to be rerun. Similarly, if no JavaScript has changed, we don't run our Jest tests because we assume that everything is good. We don't lint our views with erb-lint if our view files haven't changed. We don't lint our factories if the model or the database hasn't changed. So we've got all these things to skip key types of processing. I always try to err on the side of not having a false pass. So I'm sure we could shave this even tighter and do even less work and sometimes finish the build even faster. But I don't want to ever have a thing where the build passes and we get false confidence. JOËL: Right. Right. So you're using a heuristic that eliminates the really obvious tests that don't need to be run but the ones that maybe are a little bit more borderline, you keep them in. Shaving two seconds is not worth missing a failure. GEOFF: Yeah. And I've read things about big enterprises doing very sophisticated versions of this where they're guessing at which CI specs might be most relevant and things like that. We're nowhere near that level of sophistication right now. But I do think that once you get your test suite parallelized and you're not doing wasted work in the form of rebuilding dependencies or rebuilding assets that don't need to be rebuilt, there is some maybe not low, maybe medium hanging fruit that you can use to get some extra oomph out of your test suite. JOËL: I really like that you brought up this idea of infrastructure and skipping. I think in my own way of thinking about improving test suites, there are three broad categories of approaches you can take. One variable you get to work with is that total number of time single-threaded, so you mentioned 70 minutes. You can make that 70 minutes shorter by avoiding database writes where you don't need them, all the common tricks that we would do to actually change the test themselves. Then we can change...as another variable; we get to work with parallelism, we talked about that. And then finally, there's all that other stuff that's not actually executing RSpec like you said, loading the gems, installing NPM packages, Docker images. All of those, if we can skip work running migrations, setting up a database, if there are situations where we can improve the speed there, that also improves the total time. GEOFF: Yeah, there are so many little things that you can pick at to...like, one of the slowest things for us is Elasticsearch. And so we really try to limit the number of specs that use Elasticsearch if we can. You actually have to opt-in to using Elasticsearch on a spec, or else we silently mock and disable all of the things that happen there. When you're looking at that first variable that you were talking about, just sort of the overall time, beyond using FactoryDoctor and FactoryProf, is there anything else that you've used to just identify the most egregious offenders in a test suite and then figure out if they're worth it? JOËL: One thing you can do is hook into Active Support notification to try to find database writes. And so you can find, oh, here's where all of the...this test is making way too many database writes for some reason, or it's making a lot, maybe I should take a look at it; it's a hotspot. GEOFF: Oh, that's really nice. There's one that I've always found is like a big offender, which is people doing negative expectations in system specs. JOËL: Oh, for their Capybara wait time. GEOFF: Yeah. So there's a really cool gem, and the name of it is eluding me right now. But there's a gem that raises a special exception if Capybara waits the full time for something to happen. So it lets you know that those things exist. And so we've done a lot of like hunting for...Knapsack will report the slowest examples in your test suite. So we've done some stuff to look for the slowest files and then look to see if there are examples of these negative expectations that are waiting 10 seconds or waiting 8 seconds before they fail. JOËL: Right. Some files are slow, but they're slow for a reason. Like, a feature spec is going to be much slower than a model test. But the model tests might be very wasteful and because you have so many of them, if you're doing the same pattern in a bunch of them or if it's a factory that's reused across a lot of them, then a small fix there can have some pretty big ripple effects. GEOFF: Yeah, I think that's true. Have you ever done any evaluation of test suite to see what files or examples you could throw away? JOËL: Not holistically. I think it's more on an ad hoc basis. You find a place, and you're like, oh, these tests we probably don't need them. We can throw them out. I have found dead tests, tests that are not executed but still committed to the repo. GEOFF: [laughs] JOËL: It's just like, hey, I'm going to get a lot of red in my diff today. GEOFF: That always feels good to have that diff-y check-in, and it's 250 lines or 1,000 lines of red and 1 line of green. JOËL: So that's been a pretty good overview of a lot of different areas related to performance and infrastructure around tests. Thank you so much, Geoff, for joining us today on The Bike Shed to talk about your experience at CommonLit doing this. Do you have any final words for our listeners? GEOFF: Yeah. CommonLit is hiring a senior full-stack engineer, so if you'd like to work on Rails and TypeScript in a place with a great test suite and a great team. I've been here for five years, and it's a really, really excellent place to work. And also, it's been really a pleasure to catch up with you again, Joël. JOËL: And, Geoff, where can people find you online? GEOFF: I'm Geoff with a G, G-E-O-F-F Harcourt, @geoffharcourt. And that's my name on Twitter, and it's my name on GitHub, so you can find me there. JOËL: And we'll make sure to include a link to your Twitter profile in the show notes. The show notes for this episode can be found at bikeshed.fm. This show is produced and edited by Mandy Moore. If you enjoyed listening, one really easy way to support the show is to leave us a quick rating or even a review in iTunes. It really helps other folks find the show. If you have any feedback, you can reach us at @_bikeshed or reach me at @joelquen on Twitter or at hosts@bikeshed.fm via email. Thank you so much for listening to The Bike Shed, and we'll see you next week. Byeeeeeee!!!!!! ANNOUNCER: This podcast was brought to you by thoughtbot. thoughtbot is your expert design and development partner. Let's make your product and team a success.
Lohnt es sich ein IT-Fachbuch zu schreiben?Es gibt zu jeder Software und zu jedem IT-Thema mindestens ein Buch. Doch wie ist es eigentlich, ein solches Buch zu schreiben? Was macht ein Verlag und braucht man diesen in der heutigen Zeit eigentlich noch? Wird man dadurch reich oder bleibt es Hungerlohn? Was für ein Tech-Stack steckt hinter einem Buch? Und wie würde man eigentlich starten? All das klären wir in dieser Episode mit Wolfgang, der ein Buch über MySQL im Rheinwerk-Verlag (Galileo Press) publiziert hat.Bonus: Warum Wolfgang eher ein Fan vom Reden ist und warum er wirklich so lange für seinen Dr. Titel benötigt hat.Feedback (gerne auch als Voice Message)Email: stehtisch@engineeringkiosk.devTwitter: https://twitter.com/EngKioskWhatsApp +49 15678 136776Gerne behandeln wir auch euer Audio Feedback in einer der nächsten Episoden, einfach Audiodatei per Email oder WhatsApp Voice Message an +49 15678 136776LinksBewerbungsgespräche mit Wolfgang Gassler, Interview von Nils Langner: https://www.youtube.com/watch?v=-c3ZAp7MvTIMySQL: Das umfassende Handbuch: https://www.rheinwerk-verlag.de/mysql-das-umfassende-handbuch/ LaTeX: https://www.latex-project.org/ct Magazin: https://www.heise.de/ct/Rheinwerk Verlag (Galileo Press): https://www.rheinwerk-verlag.de/Michael Kofler: https://kofler.info/ Buch "Designing Data-Intensive Applications": https://dataintensive.net/Buch "MAKE" von Pieter Levels: https://readmake.com/Use the Index, Luke (Markus Winand): https://use-the-index-luke.com/Simon Sinek: Start with why: https://www.youtube.com/watch?v=u4ZoJKF_VuABuch "Observability Engineering": https://www.oreilly.com/library/view/observability-engineering/9781492076438/Honeycomb: https://www.honeycomb.io/LeanPub: https://leanpub.com/Amazon Print on Demand: https://www.amazon.de/Buecher-Print-On-Demand/b?ie=UTF8&node=5445727031Episode #35 Knowledge Sharing oder die Person, die nie "gehen" sollte...: https://engineeringkiosk.dev/podcast/episode/35-knowledge-sharing-oder-die-person-die-nie-gehen-sollte/Sprungmarken(00:00:00) Intro(00:00:40) Schreib-Skills(00:02:42) Thema: Das schreiben von IT Fachbüchern - MySQL, das umfassende Handbuch(00:03:39) Wie gut kennst du dich im Bereich MySQL aus?(00:05:19) Überblick über Wolfgangs MySQL Buch(00:09:34) Wie lange habt ihr benötigt, das Buch zu schreiben?(00:11:24) Welcher Tech-Stack wurde genutzt, um das Buch zu schreiben?(00:13:43) Welche Aufgaben hat der Verlag übernommen?(00:18:54) Wie hast du die Verbindung zum Verlag aufgebaut?(00:21:17) Was verdient man mit einem IT Fachbuch?(00:28:21) Warum hast du ein Buch geschrieben? Was bringt es ein Buch zu schreiben?(00:30:27) Was hat sich durch das Buch bei dir professionell ergeben?(00:32:22) Wieso habt ihr das Schreiben einer neuen Edition abgelehnt?(00:33:56) Imposter-Syndrom: Wie viel weißt du über das MySQL Datenbank-Thema, nach deiner Recherche?(00:35:52) Wie viele Amazon-Reviews hast du gekauft?(00:38:39) Liest du IT-Fachbücher?(00:43:24) Würdest du nochmal ein Buch schreiben?(00:46:32) Würdest du Leuten empfehlen, ein Buch zu schreiben?(00:51:00) Was würdest du Leuten empfehlen, die ein Buch schreiben möchten?(00:53:25) Würdest du nochmal ein Buch über eine Software/Software-Version schreiben?(00:55:50) Hattest du schon Berührungen zu einem Ghostwriter?(00:58:46) Lohnt sich das Mittel- und Langfristig ein Buch zu schreiben?(01:01:22) Selbst-Publishing und Print on Demand(01:03:56) Audio-Rants, Reddit und OutroHostsWolfgang Gassler (https://twitter.com/schafele)Andy Grunwald (https://twitter.com/andygrunwald)Feedback (gerne auch als Voice Message)Email: stehtisch@engineeringkiosk.devTwitter: https://twitter.com/EngKioskWhatsApp +49 15678 136776
Распаковываем трудоустройство в Bookmate: CTO Василий Колесников рассказывает про работу сервиса, взаимодействие между командами разработки и процесс найма.⚠ Запись данного аудио была произведена 16 февраля 2022г. С тех пор многие аспекты бизнеса компании Bookmate изменились вслед за событиями, происходящими в мире. Актуальную информацию о вакансиях уточняйте письмом на work@bookmate.com. Мы всегда на связи!
这一期我们讨论Designing Data-Intensive Applications书中partitioning这一章的学习笔记。
We wrap up the discussion on partitioning from our collective favorite book, Designing Data-Intensive Applications, while Allen is properly substituted, Michael can't stop thinking about Kafka, and Joe doesn't live in the real sunshine state.
We wrap up the discussion on partitioning from our collective favorite book, Designing Data-Intensive Applications, while Allen is properly substituted, Michael can't stop thinking about Kafka, and Joe doesn't live in the real sunshine state.
We crack open our favorite book again, Designing Data-Intensive Applications by Martin Kleppmann, while Joe sounds different, Michael comes to a sad realization, and Allen also engages "no take backs".
We crack open our favorite book again, Designing Data-Intensive Applications by Martin Kleppmann, while Joe sounds different, Michael comes to a sad realization, and Allen also engages "no take backs".
In this episode, Senior Research Associate from Cambridge University and author of "Designing Data Intensive Applications" Martin Kleppmann discusses how they are finding ways to make software collaboration and ownership properly coexist through their "local-first software" project, which will allow users to work offline without compromising important security aspects and user data control.
We wrap up our replication discussion of Designing Data-Intensive Applications, this time discussing leaderless replication strategies and issues, while Allen missed his calling, Joe doesn't read the gray boxes, and Michael lives in a future where we use apps.
We wrap up our replication discussion of Designing Data-Intensive Applications, this time discussing leaderless replication strategies and issues, while Allen missed his calling, Joe doesn't read the gray boxes, and Michael lives in a future where we use apps.
We continue our discussion of Designing Data-Intensive Applications, this time focusing on multi-leader replication, while Joe is seriously tired, and Allen is on to Michael's shenanigans.
We continue our discussion of Designing Data-Intensive Applications, this time focusing on multi-leader replication, while Joe is seriously tired, and Allen is on to Michael's shenanigans.
We dive back into Designing Data-Intensive Applications to learn more about replication while Michael thinks cluster is a three syllable word, Allen doesn't understand how we roll, and Joe isn't even paying attention.
We dive back into Designing Data-Intensive Applications to learn more about replication while Michael thinks cluster is a three syllable word, Allen doesn't understand how we roll, and Joe isn't even paying attention.
Quelle interface choisir pour les utilisateurs de mon site ? Le nouveau système de recommandation de mon application a-t-il un impact ? Pour qui ? Un test AB permet de comparer deux variantes afin d'identifier la plus efficace pour l'objectif recherché. Aujourd'hui, je reçois Cyril De Catheu, Data Engineer @ AB Tasty pour discuter des dessous d'une plateforme d'expérimentations
Conversamos sobre arquitetura de sistemas, arquitetura limpa e hexagonal e muito mais. O quão complexo é fazer sistemas grandes e manter uma boa arquitetura? Participantes: Paulo Silveira, o host que usa Mac e AndroidRodrigo Caneppele, desenvolvedor e instrutor na Alura e CaelumOtávio Lemos, desenvolvedor e instrutorVinicius Dias, desenvolvedor na Achievers e instrutor na AluraRoberta Arcoverde, a co-host que acha arquitetura hexagonal um nome engraçadoMaurício "Balboa" Linhares, o co-host que só veio pra reclamar Links: Canal do OtávioCanal do MárioCanal do Vinicius DiasLivro "The Architecture of Open Source Applications"Livro "Designing Data-Intensive Applications"Inscreva-se na newsletter Imersão, Aprendizado e Tecnologia Produção e conteúdo: Alura Cursos de Tecnologia - https://www.alura.com.br === Caelum Escola de Tecnologia - https://www.caelum.com.br/ Edição e sonorização: Radiofobia Podcast e Multimídia
Conversamos sobre arquitetura de sistemas, arquitetura limpa e hexagonal e muito mais. O quão complexo é fazer sistemas grandes e manter uma boa arquitetura? Participantes: Paulo Silveira, o host que usa Mac e AndroidRodrigo Caneppele, desenvolvedor e instrutor na Alura e CaelumOtávio Lemos, desenvolvedor e instrutorVinicius Dias, desenvolvedor na Achievers e instrutor na AluraRoberta Arcoverde, a co-host que acha arquitetura hexagonal um nome engraçadoMaurício "Balboa" Linhares, o co-host que só veio pra reclamar Links: Canal do OtávioCanal do MárioCanal do Vinicius DiasLivro "The Architecture of Open Source Applications"Livro "Designing Data-Intensive Applications"Inscreva-se na newsletter Imersão, Aprendizado e Tecnologia Produção e conteúdo: Alura Cursos de Tecnologia - https://www.alura.com.br === Caelum Escola de Tecnologia - https://www.caelum.com.br/ Edição e sonorização: Radiofobia Podcast e Multimídia
Mais qui est Mathieu Corbin ? J'interroge un nouvel arrivant dans les animateurs de #RadioDevOps.
Software Engineer Andrey Goncharov joins the React Round Up crew to discuss how his company Hazelcast has approached visualizing hundreds of data points on potentially hundreds of computers in a way that makes sense to users. Dust off your math skills - it gets a little technical along the way as they discuss graphs, charts, performance optimizations and bottlenecks, and even handling accessibility of these data-intensive graphs. If you ever have to debug system failures and anomalies, this will be a worthwhile episode to check out. Panel Paige Niedringhaus TJ VanToll Guest Andrey Goncharov Sponsors Next Level Mastermind Raygun | Click here to get started on your free 14-day trial Links Hazelcast | The Leading In-Memory Computing Platform Chart.js | Open Source HTML5 Charts for your Website Picks Andrey- Designing Data-Intensive Applications by Martin Kleppmann Paige- Netflix Series | The Queen’s Gambit TJ- City of Stairs: A Novel (The Divine Cities) by Robert Jackson Bennett
Software Engineer Andrey Goncharov joins the React Round Up crew to discuss how his company Hazelcast has approached visualizing hundreds of data points on potentially hundreds of computers in a way that makes sense to users. Dust off your math skills - it gets a little technical along the way as they discuss graphs, charts, performance optimizations and bottlenecks, and even handling accessibility of these data-intensive graphs. If you ever have to debug system failures and anomalies, this will be a worthwhile episode to check out. Panel Paige Niedringhaus TJ VanToll Guest Andrey Goncharov Sponsors Next Level Mastermind Raygun | Click here to get started on your free 14-day trial Links Hazelcast | The Leading In-Memory Computing Platform Chart.js | Open Source HTML5 Charts for your Website Picks Andrey- Designing Data-Intensive Applications by Martin Kleppmann Paige- Netflix Series | The Queen’s Gambit TJ- City of Stairs: A Novel (The Divine Cities) by Robert Jackson Bennett
第 3 期:漫谈写作和个人成长 主持: 杨文, 欧长坤 嘉宾: 曹春晖, 饶全成 本期摘要:这是 Go 夜聊的第 3 期节目,这期节目嘉宾请到了我们在 Go 夜读 的老朋友们曹春晖(曹大)、饶全成(饶大),一起聊了聊他们与 Golang 相关的成长经历。本期节目主要包括了曹大编写《Go 语言高级编程》背后的故事、做过的开源项目、Golang 在企业面试、国内大环境中的现状,以及如何学习成长等话题。 时间线 00:15 开场 00:42 饶大和曹大的自我介绍 03:10 创作《Go 语言高级编程》的幕后 11:21 聊聊博客和写作 22:48 cch123/elasticsql 开源项目的由来 27:17 成为 Go Contributor 的过程 40:40 聊聊面试 62:03 Golang 在国内的现状 65:50 工程师最重要的学习方法 67:43 科学的搜索方法 81:15 聊聊玩过的那些游戏 88:00 平常听哪些播客 89:02 推荐环节
Shreya Shankar is a Machine Learning Engineer at Viaduct AI. She's a master's student at Stanford and has previously worked at Facebook and Google Brain. She writes some truly excellent articles about machine learning on her personal blog, https://www.shreya-shankar.com/ (https://www.shreya-shankar.com/) Want to level-up your skills in machine learning and software engineering? Join the ML Engineered Newsletter: http://bit.ly/mle-newsletter (http://bit.ly/mle-newsletter) Comments? Questions? Submit them here: http://bit.ly/mle-survey (http://bit.ly/mle-survey) Follow Charlie on Twitter: https://twitter.com/CharlieYouAI (https://twitter.com/CharlieYouAI) Take the Giving What We Can Pledge: https://www.givingwhatwecan.org/ (https://www.givingwhatwecan.org/) Subscribe to ML Engineered: https://mlengineered.com/listen (https://mlengineered.com/listen) Timestamps: 01:30 Follow Charlie on Twitter (http://twitter.com/charlieyouai (http://twitter.com/charlieyouai)) 02:40 How Shreya got started in CS 06:00 Choosing to concentrate in systems in undergrad (https://www.shreya-shankar.com/systems/ (https://www.shreya-shankar.com/systems/)) 12:25 Research at Google Brain on fooling humans with adversarial examples (http://papers.nips.cc/paper/7647-adversarial-examples-that-fool-both-computer-vision-and-time-limited-humans.pdf (http://papers.nips.cc/paper/7647-adversarial-examples-that-fool-both-computer-vision-and-time-limited-humans.pdf)) 18:00 Deciding to go into industry instead of pursuing a PhD (https://www.shreya-shankar.com/new-grad-advice/ (https://www.shreya-shankar.com/new-grad-advice/)) 19:35 Why is putting ML into production so hard? (https://www.shreya-shankar.com/making-ml-work/ (https://www.shreya-shankar.com/making-ml-work/)) 25:00 Best of the research graveyard 29:05 Checklist for building an ML model for production 34:10 Ensuring reproducibility 39:25 Back to the checklist 44:25 PM for ML engineering 48:50 Monitoring ML deployments 53:50 Fighting ML bias 58:45 Feature engineering best practices 01:02:30 Remote collaboration on data science projects 01:07:45 AI Saviorism (https://www.shreya-shankar.com/ai-saviorism/ (https://www.shreya-shankar.com/ai-saviorism/)) 01:17:40 Rapid Fire Questions Links: https://www.shreya-shankar.com/systems/ (Why you should major in systems) http://papers.nips.cc/paper/7647-adversarial-examples-that-fool-both-computer-vision-and-time-limited-humans.pdf (Adversarial Examples that Fool Both Computer Vision and Time-Limited Humans) https://www.shreya-shankar.com/new-grad-advice/ (Choosing between a PhD and industry for new computer science graduates) https://www.shreya-shankar.com/making-ml-work/ (Reflecting on a year of making machine learning actually useful) https://www.shreya-shankar.com/ai-saviorism/ (Get rid of AI Saviorism) https://dataintensive.net/ (Designing Data Intensive Applications)
Пару недель назад в сайд-проекте решил попробовать MongoDB. Несколько поделился первыми ощущениями от MongoDB. Коснулись [доклада][1] Айка Саргсяна (Юла) "Крупный проект на одной NoSQL". Обсудили общую тему базы данных для стартапа на начальной стадии. Коснулись хранения данных в MongoDB, [спецификацию BSON][2], join'ы, мигарции. Упомянули отличную книгу [Designing Data-Intensive Applications][3]. ## Overlay fs Неплохая свежая [статья][4] объясняющая как работают docker images на примере overlay fs. Немного про организацию наших проектов: gitlab, registry, CI, k8s. [1]: https://www.youtube.com/watch?v=ZLOFOxsDJIY [2]: http://bsonspec.org/spec.html [3]: https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321 [4]: https://dev.to/napicella/how-are-docker-images-built-a-look-into-the-linux-overlay-file-systems-and-the-oci-specification-175n
JavaScript Remote Conf 2020 May 14th to 15th - register now! In this episode of Elixir Mix, the panelists talk with Alvise Susmel about building Elixir systems that leverage Python image detection on video frames. We cover Ports vs NIFs, using platforms for their strengths, cool embedded hardware, displaying real time results in Phoenix or Scenic, and much more! Panelists Sophie DeBenedetto Mark Ericksen Guest Alvise Susmel Sponsors CacheFly ____________________________________________________________ "The MaxCoders Guide to Finding Your Dream Developer Job" by Charles Max Wood is now available on Amazon. Get Your Copy Today! ____________________________________________________________ Links Real-time Object Detection with Phoenix and Python cvlib GitHub opencv/opencv Hexdocs Elixir/Port Erlang 4 Ports Yolo Erlang 8 NIFs Jetson Nano Developer Kit GitHub boydm/scenic Poeticcoding Picks Sophie DeBenedetto: Black Hat Go Love Is Blind Mark Ericksen: Hollywood.computer Alvise Susmel: Outside Elixir Designing Data-Intensive Applications Dark
JavaScript Remote Conf 2020 May 14th to 15th - register now! In this episode of Elixir Mix, the panelists talk with Alvise Susmel about building Elixir systems that leverage Python image detection on video frames. We cover Ports vs NIFs, using platforms for their strengths, cool embedded hardware, displaying real time results in Phoenix or Scenic, and much more! Panelists Sophie DeBenedetto Mark Ericksen Guest Alvise Susmel Sponsors CacheFly ____________________________________________________________ "The MaxCoders Guide to Finding Your Dream Developer Job" by Charles Max Wood is now available on Amazon. Get Your Copy Today! ____________________________________________________________ Links Real-time Object Detection with Phoenix and Python cvlib GitHub opencv/opencv Hexdocs Elixir/Port Erlang 4 Ports Yolo Erlang 8 NIFs Jetson Nano Developer Kit GitHub boydm/scenic Poeticcoding Picks Sophie DeBenedetto: Black Hat Go Love Is Blind Mark Ericksen: Hollywood.computer Alvise Susmel: Outside Elixir Designing Data-Intensive Applications Dark
We dig into the details of how databases use B-trees as we continue our discussion of Designing Data-Intensive Applications while Michael's description of median is awful, live streaming isn't for Allen, and Joe really wants to bring us back from the break.
We dig into the details of how databases use B-trees as we continue our discussion of Designing Data-Intensive Applications while Michael's description of median is awful, live streaming isn't for Allen, and Joe really wants to bring us back from the break.
It's time to learn about SSTables and LSM-Trees as Joe feels pretty zacked, Michael clarifies what he was looking forward to, and Allen has opinions about Dr Who.
It's time to learn about SSTables and LSM-Trees as Joe feels pretty zacked, Michael clarifies what he was looking forward to, and Allen has opinions about Dr Who.
In this episode, Allen is back, Joe knows his maff, and Michael brings the jokes, all that and more as we discuss the internals of how databases store and retrieve the data we save as we continue our deep dive into Designing Data-Intensive Applications.
In this episode, Allen is back, Joe knows his maff, and Michael brings the jokes, all that and more as we discuss the internals of how databases store and retrieve the data we save as we continue our deep dive into Designing Data-Intensive Applications.
We dive into declarative vs imperative query languages as we continue to dive into Designing Data-Intensive Applications while Allen is gallivanting around London, Michael had a bullish opinion, and Joe might not know about The Witcher.
We dive into declarative vs imperative query languages as we continue to dive into Designing Data-Intensive Applications while Allen is gallivanting around London, Michael had a bullish opinion, and Joe might not know about The Witcher.
While we continue to dig into Designing Data-Intensive Applications, we take a step back to discuss data models and relationships as Michael covers all of his bases, Allen has a survey answer just for him, and Joe really didn't get his tip from Reddit.
While we continue to dig into Designing Data-Intensive Applications, we take a step back to discuss data models and relationships as Michael covers all of his bases, Allen has a survey answer just for him, and Joe really didn't get his tip from Reddit.
In this episode of The Podlets Podcast, we welcome Michael Gasch from VMware to join our discussion on the necessity (or not) of formal education in working in the realm of distributed systems. There is a common belief that studying computer science is a must if you want to enter this field, but today we talk about the various ways in which individuals can teach themselves everything they need to know. What we establish, however, is that you need a good dose of curiosity and craziness to find your feet in this world, and we discuss the many different pathways you can take to fully equip yourself. Long gone are the days when you needed a degree from a prestigious school: we give you our hit-list of top resources that will go a long way in helping you succeed in this industry. Whether you are someone who prefers learning by reading, attending Meetups or listening to podcasts, this episode will provide you with lots of new perspectives on learning about distributed systems. Follow us: https://twitter.com/thepodlets Website: https://thepodlets.io Feeback: info@thepodlets.io https://github.com/vmware-tanzu/thepodlets/issues Hosts: Carlisia Campos Duffie Cooley Michael Gasch Key Points From This Episode: • Introducing our new host, Michael Gasch, and a brief overview of his role at VMware. • Duffie and Carlisia’s educational backgrounds and the value of hands-on work experience. • How they first got introduced to distributed systems and the confusion around what it involves. • Why distributed systems are about more than simply streamlining communication and making things work. • The importance and benefit of educating oneself on the fundamentals of this topic. • Our top recommended resources for learning about distributed systems and their concepts. • The practical downside of not having a formal education in software development. • The different ways in which people learn, index and approach problem-solving. • Ensuring that you balance reading with implementation and practical experience. • Why it’s important to expose yourself to discussions on the topic you want to learn about. • The value of getting different perspectives around ideas that you think you understand. • How systems thinking is applicable to things outside of computer science.• The various factors that influence how we build systems. Quotes: “When people are interacting with distributed systems today, or if I were to ask like 50 people what a distributed system is, I would probably get 50 different answers.” — @mauilion [0:14:43] “Try to expose yourself to the words, because our brains are amazing. Once you get exposure, it’s like your brain works in the background. All of a sudden, you go, ‘Oh, yeah! I know this word.’” — @carlisia [0:14:43] “If you’re just curious a little bit and maybe a little bit crazy, you can totally get down the rabbit hole in distributed systems and get totally excited about it. There’s no need for having formal education and the degree to enter this world.” — @embano1 [0:44:08] Learning resources suggested by the hosts: Book, Designing Data-Intensive Applications, M. Kleppmann Book, Distributed Systems, M. van Steen and A.S. Tanenbaum (free with registration) Book, Thesis on Raft, D. Ongaro. - Consensus - Bridging Theory and Practice (free PDF) Book, Enterprise Integration Patterns, B.Woolf, G. Hohpe Book, Designing Distributed Systems, B. Burns (free with registration) Video, Distributed Systems Video, Architecting Distributed Cloud Applications Video, Distributed Algorithms Video, Operating System - IIT Lectures Video, Intro to Database Systems (Fall 2018) Video, Advanced Database Systems (Spring 2018) Paper, Time, Clocks, and the Ordering of Events in a Distributed System Post, Notes on Distributed Systems for Young Bloods Post, Distributed Systems for Fun and Profit Post, On Time Post, Distributed Systems @The Morning Paper Post, Distributed Systems @Brave New Geek Post, Aphyr’s Class materials for a distributed systems lecture series Post, The Log - What every software engineer should know about real-time data’s unifying abstraction Post, Github - awesome-distributed-systems Post, Your Coffee Shop Doesn’t Use Two-Phase Commit Podcast, Distributed Systems Engineering with Apache Kafka ft. Jason Gustafson Podcast, The Systems Bible - The Beginner’s Guide to Systems Large and Small - John Gall Podcast, Systems Programming - Designing and Developing Distributed Applications - Richard Anthony Podcast, Distributed Systems - Design Concepts - Sunil Kumar Links Mentioned in Today’s Episode: The Podlets on Twitter — https://twitter.com/thepodlets Michael Gasch on LinkedIn — https://de.linkedin.com/in/michael-gasch-10603298 Michael Gasch on Twitter — https://twitter.com/embano1 Carlisia Campos on LinkedIn — https://www.linkedin.com/in/carlisia Duffie Cooley on LinkedIn — https://www.linkedin.com/in/mauilion VMware — https://www.vmware.com/ Kubernetes — https://kubernetes.io/ Linux — https://www.linux.org Brian Grant on LinkedIn — https://www.linkedin.com/in/bgrant0607 Kafka — https://kafka.apache.org/ Lamport Article — https://lamport.azurewebsites.net/pubs/time-clocks.pdf Designing Date-Intensive Applications — https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable-ebook/dp/B06XPJML5D Designing Distributed Systems — https://www.amazon.com/Designing-Distributed-Systems-Patterns-Paradigms/dp/1491983647 Papers We Love Meetup — https://www.meetup.com/papers-we-love/ The Systems Bible — https://www.amazon.com/Systems-Bible-Beginners-Guide-Large/dp/0961825170 Enterprise Integration Patterns — https://www.amazon.com/Enterprise-Integration-Patterns-Designing-Deploying/dp/0321200683 Transcript: EPISODE 12 [INTRODUCTION] [0:00:08.7] ANNOUNCER: Welcome to The Podlets Podcast, a weekly show that explores Cloud Native one buzzword at a time. Each week, experts in the field will discuss and contrast distributed systems concepts, practices, tradeoffs and lessons learned to help you on your cloud native journey. This space moves fast and we shouldn’t reinvent the wheel. If you’re an engineer, operator or technically minded decision maker, this podcast is for you. [EPISODE] [00:00:41] CC: Hi, everybody. Welcome back. This is Episode 12, and we are going to talk about distributed systems without a degree or even with a degree, because who knows how much we learn in university. I am Carlisia Campos, one of your hosts. Today, I also have Duffie Cooley. Say hi, Duffie. [00:01:02] DC: Hey, everybody. [00:01:03] CC: And a new host for you, and this is such a treat. Michael Gasch, please tell us a little bit of your background. [00:01:11] MG: Hey! Hey, everyone! Thanks, Carlisia. Yes. So I’m new to the show. I just want to keep it brief because I think over the show we’ll discuss our backgrounds a little bit further. So right now, I’m with VMware. So I’ve been with VMware almost for five years. Currently, I'm in the office of the CTO. I’m a platform architect in the office of the CTO and I mainly use Kubernetes on a daily basis from an engineering perspective. So we build a lot of prototypes based on customer input or ideas that we have, and we work with different engineering teams. Kurbernetes has become kind of my bread and butter but lately more from a consumer perspective like developing with Kurbenetes or against Kubernetes, instead of the formal ware of mostly being around implementing and architecting Kubernetes. [00:01:55] CC: Nice. Very impressive. Duffie? [00:01:58] MG: Thank you. [00:01:59] DC: Yeah. [00:02:00] CC: Let’s give the audience a little bit of your backgrounds. We’ve done this before but just to frame the episodes, so people will know how we come in as distributed systems. [00:02:13] DC: Sure. In my experience, I spent – I don’t have a formal education history. I spent most of my time kind of like in a high school time. Then from there, basically worked into different systems administration, network administration, network architect, and up into virtualization and now containerization. So I’ve got a pretty hands-on kind of bootstrap experience around managing infrastructure, both at small-scale, inside of offices, and the way up to very large scale, working for some of the larger companies here in the Silicon Valley. [00:02:46] CC: All right. My turn I guess. So I do have a computer science degree but I don’t feel that I really went deep at all in distributed systems. My degree is also from a long time ago. So mainly, what I do know now is almost entirely from hands-on work experience. Even so, I think I'm very much lacking and I’m very interested in this episode, because we are going to go through some great resources that I am also going to check out later. So let’s get this party started. [00:03:22] DC: Awesome. So you want to just talk about kind of the general ideas behind distributed systems and like how you became introduced to them or like where you started in that journey? [00:03:32] CC: Yeah. Let’s do that. [00:03:35] DC: My first experience with the idea of distributed systems was in using them before I knew that they were distributed systems, right? One of the very first distributed systems as I look back on it that I ever actually spent any real time with was DNS, which I consider to be something of a distributed system. If you think about it, they have name servers, they have a bunch of caching servers. They solve many of the same sorts of problems. In a previous episode, we talked about how networking, just the general idea of networking and handling large-scale architecting networks. It’s also in a way very – has a lot of analogues into distributed systems. For me, I think working with and helping solve the problems that are associated with them over time gave me a good foundational understanding for when we were doing distributed systems as a thing later on in my career. [00:04:25] CC: You said something that caught my interest, and it’s very interesting, because obviously for people who have been writing algorithms, writing papers about distributed systems, they’re going to go yawning right now, because I’m going to say the obvious. As you start your journey programming, you read job requirements. You read or you must – should know distributed systems. Then I go, “What is distributed system? What do they really mean?” Because, yes, we understand apps stuck to apps and then there is API, but there’s always for me at least a question at the back of my head. Is that all there is to it? It sounds like it should be a lot more involved and complex and complicated than just having an app stuck on another app. In fact, it is because there are so many concepts and problems involved in distributed systems, right? From timing, clock, and sequence, and networking, and failures, how do you recover. There is a whole world in how do you log this properly, how do you monitor. There’s a whole world that revolves around this concept of systems residing in different places and [inaudible 00:05:34] each other. [00:05:37] DC: I think you made a very good point. I think this is sort of like there’s an analog to this in containers, oddly enough. When people say, “I want a container within and then the orchestration systems,” they think that that's just a thing that you can ask for. That you get a container and inside of that is going to be your file system and it’s going to do all those things. In a way, I feel like that same confusion is definitely related to distributed systems. When people are interacting with distributed systems today or if I were to ask like 50 people what a distributed system is, I would probably get 50 different answers. I think that you got a pretty concise definition there in that it is a set of systems that intercommunicate to perform some function. It’s like found at its base line. I feel like that's a pretty reasonable definition of what distributed systems are, and then we can figure out from there like what functions are they trying to achieve and what are some of the problems that we’re trying to solve with them. [00:06:29] CC: Yeah. That’s what it’s all about in my head is solving the problems because at the beginning, I was thinking, “Well, it must be just about communicating and making things work.” It’s the opposite of that. It’s like that’s a given. When a job says you need to understand about distributed systems, what they are really saying is you need to know how to deal with failures, not just to make it work. Make it work is sort of the easy part, but the whole world of where the failures can happen, how do you handle it, and that, to me is what needing to know distributed system comes in handy. In a couple different things, like at the top layer or 5% is knowing how to make things work, and 95% is knowing how to handle things when they don’t work, because it’s inevitable. [00:07:19] DC: Yeah, I agree. What do you think, Michael? How would you describe the context around distributed systems? What was the first one that you worked with? [00:07:27] MG: Exactly. It’s kind of similar to your background, Duffie, which is no formal degree or education on computer science right after high school and jumping into kind of my first job, working with computers, computer administration. I must say that from the age of I think seven or so, I was interested in computers and all that stuff but more from a hardware perspective, less from a software development perspective. So my take always was on disassembling the pieces and building my own computers than writing programs. In the early days, that just was me. So I completely almost missed the whole education and principles and fundamentals of how you would write a program for a single computer and then obviously also for how to write programs that run across a network of computers. So over time, as I progress on my career, especially kind of in the first job, which was like seven years of different Linux systems, Linux administrations, I kind of – Like you, Duffie, I dealt with distributed systems without necessarily knowing that I'm dealing with distributed systems. I knew that it was mostly storage systems, Linux file servers, but distributed file servers. Samba, if some of you recall that project. So I knew that things could fail. I know it could fail, for example, or I know it could not be writable, and so a client must be stuck but not necessarily I think directly related to fundamentals of how distributed systems work or don’t work. Over time, and this is really why I appreciate the Kubernetes project in community, I got more questions, especially when this whole container movement came up. I got so many questions around how does that thing work. How does scheduling work? Because scheduling kind of was close to my interest in the hardware design and low-level details. But I was looking at Kubernetes like, “Okay. There is the scheduler.” In the beginning, the documentation was pretty scarce around the implementation and all the control as for what’s going on. So I had to – I listen to a lot of podcasts and Brian Grant’s great talks and different shows that he gave from the Kubernetes space and other people there as well. In the end, I had more questions than answers. So I had to dig deeper. Eventually, that led me to a path of wanting to understand more formal theory behind distributed systems by reading the papers, reading books, taking some online classes just to get a basic understanding of those issues. So I got interested in results scheduling in distributed systems and consensus. So those were two areas that kind of caught my eyes like, “What is it? How do machines agree in a distributed system if so many things can go wrong?” Maybe we can explore this later on. So I’m going to park this for a bit. But back to your question, which was kind of a long-winded answer or a road to answering your question, Duffie. For me, a distributed system is like this kind of coherent network of computer machines that from the outside to an end-user or to another client looks like one gigantic big machine that is [inaudible 00:10:31] to run as fast. That is performing also efficient. It constitutes a lot of characteristics and properties that we want from our systems that a single machine usually can’t handle. But it looks like it's a big single machine to a client. [00:10:46] DC: I think that – I mean, it is interesting like, I don’t want to get into – I guess this is probably not just a distributed systems talk. But obviously, one of the questions that falls out for me when I hear that answer is then what is the difference between a micro service architecture and distributed systems, because I think it's – I mean, to your point, the way that a lot of people work with the app to learn to develop software, it’s like we’re going to develop a monolithic application just by nature. We’re going to solve a software problem using code. Then later on, when we decide to actually scale this thing or understand how to better operate it under a significant load, then we started thinking about, “Okay. Well, how do we have to architect this differently in such a way that it can support that load?” That’s where I feel like the beams cut across, right? We’re suddenly in a world where you’re not only just talking about microservices. You’re also talking about distributed systems because you’re going to start thinking about how to understand transactionality throughout that system, how to understand all of those consensus things that you're referring to. How do they affect it when I add mister network in there? That’s cool. [00:11:55] MG: Just one comment on this, Duffie, which took me a very long time to realize, which is coming – From my definition of what a distributed system is like this group of machines that they perform work in a certain sense or maybe even more abstracted like at a bunch of computers network together. What I kind of missed most of the time, and this goes back to the DNS example that you gave in the beginning, was the client or the clients are also part of this distributed system, because they might have caches, especially in DNS. So you always deal with this kind of state that is distributed everywhere. Maybe you don't even know where it kind of is distributed, and the client kind of works with a local stale data. So that is also part of a distributed system, and something I want to give credit to the Kafka community and some of the engineers on Kafka, because there was a great talk lately that I heard. It’s like, “Right. The client is also part of your distributed system, even though usually we think it's just the server. That those many server machines, all those microservices.” At least I missed that a long time. [00:12:58] DC: You should put a link to that talk in our [inaudible 00:13:00]. That would be awesome. It sounds great. So what do you think, Carlisia? [00:13:08] CC: Well, one thing that I wanted to mention is that Michael was saying how he’s been self-teaching distributed systems, and I think if we want to be competent in the area, we have to do that. I’m saying this to myself even. It’s very refreshing when you read a book or you read a paper and you really understand the fundamentals of an aspect of distributed system. A lot of things fall into place in your hands. I’m saying this because even prioritizing reading about and learning about the fundamentals is really hard for me, because you have your life. You have things to do. You have the minutiae in things to get done. But so many times, I struggle. In the rare occasions where I go, “Okay. Let me just learn this stuff trial and error,” it makes such a difference. Then once you learn, it stays with you forever. So it’s really good. It’s so refreshing to read a paper and understand things at a different level, and that is what this episode is. I don’t know if this is the time to jump in into, “So there are our recommendations.” I don't know how deep, Michael, you’re going to go. You have a ton of things listed. Everything we mention on the show is going to be on our website, on the show notes. So nobody needs to be necessarily taking notes. Anything thing I wanted to say is it would be lovely if people would get back to us once you listened to this. Let us know if you want to add anything to this list. It would be awesome. We can even add it to this list later and give a shout out to you. So it’d be great. [00:14:53] MG: Right. I don’t want to cover this whole list. I just wanted to be as complete as possible about a stuff that I kind of read or watched. So I just put it in and I just picked some highlights there if you want. [00:15:05] CC: Yeah. Go for it. [00:15:06] MG: Yeah. Okay. Perfect. Honestly, even though not the first in the list, but the first thing that I read, so maybe from kind of my history of how I approach things, was searching for how do computers work and what are some of the issues and how do computers and machines agree. Obviously, the classic paper that I read was the Lamport paper on “Time, Clocks, and the Ordering of Events in a Distributed System”. I want to be honest. First time I read it, I didn’t really get the full essence of the paper, because it doesn't prove in there. The mathematic proof for me didn't click immediately, and there were so many things and concepts and physics and time that were thrown at me where I was looking for answers and I had more questions than answers. But this is not to Leslie. This is more like by the time I just wasn't prepared for how deep the rabbit hole goes. So I thought, if someone asked me for – I only have time to read one book out of this huge list that I have there and all the other resources. Which one would it be? Which one would I recommend? I would recommend Designing Data-Intensive Apps by Martin Kleppmann, which I’ve been following his blog posts and some partial releases that he's done before fully releasing that book, which took him more than four years to release that book. It’s kind of almost the Bible, state-of-the-art Bible when it comes to all concepts in distributed systems. Obviously, consensus, network failures, and all that stuff but then also leading into modern data streaming, data platform architectures inspired by, for example, LinkedIn and other communities. So that would be the book that I would recommend to someone if – Who does have time to read one book. [00:16:52] DC: That’s a neat approach. I like the idea of like if you had one thing, if you have one way to help somebody ramp on distributed systems and stuff, what would it be? For me, it’s actually I don't think I would recommend a book, oddly enough. I feel like I would actually – I’d probably drive them toward the kind of project, like the kind [inaudible 00:17:09] project and say, “This is a distributed system all but itself.” Start tearing it apart to pieces and seeing how they work and breaking them and then exploring and kind of just playing with the parts. You can do a lot of really interesting things. This is actually another book in your list that was written by Brendan Burns about Designing Distributed Systems I think it’s called. That book, I think he actually uses Kubernetes as a model for how to go about achieving these things, which I think is incredibly valuable, because it really gets into some of the more stable distributed systems patterns that are around. I feel like that's a great entry point. So if I had one thing, if I had to pick one way to help somebody or to push somebody in the direction of trying to learn distributed systems, I would say identify those distributed systems that maybe you’re already aware of and really explore how they work and what the problems with them are and how they went about solving those problems. Really dig into the idea of it. It’s something you could put your hands on and play with. I mean, Kubernetes is a great example of this, and this is actually why I referred to it. [00:18:19] CC: The way that works for me when I’m learning something like that is to really think about where the boundaries are, where the limitations are, where the tradeoffs are. If you can take a smaller system, maybe something like The Kind Project and identify what those things are. If you can’t, then ask around. Ask someone. Google it. I don’t know. Maybe it will be a good episode topic for us to do that. This part is doing this to map things out. So maybe we can understand better and help people understand things better. So mainly like yeah. They try to do the distributed system thesis are. But for people who don’t even know what they could be, it’s harder to identify it. I don’t know what a good strategy for that would be, because you can read about distributed systems and then you can go and look at a project. How do you map the concept to learning to what you’re seeing in the code base? For me, that’s the hardest thing. [00:19:26] MG: Exactly. Something that kind of I had related experience was like when I went into software development, without having formal education on algorithms and data structures, sometimes in your head, you have the problem statement and you're like, “Okay. I would do it like that.” But you don't know the word that describes, for example, a heap structure or queue because you’ve never – Someone told you that is heap, that is a queue, and/or that is a stick. So, for me, reading the book was a bit easier. Even though I have done distributed systems, if you will, administration for many years, many years ago, I didn't realize that it was a distributed system because I never had this definition or I never had those failure scenarios in mind and it never had a word for consensus. So how would I search for something like how do machines agree? I mean, if you put that on Google, then likely they will come – Have a lot of stuff. But if you put it in consensus algorithm, likely you get a good hit on what the answer should be. [00:20:29] CC: It is really problematic when we don't know the names of things because – What you said is so right, because we are probably doing a lot of distributed systems without even knowing that that’s what it is. Then we go in the job interview, and people are, “Oh! Have you done a distributed system?” No. You have but you just don’t know how to name things. But that’s one – [00:20:51] DC: Yeah, exactly. [00:20:52] CC: Yeah. Right? That’s one issue. Another issue, which is a bigger issue though is at least that’s how it is for me. I don’t want to speak for anybody else but for me definitely. If I can’t name things and I face a problem and I solve it, every time I face that problem it’s a one-off thing because I can’t map to a higher concept. So every time I face that problem, it’s like, “Oh!” It’s not like, “Oh, yeah!” If this is this kind of problem, I have a pattern. I’m going to use that to this problem. So that’s what I’m saying. Once you learn the concept, you need to be able to name it. Then you can map that concept to problems you have. All of a sudden, if you have like three things [inaudible 00:21:35] use to solve this problem, because as you work with computers, coding, it’s like you see the same thing over and over again. But when you don’t understand the fundamentals, things are just like – It’s a bunch of different one-offs. It’s like when you have an argument with your spouse or girlfriend or boyfriend. Sometimes, it’s like you’re arguing 10 times in a month and you thought, “Oh! I had 10 arguments.” But if you’d stop and think about it, no. We had one argument 10 times. It’s very different than having 10 problems versus having 1 problem 10 times, if that makes sense. [00:22:12] MG: It does. [00:22:11] DC: I think it does, right? [00:22:12] MG: I just want to agree. [00:22:16] DC: I think it does make sense. I think it’s interesting. You’ve highlighted kind of an interesting pattern around the way that people learn, which I think is really interesting. That is like some people are able to read about patterns or software patterns or algorithms or architectures and have that suddenly be an index of their heads. They can actually then later on correlate what they've read with the experience that they’re having around the things they're working on. For some, it needs to be hands-on. They need to actually be able to explore that idea and understand and manipulate it and be able to describe how it works or functions in person, in reality. They need to have that hands-on like, “I need to touch it to understand it,” kind of experience. Those people also, as they go through those experiences, start building this index of patterns or algorithms in their head. They have this thing that they can correlate to, right, like, “Oh! This is a time problem,” or, “This is a consensus problem,” or what have you, right? [00:23:19] CC: Exactly. [00:23:19] DC: You may not know the word for that saying but you're still going to develop a pattern in your mind like the ability to correlate this particular problem with some pattern that you’ve seen before. What's interesting is I feel like people have taken different approaches to building that index, right? For me, it’s been troubleshooting. Somebody gives me a hard problem, and I dig into it and I figure out what the problem is, regardless of whether it's to do with distributed systems or cooking. It could be anything, but I always want to get right in there and figure out what that problem and start building a map in my mind of all of the players that are involved. For others, I feel like with an educational background, if you have an education background, I think that sometimes you end up coming to this with a set of patterns already instilled that you understand and you're just trying to apply those patterns to the experience you’re having instead. It’s just very – It’s like horse before the cart or cart before the horse. It’s very interesting when you think about it. [00:24:21] CC: Yes. [00:24:22] MG: The recommendation that I just want to give to people that are like me who like reading is that I went overboard a bit in the beginnings because I was so fascinated by all the stuff, and it went down the rabbit hole deeper, deeper, deeper, deeper. Reading and reading and reading. At some point, even coming to weird YouTube channels that talk about like, “Is time real and where does time emerge from?” It became philosophical even like the past where I went to. Now, the thing is, and this is why I like Duffie’s approach with like breaking things and then undergo like trying to break things and understanding how they work and how they can fail is that immediately you practice. You’re hands-on. So that would be my advice to people who are more like me who are fascinated by reading and all the theory that your brain and your mind is not really capable of kind of absorbing all the stuff and then remembering without practicing. Practicing can be breaking things or installing things or administrating things or even writing software. But for me, that was also a late realization that I should have maybe started doing things earlier than the time I spent reading. [00:25:32] CC: By doing, you mean, hands-on? [00:25:35] MG: Yeah. [00:25:35] CC: Anything specific that you would have started with? [00:25:38] MG: Yes. On Kubernetes – So going back those 15 years to my early days of Linux and Samba, which is a project. By the time, I think it was written in C or C++. But the problem was I wasn’t able to read the code. So the only thing that I had by then was some mailing lists and asking questions and not even knowing which questions to ask because of lack of words of understanding. Now, fast-forward into Kubernetes’ time, which got me deeper in distributed systems, I still couldn't read the code because I didn't know [inaudible 00:26:10]. But I forced myself to read the code, which helped a little bit for myself to understand what was going on because the documentation by then was lacking. These days, it’s easier, because you can just install [inaudible 00:26:20] way easier today. The hands-on piece, I mean. [00:26:23] CC: You said something interesting, Michael, and I have given this advice before because I use this practice all the time. It's so important to have a vocabulary. Like you just said, I didn't know what to ask because I didn’t know the words. I practice this all the time. To people who are in this position of distributed systems or whatever it is or something more specific that you are trying to learn, try to expose yourself to the words, because our brains are amazing. Once you get exposure, it’s like your brain works in the background. All of a sudden, you go, “Oh, yeah! I know this word.” So podcasts are great for me. If I don't know something, I will look for a podcast on the subject and I start listening to it. As the words get repeated, just contextually. I don’t have to go and get a degree or anything. Just by listening to the words being spoken in context, absorb the meaning of it. So podcasting is great or YouTube or anything that you can listen. Just in reading too, of course. The best thing is talking to people. But, again, it’s really – Sometimes, it’s not trivial to put yourself in positions where people are discussing these things. [00:27:38] DC: There are actually a number of Meetups here in the Bay Area, and there’s a number of Meetups – That whole Meetup thing is sort of nationwide across the entire US and around the world it seems like now lately. Those Meetups I feel like there are a number of Meetups in different subject areas. There’s one here in the Bay Area called Papers We Love, where they actually do explore interesting technical papers, which are obviously a great place to learn the words for things, right? This is actually where those words are being defined, right? When you get into the consensus stuff, they really get into – One even is Raft. There are many papers on Raft and many papers on multiple things that get into consensus. So definitely, whether you explore a meetup on a distributed system or in a particular application or in a particular theme like Kubernetes, those things are great places just to kind of get more exposure to what people are thinking about in these problems. [00:28:31] CC: That is such a great tip. [00:28:34] MG: Yeah. The podcast is twice as good as well, because for people, non-natives – English speaker, I mean. Oh, people. Not speakers. People. The thing is that the word you’re looking for might be totally different than the English word. For example, consensus in Germany has this totally different meaning. So if I would look that up in German, likely I would find nothing or not really related at all. So you have to go through translation and then finding the stuff. So what you said, Duffie, with PWL, Papers We Love, or podcasts, those words, often they are in English, those podcasts and they are natural consensus or charting or partitioning. Those are the words that you can at least look up like what does it mean. That’s what I did as well thus far. [00:29:16] CC: Yes. I also wanted to do a plus one for Papers We Love. It’s – They are everywhere and they also have an online. They have an online version of the Papers We Love Meetup, and a lot of the local ones film their meetups. So you can go through the history and see if they talked about any paper that you are interested in. Probably, I’m sure multiple locations talk about the same paper, so you can get different takes too. It’s really, really cool. Sometimes, it’s completely obscure like, “I didn’t get a word of what they were saying. Not one. What am I doing here?” But sometimes, they talk about things. You at least know what the thing is and you get like 10% of it. But some paper you don’t. People who deal with papers day in and day out, it’s very much – I don’t know. [00:30:07] DC: It’s super easy when going through a paper like that to have the imposter syndrome wash over you, right, because you’re like – [00:30:13] CC: Yes. Thank you. That’s what I wanted to say. [00:30:15] DC: I feel like I’ve been in this for 20 years. I probably know a few things, right. But in talking about reading this consensus paper going, “Can I buy a vowel? What is happening?” [00:30:24] CC: Yeah. Can I buy a vowel? That’s awesome, Duffie. [00:30:28] DC: But the other piece I want to call out to your point, which I think is important is that some people don't want to go out and be there in person. They don’t feel comfortable or safe exploring those things in person. So there are tons of resources like you have just pointed out like the online version of Papers We Love. You can also sign into Slack and just interact with people via text messaging, right? There’s a lot of really great resources out there for people of all types, including the amount of time that you have. [00:30:53] CC: For Papers We Love, it’s like going to language class. If you go and take a class in Italian, your first day, even though that is going to be super basic, you’re going to be like, “What?” You’ll go back in your third week. You start, “Oh! I’m getting this.” Then a month, three months, “Oh! I’m starting to be competent.” So you go once. You’re going to feel lost and experience imposter syndrome. But you keep going, because that is a format. First, you start absorbing what the format is, and that helps you understand the content. So once your mind absorbs the format, you’re like, “Okay. Now, I have – I know how to navigate this. I know what’s coming next.” So you don’t have to focus on that. You start focusing in the content. Then little but little, you become more proficient in understanding. Very soon, you’re going to be willing to write a paper. I’m not there yet. [00:31:51] DC: That’s awesome. [00:31:52] CC: At least that’s how I think it goes. I don’t know. [00:31:54] MG: I agree. [00:31:55] DC: It’s also changed over time. It’s fascinating. If you read papers from like 20 years ago and you read papers that are written more recently, it's interesting. The papers have changed their language when considering competition. When you're introducing a new idea with a paper, frequently that you are introducing it into a market full of competition. You're being very careful about the language, almost in a way to complicate the idea rather than to make it clear, which is challenging. There are definitely some papers that I’ve read where I was like, “Why are you using so many words to describe this simple idea?” It makes no sense, but yeah. [00:32:37] CC: I don’t want to make this episode all about Papers We Love. It was so good that you mentioned that, Duffie. It’s really good to be in a room where we’ll be watching something online where you see people asking questions and people go, “Oh! Why is this thing like this? Why is X like this,” or, “Why is Y doing like this?” Then you go, “Oh! I didn’t even think that X was important. I didn’t even know that Y was important.” So you stop picking up what the important things are, and that’s what makes it click is now you’ve – Hooking into the important concepts because people who know more than you are pointing out and asking questions. So you start paying attention to learning what the main things it should be paying attention to, which is different from reading the paper by yourself. It’s just a ton of content that you need to sort through. [00:33:34] DC: Yeah. I frequently self-describe it as a perspective junkie, because I feel like for any of us really to learn more about a subject that we feel we understand, we need the perspective of others to really engage, to expand our understanding of that thing. I feel like and I know how to make a peanut butter and jelly sandwich. I’ve done it a million times. It’s a solid thing. But then I watch my kid do it and I’m like, “I hadn’t thought of that problem.” [inaudible 00:33:59], right? This is a great example of that. Those communities like Papers We Love are great opportunity to understand the perspective of others around these hard ideas. When we’re trying to understand complex things like distributed systems, this is where it’s at. This is actually how we go about achieving this. There is a lot that you can do on your own but there is always going to be more that you can do together, right? You can always do more. You can always understand this idea faster. You can understand the complexity of a system and how to break it down into these things by exploiting it with other people. That's I feel like – [00:34:40] CC: That is so well said, so well said, and it’s the reason for this show to exist, right? We come on a show and we give our perspectives, and people get to learn from people with different backgrounds, what their takes are on distributed systems, cloud native. So this was such a major plug for the show. Keep coming back. You’re going to learn a ton. Also, it was funny that you – It was the second time you mentioned cooking, made a cooking reference, Duffie, which brings me to something I want to make sure I say on this episode. I added a few things for reference, three books. But the one that I definitely would recommend starting with is The Systems Bible by John Gall. This book is so cool, because it helps you see everything through systems. Everything is a system. A conversation can be a system. An interaction between two people can be a system. I’m not saying this book says that. It’s just like my translation and that you can look – Cooking is a system. There is a process. There is a sequence. It’s really, really cool and it really helps to have things framed in this way and then go out and read the other books on systems. I think it helps a lot. This is definitely what I am starting with and what I would recommend people start with, The Systems Bible. Did you two know this book? [00:36:15] MG: I did not. I don’t. [00:36:17] DC: I’m not aware of it either but I really appreciate the idea. I do think that that's true. If you develop a skill for understanding systems as they are, you’ll basically develop – Frequently, what you’re developing is the ability to recognize patterns, right? [00:36:32] CC: Exactly. [00:36:32] DC: You could recognize those patterns on anything. [00:36:37] MG: Yeah. That's a good segue for just something that came to my mind. Recently, I gave a talk on event-driven architectures. For someone who's not a software developer or architect, it can be really hard to grab all those concepts on asynchrony and eventual consistency and idempotency. There are so many words of like, “What is this all – It sounds weird, way too complex.” But I was reading a book some years ago by Gregor Hohpe. He’s the guy behind Enterprise Integration Patterns. That’s also a book that I have on my list here. He said, “Your barista doesn't use two-phase commit.” So he was basically making this analogy of he was in a coffee shop and he was just looking at the process of how the barista makes the coffee. You pay for it and all the things that can go wrong while your coffee is brewed and served to you. So he was making this relation between the real world and the life and human society to computer systems. There it clicked to me where I was like, “So many problems we solve every day, for example, agreeing on a time where we should meet for dinner or cooking, is a consensus problem, and we solve it.” We even solve it in the case of failure. I might not be able to call Duffie, because he is not available right now. So somehow, we figure out. I always thought that those problems just exist in computer science and distributed systems. But I realized actually that's just a subset of the real world as is. Looking at those problems through the lens of your daily life and you get up and all the stuff, there are so many things that are related to computer systems. [00:38:13] CC: Michael, I missed it. Was it an article you read? [00:38:16] MG: Yes. I need to put that in there as well. Yeah. It’s a plug. [00:38:19] CC: Please put that in there. Absolutely. So far from being any kind of expert in distributed systems, but I have noticed. I have caught myself using systems thinking for even complicated conversations. Even in my personal life, I started approaching things in the systems oriented and just the – just a high-level example. When I am working with systems, I can approach from the beginning, the end. It’s like a puzzle, putting the puzzle together, right? Sometimes, it starts from the middle. Sometimes, it starts from the edges. When I‘m having conversations that I need to be very strategic like I have one shot. Let’s say maybe I’m in a school meeting and I have to reach a consensus or have a solution or have a plan of action. I have to ask the right questions. My private self would do things linearly. Historically like, “Let’s go from the beginning and work out through the end.” Now, I don’t do that anymore. Not necessarily. Sometimes, I like, “Let me maybe ask the last question I would ask and see where it leads and just approach things from a different way.” I don’t know if this is making sense. [00:39:31] MG: It does. It does. [00:39:32] CC: But my thinking has changed. The way I see the possibilities is not a linear thing anymore. I see how you can truly switch things. I use this in programming a lot and also writing. Sometimes, when you’re a beginner writer, you start at the top and you go down to the conclusion. Sometimes, I start I the middle and go up, right? So you can start anywhere. It’s beautiful or it just gives you so many more options. Or maybe I’m just crazy. Don’t listen to me. [00:40:03] DC: I don’t think you’re crazy. I was going to say, one of the funny things about Michael’s point and your point both, it’s like in a way that they have kind of referred to Conway's law, the idea that people will build systems in the way that they communicate. So this is actually – It totally brings it back to that same point of thing, right? We by nature will build systems that we can understand, because that is the constraint in which we have to work, right? So it’s very interesting. [00:40:29] CC: Yeah. But it’s an interesting thing, because we are [inaudible 00:40:32] by the way we are forced to work. For example, I work with constraints and what I'm saying is that that has been influencing my way of thinking. So, yes, I built systems in the way I think but also because of the constraints that I’m dealing with that I have to be – the tradeoffs I need to make, that also turns around and influences the way I think, the way I see the world and the rest of the systems and all the rest of the world. Of course, as I change my thinking, possibly you can theorize that you go back and apply that. Apply things that you learn outside of your work back to your work. It’s a beautiful back-and-forth I think. [00:41:17] MG: I had the same experience with some – When I had to design kind of my first API and think of, “Okay. What would the consumer contract be and what would a consumer expect me to deliver in response and so on?” I was forcing myself and being explicit in communicating and not throwing everything at the client back to confusing but being very explicit and precise. Also on communication every day when you talk to people, being explicit and precise really helps to avoid a lot of problems and trouble. Be it partnership or amongst friends or at work. This is what I took from computer science actually back into my real world in order to taking all those perceptions, perceiving things from a different perspective, and being more precise and explicit in how I respond or communicated. [00:42:07] CC: My take on what you just said, Michael, is we design systems thinking how is this going to fail. We know this is going to fail. We’re going to design for that. We’re going to implement for that. In real life, for example, if I need to get an agreement from someone, I try to understand the person's thinking and just go, “I just had this huge thing this week. This is in my mind.” I’m not constantly thinking about this, I’m not crazy like that. Just a little bit crazy. It’s like, “How does this person think? What do they need to know? How far can I push?” Right? We need to make a decision quickly, so the approach is everything, and sometimes you only get one shot, so yeah. I mean, correct me if I’m wrong. That's how I heard or I interpreted what you just said. [00:42:52] MG: Yeah, absolutely. Spot on. Spot on. So I’m not crazy as well. [00:42:55] CC: Basically, I think we ended up turning this episode into a little bit of like, “Here are great references,” and also a huge endorsement for really going deep into distributed systems, because it’s going to be good for your jobs. It’s going to be good for your life. It’s going to be good for your health. We are crazy. [00:43:17] DC: I’m definitely crazy. You guys might be. I’m not. All right. So we started this episode with the idea of coming to learning distributed systems perhaps without a degree or without a formal education in it. We talked about a ride of different ideas on that subject. Like different approaches that each of us took, how each of us see the problem. Is there any important point that either of you want to throw back into the mix here or bring up in relation to that? [00:43:48] MG: Well, what I take from this episode, being my first episode and getting to know your background, Duffie and Carlisia, is that whoever is going to listen to this episode, whatever background you have, even though you might not be in computer systems or industry at all, I think we three all had approved that whatever background you have, if you’re just curious a little bit and maybe a little bit crazy, you can totally get down the rabbit hole in distributed systems and get totally excited about it. There’s no need for having formal education and the degree to enter this world. It might help but it’s kind of not a high bar that I was perceiving it to be 10 years ago, for example. [00:44:28] CC: Yeah. That’s a good point. My takeaway is it always puzzled me how some people are so good and experienced and such experts in distributed systems. I always look at myself. It’s like, “How am I lacking?” It’s like, “What memo did I miss? What class did I miss? What project did I not work on to get the experience?” What I’m seeing is you just need to put yourself in that place. You need to do the work. But the good news is achieving competency in distributed systems is doable. [00:45:02] DC: My takeaway is as we discussed before, I think that there is no one thing that comprises a distributed system. It is a number of things, right, and basically a number of behaviors or patterns that we see that comprise what a distributed system is. So when I hear people say, “I’m not an expert in distributed systems,” I think, “Well, perhaps you are and maybe you don’t know it already.” Maybe there's some particular set of patterns with which you are incredibly familiar. Like you understand DNS better than the other 20 people in the room. That exposes you to a set of patterns that certainly give you the capability of saying that you are an expert in that particular set of patterns. So I think that to both of your points, it’s like you can enter this stage where you want to learn about distributed systems from pretty much any direction. You can learn it from a CIS background. You can come it with no computer experience whatsoever, and it will obviously take a bit more work. But this is really just about developing and understanding around how these things communicate and the patterns with which they accomplish that communication. I think that’s the important part. [00:46:19] CC: All right, everybody. Thank you, Michael Gasch, for being with us now. I hope to – [00:46:25] MG: Thank you. [00:46:25] CC: To see you in more episodes [inaudible 00:46:27]. Thank you, Duffie. [00:46:30] DC: My pleasure. [00:46:31] CC: Again, I’m Carlisia Campos. With us was Duffie Cooley and Michael Gesh. This was episode 12, and I hope to see you next time. Bye. [00:46:41] DC: Bye. [00:46:41] MG: Goodbye. [END OF EPISODE] [00:46:43] ANNOUNCER: Thank you for listening to The Podlets Cloud Native Podcast. Find us on Twitter at https://twitter.com/ThePodlets and on the http://thepodlets.io/ website, where you'll find transcripts and show notes. We'll be back next week. Stay tuned by subscribing. [END]See omnystudio.com/listener for privacy information.
We're comparing data models as we continue our deep dive into Designing Data-Intensive Applications as Coach Joe is ready to teach some basketball, Michael can't pronounce 6NF, and Allen measured some geodesic distances just this morning.
We're comparing data models as we continue our deep dive into Designing Data-Intensive Applications as Coach Joe is ready to teach some basketball, Michael can't pronounce 6NF, and Allen measured some geodesic distances just this morning.
We dig into what it takes to make a maintainable application as we continue to learn from Designing Data-Intensive Applications, as Allen is a big fan of baby Yoda, Michael's index isn't corrupt, and Joe has some latency issues.
We dig into what it takes to make a maintainable application as we continue to learn from Designing Data-Intensive Applications, as Allen is a big fan of baby Yoda, Michael's index isn't corrupt, and Joe has some latency issues.
We continue to study the teachings of Designing Data-Intensive Applications, while Michael's favorite book series might be the Twilight series, Joe blames his squeak toy chewing habit on his dogs, and Allen might be a Belieber.
We continue to study the teachings of Designing Data-Intensive Applications, while Michael's favorite book series might be the Twilight series, Joe blames his squeak toy chewing habit on his dogs, and Allen might be a Belieber.
We start our deep dive into Joe's favorite new book, Designing Data-Intensive Applications as Joe can't be stopped while running downhill, Michael might have a new spin on #fartgate, and Allen doesn't quite have a dozen tips this episode.
We start our deep dive into Joe's favorite new book, Designing Data-Intensive Applications as Joe can't be stopped while running downhill, Michael might have a new spin on #fartgate, and Allen doesn't quite have a dozen tips this episode.
Panel: Divya Sasidharan Charles Max Wood Joe Eames John Papa Chris Fritz Erik Hanchett Special Guest: Sarah Drasner In this episode, the panel talks with Jacob Schatz and Taylor Murphy who are apart of the GitLab Team. Jake is a staff developer, and Taylor is a manager at GitLab who started off as a data engineer. To find out more about the GitLab Team check them out here! Also, they are looking to hire, so inquire about the position through GitLab, if interested! The panel talks about Vue, Flux, Node, Flask, Python, D3, and much...much more! Show Topics: 1:51 – Chuck: Introduce yourselves, please. 1:55 – Backgrounds of the guests. 2:45 – Chuck. 2:51 – GitLab (GL): We first adapted Vue at the GitLab team for 2 years now. 3:34 – Chuck: What’s your workflow like through Vue? 3:50 – GL: We are using an application that...Using Python and Flask on the background. Vue CLI throughout the development. 4:35 – Panel asks a question. 4:40 – GitLab answers the question. 5:38 – Panel: Tell us about your secret project? 5:49 – GL: The data team at GL we are trying to solve these questions. How to get from resume to hire? There is data there. So that’s what Meltano helps with. Taylor has a Ph.D. in this area so he knows what’s he’s talking about. 7:30 – Taylor dives into this project via GitLab. 8:52 – GL: Super cool thing is that we are figuring out different ways to do things. It’s really cool stuff that we are doing. 9:23 – Panel: I’ve worked on projects when the frontend people and the data people are doing 2 different things. And they don’t know what each other group is doing. It’s interesting to bring the two things together. I see that teams have a hard time working together when it’s too separated. 10:31 – Panel: Can we get a definition of data scientist vs. a data engineer. 10:44 – Panel: Definitions of DATA SCIENCE and DATA ENGINEER are. 11:39 – GL: That is pretty close. Data science means different things to different people. 12:51 – Panel chimes in. 13:00 – Panel asks a question. 13:11 – GL: When I started working on Meltano... 14:26 – Panel: Looker is a visualization tool; I thought: I bet we can make that. I have been recreating something like Looker. We are trying to replace Looker. We are recreating a lot of the functionality of Looker. 15:10 – Panel will this be called...? 15:31 – Meltano analyze it’s apart of Meltano. Cool thing about Looker it has these files that show the whole visualization – drag and drop. With these files we can do version control. It’s built in – and if you drag it’s apart of a database. We took these files and we... 17:37 – Panel: Define Vue for that, please? 17:49 – GL dives into this topic. 18:40 – GL mentions Node. 18:52 – Chuck: What format does your data take? Do you have different reports that get sent? How does that work? 19:13 – GL: It tells a list of measures and dimensions. I setup our database to... 20:13 – Panel: Question. You chose Vue and it’s working. The reality you could have chosen any other tools. Why really did you choose Vue? 20:30 – GL: I know Vue really well. In the early 2000s I had my... If I have to repeat a process I always use Vue, because it’s the thing I am most comfortable with. This is how I program things very quickly. 21:10 – Panel: How has Vue met or exceeded or not met those expectations? 21:20 – GL: It has exceeded my expectations. One of the things is that as I am trying to staff a team I am trying to write Vue so when people see it they don’t think, “why would he do that?” 22:53 – Flux inspired architecture. 23:07 – GitLab continues the talk. 23:21 – Everything is Flux inspired in the sense that it was an idea to start with and then everybody made alterations and built things on top of that. 23:48 – Panel chimes in. 24:35 – Panel: Can you speak on the process of the workflow and process you work in Taylor and the data science and the frontend of it? 24:54 – GL: It’s the same but different. GitLab talks about Meltano some more, and also Taylor. GL: Taylor is trying to solve all these problems through Meltano. Maybe we can build our own tools? 26:05 – Panel: What’s a Lever Extractor?! 26:14 – GL: Answers this question. 26:25 – Panel: So it’s not a technical term...okay. 26:30 – GitLab continues the talk and discusses different tools. 27:18 – Panel: You are grabbing that data and Taylor is doing his magic? Or is it more integrated? 27:32 – GL answers this question. 29:06 – GitLab: In the beginning we are building that extractors for the other team, but later... The cool thing about Meltano is making it like Word Press. We have an extractor, different directories other things will be discovered by Meltano and discovered by the Gooey. If you write it correctly it can hook on to it. 30:00 – Digital Ocean Advertisement 31:38 – Panel: Meltano is a mix between Python and JavaScript or Vue? 30:43 – GL: Yeah... 31:20 – Panel asks question. How are you orchestrating the data? 31:32 – GL: Eventually it will happen with GitLab CUI. We are thinking we can orchestrate other ways. Right now it’s manually. 32:33 – GL: I like finding some sort of language that doesn’t have an extension...and writing... 32:54 – GL: I’m excited to use a tool that does things the right way like loading and transforming data but the frontend can be a joy to use. A previous company that I worked with and thought: It would be a joy to work with and connect to things that make sense, and do things the “right way”. I hope that’s what we can do with Meltano. I’m not a frontend person, but I appreciate it. 34:03 – GL: This is what I’m going to do...we will have these conversations between Taylor, myself, and our teams. 34:53 – Panel: This is a tool that people need to DL, maybe will you guys host this somewhere as a service. 35:10 – GL: We are trying to get this running. Small steps. It’s not out of the question and it’s not out of the question for this to be a service. 35:33 – GL: What do you want to do with the data warehouse? Your data is yours. 36:06 – Panel: Yeah, you don’t want to be in-charge of that. 36:17 – Panel: Have we asked where the name Meltano came from? 36:30 – GL: It sounds like a weird name. Here is the background of the name of “Meltano” came from. First name was from a sperm whale, it’s a unique name: Cachalot. 38:02 – GL: Conversation continues. 38:38 – Panel chimes in. 38:58 – GL: What does this program offering and doing...This was to help me with the name. 39:27 – GL: Acronym for Meltano: Model / Extract / Load / Transform / Analyze / Notebook / Orchestrate 39:47 – GL continues. They talk about notebooks. 40:19 –Sounds like a Daft Punk album! 40:28 – GL: I am trying to get more on the data science side. 40:57 – Panel: Question. Is Meltano super responsive and quick? 41:17 – GL: It depends on the size of the data, of course, but it is very responsive. 42:11 – GL: That job took 7-8 hours to extract everything for that specific project. 42:39 – GL: There are a lot of moving parts, so that could depend on it slowing it down or speeding it up. 43:01 – When you were building Meltano for your team, for the visualization how do you make decisions on what exactly you are visualizing? 43:18 – GL: That is the tricky part...you are one team. We are trying to find at a point where the data team is happy. One thing for example I put out a bar chart. Team member said that bar charts should always be vertical. So I am learning how they work and their wealth of information on visualization. 44:33 – Panel: Chris always does visualization. 44:48 – GL: Emily is on the team, and knows a lot about that. The correct way to visualize data so it doesn’t just look “cool.” You want it to be useful. Chart JS is what I use. 45:32 – Panel: I have used Chart JS before, too. 46:00 – Chris: I really like... 46:37 – Panel continues this conversation. 47:01 – Panel: Keynote will be given by...at this conference. 47:11 – GL continues to talk about this conversation. From nothing to something in a short amount of time. When I showed people: 47:55 – Panel: are you using Vue transitions? 48:09 – GL: Nope not even slightly. My plan was to use Vue transitions but it’s icing on the cake. Just get it working. 48:29 – Panel: A link of how I use... 49:14 – GL: This is a very small amount of code to where you are. It’s not like you had to re-implement triangles or anything like that. 49:36 – Panel: It does take some time but once you get it – you get it. 49:59 – Panel: When working with axis it can get hairy. 50:52 – GL: D3 really does a lot of the math for you and fits right it once you know how it works. You can draw anything with HTML. Check Links. 52:19 – Panel: There are a million different ways to do visualizations. There is math behind... 53:08 – Panel: D3 also helps with de-clustering. 53:25 – Panel: Any recommendations with someone who wants to dive into D3? 53:37 – GL: Tutorials have gotten better over time. 53:57 – Panel continues the conversation. 54:19 – GL: D3 Version 4 and 5 was one big library. You have C3 – what’s your opinion on C3? 55:00 – GL: have no strong opinions. 55:03 – Chuck chimes in. 55:18 – Panel continues this conversation. She talks about how she had a hard time learning D3, and how everything clicked once she learned it. 55:55 – GL: Main reason why I didn’t use D3 because... 56:07 – GL: If you were a “real” developer you’d... 56:35 – Panel: Let’s go to Picks! 56:40 – Advertisement – Code Badges Links: JavaScript Ruby on Rails Angular Digital Ocean Code Badge Notion Vue Meltano Looker Node Flux Taylor Python Chart JS React Chris Fritz – JS Fiddle D3 Chris Lema – Building an Online Course... Vuetify The First Vue.js Spring Vue CLI 3.0 Online Tutorials To Help You Get Ahead Hacker Noon – Finding Creativity in Software Engineer Indiegogo Create Awesome Vue.js Apps With... Data Sketches Vue.js in Action Benjamin Hardy’s Website Data Intensive: Don’t Just Hack It Together Article: How to Pick a Career...By Tim Urban Taylor A. Murphy’s Twitter Email: tmurphy@gitlab.com GitLab – Meet our Team Jacob Schatz’s Twitter Sponsors: Kendo UI Digital Ocean Code Badge Cache Fly Picks: Joe Ben Hardy on Medium Set Goals Chris Vue CLI 3 Vue CLI 3 on Medium Vue Dev Tools Get a new computer John Vuetify Divya Data Sketch One climb Finding Creativity in Software Engineering Erik Create Awesome Vue.js Vue.js in action Charles Get a Coder Job Building an online course Jacob Alma CCS Read source code Allen Kay Taylor Designing Data-Intensive Applications Wait But Why
Panel: Divya Sasidharan Charles Max Wood Joe Eames John Papa Chris Fritz Erik Hanchett Special Guest: Sarah Drasner In this episode, the panel talks with Jacob Schatz and Taylor Murphy who are apart of the GitLab Team. Jake is a staff developer, and Taylor is a manager at GitLab who started off as a data engineer. To find out more about the GitLab Team check them out here! Also, they are looking to hire, so inquire about the position through GitLab, if interested! The panel talks about Vue, Flux, Node, Flask, Python, D3, and much...much more! Show Topics: 1:51 – Chuck: Introduce yourselves, please. 1:55 – Backgrounds of the guests. 2:45 – Chuck. 2:51 – GitLab (GL): We first adapted Vue at the GitLab team for 2 years now. 3:34 – Chuck: What’s your workflow like through Vue? 3:50 – GL: We are using an application that...Using Python and Flask on the background. Vue CLI throughout the development. 4:35 – Panel asks a question. 4:40 – GitLab answers the question. 5:38 – Panel: Tell us about your secret project? 5:49 – GL: The data team at GL we are trying to solve these questions. How to get from resume to hire? There is data there. So that’s what Meltano helps with. Taylor has a Ph.D. in this area so he knows what’s he’s talking about. 7:30 – Taylor dives into this project via GitLab. 8:52 – GL: Super cool thing is that we are figuring out different ways to do things. It’s really cool stuff that we are doing. 9:23 – Panel: I’ve worked on projects when the frontend people and the data people are doing 2 different things. And they don’t know what each other group is doing. It’s interesting to bring the two things together. I see that teams have a hard time working together when it’s too separated. 10:31 – Panel: Can we get a definition of data scientist vs. a data engineer. 10:44 – Panel: Definitions of DATA SCIENCE and DATA ENGINEER are. 11:39 – GL: That is pretty close. Data science means different things to different people. 12:51 – Panel chimes in. 13:00 – Panel asks a question. 13:11 – GL: When I started working on Meltano... 14:26 – Panel: Looker is a visualization tool; I thought: I bet we can make that. I have been recreating something like Looker. We are trying to replace Looker. We are recreating a lot of the functionality of Looker. 15:10 – Panel will this be called...? 15:31 – Meltano analyze it’s apart of Meltano. Cool thing about Looker it has these files that show the whole visualization – drag and drop. With these files we can do version control. It’s built in – and if you drag it’s apart of a database. We took these files and we... 17:37 – Panel: Define Vue for that, please? 17:49 – GL dives into this topic. 18:40 – GL mentions Node. 18:52 – Chuck: What format does your data take? Do you have different reports that get sent? How does that work? 19:13 – GL: It tells a list of measures and dimensions. I setup our database to... 20:13 – Panel: Question. You chose Vue and it’s working. The reality you could have chosen any other tools. Why really did you choose Vue? 20:30 – GL: I know Vue really well. In the early 2000s I had my... If I have to repeat a process I always use Vue, because it’s the thing I am most comfortable with. This is how I program things very quickly. 21:10 – Panel: How has Vue met or exceeded or not met those expectations? 21:20 – GL: It has exceeded my expectations. One of the things is that as I am trying to staff a team I am trying to write Vue so when people see it they don’t think, “why would he do that?” 22:53 – Flux inspired architecture. 23:07 – GitLab continues the talk. 23:21 – Everything is Flux inspired in the sense that it was an idea to start with and then everybody made alterations and built things on top of that. 23:48 – Panel chimes in. 24:35 – Panel: Can you speak on the process of the workflow and process you work in Taylor and the data science and the frontend of it? 24:54 – GL: It’s the same but different. GitLab talks about Meltano some more, and also Taylor. GL: Taylor is trying to solve all these problems through Meltano. Maybe we can build our own tools? 26:05 – Panel: What’s a Lever Extractor?! 26:14 – GL: Answers this question. 26:25 – Panel: So it’s not a technical term...okay. 26:30 – GitLab continues the talk and discusses different tools. 27:18 – Panel: You are grabbing that data and Taylor is doing his magic? Or is it more integrated? 27:32 – GL answers this question. 29:06 – GitLab: In the beginning we are building that extractors for the other team, but later... The cool thing about Meltano is making it like Word Press. We have an extractor, different directories other things will be discovered by Meltano and discovered by the Gooey. If you write it correctly it can hook on to it. 30:00 – Digital Ocean Advertisement 31:38 – Panel: Meltano is a mix between Python and JavaScript or Vue? 30:43 – GL: Yeah... 31:20 – Panel asks question. How are you orchestrating the data? 31:32 – GL: Eventually it will happen with GitLab CUI. We are thinking we can orchestrate other ways. Right now it’s manually. 32:33 – GL: I like finding some sort of language that doesn’t have an extension...and writing... 32:54 – GL: I’m excited to use a tool that does things the right way like loading and transforming data but the frontend can be a joy to use. A previous company that I worked with and thought: It would be a joy to work with and connect to things that make sense, and do things the “right way”. I hope that’s what we can do with Meltano. I’m not a frontend person, but I appreciate it. 34:03 – GL: This is what I’m going to do...we will have these conversations between Taylor, myself, and our teams. 34:53 – Panel: This is a tool that people need to DL, maybe will you guys host this somewhere as a service. 35:10 – GL: We are trying to get this running. Small steps. It’s not out of the question and it’s not out of the question for this to be a service. 35:33 – GL: What do you want to do with the data warehouse? Your data is yours. 36:06 – Panel: Yeah, you don’t want to be in-charge of that. 36:17 – Panel: Have we asked where the name Meltano came from? 36:30 – GL: It sounds like a weird name. Here is the background of the name of “Meltano” came from. First name was from a sperm whale, it’s a unique name: Cachalot. 38:02 – GL: Conversation continues. 38:38 – Panel chimes in. 38:58 – GL: What does this program offering and doing...This was to help me with the name. 39:27 – GL: Acronym for Meltano: Model / Extract / Load / Transform / Analyze / Notebook / Orchestrate 39:47 – GL continues. They talk about notebooks. 40:19 –Sounds like a Daft Punk album! 40:28 – GL: I am trying to get more on the data science side. 40:57 – Panel: Question. Is Meltano super responsive and quick? 41:17 – GL: It depends on the size of the data, of course, but it is very responsive. 42:11 – GL: That job took 7-8 hours to extract everything for that specific project. 42:39 – GL: There are a lot of moving parts, so that could depend on it slowing it down or speeding it up. 43:01 – When you were building Meltano for your team, for the visualization how do you make decisions on what exactly you are visualizing? 43:18 – GL: That is the tricky part...you are one team. We are trying to find at a point where the data team is happy. One thing for example I put out a bar chart. Team member said that bar charts should always be vertical. So I am learning how they work and their wealth of information on visualization. 44:33 – Panel: Chris always does visualization. 44:48 – GL: Emily is on the team, and knows a lot about that. The correct way to visualize data so it doesn’t just look “cool.” You want it to be useful. Chart JS is what I use. 45:32 – Panel: I have used Chart JS before, too. 46:00 – Chris: I really like... 46:37 – Panel continues this conversation. 47:01 – Panel: Keynote will be given by...at this conference. 47:11 – GL continues to talk about this conversation. From nothing to something in a short amount of time. When I showed people: 47:55 – Panel: are you using Vue transitions? 48:09 – GL: Nope not even slightly. My plan was to use Vue transitions but it’s icing on the cake. Just get it working. 48:29 – Panel: A link of how I use... 49:14 – GL: This is a very small amount of code to where you are. It’s not like you had to re-implement triangles or anything like that. 49:36 – Panel: It does take some time but once you get it – you get it. 49:59 – Panel: When working with axis it can get hairy. 50:52 – GL: D3 really does a lot of the math for you and fits right it once you know how it works. You can draw anything with HTML. Check Links. 52:19 – Panel: There are a million different ways to do visualizations. There is math behind... 53:08 – Panel: D3 also helps with de-clustering. 53:25 – Panel: Any recommendations with someone who wants to dive into D3? 53:37 – GL: Tutorials have gotten better over time. 53:57 – Panel continues the conversation. 54:19 – GL: D3 Version 4 and 5 was one big library. You have C3 – what’s your opinion on C3? 55:00 – GL: have no strong opinions. 55:03 – Chuck chimes in. 55:18 – Panel continues this conversation. She talks about how she had a hard time learning D3, and how everything clicked once she learned it. 55:55 – GL: Main reason why I didn’t use D3 because... 56:07 – GL: If you were a “real” developer you’d... 56:35 – Panel: Let’s go to Picks! 56:40 – Advertisement – Code Badges Links: JavaScript Ruby on Rails Angular Digital Ocean Code Badge Notion Vue Meltano Looker Node Flux Taylor Python Chart JS React Chris Fritz – JS Fiddle D3 Chris Lema – Building an Online Course... Vuetify The First Vue.js Spring Vue CLI 3.0 Online Tutorials To Help You Get Ahead Hacker Noon – Finding Creativity in Software Engineer Indiegogo Create Awesome Vue.js Apps With... Data Sketches Vue.js in Action Benjamin Hardy’s Website Data Intensive: Don’t Just Hack It Together Article: How to Pick a Career...By Tim Urban Taylor A. Murphy’s Twitter Email: tmurphy@gitlab.com GitLab – Meet our Team Jacob Schatz’s Twitter Sponsors: Kendo UI Digital Ocean Code Badge Cache Fly Picks: Joe Ben Hardy on Medium Set Goals Chris Vue CLI 3 Vue CLI 3 on Medium Vue Dev Tools Get a new computer John Vuetify Divya Data Sketch One climb Finding Creativity in Software Engineering Erik Create Awesome Vue.js Vue.js in action Charles Get a Coder Job Building an online course Jacob Alma CCS Read source code Allen Kay Taylor Designing Data-Intensive Applications Wait But Why
Panel: Dave Kimura Eric Berry Special Guests: Luke Francl In this episode of Ruby Rogues, the panel talks to Luke Francl about his article “Upgrading Rails applications incrementally”. Luke works at GitHub on search and has been there since October 2017. Before working at GitHub, he worked at a search startup that was working with Rails and Elasticsearch. They talk about things that people take for granted with search, the impending takeover of GitHub from Microsoft, and what open source looks like today. They also touch on the process of getting hired at GitHub, his process for upgrading Rails applications, and more! In particular, we dive pretty deep on: Luke intro Working with Rails and Elasticsearch Why he decided to come to GitHub Surreal working at GitHub What are some of the things that people take for granted with search? What people expect from search Wordpress GitHub has been very focused on the Microsoft deal recently Code Sponsor GitHub/Microsoft owns open source Open source today Kubernetes The GitHub office What was the process like of getting hired at GitHub? Build a Query Parser blog post Using his search experience Rails incremental upgrades His process of upgrading Rails applications Upgrading a Rails application incrementally Short vs long upgrading process Rails is fairly compatible between versions And much, much more! Links: Rails Elasticsearch GitHub Wordpress Code Sponsor Kubernetes Build a Query Parser Upgrading a Rails application incrementally luke.francl.org @lof Luke’s Blog Luke’s GitHub Sponsors Sentry Digital Ocean Get a Coder Job Course Picks: Dave Screenflow LED Lightbulbs Eric Navicat Essentials Luke Designing Data-Intensive Applications by Martin Kleppmann
Panel: Dave Kimura Eric Berry Special Guests: Luke Francl In this episode of Ruby Rogues, the panel talks to Luke Francl about his article “Upgrading Rails applications incrementally”. Luke works at GitHub on search and has been there since October 2017. Before working at GitHub, he worked at a search startup that was working with Rails and Elasticsearch. They talk about things that people take for granted with search, the impending takeover of GitHub from Microsoft, and what open source looks like today. They also touch on the process of getting hired at GitHub, his process for upgrading Rails applications, and more! In particular, we dive pretty deep on: Luke intro Working with Rails and Elasticsearch Why he decided to come to GitHub Surreal working at GitHub What are some of the things that people take for granted with search? What people expect from search Wordpress GitHub has been very focused on the Microsoft deal recently Code Sponsor GitHub/Microsoft owns open source Open source today Kubernetes The GitHub office What was the process like of getting hired at GitHub? Build a Query Parser blog post Using his search experience Rails incremental upgrades His process of upgrading Rails applications Upgrading a Rails application incrementally Short vs long upgrading process Rails is fairly compatible between versions And much, much more! Links: Rails Elasticsearch GitHub Wordpress Code Sponsor Kubernetes Build a Query Parser Upgrading a Rails application incrementally luke.francl.org @lof Luke’s Blog Luke’s GitHub Sponsors Sentry Digital Ocean Get a Coder Job Course Picks: Dave Screenflow LED Lightbulbs Eric Navicat Essentials Luke Designing Data-Intensive Applications by Martin Kleppmann
Panel: Dave Kimura Eric Berry Special Guests: Luke Francl In this episode of Ruby Rogues, the panel talks to Luke Francl about his article “Upgrading Rails applications incrementally”. Luke works at GitHub on search and has been there since October 2017. Before working at GitHub, he worked at a search startup that was working with Rails and Elasticsearch. They talk about things that people take for granted with search, the impending takeover of GitHub from Microsoft, and what open source looks like today. They also touch on the process of getting hired at GitHub, his process for upgrading Rails applications, and more! In particular, we dive pretty deep on: Luke intro Working with Rails and Elasticsearch Why he decided to come to GitHub Surreal working at GitHub What are some of the things that people take for granted with search? What people expect from search Wordpress GitHub has been very focused on the Microsoft deal recently Code Sponsor GitHub/Microsoft owns open source Open source today Kubernetes The GitHub office What was the process like of getting hired at GitHub? Build a Query Parser blog post Using his search experience Rails incremental upgrades His process of upgrading Rails applications Upgrading a Rails application incrementally Short vs long upgrading process Rails is fairly compatible between versions And much, much more! Links: Rails Elasticsearch GitHub Wordpress Code Sponsor Kubernetes Build a Query Parser Upgrading a Rails application incrementally luke.francl.org @lof Luke’s Blog Luke’s GitHub Sponsors Sentry Digital Ocean Get a Coder Job Course Picks: Dave Screenflow LED Lightbulbs Eric Navicat Essentials Luke Designing Data-Intensive Applications by Martin Kleppmann
Martin Kleppmann is a software engineer, entrepreneur, author and speaker. He co-founded Rapportive (acquired by LinkedIn in 2012) and Go Test It (acquired by Red Gate Software in 2009). An accomplished author and academic, Martin wrote a book for OReilly, called Designing Data-Intensive Applications, while working as a researcher at the University of Cambridge Computer Laboratory.
Check out Newbie Remote Conf! 02:44 - What it Takes to Learn JavaScript in 2016 04:03 - Resources: Then vs Now 09:42 - Are there prerequisites? Should you have experience? 20:34 - Choosing What to Learn The iPhreaks Show Episode #153: Using Mobile Devices to Manage Diabetes with Scott Hanselman 28:19 - Deciding What to Learn Next 31:19 - Keeping Up: Obligations As a Developer 34:22 - Deciding What to Learn Next (Cont’d) 42:01 - Recommendations You-Dont-Know-JS gulp.js webpack The Little Schemer Designing Data-Intensive Applications by Martin Kleppmann Picks accidentally nonblocking (Jamison) choo (Jamison) Web Rebels (Jamison) React Rally (Jamison) Grab The Gold (Aimee) node-for-beginners (Aimee) Procrastinate On Purpose by Rory Vaden (Chuck) Newbie Remote Conf (Chuck) Get A Coder Job (Chuck)
Check out Newbie Remote Conf! 02:44 - What it Takes to Learn JavaScript in 2016 04:03 - Resources: Then vs Now 09:42 - Are there prerequisites? Should you have experience? 20:34 - Choosing What to Learn The iPhreaks Show Episode #153: Using Mobile Devices to Manage Diabetes with Scott Hanselman 28:19 - Deciding What to Learn Next 31:19 - Keeping Up: Obligations As a Developer 34:22 - Deciding What to Learn Next (Cont’d) 42:01 - Recommendations You-Dont-Know-JS gulp.js webpack The Little Schemer Designing Data-Intensive Applications by Martin Kleppmann Picks accidentally nonblocking (Jamison) choo (Jamison) Web Rebels (Jamison) React Rally (Jamison) Grab The Gold (Aimee) node-for-beginners (Aimee) Procrastinate On Purpose by Rory Vaden (Chuck) Newbie Remote Conf (Chuck) Get A Coder Job (Chuck)
Check out Newbie Remote Conf! 02:44 - What it Takes to Learn JavaScript in 2016 04:03 - Resources: Then vs Now 09:42 - Are there prerequisites? Should you have experience? 20:34 - Choosing What to Learn The iPhreaks Show Episode #153: Using Mobile Devices to Manage Diabetes with Scott Hanselman 28:19 - Deciding What to Learn Next 31:19 - Keeping Up: Obligations As a Developer 34:22 - Deciding What to Learn Next (Cont’d) 42:01 - Recommendations You-Dont-Know-JS gulp.js webpack The Little Schemer Designing Data-Intensive Applications by Martin Kleppmann Picks accidentally nonblocking (Jamison) choo (Jamison) Web Rebels (Jamison) React Rally (Jamison) Grab The Gold (Aimee) node-for-beginners (Aimee) Procrastinate On Purpose by Rory Vaden (Chuck) Newbie Remote Conf (Chuck) Get A Coder Job (Chuck)
本期由叶玎玎主持,邀请了开发者头条和码农周刊的创办人戚祥来 Teahour 做客。说起码农周刊和开发者头条,可能大家都已经很熟悉了,小编本人每天有空也会看几篇。但是对于这两个产品背后的人,相信大部分人并不太熟悉。所以这次有幸邀请了同为媒体圈的戚祥同学,我们来聊聊技术媒体的那些事。 今日头条 码农周刊整理 猿已阅 Hardcore, building the fun! Signal v. Noise Zach Holman The Hard Thing About Hard Things Rework Using logs to build a solid data infrastructure Designing Data-Intensive Applications Special Guest: 戚祥.
02:52 - David Herman Introduction Twitter Blog JavaScript Jabber Episode #54: JavaScript Parsing, ASTs, and Language Grammar w/ David Herman and Ariya Hidayat JavaScript Jabber Episode #44: Book Club! Effective JavaScript with David Herman Effective JavaScript by David Herman @effectivejs TC39 Mozilla 03:50 - The Rust Programming Language [GitHub] rust 06:31 - “Systems Programming Without Fear” 07:38 - High vs Low-level Programming Languages Garbage Collection and Deallocation Memory Safety Performance and Control Over Performance 11:44 - Stack vs Heap Memory Etymology of "Foo" RAII (Resource Acquisition Is Initialization) 16:52 - The Core of Rust Ownership Type System 24:23 - Segmentation Fault (Seg Faults) 27:51 - How much should programmers care about programming languages? Andrew Oppenlander: Rust FFI (Embedding Rust in projects for safe, concurrent, and fast code anywhere.) 32:43 - Concurrency and Multithreaded Programming 35:06 - Rust vs Go 37:58 - servo 40:27 - asm.js emscripten 42:19 - Cool Apps Built with Rust Skylight Wit.ai 45:04 - What hardware architectures does the Rust target? 45:46 - Learning Rust Rust for Rubyists by Steve Klabnik Picks Software Engineering Radio (Dave) How Will You Measure Your Life? by Clayton M. Christensen (Dave) The Presidents of the United States of America (Dave) Design Patterns in C (AJ) Microsoft Edge Dev Blog: Bringing Asm.js to Chakra and Microsoft Edge (AJ) The Web Platform Podcast: Episode 43: Modern JavaScript with ES6 & ES7 (AJ) Firefox Fame Phone (AJ) iTunes U CS106A (Programming Methodology) (Aimee) Valerian Root on Etsy (Aimee) The Dear Hunter - Live (Jamison) Designing Data-Intensive Applications by Martin Kleppmann (Jamison) Fogus: Perlis Languages (Jamison) Galactic Civilizations III (Joe) Visual Studio Code (Joe) Tessel 2 (Dave) Event Driven: How to Run Memorable Tech Conferences by Leah Silber (Dave) Plush Hello Kitty Doll (Dave)
02:52 - David Herman Introduction Twitter Blog JavaScript Jabber Episode #54: JavaScript Parsing, ASTs, and Language Grammar w/ David Herman and Ariya Hidayat JavaScript Jabber Episode #44: Book Club! Effective JavaScript with David Herman Effective JavaScript by David Herman @effectivejs TC39 Mozilla 03:50 - The Rust Programming Language [GitHub] rust 06:31 - “Systems Programming Without Fear” 07:38 - High vs Low-level Programming Languages Garbage Collection and Deallocation Memory Safety Performance and Control Over Performance 11:44 - Stack vs Heap Memory Etymology of "Foo" RAII (Resource Acquisition Is Initialization) 16:52 - The Core of Rust Ownership Type System 24:23 - Segmentation Fault (Seg Faults) 27:51 - How much should programmers care about programming languages? Andrew Oppenlander: Rust FFI (Embedding Rust in projects for safe, concurrent, and fast code anywhere.) 32:43 - Concurrency and Multithreaded Programming 35:06 - Rust vs Go 37:58 - servo 40:27 - asm.js emscripten 42:19 - Cool Apps Built with Rust Skylight Wit.ai 45:04 - What hardware architectures does the Rust target? 45:46 - Learning Rust Rust for Rubyists by Steve Klabnik Picks Software Engineering Radio (Dave) How Will You Measure Your Life? by Clayton M. Christensen (Dave) The Presidents of the United States of America (Dave) Design Patterns in C (AJ) Microsoft Edge Dev Blog: Bringing Asm.js to Chakra and Microsoft Edge (AJ) The Web Platform Podcast: Episode 43: Modern JavaScript with ES6 & ES7 (AJ) Firefox Fame Phone (AJ) iTunes U CS106A (Programming Methodology) (Aimee) Valerian Root on Etsy (Aimee) The Dear Hunter - Live (Jamison) Designing Data-Intensive Applications by Martin Kleppmann (Jamison) Fogus: Perlis Languages (Jamison) Galactic Civilizations III (Joe) Visual Studio Code (Joe) Tessel 2 (Dave) Event Driven: How to Run Memorable Tech Conferences by Leah Silber (Dave) Plush Hello Kitty Doll (Dave)
02:52 - David Herman Introduction Twitter Blog JavaScript Jabber Episode #54: JavaScript Parsing, ASTs, and Language Grammar w/ David Herman and Ariya Hidayat JavaScript Jabber Episode #44: Book Club! Effective JavaScript with David Herman Effective JavaScript by David Herman @effectivejs TC39 Mozilla 03:50 - The Rust Programming Language [GitHub] rust 06:31 - “Systems Programming Without Fear” 07:38 - High vs Low-level Programming Languages Garbage Collection and Deallocation Memory Safety Performance and Control Over Performance 11:44 - Stack vs Heap Memory Etymology of "Foo" RAII (Resource Acquisition Is Initialization) 16:52 - The Core of Rust Ownership Type System 24:23 - Segmentation Fault (Seg Faults) 27:51 - How much should programmers care about programming languages? Andrew Oppenlander: Rust FFI (Embedding Rust in projects for safe, concurrent, and fast code anywhere.) 32:43 - Concurrency and Multithreaded Programming 35:06 - Rust vs Go 37:58 - servo 40:27 - asm.js emscripten 42:19 - Cool Apps Built with Rust Skylight Wit.ai 45:04 - What hardware architectures does the Rust target? 45:46 - Learning Rust Rust for Rubyists by Steve Klabnik Picks Software Engineering Radio (Dave) How Will You Measure Your Life? by Clayton M. Christensen (Dave) The Presidents of the United States of America (Dave) Design Patterns in C (AJ) Microsoft Edge Dev Blog: Bringing Asm.js to Chakra and Microsoft Edge (AJ) The Web Platform Podcast: Episode 43: Modern JavaScript with ES6 & ES7 (AJ) Firefox Fame Phone (AJ) iTunes U CS106A (Programming Methodology) (Aimee) Valerian Root on Etsy (Aimee) The Dear Hunter - Live (Jamison) Designing Data-Intensive Applications by Martin Kleppmann (Jamison) Fogus: Perlis Languages (Jamison) Galactic Civilizations III (Joe) Visual Studio Code (Joe) Tessel 2 (Dave) Event Driven: How to Run Memorable Tech Conferences by Leah Silber (Dave) Plush Hello Kitty Doll (Dave)
Discuss this episode in the Muse community Follow @MuseAppHQ on Twitter Show notes 00:00:00 - Speaker 1: And I feel like this idea of really changes the abstractions that operating systems should provide because maybe OSs should not just be providing this model of files as a sequence of bytes, but this higher level CRDT like model and how does that impact the entire way how software is developed. 00:00:23 - Speaker 2: Hello and welcome to Meta Muse. Muse is a tool for thought on iPad. This podcast isn’t about Muse the product, it’s about Muse the company and the small team behind it. I’m Adam Wiggins here with my colleague Mark McGranaghan. Hey, Adam, and joined today by Martin Klutman from the University of Cambridge. Hello. And we talked before about Mark’s dabbling in playing the piano. I understand this is a hobby you’re starting to look into as well, Martin. 00:00:49 - Speaker 1: Oh yes, I’ve been playing the piano, like trying to do it a bit more consistently for the last year and a half or so. My lockdown projects. 00:00:57 - Speaker 2: And do you have a technique for not annoying your neighbors, or is this an electronic piano, or how do you do that? 00:01:03 - Speaker 1: It’s an electric piano, although I don’t think it’s too bad for the neighbors. Lately I’ve been trying to learn a WBC 400 piece that I can play together with my wife, so she’ll play two hands and I’ll play the other two. 00:01:15 - Speaker 2: Nice. I suspect a lot of our listeners know you already, Martin. I think you’re within your small, narrow niche, you’re a pretty high profile guy, but for those that don’t, it’d be great to hear a little bit about your background. What brought you on the journey to the topic we’re gonna talk about today? 00:01:31 - Speaker 1: Yeah, well, I’m a computer scientist, I guess. I started out as an entrepreneur and started two startups some years ago. I ended up at LinkedIn through the acquisition of the 2nd startup. And they worked on large scale stream processing with Apache Kafka and was part of that sort of stream processing world for a while. And then I wanted to share what I had learned about building large scale distributed data systems. And so I then took some time out to write a book which is called Designing Data Intensive Applications, which has turned out to be surprisingly popular. 00:02:10 - Speaker 2: Yeah, you wrote a nice kind of tell-all, you showed the numbers on it, which it’s been financially successful for you, but also one of the more popular O’Reilly books just by kind of copy sold in recent times. I like that post, like the candor there, but yeah, it makes you a pretty successful author, right? 00:02:28 - Speaker 1: Yeah, it’s sold over 100,000 copies, which is, wow, way more than what I was expecting for something that it’s a pretty technical, pretty niche book, really. But the goal of the book really is to help people figure out what sort of storage technologies and data processing technologies are appropriate for their particular use case. So it’s a lot about the trade-offs and the pros and cons of different types of systems. And there’s not a whole lot on that sort of thing out there, you know, there’s a lot of sort of vendor talk hyping the capabilities of their particular database or whatever it might be, but not so much on this comparison between different approaches. So that’s what my book tries to provide. Yeah, and then after writing that book, I sort of slipped into academia, sort of half by accident, half by design. So I then found a job at the University of Cambridge where I could do research full time. And since then I’ve been working on what we have come to call the first software, which we’re going to talk about today. The nice thing there is that now then academia compared to the startup world, I have the freedom to work on really long term ideas, big ideas which might take 5 or 10 years until they turn into like viable technologies that might be used in everyday software development. But if they do work, they’ll be really impactful and really important and so I’m enjoying that freedom to work on really long term things now as an academic. 00:03:53 - Speaker 2: And certainly it struck me when we got the chance to work together through these Ink & Switch projects that because you have both the commercial world, including startup founder, but obviously you’re very immersed in the academic kind of machinery now and again just that long-term mindset and thinking about creating public goods and all that sort of thing. And I found that I actually really like now working with people that have both of those. Another great example there would be another former podcast guest Jeffrey Litt. He was also in the startup world, now he’s doing academic work at MIT. 00:04:26 - Speaker 1: Yes, and I’m actually doing a project with him right now, right, I forgot about that. 00:04:29 - Speaker 2: There’s a current Ink & Switch project there. So I find that maybe if you live your whole life in one of those two kind of commercial slash industry or academia, you get like a fish doesn’t know what water is kind of thing, but if you have experienced both models, then it’s easier to know the pros and cons and understand the shape of the venue you’re doing your work in in the end. The point is to have some meaningful impact on humanity through your work, whatever small piece of the world you hope you’re making better. In our case, it’s computer things, but that the venue you’re in is not the point, that’s just a vehicle for getting to where you want to go, and each of these styles of venue have different trade-offs, and being aware of those maybe makes it easier to have your work have an impact. 00:05:19 - Speaker 1: Yes, I think it is really helpful to have seen both sides and I find it allows me to be a little bit more detached from the common mindset that you get like in every domain you get, you know, there are certain things that everyone believes, but you know, they’re kind of unspoken, maybe not really written down either. And so like in academia, that’s like the publishing culture and the competitiveness of publication venues and that sort of stuff, which seems ridiculous to outsiders. But if you’re in it, you kind of get accustomed to it. And likewise in startups, it’s like the hype to be constantly selling and marketing and promoting what you’re doing to the max crushing it, always crushing it, exactly, and to an outsider that seems really. it’s kind of a ridiculous show that people put on frankly. But to an insider, you know, you just get used to it and that’s just your everyday life. I find that having seen both makes me a bit more detached from both of them and I don’t know, maybe I see a little bit more through the bullshit. 00:06:21 - Speaker 2: So as you hinted, our topic today is local first software. So this is an essay that I’ll link to in the show notes. It’s about 2 years old, and notably there’s 4 authors on this paper, 3 of them are here, kind of almost a little reunion, and actually the 4th author, Peter van Hardenberg, we hope to have on as a future guest. But I thought it would be really fun to not only kind of summarize what that philosophy is, particularly because we’re actively pursuing that for the Muse sinking persistence model, but also to look at sort of what we’ve learned since we published that essay and revisiting a little bit. What do we wish we’d put in, how’s the movement, if that’s the right word for it, how’s that evolved, what have we learned in that time? But I guess before getting into all that, maybe Martin, you can give us the elevator pitch, if I’m to reference the startup terminology, the brief summary of what is local first software. 00:07:18 - Speaker 1: Yeah, local first software is a reaction to cloud software and so with cloud software, I mean things like Google Docs, where you have a browser window and you type into it and you can share it really easily. You can have several people contributing to a document really easily, you can send it for comments. Very easily and so on. So, it has made collaboration a ton easier, but it’s come at a great cost to our control and our ownership of the data, because whenever you’re using some cloud software, the data is stored on the cloud provider servers, like Google servers, for example. And you know, as users, we are given access to that data temporarily. Until that day where Google suddenly decides to lock your account and you are locked out of all of the documents that you ever created with Google Docs, or until the startup software as a service product you’re using, suddenly goes bust and decides to shut down their product with 2 weeks' notice and maybe allows you to download a zip file full of JSON files as your data export. And I find that tragic because as creative people, we put a ton of effort, time and our souls and really our personalities into the things that we create. And so much now the things that we create are computer-based things, you know, whether you’re writing the script for a play or whether you’re negotiating a contract or whether you’re doing any sort of endeavor, it’s probably a file on a computer somewhere. And if that file is in some cloud software, then there’s always this risk that it might disappear and that you might lose access to it. And so what we try to do with local first software is to articulate a vision for the future where that does not happen, where We have the same convenience that we have with cloud software that is we have the same ability to do real-time collaboration. It’s not back to the old world of sending files back and forth by email. We still want the same real-time collaboration that we get with Google Docs, but at the same time we also want the files stored. On our own computers. Because if there are files on our own computers, then nobody can take them away. They are there, we can back them up ourselves. We can optionally back them up to a cloud service if we want to. There’s nothing wrong with using a cloud service as long as the software still continues working without the cloud service. Moreover, we want the software to continue working offline so that if you’re working on a plane or working on a train that’s going through a tunnel or whatever, the software should just continue to work. And we want better security and privacy because we don’t want cloud services scanning through the content of all of our files. I think for creativity, it’s important to have that sense of privacy and ownership over your workspace. And so those are some of the ideas that we try to encapsulate in this idea of local first software. So how can we try to have the best of both worlds of the convenience of cloud software. But the data ownership of having the files locally on your own device. 00:10:15 - Speaker 2: Yeah, for me, the core of it is really agency and much of the value of cloud, and I think there’s a version of this also for mobile apps, let’s say and app stores and that sort of thing, which is not what we’re addressing in the paper, but maybe there’s a theme in computing that we’ve made computers vastly more accessible by In many cases, taking agency from people, and that’s actually a good thing in many cases, right? You don’t need to defrag your hard drive anymore. You lose your device, your email, and your photos and all those things are still in this cloud that’s managed by experiencedmins and product managers and so forth at companies like Google and So forth, and they can often do a better job of it in a lot of cases than an individual can. I mean, I think of managing my own email servers, SMTP servers, years back and needing to deal with data backup and spam filtering and all that kind of thing, and Gmail came along and I was just super happy to outsource the problem to them. Absolutely. They did a better job managing it. So I think that’s basically in many ways a good trend or as a net good in the world, and I don’t think we feel like we necessarily want to go back to everyone needs to do more of those data management tasks, but I think for the area of creative tools or more, I guess you call them power users, but it’s like you said, if you’re writing a play, that’s just a very different kind of interaction with a computer than the average person doing some calendar and email and messaging. Yeah, maybe they want different trade-offs. It’s worth doing a little bit more management and taking a little more ownership to get that greater agency over something like, yeah, my work product, the script of my play or my master thesis or whatever it is that I’m working on is something that really belongs to me and I want to put a little extra effort to have that ownership. 00:12:08 - Speaker 1: Right, exactly. And I feel like it’s not reasonable to expect everyone to be a sys admin and to set up their own services, you know, you get this self-hosted cloud software, but most of it is far too technical for the vast majority of users, and that’s not where we want to go with this. I think you still want exactly the same kind of convenience of clouds software that, you know, it just works out of the box and you don’t have to worry about the technicalities of how it’s set up. But one part of local first software is that because all of the interesting app specific work happens client side on your own device, it now means that the cloud services that you do use for syncing your data, for backing up your data, and so the cloud services become generic. And so you could imagine Dropbox or Google Drive or AWS or some other big cloud provider just giving you a syncing service for local first apps. And the way we’re thinking about this, you could have one generic service that could be used as the syncing infrastructure for many different pieces of software. So regardless of whether the software is a text editor or a spreadsheet or a CAD application for designing industrial products or music software or whatever it might be, all of those different apps. Could potentially use the same backup and syncing infrastructure in the cloud, and you can have multiple cloud providers that are compatible with each other and you could just switch from one to the other. So at that point, then it just becomes like, OK, who do you pay 6 cents a month to in order for them to store your data, it becomes just a very generic and fungible service. And so that’s what I see makes actually the cloud almost more powerful. Because it removes the lock-in that you have from, you have to use a single, the, the cloud service provided by the software offer. Instead, you could switch from one cloud provider to another very easily and you still retain the property that you’re using all of the cloud providers' expertise in providing a highly available service and you don’t have to do any admin yourself. It’s not like running your own SMTP server. So I feel like this is a really promising direction that local first software enables. 00:14:24 - Speaker 3: Yeah, for sure, indeed, and you could even describe local first software, I think, as sort of generalizing and distributing the capabilities of the different nodes. So in the classic cloud model, you have these thin clients, they can dial into the server and render whatever the server tells them. And then you have the servers and they can store data and process it and return to clients and when you have both of those at the same time, you know, it works great, but then if you’re a client like you said, who’s in a tunnel, well too bad you can’t do anything, and the local first model is more that any node in that system can do anything, it can. Process the data, I can validate it, it can store it, it can communicate it, it can sync it, and then you can choose what kind of typologies you want. So it might be that you just want to work alone in your tunnel, or it might be that you want to subscribe to a cloud backup service that does the synchronization storage part for you while you still maintain the ability to process and render data locally. This actually gets to how I first got into what we’re now calling local first software. I was in a coffee shop with Peter Ben Hartenberg, who’s one of the other authors that Adam mentioned. And we’re talking about working together at the lab when he was a principal there, he’s now the director, and he showed me the pixel pusher prototype. So Pixel Pusher was this Pixel art app where you color individual pictures to make a kind of retrographic thing, and it was real time collaborative, but the huge thing was that there was no server, so that you had this one code base and this one app, and you got real time collaboration. Across devices, and that was the moment that I realized, you know, I was a fish in the cloud infrastructure water and I didn’t realize it. Just assumed, oh, you need servers and AWS need a whole ops team, you’re gonna be running that for the rest of your life, it’s the whole thing. Well, actually, no, you could just write the app and point at the other laptop and there you go. And we eventually kind of realized all these other benefits that we would eventually articulate as the desiderao property to the local for software article, but that was the thing that really actually kicked it off for me. 00:16:19 - Speaker 1: Yeah, and that’s aspect that the apps become really self-contained and that you just don’t have a server anymore, or if you have a server, it’s like a really simple and generic thing. You don’t write a specific server just for your app anymore. That’s something that I’m not sure we really explored very well in the local first asset as it was published, but I’ve been continuing to think about that since, you know, this has really profound implications for the economics of software development. Because right now, as you said, like if you’re a startup and you want to provide some SAS product, you need your own ops team that is available 24/7 with one pager duty so that when the database starts running slow or a node falls over and you need to reboot something or whatever, you know, there’s just all this crap that you have to deal with, which makes it really. to provide cloud software because you need all of these people on call and you need all of these people to write these scalable cloud services and it’s really complicated, as evidenced by my book, a lot of which is basically like, oh crap, how do I build a scalable cloud service. And with local first software, potentially that problem simply goes away because you’ve just got each local client which just rights to storage on its own local hard disk. You know, there are no distributed systems problems to deal with, no network time outs and so on. You just write some data locally and then you have this syncing code which you just use an open source library like automerge, which will do the data syncing between your device and maybe a cloud service and maybe the other services. And the server side is just non-existent. And you’ve just like removed the entire backend team from the cost of developing a product and you don’t have the ops team problem anymore because you’re using some generic service provided by some other cloud provider. And you know, that has the potential to make the development of collaborative software so much cheaper. Which then in turn will mean that we get more software developed by smaller teams, faster, it’ll improve the competitiveness of software development in general, like it seems to have so many positive effects once you start thinking it through. 00:18:22 - Speaker 2: Yeah, absolutely. For me, yeah, maybe similar to both of you, my motivations were, well, both as a user and as a, let’s say a software creator or provider on the user side, we have these 7 different points we articulate and I think you can, in fact, we even set it. Up this way is you can give yourself a little scorecard and see which of the boxes you tick. It’ll be fun to do that for the muse syncing service when that’s up and running, but the offline capability is a huge one to me, and it’s not just the convenience. I mean, yeah, it’s. Every time I’m working on the train and my train goes through a tunnel, and suddenly I can’t type into my documents anymore, for example, or I don’t know, I like to go more remote places to work and have solitude, but then I can’t load up Figma or whatever else, and Yeah, that for me as a user is just this feeling of it comes back to the loss of agency, but also just practically it’s just annoying and you know we assume always on internet connections, but I wonder how much that is because the software engineers are people sitting in office. Or maybe now at home in San Francisco on fast internet connections with always connected devices versus kind of the more realities of life walking around in this well connected but not perfectly so world we all live in. That’s on the user side. 00:19:42 - Speaker 1: Yeah, I feel like there’s a huge bias there towards like, oh, it’s fine, we can assume everyone always has an internet connection because yes, we happen to be that small segment of the population that does have a reliable internet connection most of the time. There’s so many situations in which you simply can’t assume that and that might be anything from a farmer working on their fields using an app to manage what they’re doing to their crops and something like that and you know, they won’t necessarily have reliable cellular data coverage even in industrialized countries, let alone in other parts of the world where you just can’t assume that sort of level of network infrastructure at all. 00:20:17 - Speaker 3: Yeah. Yeah, it’s funny you mention this because we often run into this on the summits that we have for Muse. So we were recently in rural France and we had pretty slow internet, especially on upload. I think it was a satellite connection, and we always had this experience where there are 4 of us sitting around a table and you’re looking at each other, but you can’t, you know, send files around cause it needs to go to, you know, whatever, Virginia and come all the way back. 00:20:43 - Speaker 1: It’s crazy if you think about it, it’s ridiculous. 00:20:46 - Speaker 2: Yeah, and I don’t think you even need to reach as far as a farmer or a team summit at a remote location. I had a kitchen table in the house I lived in right before this one that was like a perfect place to sit and work with my laptop, but the location of the refrigerator, which it really couldn’t be any other place, just exactly blocked path to my router, and the router couldn’t really be any other place. I guess I could run a wire or something, but I really wanted to sit right there and work. But again, it’s this ridiculous thing where you can’t even put a character into a document and I could pick up the laptop and walk a meter to the left, and now suddenly I can type again and you can be that to something like. G, which does have more of a local is probably one of the closest thing the true local first software where you can work and yes, you need an internet connection to share that work with others, but you’re not stopped from that moment to moment typing things into your computer. 00:21:38 - Speaker 3: Yeah, and furthermore, from the implementation perspective, even when you have a very fast internet connection, you’re still dealing with this problem. So if I’m using an app and I type in a letter on my keyboard, between the time when I do that and when the round trip happens with the AWS server, which might be 50 or 100 milliseconds, the app needs to do something useful. I can’t wait for that full round trip. It needs to immediately show. Me what’s happening. So you inevitably have this little distributed system where you have your local process and app trying to display something immediately, and you have the remote server. And the great elegance, I think, of the local first approach is that that’s just like another instance of the general problem of synchronizing across nodes, whereas often in other apps, that’s sort of like an ad hoc kind of second special case thing, like, oh, It’s only going to be like this for 100 milliseconds. So just kind of do a hacky solution and make it so that most of the time the right letter shows up. And that’s why you have this behavior where apps will have like an offline mode, but it like never works, because I think we mentioned this on the podcast before, there’s systems that you use all the time and systems that don’t work. This is a maximum we can link to. But again, with local first, you’re kind of exercising that core synchronization approach all the time, including when it’s just you and another server on a good connection. 00:22:50 - Speaker 1: Yeah, and from a sort of fundamentals of distributed systems point of view, I find that very satisfying because I just see this as different amounts of network latency. Like if you’re online you have network latency of 50 or 100 milliseconds. If you’re offline, you have network latency of 3 hours or however long it’s going to be until you next come back online again. To me those are exactly the same, you know, I don’t care if it’s a few orders of magnitude apart. Both the network latency both need to be dealt with and if we can use the same techniques for dealing with both standard online latency and being offline, that just simplifies the software dramatically. 00:23:25 - Speaker 2: Going back to sort of the infrastructure, fewer moving parts thing and speaking to our personal motivations, for me, the experience of running Hiroku was a big part of my motivation or fed into my interest in this because Hiokku was an infrastructure business. I didn’t quite grasp what that meant when we went into it. I just wanted a better way to deploy apps, and in the end, I enjoy writing software, I enjoy creating products that solve problems for people, but infrastructure is a whole other game. And you know, it became the point where once you’re, I don’t know if mission critical is the right word, but just something people really care about working well and you’re in the critical path. So for example, our routing infrastructure, if it was down for 3 seconds, people would complain. So the slightest hiccup, and as they should, that was part of the service that that company is providing, and so that’s fair enough. But then when I go, OK, well, I’m building software, when I think of, for example, Muse, where I’m providing this productivity tool to help people think and that sort of thing. I don’t want to be paged because someone went to move a card 5 centimeters to the right and our server was down or overloaded or something, so then they can’t move the card and so then they’re writing into support angrily. I’m pretty comfortable with there’s some kind of cloud syncing problem and OK, I can’t easily like push my changes to someone else, and that is still a problem, but it feels like it’s on this slightly different timeline. You’re not just blocking the very most basic fundamental operation of the software. And so the idea that exactly as you said, it changes the economics, for me personally, I want to spend more of my time writing software and building products and less of my time setting up, maintaining and running infrastructure. So I guess looking back on the two years that have elapsed, I would say that this is probably, it’s hard to know for sure, but the Inkot switch essays, there’s a number of them that I think had a really good impact, but I think this one probably just my anecdotal feeling of seeing people cite it by. it in Twitter comments and things like that. It feels like one of the bigger impact pieces that we published, and I do really see quite a lot of people referencing that term, you know, we’ve sort of injected that term into discussion again, at least among a certain very niche narrow world of things. So, yeah, I’d be curious to hear from both of you, first, whether there’s things that looking back you wish we’d put in or you would add now, and then how that interacts with what you make of local first movement or other work that people are doing on that now. 00:25:59 - Speaker 1: I’m very happy that we gave the thing a name. There’s something we didn’t have initially when we started writing this and we’re just writing this like manifesto for software that works better basically. And then at some point we thought like it would be really good to have some way of referring to it and you know, people talk about offline first or mobile first, and these were all kind of established things and terms that people would throw around. And we also wanted some term X where we could say like I’m building an X type app. And so I’m very glad that we came up with this term they first because I’ve also seen people even outside of our direct community starting to use it. And just, you know, put it in an article casually without even necessarily explaining what it means and just assuming that people know what it is. And I think that’s a great form of impact if we can give people a term to articulate what it is they’re thinking about. 00:26:50 - Speaker 2: Yeah, language, a shared vocabulary to describe something as a very powerful way to, one, just sort of advance our ability to communicate clearly with each other, but also, yeah, there’s so many ideas. I mean, it’s a 20 something page paper and there’s so many ideas, but you wrap this up in this one term and for someone who has downloaded some or most of these ideas, that one term can carry all the weight and then you can build on that. You can take all that as a given and then build forward from there. 00:27:19 - Speaker 1: Yeah. One thing I wish we had done more on is I think trying to get a bit more into the economic implications of it. I guess that would have made the essay another 5 pages longer and so at some point we just have to stop. But I feel like it’s quite an important aspect like what we talked about earlier of not having to worry about back ends or even just like not having to worry generally about the distributed systems problem of like you make a request to a server, the request times out. You have no idea whether the server got the request or not. Like, do you retry it? If so, how do you make the retry? Is that potent so that it’s safe to retry and so on. Like all of those problems just go away if you’re using a general purpose syncing infrastructure that somebody else has written for you. And there are other implications as well that are less clear of what about the business model of software as a service. Because there a lot of companies' business model right now is basically pay us, otherwise you’re going to get locked out of your data. So it’s using this idea of holding data hostage almost as the reason why you should pay for a product. And you know, it’s like that with Slack. Like you put all of your messages in Slack those messages were written by you and your colleagues. There’s nothing really Slack did to own those, they just facilitated the exchange of those messages between you and your colleagues. But then once you go over the, whatever it is, 10,000 messages limit, then suddenly you have to pay slack to see the messages that you wrote yourself. And generally that’s the business model with a lot of software as a service. And with local first, it’s not clear that that business model will still work so clearly. But of course, software developers still have to be paid for their time somehow. So how do we find a way of building sustainable software businesses for collaboration software but without holding data hostage? I think that’s a really deep and interesting question. 00:29:07 - Speaker 3: Yeah, I think as an aside that uh you might call it the political economy of software is understudied and underconsidered, and I would put in here like the economics of software business, but also the interaction with things like regulation and governments and the huge amount of path dependence that’s involved. I think that’s just a huge deal. I think we’re starting to realize it, but yeah, there’s a ton of stuff we could do and think about just for local first. Like just one kind of branch that I hope we get to explore is we mentioned how local first enables you to do totally different topologies. So with cloud software almost by definition, you have this hub and spoke model where everything goes through. The central server and the central corporation. Well with local first, you can very easily, for example, have a federated system where you dial into one of many synchronization nodes and you could even have more like a mesh system where you request and incentivize packets to be forwarded through a mesh to their destination, sort of like TCPIP networking, but for like the application layer, and it may be, you know, it’s still kind of TBD but it may be that a mesh or a distributed approach has totally different political implications from a centralized node and that might become important, so. I just think there’s a lot to think about and do here. 00:30:15 - Speaker 1: Yeah, I think so too. And like, I would take email as an analogy maybe, which is a federated system just like what you described, like you send your email to your local SMTP server and it forwards it to the recipient’s SMTP server and the system works really well. Certainly as criticisms like spam filtering is difficult in a really decentralized way. Maybe spam is not a problem that local first software will have as much because it’s intended more like for collaboration between people who know each other rather than as a way of contacting people you don’t know yet. But certainly like I think taking some inspiration from that federation and seeing how that can be applied to other domains, I think would be very interesting. 00:30:55 - Speaker 3: Yeah, and this brings us to the topic of like industrialization and commercialization, and I feel like there’s more promise than ever around local First and more people are excited about it, but I still feel like we’re just in the beginning phases of getting the ball rolling on industrial and commercial applications. And if I’m being really honest, I feel like it might have been slower than I had initially hoped over the past few years, so I’m just curious if Adam and Martin, you would reflect on that. 00:31:21 - Speaker 2: It’s always hard to say, right? The thing with any technology, but certainly my career in computing, this has always proven to be the case, is that something seems right around the corner and it stays so for 20 years. I don’t know, maybe VRs in that category, but then there’ll be a moment, suddenly it’ll just be everywhere, broadband internet or something like that. So as a people who are both trying to advance the state of the art but also making Business decisions, you know, should I start a company? Should I invest in a company? Should I keep working on the company I’m working on based on what technologies exist or where you see things going? Yeah, you’re always trying to make accurate predictions. So yeah, I agree on one hand it felt very close to me on the basis of the prototypes we’d built, the automerged library, the reference, Martin I’ll link that in the notes here, but basically that’s a JavaScript implementation of something called CRDTs which Just I guess as a sidebar, it could be easy to think that CRDTs and local first software are kind of one and the same because they are often mentioned together and in fact our paper talks about them together, but CRDTs are a technology we find incredibly promising for helping to deliver local first software, but local first is a set of principles. It doesn’t require any particular technological solution. Yeah, based on the strength of those prototypes, many of which worked really well, there’s the networking side of it and whether you can have that be fully kind of decentralized versus needing more of a central coordination server. But once you get past that hump, it does work really, really well. But I think that comes back to the point you both made there about the economic model side of things, which is we have a whole software industry that’s built around people will pay for software when there’s a service connected to it. Right, so Sass, in particular B2BASS is just a fantastic business to be in, and as a result, we’ve seen a huge explosion of software around that, but connected to that is, for example, the Fremium model, exactly like what you mentioned with Slack, the docs is one of those. Notion is one of those. They do this kind of free for individuals, but then you pay when you’re a business and then you need to come up with the feature stuff, the kinds of features that seem to be selecting for you being a business with more serious needs and something like retaining your message history is there. I wrote a whole other shorter essay about paying for software. I’ll link that in the notes, but I think we got into a weird corner. The industry got itself into a weird painted itself into a corner because Things like Google giving you so much incredibly high quality software, Gmail, Google Docs, Google Maps, etc. for quote unquote free, but then how you’re really paying for it is your attention and your data, right? And that being kind of montizable through being able to essentially serve you ads. And I think that’s fine and I’m very glad for Google’s existence and they found that model, but it almost feels like then it taught people that good software should be free and that you shouldn’t pay for, maybe that’s a little bit connected to the concept that software R&D basically costs nothing to make additional copies of it. So therefore, if you make this big upfront investment and then the software exists and you can duplicate it endlessly, but I think there’s a lot of things flawed about all of that. But the end place that gets you to is, OK, if someone has my data and I’m paying them to maintain it and run the servers that it’s on, I can stomach that. OK, now I’ll pay $5 a month, $10 a month for my Dropbox account or something like that. But other than that, we’ve become accustomed to, oh, if it’s an app on the App Store, the App Store is a good example of these kind of consumer economics, we just expect it to be vastly lower cost or free and good software costs money to make, and as we kind of talked about earlier, I would rather be building the software, not maintaining the infrastructure, but when you set it up so that The only way you can make money is to build software that has infrastructure, you’re actually incentivized, build that back end as soon as you can and get the user’s data in there and not necessarily hold it hostage, but just take ownership of it, because that’s what people will pay for. They won’t pay for software where they own the data themselves. 00:35:34 - Speaker 1: Yes, one thing that a friend has suggested is that when talking about the business model of local first software, we should just call it SA, like label it as SAS, market it in exactly the same way as SAS. Don’t even tell people that it’s local first software and just use the fact that it’s a lot cheaper and easier to implement local first software and use that for your own benefit in order to build the software more cheaply. But don’t actually market the local first aspect. And I thought that’s quite an interesting idea because, you know, it is an idea that people are accustomed to and to be honest, I think the amount of piracy that you would get from people like ripping out the sinking infrastructure and putting it for something else and then continuing to use the app without paying for it, it’s probably pretty limited. So you probably only need to put in a very modest hurdle there of say, OK, like this is the point at which you pay. Regardless of whether that point for payment is, you know, necessarily enforced in the infrastructure, it might just be an if statement in your client side up and maybe that’s fine. 00:36:38 - Speaker 2: Muse is basically an example of that. We have this membership model that is, you know, subscription is your only option and there are a lot of folks that complain about that or take issue with it, and I think there are many valid complaints you can make, but I think in many cases it is just a matter of what. Folks are accustomed to and we want to be building and delivering great software that improves and changes over time and maps to the changing world around it, and that’s something where as long as you’re getting value, you pay for it and you’re not getting value anymore, you don’t have to pay anymore. And then a model like that basically works best for everyone, we think, again, not everyone agrees. But then again, you do get this pushback of we are running a small service, but it’s not super critical to the application, but maybe that would be a good moment to speak briefly about the explorations we’re doing on the local first sync side, Mark. 00:37:32 - Speaker 3: Yeah, so right now Muse is basically a local only app, like it’s a traditional desktop app where files are just saved to the local device and that’s about it, and you can manually move bundles across devices, but otherwise it just runs locally and the idea is to extend be used with first syncing across your devices and then eventually collaboration across users using a local first approach. Now we don’t plan to do, at least initially the kind of fully distributed mesh networking peer to peer thing. It will be a sync service provided by Muse and kind of baked in to the app, but it will have all those nice local first properties of it works offline, it’s very fast, all the different nodes are first class and so forth while eventually supporting syncing and and collaboration. So, yeah, we’re going through this journey of, we had a lot of experience with um basic prototypes in a lab, but there’s a big jump to have a commercialized and industrialized product, not just in terms of charging for the business model and stuff, but in terms of the performance and it’s like all the weird things that you deal with in the real world, like versioning and schemas and the idiosyncrasies of networking and All the things that go around the core functionality, like one thing we’re thinking a lot about is visibility into the sync status and how it’s different in a local first world. Yeah, so I’m excited that we are now investing a lot in bringing local first into the real world with MS. 00:38:57 - Speaker 1: Yeah, and I feel like more generally, if we want the local first ideas to be adopted, we need to make it easy in a way that people can just take an open source library off the shelf, not have to think too much about it, plug it into their app, have a server that’s ready to go, that’s either it’s already hosted for them or they can spin up their own server and make that path. Super easy and straightforward and that’s kind of where my research is focusing of trying to get the technologies to that point. So right now, we have some basic implementations of this stuff. So automerge is a library that does this kind of data synchronization. It includes a JSON like data model that you can use to store the state of your application. It has a sort of basic network protocol that can be used to sync up two nodes. But there’s so much more work to be done on making the performance really good, like at the moment it’s definitely not very good. We’re making progress with that, but it’s still a long way to go. Making the data sync protocol efficient over all sorts of different types of network link in different scenarios, making it work well if you have large numbers of files, for example, not just single file and so on. And so there’s a ton of work still to be done there on the technical side. I think before this is really in a state where people can just pick up the open source library and run with it. Part of it is also like just uh getting the APIs right, making sure it has support across all the platforms, just having a JavaScript implementation is fine for initial prototypes, but Obviously, iOS apps are written in SWIFT and Android apps will be written in coin or whatever people use and so you need to have support across all of the commonly used platforms and we’re gradually getting there, but it’s a ton of work. 00:40:43 - Speaker 2: And conceptually seeing how autoerge is evolving. And how people are trying to use it sometimes very successfully, sometimes less so, but I see this as a case of technology transfer, which is an area I’m incredibly interested in because I think it’s kind of a big unsolved problem in HCI research, computer science, honestly, maybe all research, but I’ll stick to my lane in terms of what I know, which is there is often this very excellent cutting edge research that does sit in the labs. So to speak, and never graduates or it’s very hard or there isn’t a good path often for it to jump over that hump into what’s needed in the production world and of course in the research world you’re trying to do something new and different and push the boundaries of what was possible before and in the production commercial side you want to choose boring technologies and do things that are really reliable and known and stable, and those two, there’s often a bridge that’s hard to divide there. Sitting in your seat as again someone who’s enmeshed in the academic world right now and you’re creating this library, you know, started as a called a proof of concept for lack of a better term, and then you have customers if that’s the right way to put it, but as an academic you would Shouldn’t have customers, but you sort of do because people want to use this library and in fact are for their startups and things like that. How do you see that transition happening or is there a good role model you’ve seen elsewhere or just kind of figure it out as you go? 00:42:16 - Speaker 1: Well, I think where we’re trying to go with this is it’s great for automerge to have users. I don’t think of them as customers. I don’t care about getting any money from them. But I do care about getting bug reports from them and experienced reports of how they’re getting on with the APIs and reports of performance problems and so on. And those things are all tremendously valuable because they actually feed back into the research process and so I’m essentially using the open source users and the contributors as a source of research problems. So with my research hat on, this is great because I have essentially here right in front of me a goldmine of interesting research problems to work on. I just take like the top issue that people are complaining about on GitHub, have a think about how we might solve that. And often there’s enough of a nugget of research problem in there that when we solve the problem, we can write a paper about it. It can be an academic contribution as well as moving the open source ecosystem gradually towards a point where we’ve ironed out all of those major technical problems and hopefully made something that is more usable in production. So, I actually feel those worlds are pretty compatible at the moment. There are some things which are a bit harder to make compatible like sort of the basic work of porting stuff to new languages on new platforms, that’s necessary for real life software engineering, but there’s no interesting research to be done there, to be honest. But so far I’ve found that quite a lot of the problems that we have run into actually do have interesting research that needs to be done in order to solve them. And as, as such, I think they’re quite well compatible at the moment. 00:43:56 - Speaker 2: And like imagining or the mental picture of someone submits a bug report and one year later you come back and say, here’s the fix and also the paper we published about it. 00:44:08 - Speaker 1: I’ve literally had cases where somebody turns up on Slack and says, I found this problem here. What about it? And I said, oh yeah, I wrote a paper about it. And the paper has a potential algorithm for fixing it, but I haven’t implemented it yet, sorry. And they go like, WTF what you put all of this thought into it. You’ve written the paper. And you haven’t implemented. And I go, well, actually sorry for me, that’s easier because if I want to implement it, I have to put in all of the thoughts and convince myself that it’s correct. And then I also have to implement it and then I also have to write all the tests for it and then I have to make sure that it doesn’t break other features and it doesn’t break the APIs and need to come up with good APIs for it and so on. So for me actually like implementing it is a lot more work than just doing the research in a sense, but actually, Doing the research and implementing it can be a really useful part of making sure that we’ve understood it properly from a research point of view. So that at the end what we write in the paper is ends up being correct. In this particular case, actually, it turned out that the algorithm I had written down in the paper was wrong because I just haven’t thought about it deeply enough and a student in India emailed me to say, hey, there’s a bug in your algorithm, and I said, yeah, you’re right, there’s a bug in our algorithm. We better fix it. And so probably through implementing that maybe I would have found the bug, maybe not, but I think this just shows that it is hard getting this stuff right, but the engagement with the open source community, I found a very valuable way of Both working towards a good product but also doing interesting research. 00:45:39 - Speaker 3: I think it’s also useful to think of this in terms of the research and development frame. So research is coming up with the core insights, the basic ideas, those universal truths to unlock new potential in the world, and it’s my opinion that with local first, there’s a huge amount of development that is needed, and that’s a lot of what we’re doing with Muse. So analogy I might use is Like a car and an internal combustion engine. If you came up with the idea of an internal combustion engine, that’s amazing. It’s pretty obvious that that should be world changing. You can spin this shaft at 5000 RPM with 300 horsepower, you know, it’s amazing, but you’re really not there yet. Like you need to invent suspension and transmission and cooling and It’s kind of not obvious how much work that’s gonna be until you go to actually build the car and run at 100 miles an hour. So I think there’s a lot of work that is still yet to be done on that front. And eventually, that kind of does boil down or emit research ideas and bug reports and things like that, but there’s also kind of its own whole thing, and there’s a lot to do there. There’s also a continuous analogy. I think once the research and the initial most obvious development examples get far enough along, you should have some. Unanticipated applications of the original technology. So this should be someone saying like, what if we made an airplane with an internal combustion engine, right? I don’t think we’ve quite seen that with local first, but I think we will once it’s more accessible, cause right now to use local first, you gotta be basically a world expert on local first stuff to even have a shot. But once it’s packaged enough and people see enough examples in real life, they should be able to more easily come up with their own new wild stuff. 00:47:11 - Speaker 1: Yeah, we have seen some interesting examples of people using our software in unexpected ways. One that I quite like is the Washington Post, as in the newspaper, everyone knows. They have an internal system for allowing several editors to update the layout of the home page. So the placement of which article goes where, with which headline, with which image, in which font size, in which column. All of that is set manually, of course, by editors. And they adopted automerge as a way of building the collaboration tool that helps them manage this homepage. Now, this is not really a case that needs local first, particularly because it’s not like one editor is going to spend a huge amount of time editing offline and then sharing their edits to the homepage. But what I did want is a process whereby multiple editors can each be responsible for a section of the homepage and they can propose changes to their section. And then hand those changes over to somebody else who’s going to review them and maybe approve them or maybe decline them. And so what they need essentially is this process of version control, Git style version control almost, but for the structure of representing the homepage. And they want the ability for several people to update that independently. And that’s not because people are working offline, but because people are using essentially branches using the Git metaphor. So different editors will be working on their own local branch until they’ve got it right and then they’ll hit a button where they say, OK, send this to another editor for approval. And that I found really interesting. It’s sort of using the same basic technologies that we’ve developed with CRTTs, tracking the changes to these data structures, being able to automatically merge changes made by different users, but applying it in sort of this interesting unexpected context. And I hope like as these tools mature, we will expand the set of applications for which they can be sensibly used and In that expansion, we will then also see more interesting unexpected applications where people start doing things that we haven’t anticipated. 00:49:19 - Speaker 2: Maybe this reflects my commercial world bias or maybe I’m just a simple man, but I like to see something working more than I like to read a proof that it works. And both are extremely important, right? So the engineering approach to seeing if something works is you write a script, you know, fuzz testing, right, you try a million different permutations and if it all seemed to work, kind of the Monte Carlo simulation test of something, and it seems to work in all the cases you can find, so it seems like it’s working. And then there’s, I think, the more proof style in the sense of mathematical proof of here is an airtight logical deductive reasoning case or mathematical case that shows that it works in all scenarios that it’s not a Monte Carlo calculation of the area under the curve, it’s calculus to determine precisely to infinite resolution area to the curve. And I think they both have their place kind of to Mark’s point, you need to both kind of conceptually come up with the combustion engine and then build one, and then all the things that are gonna go with that. And I think we all have our contributions to make. I think I probably, much as I like the research world at some point when there’s an idea that truly excites me enough, and local first broadly and CRDT. in this category, and I wanna see it, I want to try it, I want to see how it feels. In fact, that was our first project together. Martin was, we did this sort of trello clone essentially that was local first software and could basically merge together of two people worked offline, and they had a little bit of a version history. I did a little demo video, I’ll link that in the show notes. But for me it was really exciting to see that working, and I think maybe your reaction was a bit of a like, well, of course, you know, we have 5 years of research. Look at all these papers that prove that it would work, but I want to see it working and moreover feel what it will be like because I had this hunch that it would feel really great for the end user to have that agency. But seeing it slash experiencing it, for me that drives it home and creates internal motivation far more than the thought experiment is the right word, the conceptual realm work, even though I know that there’s no way we could have built that prototype without all that deep thinking and hard work that went into the science that led up to it. 00:51:42 - Speaker 1: Yeah, and it’s totally amazing to see something like working for the first time and it’s very hard to anticipate how something is going to feel, as you said, like you can sort of rationalize about its pros and cons and things like that, but that’s still not quite the same thing as the actual firsthand experience of really using the software. 00:52:01 - Speaker 2: All right, so local first, the paper and the concept, I think we are pretty happy with the impact that it made, how it’s changed a lot of industry discussion, and furthermore, that while the technology maybe is not as far along as we’d like, it has come a long way, and we’re starting to see it make its way into more real world applications, including news in the very near future. But I guess looking forward to the future for either that kind of general movement or the technology. What do you both hope to see in the coming, say, next 2 years, or even further out? 00:52:38 - Speaker 3: Well, the basic thing that I’d like to see is this development of the core idea and see it successfully applied in commercial and industrial settings. Like I said, I think there’s a lot of work to do there and some people have started, but I’d like to see that really land. And then assuming we’re able to get the basic development landed, a particular direction I’m really excited about is non-centralized topologies. I just think that’s going to become very important and it’s a unique potential of local first software. So things like federated syncing services, mesh topologies, end to end encryption, generalized sync services like we talked about, really excited to see those get developed and explored. 00:53:18 - Speaker 1: Yeah, those are all exciting topics. For me, one thing that I don’t really have a good answer to but which seems very interesting is what does the relationship between apps and the operating system look like in the future? Because like right now, we’re still essentially using the same 1970 Unix abstraction of we have a hierarchical file system. A file is a sequence of bytes. That’s it. A file has a name and the content has no further structure other than being a sequence of bytes. But if you want to allow several users to edit a file at the same time and then merge those things together again, you need more knowledge about the structure of what’s inside the file. You can’t just do that with an opaque sequence of bytes. And ICRDTs is essentially providing a sort of general purpose higher level file format that apps can use to express and represent the data that they want to have just like Jason and XML are general purpose data representations and CRDTs further refined this by not just capturing the current state, but also capturing all the changes that were made to the state and thereby they much better encapsulate the what was the intent of the user when they made a certain change and then capturing those intents of the user through the operations they perform that then allows different users changes to be merged in a sensible way. And I feel like this idea of really changes the abstractions that operating systems should provide because maybe OSs should not just be providing this model of files as a sequence of bytes, but this higher level CRDT like model and how does that impact the entire way how software is developed. I think there’s a potential for just rethinking a lot of the stack that has built up a huge amount of craft over the past decades. And potential to like really simplify and make things more powerful at the same time. 00:55:17 - Speaker 2: Yeah, local first file system to me is kind of the end state, and maybe that’s not quite a file system in the sense of how we think about it today, but a persistence layer that has certainly these concepts baked into it, but I think also just reflects the changing user expectations. People want Google Docs and notion and Figma. And they expect that their email and calendar will seamlessly sync across all their devices and then you have other collaborators in the mix, so your files go from being these pretty static things on the disk, you know, you press command S or ControlS and every once in a while it does a binary dump of your work that you can load later and instead it becomes a continuous stream of changes coming from a lot of different sources. They come from, I’ve got my phone and my computer and Mark’s got his tablet and his phone, and Martin, you’ve got your computer and we’re all contributing to a document and those changes are all streaming together and need to be coalesced and made sense of. And I think that’s the place where, for example, Dropbox, much as I love it, or iCloud, which I think in a lot of ways is a really good direction, but both of those are essentially dead ends, because they just take the classic static binary file and put it on the network, which is good, but it only takes you so far, cause again, people want Google Docs, that’s just the end of it. And that puts every single company that’s going to build an application of this sort. They have to build the kind of infrastructure necessary to do that. And we’ve seen where I think FIMA is the most dramatic example, they just took sketch and ported it to a kind of a real-time collaborative web first environment, and the word just there is carrying a lot of weight. Because in fact, it’s this huge engineering project and they need to raise a bunch of venture capital, but then once you had it, it was so incredibly valuable to have that collaboration, and then, of course, they built far beyond the initial, let’s just do sketch on the web. But any company that wants to do something like that, and increasingly that’s not table stakes from user expectation standpoint, they want to be able to do that, but you have to do that same thing. You’ve gotta drop, you know, tens of millions of dollars on big teams to do that. It seems strange when I think many and most at least productivity applications want something similar to that. So, if that were built into the operating system the same way that a file system is, or, you know, we proposed this idea of like a firebase style thing for local first and CRDTs which could be maybe more developer infrastructure, maybe that’s also what you guys were speaking about. Earlier with kind of like AWBS could run a generic sync service. I don’t know exactly what the interface looks like, it’s more of a developer thing or an end user thing, but basically every single application needs this, and the fact that it is a huge endeavor that costs so much money and requires really top talent to do at all, let alone continue running over time and scale up, just seems like a mismatch with what the world needs and wants right now. 00:58:25 - Speaker 3: Yeah, and now I want to riff on this, Adam, because the strongest version of that vision is