Database class for storage and retrieval of modeled data
POPULARITY
Categories
Talk Python To Me - Python conversations for passionate developers
The folks over at Astral have made some big-time impacts in the Python space with uv and ruff. They are back with another amazing project named ty. You may have known it as Red-Knot. But it's coming up on release time for the first version and with the release it comes with a new official name: ty. We have Charlie Marsh and Carl Meyer on the show to tell us all about this new project. Episode sponsors Posit Auth0 Talk Python Courses Links from the show Talk Python's Rock Solid Python: Type Hints & Modern Tools (Pydantic, FastAPI, and More) Course: training.talkpython.fm Charlie Marsh on Twitter: @charliermarsh Charlie Marsh on Mastodon: @charliermarsh Carl Meyer: @carljm ty on Github: github.com/astral-sh/ty A Very Early Play with Astral's Red Knot Static Type Checker: app.daily.dev Will Red Knot be a drop-in replacement for mypy or pyright?: github.com Hacker News Announcement: news.ycombinator.com Early Explorations of Astral's Red Knot Type Checker: pydevtools.com Astral's Blog: astral.sh Rust Analyzer Salsa Docs: docs.rs Ruff Open Issues (label: red-knot): github.com Ruff Types: types.ruff.rs Ruff Docs (Astral): docs.astral.sh uv Repository: github.com Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to Talk Python on YouTube: youtube.com Talk Python on Bluesky: @talkpython.fm at bsky.app Talk Python on Mastodon: talkpython Michael on Bluesky: @mkennedy.codes at bsky.app Michael on Mastodon: mkennedy
Talk Python To Me - Python conversations for passionate developers
Python has many string formatting styles which have been added to the language over the years. Early Python used the % operator to injected formatted values into strings. And we have string.format() which offers several powerful styles. Both were verbose and indirect, so f-strings were added in Python 3.6. But these f-strings lacked security features (think little bobby tables) and they manifested as fully-formed strings to runtime code. Today we talk about the next evolution of Python string formatting for advanced use-cases (SQL, HTML, DSLs, etc): t-strings. We have Paul Everitt, David Peck, and Jim Baker on the show to introduce this upcoming new language feature. Episode sponsors Posit Auth0 Talk Python Courses Links from the show Guests: Paul on X: @paulweveritt Paul on Mastodon: @pauleveritt@fosstodon.org Dave Peck on Github: github.com Jim Baker: github.com PEP 750 – Template Strings: peps.python.org tdom - Placeholder for future library on PyPI using PEP 750 t-strings: github.com PEP 750: Tag Strings For Writing Domain-Specific Languages: discuss.python.org How To Teach This: peps.python.org PEP 501 – General purpose template literal strings: peps.python.org Python's new t-strings: davepeck.org PyFormat: Using % and .format() for great good!: pyformat.info flynt: A tool to automatically convert old string literal formatting to f-strings: github.com Examples of using t-strings as defined in PEP 750: github.com htm.py issue: github.com Exploits of a Mom: xkcd.com pyparsing: github.com Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to Talk Python on YouTube: youtube.com Talk Python on Bluesky: @talkpython.fm at bsky.app Talk Python on Mastodon: talkpython Michael on Bluesky: @mkennedy.codes at bsky.app Michael on Mastodon: mkennedy
In this episode, Lois Houston and Nikita Abraham continue their deep dive into Oracle GoldenGate 23ai, focusing on its evolution and the extensive features it offers. They are joined once again by Nick Wagner, who provides valuable insights into the product's journey. Nick talks about the various iterations of Oracle GoldenGate, highlighting the significant advancements from version 12c to the latest 23ai release. The discussion then shifts to the extensive new features in 23ai, including AI-related capabilities, UI enhancements, and database function integration. Oracle GoldenGate 23ai: Fundamentals: https://mylearn.oracle.com/ou/course/oracle-goldengate-23ai-fundamentals/145884/237273 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://x.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Kris-Ann Nansen, Radhika Banka, and the OU Studio Team for helping us create this episode. ----------------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started! 00:25 Lois: Hello and welcome to the Oracle University Podcast! I'm Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Team Lead: Editorial Services. Nikita: Hi everyone! Last week, we introduced Oracle GoldenGate and its capabilities, and also spoke about GoldenGate 23ai. In today's episode, we'll talk about the various iterations of Oracle GoldenGate since its inception. And we'll also take a look at some new features and the Oracle GoldenGate product family. 00:57 Lois: And we have Nick Wagner back with us. Nick is a Senior Director of Product Management for GoldenGate at Oracle. Hi Nick! I think the last time we had an Oracle University course was when Oracle GoldenGate 12c was out. I'm sure there's been a lot of advancements since then. Can you walk us through those? Nick: GoldenGate 12.3 introduced the microservices architecture. GoldenGate 18c introduced support for Oracle Autonomous Data Warehouse and Autonomous Transaction Processing Databases. In GoldenGate 19c, we added the ability to do cross endian remote capture for Oracle, making it easier to set up the GoldenGate OCI service to capture from environments like Solaris, Spark, and HP-UX and replicate into the Cloud. Also, GoldenGate 19c introduced a simpler process for upgrades and installation of GoldenGate where we released something called a unified build. This means that when you install GoldenGate for a particular database, you don't need to worry about the database version when you install GoldenGate. Prior to this, you would have to install a version-specific and database-specific version of GoldenGate. So this really simplified that whole process. In GoldenGate 23ai, which is where we are now, this really is a huge release. 02:16 Nikita: Yeah, we covered some of the distributed AI features and high availability environments in our last episode. But can you give us an overview of everything that's in the 23ai release? I know there's a lot to get into but maybe you could highlight just the major ones? Nick: Within the AI and streaming environments, we've got interoperability for database vector types, heterogeneous capture and apply as well. Again, this is not just replication between Oracle-to-Oracle vector or Postgres to Postgres vector, it is heterogeneous just like the rest of GoldenGate. The entire UI has been redesigned and optimized for high speed. And so we have a lot of customers that have dozens and dozens of extracts and replicats and processes running and it was taking a long time for the UI to refresh those and to show what's going on within those systems. So the UI has been optimized to be able to handle those environments much better. We now have the ability to call database functions directly from call map. And so when you do transformation with GoldenGate, we have about 50 or 60 built-in transformation routines for string conversion, arithmetic operation, date manipulation. But we never had the ability to directly call a database function. 03:28 Lois: And now we do? Nick: So now you can actually call that database function, database stored procedure, database package, return a value and that can be used for transformation within GoldenGate. We have integration with identity providers, being able to use token-based authentication and integrate in with things like Azure Active Directory and your other single sign-on for the GoldenGate product itself. Within Oracle 23ai, there's a number of new features. One of those cool features is something called lock-free reservation columns. So this allows you to have a row, a single row within a table and you can identify a column within that row that's like an inventory column. And you can have multiple different users and multiple different transactions all updating that column within that same exact row at that same time. So you no longer have row-level locking for these reservation columns. And it allows you to do things like shopping carts very easily. If I have 500 widgets to sell, I'm going to let any number of transactions come in and subtract from that inventory column. And then once it gets below a certain point, then I'll start enforcing that row-level locking. 04:43 Lois: That's really cool… Nick: The one key thing that I wanted to mention here is that because of the way that the lock-free reservations work, you can have multiple transactions open on the same row. This is only supported for Oracle to Oracle. You need to have that same lock-free reservation data type and availability on that target system if GoldenGate is going to replicate into it. 05:05 Nikita: Are there any new features related to the diagnosability and observability of GoldenGate? Nick: We've improved the AWR reports in Oracle 23ai. There's now seven sections that are specific to Oracle GoldenGate to allow you to really go in and see exactly what the GoldenGate processes are doing and how they're behaving inside the database itself. And there's a Replication Performance Advisor package inside that database, and that's been integrated into the Web UI as well. So now you can actually get information out of the replication advisor package in Oracle directly from the UI without having to log into the database and try to run any database procedures to get it. We've also added the ability to support a per-PDB Extract. So in the past, when GoldenGate would run on a multitenant database, a multitenant database in Oracle, all the redo data from any pluggable database gets sent to that one redo stream. And so you would have to configure GoldenGate at the container or root level and it would be able to access anything at any PDB. Now, there's better security and better performance by doing what we call per-PDB Extract. And this means that for a single pluggable database, I can have an extract that runs at that database level that's going to capture information just from that pluggable database. 06:22 Lois And what about non-Oracle environments, Nick? Nick: We've also enhanced the non-Oracle environments as well. For example, in Postgres, we've added support for precise instantiation using Postgres snapshots. This eliminates the need to handle collisions when you're doing Postgres to Postgres replication and initial instantiation. On the GoldenGate for big data side, we've renamed that product more aptly to distributed applications in analytics, which is really what it does, and we've added a whole bunch of new features here too. The ability to move data into Databricks, doing Google Pub/Sub delivery. We now have support for XAG within the GoldenGate for distributed applications and analytics. What that means is that now you can follow all of our MAA best practices for GoldenGate for Oracle, but it also works for the DAA product as well, meaning that if it's running on one node of a cluster and that node fails, it'll restart itself on another node in the cluster. We've also added the ability to deliver data to Redis, Google BigQuery, stage and merge functionality for better performance into the BigQuery product. And then we've added a completely new feature, and this is something called streaming data and apps and we're calling it AsyncAPI and CloudEvent data streaming. It's a long name, but what that means is that we now have the ability to publish changes from a GoldenGate trail file out to end users. And so this allows through the Web UI or through the REST API, you can now come into GoldenGate and through the distributed applications and analytics product, actually set up a subscription to a GoldenGate trail file. And so this allows us to push data into messaging environments, or you can simply subscribe to changes and it doesn't have to be the whole trail file, it can just be a subset. You can specify exactly which tables and you can put filters on that. You can also set up your topologies as well. So, it's a really cool feature that we've added here. 08:26 Nikita: Ok, you've given us a lot of updates about what GoldenGate can support. But can we also get some specifics? Nick: So as far as what we have, on the Oracle Database side, there's a ton of different Oracle databases we support, including the Autonomous Databases and all the different flavors of them, your Oracle Database Appliance, your Base Database Service within OCI, your of course, Standard and Enterprise Edition, as well as all the different flavors of Exadata, are all supported with GoldenGate. This is all for capture and delivery. And this is all versions as well. GoldenGate supports Oracle 23ai and below. We also have a ton of non-Oracle databases in different Cloud stores. On an non-Oracle side, we support everything from application-specific databases like FairCom DB, all the way to more advanced applications like Snowflake, which there's a vast user base for that. We also support a lot of different cloud stores and these again, are non-Oracle, nonrelational systems, or they can be relational databases. We also support a lot of big data platforms and this is part of the distributed applications and analytics side of things where you have the ability to replicate to different Apache environments, different Cloudera environments. We also support a number of open-source systems, including things like Apache Cassandra, MySQL Community Edition, a lot of different Postgres open source databases along with MariaDB. And then we have a bunch of streaming event products, NoSQL data stores, and even Oracle applications that we support. So there's absolutely a ton of different environments that GoldenGate supports. There are additional Oracle databases that we support and this includes the Oracle Metadata Service, as well as Oracle MySQL, including MySQL HeatWave. Oracle also has Oracle NoSQL Spatial and Graph and times 10 products, which again are all supported by GoldenGate. 10:23 Lois: Wow, that's a lot of information! Nick: One of the things that we didn't really cover was the different SaaS applications, which we've got like Cerner, Fusion Cloud, Hospitality, Retail, MICROS, Oracle Transportation, JD Edwards, Siebel, and on and on and on. And again, because of the nature of GoldenGate, it's heterogeneous. Any source can talk to any target. And so it doesn't have to be, oh, I'm pulling from Oracle Fusion Cloud, that means I have to go to an Oracle Database on the target, not necessarily. 10:51 Lois: So, there's really a massive amount of flexibility built into the system. 11:00 Unlock the power of AI Vector Search with our new course and certification. Get more accurate search results, handle complex datasets easily, and supercharge your data-driven decisions. From now through May 15, 2025, we are waiving the certification exam fee (valued at $245). Visit mylearn.oracle.com to enroll. 11:26 Nikita: Welcome back! Now that we've gone through the base product, what other features or products are in the GoldenGate family itself, Nick? Nick: So we have quite a few. We've kind of touched already on GoldenGate for Oracle databases and non-Oracle databases. We also have something called GoldenGate for Mainframe, which right now is covered under the GoldenGate for non-Oracle, but there is a licensing difference there. So that's something to be aware of. We also have the OCI GoldenGate product. We are announcing and we have announced that OCI GoldenGate will also be made available as part of the Oracle Database@Azure and Oracle Database@ Google Cloud partnerships. And then you'll be able to use that vendor's cloud credits to actually pay for the OCI GoldenGate product. One of the cool things about this is it will have full feature parity with OCI GoldenGate running in OCI. So all the same features, all the same sources and targets, all the same topologies be able to migrate data in and out of those clouds at will, just like you do with OCI GoldenGate today running in OCI. We have Oracle GoldenGate Free. This is a completely free edition of GoldenGate to use. It is limited on the number of platforms that it supports as far as sources and targets and the size of the database. 12:45 Lois: But it's a great way for developers to really experience GoldenGate without worrying about a license, right? What's next, Nick? Nick: We have GoldenGate for Distributed Applications and Analytics, which was formerly called GoldenGate for big data, and that allows us to do all the streaming. That's also where the GoldenGate AsyncAPI integration is done. So in order to publish the GoldenGate trail files or allow people to subscribe to them, it would be covered under the Oracle GoldenGate Distributed Applications and Analytics license. We also have OCI GoldenGate Marketplace, which allows you to run essentially the on-premises version of GoldenGate but within OCI. So a little bit more flexibility there. It also has a hub architecture. So if you need that 99.99% availability, you can get it within the OCI Marketplace environment. We have GoldenGate for Oracle Enterprise Manager Cloud Control, which used to be called Oracle Enterprise Manager. And this allows you to use Enterprise Manager Cloud Control to get all the statistics and details about GoldenGate. So all the reporting information, all the analytics, all the statistics, how fast GoldenGate is replicating, what's the lag, what's the performance of each of the processes, how much data am I sending across a network. All that's available within the plug-in. We also have Oracle GoldenGate Veridata. This is a nice utility and tool that allows you to compare two databases, whether or not GoldenGate is running between them and actually tell you, hey, these two systems are out of sync. And if they are out of sync, it actually allows you to repair the data too. 14:25 Nikita: That's really valuable…. Nick: And it does this comparison without locking the source or the target tables. The other really cool thing about Veridata is it does this while there's data in flight. So let's say that the GoldenGate lag is 15 or 20 seconds and I want to compare this table that has 10 million rows in it. The Veridata product will go out, run its comparison once. Once that comparison is done the first time, it's then going to have a list of rows that are potentially out of sync. Well, some of those rows could have been moved over or could have been modified during that 10 to 15 second window. And so the next time you run Veridata, it's actually going to go through. It's going to check just those rows that were potentially out of sync to see if they're really out of sync or not. And if it comes back and says, hey, out of those potential rows, there's two out of sync, it'll actually produce a script that allows you to resynchronize those systems and repair them. So it's a very cool product. 15:19 Nikita: What about GoldenGate Stream Analytics? I know you mentioned it in the last episode, but in the context of this discussion, can you tell us a little more about it? Nick: This is the ability to essentially stream data from a GoldenGate trail file, and they do a real time analytics on it. And also things like geofencing or real-time series analysis of it. 15:40 Lois: Could you give us an example of this? Nick: If I'm working in tracking stock market information and stocks, it's not really that important on how much or how far down a stock goes. What's really important is how quickly did that stock rise or how quickly did that stock fall. And that's something that GoldenGate Stream Analytics product can do. Another thing that it's very valuable for is the geofencing. I can have an application on my phone and I can track where the user is based on that application and all that information goes into a database. I can then use the geofencing tool to say that, hey, if one of those users on that app gets within a certain distance of one of my brick-and-mortar stores, I can actually send them a push notification to say, hey, come on in and you can order your favorite drink just by clicking Yes, and we'll have it ready for you. And so there's a lot of things that you can do there to help upsell your customers and to get more revenue just through GoldenGate itself. And then we also have a GoldenGate Migration Utility, which allows customers to migrate from the classic architecture into the microservices architecture. 16:44 Nikita: Thanks Nick for that comprehensive overview. Lois: In our next episode, we'll have Nick back with us to talk about commonly used terminology and the GoldenGate architecture. And if you want to learn more about what we discussed today, visit mylearn.oracle.com and take a look at the Oracle GoldenGate 23ai Fundamentals course. Until next time, this is Lois Houston… Nikita: And Nikita Abraham, signing off! 17:10 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
Talk Python To Me - Python conversations for passionate developers
What trends and technologies should you be paying attention to today? Are there hot new database servers you should check out? Or will that just be a flash in the pan? I love these forward looking episodes and this one is super fun. I've put together an amazing panel: Gina Häußge, Ines Montani, Richard Campbell, and Calvin Hendryx-Parker. We dive into the recent Stack Overflow Developer survey results as a sounding board for our thoughts on rising and falling trends in the Python and broader developer space. Episode sponsors NordLayer Auth0 Talk Python Courses Links from the show The Stack Overflow Survey Results: survey.stackoverflow.co/2024 Panelists Gina Häußge: chaos.social/@foosel Ines Montani: ines.io Richard Campbell: about.me/richard.campbell Calvin Hendryx-Parker: github.com/calvinhp Explosion: explosion.ai spaCy: spacy.io OctoPrint: octoprint.org .NET Rocks: dotnetrocks.com Six Feet Up: sixfeetup.com Stack Overflow: stackoverflow.com Python.org: python.org GitHub Copilot: github.com OpenAI ChatGPT: chat.openai.com Claude: anthropic.com LM Studio: lmstudio.ai Hetzner: hetzner.com Docker: docker.com Aider Chat: github.com Goose AI: goose.ai IndyPy: indypy.org OctoPrint Community Forum: community.octoprint.org spaCy GitHub: github.com Hugging Face: huggingface.co Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to Talk Python on YouTube: youtube.com Talk Python on Bluesky: @talkpython.fm at bsky.app Talk Python on Mastodon: talkpython Michael on Bluesky: @mkennedy.codes at bsky.app Michael on Mastodon: mkennedy
In a new season of the Oracle University Podcast, Lois Houston and Nikita Abraham dive into the world of Oracle GoldenGate 23ai, a cutting-edge software solution for data management. They are joined by Nick Wagner, a seasoned expert in database replication, who provides a comprehensive overview of this powerful tool. Nick highlights GoldenGate's ability to ensure continuous operations by efficiently moving data between databases and platforms with minimal overhead. He emphasizes its role in enabling real-time analytics, enhancing data security, and reducing costs by offloading data to low-cost hardware. The discussion also covers GoldenGate's role in facilitating data sharing, improving operational efficiency, and reducing downtime during outages. Oracle GoldenGate 23ai: Fundamentals: https://mylearn.oracle.com/ou/course/oracle-goldengate-23ai-fundamentals/145884/237273 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://x.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Kris-Ann Nansen, Radhika Banka, and the OU Studio Team for helping us create this episode. --------------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started! 00:25 Nikita: Welcome to the Oracle University Podcast! I'm Nikita Abraham, Team Lead: Editorial Services with Oracle University, and with me is Lois Houston: Director of Innovation Programs. Lois: Hi everyone! Welcome to a new season of the podcast. This time, we're focusing on the fundamentals of Oracle GoldenGate. Oracle GoldenGate helps organizations manage and synchronize their data across diverse systems and databases in real time. And with the new Oracle GoldenGate 23ai release, we'll uncover the latest innovations and features that empower businesses to make the most of their data. Nikita: Taking us through this is Nick Wagner, Senior Director of Product Management for Oracle GoldenGate. He's been doing database replication for about 25 years and has been focused on GoldenGate on and off for about 20 of those years. 01:18 Lois: In today's episode, we'll ask Nick to give us a general overview of the product, along with some use cases and benefits. Hi Nick! To start with, why do customers need GoldenGate? Nick: Well, it delivers continuous operations, being able to continuously move data from one database to another database or data platform in efficiently and a high-speed manner, and it does this with very low overhead. Almost all the GoldenGate environments use transaction logs to pull the data out of the system, so we're not creating any additional triggers or very little overhead on that source system. GoldenGate can also enable real-time analytics, being able to pull data from all these different databases and move them into your analytics system in real time can improve the value that those analytics systems provide. Being able to do real-time statistics and analysis of that data within those high-performance custom environments is really important. 02:13 Nikita: Does it offer any benefits in terms of cost? Nick: GoldenGate can also lower IT costs. A lot of times people run these massive OLTP databases, and they are running reporting in those same systems. With GoldenGate, you can offload some of the data or all the data to a low-cost commodity hardware where you can then run the reports on that other system. So, this way, you can get back that performance on the OLTP system, while at the same time optimizing your reporting environment for those long running reports. You can improve efficiencies and reduce risks. Being able to reduce the amount of downtime during planned and unplanned outages can really make a big benefit to the overall operational efficiencies of your company. 02:54 Nikita: What about when it comes to data sharing and data security? Nick: You can also reduce barriers to data sharing. Being able to pull subsets of data, or just specific pieces of data out of a production database and move it to the team or to the group that needs that information in real time is very important. And it also protects the security of your data by only moving in the information that they need and not the entire database. It also provides extensibility and flexibility, being able to support multiple different replication topologies and architectures. 03:24 Lois: Can you tell us about some of the use cases of GoldenGate? Where does GoldenGate truly shine? Nick: Some of the more traditional use cases of GoldenGate include use within the multicloud fabric. Within a multicloud fabric, this essentially means that GoldenGate can replicate data between on-premise environments, within cloud environments, or hybrid, cloud to on-premise, on-premise to cloud, or even within multiple clouds. So, you can move data from AWS to Azure to OCI. You can also move between the systems themselves, so you don't have to use the same database in all the different clouds. For example, if you wanted to move data from AWS Postgres into Oracle running in OCI, you can do that using Oracle GoldenGate. We also support maximum availability architectures. And so, there's a lot of different use cases here, but primarily geared around reducing your recovery point objective and recovery time objective. 04:20 Lois: Ah, reducing RPO and RTO. That must have a significant advantage for the customer, right? Nick: So, reducing your RPO and RTO allows you to take advantage of some of the benefits of GoldenGate, being able to do active-active replication, being able to set up GoldenGate for high availability, real-time failover, and it can augment your active Data Guard and Data Guard configuration. So, a lot of times GoldenGate is used within Oracle's maximum availability architecture platinum tier level of replication, which means that at that point you've got lots of different capabilities within the Oracle Database itself. But to help eke out that last little bit of high availability, you want to set up an active-active environment with GoldenGate to really get true zero RPO and RTO. GoldenGate can also be used for data offloading and data hubs. Being able to pull data from one or more source systems and move it into a data hub, or into a data warehouse for your operational reporting. This could also be your analytics environment too. 05:22 Nikita: Does GoldenGate support online migrations? Nick: In fact, a lot of companies actually get started in GoldenGate by doing a migration from one platform to another. Now, these don't even have to be something as complex as going from one database like a DB2 on-premise into an Oracle on OCI, it could even be simple migrations. A lot of times doing something like a major application or a major database version upgrade is going to take downtime on that production system. You can use GoldenGate to eliminate that downtime. So this could be going from Oracle 19c to Oracle 23ai, or going from application version 1.0 to application version 2.0, because GoldenGate can do the transformation between the different application schemas. You can use GoldenGate to migrate your database from on premise into the cloud with no downtime as well. We also support real-time analytic feeds, being able to go from multiple databases, not only those on premise, but being able to pull information from different SaaS applications inside of OCI and move it to your different analytic systems. And then, of course, we also have the ability to stream events and analytics within GoldenGate itself. 06:34 Lois: Let's move on to the various topologies supported by GoldenGate. I know GoldenGate supports many different platforms and can be used with just about any database. Nick: This first layer of topologies is what we usually consider relational database topologies. And so this would be moving data from Oracle to Oracle, Postgres to Oracle, Sybase to SQL Server, a lot of different types of databases. So the first architecture would be unidirectional. This is replicating from one source to one target. You can do this for reporting. If I wanted to offload some reports into another server, I can go ahead and do that using GoldenGate. I can replicate the entire database or just a subset of tables. I can also set up GoldenGate for bidirectional, and this is what I want to set up GoldenGate for something like high availability. So in the event that one of the servers crashes, I can almost immediately reconnect my users to the other system. And that almost immediately depends on the amount of latency that GoldenGate has at that time. So a typical latency is anywhere from 3 to 6 seconds. So after that primary system fails, I can reconnect my users to the other system in 3 to 6 seconds. And I can do that because as GoldenGate's applying data into that target database, that target system is already open for read and write activity. GoldenGate is just another user connecting in issuing DML operations, and so it makes that failover time very low. 07:59 Nikita: Ok…If you can get it down to 3 to 6 seconds, can you bring it down to zero? Like zero failover time? Nick: That's the next topology, which is active-active. And in this scenario, all servers are read/write all at the same time and all available for user activity. And you can do multiple topologies with this as well. You can do a mesh architecture, which is where every server talks to every other server. This works really well for 2, 3, 4, maybe even 5 environments, but when you get beyond that, having every server communicate with every other server can get a little complex. And so at that point we start looking at doing what we call a hub and spoke architecture, where we have lots of different spokes. At the end of each spoke is a read/write database, and then those communicate with a hub. So any change that happens on one spoke gets sent into the hub, and then from the hub it gets sent out to all the other spokes. And through that architecture, it allows you to really scale up your environments. We have customers that are doing up to 150 spokes within that hub architecture. Within active-active replication as well, we can do conflict detection and resolution, which means that if two users modify the same row on two different systems, GoldenGate can actually determine that there was an issue with that and determine what user wins or which row change wins, which is extremely important when doing active-active replication. And this means that if one of those systems fails, there is no downtime when you switch your users to another active system because it's already available for activity and ready to go. 09:35 Lois: Wow, that's fantastic. Ok, tell us more about the topologies. Nick: GoldenGate can do other things like broadcast, sending data from one system to multiple systems, or many to one as far as consolidation. We can also do cascading replication, so when data moves from one environment that GoldenGate is replicating into another environment that GoldenGate is replicating. By default, we ignore all of our own transactions. But there's actually a toggle switch that you can flip that says, hey, GoldenGate, even though you wrote that data into that database, still push it on to the next system. And then of course, we can also do distribution of data, and this is more like moving data from a relational database into something like a Kafka topic or a JMS queue or into some messaging service. 10:24 Raise your game with the Oracle Cloud Applications skills challenge. Get free training on Oracle Fusion Cloud Applications, Oracle Modern Best Practice, and Oracle Cloud Success Navigator. Pass the free Oracle Fusion Cloud Foundations Associate exam to earn a Foundations Associate certification. Plus, there's a chance to win awards and prizes throughout the challenge! What are you waiting for? Join the challenge today by visiting visit oracle.com/education. 10:58 Nikita: Welcome back! Nick, does GoldenGate also have nonrelational capabilities? Nick: We have a number of nonrelational replication events in topologies as well. This includes things like data lake ingestion and streaming ingestion, being able to move data and data objects from these different relational database platforms into data lakes and into these streaming systems where you can run analytics on them and run reports. We can also do cloud ingestion, being able to move data from these databases into different cloud environments. And this is not only just moving it into relational databases with those clouds, but also their data lakes and data fabrics. 11:38 Lois: You mentioned a messaging service earlier. Can you tell us more about that? Nick: Messaging replication is also possible. So we can actually capture from things like messaging systems like Kafka Connect and JMS, replicate that into a relational data, or simply stream it into another environment. We also support NoSQL replication, being able to capture from MongoDB and replicate it onto another MongoDB for high availability or disaster recovery, or simply into any other system. 12:06 Nikita: I see. And is there any integration with a customer's SaaS applications? Nick: GoldenGate also supports a number of different OCI SaaS applications. And so a lot of these different applications like Oracle Financials Fusion, Oracle Transportation Management, they all have GoldenGate built under the covers and can be enabled with a flag that you can actually have that data sent out to your other GoldenGate environment. So you can actually subscribe to changes that are happening in these other systems with very little overhead. And then of course, we have event processing and analytics, and this is the final topology or flexibility within GoldenGate itself. And this is being able to push data through data pipelines, doing data transformations. GoldenGate is not an ETL tool, but it can do row-level transformation and row-level filtering. 12:55 Lois: Are there integrations offered by Oracle GoldenGate in automation and artificial intelligence? Nick: We can do time series analysis and geofencing using the GoldenGate Stream Analytics product. It allows you to actually do real time analysis and time series analysis on data as it flows through the GoldenGate trails. And then that same product, the GoldenGate Stream Analytics, can then take the data and move it to predictive analytics, where you can run MML on it, or ONNX or other Spark-type technologies and do real-time analysis and AI on that information as it's flowing through. 13:29 Nikita: So, GoldenGate is extremely flexible. And given Oracle's focus on integrating AI into its product portfolio, what about GoldenGate? Does it offer any AI-related features, especially since the product name has “23ai” in it? Nick: With the advent of Oracle GoldenGate 23ai, it's one of the two products at this point that has the AI moniker at Oracle. Oracle Database 23ai also has it, and that means that we actually do stuff with AI. So the Oracle GoldenGate product can actually capture vectors from databases like MySQL HeatWave, Postgres using pgvector, which includes things like AlloyDB, Amazon RDS Postgres, Aurora Postgres. We can also replicate data into Elasticsearch and OpenSearch, or if the data is using vectors within OCI or the Oracle Database itself. So GoldenGate can be used for a number of things here. The first one is being able to migrate vectors into the Oracle Database. So if you're using something like Postgres, MySQL, and you want to migrate the vector information into the Oracle Database, you can. Now one thing to keep in mind here is a vector is oftentimes like a GPS coordinate. So if I need to know the GPS coordinates of Austin, Texas, I can put in a latitude and longitude and it will give me the GPS coordinates of a building within that city. But if I also need to know the altitude of that same building, well, that's going to be a different algorithm. And GoldenGate and replicating vectors is the same way. When you create a vector, it's essentially just creating a bunch of numbers under the screen, kind of like those same GPS coordinates. The dimension and the algorithm that you use to generate that vector can be different across different databases, but the actual meaning of that data will change. And so GoldenGate can replicate the vector data as long as the algorithm and the dimensions are the same. If the algorithm and the dimensions are not the same between the source and the target, then you'll actually want GoldenGate to replicate the base data that created that vector. And then once GoldenGate replicates the base data, it'll actually call the vector embedding technology to re-embed that data and produce that numerical formatting for you. 15:42 Lois: So, there are some nuances there… Nick: GoldenGate can also replicate and consolidate vector changes or even do the embedding API calls itself. This is really nice because it means that we can take changes from multiple systems and consolidate them into a single one. We can also do the reverse of that too. A lot of customers are still trying to find out which algorithms work best for them. How many dimensions? What's the optimal use? Well, you can now run those in different servers without impacting your actual AI system. Once you've identified which algorithm and dimension is going to be best for your data, you can then have GoldenGate replicate that into your production system and we'll start using that instead. So it's a nice way to switch algorithms without taking extensive downtime. 16:29 Nikita: What about in multicloud environments? Nick: GoldenGate can also do multicloud and N-way active-active Oracle replication between vectors. So if there's vectors in Oracle databases, in multiple clouds, or multiple on-premise databases, GoldenGate can synchronize them all up. And of course we can also stream changes from vector information, including text as well into different search engines. And that's where the integration with Elasticsearch and OpenSearch comes in. And then we can use things like NVIDIA and Cohere to actually do the AI on that data. 17:01 Lois: Using GoldenGate with AI in the database unlocks so many possibilities. Thanks for that detailed introduction to Oracle GoldenGate 23ai and its capabilities, Nick. Nikita: We've run out of time for today, but Nick will be back next week to talk about how GoldenGate has evolved over time and its latest features. And if you liked what you heard today, head over to mylearn.oracle.com and take a look at the Oracle GoldenGate 23ai Fundamentals course to learn more. Until next time, this is Nikita Abraham… Lois: And Lois Houston, signing off! 17:33 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
Talk Python To Me - Python conversations for passionate developers
Pandas is at a the core of virtually all data science done in Python, that is virtually all data science. Since it's beginning, Pandas has been based upon numpy. But changes are afoot to update those internals and you can now optionally use PyArrow. PyArrow comes with a ton of benefits including it's columnar format which makes answering analytical questions faster, support for a range of high performance file formats, inter-machine data streaming, faster file IO and more. Reuven Lerner is here to give us the low-down on the PyArrow revolution. Episode sponsors NordLayer Auth0 Talk Python Courses Links from the show Reuven: github.com/reuven Apache Arrow: github.com Parquet: parquet.apache.org Feather format: arrow.apache.org Python Workout Book: manning.com Pandas Workout Book: manning.com Pandas: pandas.pydata.org PyArrow CSV docs: arrow.apache.org Future string inference in Pandas: pandas.pydata.org Pandas NA/nullable dtypes: pandas.pydata.org Pandas `.iloc` indexing: pandas.pydata.org DuckDB: duckdb.org Pandas user guide: pandas.pydata.org Pandas GitHub issues: github.com Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to Talk Python on YouTube: youtube.com Talk Python on Bluesky: @talkpython.fm at bsky.app Talk Python on Mastodon: talkpython Michael on Bluesky: @mkennedy.codes at bsky.app Michael on Mastodon: mkennedy
In dieser Folge nehmen wir euch mit auf eine Reise durch die Welt relationaler und nicht-relationaler Datenbanken – oder wie wir sagen: SQL trifft auf NoSQL, Struktur trifft auf Chaos, Ordnung trifft auf… Hühner?!Wir sprechen darüber, wann SQL deine beste Freundin ist – und wann sie dich einfach nur nervt. Warum NoSQL die coole, flexible Schwester ist, die sich an keine Regeln hält. Was ACID und BASE eigentlich bedeuten (Spoiler: hat nichts mit Chemie zu tun). Und welches Datenbankmodell sich am besten eignet, wenn deine App "EggTrack" heißt und Hühner zu Social-Media-Stars macht.Wie immer gibt's dazu eine sehr gute Frage, praktische Tipps und den einen oder anderen AHA-Moment. Viel Spaß beim Hören!
Talk Python To Me - Python conversations for passionate developers
Do you or your company need accounting software? Well, there are plenty of SaaS products out there that you can give your data to. but maybe you also really like Django and would rather have a foundation to build your own accounting system exactly as you need for your company or your product. On this episode, we're diving into Django Ledger, created by Miguel Sanda, which can do just that. Episode sponsors Auth0 Talk Python Courses Links from the show Miguel Sanda on Twitter: @elarroba Miguel on Mastodon: @elarroba@fosstodon.org Miguel on GitHub: github.com Django Ledger on Github: github.com Django Ledger Discord: discord.gg Get Started with Django MongoDB Backend: mongodb.com Wagtail CMS: wagtail.org Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to Talk Python on YouTube: youtube.com Talk Python on Bluesky: @talkpython.fm at bsky.app Talk Python on Mastodon: talkpython Michael on Bluesky: @mkennedy.codes at bsky.app Michael on Mastodon: mkennedy
Talk Python To Me - Python conversations for passionate developers
Have you ever spent an afternoon wrestling with a Jupyter notebook, hoping that you ran the cells in just the right order, only to realize your outputs were completely out of sync? Today's guest has a fresh take on solving that exact problem. Akshay Agrawal is here to introduce Marimo, a reactive Python notebook that ensures your code and outputs always stay in lockstep. And that's just the start! We'll also dig into Akshay's background at Google Brain and Stanford, what it's like to work on the cutting edge of AI, and how Marimo is uniting the best of data science exploration and real software engineering. Episode sponsors Worth Search Talk Python Courses Links from the show Akshay Agrawal: akshayagrawal.com YouTube: youtube.com Source: github.com Docs: marimo.io Marimo: marimo.io Discord: marimo.io WASM playground: marimo.new Experimental generate notebooks with AI: marimo.app Pluto.jl: plutojl.org Observable JS: observablehq.com Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to Talk Python on YouTube: youtube.com Talk Python on Bluesky: @talkpython.fm at bsky.app Talk Python on Mastodon: talkpython Michael on Bluesky: @mkennedy.codes at bsky.app Michael on Mastodon: mkennedy
Talk Python To Me - Python conversations for passionate developers
We're sitting down with Eric Matthes, the educator, author, and developer behind Django Simple Deploy. If you've ever struggled with taking that final step of getting your Django app onto a live server (without spending days wrestling with DevOps complexities), then give Django Simple Deploy a look. Eric shares how Django Simple Deploy automates away the boilerplate parts of deployment, so you can focus on building features instead of deciphering endless configs. We'll talk about this new project's journey to 1.0, the range of hosting platforms it supports, and why it's not just for beginners. Episode sponsors Worth Search Talk Python Courses Links from the show django-simple-deploy documentation: readthedocs.io django-simple-deploy repository: github.com Python Crash Course book: ehmatthes.github.io Code Red: codered.cloud Docker: docker.com Caddy: caddyserver.com Bunny.net CDN: bunny.net Platform.sh: platform.sh fly.io: fly.io Heroku: heroku.com Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to Talk Python on YouTube: youtube.com Talk Python on Bluesky: @talkpython.fm at bsky.app Talk Python on Mastodon: talkpython Michael on Bluesky: @mkennedy.codes at bsky.app Michael on Mastodon: mkennedy
On this episode of Alexa's Input (AI), we're diving deep into the world of distributed databases with Patrick McFadin, Principal Technical Strategist at DataStax and a leading voice in the Apache Cassandra community. Patrick shares his journey into tech and how he became one of the foremost experts on Cassandra—an open-source, highly scalable NoSQL database that powers mission-critical applications across the globe.We explore Cassandra's unique architecture, its approach to the CAP theorem, real-world use cases, and how it continues to evolve in the era of AI and real-time analytics. Whether you're a developer, architect, or just database-curious, this episode offers a clear, insightful look at how Cassandra handles scale, availability, and open-source innovation.Links:LinkedIn: https://www.linkedin.com/in/patrick-mcfadin-53a8046/DataStax: https://www.datastax.com/our-people/patrick-mcfadinX: https://x.com/patrickmcfadinGithub: https://github.com/pmcfadinYou can support this podcast on the creators page. Make sure to subscribe and follow Alexa's Input Twitter account to get notified when a new podcast episode comes out.
Talk Python To Me - Python conversations for passionate developers
This episode is all about Beeware, the project that working towards true native apps built on Python, especially for iOS and Android. Russell's been at this for more than a decade, and the progress is now hitting critical mass. We'll talk about the Toga GUI toolkit, building and shipping your apps with Briefcase, the newly official support for iOS and Android in CPython, and so much more. I can't wait to explore how BeeWare opens up the entire mobile ecosystem for Python developers, let's jump right in. Episode sponsors Posit Python in Production Talk Python Courses Links from the show Anaconda open source team: anaconda.com PEP 730 – Adding iOS: peps.python.org PEP 738 – Adding Android: peps.python.org Toga: beeware.org Briefcase: beeware.org emscripten: emscripten.org Russell Keith-Magee - Keynote - PyCon 2019: youtube.com Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to Talk Python on YouTube: youtube.com Talk Python on Bluesky: @talkpython.fm at bsky.app Talk Python on Mastodon: talkpython Michael on Bluesky: @mkennedy.codes at bsky.app Michael on Mastodon: mkennedy
Talk Python To Me - Python conversations for passionate developers
In this episode, we welcome back Will McGugan, the creator of the wildly popular Rich library and founder of Textualize. We'll dive into Will's latest article on "Algorithms for High Performance Terminal Apps" and explore how he's quietly revolutionizing what's possible in the terminal, from smooth animations and dynamic widgets to full-on TUI (or should we say GUI?) frameworks. Whether you're looking to supercharge your command-line tools or just curious how Python can push the limits of text-based UIs, you'll love hearing how Will's taking a modern, web-inspired approach to old-school terminals. Episode sponsors Posit Python in Production Talk Python Courses Links from the show Algorithms for high performance terminal apps post: textual.textualize.io Textual Demo: github.com Textual: textualize.io Zero ver: 0ver.org memray: github.com Posting app: posting.sh Bulma CSS framewokr: bulma.io JP Term: davidbrochart.github.io Rich: github.com btop: github.com starship: starship.rs Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to Talk Python on YouTube: youtube.com Talk Python on Bluesky: @talkpython.fm at bsky.app Talk Python on Mastodon: talkpython Michael on Bluesky: @mkennedy.codes at bsky.app Michael on Mastodon: mkennedy
Talk Python To Me - Python conversations for passionate developers
Have you ever wondered why certain data points stand out so dramatically? They might hold the key to everything from fraud detection to groundbreaking discoveries. This week on Talk Python to Me, we dive into the world of outlier detection with Python with Brett Kennedy. You'll learn how outliers can signal errors, highlight novel insights, or even reveal hidden patterns lurking in the data you thought you understood. We'll explore fresh research developments, practical use cases, and how outlier detection compares to other core data science tasks like prediction and clustering. If you're ready to spot those game-changing anomalies in your own projects, stay tuned. Episode sponsors Posit Python in Production Talk Python Courses Links from the show Data-morph: github.com PyOD: github.com Prophet: github.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to Talk Python on YouTube: youtube.com Talk Python on Bluesky: @talkpython.fm at bsky.app Talk Python on Mastodon: talkpython Michael on Bluesky: @mkennedy.codes at bsky.app Michael on Mastodon: mkennedy
Talk Python To Me - Python conversations for passionate developers
Today we explore the wild world of Python deployment with my friend, Calvin Hendricks-Parker from Six Feet Up. We'll tackle some of the biggest challenges in taking a Python app from “it works on my machine” to production, covering inconsistent environments, conflicting dependencies, and sneaky security pitfalls. Along the way, Calvin shares how containerization with Docker and Kubernetes can both simplify and complicate deployments, especially for smaller teams. Finally, we'll introduce Scaf, a powerful project blueprint designed to give developers a rock-solid start on Python web projects of all sizes. Get notified when the Talk Python in Production book goes live and read the first third online right now. Episode sponsors Posit Python in Production Talk Python Courses Links from the show Calvin Hendryx-Parker: github.com Scaf on GitHub: github.com Scaf on GitHub (duplicate): github.com "Deploy the Dream" song: deploy-the-dream-talk-python.mp3 CloudDevEngineering YouTube Channel: youtube.com TechWorld with Nana YouTube Channel: youtube.com Tilt (Kubernetes Dev Tool): tilt.dev Talos (Minimal OS for Kubernetes): talos.dev Traefik Reverse Proxy: traefik.io Sealed Secrets on GitHub: github.com Argo CD Documentation: readthedocs.io MailHog on GitHub: github.com Next.js: nextjs.org Cloud Custodian: cloudcustodian.io Valky (Redis Replacement): valkey.io “The ‘Works on My Machine' Certification Program” (Coding Horror): blog.codinghorror.com NVIDIA's First Desktop AI PC (Ars Technica): arstechnica.com Kind (Kubernetes in Docker): kind.sigs.k8s.io Updated Effective PyCharm Course: training.talkpython.fm Talk Python in Production book: talkpython.fm/books/python-in-production Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to Talk Python on YouTube: youtube.com Talk Python on Bluesky: @talkpython.fm at bsky.app Talk Python on Mastodon: talkpython Michael on Bluesky: @mkennedy.codes at bsky.app Michael on Mastodon: mkennedy
Nikolay and Michael use a recent "best practices" article as a prompt — giving a few tips each on the topics mentioned, like schema design, performance, backups, and more. Here are some links to things they mentioned:7 Crucial PostgreSQL Best Practices (recent blog post) https://speakdatascience.com/postgresql-best-practices“Don't do this” episode https://postgres.fm/episodes/dont-do-thisArticle discussion on Hacker News https://news.ycombinator.com/item?id=42992913Mozilla's SQL Style Guide https://docs.telemetry.mozilla.org/concepts/sql_style“SQL vs NoSQL” episode with Franck Pachot https://postgres.fm/episodes/sql-vs-nosqlHA episode https://postgres.fm/episodes/high-availability ~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is produced by:Michael Christofides, founder of pgMustardNikolay Samokhvalov, founder of Postgres.aiWith credit to:Jessie Draws for the elephant artwork
Давненько мы что-то с вами не ныряли в удивительный мирок СУБД. А там, между прочим, не только эти ваши замшелые истории про нормальные формы обитают, и не только модный (уже не особо) и молодежный (уже успели все постареть) NoSQL, а еще и всякие интересные движения происходят. И пример тому — YDB. Кто: Антон Коваленко, руководитель проектного офиса YDB О чем: Зачем нужна ещё одна СУБД – предпосылки появления YDB. Что такое DisitributedSQL? Кто сейчас использует YDB? Что там под капотом? YDB - это замена кому? (спойлер - не clickhouse) Побочные эффекты - как при написании СУБД сделали распределенною файловую систему. Сообщение sysadmins №54. YDB появились сначала на linkmeup.
Talk Python To Me - Python conversations for passionate developers
On this episode, I'm joined by Dr. Jeff Boeing, an assistant professor at the University of Southern California whose research spans urban planning, spatial analysis, and data science. We explore why OpenStreetMap is such a powerful source of global map data—and how Jeff's Python library, OSMnx, makes that data easier to download, model, and visualize. Along the way, we talk about what shapes city streets around the world, how urban design influences everything from daily commutes to disaster resilience, and why turning open data into accessible tools can open up completely new ways of understanding our cities. If you've ever wondered how to build or analyze your own digital maps in Python, or what it takes to manage a project that transforms raw geographic data into meaningful research, you won't want to miss this conversation. Episode sponsors Posit Podcast Later Talk Python Courses Links from the show City Street Orientations World: geoffboeing.com OSMnx Documentation: readthedocs.io OSMnx GitHub: github.com OpenStreetMap: openstreetmap.org Open Database License: opendatacommons.org ID Editor (Web Editor): wiki.openstreetmap.org Planet OSM: planet.openstreetmap.org Overpass API: wiki.openstreetmap.org GeoPandas: geopandas.org NetworkX: networkx.org Shapely: shapely.readthedocs.io Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to Talk Python on YouTube: youtube.com Talk Python on Bluesky: @talkpython.fm at bsky.app Talk Python on Mastodon: talkpython Michael on Bluesky: @mkennedy.codes at bsky.app Michael on Mastodon: mkennedy
Talk Python To Me - Python conversations for passionate developers
As Python developers, we're incredibly lucky to have over half a million packages that we can use to build our applications with over at PyPI. However, when it comes to choosing a UI framework, the options get narrowed down very quickly. Intersect those choices with the ones that work on mobile, and you have a very short list. Flutter is a UI framework for building desktop and mobile applications, and is in fact the one that we used to build the Talk Python courses app, you'd find at talkpython.fm/apps. That's why I'm so excited about Flet. Flet is a Python UI framework that is distributed and executed on the Flutter framework, making it possible to build mobile apps and desktop apps with Python. We have Feodor Fitsner back on the show after he launched his project a couple years ago to give us an update on how close they are to a full featured mobile app framework in Python. Episode sponsors Posit Podcast Later Talk Python Courses Links from the show Flet: flet.dev Flet on Github: github.com Packaging apps with Flet: flet.dev/docs/publish Flutter: flutter.dev React vs. Flutter: trends.stackoverflow.co Kivy: kivy.org Beeware: beeware.org Mobile forge from Beeware: github.com The list of built-in binary wheels: flet.dev/docs/publish/android#binary-python-packages Difference between dynamic and static Flet web apps: flet.dev/docs/publish/web Integrating Flutter packages: flet.dev/docs/extend/integrating-existing-flutter-packages serious_python: pub.dev/packages/serious_python Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to Talk Python on YouTube: youtube.com Talk Python on Bluesky: @talkpython.fm at bsky.app Talk Python on Mastodon: talkpython Michael on Bluesky: @mkennedy.codes at bsky.app Michael on Mastodon: mkennedy
Nikolay and Michael are joined by Franck Pachot to discuss SQL vs NoSQL — did Franck change teams by joining MongoDB, normalisation vs denormalisation, developer experience, NULLs, and more! Here are some links to things they mentioned:Franck Pachot https://postgres.fm/people/franck-pachotFranck's workshop at PGConf India https://pgconf.in/conferences/pgconfin2025/program/proposals/958 PostgreSQL Conference Germany https://2025.pgconf.de"Schema Later" Considered Harmful by Michael Stonebraker and Álvaro Hernández https://www.enterprisedb.com/blog/schema-later-considered-harmfulComparison of JOINS by Michael Stonebraker and Álvaro Hernández https://www.enterprisedb.com/blog/comparison-joins-mongodb-vs-postgresql Franck's post about why he joined MongoDB https://www.linkedin.com/pulse/2025-im-joining-mongodb-franck-pachot-e4shfEdgeDB https://www.edgedb.comNikolay's tweet about a recent issue with NULLs https://x.com/samokhvalov/status/1889078097124999272PartiQL https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ql-reference.htmlFerretDB https://www.ferretdb.comDocumentDB https://github.com/microsoft/documentdb~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is produced by:Michael Christofides, founder of pgMustardNikolay Samokhvalov, founder of Postgres.aiWith special thanks to:Jessie Draws for the elephant artwork
Show NotesMike Bowers, Chief Architect at FairCom, has spent decades navigating the evolution of database technology. In this conversation, he and Robby explore the challenges of maintaining a 40+ year-old codebase, balancing legacy constraints with forward-thinking design, and the realities of technical debt.Mike shares how FairCom transitioned from ISAM-based databases to modern JSON-driven APIs, the trade-offs between strict schemas and flexible document stores, and how software architecture plays a critical role in long-term maintainability. He also explains why human-readable JSON simplifies debugging, how documentation-driven development improves API usability, and why many software teams struggle with refactoring at the right time.Topics covered[00:05:32] The role of software architecture in long-term maintainability[00:10:45] Why FairCom's legacy ISAM technology still matters today[00:14:20] Transitioning to a JSON-based API for modern developers[00:19:40] The challenges of maintaining 40+ years of C code[00:24:10] Technical debt: What it really means and how to manage it[00:28:50] The trade-offs between strict schemas and flexible NoSQL approaches[00:34:00] When to refactor vs. when to start over from scratch[00:38:15] The influence of product management thinking on software architecture[00:42:30] Advice for engineers considering a shift into architecture rolesResources mentionedFairComMike Bowers on LinkedInFairCom on Twitter/XBook Recommendation: The Influential Product Manager by MSc BuceroThanks to Our Sponsor!Need a smoother way to share your team's inbox? Jelly's got you covered!
Talk Python To Me - Python conversations for passionate developers
In this episode, I'm joined by JJ Allaire, founder and executive chairman at Posit, and Carlos Scheidegger, a software engineer at Posit, to explore Quarto, an open-source tool revolutionizing technical publishing. We discuss how Quarto empowers users to seamlessly transform Jupyter notebooks into polished reports, dashboards, e-books, websites, and more. JJ shares his journey from creating RStudio to developing Quarto as a versatile, multi-language tool, while Carlos delves into its roots in reproducibility and the challenges of academic publishing. Don't miss this deep dive into a tool that's shaping the future of data-driven storytelling! Episode sponsors Talk Python Courses DigitalOcean Links from the show JJ Allaire JJ on LinkedIn: linkedin.com JJ on GitHub: github.com Carlos Scheidegger Personal site: cscheid.net Mastodon: @scheidegger Fast AI: fast.ai nbdev: nbdev.fast.ai nbsanity - Share Notebooks as Polished Web Pages in Seconds: answer.ai Pandoc: pandoc.org Observable: github.com Quarto Pub: quartopub.com Deno: deno.com Real World Data Science site: realworlddatascience.net Typst: typst.app Github Actions for Quarto: github.com Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to Talk Python on YouTube: youtube.com Talk Python on Bluesky: @talkpython.fm at bsky.app Talk Python on Mastodon: talkpython Michael on Bluesky: @mkennedy.codes at bsky.app Michael on Mastodon: mkennedy
En este episodio tendremos una comparativa de bases de datos SQL, NoSQL y NewSQL – ¿Cuál elegir y cuándo?
Arnaud et Emmanuel discutent des nouvelles de ce mois. On y parle intégrité de JVM, fetch size de JDBC, MCP, de prompt engineering, de DeepSeek bien sûr mais aussi de Maven 4 et des proxy de répository Maven. Et d'autres choses encore, bonne lecture. Enregistré le 7 février 2025 Téléchargement de l'épisode LesCastCodeurs-Episode-322.mp3 ou en vidéo sur YouTube. News Langages Les evolutions de la JVM pour augmenter l'intégrité https://inside.java/2025/01/03/evolving-default-integrity/ un article sur les raisons pour lesquelles les editeurs de frameworks et les utilisateurs s'arrachent les cheveux et vont continuer garantir l'integrite du code et des données en enlevant des APIs existantes historiquemnt agents dynamiques, setAccessible, Unsafe, JNI Article expliques les risques percus par les mainteneurs de la JVM Franchement c'est un peu leg sur les causes l'article, auto propagande JavaScript Temporal, enfin une API propre et moderne pour gérer les dates en JS https://developer.mozilla.org/en-US/blog/javascript-temporal-is-coming/ JavaScript Temporal est un nouvel objet conçu pour remplacer l'objet Date, qui présente des défauts. Il résout des problèmes tels que le manque de prise en charge des fuseaux horaires et la mutabilité. Temporal introduit des concepts tels que les instants, les heures civiles et les durées. Il fournit des classes pour gérer diverses représentations de date/heure, y compris celles qui tiennent compte du fuseau horaire et celles qui n'en tiennent pas compte. Temporal simplifie l'utilisation de différents calendriers (par exemple, chinois, hébreu). Il comprend des méthodes pour les comparaisons, les conversions et le formatage des dates et des heures. La prise en charge par les navigateurs est expérimentale, Firefox Nightly ayant l'implémentation la plus aboutie. Un polyfill est disponible pour essayer Temporal dans n'importe quel navigateur. Librairies Un article sur les fetch size du JDBC et les impacts sur vos applications https://in.relation.to/2025/01/24/jdbc-fetch-size/ qui connait la valeur fetch size par default de son driver? en fonction de vos use cases, ca peut etre devastateur exemple d'une appli qui retourne 12 lignes et un fetch size de oracle a 10, 2 a/r pour rien et si c'est 50 lignres retournées la base de donnée est le facteur limitant, pas Java donc monter sont fetch size est avantageux, on utilise la memoire de Java pour eviter la latence Quarkus annouce les MCP servers project pour collecter les servier MCP en Java https://quarkus.io/blog/introducing-mcp-servers/ MCP d'Anthropic introspecteur de bases JDBC lecteur de filke system Dessine en Java FX demarrables facilement avec jbang et testes avec claude desktop, goose et mcp-cli permet d'utliser le pouvoir des librarires Java de votre IA d'ailleurs Spring a la version 0.6 de leur support MCP https://spring.io/blog/2025/01/23/spring-ai-mcp-0 Infrastructure Apache Flink sur Kibernetes https://www.decodable.co/blog/get-running-with-apache-flink-on-kubernetes-2 un article tres complet ejn deux parties sur l'installation de Flink sur Kubernetes installation, setup mais aussi le checkpointing, la HA, l'observablité Data et Intelligence Artificielle 10 techniques de prompt engineering https://medium.com/google-cloud/10-prompt-engineering-techniques-every-beginner-should-know-bf6c195916c7 Si vous voulez aller plus loin, l'article référence un très bon livre blanc sur le prompt engineering https://www.kaggle.com/whitepaper-prompt-engineering Les techniques évoquées : Zero-Shot Prompting: On demande directement à l'IA de répondre à une question sans lui fournir d'exemple préalable. C'est comme si on posait une question à une personne sans lui donner de contexte. Few-Shot Prompting: On donne à l'IA un ou plusieurs exemples de la tâche qu'on souhaite qu'elle accomplisse. C'est comme montrer à quelqu'un comment faire quelque chose avant de lui demander de le faire. System Prompting: On définit le contexte général et le but de la tâche pour l'IA. C'est comme donner à l'IA des instructions générales sur ce qu'elle doit faire. Role Prompting: On attribue un rôle spécifique à l'IA (enseignant, journaliste, etc.). C'est comme demander à quelqu'un de jouer un rôle spécifique. Contextual Prompting: On fournit des informations supplémentaires ou un contexte pour la tâche. C'est comme donner à quelqu'un toutes les informations nécessaires pour répondre à une question. Step-Back Prompting: On pose d'abord une question générale, puis on utilise la réponse pour poser une question plus spécifique. C'est comme poser une question ouverte avant de poser une question plus fermée. Chain-of-Thought Prompting: On demande à l'IA de montrer étape par étape comment elle arrive à sa conclusion. C'est comme demander à quelqu'un d'expliquer son raisonnement. Self-Consistency Prompting: On pose plusieurs fois la même question à l'IA et on compare les réponses pour trouver la plus cohérente. C'est comme vérifier une réponse en la posant sous différentes formes. Tree-of-Thoughts Prompting: On permet à l'IA d'explorer plusieurs chemins de raisonnement en même temps. C'est comme considérer toutes les options possibles avant de prendre une décision. ReAct Prompting: On permet à l'IA d'interagir avec des outils externes pour résoudre des problèmes complexes. C'est comme donner à quelqu'un les outils nécessaires pour résoudre un problème. Les patterns GenAI the thoughtworks https://martinfowler.com/articles/gen-ai-patterns/ tres introductif et pre RAG le direct prompt qui est un appel direct au LLM: limitations de connaissance et de controle de l'experience eval: evaluer la sortie d'un LLM avec plusieurs techniques mais fondamentalement une fonction qui prend la demande, la reponse et donc un score numerique evaluation via un LLM (le meme ou un autre), ou evaluation humaine tourner les evaluations a partir de la chaine de build amis aussi en live vu que les LLMs puvent evoluer. Decrit les embedding notament d'image amis aussi de texte avec la notion de contexte DeepSeek et la fin de la domination de NVidia https://youtubetranscriptoptimizer.com/blog/05_the_short_case_for_nvda un article sur les raisons pour lesquelles NVIDIA va se faire cahllengert sur ses marges 90% de marge quand meme parce que les plus gros GPU et CUDA qui est proprio mais des approches ardware alternatives existent qui sont plus efficientes (TPU et gros waffle) Google, MS et d'autres construisent leurs GPU alternatifs CUDA devient de moins en moins le linga franca avec l'investissement sur des langages intermediares alternatifs par Apple, Google OpenAI etc L'article parle de DeepSkeek qui est venu mettre une baffe dans le monde des LLMs Ils ont construit un competiteur a gpt4o et o1 avec 5M de dollars et des capacites de raisonnements impressionnant la cles c'etait beaucoup de trick d'optimisation mais le plus gros est d'avoir des poids de neurores sur 8 bits vs 32 pour les autres. et donc de quatizer au fil de l'eau et au moment de l'entrainement beaucoup de reinforcemnt learning innovatifs aussi et des Mixture of Expert donc ~50x moins chers que OpenAI Donc plus besoin de GPU qui on des tonnes de vRAM ah et DeepSeek est open source un article de semianalytics change un peu le narratif le papier de DeepSkeek en dit long via ses omissions par ensemple les 6M c'est juste l'inference en GPU, pas les couts de recherches et divers trials et erreurs en comparaison Claude Sonnet a coute 10M en infererence DeepSeek a beaucoup de CPU pre ban et ceratins post bans evalués a 5 Milliards en investissement. leurs avancées et leur ouverture reste extremement interessante Une intro à Apache Iceberg http://blog.ippon.fr/2025/01/17/la-revolution-des-donnees-lavenement-des-lakehouses-avec-apache-iceberg/ issue des limites du data lake. non structuré et des Data Warehouses aux limites en diversite de données et de volume entrent les lakehouse Et particulierement Apache Iceberg issue de Netflix gestion de schema mais flexible notion de copy en write vs merge on read en fonction de besoins garantie atomicite, coherence, isoliation et durabilite notion de time travel et rollback partitions cachées (qui abstraient la partition et ses transfos) et evolution de partitions compatbile avec les moteurs de calcul comme spark, trino, flink etc explique la structure des metadonnées et des données Guillaume s'amuse à générer des histoires courtes de Science-Fiction en programmant des Agents IA avec LangChain4j et aussi avec des workflows https://glaforge.dev/posts/2025/01/27/an-ai-agent-to-generate-short-scifi-stories/ https://glaforge.dev/posts/2025/01/31/a-genai-agent-with-a-real-workflow/ Création d'un générateur automatisé de nouvelles de science-fiction à l'aide de Gemini et Imagen en Java, LangChain4j, sur Google Cloud. Le système génère chaque nuit des histoires, complétées par des illustrations créées par le modèle Imagen 3, et les publie sur un site Web. Une étape d'auto-réflexion utilise Gemini pour sélectionner la meilleure image pour chaque chapitre. L'agent utilise un workflow explicite, drivé par le code Java, où les étapes sont prédéfinies dans le code, plutôt que de s'appuyer sur une planification basée sur LLM. Le code est disponible sur GitHub et l'application est déployée sur Google Cloud. L'article oppose les agents de workflow explicites aux agents autonomes, en soulignant les compromis de chaque approche. Car parfois, les Agent IA autonomes qui gèrent leur propre planning hallucinent un peu trop et n'établissent pas un plan correctement, ou ne le suive pas comme il faut, voire hallucine des “function call”. Le projet utilise Cloud Build, le Cloud Run jobs, Cloud Scheduler, Firestore comme base de données, et Firebase pour le déploiement et l'automatisation du frontend. Dans le deuxième article, L'approche est différente, Guillaume utilise un outil de Workflow, plutôt que de diriger le planning avec du code Java. L'approche impérative utilise du code Java explicite pour orchestrer le workflow, offrant ainsi un contrôle et une parallélisation précis. L'approche déclarative utilise un fichier YAML pour définir le workflow, en spécifiant les étapes, les entrées, les sorties et l'ordre d'exécution. Le workflow comprend les étapes permettant de générer une histoire avec Gemini 2, de créer une invite d'image, de générer des images avec Imagen 3 et d'enregistrer le résultat dans Cloud Firestore (base de donnée NoSQL). Les principaux avantages de l'approche impérative sont un contrôle précis, une parallélisation explicite et des outils de programmation familiers. Les principaux avantages de l'approche déclarative sont des définitions de workflow peut-être plus faciles à comprendre (même si c'est un YAML, berk !) la visualisation, l'évolutivité et une maintenance simplifiée (on peut juste changer le YAML dans la console, comme au bon vieux temps du PHP en prod). Les inconvénients de l'approche impérative incluent le besoin de connaissances en programmation, les défis potentiels en matière de maintenance et la gestion des conteneurs. Les inconvénients de l'approche déclarative incluent une création YAML pénible, un contrôle de parallélisation limité, l'absence d'émulateur local et un débogage moins intuitif. Le choix entre les approches dépend des exigences du projet, la déclarative étant adaptée aux workflows plus simples. L'article conclut que la planification déclarative peut aider les agents IA à rester concentrés et prévisibles. Outillage Vulnérabilité des proxy Maven https://github.blog/security/vulnerability-research/attacks-on-maven-proxy-repositories/ Quelque soit le langage, la techno, il est hautement conseillé de mettre en place des gestionnaires de repositories en tant que proxy pour mieux contrôler les dépendances qui contribuent à la création de vos produits Michael Stepankin de l'équipe GitHub Security Lab a cherché a savoir si ces derniers ne sont pas aussi sources de vulnérabilité en étudiant quelques CVEs sur des produits comme JFrog Artifactory, Sonatype Nexus, et Reposilite Certaines failles viennent de la UI des produits qui permettent d'afficher les artifacts (ex: mettez un JS dans un fichier POM) et même de naviguer dedans (ex: voir le contenu d'un jar / zip et on exploite l'API pour lire, voir modifier des fichiers du serveur en dehors des archives) Les artifacts peuvent aussi être compromis en jouant sur les paramètres propriétaires des URLs ou en jouant sur le nomage avec les encodings. Bref, rien n'est simple ni niveau. Tout système rajoute de la compléxité et il est important de les tenir à mettre à jour. Il faut surveiller activement sa chaine de distribution via différents moyens et ne pas tout miser sur le repository manager. L'auteur a fait une présentation sur le sujet : https://www.youtube.com/watch?v=0Z_QXtk0Z54 Apache Maven 4… Bientôt, c'est promis …. qu'est ce qu'il y aura dedans ? https://gnodet.github.io/maven4-presentation/ Et aussi https://github.com/Bukama/MavenStuff/blob/main/Maven4/whatsnewinmaven4.md Apache Maven 4 Doucement mais surement …. c'est le principe d'un projet Maven 4.0.0-rc-2 est dispo (Dec 2024). Maven a plus de 20 ans et est largement utilisé dans l'écosystème Java. La compatibilité ascendante a toujours été une priorité, mais elle a limité la flexibilité. Maven 4 introduit des changements significatifs, notamment un nouveau schéma de construction et des améliorations du code. Changements du POM Séparation du Build-POM et du Consumer-POM : Build-POM : Contient des informations propres à la construction (ex. plugins, configurations). Consumer-POM : Contient uniquement les informations nécessaires aux consommateurs d'artefacts (ex. dépendances). Nouveau Modèle Version 4.1.0 : Utilisé uniquement pour le Build-POM, alors que le Consumer-POM reste en 4.0.0 pour la compatibilité. Introduit de nouveaux éléments et en marque certains comme obsolètes. Modules renommés en sous-projets : “Modules” devient “Sous-projets” pour éviter la confusion avec les Modules Java. L'élément remplace (qui reste pris en charge). Nouveau type de packaging : “bom” (Bill of Materials) : Différencie les POMs parents et les BOMs de gestion des dépendances. Prend en charge les exclusions et les imports basés sur les classifiers. Déclaration explicite du répertoire racine : permet de définir explicitement le répertoire racine du projet. Élimine toute ambiguïté sur la localisation des racines de projet. Nouvelles variables de répertoire : ${project.rootDirectory}, ${session.topDirectory} et ${session.rootDirectory} pour une meilleure gestion des chemins. Remplace les anciennes solutions non officielles et variables internes obsolètes. Prise en charge de syntaxes alternatives pour le POM Introduction de ModelParser SPI permettant des syntaxes alternatives pour le POM. Apache Maven Hocon Extension est un exemple précoce de cette fonctionnalité. Améliorations pour les sous-projets Versioning automatique des parents Il n'est plus nécessaire de définir la version des parents dans chaque sous-projet. Fonctionne avec le modèle de version 4.1.0 et s'étend aux dépendances internes au projet. Support complet des variables compatibles CI Le Flatten Maven Plugin n'est plus requis. Prend en charge les variables comme ${revision} pour le versioning. Peut être défini via maven.config ou la ligne de commande (mvn verify -Drevision=4.0.1). Améliorations et corrections du Reactor Correction de bug : Gestion améliorée de --also-make lors de la reprise des builds. Nouvelle option --resume (-r) pour redémarrer à partir du dernier sous-projet en échec. Les sous-projets déjà construits avec succès sont ignorés lors de la reprise. Constructions sensibles aux sous-dossiers : Possibilité d'exécuter des outils sur des sous-projets sélectionnés uniquement. Recommandation : Utiliser mvn verify plutôt que mvn clean install. Autres Améliorations Timestamps cohérents pour tous les sous-projets dans les archives packagées. Déploiement amélioré : Le déploiement ne se produit que si tous les sous-projets sont construits avec succès. Changements de workflow, cycle de vie et exécution Java 17 requis pour exécuter Maven Java 17 est le JDK minimum requis pour exécuter Maven 4. Les anciennes versions de Java peuvent toujours être ciblées pour la compilation via Maven Toolchains. Java 17 a été préféré à Java 21 en raison d'un support à long terme plus étendu. Mise à jour des plugins et maintenance des applications Suppression des fonctionnalités obsolètes (ex. Plexus Containers, expressions ${pom.}). Mise à jour du Super POM, modifiant les versions par défaut des plugins. Les builds peuvent se comporter différemment ; définissez des versions fixes des plugins pour éviter les changements inattendus. Maven 4 affiche un avertissement si des versions par défaut sont utilisées. Nouveau paramètre “Fail on Severity” Le build peut échouer si des messages de log atteignent un niveau de gravité spécifique (ex. WARN). Utilisable via --fail-on-severity WARN ou -fos WARN. Maven Shell (mvnsh) Chaque exécution de mvn nécessitait auparavant un redémarrage complet de Java/Maven. Maven 4 introduit Maven Shell (mvnsh), qui maintient un processus Maven résident unique ouvert pour plusieurs commandes. Améliore la performance et réduit les temps de build. Alternative : Utilisez Maven Daemon (mvnd), qui gère un pool de processus Maven résidents. Architecture Un article sur les feature flags avec Unleash https://feeds.feedblitz.com//911939960/0/baeldungImplement-Feature-Flags-in-Java-With-Unleash Pour A/B testing et des cycles de développements plus rapides pour « tester en prod » Montre comment tourner sous docker unleash Et ajouter la librairie a du code java pour tester un feature flag Sécurité Keycloak 26.1 https://www.keycloak.org/2025/01/keycloak-2610-released.html detection des noeuds via la proble base de donnée aulieu echange reseau virtual threads pour infinispan et jgroups opentelemetry tracing supporté et plein de fonctionalités de sécurité Loi, société et organisation Les grands morceaux du coût et revenus d'une conférence. Ici http://bdx.io|bdx.io https://bsky.app/profile/ameliebenoit33.bsky.social/post/3lgzslhedzk2a 44% le billet 52% les sponsors 38% loc du lieu 29% traiteur et café 12% standiste 5% frais speaker (donc pas tous) Ask Me Anything Julien de Provin: J'aime beaucoup le mode “continuous testing” de Quarkus, et je me demandais s'il existait une alternative en dehors de Quarkus, ou à défaut, des ressources sur son fonctionnement ? J'aimerais beaucoup avoir un outil agnostique utilisable sur les projets non-Quarkus sur lesquels j'intervient, quitte à y metttre un peu d'huile de coude (ou de phalange pour le coup). https://github.com/infinitest/infinitest/ Conférences La liste des conférences provenant de Developers Conferences Agenda/List par Aurélie Vache et contributeurs : 6-7 février 2025 : Touraine Tech - Tours (France) 21 février 2025 : LyonJS 100 - Lyon (France) 28 février 2025 : Paris TS La Conf - Paris (France) 6 mars 2025 : DevCon #24 : 100% IA - Paris (France) 13 mars 2025 : Oracle CloudWorld Tour Paris - Paris (France) 14 mars 2025 : Rust In Paris 2025 - Paris (France) 19-21 mars 2025 : React Paris - Paris (France) 20 mars 2025 : PGDay Paris - Paris (France) 20-21 mars 2025 : Agile Niort - Niort (France) 25 mars 2025 : ParisTestConf - Paris (France) 26-29 mars 2025 : JChateau Unconference 2025 - Cour-Cheverny (France) 27-28 mars 2025 : SymfonyLive Paris 2025 - Paris (France) 28 mars 2025 : DataDays - Lille (France) 28-29 mars 2025 : Agile Games France 2025 - Lille (France) 3 avril 2025 : DotJS - Paris (France) 3 avril 2025 : SoCraTes Rennes 2025 - Rennes (France) 4 avril 2025 : Flutter Connection 2025 - Paris (France) 4 avril 2025 : aMP Orléans 04-04-2025 - Orléans (France) 10-11 avril 2025 : Android Makers - Montrouge (France) 10-12 avril 2025 : Devoxx Greece - Athens (Greece) 16-18 avril 2025 : Devoxx France - Paris (France) 23-25 avril 2025 : MODERN ENDPOINT MANAGEMENT EMEA SUMMIT 2025 - Paris (France) 24 avril 2025 : IA Data Day 2025 - Strasbourg (France) 29-30 avril 2025 : MixIT - Lyon (France) 7-9 mai 2025 : Devoxx UK - London (UK) 15 mai 2025 : Cloud Toulouse - Toulouse (France) 16 mai 2025 : AFUP Day 2025 Lille - Lille (France) 16 mai 2025 : AFUP Day 2025 Lyon - Lyon (France) 16 mai 2025 : AFUP Day 2025 Poitiers - Poitiers (France) 24 mai 2025 : Polycloud - Montpellier (France) 24 mai 2025 : NG Baguette Conf 2025 - Nantes (France) 5-6 juin 2025 : AlpesCraft - Grenoble (France) 5-6 juin 2025 : Devquest 2025 - Niort (France) 10-11 juin 2025 : Modern Workplace Conference Paris 2025 - Paris (France) 11-13 juin 2025 : Devoxx Poland - Krakow (Poland) 12-13 juin 2025 : Agile Tour Toulouse - Toulouse (France) 12-13 juin 2025 : DevLille - Lille (France) 13 juin 2025 : Tech F'Est 2025 - Nancy (France) 17 juin 2025 : Mobilis In Mobile - Nantes (France) 24 juin 2025 : WAX 2025 - Aix-en-Provence (France) 25-26 juin 2025 : Agi'Lille 2025 - Lille (France) 25-27 juin 2025 : BreizhCamp 2025 - Rennes (France) 26-27 juin 2025 : Sunny Tech - Montpellier (France) 1-4 juillet 2025 : Open edX Conference - 2025 - Palaiseau (France) 7-9 juillet 2025 : Riviera DEV 2025 - Sophia Antipolis (France) 18-19 septembre 2025 : API Platform Conference - Lille (France) & Online 2-3 octobre 2025 : Volcamp - Clermont-Ferrand (France) 6-10 octobre 2025 : Devoxx Belgium - Antwerp (Belgium) 9-10 octobre 2025 : Forum PHP 2025 - Marne-la-Vallée (France) 16-17 octobre 2025 : DevFest Nantes - Nantes (France) 4-7 novembre 2025 : NewCrafts 2025 - Paris (France) 6 novembre 2025 : dotAI 2025 - Paris (France) 7 novembre 2025 : BDX I/O - Bordeaux (France) 12-14 novembre 2025 : Devoxx Morocco - Marrakech (Morocco) 28-31 janvier 2026 : SnowCamp 2026 - Grenoble (France) 23-25 avril 2026 : Devoxx Greece - Athens (Greece) 17 juin 2026 : Devoxx Poland - Krakow (Poland) Nous contacter Pour réagir à cet épisode, venez discuter sur le groupe Google https://groups.google.com/group/lescastcodeurs Contactez-nous via X/twitter https://twitter.com/lescastcodeurs ou Bluesky https://bsky.app/profile/lescastcodeurs.com Faire un crowdcast ou une crowdquestion Soutenez Les Cast Codeurs sur Patreon https://www.patreon.com/LesCastCodeurs Tous les épisodes et toutes les infos sur https://lescastcodeurs.com/
Talk Python To Me - Python conversations for passionate developers
Join me as I chat with Rich Iannone and Michael Chow from Posit where we explore the transformative power of data tables with the Great Tables library. We'll cover practical applications of Great Tables, showcasing how thoughtful design and advanced formatting can elevate your data presentations. And you'll learn about innovative features like nano plots and interactive elements and the importance of structure, format, and style in crafting tables that both inform and inspire. Whether you're a seasoned data scientist or just starting out, this episode is packed with valuable tips and inspiring examples to enhance your data storytelling. Episode sponsors Talk Python Courses DigitalOcean Links from the show Michael Chow: github.com/machow Richard Iannone: github.com/rich-iannone Episode Deep Dives Writeup: talkpython.fm/blog Great Tables: github.com Making Beautiful, Publication Quality Tables PyCon talk: youtube.com Andrew Weatherman's Visualization Gallery: aweatherman.com Bureau of the Census Manual of Tabular Presentation: census.gov Table Contest: posit.co Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to Talk Python on YouTube: youtube.com Talk Python on Bluesky: @talkpython.fm at bsky.app Talk Python on Mastodon: talkpython Michael on Bluesky: @mkennedy.codes at bsky.app Michael on Mastodon: mkennedy
Hoje o papo é sobre o QuintoAndar. Neste episódio, conversamos sobre as formas como o QuintoAndar reconheceu que já não era mais uma pequena startup, e ajustou seus processos e a sua arquitetura para lidar com os desafios das empresas da primeira divisão da tecnologia brasileira. Vem ver quem participou desse papo: Paulo Silveira, o host que acerta sem combinar Paulo Golgher, CTO do QuintoAndar Rafael Castro, VP de Engenharia do QuintoAndar
Estas son las tecnologías que necesitas para ser programador web BACKEND en 2025
Applications for the 2025 AI Engineer Summit are up, and you can save the date for AIE Singapore in April and AIE World's Fair 2025 in June.Happy new year, and thanks for 100 great episodes! Please let us know what you want to see/hear for the next 100!Full YouTube Episode with Slides/ChartsLike and subscribe and hit that bell to get notifs!Timestamps* 00:00 Welcome to the 100th Episode!* 00:19 Reflecting on the Journey* 00:47 AI Engineering: The Rise and Impact* 03:15 Latent Space Live and AI Conferences* 09:44 The Competitive AI Landscape* 21:45 Synthetic Data and Future Trends* 35:53 Creative Writing with AI* 36:12 Legal and Ethical Issues in AI* 38:18 The Data War: GPU Poor vs. GPU Rich* 39:12 The Rise of GPU Ultra Rich* 40:47 Emerging Trends in AI Models* 45:31 The Multi-Modality War* 01:05:31 The Future of AI Benchmarks* 01:13:17 Pionote and Frontier Models* 01:13:47 Niche Models and Base Models* 01:14:30 State Space Models and RWKB* 01:15:48 Inference Race and Price Wars* 01:22:16 Major AI Themes of the Year* 01:22:48 AI Rewind: January to March* 01:26:42 AI Rewind: April to June* 01:33:12 AI Rewind: July to September* 01:34:59 AI Rewind: October to December* 01:39:53 Year-End Reflections and PredictionsTranscript[00:00:00] Welcome to the 100th Episode![00:00:00] Alessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co host Swyx for the 100th time today.[00:00:12] swyx: Yay, um, and we're so glad that, yeah, you know, everyone has, uh, followed us in this journey. How do you feel about it? 100 episodes.[00:00:19] Alessio: Yeah, I know.[00:00:19] Reflecting on the Journey[00:00:19] Alessio: Almost two years that we've been doing this. We've had four different studios. Uh, we've had a lot of changes. You know, we used to do this lightning round. When we first started that we didn't like, and we tried to change the question. The answer[00:00:32] swyx: was cursor and perplexity.[00:00:34] Alessio: Yeah, I love mid journey. It's like, do you really not like anything else?[00:00:38] Alessio: Like what's, what's the unique thing? And I think, yeah, we, we've also had a lot more research driven content. You know, we had like 3DAO, we had, you know. Jeremy Howard, we had more folks like that.[00:00:47] AI Engineering: The Rise and Impact[00:00:47] Alessio: I think we want to do more of that too in the new year, like having, uh, some of the Gemini folks, both on the research and the applied side.[00:00:54] Alessio: Yeah, but it's been a ton of fun. I think we both started, I wouldn't say as a joke, we were kind of like, Oh, we [00:01:00] should do a podcast. And I think we kind of caught the right wave, obviously. And I think your rise of the AI engineer posts just kind of get people. Sombra to congregate, and then the AI engineer summit.[00:01:11] Alessio: And that's why when I look at our growth chart, it's kind of like a proxy for like the AI engineering industry as a whole, which is almost like, like, even if we don't do that much, we keep growing just because there's so many more AI engineers. So did you expect that growth or did you expect that would take longer for like the AI engineer thing to kind of like become, you know, everybody talks about it today.[00:01:32] swyx: So, the sign of that, that we have won is that Gartner puts it at the top of the hype curve right now. So Gartner has called the peak in AI engineering. I did not expect, um, to what level. I knew that I was correct when I called it because I did like two months of work going into that. But I didn't know, You know, how quickly it could happen, and obviously there's a chance that I could be wrong.[00:01:52] swyx: But I think, like, most people have come around to that concept. Hacker News hates it, which is a good sign. But there's enough people that have defined it, you know, GitHub, when [00:02:00] they launched GitHub Models, which is the Hugging Face clone, they put AI engineers in the banner, like, above the fold, like, in big So I think it's like kind of arrived as a meaningful and useful definition.[00:02:12] swyx: I think people are trying to figure out where the boundaries are. I think that was a lot of the quote unquote drama that happens behind the scenes at the World's Fair in June. Because I think there's a lot of doubt or questions about where ML engineering stops and AI engineering starts. That's a useful debate to be had.[00:02:29] swyx: In some sense, I actually anticipated that as well. So I intentionally did not. Put a firm definition there because most of the successful definitions are necessarily underspecified and it's actually useful to have different perspectives and you don't have to specify everything from the outset.[00:02:45] Alessio: Yeah, I was at um, AWS reInvent and the line to get into like the AI engineering talk, so to speak, which is, you know, applied AI and whatnot was like, there are like hundreds of people just in line to go in.[00:02:56] Alessio: I think that's kind of what enabled me. People, right? Which is what [00:03:00] you kind of talked about. It's like, Hey, look, you don't actually need a PhD, just, yeah, just use the model. And then maybe we'll talk about some of the blind spots that you get as an engineer with the earlier posts that we also had on on the sub stack.[00:03:11] Alessio: But yeah, it's been a heck of a heck of a two years.[00:03:14] swyx: Yeah.[00:03:15] Latent Space Live and AI Conferences[00:03:15] swyx: You know, I was, I was trying to view the conference as like, so NeurIPS is I think like 16, 17, 000 people. And the Latent Space Live event that we held there was 950 signups. I think. The AI world, the ML world is still very much research heavy. And that's as it should be because ML is very much in a research phase.[00:03:34] swyx: But as we move this entire field into production, I think that ratio inverts into becoming more engineering heavy. So at least I think engineering should be on the same level, even if it's never as prestigious, like it'll always be low status because at the end of the day, you're manipulating APIs or whatever.[00:03:51] swyx: But Yeah, wrapping GPTs, but there's going to be an increasing stack and an art to doing these, these things well. And I, you know, I [00:04:00] think that's what we're focusing on for the podcast, the conference and basically everything I do seems to make sense. And I think we'll, we'll talk about the trends here that apply.[00:04:09] swyx: It's, it's just very strange. So, like, there's a mix of, like, keeping on top of research while not being a researcher and then putting that research into production. So, like, people always ask me, like, why are you covering Neuralibs? Like, this is a ML research conference and I'm like, well, yeah, I mean, we're not going to, to like, understand everything Or reproduce every single paper, but the stuff that is being found here is going to make it through into production at some point, you hope.[00:04:32] swyx: And then actually like when I talk to the researchers, they actually get very excited because they're like, oh, you guys are actually caring about how this goes into production and that's what they really really want. The measure of success is previously just peer review, right? Getting 7s and 8s on their um, Academic review conferences and stuff like citations is one metric, but money is a better metric.[00:04:51] Alessio: Money is a better metric. Yeah, and there were about 2200 people on the live stream or something like that. Yeah, yeah. Hundred on the live stream. So [00:05:00] I try my best to moderate, but it was a lot spicier in person with Jonathan and, and Dylan. Yeah, that it was in the chat on YouTube.[00:05:06] swyx: I would say that I actually also created.[00:05:09] swyx: Layen Space Live in order to address flaws that are perceived in academic conferences. This is not NeurIPS specific, it's ICML, NeurIPS. Basically, it's very sort of oriented towards the PhD student, uh, market, job market, right? Like literally all, basically everyone's there to advertise their research and skills and get jobs.[00:05:28] swyx: And then obviously all the, the companies go there to hire them. And I think that's great for the individual researchers, but for people going there to get info is not great because you have to read between the lines, bring a ton of context in order to understand every single paper. So what is missing is effectively what I ended up doing, which is domain by domain, go through and recap the best of the year.[00:05:48] swyx: Survey the field. And there are, like NeurIPS had a, uh, I think ICML had a like a position paper track, NeurIPS added a benchmarks, uh, datasets track. These are ways in which to address that [00:06:00] issue. Uh, there's always workshops as well. Every, every conference has, you know, a last day of workshops and stuff that provide more of an overview.[00:06:06] swyx: But they're not specifically prompted to do so. And I think really, uh, Organizing a conference is just about getting good speakers and giving them the correct prompts. And then they will just go and do that thing and they do a very good job of it. So I think Sarah did a fantastic job with the startups prompt.[00:06:21] swyx: I can't list everybody, but we did best of 2024 in startups, vision, open models. Post transformers, synthetic data, small models, and agents. And then the last one was the, uh, and then we also did a quick one on reasoning with Nathan Lambert. And then the last one, obviously, was the debate that people were very hyped about.[00:06:39] swyx: It was very awkward. And I'm really, really thankful for John Franco, basically, who stepped up to challenge Dylan. Because Dylan was like, yeah, I'll do it. But He was pro scaling. And I think everyone who is like in AI is pro scaling, right? So you need somebody who's ready to publicly say, no, we've hit a wall.[00:06:57] swyx: So that means you're saying Sam Altman's wrong. [00:07:00] You're saying, um, you know, everyone else is wrong. It helps that this was the day before Ilya went on, went up on stage and then said pre training has hit a wall. And data has hit a wall. So actually Jonathan ended up winning, and then Ilya supported that statement, and then Noam Brown on the last day further supported that statement as well.[00:07:17] swyx: So it's kind of interesting that I think the consensus kind of going in was that we're not done scaling, like you should believe in a better lesson. And then, four straight days in a row, you had Sepp Hochreiter, who is the creator of the LSTM, along with everyone's favorite OG in AI, which is Juergen Schmidhuber.[00:07:34] swyx: He said that, um, we're pre trading inside a wall, or like, we've run into a different kind of wall. And then we have, you know John Frankel, Ilya, and then Noam Brown are all saying variations of the same thing, that we have hit some kind of wall in the status quo of what pre trained, scaling large pre trained models has looked like, and we need a new thing.[00:07:54] swyx: And obviously the new thing for people is some make, either people are calling it inference time compute or test time [00:08:00] compute. I think the collective terminology has been inference time, and I think that makes sense because test time, calling it test, meaning, has a very pre trained bias, meaning that the only reason for running inference at all is to test your model.[00:08:11] swyx: That is not true. Right. Yeah. So, so, I quite agree that. OpenAI seems to have adopted, or the community seems to have adopted this terminology of ITC instead of TTC. And that, that makes a lot of sense because like now we care about inference, even right down to compute optimality. Like I actually interviewed this author who recovered or reviewed the Chinchilla paper.[00:08:31] swyx: Chinchilla paper is compute optimal training, but what is not stated in there is it's pre trained compute optimal training. And once you start caring about inference, compute optimal training, you have a different scaling law. And in a way that we did not know last year.[00:08:45] Alessio: I wonder, because John is, he's also on the side of attention is all you need.[00:08:49] Alessio: Like he had the bet with Sasha. So I'm curious, like he doesn't believe in scaling, but he thinks the transformer, I wonder if he's still. So, so,[00:08:56] swyx: so he, obviously everything is nuanced and you know, I told him to play a character [00:09:00] for this debate, right? So he actually does. Yeah. He still, he still believes that we can scale more.[00:09:04] swyx: Uh, he just assumed the character to be very game for, for playing this debate. So even more kudos to him that he assumed a position that he didn't believe in and still won the debate.[00:09:16] Alessio: Get rekt, Dylan. Um, do you just want to quickly run through some of these things? Like, uh, Sarah's presentation, just the highlights.[00:09:24] swyx: Yeah, we can't go through everyone's slides, but I pulled out some things as a factor of, like, stuff that we were going to talk about. And we'll[00:09:30] Alessio: publish[00:09:31] swyx: the rest. Yeah, we'll publish on this feed the best of 2024 in those domains. And hopefully people can benefit from the work that our speakers have done.[00:09:39] swyx: But I think it's, uh, these are just good slides. And I've been, I've been looking for a sort of end of year recaps from, from people.[00:09:44] The Competitive AI Landscape[00:09:44] swyx: The field has progressed a lot. You know, I think the max ELO in 2023 on LMSys used to be 1200 for LMSys ELOs. And now everyone is at least at, uh, 1275 in their ELOs, and this is across Gemini, Chadjibuti, [00:10:00] Grok, O1.[00:10:01] swyx: ai, which with their E Large model, and Enthopic, of course. It's a very, very competitive race. There are multiple Frontier labs all racing, but there is a clear tier zero Frontier. And then there's like a tier one. It's like, I wish I had everything else. Tier zero is extremely competitive. It's effectively now three horse race between Gemini, uh, Anthropic and OpenAI.[00:10:21] swyx: I would say that people are still holding out a candle for XAI. XAI, I think, for some reason, because their API was very slow to roll out, is not included in these metrics. So it's actually quite hard to put on there. As someone who also does charts, XAI is continually snubbed because they don't work well with the benchmarking people.[00:10:42] swyx: Yeah, yeah, yeah. It's a little trivia for why XAI always gets ignored. The other thing is market share. So these are slides from Sarah. We have it up on the screen. It has gone from very heavily open AI. So we have some numbers and estimates. These are from RAMP. Estimates of open AI market share in [00:11:00] December 2023.[00:11:01] swyx: And this is basically, what is it, GPT being 95 percent of production traffic. And I think if you correlate that with stuff that we asked. Harrison Chase on the LangChain episode, it was true. And then CLAUD 3 launched mid middle of this year. I think CLAUD 3 launched in March, CLAUD 3. 5 Sonnet was in June ish.[00:11:23] swyx: And you can start seeing the market share shift towards opening, uh, towards that topic, uh, very, very aggressively. The more recent one is Gemini. So if I scroll down a little bit, this is an even more recent dataset. So RAM's dataset ends in September 2 2. 2024. Gemini has basically launched a price war at the low end, uh, with Gemini Flash, uh, being basically free for personal use.[00:11:44] swyx: Like, I think people don't understand the free tier. It's something like a billion tokens per day. Unless you're trying to abuse it, you cannot really exhaust your free tier on Gemini. They're really trying to get you to use it. They know they're in like third place, um, fourth place, depending how you, how you count.[00:11:58] swyx: And so they're going after [00:12:00] the Lower tier first, and then, you know, maybe the upper tier later, but yeah, Gemini Flash, according to OpenRouter, is now 50 percent of their OpenRouter requests. Obviously, these are the small requests. These are small, cheap requests that are mathematically going to be more.[00:12:15] swyx: The smart ones obviously are still going to OpenAI. But, you know, it's a very, very big shift in the market. Like basically 2023, 2022, To going into 2024 opening has gone from nine five market share to Yeah. Reasonably somewhere between 50 to 75 market share.[00:12:29] Alessio: Yeah. I'm really curious how ramped does the attribution to the model?[00:12:32] Alessio: If it's API, because I think it's all credit card spin. . Well, but it's all, the credit card doesn't say maybe. Maybe the, maybe when they do expenses, they upload the PDF, but yeah, the, the German I think makes sense. I think that was one of my main 2024 takeaways that like. The best small model companies are the large labs, which is not something I would have thought that the open source kind of like long tail would be like the small model.[00:12:53] swyx: Yeah, different sizes of small models we're talking about here, right? Like so small model here for Gemini is AB, [00:13:00] right? Uh, mini. We don't know what the small model size is, but yeah, it's probably in the double digits or maybe single digits, but probably double digits. The open source community has kind of focused on the one to three B size.[00:13:11] swyx: Mm-hmm . Yeah. Maybe[00:13:12] swyx: zero, maybe 0.5 B uh, that's moon dream and that is small for you then, then that's great. It makes sense that we, we have a range for small now, which is like, may, maybe one to five B. Yeah. I'll even put that at, at, at the high end. And so this includes Gemma from Gemini as well. But also includes the Apple Foundation models, which I think Apple Foundation is 3B.[00:13:32] Alessio: Yeah. No, that's great. I mean, I think in the start small just meant cheap. I think today small is actually a more nuanced discussion, you know, that people weren't really having before.[00:13:43] swyx: Yeah, we can keep going. This is a slide that I smiley disagree with Sarah. She's pointing to the scale SEAL leaderboard. I think the Researchers that I talked with at NeurIPS were kind of positive on this because basically you need private test [00:14:00] sets to prevent contamination.[00:14:02] swyx: And Scale is one of maybe three or four people this year that has really made an effort in doing a credible private test set leaderboard. Llama405B does well compared to Gemini and GPT 40. And I think that's good. I would say that. You know, it's good to have an open model that is that big, that does well on those metrics.[00:14:23] swyx: But anyone putting 405B in production will tell you, if you scroll down a little bit to the artificial analysis numbers, that it is very slow and very expensive to infer. Um, it doesn't even fit on like one node. of, uh, of H100s. Cerebras will be happy to tell you they can serve 4 or 5B on their super large chips.[00:14:42] swyx: But, um, you know, if you need to do anything custom to it, you're still kind of constrained. So, is 4 or 5B really that relevant? Like, I think most people are basically saying that they only use 4 or 5B as a teacher model to distill down to something. Even Meta is doing it. So with Lama 3. [00:15:00] 3 launched, they only launched the 70B because they use 4 or 5B to distill the 70B.[00:15:03] swyx: So I don't know if like open source is keeping up. I think they're the, the open source industrial complex is very invested in telling you that the, if the gap is narrowing, I kind of disagree. I think that the gap is widening with O1. I think there are very, very smart people trying to narrow that gap and they should.[00:15:22] swyx: I really wish them success, but you cannot use a chart that is nearing 100 in your saturation chart. And look, the distance between open source and closed source is narrowing. Of course it's going to narrow because you're near 100. This is stupid. But in metrics that matter, is open source narrowing?[00:15:38] swyx: Probably not for O1 for a while. And it's really up to the open source guys to figure out if they can match O1 or not.[00:15:46] Alessio: I think inference time compute is bad for open source just because, you know, Doc can donate the flops at training time, but he cannot donate the flops at inference time. So it's really hard to like actually keep up on that axis.[00:15:59] Alessio: Big, big business [00:16:00] model shift. So I don't know what that means for the GPU clouds. I don't know what that means for the hyperscalers, but obviously the big labs have a lot of advantage. Because, like, it's not a static artifact that you're putting the compute in. You're kind of doing that still, but then you're putting a lot of computed inference too.[00:16:17] swyx: Yeah, yeah, yeah. Um, I mean, Llama4 will be reasoning oriented. We talked with Thomas Shalom. Um, kudos for getting that episode together. That was really nice. Good, well timed. Actually, I connected with the AI meta guy, uh, at NeurIPS, and, um, yeah, we're going to coordinate something for Llama4. Yeah, yeah,[00:16:32] Alessio: and our friend, yeah.[00:16:33] Alessio: Clara Shi just joined to lead the business agent side. So I'm sure we'll have her on in the new year.[00:16:39] swyx: Yeah. So, um, my comment on, on the business model shift, this is super interesting. Apparently it is wide knowledge that OpenAI wanted more than 6. 6 billion dollars for their fundraise. They wanted to raise, you know, higher, and they did not.[00:16:51] swyx: And what that means is basically like, it's very convenient that we're not getting GPT 5, which would have been a larger pre train. We should have a lot of upfront money. And [00:17:00] instead we're, we're converting fixed costs into variable costs, right. And passing it on effectively to the customer. And it's so much easier to take margin there because you can directly attribute it to like, Oh, you're using this more.[00:17:12] swyx: Therefore you, you pay more of the cost and I'll just slap a margin in there. So like that lets you control your growth margin and like tie your. Your spend, or your sort of inference spend, accordingly. And it's just really interesting to, that this change in the sort of inference paradigm has arrived exactly at the same time that the funding environment for pre training is effectively drying up, kind of.[00:17:36] swyx: I feel like maybe the VCs are very in tune with research anyway, so like, they would have noticed this, but, um, it's just interesting.[00:17:43] Alessio: Yeah, and I was looking back at our yearly recap of last year. Yeah. And the big thing was like the mixed trial price fights, you know, and I think now it's almost like there's nowhere to go, like, you know, Gemini Flash is like basically giving it away for free.[00:17:55] Alessio: So I think this is a good way for the labs to generate more revenue and pass down [00:18:00] some of the compute to the customer. I think they're going to[00:18:02] swyx: keep going. I think that 2, will come.[00:18:05] Alessio: Yeah, I know. Totally. I mean, next year, the first thing I'm doing is signing up for Devin. Signing up for the pro chat GBT.[00:18:12] Alessio: Just to try. I just want to see what does it look like to spend a thousand dollars a month on AI?[00:18:17] swyx: Yes. Yes. I think if your, if your, your job is a, at least AI content creator or VC or, you know, someone who, whose job it is to stay on, stay on top of things, you should already be spending like a thousand dollars a month on, on stuff.[00:18:28] swyx: And then obviously easy to spend, hard to use. You have to actually use. The good thing is that actually Google lets you do a lot of stuff for free now. So like deep research. That they just launched. Uses a ton of inference and it's, it's free while it's in preview.[00:18:45] Alessio: Yeah. They need to put that in Lindy.[00:18:47] Alessio: I've been using Lindy lately. I've been a built a bunch of things once we had flow because I liked the new thing. It's pretty good. I even did a phone call assistant. Um, yeah, they just launched Lindy voice. Yeah, I think once [00:19:00] they get advanced voice mode like capability today, still like speech to text, you can kind of tell.[00:19:06] Alessio: Um, but it's good for like reservations and things like that. So I have a meeting prepper thing. And so[00:19:13] swyx: it's good. Okay. I feel like we've, we've covered a lot of stuff. Uh, I, yeah, I, you know, I think We will go over the individual, uh, talks in a separate episode. Uh, I don't want to take too much time with, uh, this stuff, but that suffice to say that there is a lot of progress in each field.[00:19:28] swyx: Uh, we covered vision. Basically this is all like the audience voting for what they wanted. And then I just invited the best people I could find in each audience, especially agents. Um, Graham, who I talked to at ICML in Vienna, he is currently still number one. It's very hard to stay on top of SweetBench.[00:19:45] swyx: OpenHand is currently still number one. switchbench full, which is the hardest one. He had very good thoughts on agents, which I, which I'll highlight for people. Everyone is saying 2025 is the year of agents, just like they said last year. And, uh, but he had [00:20:00] thoughts on like eight parts of what are the frontier problems to solve in agents.[00:20:03] swyx: And so I'll highlight that talk as well.[00:20:05] Alessio: Yeah. The number six, which is the Hacken agents learn more about the environment, has been a Super interesting to us as well, just to think through, because, yeah, how do you put an agent in an enterprise where most things in an enterprise have never been public, you know, a lot of the tooling, like the code bases and things like that.[00:20:23] Alessio: So, yeah, there's not indexing and reg. Well, yeah, but it's more like. You can't really rag things that are not documented. But people know them based on how they've been doing it. You know, so I think there's almost this like, you know, Oh, institutional knowledge. Yeah, the boring word is kind of like a business process extraction.[00:20:38] Alessio: Yeah yeah, I see. It's like, how do you actually understand how these things are done? I see. Um, and I think today the, the problem is that, Yeah, the agents are, that most people are building are good at following instruction, but are not as good as like extracting them from you. Um, so I think that will be a big unlock just to touch quickly on the Jeff Dean thing.[00:20:55] Alessio: I thought it was pretty, I mean, we'll link it in the, in the things, but. I think the main [00:21:00] focus was like, how do you use ML to optimize the systems instead of just focusing on ML to do something else? Yeah, I think speculative decoding, we had, you know, Eugene from RWKB on the podcast before, like he's doing a lot of that with Fetterless AI.[00:21:12] swyx: Everyone is. I would say it's the norm. I'm a little bit uncomfortable with how much it costs, because it does use more of the GPU per call. But because everyone is so keen on fast inference, then yeah, makes sense.[00:21:24] Alessio: Exactly. Um, yeah, but we'll link that. Obviously Jeff is great.[00:21:30] swyx: Jeff is, Jeff's talk was more, it wasn't focused on Gemini.[00:21:33] swyx: I think people got the wrong impression from my tweet. It's more about how Google approaches ML and uses ML to design systems and then systems feedback into ML. And I think this ties in with Lubna's talk.[00:21:45] Synthetic Data and Future Trends[00:21:45] swyx: on synthetic data where it's basically the story of bootstrapping of humans and AI in AI research or AI in production.[00:21:53] swyx: So her talk was on synthetic data, where like how much synthetic data has grown in 2024 in the pre training side, the post training side, [00:22:00] and the eval side. And I think Jeff then also extended it basically to chips, uh, to chip design. So he'd spend a lot of time talking about alpha chip. And most of us in the audience are like, we're not working on hardware, man.[00:22:11] swyx: Like you guys are great. TPU is great. Okay. We'll buy TPUs.[00:22:14] Alessio: And then there was the earlier talk. Yeah. But, and then we have, uh, I don't know if we're calling them essays. What are we calling these? But[00:22:23] swyx: for me, it's just like bonus for late in space supporters, because I feel like they haven't been getting anything.[00:22:29] swyx: And then I wanted a more high frequency way to write stuff. Like that one I wrote in an afternoon. I think basically we now have an answer to what Ilya saw. It's one year since. The blip. And we know what he saw in 2014. We know what he saw in 2024. We think we know what he sees in 2024. He gave some hints and then we have vague indications of what he saw in 2023.[00:22:54] swyx: So that was the Oh, and then 2016 as well, because of this lawsuit with Elon, OpenAI [00:23:00] is publishing emails from Sam's, like, his personal text messages to Siobhan, Zelis, or whatever. So, like, we have emails from Ilya saying, this is what we're seeing in OpenAI, and this is why we need to scale up GPUs. And I think it's very prescient in 2016 to write that.[00:23:16] swyx: And so, like, it is exactly, like, basically his insights. It's him and Greg, basically just kind of driving the scaling up of OpenAI, while they're still playing Dota. They're like, no, like, we see the path here.[00:23:30] Alessio: Yeah, and it's funny, yeah, they even mention, you know, we can only train on 1v1 Dota. We need to train on 5v5, and that takes too many GPUs.[00:23:37] Alessio: Yeah,[00:23:37] swyx: and at least for me, I can speak for myself, like, I didn't see the path from Dota to where we are today. I think even, maybe if you ask them, like, they wouldn't necessarily draw a straight line. Yeah,[00:23:47] Alessio: no, definitely. But I think like that was like the whole idea of almost like the RL and we talked about this with Nathan on his podcast.[00:23:55] Alessio: It's like with RL, you can get very good at specific things, but then you can't really like generalize as much. And I [00:24:00] think the language models are like the opposite, which is like, you're going to throw all this data at them and scale them up, but then you really need to drive them home on a specific task later on.[00:24:08] Alessio: And we'll talk about the open AI reinforcement, fine tuning, um, announcement too, and all of that. But yeah, I think like scale is all you need. That's kind of what Elia will be remembered for. And I think just maybe to clarify on like the pre training is over thing that people love to tweet. I think the point of the talk was like everybody, we're scaling these chips, we're scaling the compute, but like the second ingredient which is data is not scaling at the same rate.[00:24:35] Alessio: So it's not necessarily pre training is over. It's kind of like What got us here won't get us there. In his email, he predicted like 10x growth every two years or something like that. And I think maybe now it's like, you know, you can 10x the chips again, but[00:24:49] swyx: I think it's 10x per year. Was it? I don't know.[00:24:52] Alessio: Exactly. And Moore's law is like 2x. So it's like, you know, much faster than that. And yeah, I like the fossil fuel of AI [00:25:00] analogy. It's kind of like, you know, the little background tokens thing. So the OpenAI reinforcement fine tuning is basically like, instead of fine tuning on data, you fine tune on a reward model.[00:25:09] Alessio: So it's basically like, instead of being data driven, it's like task driven. And I think people have tasks to do, they don't really have a lot of data. So I'm curious to see how that changes, how many people fine tune, because I think this is what people run into. It's like, Oh, you can fine tune llama. And it's like, okay, where do I get the data?[00:25:27] Alessio: To fine tune it on, you know, so it's great that we're moving the thing. And then I really like he had this chart where like, you know, the brain mass and the body mass thing is basically like mammals that scaled linearly by brain and body size, and then humans kind of like broke off the slope. So it's almost like maybe the mammal slope is like the pre training slope.[00:25:46] Alessio: And then the post training slope is like the, the human one.[00:25:49] swyx: Yeah. I wonder what the. I mean, we'll know in 10 years, but I wonder what the y axis is for, for Ilya's SSI. We'll try to get them on.[00:25:57] Alessio: Ilya, if you're listening, you're [00:26:00] welcome here. Yeah, and then he had, you know, what comes next, like agent, synthetic data, inference, compute, I thought all of that was like that.[00:26:05] Alessio: I don't[00:26:05] swyx: think he was dropping any alpha there. Yeah, yeah, yeah.[00:26:07] Alessio: Yeah. Any other new reps? Highlights?[00:26:10] swyx: I think that there was comparatively a lot more work. Oh, by the way, I need to plug that, uh, my friend Yi made this, like, little nice paper. Yeah, that was really[00:26:20] swyx: nice.[00:26:20] swyx: Uh, of, uh, of, like, all the, he's, she called it must read papers of 2024.[00:26:26] swyx: So I laid out some of these at NeurIPS, and it was just gone. Like, everyone just picked it up. Because people are dying for, like, little guidance and visualizations And so, uh, I thought it was really super nice that we got there.[00:26:38] Alessio: Should we do a late in space book for each year? Uh, I thought about it. For each year we should.[00:26:42] Alessio: Coffee table book. Yeah. Yeah. Okay. Put it in the will. Hi, Will. By the way, we haven't introduced you. He's our new, you know, general organist, Jamie. You need to[00:26:52] swyx: pull up more things. One thing I saw that, uh, Okay, one fun one, and then one [00:27:00] more general one. So the fun one is this paper on agent collusion. This is a paper on steganography.[00:27:06] swyx: This is secret collusion among AI agents, multi agent deception via steganography. I tried to go to NeurIPS in order to find these kinds of papers because the real reason Like NeurIPS this year has a lottery system. A lot of people actually even go and don't buy tickets because they just go and attend the side events.[00:27:22] swyx: And then also the people who go and end up crowding around the most popular papers, which you already know and already read them before you showed up to NeurIPS. So the only reason you go there is to talk to the paper authors, but there's like something like 10, 000 other. All these papers out there that, you know, are just people's work that they, that they did on the air and they failed to get attention for one reason or another.[00:27:42] swyx: And this was one of them. Uh, it was like all the way at the back. And this is a deep mind paper that actually focuses on collusion between AI agents, uh, by hiding messages in the text that they generate. Uh, so that's what steganography is. So a very simple example would be the first letter of every word.[00:27:57] swyx: If you Pick that out, you know, and the code sends a [00:28:00] different message than that. But something I've always emphasized is to LLMs, we read left to right. LLMs can read up, down, sideways, you know, in random character order. And it's the same to them as it is to us. So if we were ever to get You know, self motivated, underlined LLMs that we're trying to collaborate to take over the planet.[00:28:19] swyx: This would be how they do it. They spread messages among us in the messages that we generate. And he developed a scaling law for that. So he marked, I'm showing it on screen right now, the emergence of this phenomenon. Basically, for example, for Cypher encoding, GPT 2, Lama 2, mixed trial, GPT 3. 5, zero capabilities, and sudden 4.[00:28:40] swyx: And this is the kind of Jason Wei type emergence properties that people kind of look for. I think what made this paper stand out as well, so he developed the benchmark for steganography collusion, and he also focused on shelling point collusion, which is very low coordination. For agreeing on a decoding encoding format, you kind of need to have some [00:29:00] agreement on that.[00:29:00] swyx: But, but shelling point means like very, very low or almost no coordination. So for example, if I, if I ask someone, if the only message I give you is meet me in New York and you're not aware. Or when you would probably meet me at Grand Central Station. That is the Grand Central Station is a shelling point.[00:29:16] swyx: And it's probably somewhere, somewhere during the day. That is the shelling point of New York is Grand Central. To that extent, shelling points for steganography are things like the, the, the common decoding methods that we talked about. It will be interesting at some point in the future when we are worried about alignment.[00:29:30] swyx: It is not interesting today, but it's interesting that DeepMind is already thinking about this.[00:29:36] Alessio: I think that's like one of the hardest things about NeurIPS. It's like the long tail. I[00:29:41] swyx: found a pricing guy. I'm going to feature him on the podcast. Basically, this guy from NVIDIA worked out the optimal pricing for language models.[00:29:51] swyx: It's basically an econometrics paper at NeurIPS, where everyone else is talking about GPUs. And the guy with the GPUs is[00:29:57] Alessio: talking[00:29:57] swyx: about economics instead. [00:30:00] That was the sort of fun one. So the focus I saw is that model papers at NeurIPS are kind of dead. No one really presents models anymore. It's just data sets.[00:30:12] swyx: This is all the grad students are working on. So like there was a data sets track and then I was looking around like, I was like, you don't need a data sets track because every paper is a data sets paper. And so data sets and benchmarks, they're kind of flip sides of the same thing. So Yeah. Cool. Yeah, if you're a grad student, you're a GPU boy, you kind of work on that.[00:30:30] swyx: And then the, the sort of big model that people walk around and pick the ones that they like, and then they use it in their models. And that's, that's kind of how it develops. I, I feel like, um, like, like you didn't last year, you had people like Hao Tian who worked on Lava, which is take Lama and add Vision.[00:30:47] swyx: And then obviously actually I hired him and he added Vision to Grok. Now he's the Vision Grok guy. This year, I don't think there was any of those.[00:30:55] Alessio: What were the most popular, like, orals? Last year it was like the [00:31:00] Mixed Monarch, I think, was like the most attended. Yeah, uh, I need to look it up. Yeah, I mean, if nothing comes to mind, that's also kind of like an answer in a way.[00:31:10] Alessio: But I think last year there was a lot of interest in, like, furthering models and, like, different architectures and all of that.[00:31:16] swyx: I will say that I felt the orals, oral picks this year were not very good. Either that or maybe it's just a So that's the highlight of how I have changed in terms of how I view papers.[00:31:29] swyx: So like, in my estimation, two of the best papers in this year for datasets or data comp and refined web or fine web. These are two actually industrially used papers, not highlighted for a while. I think DCLM got the spotlight, FineWeb didn't even get the spotlight. So like, it's just that the picks were different.[00:31:48] swyx: But one thing that does get a lot of play that a lot of people are debating is the role that's scheduled. This is the schedule free optimizer paper from Meta from Aaron DeFazio. And this [00:32:00] year in the ML community, there's been a lot of chat about shampoo, soap, all the bathroom amenities for optimizing your learning rates.[00:32:08] swyx: And, uh, most people at the big labs are. Who I asked about this, um, say that it's cute, but it's not something that matters. I don't know, but it's something that was discussed and very, very popular. 4Wars[00:32:19] Alessio: of AI recap maybe, just quickly. Um, where do you want to start? Data?[00:32:26] swyx: So to remind people, this is the 4Wars piece that we did as one of our earlier recaps of this year.[00:32:31] swyx: And the belligerents are on the left, journalists, writers, artists, anyone who owns IP basically, New York Times, Stack Overflow, Reddit, Getty, Sarah Silverman, George RR Martin. Yeah, and I think this year we can add Scarlett Johansson to that side of the fence. So anyone suing, open the eye, basically. I actually wanted to get a snapshot of all the lawsuits.[00:32:52] swyx: I'm sure some lawyer can do it. That's the data quality war. On the right hand side, we have the synthetic data people, and I think we talked about Lumna's talk, you know, [00:33:00] really showing how much synthetic data has come along this year. I think there was a bit of a fight between scale. ai and the synthetic data community, because scale.[00:33:09] swyx: ai published a paper saying that synthetic data doesn't work. Surprise, surprise, scale. ai is the leading vendor of non synthetic data. Only[00:33:17] Alessio: cage free annotated data is useful.[00:33:21] swyx: So I think there's some debate going on there, but I don't think it's much debate anymore that at least synthetic data, for the reasons that are blessed in Luna's talk, Makes sense.[00:33:32] swyx: I don't know if you have any perspectives there.[00:33:34] Alessio: I think, again, going back to the reinforcement fine tuning, I think that will change a little bit how people think about it. I think today people mostly use synthetic data, yeah, for distillation and kind of like fine tuning a smaller model from like a larger model.[00:33:46] Alessio: I'm not super aware of how the frontier labs use it outside of like the rephrase, the web thing that Apple also did. But yeah, I think it'll be. Useful. I think like whether or not that gets us the big [00:34:00] next step, I think that's maybe like TBD, you know, I think people love talking about data because it's like a GPU poor, you know, I think, uh, synthetic data is like something that people can do, you know, so they feel more opinionated about it compared to, yeah, the optimizers stuff, which is like,[00:34:17] swyx: they don't[00:34:17] Alessio: really work[00:34:18] swyx: on.[00:34:18] swyx: I think that there is an angle to the reasoning synthetic data. So this year, we covered in the paper club, the star series of papers. So that's star, Q star, V star. It basically helps you to synthesize reasoning steps, or at least distill reasoning steps from a verifier. And if you look at the OpenAI RFT, API that they released, or that they announced, basically they're asking you to submit graders, or they choose from a preset list of graders.[00:34:49] swyx: Basically It feels like a way to create valid synthetic data for them to fine tune their reasoning paths on. Um, so I think that is another angle where it starts to make sense. And [00:35:00] so like, it's very funny that basically all the data quality wars between Let's say the music industry or like the newspaper publishing industry or the textbooks industry on the big labs.[00:35:11] swyx: It's all of the pre training era. And then like the new era, like the reasoning era, like nobody has any problem with all the reasoning, especially because it's all like sort of math and science oriented with, with very reasonable graders. I think the more interesting next step is how does it generalize beyond STEM?[00:35:27] swyx: We've been using O1 for And I would say like for summarization and creative writing and instruction following, I think it's underrated. I started using O1 in our intro songs before we killed the intro songs, but it's very good at writing lyrics. You know, I can actually say like, I think one of the O1 pro demos.[00:35:46] swyx: All of these things that Noam was showing was that, you know, you can write an entire paragraph or three paragraphs without using the letter A, right?[00:35:53] Creative Writing with AI[00:35:53] swyx: So like, like literally just anything instead of token, like not even token level, character level manipulation and [00:36:00] counting and instruction following. It's, uh, it's very, very strong.[00:36:02] swyx: And so no surprises when I ask it to rhyme, uh, and to, to create song lyrics, it's going to do that very much better than in previous models. So I think it's underrated for creative writing.[00:36:11] Alessio: Yeah.[00:36:12] Legal and Ethical Issues in AI[00:36:12] Alessio: What do you think is the rationale that they're going to have in court when they don't show you the thinking traces of O1, but then they want us to, like, they're getting sued for using other publishers data, you know, but then on their end, they're like, well, you shouldn't be using my data to then train your model.[00:36:29] Alessio: So I'm curious to see how that kind of comes. Yeah, I mean, OPA has[00:36:32] swyx: many ways to publish, to punish people without bringing, taking them to court. Already banned ByteDance for distilling their, their info. And so anyone caught distilling the chain of thought will be just disallowed to continue on, on, on the API.[00:36:44] swyx: And it's fine. It's no big deal. Like, I don't even think that's an issue at all, just because the chain of thoughts are pretty well hidden. Like you have to work very, very hard to, to get it to leak. And then even when it leaks the chain of thought, you don't know if it's, if it's [00:37:00] The bigger concern is actually that there's not that much IP hiding behind it, that Cosign, which we talked about, we talked to him on Dev Day, can just fine tune 4.[00:37:13] swyx: 0 to beat 0. 1 Cloud SONET so far is beating O1 on coding tasks without, at least O1 preview, without being a reasoning model, same for Gemini Pro or Gemini 2. 0. So like, how much is reasoning important? How much of a moat is there in this, like, All of these are proprietary sort of training data that they've presumably accomplished.[00:37:34] swyx: Because even DeepSeek was able to do it. And they had, you know, two months notice to do this, to do R1. So, it's actually unclear how much moat there is. Obviously, you know, if you talk to the Strawberry team, they'll be like, yeah, I mean, we spent the last two years doing this. So, we don't know. And it's going to be Interesting because there'll be a lot of noise from people who say they have inference time compute and actually don't because they just have fancy chain of thought.[00:38:00][00:38:00] swyx: And then there's other people who actually do have very good chain of thought. And you will not see them on the same level as OpenAI because OpenAI has invested a lot in building up the mythology of their team. Um, which makes sense. Like the real answer is somewhere in between.[00:38:13] Alessio: Yeah, I think that's kind of like the main data war story developing.[00:38:18] The Data War: GPU Poor vs. GPU Rich[00:38:18] Alessio: GPU poor versus GPU rich. Yeah. Where do you think we are? I think there was, again, going back to like the small model thing, there was like a time in which the GPU poor were kind of like the rebel faction working on like these models that were like open and small and cheap. And I think today people don't really care as much about GPUs anymore.[00:38:37] Alessio: You also see it in the price of the GPUs. Like, you know, that market is kind of like plummeted because there's people don't want to be, they want to be GPU free. They don't even want to be poor. They just want to be, you know, completely without them. Yeah. How do you think about this war? You[00:38:52] swyx: can tell me about this, but like, I feel like the, the appetite for GPU rich startups, like the, you know, the, the funding plan is we will raise 60 million and [00:39:00] we'll give 50 of that to NVIDIA.[00:39:01] swyx: That is gone, right? Like, no one's, no one's pitching that. This was literally the plan, the exact plan of like, I can name like four or five startups, you know, this time last year. So yeah, GPU rich startups gone.[00:39:12] The Rise of GPU Ultra Rich[00:39:12] swyx: But I think like, The GPU ultra rich, the GPU ultra high net worth is still going. So, um, now we're, you know, we had Leopold's essay on the trillion dollar cluster.[00:39:23] swyx: We're not quite there yet. We have multiple labs, um, you know, XAI very famously, you know, Jensen Huang praising them for being. Best boy number one in spinning up 100, 000 GPU cluster in like 12 days or something. So likewise at Meta, likewise at OpenAI, likewise at the other labs as well. So like the GPU ultra rich are going to keep doing that because I think partially it's an article of faith now that you just need it.[00:39:46] swyx: Like you don't even know what it's going to, what you're going to use it for. You just, you just need it. And it makes sense that if, especially if we're going into. More researchy territory than we are. So let's say 2020 to 2023 was [00:40:00] let's scale big models territory because we had GPT 3 in 2020 and we were like, okay, we'll go from 1.[00:40:05] swyx: 75b to 1. 8b, 1. 8t. And that was GPT 3 to GPT 4. Okay, that's done. As far as everyone is concerned, Opus 3. 5 is not coming out, GPT 4. 5 is not coming out, and Gemini 2, we don't have Pro, whatever. We've hit that wall. Maybe I'll call it the 2 trillion perimeter wall. We're not going to 10 trillion. No one thinks it's a good idea, at least from training costs, from the amount of data, or at least the inference.[00:40:36] swyx: Would you pay 10x the price of GPT Probably not. Like, like you want something else that, that is at least more useful. So it makes sense that people are pivoting in terms of their inference paradigm.[00:40:47] Emerging Trends in AI Models[00:40:47] swyx: And so when it's more researchy, then you actually need more just general purpose compute to mess around with, uh, at the exact same time that production deployments of the old, the previous paradigm is still ramping up,[00:40:58] swyx: um,[00:40:58] swyx: uh, pretty aggressively.[00:40:59] swyx: So [00:41:00] it makes sense that the GPU rich are growing. We have now interviewed both together and fireworks and replicates. Uh, we haven't done any scale yet. But I think Amazon, maybe kind of a sleeper one, Amazon, in a sense of like they, at reInvent, I wasn't expecting them to do so well, but they are now a foundation model lab.[00:41:18] swyx: It's kind of interesting. Um, I think, uh, you know, David went over there and started just creating models.[00:41:25] Alessio: Yeah, I mean, that's the power of prepaid contracts. I think like a lot of AWS customers, you know, they do this big reserve instance contracts and now they got to use their money. That's why so many startups.[00:41:37] Alessio: Get bought through the AWS marketplace so they can kind of bundle them together and prefer pricing.[00:41:42] swyx: Okay, so maybe GPU super rich doing very well, GPU middle class dead, and then GPU[00:41:48] Alessio: poor. I mean, my thing is like, everybody should just be GPU rich. There shouldn't really be, even the GPU poorest, it's like, does it really make sense to be GPU poor?[00:41:57] Alessio: Like, if you're GPU poor, you should just use the [00:42:00] cloud. Yes, you know, and I think there might be a future once we kind of like figure out what the size and shape of these models is where like the tiny box and these things come to fruition where like you can be GPU poor at home. But I think today is like, why are you working so hard to like get these models to run on like very small clusters where it's like, It's so cheap to run them.[00:42:21] Alessio: Yeah, yeah,[00:42:22] swyx: yeah. I think mostly people think it's cool. People think it's a stepping stone to scaling up. So they aspire to be GPU rich one day and they're working on new methods. Like news research, like probably the most deep tech thing they've done this year is Distro or whatever the new name is.[00:42:38] swyx: There's a lot of interest in heterogeneous computing, distributed computing. I tend generally to de emphasize that historically, but it may be coming to a time where it is starting to be relevant. I don't know. You know, SF compute launched their compute marketplace this year, and like, who's really using that?[00:42:53] swyx: Like, it's a bunch of small clusters, disparate types of compute, and if you can make that [00:43:00] useful, then that will be very beneficial to the broader community, but maybe still not the source of frontier models. It's just going to be a second tier of compute that is unlocked for people, and that's fine. But yeah, I mean, I think this year, I would say a lot more on device, We are, I now have Apple intelligence on my phone.[00:43:19] swyx: Doesn't do anything apart from summarize my notifications. But still, not bad. Like, it's multi modal.[00:43:25] Alessio: Yeah, the notification summaries are so and so in my experience.[00:43:29] swyx: Yeah, but they add, they add juice to life. And then, um, Chrome Nano, uh, Gemini Nano is coming out in Chrome. Uh, they're still feature flagged, but you can, you can try it now if you, if you use the, uh, the alpha.[00:43:40] swyx: And so, like, I, I think, like, you know, We're getting the sort of GPU poor version of a lot of these things coming out, and I think it's like quite useful. Like Windows as well, rolling out RWKB in sort of every Windows department is super cool. And I think the last thing that I never put in this GPU poor war, that I think I should now, [00:44:00] is the number of startups that are GPU poor but still scaling very well, as sort of wrappers on top of either a foundation model lab, or GPU Cloud.[00:44:10] swyx: GPU Cloud, it would be Suno. Suno, Ramp has rated as one of the top ranked, fastest growing startups of the year. Um, I think the last public number is like zero to 20 million this year in ARR and Suno runs on Moto. So Suno itself is not GPU rich, but they're just doing the training on, on Moto, uh, who we've also talked to on, on the podcast.[00:44:31] swyx: The other one would be Bolt, straight cloud wrapper. And, and, um, Again, another, now they've announced 20 million ARR, which is another step up from our 8 million that we put on the title. So yeah, I mean, it's crazy that all these GPU pores are finding a way while the GPU riches are also finding a way. And then the only failures, I kind of call this the GPU smiling curve, where the edges do well, because you're either close to the machines, and you're like [00:45:00] number one on the machines, or you're like close to the customers, and you're number one on the customer side.[00:45:03] swyx: And the people who are in the middle. Inflection, um, character, didn't do that great. I think character did the best of all of them. Like, you have a note in here that we apparently said that character's price tag was[00:45:15] Alessio: 1B.[00:45:15] swyx: Did I say that?[00:45:16] Alessio: Yeah. You said Google should just buy them for 1B. I thought it was a crazy number.[00:45:20] Alessio: Then they paid 2. 7 billion. I mean, for like,[00:45:22] swyx: yeah.[00:45:22] Alessio: What do you pay for node? Like, I don't know what the game world was like. Maybe the starting price was 1B. I mean, whatever it was, it worked out for everybody involved.[00:45:31] The Multi-Modality War[00:45:31] Alessio: Multimodality war. And this one, we never had text to video in the first version, which now is the hottest.[00:45:37] swyx: Yeah, I would say it's a subset of image, but yes.[00:45:40] Alessio: Yeah, well, but I think at the time it wasn't really something people were doing, and now we had VO2 just came out yesterday. Uh, Sora was released last month, last week. I've not tried Sora, because the day that I tried, it wasn't, yeah. I[00:45:54] swyx: think it's generally available now, you can go to Sora.[00:45:56] swyx: com and try it. Yeah, they had[00:45:58] Alessio: the outage. Which I [00:46:00] think also played a part into it. Small things. Yeah. What's the other model that you posted today that was on Replicate? Video or OneLive?[00:46:08] swyx: Yeah. Very, very nondescript name, but it is from Minimax, which I think is a Chinese lab. The Chinese labs do surprisingly well at the video models.[00:46:20] swyx: I'm not sure it's actually Chinese. I don't know. Hold me up to that. Yep. China. It's good. Yeah, the Chinese love video. What can I say? They have a lot of training data for video. Or a more relaxed regulatory environment.[00:46:37] Alessio: Uh, well, sure, in some way. Yeah, I don't think there's much else there. I think like, you know, on the image side, I think it's still open.[00:46:45] Alessio: Yeah, I mean,[00:46:46] swyx: 11labs is now a unicorn. So basically, what is multi modality war? Multi modality war is, do you specialize in a single modality, right? Or do you have GodModel that does all the modalities? So this is [00:47:00] definitely still going, in a sense of 11 labs, you know, now Unicorn, PicoLabs doing well, they launched Pico 2.[00:47:06] swyx: 0 recently, HeyGen, I think has reached 100 million ARR, Assembly, I don't know, but they have billboards all over the place, so I assume they're doing very, very well. So these are all specialist models, specialist models and specialist startups. And then there's the big labs who are doing the sort of all in one play.[00:47:24] swyx: And then here I would highlight Gemini 2 for having native image output. Have you seen the demos? Um, yeah, it's, it's hard to keep up. Literally they launched this last week and a shout out to Paige Bailey, who came to the Latent Space event to demo on the day of launch. And she wasn't prepared. She was just like, I'm just going to show you.[00:47:43] swyx: So they have voice. They have, you know, obviously image input, and then they obviously can code gen and all that. But the new one that OpenAI and Meta both have but they haven't launched yet is image output. So you can literally, um, I think their demo video was that you put in an image of a [00:48:00] car, and you ask for minor modifications to that car.[00:48:02] swyx: They can generate you that modification exactly as you asked. So there's no need for the stable diffusion or comfy UI workflow of like mask here and then like infill there in paint there and all that, all that stuff. This is small model nonsense. Big model people are like, huh, we got you in as everything in the transformer.[00:48:21] swyx: This is the multimodality war, which is, do you, do you bet on the God model or do you string together a whole bunch of, uh, Small models like a, like a chump. Yeah,[00:48:29] Alessio: I don't know, man. Yeah, that would be interesting. I mean, obviously I use Midjourney for all of our thumbnails. Um, they've been doing a ton on the product, I would say.[00:48:38] Alessio: They launched a new Midjourney editor thing. They've been doing a ton. Because I think, yeah, the motto is kind of like, Maybe, you know, people say black forest, the black forest models are better than mid journey on a pixel by pixel basis. But I think when you put it, put it together, have you tried[00:48:53] swyx: the same problems on black forest?[00:48:55] Alessio: Yes. But the problem is just like, you know, on black forest, it generates one image. And then it's like, you got to [00:49:00] regenerate. You don't have all these like UI things. Like what I do, no, but it's like time issue, you know, it's like a mid[00:49:06] swyx: journey. Call the API four times.[00:49:08] Alessio: No, but then there's no like variate.[00:49:10] Alessio: Like the good thing about mid journey is like, you just go in there and you're cooking. There's a lot of stuff that just makes it really easy. And I think people underestimate that. Like, it's not really a skill issue, because I'm paying mid journey, so it's a Black Forest skill issue, because I'm not paying them, you know?[00:49:24] Alessio: Yeah,[00:49:25] swyx: so, okay, so, uh, this is a UX thing, right? Like, you, you, you understand that, at least, we think that Black Forest should be able to do all that stuff. I will also shout out, ReCraft has come out, uh, on top of the image arena that, uh, artificial analysis has done, has apparently, uh, Flux's place. Is this still true?[00:49:41] swyx: So, Artificial Analysis is now a company. I highlighted them I think in one of the early AI Newses of the year. And they have launched a whole bunch of arenas. So, they're trying to take on LM Arena, Anastasios and crew. And they have an image arena. Oh yeah, Recraft v3 is now beating Flux 1. 1. Which is very surprising [00:50:00] because Flux And Black Forest Labs are the old stable diffusion crew who left stability after, um, the management issues.[00:50:06] swyx: So Recurve has come from nowhere to be the top image model. Uh, very, very strange. I would also highlight that Grok has now launched Aurora, which is, it's very interesting dynamics between Grok and Black Forest Labs because Grok's images were originally launched, uh, in partnership with Black Forest Labs as a, as a thin wrapper.[00:50:24] swyx: And then Grok was like, no, we'll make our own. And so they've made their own. I don't know, there are no APIs or benchmarks about it. They just announced it. So yeah, that's the multi modality war. I would say that so far, the small model, the dedicated model people are winning, because they are just focused on their tasks.[00:50:42] swyx: But the big model, People are always catching up. And the moment I saw the Gemini 2 demo of image editing, where I can put in an image and just request it and it does, that's how AI should work. Not like a whole bunch of complicated steps. So it really is something. And I think one frontier that we haven't [00:51:00] seen this year, like obviously video has done very well, and it will continue to grow.[00:51:03] swyx: You know, we only have Sora Turbo today, but at some point we'll get full Sora. Oh, at least the Hollywood Labs will get Fulsora. We haven't seen video to audio, or video synced to audio. And so the researchers that I talked to are already starting to talk about that as the next frontier. But there's still maybe like five more years of video left to actually be Soda.[00:51:23] swyx: I would say that Gemini's approach Compared to OpenAI, Gemini seems, or DeepMind's approach to video seems a lot more fully fledged than OpenAI. Because if you look at the ICML recap that I published that so far nobody has listened to, um, that people have listened to it. It's just a different, definitely different audience.[00:51:43] swyx: It's only seven hours long. Why are people not listening? It's like everything in Uh, so, so DeepMind has, is working on Genie. They also launched Genie 2 and VideoPoet. So, like, they have maybe four years advantage on world modeling that OpenAI does not have. Because OpenAI basically only started [00:52:00] Diffusion Transformers last year, you know, when they hired, uh, Bill Peebles.[00:52:03] swyx: So, DeepMind has, has a bit of advantage here, I would say, in, in, in showing, like, the reason that VO2, while one, They cherry pick their videos. So obviously it looks better than Sora, but the reason I would believe that VO2, uh, when it's fully launched will do very well is because they have all this background work in video that they've done for years.[00:52:22] swyx: Like, like last year's NeurIPS, I already was interviewing some of their video people. I forget their model name, but for, for people who are dedicated fans, they can go to NeurIPS 2023 and see, see that paper.[00:52:32] Alessio: And then last but not least, the LLMOS. We renamed it to Ragops, formerly known as[00:52:39] swyx: Ragops War. I put the latest chart on the Braintrust episode.[00:52:43] swyx: I think I'm going to separate these essays from the episode notes. So the reason I used to do that, by the way, is because I wanted to show up on Hacker News. I wanted the podcast to show up on Hacker News. So I always put an essay inside of there because Hacker News people like to read and not listen.[00:52:58] Alessio: So episode essays,[00:52:59] swyx: I remember [00:53:00] purchasing them separately. You say Lanchain Llama Index is still growing.[00:53:03] Alessio: Yeah, so I looked at the PyPy stats, you know. I don't care about stars. On PyPy you see Do you want to share your screen? Yes. I prefer to look at actual downloads, not at stars on GitHub. So if you look at, you know, Lanchain still growing.[00:53:20] Alessio: These are the last six months. Llama Index still growing. What I've basically seen is like things that, One, obviously these things have A commercial product. So there's like people buying this and sticking with it versus kind of hopping in between things versus, you know, for example, crew AI, not really growing as much.[00:53:38] Alessio: The stars are growing. If you look on GitHub, like the stars are growing, but kind of like the usage is kind of like flat. In the last six months, have they done some[00:53:4
Talk Python To Me - Python conversations for passionate developers
Join me for an insightful conversation with Alex Monahan, who works on documentation, tutorials, and training at DuckDB Labs. We explore why DuckDB is gaining momentum among Python and data enthusiasts, from its in-process database design to its blazingly fast, columnar architecture. We also dive into indexing strategies, concurrency considerations, and the fascinating way MotherDuck (the cloud companion to DuckDB) handles large-scale data seamlessly. Don't miss this chance to learn how a single pip install could totally transform your Python data workflow! Episode sponsors Sentry Error Monitoring, Code TALKPYTHON Data Citizens Podcast Talk Python Courses Links from the show Alex on Mastodon: @__Alex__ DuckDB: duckdb.org MotherDuck: motherduck.com SQLite: sqlite.org Moka-Py: github.com PostgreSQL: www.postgresql.org MySQL: www.mysql.com Redis: redis.io Apache Parquet: parquet.apache.org Apache Arrow: arrow.apache.org Pandas: pandas.pydata.org Polars: pola.rs Pyodide: pyodide.org DB-API (PEP 249): peps.python.org/pep-0249 Flask: flask.palletsprojects.com Gunicorn: gunicorn.org MinIO: min.io Amazon S3: aws.amazon.com/s3 Azure Blob Storage: azure.microsoft.com/products/storage Google Cloud Storage: cloud.google.com/storage DigitalOcean: www.digitalocean.com Linode: www.linode.com Hetzner: www.hetzner.com BigQuery: cloud.google.com/bigquery DBT (Data Build Tool): docs.getdbt.com Mode: mode.com Hex: hex.tech Python: www.python.org Node.js: nodejs.org Rust: www.rust-lang.org Go: go.dev .NET: dotnet.microsoft.com Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to Talk Python on YouTube: youtube.com Talk Python on Bluesky: @talkpython.fm at bsky.app Talk Python on Mastodon: talkpython Michael on Bluesky: @mkennedy.codes at bsky.app Michael on Mastodon: mkennedy
Talk Python To Me - Python conversations for passionate developers
If you're a Django developer, I'm sure you've heard so many people raving about FastAPI and Pydantic. But you really love Django and don't want to switch. Then you might want to give Django Ninja a serious look. Django Ninja is highly inspired by FastAPI, but is also deeply integrated into Django itself. We have Vitaliy Kucheryaviy the creator of Django Ninja on this show to tell us all about it. Episode sponsors Sentry Error Monitoring, Code TALKPYTHON Bluehost Talk Python Courses Links from the show Vitaly: github.com/vitalik Vitaly on X: @vital1k Top 5 Episodes of 2024: talkpython.fm/blog/posts/top-talk-python-podcast-episodes-of-2024 Django Ninja: django-ninja.dev Motivation section we talked through: django-ninja.dev/motivation LLM for Django Ninja: llm.django-ninja.dev Nano Django: github.com/vitalik/nano-django Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to Talk Python on YouTube: youtube.com Talk Python on Bluesky: @talkpython.fm at bsky.app Talk Python on Mastodon: talkpython Michael on Bluesky: @mkennedy.codes at bsky.app Michael on Mastodon: mkennedy
Talk Python To Me - Python conversations for passionate developers
Peter Wang has been pushing Python forward since the early days of its data science roots. We're lucky to have him back on the show. We're going to talk about the Anaconda Toolbox for Excel as well as many other trends and topics that are hot in the Python space right now. I'm sure you'll enjoy listening to the two of us exchanging our takes on the topics and trends. Episode sponsors Sentry Error Monitoring, Code TALKPYTHON Bluehost Talk Python Courses Links from the show Peter on BSky: @wang.social Michael on BSky: @mkennedy.codes Michael's Curated BSky Starter List: bsky.app Python Blsky Starter Pack List: blueskydirectory.com Anaconda Toolbox for Microsoft Excel: anaconda.com JupyterLite: jupyter.org 8 of the Biggest Excel Mistakes of All Time: blog.hurree.co The Five Demons of Python Packaging PyBay talk: youtube.com PEP 759: peps.python.org TIOBE Index: tiobe.com pyscript: pyscript.net Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to us on YouTube: youtube.com Follow Talk Python on Mastodon: talkpython Follow Michael on Mastodon: mkennedy
Talk Python To Me - Python conversations for passionate developers
LanceDB is a developer-friendly, open source database for AI. It's used by well-known companies such as Midjourney and Character.ai. We have Chang She, the CEO and cofounder of LanceDB on to give us a look at the concept of multi-modal data and how you can use LanceDB in your own Python apps. Episode sponsors Sentry Error Monitoring, Code TALKPYTHON Bluehost Talk Python Courses Links from the show Chang She: @changhiskhan Chang on Github: github.com LanceDB: lancedb.com LanceDB Source: github.com Embeddings API: github.com MinIO: min.io LanceDB Quickstart: github.com VectorDB-recipes: github.com Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to us on YouTube: youtube.com Follow Talk Python on Mastodon: talkpython Follow Michael on Mastodon: mkennedy
Talk Python To Me - Python conversations for passionate developers
There has been a lot of changes in the low-level Python space these days. The biggest has to be how many projects have rewritten core performance-intensive sections in Rust. Or even the wholesale adoption of Rust for newer projects such as uv and ruff. On this episode, we dive into the tools and workflow needed to build these portions of Python apps in Rust with David Seddon and Samuel Colvin. Episode sponsors Posit Data Citizens Podcast Talk Python Courses Links from the show Samuel Colvin: github.com/samuelcolvin David Seddon: github.com/seddonym Pydantic: pydantic.dev PEP 0759: peps.python.org TypeShed: github.com Maturin: maturin.rs rloop: github.com Grimp: github.com Grimp Workflows: github.com White House recommends memory safe languages: whitehouse.gov Installing Rust: rust-lang.org jiter: github.com import-linter: github.com Logfire: pydantic.dev Crabs in Snakes, David Seddon, Pycon Italia: youtube.com Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to us on YouTube: youtube.com Follow Talk Python on Mastodon: talkpython Follow Michael on Mastodon: mkennedy
Talk Python To Me - Python conversations for passionate developers
If you are a .NET developer or work in a place that has some of those folks, wouldn't it be great to fully leverage the entirety of PyPI with it's almost 600,000 packages inside your .NET code? But how would you do this? Previous efforts have let you write Python syntax but using the full libraries (especially the C-based ones) has been out of reach, until CSnakes. This project by Anthony Shaw and Aaron Powell unlocks some pretty serious integration between the two languages. We have them both here on the show today to tell us all about it. Episode sponsors Posit Bluehost Talk Python Courses Links from the show Anthony Shaw: github.com Aaron Powell: github.com Introducing CSnakes: tonybaloney.github.io CSnakes: tonybaloney.github.io Talk Python: We've moved to Hetzner: talkpython.fm/blog Talk Python: Talk Python rewritten in Quart (async Flask): talkpython.fm/blog Pyjion - A JIT for Python based upon CoreCLR: github.com Iron Python: ironpython.net Python.NET: pythonnet.github.io The buffer protocol: docs.python.org Avalonia UI: avaloniaui.net Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to us on YouTube: youtube.com Follow Talk Python on Mastodon: talkpython Follow Michael on Mastodon: mkennedy
DataStax is known for its expertise in scalable data solutions, particularly for Apache Cassandra, a leading NoSQL database. Recently, the company has focused on enhancing platform support for AI-driven applications, including vector search capabilities. Jonathan Ellis is the Co-founder of DataStax. He maintains a technical role at the company and has recently worked on developing The post DataStax and the Future of Real-Time Data Applications with Jonathan Ellis appeared first on Software Engineering Daily.
لو خطر في بالك قبل كده ليه عندنا كل قواعد البيانات دي, و ليه فيه منهم انواع مختلفة DBMS, NOSQL و غيرهم, طيب الناس اللي بتشتغل على الحاجات دي ايه التحديات اللي بيواجهوها, و ايه التخصص ده و ايه المتطلبات بتاعته. Ahmed Ayad is a SQL Engineer by trade, a database guy by education and training, and data dude by passion. I am currently an Engineering Director of the Managed Storage and Workload Management team in Google #BigQuery, building the best large scale enterprise data warehouse on the planet. My team owns the core parts of BigQuery involved in managing user data, metadata catalog, streaming and batch ingestion, replication, resource management and placement, physical sharding, and structured lake analytics. Over the years we have: - Grew data under management by several orders of magnitude. - Grew BigQuery's global footprint to more than 20+ regions and counting. - Enabled the hyper scaling of data analytics for a Who's Who list of Fortune 500 users, both Enterprise and Cloud-native. I am passionate about building cool technologies at scale, and the effective teams that create them. Things I did in previous professional lives: - I have shipped components in SQL Server product since SQL Server 2008. Worked on the Performance Data Collector, Policy Based Management, AlwaysOn, The Utility Control Point, SQL Azure stack from the backend to the middle-tier and Portal, SQL Server Agent, SQL Server Optimizer, and SQL Server Management Tools. - Did Database research in the areas of Data Mining, Query Optimization, and Data Streaming.
Talk Python To Me - Python conversations for passionate developers
What do developers need to know about AppSec and building secure software? We have Tonya Janca (AKA SheHacksPurple) on the show to tell us all about it. We talk about what developers should expect from threat modeling events as well as concrete tips for security your apps and services. Episode sponsors Posit Bluehost Talk Python Courses Links from the show Tanya on X: @shehackspurple She Hacks Purple website: shehackspurple.ca White House recommends memory safe languages: whitehouse.gov Python Developer Survey Results: jetbrains.com Bandit: github.com Semgrep Academy: academy.semgrep.dev Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to us on YouTube: youtube.com Follow Talk Python on Mastodon: talkpython Follow Michael on Mastodon: mkennedy
Talk Python To Me - Python conversations for passionate developers
Have you heard about HTMX? We've discussed it a time or two on this show. We're back with another episode on HTMX, this time with a real-world success story and lessons learned. We have Sheena O'Connell on to tell us how she moved from a React-Django app to pure Django with HTMX. Episode sponsors Posit Bluehost Talk Python Courses Links from the show Sheena O'Connell: sheenaoc.com An HTMX success story essay: sheenaoc.com Sheena's HTMX Workshop: prelude.tech - discount code: talk_python Talk Python's HTMX Courses HTMX + Flask course: training.talkpython.fm HTMX + Django course: training.talkpython.fm Build An Audio AI App course: training.talkpython.fm HTMX: htmx.org Playwright: playwright.dev django-template-partials: github.com Michael's jinja_partials: github.com django-guardian: github.com Talk Python Courses HTMX Example: training.talkpython.fm/courses/all Alpine.js: alpinejs.dev David Guillot SaaS video: youtube.com awesome-htmx: github.com Guild of Educators: guildofeducators.org The big rewrite song: youtube.com Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to us on YouTube: youtube.com Follow Talk Python on Mastodon: talkpython Follow Michael on Mastodon: mkennedy
Talk Python To Me - Python conversations for passionate developers
Let's say you want to create a web app and you know Python really well. Your first thought might be Flask or Django or even FastAPI? All good choices but there is a lot to get a full web app into production. The framework we'll talk about today, Reflex, allows you to just write Python code and it turns it into a full web app running FastAPI, NextJS, React and more plus it handles the deployment for you. It's a cool idea. Let's talk to Elvis Kahoro and Nikhil Rao from Reflex.dev. Episode sponsors Posit Bluehost Talk Python Courses Links from the show Elvis: github.com Nikhil: github.com Reflex Framework: reflex.dev Reflex source: github.com Reflex docs: reflex.dev Reflex Roadmap: github.com AG Grid: ag-grid.com Warp terminal: warp.dev A Stroll Down Startup Lane episode: talkpython.fm PuePy: Reactive frontend framework in Python episode: talkpython.fm Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to us on YouTube: youtube.com Follow Talk Python on Mastodon: talkpython Follow Michael on Mastodon: mkennedy
Talk Python To Me - Python conversations for passionate developers
Do you struggle to make sure your code is always correct before you check it in? What about your team members' code? That one person who never wants to run the linter? Tired of dealing with tons of conflicts and spurious git changes? You need git pre-commit hooks. We're lucky to have Stefanie Molin on this episode who has done a bunch of writing and teaching of git hooks. Episode sponsors Sentry Error Monitoring, Code TALKPYTHON Bluehost Talk Python Courses Links from the show Stefanie Molin: stefaniemolin.com Talk Python Blog: talkpython.fm/blog How to Set Up Pre-Commit Hooks: stefaniemolin.com Common Pre-Commit Errors and How to Solve Them: stefaniemolin.com A Behind-the-Scenes Look at How Pre-Commit Works: stefaniemolin.com Pre-Commit Hook Creation Guide: stefaniemolin.com (Pre-)Commit to Better Code Workshop: stefaniemolin.com exif-stripper: stefaniemolin.com exif-stripper on GitHub: github.com docstring-validation-using-pre-commit-hook: numpydoc.readthedocs.io Data Morph: Moving Beyond the Datasaurus Dozen: stefaniemolin.com Data Morph on GitHub: github.com Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to us on YouTube: youtube.com Follow Talk Python on Mastodon: talkpython Follow Michael on Mastodon: mkennedy
In this episode of the MongoDB Podcast, Matt Asay sits down with Peder Ulander, Chief Marketing Officer at MongoDB, during MongoDB's .local London event. Peder shares insights on the latest developments in MongoDB, focusing on performance, scalability, and the platform's relevance to developers. They discuss how MongoDB has evolved beyond a niche NoSQL database to become a versatile platform capable of supporting complex, cloud-scale applications across various industries. Peder also highlights the simplicity of getting started with MongoDB, the role of AI in modernizing legacy apps, and how the data architecture landscape is shifting towards distributed systems.
Talk Python To Me - Python conversations for passionate developers
Hynek has been writing and speaking on some of the most significant topics in the Python space and I've enjoyed his takes. So I invited him on the show to share them with all of us. This episode really epitomizes one of the reasons I launched Talk Python 9 years ago. It's as if we run into each other at a bar during a conference and I ask Hynek, "So what are your thoughts on ..." and we dive down the rabbit hole for an hour. I hope you enjoy it. Episode sponsors WorkOS Bluehost Talk Python Courses Links from the show Hynek Schlawack on Mastodon: @hynek Why I Still Use Python Virtual Environments in Docker: hynek.me Production-ready Python Docker Containers with uv: hynek.me Attrs: github.com uv: astral.sh What's New In Python 4: python.org BusyBox: busybox.net Hynek's YouTube Channel: youtube.com MOPUp for macOS: github.com Homebrew Python Is Not For You: justinmayer.com argon2-cffi: Argon2 for Python: github.com pytest-freethreaded: github.com LM Studio: lmstudio.ai StackOverflow Trends Graph: trends.stackoverflow.co Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to us on YouTube: youtube.com Follow Talk Python on Mastodon: talkpython Follow Michael on Mastodon: mkennedy
Talk Python To Me - Python conversations for passionate developers
If you work in data science, you definitely know about data frame libraries. Pandas is certainly the most popular, but there are others such as cuDF, Modin, Polars, Dask, and more. They are all similar but definitely not the same APIs and Polars is quite different. But here's the problem. If you want to write a library that is for users of more than one of these data frame frameworks, how do you do that? Or if you want to leave open the possibility of changing yours after the app is built, same problem. That's the problem that Narwhals solves. We have Marco Gorelli on the show to tell us all about it. Episode sponsors WorkOS Talk Python Courses Links from the show Marco Gorelli: @marcogorelli Marco on LinkedIn: linkedin.com Narwhals: github.io Narwhals on Github: github.com DuckDB: duckdb.org Ibis: ibis-project.org modin: readthedocs.io Pandas and Beyond with Wes McKinney: talkpython.fm Polars: A Lightning-fast DataFrame for Python: talkpython.fm Polars: pola.rs Pandas: pandas.pydata.org Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to us on YouTube: youtube.com Follow Talk Python on Mastodon: talkpython Follow Michael on Mastodon: mkennedy
Talk Python To Me - Python conversations for passionate developers
You're about to launch your new app or API, or even just a big refactor of your current project. Will it stand up and deliver when you put it into production or when that big promotion goes live? Or will it wither and collapse? How would you know? Well you would test that of course. We have Anthony Shaw back on the podcast to dive into a wide range of tools and techniques for performance and loading testing of web apps. Episode sponsors Sentry Error Monitoring, Code TALKPYTHON WorkOS Talk Python Courses Links from the show Anthony on Twitter: @anthonypjshaw Anthony's PyCon Au Talk: youtube.com locust load testing tool: locust.io playwright: playwright.dev mimesis: github.com mimesis providers: mimesis.name vscode pets: marketplace.visualstudio.com vscode power-mode: marketplace.visualstudio.com opentelemetry: opentelemetry.io uptime-kuma: github.com Talk Python uptime / status: talkpython.fm/status when your serverless computing bill goes parabolic...: youtube.com Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to us on YouTube: youtube.com Follow Talk Python on Mastodon: talkpython Follow Michael on Mastodon: mkennedy
MongoDB Atlas is a managed NoSQL database that uses JSON-like documents with optional schemas. The platform recently released new vector search capabilities to facilitate building AI capabilities. Ben Flast is the Director of Product Management at MongoDB. He joins the show to talk about the company's developments with vector search. This episode is hosted by The post MongoDB Vector Search with Ben Flast appeared first on Software Engineering Daily.
Talk Python To Me - Python conversations for passionate developers
Do you have kids? Maybe nieces and nephews? Or maybe you work in a school environment? Maybe it's just friend's who know you're a programmer and ask about how they should go about introducing programming concepts with them. Anna-Lena Popkes is back on the show to share her research on when and how to teach kids programming. We spend the second half of the episode talking about concrete apps and toys you might consider for each age group. Plus, some of these things are fun for adults too. ;) Episode sponsors WorkOS Talk Python Courses Links from the show Anna-Lena: alpopkes.com Magical universe repo: github.com Machine learning basics repo: github.com PyData recording "when and how to start coding with kids": youtube.com Robots and devices Bee Bot: terrapinlogo.com Cubelets: modrobotics.com BBC Microbit: microbit.org RaspberryPi: raspberrypi.com Adafruit Qualia ESP32 for CircuitPython: adafruit.com Zumi: robolink.com Board games Think Fun Robot Turtles Board Game: amazon.com Visual programming: Scratch Jr.: scratchjr.org Scratch: scratch.org Blocky: google.com Microbit's Make Code: microbit.org Code Club: codeclubworld.org Textual programming Code Combat: codecombat.com Hedy: hedycode.com Anvil: anvil.works Coding classes / summer camps (US) Portland Community College Summer Teen Program: pcc.edu Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to us on YouTube: youtube.com Follow Talk Python on Mastodon: talkpython Follow Michael on Mastodon: mkennedy
Talk Python To Me - Python conversations for passionate developers
Do you have text that you want to process automatically? Maybe you want to pull out key products or topics of conversation? Maybe you want to get the sentiment? The possibilities are many with this week's topic: NLP with spaCy and Python. Our guest, Vincent D. Warmerdam, has worked on spaCy and other tools at Explosion AI and he's here to give us his tips and tricks for working with text from Python. Episode sponsors Posit Talk Python Courses Links from the show Course: Getting Started with NLP and spaCy: talkpython.fm Vincent on X: @fishnets88 Vincent on Mastodon: @koaning Programmable Keyboards on CalmCode: youtube.com Sample Space Podcast: youtube.com spaCy: spacy.io Course: Build An Audio AI App: talkpython.fm Lemma example: github.com Code for spaCy course: github.com Python Bytes transcripts: github.com scikit-lego: github.com Projects that import "this": calmcode.io Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to us on YouTube: youtube.com Follow Talk Python on Mastodon: talkpython Follow Michael on Mastodon: mkennedy
It's hard to escape talk of AI these days, but not all AI is the same, and not all of it is safe for large organizations to use. Today we're diving into the evolving world of generative AI for the enterprise with Sumeet Agrawal, VP of Product Management at Informatica. We'll discuss strategies and considerations for building robust, enterprise-grade generative AI applications. Sumeet Kumar Agrawal is a VP of Product Management at Informatica. Based in the Bay Area, Sumeet has 15+ years of data engineering and product management experience, driving innovative products within the cloud technology sector. He leads the Cloud AI/GenAI, Analytics & Data warehouse & Data lake, and iPaaS product portfolio at Informatica. It consists of multiple product lines such as Cloud Data Engineering, Streaming, big data, NoSQL, Serverless, Cloud Mass Ingestion(including CDC), Serverless & AI/ML, GenAI, API, and App Integration initiatives. He has a lot of experience working with many cloud ecosystem vendors like AWS, GCP, Azure, Snowflake, Databricks, etc. Apart from this, Sumeet has been frequently recognized as a strong communicator who has successfully worked with people from broad socio-economic backgrounds, and diverse cultures, building strong and fruitful organizational teams. He sits on the advisory board of many startups. RESOURCES Informatica website: https://www.informatica.com Register for the Medallia CX Day webinar: Building Loyalty: How Top Brands Create Forever Customers with CX - https://bit.ly/3M7dkQM Connect with Greg on LinkedIn: https://www.linkedin.com/in/gregkihlstrom Don't miss a thing: get the latest episodes, sign up for our newsletter and more: https://www.theagilebrand.show Attend the Mid-Atlantic MarCom Summit, the region's largest marketing communications conference. Register with the code "Agile" and get 15% off. Register now for HumanX 2025. This AI-focused event which brings some of the most forward-thinking minds in technology together. Register now with the code "HX25p_tab" for $250 off the regular price. Check out The Agile Brand Guide website with articles, insights, and Martechipedia, the wiki for marketing technology: https://www.agilebrandguide.com The Agile Brand podcast is brought to you by TEKsystems. Learn more here: https://www.teksystems.com/versionnextnow The Agile Brand is produced by Missing Link—a Latina-owned strategy-driven, creatively fueled production co-op. From ideation to creation, they craft human connections through intelligent, engaging and informative content. https://www.missinglink.company Learn more about your ad choices. Visit megaphone.fm/adchoices
Talk Python To Me - Python conversations for passionate developers
A couple of weeks ago, Charlie Marsh and the folks at Astral made another big splash with a major release of uv called "uv: Unified Python packaging" which has many far reaching features. We had to have Charlie on the show to give us the inside look into this development. Let's get to it. Episode sponsors Posit Talk Python Courses Links from the show Charlie Marsh on Twitter: @charliermarsh Charlie Marsh on Mastodon: @charliermarsh uv: Unified Python packaging: astral.sh Python executable management: astral.sh Projects: astral.sh Tools: astral.sh Scripts: astral.sh Rye and uv: August is Harvest Season for Python Packaging: lucumr.pocoo.org Python Build Standalone releases: github.com Rules: astral.sh Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to us on YouTube: youtube.com Follow Talk Python on Mastodon: talkpython Follow Michael on Mastodon: mkennedy
Talk Python To Me - Python conversations for passionate developers
Every year the core developers meet to discuss and propose the major changes and trends in Python itself. This invite-only conference of about 50 people happens inside PyCon in the US. Because it's private, we rarely get detailed looks inside this event. On this episode, we have Seth Michael Larson here to give us his account of the sessions and proposals. It's a unique look into the zeitgeist of CPython. Episode sponsors Posit Talk Python Courses Links from the show Seth on Mastodon: @sethmlarson@fosstodon.org Seth on Twitter: @sethmlarson Seth on Github: github.com The Python Language Summit 2024: blogspot.com PEP 2026: Calendar versioning for Python: github.com PSF authorized as a CVE Numbering Authority: python.org Recommends Memory-Safe Programming Languages: blogspot.com Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to us on YouTube: youtube.com Follow Talk Python on Mastodon: talkpython Follow Michael on Mastodon: mkennedy