Numerical programming library for the Python programming language
POPULARITY
Topics covered in this episode: platformdirs poethepoet - “Poe the Poet is a batteries included task runner that works well with poetry or with uv.” Python Pandas Ditches NumPy for Speedier PyArrow pointblank: Data validation made beautiful and powerful Extras Joke Watch on YouTube About the show Sponsored by us! Support our work through: Our courses at Talk Python Training The Complete pytest Course Patreon Supporters Connect with the hosts Michael: @mkennedy@fosstodon.org / @mkennedy.codes (bsky) Brian: @brianokken@fosstodon.org / @brianokken.bsky.social Show: @pythonbytes@fosstodon.org / @pythonbytes.fm (bsky) Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Monday at 10am PT. Older video versions available there too. Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to our friends of the show list, we'll never share it. Michael #1: platformdirs A small Python module for determining appropriate platform-specific dirs, e.g. a "user data dir". Why the community moved on from appdirs to platformdirs At AppDirs: Note: This project has been officially deprecated. You may want to check out pypi.org/project/platformdirs/ which is a more active fork of appdirs. Thanks to everyone who has used appdirs. Shout out to ActiveState for the time they gave their employees to work on this over the years. Better than AppDirs: Works today, works tomorrow – new Python releases sometimes change low-level APIs (win32com, pathlib, Apple sandbox rules). platformdirs tracks those changes so your code keeps running. First-class typing – no more types-appdirs stubs; editors autocomplete paths as Path objects. Richer directory set – if you need a user's Downloads folder or a per-session runtime dir, there's a helper for it. Cleaner internals – rewritten to use pathlib, caching, and extensive test coverage; all platforms are exercised in CI. Community stewardship – the project lives in the PyPA orbit and gets security/compatibility patches quickly. Brian #2: poethepoet - “Poe the Poet is a batteries included task runner that works well with poetry or with uv.” from Bob Belderbos Tasks are easy to define and are defined in pyproject.toml Michael #3: Python Pandas Ditches NumPy for Speedier PyArrow Pandas 3.0 will significantly boost performance by replacing NumPy with PyArrow as its default engine, enabling faster loading and reading of columnar data. Recently talked with Reuven Lerner about this on Talk Python too. In the next version, v3.0, PyArrow will be a required dependency, with pyarrow.string being the default type inferred for string data. PyArrow is 10 times faster. PyArrow offers columnar storage, which eliminates all that computational back and forth that comes with NumPy. PyArrow paves the way for running Pandas, by default, on Copy on Write mode, which improves memory and performance usage. Brian #4: pointblank: Data validation made beautiful and powerful “With its … chainable API, you can … validate your data against comprehensive quality checks …” Extras Brian: Ruff rules Ruff users, what rules are using and what are you ignoring? Python 3.14.0b2 - did we already cover this? Transferring your Mastodon account to another server, in case anyone was thinking about doing that I'm trying out Fathom Analytics for privacy friendly analytics Michael: Polars for Power Users: Transform Your Data Analysis Game Course Joke: Does your dog bite?
In this episode, Conor and Bryce chat with Jared Hoberock about the NVIDIA Thrust Parallel Algorithms Library.Link to Episode 237 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)SocialsADSP: The Podcast: TwitterConor Hoekstra: Twitter | BlueSky | MastodonBryce Adelstein Lelbach: TwitterAbout the GuestJared Hoberock joined NVIDIA Research in October 2008. His interests include parallel programming models and physically-based rendering. Jared is the co-creator of Thrust, a high performance parallel algorithms library. While at NVIDIA, Jared has contributed to the DirectX graphics driver, Gelato, a final frame film renderer, and OptiX, a high-performance, programmable ray tracing engine. Jared received a Ph.D in computer science from the University of Illinois at Urbana-Champaign. He is a two-time recipient of the NVIDIA Graduate Research Fellowship.Show NotesDate Generated: 2025-05-21Date Released: 2025-06-06ThrustThrust DocsC++98 std::transformthrust::reduceMPI_reduceNVIDIA MatXCuPyRAPIDS.aiThrust Summed Area Table ExampleADSP Episode 213: NumPy & Summed-Area TablesIntro Song InfoMiss You by Sarah Jansen https://soundcloud.com/sarahjansenmusicCreative Commons — Attribution 3.0 Unported — CC BY 3.0Free Download / Stream: http://bit.ly/l-miss-youMusic promoted by Audio Library https://youtu.be/iYYxnasvfx8
Auphonic (click here to comment) 25. Februar 2025, Jochen
Did you know that adding a simple Code Interpreter took o3 from 9.2% to 32% on FrontierMath? The Latent Space crew is hosting a hack night Feb 11th in San Francisco focused on CodeGen use cases, co-hosted with E2B and Edge AGI; watch E2B's new workshop and RSVP here!We're happy to announce that today's guest Samuel Colvin will be teaching his very first Pydantic AI workshop at the newly announced AI Engineer NYC Workshops day on Feb 22! 25 tickets left.If you're a Python developer, it's very likely that you've heard of Pydantic. Every month, it's downloaded >300,000,000 times, making it one of the top 25 PyPi packages. OpenAI uses it in its SDK for structured outputs, it's at the core of FastAPI, and if you've followed our AI Engineer Summit conference, Jason Liu of Instructor has given two great talks about it: “Pydantic is all you need” and “Pydantic is STILL all you need”. Now, Samuel Colvin has raised $17M from Sequoia to turn Pydantic from an open source project to a full stack AI engineer platform with Logfire, their observability platform, and PydanticAI, their new agent framework.Logfire: bringing OTEL to AIOpenTelemetry recently merged Semantic Conventions for LLM workloads which provides standard definitions to track performance like gen_ai.server.time_per_output_token. In Sam's view at least 80% of new apps being built today have some sort of LLM usage in them, and just like web observability platform got replaced by cloud-first ones in the 2010s, Logfire wants to do the same for AI-first apps. If you're interested in the technical details, Logfire migrated away from Clickhouse to Datafusion for their backend. We spent some time on the importance of picking open source tools you understand and that you can actually contribute to upstream, rather than the more popular ones; listen in ~43:19 for that part.Agents are the killer app for graphsPydantic AI is their attempt at taking a lot of the learnings that LangChain and the other early LLM frameworks had, and putting Python best practices into it. At an API level, it's very similar to the other libraries: you can call LLMs, create agents, do function calling, do evals, etc.They define an “Agent” as a container with a system prompt, tools, structured result, and an LLM. Under the hood, each Agent is now a graph of function calls that can orchestrate multi-step LLM interactions. You can start simple, then move toward fully dynamic graph-based control flow if needed.“We were compelled enough by graphs once we got them right that our agent implementation [...] is now actually a graph under the hood.”Why Graphs?* More natural for complex or multi-step AI workflows.* Easy to visualize and debug with mermaid diagrams.* Potential for distributed runs, or “waiting days” between steps in certain flows.In parallel, you see folks like Emil Eifrem of Neo4j talk about GraphRAG as another place where graphs fit really well in the AI stack, so it might be time for more people to take them seriously.Full Video EpisodeLike and subscribe!Chapters* 00:00:00 Introductions* 00:00:24 Origins of Pydantic* 00:05:28 Pydantic's AI moment * 00:08:05 Why build a new agents framework?* 00:10:17 Overview of Pydantic AI* 00:12:33 Becoming a believer in graphs* 00:24:02 God Model vs Compound AI Systems* 00:28:13 Why not build an LLM gateway?* 00:31:39 Programmatic testing vs live evals* 00:35:51 Using OpenTelemetry for AI traces* 00:43:19 Why they don't use Clickhouse* 00:48:34 Competing in the observability space* 00:50:41 Licensing decisions for Pydantic and LogFire* 00:51:48 Building Pydantic.run* 00:55:24 Marimo and the future of Jupyter notebooks* 00:57:44 London's AI sceneShow Notes* Sam Colvin* Pydantic* Pydantic AI* Logfire* Pydantic.run* Zod* E2B* Arize* Langsmith* Marimo* Prefect* GLA (Google Generative Language API)* OpenTelemetry* Jason Liu* Sebastian Ramirez* Bogomil Balkansky* Hood Chatham* Jeremy Howard* Andrew LambTranscriptAlessio [00:00:03]: Hey, everyone. Welcome to the Latent Space podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI.Swyx [00:00:12]: Good morning. And today we're very excited to have Sam Colvin join us from Pydantic AI. Welcome. Sam, I heard that Pydantic is all we need. Is that true?Samuel [00:00:24]: I would say you might need Pydantic AI and Logfire as well, but it gets you a long way, that's for sure.Swyx [00:00:29]: Pydantic almost basically needs no introduction. It's almost 300 million downloads in December. And obviously, in the previous podcasts and discussions we've had with Jason Liu, he's been a big fan and promoter of Pydantic and AI.Samuel [00:00:45]: Yeah, it's weird because obviously I didn't create Pydantic originally for uses in AI, it predates LLMs. But it's like we've been lucky that it's been picked up by that community and used so widely.Swyx [00:00:58]: Actually, maybe we'll hear it. Right from you, what is Pydantic and maybe a little bit of the origin story?Samuel [00:01:04]: The best name for it, which is not quite right, is a validation library. And we get some tension around that name because it doesn't just do validation, it will do coercion by default. We now have strict mode, so you can disable that coercion. But by default, if you say you want an integer field and you get in a string of 1, 2, 3, it will convert it to 123 and a bunch of other sensible conversions. And as you can imagine, the semantics around it. Exactly when you convert and when you don't, it's complicated, but because of that, it's more than just validation. Back in 2017, when I first started it, the different thing it was doing was using type hints to define your schema. That was controversial at the time. It was genuinely disapproved of by some people. I think the success of Pydantic and libraries like FastAPI that build on top of it means that today that's no longer controversial in Python. And indeed, lots of other people have copied that route, but yeah, it's a data validation library. It uses type hints for the for the most part and obviously does all the other stuff you want, like serialization on top of that. But yeah, that's the core.Alessio [00:02:06]: Do you have any fun stories on how JSON schemas ended up being kind of like the structure output standard for LLMs? And were you involved in any of these discussions? Because I know OpenAI was, you know, one of the early adopters. So did they reach out to you? Was there kind of like a structure output console in open source that people were talking about or was it just a random?Samuel [00:02:26]: No, very much not. So I originally. Didn't implement JSON schema inside Pydantic and then Sebastian, Sebastian Ramirez, FastAPI came along and like the first I ever heard of him was over a weekend. I got like 50 emails from him or 50 like emails as he was committing to Pydantic, adding JSON schema long pre version one. So the reason it was added was for OpenAPI, which is obviously closely akin to JSON schema. And then, yeah, I don't know why it was JSON that got picked up and used by OpenAI. It was obviously very convenient for us. That's because it meant that not only can you do the validation, but because Pydantic will generate you the JSON schema, it will it kind of can be one source of source of truth for structured outputs and tools.Swyx [00:03:09]: Before we dive in further on the on the AI side of things, something I'm mildly curious about, obviously, there's Zod in JavaScript land. Every now and then there is a new sort of in vogue validation library that that takes over for quite a few years and then maybe like some something else comes along. Is Pydantic? Is it done like the core Pydantic?Samuel [00:03:30]: I've just come off a call where we were redesigning some of the internal bits. There will be a v3 at some point, which will not break people's code half as much as v2 as in v2 was the was the massive rewrite into Rust, but also fixing all the stuff that was broken back from like version zero point something that we didn't fix in v1 because it was a side project. We have plans to move some of the basically store the data in Rust types after validation. Not completely. So we're still working to design the Pythonic version of it, in order for it to be able to convert into Python types. So then if you were doing like validation and then serialization, you would never have to go via a Python type we reckon that can give us somewhere between three and five times another three to five times speed up. That's probably the biggest thing. Also, like changing how easy it is to basically extend Pydantic and define how particular types, like for example, NumPy arrays are validated and serialized. But there's also stuff going on. And for example, Jitter, the JSON library in Rust that does the JSON parsing, has SIMD implementation at the moment only for AMD64. So we can add that. We need to go and add SIMD for other instruction sets. So there's a bunch more we can do on performance. I don't think we're going to go and revolutionize Pydantic, but it's going to continue to get faster, continue, hopefully, to allow people to do more advanced things. We might add a binary format like CBOR for serialization for when you'll just want to put the data into a database and probably load it again from Pydantic. So there are some things that will come along, but for the most part, it should just get faster and cleaner.Alessio [00:05:04]: From a focus perspective, I guess, as a founder too, how did you think about the AI interest rising? And then how do you kind of prioritize, okay, this is worth going into more, and we'll talk about Pydantic AI and all of that. What was maybe your early experience with LLAMP, and when did you figure out, okay, this is something we should take seriously and focus more resources on it?Samuel [00:05:28]: I'll answer that, but I'll answer what I think is a kind of parallel question, which is Pydantic's weird, because Pydantic existed, obviously, before I was starting a company. I was working on it in my spare time, and then beginning of 22, I started working on the rewrite in Rust. And I worked on it full-time for a year and a half, and then once we started the company, people came and joined. And it was a weird project, because that would never go away. You can't get signed off inside a startup. Like, we're going to go off and three engineers are going to work full-on for a year in Python and Rust, writing like 30,000 lines of Rust just to release open-source-free Python library. The result of that has been excellent for us as a company, right? As in, it's made us remain entirely relevant. And it's like, Pydantic is not just used in the SDKs of all of the AI libraries, but I can't say which one, but one of the big foundational model companies, when they upgraded from Pydantic v1 to v2, their number one internal model... The metric of performance is time to first token. That went down by 20%. So you think about all of the actual AI going on inside, and yet at least 20% of the CPU, or at least the latency inside requests was actually Pydantic, which shows like how widely it's used. So we've benefited from doing that work, although it didn't, it would have never have made financial sense in most companies. In answer to your question about like, how do we prioritize AI, I mean, the honest truth is we've spent a lot of the last year and a half building. Good general purpose observability inside LogFire and making Pydantic good for general purpose use cases. And the AI has kind of come to us. Like we just, not that we want to get away from it, but like the appetite, uh, both in Pydantic and in LogFire to go and build with AI is enormous because it kind of makes sense, right? Like if you're starting a new greenfield project in Python today, what's the chance that you're using GenAI 80%, let's say, globally, obviously it's like a hundred percent in California, but even worldwide, it's probably 80%. Yeah. And so everyone needs that stuff. And there's so much yet to be figured out so much like space to do things better in the ecosystem in a way that like to go and implement a database that's better than Postgres is a like Sisyphean task. Whereas building, uh, tools that are better for GenAI than some of the stuff that's about now is not very difficult. Putting the actual models themselves to one side.Alessio [00:07:40]: And then at the same time, then you released Pydantic AI recently, which is, uh, um, you know, agent framework and early on, I would say everybody like, you know, Langchain and like, uh, Pydantic kind of like a first class support, a lot of these frameworks, we're trying to use you to be better. What was the decision behind we should do our own framework? Were there any design decisions that you disagree with any workloads that you think people didn't support? Well,Samuel [00:08:05]: it wasn't so much like design and workflow, although I think there were some, some things we've done differently. Yeah. I think looking in general at the ecosystem of agent frameworks, the engineering quality is far below that of the rest of the Python ecosystem. There's a bunch of stuff that we have learned how to do over the last 20 years of building Python libraries and writing Python code that seems to be abandoned by people when they build agent frameworks. Now I can kind of respect that, particularly in the very first agent frameworks, like Langchain, where they were literally figuring out how to go and do this stuff. It's completely understandable that you would like basically skip some stuff.Samuel [00:08:42]: I'm shocked by the like quality of some of the agent frameworks that have come out recently from like well-respected names, which it just seems to be opportunism and I have little time for that, but like the early ones, like I think they were just figuring out how to do stuff and just as lots of people have learned from Pydantic, we were able to learn a bit from them. I think from like the gap we saw and the thing we were frustrated by was the production readiness. And that means things like type checking, even if type checking makes it hard. Like Pydantic AI, I will put my hand up now and say it has a lot of generics and you need to, it's probably easier to use it if you've written a bit of Rust and you really understand generics, but like, and that is, we're not claiming that that makes it the easiest thing to use in all cases, we think it makes it good for production applications in big systems where type checking is a no-brainer in Python. But there are also a bunch of stuff we've learned from maintaining Pydantic over the years that we've gone and done. So every single example in Pydantic AI's documentation is run on Python. As part of tests and every single print output within an example is checked during tests. So it will always be up to date. And then a bunch of things that, like I say, are standard best practice within the rest of the Python ecosystem, but I'm not followed surprisingly by some AI libraries like coverage, linting, type checking, et cetera, et cetera, where I think these are no-brainers, but like weirdly they're not followed by some of the other libraries.Alessio [00:10:04]: And can you just give an overview of the framework itself? I think there's kind of like the. LLM calling frameworks, there are the multi-agent frameworks, there's the workflow frameworks, like what does Pydantic AI do?Samuel [00:10:17]: I glaze over a bit when I hear all of the different sorts of frameworks, but I like, and I will tell you when I built Pydantic, when I built Logfire and when I built Pydantic AI, my methodology is not to go and like research and review all of the other things. I kind of work out what I want and I go and build it and then feedback comes and we adjust. So the fundamental building block of Pydantic AI is agents. The exact definition of agents and how you want to define them. is obviously ambiguous and our things are probably sort of agent-lit, not that we would want to go and rename them to agent-lit, but like the point is you probably build them together to build something and most people will call an agent. So an agent in our case has, you know, things like a prompt, like system prompt and some tools and a structured return type if you want it, that covers the vast majority of cases. There are situations where you want to go further and the most complex workflows where you want graphs and I resisted graphs for quite a while. I was sort of of the opinion you didn't need them and you could use standard like Python flow control to do all of that stuff. I had a few arguments with people, but I basically came around to, yeah, I can totally see why graphs are useful. But then we have the problem that by default, they're not type safe because if you have a like add edge method where you give the names of two different edges, there's no type checking, right? Even if you go and do some, I'm not, not all the graph libraries are AI specific. So there's a, there's a graph library called, but it allows, it does like a basic runtime type checking. Ironically using Pydantic to try and make up for the fact that like fundamentally that graphs are not typed type safe. Well, I like Pydantic, but it did, that's not a real solution to have to go and run the code to see if it's safe. There's a reason that starting type checking is so powerful. And so we kind of, from a lot of iteration eventually came up with a system of using normally data classes to define nodes where you return the next node you want to call and where we're able to go and introspect the return type of a node to basically build the graph. And so the graph is. Yeah. Inherently type safe. And once we got that right, I, I wasn't, I'm incredibly excited about graphs. I think there's like masses of use cases for them, both in gen AI and other development, but also software's all going to have interact with gen AI, right? It's going to be like web. There's no longer be like a web department in a company is that there's just like all the developers are building for web building with databases. The same is going to be true for gen AI.Alessio [00:12:33]: Yeah. I see on your docs, you call an agent, a container that contains a system prompt function. Tools, structure, result, dependency type model, and then model settings. Are the graphs in your mind, different agents? Are they different prompts for the same agent? What are like the structures in your mind?Samuel [00:12:52]: So we were compelled enough by graphs once we got them right, that we actually merged the PR this morning. That means our agent implementation without changing its API at all is now actually a graph under the hood as it is built using our graph library. So graphs are basically a lower level tool that allow you to build these complex workflows. Our agents are technically one of the many graphs you could go and build. And we just happened to build that one for you because it's a very common, commonplace one. But obviously there are cases where you need more complex workflows where the current agent assumptions don't work. And that's where you can then go and use graphs to build more complex things.Swyx [00:13:29]: You said you were cynical about graphs. What changed your mind specifically?Samuel [00:13:33]: I guess people kept giving me examples of things that they wanted to use graphs for. And my like, yeah, but you could do that in standard flow control in Python became a like less and less compelling argument to me because I've maintained those systems that end up with like spaghetti code. And I could see the appeal of this like structured way of defining the workflow of my code. And it's really neat that like just from your code, just from your type hints, you can get out a mermaid diagram that defines exactly what can go and happen.Swyx [00:14:00]: Right. Yeah. You do have very neat implementation of sort of inferring the graph from type hints, I guess. Yeah. Is what I would call it. Yeah. I think the question always is I have gone back and forth. I used to work at Temporal where we would actually spend a lot of time complaining about graph based workflow solutions like AWS step functions. And we would actually say that we were better because you could use normal control flow that you already knew and worked with. Yours, I guess, is like a little bit of a nice compromise. Like it looks like normal Pythonic code. But you just have to keep in mind what the type hints actually mean. And that's what we do with the quote unquote magic that the graph construction does.Samuel [00:14:42]: Yeah, exactly. And if you look at the internal logic of actually running a graph, it's incredibly simple. It's basically call a node, get a node back, call that node, get a node back, call that node. If you get an end, you're done. We will add in soon support for, well, basically storage so that you can store the state between each node that's run. And then the idea is you can then distribute the graph and run it across computers. And also, I mean, the other weird, the other bit that's really valuable is across time. Because it's all very well if you look at like lots of the graph examples that like Claude will give you. If it gives you an example, it gives you this lovely enormous mermaid chart of like the workflow, for example, managing returns if you're an e-commerce company. But what you realize is some of those lines are literally one function calls another function. And some of those lines are wait six days for the customer to print their like piece of paper and put it in the post. And if you're writing like your demo. Project or your like proof of concept, that's fine because you can just say, and now we call this function. But when you're building when you're in real in real life, that doesn't work. And now how do we manage that concept to basically be able to start somewhere else in the in our code? Well, this graph implementation makes it incredibly easy because you just pass the node that is the start point for carrying on the graph and it continues to run. So it's things like that where I was like, yeah, I can just imagine how things I've done in the past would be fundamentally easier to understand if we had done them with graphs.Swyx [00:16:07]: You say imagine, but like right now, this pedantic AI actually resume, you know, six days later, like you said, or is this just like a theoretical thing we can go someday?Samuel [00:16:16]: I think it's basically Q&A. So there's an AI that's asking the user a question and effectively you then call the CLI again to continue the conversation. And it basically instantiates the node and calls the graph with that node again. Now, we don't have the logic yet for effectively storing state in the database between individual nodes that we're going to add soon. But like the rest of it is basically there.Swyx [00:16:37]: It does make me think that not only are you competing with Langchain now and obviously Instructor, and now you're going into sort of the more like orchestrated things like Airflow, Prefect, Daxter, those guys.Samuel [00:16:52]: Yeah, I mean, we're good friends with the Prefect guys and Temporal have the same investors as us. And I'm sure that my investor Bogomol would not be too happy if I was like, oh, yeah, by the way, as well as trying to take on Datadog. We're also going off and trying to take on Temporal and everyone else doing that. Obviously, we're not doing all of the infrastructure of deploying that right yet, at least. We're, you know, we're just building a Python library. And like what's crazy about our graph implementation is, sure, there's a bit of magic in like introspecting the return type, you know, extracting things from unions, stuff like that. But like the actual calls, as I say, is literally call a function and get back a thing and call that. It's like incredibly simple and therefore easy to maintain. The question is, how useful is it? Well, I don't know yet. I think we have to go and find out. We have a whole. We've had a slew of people joining our Slack over the last few days and saying, tell me how good Pydantic AI is. How good is Pydantic AI versus Langchain? And I refuse to answer. That's your job to go and find that out. Not mine. We built a thing. I'm compelled by it, but I'm obviously biased. The ecosystem will work out what the useful tools are.Swyx [00:17:52]: Bogomol was my board member when I was at Temporal. And I think I think just generally also having been a workflow engine investor and participant in this space, it's a big space. Like everyone needs different functions. I think the one thing that I would say like yours, you know, as a library, you don't have that much control of it over the infrastructure. I do like the idea that each new agents or whatever or unit of work, whatever you call that should spin up in this sort of isolated boundaries. Whereas yours, I think around everything runs in the same process. But you ideally want to sort of spin out its own little container of things.Samuel [00:18:30]: I agree with you a hundred percent. And we will. It would work now. Right. As in theory, you're just like as long as you can serialize the calls to the next node, you just have to all of the different containers basically have to have the same the same code. I mean, I'm super excited about Cloudflare workers running Python and being able to install dependencies. And if Cloudflare could only give me my invitation to the private beta of that, we would be exploring that right now because I'm super excited about that as a like compute level for some of this stuff where exactly what you're saying, basically. You can run everything as an individual. Like worker function and distribute it. And it's resilient to failure, et cetera, et cetera.Swyx [00:19:08]: And it spins up like a thousand instances simultaneously. You know, you want it to be sort of truly serverless at once. Actually, I know we have some Cloudflare friends who are listening, so hopefully they'll get in front of the line. Especially.Samuel [00:19:19]: I was in Cloudflare's office last week shouting at them about other things that frustrate me. I have a love-hate relationship with Cloudflare. Their tech is awesome. But because I use it the whole time, I then get frustrated. So, yeah, I'm sure I will. I will. I will get there soon.Swyx [00:19:32]: There's a side tangent on Cloudflare. Is Python supported at full? I actually wasn't fully aware of what the status of that thing is.Samuel [00:19:39]: Yeah. So Pyodide, which is Python running inside the browser in scripting, is supported now by Cloudflare. They basically, they're having some struggles working out how to manage, ironically, dependencies that have binaries, in particular, Pydantic. Because these workers where you can have thousands of them on a given metal machine, you don't want to have a difference. You basically want to be able to have a share. Shared memory for all the different Pydantic installations, effectively. That's the thing they work out. They're working out. But Hood, who's my friend, who is the primary maintainer of Pyodide, works for Cloudflare. And that's basically what he's doing, is working out how to get Python running on Cloudflare's network.Swyx [00:20:19]: I mean, the nice thing is that your binary is really written in Rust, right? Yeah. Which also compiles the WebAssembly. Yeah. So maybe there's a way that you'd build... You have just a different build of Pydantic and that ships with whatever your distro for Cloudflare workers is.Samuel [00:20:36]: Yes, that's exactly what... So Pyodide has builds for Pydantic Core and for things like NumPy and basically all of the popular binary libraries. Yeah. It's just basic. And you're doing exactly that, right? You're using Rust to compile the WebAssembly and then you're calling that shared library from Python. And it's unbelievably complicated, but it works. Okay.Swyx [00:20:57]: Staying on graphs a little bit more, and then I wanted to go to some of the other features that you have in Pydantic AI. I see in your docs, there are sort of four levels of agents. There's single agents, there's agent delegation, programmatic agent handoff. That seems to be what OpenAI swarms would be like. And then the last one, graph-based control flow. Would you say that those are sort of the mental hierarchy of how these things go?Samuel [00:21:21]: Yeah, roughly. Okay.Swyx [00:21:22]: You had some expression around OpenAI swarms. Well.Samuel [00:21:25]: And indeed, OpenAI have got in touch with me and basically, maybe I'm not supposed to say this, but basically said that Pydantic AI looks like what swarms would become if it was production ready. So, yeah. I mean, like, yeah, which makes sense. Awesome. Yeah. I mean, in fact, it was specifically saying, how can we give people the same feeling that they were getting from swarms that led us to go and implement graphs? Because my, like, just call the next agent with Python code was not a satisfactory answer to people. So it was like, okay, we've got to go and have a better answer for that. It's not like, let us to get to graphs. Yeah.Swyx [00:21:56]: I mean, it's a minimal viable graph in some sense. What are the shapes of graphs that people should know? So the way that I would phrase this is I think Anthropic did a very good public service and also kind of surprisingly influential blog post, I would say, when they wrote Building Effective Agents. We actually have the authors coming to speak at my conference in New York, which I think you're giving a workshop at. Yeah.Samuel [00:22:24]: I'm trying to work it out. But yes, I think so.Swyx [00:22:26]: Tell me if you're not. yeah, I mean, like, that was the first, I think, authoritative view of, like, what kinds of graphs exist in agents and let's give each of them a name so that everyone is on the same page. So I'm just kind of curious if you have community names or top five patterns of graphs.Samuel [00:22:44]: I don't have top five patterns of graphs. I would love to see what people are building with them. But like, it's been it's only been a couple of weeks. And of course, there's a point is that. Because they're relatively unopinionated about what you can go and do with them. They don't suit them. Like, you can go and do lots of lots of things with them, but they don't have the structure to go and have like specific names as much as perhaps like some other systems do. I think what our agents are, which have a name and I can't remember what it is, but this basically system of like, decide what tool to call, go back to the center, decide what tool to call, go back to the center and then exit. One form of graph, which, as I say, like our agents are effectively one implementation of a graph, which is why under the hood they are now using graphs. And it'll be interesting to see over the next few years whether we end up with these like predefined graph names or graph structures or whether it's just like, yep, I built a graph or whether graphs just turn out not to match people's mental image of what they want and die away. We'll see.Swyx [00:23:38]: I think there is always appeal. Every developer eventually gets graph religion and goes, oh, yeah, everything's a graph. And then they probably over rotate and go go too far into graphs. And then they have to learn a whole bunch of DSLs. And then they're like, actually, I didn't need that. I need this. And they scale back a little bit.Samuel [00:23:55]: I'm at the beginning of that process. I'm currently a graph maximalist, although I haven't actually put any into production yet. But yeah.Swyx [00:24:02]: This has a lot of philosophical connections with other work coming out of UC Berkeley on compounding AI systems. I don't know if you know of or care. This is the Gartner world of things where they need some kind of industry terminology to sell it to enterprises. I don't know if you know about any of that.Samuel [00:24:24]: I haven't. I probably should. I should probably do it because I should probably get better at selling to enterprises. But no, no, I don't. Not right now.Swyx [00:24:29]: This is really the argument is that instead of putting everything in one model, you have more control and more maybe observability to if you break everything out into composing little models and changing them together. And obviously, then you need an orchestration framework to do that. Yeah.Samuel [00:24:47]: And it makes complete sense. And one of the things we've seen with agents is they work well when they work well. But when they. Even if you have the observability through log five that you can see what was going on, if you don't have a nice hook point to say, hang on, this is all gone wrong. You have a relatively blunt instrument of basically erroring when you exceed some kind of limit. But like what you need to be able to do is effectively iterate through these runs so that you can have your own control flow where you're like, OK, we've gone too far. And that's where one of the neat things about our graph implementation is you can basically call next in a loop rather than just running the full graph. And therefore, you have this opportunity to to break out of it. But yeah, basically, it's the same point, which is like if you have two bigger unit of work to some extent, whether or not it involves gen AI. But obviously, it's particularly problematic in gen AI. You only find out afterwards when you've spent quite a lot of time and or money when it's gone off and done done the wrong thing.Swyx [00:25:39]: Oh, drop on this. We're not going to resolve this here, but I'll drop this and then we can move on to the next thing. This is the common way that we we developers talk about this. And then the machine learning researchers look at us. And laugh and say, that's cute. And then they just train a bigger model and they wipe us out in the next training run. So I think there's a certain amount of we are fighting the bitter lesson here. We're fighting AGI. And, you know, when AGI arrives, this will all go away. Obviously, on Latent Space, we don't really discuss that because I think AGI is kind of this hand wavy concept that isn't super relevant. But I think we have to respect that. For example, you could do a chain of thoughts with graphs and you could manually orchestrate a nice little graph that does like. Reflect, think about if you need more, more inference time, compute, you know, that's the hot term now. And then think again and, you know, scale that up. Or you could train Strawberry and DeepSeq R1. Right.Samuel [00:26:32]: I saw someone saying recently, oh, they were really optimistic about agents because models are getting faster exponentially. And I like took a certain amount of self-control not to describe that it wasn't exponential. But my main point was. If models are getting faster as quickly as you say they are, then we don't need agents and we don't really need any of these abstraction layers. We can just give our model and, you know, access to the Internet, cross our fingers and hope for the best. Agents, agent frameworks, graphs, all of this stuff is basically making up for the fact that right now the models are not that clever. In the same way that if you're running a customer service business and you have loads of people sitting answering telephones, the less well trained they are, the less that you trust them, the more that you need to give them a script to go through. Whereas, you know, so if you're running a bank and you have lots of customer service people who you don't trust that much, then you tell them exactly what to say. If you're doing high net worth banking, you just employ people who you think are going to be charming to other rich people and set them off to go and have coffee with people. Right. And the same is true of models. The more intelligent they are, the less we need to tell them, like structure what they go and do and constrain the routes in which they take.Swyx [00:27:42]: Yeah. Yeah. Agree with that. So I'm happy to move on. So the other parts of Pydantic AI that are worth commenting on, and this is like my last rant, I promise. So obviously, every framework needs to do its sort of model adapter layer, which is, oh, you can easily swap from OpenAI to Cloud to Grok. You also have, which I didn't know about, Google GLA, which I didn't really know about until I saw this in your docs, which is generative language API. I assume that's AI Studio? Yes.Samuel [00:28:13]: Google don't have good names for it. So Vertex is very clear. That seems to be the API that like some of the things use, although it returns 503 about 20% of the time. So... Vertex? No. Vertex, fine. But the... Oh, oh. GLA. Yeah. Yeah.Swyx [00:28:28]: I agree with that.Samuel [00:28:29]: So we have, again, another example of like, well, I think we go the extra mile in terms of engineering is we run on every commit, at least commit to main, we run tests against the live models. Not lots of tests, but like a handful of them. Oh, okay. And we had a point last week where, yeah, GLA is a little bit better. GLA1 was failing every single run. One of their tests would fail. And we, I think we might even have commented out that one at the moment. So like all of the models fail more often than you might expect, but like that one seems to be particularly likely to fail. But Vertex is the same API, but much more reliable.Swyx [00:29:01]: My rant here is that, you know, versions of this appear in Langchain and every single framework has to have its own little thing, a version of that. I would put to you, and then, you know, this is, this can be agree to disagree. This is not needed in Pydantic AI. I would much rather you adopt a layer like Lite LLM or what's the other one in JavaScript port key. And that's their job. They focus on that one thing and they, they normalize APIs for you. All new models are automatically added and you don't have to duplicate this inside of your framework. So for example, if I wanted to use deep seek, I'm out of luck because Pydantic AI doesn't have deep seek yet.Samuel [00:29:38]: Yeah, it does.Swyx [00:29:39]: Oh, it does. Okay. I'm sorry. But you know what I mean? Should this live in your code or should it live in a layer that's kind of your API gateway that's a defined piece of infrastructure that people have?Samuel [00:29:49]: And I think if a company who are well known, who are respected by everyone had come along and done this at the right time, maybe we should have done it a year and a half ago and said, we're going to be the universal AI layer. That would have been a credible thing to do. I've heard varying reports of Lite LLM is the truth. And it didn't seem to have exactly the type safety that we needed. Also, as I understand it, and again, I haven't looked into it in great detail. Part of their business model is proxying the request through their, through their own system to do the generalization. That would be an enormous put off to an awful lot of people. Honestly, the truth is I don't think it is that much work unifying the model. I get where you're coming from. I kind of see your point. I think the truth is that everyone is centralizing around open AIs. Open AI's API is the one to do. So DeepSeq support that. Grok with OK support that. Ollama also does it. I mean, if there is that library right now, it's more or less the open AI SDK. And it's very high quality. It's well type checked. It uses Pydantic. So I'm biased. But I mean, I think it's pretty well respected anyway.Swyx [00:30:57]: There's different ways to do this. Because also, it's not just about normalizing the APIs. You have to do secret management and all that stuff.Samuel [00:31:05]: Yeah. And there's also. There's Vertex and Bedrock, which to one extent or another, effectively, they host multiple models, but they don't unify the API. But they do unify the auth, as I understand it. Although we're halfway through doing Bedrock. So I don't know about it that well. But they're kind of weird hybrids because they support multiple models. But like I say, the auth is centralized.Swyx [00:31:28]: Yeah, I'm surprised they don't unify the API. That seems like something that I would do. You know, we can discuss all this all day. There's a lot of APIs. I agree.Samuel [00:31:36]: It would be nice if there was a universal one that we didn't have to go and build.Alessio [00:31:39]: And I guess the other side of, you know, routing model and picking models like evals. How do you actually figure out which one you should be using? I know you have one. First of all, you have very good support for mocking in unit tests, which is something that a lot of other frameworks don't do. So, you know, my favorite Ruby library is VCR because it just, you know, it just lets me store the HTTP requests and replay them. That part I'll kind of skip. I think you are busy like this test model. We're like just through Python. You try and figure out what the model might respond without actually calling the model. And then you have the function model where people can kind of customize outputs. Any other fun stories maybe from there? Or is it just what you see is what you get, so to speak?Samuel [00:32:18]: On those two, I think what you see is what you get. On the evals, I think watch this space. I think it's something that like, again, I was somewhat cynical about for some time. Still have my cynicism about some of the well, it's unfortunate that so many different things are called evals. It would be nice if we could agree. What they are and what they're not. But look, I think it's a really important space. I think it's something that we're going to be working on soon, both in Pydantic AI and in LogFire to try and support better because it's like it's an unsolved problem.Alessio [00:32:45]: Yeah, you do say in your doc that anyone who claims to know for sure exactly how your eval should be defined can safely be ignored.Samuel [00:32:52]: We'll delete that sentence when we tell people how to do their evals.Alessio [00:32:56]: Exactly. I was like, we need we need a snapshot of this today. And so let's talk about eval. So there's kind of like the vibe. Yeah. So you have evals, which is what you do when you're building. Right. Because you cannot really like test it that many times to get statistical significance. And then there's the production eval. So you also have LogFire, which is kind of like your observability product, which I tried before. It's very nice. What are some of the learnings you've had from building an observability tool for LEMPs? And yeah, as people think about evals, even like what are the right things to measure? What are like the right number of samples that you need to actually start making decisions?Samuel [00:33:33]: I'm not the best person to answer that is the truth. So I'm not going to come in here and tell you that I think I know the answer on the exact number. I mean, we can do some back of the envelope statistics calculations to work out that like having 30 probably gets you most of the statistical value of having 200 for, you know, by definition, 15% of the work. But the exact like how many examples do you need? For example, that's a much harder question to answer because it's, you know, it's deep within the how models operate in terms of LogFire. One of the reasons we built LogFire the way we have and we allow you to write SQL directly against your data and we're trying to build the like powerful fundamentals of observability is precisely because we know we don't know the answers. And so allowing people to go and innovate on how they're going to consume that stuff and how they're going to process it is we think that's valuable. Because even if we come along and offer you an evals framework on top of LogFire, it won't be right in all regards. And we want people to be able to go and innovate and being able to write their own SQL connected to the API. And effectively query the data like it's a database with SQL allows people to innovate on that stuff. And that's what allows us to do it as well. I mean, we do a bunch of like testing what's possible by basically writing SQL directly against LogFire as any user could. I think the other the other really interesting bit that's going on in observability is OpenTelemetry is centralizing around semantic attributes for GenAI. So it's a relatively new project. A lot of it's still being added at the moment. But basically the idea that like. They unify how both SDKs and or agent frameworks send observability data to to any OpenTelemetry endpoint. And so, again, we can go and having that unification allows us to go and like basically compare different libraries, compare different models much better. That stuff's in a very like early stage of development. One of the things we're going to be working on pretty soon is basically, I suspect, GenAI will be the first agent framework that implements those semantic attributes properly. Because, again, we control and we can say this is important for observability, whereas most of the other agent frameworks are not maintained by people who are trying to do observability. With the exception of Langchain, where they have the observability platform, but they chose not to go down the OpenTelemetry route. So they're like plowing their own furrow. And, you know, they're a lot they're even further away from standardization.Alessio [00:35:51]: Can you maybe just give a quick overview of how OTEL ties into the AI workflows? There's kind of like the question of is, you know, a trace. And a span like a LLM call. Is it the agent? It's kind of like the broader thing you're tracking. How should people think about it?Samuel [00:36:06]: Yeah, so they have a PR that I think may have now been merged from someone at IBM talking about remote agents and trying to support this concept of remote agents within GenAI. I'm not particularly compelled by that because I don't think that like that's actually by any means the common use case. But like, I suppose it's fine for it to be there. The majority of the stuff in OTEL is basically defining how you would instrument. A given call to an LLM. So basically the actual LLM call, what data you would send to your telemetry provider, how you would structure that. Apart from this slightly odd stuff on remote agents, most of the like agent level consideration is not yet implemented in is not yet decided effectively. And so there's a bit of ambiguity. Obviously, what's good about OTEL is you can in the end send whatever attributes you like. But yeah, there's quite a lot of churn in that space and exactly how we store the data. I think that one of the most interesting things, though, is that if you think about observability. Traditionally, it was sure everyone would say our observability data is very important. We must keep it safe. But actually, companies work very hard to basically not have anything that sensitive in their observability data. So if you're a doctor in a hospital and you search for a drug for an STI, the sequel might be sent to the observability provider. But none of the parameters would. It wouldn't have the patient number or their name or the drug. With GenAI, that distinction doesn't exist because it's all just messed up in the text. If you have that same patient asking an LLM how to. What drug they should take or how to stop smoking. You can't extract the PII and not send it to the observability platform. So the sensitivity of the data that's going to end up in observability platforms is going to be like basically different order of magnitude to what's in what you would normally send to Datadog. Of course, you can make a mistake and send someone's password or their card number to Datadog. But that would be seen as a as a like mistake. Whereas in GenAI, a lot of data is going to be sent. And I think that's why companies like Langsmith and are trying hard to offer observability. On prem, because there's a bunch of companies who are happy for Datadog to be cloud hosted, but want self-hosted self-hosting for this observability stuff with GenAI.Alessio [00:38:09]: And are you doing any of that today? Because I know in each of the spans you have like the number of tokens, you have the context, you're just storing everything. And then you're going to offer kind of like a self-hosting for the platform, basically. Yeah. Yeah.Samuel [00:38:23]: So we have scrubbing roughly equivalent to what the other observability platforms have. So if we, you know, if we see password as the key, we won't send the value. But like, like I said, that doesn't really work in GenAI. So we're accepting we're going to have to store a lot of data and then we'll offer self-hosting for those people who can afford it and who need it.Alessio [00:38:42]: And then this is, I think, the first time that most of the workloads performance is depending on a third party. You know, like if you're looking at Datadog data, usually it's your app that is driving the latency and like the memory usage and all of that. Here you're going to have spans that maybe take a long time to perform because the GLA API is not working or because OpenAI is kind of like overwhelmed. Do you do anything there since like the provider is almost like the same across customers? You know, like, are you trying to surface these things for people and say, hey, this was like a very slow span, but actually all customers using OpenAI right now are seeing the same thing. So maybe don't worry about it or.Samuel [00:39:20]: Not yet. We do a few things that people don't generally do in OTA. So we send. We send information at the beginning. At the beginning of a trace as well as sorry, at the beginning of a span, as well as when it finishes. By default, OTA only sends you data when the span finishes. So if you think about a request which might take like 20 seconds, even if some of the intermediate spans finished earlier, you can't basically place them on the page until you get the top level span. And so if you're using standard OTA, you can't show anything until those requests are finished. When those requests are taking a few hundred milliseconds, it doesn't really matter. But when you're doing Gen AI calls or when you're like running a batch job that might take 30 minutes. That like latency of not being able to see the span is like crippling to understanding your application. And so we've we do a bunch of slightly complex stuff to basically send data about a span as it starts, which is closely related. Yeah.Alessio [00:40:09]: Any thoughts on all the other people trying to build on top of OpenTelemetry in different languages, too? There's like the OpenLEmetry project, which doesn't really roll off the tongue. But how do you see the future of these kind of tools? Is everybody going to have to build? Why does everybody want to build? They want to build their own open source observability thing to then sell?Samuel [00:40:29]: I mean, we are not going off and trying to instrument the likes of the OpenAI SDK with the new semantic attributes, because at some point that's going to happen and it's going to live inside OTEL and we might help with it. But we're a tiny team. We don't have time to go and do all of that work. So OpenLEmetry, like interesting project. But I suspect eventually most of those semantic like that instrumentation of the big of the SDKs will live, like I say, inside the main OpenTelemetry report. I suppose. What happens to the agent frameworks? What data you basically need at the framework level to get the context is kind of unclear. I don't think we know the answer yet. But I mean, I was on the, I guess this is kind of semi-public, because I was on the call with the OpenTelemetry call last week talking about GenAI. And there was someone from Arize talking about the challenges they have trying to get OpenTelemetry data out of Langchain, where it's not like natively implemented. And obviously they're having quite a tough time. And I was realizing, hadn't really realized this before, but how lucky we are to primarily be talking about our own agent framework, where we have the control rather than trying to go and instrument other people's.Swyx [00:41:36]: Sorry, I actually didn't know about this semantic conventions thing. It looks like, yeah, it's merged into main OTel. What should people know about this? I had never heard of it before.Samuel [00:41:45]: Yeah, I think it looks like a great start. I think there's some unknowns around how you send the messages that go back and forth, which is kind of the most important part. It's the most important thing of all. And that is moved out of attributes and into OTel events. OTel events in turn are moving from being on a span to being their own top-level API where you send data. So there's a bunch of churn still going on. I'm impressed by how fast the OTel community is moving on this project. I guess they, like everyone else, get that this is important, and it's something that people are crying out to get instrumentation off. So I'm kind of pleasantly surprised at how fast they're moving, but it makes sense.Swyx [00:42:25]: I'm just kind of browsing through the specification. I can already see that this basically bakes in whatever the previous paradigm was. So now they have genai.usage.prompt tokens and genai.usage.completion tokens. And obviously now we have reasoning tokens as well. And then only one form of sampling, which is top-p. You're basically baking in or sort of reifying things that you think are important today, but it's not a super foolproof way of doing this for the future. Yeah.Samuel [00:42:54]: I mean, that's what's neat about OTel is you can always go and send another attribute and that's fine. It's just there are a bunch that are agreed on. But I would say, you know, to come back to your previous point about whether or not we should be relying on one centralized abstraction layer, this stuff is moving so fast that if you start relying on someone else's standard, you risk basically falling behind because you're relying on someone else to keep things up to date.Swyx [00:43:14]: Or you fall behind because you've got other things going on.Samuel [00:43:17]: Yeah, yeah. That's fair. That's fair.Swyx [00:43:19]: Any other observations just about building LogFire, actually? Let's just talk about this. So you announced LogFire. I was kind of only familiar with LogFire because of your Series A announcement. I actually thought you were making a separate company. I remember some amount of confusion with you when that came out. So to be clear, it's Pydantic LogFire and the company is one company that has kind of two products, an open source thing and an observability thing, correct? Yeah. I was just kind of curious, like any learnings building LogFire? So classic question is, do you use ClickHouse? Is this like the standard persistence layer? Any learnings doing that?Samuel [00:43:54]: We don't use ClickHouse. We started building our database with ClickHouse, moved off ClickHouse onto Timescale, which is a Postgres extension to do analytical databases. Wow. And then moved off Timescale onto DataFusion. And we're basically now building, it's DataFusion, but it's kind of our own database. Bogomil is not entirely happy that we went through three databases before we chose one. I'll say that. But like, we've got to the right one in the end. I think we could have realized that Timescale wasn't right. I think ClickHouse. They both taught us a lot and we're in a great place now. But like, yeah, it's been a real journey on the database in particular.Swyx [00:44:28]: Okay. So, you know, as a database nerd, I have to like double click on this, right? So ClickHouse is supposed to be the ideal backend for anything like this. And then moving from ClickHouse to Timescale is another counterintuitive move that I didn't expect because, you know, Timescale is like an extension on top of Postgres. Not super meant for like high volume logging. But like, yeah, tell us those decisions.Samuel [00:44:50]: So at the time, ClickHouse did not have good support for JSON. I was speaking to someone yesterday and said ClickHouse doesn't have good support for JSON and got roundly stepped on because apparently it does now. So they've obviously gone and built their proper JSON support. But like back when we were trying to use it, I guess a year ago or a bit more than a year ago, everything happened to be a map and maps are a pain to try and do like looking up JSON type data. And obviously all these attributes, everything you're talking about there in terms of the GenAI stuff. You can choose to make them top level columns if you want. But the simplest thing is just to put them all into a big JSON pile. And that was a problem with ClickHouse. Also, ClickHouse had some really ugly edge cases like by default, or at least until I complained about it a lot, ClickHouse thought that two nanoseconds was longer than one second because they compared intervals just by the number, not the unit. And I complained about that a lot. And then they caused it to raise an error and just say you have to have the same unit. Then I complained a bit more. And I think as I understand it now, they have some. They convert between units. But like stuff like that, when all you're looking at is when a lot of what you're doing is comparing the duration of spans was really painful. Also things like you can't subtract two date times to get an interval. You have to use the date sub function. But like the fundamental thing is because we want our end users to write SQL, the like quality of the SQL, how easy it is to write, matters way more to us than if you're building like a platform on top where your developers are going to write the SQL. And once it's written and it's working, you don't mind too much. So I think that's like one of the fundamental differences. The other problem that I have with the ClickHouse and Impact Timescale is that like the ultimate architecture, the like snowflake architecture of binary data in object store queried with some kind of cache from nearby. They both have it, but it's closed sourced and you only get it if you go and use their hosted versions. And so even if we had got through all the problems with Timescale or ClickHouse, we would end up like, you know, they would want to be taking their 80% margin. And then we would be wanting to take that would basically leave us less space for margin. Whereas data fusion. Properly open source, all of that same tooling is open source. And for us as a team of people with a lot of Rust expertise, data fusion, which is implemented in Rust, we can literally dive into it and go and change it. So, for example, I found that there were some slowdowns in data fusion's string comparison kernel for doing like string contains. And it's just Rust code. And I could go and rewrite the string comparison kernel to be faster. Or, for example, data fusion, when we started using it, didn't have JSON support. Obviously, as I've said, it's something we can do. It's something we needed. I was able to go and implement that in a weekend using our JSON parser that we built for Pydantic Core. So it's the fact that like data fusion is like for us the perfect mixture of a toolbox to build a database with, not a database. And we can go and implement stuff on top of it in a way that like if you were trying to do that in Postgres or in ClickHouse. I mean, ClickHouse would be easier because it's C++, relatively modern C++. But like as a team of people who are not C++ experts, that's much scarier than data fusion for us.Swyx [00:47:47]: Yeah, that's a beautiful rant.Alessio [00:47:49]: That's funny. Most people don't think they have agency on these projects. They're kind of like, oh, I should use this or I should use that. They're not really like, what should I pick so that I contribute the most back to it? You know, so but I think you obviously have an open source first mindset. So that makes a lot of sense.Samuel [00:48:05]: I think if we were probably better as a startup, a better startup and faster moving and just like headlong determined to get in front of customers as fast as possible, we should have just started with ClickHouse. I hope that long term we're in a better place for having worked with data fusion. We like we're quite engaged now with the data fusion community. Andrew Lam, who maintains data fusion, is an advisor to us. We're in a really good place now. But yeah, it's definitely slowed us down relative to just like building on ClickHouse and moving as fast as we can.Swyx [00:48:34]: OK, we're about to zoom out and do Pydantic run and all the other stuff. But, you know, my last question on LogFire is really, you know, at some point you run out sort of community goodwill just because like, oh, I use Pydantic. I love Pydantic. I'm going to use LogFire. OK, then you start entering the territory of the Datadogs, the Sentrys and the honeycombs. Yeah. So where are you going to really spike here? What differentiator here?Samuel [00:48:59]: I wasn't writing code in 2001, but I'm assuming that there were people talking about like web observability and then web observability stopped being a thing, not because the web stopped being a thing, but because all observability had to do web. If you were talking to people in 2010 or 2012, they would have talked about cloud observability. Now that's not a term because all observability is cloud first. The same is going to happen to gen AI. And so whether or not you're trying to compete with Datadog or with Arise and Langsmith, you've got to do first class. You've got to do general purpose observability with first class support for AI. And as far as I know, we're the only people really trying to do that. I mean, I think Datadog is starting in that direction. And to be honest, I think Datadog is a much like scarier company to compete with than the AI specific observability platforms. Because in my opinion, and I've also heard this from lots of customers, AI specific observability where you don't see everything else going on in your app is not actually that useful. Our hope is that we can build the first general purpose observability platform with first class support for AI. And that we have this open source heritage of putting developer experience first that other companies haven't done. For all I'm a fan of Datadog and what they've done. If you search Datadog logging Python. And you just try as a like a non-observability expert to get something up and running with Datadog and Python. It's not trivial, right? That's something Sentry have done amazingly well. But like there's enormous space in most of observability to do DX better.Alessio [00:50:27]: Since you mentioned Sentry, I'm curious how you thought about licensing and all of that. Obviously, your MIT license, you don't have any rolling license like Sentry has where you can only use an open source, like the one year old version of it. Was that a hard decision?Samuel [00:50:41]: So to be clear, LogFire is co-sourced. So Pydantic and Pydantic AI are MIT licensed and like properly open source. And then LogFire for now is completely closed source. And in fact, the struggles that Sentry have had with licensing and the like weird pushback the community gives when they take something that's closed source and make it source available just meant that we just avoided that whole subject matter. I think the other way to look at it is like in terms of either headcount or revenue or dollars in the bank. The amount of open source we do as a company is we've got to be open source. We're up there with the most prolific open source companies, like I say, per head. And so we didn't feel like we were morally obligated to make LogFire open source. We have Pydantic. Pydantic is a foundational library in Python. That and now Pydantic AI are our contribution to open source. And then LogFire is like openly for profit, right? As in we're not claiming otherwise. We're not sort of trying to walk a line if it's open source. But really, we want to make it hard to deploy. So you probably want to pay us. We're trying to be straight. That it's to pay for. We could change that at some point in the future, but it's not an immediate plan.Alessio [00:51:48]: All right. So the first one I saw this new I don't know if it's like a product you're building the Pydantic that run, which is a Python browser sandbox. What was the inspiration behind that? We talk a lot about code interpreter for lamps. I'm an investor in a company called E2B, which is a code sandbox as a service for remote execution. Yeah. What's the Pydantic that run story?Samuel [00:52:09]: So Pydantic that run is again completely open source. I have no interest in making it into a product. We just needed a sandbox to be able to demo LogFire in particular, but also Pydantic AI. So it doesn't have it yet, but I'm going to add basically a proxy to OpenAI and the other models so that you can run Pydantic AI in the browser. See how it works. Tweak the prompt, et cetera, et cetera. And we'll have some kind of limit per day of what you can spend on it or like what the spend is. The other thing we wanted to b
Guest Alison Hill Panelist Richard Littauer Show Notes We're kicking off the new year of Sustain with host Richard Littauer discussing sustaining open source software with guest, Alison Hill, VP of Product at Anaconda, and a cognitive scientist with a PhD in psychology. Alison shares her journey from academia to industry, emphasizing the importance of statistics and data science in her career. She explains her role at Anaconda, focusing on developing secure and compatible distribution of Python packages and managing the community repository, Anaconda.org. The conversation covers the significance of product management in open source projects, particularly those with corporate backing, and how these roles can help in balancing user needs and business goals. In addition, Alison shares her thoughts on the challenges and strategies for maintaining open source projects without corporate support and touches on the ‘palmer penguins' project. Click to download now! [00:01:13] Alison discusses her transition from academic research in cognitive science to industry and data science, emphasizing her passion for statistics and education. [00:02:41] Alison explains her work at Anaconda, focusing on product management and the Anaconda distribution, aiming to ease the use of Python and R packages in the industry and academia. She also elaborates on other projects she oversees, including Anaconda.org and its role in supporting open source projects and enterprise needs. [00:05:17] We hear how Anaconda sustains itself financially through enterprise offerings and the balance of supporting open source while maintaining a business model. [00:07:14] Alison shares her previous experience as the first PM of data science communication at Posit (formerly RStudio) and her role in enhancing data science education and product development. [00:12:49] Richard and Alison explore the challenges of sustaining open source projects without corporate backing and strategies for maintaining personal and project health in the open source community. Alison discusses common mistakes companies make by confusing project management with product management in open source projects. [00:17:18] Richard asks about the skills needed for developers to adopt a product-oriented approach. Alison suggests that successful product-oriented developers often have high empathy for end-users and experience with the pain points at scale, which helps them anticipate and innovate solutions effectively. [00:20:49] Richard expresses concerns about the sustainability of smaller, community-led open source projects that lack corporate backing and the structured support that comes with it. Alison acknowledges her limited experience with non-corporate open source projects but highlights the difficulty in maintaining such projects without institutional support, and she shares her personal challenges with keeping up with open source project demands. [00:27:41] Alison stresses the importance of clear goals and understanding the implications of joining larger ecosystems, reflects on the need for clarity about the desired outcomes when joining larger ecosystems, and shares examples of successful and unsuccessful engagements in such settings. [00:29:52] She discusses alternative sustainability models, including paid support and subscriptions. [00:33:00] Alison brings up the example of Apache Arrow and the challenges it faced with corporate sponsorship. [00:34:23] We wrap up with Richard acknowledging that not all open source projects require significant funding or formal business models, and Alison explains the ‘palmerpenguins' project she did at the beginning of COVID. [00:37:07] Find out where you can follow Alison on the web. Quotes [00:22:18] “What is the minimum level of support you need to not feel like you're drowning?” Spotlight [00:38:14] Richard's spotlight is Bernard Cornwell. [00:38:39] Alison's spotlight is the book, Impossible Creatures. Links SustainOSS (https://sustainoss.org/) podcast@sustainoss.org (mailto:podcast@sustainoss.org) richard@sustainoss.org (mailto:richard@sustainoss.org) SustainOSS Discourse (https://discourse.sustainoss.org/) SustainOSS Mastodon (https://mastodon.social/tags/sustainoss) Open Collective-SustainOSS (Contribute) (https://opencollective.com/sustainoss) Richard Littauer Socials (https://www.burntfen.com/2023-05-30/socials) Alison Hill, PhD Website (https://www.apreshill.com/) Alison Presmanes Hill, PhD LinkedIn (https://www.linkedin.com/in/apreshill/) Alison Presmanes Hill GitHub (https://github.com/apreshill) Anaconda (https://www.anaconda.com/) Anaconda.org (https://anaconda.org/) The Third Bit-Dr. Greg Wilson (https://third-bit.com/about/) Sustain Podcast-Episode 64: Travis Oliphant and Russel Pekrul on NumPy, Anaconda, and giving back with FairOSS (https://podcast.sustainoss.org/guests/oliphant) Intercom on Product Management (https://www.intercom.com/resources/books/intercom-product-management) Sustain Podcast-Episode 135: Tracy Hinds on Node.js's CommComm and PMs in Open Source (https://podcast.sustainoss.org/135) Hadley Wickham (https://en.wikipedia.org/wiki/Hadley_Wickham) palmerpenguins-GitHub (https://allisonhorst.github.io/palmerpenguins/articles/intro.html) Bernard Cornwell (https://en.wikipedia.org/wiki/Bernard_Cornwell) Impossible Creatures by Katherine Rundell (https://www.penguinrandomhouse.com/books/743371/impossible-creatures-by-katherine-rundell-illustrated-by-ashley-mackenzie/) Credits Produced by Richard Littauer (https://www.burntfen.com/) Edited by Paul M. Bahr at Peachtree Sound (https://www.peachtreesound.com/) Show notes by DeAnn Bahr Peachtree Sound (https://www.peachtreesound.com/) Special Guest: Alison Hill.
In this episode, Bryce and Conor chat about NumPy, summed-area tables and more.Link to Episode 213 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)TwitterADSP: The PodcastConor HoekstraBryce Adelstein LelbachShow NotesDate Generated: 2024-12-10Date Released: 2024-12-20NumPyScalar FunctionsMonadic Pervasive Functions (Uiua)NumbaPython frompyfuncSummed-area tablesSummed-area tables in BQN (Tweet)BQN ˘ (Cells)Leading Axis TheorySoftmaxllm.cConvolutional Neural Networks in APLIntro Song InfoMiss You by Sarah Jansen https://soundcloud.com/sarahjansenmusicCreative Commons — Attribution 3.0 Unported — CC BY 3.0Free Download / Stream: http://bit.ly/l-miss-youMusic promoted by Audio Library https://youtu.be/iYYxnasvfx8
In this episode, I have a wonderful conversation with Ralf Gommers, a director of Quansight Labs, and he's a key contributor to NumPy. He shares his journey of how he started working on and contributing to several open source projects. But more importantly, the journey that he took inside NumPy. NumPy is a project that most scientific computing projects rely on heavily. Ralf shares the thought behind its governance model, the importance of having community funding models, and how to maintain long term open-source projects.Support the showSubscribe to mailing list here.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Glitch Token Catalog - (Almost) a Full Clear, published by Lao Mein on September 22, 2024 on LessWrong. This is a collection of every unidentified GPT2 glitch token listed in the third glitch token archaeology post. I was able to find the source of every single one, except for "?????-" and "?????-?????-"[1]. Please tell me if I missed one, or you've discovered one and don't understand where it came from. This isn't meant to be a well-written analysis, just a quick repository of my glitch-hunting observations. I plan on writing up and categorizing all of these in greater detail in future posts. The first of which is here. I used OpenWebText, a recreation of GPT2's training data, for all experiments in this post. I tokenized every .gz file in the archive and made a boolean Numpy array of each tokens that was present at least once. This allowed me to quickly identify infrequent tokens in the dataset and pull up the textual context with regular expressions. If there was an issue with overlap, I used a tokenizer-based extraction instead. All data/code available upon request. The leftmost column is token id, the middle is the token string, and the right column is # of files the token was present in (out of 20610). GPT2 has 50256 total tokens. GPT2 tokens with the lowest frequency in OpenWebText 30898 'embedreportprint' 0 33434 ' 士' 0 43453 ' SolidGoldMagikarp' 0 1849 'xa0' 0 47654 ' xa0xa0' 0 50009 ' strutConnector' 0 36173 ' RandomRedditor' 0 214 'x1a' 0 42424 'DragonMagazine' 0 180 ' ' 0 187 ' ' 0 186 ' ' 0 30213 ' externalToEVAOnly' 0 30212 ' externalToEVA' 0 30211 ' guiIcon' 0 185 ' ' 0 30210 ' guiActiveUnfocused' 0 30209 ' unfocusedRange' 0 184 ' ' 0 30202 ' guiName' 0 183 ' ' 0 30905 'rawdownload' 0 39906 'EStream' 0 33454 '龍喚士' 0 42586 ' srfN' 0 25992 ' 裏覚醒' 0 43065 ' srfAttach' 0 11504 ' xa0 xa0' 0 39172 'xa0xa0xa0xa0xa0xa0xa0xa0xa0xa0xa0xa0xa0xa0xa0xa0' 0 40240 'oreAndOnline' 0 40241 'InstoreAndOnline' 0 33477 'xa0xa0xa0' 0 36174 ' RandomRedditorWithNo' 0 37574 'StreamerBot' 0 46600 ' Adinida' 0 182 ' ' 0 29372 ' guiActiveUn' 0 43177 'EStreamFrame' 0 22686 ' xa0 xa0 xa0 xa0' 0 23282 ' davidjl' 0 47571 ' DevOnline' 0 39752 'quickShip' 0 44320 'nxa0' 0 8828 'xa0xa0xa0xa0' 0 39820 '龍 ' 0 39821 '龍契士' 0 28666 'PsyNetMessage' 0 35207 ' attRot' 0 181 ' ' 0 18472 ' guiActive' 0 179 ' ' 0 17811 'xa0xa0xa0xa0xa0xa0xa0xa0' 0 20174 ' 裏 ' 0 212 'x18' 0 211 'x17' 0 210 'x16' 0 209 'x15' 0 208 'x14' 0 31666 '?????-?????-' 0 207 'x13' 0 206 'x12' 0 213 'x19' 0 205 'x11' 0 203 'x0f' 0 202 'x0e' 0 31957 'cffffcc' 0 200 'x0c' 0 199 'x0b' 0 197 't' 0 196 'x08' 0 195 'x07' 0 194 'x06' 0 193 'x05' 0 204 'x10' 0 45545 ' サーティワン' 0 201 'r' 0 216 'x1c' 0 37842 ' partName' 0 45706 ' xa0 xa0 xa0 xa0 xa0 xa0 xa0 xa0' 0 124 ' ' 0 125 ' ' 0 178 ' ' 0 41380 'natureconservancy' 0 41383 'assetsadobe' 0 177 ' ' 0 215 'x1b' 0 41551 'Downloadha' 0 4603 'xa0xa0' 0 42202 'GoldMagikarp' 0 42089 ' TheNitrome' 0 217 'x1d' 0 218 'x1e' 0 42090 ' TheNitromeFan' 0 192 'x04' 0 191 'x03' 0 219 'x1f' 0 189 'x01' 0 45544 ' サーティ' 0 5624 ' xa0' 0 190 'x02' 0 40242 'BuyableInstoreAndOnline' 1 36935 ' dstg' 1 36940 ' istg' 1 45003 ' SetTextColor' 1 30897 'reportprint' 1 39757 'channelAvailability' 1 39756 'inventoryQuantity' 1 39755 'isSpecialOrderable' 1 39811 'soDeliveryDate' 1 39753 'quickShipAvailable' 1 39714 'isSpecial' 1 47198 'ItemTracker' 1 17900 ' Dragonbound' 1 45392 'dayName' 1 37579 'TPPStreamerBot' 1 31573 'ActionCode' 2 25193 'NetMessage' 2 39749 'DeliveryDate' 2 30208 ' externalTo' 2 43569 'ÍÍ' 2 34027 ' actionGroup' 2 34504 ' 裏 ' 2 39446 ' SetFontSize' 2 30899 'cloneembedreportprint' 2 32047 ' "$:/' 3 39803 'soType' 3 39177 'ItemThumbnailImage' 3 49781 'EngineDebug' 3 25658 '?????-' 3 33813 '=~=~' 3 48396 'ÛÛ' 3 34206 ...
Topics covered in this episode: Why I Still Use Python Virtual Environments in Docker Python Developer Survey Results Anaconda Code add-in for Microsoft Excel Disabling Scheduled Dependency Updates Extras Joke Watch on YouTube About the show Sponsored by us! Support our work through Our courses at Talk Python Training Hello, pytest! Course Patreon Supporters Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Show: @pythonbytes@fosstodon.org Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Monday at 10am PT. Older video versions available there too. Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to our friends of the show list, we'll never share it. Michael #1: Why I Still Use Python Virtual Environments in Docker by Hynek Schlawack I was going to cover Production-ready Docker Containers with uv but decided to take this diversion instead. Spend a lot of time thinking about the secondary effects of what you do. venvs are well known and well documented. Let's use them. Brian #2: Python Developer Survey Results “… official Python Developers Survey, conducted as a collaborative effort between the Python Software Foundation and JetBrains.” Python w/ Rust rising, but still only 7% ““The drop in HTML/CSS/JS might show that data science is increasing its share of Python.” - Paul Everitt 37% contribute to open source. Awesome. Favorite Resources: Podcasts Lots of familiar faces there. Awesome. Perhaps I shouldn't have decided to move “Python Test” back to Test & Code Usage “Data analysis” down, but I think that's because “data engineering” is added. Data, Web dev, ML, devops, academic, Testing is down 23% Python Versions Still some 2 out there Most folks on 3.10-3.12 Install from: mostly python.org Frameworks web: Flask, Django, Requests, FastAPI … testing: pytest, unittest, mock, doctest, tox, hypothesis, nose (2% might be the Python 2 people) Data science 77% use pandas, 72% NumPy OS: Windows still at 55% Packaging: venv up to 55% I imaging uv will be on the list next year requirements.txt 63%, pyproject.toml 32% virtual env in containers? 47% say no Michael #3: Anaconda Code add-in for Microsoft Excel Run their Python-powered projects in Excel locally with the Anaconda Code add-in Powered by PyScript, an Anaconda supported open source project that runs Python locally without install and setup Features Cells Run Independently Range to Multiple Types init.py file is static and cannot be edited, with Anaconda Code, users have the ability to access and edit imports and definitions, allowing you to write top-level functions and classes and reuse them wherever you need. A Customizable Environment Brian #4: Disabling Scheduled Dependency Updates David Lord Interesting discussion of as they happen or batching of upsates to dependencies dependencies come in requirements files GH Actions in CI workflows pre-commit hooks David was seeing 60 PRs per month when set up on monthly updates (3 ecosystems * 20 projects) new tool for updating GH actions: gha-update, allows for local updating of GH dependencies New process Run pip-compile, gha-update, and pre-commit locally. Update a project's dependencies when actively working on the project, not just whenever a dependency updates. Note that this works fine for dev dependencies, less so for security updates from run time dependencies. But for libraries, runtime dependencies are usually not pinned. Extras Brian: Test & Code coming back this week Michael: Code in a Castle event Python Bytes badge spotting Guido's post removed for moderation Joke: C will watch in silence
Спонсор подкаста — https://learn.python.ru/advanced Ведущие – Григорий Петров и Михаил Корнеев Новости выпуска: Должен ли Python использовать календарную нумерацию версий Релиз NumPy 2.0 Как устроена инфраструктура работы с уязвимостями в Python Современные практики для разработки на Python Ссылки выпуска: Курс Learn Python — https://learn.python.ru/advanced Канал Миши в Telegram — https://t.me/tricky_python Канал Moscow Python в Telegram — https://t.me/moscow_python Все выпуски — https://podcast.python.ru Митапы Moscow Python — https://moscowpython.ru
DjangoCon Europe 2024 (click here to comment) 6. Juli 2024, Jochen Ronny ist zurück von der DjangoCon Europe 2024 in Vigo
Topics covered in this episode: Joining Strings in Python: A "Huh" Moment 10 hard-to-swallow truths they won't tell you about software engineer job My thoughts on Python in Excel Extra, extra, extra Extras Joke Watch on YouTube About the show Sponsored by ScoutAPM: pythonbytes.fm/scout Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Show: @pythonbytes@fosstodon.org Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Tuesdays at 10am PT. Older video versions available there too. Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to our friends of the show list, we'll never share it. Brian #1: Joining Strings in Python: A "Huh" Moment Veronica Berglyd Olsen Standard solution to “read lines from a file, do some filtering, create a multiline string”: f = open("input_file.txt") filtered_text = "n".join(x for x in f if not x.startswith("#")) This uses a generator, file reading, and passes the generator to join. Another approach is to add brackets and pass that generator to a list comprehension: f = open("input_file.txt") filtered_text = "n".join([x for x in f if not x.startswith("#")]) At first glance, this seems to just be extra typing, but it's actually faster by 16% on CPython due to the implementation of .join() doing 2 passes on input if passed a generator. From Trey Hunner: “I do know that it's not possible to do 2 passes over a generator (since it'd be exhausted after the first pass) so from my understanding, the generator version requires an extra step of storing all the items in a list first.” Michael #2: 10 hard-to-swallow truths they won't tell you about software engineer job College will not prepare you for the job You will rarely get greenfield projects Nobody gives a BLANK about your clean code You will sometimes work with incompetent people Get used to being in meetings for hours They will ask you for estimates a lot of times Bugs will be your arch-enemy for life Uncertainty will be your toxic friend It will be almost impossible to disconnect from your job You will profit more from good soft skills than from good technical skills Brian #3: My thoughts on Python in Excel Felix Zumstein Interesting take on one person's experience with trying Python in Excel. “We wanted an alternative to VBA, but got an alternative to the Excel formula language” “Python runs in the cloud on Azure Container Instances and not inside Excel.” “DataFrames are great, but so are NumPy arrays and lists.” … lots of other interesting takaways. Michael #4: Extra, extra, extra Code in a castle - Michael's Python Zero to Hero course in Tuscany Polyfill.io JavaScript supply chain attack impacts over 100K sites Now required reading: Reasons to avoid Javascript CDNs Mac users served info-stealer malware through Google ads HTMX for the win! ssh to run remote commands > ssh user@server "command_to_run --arg1 --arg2" Extras Brian: A fun reaction to AI - I will not be showing the link on our live stream, due to colorful language. Michael: Coding in a Castle Developer Education Event Polyfill.io JavaScript supply chain attack impacts over 100K sites See Reasons to avoid Javascript CDNs Joke: HTML Hacker
Topics covered in this episode: PSF Elections coming up Cloud engineer gets 2 years for wiping ex-employer's code repos Python: Import by string with pkgutil.resolve_name() DuckDB goes 1.0 Extras Joke Watch on YouTube About the show Sponsored by ScoutAPM: pythonbytes.fm/scout Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Show: @pythonbytes@fosstodon.org Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Tuesdays at 10am PT. Older video versions available there too. Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to our friends of the show list, we'll never share it. Brian #1: PSF Elections coming up This is elections for the PSF Board and for 3 bylaw changes. To vote in the PSF election, you need to be a Supporting, Managing, Contributing, or Fellow member of the PSF, … And affirm your voting status by June 25. See Affirm your PSF Membership Voting Status for more details. Timeline Board Nominations open: Tuesday, June 11th, 2:00 pm UTC Board Nominations close: Tuesday, June 25th, 2:00 pm UTC Voter application cut-off date: Tuesday, June 25th, 2:00 pm UTC same date is also for voter affirmation. Announce candidates: Thursday, June 27th Voting start date: Tuesday, July 2nd, 2:00 pm UTC Voting end date: Tuesday, July 16th, 2:00 pm UTC See also Thinking about running for the Python Software Foundation Board of Directors? Let's talk! There's still one upcoming office hours session on June 18th, 12 PM UTC And For your consideration: Proposed bylaws changes to improve our membership experience 3 proposed bylaws changes Michael #2: Cloud engineer gets 2 years for wiping ex-employer's code repos Miklos Daniel Brody, a cloud engineer, was sentenced to two years in prison and a restitution of $529,000 for wiping the code repositories of his former employer in retaliation for being fired. The court documents state that Brody's employment was terminated after he violated company policies by connecting a USB drive. Brian #3: Python: Import by string with pkgutil.resolve_name() Adam Johnson You can use pkgutil.resolve_name("[HTML_REMOVED]:[HTML_REMOVED]")to import classes, functions or modules using strings. You can also use importlib.import_module("[HTML_REMOVED]") Both of these techniques are so that you have an object imported, but the end thing isn't imported into the local namespace. Michael #4: DuckDB goes 1.0 via Alex Monahan The cloud hosted product @MotherDuck also opened up General Availability Codenamed "Snow Duck" The core theme of the 1.0.0 release is stability. Extras Brian: Sending us topics. Please send before Tuesday. But any time is welcome. NumPy 2.0 htmx 2.0.0 Michael: Get 6 months of PyCharm Pro for free. Just take a course (even a free one) at Talk Python Training. Then visit your account page > details tab and have fun. Coming soon at Talk Python: Shiny for Python Joke: .gitignore thoughts won't let me sleep
This is a recap of the top 10 posts on Hacker News on June 16th, 2024.This podcast was generated by wondercraft.ai(00:40): MicroMac, a Macintosh for under £5Original post: https://news.ycombinator.com/item?id=40699684&utm_source=wondercraft_ai(02:11): Do not try to be the smartest in the room; try to be the kindestOriginal post: https://news.ycombinator.com/item?id=40695997&utm_source=wondercraft_ai(03:40): The Raspberry Pi 5 Is No Match for a Tini-Mini-Micro PCOriginal post: https://news.ycombinator.com/item?id=40697831&utm_source=wondercraft_ai(05:22): NLRB judge declares non-compete clause is an unfair labor practiceOriginal post: https://news.ycombinator.com/item?id=40696992&utm_source=wondercraft_ai(07:16): Building SimCity: How to put the world in a machineOriginal post: https://news.ycombinator.com/item?id=40698442&utm_source=wondercraft_ai(09:05): Experts vs. ImitatorsOriginal post: https://news.ycombinator.com/item?id=40699079&utm_source=wondercraft_ai(10:45): Simple sabotage for software (2023)Original post: https://news.ycombinator.com/item?id=40695839&utm_source=wondercraft_ai(12:13): NumPy 2.0Original post: https://news.ycombinator.com/item?id=40699470&utm_source=wondercraft_ai(13:52): European AlternativesOriginal post: https://news.ycombinator.com/item?id=40695754&utm_source=wondercraft_ai(15:40): Bouba/kiki effectOriginal post: https://news.ycombinator.com/item?id=40699583&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai
Topics covered in this episode: NumPy 2.0 release date is June 16 Uvicorn adds multiprocess workers pixi JupyterLab 4.2 and Notebook 7.2 are available Extras Joke Watch on YouTube About the show Sponsored by Mailtrap: pythonbytes.fm/mailtrap Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Show: @pythonbytes@fosstodon.org Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Tuesdays at 10am PT. Older video versions available there too. Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to our friends of the show list, we'll never share it. Brian #1: NumPy 2.0 release date is June 16 “This release has been over a year in the making, and is the first major release since 2006. Importantly, in addition to many new features and performance improvement, it contains breaking changes to the ABI as well as the Python and C APIs. It is likely that downstream packages and end user code needs to be adapted - if you can, please verify whether your code works with NumPy 2.0.0rc2.” NumPy 2.0.0 Release Notes NumPy 2.0 migration guide including “try just running ruff check path/to/code/ --select NPY201” “Many of the changes covered in the 2.0 release notes and in this migration guide can be automatically adapted in downstream code with a dedicated Ruff rule, namely rule NPY201.” Michael #2: Uvicorn adds multiprocess workers via John Hagen The goal was to no longer need to suggest that people use Gunicorn on top of uvicorn. Uvicorn can now in a sense "do it all” Steps to use it and background on how it works. Brian #3: pixi Suggested by Vic Kelson “pixi is a cross-platform, multi-language package manager and workflow tool built on the foundation of the conda ecosystem.” Tutorial: Doing Python development with Pixi Some quotes from Vic: “Pixi is a project manager, written in Rust, that allows you to build Python projects without having Python previously installed. It's installable with Homebrew (brew install pixi on Linux and MacOS). There's support in VSCode and PyCharm via plugins. By default, pixi fetches packages from conda-forge, so you get the scientific stack in a pretty reliable and performant build. If a package isn't on conda-forge, it'll look on PyPI, or I believe you can force it to look on PyPI if you like.” “So far, it works GREAT for me. What really impressed me is that I got a Jupyter environment with CuPy utilizing my aging Nvidia GPU on the FIRST TRY.” Michael #4: JupyterLab 4.2 and Notebook 7.2 are available JupyterLab 4.2.0 has been released! This new minor release of JupyterLab includes 3 new features, 20 enhancements, 33 bug fixes and 29 maintenance tasks. Jupyter Notebook 7.2.0 has also been released Highlights include Easier Workspaces Management with GUI Recently opened/closed files Full notebook windowing mode by default (renders only the cells visible in the window, leading to improved performance) Improved Shortcuts Editor Dark High Contrast Theme Extras Brian: Help test Python 3.13! Help us test free-threaded Python without the GIL both from Hugo van Kemenade Python Test 221: How to get pytest to import your code under test is out Michael: Bend follow up from Bernát Gábor “Bend looks roughly like Python but is nowhere there actually. For example it has no for loops, instead you're meant to use bend keyword (hence the language name) to expand calculations and another keyword to join branches. So basically think of something that resembles Python at high level, but without being compatible with that and without any of the standard library or packages the Python language provides. That being said does an impressive job at parallelization, but essentially it's a brand new language with new syntax and paradigms that you will have to learn, it just shares at first look similarities with Python the most.” Joke: Do-while
Topics covered in this episode: NumFOCUS concerns leaping pytest debugger llm Extra, Extra, Extra, PyPI has completed its first security audit Extras Joke Watch on YouTube About the show Sponsored by us! Support our work through: Our courses at Talk Python Training The Complete pytest Course Patreon Supporters Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Show: @pythonbytes@fosstodon.org Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too. Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to our friends of the show list, we'll never share it. Brian #1: NumFOCUS concerns Suggested by Pamphile Roy Write up of the current challenges faced by NumFOCUS, by Paul Ivanov (one of the OG of Scientific Python: Jupyter, Matplotlib, etc.) Struggling to meet the needs of sponsored and affiliated projects. In February, NumFOCUS announced it is moving in a new direction. NumFOCUS initiated an effort to run an election for open board seats and proposed changing its governance structure. Some projects are considering and actively pursuing alternative venues for fiscal sponsorship. Quite a bit more detail and discussion in the article. NumFOCUS covers a lot of projects NumPy, Matplotlib, pandas, Jupyter, SciPy, Astropy, Bokeh, Dask, Conda, and so many more. Michael #2: leaping pytest debugger llm You can ask Leaping questions like: Why am I not hitting function x? Why was variable y set to this value? What was the value of variable x at this point? What changes can I make to this code to make this test pass? Brian #3: Extra, Extra, Extra, 2024 Developer Summit Also suggested by Pamphile, related to Scientific Python The Second Scientific Python Developer Summit , June 3-5, Seattle, WA Lots of great work came out of the First Summit in 2023 pytest-regex - Use regexs to specify tests to run Came out of the '23 summit I'm not sure if I'm super happy about this or a little afraid that I probably could use this. Still, cool that it's here. Cool short example of using __init__ and __call__ to hand-roll a decorator. ruff got faster Michael #4: PyPI has completed its first security audit Trail of Bits spent a total of 10 engineer-weeks of effort identifying issues, presenting those findings to the PyPI team, and assisting us as we remediated the findings. Scope: The audit was focused on "Warehouse", the open-source codebase that powers pypi.org As a result of the audit, Trail of Bits detailed 29 different advisories discovered across both codebases. When evaluating severity level of each advisory, 14 were categorized as "informational", 6 as "low", 8 as "medium" and zero as "high". Extras Brian: pytest course community to try out Podia Communities. Anyone have a podia community running strong now? If so, let me know through Mastodon: @brianokken@fosstodon.org Want to join the community when it's up and running? Same. Or join our our friends of the show list, and read our newsletter. I'll be sure to drop a note in there when it's ready. Michael: VS Code AMA @ Talk Python [video] Gunicorn CVE Talk submissions are now open for both remote and in-person talks at the 2024 PyConZA? The conference will be held on 3 and 4 October 2024 in Cape Town, South Africa. Details are on za.pycon.org. FlaskCon 2024 will be happening Friday, May 17 inside PyCon US 2024. Call for proposals are now live! Joke: Debugging with your eyes
Explore the origins of NumPy and SciPy with their creator, Dr. Travis Oliphant. Discover the journey from personal need to global impact, the challenges overcome, and the future of these essential Python libraries in scientific computing and data science. This episode is brought to you by the DataConnect Conference (https://www.dataconnectconf.com/dccwest/conference), by Data Universe, the out-of-this-world data conference (https://datauniverse2024.com), and by CloudWolf (https://www.cloudwolf.com/sds), the Cloud Skills platform. Interested in sponsoring a SuperDataScience Podcast episode? Visit passionfroot.me/superdatascience for sponsorship information. In this episode you will learn: • Travis's journey to creating NumPy and SciPy [08:05] • How Anaconda got started [42:24] • How Numba, a high-performance Python compiler, was brought to market [54:48] • Python's influence on the thought processes of scientists and engineers [1:04:21] • The commercial projects that support Travis's vast open-source efforts and communities [1:10:22] • How to get involved in Travis's commercial projects and communities [1:22:34] • The future of scientific computing and Python libraries [1:29:50] Additional materials: www.superdatascience.com/765
Logan Kilpatrick leads developer relations at OpenAI, supporting developers building with the OpenAI API and ChatGPT. He is also on the board of directors at NumFOCUS, the nonprofit organization that supports open source projects like Jupyter, Pandas, NumPy, and more. Before OpenAI, Logan was a machine-learning engineer at Apple and advised NASA on open source policy. In our conversation, we discuss:• OpenAI's fast-paced and innovative work environment• The value of high agency and high urgency in your employees• Tips for writing better ChatGPT prompts• How the GPT Store is doing• OpenAI's planning process and decision-making criteria• Where OpenAI is heading in the next few years• Insight into OpenAI's B2B offerings• Why Logan “measures in hundreds”—Brought to you by:• Hex—Helping teams ask and answer data questions by working together• Whimsical—The iterative product workspace• Arcade Software—Create effortlessly beautiful demos in minutes—Find the transcript for this episode and all past episodes at: https://www.lennyspodcast.com/episodes/. Today's transcript will be live by 8 a.m. PT.—Where to find Logan Kilpatrick:• X: https://twitter.com/OfficialLoganK• LinkedIn: https://www.linkedin.com/in/logankilpatrick/• Website: https://logank.ai/—Where to find Lenny:• Newsletter: https://www.lennysnewsletter.com• X: https://twitter.com/lennysan• LinkedIn: https://www.linkedin.com/in/lennyrachitsky/—In this episode, we cover:(00:00) Logan's background(03:49) The impact of recent events on OpenAI's team and culture(08:20) Exciting developments in AI interfaces(09:52) Using OpenAI tools to make companies more efficient(13:04) Examples of using AI effectively(18:35) Prompt engineering(22:12) How to write better prompts(26:05) The launch of GPTs and the OpenAI Store(32:10) The importance of high agency and urgency(34:35) OpenAI's ability to move fast and ship high-quality products(35:56) OpenAI's planning process and decision-making criteria(40:22) The importance of real-time communication(42:33) OpenAI's team and growth(44:47) Future developments at OpenAI(47:42) GPT-5 and building toward the future(50:38) OpenAI's enterprise offering and the value of sharing custom applications(52:30) New updates and features from OpenAI(55:09) How to leverage OpenAI's technology in products(58:26) Encouragement for building with AI(59:30) Lightning round—Referenced:• OpenAI: https://openai.com/• Sam Altman on X: https://twitter.com/sama• Greg Brockman on X: https://twitter.com/gdb• tldraw: https://www.tldraw.com/• Harvey: https://www.harvey.ai/• Boost Your Productivity with Generative AI: https://hbr.org/2023/06/boost-your-productivity-with-generative-ai• Research: quantifying GitHub Copilot's impact on developer productivity and happiness: https://github.blog/2022-09-07-research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/• Lesson learnt from the DPD AI Chatbot swearing blunder: https://www.linkedin.com/pulse/lesson-learnt-from-dpd-ai-chatbot-swearing-blunder-kitty-sz57e/• Dennis Yang on LinkedIn: https://www.linkedin.com/in/dennisyang/• Tim Ferriss's blog: https://tim.blog/• Tyler Cowen on X: https://twitter.com/tylercowen• Tom Cruise on X: https://twitter.com/TomCruise• Canva: https://www.canva.com/• Zapier: https://zapier.com/• Siqi Chen on X: https://twitter.com/blader• Runway: https://runway.com/• Universal Primer: https://chat.openai.com/g/g-GbLbctpPz-universal-primer• “I didn't expect ChatGPT to get so good” | Unconfuse Me with Bill Gates: https://www.youtube.com/watch?v=8-Ymdc6EdKw• Microsoft Azure: https://azure.microsoft.com/• Lennybot: https://www.lennybot.com/• Visual Electric: https://visualelectric.com/• DALL-E: https://openai.com/research/dall-e• The One World Schoolhouse: https://www.amazon.com/One-World-Schoolhouse-Education-Reimagined/dp/1455508373/ref=sr_1_1• Why We Sleep: Unlocking the Power of Sleep and Dreams: https://www.amazon.com/Why-We-Sleep-Unlocking-Dreams/dp/1501144324• Gran Turismo: https://www.netflix.com/title/81672085• Gran Turismo video game: https://www.playstation.com/en-us/gran-turismo/• Manta sleep mask: https://mantasleep.com/products/manta-sleep-mask• WAOAW sleep mask: https://www.amazon.com/WAOAW-Sleep-Sleeping-Blocking-Blindfold/dp/B09712FSLY—Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com.—Lenny may be an investor in the companies discussed. Get full access to Lenny's Newsletter at www.lennysnewsletter.com/subscribe
How does a debugger work? What can you learn about Python by building one from scratch? Christopher Trudeau is back on the show this week, bringing another batch of PyCoder's Weekly articles and projects.
Allen Wyma talks with Ritchie Vink about his work on Polars, a DataFrame library written in Rust. Contributing to Rustacean Station Rustacean Station is a community project; get in touch with us if you'd like to suggest an idea for an episode or offer your services as a host or audio editor! Twitter: @rustaceanfm Discord: Rustacean Station Github: @rustacean-station Email: hello@rustacean-station.org Timestamps [@00:00] - Meet Ritchie Vink - Creator of Polars [@02:00] - What is a DataFrame? [@05:19] - Arrow [@07:26] - NumPy [@11:31] - Polars vs Pandas [@17:32] - Using Polars in app development [@25:24] - Python and Rust docs [@31:49] - Polars 1.0 release [@35:21] - What keeps Ritchie working on Polars [@37:27] - Growing Polars without bloat [@39:57] - Closing discussions Credits Intro Theme: Aerocity Audio Editing: Plangora Hosting Infrastructure: Jon Gjengset Show Notes: Plangora Hosts: Allen Wyma
Dans cet épisode, Katia, Arnaud et Emmanuel discutent les nouvelles de cette fin 2023. Le gatherer dans les stream Java, les exceptions, JavaScript dans la JVM, recherche vectorielle, coût du cloud, Gemini, Llama et autres animaux fantastiques et pleins d'outils sympathiques pour fêter la fin de l'année. Enregistré le 15 décembre 2023 Téléchargement de l'épisode LesCastCodeurs-Episode-304.mp3 News Aide Les Cast Codeurs et remplis un petit formulaire pour nous guider l'année prochaine https://lescastcodeurs.com/sondage Langages Avec JEP 461, arrivée dans en preview dans Java 22 de la notion de “gatherer” pour les streams https://groovy.apache.org/blog/groovy-gatherers dans cet article de Paul King, de l'équipe Groovy, il montre et contraste ce que l'on pouvait faire en Groovy depuis des années, comme des sliding windows, par exemple explique l'approche des gatherers avec ses opérations intermédiaires gatherer sont des operations intermediaires custom qui prennent un etat et le prochain element pour decided quoi faire, et meme changer le stream d'elements suivants (en publier) (via la fonction integrate certains peuvent permettre de combiner les resultats intermediaires (pour paralleliser) Examples : fenetres de taille fixe, fenettres glissantes Joe Duffy, qui est CEO de Pulumi, mais qui avait travaillé chez Microsoft sur le project Midori (un futur OS repensé) parle du design des exceptions, des erreurs, des codes de retour https://joeduffyblog.com/2016/02/07/the-error-model/ Il compare les codes d'erreurs, les exceptions, checked et non-checked il separe les bugs des erreurs attendues (bugs doivent arreter le process) il raconte l'histoire des unchecked exception et leurs problemes et des checked exceptopns et poourquoi les developeurs java les detestent (selon lui) long article maisn interessant dans ses retours mais lon je ne suis pas allé au bout :smile: Après la disparition de Nashorn dans le JDK, on peut se tourner vers le projet Javet https://www.caoccao.com/Javet/index.html Javet permet d'intégrer JavaScript avec le moteur V8 Mais aussi carrément Node.js c'est super comme capacité car on a les deux mielleurs moteurs, par contre le support hors x86 est plus limité (genre arm sous windows c'est non) Librairies Une partie de l'équipe Spring se fait lourder après le rachat effectif de Broadcom https://x.com/odrotbohm/status/1729231722498425092?s=20 peu d'info en vrai à part ce tweet mais l'acquisition Broadcome n'a pas l'air de se faire dans le monde des bisounours Marc Wrobel annonce la sortie de JBanking 4.2.0 https://www.marcwrobel.fr/sortie-de-jbanking-4-2-0 support de Java 21 possibilité de générer aléatoirement des BIC amélioration de la génération d'IBAN jbanking est une bibliotheque pour manipuler des structures typiques des banques comme les IBAN les BIC, les monnaies, les SEPA etc. Hibernate Search 7 est sorti https://in.relation.to/2023/12/05/hibernate-search-7-0-0-Final/ Support ElasticSearch 8.10-11 et openSearch 2.10-11 Rebasé sur Lucerne 9.8 support sur Amazon OpenSearch Serverless (experimental) attention sous ensemble de fonctionnalités sur Serverless, c'est un API first search cluster vendu a la lambda En lien aussi sur la version 7.1 alpha1 Hibernate ORM 6.4 est sorti https://in.relation.to/2023/11/23/orm-640-final/ support pour SoftDelete (colonne marquant la suppression) support pour les operations vectorielles (support postgreSQL initialement) les fonctions vectorielles sont particulièrement utilisées par l'IA/ML événement spécifiques JFR Intégration de citrus et Quarkus pour les tests d'intégrations de pleins de protocoles et formats de message https://quarkus.io/blog/testing-quarkus-with-citrus/ permet de tester les entrees / sorties attendues de systèmes de messages (HTTP, Kafka, serveur mail etc) top pour tester les application Event Driven pas de rapport mais Quarkus 3.7 ciblera Java 17 (~8% des gens utilisaient Java 11 dans les builds qui ont activé les notifications) Hibernate Search 7.1 (dev 7.1.0.Alpha1) avec dernière version de Lucene (9.8), Infinispan rajoute le support pour la recherche vectorielle. https://hibernate.org/search/releases/7.1/ https://infinispan.org/blog/2023/12/13/infinispan-vector-search Hibernate Search permet maintenant la recherche vectorielle La dernière version est intégrée en Infinispan 15 (dev) qui sortira La recherche vectoriolle et stockage de vecteurs, permettent convertir Infinispan en Embedding Store (langchain) Cloud Comment choisir sa region cloud https://blog.scottlogic.com/2023/11/23/conscientious-cloud-pick-your-cloud-region-deliberately.html pas si simple le coût la securité légale de vos données la consommation carbone de la région choisie (la France est top, la Pologne moins) la latence vs où sont vos clients les services supportés Web Vers une standardisation des Webhooks ? https://www.standardwebhooks.com/ Des gens de Zapier, Twilio, Ngrok, Kong, Supabase et autres, se rejoignent pour essayer de standardiser l'approche des Webhooks La spec est open source (Apache) sur Github https://github.com/standard-webhooks/standard-webhooks/blob/main/spec/standard-webhooks.md Les objectifs sont la sécurité, la reliabilité, l'interopérabilité, la simplicité et la compatibilité (ascendante / descendante) sans la spec, chaque webhook est different dans son comportement et donc les clients doivent s'adapter dans la sematique et les erreurs etc la (meta-) structure de la payload, la taille, la securisation via signature (e.g. hmac), les erreurs (via erreurs HTTP), etc Data et Intelligence Artificielle Google annonce Gemini, son nouveau Large Language Model https://blog.google/technology/ai/google-gemini-ai/#sundar-note modèle multimodal qui peut prendre du texte, en entrée, mais aussi des images, du son, des vidéos d'après les benchmarks, il est largement aussi bon que GPT4 plusieurs tailles de modèles disponible : Nano pour être intégré aux mobiles, Pro qui va être utilisé dans la majeure partie des cas, et Ultra pour les besoins de réflexion les plus avancés Android va rajouter aussi des librairies AICore pour utiliser Gemini Nano dans les téléphones Pixel https://android-developers.googleblog.com/2023/12/a-new-foundation-for-ai-on-android.html Gemini Pro va être disponible dans Bard (en anglais et dans 170 pays, mais l'Europe va devoir attendre un petit peu pour que ce soit dispo) Gemini Ultra devrait aussi rejoindre Bard, dans une version étendue https://blog.google/products/bard/google-bard-try-gemini-ai/ Gemini va être intégré progressivement dans plein de produits Google DeepMind parlant de Gemini https://deepmind.google/technologies/gemini/#introduction Un rapport de 60 pages sur Gemini https://storage.googleapis.com/deepmind-media/gemini/gemini_1_report.pdf Gemini a permis aussi de pouvoir développer une nouvelle version du modèle AlphaCode qui excelle dans les compétitions de coding https://storage.googleapis.com/deepmind-media/AlphaCode2/AlphaCode2_Tech_Report.pdf Liste de petites vidéos sur YouTube avec des interviews et démonstrations des capacités de Gemini https://www.youtube.com/playlist?list=PL590L5WQmH8cSyqzo1PwQVUrZYgLcGZcG malheureusement certaines des annonces sont un peu fausse ce qui a amené un discrédit (non du) sur Gemini par exemple la video “aspirationelle” était vendue comme du réel mais ce n'est pas le cas. et ultra n'est pas disponible encore ausso la comparaison de ChatGPT sur la page (initialement au moins) comparait des choux et des carottes, meme si le papier de recherche était correct Avec la sortie de Gemini, Guillaume a écrit sur comment appeler Gemini en Java https://glaforge.dev/posts/2023/12/13/get-started-with-gemini-in-java/ Gemini est multimodèle, donc on peut passer aussi bien du texte que des images, ou même de la vidéo Il y a un SDK en Java pour interagir avec l'API de Gemini Facebook, Purple Llama https://ai.meta.com/blog/purple-llama-open-trust-safety-generative-ai/ Opensource https://ai.meta.com/llama/ dans l'optique des modeles GenAI ouverts, Facebook fournit des outils pour faire des IA responsables (mais pas coupables :wink: ) notament des benchmarks pour evaluler la sureté et un classifier de sureté, par exemple pour ne pas generer du code malicieux (ou le rendre plus dur) llama purple sera un projet parapluie D'ailleurs Meta IBM, Red Hat et pleins d'autres ont annoncé l'AI Alliance pour une AI ouverte et collaborative entre académique et industriels. Sont notammenrt absent Google, OpenAI (pas ouvert) et Microsoft Juste une annouce pour l'instant mais on va voir ce que ces acteurs de l'AI Alliance feront de concret il y a aussi un guide d'utilisateur l'usage IA responsable (pas lu) Apple aussi se met aux librairies de Machine Learning https://ml-explore.github.io/mlx/build/html/index.html MLX est une librairie Python qui s'inspire fortement de NumPy, PyTorch, Jax et ArrayFire Surtout, c'est développé spécifiquement pour les Macs, pour tirer au maximum parti des processeurs Apple Silicon Dans un des repos Github, on trouve également des exemples qui font tourner nativement sur macOS les modèles de Llama, de Mistral et d'auters https://github.com/ml-explore/mlx-examples non seulement les Apple Silicon amis aussi la memoire unifiee CPU/GPU qui est une des raisons clés de la rapidité des macs Faire tourner Java dans un notebook Jupyter https://www.javaadvent.com/2023/12/jupyter-notebooks-and-java.html Max Andersen explore l'utilisation de Java dans les notebooks Jupyter, au lieu du classique Python il y a des kernels java selon vos besoins mais il faut les installer dans la distro jupyter qu'on utilise et c'est la que jbang installable via pip vient a la rescousse il installe automatiquement ces kernels en quelques lignes Outillage Sfeir liste des jeux orientés développeurs https://www.sfeir.dev/tendances/notre-selection-de-jeux-de-programmation/ parfait pour Noël mais c'est pour ceux qui veulent continuer a challenger leur cerveau après le boulot jeu de logique, jeu de puzzle avec le code comme forme, jeu autour du machine learning, jeu de programmation assembleur Les calendriers de l'Avent sont populaires pour les développeurs ! En particulier avec Advent of Code https://adventofcode.com/ Mais il y a aussi l'Advent of Java https://www.javaadvent.com/ Ou un calendrier pour apprendre les bases de SVG https://svg-tutorial.com/ Le calendrier HTML “hell” https://www.htmhell.dev/adventcalendar/ qui parle d'accessibilité, de web components, de balises meta, de toutes les choses qu'on peut très bien faire en HTML/CSS sans avoir besoin de JavaScript Pour les développeurs TypeScript, il y a aussi un calendrier de l'Avent pour vous ! https://typehero.dev/aot-2023 Un super thread de Clara Dealberto sur le thème de la “dataviz” (data visualization) https://twitter.com/claradealberto/status/1729447130228457514 Beaucoup d'outil librement accessibles sont mentionnés pour faire toutes sortes de visualisations (ex. treemap, dendros, sankey…) mais aussi pour la cartographie Quelques ressources de site qui conseillent sur l'utilisation du bon type de visualisation en fonction du problème et des données que l'on a notemment celui du financial time qui tiens dans une page de PDF Bref c'est cool mais c'est long a lire Une petite liste d'outils sympas - jc pour convertir la sortie de commandes unix en JSON https://github.com/kellyjonbrazil/jc - AltTab pour macOS pour avoir le même comportement de basculement de fenêtre que sous Windows https://alt-tab-macos.netlify.app/ - gron pour rendre le JSON grep-able, en transformant chaque valeur en ligne ressemblant à du JSONPath https://github.com/tomnomnom/gron - Marker, en Python, pour transformer des PDF en beau Markdown https://github.com/VikParuchuri/marker - n8n un outil de workflow open source https://n8n.io/ gron en fait montre des lignes avec des assignments genre jsonpath = value et tu peux ungroner apres pour revenir a du json Marker utilise du machine learning mais il halklucine moins que nougat (nous voilà rassuré) Docker acquiert Testcontainers https://techcrunch.com/2023/12/11/docker-acquires-atomicjar-a-testing-startup-that-raised-25m-in-january/ Annonce par AtomicJar https://www.atomicjar.com/2023/12/atomicjar-is-now-part-of-docker/ Annonce par Docker https://www.docker.com/blog/docker-whale-comes-atomicjar-maker-of-testcontainers/ Architecture Comment implémenter la reconnaissance de chanson, comme Shazam https://www.cameronmacleod.com/blog/how-does-shazam-work il faut d'abord passer en mode fréquence avec des transformées de Fourrier pour obtenir des spectrogrammes puis créer une sorte d'empreinte qui rassemble des pics de fréquences notables à divers endroits de la chanson d'associer ces pics pour retrouver un enchainement de tels pics de fréquence dans le temps l'auteur a partagé son implémentation sur Github https://github.com/notexactlyawe/abracadabra/blob/e0eb59a944d7c9999ff8a4bc53f5cfdeb07b39aa/abracadabra/recognise.py#L80 Il y avait également une très bonne présentation sur ce thème par Moustapha Agack à DevFest Toulouse https://www.youtube.com/watch?v=2i4nstFJRXU les pics associés sont des hash qui peut etre comparés et le plus de hash veut dire que les chansons sont plus similaires Méthodologies Un mémo de chez ThoughtWorks à propos du coding assisté par IA https://martinfowler.com/articles/exploring-gen-ai.html#memo-08 Avec toute une liste de questions à se poser dans l'utilisation d'un outil tel que Copilot Il faut bien réaliser que malheureusement, une IA n'a pas raison à 100% dans ses réponses, et même plutôt que la moitié du temps, donc il faut bien mettre à jour ses attentes par rapport à cela, car ce n'est pas magique La conclusion est intéressante aussi, en suggérant que grosso modo dans 40 à 60% des situations, tu peux arriver à 40 à 80% de la solution. Est-ce que c'est à partir de ce niveau là qu'on peut vraiment gagner du temps et faire confiance à l'IA ? Ne perdez pas trop de temps non plus à essayer de convaincre l'IA de faire ce que vous voulez qu'elle fasse. Si vous n'y arrivez pas, c'est sans doute parce que l'IA n'y arrivera même pas elle même ! Donc au-delà de 10 minutes, allez lire la doc, chercher sur Google, etc. notamment, faire genrer les tests par l'IA dans al foulée augmente les risques surtout si on n'est pas capable de bien relire le code si on introduit un choix de pattern genre flexbox en CSS, si c'est sur une question de sécuriter, vérifier (ceinture et bretelle) est-ce le framework de la semaine dernière? L'info ne sera pas dans le LLM (sans RAG) Quelles capacités sont nécessaires pour déployer un projet AI/ML https://blog.scottlogic.com/2023/11/22/capabilities-to-deploy-ai-in-your-organisation.html C'est le MLOps et il y a quelques modèles end to end Google, IBM mais vu la diversité des organisations, c'est difficile a embrasser ces versions completes ML Ops est une métier, data science est un metier, donc intégrer ces competences sachez gérer votre catalogue de données Construire un process pour tester vos modèles et continuellement La notion de culture de la recherche et sa gestion (comme un portefeuille financier, accepter d'arrêter des experience etc) la culture de la recherche est peu présente en engineering qui est de construire des choses qui foncitonnent c'est un monde pre LLM Vous connaissez les 10 dark patterns de l'UX ? Pour vous inciter à cliquer ici ou là, pour vous faire rester sur le site, et plus encore https://dodonut.com/blog/10-dark-patterns-in-ux-design/ Parmi les dark patterns couverts Confirmshaming Fake Urgency and the Fear of Missing Out Nagging Sneaking Disguised Ads Intentional Misdirection The Roach Motel Pattern Preselection Friend Spam Negative Option Billing or Forced Continuity L'article conclut avec quelques pistes sur comment éviter ces dark patterns en regardant les bons patterns de la concurrence, en testant les interactions UX, et en applicant beaucoup de bon sens ! les dark patterns ne sont pas des accidents, ils s'appuient sur la psychologie et sont mis en place specifiquement Comment choisir de belles couleurs pour la visualisation de données ? https://blog.datawrapper.de/beautifulcolors/ Plutôt que de penser en RGB, il vaut mieux se positionner dans le mode Hue Saturation Brightness Plein d'exemples montrant comment améliorer certains choix de couleurs Mieux vaut éviter des couleurs trop pures ou des couleurs trop brillantes et saturées Avoir un bon contraste Penser aussi aux daltoniens ! j'ai personnellement eu toujours du mal avec saturationm vs brightness faire que les cloueirs en noir et blanc soient separees evant de le remettre (en changeant la brightness de chaque couleur) ca aide les daltoniens eviter les couleurs aux 4 coins amis plutot des couleurs complementaires (proches) rouge orange et jaune (non saturé) et variations de bleu sont pas mal les couleurs saturées sont aggressives et stressent les gens Pourquoi vous devriez devenir Engineering Manager? https://charity.wtf/2023/12/15/why-should-you-or-anyone-become-an-engineering-manager/ L'article parle de l'évolution de la perception de l'engineering management qui n'est plus désormais le choix de carrière par défaut pour les ingénieurs ambitieux. Il met en évidence les défis auxquels les engineering managers sont confrontés, y compris les attentes croissantes en matière d'empathie, de soutien et de compétences techniques, ainsi que l'impact de la pandémie de COVID-19 sur l'attrait des postes de management. L'importance des bons engineering mnanagers est soulignée, car ils sont considérés comme des multiplicateurs de force pour les équipes, contribuant de manière significative à la productivité, à la qualité et au succès global dans les environnements organisationnels complexes. L'article fournit des raisons pour lesquelles quelqu'un pourrait envisager de devenir Engineering Manager, y compris acquérir une meilleure compréhension de la façon dont les entreprises fonctionnent, contribuer au mentorat et influencer les changements positifs dans la dynamique des équipes et les pratiques de l'industrie. Une perspective est présentée, suggérant que devenir Engineering manager peut conduire à la croissance personnelle et à l'amélioration des compétences de vie, telles que l'autorégulation, la conscience de soi, la compréhension des autres, l'établissement de limites, la sensibilité à la dynamique du pouvoir et la maîtrise des conversations difficiles. L'article encourage à considérer la gestion comme une occasion de développer et de porter ces compétences pour la vie. Sécurité LogoFAIL une faille du bootloader de beaucoup de machines https://arstechnica.com/security/2023/12/just-about-every-windows-and-linux-device-vulnerable-to-new-logofail-firmware-attack/ en gros en changeant les eimages qu'on voit au boot permet d'executer du code arbitraire au tout debuit de la securisation du UEFI (le boot le plus utilisé) donc c'est game over parce que ca demarre avant l'OS c'est pas une exploitation a distance, il faut etre sur la machine avec des droits assez elevés deja mais ca peut etre la fin de la chaine d'attaque et comme d'hab un interpreteur d'image est la cause de ces vulnerabilités Conférences L'IA au secours de conférences tech: rajoute des profile tech femme comme speaker au programme pour passer le test diversité online via des profiles fake. https://twitter.com/GergelyOrosz/status/1728177708608450705 https://www.theregister.com/2023/11/28/devternity_conference_fake_speakers/ https://www.developpez.com/actu/351260/La-conference-DevTernity-sur-la-technologie-s-e[…]s-avoir-cree-de-fausses-oratrices-generees-automatiquement/ j'avais lu le tweet du createur de cette conf qui expliquait que c'etait des comptes de tests et que pris dans le rush ils avaient oublié de les enlever mais en fait les comptes de tests ont des profils “Actifs” sur le reseaux sociaux apparemment donc c'était savamment orchestré Au final beaucoup de speakers et des sponsors se desengagent La liste des conférences provenant de Developers Conferences Agenda/List par Aurélie Vache et contributeurs : 31 janvier 2024-3 février 2024 : SnowCamp - Grenoble (France) 1 février 2024 : AgiLeMans - Le Mans (France) 6 février 2024 : DevFest Paris - Paris (France) 8-9 février 2024 : Touraine Tech - Tours (France) 15-16 février 2024 : Scala.IO - Nantes (France) 6-7 mars 2024 : FlowCon 2024 - Paris (France) 14-15 mars 2024 : pgDayParis - Paris (France) 19 mars 2024 : AppDeveloperCon - Paris (France) 19 mars 2024 : ArgoCon - Paris (France) 19 mars 2024 : BackstageCon - Paris (France) 19 mars 2024 : Cilium + eBPF Day - Paris (France) 19 mars 2024 : Cloud Native AI Day Europe - Paris (France) 19 mars 2024 : Cloud Native Wasm Day Europe - Paris (France) 19 mars 2024 : Data on Kubernetes Day - Paris (France) 19 mars 2024 : Istio Day Europe - Paris (France) 19 mars 2024 : Kubeflow Summit Europe - Paris (France) 19 mars 2024 : Kubernetes on Edge Day Europe - Paris (France) 19 mars 2024 : Multi-Tenancy Con - Paris (France) 19 mars 2024 : Observabiity Day Europe - Paris (France) 19 mars 2024 : OpenTofu Day Europe - Paris (France) 19 mars 2024 : Platform Engineering Day - Paris (France) 19 mars 2024 : ThanosCon Europe - Paris (France) 19-21 mars 2024 : IT & Cybersecurity Meetings - Paris (France) 19-22 mars 2024 : KubeCon + CloudNativeCon Europe 2024 - Paris (France) 26-28 mars 2024 : Forum INCYBER Europe - Lille (France) 28-29 mars 2024 : SymfonyLive Paris 2024 - Paris (France) 4-6 avril 2024 : Toulouse Hacking Convention - Toulouse (France) 17-19 avril 2024 : Devoxx France - Paris (France) 18-20 avril 2024 : Devoxx Greece - Athens (Greece) 25-26 avril 2024 : MiXiT - Lyon (France) 25-26 avril 2024 : Android Makers - Paris (France) 8-10 mai 2024 : Devoxx UK - London (UK) 16-17 mai 2024 : Newcrafts Paris - Paris (France) 24 mai 2024 : AFUP Day Nancy - Nancy (France) 24 mai 2024 : AFUP Day Poitiers - Poitiers (France) 24 mai 2024 : AFUP Day Lille - Lille (France) 24 mai 2024 : AFUP Day Lyon - Lyon (France) 2 juin 2024 : PolyCloud - Montpellier (France) 6-7 juin 2024 : DevFest Lille - Lille (France) 6-7 juin 2024 : Alpes Craft - Grenoble (France) 27-28 juin 2024 : Agi Lille - Lille (France) 4-5 juillet 2024 : Sunny Tech - Montpellier (France) 19-20 septembre 2024 : API Platform Conference - Lille (France) & Online 7-11 octobre 2024 : Devoxx Belgium - Antwerp (Belgium) 10-11 octobre 2024 : Volcamp - Clermont-Ferrand (France) 10-11 octobre 2024 : Forum PHP - Marne-la-Vallée (France) 17-18 octobre 2024 : DevFest Nantes - Nantes (France) Nous contacter Pour réagir à cet épisode, venez discuter sur le groupe Google https://groups.google.com/group/lescastcodeurs Contactez-nous via twitter https://twitter.com/lescastcodeurs Faire un crowdcast ou une crowdquestion Soutenez Les Cast Codeurs sur Patreon https://www.patreon.com/LesCastCodeurs Tous les épisodes et toutes les infos sur https://lescastcodeurs.com/
News teases an intriguing update from Chris McCord hinting at a groundbreaking feature in Phoenix and Elixir's capabilities. José Valim proposes local accumulators in Elixir, stirring discussions on the future of coding elegance. Supabase launches the innovative "libcluster_postgres" library, promising to enhance Elixir node discovery with Postgres. And for those seeking to crunch numbers differently, a must-read blog post lays out a roadmap for translating code in NumPy to Nx. Plus, Elixir enthusiasts are buzzing about this year's Advent of Code challenges—find out how the community tackles these puzzles with bespoke tooling and shared Livebooks strategies, and more! Show Notes online - http://podcast.thinkingelixir.com/180 (http://podcast.thinkingelixir.com/180) Elixir Community News - https://twitter.com/chris_mccord/status/1731668893213544900 (https://twitter.com/chris_mccord/status/1731668893213544900?utm_source=thinkingelixir&utm_medium=shownotes) – Teaser by Chris McCord hinting a new development in Phoenix and LiveView as a potent alternative to something. - https://elixirforum.com/t/local-accumulators-for-cleaner-comprehensions/60130 (https://elixirforum.com/t/local-accumulators-for-cleaner-comprehensions/60130?utm_source=thinkingelixir&utm_medium=shownotes) – José Valim's proposal on ElixirForum for adding local accumulators to cleaner list comprehensions in Elixir. - https://elixirforum.com/t/introducing-for-let-and-for-reduce/44773 (https://elixirforum.com/t/introducing-for-let-and-for-reduce/44773?utm_source=thinkingelixir&utm_medium=shownotes) – A discussion from two years ago on ElixirForum about a different variation of local accumulators proposal for Elixir. - https://twitter.com/kiwicopple/status/1730242820441588147 (https://twitter.com/kiwicopple/status/1730242820441588147?utm_source=thinkingelixir&utm_medium=shownotes) – Announcement of a newly released Elixir library called "libclusterpostgres" by Paul Copplestone from Supabase. - https://github.com/supabase/libcluster_postgres (https://github.com/supabase/libcluster_postgres?utm_source=thinkingelixir&utm_medium=shownotes) – GitHub repository for the "libclusterpostgres" library, used by Supabase for Elixir node discovery using a Postgres strategy. - https://www.thestackcanary.com/numpy-to-nx/ (https://www.thestackcanary.com/numpy-to-nx/?utm_source=thinkingelixir&utm_medium=shownotes) – A blog post that guides through translating NumPy code to Nx by providing side-by-side examples. - https://adventofcode.com/ (https://adventofcode.com/?utm_source=thinkingelixir&utm_medium=shownotes) – Link to the official Advent of Code website which is a popular coding challenge during the Christmas season. - https://github.com/mhanberg/advent-of-code-elixir-starter (https://github.com/mhanberg/advent-of-code-elixir-starter?utm_source=thinkingelixir&utm_medium=shownotes) – Mitch Hanberg's Advent of Code Starter Kit repository, which provides a template project for solving the Advent of Code challenges in Elixir. - https://notes.club (https://notes.club?utm_source=thinkingelixir&utm_medium=shownotes) – A platform that hosts a frontend of Livebooks on GitHub, organized by author, likes, and tags, useful for exploring how people are solving Advent of Code problems in Elixir. - https://github.com/ljgago/kino_aoc (https://github.com/ljgago/kino_aoc?utm_source=thinkingelixir&utm_medium=shownotes) – A GitHub repository for a Livebook Smart Cell which aids in solving Advent of Code directly from Livebook. - https://github.com/nettinho/smaoc (https://github.com/nettinho/smaoc?utm_source=thinkingelixir&utm_medium=shownotes) – Another Livebook Smart Cell repository on GitHub for Advent of Code that facilitates problem interaction within Livebook. Do you have some Elixir news to share? Tell us at @ThinkingElixir (https://twitter.com/ThinkingElixir) or email at show@thinkingelixir.com (mailto:show@thinkingelixir.com) Find us online - Message the show - @ThinkingElixir (https://twitter.com/ThinkingElixir) - Message the show on Fediverse - @ThinkingElixir@genserver.social (https://genserver.social/ThinkingElixir) - Email the show - show@thinkingelixir.com (mailto:show@thinkingelixir.com) - Mark Ericksen - @brainlid (https://twitter.com/brainlid) - Mark Ericksen on Fediverse - @brainlid@genserver.social (https://genserver.social/brainlid) - David Bernheisel - @bernheisel (https://twitter.com/bernheisel) - David Bernheisel on Fediverse - @dbern@genserver.social (https://genserver.social/dbern) - Cade Ward - @cadebward (https://twitter.com/cadebward) - Cade Ward on Fediverse - @cadebward@genserver.social (https://genserver.social/cadebward)
Guest Sophia Vargas Panelist Richard Littauer | Leslie Hawthorn Show Notes In this episode, Richard and Leslie welcome guest Sophia Vargas, a Researcher and Program Manager at Google. Sophia's journey from data center research to the open source ecosystem is explored, and the tactical support she provides to projects. She highlights the challenge of understanding contributors' motivations, particularly in the context of financial incentives. The episode explores Google's role in open source, delves into the complexities of funding and motivation, and uncovers the often unseen “glue work” that binds open source communities together. Tune in to gain insights into the dynamic world of open source sustainability and the quest of a balanced ecosystem. Download this episode now! [00:02:07] Sophia explains her transition from data center infrastructure research to open source ecosystem research. She discusses her role in understanding how Google interacts with open source and supports projects. [00:05:26] Sophia emphasizes the importance of understanding motivation in open source contributions, noting that financial incentives aren't the primary driver. She discusses Google's role in open source and its investments in various programs and engagements. Her research also delves into understanding why people contribute to open source and what keeps them engaged. [00:09:42] We hear how Sophia overlaps between her work in the CHAOSS community and her research at Google, particularly in metrics and understanding project dynamics. [00:12:16] Richard raises a question on how open source projects can receive funding without becoming overly dependent on it. Sophia explains that she's actively researching this topic to understand the dynamics of funding and motivation in open source. She mentions her previous research has focused on contributors' motivations, and now she's investigating how money impacts those motivations. [00:16:48] Sophia emphasizes that the core focus of her research is on understanding individual contributors and how money might affect their involvement. She points out the challenges of determining the impact of different funding levels on contributors' behavior. [00:18:25] She mentions the potential impact of formal agreements and expectations tied to funding and the discussion touches on how projects can shift from being hobbies to more professional roles due to funding. [00:20:31] Richard asks about existing research in fields beyond open source that might shed light on this issue, and Sophia mentions volunteer energy research and discusses the gaps in understanding the complex relationship between individuals, their motivations, and funding. [00:22:49] Richard raises the questions of whether the motivations of young individuals, particularly from the global south, are aligned with contributing to open source, and Sophia shares her thoughts on this explaining how it's difficult to measure. [00:26:51] Leslie discusses the challenges of quantifying and acknowledging engagement that doesn't manifest as code commits or traditional contributions. Sophia adds to this highlighting the “glue work” that often goes unnoticed, including tasks related to communication, event management, and coordination. She talks about ideas such as adapting processes to better track non-coding activities and using existing communication channels to reveal hidden contributions. [00:33:13] Richard wonders how one can effectively limit and define the scope of open source given its extensive nature. Sophia cites a research effort by the Complex Systems Center that aimed to count open source activity outside of GitHub to highlight the ecosystem's size, and she emphasizes the importance of seeking exposure to diverse open source spaces, projects, conferences, and ideals to avoid bias and gain a comprehensive understanding. [00:36:32] Find out where you can follow Sophia on the web. Spotlight [00:37:24] Leslie's spotlight is her first boss, Joseph Nguyen. [00:37:53] Richard's spotlight is the Green Mountain Club and the Appalachian Mountain Club. [00:38:42] Sophia's spotlight is Inessa Pawson, a maintainer at NumPy. Links SustainOSS (https://sustainoss.org/) SustainOSS Twitter (https://twitter.com/SustainOSS?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor) SustainOSS Discourse (https://discourse.sustainoss.org/) podcast@sustainoss.org (mailto:podcast@sustainoss.org) SustainOSS Mastodon (https://mastodon.social/tags/sustainoss) Open Collective-SustainOSS (Contribute) (https://opencollective.com/sustainoss) Richard Littauer Twitter (https://twitter.com/richlitt?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor) Leslie Hawthorn Twitter (https://twitter.com/lhawthorn?lang=en) Sophia Vargas LinkedIn (https://www.linkedin.com/in/sophia-vargas-54608220) Google Open Source (https://opensource.google/) CHAOSS (https://chaoss.community/about-chaoss/) What motivates open source software contributors? (article) (https://opensource.com/article/21/4/motivates-open-source-contributors) The Shifting Sands of Motivation: Revisiting What Drives Contributors in Open Source (article) (https://arxiv.org/abs/2101.10291) Do I Belong? Modeling Sense of Virtual Community Among Linux Kernel Contributors (article) (https://arxiv.org/pdf/2301.06437.pdf) The penumbra of open source: projects outside of centralized platforms are longer maintained, more academic and more collaborative (article) (https://arxiv.org/pdf/2106.15611.pdf) Sustainability Forecasting for Apache Incubator Projects (article) (https://arxiv.org/pdf/2105.14252.pdf) ACROSS (Attributing Contributor Roles in Open Source Software) (https://whodoesthe.dev/) Why contributions count? Analysis of attribution in open source (article) (https://arxiv.org/abs/2103.11007) Green Mountain Club (https://www.greenmountainclub.org/) Appalachian Mountain Club (https://www.outdoors.org/) Inessa Pawson GitHub (https://github.com/InessaPawson) Credits Produced by Richard Littauer (https://www.burntfen.com/) Edited by Paul M. Bahr at Peachtree Sound (https://www.peachtreesound.com/) Show notes by DeAnn Bahr Peachtree Sound (https://www.peachtreesound.com/) Special Guest: Sophia Vargas.
Want to help define the AI Engineer stack? Have opinions on the top tools, communities and builders? We're collaborating with friends at Amplify to launch the first State of AI Engineering survey! Please fill it out (and tell your friends)!If AI is so important, why is its software so bad?This was the motivating question for Chris Lattner as he reconnected with his product counterpart on Tensorflow, Tim Davis, and started working on a modular solution to the problem of sprawling, monolithic, fragmented platforms in AI development. They announced a $30m seed in 2022 and, following their successful double launch of Modular/Mojo
Python en Excel / Traductor (casi) universal de Facebook / Cadena perpetua por estafa cripto / Airbus 380 reconquista el cielo / Caos de basura espacial Patrocinador: El 23 de agosto llega a Disney Plus la nueva serie original de Star Wars: Ahsoka (tráiler). Volvemos a ver en pantalla a la rebelde Ahsoka Tano en una épica aventura llena de acción, intriga y emociones desbordantes, que solo podrás disfrutar en Disney Plus. Python en Excel / Traductor (casi) universal de Facebook / Cadena perpetua por estafa cripto / Airbus 380 reconquista el cielo / Caos de basura espacial
Topics covered in this episode: Differentiating between writing down dependencies to use packages and for packages themselves PythonMonkey Quirks of Python package versioning bear-type Extras Joke Watch on YouTube About the show Sponsored by us! Support our work through: Our courses at Talk Python Training Python People Podcast Patreon Supporters Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Show: @pythonbytes@fosstodon.org Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too. Brian #1: Differentiating between writing down dependencies to use packages and for packages themselves Brett Cannon Why can't we just use pyproject.toml and stop using requirements.txt? Nope. At least not yet. They're currently for different things. pyproject.toml There's project.dependencies and project.optional-dependencies.tests that kinda would work for listing dependencies for an app. But you can't say pip install -r pyproject.toml. It doesn't work. And that's weird. project is intended for packaged projects. requirements.txt for applications and other non-packaged projects It has specific versions works great with pip What then? Either we stick with requirements.txt Or we invent some other file, maybe requirements.toml? Or maybe (Brian's comment), add something like [application] and application.dependencies and application.optional-dependencies.tests to pyproject.toml Michael #2: PythonMonkey PythonMonkey is a Mozilla SpiderMonkey JavaScript engine embedded into the Python VM, using the Python engine to provide the JS host environment. This product is in an early stage, approximately 80% to MVP as of July 2023. It is under active development by Distributive. External contributions and feedback are welcome and encouraged. It will enable JavaScript libraries to be used seamlessly in Python code and vice versa — without any significant performance penalties. Call Python packages like NumPy from within a JavaScript library, or use NPM packages like [crypto-js](https://www.npmjs.com/package/crypto-js) directly from Python. Executing WebAssembly modules in Python becomes trivial using the WebAssembly API and engine from SpiderMonkey. More details in Will Pringle's article. Brian #3: Quirks of Python package versioning Seth Larson Yes, we have SemVer, 1.2.3, and CalVer, 2023.6.1, and suffixes for pre-release, 1.2.3pre1. But it gets way more fun than that, if you get creative Here's a few v is an optional prefix, like v.1.0 You can include an “Epoch” and separate it from the version with a !, like 20!1.2.3 Local versions with alphanumerics, periods, dashes, underscores, like 1.0.0+ubuntu-1. PyPI rejects those. That's probably good. Long versions. There's no max length for a version number. How about 1.2.3.4000000000000000001? Pre, post, dev aren't mutually exclusive: 1.0.0-pre0-post0-dev0 More craziness in article - Michael #4: bear-type Beartype is an open-source PEP-compliant near-real-time pure-Python runtime type-checker emphasizing efficiency, usability, and thrilling puns. Annotate @beartype-decorated classes and callables with type hints. Call those callables with valid parameters: Transparent Call those callables with invalid parameters: Boom Traceback: raise exception_cls( beartype.roar.BeartypeCallHintParamViolation: @beartyped quote_wiggum() parameter lines=[b'Oh, my God! A horrible plane crash!', b'Hey, everybody! Get a load of thi...'] violates type hint list[str], as list item 0 value b'Oh, my God! A horrible plane crash!' not str. Extras Brian: Python Testing with Pytest Course Bundle: Limited Pre-Release Beta Use code PYTHONBYTES now through Aug 31for 20% discount (discount extended through the end of the month) What's a pre-release beta? There's a video. Check out the link. Error-tolerant pytest discovery in VSCode Finally! But you gotta turn it on. Also, I gotta talk to them about the proper non-capitalization of pytest. We're at RC1 for Python 3.12.0 Hard to believe it's that time of year again Michael: PyPI hires a Safety & Security Engineer, welcome Mike Fiedler PackagingCon October 26-28 Cloud Builders: Python Conf (born in Ukraine): September 6, 2023 | online Joke: Learning JavaScript
Этот выпуск посвятили аналитике данных и пригласили Андрея Татаринова, Epoch8, чтобы обсудить: как Андрей попал в выпуск Epoch8 отличие веб-разработчиков от тех «кто молотит данные» как устроен Pandas колоночная база данных первая проблема в обработке данных почему строка в Python — сложно про Pandas все так хорошо, что NumPy доживает свои дни? «при использовании Pandas бояться деградации скорости из-за копирований не надо» оптиизации почему Polars быстрый Polars vs. Pandas брать ли Polars вместо Pandas оптимизация больших проектов «с Pandas 2.0 преимущество стало меньше» тренд на ускорение и Rust анекдот и ответ на вопрос «Polars это неполная замена Pandas» ответ на вопрос, почему inplace не убрали в Pandas 2.0 ответ на вопрос, как оптимизировать работу при выгрузке отчетов в хsl. Ведущий: Михаил Корнеев и Григорий Петров Ссылки выпуска: Канал Миши в Telegram — https://t.me/tricky_python Канал Moscow Python в Telegram — https://t.me/moscow_python Все выпуски — https://podcast.python.ru Митапы MoscowPython — https://moscowpython.ru Курс Learn Python — https://learn.python.ru/
We are now launching our dedicated new YouTube and Twitter! Any help in amplifying our podcast would be greatly appreciated, and of course, tell your friends! Notable followon discussions collected on Twitter, Reddit, Reddit, Reddit, HN, and HN. Please don't obsess too much over the GPT4 discussion as it is mostly rumor; we spent much more time on tinybox/tinygrad on which George is the foremost authority!We are excited to share the world's first interview with George Hotz on the tiny corp!If you don't know George, he was the first person to unlock the iPhone, jailbreak the PS3, went on to start Comma.ai, and briefly “interned” at the Elon Musk-run Twitter. Tinycorp is the company behind the deep learning framework tinygrad, as well as the recently announced tinybox, a new $15,000 “luxury AI computer” aimed at local model training and inference, aka your “personal compute cluster”:* 738 FP16 TFLOPS* 144 GB GPU RAM* 5.76 TB/s RAM bandwidth* 30 GB/s model load bandwidth (big llama loads in around 4 seconds)* AMD EPYC CPU* 1600W (one 120V outlet)* Runs 65B FP16 LLaMA out of the box (using tinygrad, subject to software development risks)(In the episode, we also talked about the future of the tinybox as the intelligence center of every home that will help run models, at-home robots, and more. Make sure to check the timestamps
Host Chris Adams is joined by the GSF's Asim Hussain on this episode of The Week in Green Software. They discuss some interesting news about Amazon, AWS and their scope 3 GHG protocol emission data. We also find out how Python has got its Mojo back and we have a very exciting tool from Catchpoint WebpageTest for measuring site's carbon footprint. Finally, some great green software events that you can be part of!
Guests Russell Keith-Magee | Uriel Ofir Panelist Richard Littauer Show Notes Hello and welcome to Sustain! The podcast where we talk about sustaining open source for the long haul. This is a special podcast and one of several in this series for GitHub's Maintainer Month. We're interviewing maintainers to ask them about their experience of open source and their experience of living as maintainers. Our first guest is Dr. Russell Keith-Magee, who's the Founder of the BeeWare Project and Software Engineer at Anaconda working on BeeWare in the OSS team. Russell talks about starting the project, the challenges of transitioning from an author to a maintainer, and the role of Anaconda in the Python ecosystem. Then we'll have a conversation with our next guest, Uriel Ofir, who's the Founder and Manager of Ma'akaf, an open source Israel community. Uriel tells us all about Ma'akaf, the importance of members being serious and proactive in contributing to the community, and how they encourage participants to contribute and improve their skills through an “open source party.” Hit download to hear much more! Russell: [00:01:40] Russell explains his role at Anaconda and being the Founder of the BeeWare Project. [00:03:43] The role of Anaconda in the Python ecosystem and the company's open source offerings is discussed. [00:04:15] Russell discusses the process of starting the BeeWare project. [00:08:03] We hear about the funding problem in open source and how development is something that needs to be looked at. [00:10:15] He tells us the challenges of transitioning from an author to maintainer. [00:11:51] What's hard for Russell as a maintainer? He mentions struggling when you don't see progress and the difficulties of finding maintainers with the necessary skillset. [00:14:35] There's been a lot of effort trying to document the onboarding process and making it smoother for new contributors. [00:15:21] Russell's excited about the prospect of iOS and Android becoming officially supported platforms in C Python and the progress they've made after nine years of work. [00:16:28] Find out where you can follow Russell and read about his work on the web. Uriel: [00:18:23] Our next guest, Uriel Ofir, joins us and he tells us about Ma'akaf. [00:22:28] He explains the community currently has around 350 members and four active projects. [00:26:09] He started the community even after he got a job and manages it by delegating tasks and empowering members to take responsibility for projects, and he emphasizes the importance of members being serious and proactive in contributing to the company. [00:29:48] Uriel shares how they encourage participants who may be hesitant to contribute to their open source community by hosting an “open source party” where everyone is welcome to introduce themselves and ask for help or advice. [00:31:43] Uriel tells us where people can join this group and follow him on the web. Quotes Russell: [00:02:08] “The Python ecosystem is only successful because of the open source component of it.” [00:06:51] “I made some challenge coins that we would give out to anyone who made a contribution.” [00:08:22] “Open source has a funding problem. There are big problems that don't get solved unless you have someone working on them full-time.” [00:11:22] “Having to create the project was an inconvenience before I can get to the point of having other work on it with me.” [00:11:29] “Open source really is a community. The whole thing is built around people working together for a greater good.” [00:15:05] “The more you can remove every possible obstacle to someone getting that first contribution, the more likely they are to contribute, not necessarily more likely to hang around long term.” Uriel: [00:28:33] “The most important thing about open source is to be serious.” [00:32:07] “I really want people to contact me around the world because this idea is scalable, and I would be happy to help you with that.” Links SustainOSS (https://sustainoss.org/) SustainOSS Twitter (https://twitter.com/SustainOSS?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor) SustainOSS Discourse (https://discourse.sustainoss.org/) podcast@sustainoss.org (mailto:podcast@sustainoss.org) SustainOSS Mastodon (https://mastodon.social/tags/sustainoss) Richard Littauer Twitter (https://twitter.com/richlitt?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor) Richard Littauer Mastodon (https://mastodon.social/@richlitt) Russell Keith-Magee Mastodon (https://cloudisland.nz/@freakboy3742) BeeWare (https://beeware.org/) BeeWare GitHub (https://github.com/beeware) Sustain Podcast-Episode 64: Travis Oliphant and Russell Pekrul on NumPy, Anaconda, and giving back with FairOSS (https://podcast.sustainoss.org/64) Uriel Ofir Twitter (https://twitter.com/OfirUriel?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor) Uriel Ofir GitHub (https://github.com/UrielOfir) Uriel Ofir LinkedIn (https://il.linkedin.com/in/uriel-ofir) International OS Party (https://maintainermonth.github.com/schedule/osparty_israel) Credits Produced by Richard Littauer (https://www.burntfen.com/) Edited by Paul M. Bahr at Peachtree Sound (https://www.peachtreesound.com/) Show notes by DeAnn Bahr Peachtree Sound (https://www.peachtreesound.com/) Special Guests: Russell Keith-Magee and Uriel Ofir.
Are you still using loops and lists to process your data in Python? Have you heard of a Python library with optimized data structures and built-in operations that can speed up your data science code? This week on the show, Jodie Burchell, developer advocate for data science at JetBrains, returns to share secrets for harnessing linear algebra and NumPy for your projects.
We're so glad to launch our first podcast episode with Logan Kilpatrick! This also happens to be his first public interview since joining OpenAI as their first Developer Advocate. Thanks Logan!Recorded in-person at the beautiful StudioPod studios in San Francisco. Full transcript is below the fold.Timestamps* 00:29: Logan's path to OpenAI* 07:06: On ChatGPT and GPT3 API* 16:16: On Prompt Engineering* 20:30: Usecases and LLM-Native Products* 25:38: Risks and benefits of building on OpenAI* 35:22: OpenAI Codex* 42:40: Apple's Neural Engine* 44:21: Lightning RoundShow notes* Sam Altman's interview with Connie Loizos* OpenAI Cookbook* OpenAI's new Embedding Model* Cohere on Word and Sentence Embeddings* (referenced) What is AGI-hard?Lightning Rounds* Favorite AI Product: https://www.synthesia.io/* Favorite AI Community: MLOps * One year prediction: Personalized AI, https://civitai.com/* Takeaway: AI Revolution is here!Transcript[00:00:00] Alessio Fanelli: Hey everyone. Welcome to the Latent Space podcast. This is Alessio, partner and CTO in residence at Decibel Partners. I'm joined by my cohost, swyx writer editor of L Space Diaries. Hey.[00:00:20] swyx: Hey . Our guest today is Logan Kilpatrick. What I'm gonna try to do is I'm gonna try to introduce you based on what people know about you, and then you can fill in the blanks.[00:00:28] Introducing Logan[00:00:28] swyx: So you are the first. Developer advocate at OpenAI, which is a humongous achievement. Congrats. You're also the lead developer community advocate of the Julia language. I'm interested in a little bit of that and apparently as I've did a bit of research on you, you got into Julia through NASA where you interned and worked on stuff that's gonna land on the moon apparently.[00:00:50] And you are also working on computer vision at Apple. And had to sit at path, the eye as you fell down the machine learning rabbit hole. What should people know about you that's kind of not on your LinkedIn that like sort of ties together your interest[00:01:02] Logan Kilpatrick: in story? It's a good question. I think so one of the things that is on my LinkedIn that wasn't mentioned that's super near and dear to my heart and what I spend a lot of time in sort of wraps a lot of my open source machine learning developer advocacy experience together is supporting NumFOCUS.[00:01:17] And NumFOCUS is the nonprofit that helps enable a bunch of the open source scientific projects like Julia, Jupyter, Pandas, NumPy, all of those open source projects are. Facilitated legal and fiscally through NumFOCUS. So it's a very critical, important part of the ecosystem and something that I, I spend a bunch of my now more limited free time helping support.[00:01:37] So yeah, something that's, It's on my LinkedIn, but it's, it's something that's important to me. Well,[00:01:42] swyx: it's not as well known of a name, so maybe people kind of skip over it cuz they were like, I don't know what[00:01:45] Logan Kilpatrick: to do with this. Yeah. It's super interesting to see that too. Just one point of context for that is we tried at one point to get a Wikipedia page for non focus and it's, it's providing, again, the infrastructure for, it's like a hundred plus open source scientific projects and they're like, it's not notable enough.[00:01:59] I'm like, well, you know, there's something like 30 plus million developers around the world who use all these open source tools. It's like the foundation. All open source like science that happens. Every breakthrough in science is they discovered the black hole, the first picture of the black hole, all that stuff using numb focus tools, the Mars Rovers, NumFOCUS tools, and it's interesting to see like the disconnect between the nonprofit that supports those projects and the actual success of the projects themselves.[00:02:26] swyx: Well, we'll, we'll get a bunch of people focused on NumFOCUS and we'll get it on Wikipedia. That that is our goal. . That is the goal. , that is our shot. Is this something that you do often, which is you? You seem to always do a lot of community stuff. When you get into something, you're also, I don't know where this, where you find time for this.[00:02:42] You're also a conference chair for DjangoCon, which was last year as well. Do you fall down the rabbit hole of a language and then you look for community opportunities? Is that how you get into.[00:02:51] Logan Kilpatrick: Yeah, so the context for Django stuff was I'd actually been teaching and still am through Harvard's division of continuing education as a teaching fellow for a Django class, and had spent like two and a half years actually teaching students every semester, had a program in Django and realized that like it was kind of the one ecosystem or technical tool that I was using regularly that I wasn't actually contributing to that community.[00:03:13] So, I think sometime in 2021 like applied to be on the board of directors of the Django Events Foundation, north America, who helps run DjangoCon and was fortunate enough to join a support to be the chair of DjangoCon us and then just actually rolled off the board because of all the, all the craziness and have a lot less free time now.[00:03:32] And actually at PATH ai. Sort of core product was also using, was using Django, so it also had a lot of connections to work, so it was a little bit easier to justify that time versus now open ai. We're not doing any Django stuff unfortunately, so, or[00:03:44] swyx: Julia, I mean, should we talk about this? Like, are you defecting from Julia?[00:03:48] What's going on? ,[00:03:50] Logan Kilpatrick: it's actually felt a little bit strange recently because I, for the longest time, and, and happy to talk about this in the context of Apple as well, the Julie ecosystem was my outlet to do a lot of the developer advocacy, developer relations community work that I wanted to do. because again, at Apple I was just like training machine learning models.[00:04:07] Before that, doing software engineering at Apple, and even at Path ai, we didn't really have a developer product, so it wasn't, I was doing like advocacy work, but it wasn't like developer relations in the traditional sense. So now that I'm so deeply doing developer relations work at Open OpenAI, it's really difficult to.[00:04:26] Continue to have the energy after I just spent nine hours doing developer relations stuff to like go and after work do a bunch more developer relations stuff. So I'll be interested to see for myself like how I'm able to continue to do that work and I. The challenge is that it's, it's such critical, important work to happen.[00:04:43] Like I think the Julie ecosystem is so important. I think the language is super important. It's gonna continue to grow in, in popularity, and it's helping scientists and engineers solve problems they wouldn't otherwise be able to. So it's, yeah, the burden is on me to continue to do that work, even though I don't have a lot of time now.[00:04:58] And I[00:04:58] Alessio Fanelli: think when it comes to communities, the machine learning technical community, I think in the last six to nine months has exploded. You know, you're the first developer advocate at open ai, so I don't think anybody has a frame of reference on what that means. What is that? ? So , what do you, how did, how the[00:05:13] swyx: job, yeah.[00:05:13] How do you define the job? Yeah, let's talk about that. Your role.[00:05:16] Logan Kilpatrick: Yeah, it's a good question and I think there's a lot of those questions that actually still exist at OpenAI today. Like I think a lot of traditional developed by advocacy, at least like what you see on Twitter, which I think is what a lot of people's perception of developer advocacy and developer relations is, is like, Just putting out external content, going to events, speaking at conferences.[00:05:35] And I think OpenAI is very unique in the sense that, at least at the present moment, we have so much inbound interest that there's, there is no desire for us to like do that type of developer advocacy work. So it's like more from a developer experience point of view actually. Like how can we enable developers to be successful?[00:05:53] And that at the present moment is like building a strong foundation of documentation and things like that. And we had a bunch of amazing folks internally who were. Who were doing some of this work, but it really wasn't their full-time job. Like they were focused on other things and just helping out here and there.[00:06:05] And for me, my full-time job right now is how can we improve the documentation so that people can build the next generation of, of products and services on top of our api. And it's. Yeah. There's so much work that has to happen, but it's, it's, it's been a ton of fun so far. I find[00:06:20] swyx: being in developer relations myself, like, it's kind of like a fill in the blanks type of thing.[00:06:24] Like you go to where you, you're needed the most open. AI has no problem getting attention. It is more that people are not familiar with the APIs and, and the best practices around programming for large language models, which is a thing that did not exist three years ago, two years ago, maybe one year ago.[00:06:40] I don't know. When she launched your api, I think you launched Dall-E. As an API or I, I don't[00:06:45] Logan Kilpatrick: know. I dunno. The history, I think Dall-E was, was second. I think it was some of the, like GPT3 launched and then GPT3 launched and the API I think like two years ago or something like that. And then Dali was, I think a little more than a year ago.[00:06:58] And then now all the, the Chachi Beast ChatGPT stuff has, has blown it all outta the water. Which you have[00:07:04] swyx: a a wait list for. Should we get into that?[00:07:06] Logan Kilpatrick: Yeah. .[00:07:07] ChatGPT[00:07:07] Alessio Fanelli: Yeah. We would love to hear more about that. We were looking at some of the numbers you went. Zero to like a million users in five days and everybody, I, I think there's like dozens of ChatGPT API wrappers on GitHub that are unofficial and clearly people want the product.[00:07:21] Like how do you think about that and how developers can interact with it.[00:07:24] Logan Kilpatrick: It. It's absolutely, I think one of the most exciting things that I can possibly imagine to think about, like how much excitement there was around ChatGPT and now getting to hopefully at some point soon, put that in the hands of developers and see what they're able to unlock.[00:07:38] Like I, I think ChatGPT has been a tremendous success, hands down without a question, but I'm actually more excited to see what developers do with the API and like being able to build those chat first experiences. And it's really fascinating to see. Five years ago or 10 years ago, there was like, you know, all this like chatbot sort of mm-hmm.[00:07:57] explosion. And then that all basically went away recently, and the hype went to other places. And I think now we're going to be closer to that sort of chat layer and all these different AI chat products and services. And it'll be super interesting to see if that sticks or not. I, I'm not. , like I think people have a lot of excitement for ChatGPT right now, but it's not clear to me that that that's like the, the UI or the ux, even though people really like it in the moment, whether that will stand the test of time, I, I just don't know.[00:08:23] And I think we'll have to do a podcast in five years. Right. And check in and see whether or not people are still really enjoying that sort of conversational experience. I think it does make sense though cause like that's how we all interact and it's kind of weird that you wouldn't do that with AI products.[00:08:37] So we. and I think like[00:08:40] Alessio Fanelli: the conversational interface has made a lot of people, first, the AI to hallucinate, you know, kind of come up with things that are not true and really find all the edge cases. I think we're on the optimism camp, you know, like we see the potential. I think a lot of people like to be negative.[00:08:56] In your role, kind of, how do you think about evangelizing that and kind of the patience that sometimes it takes for these models to become.[00:09:03] Logan Kilpatrick: Yeah, I think what, what I've done is just continue to scream from the, the mountains that like ChatGPT has, current form is definitely a research preview. The model that underlies ChatGPT GPT 3.5 is not a research preview.[00:09:15] I think there's things that folks can do to definitely reduce the amount of hall hallucinations and hopefully that's something that over time I, I, again have full confidence that it'll, it'll solve. Yeah, there's a bunch of like interesting engineering challenges. you have to solve in order to like really fix that problem.[00:09:33] And I think again, people are, are very fixated on the fact that like in, you know, a few percentage points of the conversations, things don't sound really good. Mm-hmm. , I'm really more excited to see, like, again when the APIs and the Han developers like what are the interesting solutions that people come up with, I think there's a lot that can be explored and obviously, OpenAI can explore all them because we have this like one product that's using the api.[00:09:56] And once you get 10,000, a hundred thousand developers building on top of that, like, we'll see what are the different ways that people handle this. And I imagine there's a lot of low-hanging fruit solutions that'll significantly improve the, the amount of halluc hallucinations that are showing up. Talk about[00:10:11] swyx: building on top of your APIs.[00:10:13] Chat GPTs API is not out yet, but let's assume it is. Should I be, let's say I'm, I'm building. A choice between GP 3.5 and chat GPT APIs. As far as I understand, they are kind of comparable. What should people know about deciding between either of them? Like it's not clear to me what the difference is.[00:10:33] Logan Kilpatrick: It's a great question.[00:10:35] I don't know if there's any, if we've made any like public statements about like what the difference will be. I think, I think the point is that the interface for the Chachi B API will be like conversational first, and that's not the case now. If you look at text da Vinci oh oh three, like you, you just put in any sort of prompt.[00:10:52] It's not really built from the ground up to like keep the context of a conversation and things like that. And so it's really. Put in some sort of prompt, get a response. It's not always designed to be in that sort of conversational manner, so it's not tuned in that way. I think that's the biggest difference.[00:11:05] I think, again, the point that Sam made in a, a strictly the strictly VC talk mm-hmm. , which was incredible and I, I think that that talk got me excited and my, which, which part? The whole thing. And I think, I haven't been at open AI that long, so like I didn't have like a s I obviously knew who Sam was and had seen a bunch of stuff, but like obviously before, a lot of the present craziness with Elon Musk, like I used to think Elon Musk seemed like a really great guy and he was solving all these really important problems before all the stuff that happened.[00:11:33] That's a hot topic. Yeah. The stuff that happened now, yeah, now it's much more questionable and I regret having a Tesla, but I, I think Sam is actually. Similar in the sense that like he's solving and thinking about a lot of the same problems that, that Elon, that Elon is still today. But my take is that he seems like a much more aligned version of Elon.[00:11:52] Like he's, he's truly like, I, I really think he cares deeply about people and I think he cares about like solving the problems that people have and wants to enable people. And you can see this in the way that he's talked about how we deploy models at OpenAI. And I think you almost see Tesla in like the completely opposite end of the spectrum, where they're like, whoa, we.[00:12:11] Put these 5,000 pound machines out there. Yeah. And maybe they'll run somebody over, maybe they won't. But like it's all in the interest of like advancement and innovation. I think that's really on the opposite end of the spectrum of, of what open AI is doing, I think under Sam's leadership. So it's, it's interesting to see that, and I think Sam said[00:12:30] Alessio Fanelli: that people could have built Chen g p t with what you offered like six, nine months ago.[00:12:35] I[00:12:35] swyx: don't understand. Can we talk about this? Do you know what, you know what we're talking about, right? I do know what you're talking about. da Vinci oh three was not in the a p six months before ChatGPT. What was he talking about? Yeah.[00:12:45] Logan Kilpatrick: I think it's a little bit of a stretch, but I do think that it's, I, I think the underlying principle is that.[00:12:52] The way that it, it comes back to prompt engineering. The way that you could have engineered, like the, the prompts that you were put again to oh oh three or oh oh two. You would be able to basically get that sort of conversational interface and you can do that now. And, and I, you know, I've seen tutorials.[00:13:05] We have tutorials out. Yep. No, we, I mean, we, nineties, we have tutorials in the cookbook right now in on GitHub. We're like, you can do this same sort of thing. And you just, it's, it's all about how you, how you ask for responses and the way you format data and things like that. It. The, the models are currently only limited by what people are willing to ask them to do.[00:13:24] Like I really do think that, yeah, that you can do a lot of these things and you don't need the chat CBT API to, to build that conversational layer. That is actually where I[00:13:33] swyx: feel a little bit dumb because I feel like I don't, I'm not smart enough to think of new things to ask the models. I have to see an example and go, oh, you can do that.[00:13:43] All right, I'm gonna do that for now. You know, and, and that's why I think the, the cookbook is so important cuz it's kind of like a compendium of things we know about the model that you can ask it to do. I totally[00:13:52] Logan Kilpatrick: agree and I think huge shout out to the, the two folks who I work super closely with now on the cookbook, Ted and Boris, who have done a lot of that work and, and putting that out there and it's, yeah, you see number one trending repo on, on GitHub and it was super, like when my first couple of weeks at Open ai, super unknown, like really, we were only sort of directing our customers to that repo.[00:14:13] Not because we were trying to hide it or anything, but just because. It was just the way that we were doing things and then all of a sudden it got picked up on GitHub trending and a bunch of tweets went viral, showing the repo. So now I think people are actually being able to leverage the tools that are in there.[00:14:26] And, and Ted's written a bunch of amazing tutorials, Boris, as well. So I think it's awesome that more people are seeing those. And from my perspective, it's how can we take those, make them more accessible, give them more visibility, put them into the documentation, and I don't think that that connection right now doesn't exist, which I'm, I'm hopeful we'll be able to bridge those two things.[00:14:44] swyx: Cookbook is kind of a different set of documentation than API docs, and I think there's, you know, sort of existing literature about how you document these things and guide developers the right way. What, what I, what I really like about the cookbook is that it actually cites academic research. So it's like a nice way to not read the paper, but just read the conclusions of the paper ,[00:15:03] Logan Kilpatrick: and, and I think that's, that's a shout out to Ted and Boris cuz I, I think they're, they're really smart in that way and they've done a great job of finding the balance and understanding like who's actually using these different tools.[00:15:13] So, . Yeah.[00:15:15] swyx: You give other people credit, but you should take credit for yourself. So I read your last week you launched some kind of documentation about rate limiting. Yeah. And one of my favorite things about reading that doc was seeing examples of, you know, you were, you're telling people to do exponential back off and, and retry, but you gave code examples with three popular libraries.[00:15:32] You didn't have to do that. You could have just told people, just figure it out. Right. But you like, I assume that was you. It wasn't.[00:15:38] Logan Kilpatrick: So I think that's the, that's, I mean, I'm, I'm helping sort of. I think there's a lot of great stuff that people have done in open ai, but it was, we have the challenge of like, how can we make that accessible, get it into the documentation and still have that high bar for what goes into the doc.[00:15:51] So my role as of recently has been like helping support the team, building that documentation first culture, and supporting like the other folks who actually are, who wrote that information. The information was actually already in. Help center but it out. Yeah, it wasn't in the docs and like wasn't really focused on, on developers in that sense.[00:16:10] So yeah. I can't take the, the credit for the rate limit stuff either. , no, this[00:16:13] swyx: is all, it's part of the A team, that team effort[00:16:16] On Prompt Engineering[00:16:16] Alessio Fanelli: I was reading on Twitter, I think somebody was saying in the future will be kind of like in the hair potter word. People have like the spell book, they pull it out, they do all the stuff in chat.[00:16:24] GP z. When you talk with customers, like are they excited about doing prompt engineering and kind of getting a starting point or do they, do they wish there was like a better interface? ?[00:16:34] Logan Kilpatrick: Yeah, that's a good question. I think prompt engineering is so much more of an art than a science right now. Like I think there are like really.[00:16:42] Systematic things that you can do and like different like approaches and designs that you can take, but really it's a lot of like, you kind of just have to try it and figure it out. And I actually think that this remains to be one of the challenges with large language models in general, and not just head open ai, but for everyone doing it is that it's really actually difficult to understand what are the capabilities of the model and how do I get it to do the things that I wanted to do.[00:17:05] And I think that's probably where a lot of folks need to do like academic research and companies need to invest in understanding the capabilities of these models and the limitations because it's really difficult to articulate the capabilities of a model without those types of things. So I'm hopeful that, and we're shipping hopefully some new updated prompt engineering stuff.[00:17:24] Cause I think the stuff we have on the website is old, and I think the cookbook actually has a little bit more up-to-date stuff. And so hopefully we'll ship some new prompt engineering stuff in the, in the short term. I think dispel some of the myths and rumors, but like I, it's gonna continue to be like a, a little bit of a pseudoscience, I would imagine.[00:17:41] And I also think that the whole prompt engineering being like a job in the future meme, I think is, I think it's slightly overblown. Like I think at, you see this now actually with like, there's tools that are showing up and I forgot what the, I just saw went on Twitter. The[00:17:57] swyx: next guest that we are having on this podcast, Lang.[00:17:59] Yeah. Yeah.[00:18:00] Logan Kilpatrick: Lang Chain and Harrison on, yeah, there's a bunch of repos too that like categorize and like collect all the best prompts that you can put into chat. For example, and like, that's like the people who are, I saw the advertisement for someone to be like a prompt engineer and it was like a $350,000 a year.[00:18:17] Mm-hmm. . Yeah, that was, that was philanthropic. Yeah, so it, it's just unclear to me like how, how sustainable stuff like that is. Cuz like, once you figure out the interesting prompts and like right now it's kind of like the, the Wild West, but like in a year you'll be able to sort of categorize all those and then people will be able to find all the good ones that are relevant for what they want to do.[00:18:35] And I think this goes back to like, having the examples is super important and I'm, I'm with you as well. Like every time I use Dall-E the little. While it's rendering the image, it gives you like a suggestion of like how you should ask for the art to be generated. Like do it in like a cyberpunk format. Do it in a pixel art format.[00:18:53] Et cetera, et cetera, and like, I really need that. I'm like, I would never come up with asking for those things had it not prompted me to like ask it that way. And now I always ask for pixel art stuff or cyberpunk stuff and it looks so cool. That's what I, I think,[00:19:06] swyx: is the innovation of ChatGPT as a format.[00:19:09] It reduces. The need for getting everything into your prompt in the first try. Mm-hmm. , it takes it from zero shot to a few shot. If, if, if that, if prompting as, as, as shots can be concerned.[00:19:21] Logan Kilpatrick: Yeah. , I think that's a great perspective and, and again, this goes back to the ux UI piece of it really being sort of the differentiating layer from some of the other stuff that was already out there.[00:19:31] Because you could kind of like do this before with oh oh three or something like that if you just made the right interface and like built some sort of like prompt retry interface. But I don't think people were really, were really doing that. And I actually think that you really need that right now. And this is the, again, going back to the difference between like how you can use generative models versus like large scale.[00:19:53] Computer vision systems for self-driving cars, like the, the answer doesn't actually need to be right all the time. That's the beauty of, of large language models. It can be wrong 50% of the time and like it doesn't really cost you anything to like regenerate a new response. And there's no like, critical safety issue with that, so you don't need those.[00:20:09] I, I keep seeing these tweets about like, you need those like 99.99% reliability and like the three nines or whatever it is. Mm-hmm. , but like you really don't need that because the cost of regenerating the prop is again, almost, almost. I think you tweeted a[00:20:23] Alessio Fanelli: couple weeks ago that the average person doesn't yet fully grasp how GBT is gonna impact human life in the next four, five years.[00:20:30] Usecases and LLM-Native Products[00:20:30] Alessio Fanelli: I think you had an example in education. Yeah. Maybe touch on some of these. Example of non-tech related use cases that are enabling, enabled by C G B[00:20:38] T.[00:20:39] Logan Kilpatrick: I'm so excited and, and there's a bunch of other like random threads that come to my mind now. I saw a thread and, and our VP of product was, Peter, was, was involved in that thread as well, talking about like how the use of systems like ChatGPT will unlock like pretty almost low to zero cost access to like mental health services.[00:20:59] You know, you can imagine like the same use case for education, like really personalized tutors and like, it's so crazy to think about, but. The technology is not actually , like it's, it's truly like an engineering problem at this point of like somebody using one of these APIs to like build something like that and then hopefully the models get a little bit better and make it, make it better as well.[00:21:20] But like it, I have no doubt in my mind that three years from now that technology will exist for every single student in the world to like have that personalized education experience, have a pr, have a chat based experience where like they'll be able. Ask questions and then the curriculum will just evolve and be constructed for them in a way that keeps, I think the cool part is in a way that keeps them engaged, like it doesn't have to be sort of like the same delivery of curriculum that you've always seen, and this now supplements.[00:21:49] The sort of traditional education experience in the sense of, you know, you don't need teachers to do all of this work. They can really sort of do the thing that they're amazing at and not spend time like grading assignments and all that type of stuff. Like, I really do think that all those could be part of the, the system.[00:22:04] And same thing, I don't know if you all saw the the do not pay, uh, lawyer situation, say, I just saw that Twitter thread, I think yesterday around they were going to use ChatGPT in the courtroom and basically I think it was. California Bar or the Bar Institute said that they were gonna send this guy to prison if he brought, if he put AirPods in and started reading what ChatGPT was saying to him.[00:22:26] Yeah.[00:22:26] swyx: To give people the context, I think, like Josh Browder, the CEO of Do Not Pay, was like, we will pay you money to put this AirPod into your ear and only say what we tell you to say fr from the large language model. And of course the judge was gonna throw that out. I mean, I, I don't see how. You could allow that in your court,[00:22:42] Logan Kilpatrick: Yeah, but I, I really do think that, like, the, the reality is, is that like, again, it's the same situation where the legal spaces even more so than education and, and mental health services, is like not an accessible space. Like every, especially with how like overly legalized the United States is, it's impossible to get representation from a lawyer, especially if you're low income or some of those things.[00:23:04] So I'm, I'm optimistic. Those types of services will exist in the future. And you'll be able to like actually have a, a quality defense representative or just like some sort of legal counsel. Yeah. Like just answer these questions, what should I do in this situation? Yeah. And I like, I have like some legal training and I still have those same questions.[00:23:22] Like I don't know what I would do in that situation. I would have to go and get a lawyer and figure that out. And it's, . It's tough. So I'm excited about that as well. Yeah.[00:23:29] Alessio Fanelli: And when you think about all these vertical use cases, do you see the existing products implementing language models in what they have?[00:23:35] Or do you think we're just gonna see L L M native products kind of come to market and build brand[00:23:40] Logan Kilpatrick: new experiences? I think there'll be a lot of people who build the L l M first experience, and I think that. At least in the short term, those are the folks who will have the advantage. I do think that like the medium to long term is again, thinking about like what is your moat for and like again, and everyone has access to, you know, ChatGPT and to the different models that we have available.[00:24:05] So how can you build a differentiated business? And I think a lot of it actually will come down to, and this is just the true and the machine learning world in general, but having. Unique access to data. So I think if you're some company that has some really, really great data about the legal space or about the education space, you can use that and be better than your competition by fine tuning these models or building your own specific LLMs.[00:24:28] So it'll, it'll be interesting to see how that plays out, but I do think that. from a product experience, it's gonna be better in the short term for people who build the, the generative AI first experience versus people who are sort of bolting it onto their mm-hmm. existing product, which is why, like, again, the, the Google situation, like they can't just put in like the prompt into like right below the search bar.[00:24:50] Like, it just, it would be a weird experience and, and they have to sort of defend that experience that they have. So it, it'll be interesting to see what happens. Yeah. Perplexity[00:24:58] swyx: is, is kind of doing that. So you're saying perplexity will go Google ?[00:25:04] Logan Kilpatrick: I, I think that perplexity has a, has a chance in the short term to actually get more people to try the product because it's, it's something different I think, whether they can, I haven't actually used, so I can't comment on like that experience, but like I think the long term is like, How can they continue to differentiate?[00:25:21] And, and that's really the focus for like, if you're somebody building on these models, like you have to be, your first thought should be, how do I build a differentiated business? And if you can't come up with 10 reasons that you can build a differentiated business, you're probably not gonna succeed in, in building something that that stands the test of time.[00:25:37] Yeah.[00:25:37] Risks and benefits of building on OpenAI[00:25:37] swyx: I think what's. As a potential founder or something myself, like what's scary about that is I would be building on top of open ai. I would be sending all my stuff to you for fine tuning and embedding and what have you. By the way, fine tuning, embedding is their, is there a third one? Those are the main two that I know of.[00:25:55] Okay. And yeah, that's the risk. I would be a open AI API reseller.[00:26:00] Logan Kilpatrick: Yeah. And, and again, this, this comes back down to like having a clear sense of like how what you're building is different. Like the people who are just open AI API resellers, like, you're not gonna, you're not gonna have a successful business doing that because everybody has access to the Yeah.[00:26:15] Jasper's pretty great. Yeah, Jasper's pretty great because I, I think they've done a, they've, they've been smart about how they've positioned the product and I was actually a, a Jasper customer before I joined OpenAI and was using it to do a bunch of stuff. because the interface was simple because they had all the sort of customized, like if you want for like a response for this sort of thing, they'd, they'd pre-done that prompt engineering work for us.[00:26:39] I mean, you could really just like put in some exactly what you wanted and then it would make that Amazon product description or whatever it is. So I think like that. The interface is the, the differentiator for, for Jasper. And again, whether that send test time, hopefully, cuz I know they've raised a bunch of money and have a bunch of employees, so I'm, I'm optimistic for them.[00:26:58] I think that there's enough room as well for a lot of these companies to succeed. Like it's not gonna, the space is gonna get so big so quickly that like, Jasper will be able to have a super successful business. And I think they are. I just saw some, some tweets from the CEO the other day that I, I think they're doing, I think they're doing well.[00:27:13] Alessio Fanelli: So I'm the founder of A L L M native. I log into open ai, there's 6 million things that I can do. I'm on the playground. There's a lot of different models. How should people think about exploring the surface area? You know, where should they start? Kind of like hugging the go deeper into certain areas.[00:27:30] Logan Kilpatrick: I think six months ago, I think it would've been a much different conversation because people hadn't experienced ChatGPT before.[00:27:38] Now that people have experienced ChatGPT, I think there's a lot more. Technical things that you should start looking into and, and thinking about like the differentiators that you can bring. I still think that the playground that we have today is incredible cause it does sort of similar to what Jasper does, which is like we have these very focused like, you know, put in a topic and we'll generate you a summary, but in the context of like explaining something to a second grader.[00:28:03] So I think all of those things like give a sense, but we only have like 30 on the website or something like that. So really doing a lot of exploration around. What is out there? What are the different prompts that you can use? What are the different things that you can build on? And I'm super bullish on embeddings, like embed everything and that's how you can build cool stuff.[00:28:20] And I keep seeing all these Boris who, who I talked about before, who did a bunch of the cookbook stuff, tweeted the other day that his like back of the hand, back of the napkin math, was that 50 million bucks you can embed the whole internet. I'm like, Some companies gonna spend the 50 million and embed the whole internet and like, we're gonna find out what that product looks like.[00:28:40] But like, there's so many cool things that you could do if you did have the whole internet embedded. Yeah, and I, I mean, I wouldn't be surprised if Google did that cuz 50 million is a drop in the bucket and they already have the whole internet, so why not embed it?[00:28:52] swyx: Can can I ask a follow up question on that?[00:28:54] Cuz I am just learning about embeddings myself. What makes open eyes embeddings different from other embeddings? If, if there's like, It's okay if you don't have the, the numbers at hand, but I'm just like, why should I use open AI emitting versus others? I[00:29:06] Logan Kilpatrick: don't understand. Yeah, that's a really good question.[00:29:08] So I'm still ramping up on my understanding of embeddings as well. So the two things that come to my mind, one, going back to the 50 million to embed the whole internet example, it's actually just super cheap. I, I don't know the comparisons of like other prices, but at least from what I've seen people talking about on Twitter, like the embeddings that that we have in the API is just like significantly cheaper than a lot of other c.[00:29:30] Embeddings. Also the accuracy of some of the benchmarks that are like, Sort of academic benchmarks to use in embeddings. I know at least I was just looking back through the blog post from when we announced the new text embedding model, which is what Powers embeddings and it's, yeah, the, on those metrics, our API is just better.[00:29:50] So those are the those. I'll go read it up. Yeah, those are the two things. It's a good. It's a good blog post to read. I think the most recent one that came out, but, and also the original one from when we first announced the Embeddings api, I think also was a, it had, that one has a little bit more like context around if you're trying to wrap your head around embeddings, how they work.[00:30:06] That one has the context, the new one just has like the fancy new stuff and the metrics and all that kind of stuff.[00:30:11] swyx: I would shout a hugging face for having really good content around what these things like foundational concepts are. Because I was familiar with, so, you know, in Python you have like text tove, my first embedding as as a, as someone getting into nlp.[00:30:24] But then developing the concept of sentence embeddings is, is as opposed to words I think is, is super important. But yeah, it's an interesting form of lock in as a business because yes, I'm gonna embed all my source data, but then every inference needs an embedding as. . And I think that is a risk to some people, because I've seen some builders should try and build on open ai, call that out as, as a cost, as as like, you know, it starts to add a cost to every single query that you, that you[00:30:48] Logan Kilpatrick: make.[00:30:49] Yeah. It'll be interesting to see how it all plays out, but like, my hope is that that cost isn't the barrier for people to build because it's, it's really not like the cost for doing the incremental like prompts and having them embedded is, is. Cent less than cents, but[00:31:06] swyx: cost I, I mean money and also latency.[00:31:08] Yeah. Which is you're calling the different api. Yeah. Anyway, we don't have to get into that.[00:31:13] Alessio Fanelli: No, but I think embeds are a good example. You had, I think, 17 versions of your first generation, what api? Yeah. And then you released the second generation. It's much cheaper, much better. I think like the word on the street is like when GPT4 comes out, everything else is like trash that came out before it.[00:31:29] It's got[00:31:30] Logan Kilpatrick: 100 trillion billion. Exactly. Parameters you don't understand. I think Sam has already confirmed that those are, those are not true . The graphics are not real. Whatever you're seeing on Twitter about GPT4, you're, I think the direct quote was, you're begging to be disappointed by continuing to, to put that hype out.[00:31:47] So[00:31:48] Alessio Fanelli: if you're a developer building on these, What's kind of the upgrade path? You know, I've been building on Model X, now this new model comes out. What should I do to be ready to move on?[00:31:58] Logan Kilpatrick: Yeah. I think all of these types of models folks have to think about, like there will be trade offs and they'll also be.[00:32:05] Breaking changes like any other sort of software improvement, like things like the, the prompts that you were previously expecting might not be the prompts that you're seeing now. And you can actually, you, you see this in the case of the embeddings example that you just gave when we released Tex embeddings, ADA oh oh two, ada, ada, whichever it is oh oh two, and it's sort of replaced the previous.[00:32:26] 16 first generation models, people went through this exact experience where like, okay, I need to test out this new thing, see how it works in my environment. And I think that the really fascinating thing is that there aren't, like the tools around doing this type of comparison don't exist yet today. Like if you're some company that's building on lms, you sort of just have to figure it out yourself of like, is this better in my use case?[00:32:49] Is this not better? In my use case, it's, it's really difficult to tell because the like, Possibilities using generative models are endless. So I think folks really need to focus on, again, that goes back to how to build a differentiated business. And I think it's understanding like what is the way that people are using your product and how can you sort of automate that in as much way and codify that in a way that makes it clear when these different models come up, whether it's open AI or other companies.[00:33:15] Like what is the actual difference between these and which is better for my use case because the academic be. It'll be saturated and people won't be able to use them as a point of comparison in the future. So it'll be important to think about. For your specific use case, how does it differentiate?[00:33:30] swyx: I was thinking about the value of frameworks or like Lang Chain and Dust and what have you out there.[00:33:36] I feel like there is some value to building those frameworks on top of Open Eyes, APIs. It kind of is building what's missing, essentially what, what you guys don't have. But it's kind of important in the software engineering sense, like you have this. Unpredictable, highly volatile thing, and you kind of need to build a stable foundation on top of it to make it more predictable, to build real software on top of it.[00:33:59] That's a super interesting kind of engineering problem. .[00:34:03] Logan Kilpatrick: Yeah, it, it is interesting. It's also the, the added layer of this is that the large language models. Are inherently not deterministic. So I just, we just shipped a small documentation update today, which, which calls this out. And you think about APIs as like a traditional developer experience.[00:34:20] I send some response. If the response is the same, I should get the same thing back every time. Unless like the data's updating and like a, from like a time perspective. But that's not the, that's not the case with the large language models, even with temperature zero. Mm-hmm. even with temperature zero. Yep.[00:34:34] And that's, Counterintuitive part, and I think someone was trying to explain to me that it has to do with like Nvidia. Yeah. Floating points. Yes. GPU stuff. and like apparently the GPUs are just inherently non-deterministic. So like, yes, there's nothing we can do unless this high Torch[00:34:48] swyx: relies on this as well.[00:34:49] If you want to. Fix this. You're gonna have to tear it all down. ,[00:34:53] Logan Kilpatrick: maybe Nvidia, we'll fix it. I, I don't know, but I, I think it's a, it's a very like, unintuitive thing and I don't think that developers like really get that until it happens to you. And then you're sort of scratching your head and you're like, why is this happening?[00:35:05] And then you have to look it up and then you see all the NVIDIA stuff. Or hopefully our documentation makes it more clear now. But hopefully people, I also think that's, it's kinda the cool part as well. I don't know, it's like, You're not gonna get the same stuff even if you try to.[00:35:17] swyx: It's a little spark of originality in there.[00:35:19] Yeah, yeah, yeah, yeah. The random seed .[00:35:22] OpenAI Codex[00:35:22] swyx: Should we ask about[00:35:23] Logan Kilpatrick: Codex?[00:35:23] Alessio Fanelli: Yeah. I mean, I love Codex. I use it every day. I think like one thing, sometimes the code is like it, it's kinda like the ChatGPT hallucination. Like one time I asked it to write up. A Twitter function, they will pull the bayou of this thing and it wrote the whole thing and then the endpoint didn't exist once I went to the Twitter, Twitter docs, and I think like one, I, I think there was one research that said a lot of people using Co Palace, sometimes they just auto complete code that is wrong and then they commit it and it's a, it's a big[00:35:51] Logan Kilpatrick: thing.[00:35:51] swyx: Do you secure code as well? Yeah, yeah, yeah, yeah. I saw that study.[00:35:54] Logan Kilpatrick: How do[00:35:54] Alessio Fanelli: you kind of see. Use case evolving. You know, you think, like, you obviously have a very strong partnership with, with Microsoft. Like do you think Codex and VS code will just keep improving there? Do you think there's kind of like a. A whole better layer on top of it, which is from the scale AI hackathon where the, the project that one was basically telling the l l m, you're not the back end of a product[00:36:16] And they didn't even have to write the code and it's like, it just understood. Yeah. How do you see the engineer, I, I think Sean, you said copilot is everybody gets their own junior engineer to like write some of the code and then you fix it For me, a lot of it is the junior engineer gets a senior engineer to actually help them write better code.[00:36:32] How do you see that tension working between the model and the. It'll[00:36:36] Logan Kilpatrick: be really interesting to see if there's other, if there's other interfaces to this. And I think I've actually seen a lot of people asking, like, it'd be really great if I had ChatGPT and VS code because in, in some sense, like it can, it's just a better, it's a better interface in a lot of ways to like the, the auto complete version cuz you can reprompt and do, and I know Via, I know co-pilot actually has that, where you can like click and then give it, it'll like pop up like 10 suggested.[00:36:59] Different options instead of brushes. Yeah, copilot labs, yeah. Instead of the one that it's providing. And I really like that interface, but again, this goes back to. I, I do inherently think it'll get better. I think it'll be able to do a lot, a lot more of the stuff as the models get bigger, as they have longer context as they, there's a lot of really cool things that will end up coming out and yeah, I don't think it's actually very far away from being like, much, much better.[00:37:24] It'll go from the junior engineer to like the, the principal engineer probably pretty quickly. Like I, I don't think the gap is, is really that large between where things are right now. I think like getting it to the point. 60% of the stuff really well to get it to do like 90% of the stuff really well is like that's within reach in the next, in the next couple of years.[00:37:45] So I'll be really excited to see, and hopefully again, this goes back to like engineers and developers and people who aren't thinking about how to integrate. These tools, whether it's ChatGPT or co-pilot or something else into their workflows to be more efficient. Those are the people who I think will end up getting disrupted by these tools.[00:38:02] So figuring out how to make yourself more valuable than you are today using these tools, I think will be super important for people. Yeah.[00:38:09] Alessio Fanelli: Actually use ChatGPT to debug, like a react hook the other day. And then I posted in our disc and I was like, Hey guys, like look, look at this thing. It really helped me solve this.[00:38:18] And they. That's like the ugliest code I've ever seen. It's like, why are you doing that now? It's like, I don't know. I'm just trying to get[00:38:24] Logan Kilpatrick: this thing to work and I don't know, react. So I'm like, that's the perfect, exactly, that's the perfect solution. I, I did this the other day where I was looking at React code and like I have very briefly seen React and run it like one time and I was like, explain how this is working.[00:38:38] So, and like change it in this way that I want to, and like it was able to do that flawlessly and then I just popped it in. It worked exactly like I. I'll give a[00:38:45] swyx: little bit more context cause I was, I was the guy giving you feedback on your code and I think this is a illustrative of how large language models can sort of be more confident than they should be because you asked it a question which is very specific on how to improve your code or fix your code.[00:39:00] Whereas a real engineer would've said, we've looked at your code and go, why are you doing it at at all? Right? So there's a sort of sycophantic property of martial language. Accepts the basis of your question, whereas a real human might question your question. Mm-hmm. , and it was just not able to do that. I mean, I, I don't see how he could do that.[00:39:17] Logan Kilpatrick: Yeah. It's, it's interesting. I, I saw another example of this the other day as well with some chatty b t prompt and I, I agree. It'll be interesting to see if, and again, I think not to, not to go back to Sam's, to Sam's talk again, but like, he, he talked real about this, and I think this makes a ton of sense, which is like you should be able to have, and this isn't something that that exists right now, but you should be able to have the model.[00:39:39] Tuned in the way that you wanna interact with. Like if you want a model that sort of questions what you're asking it to do, like you should be able to have that. And I actually don't think that that's as far away as like some of the other stuff. Um, It, it's a very possible engineering problem to like have the, to tune the models in that way and, and ask clarifying questions, which is even something that it doesn't do right now.[00:39:59] It'll either give you the response or it won't give you the response, but it'll never say like, Hey, what do you mean by this? Which is super interesting cuz that's like we spend as humans, like 50% of our conversational time being like, what do you mean by that? Like, can you explain more? Can you say it in a different way?[00:40:14] And it's, it's fascinating that the model doesn't do that right now. It's, it's interesting.[00:40:20] swyx: I have written a piece on sort of what AGI hard might be, which is the term that is being thrown around as like a layer of boundary for what is, what requires an A real AGI to do and what, where you might sort of asymptotically approach.[00:40:33] So, What people talk about is essentially a theory of mind, developing a con conception of who I'm talking to and persisting that across sessions, which essentially ChatGPT or you know, any, any interface that you build on top of GPT3 right now would not be able to do. Right? Like, you're not persisting you, you are persisting that history, but you don't, you're not building up a conception of what you know and what.[00:40:54] I should fill in the blanks for you or where I should question you. And I think that's like the hard thing to understand, which is what will it take to get there? Because I think that to me is the, going back to your education thing, that is the biggest barrier, which is I, the language model doesn't have a memory or understanding of what I know.[00:41:11] and like, it's, it's too much to tell them what I don't know. Mm-hmm. , there's more that I don't know than I, than I do know . I think the cool[00:41:16] Logan Kilpatrick: part will be when, when you're able to, like, imagine you could upload all of the, the stuff that you've ever done, all the texts, the work that you've ever done before, and.[00:41:27] The model can start to understand, hey, what are the, what are the conceptual gaps that this person has based on what you've said, based on what you've done? I think that would be really interesting. Like if you can, like I have good notes on my phone and I can still go back to see all of the calculus classes that I took and I could put in all my calculus notebooks and all the assignments and stuff that I did in, in undergrad and grad school, and.[00:41:50] basically be like, Hey, here are the gaps in your understanding of calculus. Go and do this right now. And I think that that's in the education space. That's exactly what will end up happening. You'll be able to put in all this, all the work that you've done. It can understand those ask and then come up with custom made questions and prompts and be like, Hey, how, you know, explain this concept to me and if it.[00:42:09] If you can't do that, then it can sort of put that into your curriculum. I think like Khan Academy as an example, already does some of this, like personalized learning. You like take assessments at the beginning of every Khan Academy model module, and it'll basically only have you watch the videos and do the assignments for the things that like you didn't test well into.[00:42:27] So that's, it's, it's sort of close to already being there in some sense, but it doesn't have the, the language model interface on top of it before we[00:42:34] swyx: get into our lightning round, which is like, Quick response questions. Was there any other topics that you think you wanted to cover? We didn't touch on, whisper.[00:42:40] We didn't touch on Apple. Anything you wanted to[00:42:42] Logan Kilpatrick: talk?[00:42:43] Apple's Neural Engine[00:42:43] Logan Kilpatrick: Yeah, I think the question around Apple stuff and, and the neural engine, I think will be really interesting to see how it all plays out. I think, I don't know if you wanna like ask just to give the context around the neural engine Apple question. Well, well, the[00:42:54] swyx: only thing I know it's because I've seen Apple keynotes.[00:42:57] Everyone has, you know, I, I have a m M one MacBook Cure. They have some kind of neuro chip. , but like, I don't see it in my day-to-day life, so when is this gonna affect me, essentially? And you worked at Apple, so I I was just gonna throw the question over to you, like, what should we[00:43:11] Logan Kilpatrick: expect out of this? Yeah.[00:43:12] The, the problem that I've seen so far with the neural engine and all the, the Mac, and it's also in the phones as well, is that the actual like, API to sort of talk to the neural engine isn't something that's like a common you like, I'm pretty sure it's either not exposed at all, like it only like Apple basically decides in the software layer Yeah.[00:43:34] When, when it should kick in and when it should be used, which I think doesn't really like help developers and it doesn't, that's why no one is using it. I saw a bunch of, and of course I don't have any good insight on this, but I saw a bunch of rumors that we're talking about, like a lot of. Main use cases for the neural engine stuff.[00:43:50] It's, it's basically just in like phantom mode. Now, I'm sure it's doing some processing, but like the main use cases will be a lot of the ar vr stuff that ends up coming out and like when it gets much heavier processing on like. Graphic stuff and doing all that computation, that's where it'll be. It'll be super important.[00:44:06] And they've basically been able to trial this for the last, like six years and have it part of everything and make sure that they can do it cheaply in a cost effective way. And so it'll be cool to see when that I'm, I hope it comes out. That'll be awesome.[00:44:17] swyx: Classic Apple, right? They, they're not gonna be first, but when they do it, they'll make a lot of noise about it.[00:44:21] Yeah. . It'll be[00:44:22] Logan Kilpatrick: awesome. Sure.[00:44:22] Lightning Round[00:44:22] Logan Kilpatrick: So, so are we going to light. Let's[00:44:24] Alessio Fanelli: do it. All right. Favorite AI products not[00:44:28] Logan Kilpatrick: open AI. Build . I think synthesis. Is synthesis.io is the, yeah, you can basically put in like a text prompt and they have like a human avatar that will like speak and you can basically make content in like educational videos.[00:44:44] And I think that's so cool because maybe as people who are making content, like it's, it's super hard to like record video. It just takes a long time. Like you have to edit all the stuff, make sure you sound right, and then when you edit yourself talking it's super weird cuz your mouth is there and things.[00:44:57] So having that and just being able to ChatGPT A script. Put it in. Hopefully I saw another demo of like somebody generating like slides automatically using some open AI stuff. Like I think that type of stuff. Chat, BCG, ,[00:45:10] swyx: a fantastic name, best name of all time .[00:45:14] Logan Kilpatrick: I think that'll be cool. So I'm super excited,[00:45:16] swyx: but Okay.[00:45:16] Well, so just a follow up question on, on that, because we're both in that sort of Devrel business, would you put AI Logan on your video, on your videos and a hundred[00:45:23] Logan Kilpatrick: percent, explain that . A hundred percent. I would, because again, if it reduces the time for me, like. I am already busy doing a bunch of other stuff,[00:45:31] And if I could, if I could take, like, I think the real use case is like I've made, and this is in the sense of like creators wanting to be on every platform. If I could take, you know, the blog posts that I wrote and then have AI break it up into a bunch of things, have ai Logan. Make a TikTok, make a YouTube video.[00:45:48] I cannot wait for that. That's gonna be so nice. And I think there's probably companies who are already thinking about doing that. I'm just[00:45:53] swyx: worried cuz like people have this uncanny valley reaction to like, oh, you didn't tell me what I just watched was a AI generated thing. I hate you. Now you know there, there's a little bit of ethics there and I'm at the disclaimer,[00:46:04] Logan Kilpatrick: at the top.[00:46:04] Navigating. Yeah. I also think people will, people will build brands where like their whole thing is like AI content. I really do think there are AI influencers out there. Like[00:46:12] swyx: there are entire Instagram, like million plus follower accounts who don't exist.[00:46:16] Logan Kilpatrick: I, I've seen that with the, the woman who's a Twitch streamer who like has some, like, she's using like some, I don't know, that technology from like movies where you're like wearing like a mask and it like changes your facial appearance and all that stuff.[00:46:27] So I think there's, there's people who find their niche plus it'll become more common. So, cool. My[00:46:32] swyx: question would be, favorite AI people in communities that you wanna shout up?[00:46:37] Logan Kilpatrick: I think there's a bunch of people in the ML ops community where like that seemed to have been like the most exciting. There was a lot of innovation, a lot of cool things happening in the ML op space, and then all the generative AI stuff happened and then all the ML Ops two people got overlooked.[00:46:51] They're like, what's going on here? So hopefully I still think that ML ops and things like that are gonna be super important for like getting machine learning to be where it needs to be for us to. AGI and all that stuff. So a year from[00:47:05] Alessio Fanelli: now, what will people be the most[00:47:06] Logan Kilpatrick: surprised by? N. I think the AI is gonna get very, very personalized very quickly, and I don't think that people have that feeling yet with chat, BT, but I, I think that that's gonna, that's gonna happen and they'll be surprised in like the, the amount of surface areas in which AI is present.[00:47:23] Like right now it's like, it's really exciting cuz Chat BT is like the one place that you can sort of get that cool experience. But I think that, The people at Facebook aren't dumb. The people at Google aren't dumb. Like they're gonna have, they're gonna have those experiences in a lot of different places and I think that'll be super fascinating to see.[00:47:40] swyx: This is for the builders out there. What's an AI thing you would pay for if someone built it with their personal[00:47:45] Logan Kilpatrick: work? I think more stuff around like transfer learning for, like making transfer, learning easier. Like I think that's truly the way to. Build really cool things is transfer learning, fine tuning, and I, I don't think that there's enough.[00:48:04] Jeremy Howard who created Fasted AI talks a lot about this. I mean, it's something that really resonates with me and, and for context, like at Apple, all the machine learning stuff that we did was transfer learning because it was so powerful. And I think people have this perception that they need to.[00:48:18] Build things from scratch and that's not the case. And I think especially as large language models become more accessible, people need to build layers and products on top of this to make transfer learning more accessible to more people. So hopefully somebody builds something like that and we can all train our own models.[00:48:33] I think that's how you get like that personalized AI experiences you put in your stuff. Make transfer learning easy. Everyone wins. Just just to vector in[00:48:40] swyx: a little bit on this. So in the stable diffusion community, there's a lot of practice of like, I'll fine tune a custom dis of stable diffusion and share it.[00:48:48] And then there also, there's also this concept of, well, first it was textual inversion and then dream booth where you essentially train a concept that you can sort of add on. Is that what you're thinking about when you talk about transfer learning or is that something[00:48:59] Logan Kilpatrick: completely. I feel like I'm not as in tune with the generative like image model community as I probably should be.[00:49:07] I, I think that that makes a lot of sense. I think there'll be like whole ecosystems and marketplaces that are sort of built around exactly what you just said, where you can sort of fine tune some of these models in like very specific ways and you can use other people's fine tunes. That'll be interesting to see.[00:49:21] But, c.ai is,[00:49:23] swyx: what's it called? C C I V I Ts. Yeah. It's where people share their stable diffusion checkpoints in concepts and yeah, it's[00:49:30] Logan Kilpatrick: pretty nice. Do you buy them or is it just like free? Like open. Open source? It's, yeah. Cool. Even better.[00:49:34] swyx: I think people might want to sell them. There's a, there's a prompt marketplace.[00:49:38] Prompt base, yeah. Yeah. People hate it. Yeah. They're like, this should be free. It's just text. Come on, .[00:49:45] Alessio Fanelli: Hey, it's knowledge. All right. Last question. If there's one thing you want everyone to take away about ai, what would.[00:49:51] Logan Kilpatrick: I think the AI revolution is gonna, you know, it's been this like story that people have been talking about for the longest time, and I don't think that it's happened.[00:50:01] It was really like, oh, AI's gonna take your job, AI's gonna take your job, et cetera, et cetera. And I think people have sort of like laughed that off for a really long time, which was fair because it wasn't happening. And I think now, Things are going to accelerate very, very quickly. And if you don't have your eyes wide open about what's happening, like there's a good chance that something that you might get left behind.[00:50:21] So I'm, I'm really thinking deeply these days about like how that is going to impact a lot of people. And I, I'm hopeful that the more widespread this technology becomes, the more mainstream this technology becomes, the more people will benefit from it and hopefully not be affected in that, in that negative way.[00:50:35] So use these tools, put them into your workflow, and, and hopefully that will, and that will acceler. Well,[00:50:41] swyx: we're super happy that you're at OpenAI getting this message out there, and I'm sure we'll see a l
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How-to Transformer Mechanistic Interpretability—in 50 lines of code or less!, published by StefanHex on January 24, 2023 on LessWrong. Produced as part of the SERI ML Alignment Theory Scholars Program - Winter 2022 Cohort. What if I told you that in just one weekend you can get up to speed doing practical Mechanistic Interpretability research on Transformers? Surprised? Then this is your tutorial! I'll give you a view to how I research Transformer circuits in practice, show you the tools you need, and explain my thought process along the way. I focus on the practical side to get started with interventions; for more background see point 2 below. Prerequisites: Understanding the Transformer architecture: Know what the residual stream is, how attention layers and MLPs work, and how logits & predictions work. For future sections familiarity with multi-head attention is useful. Here's a link to Neel's glossary which provides excellent explanations for most terms I might use!If you're not familiar with Transformers you can check out Step 2 (6) on Neel's guide or any of the other explanations online, I recommend Jay Alammar's The Illustrated Transformer and/or Milan Straka's lecture series. Some overview of Mechanistic Interpretability is helpful: See e.g. any of Neel's talks, or look at the results in the IOI paper / walkthrough. Basic Python: Familiarity with arrays (as in NumPy or PyTorch, for indices) is useful; but explicitly no PyTorch knowledge required! No hardware required, free Google Colab account works fine for this. Here's a notebook with all the code from this tutorial! PS: Here's a little web page where you can run some of these methods online! No trivial inconveniences! Step 0: Setup Open a notebook (e.g. Colab) and install Neel Nanda's TransformerLens (formerly known as EasyTransformer). Step 1: Getting a model to play with That's it, now you've got a GPT2 model to play with! TransformerLens supports most relevant open source transformers. Here's how to run the language model Let's have a look at the internal activations: TransformerLens can give you a dictionary with almost all internal activations you ever care about (referred to as “cache”): Here you will find things like the attention pattern blocks.0.attn.hook_pattern, the residual stream before and after each layer blocks.1.hook_resid_pre, and more! You can also access all the weights & parameters of the model in model.named_parameters(). Here you will find weight matrices & biases of every MLP and Attention layer, as well as the embedding & unembedding matrices. I won't focus on these in this guide but they're great to look at! (Exercise: What can the unembedding biases unembed.b_U tell you about common tokens?) Step 2: Let's start analyzing a behavior! Let's go and find some induction heads! I'll make up an example: Her name was Alex Hart. When Alex, with likely completion Hart. TransformerLens has a little tool to plot a tokenized prompt, model predictions, and associated logits: I find it is useful to spend a few minutes thinking about which information is needed to solve the task: The model needs to Realize the last token, Alex, is a repetition of a previous occurrence The model needs to copy the last name from after the previous Alex occurrence to the last token as prediction Method 1: Residual stream patching The number 1 thing I try when I want to reverse engineer a new behavior is to find where in the network the information is “traveling”. In transformers, the model keeps track of all information in the residual stream. Attention heads & MLPs read from the residual stream, perform some computation or information moving, and write their outputs back into the residual stream. I think of this stream as having a couple of “lanes” corresponding to each token position. Over the course of the model...
Travis Oliphant is an impactful programmer and data scientist. He is the CEO of OpenTeams & Quansight, the founder of Anaconda, and the creator of NumPy, SciPy, and Numba. On this episode of Mock IT, he joins Marie and guest host Wilson to chat about trustworthy and ethical artificial intelligence (#AI) and machine learning (#ML). Follow Along: + Website: https://bit.ly/3VNM5xL + LinkedIn: https://bit.ly/3DY5oN7 + Instagram: https://bit.ly/3Tmi4mx + Open Jobs: https://bit.ly/3GAscny + Watch Episode: https://youtu.be/jL6AdYeDTaA Helpful Links: + Find Travis: linkedin.com/in/teoliphant/ + OpenTeams: https://bit.ly/3vVuWqw + Quansight: https://quansight.com/ + PyData #Python Event Recap: https://youtu.be/kMCnyLMhuJU + ChatGPT: https://openai.com/blog/chatgpt/ + Google Colab: https://colab.research.google.com/ +ChatGPT in Schools Article: https://apnews.com/article/what-is-chat-gpt-ac4967a4fb41fda31c4d27f015e32660
Watch on YouTube About the show Sponsored by Microsoft for Startups Founders Hub. Connect with us Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Show: @pythonbytes@fosstodon.org Join us on YouTube at pythonbytes.fm/stream/live to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too. Brian #1: PEP 703 - Making the GIL Optional in CPython Author: Sam Gross Sponsor: Łukasz Langa Draft status, but on Standards Track, targeting Python 3.12 Suggested by: Will Shanks “The GIL is a major obstacle to concurrency.” Especially for scientific computing. PEP 703 proposes adding a --without-gil build configuration to CPython to let it run code without the global interpreter lock and with the necessary changes needed to make the interpreter thread-safe. PEP includes several issues with GIL and sckikit-learn, PyTorch, Numpy, Pillow, and other numerically intensive libraries. Python's GIL makes it difficult to use modern multi-core CPUs efficiently for many scientific and numeric computing applications. There's also a section on how the GIL makes many types of parallelism difficult to express. Changes primarily in internals, and not much exposed to public Python and C APIs: Reference counting Memory management Container thread-safety Locking and atomic APIs Includes information on all of these challenges. Distribution C-API extension authors will need access to a --without-gil Python to modify their projects and supply --without-gil versions. Sam is proposing “To mitigate this, the author will work with Anaconda to distribute a --without-gil version of Python together with compatible packages from conda channels. This centralizes the challenges of building extensions, and the author believes this will enable more people to use Python without the GIL sooner than they would otherwise be able to.” Michael #2: FerretDB Via Jon Bultmeyer A truly Open Source MongoDB alternative MongoDB abandoned its Open-Source roots, changing the license to SSPL making it unusable for many Open Source and Commercial Projects. The core of our solution is a stateless proxy, which converts MongoDB protocol queries to SQL, and uses PostgreSQL as a database engine. FerretDB will be compatible with MongoDB drivers and will strive to serve as a drop-in replacement for MongoDB 6.0+. First release back in Nov 2022 I still love you MongoDB ;) Brian #3: Four tips for structuring your research group's Python packages David Aaron Nicholson Not PyPI packages, but, you know, directories with __init__.py in them. Corrections for mistakes I see frequently Give your packages and modules terse, single-word names whenever possible. Import modules internally, instead of importing everything from modules. Make use of sub-packages. Prefer modules with very specific names containing single functions over modules with very general names like utils, helpers, or support that contain many functions. Michael #4: Quibbler Quibbler is a toolset for building highly interactive, yet reproducible, transparent and efficient data analysis pipelines. One import statement and matplotlib becomes interactive. Check out the video on the repo page. Extras Brian: And now for something completely different: turtles talk Michael: More RSS recommendations FreshRSS a self-hosted RSS and Atom feed aggregator. Feedly (for AI) Flym for Android Readwise is very interesting RSS for courses at Talk Python New article: Dev on the Road Joke: Testing the program Joke: Every Cloud Architecture
Beyond chatbots, virtual assistants, and the evolution of GPS systems, artificial intelligence (AI), machine learning (ML), and the tools behind making it all work are shaping the present and future world. It may feel like these technologies are in the infant stage, and perhaps they are, but #AI and #ML solutions are here to stay. Senior Data Scientist Wilson Rodden sat down with podcast hosts Liz and Marie to chat about how humans are kept in the loop of AI/ML, the common misconceptions behind the technology, and if it's really as dangerous as some people think. Important Links: Meet Travis Oliphant (CEO of OpenTeams & Quansight, the founder of Anaconda, and the creator of NumPy, SciPy, and Numba) for free on Jan. 5: https://bit.ly/3vlQ8Wt Wilson Rodden LinkedIn: www.linkedin.com/in/wilson-rodden Follow Along for More: LinkedIn: https://bit.ly/3vlQ8Wt Instagram: https://bit.ly/3hRiN2F Website: https://bit.ly/3hxHFfe Timecodes: 1:28 Discussing the fourth anniversary of the IDEA Act 2:40 Five tips for managing stakeholder feedback as a UX'er 7:00 Interview with Data Scientist Wilson Rodden starts 8:50 What is AI/ML? 10:14 What is Big Data? 10:59 What is Predictive Modeling? 11:47 What is Computer Vision? 13:13 Common misconceptions about AI/ML 15:27 How to build trust in the government for AI innovations? 17:38 What is a biased algorithm? 24:34: What does human-in-the-loop mean? 35:22: Discussion on AI in the media 37:36 Is AI dangerous? 41:56: A data scientists' role on your team
Array Cast - December 9, 2022 Show NotesThanks to Bob Therriault, Adám Brudzewsky, Stephen Taylor and Nick Psaris for gathering these links:[01] 00:01:50 APLNAATOT podcast #4 https://www.youtube.com/watch?v=SxSd2Hma_Ro&list=PLYKQVqyrAEj8Q7BdOgakZCAGf6ReO1cue Naming the APLNAATOT podcast twitter https://twitter.com/a_brudz/status/1600523637253185541[02] 00:03:44 Advent of Code (AOC) Links BQN Solutions https://mlochbaum.github.io/BQN/community/aoc.html APL Wiki Advent of Code: https://apl.wiki/aoc K Wiki Advent of Code: https://k.miraheze.org/wiki/Advent_of_Code[03] 00:04:40 q AOC list http://github.com/qbists/studyq/[04] 00:06:20 Parsing the input for AOC in APL video https://www.youtube.com/watch?v=AHoiROI15BA Jay Foad's solution to day 6 https://github.com/jayfoad/aoc2022apl[05] 00:07:45 Nick Psaris episodes of the ArrayCast https://www.arraycast.com/episodes/episode-02-challenges-facing-the-array-languages https://www.arraycast.com/episodes/episode-03-what-is-an-array https://www.arraycast.com/episodes/episode-04-responding-to-listeners-email Q tips https://www.goodreads.com/book/show/25221469-q-tips Vector review of Q tips https://vector.org.uk/book-review-q-tips-fast-scalable-and-maintainable-kdb-2/ Fun Q https://www.amazon.com/dp/1734467509 Vector review of Fun Q https://vector.org.uk/book-review-fun-q-a-functional-introduction-to-machine-learning-in-q/[06] 00:09:33 Atar1 800 https://en.wikipedia.org/wiki/Atari_8-bit_family[07] 00:10:09 Morgan Stanley https://www.morganstanley.com/ Perl Computer Language https://www.perl.org/ Hash Map https://www.geeksforgeeks.org/differences-between-hashmap-and-hashtable-in-java/amp/ VBA Computer Language https://en.wikipedia.org/wiki/Visual_Basic_for_Applications Java Computer Language https://en.wikipedia.org/wiki/Java_(programming_language) C++ Computer Language https://en.wikipedia.org/wiki/C%2B%2B STL https://en.wikipedia.org/wiki/C%2B%2B#Standard_library Bit representation: https://code.kx.com/q/ref/vs/#encode[08] 00:14:23 kdb https://en.wikipedia.org/wiki/Kdb%2B[09] 00:15:08 Abridged introduction to kdb+ https://legaldocumentation.kx.com/q/d/kdb+.htm Arthur Whitney https://en.wikipedia.org/wiki/Arthur_Whitney_(computer_scientist) Abridged introduction to q https://legaldocumentation.kx.com/q/d/q.htm kx archive https://code.kx.com/q/learn/archive/#archive Joins - https://code.kx.com/q/basics/joins/[10] 00:23:39 vs operator in q https://code.kx.com/q/ref/vs/[11] 00:24:50 sv operator in q https://code.kx.com/q/ref/sv/[12] 00:26:08 k6 Computer Language oK https://johnearnest.github.io/ok/index.html[13] 00:27:10 kx systems https://kx.com/[14] 00:31:48 Shakti https://shakti.com/ Arthur Whitney Articles ACMqueue B. Cantrill https://queue.acm.org/detail.cfm?id=1531242 kx interview https://web.archive.org/web/20150725231802/https://kx.com/arthur-interview.php[15] 00:32:20 Roger Hui https://aplwiki.com/wiki/Roger_Hui Roger Hui Memorial Episode of ArrayCast https://www.arraycast.com/episodes/episode13-roger-hui Ken Iverson https://aplwiki.com/wiki/Ken_Iverson[16] 00:34:45 Max and Min overloaded in k https://kparc.com/k.txt[17] 00:35:40 Where operator overloads Q https://code.kx.com/q/ref/where/#vector-of-non-negative-integers APL https://apl.wiki/Where BQN https://mlochbaum.github.io/BQN/doc/replicate.html#indices J https://code.jsoftware.com/wiki/Vocabulary/icapdot[18] 00:39:23 Day 6 AOC 2022 https://adventofcode.com/2022/day/6 Coding by successive approximation Romilly Cocking https://www.arraycast.com/episodes/episode34-romilly-cocking[19] 00:41:29 Emacs https://www.gnu.org/software/emacs/[20] 00:43:28 Iversonian or Array Language episode https://www.arraycast.com/episodes/episode39-iverson-or-array-language[21] 00:45:10 APL Computer Language https://en.wikipedia.org/wiki/APL_(programming_language)[22] 00:48:00 Q lists of lists https://code.kx.com/q4m3/3_Lists/#37-nesting[23] 00:50:25 SQL https://en.wikipedia.org/wiki/SQL[24] 00:54:08 JD: https://code.jsoftware.com/wiki/Jd/Overview DDB: https://dfns.dyalog.com/ddb_index.htm SQAPL: https://www.dyalog.com/uploads/documents/Dyalog_SQAPL_Server_Data_Sheet.pdf[25] 00:56:42 Joins https://code.kx.com/q/basics/joins/[26] 00:59:20 Q Dictionary https://code.kx.com/q/basics/dictsandtables/[27] 00:59:46 Combinators https://en.wikipedia.org/wiki/Combinatory_logic[28] 01:00:14 Multidimensional arrays in SQL https://en.wikipedia.org/wiki/SQL#Standardization_history[29] 01:02:07 Database Administrator https://en.wikipedia.org/wiki/Database_administrator Database Analyst - Quant https://en.wikipedia.org/wiki/Quantitative_analysis_(finance)[30] 01:04:21 kx User version of q https://kx.com/kdb-personal-edition-download/[31] 01:04:32 Python Computer Language https://www.python.org/ Pyq https://kx.com/blog/using-the-nag-library-for-python-with-kdb-and-pyq/ PyKX https://kx.com/videos/an-introduction-to-pykx/[32] 01:07:12 John Earnest https://beyondloom.com/about/index.html John Earnest ArrayCast Episode https://www.arraycast.com/episodes/episode41-john-earnest[33] 01:07:45 Numpy https://numpy.org/ R Computer Language https://en.wikipedia.org/wiki/R_(programming_language) Pandas https://pandas.pydata.org/[34] 01:10:55 CMU Grad Course https://www.cmu.edu/mscf/academics/curriculum/46982-market-microstructure-and-algorithmic-trading.html[35] 01:14:42 Matlab https://en.wikipedia.org/wiki/MATLAB[36] 01:16:05 k nearest neighbours algorithm https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm[37] 01:18:00 Q for Mortals https://code.kx.com/q4m3[38] 01:19:30 How to speed up Pandas Code by 100x? https://medium.com/geekculture/simple-tricks-to-speed-up-pandas-by-100x-3b7e705783a8[39] 01:22:30 contact AT ArrayCast DOT COM[40] 01:24:21 Old Master q https://en.m.wikipedia.org/wiki/Old_Master_Q
Emmanuel Turlay spent more than a decade in engineering roles at tech-first companies like Instacart and Cruise before realizing machine learning engineers need a better solution. Emmanuel started Sematic earlier this year and was part of the YC summer 2022 batch. He recently raised a $3M seed round from investors including Race Capital and Soma Capital. Thanks to friend of the podcast and former guest Hina Dixit from Samsung NEXT for the intro to Emmanuel.I've been involved with the AutoML space for five years and, for full disclosure, I'm on the board of Auger which is in a related space. I've seen the space evolve and know how much room there is for innovation. This one's a great education about what's broken and what's ahead from a true machine learning pioneer.Listen and learn...How to turn every software engineer into a machine learning engineerHow AutoML platforms are automating tasks performed in traditional ML toolsHow Emmanuel translated learning from Cruise, the self-driving car company, into an open source platform available to all data engineering teamsHow to move from building an ML model locally to deploying it to the cloud and creating a data pipeline... in hoursWhat you should know about self-driving cars... from one of the experts who developed the brains that power themWhy 80% of AI and ML projects failReferences in this episode:Unscrupulous users manipulate LLMs to spew hateHina Dixit from Samsung NEXT on AI and the Future of WorkApache BeamEliot Shmukler, Anomalo CEO, on AI and the Future of Work
Benjy Weinberger is the co-founder of Toolchain, a build tool platform. He is one of the creators of the original Pants, an in-house Twitter build system focused on Scala, and was the VP of Infrastructure at Foursquare. Toolchain now focuses on Pants 2, a revamped build system.Apple Podcasts | Spotify | Google PodcastsIn this episode, we go back to the basics, and discuss the technical details of scalable build systems, like Pants, Bazel and Buck. A common challenge with these build systems is that it is extremely hard to migrate to them, and have them interoperate with open source tools that are built differently. Benjy's team redesigned Pants with an initial hyper-focus on Python to fix these shortcomings, in an attempt to create a third generation of build tools - one that easily interoperates with differently built packages, but still fast and scalable.Machine-generated Transcript[0:00] Hey, welcome to another episode of the Software at Scale podcast. Joining me here today is Benji Weinberger, previously a software engineer at Google and Twitter, VP of Infrastructure at Foursquare, and now the founder and CEO of Toolchain.Thank you for joining us.Thanks for having me. It's great to be here. Yes. Right from the beginning, I saw that you worked at Google in 2002, which is forever ago, like 20 years ago at this point.What was that experience like? What kind of change did you see as you worked there for a few years?[0:37] As you can imagine, it was absolutely fascinating. And I should mention that while I was at Google from 2002, but that was not my first job.I have been a software engineer for over 25 years. And so there were five years before that where I worked at a couple of companies.One was, and I was living in Israel at the time. So my first job out of college was at Check Point, which was a big successful network security company. And then I worked for a small startup.And then I moved to California and started working at Google. And so I had the experience that I think many people had in those days, and many people still do, of the work you're doing is fascinating, but the tools you're given to do it with as a software engineer are not great.This, I'd had five years of experience of sort of struggling with builds being slow, builds being flaky with everything requiring a lot of effort. There was almost a hazing,ritual quality to it. Like, this is what makes you a great software engineer is struggling through the mud and through the quicksand with this like awful substandard tooling. And,We are not users, we are not people for whom products are meant, right?We make products for other people. Then I got to Google.[2:03] And Google, when I joined, it was actually struggling with a very massive, very slow make file that took forever to parse, let alone run.But the difference was that I had not seen anywhere else was that Google paid a lot of attention to this problem and Google devoted a lot of resources to solving it.And Google was the first place I'd worked and I still I think in many ways the gold standard of developers are first class participants in the business and deserve the best products and the best tools and we will if there's nothing out there for them to use, we will build it in house and we will put a lot of energy into that.And so it was for me, specifically as an engineer.[2:53] A big part of watching that growth from in the sort of early to late 2000s was. The growth of engineering process and best practices and the tools to enforce it and the thing i personally am passionate about is building ci but i'm also talking about.Code review tools and all the tooling around source code management and revision control and just everything to do with engineering process.It really was an object lesson and so very, very fascinating and really inspired a big chunk of the rest of my career.I've heard all sorts of things like Python scripts that had to generate make files and finally they move the Python to your first version of Blaze. So it's like, it's a fascinating history.[3:48] Maybe can you tell us one example of something that was like paradigm changing that you saw, like something that created like a magnitude, like order of magnitude difference,in your experience there and maybe your first aha moment on this is how good like developer tools can be?[4:09] Sure. I think I had been used to using make basically up till that point. And Google again was, as you mentioned, using make and really squeezing everything it was possible to squeeze out of that lemon and then some.[4:25] But when the very early versions of what became blaze which was that big internal build system which inspired basil which is the open source variant of that today. Hey one thing that really struck me was the integration with the revision controls system which was and i think still is performance.I imagine many listeners are very familiar with Git. Perforce is very different. I can only partly remember all of the intricacies of it, because it's been so long since I've used it.But one interesting aspect of it was you could do partial checkouts. It really was designed for giant code bases.There was this concept of partial checkouts where you could check out just the bits of the code that you needed. But of course, then the question is, how do you know what those bits are?But of course the build system knows because the build system knows about dependencies. And so there was this integration, this back and forth between the, um.[5:32] Perforce client and the build system that was very creative and very effective.And allowed you to only have locally on your machine, the code that you actually needed to work on the piece of the codebase you're working on,basically the files you cared about and all of their transitive dependencies. And that to me was a very creative solution to a problem that involved some lateral thinking about how,seemingly completely unrelated parts of the tool chain could interact. And that's kind of been that made me realize, oh, there's a lot of creative thought at work here and I love it.[6:17] Yeah, no, I think that makes sense. Like I interned there way back in 2016. And I was just fascinated by, I remember by mistake, I ran like a grep across the code base and it just took forever. And that's when I realized, you know, none of this stuff is local.First of all, like half the source code is not even checked out to my machine.And my poor grep command is trying to check that out. But also how seamlessly it would work most of the times behind the scenes.Did you have any experience or did you start working on developer tools then? Or is that just what inspired you towards thinking about developer tools?I did not work on the developer tools at Google. worked on ads and search and sort of Google products, but I was a big user of the developer tools.Exception which was that I made some contributions to the.[7:21] Protocol buffer compiler which i think many people may be familiar with and that is. You know if i very deep part of the toolchain that is very integrated into everything there and so that gave me.Some experience with what it's like to hack on a tool that's everyone in every engineer is using and it's the sort of very deep part of their workflow.But it wasn't until after google when i went to twitter.[7:56] I noticed that the in my time of google my is there the rest of the industry had not. What's up and suddenly i was sort of stressed ten years into the past and was back to using very slow very clunky flaky.Tools that were not designed for the tasks we were trying to use them for. And so that made me realize, wait a minute, I spent eight years using these great tools.They don't exist outside of these giant companies. I mean, I sort of assumed that maybe, you know, Microsoft and Amazon and some other giants probably have similar internal tools, but there's something out there for everyone else.And so that's when I started hacking on that problem more directly was at Twitter together with John, who is now my co-founder at Toolchain, who was actually ahead of me and ahead ofthe game at Twitter and already begun working on some solutions and I joined him in that.Could you maybe describe some of the problems you ran into? Like were the bills just taking forever or was there something else?[9:09] So there were...[9:13] A big part of the problem was that Twitter at the time, the codebase I was interested in and that John was interested in was using Scala. Scala is a fascinating, very rich language.[9:30] Its compiler is very slow. And we were in a situation where, you know, you'd make some small change to a file and then builds would take just,10 minutes, 20 minutes, 40 minutes. The iteration time on your desktop was incredibly slow.And then CI times, where there was CI in place, were also incredibly slow because of this huge amount of repetitive or near repetitive work. And this is because the build tools,etc. were pretty naive about understanding what work actually needs to be done given a set of changes.There's been a ton of work specifically on SBT since then.[10:22] It has incremental compilation and things like that, but nonetheless, that still doesn't really scale well to large corporate codebases that are what people often refer to as monorepos.If you don't want to fragment your codebase with all of the immense problems that that brings, you end up needing tooling that can handle that situation.Some of the biggest challenges are, how do I do less than recompile the entire codebase every time. How can tooling help me be smart about what is the correct minimal amount of work to do.[11:05] To make compiling and testing as fast as it can be?[11:12] And I should mention that I dabbled in this problem at Twitter with John. It was when I went to Foursquare that I really got into it because Foursquare similarly had this big Scala codebase with a very similar problem of incredibly slow builds.[11:29] The interim solution there was to just upgrade everybody's laptops with more RAM and try and brute force the problem. It was very obvious to everyone there, tons of,force-creation pattern still has lots of very, very smart engineers.And it was very obvious to them that this was not a permanent solution and we were casting around for...[11:54] You know what can be smart about scala builds and i remember this thing that i had hacked on twitter and. I reached out to twitter and ask them to open source it so we could use it and collaborate on it wasn't obviously some secret sauce and that is how the very first version of the pants open source build system came to be.I was very much designed around scarlet did eventually.Support other languages. And we hacked on it a lot at Foursquare to get it to...[12:32] To get the codebase into a state where we could build it sensibly. So the one big challenge is build speed, build performance.The other big one is managing dependencies, keeping your codebase sane as it scales.Everything to do with How can I audit internal dependencies?How do I make sure that it is very, very easy to accidentally create all sorts of dependency tangles and cycles and create a code base whose dependency structure is unintelligible, really,hard to work with and actually impacts performance negatively, right?If you have a big tangle of dependencies, you're more likely to invalidate a large chunk of your code base with a small change.And so tooling that allows you to reason about the dependencies in your code base and.[13:24] Make it more tractable was the other big problem that we were trying to solve. Mm-hmm. No, I think that makes sense.I'm guessing you already have a good understanding of other build systems like Bazel and Buck.Maybe could you walk us through what are the difference for PANs, Veevan? What is the major design differences? And even maybe before that, like, how was Pants designed?And is it something similar to like creating a dependency graph? You need to explicitly include your dependencies.Is there something else that's going on?[14:07] Maybe just a primer. Yeah. Absolutely. So I should mention, I was careful to mention, you mentioned Pants V1.The version of Pants that we use today and base our entire technology stack around is what we very unimaginatively call Pants V2, which we launched two years ago almost to the day.That is radically different from Pants V1, from Buck, from Bazel. It is quite a departure in ways that we can talk about later.One thing that I would say Panacea V1 and Buck and Bazel have in common is that they were designed around the use cases of a single organization. is a.[14:56] Open source variant or inspired by blaze its design was very much inspired by. Here's how google does engineering and a buck similarly for facebook and pansy one frankly very similar for.[15:11] Twitter and we sort of because Foursquare also contributed a lot to it, we sort of nudged it in that direction quite a bit. But it's still very much if you did engineering in this one company's specific image, then this might be a good tool for you.But you had to be very much in that lane.But what these systems all look like is, and the way they are different from much earlier systems is.[15:46] They're designed to work in large scalable code bases that have many moving parts and share a lot of code and that builds a lot of different deployables, different, say, binaries or DockerDocker images or AWS lambdas or cloud functions or whatever it is you're deploying, Python distributions, Java files, whatever it is you're building, typically you have many of them in this code base.Could be lots of microservices, could be just lots of different things that you're deploying.And they live in the same repo because you want that unity. You want to be able to share code easily. you don't want to introduce dependency hell problems in your own code. It's bad enough that we have dependency hell problems third-party code.[16:34] And so these systems are all if you squint at them from thirty thousand feet today all very similar in that they make that the problem of. Managing and building and testing and packaging in a code base like that much more tractable and the way they do this is by applying information about the dependencies in your code base.So the important ingredient there is that these systems understand the find the relatively fine grained dependencies in your code base.And they can use that information to reason about work that needs to happen. So a naive build system, you'd say, run all the tests in the repo or in this part of the repo.So a naive system would literally just do that, and first they would compile all the code.[17:23] But a scalable build system like these would say, well, you've asked me to run these tests, but some of them have already been cached and these others, okay, haven't.So I need to look at these ones I actually need to run. So let me see what needs to be done before I can run them.Oh, so these source files need to be compiled, but some of those already in cache and then these other ones I need to compile. But I can apply concurrency because there are multiple cores on this machine.So I can know through dependency analysis which compile jobs can run concurrently and which cannot. And then when it actually comes time to run the tests, again, I can apply that sort of concurrency logic.[18:03] And so these systems, what they have in common is that they use dependency information to make your building testing packaging more tractable in a large code base.They allow you to not have to do the thing that unfortunately many organizations find themselves doing, which is fragmenting the code base into lots of different bits andsaying, well, every little team or sub team works in its own code base and they consume each other's code through, um, so it was third party dependencies in which case you are introducing a dependency versioning hell problem.Yeah. And I think that's also what I've seen that makes the migration to a tool like this hard. Cause if you have an existing code base that doesn't lay out dependencies explicitly.[18:56] That migration becomes challenging. If you already have an import cycle, for example.[19:01] Bazel is not going to work with you. You need to clean that up or you need to create one large target where the benefits of using a tool like Bazel just goes away. And I think that's a key,bit, which is so fascinating because it's the same thing over several years. And I'm hoping that,it sounds like newer tools like Go, at least, they force you to not have circular dependencies and they force you to keep your code base clean so that it's easy to migrate to like a scalable build system.[19:33] Yes exactly so it's funny that is the exact observation that let us to pans to see to so they said pans to be one like base like buck was very much inspired by and developed for the needs of a single company and other companies were using it a little bit.But it also suffered from any of the problems you just mentioned with pans to for the first time by this time i left for square and i started to chain with the exact mission of every company every team of any size should have this kind of tooling should have this ability this revolutionary ability to make the code base is fast and tractable at any scale.And that made me realize.We have to design for that we have to design for not for. What a single company's code base looks like but we have to design.To support thousands of code bases of all sorts of different challenges and sizes and shapes and languages and frameworks so.We actually had to sit down and figure out what does it mean to make a tool.Like this assistant like this adoptable over and over again thousands of times you mentioned.[20:48] Correctly, that it is very hard to adopt one of those earlier tools because you have to first make your codebase conform to whatever it is that tool expects, and then you have to write huge amounts of manual metadata to describe all of the dependencies in your,the structure and dependencies of your codebase in these so-called build files.If anyone ever sees this written down, it's usually build with all capital letters, like it's yelling at you and that those files typically are huge and contain a huge amount of information your.[21:27] I'm describing your code base to the tool with pans be to eat very different approaches first of all we said this needs to handle code bases as they are so if you have circular dependencies it should handle them if you have. I'm going to handle them gracefully and automatically and if you have multiple conflicting external dependencies in different parts of your code base this is pretty common right like you need this version of whatever.Hadoop or NumPy or whatever it is in this part of the code base, and you have a different conflicting version in this other part of the code base, it should be able to handle that.If you have all sorts of dependency tangles and criss-crossing and all sorts of things that are unpleasant, and better not to have, but you have them, the tool should handle that.It should help you remove them if you want to, but it should not let those get in the way of adopting it.It needs to handle real-world code bases. The second thing is it should not require you to write all this crazy amount of metadata.And so with Panzer V2, we leaned in very hard on dependency inference, which means you don't write these crazy build files.You write like very tiny ones that just sort of say, you know, here is some code in this language for the build tool to pay attention to.[22:44] But you don't have to edit the added dependencies to them and edit them every time you change dependencies.Instead, the system infers dependencies by static analysis. So it looks at your, and it does this at runtime.So you, you know, almost all your dependencies, 99% of the time, the dependencies are obvious from import statements.[23:05] And there are occasional and you can obviously customize this because sometimes there are runtime dependencies that have to be inferred from like a string. So from a json file or whatever is so there are various ways to customize this and of course you can always override it manually.If you have to be generally speaking ninety.Seven percent of the boilerplate that used to going to build files in those old systems including pans v1 no. You know not claiming we did not make the same choice but we goes away with pans v2 for exactly the reason that you mentioned these tools,because they were designed to be adopted once by a captive audience that has no choice in the matter.And it was designed for how that code base that adopting code base already is. is these tools are very hard to adopt.They are massive, sometimes multi-year projects outside of that organization. And we wanted to build something that you could adopt in days to weeks and would be very easy,to customize to your code base and would not require these massive wholesale changes or huge amounts of metadata.And I think we've achieved that. Yeah, I've always wondered like, why couldn't constructing the build file be a part of the build. In many ways, I know it's expensive to do that every time. So just like.[24:28] Parts of the build that are expensive, you cache it and then you redo it when things change.And it sounds like you've done exactly that with BANs V2.[24:37] We have done exactly that. The results are cached on a profile basis. So the very first time you run something, then dependency inference can take some time. And we are looking at ways to to speed that up.I mean, like no software system has ever done, right? Like it's extremely rare to declare something finished. So we are obviously always looking at ways to speed things up.But yeah, we have done exactly what you mentioned. We don't, I should mention, we don't generate the dependencies into build for, we don't edit build files and then you check them in.We do that a little bit. So I mentioned you do still with PANSTL V2, you need these little tiny build files that just say, here is some code.They typically can literally be one line sometimes, almost like a marker file just to say, here is some code for you to pay attention to.We're even working on getting rid of those.We do have a little script that generates those one time just to help you onboard.But...[25:41] The dependencies really are just generated a runtime as on demand as needed and used a runtime so we don't have this problem of. Trying to automatically add or edit a otherwise human authored file that is then checked in like this generating and checking in files is.Problematic in many ways, especially when those files also have to take human written edits.So we just do away with all of that and the dependency inference is at runtime, on demand, as needed, sort of lazily done, and the information is cached. So both cached in memory in the surpassed V2 has this daemon that runs and caches a huge amount of state in memory.And the results of running dependency inference are also cached on disk. So they survive a daemon restart, etc.I think that makes sense to me. My next question is going to be around why would I want to use panthv2 for a smaller code base, right? Like, usually with the smaller codebase, I'm not running into a ton of problems around the build.[26:55] I guess, do you notice these inflection points that people run into? It's like, okay, my current build setup is not enough. What's the smallest codebase that you've seen that you think could benefit? Or is it like any codebase in the world? And I should start with,a better build system rather than just Python setup.py or whatever.I think the dividing line is, will this code base ever be used for more than one thing?[27:24] So if you have a, let's take the Python example, if literally all this code base will ever do is build this one distribution and a top level setup pie is all I need. And that is, you know, this,sometimes you see this with open source projects and the code base is going to remain relatively small, say it's only ever going to be a few thousand lines and the tests, even if I runthe tests from scratch every single time, it takes under five minutes, then you're probably fine.But I think two things I would look at are, am I going to be building multiple things in this code base in the future, or certainly if I'm doing it now.And that is much more common with corporate code bases. You have to ask yourself, okay, my team is growing, more and more people are cooperating on this code base.I want to be able to deploy multiple microservices. I want to be able to deploy multiple cloud functions.I want to be able to deploy multiple distributions or third-party artifacts.I want to be able to.[28:41] You know, multiple sort of data science jobs, whatever it is that you're building. If you want, if you ever think you might have more than one, now's the time to think about,okay, how do I structure the code base and what tooling allows me to do this effectively?And then the other thing to look at is build times. If you're using compiled languages, then obviously compilation, in all cases testing, if you start to see like, I can already see that that tests are taking five minutes, 10 minutes, 15 minutes, 20 minutes.Surely, I want some technology that allows me to speed that up through caching, through concurrency, through fine-grained invalidation, namely, don't even attempt to do work that isn't necessary for the result that was asked for.Then it's probably time to start thinking about tools like this, because the earlier you adopt it, the easier it is to adopt.So if you wait until you've got a tangle of multiple setup pies in the repo and it's unclear how you manage them and how you keep their dependencies synchronized,so there aren't version conflicts across these different projects, specifically with Python,this is an interesting problem.I would say with other languages, there is more because of the compilation step in jvm languages or go you.[30:10] Encounter the need for a build system much much earlier a bill system of some kind and then you will ask yourself what kind with python because you can get a bite for a while just running. What are the play gate and pie test and i directly and all everything is all together in a single virtual and.But the Python tooling, as mighty as it is, mostly is not designed for larger code bases with multiple, that deploy multiple things and have multiple different sets of.[30:52] Internal and external dependencies the tooling generally implicitly assume sort of one top level set up i want top level. Hi project dot com all you know how are you configuring things and so especially using python let's say for jango flask apps or for data scienceand your code base is growing and you've hired a bunch of data scientists and there's more and more code going in there. With Python, you need to start thinking about what tooling allows me to scale this code base. No, I think I mostly resonate with that. The first question that comes to my mind is,let's talk specifically about the deployment problem. If you're deployed to multiple AWS lambdas or cloud functions or whatever, the first thought that would come to my mind isis I can use separate Docker images that can let me easily produce this container image that I can ship independently.Would you say that's not enough? I totally get that for the build time problem.A Docker image is not going to solve anything. But how about the deployment step?[32:02] So again, with deployments, I think there are two ways where a tool like this can really speed things up.One is only build the things that actually need to be redeployed. And because the tool understands dependencies and can do change analysis, it can figure that out.So one of the things that HansB2 does is it integrates with Git.And so it natively understands how to figure out Git diffs. So you can say something like, show me all the whatever, lambdas, let's say, that are affected by changes between these two branches.[32:46] And it knows and it understands it can say, well, these files changed and you know, we, I understand the transitive dependencies of those files.So I can see what actually needs to be deployed. And, you know, many cases, many things will not need to be redeployed because they haven't changed.The other thing is there's a lot of performance improvements and process improvements around building those images. So, for example, we have for Python specifically, we have an executable format called PEX,which stands for Python executable, which is a single file that embeds all of your Python code that is needed for your deployable and all of its external requirements, transitive external requirements, all bundled up into this single sort of self-executing file.This allows you to do things like if you have to deploy 50 of these, you can basically have a single docker image.[33:52] The different then on top of that you add one layer for each of these fifty and the only difference in that layer is the presence of this pecs file. Where is without all this typically what you would do is.You have fifty docker images each one of which contains a in each one of which you have to build a virtual and which means running.[34:15] Pip as part of building the image, and that gets slow and repetitive, and you have to do it 50 times.We have a lot of ways to speed up. Even if you are deploying 50 different Docker images, we have ways of speeding that up quite dramatically.Because again, of things like dependency analysis, the PECS format, and the ability to build incrementally.Yeah, I think I remember that at Dropbox, we came up with our own, like, par format to basically bundle up a Python binary with, I think par stood for Python archive. I'm notentirely sure. But it did something remarkably similar to solve exactly this problem. It just takes so long, especially if you have a large Python code base. I think that makes sense to me. The other thing that one might ask is, with Python, you don't really have,too long of a build time, is what you would guess, because there's nothing to build. Maybe myPy takes some time to do some static analysis, and, of course, your tests can take forever,and you don't want to rerun them. But there isn't that much of a build time that you have to think about. Would you say that you agree with this, or there's some issues that end,up happening on real-world code basis.[35:37] Well that's a good question the word builds means different things to different people and we recently taken to using the time see i more. Because i think that is clear to people what that means but when i say build or see i mean it in the law in in the extended sense everything you do to go from.Human written source code to a verified.Test did. deployable artifact and so it's true that for python there's no compilation step although arguably. Running my pie is really important and now that i'm really in the habit of using.My pie i will probably never not use it on python code ever again but so that are.[36:28] Sort of build-ish steps for Python such as type checking, such as running code generators like Thrift or Protobuf.And obviously a big, big one is running, resolving third-party dependencies such as running PIP or poetry or whatever it is you're using. So those are all build steps.But with Python, really the big, big, big thing is testing and packaging and primarily testing.And so with Python, you have to be even more rigorous about unit testing than you do with other languages because you don't have a compiler that is catching whole classes of bugs.So and again, MyPy and type checking does really help with that. And so when I build to me includes, build in the large sense includes running tests,includes packaging and includes everything, all the quality control that you run typically in CI or on your desktop in order to go say, well, I've made some edits and here's the proof that these edits are good and I can merge them or deploy them.[37:35] I think that makes sense to me. And like, I certainly saw it with the limited number of testing, the limited amount of type checking you can do with Python, like MyPy is definitelyimproving on this. You just need to unit test a lot to get the same amount of confidence in your own code and then unit tests are not cheap. The biggest question that comes tomy mind is that is BANs V2 focused on Python? Because I have a TypeScript code base at my workplace and I would love to replace the TypeScript compiler with something that was slightly smarter and could tell me, you know what, you don't need to run every unit test every change.[38:16] Great question so when we launched a pass me to which was two years ago. The.We focused on python and that was the initial language we launched with because you had to start somewhere and in the city ten years in between the very scarlet centric work we were doing on pansy one. And the launch of hands be to something really major happened in the industry which was the python skyrocketed in popularity sky python went from.Mostly the little scripting language around the edges of your quote unquote real code, I can use python like fancy bash to people are building massive multi billion dollar businesses entirely on python code bases and there are a few things that drove this one was.I would say the biggest one probably was the python became the. Language of choice for data science and we have strong support for those use cases. There was another was the,Django and Flask became very popular for writing web apps more and more people were used there were more in Intricate DevOps use cases and Python is very popular for DevOps for various good reasons. So.[39:28] Python became super popular. So that was the first thing we supported in pants v2, but we've since added support for or Go, Java, Scala, Kotlin, Shell.Definitely what we don't have yet is JavaScript TypeScript. We are looking at that very closely right now, because that is the very obvious next thing we want to add.Actually, if any listeners have strong opinions about what that should look like, we would love to hear from them or from you on our Slack channels or on our GitHub discussions where we are having some lively discussions about exactly this because the JavaScript.[40:09] And TypeScript ecosystem is already very rich with tools and we want to provide only value add, right? We don't want to say, you have to, oh, you know, here's another paradigm you have to adopt.And here's, you know, you have to replace, you've just done replacing whatever this with this, you know, NPM with yarn. And now you have to do this thing. And now we're, we don't want to beanother flavor of the month. We only want to do the work that uses those tools, leverages the existing ecosystem but adds value. This is what we do with Python and this is one of the reasons why our Python support is very, very strong, much stronger than any other comparable tool out there is.[40:49] A lot of leaning in on the existing Python tool ecosystem but orchestrating them in a way that brings rigor and speed to your builds.And I haven't used the word we a lot. And I just kind of want to clarify who we is here.So there is tool chain, the company, and we're working on, um, uh, SAS and commercial, um, solutions around pants, which we can talk about in a bit.But there is a very robust open source community around pants that is not. tightly held by Toolchain, the company in a way that some other companies open source projects are.So we have a lot of contributors and maintainers on Pants V2 who are not working at Toolchain, but are using Pants in their own companies and their own organizations.And so we have a very wide range of use cases and opinions that are brought to bear. And this is very important because, as I mentioned earlier,we are not trying to design a system for one use case, for one company or a team's use case.We are trying, you know, we are working on a system we want.[42:05] Adoption for over and over and over again at a wide variety of companies. And so it's very important for us to have the contributions and the input from a wide variety of teams and companiesand people. And it's very fortunate that we now do. I mean, on that note, the thing that comes to my mind is another benefit of your scalable build system like Vance or Bazel or Buck is that youYou don't have to learn various different commands when you are spelunking through the code base, whether it's like a Go code base or like a Java code base or TypeScript code base.You just have to run pants build X, Y, Z, and it can construct the appropriate artifacts for you. At least that was my experience with Bazel.Is that something that you are interested in or is that something that pants V2 does kind of act as this meta layer for various other build systems or is it much more specific and knowledgeable about languages itself?[43:09] It's, I think your intuition is correct. The idea is we want you to be able to do something like pants test or pants test, you know, give it a path to a directory and it understands what that means.Oh, this directory contains Python code. Therefore, I should run PyTest in this way. And oh, Oh, it also contains some JavaScript code, so I should run the JavaScript test in this way.And it basically provides a conceptual layer above all the individual tools that gives you this uniformity across frameworks, across languages.One way to think about this is.[43:52] The tools are all very imperative. say you have to run them with a whole set of flags and inputs and you have to know how to use each one separately. So it's like having just the blades of a Swiss Army knife withno actual Swiss Army knife. A tool like Pants will say, okay, we will encapsulate all of that complexity into a much more simple command line interface. So you can do, like I said,test or pants lint or pants format and it understands, oh, you asked me to format your code. I see that you have the black and I sort configured as formatters. So I will run them. And I happen to know that formatting, because formatting can change the source files,I have to run them sequentially. But when you ask for lint, it's not changing the source files. So I know that I can run them multiple lint as concurrently, that sort of logic. And And different tools have different ways of being configured or of telling you what they want to do, but we...[44:58] Can't be to sort of encapsulate all that away from you and so you get this uniform simple command line interface that abstract away a lot of the specifics of these tools and let you run simple commands and the reason this is important is that. This extra layer of indirection is partly what allows pants to apply things like cashing.And invalidation and concurrency because what you're saying is.[45:25] Hey, the way to think about it is not, I am telling pants to run tests. It is I am telling pants that I want the results of tests, which is a subtle difference.But pants then has the ability to say, well, I don't actually need to run pi test on all these tests because I have results from some of them already cached. So I will return them from cache.So that layer of indirection not only simplifies the UI, but provides the point where you can apply things like caching and concurrency.Yeah, I think every programmer wants to work with declarative tools. I think SQL is one of those things where you don't have to know how the database works. If SQL were somewhat easier, that dream would be fulfilled. But I think we're all getting there.I guess the next question that I have is, what benefit do I get by using the tool chain, like SaaS product versus Pants V2?When I think about build systems, I think about local development, I think about CI.[46:29] Why would I want to use the SaaS product? That's a great question.So Pants does a huge amount of heavy lifting, but in the end it is restricted to the resources is on the machine on which it's running. So when I talk about cash, I'm talking about the local cash on that machine. When I talk about concurrency, I'm talking about using,the cores on your machine. So maybe your CI machine has four cores and your laptop has eight cores. So that's the amount of concurrency you get, which is not nothing at all, which is great.[47:04] Thanks for watching![47:04] You know as i mentioned i worked at google for many years and then other companies where distributed systems were saying like i come from a distributed systems background and it really. Here is a problem.All of a piece of work taking a long time because of. Single machine resource constraints the obvious answer here is distributed distributed the work user distributed system and so that's what tool chain offers essentially.[47:30] You configure Pants to point to the toolchain system, which is currently SAS.And we will have some news soon about some on-prem solutions.And now the cache that I mentioned is not just did this test run with these exact inputs before on my machine by me me while I was iterating, but has anyone in my organization or any CI run this test before,with these exact inputs?So imagine a very common situation where you come in in the morning and you pull all the changes that have happened since you last pulled.Those changes presumably passed CI, right? And the CI populated the cache.So now when I run tests, I can get cache hits from the CI machine.[48:29] Now pretty much, yeah. And then with concurrency, again, so let's say, you know, post cache, there are still 200 tests that need to be run.I could run them eight at a time on my machine or the CI machine could run them, you know, say, four at a time on four cores, or I could run 50 or 100 at a time on a cluster of machines.That's where, again, as your code base gets bigger and bigger, that's where some massive, massive speedups come in.The other aspects of the... I should mention that the remote execution that I just mentioned is something we're about to launch. It is not available today. The remote caching is.The other aspects are things like observability. So when you run builds on your laptop or CI, they're ephemeral.Like the output gets lost in the scroll back.And it's just a wall of text that gets lost with them.[49:39] Toolchain all of that information is captured and stored in structured form so you have. Hey the ability to see past bills and see build behavior over time and drill death search builds and drill down into individual builds and see well.How often does this test fail and you know when did this get slow all this kind of information and so you get.This more enterprise level.Observability into a very core piece of developer productivity, which is the iteration time.The time it takes to run tests and build deployables and parcel the quality control checks so that you can merge and deploy code directly relates to time to release.It directly relates to some of the core metrics of developer productivity. How long is it going to take to get this thing out the door?And so having the ability to both speed that up dramatically through distributing the work and having observability into what work is going on, that is what toolchain provides,on top of the already, if I may say, pretty robust open source offering.[51:01] So yeah, that's kind of it.[51:07] Pants on its own gives you a lot of advantages, but it runs standalone. Plugging it into a larger distributed system really unleashes the full power of Pants as a client to that system.[51:21] No, I think what I'm seeing is this interesting convergence. There's several companies trying to do this for Bazel, like BuildBuddy and Edgeflow. So, and it really sounds like the build system of the future, like 10 years from now.[51:36] No one will really be developing on their local machines anymore. Like there's GitHub code spaces on one side. It's like you're doing all your development remotely.[51:46] I've always found it somewhat odd that development that happens locally and whatever scripts you need to run to provision your CI machine to run the same set of testsare so different sometimes that you can never tell why something's passing locally and failing in in CI or vice versa. And there really should just be this one execution layer that can say, you know what, I'm going to build at a certain commit or run at a certain commit.And that's shared between the local user and the CI user. And your CI script is something as simple as pants build slash slash dot dot dot. And it builds the whole code base for,you. So yeah, I certainly feel like the industry is moving in that direction. I'm curious whether You think that's the same.Do you have an even stronger vision of how folks will be developing 10 years from now? What do you think it's going to look like?Oh, no, I think you're absolutely right. I think if anything, you're underselling it. I think this is how all development should be and will be in the future for multiple reasons.One is performance.[52:51] Two is the problem of different platforms. And so today, big thorny problem is I want to, you know, I want to,I'm developing on my Mac book, but the production, so I'm running, when I run tests locally and when I run anything locally, it's running on my Mac book, but that's not our deployable, right?Typically your deploy platform is some flavor of Linux. So...[53:17] With the distributed system approach you can run the work in. Containers that exactly match your production environments you don't even have to care about can this run.On will my test pass on mac os do i need ci the runs on mac os just to make sure the developers can. past test on Mac OS and that is somehow correlated with success on the production environment.You can cut away a whole suite of those problems, which today, frankly, I had mentioned earlier, you can get cache hits on your desktop from remote, from CI populating the cache.That is hampered by differences in platform.Is hampered by other differences in local setup that we are working to mitigate. But imagine a world in which build logic is not actually running on your MacBook, or if it is,it's running in a container that exactly matches the container that you're targeting.It cuts away a whole suite of problems around platform differences and allows you to focus because on just a platform you're actually going to deploy too.[54:35] And the...[54:42] And just the speed and the performance of being able to work and deploy and the visibility that it gives you into the productivity and the operational work of your development team,I really think this absolutely is the future.There is something very strange about how in the last 15 years or so, so many business functions have had the distributed systems treatment applied to them.Function is now that there are these massive valuable companies providing systems that support sales and systems that support marketing and systems that support HR and systems supportoperations and systems support product management and systems that support every business function,and there need to be more of these that support engineering as a business function.[55:48] And so i absolutely think the idea that i need a really powerful laptop so that my running tests can take thirty minutes instead of forty minutes when in reality it should take three minutes is. That's not the future right the future is to as it has been for so many other systems to the web the laptop is that i can take anywhere is.Particularly in these work from home times, is a work from anywhere times, is just a portal into the system that is doing the actual work.[56:27] Yeah. And there's all these improvements across the stack, right? When I see companies like Versel, they're like, what if you use Next.js, we provide the best developer platform forthat and we want to provide caching. Then there's like the lower level systems with build systems, of course, like bands and Bazel and all that. And at each layer, we're kindof trying to abstract the problem out. So to me, it still feels like there is a lot of innovation to be done. And I'm also going to be really curious to know, you know, there'sgoing to be like a few winners of this space, or if it's going to be pretty broken up. And like everyone's using different tools. It's going to be fascinating, like either way.Yeah, that's really hard to know. I think one thing you mentioned that I think is really important is you said your CI should be as simple as just pants build colon, colon, or whatever.That's our syntax would be sort of pants test lint or whatever.I think that's really important. So.[57:30] Today one of the big problems with see i. Which is still growing right now home market is still growing is more more teams realize the value and importance of automated.Very aggressive automated quality control. But configuring CI is really, really complicated. Every CI provider have their own configuration language,and you have to reason about caching, and you have to manually construct cache keys to the extent,that caching is even possible or useful.There's just a lot of figuring out how to configure and set up CI, And even then it's just doing the naive thing.[58:18] So there are a couple of interesting companies, Dagger and Earthly, or interesting technologies around simplifying that, but again, you still have to manually,so they are providing a, I would say, better config and more uniform config language that allows you to, for example, run build steps in containers.And that's not nothing at all.[58:43] Um, but you still manually creating a lot of, uh, configuration to run these very coarse grained large scale, long running build steps, you know, I thinkthe future is something like my entire CI config post cloning the repo is basically pants build colon, colon, because the system does the configuration for you.[59:09] It figures out what that means in a very fast, very fine grained way and does not require you to manually decide on workflows and steps and jobs and how they all fit together.And if I want to speed this thing up, then I have to manually partition the work somehow and write extra config to implement that partitioning.That is the future, I think, is rather than there's the CI layer, say, which would be the CI providers proprietary config or theodagger and then underneath that there is the buildtool, which would be Bazel or Pants V2 or whatever it is you're using, could still be we make for many companies today or Maven or Gradle or whatever, I really think the future is the integration of those two layers.In the same way that I referenced much, much earlier in our conversation, how one thing that stood out to me at Google was that they had the insight to integrate the version control layer and the build tool to provide really effective functionality there.I think the build tool being the thing that knows about your dependencies.[1:00:29] Can take over many of the jobs of the c i configuration layer in a really smart really fast. Where is the future where essentially more and more of how do i set up and configure and run c i is delegated to the thing that knows about your dependencies and knows about cashing and knows about concurrency and is able,to make smarter decisions than you can in a YAML config file.[1:01:02] Yeah, I'm excited for the time that me as a platform engineer has to spend less than 5% of my time thinking about CI and CD and I can focus on other things like improving our data models rather than mucking with the YAML and Terraform configs. Well, yeah.Yeah. Yeah. Today you have to, we're still a little bit in that state because we are engineers and because we, the tools that we use are themselves made out of software. There's,a strong impulse to tinker and there's a strong impulse sake. Well, I want to solve this problem myself or I want to hack on it or I should be able to hack on it. And that's, you should be able to hack on it for sure. But we do deserve more tooling that requires less hacking,and more things and paradigms that have tested and have survived a lot of tire kicking.[1:02:00] Will we always need to hack on them a little bit? Yes, absolutely, because of the nature of what we do. I think there's a lot of interesting things still to happen in this space.Yeah, I think we should end on that happy note as we go back to our day jobs mucking with YAML. Well, thanks so much for being a guest. I think this was a great conversation and I hope to have you again for the show sometime.Would love that. Thanks for having me. It was fascinating. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.softwareatscale.dev
#circuitpythonparsec Speed up LED fades with numpy! To learn about CircuitPython: https://circuitpython.org more info on LED fades w numpy on Tod Kurt's blog here: https://todbot.com/blog/2022/10/21/speed-up-circuitpython-led-animations-10x/ Visit the Adafruit shop online - http://www.adafruit.com ----------------------------------------- LIVE CHAT IS HERE! http://adafru.it/discord Adafruit on Instagram: https://www.instagram.com/adafruit Subscribe to Adafruit on YouTube: http://adafru.it/subscribe New tutorials on the Adafruit Learning System: http://learn.adafruit.com/ -----------------------------------------
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Announcing Squigglepy, a Python package for Squiggle, published by Peter Wildeford on October 19, 2022 on The Effective Altruism Forum. Squiggle is a "simple programming language for intuitive probabilistic estimation". It serves as its own standalone programming language with its own syntax, but it is implemented in JavaScript. I like the features of Squiggle and intend to use it frequently, but I also frequently want to use similar functionalities in Python, especially alongside other Python statistical programming packages like Numpy, Pandas, and Matplotlib. The squigglepy package here implements many Squiggle-like functionalities in Python. The package also has useful utility functions for Bayesian networks (using rejection sampling), pooling forecasts (via weighted geometric mean of odds and others), laplace (including the time-invariant version), and kelly betting. The package and documentation are available on GitHub. The package can be downloaded from Pypi using pip install squigglepy. This package is unofficial and supported by myself and Rethink Priorities. It is not affiliated with or associated with the Quantified Uncertainty Research Institute, which maintains the Squiggle language (in JavaScript). This package is also new and not yet in a stable production version, so you may encounter bugs and other errors. Please report those so they can be fixed. It's also possible that future versions of the package may introduce breaking changes. This package is available under an MIT license. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
Guest Melissa Mendonça Panelists Richard Littauer | Amanda Casari Show Notes Hello and welcome to Sustain! The podcast where we talk about sustaining open source for the long haul. Today, we are so excited to have a wonderful guest, Melissa Mendonça, joining us. Melissa is a Senior Developer Experience Engineer at Quansight, where she focuses more on developer experience and contributor experience. Today, we'll hear all about Quansight and the focus for Melissa's role as a Developer Experience Engineer. Melissa tells us about a grant they are working on with CZI that focuses on NumPy, SciPy, Matplotlib, and pandas, she shares several ideas on what can be done to make people feel seen and heard, and we hear her thoughts on what the future of community management and community development looks like for people entering the role of these projects. Go ahead and download this episode now to hear more! [00:01:25] Melissa tells us her background and her role at Quansight. [00:03:41] When Melissa made the decision to switch from one role to another, Amanda asks if that was her plan or if she learned that the skills that she needed to get things done changed over time. [00:06:10] We find out what the focus is for Melissa's role as a Developer Experience Engineer and what she does on a day-to-day basis. [00:08:43] As Melissa was talking about her projects that they work on at Quansight, Amanda wonders if that's the majority of her portfolio, or if she works across different kinds of projects. We learn about the current grant they are working on with CZI that focuses on NumPy, SciPy, Matplotlib, and pandas. [00:13:18] We learn about the funding model and how sustainable it is. [00:16:20] Melissa shares some great ideas on how we can put more effort into making people feel seen and heard. [00:19:26] Melissa details some things she learned with the open source projects and things she recommends for others with large established projects. [00:22:44] Amanda talks about a 2020 paper that was released in nature called “Array programming with NumPy,” and Melissa gives us her perspective on what happened with the community in 2020, if things have changed, and what needs to be addressed. [00:27:09] Find out how CZI got involved with Melissa's work, what their goals are, and how she's changing in order to adapt towards those goals. [00:31:32] Melissa shares her thoughts on what the future of community management and community development looks like for people who are entering the role for those projects. [00:36:40] We hear more about Python Brasil 2022 that's coming up. [00:38:05] Find out where you can follow Melissa online and learn more about her work. Quotes [00:02:49] “Since Quansight is a company very focused on sustaining and helping maintain open source projects, we are trying to help new contributors, people who want to do the move from contributor to maintainer, understanding what that means, and how we can help them get there, and how we can help improve leadership in our open source projects.” [00:11:53] “This is one of the barriers that we want to break, is that making sure that people understand that these are important, they are core projects in the scientific Python ecosystem, but at the same time they are projects just like any other.” [00:12:17] “I think experience of working with projects that are so old and big has taught me a lot about the dynamics of how people work and how new people try to join these projects and how we can improve on that.” [00:16:41] “We need to make sure that people who do contribution outside of code are credited and that they are valued inside open source projects.” [00:18:20] “I think we should think about diversifying these paths for contribution, but for that we need to go beyond GitHub. We need to go beyond the current metrics that we have for open source, we need to go beyond the current credit system and reputation system that we have for open source contributions.” [00:30:38] “Community managers are not second-class citizens.” Spotlight [00:39:21 Amanda's spotlight is a 2014 paper from MSR called, “The Promises and Perils of Mining GitHub.” [00:40:48] Richard's spotlight is the book, Don't Sleep, There Are Snakes, by Daniel Everett. [00:41:52] Melissa's spotlights are Ralf Gommers and Scientific Python initiative. Links SustainOSS (https://sustainoss.org/) SustainOSS Twitter (https://twitter.com/SustainOSS?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor) SustainOSS Discourse (https://discourse.sustainoss.org/) podcast@sustainoss.org (mailto:podcast@sustainoss.org) Richard Littauer Twitter (https://twitter.com/richlitt?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor) Amanda Casari Twitter (https://twitter.com/amcasari?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor) Melissa Mendonça Twitter (https://twitter.com/melissawm) Melissa Mendonça LinkedIn (https://br.linkedin.com/in/axequalsb) Melissa Mendonça GitHub (https://melissawm.github.io/) Quansight (https://quansight.com/) Quansight Labs (https://labs.quansight.org/) Quansight Lab Projects (https://labs.quansight.org/projects) Quansight Labs Team (https://labs.quansight.org/team) Sustain Podcast-Episode 57: Mikeal Rogers on Building Communities, the Early Days of Node.js, and How to Stay a Coder for Life (https://podcast.sustainoss.org/guests/mikeal) Sustain Podcast-Episode 85: Geoffrey Huntley and Sustaining OSS with Gitpod (https://podcast.sustainoss.org/85) Advancing an inclusive culture in the scientific Python ecosystem (CZI grant for NumPy, SciPy, Matplotlib, and Pandas (https://figshare.com/articles/online_resource/Advancing_an_inclusive_culture_in_the_scientific_Python_ecosystem/16548063) Sustain Podcast-Episode 79: Leah Silen on how NumFocus helps makes scientific code more sustainable (https://podcast.sustainoss.org/79) NumPy (https://numpy.org/) SciPy (https://scipy.org/) Matplotlib (https://matplotlib.org/) pandas (https://pandas.pydata.org/) Sustain Podcast-Episode 64: Travis Oliphant and Russell Pekrul on NumPy, Anaconda, and giving back with FairOSS (https://podcast.sustainoss.org/guests/oliphant) Tania Allard Twitter (https://twitter.com/ixek?lang=en) Array programming with NumPy (nature) (https://www.nature.com/articles/s41586-020-2649-2) Python Brasil 2022 (https://2022.pythonbrasil.org.br/) “The Promises and Perils of Mining GitHub,” by Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M. German, Daniela Damian (https://kblincoe.github.io/publications/2014_MSR_Promises_Perils.pdf) “The Promises and Perils of Mining GitHub,” by Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M. German, Daniela Damian (ACM Digital Library) (https://dl.acm.org/doi/10.1145/2597073.2597074) Daniel Everett (Wikipedia) (https://en.wikipedia.org/wiki/Daniel_Everett#Don't_Sleep,_There_Are_Snakes:_Life_and_Language_in_the_Amazonian_Jungle) Excerpt: ‘Don't Sleep, There Are Snakes' (npr) (https://www.npr.org/2009/12/23/121515579/excerpt-dont-sleep-there-are-snakes?t=1661871384424) Ralf Gommers (GitHub) (https://github.com/rgommers) Scientific Python (https://scientific-python.org/) Credits Produced by Richard Littauer (https://www.burntfen.com/) Edited by Paul M. Bahr at Peachtree Sound (https://www.peachtreesound.com/) Show notes by DeAnn Bahr Peachtree Sound (https://www.peachtreesound.com/) Special Guest: Melissa Mendonça.
Array Cast - September 30, 2022 Show NotesMany thanks to Bob Therriault for gathering these links:[01] 00:03:08 ADSP podcast on K https://adspthepodcast.com/2022/09/23/Episode-96.html[02] 00:03:30 Paradigm Conference 2022 https://esolangconf.com/[03] 00:04:25 Troels Henriksen https://sigkill.dk/[04] 00:05:05 Futhark https://futhark-lang.org/[05] 00:06:12 Linux https://www.linux.org/[06] 00:08:00 Textualize https://www.textualize.io/[07] 00:08:27 Standard ML https://en.wikipedia.org/wiki/Standard_ML Common Lisp https://en.wikipedia.org/wiki/Common_Lisp Haskell https://www.haskell.org/[08] 00:09:50 Cosmin Oancea http://hjemmesider.diku.dk/~zgh600/[09] 00:10:53 Ocaml https://ocaml.org/[10] 00:12:20 Numpy https://numpy.org/ PyTorch https://github.com/pytorch/pytorch[11] 00:13:07 Single Assignment C https://www.sac-home.org/index[12] 00:13:20 Codfns https://github.com/Co-dfns/Co-dfns DEX https://github.com/google-research/dex-lang Accelerate for Haskell https://www.acceleratehs.org Copperhead https://github.com/bryancatanzaro/copperhead Tensorflow https://github.com/tensorflow/tensorflow JAX https://github.com/google/jax[13] 00:18:39 Phd Position https://employment.ku.dk/phd/?show=157471[14] 00:20:17 Experiential Learning https://en.wikipedia.org/wiki/Experiential_learning[15] 00:21:21 DIKU https://di.ku.dk/english/ Hiperfit http://hiperfit.dk/ Simcorp https://en.wikipedia.org/wiki/SimCorp Dyalog https://www.dyalog.com/[16] 00:23:00 TAIL http://hiperfit.dk/pdf/array14_final.pdf apltail https://github.com/melsman/apltail Martin Elsman https://elsman.com/[17] 00:29:17 Parametric Polymorphism https://en.wikipedia.org/wiki/Parametric_polymorphism[18] 00:32:06 Jay Foad https://aplwiki.com/wiki/Jay_Foadhttps://docs.dyalog.com/latest/Compiler%20User%20Guide.pdf[19] 00:33:00 Tacit Programming https://aplwiki.com/wiki/Tacit_programming[20] 00:36:30 Mandelbrot set https://en.wikipedia.org/wiki/Mandelbrot_set[21] 00:41:07 Typed Array Languages https://mlochbaum.github.io/BQN/implementation/compile/intro.html#typed-array-languages[22] 00:42:05 Leading Axis Array Theory https://aplwiki.com/wiki/Leading_axis_theory[23] 00:43:56 Ken Iverson https://en.wikipedia.org/wiki/Kenneth_E._Iverson[24] 00:49:25 Conor's Array Comparison https://github.com/codereport/array-language-comparisons[25] 00:49:50 APEX https://gitlab.com/bernecky/apex Bob Bernecky https://www.snakeisland.com/[26] 00:51:05 Second Order Array Combinators https://futhark-book.readthedocs.io/en/latest/language.html[27] 00:52:30 Associativity https://en.wikipedia.org/wiki/Associative_property Commutativity https://en.wikipedia.org/wiki/Commutativity[28] 00:56:12 Toxoplasma Gondii https://en.wikipedia.org/wiki/Toxoplasma_gondii[29] 00:59:20 Guy Blelloch https://en.wikipedia.org/wiki/Guy_Blelloch Nesl http://www.cs.cmu.edu/~scandal/nesl.html[30] 01:00:38 Remora https://www.ccs.neu.edu/home/jrslepak/typed-j.pdf Justin Slepak https://jrslepak.github.io/[31] 01:01:12 Conor's Venn diagram https://github.com/codereport/array-language-comparisons[32] 01:02:40 K https://aplwiki.com/wiki/K Kona https://aplwiki.com/wiki/Kona[33] 01:03:20 April https://aplwiki.com/wiki/April Andrew Sengul Episode on Array Cast https://www.arraycast.com/episodes/episode23-andrew-sengul[34] 01:04:40 Py'n'APL https://github.com/Dyalog/pynapl APL.jl https://aplwiki.com/wiki/APL.jl May https://github.com/justin2004/may Julia https://julialang.org/[35] 01:08:05 Bjarne Stroustrup C++ https://www.stroustrup.com/[36] 01:09:16 Artem Shinkarov https://ashinkarov.github.io/ Sven-Bodo Scholz https://www.macs.hw.ac.uk/~sbs/homepage/main/Welcome.html https://www.microsoft.com/en-us/research/video/sac-off-the-shelf-support-for-data-parallelism-on-multicores/[37] 01:10:19 Contact AT ArrayCast DOT com
Array Cast - September 16, 2022 Show NotesMany thanks to Bob Therriault for gathering these links:[01] 00:02:20 Iverson College Workshop https://community.kx.com/t5/New-kdb-q-users-question-forum/Vector-programming-in-q-online-workshop-Sun-25-Sep/td-p/13067[02] 00:03:18 Jot dot times https://apl.news/[03] 00:03:40 APLQuest https://www.youtube.com/playlist?list=PLYKQVqyrAEj9zSwnh4K28nCApruWA1j_m[04] 00:04:22 APL Barcelona Meetup https://aplwiki.com/wiki/APL_∊_BCN#Historical_APL_paper_reading_group Meetup https://www.meetup.com/apl-bcn/events/287974824/[06] 00:05:15 Paul Mansour's Blog https://www.toolofthought.com/posts[07] 00:05:34 APLwiki Blogs https://aplwiki.com/wiki/Blogs[08] 00:07:50 Conor's talk from Toronto Meetup https://youtu.be/8ynsN4nJxzU[09] 00:09:26 IEEE Spectrum Language List https://spectrum.ieee.org/top-programming-languages-2022[10] 00:15:05 TIOBE Language list https://www.tiobe.com/tiobe-index/[11] 00:15:55 Stack Overflow Language list https://survey.stackoverflow.co/2022/[12] 00:16:45 The Array Cast Episode #1 https://www.arraycast.com/episodes/episode-00-why-i-like-array-languages[13] 00:17:08 Conor's Array Language Comparison https://github.com/codereport/array-language-comparisons[14] 00:18:00 k programming language https://k.miraheze.org/wiki/Learning_Resources[15] 00:18:40 Nial https://www.nial-array-language.org/[16] 00:20:40 Iverson Notation https://apl.wiki/Iverson_Notation[17] 00:21:45 APL2 https://aplwiki.com/wiki/APL2 Jim Brown https://aplwiki.com/wiki/Jim_Brown[18] 00:25:11 Transpose https://aplwiki.com/wiki/Transpose[19] 00:26:00 Reshape https://aplwiki.com/wiki/Reshape[20] 00:29:00 X matrix tweet https://twitter.com/code_report/status/1564757140513996801 J complex copying https://code.jsoftware.com/wiki/Vocabulary/number#dyadic[21] 00:32:06 Roger Hui https://aplwiki.com/wiki/Roger_Hui[22] 00:38:15 Marshall's Outer Product Lambdaconf https://www.youtube.com/watch?v=WlUHw4hC4OY[23] 00:39:08 Ken Iverson https://aplwiki.com/wiki/Ken_Iverson[24] 00:41:38 Julia https://julialang.org/ Numpy https://numpy.org/ Matlab https://www.mathworks.com/products/matlab.html LAPACK https://netlib.org/lapack/ Single Assignment C https://www.sac-home.org/index[25] 00:42:20 APL360 https://aplwiki.com/wiki/APL%5C360[26] 00:45:57 q programming language https://code.kx.com/q/[27] 00:47:07 rank in q https://community.kx.com/t5/kdb-and-q/Iterating-lists-and-dictionaries/m-p/13065#M141[28] 00:50:40 Arthur Whitney https://aplwiki.com/wiki/Arthur_Whitney[29] 00:58:45 PHP https://www.php.net/[30] 00:59:27 JavaScript https://www.javascript.com/[31] 01:01:40 Lisp https://en.wikipedia.org/wiki/Lisp_(programming_language)[32] 01:02:25 BQN https://mlochbaum.github.io/BQN/[33] 01:04:38 Venn Diagram https://en.wikipedia.org/wiki/Venn_diagram[34] 01:07:50 Futhark https://futhark-lang.org/[35] 01:08:36 Combinators https://writings.stephenwolfram.com/2020/12/combinators-and-the-story-of-computation/[36] 01:10:24 Cosy http://www.cosy.com/language/ Haskell https://www.haskell.org/[37] 01:14:14 F# https://fsharp.org/[38] 01:22:10 Leading Axis theory https://aplwiki.com/wiki/Leading_axis_theory[39] 01:23:40 Rank Polymorphism https://prl.ccs.neu.edu/blog/2017/05/04/rank-polymorphism/#:~:text=Each%20of%20these%20use%20cases,arguments%20and%20one%20scalar%20argument.[40] 01:24:40 Diagrams on Youtube https://www.youtube.com/watch?v=qEywreN02ng Tweet of Diagram https://twitter.com/code_report/status/1569741807285567492/photo/1 Updated Tweet of Diagram https://twitter.com/code_report/status/1570069385548537857/photo/1[41] 01:26:06 contact@ArrayCast.com[42] 01:26:56 Array programming influences https://aplwiki.com/wiki/Family_tree_of_array_languages[43] 01:27:20 BQN list of functional languages https://mlochbaum.github.io/BQN/doc/functional.html
Array Cast - August 19, 2022 Show NotesMany thanks to Marshall Lochbaum, Rodrigo Girão Serrão, Bob Therriault, Conor Hoekstra, Adám Brudzewsky and Romilly Cocking for gathering these links:[01] 00:00:03 BYTE magazine https://en.wikipedia.org/wiki/Byte_(magazine)[02] 00:01:02 Org Mode https://orgmode.org/[03] 00:02:58 Toronto Meet-up https://www.meetup.com/en-AU/programming-languages-toronto-meetup/events/287695788/ New York Meet-up https://www.meetup.com/programming-languages-toronto-meetup/events/287729348/[04] 00:04:19 Morten Kromberg episode https://www.arraycast.com/episodes/episode21-morten-kromberg[05] 00:05:01 Romilly's video 'An Excellent Return' https://dyalog.tv/Dyalog08/?v=thr-7QfQWJw[06] 00:06:12 Ferranti Pegasus computer https://en.wikipedia.org/wiki/Ferranti_Pegasus[07] 00:09:09 System 360 in APL http://keiapl.org/archive/APL360_UsersMan_Aug1968.pdf[08] 00:16:50 Mind Map https://en.wikipedia.org/wiki/Mind_map[09] 00:17:00 Dyalog https://www.dyalog.com/[10] 00:18:20 Digitalk https://winworldpc.com/product/digital-smalltalk/5x[11] 00:18:30 Smalltalk https://en.wikipedia.org/wiki/Smalltalk[12] 00:21:17 Raspberry Pi https://www.raspberrypi.org/[13] 00:22:10 Robotics on Dyalog website https://www.dyalog.com/blog/2014/08/dancing-with-the-bots/[14] 00:22:45 Neural Network https://en.wikipedia.org/wiki/Neural_network David Marr https://en.wikipedia.org/wiki/David_Marr_(neuroscientist)[15] 00:23:21 Jetson Nano https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-nano/[16] 00:23:38 Spiking neural networks https://en.wikipedia.org/wiki/Spiking_neural_network[17] 00:24:02 JAX https://jax.readthedocs.io/en/latest/notebooks/quickstart.html[18] 00:27:00 Numpy https://numpy.org/[19] 00:28:21 Nested arrays https://aplwiki.com/wiki/Nested_array[20] 00:29:07 flip Numpy https://numpy.org/doc/stable/reference/generated/numpy.flip.html flipud https://numpy.org/doc/stable/reference/generated/numpy.flipud.html#numpy.flipud[21] 00:31:07 Pegasus Autocode http://blog.rareschool.com/2014/09/pegasus-autocode-revisited.html[22] 00:32:05 Atlas computer 1966 https://en.wikipedia.org/wiki/Atlas_(computer)[23] 00:34:30 Raspberry Pi pico https://www.raspberrypi.com/products/raspberry-pi-pico/[24] 00:36:33 Booker and Morris https://dl.acm.org/doi/pdf/10.1145/364520.364521[25] 00:38:12 Romilly's Blog Markdown http://blog.rareschool.com/2022/05/apl-and-python-go-head-to-head.html[26] 00:41:30 Languages that are built from concatenation https://en.wikipedia.org/wiki/Agglutination[27] 00:44:30 Alan Kay https://en.wikipedia.org/wiki/Alan_Kay[28] 00:47:12 Clojure https://en.wikipedia.org/wiki/Alan_Kay Forth https://en.wikipedia.org/wiki/Forth_(programming_language) Haskell https://www.haskell.org/[29] 00:50:00 Cosy http://www.cosy.com/language/[30] 00:51:38 Py'n'APL https://dyalog.tv/Dyalog21/?v=gOUFXBUMv_A[31] 01:00:12 Logic Analyzer https://en.wikipedia.org/wiki/Logic_analyzer[32] 01:02:15 Back propagation in neural networks https://en.wikipedia.org/wiki/Backpropagation[33] 01:07:38 Stefan Kruger 'Learn APL' https://xpqz.github.io/learnapl/intro.html[34] 01:08:10 Rodrigo Girão Serrão videos https://www.youtube.com/channel/UCd_24S_cYacw6zrvws43AWg[35] 01:08:27 João Araújo episode https://www.arraycast.com/episodes/episode33-joao-araujo[36] 01:08:59 Rodrigo Girão Serrão Neural networks https://www.youtube.com/playlist?list=PLgTqamKi1MS3p-O0QAgjv5vt4NY5OgpiM[37] 01:10:55 Functional Geekery podcast https://www.functionalgeekery.com/[38] 01:11:36 Conor's Security talk https://www.youtube.com/watch?v=ajGX7odA87k[39] 01:12:38 SICP https://en.wikipedia.org/wiki/Structure_and_Interpretation_of_Computer_Programs[40] 01:12:55 Alan McKean Rebecca Wirfs-Brock "Object Design" https://books.google.ca/books?id=vUF72vN5MY8C&printsec=copyright&redir_esc=y#v=onepage&q&f=false[41] 01:13:35 Growing Object Oriented Guided by Tests http://www.growing-object-oriented-software.com/[42] 01:15:01 Design Patterns vs Anti pattern in APL https://www.youtube.com/watch?v=v7Mt0GYHU9A[43] 01:18:25 Pop2 https://hopl.info/showlanguage.prx?exp=298&language=POP-2 Pop2 on pdf-11 https://www.cs.bham.ac.uk/research/projects/poplog/retrieved/adrian-howard-pop11.html[44] 01:18:52 Donald Michie https://en.wikipedia.org/wiki/Donald_Michie[45] 01:21:30 Menace robot http://chalkdustmagazine.com/features/menace-machine-educable-noughts-crosses-engine/[46] 01:22:05 Menace in APL https://romilly.github.io/o-x-o/an-introduction.html
Array Cast - August 5, 2022 Show NotesMany thanks to Bob Therriault, João Araújo and Rodrigo Girão Serrão for gathering these links:[01] 00:01:40 J wiki features https://www.youtube.com/watch?v=dWqixYyb52Q[02] 00:02:31 J promo video https://www.youtube.com/watch?v=vxibe2QOA0s[03] 00:03:00 British APL Association https://britishaplassociation.org/ Vector https://vector.org.uk/[04] 00:03:27 Conor's Twin Algorithms presentation https://www.youtube.com/watch?v=NiferfBvN3s[05] 00:08:13 Numpy https://numpy.org/ JAX https://jax.readthedocs.io/en/latest/notebooks/quickstart.html Julia https://julialang.org/[06] 00:08:49 João's array Google search engine https://cse.google.com/cse?cx=e5ff9c06c246f4ca5[07] 00:09:00 João's Iverson mirror site https://joaogui1.github.io/keiapl/ Original link http://keiapl.org/[08] 00:11:55 João's website https://joaogui1.netlify.app/[09] 00:13:10 BQN https://mlochbaum.github.io/BQN/ Dyalog APL https://www.dyalog.com/ J https://www.jsoftware.com/indexno.html[10] 00:13:50 Vannevar Bush https://en.wikipedia.org/wiki/Vannevar_Bush Alan Kay https://en.wikipedia.org/wiki/Alan_Kay[11] 00:14:13 Tool for Thought Rocks https://lu.ma/toolsforthoughtrockshttps://www.youtube.com/c/ToolsforThoughtRocks?app=desktop[12] 00:14:40 Obsidian discord https://discord.com/invite/veuWUTm[13] 00:15:10 Roam https://roamresearch.com/ Obsidian https://obsidian.md/ Logseq https://logseq.com/[14] 00:17:01 Anki https://logseq.com/ Muse https://museapp.com/[15] 00:18:25 Notion https://www.notion.so/ Remnote https://www.remnote.com/[16] 00:19:42 Marshall's BQN Markdown https://github.com/mlochbaum/BQN/blob/master/md.bqn[17] 00:22:06 Perlis https://en.wikipedia.org/wiki/Alan_Perlis[18] 00:22:49 Array Thinking https://www.arraycast.com/episodes/episode-00-why-i-like-array-languages[19] 00:24:50 Folds https://en.wikipedia.org/wiki/Fold_(higher-order_function)[20] 00:25:51 Rank concept https://aplwiki.com/wiki/Function_rank[22] 00:26:57 Short Term Memory https://www.simplypsychology.org/short-term-memory.html[23] 00:27:42 APL glyphs https://aplwiki.com/wiki/Typing_glyphs#By_method[24] 00:28:59 Stefan Kruger 'Learn APL' https://xpqz.github.io/learnapl/intro.html Rodrigo Girão Serrão 'Mastering Dyalog APL' https://mastering.dyalog.com/README.html[25] 00:29:35 Quarto https://quarto.org/[26] 00:32:33 Conor's original solution {≢∪⍵~0} Y[27] 00:32:40 Without APL ~ Without J -.[28] 00:32:50 BQN Without ¬∘∊/⊣[29] 00:33:55 Set Intersection APL X{⍺⌿⍨(≢⍵)≥⍵⍳⍺}Y Set Intersection J x (e. # [) y Set Union APL X{⍺⍪⍵⌿⍨(≢⍺)
Array Cast - July 8, 2022 Show Notes[01] 00:01:15 Dyalog Problem /solving Contest https://contest.dyalog.com/?goto=welcome[02] 00:01:35 Dyalog Early Bird Discount https://www.dyalog.com/user-meetings/dyalog22.htm[03] 00:02:40 Jeremy Howard wikipedia https://en.wikipedia.org/wiki/Jeremy_Howard_(entrepreneur) Fastmail https://www.fastmail.com/ Optimal Decisions Group https://www.finextra.com/newsarticle/18047/choicepoint-acquires-insurance-analytics-firm-optimal-decisions[04] 00:04:30 APL Study Group https://forums.fast.ai/t/apl-array-programming/97188[05] 00:05:50 McKinsey and Company https://en.wikipedia.org/wiki/McKinsey_%26_Company[06] 00:10:20 AT Kearney https://en.wikipedia.org/wiki/AT_Kearney[07] 00:12:33 MKL (Intel) https://en.wikipedia.org/wiki/Math_Kernel_Library[08] 00:13:00 BLAS http://www.netlib.org/blas/[09] 00:13:11 Perl BQN https://mlochbaum.github.io/BQN/running.html[10] 00:14:06 Raku https://en.wikipedia.org/wiki/Raku_%28programming_language%29[11] 00:15:45 kaggle https://www.kaggle.com/ kaggle competition https://www.kaggle.com/competitions/unimelb/leaderboard[12] 00:16:52 R https://en.wikipedia.org/wiki/R_(programming_language)[13] 00:18:50 Neural Networks https://en.wikipedia.org/wiki/Artificial_neural_network[14] 00:19:50 Enlitic https://www.enlitic.com/[15] 00:20:01 Fast.ai https://www.fast.ai/about/[16] 00:21:02 Numpy https://numpy.org/[17] 00:21:26 Leading Axis Theory https://aplwiki.com/wiki/Leading_axis_theory[18] 00:21:31 Rank Conjunction https://code.jsoftware.com/wiki/Vocabulary/quote[19] 00:21:40 Einstein notation https://en.wikipedia.org/wiki/Einstein_notation[20] 00:22:30 GPU https://en.wikipedia.org/wiki/Graphics_processing_unit[21] 00:22:55 CUDA https://en.wikipedia.org/wiki/CUDA[22] 00:23:30 Map https://en.wikipedia.org/wiki/Map_(higher-order_function)[23] 00:24:05 Data Science https://en.wikipedia.org/wiki/Data_science[24] 00:25:15 First Neural Network https://en.wikipedia.org/wiki/Frank_Rosenblatt[25] 00:28:51 Numpy Another Iverson Ghost https://dev.to/bakerjd99/numpy-another-iverson-ghost-9mc[26] 00:30:11 Pivot Tables https://en.wikipedia.org/wiki/Pivot_table[27] 00:30:36 SQL https://en.wikipedia.org/wiki/SQL[28] 00:31:25 Larry Wall "The three chief virtues of a programmer are: Laziness, Impatience and Hubris." From the glossary of the first Programming Perl book.[29] 00:32:00 Python https://www.python.org/[30] 00:36:25 Regular Expressions https://en.wikipedia.org/wiki/Regular_expression[31] 00:36:50 PyTorch https://pytorch.org/[32] 00:37:39 Notation as Tool of Thought https://www.jsoftware.com/papers/tot.htm[33] 00:37:55 Aaron Hsu codfns https://scholarworks.iu.edu/dspace/handle/2022/24749[34] 00:38:40 J https://www.jsoftware.com/#/[35] 00:39:06 Eric Iverson on Array Cast https://www.arraycast.com/episodes/episode10-eric-iverson[36] 00:40:18 Triangulation Jeremy Howard https://www.youtube.com/watch?v=hxB-rEQvBeM[37] 00:41:48 Google Brain https://en.wikipedia.org/wiki/Google_Brain[38] 00:42:30 RAPIDS https://rapids.ai/[39] 00:43:40 Julia https://julialang.org/[40] 00:43:50 llvm https://llvm.org/[41] 00:44:07 JAX https://jax.readthedocs.io/en/latest/notebooks/quickstart.html[42] 00:44:21 XLA https://www.tensorflow.org/xla[43] 00:44:32 MILAR https://www.tensorflow.org/mlir[44] 00:44:42 Chris Lattner https://en.wikipedia.org/wiki/Chris_Lattner[45] 00:44:53 Tensorflow https://www.tensorflow.org/[46] 00:49:33 torchscript https://pytorch.org/tutorials/beginner/Intro_to_TorchScript_tutorial.html[47] 00:50:09 Scheme https://en.wikipedia.org/wiki/Scheme_(programming_language)[48] 00:50:28 Swift https://en.wikipedia.org/wiki/Swift_(programming_language)[49] 00:51:10 DragonBox Algebra https://dragonbox.com/products/algebra-12[50] 00:52:47 APL Glyphs https://aplwiki.com/wiki/Glyph[51] 00:53:24 Dyalog APL https://www.dyalog.com/[52] 00:54:24 Jupyter https://jupyter.org/[53] 00:55:44 Jeremy's tweet of Meta Math https://twitter.com/jeremyphoward/status/1543738953391800320[54] 00:56:37 Power function https://aplwiki.com/wiki/Power_(function)[55] 01:03:06 Reshape ⍴ https://aplwiki.com/wiki/Reshape[56] 01:03:40 Stallman 'Rho, rho, rho' https://stallman.org/doggerel.html#APL[57] 01:04:20 APLcart https://aplcart.info/ BQNcrate https://mlochbaum.github.io/bqncrate/[58] 01:06:12 J for C programmers https://www.jsoftware.com/help/jforc/contents.htm[59] 01:07:54 Transpose episode https://www.arraycast.com/episodes/episode29-transpose[60] 01:10:00 APLcart video https://www.youtube.com/watch?v=r3owA7tfKE8[61] 01:12:28 Functional Programming https://en.wikipedia.org/wiki/Functional_programming[62] 01:13:00 List Comprehensions https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions[63] 01:13:30 BQN to J https://mlochbaum.github.io/BQN/doc/fromJ.html BQN to Dyalog APL https://mlochbaum.github.io/BQN/doc/fromDyalog.html[64] 01:18:15 Einops https://cgarciae.github.io/einops/1-einops-basics/[65] 01:19:30 April Fools APL https://ci.tc39.es/preview/tc39/ecma262/sha/efb411f2f2a6f0e242849a8cc8d7e21bbcdff543/#sec-apl-expression-rules[66] 01:20:35 Flask library https://flask.palletsprojects.com/en/2.1.x/[67] 01:21:22 JuliaCon 2022 https://juliacon.org/2022/[68] 01:28:05 Myelination https://en.wikipedia.org/wiki/Myelin[69] 01:29:15 Sanyam Bhutani interview https://www.youtube.com/watch?v=g_6nQBsE4pU&t=2150s[70] 01:31:27 Jo Boaler Growth Mindset https://www.youcubed.org/resource/growth-mindset/[71] 01:33:45 Discovery Learning https://en.wikipedia.org/wiki/Discovery_learning[72] 01:37:05 Iverson Bracket https://en.wikipedia.org/wiki/Iverson_bracket[73] 01:39:14 Radek Osmulski Meta Learning https://rosmulski.gumroad.com/l/learn_machine_learning[74] 01:40:12 Top Down Learning https://medium.com/@jacksonbull1987/top-down-learning-4743f16d63d3[75] 01:41:20 Anki https://apps.ankiweb.net/[76] 01:43:50 Lex Fridman Interview https://www.youtube.com/watch?v=J6XcP4JOHmk Deep Talks #54 https://www.youtube.com/watch?v=n7YVlPszaWc0
Array Cast - May 27, 2022 Show Notes[01] 00:02:00 APL logo https://aplwiki.com/wiki/APL_logo[02] 00:02:24 APL Merch On Bonfire: https://bonfire.com/store/apl-stuff/ On Redbubble: https://redbubble.com/shop/ap/111813275 Aaron Hsu's site: https://www.bonfire.com/store/arcfide/[03] 00:03:00 BQN FFI XXXXXXXXXX[04] 00:03:58 kx download https://kx.com/developers/download-licenses/[05] 00:04:20 kx download request mailto:trial@kx.com[06] 00:05:18 Rank https://aplwiki.com/wiki/Rank Leading Axis Theory https://aplwiki.com/wiki/Leading_axis_theory[07] 00:06:24 Adoption in APL https://aplwiki.com/wiki/Leading_axis_theory#Adoption_in_APL[08] 00:08:27 APL https://aplwiki.com/wiki/Main_Page BQN https://mlochbaum.github.io/BQN/ J https://www.jsoftware.com/#/README[09] 00:10:19 + Rank 0 https://aplwiki.com/wiki/Add Rank operator: https://apl.wiki/Rank_(operator) Multidimensional arrays: https://apl.wiki/Array_model Ravel order: https://apl.wiki/Ravel_order[10] 00:10:53 ≡ Match Rank Infinity https://aplwiki.com/wiki/Match APL: https://tryapl.org/?clear&q=a%E2%86%903%203%E2%8D%B4%E2%8D%B39%20%E2%8B%84%20b%E2%86%903%203%E2%8D%B4%E2%8D%B36%20%E2%8B%84%20a(%E2%89%A1%E2%8D%A41)b&run BQN: https://mlochbaum.github.io/BQN/try.html#code=YeKGkDPigL8z4qWKMSvihpU5IOKLhCBi4oaQM+KAvzPipYoxK+KGlTYg4ouEIGEg4omh4o6JMSBi J: https://jsoftware.github.io/j-playground/bin/html2/#code=a%3D%3A%20i.%203%203%0Ab%3D%3A%203%203%20%24%20i.%206%0Aa%20-%3A%221%20b%0A Function rank: https://apl.wiki/Function_rank Raze (;): https://apl.wiki/Raze[11] 00:12:23 Rank of verb in J https://www.jsoftware.com/help/jforc/loopless_code_i_verbs_have_r.htm#_Toc191734331 Rank of noun in J https://www.jsoftware.com/help/jforc/declarations.htm#_Toc191734319[12] 00:14:50 ". Do verb https://code.jsoftware.com/wiki/Vocabulary/quotedot[13] 00:16:05 @ Atop https://code.jsoftware.com/wiki/Vocabulary/at[14] 00:16:10 @: At https://code.jsoftware.com/wiki/Vocabulary/atco[15] 00:18:53 Combinators https://mlochbaum.github.io/BQN/tutorial/combinator.html Close composition: https://apl.wiki/Close_composition[16] 00:19:32 The J diagram of composition patterns: https://code.jsoftware.com/wiki/File:Funcomp.png[17] 00:20:10 Scalar APL https://aplwiki.com/wiki/Scalar J https://code.jsoftware.com/wiki/Vocabulary/Nouns#Atom BQN https://mlochbaum.github.io/BQN/doc/based.html#starting-from-atoms Scalar function: https://apl.wiki/Scalar_function Everything else: https://apl.wiki/Mixed_function Atom: https://apl.wiki/Simple_scalar Leading axis agreement: https://apl.wiki/Leading_axis_agreement Scalar agreement: https://apl.wiki/Scalar_extension[18] 00:25:50 Numpy https://numpy.org/ Julia https://julialang.org/[19] 00:26:58 APL: https://tryapl.org/?clear&q=1%202%203(%2B%E2%8D%A41)2%203%E2%8D%B41%202%203&run BQN: https://mlochbaum.github.io/BQN/try.html#code=MeKAvzLigL8zK+KOiTEgMuKAvzPipYox4oC/MuKAvzM= J: (3 3 $ 1 2 3) +"1 (1 2 3) https://jsoftware.github.io/j-playground/bin/html2/#code=%283%203%20%24%201%202%203%29%20%2B%221%20%281%202%203%29%0A[20] 00:27:40 Reduce https://aplwiki.com/wiki/Reduce Scan https://aplwiki.com/wiki/scan Reverse https://aplwiki.com/wiki/reverse Rotate https://aplwiki.com/wiki/rotate[21] 00:33:00 ⍪ Catenate First https://aplwiki.com/wiki/Catenate[22] 00:37:50 Major cells https://aplwiki.com/wiki/Major_cell Cell: https://apl.wiki/Cell "Items": https://apl.wiki/Major_cell Arthur Whitney: https://apl.wiki/Arthur_Whitney[23] 00:38:48 Operators and Enclosed Arrays https://www.jsoftware.com/papers/opea.htm[24] 00:40:17 Original Iverson Notation http://www.jsoftware.com/papers/DFSP.htm Iverson Notation: https://apl.wiki/Iverson_notation Roger Hui: https://apl.com/Roger_Hui[25] 00:41:32 [] Bracket Axis https://aplwiki.com/wiki/Function_axis[26] 00:43:32 Scan https://aplwiki.com/wiki/Scan Scan with axis: https://tryapl.org/?clear&q=%E2%8E%95IO%E2%86%900%20%E2%8B%84%20m%E2%86%903%203%E2%8D%B41%202%203%20%E2%8B%84%20(%2B%5Cm)%20(%2B%5C%5B0%5Dm)%20(%2B%E2%8D%80m)&run J: https://jsoftware.github.io/j-playground/bin/html2/#code=m%3D%3A%203%203%20%24%201%202%203%0Am%3B%28%2B%2F%5C%20m%29%0A%0A[27] 00:48:32 A Programming Language http://www.jsoftware.com/papers/APL.htm[28] 00:51:40 BQN to APL Translation https://mlochbaum.github.io/BQN/doc/fromDyalog.html[29] 00:52:13 ˘ Cells https://mlochbaum.github.io/BQN/help/cells.html Breve symbol: https://en.wikipedia.org/wiki/Breve SHARP APL: https://apl.wiki/SHARP_APL APL2: https://apl.wiki/APL2[30] 00:53:34 Items J https://code.jsoftware.com/wiki/Vocabulary/Nouns#Item[31] 00:54:59 ˝ Insert BQN https://mlochbaum.github.io/BQN/help/insert.html[32] 00:56:09 Catenation https://aplwiki.com/wiki/Catenate Concatenation reduction of a matrix: APL: https://tryapl.org/?clear&q=%2C%2F3%203%E2%8D%B4%E2%8D%B39&run BQN: https://mlochbaum.github.io/BQN/try.html#code=beKGkDPigL8z4qWKMSvihpU5CuKIvsudbQ== J: https://jsoftware.github.io/j-playground/bin/html2/#code=m%3D%3A%203%203%241%2Bi.9%0A%2C%2Fm BQN with Each: https://mlochbaum.github.io/BQN/try.html#code=beKGkDPigL8z4qWKMSvihpU5CuKIvsKoy51t J with each: https://jsoftware.github.io/j-playground/bin/html2/#code=%0Am%3D%3A3%203%241%2Bi.9%0A%2C%2Feach%20m%0A[33] 00:57:50 k language https://aplwiki.com/wiki/K APL360: https://apl.wiki/APL360[34] 01:00:34 / Insert J https://code.jsoftware.com/wiki/Vocabulary/slash ˝ Insert BQN https://mlochbaum.github.io/BQN/help/insert.html J NuVoc: https://code.jsoftware.com/wiki/NuVoc ⌿ Reduce First APL https://aplwiki.com/wiki/Reduce[35] 01:03:40 J for C Programmers Declarations https://www.jsoftware.com/help/jforc/declarations.htm#_Toc191734319 J for C Programmers Verbs have rank https://www.jsoftware.com/help/jforc/loopless_code_i_verbs_have_r.htm#_Toc191734331 Summing rows: APL: https://tryapl.org/?clear&q=%2B%2F3%203%E2%8D%B4%E2%8D%B39&run BQN: https://mlochbaum.github.io/BQN/try.html#code=beKGkDPigL8z4qWKMSvihpU5CivCtMuYbQ== J: https://jsoftware.github.io/j-playground/bin/html2/#code=%0Am%3D%3A3%203%24%3E%3Ai.9%0A%2B%2F%221%20m%0A Summing rows of coordinate pairs: APL: https://tryapl.org/?clear&q=m%E2%86%903%203%E2%8D%B4%E2%8D%B39%20%E2%8B%84%20m%20(%E2%8A%96m)%20(%E2%8C%BDm)&run BQN with insert: https://mlochbaum.github.io/BQN/try.html#code=K8udy5gxK+KGlTPigL8z BQN with fold: https://mlochbaum.github.io/BQN/try.html#code=K8K0y5gxK+KGlTPigL8z J: https://jsoftware.github.io/j-playground/bin/html2/#code=%0Am%3D%3A%7B%3B~%3E%3Ai.3%0A%2Beach%2F%20m%0A[36] 01:07:00 Based Arrays https://aplwiki.com/wiki/Array_model#Based_array_theory[37] 01:08:47 http://tryapl.com https://tryapl.org/[38] 01:09:01 iota ⍳ https://aplwiki.com/wiki/Index_Generator[39] 01:11:20 aplcart https://aplcart.info/[40] 01:13:30 Einstein Summation Notation https://en.wikipedia.org/wiki/Einstein_notation einsum: https://numpy.org/doc/stable/reference/generated/numpy.einsum.html[41] 01:15:00 Dex programming language https://github.com/google-research/dex-lang#readme[42] 01:18:00 ⍉ Transpose https://aplwiki.com/wiki/Transpose Mirrors: APL: https://tryapl.org/?clear&q=M%E2%86%903%203%E2%8D%B4%E2%8D%B39%20%E2%8B%84%20m%20(%E2%8A%96m)%20(%E2%8C%BDm)&run BQN: https://mlochbaum.github.io/BQN/try.html#code=beKGkDPigL8z4qWKMSvihpU5IOKLhCDin6htLOKMvW0s4oy9y5ht4p+p J:https://jsoftware.github.io/j-playground/bin/html2/#code=%0Am%3D%3A%3E%3Ai.%203%203%0A%28%5D%20%3B%20%7C.%20%3B%20%7C.%221%29%20m%0A[43] 01:20:00 BQN ⊸ Before https://mlochbaum.github.io/BQN/help/before_bind.html ⟜ After https://mlochbaum.github.io/BQN/help/after_bind.html Conor's number of permutations tweet: https://twitter.com/code_report/status/1528844402046685187 Combinators: https://apl.wiki/Function_composition[44] 01:31:30 Joel Kaplan on the ArrayCast https://www.arraycast.com/episodes/episode27-joel-kaplan
Watch the live stream: Watch on YouTube About the show Sponsored by us! Support our work through: Our courses at Talk Python Training Test & Code Podcast Patreon Supporters Brian #1:distinctipy “distinctipy is a lightweight python package providing functions to generate colours that are visually distinct from one another.” Small, focused tool, but really cool. Say you need to plot a dynamic number of lines. Why not let distinctipy pick colors for you that will be distinct? Also can display the color swatches. Some example palettes here: https://github.com/alan-turing-institute/distinctipy/tree/main/examples from distinctipy import distinctipy # number of colours to generate N = 36 # generate N visually distinct colours colors = distinctipy.get_colors(N) # display the colours distinctipy.color_swatch(colors) Michael #2: Soda SQL Soda SQL is a free, open-source command-line tool. It utilizes user-defined input to prepare SQL queries that run tests on dataset in a data source to find invalid, missing, or unexpected data. Looks good for data pipelines and other CI/CD work! Daniel #3: Python in Nature There's a review article from Sept 2020 on array programming with NumPy in the research journal Nature. For reference, in grad school we had a fancy paper on quantum entanglement that got rejected from Nature Communications, a sub-journal to Nature. Nature is hard to get into. List of authors includes Travis Oliphant who started NumPy. Covers NumPy as the foundation, building up to specialized libraries like QuTiP for quantum computing. If you search “Python” on their site, many papers come up. Interesting to see their take on publishing software work. Brian #4: Supercharging GitHub Actions with Job Summaries From a tweet by Simon Willison and an article: GH Actions job summaries Also, Ned Batchelder is using it for Coverage reports “You can now output and group custom Markdown content on the Actions run summary page.” “Custom Markdown content can be used for a variety of creative purposes, such as: Aggregating and displaying test results Generating reports Custom output independent of logs” Coverage.py example: - name: "Create summary" run: | echo '### Total coverage: ${{ env.total }}%' >> $GITHUB_STEP_SUMMARY echo '[${{ env.url }}](${{ env.url }})' >> $GITHUB_STEP_SUMMARY Michael #5:Language Summit is write up out via Itamar, by Alex Waygood Python without the GIL: A talk by Sam Gross Reaching a per-interpreter GIL: A talk by Eric Snow The "Faster CPython" project: 3.12 and beyond: A talk by Mark Shannon WebAssembly: Python in the browser and beyond: A talk by Christian Heimes F-strings in the grammar: A talk by Pablo Galindo Salgado Cinder Async Optimisations: A talk by Itamar Ostricher The issue and PR backlog: A talk by Irit Katriel The path forward for immortal objects: A talk by Eddie Elizondo and Eric Snow Lightning talks, featuring short presentations by Carl Meyer, Thomas Wouters, Kevin Modzelewski, Samuel Colvin and Larry Hastings Daniel #6:AllSpice is Git for EEs Software engineers have Git/SVN/Mercurial/etc None of the other engineering disciplines (mechanical, electrical, optical, etc), have it nearly as good. Altium has their Vault and “365,” but there's nothing with a Git-like UX. Supports version history, diffs, all the things you expect. Even self-hosting and a Gov Cloud version. “Bring your workflow to the 21st century, finally.” Extras Brian: Will McGugan talks about Rich, Textual, and Textualize on Test & Code 188 Also 3 other episodes since last week. (I have a backlog I'm working through.) Michael: Power On-Xbox Documentary | Full Movie The 4 Reasons To Branch with Git - Illustrated Examples with Python A Python spotting - via Jason Pecor 2022 StackOverflow Developer Survey is live, via Brian TextSniper macOS App PandasTutor on webassembly Daniel: I know Adafruit's a household name, shout-out to Sparkfun, Seeed Studio, OpenMV, and other companies in the field. Joke: A little awkward
Watch the live stream: Watch on YouTube About the show Sponsored by Datadog: pythonbytes.fm/datadog Special guest: Brian Skinn (Twitter | Github) Michael #1: OpenBB wants to be an open source challenger to Bloomberg Terminal OpenBB Terminal provides a modern Python-based integrated environment for investment research, that allows an average joe retail trader to leverage state-of-the-art Data Science and Machine Learning technologies. As a modern Python-based environment, OpenBBTerminal opens access to numerous Python data libraries in Data Science (Pandas, Numpy, Scipy, Jupyter) Machine Learning (Pytorch, Tensorflow, Sklearn, Flair) Data Acquisition (Beautiful Soup, and numerous third-party APIs) They have a discord community too BTW, seem to be a successful open source project: OpenBB Raises $8.5M in Seed Round Funding Following Open Source Project Gamestonk Terminal's Success Great graphics / gallery here. Way more affordable than the $1,900/mo/user for the Bloomberg Terminal Brian #2: Python f-strings https://fstring.help Florian Bruhin Quick overview of cool features of f-strings, made with Jupyter Python f-strings Are More Powerful Than You Might Think Martin Heinz More verbose discussion of f-strings Both are great to up your string formatting game. Brian S. #3: pyproject.toml and PEP 621 Support in setuptools PEP 621: “Storing project metadata in pyproject.toml” Authors: Brett Cannon, Dustin Ingram, Paul Ganssle, Pradyun Gedam, Sébastien Eustace, Thomas Kluyver, Tzu-ping Chung (Jun-Oct 2020) Covers build-tool-independent fields (name, version, description, readme, authors, etc.) Various tools had already implemented pyproject.toml support, but not setuptools Including: Flit, Hatch, PDM, Trampolim, and Whey (h/t: Scikit-HEP) Not Poetry yet, though it's under discussion setuptools support had been discussed pretty extensively, and had been included on the PSF's list of fundable packaging improvements Initial experimental implementation spearheaded by Anderson Bravalheri, recently completed Seeking testing and bug reports from the community (Discuss thread) I tried it on one of my projects — it mostly worked, but revealed a bug that Anderson fixed super-quick (proper handling of a dynamic long_description, defined in setup.py) Related tools (all early-stage/experimental AFAIK) ini2toml (Anderson Bravalheri) — Can convert setup.cfg (which is in INI format) to pyproject.toml Mostly worked well for me, though I had to manually fix a couple things, most of which were due to limitations of the INI format INI has no list syntax! validate-pyproject (Anderson Bravalheri) — Automated pyproject.toml checks pyproject-fmt (Bernát Gábor) — Autoformatter for pyproject.toml Don't forget to use it with build, instead of via a python setup.py invocation! $ pip install build $ python -m build Will also want to constrain your setuptools version in the build-backend.requires key of pyproject.toml (you are using PEP517/518, right??) Michael #4: JSON Web Tokens @ jwt.io JSON Web Tokens are an open, industry standard RFC 7519 method for representing claims securely between two parties. Basically a visualizer and debugger for JWTs Enter an encoded token Select a decryption algorithm See the payload data verify the signature List of libraries, grouped by language Brian #5: Autocorrect and other Git Tricks - Waylon Walker - Use `git config --global help.autocorrect 10` to have git automatically run the command you meant in 1 second. The `10` is 10 x 1/10 of a second. So `50` for 5 seconds, etc. Automatically set upstream branch if it's not there git config --global push.default current You may NOT want to do this if you are not careful with your branches. From https://stackoverflow.com/a/22933955 git commit -a Automatically “add” all changed and deleted files, but not untracked files. From https://git-scm.com/docs/git-commit#Documentation/git-commit.txt--a Now most of my interactions with git CLI, especially for quick changes, is: $ git checkout main $ git pull $ git checkout -b okken_something $ git commit -a -m 'quick message' $ git push With these working, with autocorrect $ git chkout main $ git pll $ git comit -a -m 'quick message' $ git psh Brian S. #6: jupyter-tempvars Jupyter notebooks are great, and the global namespace of the Python kernel backend makes it super easy to flow analysis from one cell to another BUT, that global namespace also makes it super easy to footgun, when variables leak into/out of a cell when you don't want them to jupyter-tempvars notebook extension Built on top of the tempvars library, which defines a TempVars context manager for handling temporary variables When you create a TempVars context manager, you provide it patterns for variable names to treat as temporary In its simplest form, TempVars (1) clears matching variables from the namespace on entering the context, and then (2) clears them again upon exiting the context, and restoring their prior values, if any TempVars works great, but it's cumbersome and distracting to manually include it in every notebook cell where it's needed With jupyter-tempvars, you instead apply tags with a specific format to notebook cells, and the extension automatically wraps each cell's code in a TempVars context before execution Javascript adapted from existing extensions Patching CodeCell.execute, from the jupyter_contrib_nbextensions ‘Execution Dependencies' extension, to enclose the cell code with the context manager Listening for the ‘kernel ready' event, from [jupyter-black](https://github.com/drillan/jupyter-black/blob/d197945508a9d2879f2e2cc99cafe0cedf034cf2/kernel_exec_on_cell.js#L347-L350), to import the [TempVars](https://github.com/bskinn/jupyter-tempvars/blob/491babaca4f48c8d453ce4598ac12aa6c5323181/src/jupyter_tempvars/extension/jupyter_tempvars.js#L42-L46) context manager upon kernel (re)start See the README (with animated GIFs!) for installation and usage instructions It's on PyPI: $ pip install jupyter-tempvars And, I made a shortcut install script for it: $ jupyter-tempvars install && jupyter-tempvars enable Please try it out, find/report bugs, and suggest features! Future work Publish to conda-forge (definitely) Adapt to JupyterLab, VS Code, etc. (pending interest) Extras Brian: Ok. Python issues are now on GitHub. Seriously. See for yourself. Lorem Ipsum is more interesting than I realized. O RLY Cover Generator Example: Michael: New course: Secure APIs with FastAPI and the Microsoft Identity Platform Pyenv Virtualenv for Windows (Sorta'ish) Hipster Ipsum Brian S.: PSF staff is expanding PSF hiring an Infrastructure Engineer Link now 404s, perhaps they've made their hire? Last year's hire of the Packaging Project Manager (Shamika Mohanan) Steering Council supports PSF hiring a second developer-in-residence PSF has chosen its new Executive Director: Deb Nicholson! PyOhio 2022 Call for Proposals is open Teaser tweet for performance improvements to pydantic Jokes: https://twitter.com/CaNerdIan/status/1512628780212396036 https://www.reddit.com/r/ProgrammerHumor/comments/tuh06y/i_guess_we_all_have_been_there/ https://twitter.com/PR0GRAMMERHUM0R/status/1507613349625966599