Podcasts about sandboxes

  • 128PODCASTS
  • 168EPISODES
  • 44mAVG DURATION
  • 1WEEKLY EPISODE
  • Mar 5, 2026LATEST

POPULARITY

20192020202120222023202420252026


Best podcasts about sandboxes

Latest podcast episodes about sandboxes

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

The reception to our recent post on Code Reviews has been strong. Catch up!Amid a maelstrom of discussion on whether or not AI is killing SaaS, one of the top publicly listed SaaS companies in the world has just reported record revenues, clearing well over $1.1B in ARR for the first time with a 28% margin. As we comment on the pod, Aaron Levie is the rare public company CEO equally at home in both worlds of Silicon Valley and Wall Street/Main Street, by day helping 70% of the Fortune 500 with their Enterprise Advanced Suite, and yet by night is often found in the basements of early startups and tweeting viral insights about the future of agents.Now that both Cursor, Cloudflare, Perplexity, Anthropic and more have made Filesystems and Sandboxes and various forms of “Just Give the Agent a Box” cool (not just cool; it is now one of the single hottest areas in AI infrastructure growing 100% MoM), we find it a delightfully appropriate time to do the episode with the OG CEO who has been giving humans and computers Boxes since he was a college dropout pitching VCs at a Michael Arrington house party.Enjoy our special pod, with fan favorite returning guest/guest cohost Jeff Huber!Note: We didn't directly discuss the AI vs SaaS debate - Aaron has done many, many, many other podcasts on that, and you should read his definitive essay on it. Most commentators do not understand SaaS businesses because they have never scaled one themselves, and deeply reflected on what the true value proposition of SaaS is.We also discuss Your Company is a Filesystem:We also shoutout CTO Ben Kus' and the AI team, who talked about the technical architecture and will return for AIE WF 2026.Full Video EpisodeTimestamps* 00:00 Adapting Work for Agents* 01:29 Why Every Agent Needs a Box* 04:38 Agent Governance and Identity* 11:28 Why Coding Agents Took Off First* 21:42 Context Engineering and Search Limits* 31:29 Inside Agent Evals* 33:23 Industries and Datasets* 35:22 Building the Agent Team* 38:50 Read Write Agent Workflows* 41:54 Docs Graphs and Founder Mode* 55:38 Token FOMO Culture* 56:31 Production Function Secrets* 01:01:08 Film Roots to Box* 01:03:38 AI Future of Movies* 01:06:47 Media DevRel and EngineeringTranscriptAdapting Work for AgentsAaron Levie: Like you don't write code, you talk to an agent and it goes and does it for you, and you may be at best review it. That's even probably like, like largely not even what you're doing. What's happening is we are changing our work to make the agents effective. In that model, the agent didn't really adapt to how we work.We basically adapted to how the agent works. All of the economy has to go through that exact same evolution. Right now, it's a huge asset and an advantage for the teams that do it early and that are kinda wired into doing this ‘cause you'll see compounding returns. But that's just gonna take a while for most companies to actually go and get this deployed.swyx: Welcome to the Lane Space Pod. We're back in the chroma studio with uh, chroma, CEO, Jeff Hoover. Welcome returning guest now guest host.Aaron Levie: It's a pleasure. Wow. How'd you get upgraded to, uh, to that?swyx: Because he's like the perfect guy to be guest those for you.Aaron Levie: That makes sense actually, for We love context. We, we both really love context le we really do.We really do.swyx: Uh, and we're here with, uh, Aaron Levy. Welcome.Aaron Levie: Thank you. Good to, uh, good to be [00:01:00] here.swyx: Uh, yeah. So we've all met offline and like chatted a little bit, but like, it's always nice to get these things in person and conversation. Yeah. You just started off with so much energy. You're, you're super excited about agents.I loveAaron Levie: agents.swyx: Yeah. Open claw. Just got by, got bought by OpenAI. No, not bought, but you know, you know what I mean?Aaron Levie: Some, some, you know, acquihire. Executiveswyx: hire.Aaron Levie: Executive hire. Okay. Executive hire. Say,swyx: hey, that's my term. Okay. Um, what are you pounding the table on on agents? You have so many insightful tweets.Why Every Agent Needs a BoxAaron Levie: Well, the thing that, that we get super excited by that I think is probably, you know, should be relatively obvious is we've, we've built a platform to help enterprises manage their files and their, their corporate files and the permissions of who has access to those files and the sharing collaboration of those files.All of those files contain really, really important information for the enterprise. It might have your contracts, it might have your research materials, it might have marketing information, it might have your memos. All that data obviously has, you know, predominantly been used by humans. [00:02:00] But there's been one really interesting problem, which is that, you know, humans only really work with their files during an active engagement with them, and they kind of go away and you don't really see them for a long time.And all of a sudden, uh, with the power of AI and AI agents, all of that data becomes extremely relevant as this ongoing source of, of answers to new questions of data that will transform into, into something else that, that produces value in your organization. It, it contains the answer to the new employee that's onboarding, that needs to ramp up on a project.Um, it contains the answer to the right thing to sell a customer when you're having a conversation to them, with them contains the roadmap information that's gonna produce the next feature. So all that data. That previously we've been just sort of storing and, and you know, occasionally forgetting about, ‘cause we're only working on the new active stuff.All of that information becomes valuable to the enterprise and it's gonna become extremely valuable to end users because now they can have agents go find what they're looking for and produce new, new [00:03:00] value and new data on that information. And it's gonna become incredibly valuable to agents because agents can roam around and do a bunch of work and they're gonna need access to that data as well.And um, and you know, sometimes that will be an agent that is sort of working on behalf of, of, of you and, and effectively as you as and, and they are kind of accessing all of the same information that you have access to and, and operating as you in the system. And then sometimes there's gonna be agents that are just.Effectively autonomous and kind of run on their own and, and you're gonna collaborate and work with them kind of like you did another person. Open Claw being the most recent and maybe first real sort of, you know, kind of, you know, up updating everybody's, you know, views of this landscape version of, of what that could look like, which is, okay, I have an agent.It's on its own system, it's on its own computer, it has access to its own tools. I probably don't give it access to my entire life. I probably communicate with it like I would an assistant or a colleague and then it, it sort of has this sandbox environment. So all of that has massive implications for a platform that manage that [00:04:00] enterprise data.We think it's gonna just transform how we work with all of the enterprise content that we work with, and we just have to make sure we're building the right platform to support that.swyx: The sort of shorthand I put it is as people build agents, everybody's just realizing that every agent needs a box. Yes.And it's nice to be called box and just give everyone a box.Aaron Levie: Hey, I if I, you know, if we can make that go viral, uh, like I, I think that that terminology, I, that's theswyx: tagline. Every agentAaron Levie: needs a box. Every agent needs a box. If we can make that the headline of this, I'm fine with this. And that's the billboard I wanna like Yeah, exactly.Every agent needs a box. Um, I like it. Can we ship this? Like,swyx: okay, let's do it. Yeah.Aaron Levie: Uh, my work here is done and I got the value I needed outta this podcast Drinks.swyx: Yeah.Agent Governance and IdentityAaron Levie: But, but, um, but, but, you know, so the thing that we, we kind of think about is, um, is, you know, whether you think the number 10 x or a hundred x or whatever the number is, we're gonna have some order of magnitude more agents than people.That's inevitable. It has to happen. So then the question is, what is the infrastructure that's needed to make all those agents effective in the enterprise? Make sure that they are well governed. Make sure they're only doing [00:05:00] safe things on your information. Make sure that they're not getting exposed. The data that they shouldn't have access to.There's gonna be just incredibly spectacularly crazy security incidents that will happen with agents because you'll prompt, inject an agent and sort of find your way through the CRM system and pull out data that you shouldn't have access to. Oh, weJeff Huber: have God,Aaron Levie: right? I mean, that's just gonna happen all over the place, right?So, so then the thing is, is how do you make sure you have the right security, the permissions, the access controls, the data governance. Um, we actually don't yet exactly know in many cases how we're gonna regulate some of these agents, right? If you think about an agent in financial services, does it have the exact same financial sort of, uh, requirements that a human did?Or is it, is the risk fully on the human that was interacting or created the agent? All open questions, but no matter what, there's gonna need to be a layer that manages the, the data they have access to, the workflows that they're involved in, pulling up data from multiple systems. This is the new infrastructure opportunity in the era of agents.swyx: You have a piece on agent identities, [00:06:00] which I think was today, um, which I think a lot of breaking news, the security, security people are talking about, right? Like you basically, I, I always think of this as like, well you need the human you and then there you need the agent. YouAaron Levie: Yes.swyx: And uh, well, I don't know if it's that simple, but is box going to have an opinion on that or you're just gonna be like, well we're just the sort of the, the source layer.Yeah. Let's Okta of zero handle that.Aaron Levie: I think we're gonna have an opinion and we will work with generally wherever the contours of the market end up. Um, and the reason that we're gonna have an opinion more than other topics probably is because one of the biggest use cases for why your agent might need it, an identity is for file system access.So thus we have to kind of think about this pretty deeply. And I think, uh, unless you're like in our world thinking about this particular problem all day long, it might be, you know, like, why is this such a big deal? And the reason why it's a really big deal is because sometimes sort of say, well just give the agent an, an account on the system and it just treats, treat it like every other type of user on the system.The [00:07:00] problem is, is that I as Aaron don't really have any responsibility over anybody else's box account in our organization. I can't see the box account of any other employee that I work with. I am not liable for anything that they do. And they have, I have, I have, you know, strict privacy requirements on everything that they're able to, you know, that, that, that they work on.Agents don't have that, you know, don't have those properties. The person who creates the agent probably is gonna, for the foreseeable future, take on a lot of the liability of what that agent does. That agent doesn't deserve any privacy because, because it's, you know, it can't fully be autonomously operated and it doesn't have any legal, you know, kind of, you know, responsibility.So thus you can't just be like, oh, well I'll just create a bunch of accounts and then I'll, I'll kind of work with that agent and I'll talk to it occasionally. Like you need oversight of that. And so then the question is, how do you have a world where the agent, sometimes you have oversight of, but what if that agent goes and works with other people?That person over there is collaborating with the agent on something you shouldn't have [00:08:00] access to what they're doing. So we have all of these new boundaries that we're gonna have to figure out of, of, you know, it's really, really easy. So far we've been in, in easy mode. We've hit the easy button with ai, which is the agent just is you.And when you're in quad code and you're in cursor, and you're in Codex, you're just, the agent is you. You're offing into your services. It can do everything you can do. That's the easy mode. The hard mode is agents are kind of running on their own. People check in with them occasionally, they're doing things autonomously.How do you give them access to resources in the enterprise and not dramatically increased the security risk and the risk that you might expose the wrong thing to somebody. These are all the new problems that we have to get solved. I like the identity layer and, and identity vendors as being a solution to that, but we'll, we'll need some opinions as well because so many of the use cases are these collaborative file system use cases, which is how do I give it an agent, a subset of my data?Give it its own workspace as well. ‘cause it's gonna need to store off its own information that would be relevant for it. And how do I have the right oversight into that? [00:09:00]Jeff Huber: One thing, which, um, I think is kind interesting, think about is that you know, how humans work, right? Like I may not also just like give you access to the whole file.I might like sit next to you and like scroll to this like one part of the file and just show you that like one part and like, you know,swyx: partial file access.Jeff Huber: I'm just saying I think like our, like RA does seem to be dead, right? Like you wanna say something is dead uhhuh probably RA is dead. And uh, like the auth story to me seems like incredibly unsolved and unaddressed by like the existing state of like AI vendors.ButAaron Levie: yeah, I think, um, we're, I mean you're taking obviously really to level limit that we probably need to solve for. Yeah. And we built an access control system that was, was kind of like, you know, its own little world for, for a long time. And um, and the idea was this, it's a many to many collaboration system where I can give you any part of the file system.And it's a waterfall model. So if I give you higher up in the, in the, in the system, you get everything below. And that, that kind of created immense flexibility because I can kind of point you to any layer in the, in the tree, but then you're gonna get access to everything kind of below it. And that [00:10:00] mostly is, is working in this, in this world.But you do have to manage this issue, which is how do I create an agent that has access to some of my stuff and somebody else's stuff as well. Mm-hmm. And which parts do I get to look at as the creator of the agent? And, and these are just brand new problems? Yeah. Crazy. And humans, when there was a human there that was really easy to do.Like, like if the three of us were all sharing, there'd be a Venn diagram where we'd have an overlapping set of things we've shared, but then we'd have our own ways that we shared with each other. In an agent world, somebody needs to take responsibility for what that agent has access to and what they're working on.These are like the, some of the most probably, you know, boring problems for 98% of people on, on the internet, but they will be the problems that are the difference between can you actually have autonomous agents in an enterprise contextswyx: Yeah.Aaron Levie: That are not leaking your data constantly.swyx: No. Like, I mean, you know, I run a very, very small company for my conference and like we already have data sensitivity issues.Yes. And some of my team members cannot see Yes. Uh, the others and like, I can't imagine what it's like to run a Fortune 500 and like, you have to [00:11:00] worry about this. I'm just kinda curious, like you, you talked to a lot like, like 70, 80% of your cus uh, of the Fortune 500, your customers.Aaron Levie: Yep. 67%. Just so we're being verySEswyx: precise.So Yeah. I'm notAaron Levie: Okay. Okay.swyx: Something I'm rounding up. Yes. Round up. I'm projecting to, forAaron Levie: the government.swyx: I'm projecting to the end of the year.Aaron Levie: Okay.swyx: There you go.Aaron Levie: You do make it sound like, like we, we, well we've gotta be on this. Like we're, we're taking way too long to get to 80%. Well,swyx: no, I mean, so like. How are they approaching it?Right? Because you're, you don't have a, you don't have a final answer yet.Why Coding Agents Took Off FirstAaron Levie: Well, okay, so, so this is actually, this is the stark reality that like, unfortunately is the kinda like pouring the water on the party a little bit.swyx: Yes.Aaron Levie: We all in Silicon Valley are like, have the absolute best conditions possible for AI ever.And I think we all saw the dke, you know, kind of Dario podcast and this idea of AI coding. Why is that taken off? And, and we're not yet fully seeing it everywhere else. Well, look, if you just like enumerated the list of properties that AI coding has and then compared it to other [00:12:00] knowledge work, let's just, let's just go through a few of them.Generally speaking, you bring on a new engineer, they have access to a large swath of the code base. Like, there's like very, like you, just, like new engineer comes on, they can just go and find the, the, the stuff that they, they need to work with. It's a fully text in text out. Medium. It's only, it's just gonna be text at the end of the day.So it's like really great from a, from just a, uh, you know, kinda what the agent can work with. Obviously the models are super trained on that dataset. The labs themselves have a really strong, kind of self-reinforcing positive flywheel of why they need to do, you know, agent coding deeply. So then you get just better tooling, better services.The actual developers of the AI are daily users of the, of the thing that they're we're working on versus like the, you know, probably there's only like seven Claude Cowork legal plugin users at Anthropic any given day, but there's like a couple thousand Claude code and you know, users every single day.So just like, think about which one are they getting more feedback on. All day long. So you just go through this list. You have a, you know, everybody who's a [00:13:00] developer by definition is technical so they can go install the latest thing. We're all generally online, or at least, you know, kinda the weird ones are, and we're all talking to each other, sharing best practices, like that's like already eight differences.Versus the rest of the economy. Every other part of the economy has like, like six to seven headwinds relative to that list. You go into a company, you're a banker in financial services, you have access to like a, a tiny little subset of the total data that's gonna be relevant to do your job. And you're have to start to go and talk to a bunch of people to get the right data to do your job because Sally didn't add you to that deal room, you know, folder.And that that, you know, the information is actually in a completely different organization that you now have to go in and, and sort of run into. And it's like you have this endless list of access controls and security. As, as you talked about, you have a medium, which is not, it's not just text, right? You have, you have a zoom call that, that you're getting all of the requirements from the customer.You have a lot of in-person conversations and you're doing in-person sales and like how do you ever [00:14:00] digitize all of that information? Um, you know, I think a lot of people got upset with this idea that the code base has all the context, um, that I don't know if you follow, you know, did you follow some of that conversation that that went viral?Is like, you know, it's not that simple that, that the code base doesn't have all the knowledge, but like it's a lot, you're a lot better off than you are with other areas of knowledge work. Like you, we like, we like have documentation practices, you write specifications. Those things don't exist for like 80% of work that happens in the enterprise.That's the divide that we have, which is, which is AI coding has, has just fully, you know, where we've reached escape velocity of how powerful this stuff is, and then we're gonna have to find a way to bring that same energy and momentum, but to all these other areas of knowledge work. Where the tools aren't there, the data's not set up to be there.The access controls don't make it that easy. The context engineering is an incredibly hard problem because again, you have access control challenges, you have different data formats. You have end users that are gonna need to kind of be kind of trained through this as opposed to their adopting [00:15:00] these tools in their free time.That's where the Fortune 500 is. And so we, I think, you know, have to be prepared as an industry where we are gonna be on a multi-year march to, to be able to bring agents to the enterprise for these workflows. And I think probably the, the thing that we've learned most in coding that, that the rest of the world is not yet, I think ready for, I mean, we're, they'll, they'll have to be ready for it because it's just gonna inevitably happen is I think in coding.What, what's interesting is if you think about the practice of coding today versus two years ago. It's probably the most changed workflow in maybe the history of time from the amount of time it's changed, right? Yeah. Like, like has any, has any workflow in the entire economy changed that quickly in terms of the amount of change?I just, you know, at least in any knowledge worker workflow, there's like very rarely been an event where one piece of technology and work practice has so fundamentally, you know, changed, changed what you do. Like you don't write code, you talk to an agent and it goes and [00:16:00] does it for you, and you may be at best review it.And even that's even probably like, like largely not even what you're doing. What's happening is we are changing our work to make the agents effective. In that model, the agent didn't really adapt to how we work. We basically adapted to how the agent works. Mm-hmm. All of the economy has to go through that exact same evolution.The rest of the economy is gonna have to update its workflows to make agents effective. And to give agents the context that they need and to actually figure out what kind of prompting works and to figure out how do you ensure that the agent has the right access to information to be able to execute on its work.I, you know, this is not the panacea that people were hoping for, of the agent drops in, just automates your life. Like you have to basically re-engineer your workflow to get the most out of agents and, uh, and that, that's just gonna take, you know, multiple years across the economy. Right now it's a huge asset and an advantage for the teams that do it early and that are kinda wired into doing this.‘cause [00:17:00] you'll see compounding returns, but that's just gonna take a while for most companies to actually go and get this deployed.swyx: I love, I love pushing back. I think that. That is what a lot of technology consultants love to hear this sort of thing, right? Yeah, yeah, yeah. First to, to embrace the ai. Yes. To get to the promised land, you must pay me so much money to a hundred percent to adopt the prescribed way of, uh, conforming to the agents.Yes. And I worry that you will be eclipsed by someone else who says, no, come as you are.Aaron Levie: Yeah.swyx: And we'll meet you where you are.Aaron Levie: And, and, and and what was the thing that went viral a week ago? OpenAI probably, uh, is hiring F Dees. Yeah. Uh, to go into the enterprise. Yeah. Yeah. And then philanthropic is embedded at Goldman Sachs.Yeah. So if the labs are having to do this, if, if the labs have decided that they need to hire FDE and professional services, then I think that's a pretty clear indication that this, there's no easy mode of workflow transformation. Yeah. Yeah. So, so to your point, I think actually this is a market opportunity for, you know, new professional services and consulting [00:18:00] firms that are like Agent Build and they, and they kind of, you know, go into organizations and they figure out how to re-engineer your workflows to make them more agent ready and get your data into the right format and, you know, reconstruct your business process.So you're, you're not doing most of the work. You're telling agents how to do the work and then you're reviewing it. But I haven't seen the thing that can just drop in and, and kinda let you not go through those changes.swyx: I don't know how that kind of sales pitch goes over. Yeah. You know, you're, you're saying things like, well, in my sort of nice beautiful walled garden, here's, there's, uh, because here's this, here's this beautiful box account that has everything.Yes. And I'm like, well, most, most real life is extremely messy. Sure. And like, poorly named and there duplicate this outdated s**tAaron Levie: a hundred percent. And so No, no, a hundred percent. And so this is actually No. So, so this is, I mean, we agree that, that getting to the beautiful garden is gonna be tough.swyx: Yeah.Aaron Levie: There's also the other end of the spectrum where I, I just like, it's a technical impossibility to solve. The agent is, is truly cannot get enough context to make the right decision in, in the, in the incredibly messy land. Like there's [00:19:00] no a GI that will solve that. So, so we're gonna have to kind of land in somewhere in between, which is like we all collectively get better at.Documentation practices and, and having authoritative relatively up-to-date information and putting it in the right place like agents will, will certainly cause us to be much better organized around how we work with our information, simply because the severity of the agent pulling the wrong data will be too high and the productivity gain of that you'll miss out on by not doing this will be too high as well, that you, that your competition will just do it and they'll just have higher velocity.So, uh, and, and we, we see this a lot firsthand. So we, we build a series of agents internally that they can kind of have access to your full box account and go off and you give it a task and it can go find whatever information you're looking for and work with. And, you know, thank God for the model progress, but like, if, if you gave that task to an agent.Nine months ago, you're just gonna get lots of bogus answers because it's gonna, it's gonna say, Hey, here's, here are fi [00:20:00] five, you know, documents that all kind of smell like the right thing. And I'm gonna, but I, but you're, you're putting me on the clock. ‘cause my assistant prompt says like, you know, be pretty smart, but also try and respond to the user and it's gonna respond.And it's like, ah, it got the wrong document. And then you do that once or twice as a knowledge worker and you're just neverswyx: again,Aaron Levie: never again. You're just like done with the system.swyx: Yeah. It doesn't work.Aaron Levie: It doesn't work. And so, you know, Opus four six and Gemini three one Pro and you know, whatever the latest five 3G BT will be, like, those things are getting better and better and it's using better judgment.And this sort of like the, all of these updates to the agentic tool and search systems are, are, we're seeing, we're seeing very real progress where the agent. Kind of can, can almost smell some things a little bit fishy when it's getting, you know, we, we have this process where we, we have it go fan out, do a bunch of searches, pull up a bunch of data, and then it has to sort of do its own ranking of, you know, what are the right documents that, that it should be working with.And again, like, you know, the intelligence level of a model six months ago, [00:21:00] it'd be just throwing a dart at like, I'm just, I'm gonna grab these seven files and I, I pray, I hope that that's the right answer. And something like an opus first four five, and now four six is like, oh, it's like, no, that one doesn't seem right relative to this question because I'm seeing some signal that is making that, you know, that's contradicting the document where it would normally be in the tree and who should have access.Like it's doing all of that kind of work for you. But like, it still doesn't work if you just have a total wasteland of data. Like, it's just not, it's just not possible. Partly ‘cause a human wouldn't even be able to do it. So basically if a, if a really, really smart human. Could not do that task in five or 10 minutes for a search retrieval type task.Look, you know, your agent's not gonna be able to do it any better. You see this all day long. SoContext Engineering and Search Limitsswyx: this touches on a thing that just passionate about it was just context engineering. I, I'm just gonna let you ramble or riff on, on context engineering. If, if, if there's anything like he, he did really good work on context fraud, which has really taken over as like the term that people use and the referenceAaron Levie: a hundred percent.We, we all we think about is, is the context rob problem. [00:22:00]Jeff Huber: Yeah, there's certainly a lot of like ranking considerations. Gentech surgery think is incredibly promising. Um, yeah, I was trying to generate a question though. I think I have a question right now. Swyx.Aaron Levie: Yeah, no, but like, like I think there was this moment, um, you know, like, I don't know, two years ago before, before we knew like where the, the gotchas were gonna be in ai and I think someone was like, was like, well, infinite context windows will just solve all of these problems and ‘cause you'll just, you'll just give the context window like all the data and.It's just like, okay, I mean, maybe in 2035, like this is a viable solution. First of all, it, it would just, it would just simply cost too much. Like we just can't give the model like the 5,000 documents that might be relevant and it's gonna read them all. And I've seen enough to, to start believing in crazy stuff.So like, I'm willing to just say, sure. Like in, in 10 years from now,swyx: never say, never, never.Aaron Levie: In, in 10 years from now, we'll have infinite context windows at, at a thousandth of the price of today. Like, let's just like believe that that's possible, but Right. We're in reality today. So today we have a context engineering [00:23:00] problem, which is, I got, I got, you know, 200,000 tokens that I can work with, or prob, I don't even know what the latest graph is before, like massive degradation.16. Okay. I have 60,000 tokens that I get to work with where I'm gonna get accurate information. That's not a lot of tokens for a corpus of 10 million documents that a knowledge worker might have across all of the teams and all the projects and all the people they work with. I have, I have 10 million documents.Which, you know, maybe is times five pages per document or something like that. I'm at 50 million pages of information and I have 60,000 tokens. Like, holy s**t. Yeah. This is like, how do I bridge the 50 million pages of information with, you know, the couple hundred that I get to work with in that, in that token window.Yeah. This is like, this is like such an interesting problem and that's why actually so much work is actually like, just like search systems and the databases and that layer has to just get so locked in, but models getting better and importantly [00:24:00] knowing when they've done a search, they found the wrong thing, they go back, they check their work, they, they find a way to balance sort of appeasing the user versus double checking.We have this one, we have this one test case where we ask the agent to go find. 10 pieces of information.swyx: Is this the complex work eval?Aaron Levie: Uh, this is actually not in the eval. This is, this is sort of just like we have a bunch of different, we have a bunch of internal benchmark kind of scenarios. Every time we, we update our agent, we have one, which is, I ask it to find all of our office addresses, and I give it the list of 10 offices that we have.And there's not one document that has this, maybe there should be, that would be a great example of the kind of thing that like maybe over time companies start to, you know, have these sort of like, what are the canonical, you know, kind of key areas of knowledge that we need to have. We don't seem to have this one document that says, here are all of our offices.We have a bunch of documents that have like, here's the New York office and whatever. So you task this agent and you, you get, you say, I need the addresses for these 10 offices. Okay. And by the way, if you do this on any, you know, [00:25:00] public chat model, the same outcome is gonna happen. But for a different kind of query, you give it, you say, I need these 10 addresses.How many times should the agent go and do its search before it decides whether or not, there's just no answer to this question. Often, and especially the, the, let's say lower tier models, it'll come back and it'll give you six of the 10 addresses. And it'll, and I'll just say I couldn't find the otherswyx: four.It, it doesn't know what It doesn't know. ItAaron Levie: doesn't know what It doesn't know. Yeah. So the model is just like, like when should it stop? When should it stop doing? Like should it, should it do that task for literally an hour and just keep cranking through? Maybe I actually made up an office location and it doesn't know that I made it up and I didn't even know that I made it up.Like, should it just keep, re should it read every single file in your entire box account until it, until it should exhaust every single piece of information.swyx: Expensive.Aaron Levie: These are the new problems that we have. So, you know, something like, let's say a new opus model is sort of like, okay, I'm gonna try these types of queries.I didn't get exactly what I wanted. I'm gonna try again. I'm gonna, at [00:26:00] some point I'm gonna stop searching. ‘cause I've determined that that no amount of searching is gonna solve this problem. I'm just not able to do it. And that judgment is like a really new thing that the model needs to be able to have.It's like, when should it give up on a task? ‘cause, ‘cause you just don't, it's a can't find the thing. That's the real world of knowledge, work problems. And this is the stuff that the coding agents don't have to deal with. Because they, it just doesn't like, like you're not usually asking it about, you're, you're always creating net new information coming right outta the model for the most part.Obviously it has to know about your code base and your specs and your documentation, but, but when you deploy an agent on all of your data that now you have all of these new problems that you're dealing withJeff Huber: our, uh, follow follow-up research to context ride is actually on a genetic search. Ah. Um, and we've like right, sort of stress tested like frontier models and their ability to search.Um, and they're not actually that good at searching. Right. Uh, so you're sort of highlighting this like explore, exploit.swyx: You're just say, Debbie, Donna say everything doesn't work. Like,Aaron Levie: well,Jeff Huber: somebody has to be,Aaron Levie: um, can I just throw out one more thing? Yeah. That is different from coding and, and the rest [00:27:00] of the knowledge work that I, I failed to mention.So one other kind of key point is, is that, you know, at the end of the day. Whether you believe we're in a slop apocalypse or, or whatever. At the end of the day, if you, if you build a working product at the end of, if you, if you've built a working solution that is ultimately what the customer is paying for, like whether I have a lot of slop, a little slop or whatever, I'm sure there's lots of code bases we could go into in enterprise software companies where it's like just crazy slop that humans did over a 20 year period, but the end customer just gets this little interface.They can, they can type into it, it does its thing. Knowledge work, uh, doesn't have that property. If I have an AI model, go generate a contract and I generate a contract 20 times and, you know, all 20 times it's just 3% different and like that I, that, that kind of lop introduces all new kinds of risk for my organization that the code version of that LOP didn't, didn't introduce.These are, and so like, so how do you constrain these models to just the part that you want [00:28:00] them to work on and just do the thing that you want them to do? And, and, you know, in engineering, we don't, you can't be disbarred as an engineer, but you could be disbarred as a lawyer. Like you can do the wrong medical thing In healthcare, you, there's no, there's no equivalent to that of engineering.Like, doswyx: you want there to be, because I've considered softwareJeff Huber: engineer. What's that? Civil engineering there is, right? NotAaron Levie: software civil engineer. Sure. Oh yeah, for sure. But like in any of our companies, you like, you know, you'll be forgiven if you took down the site and, and we, we will do a rollback and you'll, you'll be in a meeting, but you have not been disbarred as an engineer.We don't, we don't change your, you know, your computer science, uh, blameJeff Huber: degree, this postmortem.Aaron Levie: Yeah, exactly. Exactly. So, so, uh, now maybe we collectively as an industry need to figure out like, what are you liable for? Not legally, but like in a, in a management sense, uh, of these agents. All sorts of interesting problems that, that, that, uh, that have to come out.But in knowledge work, that's the real hostile environments that we're operating in. Hmm.swyx: I do think like, uh, a lot of the last year's, 2025 story was the rise of coding agents and I think [00:29:00] 2026 story is definitely knowledge work agents. Yes. A hundredAaron Levie: percent.swyx: Right. Like that would, and I think open claw core work are just the beginning.Yes. Like it's, the next one's gonna just gonna be absolute craziness.Aaron Levie: It it is. And, and, uh, and it's gonna be, I mean, again, like this is gonna be this, this wave where we, we are gonna try and bring as many of the practices from coding because that, that will clearly be the forefront, which is tell an agent to go do something and has an access to a set of resources.You need to be responsible for reviewing it at the end of the process. That to me is the, is the kind of template that I just think goes across knowledge, work and odd. Cowork is a great example. Open Closet's a great example. You can kind of, sort of see what Codex could become over time. These are some, some really interesting kind of platforms that are emerging.swyx: Okay. Um, I wanted to, we touched on evals a little bit. You had, you had the report that you're gonna go bring up and then I was gonna go into like, uh, boxes, evals, but uh, go ahead. Talk about your genetic search thing.Jeff Huber: Yeah. Mostly I think kinda a few of the insights. It's like number one frontier model is not good at search.Humans have this [00:30:00] natural explore, exploit trade off where we kinda understand like when to stop doing something. Also, humans are pretty good at like forgetting actually, and like pruning their own context, whereas agents are not, and actually an agent in their kind of context history, if they knew something was bad and they even, you could see in the trace the reason you trace, Hey, that probably wasn't a good idea.If it's still in the trace, still in the context, they'll still do it again. Uhhuh. Uh, and so like, I think pruning is also gonna be like, really, it's already becoming a thing, right? But like, letting self prune the con windowsswyx: be a big deal. Yeah. So, so don't leave the mistake. Don't leave the mistake in there.Cut out the mistake but tell it that you made a mistake in the past and so it doesn't repeat it.Jeff Huber: Yeah. But like cut it out so it doesn't get like distracted by it again. ‘cause really, you know, what is so, so it will repeat its mistake just because it's been, it's inswyx: theJeff Huber: context. It'sAaron Levie: in the context so much.That's a few shot example. Even if it, yeah.Jeff Huber: It's like oh thisAaron Levie: is a great thing to go try even ifJeff Huber: it didn't work.Aaron Levie: Yeah,Jeff Huber: exactly.Aaron Levie: SoJeff Huber: there's like a bunch of stuff there. JustAaron Levie: Groundhogs Day inside these models. Yeah. I'm gonna go keep doing the same wrongJeff Huber: thing. Covering sense. I feel like, you know, some creator analogy you're trying like fit a manifold in latent space, which kind is doing break program synthesis, which is kinda one we think about we're doing right.Like, you know, certain [00:31:00] facts might be like sort of overly pitting it. There are certain, you know, sec sectors of latent space and so like plug clean space. Yeah. And, uh, andswyx: so we have a bell, our editor as a bell every time you say that. SoJeff Huber: you have, you have to like remove those, likeswyx: you shoulda a gong like TPN or something.IfJeff Huber: we gong, you either remove those links to like kinda give it the freedom, kind of do what you need to do. So, but yeah. We'll, we'll release more soon. That'sAaron Levie: awesome.Jeff Huber: That'll, that'll be cool.swyx: We're a cerebral podcast that people listen to us and, and sort of think really deep. So yeah, we try to keep it subtle.Okay. We try to keep it.Aaron Levie: Okay, fine.Inside Agent Evalsswyx: Um, you, you guys do, you guys do have EVs, you talked about your, your office thing, but, uh, you've been also promoting APEX agents and complex work. Uh, yeah, whatever you, wherever you wanna take this just Yeah. How youAaron Levie: Apex is, is obviously me, core's, uh, uh, kind of, um, agent eval.We, we supported that by sort of. Opening up some data for them around how we kind of see these, um, data workspaces in, in the, you know, kind of regular economy. So how do lawyers have a workspace? How do investment bankers have a workspace? What kind of data goes into those? And so we, [00:32:00] we partner with them on their, their apex eval.Our own, um, eval is, it's actually relatively straightforward. We have a, a set of, of documents in a, in a range of industries. We give the agent previously did this as a one shot test of just purely the model. And then we just realized we, we need to, based on where everything's going, it's just gotta be more agentic.So now it's a bit more of a test of both our harness and the model. And we have a rubric of a set of things that has to get right and we score it. Um, and you're just seeing, you know, these incredible jumps in almost every single model in its own family of, you know, opus four, um, you know, sonnet four six versus sonnet four five.swyx: Yeah. We have this up on screen.Aaron Levie: Okay, cool. So some, you're seeing it somewhere like. I, I forget the to, it was like 15 point jump, I think on the main, on the overall,swyx: yes.Aaron Levie: And it's just like, you know, these incredible leaps that, that are starting to happen. Um,swyx: and OP doesn't know any, like any, it's completely held out from op.Aaron Levie: This is not in any, there's no public data which has, you know, Ben benefits and this is just a private eval that we [00:33:00] do, and then we just happen to show it to, to the world. Hmm. So you can't, you can't train against it. And I think it's just as representative of. It's obviously reasoning capabilities, what it's doing at, at, you know, kind of test time, compute capabilities, thinking levels, all like the context rot issues.So many interesting, you know, kind of, uh, uh, capabilities that are, that are now improvingswyx: one sector that you have. That's interesting.Industries and Datasetsswyx: Uh, people are roughly familiar with healthcare and legal, but you have public sector in there.Aaron Levie: Yeah.swyx: Uh, what's that? Like, what, what, what is that?Aaron Levie: Yeah, and, and we actually test against, I dunno, maybe 10 industries.We, we end up usually just cutting a few that we think have interesting gains. All extras, won a lot of like government type documents. Um,swyx: what is that? What is it? Government type documents?Aaron Levie: Government filings. Like a taxswyx: return, likeAaron Levie: a probably not tax returns. It would be more of what would go the government be using, uh, as data.So, okay. Um, so think about research that, that type of, of, of data sets. And then we have financial services for things like data rooms and what would be in an investment prospectus. Uhhuh,swyx: that one you can dog food.Aaron Levie: Yeah, exactly. Exactly. Yes. Yes. [00:34:00] So, uh, so we, we run the models, um, in now, you know, more of an agent mode, but, but still with, with kinda limited capacity and just try and see like on a, like, for like basis, what are the improvements?And, and again, we just continue to be blown away by. How, how good these models are getting.swyx: Yeah, I mean, I think every serious AI company needs something like that where like, well, this is the work we do. Here's our company eval. Yeah. And if you don't have it, well, you're not a serious AI company.Aaron Levie: There's two dimensions, right?So there's, there's like, how are the models improving? And so which models should you either recommend a customer use, which one should you adopt? But then every single day, we're making changes to our agents. And you need to knowswyx: if you regressed,Aaron Levie: if you know. Yeah. You know, I've been fully convinced that the whole agent observability and eval space is gonna be a massive space.Um, super excited for what Braintrust is doing, excited for, you know, Lang Smith, all the things. And I think what you're going to, I mean, this is like every enter like literally every enterprise right now. It's like the AI companies are the customers of these tools. Every enterprise will have this. Yeah, you'll just [00:35:00] have to have an eval.Of all of your work and like, we'll, you'll have an eval of your RFP generation, you'll have an eval of your sales material creation. You'll have an eval of your, uh, invoice processing. And, and as you, you know, buy or use new agentic systems, you are gonna need to know like, what's the quality of your, of your pipeline.swyx: Yeah.Aaron Levie: Um, so huge, huge market with agent evals.swyx: Yeah.Building the Agent Teamswyx: And, and you know, I'm gonna shout out your, your team a bit, uh, your CTO, Ben, uh, did a great talk with us last year. Awesome. And he's gonna come back again. Oh, cool. For World's Fair.Aaron Levie: Yep.swyx: Just talk about your team, like brag a little bit. I think I, I think people take these eval numbers in pretty charts for granted, but No, there, I mean, there's, there's lots of really smart people at work during all this.Aaron Levie: Biggest shout out, uh, is we have a, we have a couple folks at Dya, uh, Sidarth, uh, that, that kind of run this. They're like a, you know, kind of tag tag team duo on our evals, Ben, our CTO, heavily involved Yasha, head of ai, uh, you know, a bunch of folks. And, um, evals is one part of the story. And then just like the full, you know, kind of AI.An agent team [00:36:00] is, uh, is a, is a pretty, you know, is core to this whole effort. So there's probably, I don't know, like maybe a few dozen people that are like the epicenter. And then you just have like layers and layers of, of kind of concentric circles of okay, then there's a search team that supports them and an infrastructure team that supports them.And it's starting to ripple through the entire company. But there's that kind of core agent team, um, that's a pretty, pretty close, uh, close knit group.swyx: The search team is separate from the infra team.Aaron Levie: I mean, we have like every, every layer of the stack we have to kind of do, except for just pure public cloud.Um, but um, you know, we, we store, I don't even know what our public numbers are in, you know, but like, you can just think about it as like a lot of data is, is stored in box. And so we have, and you have every layer of the, of the stack of, you know, how do you manage the data, the file system, the metadata system, the search system, just all of those components.And then they all are having to understand that now you've got this new customer. Which is the agent, and they've been building for two types of customers in the past. They've been building for users and they've been building for like applications. [00:37:00] And now you've got this new agent user, and it comes in with a difference of it, of property sometimes, like, hey, maybe sometimes we should do embeddings, an embedding based, you know, kind of search versus, you know, your, your typical semantic search.Like, it's just like you have to build the, the capabilities to support all of this. And we're testing stuff, throwing things away, something doesn't work and, and not relevant. It's like just, you know, total chaos. But all of those teams are supporting the agent team that is kind of coming up with its requirements of what, what do we need?swyx: Yeah. No, uh, we just came from, uh, fireside chat where you did, and you, you talked about how you're doing this. It's, it's kind of like an internal startup. Yeah. Within the broader company. The broader company's like 3000 people. Yeah. But you know, there's, there's a, this is a core team of like, well, here's the innovation center.Aaron Levie: Yeah.swyx: And like that every company kind of is run this way.Aaron Levie: Yeah. I wanna be sensitive. I don't call it the innovation center. Yeah. Only because I think everybody has to do innovation. Um, there, there's a part of the, the, the company that is, is sort of do or die for the agent wave.swyx: Yeah.Aaron Levie: And it only happens to be more of my focus simply because it's existential that [00:38:00] we get it right.swyx: Yeah.Aaron Levie: All of the supporting systems are necessary. All of the surrounding adjacent capabilities are necessary. Like the only reason we get to be a platform where you'd run an agent is because we have a security feature or a compliance feature, or a governance feature that, that some team is working on.But that's not gonna be the make or break of, of whether we get agents right. Like that already exists and we need to keep innovating there. I don't know what the right, exact precise number is, but it's not a thousand people and it's not 10 people. There's a number of people that are like the, the kind of like, you know, startup within the company that are the make or break on everything related to AI agents, you know, leveraging our platform and letting you work with your data.And that's where I spend a lot of my time, and Ben and Yosh and Diego and Teri, you know, these are just, you know, people that, that, you know, kind of across the team. Are working.swyx: Yeah. Amazing.Read Write Agent WorkflowsJeff Huber: How do you, how do you think about, I mean, you talked a lot about like kinda read workflows over your box data. Yep.Right. You know, gen search questions, queries, et cetera. But like, what about like, write or like authoring workflows?Aaron Levie: Yes. I've [00:39:00] already probably revealed too much actually now that I think about it. So, um, I've talked about whatever,Jeff Huber: whatever you can.Aaron Levie: Okay. It's just us. It's just us. Yeah. Okay. Of course, of course.So I, I guess I would just, uh, I'll make it a little bit conceptual, uh, because again, I've already, I've already said things that are not even ga but, but we've, we've kinda like danced around it publicly, so I, yeah, yeah. Okay. Just like, hopefully nobody watches this, um, episode. No.swyx: It's tidbits for the Heidi engaged to go figure out like what exactly, um, you know, is, is your sort of line of thinking.Sure. They can connect the dots.Aaron Levie: Yeah. So, so I would say that, that, uh, we, you know, as a, as a place where you have your enterprise content, there's a use case where I want to, you know, have an agent read that data and answer questions for me. And then there's a use case where I want the agent to create something.And use the file system to create something or store off data that it's working on, or be able to have, you know, various files that it's writing to about the work it's doing. So we do see it as a total read write. The harder problem has so far been the read only because, because again, you have that kind of like 10 [00:40:00] million to one ratio problem, whereas rights are a lot of, that's just gonna come from the model and, and we just like, we'll just put it in the file system and kinda use it.So it's a little bit of a technically easier problem, but the only part that's like, not necessarily technically hard, it is just like it's not yet perfected in the state of the ecosystem is, you know, building a beautiful PowerPoint presentation. It's still a hard problem for these models. Like, like we still, you know, like, like these formats are just, we're not built for.They'reswyx: working on it.Aaron Levie: They're, they're working on it. Everybody's working on it.swyx: Every launch is like, well, we do PowerPoint now.Aaron Levie: We're getting, yeah, getting a lot, getting a lot of better each time. But then you'll do this thing where you'll ask the update one slide and all of a sudden, like the fonts will be just like a little bit different, you know, on two of the slides, or it moved, you know, some shape over to the left a little bit.And again, these are the kind of things that, like in code, obviously you could really care about if you really care about, you know, how beautiful is the code, but at the end, user doesn't notice all those problems and file creation, the end user instantly sees it. You're [00:41:00] like, ah, like paragraph three, like, you literally just changed the font on me.Like it's a totally different font and like midway through the document. Mm-hmm. Those are the kind of things that you run into a lot of in the, in the content creation side. So, mm-hmm. We are gonna have native agents. That do all of those things, they'll be powered by the leading kind of models and labs.But the thing that I think is, is probably gonna be a much bigger idea over time is any agent on any system, again, using Box as a file system for its work, and in that kind of scenario, we don't necessarily care what it's putting in the file system. It could put its memory files, it could put its, you know, specification, you know, documents.It could put, you know, whatever its markdown files are, or it could, you know, generate PDFs. It's just like, it's a workspace that is, is sort of sandboxed off for its work. People can collaborate into it, it can share with other people. And, and so we, we were thinking a lot about what's the right, you know, kind of way to, to deliver that at scale.Docs Graphs and Founder Modeswyx: I wanted to come into sort of the sort of AI transformation or AI sort of, uh, operations things. [00:42:00] Um, one of the tweets that you, that you wanted to talk about, this is just me going through your tweets, by the way. Oh, okay. I mean, like, this is, you readAaron Levie: one by one,swyx: you're the, you're the easiest guest to prep for because you, you already have like, this is the, this is what I'm interested in.I'm like, okay, well, areAaron Levie: we gonna get to like, like February, January or something? Where are we in the, in the timelines? How far back are we going?swyx: Can you, can you describe boxes? A set of skills? Right? Like that, that's like, that's like one of the extremes of like, well if you, you just turn everything into a markdown file.Yeah. Then your agent can run your company. Uh, like you just have to write, find the right sequence of words toAaron Levie: Yes.swyx: To do it.Aaron Levie: Sorry, isthatswyx: the question? So I think the question is like, what if we documented everything? Yes. The way that you exactly said like,Aaron Levie: yes.swyx: Um, let's get all the Fortune five hundreds, uh, prepared for agents.Yes. And like, you know, everything's in golden and, and nicely filed away and everything. Yes. What's missing? Like, what's left, right? LikeAaron Levie: Yeah.swyx: You've, you've run your company for a decade. LikeAaron Levie: Yeah. I think the challenge is that, that that information changes a week later. And because something happened in the market for that [00:43:00] customer, or us as a company that now has to go get updated, and so these systems are living and breathing and they have to experience reality and updates to reality, which right now is probably gonna be humans, you know, kinda giving those, giving them the updates.And, you know, there is this piece about context graphs as as, uh, that kinda went very viral. Yeah. And I, I, I was like a, i, I, I thought it was super provocative. I agreed with many parts of it. I disagree with a few parts around. You know, it's not gonna be as easy as as just if we just had the agent traces, then we can finally do that work because there's just like, there's so much more other stuff that that's happening that, that we haven't been able to capture and digitize.And I think they actually represented that in the piece to be clear. But like there's just a lot of work, you know, that that has to, you just can't have only skills files, you know, for your company because it's just gonna be like, there's gonna be a lot of other stuff that happens. Yeah. Change over time.Yeah. Most companies are practically apprenticeships.swyx: Most companies are practically apprenticeships. LikeJeff Huber: every new employee who joins the team, [00:44:00] like you span one to three months. Like ramping them up.Aaron Levie: Yes. AllJeff Huber: that tat knowledgeAaron Levie: isJeff Huber: not written down.Aaron Levie: Yes.Jeff Huber: But like, it would have to be if you wanted to like give it to an Asian.Right. And so like that seems to me like to beAaron Levie: one is I think you're gonna see again a premium on companies that can document this. Mm-hmm. Much. There'll be a huge premium on that because, because you know, can you shorten that three month ramp cycle to a two week ramp cycle? That's an instant productivity gain.Can you re dramatically reduce rework in the organization because you've documented where all the stuff is and where the answers are. Can you make your average employee as good as your 90th percentile employee because you've captured the knowledge that's sort of in the heads of, of those top employees and make that available.So like you can see some very clear productivity benefits. Mm-hmm. If you had a company culture of making sure you know your information was captured, digitized, put in a format that was agent ready and then made available to agents to work with, and then you just, again, have this reality of like add a 10,000 person [00:45:00] company.Mapping that to the, you know, access structure of the company is just a hard problem. Is like, is like, yeah, well, you just, not every piece of information that's digitized can be shared to everybody. And so now you have to organize that in a way that actually works. There was a pretty good piece, um, this, this, uh, this piece called your company as a file is a file system.I, did you see that one?swyx: Nope.Aaron Levie: Uh, yes. You saw it. Yeah. And, and, uh, I actually be curious your thoughts on it. Um, like, like an interesting kind of like, we, we agree with it because, because that's how we see the world and, uh,swyx: okay. We, we have it up on screen. Oh,Aaron Levie: okay. Yeah. But, but it's all about basically like, you know, we've already, we, we, we already organized in this kind of like, you know, permission structure way.Uh, and, and these are the kind of, you know, natural ways that, that agents can now work with data. So it's kind of like this, this, you know, kind of interesting metaphor, but I do think companies will have to start to think about how they start to digitize more, more of that data. What was your take?Jeff Huber: Yeah, I mean, like the company's probably like an acid compliant file system.Aaron Levie: Uh,Jeff Huber: yeah. Which I'm guessing boxes, right? So, yeah. Yes.swyx: Yeah. [00:46:00]Jeff Huber: Which you have a great piece on, but,swyx: uh, yeah. Well, uh, I, I, my, my, my direction is a little bit like, I wanna rewind a little bit to the graph word you said that there, that's a magic trigger word for us. I always ask what's your take on knowledge graphs?Yeah. Uh, ‘cause every, especially at every data database person, I just wanna see what they think. There's been knowledge graphs, hype cycles, and you've seen it all. So.Aaron Levie: Hmm. I actually am not the expert in knowledge graphs, so, so that you might need toswyx: research, you don't need to be an expert. Yeah. I think it's just like, well, how, how seriously do people take it?Yeah. Like, is is, is there a lot of potential in the, in the HOVI?Aaron Levie: Uh, well, can I, can I, uh, understand first if it's, um, is this a loaded question in the sense of are you super pro, super con, super anti medium? Iswyx: see pro, I see pros and cons. Okay. Uh, but I, I think your opinion should be independent of mine.Aaron Levie: Yeah. No, no, totally. Yeah. I just want to see what I'm stepping into.swyx: No, I know. It's a, and it's a huge trigger word for a lot of people out Yeah. In our audience. And they're, they're trying to figure out why is that? Because whyAaron Levie: is this such aswyx: hot item for them? Because a lot of people get graph religion.And they're like, everything's a graph. Of course you have to represent it as a graph. Well, [00:47:00] how do you solve your knowledge? Um, changing over time? Well, it's a graph.Aaron Levie: Yeah.swyx: And, and I think there, there's that line of work and then there's, there's a lot of people who are like, well, you don't need it. And both are right.Aaron Levie: Yeah. And what do the people who say you don't need it, what are theyswyx: arguing for Mark down files. Oh, sure, sure. Simplicity.Aaron Levie: Yeah.swyx: Versus it's, it's structure versus less structure. Right. That's, that's all what it is. I do.Aaron Levie: I think the tricky thing is, um, is, is again, when this gets met with real humans, they're just going to their computer.They're just working with some people on Slack or teams. They're just sharing some data through a collaborative file system and Google Docs or Box or whatever. I certainly like the vision of most, most knowledge graph, you know, kind of futuristic kind of ways of thinking about it. Uh, it's just like, you know, it's 2026.We haven't seen it yet. Kind of play out as as, I mean, I remember. Do you remember the, um, in like, actually I don't, I don't even know how old you guys are, but I'll for, for to show my age. I remember 17 years ago, everybody thought enterprises would just run on [00:48:00] Wikis. Yeah. And, uh, confluence and, and not even, I mean, confluence actually took off for engineering for sure.Like unquestionably. But like, this was like everything would be in the w. And I think based on our, uh, our, uh, general style of, of, of what we were building, like we were just like, I don't know, people just like wanna workspace. They're gonna collaborate with other people.swyx: Exactly. Yeah. So you were, you were anti-knowledge graph.Aaron Levie: Not anti, not anti. Soswyx: not nonAaron Levie: I'm not, I'm not anti. ‘cause I think, I think your search system, I just think these are two systems that probably, but like, I'm, I'm not in any religious war. I don't want to be in anybody's YouTube comments on this. There's not a fight for me.swyx: We, we love YouTube comments. We're, we're, we're get into comments.Aaron Levie: Okay. Uh, but like, but I, I, it's mostly just a virtue of what we built. Yeah. And we just continued down that path. Yeah.swyx: Yeah.Aaron Levie: And, um, and that, that was what we pursued. But I'm not, this is not a, you know, kind of, this is not a, uh, it'sswyx: not existential for you. Great.Aaron Levie: We're happy to plug into somebody else's graph.We're happy to feed data into it. We're happy for [00:49:00] agents to, to talk to multiple systems. Not, not our fight.swyx: Yeah.Aaron Levie: But I need your answer. Yeah. Graphs or nerd Snipes is very effective nerd.swyx: See this is, this is one, one opinion and then I've,Jeff Huber: and I think that the actual graph structure is emergent in the mind of the agent.Ah, in the same way it is in the mind of the human. And that's a more powerful graph ‘cause it actually involved over time.swyx: So don't tell me how to graph. I'll, I'll figure it out myself. Exactly. Okay. All right. AndJeff Huber: what's yours?swyx: I like the, the Wiki approach. Uh, my, I'm actually

The OMFIF Podcast
Unpacking sandboxes

The OMFIF Podcast

Play Episode Listen Later Feb 20, 2026 38:38


As the new generation of financial infrastructure takes shape, the need for sophisticated tools to inform policy-makers is growing. To encourage innovation and preserve security, it is vital that policy-makers have the capacity to interact with and closely supervise new systems. Sandboxes are a key tool for this process. Carmelle Cadet, chief executive officer and founder of EmTech joins OMFIF to discuss the nuances of how sandboxes work and how they should be constructed.

GREY Journal Daily News Podcast
Can Regulatory Sandboxes Propel AI Innovation Without Stifling Creativity?

GREY Journal Daily News Podcast

Play Episode Listen Later Feb 16, 2026 2:36


Regulatory sandboxes allow startups to experiment with AI technologies in a controlled setting, fostering innovation while maintaining oversight. Joseph Joshy from IFSCA highlights their importance at the India AI Impact Summit 2026, noting they help balance innovation with risk management. Industry leaders discuss the need for India-specific AI protocols and emphasize the role of digital infrastructure in AI advancement.Learn more on this news by visiting us at: https://greyjournal.net/news/ Hosted on Acast. See acast.com/privacy for more information.

TRM Talks
EP. 104 | Pioneering Payments: Turning Sandboxes into Infrastructure with UOB's Kah Kit

TRM Talks

Play Episode Listen Later Feb 11, 2026 31:57


Kah Kit Yip has spent his career turning bold ideas about the future of money into working infrastructure.After nearly two decades at Malaysia's central bank and leading breakthrough initiatives at the Bank for International Settlements, he helped pioneer new models for instant cross-border payments, programmable compliance, and faster, safer FX settlement. Now, as Head of Blockchain and Digital Assets at UOB, he's bringing those concepts out of policy sandboxes and into production across Southeast Asia.In this conversation, Ari and Kah Kit explore how interoperable payment rails, wholesale CBDCs, and tokenized assets can move value across borders in real time — while meeting the highest standards for compliance and risk management. They discuss why legal clarity and shared standards are essential for adoption, how institutions can match the speed of digital payments with equally fast safeguards, and what it takes for regulators and industry to build global financial infrastructure together.Along the way, Kah Kit reflects on marathon running, crime novels, and the grit required to solve complex, long-horizon challenges in finance and technology.

The John Batchelor Show
S8 Ep235: REGULATING ARTIFICIAL INTELLIGENCE Colleague Kevin Frazier. Kevin Frazier continues, warning against a "waterfall of regulation" by states and advocating for "regulatory sandboxes" to allow experimentation. NUMBER 8

The John Batchelor Show

Play Episode Listen Later Dec 24, 2025 8:50


REGULATING ARTIFICIAL INTELLIGENCE Colleague Kevin Frazier. Kevin Frazier continues, warning against a "waterfall of regulation" by states and advocating for "regulatory sandboxes" to allow experimentation. NUMBER 8 NOVEMBER 1955

Slow Burn
Decoder Ring | Mailbag: Yo-Yos, Sandboxes, and Encores

Slow Burn

Play Episode Listen Later Dec 17, 2025 62:11


Decoder Ring listeners write in with some excellent mysteries, and for our last episode of the year we're solving three of them. Why do children play in boxes full of sand? Why do rock bands pretend like the show is over when everybody knows they're coming back for an encore? And what was up with those school assemblies where you'd get to skip class to learn about…yo-yos?The voices you'll hear in this episode include yo-yo masters ”Dazzling Dave” Schulte and Dale Oliver, children's book author Rob Peñas, Pulitzer Prize-winning design critic Alexandra Lange, and music journalists Brian Wise, Michael Walker, and Travis Andrews. You can find all the music from the segment about encores in this YouTube playlist.This episode was produced by Max Freedman, Katie Shepherd, and Evan Chung, Decoder Ring's supervising producer. Merritt Jacob is Senior Technical Director. We had additional production from Joel Meyer.If you have any cultural mysteries you want us to decode, email us at DecoderRing@slate.com or leave a message on our hotline at (347) 460-7281.Get more of Decoder Ring with Slate Plus! Join for exclusive bonus episodes of Decoder Ring and ad-free listening on all your favorite Slate podcasts. Subscribe from the Decoder Ring show page on Apple Podcasts or Spotify. Or, visit slate.com/decoderplus for access wherever you listen. Hosted on Acast. See acast.com/privacy for more information.

spotify acast pulitzer prize slate encores michael walker decoder ring sandboxes alexandra lange slate plus senior technical director joel meyer katie shepherd merritt jacob brian wise evan chung
Slow Burn
Decoder Ring | Mailbag: Yo-Yos, Sandboxes, and Encores

Slow Burn

Play Episode Listen Later Dec 17, 2025 66:41


Decoder Ring listeners write in with some excellent mysteries, and for our last episode of the year we're solving three of them. Why do children play in boxes full of sand? Why do rock bands pretend like the show is over when everybody knows they're coming back for an encore? And what was up with those school assemblies where you'd get to skip class to learn about…yo-yos? The voices you'll hear in this episode include yo-yo masters ”Dazzling Dave” Schulte and Dale Oliver, children's book author Rob Peñas, Pulitzer Prize-winning design critic Alexandra Lange, and music journalists Brian Wise, Michael Walker, and Travis Andrews.  You can find all the music from the segment about encores in this YouTube playlist. This episode was produced by Max Freedman, Katie Shepherd, and Evan Chung, Decoder Ring's supervising producer. Merritt Jacob is Senior Technical Director. We had additional production from Joel Meyer. If you have any cultural mysteries you want us to decode, email us at DecoderRing@slate.com or leave a message on our hotline at (347) 460-7281. Get more of Decoder Ring with Slate Plus! Join for exclusive bonus episodes of Decoder Ring and ad-free listening on all your favorite Slate podcasts. Subscribe from the Decoder Ring show page on Apple Podcasts or Spotify. Or, visit slate.com/decoderplus for access wherever you listen. Learn more about your ad choices. Visit megaphone.fm/adchoices

spotify pulitzer prize slate encores michael walker decoder ring sandboxes alexandra lange slate plus senior technical director joel meyer katie shepherd merritt jacob brian wise evan chung
Decoder Ring
Mailbag: Yo-Yos, Sandboxes, and Encores

Decoder Ring

Play Episode Listen Later Dec 17, 2025 66:41


Decoder Ring listeners write in with some excellent mysteries, and for our last episode of the year we're solving three of them. Why do children play in boxes full of sand? Why do rock bands pretend like the show is over when everybody knows they're coming back for an encore? And what was up with those school assemblies where you'd get to skip class to learn about…yo-yos? The voices you'll hear in this episode include yo-yo masters ”Dazzling Dave” Schulte and Dale Oliver, children's book author Rob Peñas, Pulitzer Prize-winning design critic Alexandra Lange, and music journalists Brian Wise, Michael Walker, and Travis Andrews.  You can find all the music from the segment about encores in this YouTube playlist. This episode was produced by Max Freedman, Katie Shepherd, and Evan Chung, Decoder Ring's supervising producer. Merritt Jacob is Senior Technical Director. We had additional production from Joel Meyer. If you have any cultural mysteries you want us to decode, email us at DecoderRing@slate.com or leave a message on our hotline at (347) 460-7281. Get more of Decoder Ring with Slate Plus! Join for exclusive bonus episodes of Decoder Ring and ad-free listening on all your favorite Slate podcasts. Subscribe from the Decoder Ring show page on Apple Podcasts or Spotify. Or, visit slate.com/decoderplus for access wherever you listen. Learn more about your ad choices. Visit megaphone.fm/adchoices

spotify pulitzer prize slate encores michael walker decoder ring sandboxes alexandra lange slate plus senior technical director joel meyer katie shepherd merritt jacob brian wise evan chung
Decoder Ring
Mailbag: Yo-Yos, Sandboxes, and Encores

Decoder Ring

Play Episode Listen Later Dec 17, 2025 62:11


Decoder Ring listeners write in with some excellent mysteries, and for our last episode of the year we're solving three of them. Why do children play in boxes full of sand? Why do rock bands pretend like the show is over when everybody knows they're coming back for an encore? And what was up with those school assemblies where you'd get to skip class to learn about…yo-yos?The voices you'll hear in this episode include yo-yo masters ”Dazzling Dave” Schulte and Dale Oliver, children's book author Rob Peñas, Pulitzer Prize-winning design critic Alexandra Lange, and music journalists Brian Wise, Michael Walker, and Travis Andrews. You can find all the music from the segment about encores in this YouTube playlist.This episode was produced by Max Freedman, Katie Shepherd, and Evan Chung, Decoder Ring's supervising producer. Merritt Jacob is Senior Technical Director. We had additional production from Joel Meyer.If you have any cultural mysteries you want us to decode, email us at DecoderRing@slate.com or leave a message on our hotline at (347) 460-7281.Get more of Decoder Ring with Slate Plus! Join for exclusive bonus episodes of Decoder Ring and ad-free listening on all your favorite Slate podcasts. Subscribe from the Decoder Ring show page on Apple Podcasts or Spotify. Or, visit slate.com/decoderplus for access wherever you listen. Hosted on Acast. See acast.com/privacy for more information.

spotify acast pulitzer prize slate encores michael walker decoder ring sandboxes alexandra lange slate plus senior technical director joel meyer katie shepherd merritt jacob brian wise evan chung
Slate Culture
Decoder Ring | Mailbag: Yo-Yos, Sandboxes, and Encores

Slate Culture

Play Episode Listen Later Dec 17, 2025 66:41


Decoder Ring listeners write in with some excellent mysteries, and for our last episode of the year we're solving three of them. Why do children play in boxes full of sand? Why do rock bands pretend like the show is over when everybody knows they're coming back for an encore? And what was up with those school assemblies where you'd get to skip class to learn about…yo-yos? The voices you'll hear in this episode include yo-yo masters ”Dazzling Dave” Schulte and Dale Oliver, children's book author Rob Peñas, Pulitzer Prize-winning design critic Alexandra Lange, and music journalists Brian Wise, Michael Walker, and Travis Andrews.  You can find all the music from the segment about encores in this YouTube playlist. This episode was produced by Max Freedman, Katie Shepherd, and Evan Chung, Decoder Ring's supervising producer. Merritt Jacob is Senior Technical Director. We had additional production from Joel Meyer. If you have any cultural mysteries you want us to decode, email us at DecoderRing@slate.com or leave a message on our hotline at (347) 460-7281. Get more of Decoder Ring with Slate Plus! Join for exclusive bonus episodes of Decoder Ring and ad-free listening on all your favorite Slate podcasts. Subscribe from the Decoder Ring show page on Apple Podcasts or Spotify. Or, visit slate.com/decoderplus for access wherever you listen. Learn more about your ad choices. Visit megaphone.fm/adchoices

spotify pulitzer prize slate encores michael walker decoder ring sandboxes alexandra lange slate plus senior technical director joel meyer katie shepherd merritt jacob brian wise evan chung
Slate Culture
Mailbag: Yo-Yos, Sandboxes, and Encores

Slate Culture

Play Episode Listen Later Dec 17, 2025 62:11


Decoder Ring listeners write in with some excellent mysteries, and for our last episode of the year we're solving three of them. Why do children play in boxes full of sand? Why do rock bands pretend like the show is over when everybody knows they're coming back for an encore? And what was up with those school assemblies where you'd get to skip class to learn about…yo-yos?The voices you'll hear in this episode include yo-yo masters ”Dazzling Dave” Schulte and Dale Oliver, children's book author Rob Peñas, Pulitzer Prize-winning design critic Alexandra Lange, and music journalists Brian Wise, Michael Walker, and Travis Andrews. You can find all the music from the segment about encores in this YouTube playlist.This episode was produced by Max Freedman, Katie Shepherd, and Evan Chung, Decoder Ring's supervising producer. Merritt Jacob is Senior Technical Director. We had additional production from Joel Meyer.If you have any cultural mysteries you want us to decode, email us at DecoderRing@slate.com or leave a message on our hotline at (347) 460-7281.Get more of Decoder Ring with Slate Plus! Join for exclusive bonus episodes of Decoder Ring and ad-free listening on all your favorite Slate podcasts. Subscribe from the Decoder Ring show page on Apple Podcasts or Spotify. Or, visit slate.com/decoderplus for access wherever you listen. Hosted on Acast. See acast.com/privacy for more information.

spotify acast pulitzer prize slate encores michael walker decoder ring sandboxes alexandra lange slate plus senior technical director joel meyer katie shepherd merritt jacob brian wise evan chung
Slate Daily Feed
Mailbag: Yo-Yos, Sandboxes, and Encores

Slate Daily Feed

Play Episode Listen Later Dec 17, 2025 62:11


Decoder Ring listeners write in with some excellent mysteries, and for our last episode of the year we're solving three of them. Why do children play in boxes full of sand? Why do rock bands pretend like the show is over when everybody knows they're coming back for an encore? And what was up with those school assemblies where you'd get to skip class to learn about…yo-yos?The voices you'll hear in this episode include yo-yo masters ”Dazzling Dave” Schulte and Dale Oliver, children's book author Rob Peñas, Pulitzer Prize-winning design critic Alexandra Lange, and music journalists Brian Wise, Michael Walker, and Travis Andrews. You can find all the music from the segment about encores in this YouTube playlist.This episode was produced by Max Freedman, Katie Shepherd, and Evan Chung, Decoder Ring's supervising producer. Merritt Jacob is Senior Technical Director. We had additional production from Joel Meyer.If you have any cultural mysteries you want us to decode, email us at DecoderRing@slate.com or leave a message on our hotline at (347) 460-7281.Get more of Decoder Ring with Slate Plus! Join for exclusive bonus episodes of Decoder Ring and ad-free listening on all your favorite Slate podcasts. Subscribe from the Decoder Ring show page on Apple Podcasts or Spotify. Or, visit slate.com/decoderplus for access wherever you listen. Hosted on Acast. See acast.com/privacy for more information.

spotify acast pulitzer prize slate encores michael walker decoder ring sandboxes alexandra lange slate plus senior technical director joel meyer katie shepherd merritt jacob brian wise evan chung
Slate Daily Feed
Decoder Ring | Mailbag: Yo-Yos, Sandboxes, and Encores

Slate Daily Feed

Play Episode Listen Later Dec 17, 2025 66:41


Decoder Ring listeners write in with some excellent mysteries, and for our last episode of the year we're solving three of them. Why do children play in boxes full of sand? Why do rock bands pretend like the show is over when everybody knows they're coming back for an encore? And what was up with those school assemblies where you'd get to skip class to learn about…yo-yos? The voices you'll hear in this episode include yo-yo masters ”Dazzling Dave” Schulte and Dale Oliver, children's book author Rob Peñas, Pulitzer Prize-winning design critic Alexandra Lange, and music journalists Brian Wise, Michael Walker, and Travis Andrews.  You can find all the music from the segment about encores in this YouTube playlist. This episode was produced by Max Freedman, Katie Shepherd, and Evan Chung, Decoder Ring's supervising producer. Merritt Jacob is Senior Technical Director. We had additional production from Joel Meyer. If you have any cultural mysteries you want us to decode, email us at DecoderRing@slate.com or leave a message on our hotline at (347) 460-7281. Get more of Decoder Ring with Slate Plus! Join for exclusive bonus episodes of Decoder Ring and ad-free listening on all your favorite Slate podcasts. Subscribe from the Decoder Ring show page on Apple Podcasts or Spotify. Or, visit slate.com/decoderplus for access wherever you listen. Learn more about your ad choices. Visit megaphone.fm/adchoices

spotify pulitzer prize slate encores michael walker decoder ring sandboxes alexandra lange slate plus senior technical director joel meyer katie shepherd merritt jacob brian wise evan chung
AI in Action Ireland
E222 'Building Trust and Innovation in AI' with ADAPT Centre's Declan McKibben

AI in Action Ireland

Play Episode Listen Later Nov 20, 2025 30:49


Today's guest is Declan McKibben, Executive Director at the ADAPT Centre. Organisations worldwide are grappling with the realities and opportunities posed by AI-enabled tools and technologies. In this podcast, Declan offers invaluable insights into how AI is reshaping industries, the challenges associated with its adoption and the collaborative efforts needed to harness its full potential.Topics include:0:00 An overview of ADAPT as Ireland's human-centered AI research centre 2:44 Their work with Smart D8 and Digital Twin, which uses AI to model and improve cities6:45 Creating the Generative AI lab to guide Dublin City Council's safe adoption9:19 Running workshops to build AI literacy and inclusive decision-making11:40 How the AI lab prioritizes use cases with responsible, agile testing14:14 Award-winning AI projects showcased for public sector transformation15:37 How trust and measurement enable responsible, scalable AI adoption18:23 Sandboxes enable practical AI exploration amid investment uncertainty22:35 How their AI Accountability Lab ensures responsible, fair, socially beneficial AI28:00 ADAPT Centre as a sponsor for the 2025 AI Awards

DEEPTECH DEEPTALK
Ökosysteme

DEEPTECH DEEPTALK

Play Episode Listen Later Nov 5, 2025 32:34


Nach einer kurzen Herbstpause sprechen Alois und Oliver darüber, wie echte Deep-Tech-Ökosysteme auf Stadtebene entstehen – jenseits von Buzzwords. Sie ordnen ein, was Deep Tech wirklich bedeutet, wo die Grenze zu Commodity-Technologien verläuft und weshalb der Begriff aktuell oft inflationär verwendet wird. Ausgehend von Deutschlands neuer Hightech-Agenda (mit Schwerpunkten wie KI, Quantencomputing und Biotechnologie) diskutieren sie, wie nationale Programme lokale Cluster beeinflussen und warum Orte mit starker Forschungs- und Infrastrukturbasis – etwa DESY und die Science City in Hamburg – als Anker dienen.Die Episode zeigt, wie Konvergenz zwischen Technologien wie KI, Quanten- und Neuromorphik-Computing, Robotik oder XR Tempo aufnimmt und welche Rolle „universelle Schnittstellen“ wie große Sprachmodelle dabei spielen. Zugleich machen die beiden deutlich, dass Grundlagenforschung geschützte Domänen braucht, während Reallabore und Sandboxes die Brücke in die Anwendung schlagen. An konkreten Beispielen wird sichtbar, weshalb kurze Wege, stabile Rahmenbedingungen, verlässliche Testflächen und klar profilierte Stadt-Stärken den Unterschied zwischen Theorie und Transfer ausmachen.Ein zentrales Motiv der Folge ist der „Übersetzer“: Personen und Formate, die Forschung, Industrie und – oft als Ankerkunden – den Staat zielgerichtet zusammenbringen. Statt Event-Inflation braucht es kuratierte Räume, in denen Themen kurz vor der Marktreife fokussiert beschleunigt werden. Genau hier setzt der Deep Tech Campus Circle an: als bewusst exklusiver Ort, an dem Köpfe, Kapital und konkrete Use Cases aufeinandertreffen, um aus Keimlingen skalierbare Innovationen zu machen.Am Ende bleibt die klare These: Deep Tech entsteht, wenn tiefe Expertise, Geduld und ein klug orchestriertes Ökosystem zusammenwirken. Wer verstehen will, wie Städte vom Strategiepapier zur sichtbaren Wertschöpfung kommen, bekommt in dieser Folge eine kompakte, praxisnahe Landkarte – von der Saat in der Grundlagenforschung bis zur Ernte im Markt.

Radar Contact
Season 5 - 024 - Antoine Martin, French DGAC - Drone Sandboxes

Radar Contact

Play Episode Listen Later Oct 7, 2025 23:28


Drone sandboxes are a way to bridge the gap between regulation and reality, allowing for testing in safe ways. On this episode, Vincent Lambercy discusses with Antoine Martin. Antoine serves as "Advanced ATM Officer" with the French Directorate of General Civil Aviation and has been with the french ANSP DSNA before.We discuss the status of the current drone regulations in Europe, how the French CAA facilitates testing for drone operations, Beyond Visual Line of Sight (BVLOS) flights, and more.Why are sandboxes useful, and even needed, and how they will evolve in the future? How is the gap between the regulation and the standards, and what is missing for routine BVLOS flights?We discuss this and more, and Antoine's question is to Florian Guillermet.

Open Source Startup Podcast
E176: Why All AI Agents Will Need Cloud Sandboxes

Open Source Startup Podcast

Play Episode Listen Later Jun 23, 2025 37:19


Vasek Mlejnsky is Co-Founder & CEO of E2B, the open-source runtime for executing AI-generated code in secure cloud sandboxes. Essentially, they give AI agents cloud computers. Their open source repos, particularly e2b which has 9K GitHub stars, have been widely adopted to help securely run AI-generated code. E2B has raised $12M from investors including Decibel and Sunflower. In this episode, we dig into:Why agents need a sandboxBuilding a new category of infra tooling, much like LaunchDarkly Some of their viral content moments - including Greg Brockman sharing their videosFiguring out the right commercial offering Why they don't agree with pricing per token Why moving from Prague to the Bay Area felt essential for them as founders

Irish Tech News Audio Articles
ISPCC announces global project to prevent online child sexual exploitation and abuse

Irish Tech News Audio Articles

Play Episode Listen Later Jun 10, 2025 4:26


The project, spearheaded by Greek non-profit child welfare organisation The Smile of the Child, will be co-created by children and young people to ensure their voices are heard. The ISPCC is honoured to announce its participation in a worldwide project designed to transform how we prevent and respond to online child sexual exploitation and abuse. Safe Online, a global fund dedicated to eradicating online child sexual exploitation and abuse, is funding the project called "Sandboxing and Standardising Child Online Redress". The COR Sandbox project will establish a first-of-its-kind mechanism to advance child online safety through collaboration across sectors, borders and generations. The project is led by The Smile of the Child, Greece's premier child welfare organisation and ISPCC is a partner alongside The Young and Resilient Research Centre at Western Sydney University, Child Helpline International and the Centre for Digital Policy at University College Dublin. Sandboxes bring together industry, regulators and customers in a safe space to test innovative products and services without incurring regulatory sanctions and they are mainly used in the finance sector to test new services. The EU is increasingly encouraging the use of sandboxes in the field of high technology and artificial intelligence. Through the participation of youth, platforms, regulators and online safety experts, this first regulatory sandbox for child digital wellbeing will provide for consistent, systemic care and redress for children from online harm, based on their rights under the United Nations Convention on the Rights of the Child (UNCRC). Getting reporting and redress right means that we can keep track of harms and be able to identify systemic risk. Co-designing the reporting and redress process with young people as equitable participants can help us understand what they expect from the reporting process and what remedies are fair for them putting Article 12 of the UNCRC into action. The project also benefits from the guidance of renowned digital safety experts, including Project Lead and Scientific Coordinator Ioanna Noula, PhD, an international expert on tech policy and children's rights; pioneering online safety and youth rights advocate Anne Collier; youth rights and participation expert Amanda Third, PhD, of the Young and Resilient Research Centre; international innovation management consultant Nicky Hickman; IT innovation and startup founder Jez Goldstone; and leading child online wellbeing scholar Tijana Milosevic, PhD. ISPCC Head of Policy and Public Affairs Fiona Jennings said: "This project is a wonderful example of what we can achieve when we collaborate and listen to children and young people. Having robust online reporting mechanisms in place is a key policy objective for ISPCC and this project will go a long way towards making the online world safer for children and young people to participate in." Project lead Ioanna Noula said: "ISPCC's contribution to a project, which seeks to build coherence around the issue of online redress, will be a catalyst for real and substantial change in the area of online reporting. Helplines play a key role in flagging illegal and/or harmful content. As the experts in listening and responding to children, ISPCC can provide insight from an Irish context to help spearheading the implementation of the Digital Services Act and the wellbeing of children online." See more stories here. More about Irish Tech News Irish Tech News are Ireland's No. 1 Online Tech Publication and often Ireland's No.1 Tech Podcast too. You can find hundreds of fantastic previous episodes and subscribe using whatever platform you like via our Anchor.fm page here: https://anchor.fm/irish-tech-news If you'd like to be featured in an upcoming Podcast email us at Simon@IrishTechNews.ie now to discuss. Irish Tech News have a range of services available to help promote your business. Why not drop us a line at Info@IrishTechNews.ie now to ...

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Vasek Mlejnsky from E2B joins us today to talk about sandboxes for AI agents. In the last 2 years, E2B has grown from a handful of developers building on it to being used by ~50% of the Fortune 500 and generating millions of sandboxes each week for their customers. As the “death of chat completions” approaches, LLMs workflows and agents are relying more and more on tool usage and multi-modality.The most common use cases for their sandboxes:- Run data analysis and charting (like Perplexity)- Execute arbitrary code generated by the model (like Manus does)- Running evals on code generation (see LMArena Web)- Doing reinforcement learning for code capabilities (like HuggingFace)Full Video EpisodeTimestamps00:00:00 Introductions00:00:37 Origin of DevBook -> E2B00:02:35 Early Experiments with GPT-3.5 and Building AI Agents00:05:19 Building an Agent Cloud00:07:27 Challenges of Building with Early LLMs00:10:35 E2B Use Cases00:13:52 E2B Growth vs Models Capabilities00:15:03 The LLM Operating System (LLMOS) Landscape00:20:12 Breakdown of JavaScript vs Python Usage on E2B00:21:50 AI VMs vs Traditional Cloud00:26:28 Technical Specifications of E2B Sandboxes00:29:43 Usage-based billing infrastructure00:34:08 Pricing AI on Value Delivered vs Token Usage00:36:24 Forking, Checkpoints, and Parallel Execution in Sandboxes00:39:18 Future Plans for Toolkit and Higher-Level Agent Frameworks00:42:35 Limitations of Chat-Based Interfaces and the Future of Agents00:44:00 MCPs and Remote Agent Capabilities00:49:22 LLMs.txt, scrapers, and bad AI bots00:53:00 Manus and Computer Use on E2B00:55:03 E2B for RL with Hugging Face00:56:58 E2B for Agent Evaluation on LMArena00:58:12 Long-Term Vision: E2B as Full Lifecycle Infrastructure for LLMs01:00:45 Future Plans for Hosting and Deployment of LLM-Generated Apps01:01:15 Why E2B Moved to San Francisco01:05:49 Open Roles and Hiring Plans at E2B Get full access to Latent.Space at www.latent.space/subscribe

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Vasek Mlejnsky from E2B joins us today to talk about sandboxes for AI agents. In the last 2 years, E2B has grown from a handful of developers building on it to being used by ~50% of the Fortune 500 and generating millions of sandboxes each week for their customers. As the “death of chat completions” approaches, LLMs workflows and agents are relying more and more on tool usage and multi-modality. The most common use cases for their sandboxes: - Run data analysis and charting (like Perplexity) - Execute arbitrary code generated by the model (like Manus does) - Running evals on code generation (see LMArena Web) - Doing reinforcement learning for code capabilities (like HuggingFace) Timestamps: 00:00:00 Introductions 00:00:37 Origin of DevBook -> E2B 00:02:35 Early Experiments with GPT-3.5 and Building AI Agents 00:05:19 Building an Agent Cloud 00:07:27 Challenges of Building with Early LLMs 00:10:35 E2B Use Cases 00:13:52 E2B Growth vs Models Capabilities 00:15:03 The LLM Operating System (LLMOS) Landscape 00:20:12 Breakdown of JavaScript vs Python Usage on E2B 00:21:50 AI VMs vs Traditional Cloud 00:26:28 Technical Specifications of E2B Sandboxes 00:29:43 Usage-based billing infrastructure 00:34:08 Pricing AI on Value Delivered vs Token Usage 00:36:24 Forking, Checkpoints, and Parallel Execution in Sandboxes 00:39:18 Future Plans for Toolkit and Higher-Level Agent Frameworks 00:42:35 Limitations of Chat-Based Interfaces and the Future of Agents 00:44:00 MCPs and Remote Agent Capabilities 00:49:22 LLMs.txt, scrapers, and bad AI bots 00:53:00 Manus and Computer Use on E2B 00:55:03 E2B for RL with Hugging Face 00:56:58 E2B for Agent Evaluation on LMArena 00:58:12 Long-Term Vision: E2B as Full Lifecycle Infrastructure for LLMs 01:00:45 Future Plans for Hosting and Deployment of LLM-Generated Apps 01:01:15 Why E2B Moved to San Francisco 01:05:49 Open Roles and Hiring Plans at E2B

Cribl: The Stream Life
Introducing Lakehouse!

Cribl: The Stream Life

Play Episode Listen Later Feb 26, 2025 24:24


In this episode of The Stream Life Podcast, Joel Vincent joins the show to talk about Cribl's latest innovation: a Lakehouse that's purpose built for telemetry data.  Resources Read the blog If you want to automatically get every episode of the Stream Life podcast, you can subscribe on your favorite podcast app. Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl's suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs. We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Bill Handel on Demand
Google Distances Themselves from DEI | California EV Sales

Bill Handel on Demand

Play Episode Listen Later Feb 6, 2025 23:29 Transcription Available


KFI White House correspondent Jon Decker speaks on the latest out of the Oval Office. Google joins the list of companies distancing themselves from DEI. California EV sales flat in 2024. Can Newsom's mandate be met? Researchers tested sandboxes and street dust for lead after Eaton Fire… Here's what they found.

Corporate Escapees
590 - Visual Maps and Sandboxes: Making Tech Solutions Tangible for Clients

Corporate Escapees

Play Episode Listen Later Feb 6, 2025 6:08


Have you ever left a sales meeting feeling uncertain whether your potential client will buy? If you've experienced clients nodding along but still feeling overwhelmed, you're not alone. In this solo episode, I share two powerful actions you can take to simplify the buying process for your clients. First, we'll explore the importance of using a visual industry map that highlights their journey and pain points. Then, I'll discuss how building an industry-specific sandbox allows clients to interact with your solution tangibly. These strategies have proven effective for tech consultants and partners across various industries, and I'm excited to help you implement them in your own sales process.Resources and LinksPrevious episode: 589 - AI 1st business with John MunsellCheck out more episodes of The Paul Higgins ShowSubscribe to our YouTube channel: @PaulHigginsMentoringThe Tech Consultant's RoadmapJoin our newsletterJoin the Tech CollectiveSuggested resourcesFind out more about Paul and how he can help you

Taking 20 Podcast
Ep 246 - Railroads vs Sandboxes 2

Taking 20 Podcast

Play Episode Listen Later Jan 26, 2025 20:07


There are railroads and there are sandboxes and never the twain shall meet, right?  Not so fast, my friend.  Today Jeremy discusses the two game types and multiple ways to blend them together.   #pf2e #Pathfinder #gmtips #dmtips #dnd #rpg #railroad #sandbox Resources: Game Masters of Exandria Roundtable:  https://www.youtube.com/watch?v=LmZSWKPXhZ4 Sandbox vs Railroad:  Which is Better?  https://www.thearcanelibrary.com/blogs/news/sandbox-vs-railroad-which-is-better?srsltid=AfmBOopkIlstCFvZ_Z2HSQu1-oxxkmLXKKzgH5jLgjzjQuX0m5_INiLA

Some Derps Talk About Games
Sandboxes Vs Railroads (TTRPG)

Some Derps Talk About Games

Play Episode Listen Later Jan 21, 2025 113:25


Imagine you're in a facebook group chat with a bunch of people and someone asks what constitutes a sandbox in a TTRPG. How much does improv vs. preparation factor into it? Can the players go off and do absolutely anything they want? What happens when you run a complete dungeon crawl? Let's talk about it! ---------------------------------------------------------------- Want to watch these episodes live? Check us out at https://www.youtube.com/@somederpsplaygames or twitch.tv/somederpsplaygames Check out the podcast on Soundcloud: https://soundcloud.com/somederpstalkaboutgames Want to tell us something? Email us at podcast@somederpsplaygames.com Like our Facebook page too! www.facebook.com/SomeDerpsPlayGames/ We have a Patreon! https://www.patreon.com/somederpsplaygames Rate us on iTunes! https://itunes.apple.com/us/podcast/some-derps-talk-about-games/id1048899720 Follow us on Twitter! SDPG: twitter.com/somederps Buddy: twitter.com/thatbuddysola Mango: twitter.com/theonetruemango Intro and Outro courtesy of twitter.com/VinceRolin

Cribl: The Stream Life
Microsoft Azure + Cribl - Better together

Cribl: The Stream Life

Play Episode Listen Later Nov 19, 2024 16:00


In this episode of The Stream Life Podcast, Kam Amir and Aldo Dossola join the show to discuss Microsoft Ignite, Cribl's solutions for Microsoft Azure customers, and much more. Resources Microsoft Azure + Cribl: Better together If you want to automatically get every episode of the Stream Life podcast, you can subscribe on your favorite podcast app. Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl's suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs. We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Cribl: The Stream Life
A Two Way Door

Cribl: The Stream Life

Play Episode Listen Later Sep 26, 2024 24:50


In this episode of The Stream Life Podcast, Nick Heudecker joins the show to discuss Cribl's recent Series E round, how customers find value in our products, why they need a Data Engine for IT and Security, and how our products integrate seamlessly—without ever locking data in. Resources Cribl Closes $319M Series E Round at a $3.5B Valuation to Revolutionize Enterprise Data Management How to Avoid Vendor Lock-In Cribl Lake  If you want to automatically get every episode of the Stream Life podcast, you can subscribe on your favorite podcast app. Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl's suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs. We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Epic Adventure
On Modules, Sandboxes, and Western Marches

Epic Adventure

Play Episode Listen Later Sep 18, 2024 46:30


I remember opening up “Keep on the Borderlands” the first module I ever ran. It came in the D&D Basic Set and I thought it was very cool. The players started out in a keep, a perfect base of operations. It had a tavern, a blacksmith, a provisioner, and a chapel. Everything a growing adventuring party needs. From there the players would travel to the Caves of Chaos and explore underground lairs of beasts an monsters. You know, the dungeons of dungeons and dragons.To be fair I don't really remember much about those first games I ran. I remember the feeling of running them and how much fun I had, but I don't remember the details.As a game master I quickly moved away from the Modules and started doing my own thing. I would read modules, but then always felt like I had a better idea. So, I tended to do my own thing.Fast forward several years and I was playing more GURPS. Steve Jackson's Generic Universal Roleplaying System. That system was well known for its source books that help you world build, but had few written modules. That was perfect for me.After decades of making up my own adventures I decided that I wanted to run some of those talked about modules and, being a huge fan of Traveller decided our groups next game would be Pirates of Drinax (listen to our series The Anatomy of a Campaign for details on that one).I hated it.Ok, I didn't hate the idea or setting, but as a game master I hated being tied to the books, trying to make incidents and events happen that I was unfamiliar with, no matter how much reading I did. And of course, with every player question I found myself diving into the books and spending far to much time trying to find the answers.I should have learned my lesson.But I didn't. After Pirates I decided to take on an even bigger challenge with Masks of Nyarlathotep.And 30 episodes in, I am having a miserable time (please don't tell my players that)Personally, I like to make it up as I go.But that's just my opinion. Some people love pre-written modules. Other's prefer sandbox worlds and settings and still others enjoy the Western Marches style…oh, is that a new one for you. Don't worry we will cover that.In this episode Mike, Christina and I are going to talk about the type of campaigns. What's good, What's bad, and why?

Computer Corner (5DRadio.com)
Computer Corner August 17, 2024

Computer Corner (5DRadio.com)

Play Episode Listen Later Aug 17, 2024


Hosts: Gene MitchellAir date: August 17, 2024 Topic: VPNs & Sandboxes Special Guest(s):

Critical Thinking - Bug Bounty Podcast
Episode 84: 0xLupin & Takeaways from Google's Las Vegas BugSwat

Critical Thinking - Bug Bounty Podcast

Play Episode Listen Later Aug 15, 2024 27:15


Episode 84: In this episode of Critical Thinking - Bug Bounty Podcast, Justin is joined by Roni Carta (@0xLupin) to discuss their MVH win at the recent Google LHE, and share some technical observations they had with the target and the event.Follow us on twitter at: @ctbbpodcastWe're new to this podcasting thing, so feel free to send us any feedback here: info@criticalthinkingpodcast.ioShoutout to YTCracker for the awesome intro music!------ Links ------Find the Hackernotes: https://blog.criticalthinkingpodcast.io/Follow your hosts Rhynorater & Teknogeek on twitter:------ Ways to Support CTBBPodcast ------Hop on the CTBB Discord at https://ctbb.show/discord!We also do Discord subs at $25, $10, and $5 - premium subscribers get access to private masterclasses, exploits, tools, scripts, un-redacted bug reports, etc.Today's Guest: https://x.com/0xLupinToday's Sponsor - ThreatLockerTimestamps:(00:00:00) Introduction(00:02:12) MHV Debrief(00:09:05) Sandboxes and Comfort Zones(00:13:24) SDKs and Legal Compliance(00:19:29) Age of Target and Platform-Exclusive Hunters

Cribl: The Stream Life
Navigating the Data Current

Cribl: The Stream Life

Play Episode Listen Later Jul 31, 2024 31:43


In this episode of The Stream Life Podcast, Nick Heudecker joins the show to talk about Cribl's new research report, Navigating the Data Current 2024: Exploring Cribl.Cloud Analytics and Customer Insights. Resources Download the report Read Nick's blog If you want to automatically get every episode of the Stream Life podcast, you can subscribe on your favorite podcast app. Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl's suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs. We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Cribl: The Stream Life
Cribl University Update

Cribl: The Stream Life

Play Episode Listen Later Jun 28, 2024 21:01


In this episode of The Stream Life Podcast, Bradley chats with Cribl's Dariann Kobe about the second birthday of Cribl University and the new Cribl Certified Admin course.   Resources Learn more Cribl University If you want to automatically get every episode of the Stream Life podcast, you can subscribe on your favorite podcast app. Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl's suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs. We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Cribl: The Stream Life
CriblCon 2024 Recap

Cribl: The Stream Life

Play Episode Listen Later Jun 21, 2024 25:42


In this episode of The Stream Life Podcast, Bradley chats with Cribl's Mike Dupuis about everything announced at CriblCon 2024! Resources Cribl Copilot: Your Trusted AI Wingman for Deploying, Configuring & Troubleshooting Cribl Accelerates Data Management Productivity with AI-Powered Copilot CriblCon 2024 Recap Blog Session Recap Watch the sessions on YouTube If you want to automatically get every episode of the Stream Life podcast, you can subscribe on your favorite podcast app. Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl's suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs. We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Cribl: The Stream Life
Exploring Cribl Copilot!

Cribl: The Stream Life

Play Episode Listen Later Jun 11, 2024 17:02


In this episode of The Stream Life Podcast, Bradley Chambers chat with Nikhil Mungel about Cribl Copilot. Cribl Copilot turbocharges efficiency and bridges the skills gap, ushering in the next generation of AI-augmented workforce empowerment for IT and Security.   Resources Cribl Copilot: Your Trusted AI Wingman for Deploying, Configuring & Troubleshooting Cribl Accelerates Data Management Productivity with AI-Powered Copilot   If you want to automatically get every episode of the Stream Life podcast, you can subscribe on your favorite podcast app. Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl's suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs. We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0
ICLR 2024 — Best Papers & Talks (Benchmarks, Reasoning & Agents) — ft. Graham Neubig, Aman Sanger, Moritz Hardt)

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Jun 10, 2024 269:19


Our second wave of speakers for AI Engineer World's Fair were announced! The conference sold out of Platinum/Gold/Silver sponsors and Early Bird tickets! See our Microsoft episode for more info and buy now with code LATENTSPACE.This episode is straightforwardly a part 2 to our ICLR 2024 Part 1 episode, so without further ado, we'll just get right on with it!Timestamps[00:03:43] Section A: Code Edits and Sandboxes, OpenDevin, and Academia vs Industry — ft. Graham Neubig and Aman Sanger* [00:07:44] WebArena* [00:18:45] Sotopia* [00:24:00] Performance Improving Code Edits* [00:29:39] OpenDevin* [00:47:40] Industry and Academia[01:05:29] Section B: Benchmarks* [01:05:52] SWEBench* [01:17:05] SWEBench/SWEAgent Interview* [01:27:40] Dataset Contamination Detection* [01:39:20] GAIA Benchmark* [01:49:18] Moritz Hart - Science of Benchmarks[02:36:32] Section C: Reasoning and Post-Training* [02:37:41] Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection* [02:51:00] Let's Verify Step By Step* [02:57:04] Noam Brown* [03:07:43] Lilian Weng - Towards Safe AGI* [03:36:56] A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis* [03:48:43] MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework[04:00:51] Bonus: Notable Related Papers on LLM CapabilitiesSection A: Code Edits and Sandboxes, OpenDevin, and Academia vs Industry — ft. Graham Neubig and Aman Sanger* Guests* Graham Neubig* Aman Sanger - Previous guest and NeurIPS friend of the pod!* WebArena * * Sotopia (spotlight paper, website)* * Learning Performance-Improving Code Edits* OpenDevin* Junyang Opendevin* Morph Labs, Jesse Han* SWE-Bench* SWE-Agent* Aman tweet on swebench* LiteLLM* Livecodebench* the role of code in reasoning* Language Models of Code are Few-Shot Commonsense Learners* Industry vs academia* the matryoshka embeddings incident* other directions* UnlimiformerSection A timestamps* [00:00:00] Introduction to Guests and the Impromptu Nature of the Podcast* [00:00:45] Graham's Experience in Japan and Transition into Teaching NLP* [00:01:25] Discussion on What Constitutes a Good Experience for Students in NLP Courses* [00:02:22] The Relevance and Teaching of Older NLP Techniques Like Ngram Language Models* [00:03:38] Speculative Decoding and the Comeback of Ngram Models* [00:04:16] Introduction to WebArena and Zotopia Projects* [00:05:19] Deep Dive into the WebArena Project and Benchmarking* [00:08:17] Performance Improvements in WebArena Using GPT-4* [00:09:39] Human Performance on WebArena Tasks and Challenges in Evaluation* [00:11:04] Follow-up Work from WebArena and Focus on Web Browsing as a Benchmark* [00:12:11] Direct Interaction vs. Using APIs in Web-Based Tasks* [00:13:29] Challenges in Base Models for WebArena and the Potential of Visual Models* [00:15:33] Introduction to Zootopia and Exploring Social Interactions with Language Models* [00:16:29] Different Types of Social Situations Modeled in Zootopia* [00:17:34] Evaluation of Language Models in Social Simulations* [00:20:41] Introduction to Performance-Improving Code Edits Project* [00:26:28] Discussion on DevIn and the Future of Coding Agents* [00:32:01] Planning in Coding Agents and the Development of OpenDevon* [00:38:34] The Changing Role of Academia in the Context of Large Language Models* [00:44:44] The Changing Nature of Industry and Academia Collaboration* [00:54:07] Update on NLP Course Syllabus and Teaching about Large Language Models* [01:00:40] Call to Action: Contributions to OpenDevon and Open Source AI Projects* [01:01:56] Hiring at Cursor for Roles in Code Generation and Assistive Coding* [01:02:12] Promotion of the AI Engineer ConferenceSection B: Benchmarks * Carlos Jimenez & John Yang (Princeton) et al: SWE-bench: Can Language Models Resolve Real-world Github Issues? (ICLR Oral, Paper, website)* “We introduce SWE-bench, an evaluation framework consisting of 2,294 software engineering problems drawn from real GitHub issues and corresponding pull requests across 12 popular Python repositories. Given a codebase along with a description of an issue to be resolved, a language model is tasked with editing the codebase to address the issue. Resolving issues in SWE-bench frequently requires understanding and coordinating changes across multiple functions, classes, and even files simultaneously, calling for models to interact with execution environments, process extremely long contexts and perform complex reasoning that goes far beyond traditional code generation tasks. Our evaluations show that both state-of-the-art proprietary models and our fine-tuned model SWE-Llama can resolve only the simplest issues. The best-performing model, Claude 2, is able to solve a mere 1.96% of the issues. Advances on SWE-bench represent steps towards LMs that are more practical, intelligent, and autonomous.”* Yonatan Oren et al (Stanford): Proving Test Set Contamination in Black-Box Language Models (ICLR Oral, paper, aman tweet on swebench contamination)* “We show that it is possible to provide provable guarantees of test set contamination in language models without access to pretraining data or model weights. Our approach leverages the fact that when there is no data contamination, all orderings of an exchangeable benchmark should be equally likely. In contrast, the tendency for language models to memorize example order means that a contaminated language model will find certain canonical orderings to be much more likely than others. Our test flags potential contamination whenever the likelihood of a canonically ordered benchmark dataset is significantly higher than the likelihood after shuffling the examples. * We demonstrate that our procedure is sensitive enough to reliably prove test set contamination in challenging situations, including models as small as 1.4 billion parameters, on small test sets of only 1000 examples, and datasets that appear only a few times in the pretraining corpus.”* Outstanding Paper mention: “A simple yet elegant method to test whether a supervised-learning dataset has been included in LLM training.”* Thomas Scialom (Meta AI-FAIR w/ Yann LeCun): GAIA: A Benchmark for General AI Assistants (paper)* “We introduce GAIA, a benchmark for General AI Assistants that, if solved, would represent a milestone in AI research. GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency. * GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins. * GAIA's philosophy departs from the current trend in AI benchmarks suggesting to target tasks that are ever more difficult for humans. We posit that the advent of Artificial General Intelligence (AGI) hinges on a system's capability to exhibit similar robustness as the average human does on such questions. Using GAIA's methodology, we devise 466 questions and their answer.* * Mortiz Hardt (Max Planck Institute): The emerging science of benchmarks (ICLR stream)* “Benchmarks are the keystone that hold the machine learning community together. Growing as a research paradigm since the 1980s, there's much we've done with them, but little we know about them. In this talk, I will trace the rudiments of an emerging science of benchmarks through selected empirical and theoretical observations. Specifically, we'll discuss the role of annotator errors, external validity of model rankings, and the promise of multi-task benchmarks. The results in each case challenge conventional wisdom and underscore the benefits of developing a science of benchmarks.”Section C: Reasoning and Post-Training* Akari Asai (UW) et al: Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection (ICLR oral, website)* (Bad RAG implementations) indiscriminately retrieving and incorporating a fixed number of retrieved passages, regardless of whether retrieval is necessary, or passages are relevant, diminishes LM versatility or can lead to unhelpful response generation. * We introduce a new framework called Self-Reflective Retrieval-Augmented Generation (Self-RAG) that enhances an LM's quality and factuality through retrieval and self-reflection. * Our framework trains a single arbitrary LM that adaptively retrieves passages on-demand, and generates and reflects on retrieved passages and its generations using special tokens, called reflection tokens. Generating reflection tokens makes the LM controllable during the inference phase, enabling it to tailor its behavior to diverse task requirements. * Self-RAG (7B and 13B parameters) outperforms ChatGPT and retrieval-augmented Llama2-chat on Open-domain QA, reasoning, and fact verification tasks, and it shows significant gains in improving factuality and citation accuracy for long-form generations relative to these models. * Hunter Lightman (OpenAI): Let's Verify Step By Step (paper)* “Even state-of-the-art models still regularly produce logical mistakes. To train more reliable models, we can turn either to outcome supervision, which provides feedback for a final result, or process supervision, which provides feedback for each intermediate reasoning step. * We conduct our own investigation, finding that process supervision significantly outperforms outcome supervision for training models to solve problems from the challenging MATH dataset. Our process-supervised model solves 78% of problems from a representative subset of the MATH test set. Additionally, we show that active learning significantly improves the efficacy of process supervision. * To support related research, we also release PRM800K, the complete dataset of 800,000 step-level human feedback labels used to train our best reward model.* * Noam Brown - workshop on Generative Models for Decision Making* Solving Quantitative Reasoning Problems with Language Models (Minerva paper)* Describes some charts taken directly from the Let's Verify Step By Step paper listed/screenshotted above.* Lilian Weng (OpenAI) - Towards Safe AGI (ICLR talk)* OpenAI Model Spec* OpenAI Instruction Hierarchy: The Instruction Hierarchy: Training LLMs to Prioritize Privileged InstructionsSection D: Agent Systems* Izzeddin Gur (Google DeepMind): A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis (ICLR oral, paper)* [Agent] performance on real-world websites has still suffered from (1) open domainness, (2) limited context length, and (3) lack of inductive bias on HTML.* We introduce WebAgent, an LLM-driven agent that learns from self-experience to complete tasks on real websites following natural language instructions.* WebAgent plans ahead by decomposing instructions into canonical sub-instructions, summarizes long HTML documents into task-relevant snippets, and acts on websites via Python programs generated from those.* We design WebAgent with Flan-U-PaLM, for grounded code generation, and HTML-T5, new pre-trained LLMs for long HTML documents using local and global attention mechanisms and a mixture of long-span denoising objectives, for planning and summarization.* We empirically demonstrate that our modular recipe improves the success on real websites by over 50%, and that HTML-T5 is the best model to solve various HTML understanding tasks; achieving 18.7% higher success rate than the prior method on MiniWoB web automation benchmark, and SoTA performance on Mind2Web, an offline task planning evaluation.* Sirui Hong (DeepWisdom): MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework (ICLR Oral, Paper)* We introduce MetaGPT, an innovative meta-programming framework incorporating efficient human workflows into LLM-based multi-agent collaborations. MetaGPT encodes Standardized Operating Procedures (SOPs) into prompt sequences for more streamlined workflows, thus allowing agents with human-like domain expertise to verify intermediate results and reduce errors. MetaGPT utilizes an assembly line paradigm to assign diverse roles to various agents, efficiently breaking down complex tasks into subtasks involving many agents working together. Bonus: Notable Related Papers on LLM CapabilitiesThis includes a bunch of papers we wanted to feature above but could not.* Lukas Berglund (Vanderbilt) et al: The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A” (ICLR poster, paper, Github)* We expose a surprising failure of generalization in auto-regressive large language models (LLMs). If a model is trained on a sentence of the form ''A is B'', it will not automatically generalize to the reverse direction ''B is A''. This is the Reversal Curse. * The Reversal Curse is robust across model sizes and model families and is not alleviated by data augmentation. We also evaluate ChatGPT (GPT-3.5 and GPT-4) on questions about real-world celebrities, such as ''Who is Tom Cruise's mother? [A: Mary Lee Pfeiffer]'' and the reverse ''Who is Mary Lee Pfeiffer's son?''. GPT-4 correctly answers questions like the former 79% of the time, compared to 33% for the latter.* * Omar Khattab (Stanford): DSPy: Compiling Declarative Language Model Calls into State-of-the-Art Pipelines (ICLR Spotlight Poster, GitHub)* presented by Krista Opsahl-Ong* “Existing LM pipelines are typically implemented using hard-coded “prompt templates”, i.e. lengthy strings discovered via trial and error. Toward a more systematic approach for developing and optimizing LM pipelines, we introduce DSPy, a programming model that abstracts LM pipelines as text transformation graphs, or imperative computational graphs where LMs are invoked through declarative modules. * DSPy modules are parameterized, meaning they can learn how to apply compositions of prompting, finetuning, augmentation, and reasoning techniques. * We design a compiler that will optimize any DSPy pipeline to maximize a given metric, by creating and collecting demonstrations. * We conduct two case studies, showing that succinct DSPy programs can express and optimize pipelines that reason about math word problems, tackle multi-hop retrieval, answer complex questions, and control agent loops. * Within minutes of compiling, DSPy can automatically produce pipelines that outperform out-of-the-box few-shot prompting as well as expert-created demonstrations for GPT-3.5 and Llama2-13b-chat. On top of that, DSPy programs compiled for relatively small LMs like 770M parameter T5 and Llama2-13b-chat are competitive with many approaches that rely on large and proprietary LMs like GPT-3.5 and on expert-written prompt chains. * * MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning* Scaling Laws for Associative Memories * DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models* Efficient Streaming Language Models with Attention Sinks Get full access to Latent Space at www.latent.space/subscribe

Cribl: The Stream Life
Keep Scalin'

Cribl: The Stream Life

Play Episode Listen Later Jun 10, 2024 29:27


In this episode of The Stream Life Podcast, I chat with Nick Romito about the journey to building support for 50k Cribl Edge nodes in customer deployments. Resources The Journey to 100x-ing Control Plane Scale for Cribl Edge If you want to automatically get every episode of the Stream Life podcast, you can subscribe on your favorite podcast app. Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl's suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs. We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Cribl: The Stream Life
Customer Choice and Technical Alliances

Cribl: The Stream Life

Play Episode Listen Later May 23, 2024 19:16


In this episode of The Stream Life Podcast, Vlad Melnik joins the show to discuss all the news about Cribl's new Technical Alliance Partner program and why customer choice for data will be the decade's theme in IT and Security.   Resources Vlad's Blog Cribl's Press Release If you want to automatically get every episode of the Stream Life podcast, you can subscribe on your favorite podcast app. Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl's suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs. We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Cribl: The Stream Life
The State of OCSF

Cribl: The Stream Life

Play Episode Listen Later May 23, 2024 20:07


In this episode of The Stream Life Podcast, Nick Heudecker joins the show to talk about his recent LinkedIn article about OCSF (Open Cybersecurity Schema Framework). Resources What is OCSF? Nick's recent LinkedIn article  If you want to automatically get every episode of the Stream Life podcast, you can subscribe on your favorite podcast app. Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl's suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs. We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Cribl: The Stream Life
Join Cribl at RSA Conference 2024!

Cribl: The Stream Life

Play Episode Listen Later Apr 29, 2024 12:50


In this episode of The Stream Life Podcast, I chat with a host of goats: Mary Mikkleson, Jackie McGuire, and Holly Anderson, about all the excitement around RSA Conference! Resources Book a demo with Cribl at RSA Conference  Empower Her - Women's Happy Hour at RSA Cribl + Exabeam + Corelight Happy Hour at RSA If you want to automatically get every episode of the Stream Life podcast, you can subscribe on your favorite podcast app. Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl's suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs. We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Cribl: The Stream Life
Storm Drains and Data Lakes

Cribl: The Stream Life

Play Episode Listen Later Apr 22, 2024 27:21


In this episode of The Stream Life Podcast, Bradley Chambers and Nick Heudecker discuss the state of today's data lakes, what customers need, and Cribl's newest product: Cribl Lake!   Resources Learn more about Cribl Lake Introducing Cribl Lake blog The Data Lake Dilemma: Why Businesses Need a New Approach If you want to automatically get every episode of the Stream Life podcast, you can subscribe on your favorite podcast app.   Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl's suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs. We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Cribl: The Stream Life
Introducing Cribl Lake!

Cribl: The Stream Life

Play Episode Listen Later Apr 17, 2024 24:23


In this episode of The Stream Life Podcast, Felicia Dorng and Rick Salsa join the show to discuss Cribl's newest product: Cribl Lake!   Resources Learn more about Cribl Lake Introducing Cribl Lake blog The Data Lake Dilemma: Why Businesses Need a New Approach If you want to automatically get every episode of the Stream Life podcast, you can subscribe on your favorite podcast app.   Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl's suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs. We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Cribl: The Stream Life
Engineering at Cribl

Cribl: The Stream Life

Play Episode Listen Later Apr 2, 2024 21:59


In this episode of The Stream Life Podcast, Cribl's first nonfounder employee, Nick Romito, joins the show to talk about engineers at Cribl, how the team has scaled over the years, and much more. It's a fun show, as always! Resources Careers at Cribl Engineer Careers at Cribl If you want to automatically get every episode of the Stream Life podcast, you can subscribe on your favorite podcast app.   Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl's suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs. We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Cribl: The Stream Life
Cribl Announces Partner Award Winners!

Cribl: The Stream Life

Play Episode Listen Later Mar 7, 2024 19:10


In this episode of The Stream Life Podcast, Zac Kilpatrick and Bradley Chambers chat about Cribl's Partner Awards! During our annual company kick off, we were thrilled to announce the Cribl Partner of the Year Award Winners, who are recognized for contributions, loyalty, and mutual commitment to delivering high value to customers within our partner ecosystem. Resources Read the blog to hear all the winners If you want to automatically get every episode of the Stream Life podcast, you can subscribe on your favorite podcast app.   Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl's suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs. We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Cribl: The Stream Life
CriblCon 2024

Cribl: The Stream Life

Play Episode Listen Later Mar 6, 2024 14:52


In this episode of The Stream Life Podcast, Mike Dupuis and I chat about CriblCon 2024, what's on the agenda, and why all IT and security engineers should attend. Resources Register for CriblCon 2024! If you want to automatically get every episode of the Stream Life podcast, you can subscribe on your favorite podcast app. Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl's suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs. We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Story Paths
Playful Blueprints: Designing Personal and Professional Sandboxes

Story Paths

Play Episode Listen Later Jan 30, 2024 74:08


Learn more about Tamara Strijack's offerings here.SIgn up for weekly playshops here: https://storypaths.substack.com/p/acc68ea7-3583-4165-b74a-096dff09b74bThis is the latest conversation in our Play Matters series.I've spoken with a board game designer about the theory of game design, and by extension, play-space design.I've spoken with a man who goes into prisons, who brings creative exercises into those difficult, stifled places, helping people unlock their hearts.I've spoken with a woman who brings play into ceremony and alternative education.I've spoken with an indigenous woman who brings play into her work of decolonization, cultural renewal, and intercultural bridge building.And now, I'm happy to bring you a conversation with my good friend Tamara Strijack, who works in education and child and adolescent support.She is a counselor and educator working on Vancouver Island, near where I'm living now, and she specializes in childhood and adolescent development. In the last 25 years, she's worked as a mentor, counselor, youth leader, program director, and group facilitator. And she's now mainly a parent consultant. She also offers workshops and teaches university courses for teachers and counselors in training.She's a mother of two and the daughter of Gordon Neufeld, and she works in the Neufeld Institute.In this conversation, we get into why play is so vital for human well-being. How it is such a mistake to consider it something that children do just to pass their time. It's an integral part of what we need to be healthy and grow at every stage in our lives, not just childhood. And it can be a space to practice what we might then do within the greater field of our life.Play is vital for these reasons and more.We also speak about how to craft zones of play for children and adults, although for adults, we might not call them play zones, we might call them something more official sounding. These can be physical spaces, but perhaps more importantly, they are emotional spaces.As you listen to this, I invite you to consider where in your life you have place spaces and where you might want to create or enhance play spaces. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit storypaths.substack.com/subscribe

Tipping Point New Mexico
559 What are "Regulatory Sandboxes" and Will New Mexico Pass Legislation to Allow Them?

Tipping Point New Mexico

Play Episode Listen Later Nov 22, 2023 40:46


On this week's episode Paul discusses an innovative approach to regulation that comes to us from Utah that has been embraced by red and blue states alike. The idea is the creation of "regulatory sandboxes" which  allows live, time-bound testing of innovations under a regulator's oversight. Paul discusses the issue with Rees Empey, Director of State Govt. Affairs at the Utah-based Libertas Institute and Brian Knight is the Director of the Program on Financial Regulation at the Mercatus Center based in the DC area.  The "Sandbox" issue has bipartisan support in the New Mexico Legislature having been introduced by Democrats in the 2023 session.  

Kopec Explains Software
#125 What is a Sandbox?

Kopec Explains Software

Play Episode Listen Later Oct 2, 2023 12:12


In software, a sandbox is an isolated environment that limits the resources that a particular application can access. Sandboxes are used to protect the security and privacy of the user. All Web apps and much consumer software running on modern operating systems like iOS, Android, macOS, and Windows runs in a sandbox. We also use our general definition of sandbox to discuss their use in software development. A sandboxed, development version of a software product doesn't affect the end users of the production version. Likewise, a sandboxed API doesn't allow a developer to accidentally complete a real-world transaction. Note that we combine the sometimes more specific use of the term sandbox in computer security and sandbox environment in software development to form our own more general definition in this episode. Show Notes Episode 30: Cybersecurity with Duane Dunston Follow us on X @KopecExplains. Theme “Place on Fire” Copyright 2019 Creo, CC BY 4.0 Find out more at http://kopec.liveRead transcript

The CRM Zen Show
CRM ZEN SHOW Episode 269

The CRM Zen Show

Play Episode Listen Later Sep 25, 2023 43:03


This week we talk about: iOS 17, iPadOS 17, and macOS Sonoma PALOOZA Canvas Now Supports Sandboxes Zoho Analytics September'23 Updates Our Implementation, Read, Code Share, and Tip of the Week Read the show notes: https://zenatta.com/episode-269/  

Storyteller Conclave
Storytelling 202: Sandboxes

Storyteller Conclave

Play Episode Listen Later Aug 1, 2023 110:50


The term “sandbox” is often used to describe a world filled with opportunities and bereft of direction. Many storytellers want to offer this to their players but fall into the traps the style presents. This week Sara & Rob examine the obstacles and opportunities presented by Sandbox worlds. Listen to us live, every Wednesday night at 7pm EST, at http://www.mixlr.com/storyteller-conclave. Listen to us on your favorite device! Or Amazon Audible Find us on Twitter (@st_conclave) – Instagram (st_conclave) Support the show by joining our Patreon : https://www.patreon.com/StorytellerConclave Please join us on Discord, to submit questions for the show, chat with Rob, Sara, and other Storytellers, and read over the detailed show-notes for more information and links to stuff we may not have been able to detail during the show! Discord : https://t.co/7H8p1lGYqG Or find our older episodes at Https://Storytellerconclave.com  

The Salesforce Admins Podcast
Sandboxes and Scratch Orgs with Rohit Mehta

The Salesforce Admins Podcast

Play Episode Listen Later Dec 22, 2022 24:12


Today on the Salesforce Admins Podcast, we talk to Rohit Mehta, Senior Product Manager for Sandboxes and Scratch Orgs at Salesforce. Join us as we chat about how to think about sandboxes and scratch orgs and some tips for how to use them better. You should subscribe for the full episode, but here are a […] The post Sandboxes and Scratch Orgs with Rohit Mehta appeared first on Salesforce Admins.