Affix which is placed before the stem of a word
POPULARITY
This was a sweet Sunday crossword, with an unusual theme, unusual grid size (!), but the usual great clues that we expect from a Sunday NYTimes crossword. We've covered the biggies in today's episode, but we'd also like to note the "little" people, er, answers, without which this grid would be comically empty. For example, 32D, Prefix with bel, DECI (ha!); 87D, Items being replaced, OLDONES (duh!); and our favorite, 95D, Snatches, comic-book style, YOINKS (rah!).Remember our Triplet Tuesday Contest is coming up faster than a speeding bullet, so tune in Tuesday and see if you can capture the crown.Show note imagery: Famed actor and comedian Buster Keaton, next to someone with pronounced RINGLETs.We love feedback! Send us a text...Contact Info:We love listener mail! Drop us a line, crosswordpodcast@icloud.com.Also, we're on FaceBook, so feel free to drop by there and strike up a conversation!
As a breeder, your prefix is your signature—and it should be seen in the pedigrees of the cats you breed, not just the ones you buy. In this episode, we'll talk about the importance of breeding from your own kittens and how to make your mark by incorporating your prefix into future generations. I'll share tips on how to build a strong foundation with your own breeding stock and why it's so important for your program's legacy. Breeding from your own lines helps create consistency and strengthens your reputation, so let's dive into how you can make that happen! Tune in to learn how to get your prefix into your cats' pedigrees and continue building your breeding program with pride.
Want to describe someone else in Swahili—like “he's working” or “she likes coffee”?In this episode, we explore the power of the subject prefix a—a small but mighty piece of Swahili grammar that helps you talk about other people clearly and confidently.You'll learn:What a actually means in a sentenceWhen and how to use itCommon sentence patterns (and how to say them out loud!)Easy mistakes to avoid when talking about he or sheIf you're ready to move beyond “I” and start speaking about others—this is your next step.
In this week's episode Andrew is joined by Ben to talk about the AHDB.They start by going over the positives such as the AHDB recommended list and the AHDB app.They then move on to discuss the increased levy and whether it provides value for money.They talk about the things that the AHDB could do to streamline and improve themselves.Ben makes a point of saying that we are not opposed to the digital passport as a concept but in its current form we do not see how it is in the best interest of farmers.This month's podcast walk will take place at 2pm on the 31/03/2025, starting at the Suffield Arms. The What3Words location is Clever.Schooling.Prefix. We look forward to seeing you there! Hosted on Acast. See acast.com/privacy for more information.
The original plan was called 'disastrous' by other UK podcasters. Sponsored by CoHost. Want more listener insights without switching hosting providers? Explore CoHost's Prefix with in-depth podcast analytics and tracking links starting at just $15/month. Sign up today for a free trial. https://podnews.net/cc/2824 Visit https://podnews.net/update/bbc-uk-ads-podcasts for the story links in full, and to get our daily newsletter.
A further nine new countries to get the monetisation model. Sponsored by CoHost. Want more listener insights without switching hosting providers? Explore CoHost's Prefix with in-depth podcast analytics and tracking links starting at just $15/month. Sign up today for a free trial. https://podnews.net/cc/2824 Visit https://podnews.net/update/spotify-partners-europe for the story links in full, and to get our daily newsletter.
There's a new high, but it's still much less popular than the US. Sponsored by CoHost. Want more listener insights without switching hosting providers? Explore CoHost's Prefix with in-depth podcast analytics and tracking links starting at just $15/month. Sign up today for a free trial. https://podnews.net/cc/2824 Visit https://podnews.net/update/japan-25-podcast-consumption for the story links in full, and to get our daily newsletter.
Dynamic ads might be coming. Sponsored by CoHost. Want more listener insights without switching hosting providers? Explore CoHost's Prefix with in-depth podcast analytics and tracking links starting at just $15/month. Sign up today for a free trial. https://podnews.net/cc/2824 Visit https://podnews.net/update/youtube-dynamic-ads for the story links in full, and to get our daily newsletter.
Only 10% of Gen Z podcast consumers never use video podcasts. Sponsored by CoHost. Want more listener insights without switching hosting providers? Explore CoHost's Prefix with in-depth podcast analytics and tracking links starting at just $15/month. Sign up today for a free trial. https://podnews.net/cc/2824 Visit https://podnews.net/update/gen-z-lean-towards-video for the story links in full, and to get our daily newsletter.
In this week's episode the Dewing Grain podcast goes international!Andrew headed to Dubai to meet this weeks legend series special guest, Julian Godfrey.Julian begins by explaining the different roles he has worked in since he first joined the grain trade.Julian talks about how he ended up working from Dubai.They talk about how they made friends and contacts in the trade through the years. Julian explains to Andrew how he manages to broker trades while working in a completely different time zone to many of the deals.They talk about the fact that Russian wheat still comes into the UK despite the sanctions and tariffs on it.They discuss how arbitration is used in trading to help solve disputes but is also sometimes used for more nefarious purposes.This month's podcast walk will take place at 2pm on the 31/03/2025 at the Suffield Arms. The What3Words location is Clever.Schooling.Prefix. We look forward to seeing you there! Hosted on Acast. See acast.com/privacy for more information.
Start learning Italian today!1. Explore more simple Italian lessons: https://italianmatters.com/1822. Download the Italian Verb Conjugation Blueprint: https://bit.ly/freebieverbblueprint3. Subscribe to the YouTube lessons: https://www.youtube.com/italianmattersThe goal of the Italian Matters Language and Culture School is to help English speakers build fluency and confidence to speak the Italian language through support, feedback, and accountability. The primary focus is on empowering Italian learners to speak clearly and sound natural so they can easily have conversations in Italian. Hosted on Acast. See acast.com/privacy for more information.
On this week's episode, Andrews special guest is Frances Roberson who is the manager of the Food and Farming Discovery Trust.Frances grew up on a mixed beef and arable farm in North Norfolk.Frances talks Andrew through the wide variety of jobs she had after she left university. Andrews days as an international model get bought up. The Food and Farming Discovery Trust aims to showcase food, farming and the countryside to the local community.They discuss the Royal Norfolk Show's school program, which helps to give young people a better understanding of what agriculture is all about.Olivia Shave's petition to get agriculture in the curriculum gets mentioned. If you haven't already, please take the time sign the petition!This month's podcast walk will take place at 2pm on the 31/03/2025 at the Suffield Arms. The What3Words location is Clever.Schooling.Prefix. We look forward to seeing you there! Hosted on Acast. See acast.com/privacy for more information.
In this episode of the IPv6 Buzz, we dive into two RFCs for discovering IPv6 prefixes: 7050 and 8781. Why these two? First, 8781 is being proposed as preferential to 7050. Second, co-host Nick Buraglio is an author on 8781 and has insights to share. We start with some background on RFC 7050, including the... Read more »
In this episode of the IPv6 Buzz, we dive into two RFCs for discovering IPv6 prefixes: 7050 and 8781. Why these two? First, 8781 is being proposed as preferential to 7050. Second, co-host Nick Buraglio is an author on 8781 and has insights to share. We start with some background on RFC 7050, including the... Read more »
SANS Internet Stormcenter Daily Network/Cyber Security and Information Security Stormcast
Phishing via com- prefix domains Every day, attackers are registering a few hunder domain names starting with com-. These are used in phishing e-mails, like for example "toll fee scams", to create more convincing phishing links. https://isc.sans.edu/diary/Phishing%20via%20%22com-%22%20prefix%20domains/31654 Microsoft Windows 10 Extended Security Updates Microsoft released pricing and additional details for the Windows 10 extended security updates. For the first year after official free updates stopped, security updates will be available for $61 for the first year. https://learn.microsoft.com/en-us/windows/whats-new/extended-security-updates Mozilla Enforcing Certificate Transparency Mozilla is following the lead from other browsers, and will require certificates to include a certificate signature timestamp as proof of compliance with certificate transparency requirements. https://groups.google.com/a/mozilla.org/g/dev-security-policy/c/OagRKpVirsA/m/Q4c89XG-EAAJ https://wiki.mozilla.org/SecurityEngineering/Certificate_Transparency#Enterprise_Policies Veeam Update Veeam's internal backup process may be used to execute arbitrary code by an attacker with a machine in the middle position. https://www.veeam.com/kb4712 Netgear Unauthenticated RCE https://kb.netgear.com/000066558/Security-Advisory-for-Unauthenticated-RCE-on-Some-WiFi-Routers-PSV-2023-0039
MMM-The prefix "ob-" in English has a fascinating range of meanings and uses. Here's a breakdown of its main functions and some examples: 1. "Against" or "In the way" obstacle: Something that stands in your way. object: To express disapproval or disagreement. obstruct: To block or hinder progress. obvious: Easily seen or understood, as if it's "in the way" of being missed. 2. "Toward" obligation: A duty or commitment that you're bound "toward." observe: To watch carefully, directing your attention "toward" something. 3. "Thoroughly" or "Completely" In some cases, "ob-" intensifies the meaning of the root word. obtain: To fully grasp or hold something. obfuscate: To make something completely unclear or confusing. obdurate: Stubborn and unyielding, "thoroughly" hardened in one's stance. Variations: The prefix "ob-" can sometimes appear as "oc-", "op-", or "of-" before certain consonants (e.g., occur, oppose, offer). Etymology: "Ob-" comes from Latin, where it had similar meanings. Nuances: The exact meaning of an "ob-" word can depend on the specific context and the root word it's attached to. Expanding Your Vocabulary Root Exploration: Look up the root words that are combined with "ob-" (e.g., "tain" in obtain, "fuscate" in obfuscate). Understanding the root will illuminate the full meaning of the word. Contextual Clues: Pay attention to how "ob-" words are used in sentences and texts. This will help you grasp their specific connotations. Word Lists: Use online resources or dictionaries to find lists of words containing "ob-". This can be a fun way to expand your vocabulary. Memory Palace Location: Imagine a bustling city street. "Obstruct" (Against/In the way):: A giant, overflowing garbage dumpster has been placed smack-dab in the middle of the street. It's completely obstructing traffic. Cars are honking, people are detouring, and a cyclist is trying to squeeze past but is struggling. The dumpster is bright orange, making it obvious (another "ob-" word!) that it's causing a problem. A sleek, futuristic drone suddenly descends and, with a precise laser beam, vaporizes the overflowing garbage from the dumpster. The street is instantly clear. The drone has obviated the need for anyone to move the dumpster manually. Traffic flows smoothly again. Here, "ob-" suggests "against" in the sense of "removing" or "counteracting" a problem. The drone's action has obviated (made unnecessary) the previous need to deal with the obstruction. Visualize the clean, efficient removal to remember this meaning. A group of protesters has arrived at the street, holding signs and chanting. They are opposing the city's plan to build a new highway through the neighborhood. They stand opposed to the construction workers who are trying to clear the area (another "ob-" related concept: "obstruct," remembering the earlier example). The protesters are very vocal and determined. "Oppose" clearly demonstrates the "against" meaning of "ob-". The protesters are actively standing against the proposed highway. Visualize their passionate demonstration to solidify the meaning. Narrative: The overflowing dumpster obstructed the street. The drone obviated the need to move it. Now, the protesters oppose the new highway, adding another layer of conflict to the scene. The street is becoming a stage for various "ob-" related actions and concepts. #MemoryPalace, #VocabularyBuilding,#MnemonicDevice, #EnglishLanguage, #WordNerd, #ObPrefix, #Oppose, #Obstruct ,#Obviate, #LanguageLearning, #VisualLearning ,#StudyTips, #MemoryTechniques, #Education ,#Etymology, #WordOfTheDay, #Vocabulary, #LearningHacks, #BrainTraining,
¿Recuerdas todo lo que aprendimos sobre prefijos y sufijos? Hoy vamos a llevar ese conocimiento al siguiente nivel. ¿Qué pasaría si combináramos ambos en una sola palabra? No solo cambiaríamos su significado, ¡podríamos crear palabras más complejas y útiles para expresarnos mejor en inglés!Estos tipos de palabras son llamadas multisilábicas, y por cierto la palabra "multisyllabic" (o multisilábica) es un gran ejemplo ya que tiene un prefijo (multi-) que significa "muchos" y un sufijo (-ic) que convierte la palabra en un adjetivo. Así que, al aprender a combinar prefijos y sufijos, no solo enriqueceremos nuestro vocabulario, sino que también mejoraremos nuestra pronunciación con esas palabras largas y elegantes.¿Listos para aceptar el desafío? Okay, let's learn!Recuerda que todos los recursos para este episodio, incluyendo la transcripción, la tabla de vocabulario y ejercicios para repasar el aprendizaje están disponibles en nuestro sitio web. Haz clic en este enlace para ver todos los recursos para este episodio: https://www.inglesdesdecero.ca/192Dale “me gusta” a nuestra página en Facebook: https://www.facebook.com/inglesdesde0/Síguenos en Instagram: https://www.instagram.com/ingles.desde.cero/Aprende inglés con nativos que se formaron en su enseñanza. ¡Visita nuestro sitio web, https://www.inglesdesdecero.ca/ para inscribirte y seguir todas nuestras lecciones! __No dejes pasar esta oportunidad con Shopify y regístrate para un período de prueba por solo un dólar al mes en shopify.mx/desdecero
Happy holidays! We'll be sharing snippets from Latent Space LIVE! through the break bringing you the best of 2024! We want to express our deepest appreciation to event sponsors AWS, Daylight Computer, Thoth.ai, StrongCompute, Notable Capital, and most of all all our LS supporters who helped fund the gorgeous venue and A/V production!For NeurIPS last year we did our standard conference podcast coverage interviewing selected papers (that we have now also done for ICLR and ICML), however we felt that we could be doing more to help AI Engineers 1) get more industry-relevant content, and 2) recap 2024 year in review from experts. As a result, we organized the first Latent Space LIVE!, our first in person miniconference, at NeurIPS 2024 in Vancouver.The single most requested domain was computer vision, and we could think of no one better to help us recap 2024 than our friends at Roboflow, who was one of our earliest guests in 2023 and had one of this year's top episodes in 2024 again. Roboflow has since raised a $40m Series B!LinksTheir slides are here:All the trends and papers they picked:* Isaac Robinson* Sora (see our Video Diffusion pod) - extending diffusion from images to video* SAM 2: Segment Anything in Images and Videos (see our SAM2 pod) - extending prompted masks to full video object segmentation* DETR Dominancy: DETRs show Pareto improvement over YOLOs* RT-DETR: DETRs Beat YOLOs on Real-time Object Detection* LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection* D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement* Peter Robicheaux* MMVP (Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs)* * Florence 2 (Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks) * PalíGemma / PaliGemma 2* PaliGemma: A versatile 3B VLM for transfer* PaliGemma 2: A Family of Versatile VLMs for Transfer* AlMv2 (Multimodal Autoregressive Pre-training of Large Vision Encoders) * Vik Korrapati - MoondreamFull Talk on YouTubeWant more content like this? Like and subscribe to stay updated on our latest talks, interviews, and podcasts.Transcript/Timestamps[00:00:00] Intro[00:00:05] AI Charlie: welcome to Latent Space Live, our first mini conference held at NeurIPS 2024 in Vancouver. This is Charlie, your AI co host. When we were thinking of ways to add value to our academic conference coverage, we realized that there was a lack of good talks, just recapping the best of 2024, going domain by domain.[00:00:36] AI Charlie: We sent out a survey to the over 900 of you. who told us what you wanted, and then invited the best speakers in the Latent Space Network to cover each field. 200 of you joined us in person throughout the day, with over 2, 200 watching live online. Our second featured keynote is The Best of Vision 2024, with Peter Robichaud and Isaac [00:01:00] Robinson of Roboflow, with a special appearance from Vic Corrapati of Moondream.[00:01:05] AI Charlie: When we did a poll of our attendees, the highest interest domain of the year was vision. And so our first port of call was our friends at Roboflow. Joseph Nelson helped us kickstart our vision coverage in episode 7 last year, and this year came back as a guest host with Nikki Ravey of Meta to cover segment Anything 2.[00:01:25] AI Charlie: Roboflow have consistently been the leaders in open source vision models and tooling. With their SuperVision library recently eclipsing PyTorch's Vision library. And Roboflow Universe hosting hundreds of thousands of open source vision datasets and models. They have since announced a 40 million Series B led by Google Ventures.[00:01:46] AI Charlie: Woohoo.[00:01:48] Isaac's picks[00:01:48] Isaac Robinson: Hi, we're Isaac and Peter from Roboflow, and we're going to talk about the best papers of 2024 in computer vision. So, for us, we defined best as what made [00:02:00] the biggest shifts in the space. And to determine that, we looked at what are some major trends that happened and what papers most contributed to those trends.[00:02:09] Isaac Robinson: So I'm going to talk about a couple trends, Peter's going to talk about a trend, And then we're going to hand it off to Moondream. So, the trends that I'm interested in talking about are These are a major transition from models that run on per image basis to models that run using the same basic ideas on video.[00:02:28] Isaac Robinson: And then also how debtors are starting to take over the real time object detection scene from the YOLOs, which have been dominant for years.[00:02:37] Sora, OpenSora and Video Vision vs Generation[00:02:37] Isaac Robinson: So as a highlight we're going to talk about Sora, which from my perspective is the biggest paper of 2024, even though it came out in February. Is the what?[00:02:48] Isaac Robinson: Yeah. Yeah. So just it's a, SORA is just a a post. So I'm going to fill it in with details from replication efforts, including open SORA and related work, such as a stable [00:03:00] diffusion video. And then we're also going to talk about SAM2, which applies the SAM strategy to video. And then how debtors, These are the improvements in 2024 to debtors that are making them a Pareto improvement to YOLO based models.[00:03:15] Isaac Robinson: So to start this off, we're going to talk about the state of the art of video generation at the end of 2023, MagVIT MagVIT is a discrete token, video tokenizer akin to VQ, GAN, but applied to video sequences. And it actually outperforms state of the art handcrafted video compression frameworks.[00:03:38] Isaac Robinson: In terms of the bit rate versus human preference for quality and videos generated by autoregressing on these discrete tokens generate some pretty nice stuff, but up to like five seconds length and, you know, not super detailed. And then suddenly a few months later we have this, which when I saw it, it was totally mind blowing to me.[00:03:59] Isaac Robinson: 1080p, [00:04:00] a whole minute long. We've got light reflecting in puddles. That's reflective. Reminds me of those RTX demonstrations for next generation video games, such as Cyberpunk, but with better graphics. You can see some issues in the background if you look closely, but they're kind of, as with a lot of these models, the issues tend to be things that people aren't going to pay attention to unless they're looking for.[00:04:24] Isaac Robinson: In the same way that like six fingers on a hand. You're not going to notice is a giveaway unless you're looking for it. So yeah, as we said, SORA does not have a paper. So we're going to be filling it in with context from the rest of the computer vision scene attempting to replicate these efforts. So the first step, you have an LLM caption, a huge amount of videos.[00:04:48] Isaac Robinson: This, this is a trick that they introduced in Dolly 3, where they train a image captioning model to just generate very high quality captions for a huge corpus and then train a diffusion model [00:05:00] on that. Their Sora and their application efforts also show a bunch of other steps that are necessary for good video generation.[00:05:09] Isaac Robinson: Including filtering by aesthetic score and filtering by making sure the videos have enough motion. So they're not just like kind of the generators not learning to just generate static frames. So. Then we encode our video into a series of space time latents. Once again, SORA, very sparse in details.[00:05:29] Isaac Robinson: So the replication related works, OpenSORA actually uses a MAG VIT V2 itself to do this, but swapping out the discretization step with a classic VAE autoencoder framework. They show that there's a lot of benefit from getting the temporal compression, which makes a lot of sense as the Each sequential frames and videos have mostly redundant information.[00:05:53] Isaac Robinson: So by compressing against, compressing in the temporal space, you allow the latent to hold [00:06:00] a lot more semantic information while avoiding that duplicate. So, we've got our spacetime latents. Possibly via, there's some 3D VAE, presumably a MAG VATV2 and then you throw it into a diffusion transformer.[00:06:19] Isaac Robinson: So I think it's personally interesting to note that OpenSORA is using a MAG VATV2, which originally used an autoregressive transformer decoder to model the latent space, but is now using a diffusion diffusion transformer. So it's still a transformer happening. Just the question is like, is it?[00:06:37] Isaac Robinson: Parameterizing the stochastic differential equation is, or parameterizing a conditional distribution via autoregression. It's also it's also worth noting that most diffusion models today, the, the very high performance ones are switching away from the classic, like DDPM denoising diffusion probability modeling framework to rectified flows.[00:06:57] Isaac Robinson: Rectified flows have a very interesting property that as [00:07:00] they converge, they actually get closer to being able to be sampled with a single step. Which means that in practice, you can actually generate high quality samples much faster. Major problem of DDPM and related models for the past four years is just that they require many, many steps to generate high quality samples.[00:07:22] Isaac Robinson: So, and naturally, the third step is throwing lots of compute at the problem. So I didn't, I never figured out how to manage to get this video to loop, but we see very little compute, medium compute, lots of compute. This is so interesting because the the original diffusion transformer paper from Facebook actually showed that, in fact, the specific hyperparameters of the transformer didn't really matter that much.[00:07:48] Isaac Robinson: What mattered was that you were just increasing the amount of compute that the model had. So, I love how in the, once again, little blog posts, they don't even talk about [00:08:00] like the specific hyperparameters. They say, we're using a diffusion transformer, and we're just throwing more compute at it, and this is what happens.[00:08:08] Isaac Robinson: OpenSora shows similar results. The primary issue I think here is that no one else has 32x compute budget. So we end up with these we end up in the middle of the domain and most of the related work, which is still super, super cool. It's just a little disappointing considering the context. So I think this is a beautiful extension of the framework that was introduced in 22 and 23 for these very high quality per image generation and then extending that to videos.[00:08:39] Isaac Robinson: It's awesome. And it's GA as of Monday, except no one can seem to get access to it because they keep shutting down the login.[00:08:46] SAM and SAM2[00:08:46] Isaac Robinson: The next, so next paper I wanted to talk about is SAM. So we at Roboflow allow users to label data and train models on that data. Sam, for us, has saved our users 75 years of [00:09:00] labeling time.[00:09:00] Isaac Robinson: We are the, to the best of my knowledge, the largest SAM API that exists. We also, SAM also allows us to have our users train just pure bounding box regression models and use those to generate high quality masks which has the great side effect of requiring less training data to have a meaningful convergence.[00:09:20] Isaac Robinson: So most people are data limited in the real world. So anything that requires less data to get to a useful thing is that super useful. Most of our users actually run their object per frame object detectors on every frame in a video, or maybe not most, but many, many. And so Sam follows into this category of taking, Sam 2 falls into this category of taking something that really really works and applying it to a video which has the wonderful benefit of being plug and play with most of our Many of our users use cases.[00:09:53] Isaac Robinson: We're, we're still building out a sufficiently mature pipeline to take advantage of that, but it's, it's in the works. [00:10:00] So here we've got a great example. We can click on cells and then follow them. You even notice the cell goes away and comes back and we can still keep track of it which is very challenging for existing object trackers.[00:10:14] Isaac Robinson: High level overview of how SAM2 works. We there's a simple pipeline here where we can give, provide some type of prompt and it fills out the rest of the likely masks for that object throughout the rest of the video. So here we're giving a bounding box in the first frame, a set of positive negative points, or even just a simple mask.[00:10:36] Isaac Robinson: I'm going to assume people are somewhat familiar with SAM. So I'm going to just give a high level overview of how SAM works. You have an image encoder that runs on every frame. SAM two can be used on a single image, in which case the only difference between SAM two and SAM is that image encoder, which Sam used a standard VIT [00:11:00] Sam two replaced that with a hara hierarchical encoder, which gets approximately the same results, but leads to a six times faster inference, which is.[00:11:11] Isaac Robinson: Excellent, especially considering how in a trend of 23 was replacing the VAT with more efficient backbones. In the case where you're doing video segmentation, the difference is that you actually create a memory bank and you cross attend the features from the image encoder based on the memory bank.[00:11:31] Isaac Robinson: So the feature set that is created is essentially well, I'll go more into it in a couple of slides, but we take the features from the past couple frames, plus a set of object pointers and the set of prompts and use that to generate our new masks. Then we then fuse the new masks for this frame with the.[00:11:57] Isaac Robinson: Image features and add that to the memory bank. [00:12:00] It's, well, I'll say more in a minute. The just like SAM, the SAM2 actually uses a data engine to create its data set in that people are, they assembled a huge amount of reference data, used people to label some of it and train the model used the model to label more of it and asked people to refine the predictions of the model.[00:12:20] Isaac Robinson: And then ultimately the data set is just created from the engine Final output of the model on the reference data. It's very interesting. This paradigm is so interesting to me because it unifies a model in a dataset in a way that is very unique. It seems unlikely that another model could come in and have such a tight.[00:12:37] Isaac Robinson: So brief overview of how the memory bank works, the paper did not have a great visual, so I'm just, I'm going to fill in a bit more. So we take the last couple of frames from our video. And we take the last couple of frames from our video attend that, along with the set of prompts that we provided, they could come from the future, [00:13:00] they could come from anywhere in the video, as well as reference object pointers, saying, by the way, here's what we've found so far attending to the last few frames has the interesting benefit of allowing it to model complex object motion without actually[00:13:18] Isaac Robinson: By limiting the amount of frames that you attend to, you manage to keep the model running in real time. This is such an interesting topic for me because one would assume that attending to all of the frames is super essential, or having some type of summarization of all the frames is super essential for high performance.[00:13:35] Isaac Robinson: But we see in their later ablation that that actually is not the case. So here, just to make sure that there is some benchmarking happening, we just compared to some of the stuff that's came out prior, and indeed the SAM2 strategy does improve on the state of the art. This ablation deep in their dependencies was super interesting to me.[00:13:59] Isaac Robinson: [00:14:00] We see in section C, the number of memories. One would assume that increasing the count of memories would meaningfully increase performance. And we see that it has some impact, but not the type that you'd expect. And that it meaningfully decreases speed, which justifies, in my mind, just having this FIFO queue of memories.[00:14:20] Isaac Robinson: Although in the future, I'm super interested to see A more dedicated summarization of all of the last video, not just a stacking of the last frames. So that another extension of beautiful per frame work into the video domain.[00:14:42] Realtime detection: DETRs > YOLO[00:14:42] Isaac Robinson: The next trend I'm interested in talking about is this interesting at RoboFlow, we're super interested in training real time object detectors.[00:14:50] Isaac Robinson: Those are bread and butter. And so we're doing a lot to keep track of what is actually happening in that space. We are finally starting to see something change. So, [00:15:00] for years, YOLOs have been the dominant way of doing real time object detection, and we can see here that they've essentially stagnated.[00:15:08] Isaac Robinson: The performance between 10 and 11 is not meaningfully different, at least, you know, in this type of high level chart. And even from the last couple series, there's not. A major change so YOLOs have hit a plateau, debtors have not. So we can look here and see the YOLO series has this plateau. And then these RT debtor, LW debtor, and Define have meaningfully changed that plateau so that in fact, the best Define models are plus 4.[00:15:43] Isaac Robinson: 6 AP on Cocoa at the same latency. So three major steps to accomplish this. The first RT deditor, which is technically a 2023 paper preprint, but published officially in 24, so I'm going to include that. I hope that's okay. [00:16:00] That is showed that RT deditor showed that we could actually match or out speed YOLOs.[00:16:04] Isaac Robinson: And then LWdebtor showed that pre training is hugely effective on debtors and much less so on YOLOs. And then DeFine added the types of bells and whistles that we expect from these types, this, this arena. So the major improvements that RTdebtor shows was Taking the multi scale features that debtors typically pass into their encoder and decoupling them into a much more efficient transformer encoder.[00:16:30] Isaac Robinson: The transformer is of course, quadratic complexity. So decreasing the amount of stuff that you pass in at once is super helpful for increasing your runtime or increasing your throughput. So that change basically brought us up to yellow speed and then they do a hardcore analysis on. Benchmarking YOLOs, including the NMS step.[00:16:54] Isaac Robinson: Once you once you include the NMS in the latency calculation, you see that in fact, these debtors [00:17:00] are outperforming, at least this time, the the, the YOLOs that existed. Then LW debtor goes in and suggests that in fact, the frame, the huge boost here is from pre training. So, this is the define line, and this is the define line without pre training.[00:17:19] Isaac Robinson: It's within range, it's still an improvement over the YOLOs, but Really huge boost comes from the benefit of pre training. When YOLOx came out in 2021, they showed that they got much better results by having a much, much longer training time, but they found that when they did that, they actually did not benefit from pre training.[00:17:40] Isaac Robinson: So, you see in this graph from LWdebtor, in fact, YOLOs do have a real benefit from pre training, but it goes away as we increase the training time. Then, the debtors converge much faster. LWdebtor trains for only 50 epochs, RTdebtor is 60 epochs. So, one could assume that, in fact, [00:18:00] the entire extra gain from pre training is that you're not destroying your original weights.[00:18:06] Isaac Robinson: By relying on this long training cycle. And then LWdebtor also shows superior performance to our favorite data set, Roboflow 100 which means that they do better on the real world, not just on Cocoa. Then Define throws all the bells and whistles at it. Yellow models tend to have a lot of very specific complicated loss functions.[00:18:26] Isaac Robinson: This Define brings that into the debtor world and shows consistent improvement on a variety of debtor based frameworks. So bring these all together and we see that suddenly we have almost 60 AP on Cocoa while running in like 10 milliseconds. Huge, huge stuff. So we're spending a lot of time trying to build models that work better with less data and debtors are clearly becoming a promising step in that direction.[00:18:56] Isaac Robinson: The, what we're interested in seeing [00:19:00] from the debtors in this, this trend to next is. Codetter and the models that are currently sitting on the top of the leaderboard for large scale inference scale really well as you switch out the backbone. We're very interested in seeing and having people publish a paper, potentially us, on what happens if you take these real time ones and then throw a Swingy at it.[00:19:23] Isaac Robinson: Like, do we have a Pareto curve that extends from the real time domain all the way up to the super, super slow but high performance domain? We also want to see people benchmarking in RF100 more, because that type of data is what's relevant for most users. And we want to see more pre training, because pre training works now.[00:19:43] Isaac Robinson: It's super cool.[00:19:48] Peter's Picks[00:19:48] Peter Robicheaux: Alright, so, yeah, so in that theme one of the big things that we're focusing on is how do we get more out of our pre trained models. And one of the lenses to look at this is through sort of [00:20:00] this, this new requirement for like, how Fine grained visual details and your representations that are extracted from your foundation model.[00:20:08] Peter Robicheaux: So it's sort of a hook for this Oh, yeah, this is just a list of all the the papers that I'm going to mention I just want to make sure I set an actual paper so you can find it later[00:20:18] MMVP (Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs)[00:20:18] Peter Robicheaux: Yeah, so sort of the big hook here is that I make the claim that LLMs can't see if you go to if you go to Claude or ChatGPT you ask it to see this Watch and tell me what time it is, it fails, right?[00:20:34] Peter Robicheaux: And so you could say, like, maybe, maybe the Like, this is, like, a very classic test of an LLM, but you could say, Okay, maybe this, this image is, like, too zoomed out, And it just, like, it'll do better if we increase the resolution, And it has easier time finding these fine grained features, Like, where the watch hands are pointing.[00:20:53] Peter Robicheaux: Nodice. And you can say, okay, well, maybe the model just doesn't know how to tell time from knowing the position of the hands. But if you actually prompt [00:21:00] it textually, it's very easy for it to tell the time. So this to me is proof that these LLMs literally cannot see the position of the watch hands and it can't see those details.[00:21:08] Peter Robicheaux: So the question is sort of why? And for you anthropic heads out there, cloud fails too. So the, the, my first pick for best paper of 2024 Envision is this MMVP paper, which tries to investigate the Why do LLMs not have the ability to see fine grained details? And so, for instance, it comes up with a lot of images like this, where you ask it a question that seems very visually apparent to us, like, which way is the school bus facing?[00:21:32] Peter Robicheaux: And it gets it wrong, and then, of course, it makes up details to support its wrong claim. And so, the process by which it finds these images is sort of contained in its hypothesis for why it can't. See these details. So it hypothesizes that models that have been initialized with, with Clip as their vision encoder, they don't have fine grained details and the, the features extracted using Clip because Clip sort of doesn't need to find these fine grained [00:22:00] details to do its job correctly, which is just to match captions and images, right?[00:22:04] Peter Robicheaux: And sort of at a high level, even if ChatGPT wasn't initialized with Clip and wasn't trained contrastively at all. The vision encoder wasn't trained contrastively at all. Still, in order to do its job of capturing the image it could do a pretty good job without actually finding the exact position of all the objects and visual features in the image, right?[00:22:21] Peter Robicheaux: So This paper finds a set of difficult images for these types of models. And the way it does it is it looks for embeddings that are similar in clip space, but far in DynaV2 space. So DynaV2 is a foundation model that was trained self supervised purely on image data. And it kind of uses like some complex student teacher framework, but essentially, and like, it patches out like certain areas of the image or like crops with certain areas of the image and tries to make sure that those have consistent representations, which is a way for it to learn very fine grained visual features.[00:22:54] Peter Robicheaux: And so if you take things that are very close in clip space and very far in DynaV2 space, you get a set of images [00:23:00] that Basically, pairs of images that are hard for a chat GPT and other big language models to distinguish. So, if you then ask it questions about this image, well, as you can see from this chart, it's going to answer the same way for both images, right?[00:23:14] Peter Robicheaux: Because to, to, from the perspective of the vision encoder, they're the same image. And so if you ask a question like, how many eyes does this animal have? It answers the same for both. And like all these other models, including Lava do the same thing, right? And so this is the benchmark that they create, which is like finding clip, like clip line pairs, which is pairs of images that are similar in clip space and creating a data set of multiple choice questions based off of those.[00:23:39] Peter Robicheaux: And so how do these models do? Well, really bad. Lava, I think, So, so, chat2BT and Jim and I do a little bit better than random guessing, but, like, half of the performance of humans who find these problems to be very easy. Lava is, interestingly, extremely negatively correlated with this dataset. It does much, much, much, much worse [00:24:00] than random guessing, which means that this process has done a very good job of identifying hard images for, for Lava, specifically.[00:24:07] Peter Robicheaux: And that's because Lava is basically not trained for very long and is initialized from Clip, and so You would expect it to do poorly on this dataset. So, one of the proposed solutions that this paper attempts is by basically saying, Okay, well if clip features aren't enough, What if we train the visual encoder of the language model also on dyno features?[00:24:27] Peter Robicheaux: And so it, it proposes two different ways of doing this. One, additively which is basically interpolating between the two features, and then one is interleaving, which is just kind of like training one on the combination of both features. So there's this really interesting trend when you do the additive mixture of features.[00:24:45] Peter Robicheaux: So zero is all clip features and one is all DynaV2 features. So. It, as you, so I think it's helpful to look at the right most chart first, which is as you increase the number of DynaV2 features, your model does worse and worse and [00:25:00] worse on the actual language modeling task. And that's because DynaV2 features were trained completely from a self supervised manner and completely in image space.[00:25:08] Peter Robicheaux: It knows nothing about text. These features aren't really compatible with these text models. And so you can train an adapter all you want, but it seems that it's in such an alien language that it's like a very hard optimization for this. These models to solve. And so that kind of supports what's happening on the left, which is that, yeah, it gets better at answering these questions if as you include more dyna V two features up to a point, but then you, when you oversaturate, it completely loses its ability to like.[00:25:36] Peter Robicheaux: Answer language and do language tasks. So you can also see with the interleaving, like they essentially double the number of tokens that are going into these models and just train on both, and it still doesn't really solve the MMVP task. It gets Lava 1. 5 above random guessing by a little bit, but it's still not close to ChachiPT or, you know, Any like human performance, obviously.[00:25:59] Peter Robicheaux: [00:26:00] So clearly this proposed solution of just using DynaV2 features directly, isn't going to work. And basically what that means is that as a as a vision foundation model, DynaV2 is going to be insufficient for language tasks, right?[00:26:14] Florence 2 (Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks)[00:26:14] Peter Robicheaux: So my next pick for best paper of 2024 would be Florence 2, which tries to solve this problem by incorporating not only This dimension of spatial hierarchy, which is to say pixel level understanding, but also in making sure to include what they call semantic granularity, which ends up, the goal is basically to have features that are sufficient for finding objects in the image, so they're, they're, they have enough pixel information, but also can be talked about and can be reasoned about.[00:26:44] Peter Robicheaux: And that's on the semantic granularity axis. So here's an example of basically three different paradigms of labeling that they do. So they, they create a big dataset. One is text, which is just captioning. And you would expect a model that's trained [00:27:00] only on captioning to have similar performance like chat2BT and like not have spatial hierarchy, not have features that are meaningful at the pixel level.[00:27:08] Peter Robicheaux: And so they add another type, which is region text pairs, which is essentially either classifying a region or You're doing object detection or doing instance segmentation on that region or captioning that region. And then they have text phrased region annotations, which is essentially a triple. And basically, not only do you have a region that you've described, you also find it's like, It's placed in a descriptive paragraph about the image, which is basically trying to introduce even more like semantic understanding of these regions.[00:27:39] Peter Robicheaux: And so like, for instance, if you're saying a woman riding on the road, right, you have to know what a woman is and what the road is and that she's on top of it. And that's, that's basically composing a bunch of objects in this visual space, but also thinking about it semantically, right? And so the way that they do this is they take basically they just dump Features from a vision encoder [00:28:00] straight into a encoder decoder transformer.[00:28:03] Peter Robicheaux: And then they train a bunch of different tasks like object detection and so on as a language task. And I think that's one of the big things that we saw in 2024 is these, these vision language models operating in, on pixel space linguistically. So they introduced a bunch of new tokens to point to locations and[00:28:22] Peter Robicheaux: So how does it work? How does it actually do? We can see if you look at the graph on the right, which is using the, the Dino, the the Dino framework your, your pre trained Florence 2 models transfer very, very well. They get 60%, 60 percent map on Cocoa, which is like approaching state of the art and they train[00:28:42] Vik Korrapati: with, and they[00:28:43] Peter Robicheaux: train with a much more more efficiently.[00:28:47] Peter Robicheaux: So they, they converge a lot faster, which both of these things are pointing to the fact that they're actually leveraging their pre trained weights effectively. So where is it falling short? So these models, I forgot to mention, Florence is a 0. 2 [00:29:00] billion and a 0. 7 billion parameter count. So they're very, very small in terms of being a language model.[00:29:05] Peter Robicheaux: And I think that. This framework, you can see saturation. So, what this graph is showing is that if you train a Florence 2 model purely on the image level and region level annotations and not including the pixel level annotations, like this, segmentation, it actually performs better as an object detector.[00:29:25] Peter Robicheaux: And what that means is that it's not able to actually learn all the visual tasks that it's trying to learn because it doesn't have enough capacity.[00:29:32] PalíGemma / PaliGemma 2[00:29:32] Peter Robicheaux: So I'd like to see this paper explore larger model sizes, which brings us to our next big paper of 2024 or two papers. So PolyGemma came out earlier this year.[00:29:42] Peter Robicheaux: PolyGemma 2 was released, I think like a week or two ago. Oh, I forgot to mention, you can actually train You can, like, label text datasets on RoboFlow and you can train a Florence 2 model and you can actually train a PolyGemma 2 model on RoboFlow, which we got into the platform within, like, 14 hours of release, which I was really excited about.[00:29:59] Peter Robicheaux: So, anyway, so [00:30:00] PolyGemma 2, so PolyGemma is essentially doing the same thing, but instead of doing an encoder decoder, it just dumps everything into a decoder only transformer model. But it also introduced the concept of location tokens to point to objects in pixel space. PolyGemma 2, so PolyGemma uses Gemma as the language encoder, and it uses Gemma2B.[00:30:17] Peter Robicheaux: PolyGemma 2 introduces using multiple different sizes of language encoders. So, the way that they sort of get around having to do encoder decoder is they use the concept of prefix loss. Which basically means that when it's generating, tokens autoregressively, it's all those tokens in the prefix, which is like the image that it's looking at and like a description of the task that it's trying to do.[00:30:41] Peter Robicheaux: They're attending to each other fully, full attention. Which means that, you know, it can sort of. Find high level it's easier for the, the prefix to color, to color the output of the suffix and also to just find like features easily. So this is sort of [00:31:00] an example of like one of the tasks that was trained on, which is like, you describe the task in English and then you give it all these, like, You're asking for it to segment these two classes of objects, and then it finds, like, their locations using these tokens, and it finds their masks using some encoding of the masks into tokens.[00:31:24] Peter Robicheaux: And, yeah, so, one of my critiques, I guess, of PolyGemma 1, at least, is that You find that performance saturates as a pre trained model after only 300 million examples seen. So, what this graph is representing is each blue dot is a performance on some downstream task. And you can see that after seeing 300 million examples, It sort of does equally well on all of the downtrend tasks that they tried it on, which was a lot as 1 billion examples, which to me also kind of suggests a lack of capacity for this model.[00:31:58] Peter Robicheaux: PolyGemma2, [00:32:00] you can see the results on object detection. So these were transferred to to Coco. And you can see that this sort of also points to an increase in capacity being helpful to the model. You can see as. Both the resolution increases, and the parameter count of the language model increases, performance increases.[00:32:16] Peter Robicheaux: So resolution makes sense, obviously, it helps to find small images, or small objects in the image. But it also makes sense for another reason, which is that it kind of gives the model a thinking register, and it gives it more tokens to, like, process when making its predictions. But yeah, you could, you could say, oh, 43.[00:32:30] Peter Robicheaux: 6, that's not that great, like Florence 2 got 60. But this is not Training a dino or a debtor on top of this language or this image encoder. It's doing the raw language modeling task on Cocoa. So it doesn't have any of the bells and whistles. It doesn't have any of the fancy losses. It doesn't even have bipartite graph matching or anything like that.[00:32:52] Peter Robicheaux: Okay, the big result and one of the reasons that I was really excited about this paper is that they blow everything else away [00:33:00] on MMVP. I mean, 47. 3, sure, that's nowhere near human accuracy, which, again, is 94%, but for a, you know, a 2 billion language, 2 billion parameter language model to be chat2BT, that's quite the achievement.[00:33:12] Peter Robicheaux: And that sort of brings us to our final pick for paper of the year, which is AIMV2. So, AIMV2 sort of says, okay, Maybe this language model, like, maybe coming up with all these specific annotations to find features and with high fidelity and pixel space isn't actually necessary. And we can come up with an even simpler, more beautiful idea for combining you know, image tokens and pixel tokens in a way that's interfaceable for language tasks.[00:33:44] Peter Robicheaux: And this is nice because it can scale, you can come up with lots more data if you don't have to come up with all these annotations, right? So the way that it works. is it does something very, very similar to PolyGemo, where you have a vision encoder that dumps image tokens into a decoder only transformer.[00:33:59] Peter Robicheaux: But [00:34:00] the interesting thing is that it also autoregressively tries to learn the mean squared error of the image tokens. So instead of having to come up with fancy object detection or semantic, or segment, or segmentation labels, you can just try to reconstruct the image and have it learn fine grained features that way.[00:34:16] Peter Robicheaux: And it does this in kind of, I think, a beautiful way that's kind of compatible with the PolyGemma line of thinking, which is randomly sampling a prefix line of thinking Prefix length and using only this number of image tokens as the prefix. And so doing a similar thing with the causal. So the causal with prefix is the, the attention mask on the right.[00:34:35] Peter Robicheaux: So it's doing full block attention with some randomly sampled number of image tokens to then reconstruct the rest of the image and the downstream caption for that image. And so, This is the dataset that they train on. It's image or internet scale data, very high quality data created by the data filtering networks paper, essentially which is maybe The best clip data that exists.[00:34:59] Peter Robicheaux: [00:35:00] And we can see that this is finally a model that doesn't saturate. It's even at the highest parameter count, it's, it appears to be, oh, at the highest parameter account, it appears to be improving in performance with more and more samples seen. And so you can sort of think that. You know, if we just keep bumping the parameter count and increasing the example scene, which is the, the, the line of thinking for language models, then it'll keep getting better.[00:35:27] Peter Robicheaux: So how does it actually do at finding, oh, it also improves with resolution, which you would expect for a model that This is the ImageNet classification accuracy, but yeah, it does better if you increase the resolution, which means that it's actually leveraging and finding fine grained visual features.[00:35:44] Peter Robicheaux: And so how does that actually do compared to CLIP on Cocoa? Well, you can see that if you slap a transformer detection head on it, Entry now in Cocoa, it's just 60. 2, which is also within spitting distance of Soda, which means that it does a very good job of [00:36:00] finding visual features, but you could say, okay, well, wait a second.[00:36:03] Peter Robicheaux: Clip got to 59. 1, so. Like, how does this prove your claim at all? Because doesn't that mean like clip, which is known to be clip blind and do badly on MMVP, it's able to achieve a very high performance on fine, on this fine grained visual features task of object detection, well, they train on like, Tons of data.[00:36:24] Peter Robicheaux: They train on like objects, 365, Cocoa, Flickr and everything else. And so I think that this benchmark doesn't do a great job of selling how good of a pre trained model MV2 is. And we would like to see the performance on fewer data as examples and not trained to convergence on object detection. So seeing it in the real world on like a dataset, like RoboFlow 100, I think would be quite interesting.[00:36:48] Peter Robicheaux: And our, our, I guess our final, final pick for paper of 2024 would be Moondream. So introducing Vic to talk about that.[00:36:54] swyx: But overall, that was exactly what I was looking for. Like best of 2024, an amazing job. Yeah, you can, [00:37:00] if there's any other questions while Vic gets set up, like vision stuff,[00:37:07] swyx: yeah,[00:37:11] swyx: Vic, go ahead. Hi,[00:37:13] Vik Korrapati / Moondream[00:37:13] question: well, while we're getting set up, hi, over here, thanks for the really awesome talk. One of the things that's been weird and surprising is that the foundation model companies Even these MLMs, they're just like worse than RT Tether at detection still. Like, if you wanted to pay a bunch of money to auto label your detection dataset, If you gave it to OpenAI or Cloud, that would be like a big waste.[00:37:37] question: So I'm curious, just like, even Pali Gemma 2, like is worse. So, so I'm curious to hear your thoughts on like, how come, Nobody's cracked the code on like a generalist that really you know, beats a specialist model in computer vision like they have in in LLM land.[00:38:00][00:38:01] Isaac Robinson: Okay. It's a very, very interesting question. I think it depends on the specific domain. For image classification, it's basically there. In the, in AIMv2 showed, a simple attentional probe on the pre trained features gets like 90%, which is as well as anyone does. The, the, the, the bigger question, like, why isn't it transferring to object detection, especially like real time object detection.[00:38:25] Isaac Robinson: I think, in my mind, there are two answers. One is, object detection is really, really, really the architectures are super domain specific. You know, we see these, all these super, super complicated things, and it's not super easy to, to, to build something that just transfers naturally like that, whereas image classification, you know, clip pre training transfers super, super quickly.[00:38:48] Isaac Robinson: And the other thing is, until recently, the real time object detectors didn't even really benefit from pre training. Like, you see the YOLOs that are like, essentially saturated, showing very little [00:39:00] difference with pre training improvements, with using pre trained model at all. It's not surprising, necessarily, that People aren't looking at the effects of better and better pre training on real time detection.[00:39:12] Isaac Robinson: Maybe that'll change in the next year. Does that answer your question?[00:39:17] Peter Robicheaux: Can you guys hear me? Yeah, one thing I want to add is just like, or just to summarize, basically, is that like, Until 2024, you know, we haven't really seen a combination of transformer based object detectors and fancy losses, and PolyGemma suffers from the same problem, which is basically to say that these ResNet, or like the convolutional models, they have all these, like, extreme optimizations for doing object detection, but essentially, I think it's kind of been shown now that convolution models like just don't benefit from pre training and just don't like have the level of intelligence of transformer models.[00:39:56] swyx: Awesome. Hi,[00:39:59] Vik Korrapati: can [00:40:00] you hear me?[00:40:01] swyx: Cool. I hear you. See you. Are you sharing your screen?[00:40:04] Vik Korrapati: Hi. Might have forgotten to do that. Let me do[00:40:07] swyx: that. Sorry, should have done[00:40:08] Vik Korrapati: that.[00:40:17] swyx: Here's your screen. Oh, classic. You might have to quit zoom and restart. What? It's fine. We have a capture of your screen.[00:40:34] swyx: So let's get to it.[00:40:35] Vik Korrapati: Okay, easy enough.[00:40:49] Vik Korrapati: All right. Hi, everyone. My name is Vic. I've been working on Moondream for almost a year now. Like Shawn mentioned, I just went and looked and it turns out the first version I released December [00:41:00] 29, 2023. It's been a fascinating journey. So Moonbeam started off as a tiny vision language model. Since then, we've expanded scope a little bit to also try and build some tooling, client libraries, et cetera, to help people really deploy it.[00:41:13] Vik Korrapati: Unlike traditional large models that are focused at assistant type use cases, we're laser focused on building capabilities that developers can, sorry, it's yeah, we're basically focused on building capabilities that developers can use to build vision applications that can run anywhere. So, in a lot of cases for vision more so than for text, you really care about being able to run on the edge, run in real time, etc.[00:41:40] Vik Korrapati: So That's really important. We have we have different output modalities that we support. There's query where you can ask general English questions about an image and get back human like answers. There's captioning, which a lot of our users use for generating synthetic datasets to then train diffusion models and whatnot.[00:41:57] Vik Korrapati: We've done a lot of work to minimize those sessions there. [00:42:00] So that's. Use lot. We have open vocabulary object detection built in similar to a couple of more recent models like Palagem, et cetera, where rather than having to train a dedicated model, you can just say show me soccer balls in this image or show me if there are any deer in this image, it'll detect it.[00:42:14] Vik Korrapati: More recently, earlier this month, we released pointing capability where if all you're interested in is the center of an object you can just ask it to point out where that is. This is very useful when you're doing, you know, I automation type stuff. Let's see, LA we, we have two models out right now.[00:42:33] Vik Korrapati: There's a general purpose to be para model, which runs fair. Like it's, it's it's fine if you're running on server. It's good for our local Amma desktop friends and it can run on flagship, flagship mobile phones, but it never. so much for joining us today, and we'll see you in the [00:43:00] next one. Less memory even with our not yet fully optimized inference client.[00:43:06] Vik Korrapati: So the way we built our 0. 5b model was to start with the 2 billion parameter model and prune it while doing continual training to retain performance. We, our objective during the pruning was to preserve accuracy across a broad set of benchmarks. So the way we went about it was to estimate the importance of different components of the model, like attention heads, channels MLP rows and whatnot using basically a technique based on the gradient.[00:43:37] Vik Korrapati: I'm not sure how much people want to know details. We'll be writing a paper about this, but feel free to grab me if you have more questions. Then we iteratively prune a small chunk that will minimize loss and performance retrain the model to recover performance and bring it back. The 0. 5b we released is more of a proof of concept that this is possible.[00:43:54] Vik Korrapati: I think the thing that's really exciting about this is it makes it possible for for developers to build using the 2B param [00:44:00] model and just explore, build their application, and then once they're ready to deploy figure out what exactly they need out of the model and prune those capabilities into a smaller form factor that makes sense for their deployment target.[00:44:12] Vik Korrapati: So yeah, very excited about that. Let me talk to you folks a little bit about another problem I've been working on recently, which is similar to the clocks example we've been talking about. We had a customer reach out who was talking about, like, who had a bunch of gauges out in the field. This is very common in manufacturing and oil and gas, where you have a bunch of analog devices that you need to monitor.[00:44:34] Vik Korrapati: It's expensive to. And I was like, okay, let's have humans look at that and monitor stuff and make sure that the system gets shut down when the temperature goes over 80 or something. So I was like, yeah, this seems easy enough. Happy to, happy to help you distill that. Let's, let's get it going. Turns out our model couldn't do it at all.[00:44:51] Vik Korrapati: I went and looked at other open source models to see if I could just generate a bunch of data and learn from that. Did not work either. So I was like, let's look at what the folks with [00:45:00] hundreds of billions of dollars in market cap have to offer. And yeah, that doesn't work either. My hypothesis is that like the, the way these models are trained are using a large amount of image text data scraped from the internet.[00:45:15] Vik Korrapati: And that can be biased. In the case of gauges, most gauge images aren't gauges in the wild, they're product images. Detail images like these, where it's always set to zero. It's paired with an alt text that says something like GIVTO, pressure sensor, PSI, zero to 30 or something. And so the models are fairly good at picking up those details.[00:45:35] Vik Korrapati: It'll tell you that it's a pressure gauge. It'll tell you what the brand is, but it doesn't really learn to pay attention to the needle over there. And so, yeah, that's a gap we need to address. So naturally my mind goes to like, let's use synthetic data to, Solve this problem. That works, but it's problematic because it turned out we needed millions of synthetic gauge images to get to reasonable performance.[00:45:57] Vik Korrapati: And thinking about it, reading a gauge is like [00:46:00] not a one, like it's not a zero short process in our minds, right? Like if you had to tell me the reading in Celsius for this, Real world gauge. There's two dials on there. So first you have to figure out which one you have to be paying attention to, like the inner one or the outer one.[00:46:14] Vik Korrapati: You look at the tip of the needle, you look at what labels it's between, and you count how many and do some math to figure out what that probably is. So what happens if we just add that as a Chain of thought to give the model better understanding of the different sub, to allow the model to better learn the subtasks it needs to perform to accomplish this goal.[00:46:37] Vik Korrapati: So you can see in this example, this was actually generated by the latest version of our model. It's like, okay, Celsius is the inner scale. It's between 50 and 60. There's 10 ticks. So the second tick, it's a little debatable here, like there's a weird shadow situation going on, the dial is off, so I don't know what the ground truth is, but it works okay.[00:46:57] Vik Korrapati: There's points on there that are, the points [00:47:00] over there are actually grounded. I don't know if this is easy to see, but when I click on those, there's a little red dot that moves around on the image. The model actually has to predict where this points are, I was already trying to do this with bounding boxes, but then Malmo came out with pointing capabilities.[00:47:15] Vik Korrapati: And it's like pointing is a much better paradigm to to represent this. We see pretty good results. This one's actually for clock reading. I couldn't find our chart for gauge reading at the last minute. So the light. Blue chart is with our rounded chain of thought. This measures, we have, we built a clock reading benchmark about 500 images.[00:47:37] Vik Korrapati: This measures accuracy on that. You can see it's a lot more sample efficient when you're using the chain of thought to model. Another big benefit from this approach is like, you can kind of understand how the model is. it and how it's failing. So in this example, the actual correct reading is 54 Celsius, the model output [00:48:00] 56, not too bad but you can actually go and see where it messed up. Like it got a lot of these right, except instead of saying it was on the 7th tick, it actually predicted that it was the 8th tick and that's why it went with 56.[00:48:14] Vik Korrapati: So now that you know that this. Failing in this way, you can adjust how you're doing the chain of thought to maybe say like, actually count out each tick from 40, instead of just trying to say it's the eighth tick. Or you might say like, okay, I see that there's that middle thing, I'll count from there instead of all the way from 40.[00:48:31] Vik Korrapati: So helps a ton. The other thing I'm excited about is a few short prompting or test time training with this. Like if a customer has a specific gauge that like we're seeing minor errors on, they can give us a couple of examples where like, if it's miss detecting the. Needle, they can go in and correct that in the chain of thought.[00:48:49] Vik Korrapati: And hopefully that works the next time. Now, exciting approach, we only apply it to clocks and gauges. The real question is, is it going to generalize? Probably, like, there's some science [00:49:00] from text models that when you train on a broad number of tasks, it does generalize. And I'm seeing some science with our model as well.[00:49:05] Vik Korrapati: So, in addition to the image based chain of thought stuff, I also added some spelling based chain of thought to help it understand better understand OCR, I guess. I don't understand why everyone doesn't do this, by the way. Like, it's trivial benchmark question. It's Very, very easy to nail. But I also wanted to support it for stuff like license plate, partial matching, like, hey, does any license plate in this image start with WHA or whatever?[00:49:29] Vik Korrapati: So yeah, that sort of worked. All right, that, that ends my story about the gauges. If you think about what's going on over here it's interesting that like LLMs are showing enormous. Progress in reasoning, especially with the latest set of models that we've seen, but we're not really seeing, I have a feeling that VLMs are lagging behind, as we can see with these tasks that should be very simple for a human to do [00:50:00] that are very easy to find VLMs failing at.[00:50:04] Vik Korrapati: My hypothesis on why this is the case is because On the internet, there's a ton of data that talks about how to reason. There's books about how to solve problems. There's books critiquing the books about how to solve problems. But humans are just so good at perception that we never really talk about it.[00:50:20] Vik Korrapati: Like, maybe in art books where it's like, hey, to show that that mountain is further away, you need to desaturate it a bit or whatever. But the actual data on how to, like, look at images is, isn't really present. Also, the Data we have is kind of sketched. The best source of data we have is like image all text pairs on the internet and that's pretty low quality.[00:50:40] Vik Korrapati: So yeah, I, I think our solution here is really just we need to teach them how to operate on individual tasks and figure out how to scale that out. All right. Yep. So conclusion. At Moondream we're trying to build amazing PLMs that run everywhere. Very hard problem. Much work ahead, but we're making a ton of progress and I'm really excited [00:51:00] about If anyone wants to chat about more technical details about how we're doing this or interest in collaborating, please, please hit me up.[00:51:08] Isaac Robinson: Yeah,[00:51:09] swyx: like, I always, when people say, when people say multi modality, like, you know, I always think about vision as the first among equals in all the modalities. So, I really appreciate having the experts in the room. Get full access to Latent Space at www.latent.space/subscribe
Send us a textDo you want to learn German? Only on October 10th: Join me for a free lesson >>>16 German Verbs You Get by adding a Prefix to -machen
Hey followers, thanks for coming back to Gav & Em's How to English TEFL Pod. You can show your support for their show here: https://ko-fi.com/howtoenglishpod This week our TEFLing duo take on topics such as the prefix "under" and compare it to some other prefixes, Em tries her hand at being a student, Gav answers some What Would Gav Dos, and the Quiz of the Week looks at some words with Greek roots. You can find the transcription with audio here: https://share.descript.com/view/Nszlv2MiPYr And here's a link to all the shows: http://howtoenglishpod.com/ References: Pronunciation - two syllable words: https://pronuncian.com/2syllable-word-stress Quiz of the Week: https://www.proprofs.com/quiz-school/story.php?title=mte2otc2oqwroj Timestamps (00:00) Intro and house keeping (05:35) Under (09:18) Becoming a student (15:17) Making mistakes (25:11) What would Gav dos (32:00) Quiz of the Week (39:55) Outro
IPv6 Buzz welcomes back Nick Buraglio, a frequent guest, to discuss RFC 9637. We get into the details of RFC 9637, which describes the new documentation prefix space for IPv6. We also explore the process of how RFCs go from idea to standard in the IETF. (Cue the “I’m Just a Bill” song from Schoolhouse... Read more »
IPv6 Buzz welcomes back Nick Buraglio, a frequent guest, to discuss RFC 9637. We get into the details of RFC 9637, which describes the new documentation prefix space for IPv6. We also explore the process of how RFCs go from idea to standard in the IETF. (Cue the “I’m Just a Bill” song from Schoolhouse... Read more »
Das Python Data Model (click here to comment) 19. Juli 2024, Jochen Seit einiger Zeit bekommen wir das Feedback, dass wir mehr über Python direkt sprechen sollten
This voter disenfranchisement game the "concerned Democrats" are playing is a Russian Roulette pistol with a bullet in every chamber. Some of us have reason to be worried, even if wealthy white men who'll be OK either way DON'T.
Software Engineering Radio - The Podcast for Professional Software Developers
Wolf Vollprecht, the CEO and founder of Prefix.dev, speaks with host Gregory M. Kapfhammer about how to implement Python tools, such as package managers, in the Rust programming language. They discuss the challenges associated with building Python infrastructure tooling in Python and explore how using the Rust programming language addresses these concerns. They also explore the implementation details of Rust-based tooling for the Python ecosystem, focusing on the cross-platform Pixi package management tool, which enables developers to easily and efficiently install libraries and applications in a reproducible fashion. Brought to you by IEEE Computer Society and IEEE Software magazine.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller, published by Henry Cai on June 16, 2024 on The AI Alignment Forum. In this paper, we are trying to control model behaviors. For example, by asking saying "You hear someone making fun of a topic you're passionate about", we can control an LLM to behave in an angrier manner. We can also control "any" behaviors of an LLM by simply defining a one-liner of description. The teaser below shows the scope of our method -- SelfControl. TL;DR: We propose a novel framework, Self-Control, to control LLMs' behaviors. By appending suffix strings, e.g. "Is the above response helpful? Please answer Yes or No" to self-judge and optimizing the corresponding suffix score, we obtain the suffix gradients w.r.t the model hidden states and directly modify the states to control model behaviors on-the-fly. We then compress the gradients into a Prefix Controller, to enable controlling for any behavior target without additional cost. Our experiments demonstrate its efficacy and the exploratory study hints some potential mechanistic interpretability using suffix gradients. Tweet thread summary: link Colab demo: link Github link: code Summary of Paper There are two parts in our framework of SelfControl. The first part is a training-free method and the second part is a parameter-efficient module. The idea of the first part is straight-forward -- we wanted to control model behaviors through representation/activation engineering[1], but in a different way from the RepE paper. We thought gradients may be more flexible and provide more possibilities. Thus we tried appending some strings and then obtain the gradients using the so called "suffix score", which is free from the need to collect an annotated dataset. We call them "suffix gradients". This by the way picked up the topic of "self-improve/self-judgment", which has garnered much interests. Based on this idea, we built up an iterative framework: 1) We need to define the control direction by selecting suffix string and target (step 2 in the figure); 2) branch the first token and sample the response with the highest suffix score at each iteration (step 1/4 in the figure), and 2) obtaining gradients based on that response, find a proper step-size for the gradients, and then control the model (add them to the hidden states at the positions of input tokens, step 3 in the figure). Step 3 and 4 form the iteration loop. The optimization objective is thus to maximize the suffix score shown below: where H_{input} is the input hidden states with the suffix gradients. Specifically, we use the original (uncontrolled) model for suffix score evaluation. We were also interested in compressing these found gradients into another module. It is similar to the idea of LoRRA in the RepE paper[2] and a parallel work, whereas we were more interested in learning a prefix. By gathering suffix gradients obtained from the first part, we trained a Prefix Controller by minimizing the mean squared error between the hidden states (latent representations) from the corresponding layers. To ensure the quality of training data (suffix gradients), we filtered them by their norms and the suffix score of the output when control with that gradients. Below are some of the results. SelfControl achieves good performances on various tasks. Specifically, it can also serve as a data synthesis method (see the DPO experiment): We also carried out some exploratory studies on suffix gradients, and we are especially interested in the study of gradients' norm patterns across different tasks: Overall, our experiments and analysis show that SelfControl is able to control a wide range of model behaviors, and can potentially be applied to other areas, including alignment (the DPO experiment) and mechanistic interpreta...
Send us a Text Message.A nice Tuesday crossword, and, as detailed in today's episode, one eight years in the making!! So, even though PERSEVERANCE is not in the grid, it truly is in the heart of Chris Leatherberry, the constructor of today's fine work.We appreciated all the longer wavelength color references in today's grid, namely 54D, Famed fireman Red, ADAIR, in close proximity to 55D, Reddish application to cheeks, ROUGE, and kitty corner to 12D, More pink, RARER. We also, enjoyed 69A, Prefix with brow, UNI, as we are big fans of Ernie, Oscar the Grouch, Sam Eagle, and other assorted muppets.Show note imagery: RODAN, probably on the prowl for GodzillaContact Info:We love listener mail! Drop us a line, crosswordpodcast@icloud.com.Also, we're on FaceBook, so feel free to drop by there and strike up a conversation!
My links: My patreon: https://www.patreon.com/user?u=103280827 My Ko-fi: https://ko-fi.com/rhetoricrevolution Send me a voice message!: https://podcasters.spotify.com/pod/show/liam-connerly TikTok: https://www.tiktok.com/@mrconnerly?is_from_webapp=1&sender_device=pc Email: rhetoricrevolution@gmail.com Instagram: https://www.instagram.com/connerlyliam/ Podcast | Latin in Layman's - A Rhetoric Revolution https://open.spotify.com/show/0EjiYFx1K4lwfykjf5jApM?si=b871da6367d74d92 Gut Guardian Discount Code: LIAM64728 Epiphany: Etymology: From Greek "epiphaneia," meaning manifestation or appearance. Significance: Represents a sudden, profound realization or insight. Epistemology: Etymology: Derived from Greek "epistēmē," meaning knowledge. Significance: Refers to the branch of philosophy that explores the nature and limits of human knowledge. Epilogue: Etymology: Comes from Greek "epilogos," meaning conclusion. Significance: The concluding section of a literary work, providing closure or reflection. Epitome: Etymology: Rooted in Greek "epitomē," meaning abridgment or summary. Significance: Represents a perfect example or embodiment of a particular quality. Epistolary: Etymology: Derived from Greek "epistolē," meaning letter. Significance: Relates to the form of communication through letters or literary works in the form of letters. Epiphysis: Etymology: From Greek "epiphysis," meaning growth upon. Significance: In anatomy, refers to the growth plate at the end of long bones in children. Epigenetics: Etymology: Combines Greek "epi-" (over, above) with genetics. Significance: Study of heritable changes in gene function that do not involve changes to the underlying DNA sequence. Epicenter: Etymology: From Greek "epi-" (upon) + "kentron" (center). Significance: The point on the Earth's surface directly above the earthquake's point of origin. Epistaxis: Etymology: Derived from Greek "epi-" (upon) + "stazein" (to drip). Significance: Medical term for nosebleed. Epithet: Etymology: Comes from Greek "epitheton," meaning attributed or added. Significance: A descriptive term or phrase expressing a quality characteristic of the person or thing mentioned.
Knox New York Deep & Soulful 267 01 - Swanky - Nothin' Serious - 2Sleep 2024 Rework 02 - Masters At Work, Louie Vega, Kenny Dope - MAW Shonuff (Original Mix) 03 - KenLou, Louie Vega, Kenny Dope - Gimme Some More (KenLou Mix) 04 - Masters At Work, Kenny Dope - The Ha Drop (Kenny Dope Remix) 05 - Deon Cole, Terry Hunter, Terisa Griffin - Where The Freaks At (Terry Hunter Freaky A** Dub) 06 - Knox - Discolicious 07 - Wade Teo, Charles Rickards, Kenny Dope - La De Da Feat. Kameelah Waheed (Kenny Dope Remix Main) 08 - Sterling Ensemble, Sara Devine, Kenny Dope - Joy (Kenny Dope Remix) 09 - Prefix one - In the flow 10 - Kenny Dope, The Bucketheads, Massivedrum - The Bomb! (These Sounds Fall Into My Mind) (Massivedrum Remix)
Talk Python To Me - Python conversations for passionate developers
On this episode we have Wolf Vollprecht and Ruben Arts from the pixi project here to talk about pixi, a high performance package manager for Python and other languages that actually manages Python itself too. They have a lot of interesting ideas on where Python packaging should go and are putting their time and effort behind them. Will pixi become your next package manager? Listen in to find out. Links from the show Black Friday at Talk Python: talkpython.fm/blackfriday Guests Wolf Vollprecht: github.com/wolfv Ruben Arts: github.com/ruben-arts pixi: prefix.dev Prefix: prefix.dev Launching pixi: prefix.dev Conda: docs.conda.io Conda Forge: conda-forge.org NixOS: nixos.org Packaging Con 2023: packaging-con.org Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to us on YouTube: youtube.com Follow Talk Python on Mastodon: talkpython Follow Michael on Mastodon: mkennedy Sponsors Posit Python Tutor Talk Python Training
DHCPv6 Prefix Delegation (DHCPv6-PD) is an IETF RFC that lets one router delegate a long-lived prefix, using DHCP, to a requesting router. What’s the need for this? As the RFC notes, some applications expect stable addresses. It also notes: It is appropriate for situations in which the delegating router does not have knowledge about the... Read more »
DHCPv6 Prefix Delegation (DHCPv6-PD) is an IETF RFC that lets a router delegate a long-lived prefix, using DHCP, to a requesting router. The hosts discuss how this is used today both by service providers and in the enterprise, and potential impacts on address allocation and planning. The post IPv6 Buzz 138: Making Sense Of DHCPv6 Prefix Delegation (DHCPv6-PD) appeared first on Packet Pushers.
DHCPv6 Prefix Delegation (DHCPv6-PD) is an IETF RFC that lets a router delegate a long-lived prefix, using DHCP, to a requesting router. The hosts discuss how this is used today both by service providers and in the enterprise, and potential impacts on address allocation and planning. The post IPv6 Buzz 138: Making Sense Of DHCPv6 Prefix Delegation (DHCPv6-PD) appeared first on Packet Pushers.
DHCPv6 Prefix Delegation (DHCPv6-PD) is an IETF RFC that lets one router delegate a long-lived prefix, using DHCP, to a requesting router. What’s the need for this? As the RFC notes, some applications expect stable addresses. It also notes: It is appropriate for situations in which the delegating router does not have knowledge about the... Read more »
DHCPv6 Prefix Delegation (DHCPv6-PD) is an IETF RFC that lets one router delegate a long-lived prefix, using DHCP, to a requesting router. What’s the need for this? As the RFC notes, some applications expect stable addresses. It also notes: It is appropriate for situations in which the delegating router does not have knowledge about the... Read more »
DHCPv6 Prefix Delegation (DHCPv6-PD) is an IETF RFC that lets a router delegate a long-lived prefix, using DHCP, to a requesting router. The hosts discuss how this is used today both by service providers and in the enterprise, and potential impacts on address allocation and planning. The post IPv6 Buzz 138: Making Sense Of DHCPv6 Prefix Delegation (DHCPv6-PD) appeared first on Packet Pushers.
942. We're diving deep into the chameleon-like nature of the "a-" prefix, tracing its journey from Latin, where it often started out as "ad-," to its function as a preposition in French, and its transformative role in Greek that gifts English words like "atypical" and "asymmetrical." You'll be wowed by the versatility of the seemingly humble "a-" prefix as we unveil its covert presence in words like "atom" and its power in creating modern English words like "asexual."Then, we explore the difference between the words "personal" and "personnel" and give you a tip for getting the spelling right every time.The "a-" prefix segment was by Kirk Hazen, a data scientist at CVS Health and a linguist at West Virginia University. He is the author of Introduction to Language (Wiley) and can be found on LinkedIn: https://www.linkedin.com/in/kirk-hazen-phd/| Transcript: https://grammar-girl.simplecast.com/episodes/personnel/transcript| Preorder "The Grammar Daily"| Subscribe to the newsletter for regular updates.| Watch my LinkedIn Learning writing courses.| Peeve Wars card game. | Grammar Girl books. | HOST: Mignon Fogarty| VOICEMAIL: 833-214-GIRL (833-214-4475) or https://sayhi.chat/grammargirl| Grammar Girl is part of the Quick and Dirty Tips podcast network.Audio Engineer: Nathan SemesDirector of Podcasts: Adam CecilAdvertising Operations Specialist: Morgan ChristiansonMarketing Associate: Davina TomlinDigital Operations Specialist: Holly Hutchings| Theme music by Catherine Rannus.| Grammar Girl Social Media Links: YouTube. TikTok. Facebook. Instagram. LinkedIn. Mastodon.
Support the how by becoming a patron: Patreon.com/thebpdshow
In this English lesson I'll help you learn 8 new vocabulary words and 8 words that mean the exact opposite. In English, when you add the prefix DIS- to some words, it will create a new word with the opposite meaning. In this English class I'll give help you learn the new words and their meanings.In this video you'll learn: agree and disagree, obey and disobey, satisfaction and dissatisfaction, appear and disappear, honest and dishonest, like and dislike, connect and disconnect, and advantage and disadvantage.I hope you enjoy this English lesson about the prefix DIS! Have a great day!Note: This is the audio portion of a Youtube English lesson which you can watch right here: https://www.youtube.com/watch?v=iyWXo9ZVkdA or by searching Youtube for, "Bob the Canadian Prefix DIS"Support the show
The word “inflammation” is derived from the Latin verb “inflammare”, which means “to set on fire”. This provides insight into the actual definition of inflammation, which is a protective response of the body to injury or infection. It is characterized by redness, warmth, swelling, and pain, and is the body's attempt to remove harmful stimuli, such as damaged cells, irritants, or pathogens, and to begin the healing process. 1. Inadequate: Not sufficient, lacking in quality. Etymologically, this word comes from the Latin in- and adaequare, meaning "not equal". 2. Inanimate: Not alive; without life or animation. Etymologically, this word comes from the Latin in- and anima, meaning "without spirit". 3. Inaudible: Not able to be heard. Etymologically, this word comes from the Latin in- and audire, meaning "not to hear". 4. Illogical: Not rational, not based on sound reasoning. Etymologically, this word comes from the Latin illogicus, meaning "not reasonable". 5. Immaterial: Not composed of physical matter; having no material form. Etymologically, this word comes from the Latin in- and materia, meaning "without matter". 6. Impossible: Not able to be done or accomplished. Etymologically, this word comes from the Latin in- and possibilis, meaning "not able to be done". 7. Inaction: not taking action; inactive. Etymologically, it comes from the Latin, “inactus”, meaning “not active”. 8. Inadaptable: not able to adjust. Etymologically, it comes from the Latin, “inadaptabilis”, meaning “not able to be adapted”. 9. Impertinent: not appropriate; rude. Etymologically, it comes from the Latin, “impertinens”, meaning “not pertinent”. 10. Illiterate: not able to read; ignorant. Etymologically, it comes from the Latin, “illiteratus”, meaning “not literate”. 11. Impenetrable: not penetrable; impossible to understand. Etymologically, it comes from the Latin, “impenetrabilis”, meaning “not able to be penetrated”. 12. Impolite: not polite; rude. Etymologically, it comes from the Latin, “impolitus”, meaning “not polished”. 13. Incompatible: not compatible; unable to coexist. Etymologically, it comes from the Latin, “incomparabilis”, meaning “not equal”. 14. Impractical: not practical; not useful. Etymologically, it comes from the Latin, “impracticus”, meaning “not able to be done”. --- Support this podcast: https://podcasters.spotify.com/pod/show/liam-connerly/support
Watch on YouTube About the show Sponsored by Microsoft for Startups Founders Hub. Michael #1: Prefix-cache via Brendan Hannigan You can set an environment variable or use it as a command line argument and then instead of creating tons of __pycache__ folders to store your *.pyc files right next to the source code, it puts them in some specified folder. Introduced in python 3.8. Brian #2: NiceGUI Suggested by several listeners Browser based GUI “NiceGUI is an easy-to-use, Python-based UI framework, which shows up in your web browser. You can create buttons, dialogs, Markdown, 3D scenes, plots and much more. It is great for micro web apps, dashboards, robotics projects, smart home solutions and similar use cases. You can also use it in development, for example when tweaking/configuring a machine learning algorithm or tuning motor controllers.” - from the README Michael #3: flask-ngrok A simple way to demo Flask apps from your machine. Makes your Flask apps running on localhost available over the internet via ngrok. Great for testing API consumers too. app = Flask(__name__) run_with_ngrok(app) # Start ngrok when app is run # Endpoints ... if __name__ == '__main__': app.run() Brian #4: No-async async with Python Will McGugan Allowing async while not requiring async Await me (maybe) borrowed from Simon Willison's The “await me maybe” pattern for Python asyncio Optionally awaitable Providing API methods that can be called by both async and non-async code. The called method really is async, but if a caller doesn't want to know when the code is done, it can ignore the return value and not await. MK: I had to solve a similar problem in fastapi-chameleon MK: Syncify async functions. Extras: Brian: PyPI has a blog Docker no longer sunsetting free team plan Jokes: Long-lived software Mysteries make life more interesting last paragraph, discussing the cov fixture of pytest-cov
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Some common confusion about induction heads, published by Alexandre Variengien on March 28, 2023 on LessWrong. Epistemic status: conceptual discussion and opinions informed by doing 6 months of interpretability research at Redwood Research and exchanging with other researchers, but I'm just speaking for myself. Induction heads are defined twice by Anthropic. The first time as a mechanism in 2L attention-only transformers A second time as a behavioral description on repeated random sequences of tokens However, these two definitions rely on distinct sources of evidence and create confusion, as their difference is not always acknowledged when people cite these papers. The mechanistic definition applies to toy language models, while the behavioral definition is a useful yet incomplete characterization of attention heads. I think that many people are in fact confused by this: I have talked to many people who aren't clear on the fact that these two concepts are different, and incorrectly believe that (e.g.) the mechanism of induction heads in larger language models has been characterized. More specifically, the two Anthropic papers introduce the following two distinct definitions of induction heads: Mechanistic: The first definition, introduced by Elhage et al., describes a behavior in a 2 layer attention-only model (copying a token given a matching prefix) and a minimal mechanism to perform this behavior (a set of paths in the computational graph and a human interpretation of the transformation along those paths). Empirically, this mechanism seems to be the best possible short description of what those heads are doing (i.e. if you have to choose a subgraph made of a single path as input for the keys, queries, and values of these heads, the induction circuit is likely to be the one that recovers the most loss). But this explanation does not encompass everything these heads do. In reality, many more paths are used than the one described (see Redwood's causal scrubbing results on induction heads) and the function of the additional paths is unclear. I don't know whether the claims about the behavior and mechanisms of these heads are best described as “mostly true but missing details” or “only a small part of what's going on”. See also Buck's comment for more discussion on the interpretation of causal scrubbing recovered loss. Behavioral: The second definition, introduced by Olsson et al., relies on head activation evaluators (measuring attention patterns and head output) on out-of-distribution sequences made of Repeated Random Tokens (RRT). Two scores are used to characterize induction heads: i) Prefix matching: attention probability to the first occurrence of the token [A] on patterns like [A][B] . [A] ii) Copying: how much the head output increases the logit of [A] compared to the other logits. The RRT distribution was chosen so that fully abstracted induction behavior is one of the few useful heuristics to predict the next token. In this post, I'll use mechanistic or behavioral induction heads to differentiate between the definitions. I'll present three points that — I think — are important to keep in mind when using these definitions. 1 - The two-head mechanism (induction head and previous token head) described in Elhage et al. is the minimal way to implement an induction head As noted in the paper, induction heads can use more complicated mechanisms. For instance, instead of relying on a previous token head to match only one token as a prefix (the token [A] in the example above), they could rely on a head that attends further back to match longer prefixes (e.g. patterns like [X][A][B] . [X][A]). Empirically, evidence for induction heads using some amount of longer prefix matching has been observed in the causal scrubbing experiments on induction. The two-head mechanism i...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Behavioral and mechanistic definitions (often confuse AI alignment discussions), published by Lawrence Chan on February 20, 2023 on The AI Alignment Forum. TL;DR: It's important to distinguish between behavioral definitions – which categorize objects based on outside observable properties – and mechanistic definitions – which categorize objects based on their internal mechanisms. In this post, I give several examples of terms which can be defined either behaviorally and mechanistically. Then, I talk about the pros and cons of both kinds of definitions, and how this distinction relates to the distinction between gears-level versus black-box models. Related to: Most similar to John Wentworth's Gears and Behaviors, but about definitions rather than models. Also inspired by: Gears in understanding, How an algorithm feels from the inside, the “Human's Guide to Words” Sequence in general. Epistemic status: written quickly instead of not at all. Introduction: Broadly speaking, when pointing at a relatively distinct cluster of objects, there's two ways to define membership criteria: Behaviorally: You can categorize objects based on outside observable properties, that is, their behavior in particular situations. Mechanistically: Alternatively, you can categorize objects via their internal mechanisms. That is, instead of only checking for a particular behavioral property, you instead look for how the object implements said property. Many AI safety concepts have both behavioral and mechanistic definitions. In turn, many discussions about AI safety end up with the participants confused or even talking past each other. This is my attempt to clarify the discussion, by giving examples of both, explaining the pros and cons, and discussing when you might want to use either. Three examples of behavioral and mechanistic definitions To better illustrate what I mean, I'll give two examples from recent ML work and a third from the sequences. Induction heads First introduced in a mathematical framework for transformer circuits, induction heads are transformer attention heads that implement in-context copying behavior. However, there seem to be two definitions that are often conflated: Behavioral: Subsequent papers (In-context Learning and Induction Heads, Scaling laws and Interpretability of Learning from Repeated Data) give a behavioral definition of induction heads: Induction heads are heads that score highly on two metrics on repeated random sequences of the form [A] [B] . [A]: Prefix matching: attention heads pay a lot of attention to the first occurrence of the token [A]. Copying: attention heads increase the logit of [B] relative to other tokens. This definition is clearly behavioral: it makes no reference to how these heads are implemented, but only to their outside behavior. Mechanistic: In contrast, the original mathematical framework paper also gives a mechanistic definition for induction heads: induction heads are heads that implement copying behavior using either Q- or K-composition. While this definition does make some reference to outside properties (induction heads implement copying), the primary part is mechanistic and details how this copying behavior is implemented. However, it turns out that the two definitions don't overlap perfectly: behavioral induction heads are often implementing many other heuristics, even in very small language models. I often talk to people who confuse the two definitions and think that we understand much more about the internal mechanisms of large language models than we actually do. In a forthcoming post, Alexandre Variengien discusses the distinction between these two definitions in more detail, while also highlighting specific confusions that may arise from failing to distinguish the two definitions. Different framings of inner and...
Foundations of Amateur Radio A is for Antenna, the eyes and ears of any amateur station. You'll spend eighty percent of your life attempting to get twenty percent improvement for any antenna you'll ever use. B is for Balun, bringing together the balanced and unbalanced parts of your antenna system. C is for Coax, the versatile conductor that snakes into your station, one roll at a time. D is for Dipole, the standard against which all antennas are measured, simple to make, simple to use and often first in the many antenna experiments you'll embark on in your amateur journey. E is for Electron, source of all things RF, the beginning, middle and end of electromagnetism, the reason you are an amateur. F is for Frequency, the higher you go, the faster it happens. G is for Gain, measured against a baseline, you'll throw increasing amounts of effort at getting more, one decibel at a time. H is for Hertz, Heinrich to his mother, the first person to transmit and receive controlled radio waves in November of 1886 proving that James Clerk Maxwell's theory of electromagnetism was correct. I is for Ionosphere, the complex and ever changing layers that surround Earth which led radio amateurs to discover HF propagation in 1923. J is for JOTA, the Jamboree On The Air where radio amateurs, guides and scouts come together on the third full weekend of October to share global communications. K is for Kerchunk, the sound caused by the local repeater that brings a smile to the operator and a grimace to the listener, created by pushing the talk button and not saying anything. L is for Logging, the only way you'll ever remember who you spoke to and when and the perfect excuse for bragging to your friends after you managed to collect contacts all over the globe. M is for Modulation, adding information to a radio signal by varying the amplitude, frequency, or phase. N is for Net, a social excuse for getting on air and making noise with your friends. O is for Oscillator, making repeating currents or voltages by non-mechanical means. P is for Prefix, the beginning part of an amateur callsign that identifies your country or region of origin. Q is for QRP, the best way to make just enough noise to make yourself heard, low power is the way to go! R is for Resonance, the point where a circuit responds strongly to a particular frequency and less to others, used every time you tune a radio or an antenna or both. S is for Shack, the space you call home, where you live your radio dream. The size of the corner of the kitchen table, the back-seat of your car or a purpose built structure with never enough space, no matter how much you try. T is for Transceiver, a single box that contains both a transmitter and receiver that share a common circuit. U is for UTC, Coordinated Universal Time, the only time zone that radio amateurs should use for any activity that goes beyond their suburb. V is for VFO, the Variable Frequency Oscillator that provides radio amateurs with frequency agility, the means to listen anywhere, any-time. W is for Waterfall, which displays radio signals across multiple frequencies at the same time. X is for XIT, Transmit Incremental Tuning, changing your transmitter frequency whilst listening on the same frequency, helpful when you're trying to break through a DX pile-up. Y is for Yagi, or Yagi-Uda antenna, the most popular directional antenna invented in 1926 by Shintaro Uda at the Tohoku Imperial University in Japan and popularised to the English speaking world by his boss Hidetsugu Yagi. Z is for Zulu, the last word in the phonetic alphabet that every amateur should know and use. 73 is for best regards. Saying goodbye is hard to do, this says so without fanfare and clears your station from the air. I'm Onno VK6FLAB
fix; fix + ture; cruci + fix; af + fix; pre + fix; trans + fix; suf + fix; --- Support this podcast: https://anchor.fm/liam-connerly/support
endocrine (adj.) "secreting internally," endo- + Latinized form of Greek krinein "to separate, distinguish". certain (adj.) c. 1300, "determined, fixed," from Old French certain "reliable, sure, assured" (12c.), from Vulgar Latin *certanus, extended form of Latin certus "determined, resolved, fixed, settled," of things whose qualities are invariable, "established," also "placed beyond doubt, sure, true, proved; unerring, to be depended upon" (also source of Old French cert, Italian certo, Spanish cierto), originally a variant past participle of cernere "to distinguish, decide," literally "to sift, separate." This Latin verb comes from the root *krei- "to sieve," thus "discriminate, distinguish.” endocrinology (n.) 1917, from endocrine + -ology. Related: Endocrinologist. endorse (v.) c. 1400, endosse "confirm or approve" endow (v.) late 14c., indowen "provide an income for," from Anglo-French endover, from en- "in" + Old French douer "endow," from Latin dotare "to endow, bestow, portion," from dos (genitive dotis) "marriage portion." endogenous (adj.) "growing or proceeding from within," especially with reference to a class of plants including cereals, palms, plantains, etc., 1822, from endo- "within" + -genous "producing." endorphin (n.) "chemical which occurs naturally in the brain and works like morphine," 1975, from French endorphine. First element from endogène "endogenous, growing within." endometrium (n.) "lining membrane of the uterus," 1882, medical Latin, from endo- + Greek mētra "uterus," related to mētēr "mother" (see mother (n.1)). Related: Endometrial (1870). endoskeleton (n.) 1838, from endo- + skeleton. ENDOSCOPY endo- word-forming element meaning "inside, within, internal," from Greek endon "in, within." -scopy word-forming element meaning "viewing, examining, observing," from Modern Latin -scopium, from Greek -skopion, from skopein "to look at, examine." --- Support this podcast: https://anchor.fm/liam-connerly/support
pathos; pathetic; path + ology; patho + meter; a + pathy; anti +pathy; sym + pathy; tele + pathy; patho + genic; patho + phobia; patho + mania; em + pathy; --- Support this podcast: https://anchor.fm/liam-connerly/support
Here, I go into a bit of a introduction until about the 6 minute mark, where I transition into the linguistics! trans + port; trans + act(ion); tran + script(um); trans + fer(o); trans + form(o); trans + plant(a); trans + parent; tran +scend(o); trans + con + tinential(teneo); --- Support this podcast: https://anchor.fm/liam-connerly/support
Array Cast - October 28, 2022 Show NotesMany thanks to Bob Therriault for gathering these links:[00] 00:11:30 Episode 36 https://www.arraycast.com/episodes/episode36-what-makes-an-array-language[01] 00:01:20 J wiki (prototype) https://code2.jsoftware.com/wiki/Main_Page[02] 00:02:13 Dyalog User meeting videos https://dyalog.tv/Dyalog22/ APLNAATOT Podcast https://www.youtube.com/watch?v=R_dpMVyyCEo&list=PLYKQVqyrAEj8Q7BdOgakZCAGf6ReO1cue https://abrudz.github.io/aplnaatot/[03] 00:05:56 Intermediate q learning material https://github.com/qbists/studyq https://community.kx.com/t5/kdb-and-q/Q-For-Problems-Episode-4/td-p/13254[04] 00:08:20 CBQN REPLXX XXXX[05] 00:10:22 iPython REPL https://ipython.org/[06] 00:11:30 Conor's Venn Diagram https://github.com/codereport/array-language-comparisons[07] 00:15:33 TI-BASIC https://en.wikipedia.org/wiki/TI-BASIC[08] 00:19:40 Scan primitive https://aplwiki.com/wiki/Scan Reduce primitive https://aplwiki.com/wiki/Reduce[09] 00:25:13 q (-) . x y link to study q https://github.com/qbists/studyq[10] 00:29:20 Range between two numbers Episode 15 https://www.arraycast.com/episodes/episode15-tacit-3-and-other-topics Stephen's blog post XXXX[11] 00:34:06 Fold in J https://code.jsoftware.com/wiki/Vocabulary/fcap[12] 00:34:50 Prefix in J https://code.jsoftware.com/wiki/Vocabulary/bslash Suffix in J https://code.jsoftware.com/wiki/Vocabulary/bslashdot[13] 00:37:50 Iverson Notation https://apl.wiki/Iverson_Notation[14] 00:39:00 Remora https://www.ccs.neu.edu/home/jrslepak/typed-j.pdf Nial https://www.nial-array-language.org/[15] 00:43:35 Vector Notation https://aplwiki.com/wiki/Strand_notation[16] 00:46:50 Sigma Sum https://en.wikipedia.org/wiki/Summation Pi Product https://en.wikipedia.org/wiki/Multiplication#Capital_pi_notation Inner Product https://en.wikipedia.org/wiki/Dot_product[17] 00:48:17 Prefix Functions https://en.wikipedia.org/wiki/Polish_notation Infix Functions https://en.wikipedia.org/wiki/Infix_notation Outer Product https://en.wikipedia.org/wiki/Outer_product Scalar Extension https://aplwiki.com/wiki/Scalar_extension Table in J https://code.jsoftware.com/wiki/Vocabulary/slash#dyadic[18] 00:51:50 Arthur Whitney https://aplwiki.com/wiki/Arthur_Whitney k programming language https://k.miraheze.org/wiki/Learning_Resources q programming language https://code.kx.com/q/ lisp programming language https://en.wikipedia.org/wiki/Lisp_(programming_language)[19] 00:53:12 Compression in APL https://aplwiki.com/wiki/Replicate[20] 00:55:21 BQN Strand Notation https://mlochbaum.github.io/BQN/doc/arrayrepr.html#strands[21] 00:57:20 APL2 https://aplwiki.com/wiki/APL2[22] 00:58:10 Roger Hui https://aplwiki.com/wiki/Roger_Hui[23] 01:02:50 Matlab https://www.mathworks.com/products/matlab.html[24] 01:13:12 Implicit Map https://aplwiki.com/wiki/Scalar_extension[25] 01:23:05 I programming language https://aplwiki.com/wiki/I[27] 01:25:10 Nial Atlases https://www.nial-array-language.org/ndocs/dict/#atlas[28] 01:26:04 APL+ https://aplwiki.com/wiki/APL*PLUS[29] 01:31:07 Selfie https://aplwiki.com/wiki/Commute[30] 01:33:10 APL/? Introduction of J https://www.jsoftware.com/papers/J1990.htm[31] 01:37:40 Ferrari refutes the decline of the West https://www.caranddriver.com/features/a15142347/ferrari-reinvents-manifest-destiny-pj-orourke-and-a-ferrari-308gts-archived-feature/[33] 01:45:41 Wolfram language https://www.wolfram.com/language/[34] 01:46:41 contact AT ArrayCast DOT com Conor's Github table https://github.com/codereport/array-language-comparisons/blob/main/Iversonian_vs_Array.md
Imagine growing up near Berlin, getting into dance music, going to Berghain for the first time and getting your mind blown by techno. Then, ten years later, you become a resident DJ at the fabled club. Not many people—any people?—can claim this story, but for Fadi Mohem, what would be so many DJs' dreams became a reality. After getting his 2017 debut 12-inch Reckless in the hands of all the right people in Berlin, Mohem established a bold, bouncy techno sound with releases on Modeselektor's Seilscheibenpfeiler, Ben Klock's Klockworks and FJAAK's SPANDAU20, pretty much the cream of the crop of modern techno. Now, this year, he's becoming one of Berghain's newest residents, alongside other exciting names like Sedef Adasï and Naty Seres (and Lakuti upstairs at Panorama Bar). As heard on his recent collaborative with Ben Klock, Mohem has modern techno down to a science, thanks to a combination of reverence for the old-school and clever rhythmic touches, like the irresistible snare pattern on "Prefix." His RA Podcast sounds like what you might expect from a new generation of Berghain resident: aerodynamic, heavy and, honestly a little midtempo compared to a lot of other young techno DJs. It's the sound the club has made world-famous, with cuts from Reeko, Heiko Laux and Truncate, plus some special moments from aya, DJ Deeon and Petar Dundov. It takes a certain kind of DJ to get to this hallowed place, and Fadi Mohem deserves it. @fadimohem Read more: https://ra.co/podcast/842