POPULARITY
A Steam Deck prototype shows up on eBay, Microsoft remade Quake II with AI, emulating ray tracing on older AMD GPUs, open-source PhysX, and another blogger takes a crack at Linux gaming.
This week on the podcast we go over our reviews of the ZOTAC GAMING GeForce RTX 5070 Solid OC Graphics Card and HyperX Pulsefire Haste 2 Pro Gaming Mouse. We also talk about the launch of the AMD Radeon RX 9070 XT and how it compared to the RTX 5070, more AMD GPUs coming, a very interesting foldable gaming handheld, Digg coming back & more!
Apple is sued over carbon-neutral claims, and they try to explain why the 16e is such a terrible phone. NVIDIA might be purposely holding back stock to inflate prices. AMD announced some GPUs that might be actual competition for a change. Microsoft is shutting down Skype. OpenAI's new model seems like a dud, but it's MUCH more expensive. EA released the source code for one of my all time favorite classic PC games. MWC kicks off today, but the news has been pretty quiet. And we HAVE to talk about all this Xiaomi stuff! Let's get our tech week started right! Show Notes and Links https://somegadgetguy.com/b/4JL Video Replay on YouTube https://youtube.com/live/EeyeQPOnM84 Support Talking Tech with SomeGadgetGuy by contributing to their tip jar: https://tips.pinecast.com/jar/talking-tech-with-somegadgetgu Find out more at https://talking-tech-with-somegadgetgu.pinecast.co This podcast is powered by Pinecast. Try Pinecast for free, forever, no credit card required. If you decide to upgrade, use coupon code r-c117ce for 40% off for 4 months, and support Talking Tech with SomeGadgetGuy.
In today's episode you'll learn and hear my opinion on why I'm sticking with AMD GPUs this time around.
A 671B parameter Mixture-of-Experts language model. It highlights the model's architecture, including its innovative load balancing and multi-token prediction strategies, and its efficient training process using FP8 precision. Benchmark results demonstrate DeepSeek-V3's strong performance compared to other open-source and some closed-source models, particularly in math and code tasks. The document also provides instructions for running DeepSeek-V3 locally using various frameworks and hardware, including NVIDIA and AMD GPUs and Huawei Ascend NPUs. Finally, licensing and contact information are included.
Steam Families is great if you're in the same country, open-source Black & White brings cow-slapping nostalgia, AMD nukes high-end GPUs, Deadlock disables stats for tracker sites, and making a wireless controller is the opposite of easy.
Timestamps: 0:00 i just forgot for a sec 0:08 Users ditch Win11 for Win10 1:38 Radeon GPU sales nosedive 2:50 Apple is poor now, buy iPads 5:05 QUICK BITS INTRO 5:13 Intel Core Ultra 9 285K and other bad names 5:58 Nintendo DMCA's 8500 Yuzu forks 6:48 How hackers share routers 7:44 Rabbit R1 app on iOS 8:23 Boston Dynamics makes Spot fuzzy News Sources: https://lmg.gg/dgPOc Learn more about your ad choices. Visit megaphone.fm/adchoices
Speaker CFPs and Sponsor Guides are now available for AIE World's Fair — join us on June 25-27 for the biggest AI Engineer conference of 2024!Soumith Chintala needs no introduction in the ML world — his insights are incredibly accessible across Twitter, LinkedIn, podcasts, and conference talks (in this pod we'll assume you'll have caught up on the History of PyTorch pod from last year and cover different topics). He's well known as the creator of PyTorch, but he's more broadly the Engineering Lead on AI Infra, PyTorch, and Generative AI at Meta.Soumith was one of the earliest supporters of Latent Space (and more recently AI News), and we were overjoyed to catch up with him on his latest SF visit for a braindump of the latest AI topics, reactions to some of our past guests, and why Open Source AI is personally so important to him.Life in the GPU-Rich LaneBack in January, Zuck went on Instagram to announce their GPU wealth: by the end of 2024, Meta will have 350k H100s. By adding all their GPU clusters, you'd get to 600k H100-equivalents of compute. At FP16 precision, that's ~1,200,000 PFLOPS. If we used George Hotz's (previous guest!) "Person of Compute" measure, Meta now has 60k humans of compute in their clusters. Occasionally we get glimpses into the GPU-rich life; on a recent ThursdAI chat, swyx prompted PaLM tech lead Yi Tay to write down what he missed most from Google, and he commented that UL2 20B was trained by accidentally leaving the training job running for a month, because hardware failures are so rare in Google.Meta AI's Epic LLM RunBefore Llama broke the internet, Meta released an open source LLM in May 2022, OPT-175B, which was notable for how “open” it was - right down to the logbook! They used only 16 NVIDIA V100 GPUs and Soumith agrees that, with hindsight, it was likely under-trained for its parameter size.In Feb 2023 (pre Latent Space pod), Llama was released, with a 7B version trained on 1T tokens alongside 65B and 33B versions trained on 1.4T tokens. The Llama authors included Guillaume Lample and Timothée Lacroix, who went on to start Mistral.July 2023 was Llama2 time (which we covered!): 3 model sizes, 7B, 13B, and 70B, all trained on 2T tokens. The three models accounted for a grand total of 3,311,616 GPU hours for all pre-training work. CodeLlama followed shortly after, a fine-tune of Llama2 specifically focused on code generation use cases. The family had models in the 7B, 13B, 34B, and 70B size, all trained with 500B extra tokens of code and code-related data, except for 70B which is trained on 1T.All of this on top of other open sourced models like Segment Anything (one of our early hits!), Detectron, Detectron 2, DensePose, and Seamless, and in one year, Meta transformed from a company people made fun of for its “metaverse” investments to one of the key players in the AI landscape and its stock has almost tripled since (about $830B in market value created in the past year).Why Open Source AIThe obvious question is why Meta would spend hundreds of millions on its AI efforts and then release them for free. Zuck has addressed this in public statements:But for Soumith, the motivation is even more personal:“I'm irrationally interested in open source. I think open source has that fundamental way to distribute opportunity in a way that is very powerful. Like, I grew up in India… And knowledge was very centralized, but I saw that evolution of knowledge slowly getting decentralized. And that ended up helping me learn quicker and faster for like zero dollars. And I think that was a strong reason why I ended up where I am. So like that, like the open source side of things, I always push regardless of like what I get paid for, like I think I would do that as a passion project on the side……I think at a fundamental level, the most beneficial value of open source is that you make the distribution to be very wide. It's just available with no friction and people can do transformative things in a way that's very accessible. Maybe it's open source, but it has a commercial license and I'm a student in India. I don't care about the license. I just don't even understand the license. But like the fact that I can use it and do something with it is very transformative to me……Like, okay, I again always go back to like I'm a student in India with no money. What is my accessibility to any of these closed source models? At some scale I have to pay money. That makes it a non-starter and stuff. And there's also the control issue: I strongly believe if you want human aligned AI, you want all humans to give feedback. And you want all humans to have access to that technology in the first place. And I actually have seen, living in New York, whenever I come to Silicon Valley, I see a different cultural bubble.We like the way Soumith put it last year: Closed AI “rate-limits against people's imaginations and needs”!What It Takes For Open Source AI to WinHowever Soumith doesn't think Open Source will simply win by popular demand. There is a tremendous coordination problem with the decentralized nature of the open source AI development right now: nobody is collecting the valuable human feedback in the way that OpenAI or Midjourney are doing.“Open source in general always has a coordination problem. If there's a vertically integrated provider with more resources, they will just be better coordinated than open source. And so now open source has to figure out how to have coordinated benefits. And the reason you want coordinated benefits is because these models are getting better based on human feedback. And if you see with open source models, like if you go to the /r/localllama subreddit, like there's so many variations of models that are being produced from, say, Nous research. I mean, like there's like so many variations built by so many people. And one common theme is they're all using these fine-tuning or human preferences datasets that are very limited and they're not sufficiently diverse. And you look at the other side, say front-ends like Oobabooga or like Hugging Chat or Ollama, they don't really have feedback buttons. All the people using all these front-ends, they probably want to give feedback, but there's no way for them to give feedback… So we're just losing all of this feedback. Maybe open source models are being as used as GPT is at this point in like all kinds of, in a very fragmented way, like in aggregate all the open source models together are probably being used as much as GPT is, maybe close to that. But the amount of feedback that is driving back into the open source ecosystem is like negligible, maybe less than 1% of like the usage. So I think like some, like the blueprint here I think is you'd want someone to create a sinkhole for the feedback… I think if we do that, if that actually happens, I think that probably has a real chance of the open source models having a runaway effect against OpenAI, I think like there's a clear chance we can take at truly winning open source.”If you're working on solving open source coordination, please get in touch!Show Notes* Soumith Chintala Twitter* History of PyTorch episode on Gradient Podcast* The Llama Ecosystem* Apple's MLX* Neural ODEs (Ordinary Differential Equations)* AlphaGo* LMSys arena* Dan Pink's "Drive"* Robotics projects:* Dobb-E* OK Robot* Yann LeCun* Yangqing Jia of Lepton AI* Ed Catmull* George Hotz on Latent Space* Chris Lattner on Latent Space* Guillaume Lample* Yannic Kilcher of OpenAssistant* LMSys* Alex Atallah of OpenRouter* Carlo Sferrazza's 3D tactile research* Alex Wiltschko of Osmo* Tangent by Alex Wiltschko* Lerrel Pinto - RoboticsTimestamps* [00:00:00] Introductions* [00:00:51] Extrinsic vs Intrinsic Success* [00:02:40] Importance of Open Source and Its Impact* [00:03:46] PyTorch vs TinyGrad* [00:08:33] Why PyTorch is the Switzerland of frameworks* [00:10:27] Modular's Mojo + PyTorch?* [00:13:32] PyTorch vs Apple's MLX* [00:16:27] FAIR / PyTorch Alumni* [00:18:50] How can AI inference providers differentiate?* [00:21:41] How to build good benchmarks and learnings from AnyScale's* [00:25:28] Most interesting unexplored ideas* [00:28:18] What people get wrong about synthetic data* [00:35:57] Meta AI's evolution* [00:38:42] How do you allocate 600,000 GPUs?* [00:42:05] Even the GPU Rich are GPU Poor* [00:47:31] Meta's MTIA silicon* [00:50:09] Why we need open source* [00:59:00] Open source's coordination problem for feedback gathering* [01:08:59] Beyond text generation* [01:15:37] Osmo and the Future of Smell Recognition TechnologyTranscriptAlessio [00:00:00]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO in residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI.Swyx [00:00:15]: Hey, and today we have in the studio Soumith Chintala, welcome.Soumith [00:00:17]: Thanks for having me.Swyx [00:00:18]: On one of your rare visits from New York where you live. You got your start in computer vision at NYU with Yann LeCun. That was a very fortuitous start. I was actually listening to your interview on the Gradient podcast. So if people want to know more about the history of Soumith, history of PyTorch, they can go to that podcast. We won't spend that much time there, but I just was marveling at your luck, or I don't know if it's your luck or your drive to find AI early and then find the right quality mentor because I guess Yan really sort of introduced you to that world.Soumith [00:00:51]: Yeah, I think you're talking about extrinsic success, right? A lot of people just have drive to do things that they think is fun, and a lot of those things might or might not be extrinsically perceived as good and successful. I think I just happened to like something that is now one of the coolest things in the world or whatever. But if I happen, the first thing I tried to become was a 3D VFX artist, and I was really interested in doing that, but I turned out to be very bad at it. So I ended up not doing that further. But even if I was good at that, whatever, and I ended up going down that path, I probably would have been equally happy. It's just like maybe like the perception of, oh, is this person successful or not might be different. I think like after a baseline, like your happiness is probably more correlated with your intrinsic stuff.Swyx [00:01:44]: Yes. I think Dan Pink has this book on drive that I often refer to about the power of intrinsic motivation versus extrinsic and how long extrinsic lasts. It's not very long at all. But anyway, now you are an investor in Runway, so in a way you're working on VFX. Yes.Soumith [00:02:01]: I mean, in a very convoluted way.Swyx [00:02:03]: It reminds me of Ed Catmull. I don't know if you guys know, but he actually tried to become an animator in his early years and failed or didn't get accepted by Disney and then went and created Pixar and then got bought by Disney and created Toy Story. So you joined Facebook in 2014 and eventually became a creator and maintainer of PyTorch. And there's this long story there you can refer to on the gradient. I think maybe people don't know that you also involved in more sort of hardware and cluster decision affair. And we can dive into more details there because we're all about hardware this month. Yeah. And then finally, I don't know what else, like what else should people know about you on a personal side or professional side?Soumith [00:02:40]: I think open source is definitely a big passion of mine and probably forms a little bit of my identity at this point. I'm irrationally interested in open source. I think open source has that fundamental way to distribute opportunity in a way that is very powerful. Like, I grew up in India. I didn't have internet for a while. In college, actually, I didn't have internet except for GPRS or whatever. And knowledge was very centralized, but I saw that evolution of knowledge slowly getting decentralized. And that ended up helping me learn quicker and faster for zero dollars. And I think that was a strong reason why I ended up where I am. So the open source side of things, I always push regardless of what I get paid for, like I think I would do that as a passion project on the side.Swyx [00:03:35]: Yeah, that's wonderful. Well, we'll talk about the challenges as well that open source has, open models versus closed models. Maybe you want to touch a little bit on PyTorch before we move on to the sort of Meta AI in general.PyTorch vs Tinygrad tradeoffsAlessio [00:03:46]: Yeah, we kind of touched on PyTorch in a lot of episodes. So we had George Hotz from TinyGrad. He called PyTorch a CISC and TinyGrad a RISC. I would love to get your thoughts on PyTorch design direction as far as, I know you talk a lot about kind of having a happy path to start with and then making complexity hidden away but then available to the end user. One of the things that George mentioned is I think you have like 250 primitive operators in PyTorch, I think TinyGrad is four. So how do you think about some of the learnings that maybe he's going to run into that you already had in the past seven, eight years almost of running PyTorch?Soumith [00:04:24]: Yeah, I think there's different models here, but I think it's two different models that people generally start with. Either they go like, I have a grand vision and I'm going to build a giant system that achieves this grand vision and maybe one is super feature complete or whatever. Or other people say they will get incrementally ambitious, right? And they say, oh, we'll start with something simple and then we'll slowly layer out complexity in a way that optimally applies Huffman coding or whatever. Like where the density of users are and what they're using, I would want to keep it in the easy, happy path and where the more niche advanced use cases, I'll still want people to try them, but they need to take additional frictional steps. George, I think just like we started with PyTorch, George started with the incrementally ambitious thing. I remember TinyGrad used to be, like we would be limited to a thousand lines of code and I think now it's at 5,000. So I think there is no real magic to which why PyTorch has the kind of complexity. I think it's probably partly necessitated and partly because we built with the technology available under us at that time, PyTorch is like 190,000 lines of code or something at this point. I think if you had to rewrite it, we would probably think about ways to rewrite it in a vastly simplified way for sure. But a lot of that complexity comes from the fact that in a very simple, explainable way, you have memory hierarchies. You have CPU has three levels of caches and then you have DRAM and SSD and then you have network. Similarly, GPU has several levels of memory and then you have different levels of network hierarchies, NVLink plus InfiniBand or Rocky or something like that, right? And the way the flops are available on your hardware, they are available in a certain way and your computation is in a certain way and you have to retrofit your computation onto both the memory hierarchy and like the flops available. When you're doing this, it is actually a fairly hard mathematical problem to do this setup, like you find the optimal thing. And finding the optimal thing is, what is optimal depends on the input variables themselves. So like, okay, what is the shape of your input tensors and what is the operation you're trying to do and various things like that. Finding that optimal configuration and writing it down in code is not the same for every input configuration you have. Like for example, just as the shape of the tensors change, let's say you have three input tensors into a Sparstar product or something like that. The shape of each of these input tensors will vastly change how you do this optimally placing this operation onto the hardware in a way that will get you maximal throughput. So a lot of our complexity comes from writing out hundreds of configurations for each single PyTorch operator and templatizing these things and symbolically generating the final CUDA code or CPU code. There's no way to avoid it because mathematically we haven't found symbolic ways to do this that also keep compile time near zero. You can write a very simple framework, but then you also should be willing to eat the long compile time. So if searching for that optimal performance at runtime, but that's the trade off. There's no, like, I don't think unless we have great breakthroughs George's vision is achievable, he should be thinking about a narrower problem such as I'm only going to make this for work for self-driving car connets or I'm only going to make this work for LLM transformers of the llama style. Like if you start narrowing the problem down, you can make a vastly simpler framework. But if you don't, if you need the generality to power all of the AI research that is happening and keep zero compile time and in all these other factors, I think it's not easy to avoid the complexity.Pytorch vs MojoAlessio [00:08:33]: That's interesting. And we kind of touched on this with Chris Lattner when he was on the podcast. If you think about frameworks, they have the model target. They have the hardware target. They have different things to think about. He mentioned when he was at Google, TensorFlow trying to be optimized to make TPUs go brr, you know, and go as fast. I think George is trying to make especially AMD stack be better than ROCm. How come PyTorch has been such as Switzerland versus just making Meta hardware go brr?Soumith [00:09:00]: First, Meta is not in the business of selling hardware. Meta is not in the business of cloud compute. The way Meta thinks about funding PyTorch is we're funding it because it's net good for Meta to fund PyTorch because PyTorch has become a standard and a big open source project. And generally it gives us a timeline edge. It gives us leverage and all that within our own work. So why is PyTorch more of a Switzerland rather than being opinionated? I think the way we think about it is not in terms of Switzerland or not. We actually the way we articulate it to all hardware vendors and software vendors and all who come to us being we want to build a backend in core for PyTorch and ship it by default is we just only look at our user side of things. Like if users are using a particular piece of hardware, then we want to support it. We very much don't want to king make the hardware side of things. So as the MacBooks have GPUs and as that stuff started getting increasingly interesting, we pushed Apple to push some engineers and work on the NPS support and we spend significant time from Meta funded engineers on that as well because a lot of people are using the Apple GPUs and there's demand. So we kind of mostly look at it from the demand side. We never look at it from like oh which hardware should we start taking opinions on.Swyx [00:10:27]: Is there a future in which, because Mojo or Modular Mojo is kind of a superset of Python, is there a future in which PyTorch might use Mojo features optionally?Soumith [00:10:36]: I think it depends on how well integrated it is into the Python ecosystem. So if Mojo is like a pip install and it's readily available and users feel like they can use Mojo so smoothly within their workflows in a way that just is low friction, we would definitely look into that. Like in the same way PyTorch now depends on Triton, OpenAI Triton, and we never had a conversation that was like huh, that's like a dependency. Should we just build a Triton of our own or should we use Triton? It almost doesn't, like those conversations don't really come up for us. The conversations are more well does Triton have 10,000 dependencies and is it hard to install? We almost don't look at these things from a strategic leverage point of view. We look at these things from a user experience point of view, like is it easy to install? Is it smoothly integrated and does it give enough benefits for us to start depending on it? If so, yeah, we should consider it. That's how we think about it.Swyx [00:11:37]: You're inclusive by default as long as it meets the minimum bar of, yeah, but like maybe I phrased it wrongly. Maybe it's more like what problems would you look to solve that you have right now?Soumith [00:11:48]: I think it depends on what problems Mojo will be useful at.Swyx [00:11:52]: Mainly a performance pitch, some amount of cross compiling pitch.Soumith [00:11:56]: Yeah, I think the performance pitch for Mojo was like, we're going to be performant even if you have a lot of custom stuff, you're going to write arbitrary custom things and we will be performant. And that value proposition is not clear to us from the PyTorch side to consider it for PyTorch. So PyTorch, it's actually not 250 operators, it's like a thousand operators. PyTorch exposes about a thousand operators and people kind of write their ideas in the thousand operators of PyTorch. Mojo is like, well, maybe it's okay to completely sidestep those thousand operators of PyTorch and just write it in a more natural form. Just write raw Python, write for loops or whatever, right? So from the consideration of how do we intersect PyTorch with Mojo, I can see one use case where you have custom stuff for some parts of your program, but mostly it's PyTorch. And so we can probably figure out how to make it easier for say Torch.compile to smoothly also consume Mojo subgraphs and like, you know, the interoperability being actually usable, that I think is valuable. But Mojo as a fundamental front end would be replacing PyTorch, not augmenting PyTorch. So in that sense, I don't see a synergy in more deeply integrating Mojo.Pytorch vs MLXSwyx [00:13:21]: So call out to Mojo whenever they have written something in Mojo and there's some performance related thing going on. And then since you mentioned Apple, what should people think of PyTorch versus MLX?Soumith [00:13:32]: I mean, MLX is early and I know the folks well, Ani used to work at FAIR and I used to chat with him all the time. He used to be based out of New York as well. The way I think about MLX is that MLX is specialized for Apple right now. It has a happy path because it's defined its product in a narrow way. At some point MLX either says we will only be supporting Apple and we will just focus on enabling, you know, there's a framework if you use your MacBook, but once you like go server side or whatever, that's not my problem and I don't care. For MLS, it enters like the server side set of things as well. Like one of these two things will happen, right? If the first thing will happen, like MLX's overall addressable market will be small, but it probably do well within that addressable market. If it enters the second phase, they're going to run into all the same complexities that we have to deal with. They will not have any magic wand and they will have more complex work to do. They probably wouldn't be able to move as fast.Swyx [00:14:44]: Like having to deal with distributed compute?Soumith [00:14:48]: Distributed, NVIDIA and AMD GPUs, like just like having a generalization of the concept of a backend, how they treat compilation with plus overheads. Right now they're deeply assumed like the whole NPS graph thing. So they need to think about all these additional things if they end up expanding onto the server side and they'll probably build something like PyTorch as well, right? Like eventually that's where it will land. And I think there they will kind of fail on the lack of differentiation. Like it wouldn't be obvious to people why they would want to use it.Swyx [00:15:24]: I mean, there are some cloud companies offering M1 and M2 chips on servers. I feel like it might be interesting for Apple to pursue that market, but it's not their core strength.Soumith [00:15:33]: Yeah. If Apple can figure out their interconnect story, maybe, like then it can become a thing.Swyx [00:15:40]: Honestly, that's more interesting than the cars. Yes.Soumith [00:15:43]: I think the moat that NVIDIA has right now, I feel is that they have the interconnect that no one else has, like AMD GPUs are pretty good. I'm sure there's various silicon that is not bad at all, but the interconnect, like NVLink is uniquely awesome. I'm sure the other hardware providers are working on it, but-Swyx [00:16:04]: I feel like when you say it's uniquely awesome, you have some appreciation of it that the rest of us don't. I mean, the rest of us just like, you know, we hear marketing lines, but what do you mean when you say NVIDIA is very good at networking? Obviously they made the acquisition maybe like 15 years ago.Soumith [00:16:15]: Just the bandwidth it offers and the latency it offers. I mean, TPUs also have a good interconnect, but you can't buy them. So you have to go to Google to use it.PyTorch MafiaAlessio [00:16:27]: Who are some of the other FAIR PyTorch alumni that are building cool companies? I know you have Fireworks AI, Lightning AI, Lepton, and Yangqing, you knew since college when he was building Coffee?Soumith [00:16:40]: Yeah, so Yangqing and I used to be framework rivals, PyTorch, I mean, we were all a very small close-knit community back then. Caffe, Torch, Theano, Chainer, Keras, various frameworks. I mean, it used to be more like 20 frameworks. I can't remember all the names. CCV by Liu Liu, who is also based out of SF. And I would actually like, you know, one of the ways it was interesting is you went into the framework guts and saw if someone wrote their own convolution kernel or they were just copying someone else's. There were four or five convolution kernels that were unique and interesting. There was one from this guy out of Russia, I forgot the name, but I remembered who was awesome enough to have written their own kernel. And at some point there, I built out these benchmarks called ConNet benchmarks. They're just benchmarking all the convolution kernels that are available at that time. It hilariously became big enough that at that time AI was getting important, but not important enough that industrial strength players came in to do these kinds of benchmarking and standardization. Like we have MLPerf today. So a lot of the startups were using ConNet benchmarks in their pitch decks as like, oh, you know, on ConNet benchmarks, this is how we fare, so you should fund us. I remember Nirvana actually was at the top of the pack because Scott Gray wrote amazingly fast convolution kernels at that time. Very interesting, but separate times. But to answer your question, Alessio, I think mainly Lepton, Fireworks are the two most obvious ones, but I'm sure the fingerprints are a lot wider. They're just people who worked within the PyTorch Cafe2 cohort of things and now end up at various other places.Swyx [00:18:50]: I think as a, both as an investor and a people looking to build on top of their services, it's a uncomfortable slash like, I don't know what I don't know pitch. Because I've met Yang Tsing and I've met Lin Chao. Yeah, I've met these folks and they're like, you know, we are deep in the PyTorch ecosystem and we serve billions of inferences a day or whatever at Facebook and now we can do it for you. And I'm like, okay, that's great. Like, what should I be wary of or cautious of when these things happen? Because I'm like, obviously this experience is extremely powerful and valuable. I just don't know what I don't know. Like, what should people know about like these sort of new inference as a service companies?Soumith [00:19:32]: I think at that point you would be investing in them for their expertise of one kind. So if they've been at a large company, but they've been doing amazing work, you would be thinking about it as what these people bring to the table is that they're really good at like GPU programming or understanding the complexity of serving models once it hits a certain scale. You know, various expertise like from the infra and AI and GPUs point of view. What you would obviously want to figure out is whether their understanding of the external markets is clear, whether they know and understand how to think about running a business, understanding how to be disciplined about making money or, you know, various things like that.Swyx [00:20:23]: Maybe I'll put it like, actually I will de-emphasize the investing bit and just more as a potential customer. Oh, okay. Like, it's more okay, you know, you have PyTorch gods, of course. Like, what else should I know?Soumith [00:20:37]: I mean, I would not care about who's building something. If I'm trying to be a customer, I would care about whether...Swyx [00:20:44]: Benchmarks.Soumith [00:20:44]: Yeah, I use it and it's usability and reliability and speed, right?Swyx [00:20:51]: Quality as well.Soumith [00:20:51]: Yeah, if someone from some random unknown place came to me and say, user stuff is great. Like, and I have the bandwidth, I probably will give it a shot. And if it turns out to be great, like I'll just use it.Benchmark dramaSwyx [00:21:07]: Okay, great. And then maybe one more thing about benchmarks, since we already brought it up and you brought up Confident Benchmarks. There was some recent drama around AnyScale. AnyScale released their own benchmarks and obviously they look great on their own benchmarks, but maybe didn't give the other... I feel there are two lines of criticism. One, which is they didn't test some apples for apples on the kind of endpoints that the other providers, that they are competitors with, on their benchmarks and that is due diligence baseline. And then the second would be more just optimizing for the right thing. You had some commentary on it. I'll just kind of let you riff.Soumith [00:21:41]: Yeah, I mean, in summary, basically my criticism of that was AnyScale built these benchmarks for end users to just understand what they should pick, right? And that's a very good thing to do. I think what they didn't do a good job of is give that end user a full understanding of what they should pick. Like they just gave them a very narrow slice of understanding. I think they just gave them latency numbers and that's not sufficient, right? You need to understand your total cost of ownership at some reasonable scale. Not oh, one API call is one cent, but a thousand API calls are 10 cents. Like people can misprice to cheat on those benchmarks. So you want to understand, okay, like how much is it going to cost me if I actually subscribe to you and do like a million API calls a month or something? And then you want to understand the latency and reliability, not just from one call you made, but an aggregate of calls you've made over several various times of the day and times of the week. And the nature of the workloads, is it just some generic single paragraph that you're sending that is cashable? Or is it like testing of real world workload? I think that kind of rigor, like in presenting that benchmark wasn't there. It was a much more narrow sliver of what should have been a good benchmark. That was my main criticism. And I'm pretty sure if before they released it, they showed it to their other stakeholders who would be caring about this benchmark because they are present in it, they would have easily just pointed out these gaps. And I think they didn't do that and they just released it. So I think those were the two main criticisms. I think they were fair and Robert took it well.Swyx [00:23:40]: And he took it very well. And we'll have him on at some point and we'll discuss it. But I think it's important for, I think the market being maturing enough that people start caring and competing on these kinds of things means that we need to establish what best practice is because otherwise everyone's going to play dirty.Soumith [00:23:55]: Yeah, absolutely. My view of the LLM inference market in general is that it's the laundromat model. Like the margins are going to drive down towards the bare minimum. It's going to be all kinds of arbitrage between how much you can get the hardware for and then how much you sell the API and how much latency your customers are willing to let go. You need to figure out how to squeeze your margins. Like what is your unique thing here? Like I think Together and Fireworks and all these people are trying to build some faster CUDA kernels and faster, you know, hardware kernels in general. But those modes only last for a month or two. These ideas quickly propagate.Swyx [00:24:38]: Even if they're not published?Soumith [00:24:39]: Even if they're not published, the idea space is small. So even if they're not published, the discovery rate is going to be pretty high. It's not like we're talking about a combinatorial thing that is really large. You're talking about Llama style LLM models. And we're going to beat those to death on a few different hardware SKUs, right? Like it's not even we have a huge diversity of hardware you're going to aim to run it on. Now when you have such a narrow problem and you have a lot of people working on it, the rate at which these ideas are going to get figured out is going to be pretty rapid.Swyx [00:25:15]: Is it a standard bag of tricks? Like the standard one that I know of is, you know, fusing operators and-Soumith [00:25:22]: Yeah, it's the standard bag of tricks on figuring out how to improve your memory bandwidth and all that, yeah.Alessio [00:25:28]: Any ideas instead of things that are not being beaten to death that people should be paying more attention to?Novel PyTorch ApplicationsSwyx [00:25:34]: One thing I was like, you know, you have a thousand operators, right? Like what's the most interesting usage of PyTorch that you're seeing maybe outside of this little bubble?Soumith [00:25:41]: So PyTorch, it's very interesting and scary at the same time, but basically it's used in a lot of exotic ways, like from the ML angle, what kind of models are being built? And you get all the way from state-based models and all of these things to stuff nth order differentiable models, like neural ODEs and stuff like that. I think there's one set of interestingness factor from the ML side of things. And then there's the other set of interesting factor from the applications point of view. It's used in Mars Rover simulations, to drug discovery, to Tesla cars. And there's a huge diversity of applications in which it is used. So in terms of the most interesting application side of things, I think I'm scared at how many interesting things that are also very critical and really important it is used in. I think the scariest was when I went to visit CERN at some point and they said they were using PyTorch and they were using GANs at the same time for particle physics research. And I was scared more about the fact that they were using GANs than they were using PyTorch, because at that time I was a researcher focusing on GANs. But the diversity is probably the most interesting. How many different things it is being used in. I think that's the most interesting to me from the applications perspective. From the models perspective, I think I've seen a lot of them. Like the really interesting ones to me are where we're starting to combine search and symbolic stuff with differentiable models, like the whole AlphaGo style models is one example. And then I think we're attempting to do it for LLMs as well, with various reward models and search. I mean, I don't think PyTorch is being used in this, but the whole alpha geometry thing was interesting because again, it's an example of combining the symbolic models with the gradient based ones. But there are stuff like alpha geometry that PyTorch is used at, especially when you intersect biology and chemistry with ML. In those areas, you want stronger guarantees on the output. So yeah, maybe from the ML side, those things to me are very interesting right now.Swyx [00:28:03]: Yeah. People are very excited about the alpha geometry thing. And it's kind of like, for me, it's theoretical. It's great. You can solve some Olympia questions. I'm not sure how to make that bridge over into the real world applications, but I'm sure people smarter than me will figure it out.Synthetic Data vs Symbolic ModelsSoumith [00:28:18]: Let me give you an example of it. You know how the whole thing about synthetic data will be the next rage in LLMs is a thing?Swyx [00:28:27]: Already is a rage.Soumith [00:28:28]: Which I think is fairly misplaced in how people perceive it. People think synthetic data is some kind of magic wand that you wave and it's going to be amazing. Synthetic data is useful in neural networks right now because we as humans have figured out a bunch of symbolic models of the world or made up certain symbolic models because of human innate biases. So we've figured out how to ground particle physics in a 30 parameter model. And it's just very hard to compute as in it takes a lot of flops to compute, but it only has 30 parameters or so. I mean, I'm not a physics expert, but it's a very low rank model. We built mathematics as a field that basically is very low rank. Language, a deep understanding of language, like the whole syntactic parse trees and just understanding how language can be broken down and into a formal symbolism is something that we figured out. So we basically as humans have accumulated all this knowledge on these subjects, either synthetic, we created those subjects in our heads, or we grounded some real world phenomenon into a set of symbols. But we haven't figured out how to teach neural networks symbolic world models directly. The only way we have to teach them is generating a bunch of inputs and outputs and gradient dissenting over them. So in areas where we have the symbolic models and we need to teach all the knowledge we have that is better encoded in the symbolic models, what we're doing is we're generating a bunch of synthetic data, a bunch of input output pairs, and then giving that to the neural network and asking it to learn the same thing that we already have a better low rank model of in gradient descent in a much more over-parameterized way. Outside of this, like where we don't have good symbolic models, like synthetic data obviously doesn't make any sense. So synthetic data is not a magic wand where it'll work in all cases in every case or whatever. It's just where we as humans already have good symbolic models off. We need to impart that knowledge to neural networks and we figured out the synthetic data is a vehicle to impart this knowledge to. So, but people, because maybe they don't know enough about synthetic data as a notion, but they hear, you know, the next wave of data revolution is synthetic data. They think it's some kind of magic where we just create a bunch of random data somehow. They don't think about how, and then they think that's just a revolution. And I think that's maybe a gap in understanding most people have in this hype cycle.Swyx [00:31:23]: Yeah, well, it's a relatively new concept, so. Oh, there's two more that I'll put in front of you and then you can see what you respond. One is, you know, I have this joke that it's, you know, it's only synthetic data if it's from the Mistral region of France, otherwise it's just a sparkling distillation, which is what news research is doing. Like they're distilling GPT-4 by creating synthetic data from GPT-4, creating mock textbooks inspired by Phi 2 and then fine tuning open source models like Llama. And so I don't know, I mean, I think that's, should we call that synthetic data? Should we call it something else? I don't know.Soumith [00:31:57]: Yeah, I mean, the outputs of LLMs, are they synthetic data? They probably are, but I think it depends on the goal you have. If your goal is you're creating synthetic data with the goal of trying to distill GPT-4's superiority into another model, I guess you can call it synthetic data, but it also feels like disingenuous because your goal is I need to copy the behavior of GPT-4 and-Swyx [00:32:25]: It's also not just behavior, but data set. So I've often thought of this as data set washing. Like you need one model at the top of the chain, you know, unnamed French company that has that, you know, makes a model that has all the data in it that we don't know where it's from, but it's open source, hey, and then we distill from that and it's great. To be fair, they also use larger models as judges for preference ranking, right? So that is, I think, a very, very accepted use of synthetic.Soumith [00:32:53]: Correct. I think it's a very interesting time where we don't really have good social models of what is acceptable depending on how many bits of information you use from someone else, right? It's like, okay, you use one bit. Is that okay? Yeah, let's accept it to be okay. Okay, what about if you use 20 bits? Is that okay? I don't know. What if you use 200 bits? I don't think we as society have ever been in this conundrum where we have to be like, where is the boundary of copyright or where is the boundary of socially accepted understanding of copying someone else? We haven't been tested this mathematically before,Swyx [00:33:38]: in my opinion. Whether it's transformative use. Yes. So yeah, I think this New York Times opening eye case is gonna go to the Supreme Court and we'll have to decide it because I think we never had to deal with it before. And then finally, for synthetic data, the thing that I'm personally exploring is solving this great stark paradigm difference between rag and fine tuning, where you can kind of create synthetic data off of your retrieved documents and then fine tune on that. That's kind of synthetic. All you need is variation or diversity of samples for you to fine tune on. And then you can fine tune new knowledge into your model. I don't know if you've seen that as a direction for synthetic data.Soumith [00:34:13]: I think you're basically trying to, what you're doing is you're saying, well, language, I know how to parametrize language to an extent. And I need to teach my model variations of this input data so that it's resilient or invariant to language uses of that data.Swyx [00:34:32]: Yeah, it doesn't overfit on the wrong source documents.Soumith [00:34:33]: So I think that's 100% synthetic. You understand, the key is you create variations of your documents and you know how to do that because you have a symbolic model or like some implicit symbolic model of language.Swyx [00:34:48]: Okay.Alessio [00:34:49]: Do you think the issue with symbolic models is just the architecture of the language models that we're building? I think maybe the thing that people grasp is the inability of transformers to deal with numbers because of the tokenizer. Is it a fundamental issue there too? And do you see alternative architectures that will be better with symbolic understanding?Soumith [00:35:09]: I am not sure if it's a fundamental issue or not. I think we just don't understand transformers enough. I don't even mean transformers as an architecture. I mean the use of transformers today, like combining the tokenizer and transformers and the dynamics of training, when you show math heavy questions versus not. I don't have a good calibration of whether I know the answer or not. I, you know, there's common criticisms that are, you know, transformers will just fail at X. But then when you scale them up to sufficient scale, they actually don't fail at that X. I think there's this entire subfield where they're trying to figure out these answers called like the science of deep learning or something. So we'll get to know more. I don't know the answer.Meta AI and Llama 2/3Swyx [00:35:57]: Got it. Let's touch a little bit on just Meta AI and you know, stuff that's going on there. Maybe, I don't know how deeply you're personally involved in it, but you're our first guest with Meta AI, which is really fantastic. And Llama 1 was, you know, you are such a believer in open source. Llama 1 was more or less the real breakthrough in open source AI. The most interesting thing for us covering on this, in this podcast was the death of Chinchilla, as people say. Any interesting insights there around the scaling models for open source models or smaller models or whatever that design decision was when you guys were doing it?Soumith [00:36:31]: So Llama 1 was Guillaume Lample and team. There was OPT before, which I think I'm also very proud of because we bridged the gap in understanding of how complex it is to train these models to the world. Like until then, no one really in gory detail published.Swyx [00:36:50]: The logs.Soumith [00:36:51]: Yeah. Like, why is it complex? And everyone says, oh, it's complex. But no one really talked about why it's complex. I think OPT was cool.Swyx [00:37:02]: I met Susan and she's very, very outspoken. Yeah.Soumith [00:37:05]: We probably, I think, didn't train it for long enough, right? That's kind of obvious in retrospect.Swyx [00:37:12]: For a 175B. Yeah. You trained it according to Chinchilla at the time or?Soumith [00:37:17]: I can't remember the details, but I think it's a commonly held belief at this point that if we trained OPT longer, it would actually end up being better. Llama 1, I think, was Guillaume Lample and team Guillaume is fantastic and went on to build Mistral. I wasn't too involved in that side of things. So I don't know what you're asking me, which is how did they think about scaling loss and all of that? Llama 2, I was more closely involved in. I helped them a reasonable amount with their infrastructure needs and stuff. And Llama 2, I think, was more like, let's get to the evolution. At that point, we kind of understood what we were missing from the industry's understanding of LLMs. And we needed more data and we needed more to train the models for longer. And we made, I think, a few tweaks to the architecture and we scaled up more. And that was Llama 2. I think Llama 2, you can think of it as after Guillaume left, the team kind of rebuilt their muscle around Llama 2. And Hugo, I think, who's the first author is fantastic. And I think he did play a reasonable big role in Llama 1 as well.Soumith [00:38:35]: And he overlaps between Llama 1 and 2. So in Llama 3, obviously, hopefully, it'll be awesome.Alessio [00:38:42]: Just one question on Llama 2, and then we'll try and fish Llama 3 spoilers out of you. In the Llama 2 paper, the loss curves of the 34 and 70B parameter, they still seem kind of steep. Like they could go lower. How, from an infrastructure level, how do you allocate resources? Could they have just gone longer or were you just, hey, this is all the GPUs that we can burn and let's just move on to Llama 3 and then make that one better?Soumith [00:39:07]: Instead of answering specifically about that Llama 2 situation or whatever, I'll tell you how we think about things. Generally, we're, I mean, Mark really is some numbers, right?Swyx [00:39:20]: So let's cite those things again. All I remember is like 600K GPUs.Soumith [00:39:24]: That is by the end of this year and 600K H100 equivalents. With 250K H100s, including all of our other GPU or accelerator stuff, it would be 600-and-something-K aggregate capacity.Swyx [00:39:38]: That's a lot of GPUs.Soumith [00:39:39]: We'll talk about that separately. But the way we think about it is we have a train of models, right? Llama 1, 2, 3, 4. And we have a bunch of GPUs. I don't think we're short of GPUs. Like-Swyx [00:39:54]: Yeah, no, I wouldn't say so. Yeah, so it's all a matter of time.Soumith [00:39:56]: I think time is the biggest bottleneck. It's like, when do you stop training the previous one and when do you start training the next one? And how do you make those decisions? The data, do you have net new data, better clean data for the next one in a way that it's not worth really focusing on the previous one? It's just a standard iterative product. You're like, when is the iPhone 1? When do you start working on iPhone 2? Where is the iPhone? And so on, right? So mostly the considerations are time and generation, rather than GPUs, in my opinion.Alessio [00:40:31]: So one of the things with the scaling loss, like Chinchilla is optimal to balance training and inference costs. I think at Meta's scale, you would rather pay a lot more maybe at training and then save on inference. How do you think about that from infrastructure perspective? I think in your tweet, you say you can try and guess on like how we're using these GPUs. Can you just give people a bit of understanding? It's like, because I've already seen a lot of VCs say, Llama 3 has been trained on 600,000 GPUs and that's obviously not true, I'm sure. How do you allocate between the research, FAIR and the Llama training, the inference on Instagram suggestions that get me to scroll, like AI-generated stickers on WhatsApp and all of that?Soumith [00:41:11]: Yeah, we haven't talked about any of this publicly, but as a broad stroke, it's like how we would allocate resources of any other kinds at any company. You run a VC portfolio, how do you allocate your investments between different companies or whatever? You kind of make various trade-offs and you kind of decide, should I invest in this project or this other project, or how much should I invest in this project? It's very much a zero sum of trade-offs. And it also comes into play, how are your clusters configured, like overall, what you can fit of what size and what cluster and so on. So broadly, there's no magic sauce here. I mean, I think the details would add more spice, but also wouldn't add more understanding. It's just gonna be like, oh, okay, I mean, this looks like they just think about this as I would normally do.Alessio [00:42:05]: So even the GPU rich run through the same struggles of having to decide where to allocate things.Soumith [00:42:11]: Yeah, I mean, at some point I forgot who said it, but you kind of fit your models to the amount of compute you have. If you don't have enough compute, you figure out how to make do with smaller models. But no one as of today, I think would feel like they have enough compute. I don't think I've heard any company within the AI space be like, oh yeah, like we feel like we have sufficient compute and we couldn't have done better. So that conversation, I don't think I've heard from any of my friends at other companies.EleutherSwyx [00:42:47]: Stella from Eleuther sometimes says that because she has a lot of donated compute. She's trying to put it to interesting uses, but for some reason she's decided to stop making large models.Soumith [00:42:57]: I mean, that's a cool, high conviction opinion that might pay out.Swyx [00:43:01]: Why?Soumith [00:43:02]: I mean, she's taking a path that most people don't care to take about in this climate and she probably will have very differentiated ideas. I mean, think about the correlation of ideas in AI right now. It's so bad, right? So everyone's fighting for the same pie. In some weird sense, that's partly why I don't really directly work on LLMs. I used to do image models and stuff and I actually stopped doing GANs because GANs were getting so hot that I didn't have any calibration of whether my work would be useful or not because, oh yeah, someone else did the same thing you did. It's like, there's so much to do, I don't understand why I need to fight for the same pie. So I think Stella's decision is very smart.Making BetsAlessio [00:43:53]: And how do you reconcile that with how we started the discussion about intrinsic versus extrinsic kind of like accomplishment or success? How should people think about that especially when they're doing a PhD or early in their career? I think in Europe, I walked through a lot of the posters and whatnot, there seems to be mode collapse in a way in the research, a lot of people working on the same things. Is it worth for a PhD to not take a bet on something that is maybe not as interesting just because of funding and visibility and whatnot? Or yeah, what suggestions would you give?Soumith [00:44:28]: I think there's a baseline level of compatibility you need to have with the field. Basically, you need to figure out if you will get paid enough to eat, right? Like whatever reasonable normal lifestyle you want to have as a baseline. So you at least have to pick a problem within the neighborhood of fundable. Like you wouldn't wanna be doing something so obscure that people are like, I don't know, like you can work on it.Swyx [00:44:59]: Would a limit on fundability, I'm just observing something like three months of compute, right? That's the top line, that's the like max that you can spend on any one project.Soumith [00:45:09]: But like, I think that's very ill specified, like how much compute, right? I think that the notion of fundability is broader. It's more like, hey, are these family of models within the acceptable set of, you're not crazy or something, right? Even something like neural or DS, which is a very boundary pushing thing or states-based models or whatever. Like all of these things I think are still in fundable territory. When you're talking about, I'm gonna do one of the neuromorphic models and then apply image classification to them or something, then it becomes a bit questionable. Again, it depends on your motivation. Maybe if you're a neuroscientist, it actually is feasible. But if you're an AI engineer, like the audience of these podcasts, then it's more questionable. The way I think about it is, you need to figure out how you can be in the baseline level of fundability just so that you can just live. And then after that, really focus on intrinsic motivation and depends on your strengths, like how you can play to your strengths and your interests at the same time. Like I try to look at a bunch of ideas that are interesting to me, but also try to play to my strengths. I'm not gonna go work on theoretical ML. I'm interested in it, but when I want to work on something like that, I try to partner with someone who is actually a good theoretical ML person and see if I actually have any value to provide. And if they think I do, then I come in. So I think you'd want to find that intersection of ideas you like, and that also play to your strengths. And I'd go from there. Everything else, like actually finding extrinsic success and all of that, I think is the way I think about it is like somewhat immaterial. When you're talking about building ecosystems and stuff, slightly different considerations come into play, but that's a different conversation.Swyx [00:47:06]: We're gonna pivot a little bit to just talking about open source AI. But one more thing I wanted to establish for Meta is this 600K number, just kind of rounding out the discussion, that's for all Meta. So including your own inference needs, right? It's not just about training.Soumith [00:47:19]: It's gonna be the number in our data centers for all of Meta, yeah.Swyx [00:47:23]: Yeah, so there's a decent amount of workload serving Facebook and Instagram and whatever. And then is there interest in like your own hardware?MTIASoumith [00:47:31]: We already talked about our own hardware. It's called MTIA. Our own silicon, I think we've even showed the standard photograph of you holding the chip that doesn't work. Like as in the chip that you basically just get like-Swyx [00:47:51]: As a test, right?Soumith [00:47:52]: Yeah, a test chip or whatever. So we are working on our silicon and we'll probably talk more about it when the time is right, but-Swyx [00:48:00]: Like what gaps do you have that the market doesn't offer?Soumith [00:48:04]: Okay, I mean, this is easy to answer. So basically, remember how I told you about there's this memory hierarchy and like sweet spots and all of that? Fundamentally, when you build a hardware, you make it general enough that a wide set of customers and a wide set of workloads can use it effectively while trying to get the maximum level of performance they can. The more specialized you make the chip, the more hardware efficient it's going to be, the more power efficient it's gonna be, the more easier it's going to be to find the software, like the kernel's right to just map that one or two workloads to that hardware and so on. So it's pretty well understood across the industry that if you have a sufficiently large volume, enough workload, you can specialize it and get some efficiency gains, like power gains and so on. So the way you can think about everyone building, every large company building silicon, I think a bunch of the other large companies are building their own silicon as well, is they, each large company has a sufficient enough set of verticalized workloads that can be specialized that have a pattern to them that say a more generic accelerator like an NVIDIA or an AMD GPU does not exploit. So there is some level of power efficiency that you're leaving on the table by not exploiting that. And you have sufficient scale and you have sufficient forecasted stability that those workloads will exist in the same form, that it's worth spending the time to build out a chip to exploit that sweet spot. Like obviously something like this is only useful if you hit a certain scale and that your forecasted prediction of those kind of workloads being in the same kind of specializable exploitable way is true. So yeah, that's why we're building our own chips.Swyx [00:50:08]: Awesome.Open Source AIAlessio [00:50:09]: Yeah, I know we've been talking a lot on a lot of different topics and going back to open source, you had a very good tweet. You said that a single company's closed source effort rate limits against people's imaginations and needs. How do you think about all the impact that some of the Meta AI work in open source has been doing and maybe directions of the whole open source AI space?Soumith [00:50:32]: Yeah, in general, I think first, I think it's worth talking about this in terms of open and not just open source, because like with the whole notion of model weights, no one even knows what source means for these things. But just for the discussion, when I say open source, you can assume it's just I'm talking about open. And then there's the whole notion of licensing and all that, commercial, non-commercial, commercial with clauses and all that. I think at a fundamental level, the most benefited value of open source is that you make the distribution to be very wide. It's just available with no friction and people can do transformative things in a way that's very accessible. Maybe it's open source, but it has a commercial license and I'm a student in India. I don't care about the license. I just don't even understand the license. But like the fact that I can use it and do something with it is very transformative to me. Like I got this thing in a very accessible way. And then it's various degrees, right? And then if it's open source, but it's actually a commercial license, then a lot of companies are gonna benefit from gaining value that they didn't previously have, that they maybe had to pay a closed source company for it. So open source is just a very interesting tool that you can use in various ways. So there's, again, two kinds of open source. One is some large company doing a lot of work and then open sourcing it. And that kind of effort is not really feasible by say a band of volunteers doing it the same way. So there's both a capital and operational expenditure that the large company just decided to ignore and give it away to the world for some benefits of some kind. They're not as tangible as direct revenue. So in that part, Meta has been doing incredibly good things. They fund a huge amount of the PyTorch development. They've open sourced Llama and those family of models and several other fairly transformative projects. FICE is one, Segment Anything, Detectron, Detectron 2. Dense Pose. I mean, it's-Swyx [00:52:52]: Seamless. Yeah, seamless.Soumith [00:52:53]: Like it's just the list is so long that we're not gonna cover. So I think Meta comes into that category where we spend a lot of CapEx and OpEx and we have a high talent density of great AI people and we open our stuff. And the thesis for that, I remember when FAIR was started, the common thing was like, wait, why would Meta wanna start a open AI lab? Like what exactly is a benefit from a commercial perspective? And for then the thesis was very simple. It was AI is currently rate limiting Meta's ability to do things. Our ability to build various product integrations, moderation, various other factors. Like AI was the limiting factor and we just wanted AI to advance more and we didn't care if the IP of the AI was uniquely in our possession or not. However the field advances, that accelerates Meta's ability to build a better product. So we just built an open AI lab and we said, if this helps accelerate the progress of AI, that's strictly great for us. But very easy, rational, right? Still the same to a large extent with the Llama stuff. And it's the same values, but the argument, it's a bit more nuanced. And then there's a second kind of open source, which is, oh, we built this project, nights and weekends and we're very smart people and we open sourced it and then we built a community around it. This is the Linux kernel and various software projects like that. So I think about open source, like both of these things being beneficial and both of these things being different. They're different and beneficial in their own ways. The second one is really useful when there's an active arbitrage to be done. If someone's not really looking at a particular space because it's not commercially viable or whatever, like a band of volunteers can just coordinate online and do something and then make that happen. And that's great.Open Source LLMsI wanna cover a little bit about open source LLMs maybe. So open source LLMs have been very interesting because I think we were trending towards an increase in open source in AI from 2010 all the way to 2017 or something. Like where more and more pressure within the community was to open source their stuff so that their methods and stuff get adopted. And then the LLMs revolution kind of took the opposite effect OpenAI stopped open sourcing their stuff and DeepMind kind of didn't, like all the other cloud and all these other providers, they didn't open source their stuff. And it was not good in the sense that first science done in isolation probably will just form its own bubble where people believe their own b******t or whatever. So there's that problem. And then there was the other problem which was the accessibility part. Like, okay, I again always go back to I'm a student in India with no money. What is my accessibility to any of these closers models? At some scale I have to pay money. That makes it a non-starter and stuff. And there's also the control thing. I strongly believe if you want human aligned stuff, you want all humans to give feedback. And you want all humans to have access to that technology in the first place. And I actually have seen, living in New York, whenever I come to Silicon Valley, I see a different cultural bubble. Like all the friends I hang out with talk about some random thing like Dyson Spheres or whatever, that's a thing. And most of the world doesn't know or care about any of this stuff. It's definitely a bubble and bubbles can form very easily. And when you make a lot of decisions because you're in a bubble, they're probably not globally optimal decisions. So I think open source, the distribution of open source powers a certain kind of non-falsifiability that I think is very important. I think on the open source models, like it's going great in the fact that LoRa I think came out of the necessity of open source models needing to be fine-tunable in some way. Yeah, and I think DPO also came out of the academic open source side of things. So do any of the closed source labs, did any of them already have LoRa or DPO internally? Maybe, but that does not advance humanity in any way. It advances some companies probability of doing the winner takes all that I talked about earlier in the podcast.Open Source and TrustI don't know, it just feels fundamentally good. Like when people try to, you know, people are like, well, what are the ways in which it is not okay? I find most of these arguments, and this might be a little controversial, but I find a lot of arguments based on whether closed source models are safer or open source models are safer very much related to what kind of culture they grew up in, what kind of society they grew up in. If they grew up in a society that they trusted, then I think they take the closed source argument. And if they grew up in a society that they couldn't trust, where the norm was that you didn't trust your government, obviously it's corrupt or whatever, then I think the open source argument is what they take. I think there's a deep connection to like people's innate biases from their childhood and their trust in society and governmental aspects that push them towards one opinion or the other. And I'm definitely in the camp of open source is definitely going to actually have better outcomes for society. Closed source to me just means that centralization of power, which, you know, is really hard to trust. So I think it's going well
Intel adds AI to Audacity! NVIDIA GSP lands in Linux Kernel 6.7, updates for 22 year old AMD GPUs, and two weeks with a Raspberry Pi 5 desktop.
► Thanks to Vanta for sponsoring today's video! Check out Vanta's 3 Minute Demo Video here: https://vanta.com/UFD Click our link vanta.com/ufd in the description below to watch Vanta's 3 Minute Demo Video. Big thanks to Vanta for sponsoring today's video. ► Check out today's hottest tech deals here: https://www.ufd.deals/ https://howl.me/cliOmM9u17L https://howl.me/cliOrpnhc4k 0:00 - Intro 00:12 - Apple Watch Ban Paused: https://tinyurl.com/yuxfueg8 https://tinyurl.com/yp6ay6cl 01:18 - Sponsor 02:19 - Google's Own Green Bubble Situation: https://tinyurl.com/yvrhkfmd 03:18 - Prime Video Is Paid Video: https://tinyurl.com/ymx4pb2h 04:26 - Goodbye WiiU & 3DS: https://tinyurl.com/ynh86h46 05:25 - UFD Deals: https://www.ufd.deals/ https://howl.me/cliOmM9u17L https://howl.me/cliOrpnhc4k 06:11 - NYT Suing OpenAI & Microsoft: https://tinyurl.com/ylbo2mfz 07:51 - Seasonic Beta Testing Cables: https://tinyurl.com/yv43qj8m https://tinyurl.com/ykdtjtle https://tinyurl.com/yp8tvv8d 08:53 - More Good AMD GPUs: https://tinyurl.com/yln6xaps https://tinyurl.com/yto3mrbt https://tinyurl.com/yq87vnb7 10:10 - Comment Response ► Follow me on Twitch - http://www.twitch.tv/ufdisciple ► Join Our Discord: https://discord.gg/GduJmEM ► Support Us on Floatplane: https://www.floatplane.com/channel/ufdtech ► Support Us on Patreon: https://www.patreon.com/UFDTech ► Twitter - http://www.twitter.com/UFDTech ► Facebook - http://www.facebook.com/ufdtech ► Instagram - http://www.instagram.com/ufd_tech ► Reddit - https://www.reddit.com/r/UFDTech/ Presenter: Brett Sticklemonster Videographer: Brett Sticklemonster Editor: Rikus Strauss Thumbnail Designer: Reece Hill
Valve announces OLED Steam Decks! 3DMark Steel Nomad is headed to Linux, TuxClocker gets support for AMD GPUs, a mysterious Half-Life 1 update, and the worst distribution for Linux gaming.
Timestamps: 0:00 no visible mold. it's fine 0:08 Microsoft completes Activision Blizzard deal 1:57 Best Buy ending physical media sales 3:28 RISC-V: don't block China 4:44 The Ridge Wallet 5:30 OnePlus Open Foldable next week 6:15 Sonos loses lawsuit, Google adds back features 7:10 Nvidia's Jensen Huang "most popular CEO" 7:53 U.S. v. Google emails 8:29 CS2 players with AMD GPUs get banned News Sources: https://lmg.gg/h98QD
Our 139th episode with a summary and discussion of last week's big AI news, now back with the usual hosts! Read out our text newsletter and comment on the podcast at https://lastweekin.ai/ Email us your questions and feedback at contact@lastweekin.ai Check out our sponsor, the SuperDataScience podcast. You can listen to SDS across all major podcasting platforms (e.g., Spotify, Apple Podcasts, Google Podcasts) plus there's a video version on YouTube. Timestamps + Links: (00:00) Intro / Banter (01:30) SuperDataScience podcast ad (03:07) Response to listener comments / corrections Tools & Apps(05:59) ChatGPT goes multimodal: now supports voice, image uploads (10:44) ChatGPT can now search the web in real time (13:30) Meet the A.I. Jane Austen: Meta Weaves A.I. Throughout Its Apps (18:26) Meta's new AI chatbot trained on public Facebook and Instagram posts (19:32) Windows 11's next big update is now available with Copilot, AI-powered Paint, and more (20:21) Adobe launches Photoshop's web version with Firefly-powered AI tools (21:32) YouTube Unveils Suite of AI-Powered Tools For Video Creators (23:20) Google is opening up its generative AI search experience to teenagers Applications & Business(25:00) AI startup Lamini bets future on AMD's Instinct GPUs (29:28) Amazon to Invest Up to $4 Billion in AI Startup Anthropic (36:24) China to Challenge ASML with a better technology than EUV (40:00) Intel says newest laptop chips, software will handle generative AI (43:03) Getty made an AI generator that only trained on its licensed images (46:02) AI Startup Writer Pens $100M Round (46:40) AI chip startup Kneron secures $49M investment (48:45) AI startup AlphaSense valued at $2.5 billion after latest funding round (51:01) A Silicon Valley Supergroup Is Coming Together to Create an A.I. Device (52:42) Microsoft Cloud hiring to "implement global small modular reactor and microreactor" strategy to power data centers (55:24) Google adds a switch for publishers to opt out of becoming AI training data Projects & Open Source(56:11) Mistral AI makes its first large language model free for everyone (01:01:12) Adept AI Labs Open-Sources Persimmon-8B: A Powerful Fully Permissively-Licensed Language Model with
Timestamps: 0:00 TikTok fined, opens store 1:39 No flagship AMD GPUs next gen, claims leaker 3:09 Lian Li files suit against Phanteks 4:13 UPDF 5:05 QUICK BITS INTRO 5:13 Google's iFixit partnership, can't fix Pixel watch 5:41 Chromebooks now get 10 years of updates 6:21 The HP Spectre Fold 7:04 Nintendo Switch 2 leaks 7:45 US Army wants more HoloLens News Sources: https://lmg.gg/Qs7KF
► Big thanks to Jawa for sponsoring today's video! Go to https://bit.ly/JawaUFDGPUSep23 to make your upcoming GPU upgrade cheaper by selling your old one directly to Jawa! ► Check out today's hottest tech deals here: https://www.ufd.deals/ https://howl.me/ckwUNiDxbqO https://howl.me/ckwUKYJFGFH https://howl.me/ckwUI03K1vu https://howl.me/cjXcf2qsa5F 0:00 - Intro 00:27 - Toyota Doesn't Have Enough Disk Space: https://bit.ly/4822jdb 01:31 - Sponsor 02:47 - New Android: https://bit.ly/3sKeYBv 04:08 - Google Leaks Pixel 8 Pro: https://engt.co/487qzL8 04:57 - UFD Deals: https://www.ufd.deals/ https://howl.me/ckwUNiDxbqO https://howl.me/ckwUKYJFGFH https://howl.me/ckwUI03K1vu https://howl.me/cjXcf2qsa5F 06:46 - Tesla Accessories: https://bit.ly/486OjyE https://bit.ly/3P7Iqsr 08:42 - AMD's New Tech: https://bit.ly/3P8yVcx https://bit.ly/3R5UnS8 09:40 - New AMD GPUs: https://howl.me/ckwWZ2u6vPk https://howl.me/ckwW3IuqxS0 https://bit.ly/3r1K1YO https://bit.ly/3L8uRHV 14:09 - Last Video Comment Response: https://youtu.be/nJn-Yv-Ktxs?si=ewwMK-Qd9E54Vx2b ► Follow me on Twitch - http://www.twitch.tv/ufdisciple ► Join Our Discord: https://discord.gg/GduJmEM ► Support Us on Floatplane: https://www.floatplane.com/channel/ufdtech ► Support Us on Patreon: https://www.patreon.com/UFDTech ► Twitter - http://www.twitter.com/ufdisciple ► Facebook - http://www.facebook.com/ufdtech ► Instagram - http://www.instagram.com/ufd_tech ► Reddit - https://www.reddit.com/r/UFDTech/ Presenter: Brett Sticklemonster Videographer: Brett Sticklemonster Editor: Catlin Stevenson Thumbnail Designer: Reece Hill
► Thanks to soundcore for sponsoring today's video! Turn Down the Crowd with soundcore Space One Noise Cancelling Headphones
Mike musste sich krankheitsbedingt entschuldigen (kann er tatsächlich) und dann gerade bei so einem Thema: Linus Tech Tips (Hauptkanal auf Youtube mit 15,4 Millionen Abonnenten) stehen unter enormer Kritik für mangelnde Sorgfalt bei Benchmarks mit teils hanebüchenen Ergebnissen, mehr als nur unprofessionellem Umgang mit leihweise zur Verfügung gestellten Test-Samples (ein Prototyp wurde versteigert), überarbeitete Angestellte. Dazu erhebt eine ehemalige Social Media Managerin massive Vorwürfe: toxisches Arbeitsumfeld, Sexismus, Belästigung, psychischer Druck. Etwas unterhaltsamer: Es gibt neue Leaks zu einem Windows-Handheld von Lenove, genannt "Legion Go". Aufgetauchte Produkt-Render zeigen abnehmbare Controller-Seitenteile wie bei einer Nintendo Switch. Meep und Mo fragen sich: Kann das wirklich echt sein? Und als Bonus: ein richtiger "Dom-Schott-Moment" im Podcast. Viel "Spaß" mit Folge 166! Sprecher: Meep, Mohammed Ali DadSchnitt: Michael KisterTitelbild: Mohammed Ali Dad Besucht unsim Discord https://discord.gg/SneNarVCBMauf Twitter https://twitter.com/technikquatschauf Bluesky https://bsky.app/profile/technikquatsch.bsky.socialauf Youtube https://www.youtube.com/@technikquatsch 00:00:00 Hochzeits- und Gamescom-Vorbereitungen 00:14:43 Europa Park: noch mehr kaputthttps://www.merkur.de/deutschland/baden-wuerttemberg/rust-freiburg-geruest-zusammengebrochen-verletzte-freizeitpark-unfall-europa-park-zr-92460713.html 00:25:49 Linus Tech Tipps unter Druck: mangelnder Sorgfalt und fehlende Qualitätssicherung, unerlaubter Verkauf von geliehenem Prototyp, zu hoher Workload und toxisches Arbeitsumfeldhttps://www.theverge.com/2023/8/16/23834190/linus-tech-tips-gamersnexus-madison-reeves-controversyGamers Nexus: "The Problem with Linus Tech Tips: Accuracy, Ethics, & Responsibility" https://www.youtube.com/watch?v=FGW3TPytTjchttps://www.reddit.com/r/LinusTechTips/comments/15r7odj/linus_response/Gamers Nexus: "HW News - Linus Tech Tips' Terrible Response, ESMC, & Starfield x AMD GPUs" https://www.youtube.com/watch?v=X3byz3txpsoLinus Tech Tips: "What do we do now?" https://www.youtube.com/watch?v=0cTpTMl8kFY (Monetarisierung inzwischen deaktiviert) 00:45:49 Der Fall "Madison": ehemalige Social Media Managerin berichtet von toxischem Arbeitsumfeld und Sexismushttps://threadreaderapp.com/thread/1691693740254228741.html 01:11:47 Lenovo Handheld "Legion Go": neue Bilder mit abnehmbaren "JoyCons"https://windowsreport.com/lenovo-legion-go-first-look/https://www.theverge.com/2023/8/16/23835191/lenovo-legion-go-leaked-images-steam-deck-competitorhttps://twitter.com/gamingonlinux/status/1692215873061204348 01:26:07 Verabschiedung
Mentioned videos: The Problem with Linus Tech Tips: Accuracy, Ethics, & Responsibility - https://youtu.be/FGW3TPytTjc HW News - Linus Tech Tips' Terrible Response, ESMC, & Starfield x AMD GPUs https://www.youtube.com/watch?v=X3byz3txpso You Guys are Mad. I Get it. - WAN Show February 3, 2023 - https://youtu.be/_aSFKdBm944?t=155 Is It Even Possible For Reaction Content To Be Ethical? - https://youtu.be/i9u_J6BMGJI The True Harm of Reaction Content - Featuring MoistCr1tikal - https://youtu.be/cqWNmXYC76A ✔Twitch Channel ~ https://www.twitch.tv/DarkViperAU ✔Main Channel ~ https://www.youtube.com/@DarkViperAU ✔Rambles Podcast ~ https://www.youtube.com/@ViperRambles ✔Livestreams Channel ~ https://www.youtube.com/@DarkViperAULivestreams ✔Clips Channel ~ https://www.youtube.com/@DarkViperAUClips ✔Merch ~ https://darkviperau-merch.myspreadshop.com/ ✔My Discord ~ https://discord.gg/darkviperau ✔DarkviperAU Subreddit ~ https://www.reddit.com/r/DarkViperAU ✔YouTube Membership ~ https://www.youtube.com/darkviperau/join ✔Patreon ~ https://www.patreon.com/DarkViperAU ✔Twitter ~ https://www.twitter.com/DarkViperAU ✔Instagram ~ https://www.instagram.com/DarkViperAU/ ✔TikTok ~ https://www.tiktok.com/@darkviperau ✔GTA 5 Speedrun Discord ~ https://discord.com/invite/zQt8wZg ✔Business Email ~ darkviperau@mgmtexe.com Timestamps: 00:00 - What happened? 02:20 - LTT uploading videos i 04:40 - My experience with growing a Youtube channel. 15:50 - Linus Tech Tips responded with an apology video. 16:30 - Allegations made by Madison on Twitter. 20:50 - People are mad about the LTT video apology. 25:55 - Did Linus talk about Madison when she quit the job at LMG? 26:20 - Ex-employee Colin gives his take on the claims made by Madison. 27:13 - Madison response to the leaked meeting after she quit.
Invites are going out for AI Engineer Summit! In the meantime, we have just announced our first Actually Open AI event with Brev.dev and Langchain, Aug 26 in our SF HQ (we'll record talks for those remote). See you soon (and join the Discord)!Special thanks to @nearcyan for helping us arrange this with the Eleuther team.This post was on the HN frontpage for 15 hours.As startups and even VCs hoard GPUs to attract talent, the one thing more valuable than GPUs is knowing how to use them (aka, make GPUs go brrrr).There is an incredible amount of tacit knowledge in the NLP community around training, and until Eleuther.ai came along you pretty much had to work at Google or Meta to gain that knowledge. This makes it hard for non-insiders to even do simple estimations around costing out projects - it is well known how to trade $ for GPU hours, but trading “$ for size of model” or “$ for quality of model” is less known and more valuable and full of opaque “it depends”. This is why rules of thumb for training are incredibly useful, because they cut through the noise and give you the simple 20% of knowledge that determines 80% of the outcome derived from hard earned experience.Today's guest, Quentin Anthony from EleutherAI, is one of the top researchers in high-performance deep learning. He's one of the co-authors of Transformers Math 101, which was one of the clearest articulations of training rules of thumb. We can think of no better way to dive into training math than to have Quentin run us through a masterclass on model weights, optimizer states, gradients, activations, and how they all impact memory requirements.The core equation you will need to know is the following:Where C is the compute requirements to train a model, P is the number of parameters, and D is the size of the training dataset in tokens. This is also equal to τ, the throughput of your machine measured in FLOPs (Actual FLOPs/GPU * # of GPUs), multiplied by T, the amount of time spent training the model.Taking Chinchilla scaling at face value, you can simplify this equation to be `C = 120(P^2)`.These laws are only true when 1000 GPUs for 1 hour costs the same as 1 GPU for 1000 hours, so it's not always that easy to make these assumptions especially when it comes to communication overhead. There's a lot more math to dive into here between training and inference, which you can listen to in the episode or read in the articles. The other interesting concept we covered is distributed training and strategies such as ZeRO and 3D parallelism. As these models have scaled, it's become impossible to fit everything in a single GPU for training and inference. We leave these advanced concepts to the end, but there's a lot of innovation happening around sharding of params, gradients, and optimizer states that you must know is happening in modern LLM training. If you have questions, you can join the Eleuther AI Discord or follow Quentin on Twitter. Show Notes* Transformers Math 101 Article* Eleuther.ai* GPT-NeoX 20B* BLOOM* Turing NLG* Mosaic* Oak Ridge & Frontier Supercomputer* Summit Supercomputer * Lawrence Livermore Lab* RWKV* Flash Attention * Stas BekmanTimestamps* [00:00:00] Quentin's background and work at Eleuther.ai* [00:03:14] Motivation behind writing the Transformers Math 101 article* [00:05:58] Key equation for calculating compute requirements (tau x T = 6 x P x D)* [00:10:00] Difference between theoretical and actual FLOPs* [00:12:42] Applying the equation to estimate compute for GPT-3 training* [00:14:08] Expecting 115+ teraflops/sec per A100 GPU as a baseline* [00:15:10] Tradeoffs between Nvidia and AMD GPUs for training* [00:18:50] Model precision (FP32, FP16, BF16 etc.) and impact on memory* [00:22:00] Benefits of model quantization even with unlimited memory* [00:23:44] KV cache memory overhead during inference* [00:26:08] How optimizer memory usage is calculated* [00:32:03] Components of total training memory (model, optimizer, gradients, activations)* [00:33:47] Activation recomputation to reduce memory overhead* [00:38:25] Sharded optimizers like ZeRO to distribute across GPUs* [00:40:23] Communication operations like scatter and gather in ZeRO* [00:41:33] Advanced 3D parallelism techniques (data, tensor, pipeline)* [00:43:55] Combining 3D parallelism and sharded optimizers* [00:45:43] Challenges with heterogeneous clusters for distribution* [00:47:58] Lightning RoundTranscriptionAlessio: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO in Residence at Decibel Partners, and I'm joined by my co-host Swyx, writer and editor of Latent Space. [00:00:20]Swyx: Hey, today we have a very special guest, Quentin Anthony from Eleuther.ai. The context for this episode is that we've been looking to cover Transformers math for a long time. And then one day in April, there's this blog post that comes out that literally is called Transformers Math 101 from Eleuther. And this is one of the most authoritative posts that I've ever seen. And I think basically on this podcast, we're trying to give people an intuition around what are the rules of thumb that are important in thinking about AI and reasoning by AI. And I don't think there's anyone more credible than the people at Eleuther or the people training actual large language models, especially on limited resources. So welcome, Quentin. [00:00:59]Quentin: Thank you. A little bit about myself is that I'm a PhD student at Ohio State University, starting my fifth year now, almost done. I started with Eleuther during the GPT-NeoX20B model. So they were getting started training that, they were having some problems scaling it. As we'll talk about, I'm sure today a lot, is that communication costs and synchronization and how do you scale up a model to hundreds of GPUs and make sure that things progress quickly is really difficult. That was really similar to my PhD work. So I jumped in and helped them on the 20B, getting that running smoothly. And then ever since then, just as new systems challenges arise, and as they move to high performance computing systems and distributed systems, I just sort of kept finding myself falling into projects and helping out there. So I've been at Eleuther for a little bit now, head engineer there now, and then finishing up my PhD and then, well, who knows where I'll go next. [00:01:48]Alessio: Awesome. What was the inspiration behind writing the article? Was it taking some of those learnings? Obviously Eleuther is one of the most open research places out there. Is it just part of the DNA there or any fun stories there? [00:02:00]Quentin: For the motivation for writing, you very frequently see in like the DL training space, like these Twitter posts by like, for example, like Stas Bekman at Hugging Face, you'll see like a Twitter post that's like, oh, we just found this magic number and everything is like 20% faster. He's super excited, but doesn't really understand what's going on. And the same thing for us, we very frequently find that a lot of people understand the theory or maybe the fundamentals of why like AI training or inference works, but no one knows like the nitty gritty details of like, how do you get inference to actually run correctly on your machine split across two GPUs or something like that. So we sort of had all of these notes that we had accumulated and we're sort of sharing among engineers within Eleuther and we thought, well, this would really help a lot of other people. It's not really maybe appropriate for like a paper, but for something like a blog post or technical report, this would actually maybe squeeze a lot of performance out of people's hardware they're already running on. So I guess there are a lot of projects in Eleuther that we're sort of trying to share notes with people in a way that typical institutions don't. They sort of live within that institution and then you go to a different institution and they do something very similar, but without the lessons of the previous. And it's because everyone's trying to do their own special sauce with their own stack. Whereas Eleuther, we don't really have that constraint and we can just share everything to everybody. [00:03:14]Swyx: Yeah, this is a level of openness that basically very few people actually embrace. One, it's an extra effort to write things down, of course, but two, it is secret sauce and so that not many people do it. And therefore, oftentimes the only way to learn this stuff is to actually work in one of the large model labs. And so you guys are doing a lot. The only other instance where I can think of where people actually open sourced their process was Facebook's OPT. What else is similar, like sort of trade knowledge, but not formal research knowledge? [00:03:45]Quentin: I would say Bloom. So the Hugging Face Bloom project in big science and all of that, that was very open. I'd say it's the same caliber, if not more detailed than OPT. Other than that, I think there was like a doc from Microsoft on like their Turing NLG. Their paper is pretty relaxed in that it did talk about some of those challenges. Other than like OPT and Bloom and us, I can't think of any. It's a new thing. [00:04:10]Swyx: It matters that you are going for the sort of good enough rules of thumb, because I think a lot of people try to go for precision and being overly precise actually is not helpful. Right. Yes. [00:04:20]Quentin: You'll see some like statements in the blog posts that are just like, we think this is about 1.2 in our experience. And, you know, we don't go any further into detail and it would take maybe an extra month for us to chase down every single little piece of memory. But instead, like getting good enough is still helpful to people. [00:04:36]Alessio: Let's jump into it. The first part of the article, and we'll put this in the show notes so people will be following along with the post. So we don't need to read every single equation and every footnote for it. [00:04:46]Swyx: Okay. [00:04:46]Alessio: But the core equation here is that not the cost of compute, but the compute required to turn a transformer model is roughly equal to tau times T, where like T is the, where tau is the hardware setup throughput that you have. So number of GPUs times the actual flops per GPU. And then T is the time spent. I think people can visualize that pretty easily. It's basically like how many GPUs do you have and how much do you let them run for? And the things that come to it that people have read before in the Chinchilla paper in a way, and the OpenAI scaling law is that you can then equal this to 6PD, where P is the number of parameters in the model and D is the size of the, of the dataset in tokens. So talk a little bit about how people should think about the two. I think a lot of times the focus is on tokens parameter ratio in the training dataset and people don't think as much about the actual flops per GPU, which you're going to mention later in the blog post too, in terms of how much you can get out. So how should people think about this when they're building a model and where should they go to this equation as they're starting to think about training their own transformer-based [00:05:58]Swyx: model? [00:05:58]Quentin: You touched a little bit on the fact that people usually start with the dataset. So you have some dataset that you want to train a model on. And then from there, from the 6PD, you should see, okay, I should have about six tokens per parameter. So that determines my model size thereabouts for Chinchilla Optimal. So since then we've seen that need more something like 20 or more than that to get a good quality model. But the next question that should be on your mind in terms of a systems perspective is how long is it going to take for this model to train and what kind of budget should I expect? So let's say I want some cloud instance for some amount of time and each of them will have some price attached to it. So that's where the throughput comes in. So now that you have this model, this number of parameters, you should map that to a transformer architecture and you should benchmark what throughput you get on your software stack for that type of model. So now you have your flops per second on a single GPU. And then given whatever parallelism scheme, which I'm sure we'll get into, like data parallelism or tensor parallelism or whatever else, how is that flops number going to scale to whatever number of GPUs? And then from there, you're going to get a time. And if you have a time, you have a cost. Those are like the business answers that you'll be able to get using this formula. That's why we sort of split it into the T and the throughput terms so that you can solve for one of them, which is usually get throughput, need time, and from time you get cost. In a nutshell, that's the answer. [00:07:19]Alessio: One thing that I noticed, you mentioned some of these laws are only true when a thousand GPUs for one hour cost the same as one GPU for a thousand hours, given that we have a shortage of the biggest GPUs out there. Any thoughts there on how people should prioritize this? [00:07:36]Quentin: Yeah, so I would say you should find what the minimum number of GPUs is to just fit your model first. The memory bottleneck is your biggest problem if you have a sizable model. If it's a small model, nobody cares. But most models that people care about will need to be split across multiple GPUs. So find the minimum number of GPUs to just fit your one instance of your model and then calculate how long that's going to take. If it's a reasonable amount of time, then you're done. If it takes too long, then you need to start worrying about having multiple instances of that model. I always feel like you should go with the minimum number of GPUs because the more number of GPUs that you have, the more likely it is for things to break. So I would say just find out what time is reasonable for you and then fit the number of GPUs to that and no more. Because people get greedy and they say, if I have twice the GPUs, I can get this done in half the time. And then you end up taking three times the time because everything is breaking every day. And that's when I am up at midnight trying to fix your model that's broken. [00:08:34]Swyx: We had a previous guest which has invested a lot in their framework for training these things. Would there not be an equivalent open source framework you guys would have made that would help with scaling up GPUs linearly like that? Or is this an oversimplification? [00:08:50]Quentin: Okay, yeah. So maybe I should step back. Both Mosaic and us have our own sort of software stack recipe that scales well, theoretically. But I'll get to that in a minute. Mosaic is all based off optimizer sharding. So it's based off ZeRO. So you basically perfectly split your model optimizer and your parameters and your gradients across all of the different GPUs. So your aggregate memory is number of parameters divided by number of GPUs. Same thing for optimizer and so on. Whereas we at Eleuther use a Megatron deep speed based library. And for that, it's a bit more complex. So the efficiency can be a little higher, but it's more prone to failure at the same [00:09:30]Swyx: time. [00:09:30]Quentin: So you kind of have to tune it. In both cases, getting back to like the practical case, you should be able to get linear speed up by adding more GPUs. The problem is that there are hardware failures. You tend to have problems with like maybe loss will overflow if you have too many GPUs or maybe one GPU will hang. You might have software issues. You might have synchronization issues. And that's why I'm saying practically that you should take the minimum number of GPUs that you have because those are the easier cases to debug. That make sense? [00:10:00]Swyx: Yeah. [00:10:00]Quentin: Any more detail on any specific point? [00:10:02]Swyx: Not particularly, just because we haven't actually had to debug those things. But I imagine basically there's a lot of return towards encoding these knowledge into software and not repeating it again. So it makes a ton of sense. I think Alessio had more questions before we move too far into high level, more questions on just the equation itself. I think we want to spend time on essentially, this is the central equation of figuring out compute requirements. Yeah. [00:10:25]Alessio: Another thing in it is that the computer is like the forward pass and like the backwards pass and forward is 2PD, backward is 4PD. Why it's to the ratio between the two? Can you explain that? Why is it two and four? [00:10:39]Quentin: Yeah. [00:10:40]Alessio: Why is it twice the amount? [00:10:42]Quentin: Oh, okay. Intuitively for forward pass, you're just moving, you're propagating forward the inputs through the layer. And then in the backward pass, you're doing something a little more complex than that. You're doing back propagation. And I don't think I can explain it intuitively enough to go into more detail on the exact [00:10:58]Swyx: numbers. Yeah. [00:10:58]Quentin: That's okay. [00:10:59]Swyx: I feel like you want to get out a whiteboard and start drawing like, you know. [00:11:02]Quentin: That's what I would normally do. [00:11:03]Swyx: Tangents and gradients. It's actually surprisingly low to do the back propagation. Honestly, that's one of the fundamental things I love about the math of deep learning so far that as I've explored it, which is, it's surprisingly efficient as compared to other, I guess, numerical methods you might be exposed to and, you know, college calculus. Yeah. [00:11:22]Alessio: And I think the other thing is that things sound simple, you know, when people go on Twitter and say, Oh, 20 is like the optimal ratio. And it's like, then it's like, well, why is that the number? And the answer is usually much, much harder, like what we're seeing right now. So I think it's a, it's a good reminder that the numbers are simple, like all the best and most popular, like math equations are like, so elegant. Obviously the proof behind that is, it's not that easy. That's always a good reminder. [00:11:52]Swyx: I want to put this equation to the test a little bit. We can do this from either GPT-3's perspective or GPT-NeoX, whatever you're more comfortable with. You have this distinction of actual flops versus theoretical flops. And a lot of times when people report the flops it took to train a model, like we just saw one in Lama 2 where the estimate is something that the amount of flops and that's, that's what we go with. So GPT-3 took a 3.14 times 10 to the power 23 flops. That is the theoretical flops. I want to get to a point where I can sort of work out if a number passes the smell test. And I wonder how to do that because I should be able to plug in this equation, right? I know that GPT-3 was trained on 300 billion tokens. I know the parameter size of 175. Is it, is it just like a 6 times 175 times 300? Like I haven't done the math, but what are the nuances here that you might want to call out? [00:12:42]Quentin: Theoretical flops is usually given from, you have a given set of hardware and this is what you expect your hardware to get. The problem is that in practice, full utilization, that's the key word, right? Because in practice, there are a lot of cases where like you're spending time waiting on data movement from like the GPU to CPU. Or for example, you might be waiting to synchronize across the different GPUs. So there's a lot of idle time basically that you're going to be spending during training. [00:13:05]Swyx: Smell tests. [00:13:06]Quentin: I don't know if I have a smell test myself, to be honest, like maybe I'll look at like what sort of flops, what you would expect on like an A100. There's sort of just an expected flops for a given GPU that everyone sort of knows what you should expect. So like for an A100, that number is somewhere between 100 and 180. T flops is what you would expect to see on an A100. For a V100, like an older GPU, it's something more like 40 to 30. So people sort of know, given the kernels that we're running for a deep learning, what sort of flops you expect. And then you sort of compare that to the theory, to the theoretical flops that people are reporting and see if that matches your expectations. [00:13:47]Swyx: Yeah. [00:13:47]Alessio: And in the article you mentioned for the A100, like if you're seeing below 115 teraflops a second, there's something wrong with your model or hardware. How did you get to 115? Is it just, you know, production observability and like you've seen over months and months and months that like that's the baseline or how do you come up with the numbers like that? Yeah. [00:14:08]Quentin: For a number like that, we basically, we compared a lot of different frameworks. So like I mentioned before, Mosaic has their own framework and we have our own framework. They all have their own flop counters too, right? And we saw across a bunch of different hardware configurations that if you tune things correctly, you should be getting above 115 in pretty much all cases. So like there are some cases where things are tuned poorly or your system is a little weird, but we've never been able to get a new system and not been able to get above [00:14:35]Swyx: 115. [00:14:35]Quentin: If something is below 115, you have something really wrong in your software. But that's really all it is, is just comparing across software stacks and hardware systems. [00:14:44]Alessio: What about different GPUs? We had George Hotz on the podcast and he talked about AMD cards and how in theory their flops should be much better than some Nvidia cards, but the reality is like the CUDA runtime makes up for it. How should people think about improving that? You know, like do you see, okay, the A100 is like 115 teraflops. I'd rather just stick with this than try and figure out all the kinks of like a better AMD card or any thoughts there? [00:15:10]Swyx: Right. [00:15:10]Quentin: Well, that's sort of touching on developer time, right? And which ends up being more expensive because at the end of the day, the AMD and Rockham software stack has a long way to go. I would say most things run there, not particularly efficiently, but you're going to have weird bugs that no one has encountered before. One of the big pluses of going with the Nvidia and PyTorch stack is that there are thousands of GitHub issues with everyone facing the same problem as you and resolving them quickly and in an open source way is probably the biggest benefit of going with the Nvidia software stack right now. AMD has about the same hardware, software, not so much. And they haven't quite got the momentum in the open source realm, for example, to get close. Like something, for example, like Flash Attention, it's spread to more Nvidia GPU types than it has like to AMD at all. And waiting on those latest and greatest features to reach AMD is something that's prohibitive to a lot of people, but it's getting there. I'm running a lot of experiments on AMD right now because it's sort of reached the government lab supercomputers now. And so a lot of experiments are going there and it will catch up, I'd say within a few [00:16:14]Swyx: years. [00:16:14]Quentin: Awesome. [00:16:15]Swyx: Maybe just talk about what's available from the government labs and I heard the original, the origin of Eluther started with a grant for TPUs. Is that right? [00:16:24]Quentin: Yes, that was a little before me, but there was a lot of just like getting a grabbing a Google Cloud or TPU pod or something like that is a lot of the original TPU work on Mesh TensorFlow, which is like now like an ancient distributed deep learning library. [00:16:36]Quentin: Eluther got a grant, an insight grant with Oak Ridge last year, and we got quite a bit of Summit Compute. So Summit is a V100 based supercomputer. It's got some weirdness to it. So there's six V100 GPUs per node. And we did a lot of experiments there. It's a challenging system to scale to because your interconnect across nodes is kind of slow in comparison to within a node, which I think we'll get to later. But now Oak Ridge has moved to AMD. So the next grant that we're trying to work towards is on Frontier, which has four AMD GPUs per node and again has a slower interconnect across nodes. So we get all of those new challenges again to try and overlap things. But that's just like you have Oak Ridge, you have Lawrence Livermore. There's a lot of government supercomputers that you can apply for compute towards like open researchers too. It's sort of a new thing. I think we're one of the first like us and like Lion, for example, is another organization that's getting compute from government providers and such. They're all moving to AMD as well. And we look forward to exploring that with them. [00:17:42]Swyx: Yeah. [00:17:43]Alessio: The computing is definitely, it used to be easy to find the GPU. Now, not as much. So you got to find them anywhere. [00:17:49]Swyx: Yes. [00:17:49]Alessio: Let's talk about memory requirements a little bit. So you touched on this a little bit before and just before this, we had a trade out on the pockets from FlashAttention and memory speed was one of our main focuses, but this time we're being bound by actually memory size, like the VRAM itself, when it comes to model weights and parameters and optimizer states and all that fun stuff. Let's go through this and Sean, we can, we can take turns. There's a lot to cover here, but maybe we can start from model weights. So one topic we covered a lot in the past is precision and quantization. That's one of the obviously main driver of memory. You mentioned most of, in the article, most transformers are mixed precision, like FP16 plus FP32 or BF16 FP32, and they can be cast down. And you mentioned up to like INT8 without a lot of performance hit. So let's start there and maybe run people through some of the maths and like the byte per parameter ratio and different precision. [00:18:50]Swyx: Sure. [00:18:51]Quentin: So when I started deep learning, it was all FP32. You have 32 bits, four bytes per parameter. Things were pretty simple. You didn't have to do any loss scaling at all. But the problem was that you didn't get a whole lot of flops once NVIDIA moved to V100s and introduced Tensor cores. So Tensor cores do all of their computation at FP16 precision. So you're kind of throwing all of those away if you're doing things in FP32. So once the hardware moved to V100, the software moved to like mixed precision and APEX and AMP and such. And one counterintuitive part of mixed precision is that you actually require more memory when you're trained because you need an FP16 copy of the weights and an FP32 copy of the weights. The FP16 copy is where you're doing like your actual computation on the Tensor cores. So you get maybe it's not uncommon to get double the throughput that you would see before in FP32. And then you at each step update that FP32 copy with the FP16 update. So both need to be stored in memory. The problem with that is that FP16 is very precise but doesn't have a whole lot of range, [00:19:55]Swyx: dynamic range. [00:19:55]Quentin: So you have a really big mantissa if you're thinking in terms of like floating point representations, not a whole lot of exponent. So BF16 puts more of the bits from the mantissa back to the exponent. So you have a much higher range and a lower precision. And that gets rid of all of this instability problem and loss scaling and such that anyone familiar with debugging knows how unstable it can be, especially for large scale training. And BF16 does away with a lot of that, but it's only supported on A100s. So you see the back and forth between hardware and software. So every time NVIDIA introduces some new Tensor cores or BF16 support or something like that, the software adapts to support it and then training adapts. And then now you mentioned like Ind8 and such. Now we're seeing that you have some model that's been trained in FP16, FP32, whatever else. And then now you want to, with minimal loss and accuracy, quantize that model into a smaller representation like Ind8 and now like Ind4 and things like that and see what you can get away with. And then since deep learning is such like a stochastic problem that a lot of those last bits of precision don't really matter is what we're finding. And I expect that to continue. [00:21:06]Alessio: And so just to put some numbers to it, when you have a FP32, you need four bytes per parameter at inference time to load it in memory. If you have a eight bits model quantized down, you need one byte per parameter. So for example, in an H100, which is 80 gigabyte of memory, you could fit a 70 billion parameters in eight, you cannot fit a FP32 because you will need like 280 gigabytes of memory. So how much does that play into it? Like you mentioned it was all FP32 when you first started. Is it just like a development complexity thing, like going down to FP16 and then Ind8? Or if they could get a GPU with like a terabyte of VRAM, will people just load this memory as like FP32 weights or would they still want to quantize them to make them more efficient? Right. [00:22:00]Quentin: I would say even if you had infinite VRAM, you would still want a quantized model, just a bigger model that's quantized is what I would say. And that's because like I was mentioning there at the end, how like deep learning is very stochastic and a lot, you could have all the precision in the world, but ultimately it's meaningless when you still depend so much like on what the input is. And you depend so much on little variations and maybe a few more samples of training data would matter more. A lot of that precision in a nutshell doesn't really matter in deep learning. All that matters is the big picture. What is that neuron actually saying? And not the tiny details of what it might be thinking. Oh, I also wanted to mention that even if you have an A100, the actual model size is quite a bit smaller that you could load than what you mentioned. That's because of the KV cache. So the KV cache intuitively during inference, it only matters during inference and think intuitively if you're writing a paragraph, you want to remember every single previous word that you've written before you write the next word. So like what is autoregressive language modeling? It's filling in the next word, the next token. So if I say like the dog went to the, and I need to write the next word, I would say park or something. Before I write the next word, my memory is wiped and I have to read the whole thing again. That is life without a KV cache. And a KV cache says, remember everything that I've generated before, as well as all the context before what I've generated. But the memory overhead for a KV cache commonly is either comparable or larger than the model in some cases, if you have a really long context. And I think the exact equation is something like, oh, it's like two times the number of layers, times the number of heads, times the dimension of each head. And then there's two of those. You have one for K, one for V. But that was just a quick aside. Yeah. [00:23:44]Alessio: I know this is Transformers math, but do you think one of the interesting things about RNNs too, it's like moving away from this, like KV cache, the scales with the sequence length and having like a fixed sequence pass. I know those are some of the things that people are working on. [00:24:00]Swyx: Yeah. [00:24:00]Quentin: So there's a paper that I was involved with called RWKV that I would recommend people read. It is answering this exact question. So how do you get Transformers quality without this quadratic attention overhead that Transformers requires? So it is interesting. I don't know if I can really dive too deep into the technical details there. I'd recommend people read the paper. But yeah. [00:24:23]Swyx: Yeah. [00:24:23]Alessio: It's interesting to see if attention is all you need, or maybe attention is all we need, but we need better ways to make it infer in a good way. [00:24:33]Swyx: We've actually done an unreleased episode with one of the RWKV core members and they call it soft attention or light attention. I forget what they call it, but yeah, just ways to approximate it such that it's linear and not quadratic. That's great. Yeah. [00:24:47]Quentin: I didn't know that you were involved. [00:24:48]Swyx: That's great. How did you get involved? Is it just because like everyone just hangs out in Discord and talks about the future of Transformers? Oh yeah. [00:24:55]Quentin: I mean, the RWKV people specifically are in Eleuther all the time. Like they're very close collaboration with us. And my contribution was we have all of these experiments done by all of these people on RNNs and how they relate to Transformers and how do we turn that into a paper and disseminate that digestibly so that people don't have to read through like a Discord log from a year ago to understand what's going on. [00:25:16]Swyx: Oh my God. [00:25:16]Quentin: Just read this paper. So that took some work, but I wasn't a core contributor. So that's why I don't want to go into like the technical details. But yeah, that's how I did. [00:25:24]Swyx: We'll try to get that RWKV episode out. It seems like there's increasing mentions of it and they are doing pretty important work as far as scaling these models are concerned. Okay. So we discussed inference type quantization and memory requirements. And then you also had a section on training with a lot of stuff I think mentioned. I think we probably want to spend the most of our time on optimizer states and the Atom optimizer. Yeah. What are your takes on it and what should people keep in mind when they deal with these optimizers? Okay. [00:25:57]Quentin: I would say the Atom optimizer is good at what it does. It's sort of a broad question. So let me think. You have the copy of the weights and then you have your momentum and your variance that [00:26:08]Swyx: you store. [00:26:08]Quentin: And like, okay, maybe an intuitive explanation for momentum is that like, let's say you have a canyon and you're trying to get to the bottom. And if you're just doing basic SGD, then every step is going to be an equal size. Whereas if you're using something like Atom with the momentum term, then your steps should be progressively larger because you can see, oh, the general trend is we're heading downwards very quickly. But stepping back from that, since you have all of these extra terms in Atom, you require a lot more memory to store it. Like three times as much memory as SGD. And if you have all of this memory being spent on your optimizer states, then how do you distribute it across GPUs? Because you'll find that what ends up being your bottleneck more than just raw compute, raw flops on a given GPU is your parallelism. And that falls back onto how much model you can fit on a single GPU before you need to split it up across a bunch of GPUs. And then you end up spending time, more time with them talking to each other than actually making progress. So that's why all of this time in the blog post is spent on how do you distribute your model? What are all those different distributed strategies look like? Which ones are more efficient? And given that a lot of your memory is being spent optimizers, how do you distribute that optimizer specifically? Because a lot of people, when they talk about parallelism, they talk about model parallelism, the parameters themselves. In actuality, when you're training, a good portion of your memory is actually spent on optimizer states. So what specific part of that would you like to go into? Would you like to go into like zero or sharded optimizers? [00:27:36]Swyx: I think the sharded optimizer stuff is really interesting, but I think we're kind of leaving that towards the end, right? Because that's the maybe more advanced distributed sections. Here, I think we're just going for rough intuition for people who've maybe are familiar with the ideas of these optimizers, but haven't actually had to implement them yet. They read your code, but they don't really understand the intuition behind the code. I see. [00:28:00]Alessio: And Quentin, when you say in the blog post, it says, Adam is magic. How much of it is like actual magic, even to like people like you that are pretty close to the metal, so to speak? Are some of these things just come as gospel? It's like, I know this works, like I'm not touching it. I'm just leveraging it. How much of it are you actually thinking about improving on in your day-to-day work? I see. [00:28:22]Quentin: So I'm a systems guy. I'm an engineer. And a lot of these things come to me as magic. Adam comes to me as magic. I see it from the gods. I say, this is how a deep learning model is trained. And this is how the next step is calculated. And then I say, okay, how do I make that fast? I would say I do look at ways to improve upon it using things like second order optimizers. So there's a lot of research on there because they're hard to distribute. But the core contribution for me always comes down to someone else has done like some deep learning optimization and I need to make it run fast. So I can't really speak to the motivation of why Adam came about other than like simple, intuitive things like I mentioned with like the momentum. But what matters to me is that Adam takes more memory than SGD, specifically three times. And all of that memory needs to go somewhere and it needs to be split efficiently. [00:29:14]Swyx: Yeah. [00:29:14]Alessio: So when you add them all up, you got 12 bytes per parameter with vanilla Adam. [00:29:20]Swyx: Yeah. [00:29:20]Alessio: And then you still get the model parameters and memory too. So as you mentioned, you need to keep a copy of both for like a FB32, FB16 mixed, a copy of both quantization levels. So there's precision levels. So it's six bytes per parameter. Right. [00:29:36]Quentin: Taking a step back again, is that like, okay, most people think of your model getting big. So you need to split with model parallelism purely, something like tensor parallelism. But we can see that the model only takes like two bytes per parameter if we're doing FB16. Whereas the optimizer itself requires four bytes per parameter for the model states, four bytes for momentum, four bytes for variance. So what matters more is how do you split your optimizer efficiently and how do you store it efficiently? And something like bits and bytes, where the optimizer, you got like eight bit Adam, where those optimizer states is only one byte per parameter instead of four or something like that. That is going to give you a much better return on your model training and on your memory overhead required than if you were to, for example, quantize your pure like FB16 model weights down to int8 or something. So for training specifically, your optimizer memory matters a lot. The most in most cases. [00:30:31]Swyx: Well, yeah. [00:30:31]Alessio: And before we dive into zero, just to wrap up the items that you're going to shard later. So you have the parameters, you have the optimizer states, and then you have the gradients. Just maybe touch a little bit on that. And then we can talk about how to efficiently load them in GPUs. [00:30:48]Quentin: So the parameters are the FP32 copies of the parameters. We include them in the optimizer discussion. Some people don't, but just for clarity, it's 12 bytes per param for the optimizer states and four of them are for that FP32 copy of the weights. Four of them are for the momentum. I already went into why it's important to store momentum, but that's also per parameter. You need to store where that parameter is going and where it's been going in the past. You also need to know, okay, we know where it's going, but there's going to be bumps on this canyon that we're going down. So we need to store its variance. How often are those bumps? Should we be focusing more on the momentum? Or is this parameter just kind of jumping around everywhere? Those are all important answers that we need the optimizer to store, and it's per parameter. So that's where all three of those terms come from. And we also include some competing bits and bytes, for example, an SGD to show that depending on your optimizer, you may store all or none of these and in different representations. [00:31:50]Alessio: I'm looking at the total training memory. You essentially have model memory, optimizer memory, gradient memory, and activation memory. I think that's one of the last discussed things. So maybe just give people a little bit of a view. [00:32:03]Swyx: Yeah, this is completely new to me. [00:32:05]Alessio: Active, you know, recomputation, checkpointing, and all of that. [00:32:08]Swyx: Right. [00:32:09]Quentin: So, okay. So to summarize before activation checkpointing, which will be complicated, you have your model params, like I mentioned before, they used to be FP32. Now they're probably BF16, maybe FP16 if it's an older GPU. Then you have your optimizer. That's where a lot of the memory is going. And it's your high precision, usually FP32, copy of the weights. So that's four bytes per param. And then you have, optionally, a couple more terms like we just discussed, like momentum or variance or whatever else, depending on what your optimizer is. Then you have your gradients. So your gradients is what is the gradient update that we get after running the forward pass on the model. And that's going to be whatever your low precision copy of the weights is. So like two bytes per param, if you're using FP16 or BF16. And all of those are sort of set in stone. And that overhead is not going to go away for the duration of training. Your gradients might get cleared after you back propagate them, but your optimizer states and your model states aren't going away. That memory overhead will be there. Activation recomputation and activation memory is dynamic. So some people will come and have this problem where the model loads fine for training. But then when you actually run your first iteration, or you run some future iteration or something like that, you run out of memory, seemingly at random. And it's because of these activations that you're computing on the fly. Good summary, or do you want to get into activation recomputation now, or do you want me to touch on anything else? [00:33:35]Alessio: Yeah, I was going to say, when is the recomputation happening? How does it decide between recomputing versus storing? And talk a bit more about that, maybe. [00:33:47]Quentin: Yeah, okay. So there's a lot of different ways to do this, but I would say there are a few main ones. First is a very simple scheme. You recompute everything. Every single activation that you calculate is just going to be either used or thrown away until the end. So in that case, you care very much about memory. You care very little about compute. Maybe this would be a case where you have to distribute across a lot of different GPUs, for example. And your communication speed is really low. Then that might be a good case for you to just recompute everything. It happens rarely, but it happens. Next up would be something like selective recomputation. So in selective recomputation, which Megatron has a good paper on, and I believe the figure that we have in our blog post is from, in that case, you sort of do a weighted decision for each activation. So for really big activation tensors, you decide, is this going to be more expensive to save in terms of memory or to recompute in terms of compute? So that's sort of the smart scheme that Megatron implements. And there's a lot of different heuristics they use. It's probably not worth mentioning off this super long equation on a pod, but you should go and read that paper if you're interested on selective recomputation. And then a really stupid scheme that most people go with, including NeoX, would be something like, instead of doing all of these heuristics, you just say, if my tensor is bigger than X, I throw it away. And you set X to some static number, and that's it. And that is good enough for a lot of cases. [00:35:18]Swyx: Why is it good enough? [00:35:20]Quentin: You don't want to store more than, you know, X-sized tensor. And some fall above that, some fall below it. And you're not trying to squeeze. You care more about getting something close enough to what the actual heuristic should be without actually computing the heuristic because you don't want to spend the time writing that heuristic code. [00:35:37]Swyx: Cool. I think that does take us on a grand tour of the memory math. Is there any sort of high-level takeaway before we go into the distributed stuff? Zero and all that. Perhaps more detail than most people have ever encountered. And so I'll repeat the equation that Alessio mentioned again, which is total training memory now has all these components that you've mapped out for the first time as far as we're concerned. Model memory, optimizer memory, activation memory, gradient memory. We covered quite a few algorithms as to the choices you can make there. Anything else that you want to mention about just memory math? I don't think so. [00:36:11]Quentin: I think that about covers it. I will say that it's a very different scheme for training and inference. It's common for people to say, oh, BF16 is the best. Done. Whereas a more correct take is that during training, precision matters a bit more. So BF16 will be around longer for training than it will for inference, in which case your model is sort of already baked. And it definitely doesn't need some of those last bits of precision so you can get away much easier with going to int8 for inference rather than training. So everything that you learn for training has to be relearned for inference and vice versa. [00:36:44]Swyx: There's a third category. You're talking about training versus inference. This third category is emerging with regards to fine-tuning and perhaps parameter-efficient methods of fine-tuning. The naive way to implement fine-tuning is just to do more training. But I don't know if you've developed any intuitions over fine-tuning that's worth inserting here. Any intuitions? If you were to write fine-tuning math, what would go in there? That might be an interesting diff to training math. [00:37:10]Quentin: I think there's a lot of questions that are unanswered for fine-tuning. For example, we know scaling laws for training. And some people have done scaling laws for fine-tuning. But how does a model that's already been trained on one domain transfer to another in terms of fine-tuning size? How many tokens per parameter should you have for your fine-tuning dataset? Maybe I'm ignorant, but I feel like a lot of those sort of practical questions on how a model can transfer and how a model can learn or grok some new ability that wasn't in its original training dataset is something that I would definitely put inside a fine-tuning blog post. [00:37:45]Swyx: Something related to perplexity and, I guess, diversity of the tokens that you get. [00:37:49]Quentin: Yeah, sort of dataset transfer is something that I would be curious in. Learning rate transfer is another one. So your model has some decayed learning rate over the course of training. How does that change for fine-tuning? Things like that. [00:38:00]Swyx: All right, cool. Thanks for indulging that stuff. Sure. Yeah. [00:38:03]Alessio: I think after all of this, you can quickly do the math and see that training needs to be distributed to actually work because we just don't have hardware that can easily run this. So let's talk a bit about that. So zero is one of the first things that you mentioned here, which is focused on sharded optimizers. Maybe run people through that and how to think about it. [00:38:25]Swyx: Sure. [00:38:25]Quentin: So zero is centered around two communication operations. And the first is scatter. And people should be looking at the zero figure that I think we have. [00:38:35]Swyx: Yeah. [00:38:36]Quentin: So there's a figure in the paper with parameters, gradients, and optimizer states that people should be looking at when I'm talking about this. Every GPU is going to get its own equal portion of the slice. And if we're doing... There are different stages of zero, but let's just start off with assuming that it's an equal slice of the optimizer states, gradients, and parameters. That would be zero three, stage three in that case. And we do that with a scatter. And the scatter takes, say, one over end GPUs, plus this offset of that slice goes to that GPU. Now all of the GPUs have an equal slice that's in its rank order. And then during each training step, that GPU is going to wait for all of the other slices to communicate so that we now have a whole pie on that GPU, that single GPU. Once we have that whole pie, we do the forward pass on it. And then we distribute that forward pass to all of the others using a gather. So it's a scatter, reduced scatter specifically, and then a gather back to all the others. And you do that each step. So the point of it is that you're sharding these states across GPUs. And with the different stages, you'll see in that figure that the optimizer state is taking the most proportion, which is because of what I mentioned before. We're including the FP32 copy and we're doing atom. So we need those four bytes per param for momentum and for variance. And then zero stage one, which is the most common one, is just optimizer. Zero stage two is optimizer plus gradients. And zero stage three is optimizer gradients and model parameters. But it all comes back to this splitting up and then gathering together back and forth over and over. So you get a lot of communication overhead from zero. But the plus part of that is that you can overlap a lot of that movement with computation. [00:40:23]Alessio: How do you get the optimal number of GPUs to do this on? Is there a way to shard too much as well and put too much overhead? [00:40:31]Quentin: It depends more on what your interconnect is. Taking a step back, there is synchronization that's required, a lot of it, across all of these GPUs. And those tend to be cumulative. So if you go to too many GPUs on an interconnect that's too slow, then you're going to end up spending more time synchronizing. And that magic number where you spend more time synchronizing is going to be different depending on what your fabric is and what your GPU memory is specifically. Just how small of a slice is each GPU getting? I can't, for example, for Summit, that number comes out to be about 20 billion parameters. Now you have 20 billion parameters, and then your magic number of GPUs for that is going to be something like 100 to 200 scale. Beyond that, you're just going to end up spending more time communicating. And the actual flops dipping below some predetermined number by you is going to be whatever your sweet spot ends up being. [00:41:24]Alessio: And then, so this one was like hard for me to go through, so I'm excited to have you run through it, which is a 3D parallelism. [00:41:33]Swyx: It's fancy, it's cutting edge. [00:41:35]Alessio: Yeah, let's talk a bit more about that and some of the work. [00:41:38]Quentin: Okay, 3D parallelism. So what is each dimension? First is the really basic one. That's data parallelism. And data parallelism is you have a copy of the model. Let's say for simplicity, one copy fits on one GPU perfectly. Data parallelism is that now you have two GPUs, so you have one copy on GPU one, one copy on GPU two. Both of them do the forward and backward pass and then synchronize and average the gradients. And then that's a step. Data parallelism for 3D parallelism is actually zero. So it's, you're sharding the optimizer states across all of your different GPUs. Next up is tensor parallelism. Tensor parallelism is you split your model. Like say, if you have two GPUs, you split your model down the middle and each GPU on its tensor specifically is going to do its forward or backward operation on its tensor. And then only when necessary, it'll synchronize that tensor operation with the other GPU. It's a bit more complex than something like pipeline parallelism, which is the third dimension. In pipeline parallelism, let's say you have four layers in your model. And you have four GPUs. You put one layer on each GPU and then GPU one does the forward pass and then sends the output of its activations to GPU two. It does the forward pass, sends activations to three, and you're just moving down a line. That is a naive scheme in that all of the other GPUs are doing nothing while a single GPU is doing its forward or backward pass. So the reason it's called pipeline parallelism is because you're splitting your mini batch into micro batches. So GPU one will do the forward pass on micro batch one and then send to GPU two. And then while GPU two is running on that first micro batch, GPU one is working on the next micro batch. And so you're sort of pipelining the movement and computation of each micro batch. The problem with that is that you need a really big batch size in order to split it up into both mini batches and micro batches. So combining all three of those together, you get a 3D mesh of where each parameter and optimizer state and so on maps to each GPU. And that's 3D parallelism. So let's start diving into details on what have that made sense, what should I jump into more on? [00:43:55]Alessio: I think the main question is, do you need all of the GPUs to be the same to do this? Or can you have mismatching GPUs as well? [00:44:03]Quentin: Okay, two things matter. If there's a difference in VRAM for the two different kinds of GPUs, then you're going to be bottlenecked by whichever GPU has the lower amount of VRAM because it's going to run out of memory. And then you can't like whatever's left on the larger GPUs is going to be empty. As far as I'm aware, there's no like GPU single GPU aware memory overhead scheme that would account for that. The second problem is that let's say all of your GPUs have the same amount of VRAM, but half of them are really slow. And the problem with that is that those synchronizations that I mentioned earlier are going to kill you. So you're going to move as quickly as your slowest GPU in that case. So in both cases, you end up regressing to your slowest or smallest GPU. So you might as well have the same GPUs for all of them. Otherwise, you're wasting the nicer ones. And that also goes to your CPUs and your interconnect. So going back to the 20 billion parameter model that Eleuther was training, that was on a cluster that was sort of Frankenstein made during COVID when there was all of that shortage of network switches and such like that. So every node had a different network switch. And so you ended up moving at the speed of the slowest switch and getting everything tuned properly so that it's not worse than the slowest switch was challenging and is like a real world problem that sometimes comes up. [00:45:28]Alessio: Is this work widely accepted? Like I hadn't learned about this before studying for this episode. Is this something that people are still trying and researching? Or is everybody just aware of this and running this in production? [00:45:43]Quentin: What is this specifically? [00:45:44]Alessio: Like the sharded optimizers plus the 3D parallelism, bringing the two things together and having this kind of mesh strategy. [00:45:51]Quentin: I would say that a lot of major GPT-based models use this scheme. A lot of them now are sort of going with just a pure zero scheme. So just a pure sharded. You just shard everything. And then since that's so easy, everyone gets an equal slice. There's no such thing as a pipeline stage. There's no such thing as what tensor should go on which GPU. Instead, we shard everything equally and treat everything equally. It's a much easier problem to debug, to checkpoint, to run training on than it is with this 3D parallel scheme. I say 3D parallel gives you the most control and also the most ways to go wrong. And depending on whether you have more engineers or whether you have more GPUs, that should decide which of these you go with. [00:46:35]Swyx: It's also not too hard, right? You've basically outlined the five or six different numbers that you need to keep in your head. And it doesn't feel impossible that if you need to achieve that level of control, you've given everybody the main levers to do it with. And that's wonderful. Definitely. [00:46:51]Quentin: The problem that comes up is like, say, like, okay, GPT-4 came out. Now we have VLLMs. [00:46:57]Swyx: Whoa, what are VLLMs? Oh, okay. Virtual LLMs, like the Metro of Expert things? No, like visual. [00:47:03]Quentin: So now you have like multimodal models and such. How do you distribute that? Do you distribute it in a pipeline stage? And do you just shard it? Do you split the tensor and make a tensor parallel? It's sort of hard to change your model and add new features and such when you have this 3D parallel scheme. That's when I say hard. I mean, it's hard to sort of adapt and modify it to new features. [00:47:26]Alessio: I know we're at the hour mark, and I think we put our listeners through a very intense class today. So this was great, Quentin. And we're going to definitely link the article so that people can read it and follow along. Any other research that you're working on in this space that you want to shout out? I know one of our usual, I mean, wrong question is, what's the most interesting unsolved question in AI? So curious to hear if you think it's still on the training inference, math optimization, or are there more areas that people should pay attention to? [00:47:58]Quentin: I think in my area of research, there are two things that I think people should really care about. And the first is multimodal parallelism and RLHF. You were seeing more and more reinforcement learning and coming into the training loop. And so how do you split that some model or some GPUs are working on inference and some GPUs are working on training? And like I mentioned before, you have to relearn everything and they have very unique challenges. How do you split up a KV cache during training, for example? Those are challenges that are not well studied, I don't think. And then multimodal, you have like maybe a vision transformer and a text transformer. How do you split those up? Do you split them up equally? Do you put them on separate GPUs or do you just shard everything? And just maybe one GPU will have some vision, some text parameters. And then the second case I would say is that communication is very often a bottleneck. So we talk about 3D parallelism, but a lot of those like, for example, tensor parallelism, you can't go across nodes with. You'll just get killed in communication. So what I'm getting to is how should you compress your communication before it happens? So on the fly compression, you have some buffer that needs to be communicated. You compress it with a GPU kernel, then you send it across the network and then you decompress it, something like that. Making people spend less money on communication fabrics and more on GPUs as intended is sort of a thing that people need to explore. I think those are my two. [00:49:26]Alessio: Sean, you went over the other half of the lightning round before we wrap it up. [00:49:30]Swyx: That's a good brain dump. Cool. Yeah, I have so many more questions on the multimodal stuff, but that should be for another time. Acceleration, what has already happened in AI that you thought would take much longer? [00:49:42]Quentin: I would say flash attention. Guys, just talk to Tree. And flash attention is just sort of a really great set of kernels that I thought would take a while to get to us. [00:49:51]Alessio: Well, Quentin, thank you very much, man. This was super informative and I think hopefully helps demystify a little bit the blog post. I think people open it and it's like a lot of math on it. And I think you walking them through it was super helpful. So thank you so much for coming on. [00:50:07]Swyx: Of course. [00:50:08]Quentin: And I'm happy to answer any questions that people have offline if they have them. I do read my email. [00:50:13]Swyx: Email and Discord. Of course, yeah. [00:50:15]Quentin: Discord I'm even faster on. [00:50:16]Alessio: Thank you, everyone. [00:50:18]Swyx: Thanks, Quentin. [00:50:19] Get full access to Latent Space at www.latent.space/subscribe
0:00 Sony takes Call of Duty deal 1:24 Who cares about the RTX 4060 Ti 16GB? 2:19 Secretlab 3:53 QUICK BITS 4:01 Corsair buys Drop, or part of it? 4:34 China funds GPU startup using CUDA 5:06 Elon Musk unveils xAI 5:45 EU mandates user-replaceable batteries 6:20 OLED screens repaired by laser beams News Sources: https://lmg.gg/MV5LP
We are now launching our dedicated new YouTube and Twitter! Any help in amplifying our podcast would be greatly appreciated, and of course, tell your friends! Notable followon discussions collected on Twitter, Reddit, Reddit, Reddit, HN, and HN. Please don't obsess too much over the GPT4 discussion as it is mostly rumor; we spent much more time on tinybox/tinygrad on which George is the foremost authority!We are excited to share the world's first interview with George Hotz on the tiny corp!If you don't know George, he was the first person to unlock the iPhone, jailbreak the PS3, went on to start Comma.ai, and briefly “interned” at the Elon Musk-run Twitter. Tinycorp is the company behind the deep learning framework tinygrad, as well as the recently announced tinybox, a new $15,000 “luxury AI computer” aimed at local model training and inference, aka your “personal compute cluster”:* 738 FP16 TFLOPS* 144 GB GPU RAM* 5.76 TB/s RAM bandwidth* 30 GB/s model load bandwidth (big llama loads in around 4 seconds)* AMD EPYC CPU* 1600W (one 120V outlet)* Runs 65B FP16 LLaMA out of the box (using tinygrad, subject to software development risks)(In the episode, we also talked about the future of the tinybox as the intelligence center of every home that will help run models, at-home robots, and more. Make sure to check the timestamps
0:00 heeey haha 0:14 RX 7900 XTX cards overheating 1:23 Kickstarter suspends AI art project 2:46 EU battery legislation 4:08 Vessi Footwear 4:44 QUICK BITS 4:50 Ampere RTX Titan renders 5:22 Artstation removing AI protest images 5:54 Meta to pay $725 million 6:20 France fines Microsoft 60M Euros 6:50 ElonJet returns News Sources: https://lmg.gg/5W54G
This week we talk about The Witcher 3's big upgrade, more details about Armored Core 6, the reception of the new AMD GPUs, a Starfield Q&A, Ubisoft's Star Wars playtesting, new Twitch features, listener questions, community poll results, and content updates. Be sure to check out the links below for ways to support this podcast, get awesome deals, and check out my other content!
Today, Codeplay Software announces expanding oneAPI for Nvidia and AMD GPUs, making multiarchitecture, multivendor programming much easier. This includes new plugins: oneAPI for Nvidia GPUs and oneAPI for AMD GPUs (beta). Software developers are increasingly looking to move away from proprietary interfaces and use open standards that are defined by the industry. This can allow them to target a broad set of platforms and architectures. oneAPI aims to bring this to reality by supporting the widest set of processors and accelerators. As a cross-industry initiative, oneAPI offers an open, standards-based, multiarchitecture programming model and specification. Having official plugins for Nvidia and AMD platforms is a major milestone for the oneAPI open community. The oneAPI Nvidia and AMD GPU plugins seamlessly work with the Intel oneAPI DPC++/C++ Compiler 2023 and many popular libraries, bringing open, standards-based programming to a range of processors from multiple hardware vendors. The oneAPI for Nvidia GPUs and oneAPI for AMD GPUs packages are available to download for free on the Codeplay website. Learn more: Intel oneAPI toolkits intel.com/content/www/us/en/developer/tools/oneapi/toolkits.html Codeplay developer.codeplay.com/home/ oneAPI for NVIDIA GPUs developer.codeplay.com/products/oneapi/nvidia/home/ oneAPI for AMD GPUs (beta) developer.codeplay.com/products/oneapi/amd/home/ James' blog on the Intel oneAPI 2023 Toolkit release intel.com/content/www/us/en/developer/articles/news/oneapi-2023.html Codeplay blog on the oneAPI for NVIDIA and AMD plugins codeplay.com/portal/blogs/2022/12/16/bringing-nvidia-and-amd-support-to-oneapi.html Guests: Gordon Brown – Codeplay oneAPI Product Manager James Reinders – Intel Evangelist
The new AMD GPUs are out and... the benchmarks are a mixed bag Sam Bankman-Fried has been arrested in the bahamas --- Send in a voice message: https://anchor.fm/earlybirbbriefing/message Support this podcast: https://anchor.fm/earlybirbbriefing/support
Greetings listeners and welcome to Tuesday Tech Byte where today I talked about Nvidia Intel and Amd GPUs. Enjoy the rest of your day and have a great Technically Unbothered Day! . Be part of the Facebook group https://www.facebook.com/TechnicallyUnbotheredThePodcast . . Share. Like. Subscribe. Review. Download. https://anchor.fm/joshua-nathan . . Find us on The CrossPods https://www.thecrosspods.com/games-hobb . . Catch me streaming almost every day! twitch.tv/22enigma7 https://www.youtube.com/channel/UCp5E3B51Qt0IxNt8DKZexhA?view_as=subscriber.
0:00 it's spatial 0:09 AMD GPUs getting a boost 1:35 Xbox Game Pass on Samsung TVs 2:40 Summer Game Fest 2022 4:03 Brilliant 4:47 QUICK BITS 4:52 'PACman' unpatchable M1 flaw 5:28 Windows 11 pushed to unsupported PCs 5:59 Insteon saved by its users 6:26 Symbiote Linux malware 6:51 Vielight light therapy for GAMERS News Sources: https://lmg.gg/GiYN8
Gba emulator: https://www.nintendolife.com/news/2022/04/rumour-here-are-the-supposed-gba-games-tested-for-switch-online-so-far In game ads free to play: https://www.gamesindustry.biz/articles/2022-04-18-microsoft-considering-in-game-ads-for-free-to-play-titles-report Immortal metaverse: https://nypost.com/2022/04/15/you-could-become-immortal-in-the-metaverse-with-new-live-forever-mode/ [Crosspost] Nvidia and AMD GPUs are returning to shelves and prices are finally falling https://www.theverge.com/2022/4/19/23031309/nvidia-amd-gpu-price-in-stock-retail-ebayLimited use: https://techcrunch.com/2022/04/15/china-further-limits-access-to-unauthorized-foreign-games/ --- This episode is sponsored by · Anchor: The easiest way to make a podcast. https://anchor.fm/app --- Send in a voice message: https://anchor.fm/wagwan-pod/message
The somewhat unlikely scenario of DF Direct Weekly making its way to a 49th instalment has somehow come to pass! Tom, Alex, Will and Audi are all consigned to Rich's basement this week, meaning that it's just John and Rich to tackle the big news of the week - Cyberpunk 2077's 1.5 update with next-gen features landing, Dying Light 2 receiving VRR unlocked frame-rate support on Xbox Series X and - yes - there's reaction to the reaction of Horizon Forbidden West's release. 00:00:00 Introductions 00:00:55 Cyberpunk Patch 1.5 is out 00:17:07 Dying Light Xbox VRR Mode 00:20:07 DF Bonus Supporter Q: Hey guys, did you have any chance to look at the Dying light 2 VRR mode for Xbox? Seems like it now runs at 80-100 fps on average in performance mode, as far as I can tell through LGs built in VRR info screen. Do you think DRS with a locked 60 would have been better instead? 00:22:28 Nintendo announces the death of 3DS and WiiU storefronts 00:27:52 Chrono Cross Remaster filters can be disabled 00:30:14 Final Vendetta - Bitmap Bureau's new game 00:31:52 Elden Ring Recommended Specs are bizarre 00:37:11 DF Content Discussion: Horizon Forbidden West 00:41:31 DF Bonus Supporter Q: SkillUp's review for Horizon Forbidden West showed significantly more noticeable pop-in than Digital Foundry's. Any idea why that might be? 00:45:16 DF Bonus Supporter Q: Do you think there is enough headroom for a 40 fps mode? 00:46:20 DF Bonus Supporter Q: Do you think AMD FSR would be the better choice for the performance mode of horizon forbidden west? 00:49:12 DF Bonus Supporter Q: What do you think about the fact that all Decima games we've seen so far only lightly utilize TAA? 00:50:30 DF Bonus Supporter Q: What graphical elements of Horizon Forbidden West could benefit from an eventual PC port? 00:56:49 DF Supporter Q1/2: Would it be possible to make a kind of generation comparison video? 01:01:00 DF Supporter Q3: Is there a setting on LG C1 that is suitable for 30 fps games? 01:03:21 DF Supporter Q4: What are some great hdr implementations in games to really convince somebody that hdr is as amazing as some people say it is? 01:06:13 DF Supporter Q5: Who would you recommend to buy AMD GPUs assuming the cards are available at MSRP? Do you think AMD can catch up with Nvidia in RT/ML and does the coming XeSS help AMD? 01:12:42 DF Supporter Q6: Do you think we're going to start seeing larger and larger gaps between 30fps quality modes and 60fps performance modes? And do you expect some games to drop their performance modes all together? 01:15:08 DF Supporter Q7: What do you think is responsible for the lack of four- and even two-Player split-screen multiplayer on modern consoles? 01:19:27 DF Supporter Q8: How do you decide which platform to buy a game on?
Wundertüten Folge #3 diZee's 24h rennen. Jim's Streaming PC. Pulsmesser für den Stream. diZee's Raspberry Pi Projekt. Windows mag keine AMD GPUs. Grafikkarten und Monitore. Apple vs. alles andere. Ist Technik Rohstoffverschwendung? diZee und Jim erzählen von ihren USA Trips. Osteuropa Talk.
#Bitcoin #RobertKiyosaki #cryptoI'd like to welcome everyone to my new PODCASTDave's Daily Crypto TakeIn this channel I will be providing you with news on a daily basis about cryptocurrency, bitcoin, blockchain, FIAT.My main purpose is to share UNBIASED news and updates. Ultimately I learn and hopefully you learn while I go on this journey.ARTICLES used in today's video:https://cointelegraph.com/news/here-s-why-pro-traders-expect-further-downside-from-ethereum-priceHere's why pro traders expect further downside from Ethereum priceDerivatives data shows that Ether (ETH) traders are feeling less bullish when compared to Bitcoin (BTC). Even though the altcoin captured a nearly 200% gain in the first half of 2021 versus Bitcoin's modest 22% price increase, traders seem to be more affected by Ether's recent underperformance.Institutional flow also backs the decreased optimism seen in Ether derivatives, as ETH investment vehicles suffered record outflows this past week while Bitcoin flows began to stabilize. According to data from CoinShares, Ether funds experienced a record outflow of $50 million this past week.https://www.pcgamer.com/cryptocurrency-miners-in-china-are-beginning-to-sell-off-gpus-for-cheap/Cryptocurrency miners in China are beginning to sell off GPUs for cheapCould the cryptocurrency boom be starting to ease off in China? Sales of graphics cards from mining groups seems to suggest it may well be. Not to speak too soon, but Taiwanese PC builders spotted heaps of GPUs up for grabs in China, cards likely once used to mine cryptocurrencies but seemingly no longer worth the hassle.According to a screenshot from a mining group posted to PTT, a popular Taiwanese bulletin board, miners in China are selling off a large number of Nvidia and AMD GPUs (via HKEPC). These include the latest 30-series or 6000-series, but also other older GPU generations, such as AMD's cheap and cheerful Polaris generationhttps://nypost.com/2021/06/30/cryptocurrency-investors-getting-younger-making-millions/?utm_source=flipboard&utm_medium=syndicated&utm_campaign=partnerfeedYoung investors are abandoning stocks for crypto — and making millionsIs cryptocurrency financial kryptonite? The younger generations don't think so.Gen Z and millennial investors are flocking to buy shares of cryptocoin, such as Bitcoin or Ethereum, with the hopes of winning big. Cryptocurrency — which is a type of digital file that can be used as money, based on blockchain technology — is not controlled by any centralized entity or government, and is nearly impossible to trace.https://ambcrypto.com/why-well-never-see-these-bitcoin-price-levels-again/Why ‘we'll never see these' Bitcoin price levels againBitcoin cannot be predicted is the one thing everyone knows. But can Mathematics actually turn that notion around? Popular analyst Crypto Total Ventures used the concept of Fibonacci also known as the golden spiral in studying the past movement of the king coin. This helped us figure out where, according to these tools, BTC could go and how soon could that be.https://bitcoinist.com/u-s-citizens-will-soon-be-able-to-buy-bitcoin-across-650-banks/U.S. Citizens Will Soon Be Able To Buy Bitcoin Across 650 BanksA new deal with NCR and NYDIG has opened up a way for banks across the U.S. to offer Bitcoin buying options to their customers. Forbes reported that the deal would enable community banks including North Carolina-based First Citizens Bank and credit unions including Bay Federal Credit Union in California would be able to offer their customers crypto trading. The trading would be carried out on mobile applications that would be built by the payments provider.The deal will make it so that banks can offer bitcoin trading options without having to worry about regulations surrounding bitcoin ownership. In short, the coins would be able to let their customers trade bitcoin without having to buy any coins themselves.https://www.entrepreneur.com/article/376213Robert Kiyosaki warns that the largest market crash in history is approachingThis past Tuesday, investor and author Robert Kiyosaki wrote on his Twitter account that it will soon be the biggest crash the market has seen in world history and that people should start preparing. The good news, he added, is that the best time to get rich is when there is an economic crash."The biggest bubble in world history is getting bigger. This is the biggest collapse in world history. I will buy more gold and silver. Bitcoin is expected to drop to $ 24,000. The best time to get rich is collapse." read his tweet in English.https://alternative.me/crypto/fear-and-greed-index/https://coinmarketcap.com/Please subscribe, like, and share so that more and more people can view this content.DISCLAIMER: I will never give any financial advice. And my channel is not considered official Financial Advice. Please do your research before purchasing any cryptocurrency.Thank you very muchDaveSupport this podcast at — https://redcircle.com/daves-daily-crypto-take/donations
This week Wes and Charlie talk some more Forza Horizon 4, League of Legends, and Company of Heroes 2. Stonefly, Operation: Tango, Astalon: Tears of Earth, Watch Dogs Legion: Legion of the Dead, and No Man's Sky Prisms update all came out this week. Nvidia's latest driver supports DLSS for Red Dead Redemption 2 and a few other titles. The Crowfall MMO is planned for release in July. Riot Games has also announced Valorant for mobile platforms. New laptops with both AMD GPUs and CPUs are hitting the streets and they look pretty good. This will be the last month for our podcast as Charlie gets set to head off to college. As always, thanks for listening and have a great week!
The automotive industry has delays. The tech industry has delays. Graphic cards are unfindable. Did you know these are all related? That's right, the whole world is scrambling for silicon semiconductors. Your phone has them, your car has them, your alarm clock, fancy doorbell and your new fridge probably too...and they're in short supply. We try to explain why. The Snyder cut has dropped but is it worth the hype? If you're interested in F1 and interested in dating advice, we may have found the right shows for you! News: Last of us series to have some of the dialogue taken from the game, sometime deviate greatly, season 1 will cover the first game - The Last of Us HBO Show: Season 1 Adapts the First Game, But Will 'Deviate Greatly' in Some Episodes - IGN Lego announced a Nasa collaboration and a new Space Shuttle and it looks AMAZING Lego announces its biggest and most detailed Space Shuttle set yet - The Verge Global tech silicon shortage: https://thetechcluster.com/global-silicon-shortages-could-threaten-iphone-screen-production/ https://www.msn.com/en-us/money/other/why-is-there-a-silicon-chip-shortage-3-factors-are-to-blame/ar-BB1dYdjW The street prices of Nvidia and AMD GPUs are utterly out of control - The Verge Main: Zack Snyder's Justice League - imdb Drive to Survive - Netflix Hello Ladies - Stephen Merchant - imdb/HBO Contact us: Twitter - Dez and Marco (@dezandmarco) Instagram - Dez and Marco (@dezandmarco) Email - dezandmarco@gmail.com Special Effects: www.zapsplat.com Music: www.bensound.com
4Nilay Patel and Dieter Bohn talk with Verge reporter Allison Johnson about her review of the One Plus 9 and how it compares with their 9 Pro and other Android flagship phones. Politics reporter Makena Kelly joins the show the discuss Congress’ first hearing of 2021 with the chief executives of Facebook, Google, and Twitter. Further reading: AstraZeneca COVID-19 vaccine 79 percent effective in US study US officials publicly question AstraZeneca’s COVID-19 vaccine data Biden’s COVID-19 vaccine website builds on a swine flu tool OnePlus 9 and 9 Pro announced with Hasselblad-branded cameras OnePlus 9 review: cheaper than the Pro and almost as good The lower-cost OnePlus 9R is official, and it sounds surprisingly strong OnePlus 9 Pro review: the elegant Android alternative Here’s how the OnePlus 9 and 9 Pro compare to Samsung and Apple’s flagships The $159 OnePlus Watch is OnePlus’ first smartwatch Angry MacBook owners get class action status for butterfly keyboard suit Intel invests $20 billion into new factories, will produce chips for other companies Intel Unleashed, Gelsinger on Intel, IDM 2.0 - Stratechery The startup trying to augment audio reality in public spaces Yes or no: Are these tech hearings doing anything? Mark Zuckerberg proposes limited 230 reforms ahead of congressional hearing Jack Dorsey is just trolling Congress with Twitter polls now Lina Khan is just the first step toward tougher US tech regulation Congress tries to get the FTC in fighting shape Microsoft rebrands Xbox Live to Xbox network The street prices of Nvidia and AMD GPUs are utterly out of control Learn more about your ad choices. Visit megaphone.fm/adchoices
This is episode 199 of the Mobile Tech Podcast with guests Dave Altavilla and Marco Chiappetta of HotHardware -- brought to you by Audible. In today's show, we share our first impressions of Samsung's Galaxy S21-series and Galaxy Buds Pro. We then dive into CES 2021 news from Samsung, Qualcomm, TCL, LG, OnePlus, Honor, HTC, BlackBerry, and Huawei (phew!), and briefly recap the rest of the show... Enjoy :)Episode Links- Donate: https://tnkgrl.com/tnkgrl/- Support the podcast with Audible: http://AudibleTrial.com/MobileTech- Dave Altavilla: https://twitter.com/DaveAltavilla- Marco Chiappetta: https://twitter.com/MarcoChiappetta- Samsung Galaxy S21-series: https://hothardware.com/news/samsung-galaxy-s21-launch-ces-2021-snapdragon-888- Samsung Galaxy Buds Pro: https://hothardware.com/news/samsungs-galaxy-buds-pro-premium-wireless-earbuds- Samsung Exynos 2100: https://hothardware.com/news/samsung-officially-unveils-exynos-2100-flagship-mobile-chip-to-battle-mighty-snapdragon-888- Samsung Exynos SoCs getting AMD GPUs: https://hothardware.com/news/samsung-2022-flagship-phones-socs-radeon-rdna-2-gpu- Qualcomm's new ultrasonic fingerprint sensor: https://hothardware.com/news/qualcomm-3d-sonic-sensor-gen-2- My Huawei P40 Pro+ review: https://geardiary.com/2021/01/15/huawei-p40-pro-review/- TLC 20-series phones: https://www.theverge.com/2021/1/11/22219101/tcl-20-5g-20-se-android-phone-price-specs- TCL Rollable devices: https://hothardware.com/news/tcl-shows-off-17-inch-scrollable-tablet-and-675-inch-rollable-phone-concepts- LG Rollable phone: https://hothardware.com/news/lg-rollable-smartphone-concept-ces-2021- Oppo Find X3 Pro leaks: https://www.gsmarena.com/oppo_find_x3_pro_leaks_in_officiallooking_renders_showcasing_unique_camera_hump-news-47175.php- OnePlus band: https://www.androidpolice.com/2021/01/11/oneplus-band-unveiled-with-spo2-tracking-and-two-week-battery-life/- Honor MagicBook Pro and Band 6: https://pocketnow.com/honor-announces-magicbook-pro-honor-band-6- HTC Desire 21 Pro 5G: https://www.engadget.com/htc-desire-21-pro-5g-205009669.html- BlackBerry sells smartphone IP to Huawei: https://crackberry.com/blackberry-sells-90-smartphone-patents-huawei- HotHardware's best of CES 2021: https://hothardware.com/news/hot-hardware-picks-best-ces-2021- ASUS' new ZenBook Duo models: https://hothardware.com/news/asus-laptop-and-aio-reveals-ces2021
Welcome to Hardware Addicts, a proud member of the Destination Linux Network. Hardware Addicts is the podcast that focuses on the physical components that powers our technology world. In this episode, we’re going to discuss why there is no hardware in stock anywhere and deliver you a brain filler on one of the alleged culprits GDDR Memory. Then we head to Camera Corner where Wendy discusses creating HDR images… So Sit back, Relax, and Plug In because Hardware Addicts Starts Now! Quick Links: Ryan = https://dasgeekcommunity.com Michael = https://tuxdigital.com Wendy = https://destinationlinux.network Want to Support the Show? Destination Linux Network Store = https://destinationlinux.network/store Want to follow the show and hosts on social media? You can find all of our social accounts at https://hardwareaddicts.org/hosts
This episode is superrr late (like two weeks) and that's my bad, had a lot going on and completely forgot but the show must go on! Let's talk about that AMD GPU announcement, Nvidia finally feeling the heat with the competition that AMD brought to the table. Only time will tell, but this competition is great for the market and great for consumers. We also had to jump into a serious topic...can anyone, and we mean anyone who is willing to work hard... can they become a pro at something through sheer effort alone? Well, we got some thoughts on that.Our Links:linktr.ee/gamingti
AMD GPUs change the game, get one if you can. There's a long performance history between AMD and Intel CPUs, Android keeps phoning home and we review a nice mid-ATX case from be quiet!
Apple M1 CPU is a beast actually, AMD GPUs are competitive again (with caveats), FAA clears 737 Max, Game and Watch thrifty 50, PS5 discussion, The Void is going under, Shower Speaker, James' Gameses
This week's episode is jam packed with sweet and crunchy gaming reviews and news! We chat in-game Halloween celebrations, Cyberpunk 2077 delayed again, new AMD GPUs and more. Finally, we get a glimpse into the next-gen of gaming with Leah as she details her experience going hands-on with the Xbox Series X and reviews Watch Dogs Legion. Crew Check-in Ali's ESO in-game Halloween Costume (00:03:47) Animal Crossing Treat or Treating (00:05:25) Simon manages to order a RTX 3080 (00:19:27) Xbox Series X unboxing (00:27:39) What is Everyone Playing? (00:40:34) Watch Dogs Legion Review (00:47:38) This Week's News (00:58:16) Halo Infinite director steps down (00:58:16) Cyberpunk 2077 delayed again (01:00:16) New AMD RX 6000 GPU series (01:05:50) 1UP is a video game sit-com that includes GamerGate (01:07:30) Outro and Wrap-up (01:18:03) DISCLAIMER: Watch Dogs Legion review code and Xbox Series X review unit provided by Ubisoft Canada and Microsoft Canada respectively. The opinions expressed in the article above have not been affected by, dictated or edited in any way by the provider. For more information please see Girls on Games' Code of Journalistic Ethics. --- Thanks for listening! The GoGCast comes out weekly so make sure to subscribe and you won't miss an episode. For more about us, Girls on Games, check out girlsongames.ca. Find our Merch at http://www.designbyhumans.com/shop/GirlsOnGames/ Buy us a Ko-Fi at https://ko-fi.com/girlsongames
Garrett, Adam, and Luke run through the Nintendo Direct Mini from earlier this week and give their impressions on it. The guys also discuss Cyberpunk 2077 getting delayed again as well as the new AMD GPUs. We now have our own merch! Check it out here: https://teespring.com/stores/gamesarefun Join our Discord server to take part in community discussion and answer the question of the week! https://discord.gg/sVxnEf5 Make sure to follow our Fantasy Critic League! https://www.fantasycritic.games/league/14d9fc30-4a79-4ef6-9c84-398051f9aa9f/2020 Social Links: https://www.facebook.com/gamesarefunpodcast https://twitter.com/gamesarefunpod Follow each of us on Twitter! Luke: https://twitter.com/lukealanarm Adam: https://twitter.com/adampalooza85 Garrett: https://twitter.com/lppanther Tune in every Tuesday to catch new episodes of Games Are Fun! contact@gamesarefunpodcast.com --- Support this podcast: https://anchor.fm/gamesarefun/support
Lots of speculations as usual, but we literally dissect the last 1 year of OnePlus and come to (what we think) is a pretty good theory of why Carl Pei might have left. Oh and Pakistan is unbanning* TikTok and Samsung is gonna put AMD GPUs in their smartphones in the near future. *T&C Check out our clips channel: http://bit.ly/UnengineeredclipsYT Drop your topic suggestions and ask us any questions related to this episode in the comments :) We have a WEBSITE!: https://www.unengineered.net/ Follow us! Instagram: https://www.instagram.com/theunengineeredpodcast/ Advait: https://www.instagram.com/lifeofadvait/ Atique: https://www.instagram.com/anteeqboiii/ Edit by: https://www.instagram.com/fotodada/ #getunengineered #unengineeredpod
Topics Tech - db8151dd – the mystery data breach of 22 million records - Apple Calls FBI Comments on Lack of Help Unlocking Florida Shooter's iPhone an 'Excuse to Weaken Encryption' - Capture an iPhone passcode – an extra feature of GrayKey box - Attacks via Bluetooth possible on iPhone 8 and older - Siri grading whistleblower says Apple should face consequences - Students are failing AP tests because the College Board can't handle iPhone photos - Apple and Google Launch COVID-19 Exposure Notification API, Over 20 Countries Have Requested and Received Access - TikTok rating plummets to 2.0 stars as an Indian YouTuber's fanbase orchestrates one-star campaign - Microsoft president admits the company was on 'the wrong side of history' with open source - Nvidia and AMD GPUs deliver 2.5x speed boost in new Premiere Pro patch - Twitter is trying out new settings for who's allowed to reply to your tweets Gaming - Ubisoft leveraging Stadia for WFH game development - People are extremely unhappy with Doom Eternal's kernel mode anti-cheat - Doom Eternal will ditch Denuvo Anti-Cheat in its next update - Sega, SNK and many others are putting together their own E3-esque conference next month - Amazon's free-to-play PvP shooter Crucible launches today on Steam = G2A admits it sold stolen game keys Software/Hardware Updates - watchOS 6.2.5 enables ECG and Irregular heart rhythm notifications for Apple Watch in Saudi Arabia - Apple releases iOS 13.5, here are the new features Deals - Civilization 6 is free on the Epic Games Store
With the ever-increasing need for efficiency and better price performance, a balanced data center infrastructure is a crucial component of today's cloud solutions. Organizations seeking to maximize the value of their cloud and on-premises infrastructures now have a choice when selecting their compute and GPU platforms. In this session, a representative from AMD discusses how AMD EPYC processors and AMD GPUs can help you optimize your workload performance and security while providing significant cost savings. This presentation is brought to you by AMD, an APN Partner.
In this week's episode of Weekly Download we talk about Borderlands 3, AMD GPUs possibly getting ray tracing support, and of course all of the other tech and PC gaming news for the week!Subscribe to my YouTube channel here: https://youtube.com/c/ZachsTechTurfConnect with me on Twitter and Instagram: @ZachsTechTurfShadow Streaming Service: https://shop.shadow.tech/usenDiscount Code: ZTT
In this week's episode of Weekly Download we talk about Borderlands 3, AMD GPUs possibly getting ray tracing support, and of course all of the other tech and PC gaming news for the week!Subscribe to my YouTube channel here: https://youtube.com/c/ZachsTechTurfConnect with me on Twitter and Instagram: @ZachsTechTurfShadow Streaming Service: https://shop.shadow.tech/usenDiscount Code: ZTT
This week: it’s all our favorite gadgets of 2018! Well, not all, but lots of them. We didn’t have time to get through them all but we tried real hard. Plus: Apple is finally gearing up to overhaul the AMD GPUs in their Mac lineup; and the legit way to purchase a base iMac Pro for just $3800. This episode supported by Opsgenie empowers Dev & Ops teams to plan for service disruptions and stay in control during incidents. It also gives teams the power to respond quickly and efficiently to unplanned issues. Visit Opsgenie.com to sign up to get a FREE company account and add up to 5 team members. CultCloth will keep your iPhone X, Apple Watch, Mac and iPad sparkling clean, and for a limited time use code CULTCAST at checkout to score a free CleanCloth with any order at CultCloth.co. Thanks to Kevin McLeod for the music you hear on today’s episode. On the show this week: @erfon / @lkahney / @lewiswallace Mojave update hints at new AMD Vega GPUs coming to Mac https://www.cultofmac.com/595740/mojave-new-amd-vega-gpu-mac/ Apple’s most recent macOS Mojave update suggests we’ll soon see new AMD Vega GPUs coming to the Mac. Apple just updated the GPUs in the Macbook Pro with Vega 16 and Vega 20 options, and now they look to be planning similar upgrades for other Macs in its lineup. Sonos Beam Sonos One Simply Safe security system See Sense Icon 2 bike lights Nintendo Switch Cal Digit TS3 Plus Thunderbolt 3 dock Apple Pencil 2 Apple Watch 4 "Refurbished" iMac Pro on eBay
We are baaaaaack! After 2 weeks off the we are back this week with the podcast talking about our two reviews this week, Intel killing off the "Extreme Edition" branding, No new AMD GPUs until February 2019, Battlefield V, and much more! The post ThinkComputers Podcast #141 (https://thinkcomputers.org/thinkcomputers-podcast-141/) appeared first on ThinkComputers.org (https://thinkcomputers.org) .
We'll cover OpenBSD's defensive approach to OS security, help you Understanding Syscall Conventions for Different Platforms, Mishandling SMTP Sender Verification, how the cd command works, and the LUA boot loader coming to FreeBSD. This episode was brought to you by Headlines Pledge: OpenBSD's defensive approach to OS Security (https://medium.com/@_neerajpal/pledge-openbsds-defensive-approach-for-os-security-86629ef779ce) The meaning of Pledge is same as in the real world, that is, “a solemn promise or undertaking”. So, in OpenBSD: Calling pledge in a program means to promise that the program will only use certain resources. How does it make a program more secure? It limits the operation of a program. Example: You wrote a program named ‘abc' that only needed the stdio to just print something to stdout. You added pledge to use only stdio and nothing else. Then, a malicious user found out that there is a vulnerability in your program which one can exploit and get into shell (or root shell). Exploiting your program to open a shell (or root shell) will result in the kernel killing the process with SIGABRT (which cannot be caught/ignored) and will generate a log (which you can find with dmesg). This happens because before executing other codes of your program, the code first pledges not to use anything other than stdio promise/operations. But, opening a shell or root shell will call several other system-calls which are distributed in lots of other promises like “stdio”, “proc”, “exec” etc. They are all forbidden because the program has already promised not to use any promises other than stdio. Pledge is not a system call filter. So, it is not used to restrict system calls. For example, pledge(“read”,NULL) ? wrong syntax of the pledge() pledge(“stdio inet”,NULL) ? correct syntax of the pledge() Pledge works on stdio, dns, inet, etc. promises but not directly on system calls like read, write, etc. And, unique functionality of pledge() is that it works on behavioral approach not just like 1:1 approach with the system calls. On 11 December 2017, Theo de Raadt said: List: openbsd-tech Subject: pledge execpromises From: Theo de Raadt Date: 2017–12–11 21:20:51 Message-ID: 6735.1513027251 () cvs ! openbsd ! org This will probably be committed in the next day or so. The 2nd argument of pledge() becomes execpromises, which is what will gets activated after execve. There is also a small new feature called “error”, which causes violating system calls to return -1 with ENOSYS rather than killing the process. This must be used with EXTREME CAUTION because libraries and programs are full of unchecked system calls. If you carry on past one of these failures, your program is in uncharted territory and risks of exploitation become high. “error” is being introduced for a different reason: The pre-exec process's expectation of what the post-exec process will do might mismatch, so “error” allows things like starting an editor which has no network access or maybe other restrictions in the future… Every Journey Starts with a FAIL...or Understanding Syscall Conventions for Different Platforms (http://k3research.outerhaven.de/posts/every-journey-starts-with-a-fail.html) Introduction Not long ago I started looking into FreeBSD kernel exploitation. There are only a few resources but probably the best starting point is argp's Phrack article from 2009[0]. And while he does only provide one technique, I wanted to understand it and port it to a modern FreeBSD release before describing new, own researched techniques. Well, at least this was my plan. In reality I ended researching how different operating systems resp. the same operating system but for different architectures implement syscalls. Hence, new exploiting methods have to wait for another post. In this one I want to describe my personal FAIL while porting argp's exploit example to a FreeBSD 11.1-RELEASE running on a 64bit processor. Maybe this will give other people interested in kernel stuff some insights they didn't know before. If you already know how syscalls work on 32bit and 64bit *BSD because you are an experienced exploit or kernel developer, you will probably want to search for something else to read. Moreover, some of the debugging stuff can look laborious because I wanted to show the steps I have done while attacking my problem instead of showing a simple walkthrough to the solution. The Problem argp described in his article vulnerable code consisting of a loadable kernel module which exposes a syscall to the userland. Because it was written around the time when FreeBSD 8-RELEASE came out and because he has written himself that the code needs smaller adjustments to work with this version (it was written for FreeBSD 7) I thought I will first port it to FreeBSD 11.1-RELEASE. Moreover it was written for an Intel 32bit processor architecture as we can see from his shellcode examples. Hence, I wanted to go right away the harder way and modify it to work on an 64bit processor. Why the Original Code Worked While It Was Wrong As written above, the syscall convention for the 32bit architecture is different from the one for the 64bit architecture. Indeed, a syscall on a 32bit FreeBSD system passes the arguments via the stack while the syscall offset is stored in the EAX register. The transfer into the kernel address space is done in 'cpufetchsyscall_args' in 'sys/i386/i386/trap.c'. ``` int cpufetchsyscallargs(struct thread *td, struct syscallargs *sa) { ... frame = td->td_frame; params = (caddr_t)frame->tf_esp + sizeof(int); sa->code = frame->tf_eax; ... if (params != NULL && sa->narg != 0) error = copyin(params, (caddr_t)sa->args, (u_int)(sa->narg * sizeof(int))); else ... } ``` That is, 'params' points to ESP+4 bytes offset. Later, the arguments are copied into the kernel space which is referenced by 'sa->args'. 'args' is an array of eight 'registert' which is defined as 'int32t' on the 32bit platform in comparison to the 64bit platform. And as 'struct args' only consisted of integers they got copied into the syscall arguments which are given to the trigger function inside the kernel module. We could verify this by changing 'int op' to 'long long op' in the kernel module and in trigger.c. We get the following output: root@freebsd64:trigger/ # ./trigger 0x28414000 256 3 1 0x28414000 256 4294967295 2 root@freebsd64:trigger/ # To bring this to an end: argp's version only worked for his special choice of arguments and only on 32bit. On 32bit FreeBSD platforms the arguments are transferred into kernel space by 4 byte integers, hence it will only work for integers anyway. On 64bit FreeBSD platforms we have to use syscall(2) in the intended way. iXsystems New Disks! (https://www.ixsystems.com/blog/gdpr-countdown/) A Life Lesson in Mishandling SMTP Sender Verification (https://bsdly.blogspot.co.uk/2018/02/a-life-lesson-in-mishandling-smtp.html) It all started with one of those rare spam mails that got through. This one was hawking address lists, much like the ones I occasionally receive to addresses that I can not turn into spamtraps. The message was addressed to, of all things, root@skapet.bsdly.net. (The message with full headers has been preserved here for reference). Yes, that's right, they sent their spam to root@. And a quick peek at the headers revealed that like most of those attempts at hawking address lists for spamming that actually make it to a mailbox here, this one had been sent by an outlook.com customer. The problem with spam delivered via outlook.com is that you can't usefully blacklist the sending server, since the largish chunk of the world that uses some sort of Microsoft hosted email solution (Office365 and its ilk) have their usually legitimate mail delivered via the very same infrastructure. And since outlook.com is one of the mail providers that doesn't play well with greylisting (it spreads its retries across no less than 81 subnets (the output of 'echo outlook.com | doas smtpctl spf walk' is preserved here), it's fairly common practice to just whitelist all those networks and avoid the hassle of lost or delayed mail to and from Microsoft customers. I was going to just ignore this message too, but we've seen an increasing number of spammy outfits taking advantage of outlook.com's seeming right of way to innocent third parties' mail boxes. So I decided to try both to do my best at demoralizing this particular sender and alert outlook.com to their problem. I wrote a messsage (preserved here) with a Cc: to abuse@outlook.com where the meat is, ``` Ms Farell, The address root@skapet.bsdly.net has never been subscribed to any mailing list, for obvious reasons. Whoever sold you an address list with that address on it are criminals and you should at least demand your money back. Whoever handles abuse@outlook.com will appreciate the attachment, which is a copy of the message as it arrived here with all headers intact. Yours sincerely, Peter N. M. Hansteen ``` What happened next is quite amazing. If my analysis is correct, it may not be possible for senders who are not themselves outlook.com customers to actually reach the outlook.com abuse team. Any student or practitioner of SMTP mail delivery should know that SPF records should only happen on ingress, that is at the point where the mail traffic enters your infrastructure and the sender IP address is the original one. Leave the check for later when the message may have been forwarded, and you do not have sufficient data to perform the check. Whenever I encounter incredibly stupid and functionally destructive configuration errors like this I tend to believe they're down to simple incompetence and not malice. But this one has me wondering. If you essentially require incoming mail to include the contents of spf.outlook.com (currently no less than 81 subnets) as valid senders for the domain, you are essentially saying that only outlook.com customers are allowed to communicate. If that restriction is a result of a deliberate choice rather than a simple configuration error, the problem moves out of the technical sphere and could conceivably become a legal matter, depending on what outlook.com have specified in their contracts that they are selling to their customers. But let us assume that this is indeed a matter of simple bad luck or incompetence and that the solution is indeed technical. I would have liked to report this to whoever does technical things at that domain via email, but unfortunately there are indications that being their customer is a precondition for using that channel of communication to them. I hope they fix that, and soon. And then move on to terminating their spamming customers' contracts. The main lesson to be learned from this is that when you shop around for email service, please do yourself a favor and make an effort to ensure that your prospective providers actually understand how the modern-ish SMTP addons SPF, DKIM and DMARC actually work. Otherwise you may end up receiving more of the mail you don't want than what you do want, and your own mail may end up not being delivered as intended. News Roundup Running Salt Proxy Minions on OpenBSD (https://mirceaulinic.net/2018-02-14-openbsd-salt-proxy/) As I have previously attempted several times in the past, I am (finally) very close to switch to OpenBSD, a more stable and reliable operating system that I like. Before starting to make the actual change on both personal and work computer, I started testing some of the tools I'm currently using, and understand what are the expectations. In general I didn't encounter issues, or when I did, I found the answers in the documentation (which is really great), or various forums. I didn't find however any questions regarding Proxy Minions on OpenBSD which is why I thought it might be helpful to share my experience. Installation and Startup With these said, I started playing with Salt, and it was simple and straightforward. First step - install Salt: pkg_add salt. This will bring several ports for Python futures, ZeroMQ, or Tornado which are needed for Salt. After configuring the pillar_roots in the /etc/salt/master config file for the Master, I started up the master process using rcctl: Starting up the Proxy Minions The Salt package for OpenBSD comes with the rc file for salt-proxy as well, /etc/rc.d/salt_proxy While typically you run a single regular Minion on a given machine, it is very like that there are multiple Proxy processes. Additionally, the default Salt rc file has the following configuration for the salt-proxy daemon: Starting many Proxy Minions I have managed to startup a Proxy Minion, but what about many? Executing the three commands above for each and every device is tedious and cannot scale very well. I thus have figured the following way: Have a separate rc file per Proxy, each having the daemon instruction explicitly specifying its Minion ID Start the service (using the regular Minion that controls the machine where the Proxy processes are running) And the test Proxy Minion is then up (after accepting the key, i.e,, salt-key -a test) Extending the same to a (very) large number of Proxy Minions, you can easily manage the rc files and start the services using a Salt State executed on the regular Minion: Using the file.managed State function to generate the contents of the rc file for each Proxy, with its own Minion ID. Using the service.running State function start the service. These two steps would suffice to start an arbitrary number of Proxy Minions, and the command executed will always be the same regardless how many processes you aim to manage. Conclusions I am still a novice when it comes to OpenBSD, I have plenty to learn, but it looks like the transition will be much smoother than I expected. I am already looking forward to the handover, and - most importantly - I will no longer be using systemd. :-) LUA boot loader coming very soon (https://lists.freebsd.org/pipermail/freebsd-current/2018-February/068464.html) As you may know, the Lua (http://www.lua.org) boot loader has been in the works for some time. It started out life as a GSoC in 2014 by Pedro Souza mentored by Wojciech A. Koszek. Rui Paulo created a svn project branch to try to integrate it. I rebased that effort into a github branch which Pedro Arthur fixed up. Over the past year, I've been cleaning up the boot loader for other reasons, and found the time was ripe to start integrating this into the tree. However, those integration efforts have taken a while as my day-job work on the boot loader took priority. In the mean time, Ed Maste and the FreeBSD Foundation funded Zakary Nafziger to enhance the original GSoC Lua scripts to bring it closer to parity with the evolution of the FORTH menu system since the GSoC project started. I'm pleased to announce that all these threads of development have converged and I'll be pushing the FreeBSD Lua Loader later today. This loader uses Lua as its scripting language instead of FORTH. While co-existance is planned, the timeline for it is looking to be a few weeks and I didn't want to delay pushing this into the tree for that. To try the loader, you'll need to build WITHOUTFORTH=yes and WITHLOADERLUA=yes. Fortunately, you needn't do a full world to do this, you can do it in src/stand and install the result (be sure to have the options for both the build and the install). This will replace your current /boot/loader that is scripted with FORTH to one that's scripted with Lua. It will install the lua scripts in /boot/lua. The boot is scripted with /boot/lua/loader.lua instead of /boot/loader.rc. You are strongly advised to create a backup copy of /boot/loader before testing (eg cp /boot/loader /boot/loaderforth), since you'll need to boot that from boot2 if something goes wrong. I've tested it extensively, though, with userboot.so and it's test program, so all the initial kinks of finding the lua scripts, etc have been worked out. While it's possible to build all the /boot/loader variants with Lua, I've just tested a BIOS booting /boot/loader both with and without menus enabled. I've not tested any of the other variants and the instructions for testing some of them may be rather tedious (especially UEFI, if you want a simple path to back out). Since there's not been full convergence testing, you'll almost certainly find bumps in this system. Also, all the build-system APIs are likely not yet final. I put MFC after a month on the commit. Due to the heroic (dare I say almost crazy) work of Kyle Evans on merging all the revs from -current to 11, I'm planning a MFC to 11 after the co-existence issues are hammered out. In 11, FORTH will be the default, and Lua will be built by default, but users will have to do something to use it. 12, both FORTH and Lua will be built and installed, with Lua as default (barring unforeseen complications). Once the co-existence stuff goes in, I imagine we'll make the switch to Lua by default shortly after that. In 13, FORTH will be removed unless there's a really really compelling case made to keep it. So please give it a spin and give me any feedback, documentation updates and/or bug fixes. I'm especially interested in reviews from people that have embedded Lua in other projects or experts in Lua that can improve the robustness of the menu code. Bitcoin Full Node on FreeBSD (https://bsdmag.org/5374-2/) What is a Bitcoin ? Bitcoin is a valuable popular open-source cryptocurrency that was invented by Satoshi Nakamoto in 2009. Bitcoins have value because they possess same characteristics like money (durability, portability, fungibility, scarcity, divisibility, and recognizability), but based on the properties of mathematics rather than on physical properties (like gold and silver) or trust in central authorities (like fiat currencies). In short, Bitcoin is backed by mathematics. Bitcoin is the first decentralized peer-to-peer cryptocurrency that is controlled by its users. Transactions take place directly between users, and are later verified by network nodes with digital signature and then placed in a public distributed ledger called a blockchain. Bitcoin is unique in that only 21 million bitcoins will ever be created. The unit of the bitcoin system is bitcoin or mBTC. What is a Bitcoin Wallet ? A wallet is nothing more than a pair of public and private keys that are created by a client to store the digital credentials for your bitcoin. There are several types of wallets: Desktop Wallet Token Wallet Online Wallet Mobile Wallet A token wallet is the safest way to work with bitcoin network, but you can use your mobile or pc as a bitcoin wallet. What is a Blockchain? A blockchain is a ledger that records bitcoin transactions. The blockchain is a distributed database that achieves independent verification of the chain of ownership. Each network node stores its own copy of the blockchain. Transactions will broadcast on the bitcoin network, and about 2400 transactions create a block. These blocks are building blocks of the blockchain. What is Mining? Mining is the process of dedicating computing power to process transactions, secure the network, and keep everyone in the system synchronized together. It has been designed to be fully decentralized. Miners need mining software with specialized hardware. Mining software listens for transactions broadcasted through the peer-to-peer network and performs appropriate tasks to process and confirm these transactions. Bitcoin miners perform this work because they can earn transaction fees paid by users for faster transaction processing. New transactions have to be confirmed then be included in a block along with a mathematical proof of work. Such proofs are very hard to generate because there is no way to create them other than by trying billions of calculations per second. Hence, miners are required to perform these calculations before their blocks are accepted by the network and before they are rewarded. As more people start to mine, the difficulty of finding valid blocks is automatically increased by the network to ensure that the average time to find a block remains equal to 10 minutes. As a result, mining is a very competitive business where no individual miner can control what is included in the blockchain. The proof of work is also designed to depend on the previous block to force a chronological order in the blockchain. This makes it exponentially difficult to reverse previous transactions because it would require the recalculation of the proofs of work of all the subsequent blocks. When two blocks are found at the same time, miners work on the first block they receive and switch to the longest chain of blocks as soon as the next block is found. This allows mining to secure and maintain a global consensus based on processing power. What is Pooled Mining? You have more chances if you participate with others to create a block. In a pool, all participating miners get paid every time a participating server solves a block. The payment depends on the amount of work an individual miner contributed to help find that block. What is a Full Node? A full node is a client that fully validates transactions and blocks. Full nodes also help the network by accepting transactions and blocks from other full nodes, validating those transactions and blocks, and then relaying them to further full nodes. Many people and organizations volunteer to run full nodes using spare computing and bandwidth resources. What is a Bitcoind? bitcoind is a Bitcoin client under the MIT license in 32-bit and 64-bit versions for Windows, GNU/Linux-based OSes, Mac OS X, OpenBSD and FreeBSD as well. Conclusion Cryptocurrencies are replacement for banking we know today, and bitcoin is the game changer. Mining bitcoin with typical hardware is not a good idea. It needs specialized devices like ASIC, but you can create a full node and help the bitcoin network. Useful Links https://en.wikipedia.org/wiki/Cryptocurrency https://bitcoin.org/en/faq *** Latest DRM Graphics work The DRM Graphics stack from Linux is ported to FreeBSD on an ongoing basis to provide support for accelerated graphics for Intel and AMD GPUs. The LinuxKPI bits that the drm-next-kmod driver port depends on have been merged into stable/11 and will be included as part of the upcoming FreeBSD 11.2 (https://svnweb.freebsd.org/ports?view=revision&revision=462202) Additionally, the version of the drives has been updated from Linux 4.9 to Linux 4.11 with a number of additional devices being supported (https://lists.freebsd.org/pipermail/freebsd-current/2018-February/068690.html) *** How does cd work? (https://blog.safia.rocks/post/171311670379/how-does-cd-work) In my last blog post, I dove into some of the code behind the sudo command. I thought this was pretty fun. sudo is one of those commands that I use quite often but haven't had the chance to look into truly. I started thinking about other commands that I use on a daily basis but had little understanding of the internals of. The first command that came to mind is cd. cd stands for change directory. Simply put, it allows you to set your current working directory to a different directory. I read through some of the code that was defined in this file. Some of it was in functions, and other bits were in templates, but after a while, I figured that most of the code was a wrapper around a function called chdir. A lot of the functions defined in the cd.def file linked above actually just invoke chdir and handle errors and parameter cleaning. So all in all, here is what happens when you run cd on the command line. The cd builtin is invoked as part of the Bash shell. The Bash shell invokes the chdir function. The chdir function is part of Unix and invokes the chdir system call. The Unix kernel executes the chdir call and does its own low-level thing. I could dive in a little bit more into how #4 works, but let's be honest, I've already read too much code at this point, and my eyes are starting to hurt. Beastie Bits Stockholm BSD User Group: March 22 (https://www.meetup.com/BSD-Users-Stockholm/events/247552279/) Open Source Hardware Camp 2018 (30/06 & 01/07) Call for Participation (http://mailman.uk.freebsd.org/pipermail/ukfreebsd/2018-February/014182.html) Initial release schedule announcement for FreeBSD 11.2 (https://www.freebsd.org/releases/11.2R/schedule.html) Serious Shell Programming (Devin Teske) (https://www.gitbook.com/book/freebsdfrau/serious-shell-programming/details) SSH Mastery 2/e out (https://blather.michaelwlucas.com/archives/3115) TCP Fast Open client side lands in FreeBSD (https://svnweb.freebsd.org/base?view=revision&revision=330001) Help the Tor BSD Project increase the OS diversity of Tor nodes, for your own safety, and everyone else's (https://torbsd.org/open-letter.html) 5 Differences Between TrueOS & Linux (https://www.kompulsa.com/2018/02/23/5-differences-trueos-linux/) *** Feedback/Questions Ambrose - Bunch of questions (http://dpaste.com/0KRRG18#wrap) Eddy - ZFSoL with single SSD (http://dpaste.com/0MTXYJN#wrap)
Follow up: Jeep Wranglers iMac Pro RAM upgrades iMac Pro input devices Mac Pro will be "upgradeable" iMac Pro orders #askatp Entry-level open-backed headphones (via JD Lewin) Schiit Fulla 2 Grado SR60e: $80, but uncomfortable Beyerdynamic DT 880 Pro: $225, for treble lovers (250-ohm recommended) Sennheiser HD 650: $300, for treble haters Car magazines for kids (via Jeremy Nachman) Motor Trend on YouTube Car and Driver John's observational tweet Cars from a Woman's Perspective Do we really need ECC RAM? (via japhisu) Google study IBM study NVIDIA vs. AMD GPUs in Macs CUDA More about the iMac Pro Table of Skylake Xeon workstation processors T2 Startup Security Utility Recovery procedure Post-show Analog(ue): Marco reflects on his interesting phone call John's autobiography Sponsored by: Squarespace: Make your next move. Use code ATP for 10% off your first order. Backblaze: Online backup for $5/month. Native. Unlimited. Unthrottled. Uncomplicated. AfterShokz: Headphones powered by bone-conduction technology.
CES is just beginning. Join us for announcements from Lenovo, NVIDIA Press Conference, new AMD GPUs and more!