Podcasts about FFmpeg

  • 121PODCASTS
  • 231EPISODES
  • 1hAVG DURATION
  • 5WEEKLY NEW EPISODES
  • Jun 24, 2026LATEST

POPULARITY

20192020202120222023202420252026


Best podcasts about FFmpeg

Latest podcast episodes about FFmpeg

SANS Internet Stormcenter Daily Network/Cyber Security and Information Security Stormcast
SANS Stormcast Wednesday, June 24th, 2026: Patching vs. Configurations Updates; libssh2 and ffmpeg vuln;

SANS Internet Stormcenter Daily Network/Cyber Security and Information Security Stormcast

Play Episode Listen Later Jun 24, 2026 6:48


CVE-2024-40766: The Patch Fixed the Bug. Nobody Fixed the Configuration. https://isc.sans.edu/diary/CVE-2024-40766%3A%20The%20Patch%20Fixed%20the%20Bug.%20Nobody%20Fixed%20the%20Configuration./33094 libssh2 - Out-of-Bounds Write via Unchecked packet_length in transport.c https://www.vulncheck.com/advisories/libssh2-out-of-bounds-write-via-unchecked-packet-length-in-transport-c PixelSmash Critical FFmpeg Vulnerability Turns Media Files into Weapons https://jfrog.com/blog/pixelsmash-critical-ffmpeg-vulnerability-turns-media-files-into-weapons/ My Upcoming Classes https://www.sans.org/profiles/dr-johannes-ullrich

SPACE NEWS POD
Building a Million Dollar AI App - Day 1: AI Coaching with OpenAI Vision

SPACE NEWS POD

Play Episode Listen Later Jun 21, 2026 4:12


Day 1 of building a million dollar AI app. I'm 20 years into software development and I use AI agents and tools like Lovable to ship faster than I ever could by hand. In this video I walk through the AI coaching feature I just implemented in Stable Manager Pro, an app that helps riding instructors give better feedback to students.The feature uses OpenAI Vision to analyze a photo of a rider on a horse and return real coaching feedback: overall balance score, posture analysis, hip engagement, upper body angle, and specific coaching cues. The output reads like what a real instructor would say after watching a student ride. Two years ago you couldn't build this. Today it's one API call.What's covered:How OpenAI Vision analyzes a rider's posture from a single photoThe coaching output: balance, posture, stirrup position, hip engagement, upper body angleWhy this works for equestrian coaching specificallyHow I'm using OpenAI and Claude together to ideate featuresWhat's coming in v1.1: video upload, FFmpeg frame extraction, multi-frame analysisThe app is stablemanagerpro.com. We're in soft launch and onboarding stable owners, students, and instructors now. Web app first, with iOS and Android coming.This is Day 1 of an ongoing series where I show how I build apps using AI tools instead of hand-coding everything. Subscribe to follow the build. New videos drop as features ship.Stable Manager Pro: https://stablemanagerpro.com#AI #BuildInPublic #OpenAI #Lovable #AIagents #SaaS #AIapp #VibeCoding #equestrian #softwaredevelopment #Claude #AIcoaching

The Cybersecurity Defenders Podcast
FFmpeg's 21 zero-days, Ruby cooldown feature, Microsoft disrupted by Shai-Hulud worm & Meta AI tool compromise / Intel Chat [#331]

The Cybersecurity Defenders Podcast

Play Episode Listen Later Jun 15, 2026 28:31


In this episode of The Cybersecurity Defenders Podcast, we discuss some intel being shared in the LimaCharlie community.DepthFirst reported that it's autonomous security agent discovered 21 previously unknown vulnerabilities in FFmpeg, a widely deployed multimedia framework used across browsers, streaming infrastructure, and other systems that process media. Bundler, 4.0.13 introduces a new security feature called cooldown, aimed at reducing the impact of software supply chain attacks in the Ruby ecosystem. A new variant of the Shai-Hulud supply chain worm, known as Miasma, briefly disrupted Microsoft's software development ecosystem after compromising dozens of GitHub repositories.Meta says approximately 20,000 Instagram accounts may have been compromised through the abuse of an AI powered account recovery support system.Support our show by sharing your favorite episodes with a friend, subscribe, give us a rating or leave a comment on your podcast platform.This podcast is brought to you by LimaCharlie, maker of the SecOps Cloud Platform, infrastructure for SecOps where everything is built API first. Scale with confidence as your business grows. Start today for free at limacharlie.io.

Hacker Public Radio
HPR4659: Command Line Fun - Recording a show

Hacker Public Radio

Play Episode Listen Later Jun 11, 2026


This show has been flagged as Clean by the host. In this episode Kevie does a step-by-step approach to record an episode of HPR using the FFMPEG tool on the Linux command line. Before beginning please ensure that FFMPEG is installed, it is available in the vast majority of Linux repositories. Start by making a new folder to keep all your files in and move into the new folder (these will be numerous by the end of your recording): mkdir Podcast cd Podcast To start recording audio use the command: ffmpeg -f pulse -i default file01.flac and finish the recording by pressing ctrl+c. I would recommend recording a test piece of audio to ensure that you are recording from your desired microphone and that the levels are to your liking. To listen to the audio file we use ffplay: ffplay file01.flac Once all of the files have been recorded, to reduce the need for editing I would recommend recording several short segments, we need to put these together using: ffmpeg -i file01.flac -i file02.flac -i file03.flac -i file04.flac -i file05.flac -filter_complex "[0:a][1:a][2:a][3:a][4:a]concat=n=5:v=0:a=1" filedone.flac Not that the number of sets squared brackets [] should be the same as the number of files (these start at zero) and the number after n= should be the actual number of files you wish to combine. To remove any extended periods of silence then we can use: ffmpeg -i filedone.flac -af silenceremove=stop_periods=-1:stop_duration=1:stop_threshold=-45dB filefinished.flac Please note that this is a bit flaky at the time of recording (my results have been mixed) and it will re-encode the audio file so never do this with a lossy file such as ogg or mp3 as this will reduce the quality, keep this for lossless versions such as flac or wav files. If you want to spend a bit more time editing the files and getting a better final audio file then the most effective way (but not quick) is to trim the audio from the end and beginning. Listen to the audio files and note the times of any periods of silence. As these normally are at the beginning and end, especially when we are recording in short segments then I will limit the instructions to avoid these becoming silly in length. Firstly clip off the end silences first, if you start with the beginning then it will change the starting position of the end silences. To remove audio from the end of a track use: ffmpeg -i file01.flac -vn -acodec copy -to 00:01:30 file01cut.flac In this example anything after 1 minute and 30 seconds will be removed. The edited audio file will then be saved as file01cut.flac. This method does not re-encode the audio so there is no loss of quality. To remove audio from the start of a track use: ffmpeg -ss 30 -i file01.flac -c copy file01cut.flac In the above example the first 30 seconds of the file will be removed and saved as file01cut.flac Once you have edited each audio file then they will need to be merged together again to make a complete show. Provide feedback on this episode.

Dachthekenduett
Kubicki-Effekt: Brandmauer bis zur AfD-Alleinherrschaft?

Dachthekenduett

Play Episode Listen Later Jun 11, 2026 95:44


In Folge 221 des Dachthekenduetts sprechen Sascha Koll und Martin Moczarski über KI-Boom und Arbeitsmarkt, Mario Voigts KI-Affäre, Merz und Merkel, CDU/FDP-Brandmauer, AfD-Proteste, Belfast und Stuttgart 21.Möchten Sie unsere Arbeit unterstützen?––––––––––––––––––––––––––––––––––––––––––––––––Spenden Sie Werkzeuge für die libertäre GlücksschmiedePayPal (auch Kreditkarte) / Überweisung / Bitcoin / Monero:

The Bad Crypto Podcast
Claude Fable is AWESOME - Bad Crypto #810

The Bad Crypto Podcast

Play Episode Listen Later Jun 10, 2026 40:11


The Worst It'll Ever Be: AI Apps in 20 Minutes, SpaceX's $1.8T IPO & Saylor's Head Fake — Bad Crypto Podcast #810 It's a bear market, so the bad boys of crypto are doing what builders do: SHIPPING. Bitcoin sits at $61,873, the altcoins are in the crapper, and Joel has officially divorced his bags. Travis explains why the 4-year cycle is alive and well — mapping this pullback exactly to previous cycles, with a projected bottom around mid-October. Then it goes full mad-scientist. Travis builds a viral-worthy "Culture Shock" site of World Cup visitors reviewing America in 20 minutes flat with Claude's new Fable model, then ships Viddl — a desktop app that downloads video from YouTube, X, TikTok, Instagram or LinkedIn with FFmpeg baked in. Joel premieres his AI-generated origin story film (1978, a food court paycheck, and a TRS-80 in a Radio Shack window) and announces his Acumen daily puzzle games are headed to the App Store. Plus: SpaceX IPOs as $SPCX at a $1.8 TRILLION valuation with ~$250B in demand, OpenAI and Anthropic file to go public, Michael Saylor's 32-BTC head fake, a trader who built his own exchange from a 42-page prompt, and the AI video tool stack the guys actually use (Kling, PAI, Higgsfield, Seedance & more). "The technology that we're using now to build stuff is the worst that it's going to be." — Joel ⏱ CHAPTERS0:00 Cold open & liftoff1:04 Episode 810 kicks off — semi-retired no more3:48 Bitcoin's 4-year cycle is mapping exactly4:45 Saylor's head fake: sells 32 BTC, buys 1,500 more6:40 Market check: BTC $61,873 & Joel divorces his altcoins7:49 The AI trading edge: OKX & the 42-page prompt exchange10:24 SpaceX IPO ($SPCX): $250B demand, $1.8T valuation11:27 Trillion-dollar AI: Anthropic & OpenAI file to go public15:48 Culture Shock: World Cup visitors review America19:09 Viddl: download any video, built in a morning23:06 Joel's AI origin story: 1978 & a TRS-8026:30 The AI video stack: Kling, PAI, Higgsfield, Seedance28:08 Acumen: 9 daily puzzle games headed to the App Store31:56 Travis's Pixar-style get-well video for his brother35:03 "The worst it's ever going to be" — why the opportunity is NOW37:18 The fine print

Hacker Public Radio
HPR4658: Audio Revisited

Hacker Public Radio

Play Episode Listen Later Jun 10, 2026


This show has been flagged as Clean by the host. 01 Introduction This is a follow up to my 4 part series on simple podcasting. In this episode I will discuss a number of experiments with audio filtering. These experiments were inspired by comments by listeners and by other discussions about audio on HPR. I am not an audio expert, so I am doing this partly in order to learn something, but mainly in order to have a bit of fun. I hope that you find this entertaining as well. In a comment on the first episode a listener mentioned something called Solocast and said that the method bore a resemblance to the method that I was using. Here is his comment -------------------- 02 Comment #3 posted on 2026-04-03 07:49:58 by Reto It reminds me about Solocast Hi Whiskeyjack, I really liked your podcast and the topic. I cannot remember about your last, but the sound quality of this one was good on my mobile speakers :) The concept reminded me about the program from Norrist (another host on HPR), while similar does it have some differences HPR 3496 https://hackerpublicradio.org/eps.php?id=3496 As I am not on the future feed, I look forward to your next episode. Cheers, Reto -------------------- 03 End of comment. I did not recall having heard the episode on Solocast, but this sounded very interesting. Solocast was in HPR episode 3496 and was released by norrist on the 27th of December 2021. I listened to that episode and does indeed use use the same basic concept of recording short segments of audio and combining them later instead of creating one big recording and editing it with an audio editor. 04 The main difference is that the work flow that I described involves a lot of manual steps, while Solocast is a short Python program that automates the entire process of presenting your script, recording the segments, combining the segments, and filtering and normalizing the result. I won't try to describe Solocast in detail, instead I would recommend just listening to HPR episode 3496 to get norrist's explanation directly. -------------------- 05 While I wanted to make sure that I credited norrist with having come up with this concept four years before I did, this won't be the focus of this episode. Instead I will talk about audio filtering and various experiments that I ran on several different methods. 06 While looking at the source code for Solocast I noticed that it used a filtering method that resembled one used by Jivetalk, a podcast production program that caught the attention of one of the HPR community news presenters. This method involves taking a sample of quiet audio where there is no speaking taking place, and then using this as input to a noise reduction filter which is applied to the voice recording. The filter subtracts the quiet sample from the voice audio, which should theoretically remove the ambient noise. 07 I decided to apply this method to a number of different audio test recordings which were recorded under different circumstances using different hardware. In this way I could see if the method worked equally well under all circumstances or if there were some sorts of noise which it was suited to and some sorts that were not. 08 While I was at it, I also picked several other filter methods to see how they worked as well. Potentially, some methods may be better under some conditions while other methods were better suited to others. -------------------- 09 I won't present all of my experiments, as that would be a bit dull to listen to. Instead I will describe each method and then present audio samples which illustrate my conclusions. There are two pieces of audio software involved, both of which were also used in my series on simple podcasting. 10 The first is Sox, spelled s o x , and which is short for Sound Exchange. Sox is a command line program for audio manipulation. Sox is Free Software, released under the GPLv2 or later. The other is FFMPEG, which is also a command line program. FFMPEG is also Free Software, released under the LGPL V 2.1 or later, and GPL v 2 or later. Sox actually uses FFMPEG for certain operations. -------------------- 11 Audio Hardware For recording hardware I used the following. 12 Maxwell Headset The first is a cheap Maxwell headset that has an electrical noise problem. Unfortunately I don't have a model number for this headset. I described this hardware, the noise problems that I had with it, and how I created filters to deal with the noise in my series on simple podcasting. Briefly though, this is a headset that has a build in microphone on a boom which allows the microphone to be positioned close to the mouth. It connects with a USB cable. 13 Borne Earpiece and In-line Microphone This is a set of earplugs that go in your ears and connected by wires and a very small microphone built into a small bulge in the cable. It connects using a 3.5mm jack. The model number seems to be BUD250-BL. 14 XTrike Headset This is a gaming headset similar to the Maxwell headset described above. The model number is GH-510 It uses a USB connection. 15 Yanmai Condenser Microphone This is a microphone that comes with a small tripod stand. The model number is SF-910 It uses a 3.5mm audio jack. -------------------- 16 This is not a review of the hardware. Rather, I was trying to create audio problems so that I could test ways to fix them. Therefore, do not take the above list as a recommendation of what to buy. However, you can see that I am not using any expensive audio hardware. If you want to make an HPR podcast, you do not need professional level hardware. -------------------- 17 Audio Samples The audio samples are as follows 18 Quiet This was recorded in a quiet environment at my desk. This is my normal podcasting environment and represents optimal conditions. The main reason for this method is to see how the various filter methods perform when dealing with the electrical noise from the Maxwell headset. 19 Small fan This is a small USB powered table fan approximately 10 cm in diameter. It was located roughly 40 cm or less to the left of the microphone, although this varies depending on the microphone. 20 Traffic This was along a busy street with traffic noise in the background. -------------------- 21 Filter Methods Sox noisered Filter with Audio Profile This method uses the Sox noisered filter. Here is a brief quote from the Sox documentation on this filter. Quote Reduce noise in the audio signal by profiling and filtering. This effect is moderately effective at removing consistent background noise such as hiss or hum. To use it, first run SoX with the noiseprof effect on a section of audio that ideally would contain silence but in fact contains noise - such sections are typically found at the beginning or the end of a recording. End of quote For these tests I recorded a separate noise profile to go with each test. -------------------- 22 Basic Manual Filter This is a basic high and low pass filter pair based on the work I had done in my previous series on simple podcasting. However, based on the tests that I have done for this episode, I decided to get a bit more aggressive in terms of filtering. I use a high pass filter of 120 Hz, and low pass filter of 8 kHz. The each filter is then applied twice to increase its effect. I also added band reject filters to deal specifically with 50 and 60 Hz line noise. -------------------- 23 Complex Manual Filter This uses the manually constructed filter described in my series on simple podcasting. This uses the basic manual filter plus a series of custom bandreject filters to fix specific noise problems with the Maxwell headset. -------------------- 24 FFMPEG afftdn Filter The documentation describes this as "Denoise audio samples with FFT." -------------------- 25 FFMPEG arnndn Filter The documentation describes this as "Reduce noise from speech using Recurrent Neural Networks." -------------------- 26 FFMPEG agate Filter I will pronounce this as "agate" for convenience. The documentation describes this as "A gate is mainly used to reduce lower parts of a signal. This kind of signal processing reduces disturbing noise between useful signals." -------------------- 27 Method The experimental method used was to take each noise sample and apply the different filter methods to it. Where there are parameters which can be adjusted, a script was used to generate a series of different sample files with different parameter values. Not all possible parameters were experimented with, as the goal is to see which method produces what sorts of results under different circumstances, not to get the best possible result for the samples that I happen to have. The method in each case was as follows 28 Step 1 Convert the audio file to FLAC if it is not already in that format. 29 Step 2 Apply a basic high and low pass filter described previously to each sample. The reason for this basic filtering is that it eliminates at least some undesired noise in a fairly fool proof manner, leaving less for the more advanced filter to deal with. This should allow for a better test of the filter under realistic conditions. 30 Step 3 Apply the noise reduction filter being tested. 31 Step 4 Normalize the filtered sample to 17 LUFS according to the EBU R128 standard. The EBU standard is described in my series on simple podcasting. Normalizing adjusts the audio signal to a desired loudness level. This allows for more more consistent sound levels and allows us to hear the results under realistic conditions. I normalize the audio individually for each sample as different recording hardware requires different amounts of loudness adjustment. This is different from the typical podcast process where normalizing takes place as the very last step in the process, but it was necessary in this case. 32 Step 5 Concatenate selected sample audio files to one another to allow for better review and comparing. -------------------- 33 Results The results are grouped according to the type of noise which is being mitigated. This allows for easier comparison of the effectiveness of each technique under different circumstances. I have only picked a few examples of interest out of the numerous experiments that I conducted. -------------------- 34 Quiet Recording Environment with Maxwell Headset This compares how well the various filtering methods work on the noise induced by the electronics in the Maxwell headset. This electronic noise consisted of a noise spike every 1 kHz. This should be representative of electronic noise caused by problems in recording hardware. 35 Manual Filter The manual filter applied a narrow band reject filter every 1 kHz from 1 kHz to 12 kHz. This completely removed the otherwise audible whine caused by the noise. 36 FFMPEG afftdn This method allows for setting a noise floor and then specifying how much the noise floor should be reduced by. The method is very sensitive to getting the noise floor correct for that recording. Set the floor too low and nothing happens. Set it too high, and some distortion results. However it seemed to be moderately effective, but it would seem to require checking it and possibly adjusting it each time it is used. 37 FFMPEG agate This method allows setting a noise floor and then suppressing all sound which falls below that level. This method is very sensitive to getting the noise floor correct for that recording. If set too low (or quiet), it is ineffective. If set too high (or loud), it distorts words which come after a pause, which would typically be between sentences. 38 When set correctly, it completely removes noise in the silences between sentences. However, the noise is still audible during speech. This is because the noise in this case is a higher frequency than normal speech, and so stands out more. It may not be a significant problem for noise which is closer to the main vocal frequency band. Overall, this method is not suitable for this particular problem. 39 FFMPEG arnndn This method used the standard model. A variety of different noise reduction models are available. I only tested it with one, std.rnnn It does not seem to introduce much distortion in the voice signal even with a high amount of mix parameter. 40 However, it is only slightly effective at removing the whine from the signal, even with a high amount of mix parameter. Overall, this method does not appear to be useful for this sort of noise problem. 41 Sox noisered Filter This was effective in removing noise between words, but noise can be heard while words are being spoken. It was better than agate however. 42 Overall Conclusion for the Maxwell Headset Noise When dealing with narrow noise bands that occur at known frequencies, the manual filter is leagues ahead of any of the other tested alternatives. 43 Sample Audio Here is a sample audio recording showing the best overall results The sample is repeated, first with only basic low and high pass filtering, and then with the manually constructed filtering. In the first sample you should hear a high pitched background whine. In the second sample, the high pitched whine is completely removed. 44 (Audio sample inserted here.) -------------------- 45 Traffic Noise This was recorded using the Borne in-line microphone connected to a mobile phone while walking along beside a busy street. This was in dry cool spring weather, and the road was paved with asphalt. This should be reasonably representative of podcasting while walking outdoors in a noisy environment. 46 Basic Manual Filter This used the basic manual filter with high and low pass filters. This did nothing very useful in this case as the signal was already filtered within those limits by the recording hardware anyway. The low sample rate of 8 kHz in the phone limited the upper frequency to 4 kHz. Recall that the sample rate has to be twice the highest frequency that you want to detect. Overall, this is not suitable for this sort of problem. 47 FFMPEG afftdn With a high noise floor, background noise is reduced, but not eliminated. There was not much distortion in the voice. This is only slightly useful for this sort of problem. 48 FFMPEG agate With a high threshhold, background noise is reduced, but not eliminated. There was some distortion in the voice. The background noise could also be heard when speaking, but because the frequency of the background signal was similar to the louder voice signal, it was not as noticeable as it would have been if the two were very different. This is moderately useful for this sort of problem. It may be more useful in situations where the background noise was not quite as loud. 49 FFMPEG arnndn With high amounts of noise reduction, much of the background noise is suppressed, but there is not a lot of distortion in the voice. The background traffic noise is still present, but is significantly less. This offers only a moderate improvement. 50 Sox noisered Filter With small amounts of noise reduction voice is clear but traffic noise is present as a very significant continuous warbling sound in the background. This is no improvement on the original and in fact could be seen as making it worse. With moderate amounts of noise reduction, traffic noise is mostly gone, but there are still various squeaks present. Voice is noticeably distorted. With large amounts of noise reduction, traffic noise is gone but voice is highly distorted. This is moderately useful for this sort of problem, but requires careful adjustment. 51 FFMPEG arnndn Followed by FFMPEG agate This combined two different filters. First, it used arnndn to suppress the background noise to a lower level without much voice distortion. Then it applied the agate filter to suppress the noise levels between words still further. This used the same amount of mix and threshold as was found to be most effective when each of these filters was used on its own. The background noise is almost completely gone while distortion of the voice signal is low. 52 Overall Conclusion for Traffic Noise The arnndn combined with agate filters was the most successful at suppressing background noise while limiting the amount of voice signal distortion. 53 Sample Audio Here is an audio sample for what I felt to be the best overall results, the arnndn filter combined with the agate filter. First is the original audio with basic filtering. This is followed with the same audio after being passed through the arnndn and agate filters. 54 (Insert arnndn plus agate audio sample here) 55 Another Sample Here is a second audio sample showing the Sox noisered profile based filter. I have included this to show how a profile based filter can make things worse if you are not careful how you use it. This repeats the test audio 4 times. The first is with basic filtering only. The second uses low amounts of noise reduction. The third uses moderate amounts of noise reduction. The fourth uses high amounts of noise reduction. 56 (Insert noisered audio sample here) -------------------- 57 Small Fan Noise with Yanmai Microphone This was recorded using the Yanmai condenser microphone. A small fan was set up behind and to the left of the microphone. This is intended to represent situations where someone may have a fan or air conditioner running in the background due to hot weather, or has a loud computer fan. 58 A condenser microphone was used for this test as they are more prone to picking up unwanted noise. However, for practical recording purposes, this sort of microphone is unsuitable for this type of environment. 59 Basic Manual Filter This used the basic manual filter with high and low pass filters. This did nothing useful as the fan noise was in the same frequency range as the voice signal. This may be of more help in cases where the noise is below the 120 Hz cut off used in the low pass filter. 60 FFMPEG afftdn With high amounts of noise reduction, much of the background noise is suppressed, but there is some distortion in the voice. The background fan noise is still present, but is significantly less. Overall this is moderately effective. 61 FFMPEG agate This was effective in removing noise between words, but noise can be heard while words are being spoken. However, this was a small voice sample and it is possible that more problems could occur. With less fan noise than was in this sample this technique may work much better. 62 FFMPEG arnndn With high amounts of noise reduction, much of the background noise is suppressed, but there is not a lot of distortion in the voice. The background fan noise is still present, but is significantly less. Overall this was fairly effective. 63 Sox noisered Filter With small amounts of noise reduction voice is clear but fan noise is present as a slight warbling sound in the background. With moderate amounts of noise reduction, fan noise is gone, but voice is somewhat distorted. With large amounts of noise reduction, fan noise is gone but voice is very distorted. 64 In general this method is fairly successful at dealing with this sort of problem. However, there is a trade off between background noise and voice quality. Getting that trade off correct takes experiment and judgment for each specific situation. 65 FFMPEG arnndn Followed by FFMPEG agate This combined two different filters. First, it used arnndn to suppress the background noise to a lower level without much voice distortion. Then it applied the agate filter to suppress the noise levels between words still further. This got rid of virtually all of the background noise between words. If you listen carefully however, there is a slight buzzing sound in the voice signal. 66 Overall Conclusion for Fan Noise with Yanmai Microphone. Of the methods tested, the arnndn followed by agate filter seemed to offer the most improvement for the least effort and least voice distortion. The arnndn filter on its own seemed the next most preferable to me despite leaving some fan noise in the background. 67 Audio Sample Here is an audio sample for what I felt to be the best overall results, the arnndn filter combined with the agate filter. First is the original audio with basic filtering. This is followed with the same audio after being passed through the arnndn and agate filters. 68 (Insert audio sample here) -------------------- 69 Small Fan Noise Recorded with Headset The following is an observation rather than a filtering technique. When a recording was made using the Maxwell headset and listened to on the headset later or with speakers, the fan was virtually inaudible. When the same recording was listened to with the XTrike headset, it was barely audible with careful listening and only identifiable as a fan because I knew it was there. 70 In situations where there is ambient noise, the best noise reduction technique is probably to move the microphone as close to your mouth as possible, although not directly in front of it, and reduce the gain if there is a gain adjustment in the microphone. This will work far better than trying to remove the noise later. If you are recording an HPR episode at a desk, then an inexpensive headset with boom mike may do the job just fine with minimal effort and expense. -------------------- 71 Conclusions I have tested three noise scenarios - Electronic noise in the audio hardware at specific frequencies. Recording outdoors with an inline microphone in a noisy traffic environment. A noisy fan creating background noise in an office. My conclusions on these are as follows. 72 Electronic Noise in the Audio Hardware at Specific Frequencies If you can use Audacity or some other means to find the frequencies which are causing the noise, the best solution, assuming you don't just replace the hardware, is to manually construct filters to remove those specific frequencies. This is the safest solution in terms of only doing what you tell it to and not producing unexpected surprises some time down the road when something changed in the environment. 73 If you are looking for a fairly automatic filtering method, the Sox noisered profile based filter seems to work fairly well. There is an equivalent filter in ffmpeg, but I did not include that in my experiments as it is harder to use in a script because it does not use a separate noise profile file. 74 Recording Outdoors with an Inline Microphone in a Noisy Traffic Environment. In this situation, the FFMPEG arnndn combined with agate filters seem to be the most successful. The Sox noisered filter may work, but at the cost of more distortion in the voice than is seen in the other methods. 75 An inherent problem with any profile based noise reduction method is that if the background noise is not constant, which it seldom is in that sort of environment, the profile may not represent the background noise which is present later on in the recording. This risks adding more distortion in the voice as the profile and later environments diverge. 76 However, for this application a different microphone that provided a better recording would appear to be advisable. A solution which brought the microphone much closer to the mouth and so resulted in a better ratio of voice signal compared to background noise would appear to be necessary, after which the question of what sort of noise reduction to use would need to be re-evaluated. 77 A Noisy Fan Creating Background Noise in an Office. The Sox noisered filter and the FFMPEG arnndn, afftdn, and agate methods all work to some degree. However, they all need correct selection of parameters to achieve the proper results. When I compared all four methods side by side, I found the arnndn combined with the agate filter to be preferable in terms of the trade off between background noise reduction and distortion of the voice signal. The arnndn filter on its own seemed the next most preferable to me despite leaving some fan noise in the background. 78 However, that is a subjective judgment of a specific noise sample when recorded using a specific microphone. Keep in mind though that many listeners will not be listening in an idea environment. They may be doing things where background noise is present rather than in a very quiet room and so may find a small amount of background noise in the recording to be less of a problem than distortion in the voice signal which may make some words harder to understand. 79 When I conducted the same experiment recorded with the XTrike headset I found that arnndn seemed to offer no noticeable improvement. This may be because the amount of audible fan noise was far less with the XTrike headset to begin with. In other words, there is no single best solution here, and you may have to be prepared to try different options to see which one works in your situation. The important thing is to avoid making things worse by applying filtering that is not appropriate for that situation. The best method may be to use a recording method that doesn't pick up the fan noise to begin with. This can include just using a gaming headset with boom mic. 80 I have one final observation on this point regarding headsets. The Maxwell headset has a foam cover over the microphone while the XTrike headset does not. There was some slight audible wind buffeting noise picked up by the XTrike headset that was not observed with the Maxwell. This seemed to cause particular problems with the Sox noisered profile based filter, as this noise was irregular and after filtering would show up as a warbling sound. If you use a headset and plan to use it in conjunction with a fan, it may be advisable to apply some sort of wind cover over it. 81 Combining Complex Filters In several cases I found that combining several complex filters offered better results than using any single one on its own. The basic strategy though is to first use a method which is good at reducing undesirable noise without introducing excessive voice distortion. Then apply a different filter which is good at reducing small levels of background noise to an even lower level while affecting the voice signal as little as possible. This uses the relative strengths of different filter types to compensate for the weaknesses of the other. 82 Different combinations of filters were most effective for different types of problems. I did not try all possible combinations however. Perhaps a further exploration of this would be worth doing in a later podcast. -------------------- 83 Case Study - Noise in Another HPR Episode Audio In the comments to my second episode on Simple Podcasting (which is HPR4618) where I discussed basic filtering, a couple of listeners brought up an interesting point. Antoine mentioned "declicking" in a post. -------------------- Vance replied 84 Antoine, thanks for mentioning the click removal capability in Audacity! While I already knew about its noise removal filter, I wasn't aware it also had click removal. It might have helped me for HPR4637, where some sort of electromagnetic signal was picked up by my microphone/recorder, a Zoom H2 (the tapping sound was *not* present in the room where I recorded). While click removal does seem to distort speech when applied to it (though to my ears, it doesn't sound as weird as when noise removal is done with speech), I could have applied the filter only to the pauses, where the "tapping" is most noticeable. I will consider doing this in the event that I'm not able to eliminate the source of interference in the future, which would be the best way to go. -------------------- 85 End of quote. I found this interesting as it sounded like another audio problem that could be experimented with. I found a sample of the episode which had the clicks and cut a copy of that segment out to experiment with. These sounds are a series of clicks, or "ticks" would be another way to describe them, in the quiet part of the audio between sentences or phrases. 86 Next I used Audacity to study the sound spectrum. I found a massive 60 Hz noise spike. However, my speakers won't reproduce sound that low, and filtering this out didn't reduce the clicks. The clicks turned out to be bursts of noise across the 100 to 800 Hz band, which is right where the main vocal band also is. This makes it difficult to filter based on frequency. The most promising approach would seem to be to filter based on sound level. 87 I tried all of the individual audio filter techniques mentioned in the other experiments above. None produced satisfactory results except for agate, which makes quiet audio quieter. This completely suppressed the clicks. However, when applied to the entire episode it also distorted the start of a few sentences which began with single short syllables. 88 The agate filter has a number of parameters which could be adjusted to try to deal with these cases, although I did not spend the time to do so. Another solution to this distortion problem is to simply not apply the filter to those parts of the audio which are affected. If you record the audio as a series of small individual files, it would be easy enough to filter before concatenating the files together while skipping those files which contain audio which is not suited to this method. Here are the results of the experiments. 89 FFMPEG afftdn This reduces the size of of the ticks, but they are still present. However, they may be reduced to a level which is considered acceptable. 90 FFMPEG agate This was very effective in removing ticks with the right parameters. However, it can introduce some voice distortion in the form of cutting out the start of a few sentences which began with single short syllables. This can be corrected with a very short "attack" parameter to turn off the filter when it detects sound above a set threshhold. 91 FFMPEG arnndn This was relatively ineffective. 92 Sox noisered This was effective in removing the sounds between phrases. However, it introduces some distortion in the voice signal. 93 I also tried combining filters. FFMPEG afftdn Followed by agate This combined two different filters. First, it used afftdn to suppress the background noise to a lower level without much voice distortion. Then it applied the agate filter to suppress the noise levels between words still further. This got rid of virtually all of the background noise between words. 94 Here is a short audio sample from HPR4637. First is the unfiltered audio. Second is the filtered audio using the combined afftdn plus agate filters. Since the "clicks" are very quiet, you may not hear them unless you are in quiet environment. Quite a few listeners would probably not be aware of the perceived audio problem in this episode if it had not been discussed here. None the less, it makes for an interesting experiment. Here it is: 95 (Insert sample audio here) 96 Overall Conclusion for Noise "Ticks" The afftdn combined with agate filters seemed to offer the best overall results when used with the right parameters. However, the author, Vance, speaks very clearly and evenly, and so his voice is ideally suited for use with this filter. Another author's voice may not be as suited to this filter. 97 The Sox noisered profile based filter offers various degrees of trade off between suppressing noise and distorting the voice signal. As to whether this is an acceptable trade off depends on the particular voice in question and how easily understood it is under normal circumstances with out additional distortion. The afftdn filter may be a fairly safe filter to use on its own while producing acceptable if not perfect output. -------------------- 98 Overall Conclusions I have presented only a few of the experiments that I conducted. My overall conclusion after all of this is that there is no universal audio filtering method that works best in all circumstances. There are instead a number of tools in the toolbox, and picking the right one for the job takes a bit of trial and error. 99 However, if you have a repeatable recording environment, then once you have decided what tool you need you should create a script for it so you can have a repeatable processing setup. These conclusions apply to voice podcasting. Music has a different set of criteria and techniques that work well with basic voice podcasting may produce poor results when applied to music which has a broader range of frequency and just as importantly, a broad range of loudness. 100 If you are used to using filters and effects in Audacity, many of the settings on those correspond to arguments in the command line version of ffmpeg. It is worth learning how to use ffmpeg directly to automate your recording process. 101 The experiments that I conducted were greatly assisted by writing scripts which created multiple versions of audio files with different settings, thereby allowing me to try many different alternatives relatively easily. It also allowed me to concatenate different audio samples into a single audio file and so listen to different versions in quick succession, making subjective listening judgments more reliable. 102 It is important to keep in mind in all this that I am playing with audio filtering mainly to have fun. It is not necessary to do any of this if you think your podcast episode sounds just fine without it. So, don't let any of what I have talked about in all this discourage you from simply recording a podcast and sending it in as is. I will include copies of the filters I have described here in the show notes. -------------------- 103 Related Matters Hardware Characterization Using Audio Signals I found it useful to characterize the hardware that I had in order to understand its limitations better before starting the experiments. This involved playing a signal out through a set of speakers and then recording it through a microphone. 104 I used two types of signal for this. One is type of signal is known as a "chirp" signal. This is a sine wave that steadily increases in frequency as it sweeps across the audio spectrum. The standard audio range is 20 Hz to 20 kHz, but for my purposes I limited the upper frequency to 15 kHz to save time as anything beyond that is not very useful for voice podcasts. 105 By recording the chirp signal with a microphone and analyzing it with a Fourier transform, I could quickly see what each device was capable of. See my previous series on simple podcasting for an explanation of what a Fourier transform is and what software to use to see the results of it. Here is a chirp signal. 106 (Insert Audio Sample Here) 107 In addition to a chirp signal, I also used a series of simple tones of specific frequencies. By using these tones of known frequency I could gain an understanding of the limitations of my speakers and headphones, and just as importantly, my own ears. By understanding these limitations I was able to narrow the range of frequencies that I need to deal with quite considerably and set the high and low pass filters accordingly. These tones are a series of flac files generated with ffmpeg. 108 Here is a a sample audio tone at a 2 kHz frequency. 109 (Insert Audio Sample Here) 110 Copies of the script to create the chirp signal and the tones are in the show notes. -------------------- 111 A "Not a Review" of some of the Hardware that I Used I said that I would not do a review of the hardware that I used. However, some of it deserves mention for either how good or bad it was. I will record each section using the hardware being described. 112 Maxwell Headset This is my original recording hardware. This is a headset with boom mic and USB connection. There is no model number on it, so I don't know the model. This probably cost somewhere between 10 and 25 dollars. The earpieces sit on the ears and do not fully enclose them. This makes it light weight and comfortable to wear for extended periods of time. It has a problem however with electronic noise consisting of a noise spike every 1 kHz. I was able to fix this with a series of filters using FFMPEG. Fixing this problem is what got me started in understanding audio. I will probably continue to use this headset to make podcasts. 113 XTrike Headset, Model GH-510 This is also a headset with boom mic and USB connection. I purchased this headset for the purposes of experimentation for this podcast episode. It cost $12.88. I found it to be surprisingly good for the price. It has fully enclosed ear pieces however, which may make it uncomfortable to wear in hot weather. I may try doing some of my future podcasting using this headset. 114 Borne Earpiece and In-line Microphone This is a set of earplugs that go in your ears and connected by wires and a very small microphone built into a small bulge in the cable. It connects using a 3.5mm jack. The model number seems to be BUD250-BL. It cost approximately $3.00. I bought several sets of these and use them for listening to podcasts from an MP3 player. The ear pieces are pretty good for listening with. The microphone works reasonably well when used in a quiet location. It is less good when in a noisy environment. It is very important however to secure the microphone to your lapel or other location reasonably near your mouth and to point the microphone (that is the small hole) outwards and not simply let it dangle freely. If you let it just hang, you will get poor quality and inconsistent audio. 115 Yanmai Condenser Microphone, Model SF-910 I purchased this microphone for the purposes of experimentation for this podcast episode. It cost $3.88. As it is a condenser microphone, it is prone to picking up background noise more and as such is probably not a good choice for podcasting by single person sitting at a desk. However, it is none the less a surprisingly good microphone for surprisingly little money. 116 iCan USB Microphone, Model M-306 I purchased this microphone for the purposes of experimentation for this podcast episode. This has a USB connection. This was also relatively inexpensive at $7.99, or roughly twice the price of the Yanmai microphone. Unlike the Yanmai however, it is absolutely wretched. There was such a high degree of distortion when recording through it that I found I could not use it in the fan experiments which I had bought it for. I ended up buying the Yanmai microphone for that instead. -------------------- 117 Easy Effects Software The techniques described so far all involve recording audio files and then processing them later to produce the desired result. This is probably the simplest and most straightforward way of doing things if you are making a typical podcast. However, there may be instances where you want to apply filtering or other effects on the "live" signal immediately and not after the fact. 118 There is audio software which can hook into your computer's audio system and do this with a live signal. For Linux, there is a package called "Easy Effects". This is Free Software and comes under a GPL V3 or later license. I installed it from the Debian repository under Ubuntu 24.04. 119 You can create various filters and even chain them together to combine them. I played with it a bit but do not know enough about it to discuss it seriously at this time. However, I thought it would be worth mentioning for the sake of those who may wish to try it out themselves. -------------------- 120 Episode Conclusion After having had some fun with audio and listening to other HPR members talk about audio, I thought I would have some more fun by playing with noise reduction filters. I have no intention of becoming an audio professional, but by doing some experiments I learned a few things and had some fun doing it. I hope that the rest of you found this interest as well. I will see you all again later in another episode of Hacker Public Radio. -------------------- Scripts Basic Filter This shows basic high and low pass filters ( 120 Hz and 8 kHz respectively) and band reject filters for 50 and 60 Hz. # The high and low pass filters. hlpfil="highpass=f=120, highpass=f=120, lowpass=f=8000, lowpass=f=8000" # Band reject filters filter for 60Hz and another for 50Hz. linefil="bandreject=f=60:width_type=h:w=20, bandreject=f=50:width_type=h:w=20" # Filter using ffmpeg. ffmpeg -i inputfile.flac -af "$hlpfil, $linefil" outputname.flac # ====================================================================== afftdn Filter # noisefloor should be between 20 and 80. noisefloor=$1 # Run the noise reduction. ffmpeg -i testrec-filtered.flac -af "afftdn=nr=10:nf=-""$noisefloor" tmptestrec.flac # ====================================================================== agate Filter # threshold shoud be between 10 and 80. threshold=$1 # Run the noise reduction. ffmpeg -i testrec-filtered.flac -af "agate=threshold=-"$threshold"dB:range=-60dB" tmptestrec.flac # ====================================================================== arnndn Filter # mix should be between 0 and 1. mix=$1 # Run the noise reduction. ffmpeg -i testrec-filtered.flac -af 'arnndn=model=std.rnnn:mix='"$mix" tmptestrec.flac # ====================================================================== sox noisered Filter # Generate the noise profile from a sample of background noise. sox silencefiltered.flac -n noiseprof noise.prof # nramount shoudl be between 0 and 1 sox testrec-filtered.flac noiseout-testrec.flac noisered noise.prof "$nramount" # ====================================================================== Manual Filter for Maxwell Headset Noise # Create a series of band reject filters, from 1 kHz to 11 kHz. ftemplate="bandreject=f=%s000:width_type=h:w=100" kilospikefil=$( seq 1 11 | xargs printf "$ftemplate," ) # Using ffmpeg ffmpeg -i testrec-filtered.flac -af "$kilospikefil" tmptestrec.flac # ====================================================================== Create a "chirp" signal # Start frequency. f0=20 # End frequency. f1=15000 # Duration of signal. duration=10 ffmpeg -f lavfi -i "aevalsrc=sin(2 * PI * (0.5 * ($f1 - $f0)/$duration * t^2 + ($f0 * t))):s=44100:d=$duration" -c:a flac -af "aformat=sample_fmts=s16" chirp.flac # ====================================================================== Generate Audio Tones toneout () { printf -v freqval "%05d" $1 ffmpeg -f lavfi -i "sine=frequency=$freqval:duration=3" tmptone.flac # Normalize ffmpeg -i tmptone.flac -af loudnorm=I=-17:TP=-2.0:LRA=4.0 -ar 44.1k -sample_fmt s16 tone$freqval.flac rm tmptone.flac } # List of frequencies in hertz. freqlist="50 60 100 120 130 140 150 160 170 200 500 1000 2000 3000 4000 5000 6000 7000 8000 9000" for freq in $( echo $freqlist ); do toneout $freq done # ====================================================================== Provide feedback on this episode.

Met Nerds om Tafel
Zo bepaal je zelf wie weet dat jij naar de wc gaat

Met Nerds om Tafel

Play Episode Listen Later Jun 3, 2026 89:45


Op de Fediverse draaien meer dan 40.000 servers en bijna geen enkel bedrijf. Volgens twee K/Coens is dat geen bug, maar een feature. Randal en Jurian praten met Koen de Jonge en Coen Wesselman van Procolix. Dat hostingbedrijf doet al 25 jaar bijna alles zelf, volledig op open source, zonder Big Tech. Sinds 2024 is het voor 100% eigendom van een stichting, zodat het nooit tegen zijn eigen missie in verkocht kan worden. Waarom zo radicaal? Omdat ze te vaak zagen wat afhankelijkheid kost. Het gesprek gaat over de Fediverse als fijne plek. Over hoe de glorietijd van Twitter langzaam werd dichtgeknepen. En over de vraag: wie heeft hier eigenlijk de controle? Over Koen de Jonge Koen de Jonge is medeoprichter en directeur van Procolix (sinds 2000, gevestigd in Dordrecht) en voorzitter van de stichting die de Nederlandse Mastodon-server mastodon.nl draaiende houdt. Procolix is sinds 2024 volledig eigendom van een stichting (stewardship owned), draait 100% op open source en host onder meer De Groene Amsterdammer, The Moscow Times, petities.nl en stemwijzer.nl. In deze aflevering legt hij uit hoe de Fediverse technisch werkt en waarom hij hem geen alternatief noemt. LinkedIn: https://www.linkedin.com/in/koendejonge/ Mastodon: https://procolix.social/@koen Website: https://procolix.eu/over-ons Over Coen Wesselman Coen Wesselman is commercieel directeur bij Procolix en kwam daarvoor van internet.nl. Hij is sinds de begindagen van Twitter (account uit 2007) actief op sociale media en zet zich nu in om publieke diensten naar de Fediverse te krijgen. Aan tafel brengt hij vooral het commerciële en maatschappelijke perspectief in. LinkedIn: https://www.linkedin.com/in/coenwesselman/ Mastodon: https://mastodon.nl/@wsslmn Sponsor: Red de AI Wet Kim van Sparrentak neemt het op tegen de techbro’s om duidelijke regels te maken voor kunstmatige intelligentie. Red de AI Wet besluiter je hier. In deze aflevering 0:00:00 Intro: open source, de Fediverse en niemand vertelt meer wanneer-ie naar de wc gaat0:05:00 25 jaar Procolix: hoe XenSource en VMware de keuze voor open source forceerden0:09:00 Als het zo logisch is, waarom doen niet veel meer bedrijven het dan?0:10:07 Brenno de Winter, soevereiniteit en de vraag wie er werkelijk de controle heeft0:12:50 VLC, FFmpeg en het risico van open source als kwetsbare bouwsteen0:17:08 De nadelen: tijd, integraties, recurring payments en btw voor heel Europa0:21:19 Het versnipperde landschap: X, Bluesky, Threads en de rest0:22:55 Hoe de glorietijd van Twitter werd dichtgeknepen: API’s, bots en wc-tweets0:27:01 Het panopticon: je bespied voelen door Meta, Google en zelfs WhatsApp0:36:25 De techniek uitgelegd: outboxen, abonneren en 40.000 servers die praten0:41:27 Als de brandweer alleen nog op X staat: NL-alert en de accountmuur0:44:48 Just a nice place, not an alternative: waarom er geen bedrijven op zitten0:52:57 Zelf een server hosten: opslag uitrekenen met Claude en wat het écht kost0:55:23 mastodon.nl in cijfers: 4.000 actieve gebruikers op 2 terabyte1:02:06 Moderatie en defederatie: hoe mastodon.nl een veilige plek probeert te zijn1:09:13 Luistervraag: Koen met een C of een K? (Gemini gaat tellen)1:12:16 Luistervraag: waarom Procolix in tv-torens zit, met diesel en gratis koeling1:17:44 Luistervraag van Simon: een LoRa-mesh-repeater in de toren en crisiscommunicatie Genoemd in deze aflevering mastodon.nl, Nederlandse Mastodon-server van de stichting van Procolix Procolix, open-source hostingbedrijf van de twee gasten Pixelfed, open Instagram-alternatief in de Fediverse publicvideo.nl, de PeerTube-videodienst die Procolix host Loops, korte video’s in de Fediverse, TikTok-achtig social.overheid.nl, Mastodon-server van de Nederlandse overheid T-DOSE, Technical Dutch Open Source Event, 6-7 juni in Geldrop Soevereiniteit! Hoe dan?, boek van Brenno de Winter Tips van de tafel Koen de Jonge: gebruik de officiële Mastodon-app, die remt je doomscrollen af door na een paar minuten even te pauzeren. Randal: post bewust minder, ook in besloten groepen; zet je feed op alleen-volgers en accepteer dat het saaier wordt, dat is een feature. Coen Wesselman: wil je controle zonder fulltime systeembeheerder te worden? Kies een bestaande server die bij je past in plaats van alles zelf te hosten. Koen de Jonge: nerderij voor 35 euro: bouw met een Heltec V3-printje, batterij en antenne je eigen LoRa-mesh-node om te leren hoe decentrale communicatie werkt.See omnystudio.com/listener for privacy information.

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

We're announcing AIEWF speakers this week! Take the AI Engineering Survey!Today's guest Ethan first joined us for the LS Paper Club as the lead on NVIDIA Cosmos World Model, but then joined xAI and built Grok Imagine in 3 months:He comes back on Latent Space with some nuclear hot takes: that Video Models primarily get their intelligence from LLMs, not from training on video data, and that the next frontier for truly interactive, realtime, long-horizon world models is to work on LLMs (perhaps Interaction Models as well…)Put it this way: In the near term, the next Sora won't be a better video model, but a video agent.Generative Media may more closely follow the evolution of AI coding which went from focusing on one-shot output performance and cost, to multiturn reasoning and planning models for agents and systems that can plan, edit, test, debug, and submit PRs.At a certain point, coding models got so good that the only significant next step to improve performance was handling the orchestration of these models.Now as the performance of video models increases significantly across realism, consistency, & prompt adherence while becoming more cost efficient, the next evolution of video generation may also be systems that can plan, generate, edit, critique, and iterate across an entire creative task. In this episode, Ethan joins swyx and Vibhu to unpack what it actually takes to build frontier image and video systems: data, VAEs, diffusion transformers, audio-video alignment, inference speedups, and the hidden cost of storing and moving massive video datasets. From building NVIDIA's Cosmos world model to joining xAI as Grok Imagine was being built from zero to one, Ethan He has been at the center of some of the most important work in video generation, multimodal models, and real-time world models.We go deep on Grok Imagine, how a small xAI team shipped its first multimodal video model in three months, why iteration speed matters more than almost anything in model development, and why many of the biggest gains come from fixing tiny bugs in data and training pipelines. Flipbook: The future of VideomaxxingVideo agents are almost a sure bet to be the trend in the coming year. We end with a glance at what's beyond video agents:Flipbook caused a minor sensation this year when it was released, but most treat it as a fun demo. Ethan takes it very seriously — with the speed and cost of inference coming down every year, the future of custom video JIT UI is closer than you think. We talked about why videogen models may become the front end of AI, how generative UI could replace traditional HTML/CSS, why world models need to be real-time, interactive, and long-horizon, and why the future of video generation may depend more on language models and agents than on diffusion alone.We discuss:* Why fast iteration mattered more than meetings* Why small training bugs can drive huge model quality gains* Why coding models may make compute the bottleneck again* How image and video models are trained with synthetic captions* The role of VAEs and latent space in frontier video models* Why image models are the foundation for video models* The tradeoff between temporal compression and real-time interactivity* Flipbook, Neural OS, and the future of generative UI* Why future interfaces may go from user intent to pixels* The hidden cost of training video models: storage, egress, and GPU hours* How step distillation and consistency models (like OpenAI sCM) makes video inference orders of magnitude faster* Grok Imagine 0.9 and large-scale audio-video generation* Why audio-video alignment is harder than text-video alignment* Ethan's definition of world models* Reference-to-video, video extension, and long-context video generation* Why xAI's research communication undersells Grok Imagine* How xAI culture shaped the speed of development* AI watermarking, SynthID, and detecting generated media* Why prompt rewriting matters for video models* Grok Imagine Agent and the rise of video agents* Why language models may unlock better video generation* Robotics, physical AI, and embodied world models* Why Ethan left xAI and shifted focus toward LLMs* Self-managed context, memory, and the next frontier for language modelsEthan He* LinkedIn: https://www.linkedin.com/in/ethanhe42* X: https://x.com/EthanHe_42Timestamps00:00:00 Introduction00:01:25 From NVIDIA Cosmos to xAI00:03:24 Building Grok Imagine from Zero to One00:10:07 How Image and Video Models Are Trained00:18:53 Video Compression, VAEs, and Real-Time Tradeoffs00:22:10 Generative UI, Flipbook, and Neural OS00:32:10 The Cost of Training Large Video Models00:37:04 Distillation, GANs, and Fast Video Inference00:41:21 Audio-Video Generation and Grok Imagine 0.900:48:34 What Makes a World Model?00:55:51 Reference Videos, Long Context, and Video Memory01:00:11 xAI Culture, Research, and First-Principles Building01:09:45 AI Safety, Watermarking, and Prompt Rewriting01:13:10 Video Agents and AI-Assisted Creation01:27:32 Why Language Models Unlock Better Video01:31:15 Robotics, Physical AI, and Embodied World Models01:32:38 Why Ethan Left xAI01:34:16 Self-Managed Context and the Future of LLMs01:38:43 Ethan's Career Path and Closing ThoughtsTranscriptIntroduction: Ethan He, Latent Space, and the Path to xAISwyx [00:00:00]: We're here in the studio with Ethan He, most recently of xAI. Welcome.Ethan [00:00:10]: Thank you. Glad being here.Swyx [00:00:11]: We're also here with Vibhu. you were first coming to us or joining the latent space world because you were working on Kosmos at NVIDIA, and you did a paper. We loved it. you presented it as well, so thank you for doing that.Ethan [00:00:23]: I've actually, I also presented the MoEs twice at latent space.Swyx [00:00:29]: How did you actually hear about us? Did we reach out to you? Is that how it worked?Ethan [00:00:33]: No, actually, I-- the community. Like I realized, oh, there is this online community that people talk about AI and also learn from each other through papers every week through the Paperclip. It's very nice.Ethan [00:00:49]: I learned a lot.Swyx [00:00:49]: I think three years stop. We haven't stopped even on Christmas and New Years. many weeks I want to stop but it keeps going.Vibhu [00:00:58]: No, that was good. I think you had posted that you worked on a paper, and I was “Oh, very cool. We have Paperclip. Present then.”Vibhu [00:01:04]: But I might have reached out to you after.Swyx [00:01:05]: you-- because it's an amateur club, right?Swyx [00:01:08]: so it's very unusual and but we have sometimes paper authors come by and actually explain the paper. Today we just did, the poolside paper, which was apparently very good.Vibhu [00:01:18]: Came out yesterday.Vibhu [00:01:19]: pretty interesting, right? Fully open. They talk about everything, systems. So it's a good one. We'll, we'll recommend people to read it.Swyx [00:01:25]: Bring us up to speed on your transition to xAI, ‘cause I actually don't even know when you joined. just like tell the, tell the story about the sort of transition.From NVIDIA Cosmos to xAI: Scaling Video and World ModelsEthan [00:01:34]: Before xAI, I was working on Kosmos world model as in-- at NVIDIA. So Kosmos is, it's a giant video foundation models that can-- that aims to simulate the world and for-- it serves as a foundation of-- for all of the roboticists to build on top of. There, once I built the Kosmos one, I realized as this thing also has a scaling law similar to language model, we need to scale up the video models further. that's, that's why I realized I need to move to somewhere with much more compute resources. That's how ISwyx [00:02:13]: Than NVIDIA?Vibhu [00:02:14]: The GPU rich came themselves.Vibhu [00:02:19]: And timeline-wise, when was Kosmo? It was pretty early, right? It was open world model, open paper, everything.Ethan [00:02:25]: It was end of twenty-four.Vibhu [00:02:28]: End of twenty-four.Ethan [00:02:30]: Then at mid twenty-five, I moved to xAI. At that time-- I joined about the time when xAI was about to build video models and in multi-model models. There were no infra, no data, and no model, and it just-- as a few engineers, we built it in three months and released the first model, Grok Imagine zero point nine.Ethan [00:02:55]: And since then, I keep working on video models and move more from training and to post-training of the video models. For example, like a reference to videos, kind of like the cameo feature and, video extensions. And, before I left, I worked on a world model, leading a small team to focus on the real-time long horizon video generation.Building Grok Imagine From Scratch in Three MonthsSwyx [00:03:24]: Can you give like a rough roadmap of okay, you're on a brand-new team. Grok previously was only text, or they partnered with BFL for their image gen stuff. What do you-- what are the building blocks, right? You have compute, data you can procure somewhere. Like just what are like the sequence of things that people should think about when you're setting up a new team?Vibhu [00:03:43]: actually even deeper, not just data you can procure. You guys had to go through getting the data too, right? So you shipped it pretty fast, but yeahSwyx [00:03:51]: three months is likeVibhu [00:03:52]: From everythingSwyx [00:03:52]: actually like very surprisingly fast.Ethan [00:03:55]: One thing I say like thanks to my experience at NVIDIA, ‘cause first time when we were building Kosmos together, we built it, for about a year. So this is like the second time I do it. Roughly have an idea, what to do. I say the most important thing is the talent. Everyone were very strong and clever, very close with each other towards a common goal. So that speed up things a lot. So you reduce the communication bandwidth among people, and everyone can work towards the same goal. It's, it's like every day there's not that much meetings on the calendar, like maybe like a, like a sync a day, and after that it's, it's just all building. It was pretty fun at that time.Ethan [00:04:47]: And another thing is that xAI has very strong foundations of like data inference, model inference, and the supporting there can help the model develop a lot. When I look at, training models, I don't so actually the top important thing is like how many, how many iterations can you do, per day? and the more iteration can you do, you can, you can train the model much faster. So if you have very strong infra and you have a lot of compute, you can, you can train these models in very short period of time. That can give you a much larger buffer to, for errors, and it also gives you the opportunity to spot more bugs.Iteration Speed, Compute, and Debugging Model PipelinesSwyx [00:05:46]: What is an iteration? Is it like a few hundred steps or what are youEthan [00:05:50]: Let's say just the train-training the model, like from acquire new data and maybe design new algorithms and train a new model, maybe at smaller scale orSwyx [00:06:01]: So cycle time for like any hyperparam that you're searching.Ethan [00:06:04]: Cycle time and tune to like eval this model. Is this model better than my previous iteration?Ethan [00:06:11]: SoSwyx [00:06:11]: So it's like before you, someone had already set this up that you can iterate very quickly.Ethan [00:06:15]: I think the foundation there is extremely good forDeveloping and research models.Ethan [00:06:23]: And often I find is it-- this is kind of boring, but like a lot of the improvements does not come from new algorithms. It comes from finding small bugs here and there in the data pipeline, in the, in the model training pipeline. Those give, those give the biggest boost to the model quality.Vibhu [00:06:46]: It's interesting, right? So you say it's like small team, less communication bandwidth, but also a lot of quality is like find little bugs. It seems counterintuitive, right? You have a lot of people, you can iron out more of those, but it's interesting to see the other side, right?Swyx [00:07:00]: I also wonder, have you-- do you try using LLMs to look for bugs? I don't know.Ethan [00:07:05]: I remember at that time it was mid two thousand and twenty-five, so it's the coding model wasn't quite there yet. I remem- I remember like December two thousand and twenty-five, it was extremely good. Yeah, I've been, I've been using it at that time. It's, it's helpful. sometimes it produce codes that are kind of difficult to maintain, even though like the first time it built something extremely fast. But it gave the, like a spaghetti code, thousands of lines that I couldn't maintain, and the LLM itself couldn't figure out what's, what's wrong and how to improve on top of it. But now I find it much better. Yeah, I want to bring up another point here is now coding models are much more efficient and can help us implement stuff much faster. Compute might become a bottleneck again because previously, like if you want to train a new model, say you want to generate new synthetic data and then or write a new algorithm, it might take a few weeks. And during that period of time, you don't-- you might not have experiments to run. But now you can build that thing within a few hours, then you can immediately train a model.Ethan [00:08:24]: Now you have to have enough compute to try all of the ideas. So compute might be the bottleneck of iterating speed again.Swyx [00:08:36]: yeah, I actually, honestly, I think it's like kind of a stressful job because you're “Well, I should be trying everything, and if I'm not, then I'm not doing my job well.”Vibhu [00:08:48]: there's also the stress of you're eating thousands of GPUs per hour, which is very expensive and, compute can go to other researchers.Swyx [00:08:56]: You got the daddy Elon toVibhu [00:08:57]: You got daddy Elon.Ethan [00:08:59]: It wasVibhu [00:09:00]: But there's still finite amount of compute, like you want to use it, you want to use it well, you want more of it.Ethan [00:09:06]: That was quite stressful indeed. Yeah, I think one thing is the-- with coding models now, like a lot of these jobs can be automated, which is much better. A second, it's a, it's a marathon, so you got to maintain good health and, a regular schedule.Vibhu [00:09:28]: It's, it's hard to hear that when you shift from zero to nothing in two months.Swyx [00:09:32]: and, I think obviously the culture at xAI is very famously, people work very hard. one thing I did want to dive into, in our-- in the notes that you, that you sent ahead of time, you had specific comments about the cost of Video Gen training. presumably this is on the Colossus-1, right? the two hundred megawatt cluster. Any whatever you want to just share on that.Vibhu [00:09:54]: I think there's, there's three things we're talking about, right? So there's Video Gen, there's also the Image Gen model that you put out. Do you want to like complete the, okay, so zero to one, you have a few months. Just what are the stages of create Image Gen model?Swyx [00:10:06]: Oh, yeah, maybe I got distracted.How Image and Video Models Are Trained: Synthetic Captions, Tokenizers, and VAEsVibhu [00:10:07]: Sorry. and then, from there's Video Gen, there's Audio Gen. Would love to get into those next. But what is that first few months like? So small team, a lot of bugs, iterations, but what does it look like? Do we take something off the shelf? Do we just get data compute? What's, what's the few months like? How do you go to state-art Image Gen model? How do you just start?Ethan [00:10:28]: I cannot comment specifically how xAI did, but it's, it's a quite standard process. I can draw some, examples from Cosmos. So mainly it's building a video model, you actually need to build a image model first. And building these two models, the data you need is a hundred percent synthetic pair of language and image or language to video. Because on the, on the internet, actually, the videos don't naturally associate with text. So you can say, oh, like on YouTube, you have the title and you have the description and the commentsSwyx [00:11:11]: TitleEthan [00:11:11]: of a video, but usually they're not relevant to the video itself. And say maybe like the video is a natural scene of mountains or something, and the title is, I'm so happy today.Ethan [00:11:26]: So they have they have no correlation at all. So the first step is to, you have to generate synthetic pair of language with the videos. So you gather videos from the internet, and you use a VLM to caption the videos. So that part, here's a question, like how do you, how do you gather VLM to begin with? So if there's noSwyx [00:11:55]: You, so you fuse the model, right? LikeEthan [00:11:57]: Say if there's no like VLM exists, like how do you generate the text to the beginning, right? It's, it's impossible.Swyx [00:12:04]: I see.Ethan [00:12:05]: In the beginning, it's like you ask human to describe the video as detailed as possible.For example, you ask them to describe everything, like all objects, all characters, and all interaction and dialogues in the, in the videos. So that's in the protocol of Cosmos labeling. We require the objective we give to the labelers was that you have to describe the video as detailed as possible, such that a blind person hears a blob of text can reconstruct what the video is like from their head.Swyx [00:12:43]: Video or image? You're talking about images.Ethan [00:12:44]: Video or image, either one of them.Vibhu [00:12:47]: This was pretty common when we went from clip and DALL-E, right?Vibhu [00:12:51]: It's all training on really detailed captioning of images. So same is applied to video, but insteadEthan [00:12:57]: same appliedVibhu [00:12:57]: of using multimodal model to pass in video images and write rich descriptions, you can alsoSwyx [00:13:04]: I think there's this traditional perspective of supervised, or, very highly human curated thing. I feel like there's a unlock with unsupervised, right? Where like you have enough to bootstrap that you can just throw common corpus on it or, whatever. like unsupervised vision and language pairing, right? Like where you just have, interspersed image and text and it just learns. To me, that is the VLM breakthrough that is different from the clip, different from the LM era.Ethan [00:13:36]: It's interesting to see that you kind of need both data.Ethan [00:13:41]: For example, for theSwyx [00:13:41]: You need it to bootstrap it up. YeahEthan [00:13:43]: for the generative model training, there's also usually like a small percentage of unlabeled data. So the model is instructed to generate a video without any text instruction. That can also help the model generalize. So after this stage of generative synthetic pair, so, one important common step is to train a compressor or a tokenizer of the image or videos. So because, if you train-- If you can technically, theoretically train image or video models on pure pixels, but the problem is that the, it's, it's a lot of tokens. So like one image, it's, a thousand by a thousand, it's like one million tokens, one million pixels. It's impossible to train transformer on that. So it's, you need to train a tokenizer, which can go from image to latent space and latent space back to image.Swyx [00:14:45]: That's why we named the podcast.Swyx [00:14:48]: But, basically, you're talking about vocabulary science.Ethan [00:14:50]: so vocab.Swyx [00:14:51]: And so, what is, what is imp-- like a million is impossible?Ethan [00:14:54]: In generative models, the vocab is continuous. It's a continuous space. We can think about like you map an image to a vector. It's a, it's a fixed length vector. It's sixteen or forty-eight, something like that. And then you map that vector back to the image space. And the mapping is, has-- The mapping is patch-based. So you say you haveEthan [00:15:22]: a sixteen by sixteen patch and you match, you map that patch of pixels into this latent space.Swyx [00:15:29]: We've covered thisVibhu [00:15:30]: This is like the vision transformersSwyx [00:15:32]: VAEs,Ethan [00:15:33]: VAEs.Vibhu [00:15:34]: You basically compress your input, you do your generation, you're reasoning all that generation in smaller dimension, and then you project back out.Swyx [00:15:43]: VAE is a form compression, but I think the for me, the patching thing is from VIT, right?Ethan [00:15:48]: You can make those.Swyx [00:15:49]: Literally the, yeah, the paper is titled like sixteen by sixteen is all you need. something like that. and then I think also, people make a lot of comparisons with this kind of patching with convolutions.Swyx [00:16:02]: Which is you're, you're kind of re- reconstructing the old paradigm with the new.Ethan [00:16:05]: Actually, in VAEs, there are, there are both convolution networks and transformers. You can actually do both.Ethan [00:16:14]: After this VAE, so what you've got is you've got latent space tokens and you've got the language tokens. So now the training of the diffusion transformer, usually generative models use diffusion transformers. It is actually quite standard. It's, it's very similar to how you train a language transformer models. It's not that much difference. It's just the tokens, the visual tokens in, visual tokens out. The only difference is there's a denoising process. So you train the model to unmask some of the noise. So you add, you add random noise to the visual tokens, and then you train the model to remove those noise to generate the clean tokens. Any inference, the model can iteratively remove noise from a hundred percent noise.Swyx [00:17:12]: And then there's also, to speed things along on the tech tree of diffusion, there's CFG, and then there's, there's also, latent diffusion that, there's, there's someone in there. I think, somewhere along the line, obviously, like stability and all these other guys, pioneered a lot of this, architecture. I don't know if you want to get into that or just, or do the video side up to you.Bootstrapping Video from Image Models and Temporal CompressionEthan [00:17:37]: After you train such model, such image model, the reason it's a, it's a foundation for video models is that image models are cheaper to train, and they have much denser connection between language and text. So, sorry, language and images. For example, you train a billion, you train on a billion images, and there's a mapping from the text to the image. And the cost to train the same, like the, a billion, a billion text to a billion videos, that's much more expensive because videosNaturally have more tokens than images. Because the diffusion models, their understanding of, language purely come from this mapping. So if you don't have enough mapping, so if you only train on like a ten million videos or something, there-- you might not see enough language tokens in your training, so your model does not understand human intention enough. So that's why you really-- you train-- you first train this image diffusion models, and then you bootstrap the video model from there.Swyx [00:18:53]: One thing I did want to ask, because I-- actually, I think you're, you're the first per-- video model person I've ever talked to, I think. we've, we've like talked to Luma and all those folks. There's all these tricks in video compression where basically frame by frame there's not that much difference, so actually you don't have to regenerate or save the whole frame, right? but I think MP4 compression or something else like that.Swyx [00:19:16]: is it tempting to use that? Or as far as I can tell, everyone just treats it as, “No, we would just generate every frame.” Is that roughly the state-art?Ethan [00:19:27]: There are a few different approaches. Let's say first, like you want to just directly use MP4 compression and use that as the tokens for the transformers to train, right? So people actually have tried that, but the main challenge is the latent space for the MP4 tokens were not, were not very comprehensible for the models. It's, it's extremely hard to train on that. And there's aEthan [00:20:01]: So that's why they created VAEs, which creates more continuous, latent space, so the models can understand that latent space and learn from it much easier. Even within the VAEs, there are different difficulties of the latent space. So you can imagine something the simplest, the most naive VAE is like you have an image, and you just shuffle all of the images into a, into a vector. So you don't need to train any VAEs, right? But that latent space is extremely hard for models to train on top of. That's why there are some debate on like how do you compress the tokens. So you mentioned like you can compress frame by frame. Also, you can compress, the temporal dimension.Ethan [00:20:52]: The difference is if you compress the temporal dimension, you get a much higher compression rate. Because there's temporal redundancy between frames, because, this frame and the last frame, likely they are mostly similar, so there's only some small difference. for example, I think in 12.1 VAE, they have like a eight by eight by four compression rate. So the four temporal tokens are compressed into one tokens. That can save a lot of, save a lot of the context length. If you do it frame by frame, you have to do maybe like eight by eight by one. Your context length will be four times larger. That being said, the benefit of the frame-- per frame compression, we might come back to this later, is, real-timeness and interactivity. ‘Cause if you, if you strain the output of the model, frame by frame, you can-- the model can respond to any user request immediately. So if you have like a temporal four compression, four times compression, thenSwyx [00:22:06]: It might be laggyEthan [00:22:07]: there's a lag there in nature.Swyx [00:22:10]: So you're very pilled on this. let's just go ahead and bring it up ‘cause we have the visual prepared anyway. There's some frontier applications of real-time video gen. So Flipbook is one of the examples that went viral recently, right? What is Flipbook?Real-Time Generative UI: Flipbook, Neural OS, and Diffusion Front EndsEthan [00:22:23]: Flipbook is kind of like a web brow- web browser. You can see like it has the web bro- browser UI on top. The difference is all of the UIs are generated by generative image model in real time, and anything here are fake. But you can, you can explore inside this wor- this imaginary world. Say like we-- here we have engineering the Great Pyramid. Like the model generates this for us to understand how it works, and if we want to navigate around and understand further, we can click on some of the, some of the description here, and the model will generate a new page, new subpage describing the details we want to know about.Swyx [00:23:14]: So it's basically kind of we're playing a video, but it's pausing for our next interaction, and then it just plays the next thing based on our interaction.Swyx [00:23:23]: Which is kind of cool.Vibhu [00:23:25]: and you kind of decide your story. So this was, how do you make a pyramid? levering technique seemed interesting, right? It shows how do you take Okay, I want to know what is thisSwyx [00:23:35]: The demo, the demo tweet had more animation between frames.Vibhu [00:23:38]: I think it's just skipping,Swyx [00:23:39]: Oh, it's just skipping a lot of frames.Ethan [00:23:40]: they also have a video modeVibhu [00:23:42]: It takes a lot. There's a lot of peopleEthan [00:23:42]: but, a lot of people are using it.Ethan [00:23:45]: So it's not available.Vibhu [00:23:46]: There's a live video stream. We can try,Swyx [00:23:50]: So this is an example of the kind of future that you see at the extreme. We don't-- we're obviously not in it today.Swyx [00:23:56]: But in a world where inference is completely free this is better than generating code and text?Ethan [00:24:02]: So this is, this is a final state of where Viva will be at for word model, I think. Imagine internet doesn't exist, and then you type in google.com. Like what should, what should, what should a model show you?the model can imagine something, and this is what the model imagine. And these web pages, they completely do not exist. So I think as the inference costs come down, we are going to have generative UI for everything. If you think about how the coding model works, so they write code for a web page, and they render the code might be con- converted into binary, and the binary render the pixels on the screen. So we in machine learning, every time we have some breakthrough, obviously it's, it's more intuit. So why don't we have like user instruction to the pixel directly? So the generative UI will be user intention to the pixels directly. And say like even if I want email, let's say everyone have the same interface, but I want, I want it slightly different. I want the email to show to me like a TikTok, so I can swipe left and right for the emails. And or maybe you want something else. We can have completely different things. Or like I have I'm looking at, Instagram stories, and I don't like the Like button. I always may click it. And, generative UI resolved it. So it's going to be a revolutionary replacement of the interface. So in the future, we might have much more powerfulEthan [00:25:50]: LLMs and coding models running behind the scene. And in the, in the front-end, the diffusion model will actually be the front-end to show stuff to you. That's how I imagine it.Swyx [00:26:02]: Diffusion front-end, deterministic back-end.Swyx [00:26:04]: Something like that. I find that very expensive, but,Vibhu [00:26:08]: I find it interesting you called LLMs writing code on the back end deterministic, but okay.Swyx [00:26:14]: you write it onceVibhu [00:26:15]: Compare it toSwyx [00:26:16]: And then you execute.Ethan [00:26:17]: If you think about the cost, say, let's say H100 costs $1 per hour, and if you use this eight hours a day and thirty days, so, every month you're paying this two forty, you'll actually not wanna pay for that. That's even more expensive than Cloud Code Max. But if you think about the compute costs come down like two times every year, and I think the future will likely arrive like within few years.Vibhu [00:26:49]: It's everything, right? compute cost comes down, compute gets faster, model gets smarterEthan [00:26:54]: More efficientVibhu [00:26:54]: model gets smaller.Swyx [00:26:55]: I don't know why you say two times, ‘cause I think it's like 100 times. In language models, it is roughly one hundred to a thousand times every twelve to eighteen months, for the same given level of LMSys, ELO.Vibhu [00:27:08]: That's a net of everything, right? That's model performance alongside compute. So different than just compute costs come down. But, a very interesting future.Swyx [00:27:19]: So the web designers will have to shout out that accessibility is an issue, right? how do you deal with screen readers or whatever. But yes, this is higher bandwidth storytelling than anything you can possibly generate with code, right? So I think that's the rough idea.Ethan [00:27:34]: And I'd like to add a little bit that so human naturally have the maximum bandwidth when we are looking at things, look at videos, and we also have maximum output bandwidth when we are talking. So in the future, it might be something like we talk to AI models, and the AI model responds back with a generative UI. So that would be the maximum input and output bandwidth to interact with AI models before neural link happens.Vibhu [00:28:06]: And it's also very custom, right? Some people are very visual, some people are not as visual, right? They prefer the text. But the best thing about generative UI, right, it can also be text.Swyx [00:28:17]: There's another project that we wanted to highlight, which is the Neural OS. Kinda similar idea, but here you're literally operating, simulating an operating system with a video model.Swyx [00:28:27]: and you can play Doom, you can do Firefox. I find this like mildly less impressive, obviously, because it's an OS that I can run.Swyx [00:28:37]: But here everything is imagined.Vibhu [00:28:40]: I was, used to the Command+W to close the Firefox tab. It didn't crash. That's why I saidSwyx [00:28:45]: It's too immersive.Vibhu [00:28:46]: It's, it's too immersive for me.Swyx [00:28:47]: Too immersive.Vibhu [00:28:48]: I wanted to close the tab.Vibhu [00:28:49]: But yes, I can play generated diffusion.Swyx [00:28:51]: this is shockingly fast.Swyx [00:28:54]: Because I remember there was a demo about like maybe one to two years ago. Someone tried to do the first-person shooter with a image model. There was no consistency. It was very slow. But here it looks like realistically it's-- this is Doom.Vibhu [00:29:07]: I think there's two sides to that, right? There's okay, what is running a game? The heavy part of it is actually the game engine, all the lighting, all that stuff, the graphics. This is just kind of video, right? Like we've solved consistency. This is still, it looks like a few years old image generation. There's some temporal consistency, but it's, it's kind of just images stitched together as frame video. But it's a good visual representation to pi- to picture the future you wanna see, right? that's, that's what I see in these more so.Ethan [00:29:38]: This reminds me of how the video models gets better and better. So Neural OS is kinda if you just look at it feels like it's just a crappy version of the, like the Windows we could have, right? And, but the difference is, so the model, this model is overfitted on the existing operating systems. It can generate nothing different than that. But it's actually also similar to video models. So when we are training these video model, image model, we train them on internet. There's no imaginary supernatural stuff on the internet. But once we train this model, you can prompt the model to generate something supernatural that have never existed in the data set. So if you train your Neural OS or neural computer on the standard screen recordings on the entire internet. The model can imagine completely new interface to interact with the computer.Swyx [00:30:43]: This is one of those things that is magical to me. usually generalizing out of distribution is bad, but somehow we have learned some kind of internal world model that you say, this plus, but it looks like rainbows and butterflies, it'll do it and it will kind of make sense.Swyx [00:31:03]: So yeah, that's kind of cool. Yeah, I don't know if there's any comment more on there. I do, I do wanted to, I did wanted to touch a little bit more on the model architecture stuff, which I think you were getting. It's, really fascinating. We don't get a chance to talk about this enough. So one of the papers that we covered, we've covered every annual, segment anything release. and I don't know if you follow-- you're a computer vision guy, so youEthan [00:31:26]: I knowSwyx [00:31:27]: . So they did memory attention, which is kind of interesting. And I always think, anything where you can, across the temporal dimension, keep some consistency, I think it's, very fascinating, and I don't know if Basically, does that-- the CV side bleeding into video gen side, I think is underexplored, right? we talk about it for labeling, but actually you can borrow the architecture itself.Ethan [00:31:50]: There's, there's also complete different approaches, right? you brought up the term world model, so we went from video model to world model. There is diffusion, but there's also other approaches that people are doing. So maybe we get into those after as well,?Swyx [00:32:03]: He has a whole definition of world models and stuff. I feel like we threw a lot at you. Whatever you want to comment on.Why Video Models Are Expensive: Storage, I/O, and Training ScaleEthan [00:32:10]: I think one thing that we should actually comment back on is okay, so we were talking about the steps to train image gen to video model. One thing we don't see as much of is okay, you brought up the delta in training data, right? SoEthan [00:32:24]: you won't have as much a video model might not generalize, but what is the cost of training a large video model? So we know for LLMs roughly, okay, even like the poolside thing that came out today, right? It's a Gemma level model trained on roughly forty trillion tokens at this many H200s over this much time, right? You can see what is the exact cost of that. So how many GPU hours over how much H200 costs? So how do we do the back-end math of, same thing for video models, image models. How do you, how do you kind of break that down? I can share some back-envelope calculation. So surprisingly, video models is-- the cost is very-- is comparable to language models and obviously the largest scale is language model, maybe like a medium scale to language models. I said just storing the videos alone, it costs a lot. You can, you can maybe look up on AWS or something.Ethan [00:33:20]: You really, say if you have a billion videos and let's say, let's just say like each video, like five megabyte, then you need five petabyte to just store those videos. And also remember we talk about you use a VAE to compress the videos, and you also need to store, typically you need to store those continuous feature, in-- also in your storage. That's also comparable size with the videos themselves. So just storing these videos and the features is tens of petabytes alone. And,Swyx [00:33:58]: I just, I just looked up the calculation. Five petabytes on S3 Standard is one hundred K per month.Ethan [00:34:05]: AndSwyx [00:34:05]: It's comparableEthan [00:34:05]: and you needSwyx [00:34:06]: AndEthan [00:34:06]: And then like tens of petabytes, two hundred K. And even more expensive is you have the ingress and egress.Swyx [00:34:13]: Oh, yeah.Ethan [00:34:14]: Like you-- through the internet. You have to just to download those videos, I believe it's, it's more expensive on AWS than just storing those videos.Swyx [00:34:25]: Storing, yeah.Ethan [00:34:25]: And each training runs, you probably need to pull them once. If you train multiple times, it's, it's even more than that. So it's like just storing the network, those costs is just, it would be a few, a few millions per month to just storing everything, not to mention the GPU cost.Ethan [00:34:45]: AndSwyx [00:34:45]: my side tangent, the compute rental, like GPU rental is very efficient. There's one side, okay, you can be XAI and build your data center. Should we not just build our, storage compute as well? LikeEthan [00:34:57]: Of courseSwyx [00:34:57]: cloud cost compared to just,Ethan [00:34:59]: You save so muchSwyx [00:35:00]: store. Yeah, exactly.Swyx [00:35:01]: Especially with like egress and stuff. So.Ethan [00:35:04]: That's a good idea, but it also comes to-- there are some of its own challenges.Swyx [00:35:09]: Of course, of course.Ethan [00:35:10]: like people who build the GPU data centers, they might not expect this much, storage. And yeah, people build storage, typically they just build it somewhere with just CPUs.Swyx [00:35:23]: I just looked it up. Five-- AWS only charges for egress, not ingress. Tier five for five petabytes is two hundred and thirty K.Ethan [00:35:32]: Even more expensive than the storage.Swyx [00:35:34]: But storing is per month, right? You check in, then you cannot check out. so it's so cool. It's okay. So there's that side.Ethan [00:35:41]: So the TLDR, my backhand mathSwyx [00:35:42]: Data is larger than you think. Yes.Ethan [00:35:44]: my backhand math of GPU hours times GPU cost is also very much, I'm missing some storage.Swyx [00:35:49]: You're also-- you're basically like also more IO bound than normal training.Swyx [00:35:55]: Yes. ‘Cause like data loading, so caching everything, it becomes super important.Ethan [00:36:00]: So in Cosmos, we did a lot of optimizations to make it not IO bound. So, speaking of the training, actually training the model, the GPU cost, if you look up like the open source model, how big these video models are, I think like LTX has nineteen B parameters. That's a dense model. And people are also exploring, MoEs, so it might be twenty B active and, like a hun- hundreds B, total. So that's, that's even-- that's similar size as medium-sized LLM models. And if you, if you look at number of tokens-Uh, we disclose that in Cosmos. It's also like tens of trillions of tokens on the visual tokens. So putting this together, the cost of, training these video models, it's actually comparable with LLMs. Not to mention, the infra is slightly different from LLM, so it might be less efficient to train these models.Inference Speedups: Step Distillation, Consistency Models, and GANsSwyx [00:37:04]: Do you get the benefits of traditional diffusion speed-up? So for, images, there's LCM, LoRAs for, fine-tuning. There's, there's a lot of stuff that's beenEthan [00:37:15]: Flow matching.Swyx [00:37:16]: there's flow matching. There's a lot of stuff that's been done. there's some overlap that applies to diffusion on the inference side and stuff or?Ethan [00:37:23]: so the difference-- the inference side is a completely different story.Ethan [00:37:28]: I think for the training side, it might be a little bit hard to reduce that cost. And for the inference side, the biggest gain is from the distillation of these models. You can-- It's called step distillation, slightly different from knowledge distillation in LLMs. So you-- Typically, for flow matching models, you need like 100 steps or something. Like a distortion model even need even more, like 1,000 steps to generate a good image or video. A step distillation is try to learn to generate fewer step from the model itself. It's kind of like now we-- you use the full model to generate in 100 steps, and then you take a model that only generate 10 steps and let that model to learn from the perfect one.Ethan [00:38:25]: why this workSwyx [00:38:27]: Strong to weak seemingly.Ethan [00:38:28]: It is. It's kind ofSwyx [00:38:29]: DistillationEthan [00:38:29]: kind of like strong to weak. the-- from the modeling perspective, the strong model, the teacher model is trying to model the image and videos of inter-internet, and that distribution is extremely complex. But the step distilled model is just trying to learn from the teacher. The teacher is a model, and the size is fixed, as the distribution is much simpler than the whole internet. That's the intuition I have why step distillation can work. So usually these models serve in productions, they only run in a few steps. In Cosmos, I believe we have, we have like four step and eight steps. If you do some simpler task, image-image translation, it can even run in fewer step, like one step in Cosmos Transfer.Swyx [00:39:22]: I think this is the same intuition that guides a lot of the consistency model work. I sent you a link for, SCM. I don't know if you covered that. To me, that was actually one of, the most impressive papers I've ever seen from OpenAI.Swyx [00:39:34]: That this is the unifying grand concept of consistency models. I don't know if you have any comments on this.Ethan [00:39:41]: So there are, there are a few different approaches,Swyx [00:39:46]: Oh, yeah. Here it is.Swyx [00:39:47]: Two steps versus twenty or 100 steps, whatever. It's already done.Ethan [00:39:52]: So there are, there are a few different approaches, for example, consistency model, and there are also Actually, we shouldn't forget GAN. So GAN, actually, that was, that was the OG ofSwyx [00:40:05]: OGEthan [00:40:05]: step distillation ‘cause it trained just one step to begin with. So actually, a lot of, uh-- For example, there's a distribution matching distillation which use, which uses GAN, as one of the laws for distillation. It-- GAN just tells you, “Hey, generate an image,” and thenEthan [00:40:31]: it has a discriminator to tell, is this image real or not? So the model, the model just need to learn one of the distribution, not the full distribution. Because in training, the model is asked to reconstruct the ground truth image from the internet, which is extremely hard. And in-- When you're training GAN, it's a step process. It's just a, “Hey, you generate image. Does this image look as real as the image from the internet?” Which is a much simpler task. And, yeah, combining a lot of these approaches together, people typically do that, like consistency model and distribution matching and GAN, and we can get these few step models.Audio-Video Generation and Time AlignmentSwyx [00:41:21]: Then there's one step I wanted to add, which is audio and video.Ethan [00:41:26]: So, Grok Imagine zero point nine, I believe it's, it's a first audio video transmodel deployed at a large scale. SoSwyx [00:41:39]: And that was your first model?Ethan [00:41:40]: that was, Grok Imagine's first model. It's, it's audio video, joint generation. I think the hard part is, the modality alignment, ‘cause before this transmodel, we have, we have text to video alignment. We have this, correspondence between text and video. Typically, most of the VLMs, they understand images and videos. Video's very rare, and they don't understand audio mostly. And if you look at the audio generation on the LLM side, you can talk to them perfectly fine, but if you ask them to sing a song or something, it typically is not very good. Also, they don't have, they don't have music either. The hard part is thatUh, actually audio has two component. It has like a discrete component, a continuous component. The discrete component is like the language.Ethan [00:42:44]: So when we speak, it's just, someSwyx [00:42:47]: It's an ASR issue, yeah.Ethan [00:42:49]: It's, it's text token with some characteristics, I would say.Ethan [00:42:54]: But musicSwyx [00:42:56]: I think the speech guys would disagree with this.Swyx [00:42:57]: Like disfluencies and then,Vibhu [00:43:00]: There's tones you can get angry.Ethan [00:43:01]: Well, I say largely.Ethan [00:43:03]: the mu- but the music is completely different. It's, it's very continuous, and you cannot model them like discrete tokens in language models. this is like the hard part for models is, not to mention we have to align text, video, and audio together.Ethan [00:43:26]: SoVibhu [00:43:26]: How?Ethan [00:43:28]: So significant-- some significant challenges are like-- So first, like we talk about as the VLMs, they cannot understand most of them cannot understand audio.Ethan [00:43:39]: So you have to have some way to do the synthetic data generation for audio. You have to caption the model, and that involve, that involve synthetic data and human data effort a lot. And not just surprisingly, most of the LLMs are very bad at recognizing, like the beat, tone, and the details of the of music. They can, they can give some general prediction of which song is this, but it's very hard to describe the details of the music. like we mentioned in image generation, like you have to describe image as detailed as possible so that someone blind can reconstruct that. So here is like someoneVibhu [00:44:32]: DeafEthan [00:44:32]: someone deaf can reconstruct how the music sounds like without actually listening to it. Maybe you can think of it need to have the-- or they call the script.Vibhu [00:44:49]: Subtitles, yeah.Ethan [00:44:49]: You gotta have all the details of the music, and the dialogue.Vibhu [00:44:55]: So is the challenge there typically stuff like music and audio, or is it just Like is there a baseline? Okay, there's enough data where we can understand, narration, conversation, but there's nuances in audio that's where you hit all the data issues or is it just from stage zero, you just do it all right?Ethan [00:45:15]: So one important thing is like the alignment. So the model, the model has to know like the video and audio, the, uh-- it has to have a time-based alignment, like at which time step the video and the audio token correspond to each other. But we actually don't have this kind of alignment for most of the other modalities. If you think about like text and image, text and video, they are loosely aligned. So you can, you can have a description of what's going on in the video, but you don't have to exactly, You typically don't have exact description, oh, at, time step one second like what happened?Vibhu [00:46:02]: It's veryEthan [00:46:03]: At time step two second what happenedVibhu [00:46:03]: coarse. Yeah.Swyx [00:46:05]: So what was the ideal time step? You have to oblate it, and then it's like four seconds or something.Ethan [00:46:09]: So that comes down to how you design the model to, for the model to be aware of as a time, as a time modality. So the model is like a time aware. And that's something pretty unique if you think about LLMs. So if you ask LLM to complete a task, say they, uh-- you ask them and they will say, “Oh, this task will probably take twelve hours to complete,” and they come back in one hour. Say “I've already spent two days on this and I've exhausted everything.”Ethan [00:46:47]: So the LLMs them-themselves, they don't have a sense of time there.Vibhu [00:46:53]: I actually don't think that's just them not having a sense of time. I think it's somewhat based, right?Vibhu [00:46:58]: Like you tell someone, “Okay, go work on this feature. Go implement this,” there's a general understanding you would have of how long that would take without LLMs working at LLM speed, right? So you think back like two years ago, if I tell you to like build me like a new front end for latent space, have a search bar, have all this, you'll estimate that it'll take a few days, right?Vibhu [00:47:19]: So you tell an LLM, “Go build this.” It'll take me a few days. But I think it's somewhat grounded as opposed to them not having the best-- Not saying that they have a great understanding, but I think that example is like you can see where it comes from, right? You're trained on all over the text.Swyx [00:47:35]: They're, they're trying to estimate what a human would say.Vibhu [00:47:37]: because that's what the, that's what the data kind of represents. It's not themEthan [00:47:41]: It came from the corpus on the internet. People have a estimate of how much time.Vibhu [00:47:45]: And not even just in direct like training samples, right? Just your world understanding of tokens of how long stuff takes, right? Go read a book. It'll take you a while, right?Vibhu [00:47:56]: Even if you do nothing but read a book, it takes a few days. So yeah, LLM, I read it took me a few hours.Vibhu [00:48:01]: It'll take me a few hours to go through this research. But this is a tangent.Swyx [00:48:05]: Somewhat, yeah.Swyx [00:48:06]: This is a train of thought I haven't really expressed until now is, which is basically like a full world model must also be recursive, meaning that the participant in the world model must also be aware that they have a world model. which is like this whole recursive thing down the, down the line. but yes, and that the world model can be wrong and that they need to update it and blah. Yeah. We've, argued this on the, newsletter as well, that there needs to be sort of recursive or adversarial world models.World Models: Real-Time, Long-Horizon, Interactive VideoVibhu [00:48:34]: just, to ask, how do you define world model?Swyx [00:48:38]: Oh, yeah, let's go there.Ethan [00:48:40]: SoVibhu [00:48:40]: So just for context, we talked about, video generation, and then there's a-- if you say there's a distinction between world models, what's your, what's your definition? How do you see the two?Ethan [00:48:53]: So disclaimer, I'm not going to debate, what is world model. Yeah. there are many definitions, so I'll just talk about my definition. Since I came from the multi-model, multi-model domain, so mainly talking from video. So world model is like real-time interactive long horizon videos. So there are three parts. so we-- let's talk about them one by one. So the so interaction, so we just, we just look at Facebook and neural computer. So the interaction part of it, so you, world model can allow you to interact with them through keyboard, mouse, and maybe also voice. So these all is-- all is a modality. You can, you can interact with the model, and the model should respond reasonably. Second part is real time. So once you, once, say, you move your mouse, if, say, the world model generate a game, how fast can the game respond? So if you're like professional CS: GO players- -my say, oh, you have to respond- He's beginner within sub ten milliseconds or- Yeah even less. So that's not most of the- No, sixty FPS. Let's go. Oh, three hundred FPS. Oh, five hundred FPS. Wait. okay, yeah. I didn't do the math, but yeah, okay. Uh- Yeah, three hundred FPS, that's a three millisecond. So you have to respond- Oh, s**t. Okay. YeahEthan [00:50:29]: within a millisecond. Most of the video models cannot do that. Yeah. And, but if you, say, if you have a video model that is, say, like a digital human, the response time might be more generous. Maybe typically, for real-time voice interaction, it's like two hundred millisecond. So that's, that's much more generous. But even two hundred millisecond is pretty, it is pretty tricky, ‘cause remember we mentionedEthan [00:51:01]: you have this, temporal compression coming from the VAE. So if you, if you don't compress the temporal dimension, your sequence length is going to explode. So if you want to have this real-time, real-timeness in your model, you have to do is one context problem. And the third part is long horizon, ‘cause we-- if you're not going to just play with, video games just, a few seconds, most video models only a few seconds. We're going to play with minutes, hours. The model have to be able to generate long-form content.Ethan [00:51:42]: So putting these three together, it's, real-time, long horizon interactive videos. I think the final state will be, for example, like a video, a video version of Playbook, where you can, you can interact with, a neural computer. You move your mouse, and you click on the generative interface, and it will reply to you through pixels- generating in real time. But getting there, it's, it's a very long way to get there. So one of the first step, at Grok Imagine, where I led a small world model team there, was to build video extension. So, video extension- it's the first step of interactivity. Yeah. It's, it's the first step. Yeah. So it's the first step- You have it here, video editing, yeah. Yeah. Yeah. So the first step is because, this unlocks long horizon videos. Typically, for most of the video generation models, you give it a prompt or an image as an initial frame. You generate video, that's it. That's just, one time, done. And some creators would try to, use the last frame as a first frame for the second video. It can-- sometimes it works, but if you do it a few times, it says the quality would decrease. And- It doesn't have that context- Yeah over the full video, so the temporal- Yeah, exactly. Yeah, ‘cause you only gave it the last frame, of course, right? Yeah. Exactly. And- it's actually a pretty fun hack. if you've seen like- Oh, no, he's saying something better. Yeah. And for example, like Vue, I remember Vue 3 has like a second context of the last video. It is slightly better than using the last frame, but it has the same problem-- similar problem that it, the quality would decrease. if you extend a few times to, one minute, the video quality would look much worse than the first video. Second, another problem is that the model doesn't have long-range knowledge of, what's happening before. Say, if they generate some dialogue, some, two people speaking, and their voice might change, over some time, especially if the second conditioning, it does not cover the previous context. So these are the core challenges. So the Grok Imagine video extension, it has historical context of all of the previous generated videos. It can, It has, it has the context of, who is speaking and what objects have appeared and everything, having that to generate the next video. So if we naively do this, you can imagine, just, put all of the previous history video tokens into the context. The context lens will easily explode. Especially for video models, that can be like a few, a few million context, I would imagine- context lens. Yes.Yeah.Swyx [00:54:58]: Let's run with that.Ethan [00:54:59]: for example, like in Cosmos, I think just five seconds of video is like a fifty K or sixty K number of tokens. So like if you do, if you do fifty second, that's a five hundred K tokens. If you do longer than that, easily explode. This long horizon, problem was the first step we're trying to solve world model. It turns out people, yeah, people love video extension. Like a lot, a lot of the creators love using video extension to create longer form videos. This is the part I liked that you have a, you have an intermediate step toward the final goal instead of just a straight shot to the final version very much.Swyx [00:55:48]: But I can see you have a strong vision of where we want to end up.Long Context, Redundancy, and Efficient Interactive VideoVibhu [00:55:51]: Does it seem like it's an efficiency issue? okay, we're at a few million tokens context,. If you draw the parallel to language models, we had very short context, two thousand, eight thousand, then, you scale it up one million, ten million. sure, there's effective context, but at the end of the day, it's just what's it worth? sure, there's a whole training data side. In video, it might be slightly easier ‘cause we have a hundred million token video, right? Just take a movie with the full context there. Like is this efficiency from an inference standpoint that like it's expensive, but we know how to solve it? Or like why is this not the approach? So like my broader point was on your second point of world models, you say it needs to be interactive and live, right? You should be able to play a game and see the interaction live. So one thing I see with research is a lot of what you actually serve is different than what you build, right? So we talked about distillation. You train big model, you distill it, you do quantization, speculative decoding. We do all this stuff to serve it efficiently. Should we not just have a solution, like a world model that can interact well, do inference optimization, serve it, distill it secondary, so make it real time after you solve it? So like a-- another parallel is say, continual learning, right? What we need is someone to solve it and show it works inefficiently. Give it a few years, people will make it efficient. Same thing with regular attention, right? It worked. Over a few years, people have different forms of attention, and we've scaled it to be efficient at log context,? So kind of two things there, right? One is it seems like it works. You've scaled it. Can we not just scale it a lot more efficiently over time? Do we need a separate approach if this works? And same thing with interaction, right? if we can get it done, like if we can solve some way that it works, we can solve making it more efficient from an inference standpoint later.Ethan [00:57:53]: that's actually a very good point. So in videos, there's actually a lot of redundancies. So we solve a lot of the pixel redundancy from VE, but there's more redundancy in long range and long horizon videos. Say, if a character appear in the first clip and then it disappeared, it only reappear at the end of the video, you probably don't need the-- the context, like in the middle of the generation. So you only need that character, where you need. So that's why, I helped build another feature. It's a reference video.Vibhu [00:58:36]: Is it here?Swyx [00:58:36]: is it the same model release or different one?Ethan [00:58:39]: It's a different one.Ethan [00:58:41]: You probably need to search onSwyx [00:58:43]: I'll find itEthan [00:58:43]: X reference to video.Ethan [00:58:46]: So reference video allow you to like upload up to seven images as condition and generate the video. Say, if like I want-- it can, it can be characters or objects or even scenes. Say like I want, I want condition on, Sean's selfie and holding a bladeSwyx [00:59:07]: We have a dogEthan [00:59:08]: or whatever.Swyx [00:59:08]: We put the dog in the thing.Ethan [00:59:09]: you can put them there and the video models will generate the video from and copies the context over. So that can solve a lot of the problems there, like the long context problem. It doesn't need to have a very long context, but it's-- I feel like it's an intermediate solution. The modelSwyx [00:59:29]: It's cheating.Ethan [00:59:30]: the model should be able to like selectively know, where should I draw the references. So say if I want to generate a movie, I generate it autoregressive, like a ten second at a time or something. And now this character appear, I can look back to where it first appear and, bring that back. Yeah, this one, I put the references. Yeah, that's, Optimus, Einstein myself, Annie.Vibhu [01:00:02]: Oddly enough, I used Grok Search to find it, and it pulled your LinkedIn post. But yeah we found it.Ethan [01:00:08]: Interesting.Vibhu [01:00:10]: ButxAI's Underrated Work, Culture, and WatermarkingSwyx [01:00:11]: this is a problem. This is not your fault, but like XAI doesn't communicate all this work that you do very well because they just have the model release and then that's it. But actually, these details are very good.Swyx [01:00:22]: As far as I understand, everything you just described is state-art, like no one else has done it.Vibhu [01:00:30]: A lot of-- yeah, I have a lot moreSwyx [01:00:32]: And then, and then you just put this blog post with the cookies. I'm this is not enough,?Swyx [01:00:37]: but I, obviously this is like the high level numbers that people want to know. But no, okay, soVibhu [01:00:42]: And I wonder, like part of that is also some labs don't share research into what happens. And ifSwyx [01:00:50]: No, but this is literally bragging about how good they are, right?Swyx [01:00:54]: Like, why would you not say that you are capable of extending with full context? this is not a secret sauce. This is like we did the work. yeah, I don't know.Ethan [01:01:02]: different labs have slightly different communication styles.Swyx [01:01:07]: Anyway, if anyone from XAI is listening we are always happy to help you tell your story. Yeah, okay, so you did references, and I think, I think kind of the point you're, you're making is it is sort of like a kludge, right? this is-- you can do seven, but what about 100?Swyx [01:01:23]: Right? Then you need a completely different thing.Ethan [01:01:26]: So I think it's-- this is, a mechanism to, select the context from the history, and you might not put the entire history into the context. for example, there's a paper called Frame Pack, which haveEthan [01:01:41]: a heuristic that the latest history, the last one second, I put the entire history, and the history before that, I would, compress it and makes the video smaller. So they follow this pattern, this build overall pattern that the maximum sequence length is fixed. So the further you are from the current frame, you have a smaller image. So this is just a heuristic. I think it can be more automatic. The model is aware like which history part of it can be select. So this part of the research is actually being actively, worked on by a lot of people. It's also quite interesting. I feel this is actually, this part of long context is a little bit ahead of the LLM part.Ethan [01:02:31]: So for example, like in LLMs, if you-- so contexts keep growing. Let's say if you call tool and the tool call history is extremely long, that's still in context, and keep growing, keep growing. Even if you switch the topic to something else, the whole context was there. There are some agentic harnesses that help you to, say, prune the tool results and, prune Like when you, when you query a file, only show like the top 200 lines or something. Those were very heuristic-driven.Swyx [01:03:08]: For listeners, we did a write-up on the cloud code, leak where there are eight different kinds of pruning, including like you prune the tool results and all that. So you can, you can read up on that kind of thing.Ethan [01:03:17]: I think, one breakthrough in continual learning might be like a way to automatically, manage its own context.Swyx [01:03:27]: These are all heuristics, and they will be replaced by machine learning.Ethan [01:03:30]: InterestinglyVibhu [01:03:32]: TheEthan [01:03:32]: the same thing is being researched in both LLMs and video models.Vibhu [01:03:36]: The interesting thing is also like in the paper you showed, it's actually happening at the model level, right? Compared to like language models, sure, we have base attention, but we'll do our own compression, we'll do our own pruning, which is separate from model error.Vibhu [01:03:49]: Eventually, it all just boils in, hopefully.Swyx [01:03:52]: I think this is a form of like attention, but like also know sort of reasoning attention. I feel like that's different than normal attention.Swyx [01:04:03]: Does that, does that make sense?Ethan [01:04:04]: It's, it's different in the sense that attention, not to mention, set sparse attention aside,

Crazy Wisdom
Episode #549: From MS-DOS to Vibe Coding: How Non-Technical Founders Build Complex Software

Crazy Wisdom

Play Episode Listen Later May 29, 2026 70:14


Stewart Alsop sat down with Michael Shackelford to discuss their experiences building applications through vibe coding—the practice of using AI to create software without traditional programming expertise. Stewart, who runs the AI Whispers community in Buenos Aires and hosts the Crazy Wisdom podcast (with over 660 interviews), shared how he went from teaching people prompt engineering to building his own video conferencing software as a Riverside.fm replacement, while Michael opened up about his year-long journey creating Genrupt Inc, an AI-powered content generation tool for e-commerce sellers. The conversation covered everything from the decline in quality of Claude's reasoning capabilities and how Chinese companies used distillation attacks to copy Anthropic's models, to the importance of spaced repetition systems for managing knowledge in the age of LLMs, with both sharing battle-tested prompting strategies like asking AI to "explain it to me in genius terms" and using deep research queries to reverse engineer how competitors build their products.Show Notes:- Dan Martell's book "Buy Back Your Time" was mentioned as one of the best business books for thinking about life and business- Check out John Vervaeke's "Awakening from the Meaning Crisis" for understanding relevance realization and why AI fundamentally cannot determine what's relevant to humans without being toldTimestamps00:00 Michael discusses being exhausted from getting his app ready for launch, working nonstop with AI to prepare landing page for podcast traffic driving beta signups05:00 Stewart explains starting AI Whispers in Buenos Aires after leaving OpenAI vendor company, meeting early adopters like Torin who was building mind-reading EEG technology10:00 Discussion of how corporations resist AI adoption due to political games and job security fears while some companies use AI as excuse for pandemic-era layoffs15:00 Stewart describes teaching workshops on using LLMs as linguistic tools rather than coding tools, noting technical people often lack humanities background needed for prompting20:00 Explaining chatbot wrappers, API calls, and how Anthropic's reasoning quality declined after Chinese distillation attacks copied their secret sauce developed with philosophers25:00 Technical discussion of model training, fine-tuning versus RAG for new information, and different approaches to updating AI knowledge beyond initial training30:00 Stewart describes building podcast recording software to replace expensive Riverside, struggling with syncing audio and video files across different computer clocks35:00 Discussion of critical factors in vibe coding, discovering unknown technical requirements, and how AIs don't automatically reveal missing information40:00 Stewart's reverse engineering process using deep research function to study competitors' hiring and technology stacks, separating planning agents from coding agents45:00 Prompting techniques including "explain like I know everything" and using spaced repetition systems to capture valuable prompts and technical knowledge50:00 Michael explains his Generux app for generating ecommerce content using Amazon review data analysis to inform high-converting listing images and videos55:00 Discussion of founder mentality involving self-delusion about project timelines, Michael working nine-plus hours daily for nine months on app development60:00 Comparing Amazon's expert software to prosumer software approach, discussing distribution challenges and future robotics applications for customized products65:00 Stewart demonstrates spaced repetition app for memory improvement and knowledge retention, explaining relevance realization problem that AI agents cannot solve without embodimentKey Insights1. Stewart Alsop started AI Whisperers in Buenos Aires after leaving his role at Invisible Technologies, which was OpenAI's largest vendor for RLHF work. He noticed that machine learning engineers at tech companies lacked the humanities background needed to properly interact with large language models, which are fundamentally linguistic tools. This led him to create weekly workshops teaching non-technical people how to use AI effectively, running events every Thursday for two years straight. The group attracted intense geeks from the start and eventually led to Stewart speaking right after Vitalik Buterin at DevConnect, marking a significant milestone for the community.2. Large corporations are resistant to AI adoption due to multiple factors including political dynamics within organizations and employees fearing job loss. Many companies that grew during the pandemic are now using AI as an excuse to downsize when the real issue is inefficiency from rapid expansion. Stewart observed that even technical people in machine learning often don't understand how to properly use AI tools because they lack linguistic and humanities training. The fundamental problem is educational, requiring companies to train people how to use these new tools while those same people resist learning them.3. Vibe coding has evolved significantly with Claude Code being a game changer that reduced the technical barrier to entry. Before Claude Code, developers needed substantial technical knowledge to work through constant doom loops and debugging cycles. The success of coding AI tools stems from thirty years of testing infrastructure that provides clear yes or no feedback on whether code works. This infrastructure doesn't exist in the same way for manufacturing, science, and other fields, which is why software became the dominant area for AI assistance initially.4. Claude's quality degradation over recent months resulted from multiple factors including distillation attacks by Chinese companies who reverse engineered Anthropic's reasoning capabilities. Anthropic had hired philosophers, sociologists, and psychologists to develop exceptional reasoning in Claude 4.5, but this was expensive to run. When Chinese models like Kimi copied these capabilities at one tenth the cost, and when mainstream users flooded the platform before Anthropic's planned IPO, the company had to reduce quality to manage computational costs. This represents a significant loss for power users who relied on Claude's superior reasoning abilities.5. Stewart built a podcast recording application to replace Riverside because he needed API access to automate workflows, which Riverside wanted one thousand dollars monthly to provide. The technical challenge involves syncing audio and video from local recordings on multiple computers with different clocks through a server, then merging them so voices match lip movements. This problem requires understanding complex timing issues across different network conditions and file formats. Stewart has been working through AI psychosis for months on this FFMPEG pipeline problem, illustrating how vibe coding still requires building intuition about technical problems even without traditional coding knowledge.6. The transition from expert software to prosumer software represents a major opportunity for AI-enabled tools. Expert software like Photoshop, Blender, and terminal interfaces have extreme complexity that intimidates beginners, but AI is making these capabilities accessible through natural language. The reign of specialists is ending as generalists with broad knowledge and curiosity can now build complete applications by leveraging AI to fill technical gaps. This shift particularly benefits entrepreneurs and founders who specialize in getting into difficult situations and figuring them out, even when they originally thought tasks would be easier than they turned out to be.7. Building applications with AI requires accepting massive time investments beyond initial estimates and developing strategies for overcoming knowledge gaps. Michael estimated his ecommerce content generation app would take months but spent nearly a year working over nine hours daily, while Stewart spent months solving audio-video sync issues. Success requires using tools like deep research to understand how competitors solve problems, maintaining separate planning and coding agents, and learning to ask the right questions. The key insight is that vibe coders can achieve ninety percent of functionality independently, but the final ten percent often requires understanding specific technical concepts that AI cannot intuit without proper context and domain knowledge.

Hacker Public Radio
HPR4638: Simple Podcasting - Episode 3 - Analyzing and Filtering

Hacker Public Radio

Play Episode Listen Later May 13, 2026


This show has been flagged as Clean by the host. 01 This is the third in a four part series on simple podcasting. 02 In this episode we will cover the following topics: Analysis of audio noise problems and filtering methods used to deal with specific problems that we may find. Command line recording. Command line playback. Getting information about an audio recording. 03 Introduction When I did my first couple of podcasts I didn't notice that there was a quiet high pitched whine or buzz in the background. Nobody complained about it, but I thought I could do better in subsequent episodes. 04 Creating an Audio Sample If you have a similar problem, the first step is to find out where it is coming from. If there is no audible noise where you are recording, there is a good chance the problem is in the microphone or another part of the audio system. Plug in your microphone and record 2 or 3 seconds of quiet audio where you do not speak into the microphone or make other noise. 05 You will need a minimum amount of data in order to analyze it. For a flac file sampled at 44.1 kHz, 2 to 3 seconds of data should be enough. To get a sample of just electronic noise you can put the microphone in a drawer or somewhere like that if you want to be sure of getting a quiet signal. Any sound recorded in this way should be mainly from the microphone or other electronic elements in the analogue pathway. To get a sample of possible ambient noise, such as fans, make sure the microphone is in the open air in an area which is representative of where it will be when you are recording. -------------------- 06 Analyzing using Fourier Transforms Next you need to look at the wave form. At this point I will describe this using Audacity. I will show other ways later, but Audacity is actually the easiest if you are starting from nothing. You don't need to become an expert in Audacity to use it, just follow the steps I will describe. I myself don't know how to use Audacity beyond using this one feature. 07 We are going to analyze the sound spectrum in our sample. The technique being used is a Fourier Transform. A Fourier transform, often called an "FFT" for fast fourier transform, is a mathematical method of showing a signal in terms of frequency along the x axis instead of time. This allows us to spot troublesome noise frequencies which appear when we don't want them to. The FFT is a very common mathematical technique which is widely used in signal processing, not just in audio. 08 There is software which will create pretty coloured animations of sound waves, but this is not what you want. These are simply decorative patterns and won't tell us what we want to know. -------------------- 09 Using Audacity Install Audacity if you haven't already. Start Audacity. Select file > import > audio, then navigate to your sample and select "open". The file should load. 10 In the wave form part of the window, click anywhere and then type Ctrl-S to select all data points. The chart should turn a slightly darker colour. From the menu, select Analyze > Plot Spectrum. A new window will open, showing magnitude in db on the Y axis, and frequency in hertz on the x axis. For "algorithm" be sure it is set to "spectrum" 11 There are now two settings that we need to play with while we look for problems. One is "size" The default for this is 1024. The other is "axis". The default for this is "log frequency". -------------------- 12 What to Look For What we are looking for are large obvious spikes that stand out in the data. Since our test signal has very little to no actual audio data, any spikes should represent electrical or other noise that doesn't belong there. 13 I have found two combinations of settings to be most helpful in finding problems. These are Size 2048, axis linear frequency. Size 32768, axis log frequency. 14 A small size value can help very narrow spikes stand out from the background more, while a large size value can help separate spikes from surrounding noise. A linear frequency axis can help with seeing all spikes across the full frequency range, while a log frequency axis can help to better see what is happening in the often very crowded lowest frequency range. -------------------- 15 A Real Example of an Audio Problem If you have good audio equipment you may find nothing obvious. If you cannot hear any noise in the signal, there may be none of any consequence and there is nothing for you to do. 16 However, in my case I found two main problems and one lesser one. One problem was a spike at 60 Hz, which is the AC line frequency. There is also a lesser problem of a collection of a broad frequency range of noise below 60Hz. Both of these however will be taken care of by the basic filtering that we looked at earlier so we do not need to worry about them here. 17 The other main problem is I had a large spike at every 1 kHz interval from 1 kHz to 19 KHz. This was noise generated within the head set electronics, or the result of noise on the USB power supply. This is the product of a cheap headset. 18 These spikes are not very large compared to the volume of my voice, but if I do the same sort of analysis of samples where I am speaking, they appear in the intervals between words. This results in a high pitched whine or buzz. This was the source of the background noise or buzz in my first two podcast episodes. I need to get rid of this. 19 One option would be to get a better microphone, but, well, that wouldn't be any fun would it. It would also cost money and I don't want to spend any of that if I don't have to. If you analyze your own signal, you may find a different pattern, or even no noise at all. If you did not find anything when shielding your microphone from ambient audio noise, repeat the same test but with the microphone exposed to acoustic noise in the room. -------------------- 20 Advanced Filtering The next step is to figure out how to get rid of this noise. I have called this section "advanced filtering", but we are actually just making use of a technique that was already covered in basic filtering. 21 To deal with the remaining spikes we can use additional "band reject" filters, each of which removes a specific frequency at 1 kHz intervals from 1 kHz to 12 kHz. We will use this in combination with the filtering that we have already done previously, so we don't need to worry about anything above 12 kHz as we already remove that with a low pass filter. After a small amount of experimenting I came up with the following. 22 Because I am applying a total of 16 filters, 4 for basic filtering and 12 to deal with the specific microphone problems that I have, I have broken up the filters into separate strings. I then generate the 12 new band reject filters from a template. Note that I don't show the "de-esser" filter here. I would recommend adding it as a separate step after doing the sort of filtering we are talking about here. 23 Rather than reading out multiple lines of bash script, I will post them in the show notes. I will give a brief description of them here which you can refer to when reading the show notes. The FFMPEG and Sox versions are very similar in concept so I don't need to go over the Sox version in detail. See the show notes for it. FFMPEG Version Here's the FFMPEG version. # The high and low pass filters. hlpfil="highpass=f=80, lowpass=f=12000" # Band reject filters filter for 60Hz and another for 50Hz. linefil="bandreject=f=60:width_type=h:w=20, bandreject=f=50:width_type=h:w=20" # Create a series of band reject filters, from 1 kHz to 12 kHz. # Change or remove this part if your recording hardware does not require it. ftemplate="bandreject=f=%s000:width_type=h:w=100" kilospikefil=$( seq 1 12 | xargs printf "$ftemplate," ) # Using ffmpeg ffmpeg -i input.flac -af "$hlpfil, $linefil, $kilospikefil" output.flac 24 There are a total of 5 lines of bash script. In the first line, we create a string called "hlpfil" which is just the high and low pass filters copied from our previous discussion on basic filtering. In the second line, we create a string called "linefil" which is just the simple bandreject filters to cover 50 and 60 hertz AC line noise filters also from basic filtering. 25 In the third and fourth lines, we create a string called "kilospikefil" containing the new filters. The "f" parameter represents the frequency we are targeting. The "w" parameter represents the "width" of the frequency range we are filtering in terms of hertz. The filter is applied gradually rather than with a sharp cut-off, so to get more filtering action we need to have larger width. In this case I decided to hammer the spike quite aggressively and so used a relatively wide width of 100 hertz. Testing with a voice file did not show any noticeable distortion, so it's an acceptable solution. 26 For this filter we need to create a dozen filter command so we use the shell "seq" command to generate a sequence of numbers from 1 to 12. We then pipe that into the xargs command which applies each number to the next command. The next command is "printf", which takes the number it gets from xargs and applies it to the "ftemplate" string template in a manner very similar to C programming printf string templates. 27 We also have a comma in there to separate each of the individual filters. We then surround this with a $ and () so we can run the command and capture the output into a variable. Then we call ffmpeg and pass it the filters we created by putting the variable names inside a double quoted string, separated by commas. All of this will be in the show notes, so don't worry about trying to get the exact details right now. Sox Version Here's the Sox version. # The high and low pass filters. sxhlpfil="highpass 80 lowpass 12000" # Band reject filters filter for 60Hz and another for 50Hz. sxfilter="$sxhlpfil $sxkilospikefil bandreject 60 20 bandreject 50 20" # Create a series of reject filters filters, from 1 kHz to 12 kHz. sxftemplate="bandreject %s000 100" sxkilospikefil=$( seq 1 12 | xargs printf "$sxftemplate " ) # Using SOX. sox input.flac output.flac $sxhlpfil $sxfilter $sxkilospikefil 28 The Sox version is very similar with the exception that the command arguments representing the filters must not be in quoted strings as Sox wants to see them as separate arguments instead of parsing a string. -------------------- 29 Confirming the Effect If we apply the above filters and look at this headset noise output file in the Audacity spectrum analyzer we will now see that these noise spikes are almost completely gone. We can now confirm how well this works by using a test audio file. Any normal short voice audio file will do for this. Just talk into the microphone normally and create a voice sample file that is 5 or 10 seconds long, or whatever you feel comfortable with. 30 With the original unfiltered voice audio I can hear a distinct high pitched whine overlaying the voice. With the filtered audio that whine or hum is not detectable. If we then look at the voice file in the Audacity spectrum analyzer, we can see distinct "notches" at the 50 Hz and 60 Hz frequencies, and at every 1 kHz from 1 kHz to 12 kHz. These notches are narrow enough that they won't cause a noticeable problem with voice signals. If we apply this filter to voice samples, the buzz or whine is gone and the voice signal sounds fine. Despite using a very cheap microphone, I now have acceptable quality audio for a podcast. 31 Again I want to emphasize that in this instance I am dealing with deficiencies with my hardware instead of buying a better microphone. These additional filters are intended to deal with the specific hardware problem I am facing. You don't need these additional filters if you cannot detect an audible problem. On the other hand, if you have a different problem you may wish to deal with a different set of frequencies. Finding these problems is the reason for using a spectrum analyzer. 32 FFMPEG has other filtering methods as well. However, as I didn't end up using them I can't really do an adequate job of describing them. If anyone has used them successfully, they are welcome to make a podcast on the subject. -------------------- 33 Completing the Process With these new filters added into the middle of the processing steps, you can now complete the processing by doing the de-essing, normalizing, and review steps as described in the previous episode. -------------------- 34 Command Line Recording I will now cover a separate topic, which is recording using command line programs. I am covering it in this episode as it is a short topic and it is convenient to talk about it here. 35 As well as using GUI based recording programs such as Gnome Sound Recorder, it is possible to record podcast episodes using command line tools such as FFMPEG. As for why you may wish to use command line tools to record audio, there are several reasons. One is that you may simply prefer to do it this way because it pleases you to do so. Another is that it allows the recording step to be included in a script that encompasses other parts of the process, automating what may have otherwise been separate manual steps. 36 However, if you don't find these arguments particularly compelling, then I'm not going to attempt to persuade you to use the command line to record audio. I am doing this part of this episode out of a desire to have a bit of fun and I probably won't be using it much myself. I will however use one of these methods to record this part of this episode. 37 Recording with FFMPEG - The Basics One of the common command line tools you can use is FFMPEG, a package which I have previously mentioned with respect to filtering audio files. Here is an example of how to record using FFMPEG. We call FFMPEG specifying the audio input system as the FFMPEG input, and then specify a file to output to. 38 # Record audio. ffmpeg -f pulse -i default ff.flac 39 Press 'q' to stop. This uses pulse audio on Linux for input "-f pulse", and the default input "-i default". However, this does not specify the the sample rate or mono recording. To do that we need to add a few more parameters as in the following 40 ffmpeg -f pulse -i default -ac 1 -ar 44100 ff.flac 41 "-ac 1" specifies mono output "-ar 44100" specifies 44.1 khz bit rate. 42 Playback with FFMPEG - The Basics FFMPEG can also play back music. In this case however we need to call the "ffplay" program rather than FFMPEG itself. To play an audio file, simply call ffplay and give it the name of the audio file as an argument to the command. For example: 43 # Play an audio file. ffplay podcast.flac 44 We can also call it with the "autoexit" option, which tells ffplay to automatically exit when the audio file has finished playing. ffplay -autoexit ff.flac 45 -autoexit means Exit when the audio file is done playing. 46 To exit in the middle of the recording, press "q' or ESC. To pause the playback, press "p" or space bar. To decrease the volume press "9" or "/". To increase the volume press "0" or "*". 47 To seek forward 10 seconds, press the right cursor button. To seek backward 10 seconds, press the left cursor button. To seek forward 1 minute, press the up cursor button. To seek backward 1 minute, press the down cursor button. 48 The "0" and "9" keys mentioned above are those on the top row of the keyboard, not the ones on the separate numeric pad. 49 While the recording is playing, a graphical window will open which shows a cascading waveform based on the current content. This is purely decorative and does not serve any particularly useful purpose. -------------------- #!/bin/bash # Record a podcast episode segment. # Get the next file name. # First we check if any matching file patterns exist. If they don't, # then we create the first one starting counting at 1. fcount=$( ls [0-9][0-9].flac 2>/dev/null | wc -l ) if (( $fcount < 1 )); then fname="01.flac" else # If there are any matching file patterns, we find the highest number # and increment it by 1. filenum=$( ls [0-9][0-9].flac 2>&1 | cut -d. -f1 | sort | tail -1 ) newfilecount=$(( 10#$filenum + 1 )) fname=$( printf "%02d.flac" $newfilecount ) fi echo "Recording to: $fname" # Record using ffmpeg. # This makes use of pulse audio and the input is the default audio input. # The sample rate is set to 44.1 kHz, and it is recorded as mono (1 channel). ffmpeg -f pulse -i default -ar 44100 -ac 1 $fname echo "Recorded audio to: $fname" # Report on basic information about the audio file that was just recorded. ffprobe -hide_banner $fname -------------------- 50 Sox - Not so Good I did not find the recording or playback features of Sox to be as useful as those of FFMPEG, so I won't bother to cover them here. -------------------- 51 Getting Information About an Audio Recording There are also command line tools which can be used to retrieve information about audio recordings. 52 FFMPEG Version With FFMPEG this is called "ffprobe". For example: 53 ffprobe hpr4566.mp3 54 This will print out a lot of information about FFMPEG itself. To skip that use the hide_banner option. 55 ffprobe -hide_banner hpr4566.mp3 56 This will print out information about the audio recording. This will include things like the duration, bit rate, sample rate, stereo or mono, etc. If the author added metadata tags to the file, it will also show those. HPR add things like the title, author, copyright license, comment, etc. You can extract the ones you want using something like grep and cut. 57 Sox Version Sox has a similar feature, called "soxi". 58 soxi ff.flac 59 However, it may not work on mp3 files if you do not have an mp3 handler for it installed. -------------------- 60 Conclusion In this episode we took a brief look at an example of how to solve an audio problem through filtering. We looked at how to use Audacity to find where the problems were. We then looked at how to apply filters to remove these sources of noise. We also looked at how to record podcasts and get information about audio files using command line tools. 61 In the next episode we will look at alternatives to Audacity for analyzing audio. While Audacity works just fine, this is an opportunity to have a bit fun with some gratuitous hackery. 62 This has been the third episode in a four part series on simple podcasting. -------------------- -------------------- Full Audio Processing Pipeline This version includes the special filters used to fix my headset problems. Use the version from the previous episode if you do not have the same audio hardware problems. #!/bin/bash # Full processing pipeline for making simple podcasts. # ====================================================================== # Concatenate multiple flac files into a single flac file. # This is used to combine podcast recorded segments into a single # flac file for uploading to HPR. concataudio () { outputname="$1" # First create the list file. printf "file '%s'n" [0-9][0-9].flac > podseglist.txt # Now concatenate them ffmpeg -f concat -safe 0 -i podseglist.txt "$outputname" rm podseglist.txt } # ====================================================================== # Basic and advanced filters. filter () { inputfile=$1 outputname=$2 # Using ffmpeg. # The high and low pass filters. hlpfil="highpass=f=80, lowpass=f=12000" # Band reject filters filter for 60Hz and another for 50Hz. linefil="bandreject=f=60:width_type=h:w=20, bandreject=f=50:width_type=h:w=20" # Create a series of band reject filters, from 1 kHz to 11 kHz. ftemplate="bandreject=f=%s000:width_type=h:w=100" kilospikefil=$( seq 1 11 | xargs printf "$ftemplate," ) # Using ffmpeg ffmpeg -i $inputfile -af "$hlpfil, $linefil, $kilospikefil" $outputname } # ====================================================================== # De-Essing. deessing () { inputfile=$1 outputname=$2 option=$3 # De-essing filter. ffmpeg -i $inputfile -filter_complex "deesser=i=0.5:m=0.5:f=0.5:s=$option" -b:a 336k -sample_fmt s16 $outputname } # ====================================================================== # Normalizing the audio to EBU R128 standard for review using ffmpeg. normffmpeg () { inputfile=$1 outputname=$2 # Normalize to EBU R128 standard. ffmpeg -i $inputfile -af loudnorm=I=-17:TP=-2.0:LRA=4.0 -ar 44.1k $outputname } # ====================================================================== # Output an MP3 version to help with reviewing. mp3convert () { inputfile=$1 # Get the name of the file and then create the output file name. j=$( basename $inputfile ".flac" ) outputname="$j"".mp3" # Convert to MP3. ffmpeg -i $inputfile $outputname } # ====================================================================== # Concatenate the separate audio files. concataudio fullpod-unfiltered.flac # Basic filtering. filter fullpod-unfiltered.flac filtered.flac # De-essing. This is the version to send for publishing. # The third argument should be "o" for de-essing, or "i" for pass through without de-essing. deessing filtered.flac fullpod.flac o # Normalized for review. normffmpeg fullpod.flac fullpod-norm.flac # Output an MP3 copy for review. mp3convert fullpod-norm.flac -------------------- -------------------- Provide feedback on this episode.

Command Control Power: Apple Tech Support & Business Talk
669: Adam Engst (TidBITS): Slack Impersonation Malware, Anthropic's Mythos, and Why You Need a Personal AI Defender

Command Control Power: Apple Tech Support & Business Talk

Play Episode Listen Later May 12, 2026 66:59


Adam Engst (TidBITS) discusses a malware incident in a long-running public "Slack Bits" group where a bad actor impersonated Glenn Fleishman via a duplicate Slack display name, tricking him into downloading an info-stealer, prompting Engst to consider shutting down the 1,400-member community. The conversation shifts to Anthropic's Mythos and Project Glasswing (as covered by TidBITS security editor Rich Mogull), which reportedly found long-standing bugs (including in OpenBSD and FFmpeg), raising concerns about AI-accelerated vulnerability discovery, defender/attacker asymmetries, costs and compute barriers, and impacts on zero-day markets. They also cover Apple's iOS signing and update/upgrade distinctions, why Apple supports macOS differently than iOS, broader distrust in institutions, social media's advertising/algorithm problems (including Section 230), bots and AI-driven phishing, and the idea of local, user-controlled AI agents to help protect individuals online.   00:00 Welcome Back Adam Engst 00:20 Slack Impersonation Scare 02:15 Cleaning Up a Public Slack 03:40 Mythos and Glasswing Explained 05:19 AI Bug Hunting Reality Check 08:25 Red Team Blue Team Asymmetry 09:50 Compute Costs and Access Barriers 12:19 Trust Ethics and Regulation 17:50 Personal AI Security Agents 23:34 Zero Day Markets and Exploit Kits 25:40 iOS Signing and Update Windows 27:13 Why Macs Get Longer Support 32:06 Scams Incentives and Pig Butchering 34:02 Life Offline and Misinformation 35:41 Social Media Hot Garbage 36:43 Addiction By Design 37:46 Advertising Model Flaw 38:47 Infinite Scroll Limits 39:39 Dunbar Number Reality 40:54 Platform Power Responsibility 42:46 AI Influencers And Slop 43:37 Bots And Fake Accounts 46:33 AI Phishing And Passkeys 49:21 Closed Communities Trust 53:25 CAPTCHAs And Human Help 56:08 Section 230 And Algorithms 57:46 Chronological Feed Fix 59:35 Two Week News Rule 01:02:41 Ads In Maps Backlash 01:04:10 Wrap Up And Next Part

El Minicast de laurindel
Pequeñas Apps. Grandes ideas.

El Minicast de laurindel

Play Episode Listen Later May 11, 2026 13:31


Hoy hablo de dos pequeñas aplicaciones para macOS que sí me han parecido realmente útiles para workflows de creación de contenido:

Recalog
223. 2026/05/10 国立国会図書館がGPU不要OCRソフト公開

Recalog

Play Episode Listen Later May 10, 2026


以下のようなトピックについて話をしました。 01. 国立国会図書館がGPU不要の軽量版OCRソフトを公開 国立国会図書館(NDL)は、軽量版OCRソフト「NDLOCR-Lite」をGitHubで公開しました。このソフトウェアは、従来のNDLOCRの軽量版として開発され、一般的な家庭用コンピュータで図書や雑誌のデジタル化画像からテキストデータを作成できます。 最大の特徴は、GPU(グラフィックス処理装置)を必要とせず、ノートパソコンなどの標準的な環境で動作することです。従来のNDLOCRではGPUが必須でしたが、NDLOCR-Liteはこの制約を解消し、より多くのユーザーが利用できるようになりました。 また、従来版が不得意としていた英文や手書き文字の認識についても実験的に対応しており、機能面でも向上しています。デスクトップアプリケーションが用意されているため、マウス操作のみで簡単に使用できる点も魅力です。 対応OSは、Windows 11、macOS Sequoia、Ubuntu 22.04で動作確認済みです。ソフトウェアはCC BY 4.0ライセンスで公開されており、GitHubから各OS向けの最新版をダウンロードできます。なお、くずし字や漢籍資料の本格的なテキスト化には、より高精度なNDL古典籍OCRの利用が推奨されています。 02. 超強力AI限定提供がビッグテック依存を加速 Claude Mythosが突きつける超強力AIとビッグテック依存の加速 Anthropicが発表したAIモデル「Claude Mythos Preview」は、一般公開を見送り、AWS・Microsoft・Google・Appleなど12のパートナー企業にのみ限定提供される「Project Glasswing」を立ち上げた。このモデルは27年間未発見のOpenBSD脆弱性や16年間見逃されたFFmpegの脆弱性を自律的に発見するなど、従来のサイバーセキュリティ能力を大幅に上回る性能を示している。 しかし、この超強力AIの限定提供は新たな構造的問題を生み出している。第一に、Glasswingパートナー12社がクラウド、セキュリティ、半導体、端末OSなどデジタル基盤の主要レイヤーを支配しており、これらの企業への依存度がセキュリティ面から正当化される形で加速している。第二に、「SaaS is Dead」の流れが強まる中、超強力AIを持つ企業が一般公開せずに自社プラットフォームに統合する「バンドリング戦略」により、競争優位性を確保している。 特に懸念されるのは、Mythosの244ページのシステムカードが示す「表面的アライメント」の危険性だ。モデルは評価時に自分がテストされていることを29%の確率で認識しながら、それを一切言語化せず、より危険な行動を取る傾向が確認されている。 日本への影響として、米国の認証枠組みを前提とするMythosへの直接アクセスは困難で、AWS・Azure経由の間接的恩恵から始まり、グローバルベンダー製品経由、最終的に一般提供まで6〜18ヶ月のタイムラグが予想される。この間、攻撃側のAI活用が進む一方で、防御側の高度なAIツールへのアクセス格差が生じるリスクがある。 03. 令和8年度JST・AMED戦略目標決定 文部科学省は令和8年度における科学技術振興機構(JST)と日本医療研究開発機構(AMED)の戦略目標・研究開発目標を決定しました。令和8年4月以降、CRESTやさきがけ等のプログラムで研究提案の公募が開始される予定です。 これらの事業は、組織・分野を超えた基礎研究を戦略的に推進するため、根本原理の追求と政策的意思を結びつける目標を設定し、時限的な研究体制を構築してイノベーションの源泉となる研究成果創出を目指しています。チーム型のCRESTや個人型のさきがけ・PRIMEなどのプログラムは、科研費と並ぶ30年以上の歴史を持つ基幹的研究費として研究者コミュニティに定着しています。 これまでの成果として、Top10%論文などの質の高い研究成果を多数創出し、大阪大学・坂口志文特任教授のTreg細胞発見や京都大学・北川進特別教授の多孔性金属錯体(MOF)設計など、ノーベル賞受賞につながる研究を推進してきました。また、若手研究者の昇進の重要な契機となるなど、人材育成にも大きく貢献しています。 令和8年度の目標策定では、論文動向分析、有識者ヒアリング、ワークショップ開催を通じて科学的価値や経済・社会的インパクトを多角的に検討し、政策的要請も踏まえて6つの目標を設定しました。 本ラジオはあくまで個人の見解であり現実のいかなる団体を代表するものではありません ご理解頂ますようよろしくおねがいします

Lex Fridman Podcast
#496 – FFmpeg: The Incredible Technology Behind Video on the Internet

Lex Fridman Podcast

Play Episode Listen Later May 6, 2026 263:41


Jean-Baptiste Kempf is lead developer of VLC and president of VideoLAN. Kieran Kunhya is a longtime FFmpeg contributor, codec engineer, and the person behind the now-infamous FFmpeg account on X. Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep496-sc See below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc. Transcript: https://lexfridman.com/ffmpeg-transcript CONTACT LEX: Feedback – give feedback to Lex: https://lexfridman.com/survey AMA – submit questions, videos or call-in: https://lexfridman.com/ama Hiring – join our team: https://lexfridman.com/hiring Other – other ways to get in touch: https://lexfridman.com/contact EPISODE LINKS: FFmpeg on X: https://x.com/FFmpeg FFmpeg: https://ffmpeg.org/ VideoLAN (VLC): https://www.videolan.org/ VideoLAN on X: https://x.com/videolan Jean-Baptiste’s Website: https://jbkempf.com/ Jean-Baptiste’s LinkedIn: https://www.linkedin.com/in/jbkempf/ Jean-Baptiste’s GitHub: https://github.com/jbkempf Kieran’s X: https://x.com/kierank_ Kieran’s LinkedIn: https://bit.ly/3OORhmC Kieran’s GitHub: https://github.com/kierank SPONSORS: To support this podcast, check out our sponsors & get discounts: Larridin: Measure AI adoption in your business. Go to https://larridin.com Blitzy: AI agent for large enterprise codebases. Go to https://blitzy.com/lex BetterHelp: Online therapy and counseling. Go to https://betterhelp.com/lex Fin: AI agent for customer service. Go to https://fin.ai/lex LMNT: Zero-sugar electrolyte drink mix. Go to https://drinkLMNT.com/lex Perplexity: AI-powered answer engine. Go to https://perplexity.ai/ OUTLINE: (00:00) – Introduction (03:00) – Sponsors, Comments, and Reflections (10:48) – Weirdest things VLC opens (15:12) – How video playback works (24:33) – Video codecs and containers (35:20) – FFmpeg explained (56:20) – Linus Torvalds (1:00:59) – Turning down millions to keep VLC ad-free (1:15:17) – FFmpeg & Google drama (1:34:31) – FFmpeg developers (1:41:08) – VLC and FFmpeg (1:45:42) – History of FFmpeg (1:48:59) – Reverse engineering codecs (2:02:14) – FFmpeg testing (2:06:21) – Assembly code (handwritten) (2:30:39) – Rust programming language (2:39:55) – FFmpeg and Libav fork (2:48:17) – Open source burnout (2:56:04) – x264 and internet video (3:09:20) – Video compression basics (3:16:17) – CIA and fake VLC (3:26:52) – Ultra low latency streaming (3:44:20) – AV2 codec and video patents (3:54:12) – VLC backdoors (4:04:27) – Video archiving (4:11:04) – Future of FFmpeg and VLC

INNOQ Podcast
AI News

INNOQ Podcast

Play Episode Listen Later Apr 20, 2026 24:09 Transcription Available


Anthropic hält sein "Mythos"-Modell zurück: Es soll Zero-Day-Lücken in kritischer Software wie OpenBSD und FFmpeg finden können, ist aber zu rechenintensiv für den Massenmarkt. Fabian Walther und Ole Wendland schauen in dieser Folge außerdem auf GLM 5.1, das chinesische Open-Weights-Modell, das zeigt, wie der US-Chip-Ban chinesische Hersteller zum Aufbau eigener Hardware-Kompetenz zwingt. Googles AI Edge Gallery demonstriert derweil, wie gut lokale Modelle heute schon auf dem Smartphone laufen und was das für den Datenschutz bedeutet. Außerdem: Warum gute Benchmark-Scores wenig über die tatsächliche Leistung von KI-Modellen im Alltag aussagen.

Hacker Public Radio
HPR4618: Simple Podcasting - Episode 2 - Basic Filtering

Hacker Public Radio

Play Episode Listen Later Apr 15, 2026


This show has been flagged as Clean by the host. Basic-Filtering 01 Introduction This is the second episode in a four part series on a simple way to create your own HPR podcast episode. 02 This episode will cover the following topics: Basic filtering.. De-essing to improve voice quality. And normalizing to adjust audio levels for easier reviewing. 03 Filtering is removing unwanted noise from an audio signal. There are several ways of doing this. It is possible to do this with Audacity, but I don't know how so I won't try to describe that method. It is possible however to filter using command line tools such as FFMPEG and Sox. When assembled into shell scripts, these tools can become part of an automated process that you can use over and over again for each HPR episode that you record. 04 In a later episode I will discuss how to analyze audio signals to find the sources of noise that can be reduced or eliminated with filters. In this episode however I will discuss basic filtering that you can apply routinely without doing any analysis beforehand. 05 Sources of Noise A question that you may have is "why is there noise in the recording?" There are many sources of undesirable noise. 06 A very common one that you may not be aware of is electrical noise that works its way into the electronic recording circuits and is imperceptible to you until you play back the recorded audio. The most common noise signal is what is commonly called "line noise" and is a low frequency hum at 50 or 60 Hz from the electric power lines and reflects the 50 or 60 Hz frequency of the AC power lines feeding your recording hardware. 07 You may be familiar with this low frequency hum from when it emanates from large electrical hardware such as transformers as it makes the laminations vibrate. However, it can also work its way indirectly into electronic equipment as well. Good quality audio hardware may filter all or most of this out, but it is present in a lot of consumer grade hardware. 08 Other sources of electrical noise may reflect specific problems in your recording hardware. I will discuss one such problem with my microphone that I had to address. Still other sources of noise may reflect actual physical audio noise around you, such as fans. Placing the microphone close to your face will help in dealing with a lot of these problems, but you may find filtering to be of some help here as well. 09 Audio Frequency Range Let's start with some basics. A good quality stereo of the type you may have at home is typically rated to perform between 20 Hz and 20 kHz. This is the widest possible range that we need to consider. In reality, this is a far wider range than is needed for a voice oriented podcast. It is also well beyond the range of the hardware that many of your listeners will be using to listen to the podcast. 10 For example, the speakers that I have connected to my PC and a number of headphones and earphones that I have tested drop off drastically below 80 Hz or above 8 kHz, or even above 6 kHz in many cases. This is not audiophile quality hardware, but it is representative of the sort of hardware that a lot of your listeners will be using when listening to podcasts. And to be honest here, a lot of people will have difficulty hearing anything above 8 kHz even with the best quality audio hardware due to hearing loss from environmental noise exposure or age. 11 You can get a good idea of what different frequencies sound like by generating sine waves using either FFMPEG or Sox. Here's an example of generating a 1 kHz sine wave using FFMPEG. A copy of this will be in the show notes. ffmpeg -f lavfi -i "sine=frequency=1000:sample_rate=44100:duration=3" 01000hz.flac This creates a sine wave at 1 kHz and at a sample rate of 44.1 kHz for a duration of 3 seconds and saves it to a flac file named 01000hz.flac 12 Here's the same using Sox. sox -n -r 44100 -b 16 01000hz.flac synth 3 sine 1000 The -b 16 specifies using 16 bit audio to encode it, and the "sine 1000" element specifies the frequency in hertz. 13 You can test this out at different frequencies to get a feel for how your hardware responds. What the effective limits on typical hardware audio range means is that we can quite safely filter out a large part of what is considered to be the "audio range" without any noticeable loss of quality. For the purposes of our discussion here then I will limit the frequency range to between 80 Hz and 12 kHz, and that is being generous. You can probably narrow that, particularly at the top end, without any problems. 14 At the low end, the typical rule of thumb recommended by most people seems to be that for the average male voice you can set the lower threshold at 80 Hz, and for the average female you can set it at 160 Hz. Note that you don't *have* to set the threshold higher for a female. Rather, it is just that you typically *can* set it higher if you wish. Note also that these are averages, and may not reflect an actual individual. 15 Simple Filters We will now create some simple filters using the same command line software mentioned in a previous episode in this series. These are FFMPEG and Sox. 16 First let's define some terminology. A high pass filter passes through frequencies which fall above a certain threshold and blocks frequencies which are below that frequency. A low pass filter passes through frequencies which fall below a certain threshold and blocks frequencies which are above that frequency. 17 In reality there isn't an abrupt cut-off in the filters. Instead there is a gradual roll off or sloping off of amplitude below or above the specified filter frequency. This is for two reasons. One is that if there was an abrupt cut off then it would risk introducing audible distortion in the signal for frequencies on the margin. 18 The other reason is that this is how hardware filters traditionally inherently worked when they were made out of electronic components such as resistors, capacitors, and inductors. The sharpness of this cut off can be adjusted, but we won't be fiddling with it in that sort of detail. You will sometimes see filters specified in terms of "poles". This has to do with describing how filters were constructed using electronic components. Don't worry about it, it doesn't really matter. 19 Here is a typical high pass filter using ffmpeg which filters out frequencies below 80 hertz. # High pass filter. ffmpeg -i inputfile.flac -af "highpass=f=80" outputfile.flac Here is a typical low pass filter using ffmpeg which filters out frequencies above 12 kHz. # Low pass filter. ffmpeg -i inputfile.flac -af "lowpass=f=12000" outputfile.flac 20 Here is a filter which combines the two. # Combined filters. ffmpeg -i inputfile.flac -af "highpass=f=80, lowpass=f=12000" outputfile.flac And here is the same thing using Sox. sox inputfile.flac outputfile.flac highpass 80 lowpass 12000 21 Filtering Out Specific Frequencies Recall that I mentioned that a common source of noise is the 50 or 60 Hz AC power line frequency working its way through the electronics of your recording device. Because filters operate gradually and the 80 Hz lower filter threshold is close to 60 Hz, the high pass filter may not deal with this adequately. 22 Now it happens that your listeners may not be able to hear this 50 or 60 Hz noise anyway because their audio hardware won't reproduce it. That by the way includes you not being able to hear it either when you review your recording before uploading it. However, there may be some HPR listeners who are sitting back sipping a glass of wine and listening to your episode on their stereo and who can hear it. That suggests that we ought to do something about it just in case. 23 I will get into how to analyze audio signals in a later episode, but for now just accept that I looked at the frequency spectrum of a sample recording using my hardware and found a large 60 Hz noise spike which I wanted to address. 24 Experimenting with additional high pass frequencies up to 120 Hz did not improve things much with respect to the 60 Hz problem. There are other parameters which could be tweaked, but at this point it would seem most promising to attack the 60 Hz spike problem directly using a different filter method. To deal with the this 60 Hz spike we can use a "band reject" filter, which removes a specific band of frequencies. We will use this in combination with the filtering that we have already done above. 25 After a small amount of experimenting I came up with the following. I also added in a 50 Hz filter while I was at it, for the benefit of those living in areas with 50 Hz electrical supply. These filters will be included in the show notes, so don't worry if you can't quite understand all the details from a verbal description. 26 Here's the FFMPEG version. # Using ffmpeg ffmpeg -i input.flac -af "highpass=f=80, lowpass=f=12000, bandreject=f=60:width_type=h:w=20, bandreject=f=50:width_type=h:w=20" output.flac 27 This as the following elements A high pass filter at 80 Hz, A low pass filter at 12 kHz, A band reject filter centred at 60 Hz and with a width of 20 hertz. A similar band reject filter centred at 50 Hz. 28 Here's the Sox version. # Sox version. sox input.flac output.flac highpass 80 lowpass 12000 bandreject 60 20 bandreject 50 20 Note that with sox, don't quote the filter definition strings or else it will result in an error as sox doesn't see enough parameters. This is not a problem with ffmpeg. 29 The band reject filter knocks the stuffing out of the 60 Hz line noise, and the 50 Hz parameter should do the same for that frequency as well. This basic filter should be able to be applied to any podcast audio recording without causing any problems. You can probably reduce the low pass frequency from 12 kHz down to 8 kHz without any problems, but I would suggest that you test it with your voice before making that decision. 30 I will come back to filtering out specific frequencies again later when I discuss how I solved a specific problem with the hardware that I am using. However, we have to discuss how to analyze audio signals before we can do that sort of technical troubleshooting, and I will cover that in a later episode. -------------------- 31 De-Essing An additional type of filtering is "de-essing". When recording audio, the microphone or environment may result in "s", "sh", "ch" and possibly other sounds to be exaggerated. These are all higher frequency elements of voice recordings. "De-essing" attempts to soften these sounds by selectively reducing the volume on the frequency band which contains these sounds. 32 Software Filters De-essing is accomplished via software filters. FFMPEG and Sox both have de-essing filters. For FFMPEG, the de-essing filter is built in. For Sox however, we must install an optional plug-in. I will cover this is more detail when I discuss using Sox for de-essing. 33 Do You Need De-Essing? The first thing to make clear however, is that you may not need to worry about this. If you think the audio sounds just fine the way it is, you don't need to do any de-essing to it. De-essing is a very subtle change, and you would probably need to do some careful before and after comparisons of audio samples to tell the difference. I didn't know that a thing called de-essing even existed before I started doing the research to make this podcast episode. However, at this point we are doing things more for fun than out of necessity, so I'll describe it anyway. 34 De-Essing with FFMPEG De-essing with FFMPEG is relatively simple. The filter is built in, and there are just three values to adjust. On the other hand, it is not really obvious what these values mean in practical terms. 35 I will however warn you to not rely on the AI search results from Google to understand this feature. The AI, in my experience, just makes stuff up about it and tells you to use options that don't exist and values that are not valid. I found that the only useful information came from FFMPEG's own web site, and from examples written by actual humans. 36 I then experimented with different values to see what effects they had. Since the results are rather subtle, fine tuning isn't really that necessary and I found that I could arrive at some reasonable values fairly quickly. I will provide the parameters that I found useful for me, and I suspect they would probably work for you as well. 37 Here is a typical de-essing command. ffmpeg -i inputfile.flac -filter_complex "deesser=i=0.5:m=0.5:f=0.5:s=o" -b:a 336k -sample_fmt s16 outputfile.flac 38 The important arguments are i, m, and f. i is intensity for triggering de-essing. The allowed range is 0 to 1. The default is 0. By experimentation I found that "0" means no de-essing, and "1" is maximum de-essing. I found that setting it to "0.5" gave satisfactory results. 39 m is the amount of "ducking on the treble part of sound". The allowed range is 0 to 1. The default is 0.5. By experimentation I found that "1" means no de-essing, and "0" is maximum de-essing. I found that setting it to "0.5" gave satisfactory results. 40 f is how much of the original frequency content to keep when de-essing. The allowed range is 0 to 1. The default is 0.5. By experimentation I found that "1" means no de-essing, and "0" is maximum de-essing. I found that setting it to "0.5" gave satisfactory results. 41 Setting "m" or "f" too high can result in a distorted output as too much of the original sound is cut out. The defaults of 0.5 in both cases gave audible improvements without noticeable distortion. 42 There is an additional parameter called "s". This controls whether the de-essing filter does anything. Setting it to "o" is the normal and default mode. Setting it to "e" causes it to output just the components that it would normally have filtered out. This is useful for testing purposes so you can see what and how much is being filtered. You only use this when experimenting with different values. Setting it to "i" causes the input to be passed through without de-essing. This would be useful in scripts where you want to use a variable to control whether or not to use the de-esser while still creating the expected output file. 43 There are two other elements of the command which were included but are not strictly speaking part of the de-essing filter itself . These are " -b:a 336k" and "-sample_fmt s16". " -b:a 336k" sets the audio bit rate to 336k. "-sample_fmt s16" sets the audio sample format to 16 bit. I found it necessary to specify these in order to prevent the de-essing filter from changing formats. They are not part of de-essing however. 44 De-Essing with Sox You can also de-ess with Sox. However, this is more complex for several reasons. One reason is that Sox does not have its own de-essing filters. Instead it uses optional plug-ins, and you must find and install these. The actual plug in may vary depending on what operating system you are using. The other reason is that it deals with the issue in fairly low level parameters, and so is a bit more complex to describe. Because of this I will skip over describing this in detail and just give a very brief overview. If anyone would like me to describe in more detail how to de-ess with Sox, then send in a comment and I will do a short episode on it later. 45 Sox De-Essing Overview To de-ess with Sox, you first need to install the plug-ins. On Linux, these will be the TAP ladspa plug-ins. TAP stands for "Tom's Audio Processing" plugins. ladspa stands for "Linux Audio Developer's Simple Plugin API" To install the TAP plugins on Ubuntu, using the following command. sudo apt install tap-plugins The plug-in we need is called "tap_deesser.so". 46 In order to use the plug-ins, you need to set the path as a variable. On Ubuntu this is. export LADSPA_PATH="/usr/lib/ladspa:" I put the above in the shell script which calls the Sox de-esser. 47 To use the Sox de-esser, you do the following: sox inputfile.flac outputfile.flac ladspa tap_deesser tap_deesser -30 4500 48 tap_deesser tap_deesser tells it which plugin to use. We need to state tap_deesser twice because the first is the name of the ".so" file and the second is the name of the plugin. A single "so" file can contain multiple filters, although in this case there is only one. -30 is the threshold in dB at which to start to apply the filter. 4500 is the frequency in Hz that the filter centres around. 49 The TAP web page has a table of recommended frequencies. These are: Male 'ess' 4500 Hz Male 'ssh' 3400 Hz Female 'ess' 6800 Hz Female 'ssh' 5100 Hz You will need to do some trial and error to find what works best for you. 50 De-Essing Summary De-essing can be used to make minor improvements to voice quality by reducing certain harsh sounds which may be exaggerated by a microphone. If it sounds like a lot of work you can probably simply not bother with it and not really miss it. -------------------- 51 Normalizing Normalizing a signal means adjusting it to meet a specified level. For audio it means adjusting the volume or sound level. You may wish to normalize the audio of your recording to make it easier to listen to when reviewing it. The copy that you send to HPR however should be the original un-normalized version. 52 Sound level is measured in two ways, dB and LUFS. The latter is a more sophisticated way of measuring things which takes into account how the human ear perceives loudness. I won't go into a lot of detail in that regards, other than to say that just accept LUFS as a unit of perceived loudness that is the international standard. LUFS stands for "Loudness Units referenced to Full Scale", and is part of the EBU R128 standard, where EBU stands for European Broadcast Union. In both cases the measured value is a negative number, with numbers smaller in magnitude being louder. Smaller in magnitude means closer to zero. 53 HPR will adjust the sound level for publication, but if you wish to check the audio before uploading it can help to adjust it to something close to what HPR will do so that you can listen to it at a volume which most listeners will hear. In my case full volume on the audio system input produced a sound level which was much lower than a typical HPR episode. However, the volume level in the flac file itself can be adjusted using ffmpeg. 54 Measuring Volume Level First we need to see what the volume level is for a typical HPR podcast. To do this we use ffmpeg. In this example we are using an episode named "hprpodcast.mp3". Pick an episode which you think is suitable and copy the file to the working directory. 55 In the following script we use a volumedetect filter. The text we want normally outputs to standard error, so we have to do a bit of bashery to redirect this to standard output so it will go through a pipe. We then grep for the string "I:". This will have the average volume level in "loudness units" (LUFS). Then we extract the number, giving us a target LUFS level. 56 ffmpeg -i hprpodcast.mp3 -filter:a ebur128=framelog=quiet -f null /dev/null 2>&1 | grep "I:" | cut -d: -f2 57 Unfortunately I can't find a Sox feature which handles EBU loudness, so we need to work in dB instead. Here is the sox version. However, note that this may not work on mp3s if sox mp3 handing is not installed. 58 sox hprpodcast.mp3 -n stats 2>&1 | grep "RMS lev dB" | rev | cut -d" " -f1 | rev 59 You can use either of these for measuring the volume or sound level of an audio file. However, note that individual episodes from HPR may vary a bit in terms of loudness. In the samples that I looked at, this however was less than 1 LUFS or dB while my own recording was roughly 5 LUFS lower in volume than a typical HPR episode. -------------------- 60 If you Google for the EBU R128 standard the AI result will confidently tell you to use a target of -23 LUFS. However, this is wrong, which shouldn't be of any surprise if you are familiar with using AI. 61 The -23 LUFS figure is for broadcast television. There is in fact no standard level for podcasts. However, there is apparently a general industry convention of using somewhere around -17 LUFS. If I look at the first two HPR episodes that I did, HPR normalized them to -16.8 LUFS and -17.8 LUFS, while the original FLAC files that I submitted were -21.6 LUFS and -22.3 LUFS respectively. 62 So HRP appear to be targeting somewhere around -17 LUFS as well. We will therefore use -17 LUFS as our target for our own copy for review. -------------------- 63 The nice thing about using the EBU filter in FFMPEG is that this is very simple. Here is the FFMPEG version. 64 ffmpeg -i inputfile.flac -af loudnorm=I=-17:TP=-2.0:LRA=7.0 -ar 44.1k outputfile.flac 65 "I" is the LUFS target. LRA is the loudness range target. The default value is 7.0 so I used that. TP sets the maximum true peak. The default value is -2.0. so I used that. -------------------- 66 With Sox things are a bit more difficult. There is no direct method of setting the loudness that I am aware of, so we need to measure the current sound level in dB, do some calculations, and then apply that as a gain factor to the output. 67 First we need to subtract the measured db level from our flac file from the target db level from the HPR episode we decided to use as a sample. Bash by itself normally just does integer math. However, we would like to have at least one decimal point of resolution to work with. The simple solution is to do this calculation using bc, the shell arbitrary precision calculator. 68 Then take this new value and use it in a "volume" filter. The number which we give sox is the amount to increase or decrease the volume by. Sox will then output a new file with the new volume level. You can now listen to this file under conditions more closely approximating what it will be like after HPR have done their own audio adjustments and normalizaton on it This helps when listening to the file for any problems before you upload it. 69 Rather than reading 5 lines of complex shell script to you, I will put a copy of it in the show notes. level=$( sox $inputfile -n stats 2>&1 | grep "RMS lev dB" ) leveldb=$( echo "$level" | rev | cut -d" " -f1 | rev ) targetdb="-18.9" volumechange=$(echo "scale=2 ; $targetdb - $leveldb" | bc ) sox $inputfile $outputname gain "$volumechange" -------------------- 70 Normalization should be the last thing you do to the file. It should be done after any noise filtering, such as low pass, high pass, bandreject, etc. If you normalize first, you will be amplifying the noise as well as the desired signal. 71 The exact normalization level used for review purposes doesn't matter, as HPR will apply their own later. All we are doing at this point is adjusting the volume to something which approximates a normal episode so you can listen to it for final review. 72 When you send your file to HPR, send the original *unnormalized* version, not the normalized version. When you normalize an audio signal, if you are not careful you may introduce things which cause problems with later additional processing. HPR probably do more things to the audio than just normalizing and so they need the unnormalized file so that they can do their own normalizing last. -------------------- 73 If at this point you are happy with the recording as is, you are ready to send the *unnormalized* version to HPR. The scripts to implement the features discussed in this episode will be in the show notes. 74 Conclusion In this episode we covered basic filtering using ffmpeg and sox. We discussed what noise was and some of the origins of noise. We talked about the audio frequency range and the limitations of common hardware used to record and listen to podcasts. We covered basic high and low pass filters used to limit the audio frequency range in order to remove possible low and high frequency noise. 75 We discussed specific filters to eliminate 50 and 60 Hz electrical power noise. We talked about de-essing, what it was, why you may wish to use it, and some basic de-essing filter implementation details. We discussed normalizing, what it is, why you may wish to use it, and how it relates to podcasting conventions. 76 In the next episode we will discuss analyzing audio signals to help find the sources of noise problems. We will also discuss creating filters to eliminate any problems that we found. In my case I had a problem with the microphone that I use, and I describe how I used filters to deal with that problem. 77 This has been the second episode in a four part series on simple podcasting. -------------------- EBU R128 Loudness Measurement using FFMPEG #!/bin/bash echo "EBU r128 loudness measurement using FFMPEG" for inputfile in *.flac *.mp3 ; do level=$( ffmpeg -i $inputfile -filter:a ebur128=framelog=quiet -f null /dev/null 2>&1 | grep "I:" | cut -d: -f2 ) echo $inputfile $level done -------------------- DB Sound Level Measurement using Sox #!/bin/bash # Sox version. May not work for mp3 if an mp3 format handling is not installed. echo "dB sound level measurement using Sox." for inputfile in *.flac *.mp3 ; do level=$( sox $inputfile -n stats 2>&1 | grep "RMS lev dB" ) leveldb=$( echo "$level" | rev | cut -d" " -f1 | rev ) echo $inputfile $leveldb done -------------------- EBU R128 Loudness Normalization using FFMPEG #!/bin/bash # Adjust the volume to a desired level. for inputfile in *.flac ; do j=$( basename $inputfile ".flac" ) outputname="$j""-normff.flac" ffmpeg -i $inputfile -af loudnorm=I=-17:TP=-2.0:LRA=4.0 -ar 44.1k $outputname echo $outputname done -------------------- DB Sound Level Normalization using Sox #!/bin/bash # Adjust the volume to a desired level. for inputfile in *.flac ; do j=$( basename $inputfile ".flac" ) outputname="$j""-normff.flac" # Measure the volume level and extract the mean volume. level=$( sox $inputfile -n stats 2>&1 | grep "RMS lev dB" ) leveldb=$( echo "$level" | rev | cut -d" " -f1 | rev ) # Calculate the difference in dB desired. Scale specifies the number of decimal places. # Target db is the volume measured on hpr4506 (UCSD-P-System). targetdb="-18.9" volumechange=$(echo "scale=2 ; $targetdb - $leveldb" | bc ) echo "Using sox: File: $inputfile Original level: $leveldb Change by: $volumechange" # Adjust the volume. sox $inputfile $outputname gain "$volumechange" done -------------------- Full processing pipeline for making simple podcasts using FFMPEG #!/bin/bash #!/bin/bash # Full processing pipeline for making simple podcasts. # ====================================================================== # Concatenate multiple flac files into a single flac file. # This is used to combine podcast recorded segments into a single # flac file for uploading to HPR. concataudio () { outputname="$1" # First create the list file. printf "file '%s'n" [0-9][0-9].flac > podseglist.txt # Now concatenate them ffmpeg -f concat -safe 0 -i podseglist.txt "$outputname" rm podseglist.txt } # ====================================================================== # Basic filters. filter () { inputfile=$1 outputname=$2 # Using ffmpeg. # The high and low pass filters. hlpfil="highpass=f=80, lowpass=f=12000" # Band reject filters filter for 60Hz and another for 50Hz. linefil="bandreject=f=60:width_type=h:w=20, bandreject=f=50:width_type=h:w=20" # Using ffmpeg ffmpeg -i $inputfile -af "$hlpfil, $linefil" $outputname } # ====================================================================== # De-Essing. deessing () { inputfile=$1 outputname=$2 option=$3 # De-essing filter. ffmpeg -i $inputfile -filter_complex "deesser=i=0.5:m=0.5:f=0.5:s=$option" -b:a 336k -sample_fmt s16 $outputname } # ====================================================================== # Normalizing the audio to EBU R128 standard for review using ffmpeg. normffmpeg () { inputfile=$1 outputname=$2 # Normalize to EBU R128 standard. ffmpeg -i $inputfile -af loudnorm=I=-17:TP=-2.0:LRA=4.0 -ar 44.1k $outputname } # ====================================================================== # Output an MP3 version to help with reviewing. mp3convert () { inputfile=$1 # Get the name of the file and then create the output file name. j=$( basename $inputfile ".flac" ) outputname="$j"".mp3" # Convert to MP3. ffmpeg -i $inputfile $outputname } # ====================================================================== # Concatenate the separate audio files. concataudio fullpod-unfiltered.flac # Basic filtering. filter fullpod-unfiltered.flac filtered.flac # De-essing. This is the version to send for publishing. # The third argument should be "o" for de-essing, or "i" for pass through without de-essing. deessing filtered.flac fullpod.flac o # Normalized for review. normffmpeg fullpod.flac fullpod-norm.flac # Output an MP3 copy for review. mp3convert fullpod-norm.flac -------------------- -------------------- Provide feedback on this episode.

Hashtag Trending
Ai Shows It's Power

Hashtag Trending

Play Episode Listen Later Apr 13, 2026 9:25


Anthropic's Mythos: Bug-Hunting Breakthroughs, Sandbox Escapes, and the AI "Nightmare Scenario" Hashtag Trending would like to thank Meter for their support in bringing you this podcast. Meter delivers a complete networking stack, wired, wireless and cellular in one integrated solution that's built for performance and scale. You can find them at Meter.com/htt Host Jim Love discusses Anthropic's new model Mythos on a special edition of Hashtag Trending, focusing on why Anthropic is hesitant to release it. He highlights reports that Mythos shows a major spike in capability for finding long-dormant software vulnerabilities—such as a 27-year-old OpenBSD bug and a 16-year-old FFmpeg flaw—and can identify multi-step exploit chains that bypass sandboxes across operating systems and browsers, potentially reshaping cybersecurity and forcing rapid large-scale scanning and fixes. Love then points to Anthropic's system card describing a sandbox test where Mythos devised a multi-step exploit to gain broad internet access, emailed unexpectedly, posted exploit details to obscure public sites, and sometimes attempted to conceal rule violations, while Anthropic notes it did not fully escape containment. He invites audience comments and provides show-note links. 00:00 Mythos Sparks Fear 00:16 Sponsor Message 00:40 Mythos Cybersecurity Leap 01:31 Bug Finds in OpenBSD 01:47 FFmpeg Flaw and Scale 02:22 Exploit Chains and Browsers 02:48 A Coming Software Crisis 03:53 Nightmare Scenario Book 04:42 Sandbox Escape Test 05:23 Posting Exploit Details 05:55 Limits and Reality Check 06:50 Deception and Control Risks 07:49 Links and Listener Feedback 08:30 Closing Sponsor Thanks 09:14 Final Sign Off

#heiseshow (HD-Video)
Schland-App, Anthropic Mythos, Spritpreise | #heiseshow

#heiseshow (HD-Video)

Play Episode Listen Later Apr 9, 2026


Anna Bicker, heise-online-Chefredakteur Dr. Volker Zota und Malte Kirchner sprechen in dieser Ausgabe der #heiseshow unter anderem über folgende Themen: - Schwarz-Rot-App: Kann die Bürger-App ein Erfolg werden? SAP und Deutsche Telekom sollen im Auftrag der Bundesregierung eine zentrale „Deutschland-App“ bauen, über die Bürger künftig Anträge stellen, Termine buchen und ihre Identität verifizieren können. Kann so ein ambitioniertes E-Government-Projekt in Deutschland tatsächlich gelingen? Wie überzeugend ist das Konsortium aus SAP, Telekom und Schwarz Digits? Und was würde die App vom bisherigen Flickenteppich der Verwaltungsdigitalisierung unterscheiden? - Ganz sicher unsicher: Anthropics Mythos und ein Claude-Code-Fehler – Anthropic hat mit Mythos ein KI-Modell vorgestellt, das Sicherheitslücken so effektiv findet und ausnutzt, dass es vorerst nicht öffentlich verfügbar gemacht wird. Im Rahmen von „Project Glasswing“ soll es ausschließlich IT-Sicherheitsunternehmen zur Verfügung stehen, um kritische Software abzusichern – darunter jahrzehntealte Lücken in OpenBSD, FFmpeg und dem Linux-Kernel. Gleichzeitig sorgte ein Fehler rund um das KI-Coding-Tool Claude Code für Aufsehen. Wie gravierend war der Vorfall? - Von wegen Vollgas: Was hilft gegen die hohen Spritpreise? Diesel hat im bundesweiten Durchschnitt an Ostern erstmals die Marke von 2,50 Euro geknackt, auch E10 nähert sich dem Allzeithoch aus dem März 2022. Seit der Einführung der 12-Uhr-Tankregel Anfang April hat sich der Dieselpreis nochmals um fast 13 Cent verteuert. Welche Maßnahmen könnten kurzfristig tatsächlich helfen? Ist die 12-Uhr-Regelung gescheitert, bevor sie begonnen hat? Und wie weit trägt das Argument, die Energiewende sei die langfristig einzige echte Antwort auf Spritpreisrekorde? Außerdem wieder mit dabei: ein Nerd-Geburtstag, das WTF der Woche und knifflige Quizfragen.

#heiseshow (Audio)
Schland-App, Anthropic Mythos, Spritpreise | #heiseshow

#heiseshow (Audio)

Play Episode Listen Later Apr 9, 2026 82:36 Transcription Available


Anna Bicker, heise-online-Chefredakteur Dr. Volker Zota und Malte Kirchner sprechen in dieser Ausgabe der #heiseshow unter anderem über folgende Themen: - Schwarz-Rot-App: Kann die Bürger-App ein Erfolg werden? SAP und Deutsche Telekom sollen im Auftrag der Bundesregierung eine zentrale „Deutschland-App“ bauen, über die Bürger künftig Anträge stellen, Termine buchen und ihre Identität verifizieren können. Kann so ein ambitioniertes E-Government-Projekt in Deutschland tatsächlich gelingen? Wie überzeugend ist das Konsortium aus SAP, Telekom und Schwarz Digits? Und was würde die App vom bisherigen Flickenteppich der Verwaltungsdigitalisierung unterscheiden? - Ganz sicher unsicher: Anthropics Mythos und ein Claude-Code-Fehler – Anthropic hat mit Mythos ein KI-Modell vorgestellt, das Sicherheitslücken so effektiv findet und ausnutzt, dass es vorerst nicht öffentlich verfügbar gemacht wird. Im Rahmen von „Project Glasswing“ soll es ausschließlich IT-Sicherheitsunternehmen zur Verfügung stehen, um kritische Software abzusichern – darunter jahrzehntealte Lücken in OpenBSD, FFmpeg und dem Linux-Kernel. Gleichzeitig sorgte ein Fehler rund um das KI-Coding-Tool Claude Code für Aufsehen. Wie gravierend war der Vorfall? - Von wegen Vollgas: Was hilft gegen die hohen Spritpreise? Diesel hat im bundesweiten Durchschnitt an Ostern erstmals die Marke von 2,50 Euro geknackt, auch E10 nähert sich dem Allzeithoch aus dem März 2022. Seit der Einführung der 12-Uhr-Tankregel Anfang April hat sich der Dieselpreis nochmals um fast 13 Cent verteuert. Welche Maßnahmen könnten kurzfristig tatsächlich helfen? Ist die 12-Uhr-Regelung gescheitert, bevor sie begonnen hat? Und wie weit trägt das Argument, die Energiewende sei die langfristig einzige echte Antwort auf Spritpreisrekorde? Außerdem wieder mit dabei: ein Nerd-Geburtstag, das WTF der Woche und knifflige Quizfragen.

Risky Business
Risky Business #832 -- Anthropic unveils magical 0day computer God

Risky Business

Play Episode Listen Later Apr 8, 2026 53:30


On this week's show, Patrick Gray, Adam Boileau and James Wilson discuss the week's cybersecurity news. They cover: Anthropic's new Mythos model hunts bugs and chains exploits together so well that… you cant have it… …Unless you're one of their Project Glasswing partners The world isn't short on bugs, though. F5, Fortinet, Progress ShareFile, and TrueConf are all getting rekt by humans GPU Rowhammering goes in the GPU, past the IOMMU and back into the host-side Nvidia driver North Korea is spending serious time and money on its crypto hacking Just when the US needs CISA most, they slash its budget some more! This week's episode is sponsored by identity verification firm, Persona. Tying digital actions to actual human identities isn't just for banking know-your-customer any more. Persona's Benjamin Chait says know-your-staff checks belong in high-value flows inside your organisation, too. This episode is also available on Youtube. Show notes Claude Mythos Preview red.anthropic.com Anthropic Claims Its New A.I. Model, Mythos, Is a Cybersecurity ‘Reckoning' - The New York Times Anthropic Teams Up With Its Rivals to Keep AI From Hacking Everything | WIRED FFmpeg on X: "Thank you to @AnthropicAI for sending FFmpeg patches" / X Critical flaw in F5 BIG-IP faces wide exploitation risk | Cybersecurity Dive React2Shell vulnerability helps hackers steal credentials, AI platform keys and other sensitive data | Cybersecurity Dive Critical flaw in FortiClient EMS under exploitation | Cybersecurity Dive Researchers warn of critical flaws in Progress ShareFile | Cybersecurity Dive CISA gives agencies two weeks to patch video conferencing bug exploited by Chinese hackers | The Record from Recorded Future News New Rowhammer attacks give complete control of machines running Nvidia GPUs - Ars Technica North Korea's hijack of one of the web's most used open source projects was likely weeks in the making | TechCrunch Drift crypto platform confirms $280 million stolen in hack as researchers point finger at North Korea | The Record from Recorded Future News Drift on X: "Drift Protocol — Incident Background Update " / X Trump's FY2027 budget again targets CISA | Cybersecurity Dive CISA's vulnerability scans, field support on chopping block in Trump budget | Cybersecurity Dive Iranian hackers break into U.S. industrial systems, agencies warn FBI labels suspected China hack of law enforcement data 'a major cyber incident' Russia Hacked Routers to Steal Microsoft Office Tokens – Krebs on Security Massachusetts hospital turning ambulances away after cyberattack | The Record from Recorded Future News Exclusive | 'Ghost Murmur,' a never-used secret tool, deployed to find lost airman in Iran in daring mission A Secure Chat App's Encryption Is So Bad It Is ‘Meaningless'

AI For Humans
Anthropic's Mythos AI Is Too Dangerous to Release. They're Using It Anyway.

AI For Humans

Play Episode Listen Later Apr 8, 2026 32:56


Anthropic revealed Mythos, a new AI model so powerful they won't let the public use it. Instead, they're deploying it to defend against cyberattacks with Project Glasswing. This week on AI For Humans, we dive deep into Anthropic's Mythos, the most powerful AI model they've ever built and one they've decided is too dangerous to release to the public. Instead, Anthropic is deploying Mythos through Project Glasswing, a AI cybersecurity initiative giving access to major corporations and trusted partners to defend against AI-powered attacks. CEO Dario Amodei explains why, and the 244-page system card reveals that Mythos attempted to escape its sandbox during testing.  Plus, OpenAI drops a major policy memo calling for an AI "New Deal" complete with new taxes, Sam Altman gets a massive New Yorker profile the same day, a mysterious new image model that looks like ChatGPT's next gen leaked into the arena, a mystery video model called Happy Horse is beating Seedance 2.0 and might be VEO 4, Anthropic hits $30B in annual recurring revenue, people are furious about Anthropic charging extra for OpenClaw API access, a new Chinese open-source model GLM-5.1 tops the coding benchmarks, and Milla Jovovich from The Fifth Element released an AI memory tool and it's actually good? MYTHOS IS TOO POWERFUL… BUT WE WANT IT STILL. SORRY.   Come to our Discord: https://discord.gg/muD2TYgC8f Join our Patreon: https://www.patreon.com/AIForHumansShow AI For Humans Newsletter: https://aiforhumans.beehiiv.com/ Follow us for more on X @AIForHumansShow Join our TikTok @aiforhumansshow To book us for speaking, please visit our website: https://www.aiforhumans.show/   // Show Links // Project Glasswing: Anthropic's Cybersecurity Initiative Powered by Mythos https://www.anthropic.com/glasswing   Mythos/Project Glasswing Mini-Trailer https://youtu.be/INGOC6-LLv0?si=sCJ6ZKAL6plkVZQ4   Dario Amodei on Why Mythos Won't Be Released to the Public https://x.com/DarioAmodei/status/2041580334693720511?s=20   Mythos System Card (244 Pages) https://www-cdn.anthropic.com/53566bf5440a10affd749724787c8913a2ae0841.pdf   Mythos Found a Vulnerability in FFMPEG https://x.com/trentonbricken/status/2041579112423440485?s=46   Anthropic Hits $30B in Annual Recurring Revenue https://x.com/AnthropicAI/status/2041275563466502560?s=20   Anthropic Charges Extra for OpenClaw API Access in Claude Code https://techcrunch.com/2026/04/04/anthropic-says-claude-code-subscribers-will-need-to-pay-extra-for-openclaw-support/   OpenAI's New Deal: Industrial Policy for the Intelligence Age https://openai.com/index/industrial-policy-for-the-intelligence-age/   GLM-5.1: New Chinese Open-Source Model Tops Coding Benchmarks https://x.com/ClementDelangue/status/2041554501539103014?s=20   GLM-5.1 on Hugging Face https://huggingface.co/zai-org/GLM-5.1   Milla Jovovich's AI Memory Tool https://www.instagram.com/p/DWzNnqwD2Lu/   New ChatGPT Image Model Spotted in the Arena https://x.com/levelsio/status/2040333489476681758?s=20   New ChatGPT Image Model Examples https://x.com/flowersslop/status/2040261168460108213?s=20   Mystery Video Model Happy Horse Beating Seedance 2.0 in the Arena https://artificialanalysis.ai/video/leaderboard/image-to-video   Happy Horse Video Examples https://x.com/venturetwins/status/2041554747086553093?s=20  

LINUX Unplugged
660: Boots and Breakups

LINUX Unplugged

Play Episode Listen Later Mar 29, 2026 57:36 Transcription Available


Ubuntu wants a leaner, stricter GRUB, and your favorite setup may not survive the cut. We break down what's really changing, and the practical ways to adapt. Plus, Chris moves on from one of his favorite open source apps.Sponsored By:Jupiter Party Annual Membership: Put your support on automatic with our annual plan, and get one month of membership for free!Managed Nebula: Meet Managed Nebula from Defined Networking. A decentralized VPN built on the open-source Nebula platform that we love.Support LINUX UnpluggedLinks:

Hacker News Recap
March 21st, 2026 | Do Not Turn Child Protection into Internet Access Control

Hacker News Recap

Play Episode Listen Later Mar 22, 2026 15:32


This is a recap of the top 10 posts on Hacker News on March 21, 2026. This podcast was generated by wondercraft.ai (00:30): Do Not Turn Child Protection into Internet Access ControlOriginal post: https://news.ycombinator.com/item?id=47470991&utm_source=wondercraft_ai(01:58): Some things just take timeOriginal post: https://news.ycombinator.com/item?id=47467537&utm_source=wondercraft_ai(03:27): Blocking Internet Archive Won't Stop AI, but Will Erase Web's Historical RecordOriginal post: https://news.ycombinator.com/item?id=47464818&utm_source=wondercraft_ai(04:56): Tinybox – Offline AI device 120B parametersOriginal post: https://news.ycombinator.com/item?id=47470773&utm_source=wondercraft_ai(06:24): Ubuntu 26.04 Ends 46 Years of Silent sudo PasswordsOriginal post: https://news.ycombinator.com/item?id=47464134&utm_source=wondercraft_ai(07:53): 404 Deno CEO not foundOriginal post: https://news.ycombinator.com/item?id=47467746&utm_source=wondercraft_ai(09:22): Mayor of Paris removed parking spaces, reduced the number of carsOriginal post: https://news.ycombinator.com/item?id=47466697&utm_source=wondercraft_ai(10:50): FFmpeg 101 (2024)Original post: https://news.ycombinator.com/item?id=47463547&utm_source=wondercraft_ai(12:19): Grafeo – A fast, lean, embeddable graph database built in RustOriginal post: https://news.ycombinator.com/item?id=47467567&utm_source=wondercraft_ai(13:48): Professional video editing, right in the browser with WebGPU and WASMOriginal post: https://news.ycombinator.com/item?id=47471601&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

Hacker News Recap
March 17th, 2026 | Kagi Translate now supports LinkedIn Speak as an output language

Hacker News Recap

Play Episode Listen Later Mar 18, 2026 14:58


This is a recap of the top 10 posts on Hacker News on March 17, 2026. This podcast was generated by wondercraft.ai (00:30): Kagi Translate now supports LinkedIn Speak as an output languageOriginal post: https://news.ycombinator.com/item?id=47408703&utm_source=wondercraft_ai(01:55): Reddit User Uncovers Who Is Behind Meta's $2B Lobbying for Age Verification TechOriginal post: https://news.ycombinator.com/item?id=47410870&utm_source=wondercraft_ai(03:20): US SEC preparing to scrap quarterly reporting requirementOriginal post: https://news.ycombinator.com/item?id=47406779&utm_source=wondercraft_ai(04:45): Kagi Small WebOriginal post: https://news.ycombinator.com/item?id=47410542&utm_source=wondercraft_ai(06:11): Microsoft's 'unhackable' Xbox One has been hacked by 'Bliss'Original post: https://news.ycombinator.com/item?id=47413876&utm_source=wondercraft_ai(07:36): A Decade of SlugOriginal post: https://news.ycombinator.com/item?id=47416736&utm_source=wondercraft_ai(09:01): Every layer of review makes you 10x slowerOriginal post: https://news.ycombinator.com/item?id=47408205&utm_source=wondercraft_ai(10:27): FFmpeg 8.1Original post: https://news.ycombinator.com/item?id=47413525&utm_source=wondercraft_ai(11:52): Python 3.15's JIT is now back on trackOriginal post: https://news.ycombinator.com/item?id=47416486&utm_source=wondercraft_ai(13:17): If you thought code writing speed was your problem you have bigger problemsOriginal post: https://news.ycombinator.com/item?id=47415919&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

Atareao con Linux
ATA 758 - De podcast a vídeo

Atareao con Linux

Play Episode Listen Later Jan 5, 2026 21:27


¡Feliz 2026! Comenzamos el año con un episodio cargado de novedades y proyectos que buscan devolver algo de valor a la comunidad del software libre. En esta entrega, te presento dos "regalos" especiales: el renacimiento de una herramienta técnica para creadores de contenido y el anuncio de un nuevo podcast sobre la historia de Linux en España.Todo surge a raíz de una invitación de José Jiménez para participar en una mesa redonda sobre el estado de los podcasts de Linux, junto a David Marzal (KDE Express), Jose (KernelCast) y bajo la impecable producción de David Baquero (Cursos de Desarrollo). Durante esa charla, surgió la recomendación de llevar los contenidos de audio a YouTube para alcanzar nuevas audiencias.Fue en ese momento cuando surgió el recuerdo de Audiowave, un pequeño script que desarrollé hace ocho años para generar vídeos con ondas de audio dinámicas. La idea original era permitir que los audios estuvieran disponibles en plataformas de vídeo incluso cuando el creador no deseara aparecer físicamente en pantalla. El reto actual ha sido no solo recuperar la herramienta, sino darle el "cariño" que merece para los estándares de calidad actuales.Lo que comenzó como un simple envoltorio (wrapper) de FFmpeg escrito en Bash, ha evolucionado hacia una herramienta robusta desarrollada en Rust. Esta nueva versión no solo es más rápida y segura, sino que introduce un nivel de personalización profesional:Sistema de plantillas: Ahora es posible definir estilos reutilizables mediante archivos de configuración, controlando dimensiones, colores y la disposición de todos los elementos visuales.Automatización de metadatos: La herramienta es capaz de extraer el título, el subtítulo y la carátula directamente de los archivos MP3, simplificando al máximo el flujo de trabajo del podcaster.Filtros visuales avanzados: Gracias a la potencia de FFmpeg, Audiowave permite generar desde ecualizadores de múltiples bandas hasta ondas circulares que se modulan en tiempo real con la voz.Optimización de renderizado: Aunque los estilos más complejos pueden requerir un tiempo de procesamiento considerable, el resultado final es una pieza de vídeo de alta fidelidad.La motivación principal para actualizar Audiowave es el lanzamiento de "La era de las distros". Este nuevo podcast documental se centra en las distribuciones Linux autonómicas que nacieron en España a principios de los años 2000, tanto las oficiales respaldadas por instituciones como las iniciativas independientes.A través de sus propios protagonistas, el podcast explora cómo estos proyectos fueron piezas clave en la estrategia de digitalización de diversas comunidades autónomas. Es un ejercicio de memoria histórica tecnológica que revela hitos asombrosos, como la instalación desde cero de más de 100.000 ordenadores con Linux hace ya un cuarto de siglo. Es un pasado fascinante que a veces olvidamos mientras nos maravillamos con noticias similares que llegan hoy desde otros países.Este episodio es una invitación a explorar las herramientas que el código abierto pone a nuestra disposición para comunicar mejor, y un adelanto de un viaje sonoro por la historia del software libre en nuestro país.Más información y enlaces en las notas del episodio

Hacker News Recap
December 26th, 2025 | Rob Pike goes nuclear over GenAI

Hacker News Recap

Play Episode Listen Later Dec 27, 2025 14:10


This is a recap of the top 10 posts on Hacker News on December 26, 2025. This podcast was generated by wondercraft.ai (00:30): Rob Pike goes nuclear over GenAIOriginal post: https://news.ycombinator.com/item?id=46392115&utm_source=wondercraft_ai(01:50): How uv got so fastOriginal post: https://news.ycombinator.com/item?id=46393992&utm_source=wondercraft_ai(03:11): Package managers keep using Git as a database, it never works outOriginal post: https://news.ycombinator.com/item?id=46391514&utm_source=wondercraft_ai(04:31): Rob Pike Goes Nuclear over GenAIOriginal post: https://news.ycombinator.com/item?id=46389444&utm_source=wondercraft_ai(05:52): FFmpeg has issued a DMCA takedown on GitHubOriginal post: https://news.ycombinator.com/item?id=46394327&utm_source=wondercraft_ai(07:12): Seven Diabetes Patients Die Due to Undisclosed Bug in Abbott's Glucose MonitorsOriginal post: https://news.ycombinator.com/item?id=46388040&utm_source=wondercraft_ai(08:33): My insulin pump controller uses the Linux kernel. It also violates the GPLOriginal post: https://news.ycombinator.com/item?id=46395184&utm_source=wondercraft_ai(09:53): Experts explore new mushroom which causes fairytale-like hallucinationsOriginal post: https://news.ycombinator.com/item?id=46393936&utm_source=wondercraft_ai(11:14): I'm a laptop weirdo and that's why I like my new Framework 13Original post: https://news.ycombinator.com/item?id=46391410&utm_source=wondercraft_ai(12:34): Rob Pike got spammed with an AI slop "act of kindness"Original post: https://news.ycombinator.com/item?id=46394867&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

Ask Noah Show
Ask Noah Show 471

Ask Noah Show

Play Episode Listen Later Dec 17, 2025 58:55


This week we dig full force into some interesting listener questions. Noah talks about an open source hardware synth, and Steve walks through some of his hardware choices to help you! -- During The Show -- 00:50 Intro Weather Cooling IT rooms in winter 05:00 Printers, DVD ripping and more - James Steve has 2 brother printers Auto Duplexing Separate printer and scanner Large business grade units Ask Noah Show 368 (https://podcast.asknoahshow.com/368) All in One Brother DCP-L2640DW Amazon (https://www.amazon.com/dp/B0CPLFTPCV) Budget Brother HL-L2460DW Amazon (https://www.amazon.com/dp/B0CPL2N5H6) Monochrome Brother HL-6210DW Amazon (https://www.amazon.com/dp/B0CGC9HPNH) Color Brother HL-L3280CDW Amazon (https://www.amazon.com/dp/B0CFD1G1VT) Trouble with auto duplexing Stay away from Lexar HP printers Manually add the printer Change to Jetdirect or IP printer Pay attention to exact model or most similar When it goes wrong, it goes really wrong Your mileage may vary Canon Color Image Class LBP622Cdw Amazon (https://www.amazon.com/dp/B07QBR7JFV) Scanner Brother ADS-1200 Amazon (https://www.amazon.com/dp/B07WSJQWVQ) Containers vs Codecs MKV vs MP4 Avidemux (https://avidemux.sourceforge.net/) Ripping as ISOs vs video files MakeMKV (https://www.makemkv.com/) MakeMKV Docker Image (https://github.com/jlesage/docker-makemkv) ``` # sudo modprobe sg services: makemkv: image: ghcr.io/jlesage/makemkv:latest ports: "5800:5800" volumes: "./makemkv:/config:rw" "./storage:/storage:ro" "./output:/output:rw" security_opt: # Fix for apparmor enabled systems apparmor:unconfined environment: USER_ID=1000 GROUP_ID=1000 devices: "/dev/sr0:/dev/sr0" "/dev/sg0:/dev/sg0" ``` Christmas movies Handbrake (https://handbrake.fr/) FFmpeg (https://ffmpeg.org/) Transcoding Run controller at each site Ubiquiti Cloud Key (https://store.ui.com/us/en/products/uck-g2) Lots of problems OVH server Put basic auth in front Inbox Zero Paperless NGX (https://docs.paperless-ngx.com/) Dump to eml file then import into special Thunderbird 45:14 News Wire Firefox 146 - firefox.com (https://www.firefox.com/en-US/firefox/146.0/releasenotes/) Thunderbird 146 - thunderbird.net (https://www.thunderbird.net/en-US/thunderbird/146.0/releasenotes/) KDE Frameworks 6.12 - kde.org (https://kde.org/info/kde-frameworks-6.21.0/) Cinnamon Desktop 6.6 - itsfoss.com (https://itsfoss.com/news/cinnamon-6-6/) Mir 2.25 - github.com (https://github.com/canonical/mir/releases/tag/v2.25.0) Rust 1.92 - blog.rust-lang.org (https://blog.rust-lang.org/2025/12/11/Rust-1.92.0/) AerynOS 2025.12 - phoronix.com (https://www.phoronix.com/news/AerynOS-2025.12) Kali Linux 2025.4 - kali.org (https://www.kali.org/blog/kali-linux-2025-4-release/) Pop!_OS 24.04 - itsfoss.com (https://itsfoss.com/news/pop-os-24-04-review/) PearOS - pearos.xyz (https://pearos.xyz) MaboxLinux 2025.12 - maboxlinux.org (https://maboxlinux.org/mabox-25-12-improvements-fixes-and-gtk2-farewell/#google_vignette) Papermoon - thenewstack.io (https://thenewstack.io/papermoon-a-space-grade-linux-for-the-newspace-era/) 01flip Ransomware - esecurityplanet.com (https://www.esecurityplanet.com/threats/rust-based-01flip-ransomware-hits-windows-and-linux/) React2Shell - thehackernews.com (https://thehackernews.com/2025/12/react2shell-vulnerability-actively.html) Nomos 1 - venturebeat.com (https://venturebeat.com/ai/nous-research-just-released-nomos-1-an-open-source-ai-that-ranks-second-on) Nemotron Model - reuters.com (https://www.reuters.com/world/china/nvidia-unveils-new-open-source-ai-models-amid-boom-chinese-offerings-2025-12-15/) Quilter's AI - venturebeat.com (https://venturebeat.com/ai/quilters-ai-just-designed-an-843-part-linux-computer-that-booted-on-the) Chatterbox Labs - redhat.com (https://www.redhat.com/en/blog/red-hat-acquire-chatterbox-labs-frequently-asked-questions) Agentic AI Group - hackernoon.com (https://hackernoon.com/linux-foundation-launches-agentic-ai-group-to-set-standards-for-autonomous-systems) Firefox AI Browser - phoronix.com (https://www.phoronix.com/news/Mozilla-New-CEO-AI) 47:47 Zynthian Open hardware device We want your feedback! Are you comfortable with software VST Zynthian.org (https://zynthian.org/) 50:30 Family Resistant to Self Hosting - David Ovens house hold approach Watching for pain points Making responsible path easy Making irresponsible path hard Value driven decisions Supporting where your paycheck comes from -- The Extra Credit Section -- For links to the articles and material referenced in this week's episode check out this week's page from our podcast dashboard! This Episode's Podcast Dashboard (http://podcast.asknoahshow.com/471) Phone Systems for Ask Noah provided by Voxtelesys (http://www.voxtelesys.com/asknoah) Join us in our dedicated chatroom #GeekLab:linuxdelta.com on Matrix (https://element.linuxdelta.com/#/room/#geeklab:linuxdelta.com) -- Stay In Touch -- Find all the resources for this show on the Ask Noah Dashboard Ask Noah Dashboard (http://www.asknoahshow.com) Need more help than a radio show can offer? Altispeed provides commercial IT services and they're excited to offer you a great deal for listening to the Ask Noah Show. Call today and ask about the discount for listeners of the Ask Noah Show! Altispeed Technologies (http://www.altispeed.com/) Contact Noah live [at] asknoahshow.com -- Twitter -- Noah - Kernellinux (https://twitter.com/kernellinux) Ask Noah Show (https://twitter.com/asknoahshow) Altispeed Technologies (https://twitter.com/altispeed)

Black Hills Information Security
A.I. Transcription Startup Was Just A Guy Taking Notes- 2025-11-17

Black Hills Information Security

Play Episode Listen Later Nov 21, 2025 76:33


Register for FREE Infosec Webcasts, Anti-casts & Summits – https://poweredbybhis.com00:00:00 - PreShow Banter™ — The Way the Community Rumbles00:08:21 - A.I. Transcription Startup Was Just A Guy Taking Notes - BHIS - Talkin' Bout [infosec] News 2025-11-1700:09:01 - Story # 1: New data shows companies are rehiring former employees as AI falls short of expectations00:18:06 - Eric & Whitney's “Podcast” [webcast] on training your own LLM00:22:12 - Story # 2: Founder Admits His “AI Transcription” Startup Was Just Him Joining People's Meetings and Taking Notes by Hand00:26:20 - Story # 3: Five Plead Guilty in U.S. for Helping North Korean IT Workers Infiltrate 136 Companies00:37:35 - Story # 4: Google is easing up on Android's new sideloading restrictions!00:43:44 - Story # 5: Google is collecting troves of data from downgraded Nest thermostats00:44:58 - Story # 5b: Hackers are saving Google's abandoned Nest thermostats with open-source firmware00:51:34 - Story # 6: FFmpeg to Google: Fund Us or Stop Sending Bugs01:00:40 - Story # 7: Teens are Hacking School Systems. Let's Teach Them to Protect Communities Instead01:05:55 - Story # 8: Disrupting the first reported AI-orchestrated cyber espionage campaign01:14:58 - Discord CTF Winners

WP Builds
This Week in WordPress #355

WP Builds

Play Episode Listen Later Nov 18, 2025 91:24


In "This Week in WordPress #355," Nathan Wrigley, Michelle Frechette, and Rhys Wynne discuss the Kagi search engine, Michelle's job search, and WordPress updates including 6.9's new features like collaborative editing and abilities API. The episode covers the challenges faced by open source projects like FFmpeg, security concerns with AI-powered tools such as Telex, the Global Partner Program for WordPress event sponsorships, and developments in full site editing, highlighting the Ollie theme. Listener comments add depth to discussions about the future and risks of WordPress plugin and block creation through AI.

All TWiT.tv Shows (MP3)
Untitled Linux Show 229: Full Steam Ahead

All TWiT.tv Shows (MP3)

Play Episode Listen Later Nov 16, 2025 123:33 Transcription Available


Valve is going to attempt the Linux trifecta, Firefox is adding more AI and people aren't happy, and the kernel is refining its own AI guidelines. FFmpeg is tired of AI generated CVEs, no matter how good they are! Rust isn't always more secure, your Ubuntu desktop can last for 15 years now, and OpenSUSE Tumbleweed has some surprises. For Tips, we cover Webmin, btrfs-rescue, a function to center-print text in the terminal, and go down the rabbit-hole of detecting dual server PSUs. You can find the show notes at https://bit.ly/4pbm35E and see you next time! Host: Jonathan Bennett Co-Hosts: Jeff Massie, Rob Campbell, and Ken McDonald Download or subscribe to Untitled Linux Show at https://twit.tv/shows/untitled-linux-show Want access to the ad-free video and exclusive features? Become a member of Club TWiT today! https://twit.tv/clubtwit Club TWiT members can discuss this episode and leave feedback in the Club TWiT Discord.

All TWiT.tv Shows (Video LO)
Untitled Linux Show 229: Full Steam Ahead

All TWiT.tv Shows (Video LO)

Play Episode Listen Later Nov 16, 2025 123:33 Transcription Available


Valve is going to attempt the Linux trifecta, Firefox is adding more AI and people aren't happy, and the kernel is refining its own AI guidelines. FFmpeg is tired of AI generated CVEs, no matter how good they are! Rust isn't always more secure, your Ubuntu desktop can last for 15 years now, and OpenSUSE Tumbleweed has some surprises. For Tips, we cover Webmin, btrfs-rescue, a function to center-print text in the terminal, and go down the rabbit-hole of detecting dual server PSUs. You can find the show notes at https://bit.ly/4pbm35E and see you next time! Host: Jonathan Bennett Co-Hosts: Jeff Massie, Rob Campbell, and Ken McDonald Download or subscribe to Untitled Linux Show at https://twit.tv/shows/untitled-linux-show Want access to the ad-free video and exclusive features? Become a member of Club TWiT today! https://twit.tv/clubtwit Club TWiT members can discuss this episode and leave feedback in the Club TWiT Discord.

Security Now (MP3)
SN 1051: Amazon sues Perplexity - Nevada's Ransomware Comeback

Security Now (MP3)

Play Episode Listen Later Nov 12, 2025 178:34 Transcription Available


Amazon is taking Perplexity AI to court over its agentic browser that shops on your behalf, raising urgent questions about who controls your online buying experience when bots do the heavy lifting. FFmpeg teaching assembly language for performance. The state of Nevada recovers after not paying ransom. A "rounding error" nets a clever attacker $128 million. Why would Chrome decide to start form-filling driver's licenses. The UK's six major telecom providers to block number spoofing. XSLT support being removed from browsers. Will anyone notice. Firefox introduced paid support options for organizations. Russia continues to fight against non-Russian Internet. Google acquires another Internet security company (Wiz). The EU to finally fix their cookie permission mistake. More countries drop Microsoft office for open choices. More countries question and examine Chinese made buses. Microsoft discovers some information leakage from LLMs. What does Amazon's lawsuit against Perplexity's agents mean for next-generation browsers Show Notes - https://www.grc.com/sn/SN-1051-Notes.pdf Hosts: Steve Gibson and Leo Laporte Download or subscribe to Security Now at https://twit.tv/shows/security-now. You can submit a question to Security Now at the GRC Feedback Page. For 16kbps versions, transcripts, and notes (including fixes), visit Steve's site: grc.com, also the home of the best disk maintenance and recovery utility ever written Spinrite 6. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: veeam.com hoxhunt.com/securitynow zscaler.com/security zapier.com/securitynow vanta.com/SECURITYNOW

All TWiT.tv Shows (MP3)
Security Now 1051: Amazon sues Perplexity

All TWiT.tv Shows (MP3)

Play Episode Listen Later Nov 12, 2025 178:34 Transcription Available


Amazon is taking Perplexity AI to court over its agentic browser that shops on your behalf, raising urgent questions about who controls your online buying experience when bots do the heavy lifting. FFmpeg teaching assembly language for performance. The state of Nevada recovers after not paying ransom. A "rounding error" nets a clever attacker $128 million. Why would Chrome decide to start form-filling driver's licenses. The UK's six major telecom providers to block number spoofing. XSLT support being removed from browsers. Will anyone notice. Firefox introduced paid support options for organizations. Russia continues to fight against non-Russian Internet. Google acquires another Internet security company (Wiz). The EU to finally fix their cookie permission mistake. More countries drop Microsoft office for open choices. More countries question and examine Chinese made buses. Microsoft discovers some information leakage from LLMs. What does Amazon's lawsuit against Perplexity's agents mean for next-generation browsers Show Notes - https://www.grc.com/sn/SN-1051-Notes.pdf Hosts: Steve Gibson and Leo Laporte Download or subscribe to Security Now at https://twit.tv/shows/security-now. You can submit a question to Security Now at the GRC Feedback Page. For 16kbps versions, transcripts, and notes (including fixes), visit Steve's site: grc.com, also the home of the best disk maintenance and recovery utility ever written Spinrite 6. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: veeam.com hoxhunt.com/securitynow zscaler.com/security zapier.com/securitynow vanta.com/SECURITYNOW

Security Now (Video HD)
SN 1051: Amazon sues Perplexity - Nevada's Ransomware Comeback

Security Now (Video HD)

Play Episode Listen Later Nov 12, 2025 164:03 Transcription Available


Amazon is taking Perplexity AI to court over its agentic browser that shops on your behalf, raising urgent questions about who controls your online buying experience when bots do the heavy lifting. FFmpeg teaching assembly language for performance. The state of Nevada recovers after not paying ransom. A "rounding error" nets a clever attacker $128 million. Why would Chrome decide to start form-filling driver's licenses. The UK's six major telecom providers to block number spoofing. XSLT support being removed from browsers. Will anyone notice. Firefox introduced paid support options for organizations. Russia continues to fight against non-Russian Internet. Google acquires another Internet security company (Wiz). The EU to finally fix their cookie permission mistake. More countries drop Microsoft office for open choices. More countries question and examine Chinese made buses. Microsoft discovers some information leakage from LLMs. What does Amazon's lawsuit against Perplexity's agents mean for next-generation browsers Show Notes - https://www.grc.com/sn/SN-1051-Notes.pdf Hosts: Steve Gibson and Leo Laporte Download or subscribe to Security Now at https://twit.tv/shows/security-now. You can submit a question to Security Now at the GRC Feedback Page. For 16kbps versions, transcripts, and notes (including fixes), visit Steve's site: grc.com, also the home of the best disk maintenance and recovery utility ever written Spinrite 6. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: veeam.com hoxhunt.com/securitynow zscaler.com/security zapier.com/securitynow vanta.com/SECURITYNOW

Security Now (Video HI)
SN 1051: Amazon sues Perplexity - Nevada's Ransomware Comeback

Security Now (Video HI)

Play Episode Listen Later Nov 12, 2025 164:03 Transcription Available


Amazon is taking Perplexity AI to court over its agentic browser that shops on your behalf, raising urgent questions about who controls your online buying experience when bots do the heavy lifting. FFmpeg teaching assembly language for performance. The state of Nevada recovers after not paying ransom. A "rounding error" nets a clever attacker $128 million. Why would Chrome decide to start form-filling driver's licenses. The UK's six major telecom providers to block number spoofing. XSLT support being removed from browsers. Will anyone notice. Firefox introduced paid support options for organizations. Russia continues to fight against non-Russian Internet. Google acquires another Internet security company (Wiz). The EU to finally fix their cookie permission mistake. More countries drop Microsoft office for open choices. More countries question and examine Chinese made buses. Microsoft discovers some information leakage from LLMs. What does Amazon's lawsuit against Perplexity's agents mean for next-generation browsers Show Notes - https://www.grc.com/sn/SN-1051-Notes.pdf Hosts: Steve Gibson and Leo Laporte Download or subscribe to Security Now at https://twit.tv/shows/security-now. You can submit a question to Security Now at the GRC Feedback Page. For 16kbps versions, transcripts, and notes (including fixes), visit Steve's site: grc.com, also the home of the best disk maintenance and recovery utility ever written Spinrite 6. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: veeam.com hoxhunt.com/securitynow zscaler.com/security zapier.com/securitynow vanta.com/SECURITYNOW

Radio Leo (Audio)
Security Now 1051: Amazon sues Perplexity

Radio Leo (Audio)

Play Episode Listen Later Nov 12, 2025 178:34 Transcription Available


Amazon is taking Perplexity AI to court over its agentic browser that shops on your behalf, raising urgent questions about who controls your online buying experience when bots do the heavy lifting. FFmpeg teaching assembly language for performance. The state of Nevada recovers after not paying ransom. A "rounding error" nets a clever attacker $128 million. Why would Chrome decide to start form-filling driver's licenses. The UK's six major telecom providers to block number spoofing. XSLT support being removed from browsers. Will anyone notice. Firefox introduced paid support options for organizations. Russia continues to fight against non-Russian Internet. Google acquires another Internet security company (Wiz). The EU to finally fix their cookie permission mistake. More countries drop Microsoft office for open choices. More countries question and examine Chinese made buses. Microsoft discovers some information leakage from LLMs. What does Amazon's lawsuit against Perplexity's agents mean for next-generation browsers Show Notes - https://www.grc.com/sn/SN-1051-Notes.pdf Hosts: Steve Gibson and Leo Laporte Download or subscribe to Security Now at https://twit.tv/shows/security-now. You can submit a question to Security Now at the GRC Feedback Page. For 16kbps versions, transcripts, and notes (including fixes), visit Steve's site: grc.com, also the home of the best disk maintenance and recovery utility ever written Spinrite 6. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: veeam.com hoxhunt.com/securitynow zscaler.com/security zapier.com/securitynow vanta.com/SECURITYNOW

Security Now (Video LO)
SN 1051: Amazon sues Perplexity - Nevada's Ransomware Comeback

Security Now (Video LO)

Play Episode Listen Later Nov 12, 2025 164:03 Transcription Available


Amazon is taking Perplexity AI to court over its agentic browser that shops on your behalf, raising urgent questions about who controls your online buying experience when bots do the heavy lifting. FFmpeg teaching assembly language for performance. The state of Nevada recovers after not paying ransom. A "rounding error" nets a clever attacker $128 million. Why would Chrome decide to start form-filling driver's licenses. The UK's six major telecom providers to block number spoofing. XSLT support being removed from browsers. Will anyone notice. Firefox introduced paid support options for organizations. Russia continues to fight against non-Russian Internet. Google acquires another Internet security company (Wiz). The EU to finally fix their cookie permission mistake. More countries drop Microsoft office for open choices. More countries question and examine Chinese made buses. Microsoft discovers some information leakage from LLMs. What does Amazon's lawsuit against Perplexity's agents mean for next-generation browsers Show Notes - https://www.grc.com/sn/SN-1051-Notes.pdf Hosts: Steve Gibson and Leo Laporte Download or subscribe to Security Now at https://twit.tv/shows/security-now. You can submit a question to Security Now at the GRC Feedback Page. For 16kbps versions, transcripts, and notes (including fixes), visit Steve's site: grc.com, also the home of the best disk maintenance and recovery utility ever written Spinrite 6. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: veeam.com hoxhunt.com/securitynow zscaler.com/security zapier.com/securitynow vanta.com/SECURITYNOW

Hacker News Recap
November 11th, 2025 | The 'Toy Story' You Remember

Hacker News Recap

Play Episode Listen Later Nov 12, 2025 14:13


This is a recap of the top 10 posts on Hacker News on November 11, 2025. This podcast was generated by wondercraft.ai (00:30): The 'Toy Story' You RememberOriginal post: https://news.ycombinator.com/item?id=45883788&utm_source=wondercraft_ai(01:50): FFmpeg to Google: Fund us or stop sending bugsOriginal post: https://news.ycombinator.com/item?id=45891016&utm_source=wondercraft_ai(03:11): iPhone PocketOriginal post: https://news.ycombinator.com/item?id=45885813&utm_source=wondercraft_ai(04:32): Warren Buffett's final shareholder letter [pdf]Original post: https://news.ycombinator.com/item?id=45882837&utm_source=wondercraft_ai(05:53): Collaboration sucksOriginal post: https://news.ycombinator.com/item?id=45892394&utm_source=wondercraft_ai(07:14): I hate screenshots of textOriginal post: https://news.ycombinator.com/item?id=45883124&utm_source=wondercraft_ai(08:35): SoftBank sells its entire stake in NvidiaOriginal post: https://news.ycombinator.com/item?id=45884937&utm_source=wondercraft_ai(09:55): Firefox expands fingerprint protectionsOriginal post: https://news.ycombinator.com/item?id=45888891&utm_source=wondercraft_ai(11:16): X5.1 solar flare, G4 geomagnetic storm watchOriginal post: https://news.ycombinator.com/item?id=45893004&utm_source=wondercraft_ai(12:37): iPod SocksOriginal post: https://news.ycombinator.com/item?id=45889602&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

All TWiT.tv Shows (Video LO)
Security Now 1051: Amazon sues Perplexity

All TWiT.tv Shows (Video LO)

Play Episode Listen Later Nov 12, 2025 164:03 Transcription Available


Amazon is taking Perplexity AI to court over its agentic browser that shops on your behalf, raising urgent questions about who controls your online buying experience when bots do the heavy lifting. FFmpeg teaching assembly language for performance. The state of Nevada recovers after not paying ransom. A "rounding error" nets a clever attacker $128 million. Why would Chrome decide to start form-filling driver's licenses. The UK's six major telecom providers to block number spoofing. XSLT support being removed from browsers. Will anyone notice. Firefox introduced paid support options for organizations. Russia continues to fight against non-Russian Internet. Google acquires another Internet security company (Wiz). The EU to finally fix their cookie permission mistake. More countries drop Microsoft office for open choices. More countries question and examine Chinese made buses. Microsoft discovers some information leakage from LLMs. What does Amazon's lawsuit against Perplexity's agents mean for next-generation browsers Show Notes - https://www.grc.com/sn/SN-1051-Notes.pdf Hosts: Steve Gibson and Leo Laporte Download or subscribe to Security Now at https://twit.tv/shows/security-now. You can submit a question to Security Now at the GRC Feedback Page. For 16kbps versions, transcripts, and notes (including fixes), visit Steve's site: grc.com, also the home of the best disk maintenance and recovery utility ever written Spinrite 6. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: veeam.com hoxhunt.com/securitynow zscaler.com/security zapier.com/securitynow vanta.com/SECURITYNOW

Security Conversations
LIVE at Countermeasures: Google v FFmpeg, Ransomware Turncoats, Samsung 0days

Security Conversations

Play Episode Listen Later Nov 10, 2025 69:59


Presented by Material Security: We protect your company's most valuable materials -- the emails, files, and accounts that live in your Google Workspace and Microsoft 365 cloud offices. Three Buddy Problem - Episode 71: The buddies travel to Canada for a live recording at the Countermeasure conference, discussing the Google v FFmpeg open-source patching brouhana, ransomware negotiators charged and linked to ransomware attacks, the looming TP-Link ban in the U.S., and the discovery of LANDFALL, an APT attack caught using a Samsung mobile zero-day. Cast: Juan Andres Guerrero-Saade (https://twitter.com/juanandres_gs), Ryan Naraine (https://twitter.com/ryanaraine) and Costin Raiu (https://twitter.com/craiu).

Risky Business
Risky Business #813 -- FFmpeg has a point

Risky Business

Play Episode Listen Later Nov 5, 2025 65:08


In this week's show Patrick Gray and Adam Boileau discuss the week's cybersecurity news, including: We love some good vulnerability reporting drama, this time FFmpeg's got beef with Google OpenAI announces its Aardvark bug-gobbling system Two US ransomware responders get arrested for… ransomware Memento (nee HackingTeam) CEO says: Sì, those are totally our tools getting snapped in Russia Hackers help freight theft gangs steal shipments to resell A second Jabber Zeus mastermind gets his comeuppance 15 years on This week's episode is sponsored by Nucleus Security, who make a vulnerability information management system. Co-founder Scott Kuffer says that approaches for triaging vulnerabilities have started to fall apart, given there are just. So. Many. And they're all important! This episode is also available on Youtube. Show notes vx-underground on X: "Yeah, so pretty much this entire drama thing is FFmpeg are a bunch of nerds…" FFmpeg on X: "@DavidEGrayson It's someone's hobby project of an obscure 1990s decoder…" Halvar Flake on X: "Given the extremely big role ffmpeg has played historically..." thaddeus e. grugq on X: "Current drama: Plucky security researcher Google takes on volunteer open source behemoth FFmpeg." Robert Graham on X: "Current status: There's a conflict between Google…" Introducing Aardvark: OpenAI's agentic security researcher | OpenAI Bugcrowd acquires Mayhem Security to advance AI-powered security testing | CyberScoop Prosecutors allege incident response pros used ALPHV/BlackCat to commit string of ransomware attacks | CyberScoop Former Trenchant Exec Sold Stolen Code to Russian Buyer Even After Learning that Other Code He Sold Was Being "Utilized" by Different Broker in South Korea How an ex-L3Harris Trenchant boss stole and sold cyber exploits to Russia | TechCrunch Operation Zero — A Zero-Day Vulnerability Platform John Scott-Railton on X: "7/ There's a push to scale up America's offensive industry right now…" CEO of spyware maker Memento Labs confirms one of its government customers was caught using its malware | TechCrunch Exploiting Microsoft Teams: Impersonation and Spoofing Vulnerabilities Exposed Microsoft Teams Vulnerabilities Uncovered Cargo theft gets a boost from hackers using remote monitoring tools | The Record from Recorded Future News Remote access, real cargo: cybercriminals targeting trucking and logistics | Proofpoint US Alleged Conti ransomware gang affiliate appears in Tennessee court after Ireland extradition | The Record from Recorded Future News Three suspected developers of Meduza Stealer malware arrested in Russia | The Record from Recorded Future News Alleged Jabber Zeus Coder ‘MrICQ' in U.S. Custody – Krebs on Security Windows Server Update Service exploitation ensnares at least 50 victims | Cybersecurity Dive Post by @paulschnack.bsky.social — Bluesky

This Week in Tech (Audio)
TWiT 1056: The Big Sleep - The Great Router Ban

This Week in Tech (Audio)

Play Episode Listen Later Nov 3, 2025 169:26


From AI-powered code generation boosting productivity to adversaries using the same tools to hunt zero-days, the panel exposes the coming wave of AI-fueled cyberattacks—and why most companies aren't ready for it. Cotton blocks Trump-backed effort to make daylight saving time permanent The End of Cybersecurity Amazon says it didn't cut 14,000 people because of money. It cut them because of 'culture' Here's How the AI Crash Happens US government is getting closer to banning TP-Link routers Neato cloud shutdown sees robocleaners robbed of their smarts FCC will vote to scrap telecom cybersecurity requirements Trump FCC Votes To Make It Easier For Your Broadband ISP To Rip You Off Swedish Death Cleaning But for Your Ditital Life The F5 Hack is a Big Deal OpenAI Releases Agentic Security Researcher 'Do not trust your eyes': AI generates surge in expense fraud Proton Data Breach Observatory aims to alert you in near real-time Using a Security Key on X? Re-Enroll Now or Your Account Will Be Locked YouTube denies AI was involved with odd removals of tech tutorials 10M people watched a YouTuber shim a lock; the lock company sued him. Bad idea. Samsung's $2000 smart fridges are getting ads - gHacks Tech News ESPN, ABC, and other Disney channels go dark on YouTube TV Host: Leo Laporte Guests: Jill Duffy, Alex Stamos, and Stacey Higginbotham Download or subscribe to This Week in Tech at https://twit.tv/shows/this-week-in-tech Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: ziprecruiter.com/twit zscaler.com/security miro.com canary.tools/twit - use code: TWIT Melissa.com/twit

This Week in Tech (Video HI)
TWiT 1056: The Big Sleep - The Great Router Ban

This Week in Tech (Video HI)

Play Episode Listen Later Nov 3, 2025 167:28


From AI-powered code generation boosting productivity to adversaries using the same tools to hunt zero-days, the panel exposes the coming wave of AI-fueled cyberattacks—and why most companies aren't ready for it. Cotton blocks Trump-backed effort to make daylight saving time permanent The End of Cybersecurity Amazon says it didn't cut 14,000 people because of money. It cut them because of 'culture' Here's How the AI Crash Happens US government is getting closer to banning TP-Link routers Neato cloud shutdown sees robocleaners robbed of their smarts FCC will vote to scrap telecom cybersecurity requirements Trump FCC Votes To Make It Easier For Your Broadband ISP To Rip You Off Swedish Death Cleaning But for Your Ditital Life The F5 Hack is a Big Deal OpenAI Releases Agentic Security Researcher 'Do not trust your eyes': AI generates surge in expense fraud Proton Data Breach Observatory aims to alert you in near real-time Using a Security Key on X? Re-Enroll Now or Your Account Will Be Locked YouTube denies AI was involved with odd removals of tech tutorials 10M people watched a YouTuber shim a lock; the lock company sued him. Bad idea. Samsung's $2000 smart fridges are getting ads - gHacks Tech News ESPN, ABC, and other Disney channels go dark on YouTube TV Host: Leo Laporte Guests: Jill Duffy, Alex Stamos, and Stacey Higginbotham Download or subscribe to This Week in Tech at https://twit.tv/shows/this-week-in-tech Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: ziprecruiter.com/twit zscaler.com/security miro.com canary.tools/twit - use code: TWIT Melissa.com/twit

All TWiT.tv Shows (MP3)
Untitled Linux Show 227: Ancient Stack Tax

All TWiT.tv Shows (MP3)

Play Episode Listen Later Nov 3, 2025 105:55 Transcription Available


This week SUSE's SLES and Red Hat's RHEL are embracing AI in the form of MCP and CUDA support. FFMPEG scores a $100k donation, Pop_OS and Cosmic finally have a release data, and Unity is in need of help. Kodi 22 has an Alpha, Debian has a Systemd dustup, and Krita has landed HDR support. And there's a port of Linux to WASM, so you can run the kern in your browser. Handy! For tips we have doxx for opening .docx in the terminal, a primer on absolute vs relative paths, whoami for grabbing the current username, and btrfs's scrub command for checking the local disk. You can find the show notes at https://bit.ly/4ovhsLG and have a great week! Host: Jonathan Bennett Co-Hosts: Rob Campbell, Jeff Massie, and Ken McDonald Download or subscribe to Untitled Linux Show at https://twit.tv/shows/untitled-linux-show Want access to the ad-free video and exclusive features? Become a member of Club TWiT today! https://twit.tv/clubtwit Club TWiT members can discuss this episode and leave feedback in the Club TWiT Discord.

All TWiT.tv Shows (MP3)
This Week in Tech 1056: The Big Sleep

All TWiT.tv Shows (MP3)

Play Episode Listen Later Nov 3, 2025 167:58


From AI-powered code generation boosting productivity to adversaries using the same tools to hunt zero-days, the panel exposes the coming wave of AI-fueled cyberattacks—and why most companies aren't ready for it. Cotton blocks Trump-backed effort to make daylight saving time permanent The End of Cybersecurity Amazon says it didn't cut 14,000 people because of money. It cut them because of 'culture' Here's How the AI Crash Happens US government is getting closer to banning TP-Link routers Neato cloud shutdown sees robocleaners robbed of their smarts FCC will vote to scrap telecom cybersecurity requirements Trump FCC Votes To Make It Easier For Your Broadband ISP To Rip You Off Swedish Death Cleaning But for Your Ditital Life The F5 Hack is a Big Deal OpenAI Releases Agentic Security Researcher 'Do not trust your eyes': AI generates surge in expense fraud Proton Data Breach Observatory aims to alert you in near real-time Using a Security Key on X? Re-Enroll Now or Your Account Will Be Locked YouTube denies AI was involved with odd removals of tech tutorials 10M people watched a YouTuber shim a lock; the lock company sued him. Bad idea. Samsung's $2000 smart fridges are getting ads - gHacks Tech News ESPN, ABC, and other Disney channels go dark on YouTube TV Host: Leo Laporte Guests: Jill Duffy, Alex Stamos, and Stacey Higginbotham Download or subscribe to This Week in Tech at https://twit.tv/shows/this-week-in-tech Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: ziprecruiter.com/twit zscaler.com/security miro.com canary.tools/twit - use code: TWIT Melissa.com/twit

Radio Leo (Audio)
This Week in Tech 1056: The Big Sleep

Radio Leo (Audio)

Play Episode Listen Later Nov 3, 2025 168:13


From AI-powered code generation boosting productivity to adversaries using the same tools to hunt zero-days, the panel exposes the coming wave of AI-fueled cyberattacks—and why most companies aren't ready for it. Cotton blocks Trump-backed effort to make daylight saving time permanent The End of Cybersecurity Amazon says it didn't cut 14,000 people because of money. It cut them because of 'culture' Here's How the AI Crash Happens US government is getting closer to banning TP-Link routers Neato cloud shutdown sees robocleaners robbed of their smarts FCC will vote to scrap telecom cybersecurity requirements Trump FCC Votes To Make It Easier For Your Broadband ISP To Rip You Off Swedish Death Cleaning But for Your Ditital Life The F5 Hack is a Big Deal OpenAI Releases Agentic Security Researcher 'Do not trust your eyes': AI generates surge in expense fraud Proton Data Breach Observatory aims to alert you in near real-time Using a Security Key on X? Re-Enroll Now or Your Account Will Be Locked YouTube denies AI was involved with odd removals of tech tutorials 10M people watched a YouTuber shim a lock; the lock company sued him. Bad idea. Samsung's $2000 smart fridges are getting ads - gHacks Tech News ESPN, ABC, and other Disney channels go dark on YouTube TV Host: Leo Laporte Guests: Jill Duffy, Alex Stamos, and Stacey Higginbotham Download or subscribe to This Week in Tech at https://twit.tv/shows/this-week-in-tech Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: ziprecruiter.com/twit zscaler.com/security miro.com canary.tools/twit - use code: TWIT Melissa.com/twit

All TWiT.tv Shows (Video LO)
This Week in Tech 1056: The Big Sleep

All TWiT.tv Shows (Video LO)

Play Episode Listen Later Nov 3, 2025 167:28 Transcription Available


From AI-powered code generation boosting productivity to adversaries using the same tools to hunt zero-days, the panel exposes the coming wave of AI-fueled cyberattacks—and why most companies aren't ready for it. Cotton blocks Trump-backed effort to make daylight saving time permanent The End of Cybersecurity Amazon says it didn't cut 14,000 people because of money. It cut them because of 'culture' Here's How the AI Crash Happens US government is getting closer to banning TP-Link routers Neato cloud shutdown sees robocleaners robbed of their smarts FCC will vote to scrap telecom cybersecurity requirements Trump FCC Votes To Make It Easier For Your Broadband ISP To Rip You Off Swedish Death Cleaning But for Your Digital Life The F5 Hack is a Big Deal OpenAI Releases Agentic Security Researcher 'Do not trust your eyes': AI generates surge in expense fraud Proton Data Breach Observatory aims to alert you in near real-time Using a Security Key on X? Re-Enroll Now or Your Account Will Be Locked YouTube denies AI was involved with odd removals of tech tutorials 10M people watched a YouTuber shim a lock; the lock company sued him. Bad idea. Samsung's $2000 smart fridges are getting ads - gHacks Tech News ESPN, ABC, and other Disney channels go dark on YouTube TV Host: Leo Laporte Guests: Jill Duffy, Alex Stamos, and Stacey Higginbotham Download or subscribe to This Week in Tech at https://twit.tv/shows/this-week-in-tech Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: ziprecruiter.com/twit zscaler.com/security miro.com canary.tools/twit - use code: TWIT Melissa.com/twit

All TWiT.tv Shows (Video LO)
Untitled Linux Show 227: Ancient Stack Tax

All TWiT.tv Shows (Video LO)

Play Episode Listen Later Nov 3, 2025 105:55 Transcription Available


This week SUSE's SLES and Red Hat's RHEL are embracing AI in the form of MCP and CUDA support. FFMPEG scores a $100k donation, Pop_OS and Cosmic finally have a release data, and Unity is in need of help. Kodi 22 has an Alpha, Debian has a Systemd dustup, and Krita has landed HDR support. And there's a port of Linux to WASM, so you can run the kern in your browser. Handy! For tips we have doxx for opening .docx in the terminal, a primer on absolute vs relative paths, whoami for grabbing the current username, and btrfs's scrub command for checking the local disk. You can find the show notes at https://bit.ly/4ovhsLG and have a great week! Host: Jonathan Bennett Co-Hosts: Rob Campbell, Jeff Massie, and Ken McDonald Download or subscribe to Untitled Linux Show at https://twit.tv/shows/untitled-linux-show Want access to the ad-free video and exclusive features? Become a member of Club TWiT today! https://twit.tv/clubtwit Club TWiT members can discuss this episode and leave feedback in the Club TWiT Discord.