POPULARITY
We congratulate our portfolio company, eToro, on its public debut on Nasdaq, where it raised $310 million and saw a 29% share increase to open trading. We also evaluate the latest versions of OpenAI GPT-4.1 and 4.1 mini. We then comment on the Quantinuum and Al Rabban Capital launch of a $1B quantum venture in Qatar. Last, we analyze if this may signal renewed interest in tech IPOs and look at the next likely candidates in our Chart of the Week. Remember to Stay Current! To learn more, visit us on the web at https://www.morgancreekcap.com/morgan-creek-digital/. To speak to a team member or sign up for additional content, please email mcdigital@morgancreekcap.com Legal Disclaimer This podcast is for informational purposes only and should not be construed as investment advice or a solicitation for the sale of any security, advisory, or other service. Investments related to the themes and ideas discussed may be owned by funds managed by the host and podcast guests. Any conflicts mentioned by the host are subject to change. Listeners should consult their personal financial advisors before making any investment decisions.
#applemaps #apple #iphone #Michelin #MichelinGuide001: Apple Maps 夥拍米芝蓮!搵食玩樂升級!002: Samsung Galaxy AI 傳升級新玩意: AI 影相變短片!003: OpenAI GPT-4.1 全面上線 ChatGPT004: Waymo 叫停 1,200 架 Robotaxi005: 「Hey Copilot!」登陸 Windows 11006: 中資公司想豪擲 3 億美元買 $TRUMP Coin007: 用錯 AI 片生成器,小心中毒!
Im Zuge der Google Cloud Next haben wir einige AI-Updates von Google für euch:Gemini 2.5 Pro ist jetzt mit Deep Research verfügbar, außerdem gibt's Gemini 2.5 Pro und App-Prototyping in Firebase AI StudioGemini 2.5 Flash mit Hybrid ReasoningGoogle AI Studio neu designt und Live APIVeo 2 Video Model Rollout & Veo Text-To-Video in der API verfügbarGemma 3 Modelle mit QAT optimiertAudioX – Anything to Audio GenerationOpenAI veröffentlicht GPT-4.1, GPT-4.1 mini, and GPT-4.1 nanoOpenAI o3 und o4-mini mit Tool-NutzungOpenAI plant, Windsurf zu kaufenDolphinGemma – LLM für Delfin-KommunikationNVIDIA baut Supercomputer in TexasDocker unterstützt Gemma 3 Modelle lokal: Run Gemma 3 Locally with DockerShopify Memo über AI-Nutzung: Internes Memo von Tobi LutkeSchreibt uns! Schickt uns eure Themenwünsche und euer Feedback: podcast@programmier.barFolgt uns! Bleibt auf dem Laufenden über zukünftige Folgen und virtuelle Meetups und beteiligt euch an Community-Diskussionen. BlueskyInstagramLinkedIn
В этом выпуске: Благодарочка за Бусти Спешалы – рассказываю, как ваша поддержка помогает мне создавать контент Короткая поездка в Крым – тестирую зарядку от Яндекса в поезде Приятное удивление от сервисов mail.ru – делюсь неожиданными плюсами популярных инструментов Мессенджер MAX – обзор функционала и мои впечатления Gemini Pro 2.5 для кода – сравнение с Claude 3.7 и новым GPT-4.1 Minecraft Лаунчеры Модов – рассказываю о популярных решениях и их возможностях Опыт заметок в Obsidian и страшные мысли о возвращении в Notion Ссылки: Спешалы на Бусти: https://boosty.to/halofourteen Написать: https://t.me/mayatnikov
Traditionele media, gevestigde nieuwsbronnen, kranten en televisie – de communicatiewereld staat op een historisch keerpunt. Technologieën die ooit als fantasie werden afgedaan, creëren nu content die niet van menselijk werk te onderscheiden is. Waar systemen tot voor kort vooral doorgeefluik waren of simpele automatisering boden, vindt de échte revolutie plaats nu machines niet alleen verspreiden maar zelf creëren. Wat betekent dit voor authenticiteit, creatieve beroepen en ons gezamenlijke referentiekader? En de fundamentele vraag: verrijkt AI-gegenereerde content onze cultuur of verschraalt ze onze menselijke ervaring?En dan, vers van de OpenAI-pers: het nieuwe o3-redeneermodel. Het nieuwe model maakt gebruik van "simulated reasoning" om complexe problemen stap voor stap op te lossen, maar de publieke versie verschilt van de recordbrekende testversie, ChatGPT introduceert een langetermijngeheugen dat alle conversaties onthoudt én lanceert OpenAI GPT-4.1 met betere prestaties tegen lagere kosten. Wat een week weer. Wijziging: het volgende webinar is dinsdag 13 mei van 12:00 tot 13:00 in plaats van 15 mei.Als je een lezing wil over AI van Wietse of Alexander dan kan dat. Mail ons op lezing@aireport.emailOp de hoogte blijven van het laatste AI-nieuws en 2x per week tips & tools ontvangen om het meeste uit AI te halen (en bij de webinar te zijn). Abonneer je dan op onze nieuwsbrief via aireport.emailVandaag nog beginnen met AI binnen jouw bedrijf? Ga dan naar deptagency.com/aireport This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.aireport.email/subscribe
据21世纪经济报报道,当地时间4月10日,欧盟委员会宣布与中国达成重要共识,双方将启动以“最低进口价格”机制替代现行对华电动汽车关税的谈判。这一突破性进展标志着持续近半年的贸易争端出现实质性缓和,为全球新能源汽车产业格局带来新的变量。据财联社报道,蚂蚁集团副总裁、前基础大模型负责人徐鹏已离职。徐鹏一直从事人工智能领域技术研究,曾在谷歌工作11年,负责和领导了谷歌翻译的核心技术研发,并参与了谷歌显示广告系统的算法研发。据央视网报道,世界黄金协会首席市场策略师芮强日前表示,今年以来,黄金价格持续攀升,已先后刷新历史高点超20次,涨幅也超出此前预期。他认为,美国政府关税政策引发不确定性,短期内的避险需求仍对金价构成支撑。据格隆汇报道,近期,美国关税政策引发全球担忧。浙江义乌一家专营工具的商户称,目前来自美国的订单暂缓,但不影响整体销量。该商户表示其产品的出厂价也不会因关税改变。据IT之家报道,4月11日消息,特斯拉中国官网显示,目前Model S/X车型已不再提供单独的“订购新车”选项,目前相应车型页面中仅显示有“查看现车”按钮,点击“查看现车”则可以看到一系列车型信息。据巴西通讯社报道称,巴西总统卢拉当地时间11日签署《商业互惠法》,授权联邦政府可对那些对巴西出口产品施加单边贸易壁垒的国家和经济集团采取对等反制措施。该法案将于4月14日在《联邦官方公报》上正式发布并生效。据新浪财经报道,OpenAI宣布GPT-4将于4月30日从ChatGPT中移除,但仍可在API中使用。OpenAI表示,GPT-4退役后完全由新型原生多模态模型GPT-4o取代。
OpenAI mighta just killed Photoshop (as we know it). ☠️Microsoft released reasoning agents that are unmatched.
In this episode, I explore the concept of the "AI flywheel" - the accelerating momentum of artificial intelligence development that's either carrying us forward or leaving us behind. As we approach the critical juncture of Artificial General Intelligence we need to understand what makes us uniquely human, something I discussed with recent guest, Dom Heinrich. Key PointsThe Accelerating Pace: Recent releases from Anthropic (Claude 3.7 Sonnet), OpenAI (GPT-4.5), Google (Gemini), and Grok demonstrate how each breakthrough contributes to faster development.Choosing Your Tools: Mark uses different AI models for specific purposes - Claude for creative work, Gemini for Google's ecosystem, and Grok for Twitter-based insights.Human Uniqueness: As AI handles analytical tasks with ease, our distinct value may lie in emotional intelligence, ethical reasoning, and wisdom from lived experience.Finding Balance: Technology should be used thoughtfully. AI might help us become more human by handling routine tasks, freeing us to focus on relationships, creativity, and empathy.Notable Quote"As controllers of technology, we must balance technological connection with disconnection, have the discipline to lose ourselves in our unconscious minds, and have the focus to listen to our souls."Join the ConversationI invites listeners to share how they're navigating the AI flywheel and what human qualities they believe will become more valuable as AI advances.Timestamps00:00 Introduction 00:08 The AI Flywheel Concept00:38 Recent AI Developments01:49 Choosing the Right AI Tool02:49 Human Qualities in the Age of AI03:43 Balancing Technology and Humanity04:33 Engaging with AI Technology05:12 Conclusion and Call to Action LinksLinkedin Hosted on Acast. See acast.com/privacy for more information.
Oggi vedremo insieme come sta evolvendo il turismo a Berlino e perché IHG ha ammesso di non riuscire a rispettare i suoi obiettivi di sostenibilità. Parleremo di Albufeira, in Portogallo, obbligata a varare un nuovo codice di comportamento per i turisti indisciplinati, e del Qatar, dove il turismo sta conoscendo una vera e propria esplosione dell'IA.E infine ci fermeremo in Scozia, che pianifica di contrastare l'overtourism proprio con l'IA e faremo una rapida carrellata di tutte le novità tecnologiche del turismo. Io sono Mirko Lalli e questo è Data Appeal Byte-sized Trends, un podcast sul futuro del turismo, dedicato a tutte le innovazioni che stanno trasformando il modo di viaggiare. Gli spunti di riflessione di questa settimana: 360° Berlin: A Comprehensive Exploration of the City's Tourism Landscape IHG's Total Greenhouse Gas Emissions Increase Despite Extensive Efforts to Decarbonize Popular Portuguese resort town tells tourists to put their clothes on or face a fine How Saudi Tourism Authority and Qatar Airways are innovating with AI agents Visit Qatar, Microsoft join forces to advance new AI-powered smart tourism solutions AI could hold the key to beating tourist overcrowding LE NEWS DELLA SETTIMANA IN AMBITO AI e TRAVEL-TECH Microsoft e GPT-5 di OpenAI
AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store
AI Weekly: Grok's Risks, GPT-4.5, and Industry Developments (Feb 2025).The AI sector is rapidly evolving, as evidenced by a flurry of developments. OpenAI launched GPT-4.5 with enhanced capabilities, whilst Meta and Amazon invest heavily in AI infrastructure and assistants. Ethical quandaries continue to surface, highlighted by concerns over AI safety, data privacy, and artistic integrity. Companies like DeepSeek are demonstrating impressive profitability, and Tencent is releasing AI models with increased speed. Meanwhile, innovative applications range from Google's laser-based internet delivery to AI-powered traffic cameras, underscoring AI's expanding influence across diverse domains. These advances require ongoing scrutiny and proactive measures to ensure responsible and equitable deployment.
以下のようなトピックについて話をしました。 01. 宇宙飛行士の野口聡一がISSシミュレーターを体験評価 宇宙飛行士の野口聡一さんが、NASAの協力を得て株式会社スペースデータが開発した「ISS Simulator」というゲームをプレイした感想を述べた動画の内容です。 ISSは、16カ国が共同で運用する国際宇宙ステーションで、1998年に打ち上げられ、現在も運用中です。ゲームでは、温度や風などの実際のデータを元にISSの環境が再現されています。 動画内では、自由に移動できる球形ロボット「イントボール」の操作の難しさや、無重力空間での風の流れ、ロボットアーム操作パネルの設計、ケーブル配線の問題、運動設備やトイレの特徴などが紹介されました。 野口さんは、シミュレーターの技術的な再現度の高さを評価する一方で、ゲームとしては現実以上に宇宙っぽい表現があってもよいと述べ、クリエイターとの連携でより魅力的なものになる可能性を示唆しました。また、オープンワールドゲームの実物以上にリアルな表現に魅力を感じていることも明かしました。 02. OpenAIが新言語モデルGPT-4.5を発表 OpenAIが新たな言語モデル「GPT-4.5」をリリースしました。GPT-4.5は、OpenAIの最大かつ最も知識豊富な言語モデルとして位置付けられています。 主な特徴として、教師なし学習を大規模に活用することで、より広範な「世界モデル」を獲得し、パターン認識や関連付け、洞察生成の能力が向上しました。また、感情的知性(EQ)が高まり、より自然で暖かみのある対話が可能になりました。 GPT-4.5は、ChatGPTのProプラン利用者とAPI開発者向けに先行提供され、その後段階的に他のプランにも展開される予定です。ChatGPTウェブ版では、ウェブ検索機能やファイル・画像のアップロード、Canvas機能などが追加されました。 安全性に関しては、従来の教師あり学習と強化学習を組み合わせた手法で訓練されており、ハルシネーション(幻覚)の発生率も低減されています。 OpenAIは、GPT-4.5を最後の非推論モデルと位置付けており、将来的にはユーザーがモデルを意識せずに利用できる体験を目指しています。また、o系列の推論モデルとGPT系列のモデルを統合する方針も示されました。 03. Anthropic社が高性能AI『Claude 3.7 Sonnet』を発表 Anthropic社が発表した「Claude 3.7 Sonnet」は、AIモデルの新たな進化を示す画期的な製品です。このモデルの最大の特徴は、高速な応答と深い思考を1つのシステムで実現する「ハイブリッド推論モデル」という点です。 ユーザーは状況に応じて、迅速な回答を得られる標準モードと、複雑な問題に対して段階的に推論を重ねる拡張思考モードを切り替えて使用できます。拡張思考モードでは、AIの思考プロセスを可視化することも可能になりました。 特筆すべきは、コーディングと前端開発における性能向上です。ソフトウェア開発のベンチマークテストで最高水準の結果を記録し、実用性が大幅に向上しています。 また、Claude 3.7 Sonnetと同時に発表された「Claude Code」は、開発者向けのコマンドラインツールで、コードの検索や編集、テスト、GitHub連携などを直接ターミナルから行えるようになりました。 さらに、このモデルは128Kトークンの長文処理能力を持ち、より複雑で長い文章の理解と生成が可能になっています。安全性の面でも改善が見られ、有害なリクエストの識別精度が45%向上しました。 Claude 3.7 Sonnetは、AIの実用性と柔軟性を大きく前進させる革新的なモデルとして、幅広い分野での活用が期待されています。 04. 10倍高速なAI言語モデル『Mercury Coder』登場 AI開発企業Inceptionが、従来のAIモデルよりも最大10倍高速なテキスト生成が可能な大規模言語モデル「Mercury Coder」をリリースしました。Mercury Coderは拡散型の言語モデルで、ノイズから単語を抽出してコードを生成する新しいアプローチを採用しています。 このモデルの特徴は以下の通りです: 高速性: 既存のNVIDIAハードウェア上で毎秒10,000トークンまで生成可能。 パフォーマンス: Gemini 2.0 Flashlight、GPT-4o miniなどの小型フロンティアモデルと同等の性能。 並列処理: 従来の左から右へのトークン生成ではなく、一度にすべてを処理。 マルチモーダル対応: 将来的に動画や画像生成と組み合わせた機能が期待される。 コーディング能力: 複雑なコード生成タスクにも対応可能。 Mercury Coderは現在、無料でテスト利用が可能ですが、1時間あたり10リクエストの制限があります。この新しいアーキテクチャは、特に高速な推論速度を必要とする分野でイノベーションを促進する可能性があります。 05. 米民間企業の月着陸機『ブルーゴースト』が軟着陸に成功 アメリカの民間企業Firefly Aerospaceの月着陸機「Blue Ghost」が2025年3月2日17時35分頃、月面への軟着陸に成功しました。これは民間企業による2回目の月面軟着陸成功となります。 Blue Ghostは2025年1月15日にSpaceXのFalcon 9ロケットで打ち上げられ、危難の海にあるラトレイユ山の近くに着陸しました。このミッションはNASAの商業月輸送サービス(CLPS)の一環として実施されました。 搭載されたペイロードには、月面下10フィートまで測定可能な熱流量計や、全地球航法衛星システム(GNSS)の信号を月環境で利用できるかを実証する受信器など、計10の機器が含まれています。 着陸地点は丁度日の出を迎えたタイミングで、日の入りは3月16日の予定です。Blue Ghostのミッションはこの2週間にわたって行われる見込みです。 Firefly AerospaceはBlue Ghostの着陸を「完全に成功した月面着陸」と表現しており、これは以前の民間月着陸機Odysseusが横転した状態で接地したことを意識したものと思われます。 06. 手のひらサイズの月面探査車YAOKIが開発 YAOKIは、月面開発の最前線で活躍する超小型・超軽量・高強度の月面探査車(月面ローバー)です。以下がYAOKIの主な特徴と目標です: 特徴: 超小型:15×15×10cmと手のひらに乗るサイズ 超軽量:498gと非常に軽量 高強度:100Gの衝撃に耐え、洞窟への投げ込み探査も可能 確実走行:転倒しても走行可能な設計 目標: 民間企業による月面探査の実現:NASAの月輸送ミッション「CLPS」に日本企業として参加 アルテミス計画と連携した月面開発への貢献:2025年頃からモビリティシステム分野での貢献を目指す 月面基地建設への貢献:2028年頃から始まる月面基地建設を支援し、多数のYAOKIが月で働く未来を実現 YAOKIは、コストを抑えて月面に送り込むことができる設計となっており、民間企業による月面探査を実現し、月面開発を着実に前進させることを目指しています。将来的には、大量のYAOKIが月で活躍する姿を描いています。 本ラジオはあくまで個人の見解であり現実のいかなる団体を代表するものではありません ご理解頂ますようよろしくおねがいします
Is pre-training dead? In this bonus episode of Mixture of Experts, guest host Bryan Casey is joined by Kate Soule and Chris Hay. On Thursday, Sam Altman dropped GPT-4.5 just after we wrapped our weekly recording. We got a few of our veteran experts on the podcast to analyze OpenAI's largest and “best” chat model yet. What's the hype? Tune-in to this bonus episode to find out! 00:01 – Intro 00:25 – GPT-4.5 The opinions expressed in this podcast are solely those of the participants and do not necessarily reflect the views of IBM or any other organization or entity.
Szupergyors rakétát fejlesztenek az oroszok marsi utazáshoz ICT Global 2025-02-22 06:03:11 Infotech Mars Oroszország egy új, plazmaalapú rakétahajtóművet tesztel, amely a töltött hidrogén részecskék rendszerével drasztikusan csökkentené az utazási időt a Földről a Marsra. Protoclone V1: az android robot, amelytől garantáltan borzongani fog! ITBusiness 2025-02-22 07:07:25 Mobiltech USA Lengyelország Robot Android Startup Az Egyesült Államokban és Lengyelországban működő startup állítása szerint az android anatómiailag pontos kialakítású, arc nélküli robot, amely több mint 200 szabadságfokkal, 1000 Myofiber izomrosttal és 500 szenzorral rendelkezik. Az eMag-nál indul az iPhone 16e előrendelése - a cég megvizsgálta a hazai iPhone-preferenciákat is Digital Hungary 2025-02-22 07:27:05 Mobiltech Telefon Apple Okostelefon iPhone Mobiltelefon Az Apple bejelentette az iPhone 16e-t, amely a gyártó legmegfizethetőbb középkategóriás modellje az iPhone 16 termékcsaládon belül: az új készülék előrendelése az eMAG-nál 2025. február 21-én indul. Az eMAG Genius előfizetéses hűségprogram felhasználói ingyenesen kérhetik új mobiltelefonjuk kiszállítását csomagautomatába. A Grok 3 kihívás a csúcstechnológiás AI-rendszerek számára Mínuszos 2025-02-22 09:33:04 Infotech Mesterséges intelligencia Elon Musk OpenAI DeepSeek xAI Elon Musk xAI vállalata bemutatta legújabb mesterséges intelligencia modelljét, a Grok 3-at, amely állításuk szerint felülmúlja az OpenAI GPT-4.0-t és a DeepSeek V3-at a matematika, tudomány és kódolás területén. A Grok 3 egy korai verziója a Chatbot Arena teszteken is az első helyre került – ez egy olyan platform, ahol Mi történik, ha villám csap egy utasszállítóba repülés közben? Player 2025-02-22 05:51:02 Tudomány Repülőtér Repülőgép Utasszállító Ha azt hiszed, hogy egy villámcsapás végzetes lehet egy repülőgépre nézve, ideje átgondolnod. Nem nagy durranás, de tényleg. Egy ital, ami segíthet a fogyásban, és nem a kávé az First Class 2025-02-22 08:43:53 Tudomány Fogyasztás Kávé Egy számunkra alig ismert italról derítették ki kutatók, hogy rendszeres fogyasztása igen jó eséllyel segít a fogyásban, és csökkentheti az éhségérzetet. Új, a SARS-CoV-2-höz hasonló vírust találtak 24.hu 2025-02-22 10:53:55 Tudomány Koronavírus Bár a kórokozó hasonlít a Covid-19 vírusához, a szakértők szerint nincs ok az aggodalomra. Magyar egyetem bevonásával újul meg az európai kutatásmenedzsment-képzés Helló Sajtó! 2025-02-22 06:30:01 Tudomány Oktatás egyetem Corvinus Egységes európai kutatásmenedzsment-képzési keretrendszert dolgoz ki az a februárban indult nemzetközi projekt, amelyben a Budapesti Corvinus Egyetem kiemelt szakmai partnerként nemcsak a tananyag tesztelésében vesz részt, hanem fontos szerepet vállal a képzések európai szintű minőségi garanciáinak kidolgozásában is. Forma–1-es szakemberek is oktatnak a Széchenyi István Egyetem egyedülálló motorsportmérnök-képzésén Öko-drive 2025-02-22 11:36:28 Tudomány Oktatás egyetem Győr Közép-Európában egyedülálló, angol nyelvű motorsportmérnök mesterszakot indított a győri Széchenyi István Egyetem, amelynek első évfolyamán tizenegy hallgató kezdte meg a napokban tanulmányait. Az oktatók között több Forma–1-es tapasztalattal rendelkező szakember, illetve az autósport más szakágaiban világbajnoki címekkel rendelkező csapatok mérnök Elkezdte letiltani az egyik legnépszerűbb reklámblokkolót a Chrome-ban a Google PC Fórum 2025-02-22 07:00:00 Infotech Google Reklám Böngésző Chrome A Google a héten új fázisba léptette a Manifest V2, illetve az azon alapuló bővítmények kivezetését Chrome böngészőjéből. Ennek részeként a cég elkezdte letiltani az egyik legrégebbi, egyben legnépszerűbb reklámblokkolót, a uBlock Origin-t is szoftverében, a hétköznapi webezők számára is. 50 forintos visszaváltás: emelkedhet a palackok betétdíja egy uniós előírás miatt vg.hu 2025-02-22 06:15:00 Belföld Világűr A csillagászati összegek miatt lassan kötelező lesz a REpontok használata. Az AI új szintre emeli az alumínium öntést: könnyebb, mégis ugyanolyan szilárd alkatrészek newtechnology.hu 2025-02-22 05:33:58 Cégvilág Mesterséges intelligencia Anglia Alkatrészüzem Az angliai Coventryben működő Sarginsons Industries bemutatta az első olyan terveket, amelyek mesterséges intelligencia segítségével optimalizált alumínium alkatrészek gyártását írják le. A vállalat egy 2022-es, már súlycsökkentett autóipari segédalvázat tervezett Lehet, hogy az örök fiatalság titka nem a Földön, hanem az űrben rejlik? Nicole 2025-02-22 11:40:00 Egészség Világűr NASA A hosszú élet és az öregedés lassításának lehetőségei régóta foglalkoztatják a tudósokat – de mi van akkor, ha a kulcs a csillagok között rejlik? A NASA Twin Study ikerkutatása1 egy eddig ismeretlen dimenziót tárt fel: a sejtek az űrben másképp öregednek, és egyes folyamatok még vissza is fordíthatók. Az eredmények nemcsak a jövőbeli űrutazásokat f A további adásainkat keresd a podcast.hirstart.hu oldalunkon.
Szupergyors rakétát fejlesztenek az oroszok marsi utazáshoz ICT Global 2025-02-22 06:03:11 Infotech Mars Oroszország egy új, plazmaalapú rakétahajtóművet tesztel, amely a töltött hidrogén részecskék rendszerével drasztikusan csökkentené az utazási időt a Földről a Marsra. Protoclone V1: az android robot, amelytől garantáltan borzongani fog! ITBusiness 2025-02-22 07:07:25 Mobiltech USA Lengyelország Robot Android Startup Az Egyesült Államokban és Lengyelországban működő startup állítása szerint az android anatómiailag pontos kialakítású, arc nélküli robot, amely több mint 200 szabadságfokkal, 1000 Myofiber izomrosttal és 500 szenzorral rendelkezik. Az eMag-nál indul az iPhone 16e előrendelése - a cég megvizsgálta a hazai iPhone-preferenciákat is Digital Hungary 2025-02-22 07:27:05 Mobiltech Telefon Apple Okostelefon iPhone Mobiltelefon Az Apple bejelentette az iPhone 16e-t, amely a gyártó legmegfizethetőbb középkategóriás modellje az iPhone 16 termékcsaládon belül: az új készülék előrendelése az eMAG-nál 2025. február 21-én indul. Az eMAG Genius előfizetéses hűségprogram felhasználói ingyenesen kérhetik új mobiltelefonjuk kiszállítását csomagautomatába. A Grok 3 kihívás a csúcstechnológiás AI-rendszerek számára Mínuszos 2025-02-22 09:33:04 Infotech Mesterséges intelligencia Elon Musk OpenAI DeepSeek xAI Elon Musk xAI vállalata bemutatta legújabb mesterséges intelligencia modelljét, a Grok 3-at, amely állításuk szerint felülmúlja az OpenAI GPT-4.0-t és a DeepSeek V3-at a matematika, tudomány és kódolás területén. A Grok 3 egy korai verziója a Chatbot Arena teszteken is az első helyre került – ez egy olyan platform, ahol Mi történik, ha villám csap egy utasszállítóba repülés közben? Player 2025-02-22 05:51:02 Tudomány Repülőtér Repülőgép Utasszállító Ha azt hiszed, hogy egy villámcsapás végzetes lehet egy repülőgépre nézve, ideje átgondolnod. Nem nagy durranás, de tényleg. Egy ital, ami segíthet a fogyásban, és nem a kávé az First Class 2025-02-22 08:43:53 Tudomány Fogyasztás Kávé Egy számunkra alig ismert italról derítették ki kutatók, hogy rendszeres fogyasztása igen jó eséllyel segít a fogyásban, és csökkentheti az éhségérzetet. Új, a SARS-CoV-2-höz hasonló vírust találtak 24.hu 2025-02-22 10:53:55 Tudomány Koronavírus Bár a kórokozó hasonlít a Covid-19 vírusához, a szakértők szerint nincs ok az aggodalomra. Magyar egyetem bevonásával újul meg az európai kutatásmenedzsment-képzés Helló Sajtó! 2025-02-22 06:30:01 Tudomány Oktatás egyetem Corvinus Egységes európai kutatásmenedzsment-képzési keretrendszert dolgoz ki az a februárban indult nemzetközi projekt, amelyben a Budapesti Corvinus Egyetem kiemelt szakmai partnerként nemcsak a tananyag tesztelésében vesz részt, hanem fontos szerepet vállal a képzések európai szintű minőségi garanciáinak kidolgozásában is. Forma–1-es szakemberek is oktatnak a Széchenyi István Egyetem egyedülálló motorsportmérnök-képzésén Öko-drive 2025-02-22 11:36:28 Tudomány Oktatás egyetem Győr Közép-Európában egyedülálló, angol nyelvű motorsportmérnök mesterszakot indított a győri Széchenyi István Egyetem, amelynek első évfolyamán tizenegy hallgató kezdte meg a napokban tanulmányait. Az oktatók között több Forma–1-es tapasztalattal rendelkező szakember, illetve az autósport más szakágaiban világbajnoki címekkel rendelkező csapatok mérnök Elkezdte letiltani az egyik legnépszerűbb reklámblokkolót a Chrome-ban a Google PC Fórum 2025-02-22 07:00:00 Infotech Google Reklám Böngésző Chrome A Google a héten új fázisba léptette a Manifest V2, illetve az azon alapuló bővítmények kivezetését Chrome böngészőjéből. Ennek részeként a cég elkezdte letiltani az egyik legrégebbi, egyben legnépszerűbb reklámblokkolót, a uBlock Origin-t is szoftverében, a hétköznapi webezők számára is. 50 forintos visszaváltás: emelkedhet a palackok betétdíja egy uniós előírás miatt vg.hu 2025-02-22 06:15:00 Belföld Világűr A csillagászati összegek miatt lassan kötelező lesz a REpontok használata. Az AI új szintre emeli az alumínium öntést: könnyebb, mégis ugyanolyan szilárd alkatrészek newtechnology.hu 2025-02-22 05:33:58 Cégvilág Mesterséges intelligencia Anglia Alkatrészüzem Az angliai Coventryben működő Sarginsons Industries bemutatta az első olyan terveket, amelyek mesterséges intelligencia segítségével optimalizált alumínium alkatrészek gyártását írják le. A vállalat egy 2022-es, már súlycsökkentett autóipari segédalvázat tervezett Lehet, hogy az örök fiatalság titka nem a Földön, hanem az űrben rejlik? Nicole 2025-02-22 11:40:00 Egészség Világűr NASA A hosszú élet és az öregedés lassításának lehetőségei régóta foglalkoztatják a tudósokat – de mi van akkor, ha a kulcs a csillagok között rejlik? A NASA Twin Study ikerkutatása1 egy eddig ismeretlen dimenziót tárt fel: a sejtek az űrben másképp öregednek, és egyes folyamatok még vissza is fordíthatók. Az eredmények nemcsak a jövőbeli űrutazásokat f A további adásainkat keresd a podcast.hirstart.hu oldalunkon.
Send Everyday AI and Jordan a text messageSam Altman laid out plans for GPT-5. Elon Musk says Grok-3 is dropping today. We finally have plans about Claude's new 'hybrid' model. AI news doesn't slow down. Neither should your company's AI adoption. Don't waste time. Join us on Mondays as we bring you the AI News that Matters.Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode PageJoin the discussion: Ask Jordan questions on AIUpcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTopics Covered in This Episode:1. OpenAI GPT model updates2. Grok 3 launch3. Apple and Amazon voice assistants4. YouTube Veo 2 Integration5. Apple and Meta's RoboticsTimestamps:00:00 Weekly AI News Guide05:25 Elon Musk's OpenAI Strategy Unpacked07:51 XAI's Ambitious AI with Grok12:03 Apple & Amazon Voice Assistant Delays16:07 Perplexity AI Launches Free Deep Research19:25 "Challenges of AI Copy-Pasting"20:14 "Upcoming AI Research Revolution"23:38 YouTube AI Backgrounds Access Guide29:09 OpenAI's Upcoming Model Releases31:11 GPT 5 Tiered Access36:22 OpenAI Aims to Rival Google38:46 Claude SONNET 3.5 Update Stagnation41:47 Big Tech Embraces Robotics Expansion45:12 Adobe Faces AI Competition47:22 AI Developments: GPT, Claude, and MoreKeywords:AI models, Grok three, OpenAI, GPT five, Anthropic, hybrid model, everyday AI, Elon Musk, XAI, Sam Altman, Grok two, ChatGPT, Siri, Alexa, Perplexity AI, deep research, large language models, YouTube, VO two, Dream Screen, Google, Claude, Meta, humanoid robots, Adobe, Firefly, NVIDIA, reasoning models, GPT four o, LLM arena, DeepSeek Ready for ROI on GenAI? Go to youreverydayai.com/partner
全球AI商機蓄勢待發,2025如何投資更有效?野村投信ETF「台美研發大聯盟」帶您一次掌握!00935 獨家研發費用選股,鎖定台灣創新科技好股00971 嚴選美國全產業研發龍頭,聚焦美股大廠爆發力00935、00971 你投資世界的優質首選 https://fstry.pse.is/789wfs —— 以上為 Firstory Podcast 廣告 —— 美國科技富豪馬斯克旗下,人工智慧新創公司xAI,18日發布最新版AI聊天機器人Grok 3。馬斯克表示,Grok的目標是要理解宇宙,他也希望,在ChatGPT和中國的DeepSeek等競爭激烈的AI市場當中,Grok 3能夠占有一席之地。至於馬斯克的死對頭,OpenAI執行長阿特曼也在同一天稍早,宣布將推出GPT-4.5,和馬斯克較勁的意味濃厚。 留言告訴我你對這一集的想法: https://open.firstory.me/user/cku2d315gwbbo0947nezjmg86/comments YT收看《寰宇全視界》
Telegram-канал: https://t.me/forgeeksПодводим итоги недели в подкасте ForGeeks. Расскажу про закрытие проекта Hololens, зачем Яндексу свои гаджеты, темы оформления в WhatsApp и про кольцо Энштейна в космосе. Слушайте новый выпуск, читайте и подписывайтесь на ForGeeks в Telegram.
OpenAI fights off Elon Musk while Sam drops info about GPT-5 (it's coming and it's free), the leaders of the world discuss AI safety & Apple's busy making robot lamps… Plus, OpenAI says their next AI model will be one of the world's best programmers, Zonos is a brand new, open source audio cloning tool, a deep dive into Bytedance's new Goku+ model which makes realistic AI influencers and we play with Pika's amazing new Pikadditions platform. AND WE TRY AND FAIL TO MAKE AI EPIC FAILS. Just another fun episode. Join the discord: https://discord.gg/muD2TYgC8f Join our Patreon: https://www.patreon.com/AIForHumansShow AI For Humans Newsletter: https://aiforhumans.beehiiv.com/ Follow us for more on X @AIForHumansShow Join our TikTok @aiforhumansshow To book us for speaking, please visit our website: https://www.aiforhumans.show/ // Show Links // Elon / Sam Drama Ends Up in Offer To Buy OpenAI https://www.reuters.com/legal/elon-musk-openai-head-court-spar-over-nonprofit-conversion-2025-02-04/ BREAKING: Sam Altman on OpenAI's Roadmap (GPT-4.5 & GPT-5) https://x.com/sama/status/1889755723078443244 OpenAI is not for sale says Head of OpenAI Board https://x.com/tsarnick/status/1889412799660786126 OAI Will Reject the Offer https://www.theinformation.com/articles/openai-ceo-says-board-will-reject-musks-97-billion-offer?rc=c3oojq&shared=95a4828df75fbb89 Sam Altman's Three Observations Blog Post https://x.com/sama/status/1888695926484611375 Sam Altman says next gen OAI Programmer Agent will likely be #1 in the world https://x.com/tsarnick/status/1888111042301211084 New Competitive Coding Paper From OAI https://x.com/arankomatsuzaki/status/1889522974467957033 Vibe Coding Tweet From Karpthy https://x.com/karpathy/status/1886192184808149383 JD Vance Speaks At AI Summit in France https://x.com/BasedBeffJezos/status/1889341527349948432 https://www.nytimes.com/2025/02/11/world/europe/vance-speech-paris-ai-summit.html Mistral's Le Chat is Le Fast https://techcrunch.com/2025/02/06/mistral-releases-its-ai-assistant-on-ios-and-android/ Try Le Chat Here https://chat.mistral.ai/chat Plug Baby Plug from Macron https://x.com/AIForHumansShow/status/1889096014818091291 The Pope Weighs In https://www.vaticannews.va/en/pope/news/2025-02/pope-francis-to-artificial-intelligence-action-summit-in-paris.html Zonos, OpenSource AI Audio Voice Clone Tool https://x.com/ZyphraAI/status/1888996367923888341 ByteDance's Goku New AI Video Model https://x.com/_akhaliq/status/1888811509565808924 https://saiyan-world.github.io/goku/ Batman Fan Filmmakers Return With New Star Wars Fan Film https://x.com/Kavanthekid/status/1889371011667144724 r/CursedAI “Who Gives The Best Foot Massage?” https://www.reddit.com/r/CursedAI/comments/1ilmjt0/who_gives_the_best_foot_massage/ APPLE ROBOTICS x EMOTION https://x.com/TheHumanoidHub/status/1887754044980273610 ELEGNT AI Lamp https://x.com/hiltonsimon/status/1887716752278093914 AI PHYSICS COMPARISON https://x.com/venturetwins/status/1889064241900007885 AI Epic Fails https://x.com/LiveBetween2B/status/1889218092712198562 Pika Additions https://pika.art/ Gavin's Pikadditions Examples https://x.com/AIForHumansShow/status/1888336091356201104 https://x.com/AIForHumansShow/status/1888439022046851313
2627ANRワンボタンの声ダウンロードリンク■1/23配信2025年3月21日にApple Vision Proハッカソン(エンジニア以外も大歓迎)、iOSやmacOS、visionOSに対応したオープンソースのローカルLLMクライアント「fullmoon」がリリース、OpenAI、GPT 4oを利用したCanvas作成やタスクをサポートした「ChatGPT for macOS」をリリース、「ChatGPT」、定期アクションやリマインダーを設定可能に--まず有料版で、のニュ..
Jason Howell and Jeff Jarvis discuss the winners and losers in AI for 2024, the persistence of AI hallucinations, the most useful AI tools for creators, and their expectations for AI development in 2025.
We are back with more exciting IDWeek 2024 content. In this episode, Breakpoints hostesses Drs. Erin McCreary, Julie Ann Justo, Jeannette Bouchard, and Megan Klatt highlight more of our favorite sessions and posters at IDWeek, this episode is a must listen if you are an IDWeek nerd like us! References: Perret et al. Application of OpenAI GPT-4 for the retrospective detection of catheter-associated urinary tract infections in a fictitious and curated patient data set. 10.1017/ice.2023.189 Wiemken et al. Assisting the infection preventionist: Use of artificial intelligence for health care–associated infection surveillance. 10.1016/j.ajic.2024.02.007 Leekha et al. Evaluation of hospital-onset bacteraemia and fungaemia in the USA as a potential healthcare quality measure: a cross-sectional study. 10.1136/bmjqs-2023-016831 Diekema et al. Are Contact Precautions "Essential" for the Prevention of Healthcare-associated Methicillin-Resistant Staphylococcus aureus? 10.1093/cid/ciad571 Martin et al. Contact precautions for MRSA and VRE: where are we now? A survey of the Society for Healthcare Epidemiology of America Research Network. 10.1017/ash.2024.350 Browne et al. Investigating the effect of enhanced cleaning and disinfection of shared medical equipment on health-care-associated infections in Australia (CLEEN): a stepped-wedge, cluster randomised, controlled trial. 10.1016/S1473-3099(24)00399-2 Protect trial: Decolonization in Nursing Homes to Prevent Infection and Hospitalization. 10.1056/NEJMoa2215254 Aldardeer et al. Early Versus Late Antipseudomonal β-Lactam Antibiotic Dose Adjustment in Critically Ill Sepsis Patients With Acute Kidney Injury: A Prospective Observational Cohort Study. 10.1093/ofid/ofae059 Schmiemann et al. Effects of a multimodal intervention in primary care to reduce second line antibiotic prescriptions for urinary tract infections in women: parallel, cluster randomised, controlled trial. 10.1136/bmj-2023-076305 Vernacchio et al. Improving Short Course Treatment of Pediatric Infections: A Randomized Quality Improvement Trial. 10.1542/peds.2023-063691 Advani et al. Bacteremia From a Presumed Urinary Source in Hospitalized Adults With Asymptomatic Bacteriuria. 10.1001/jamanetworkopen.2024.2283 Saif et al. Clinical decision support for gastrointestinal panel testing. 10.1017/ash.2024.15 Bekker et al. Twice-Yearly Lenacapavir or Daily F/TAF for HIV Prevention in Cisgender Women. 10.1056/NEJMoa2407001 Montini et al. Short Oral Antibiotic Therapy for Pediatric Febrile Urinary Tract Infections: A Randomized Trial. 10.1542/peds.2023-062598 Nielsen et al. Oral versus intravenous empirical antibiotics in children and adolescents with uncomplicated bone and joint infections: a nationwide, randomised, controlled, non-inferiority trial in Denmark. 10.1016/S2352-4642(24)00133-0 Kaasch et al. Efficacy and safety of an early oral switch in low-risk Staphylococcus aureus bloodstream infection (SABATO): an international, open-label, parallel-group, randomised, controlled, non-inferiority trial. 10.1016/S1473-3099(23)00756-9 AMIKINHAL: Inhaled Amikacin to Prevent Ventilator-Associated Pneumonia. 10.1056/NEJMoa2310307 PROPHY-VAP: Ceftriaxone to prevent early ventilator-associated pneumonia in patients with acute brain injury: a multicentre, randomised, double-blind, placebo-controlled, assessor-masked superiority trial. 10.1016/S2213-2600(23)00471-X AVENIR: Azithromycin to Reduce Mortality — An Adaptive Cluster-Randomized Trial. 10.1056/NEJMoa2312093 Thomas et al. Comparison of Two High-Dose Versus Two Standard-Dose Influenza Vaccines in Adult Allogeneic Hematopoietic Cell Transplant Recipients. 10.1093/cid/ciad458 Schuster et al. The Durability of Antibody Responses of Two Doses of High-Dose Relative to Two Doses of Standard-Dose Inactivated Influenza Vaccine in Pediatric Hematopoietic Cell Transplant Recipients: A Multi-Center Randomized Controlled Trial. 10.1093/cid/ciad534 Mahadeo et al. Tabelecleucel for allogeneic haematopoietic stem-cell or solid organ transplant recipients with Epstein-Barr virus-positive post-transplant lymphoproliferative disease after failure of rituximab or rituximab and chemotherapy (ALLELE): a phase 3, multicentre, open-label trial. 10.1016/S1470-2045(23)00649-6 Khoury et al. Third-party virus-specific T cells for the treatment of double-stranded DNA viral reactivation and posttransplant lymphoproliferative disease after solid organ transplant. 10.1016/j.ajt.2024.04.009 Spec et al. MSG-15: Super-Bioavailability Itraconazole Versus Conventional Itraconazole in the Treatment of Endemic Mycoses—A Multicenter, Open-Label, Randomized Comparative Trial. 10.1093/ofid/ofae010
Guest: Winn SchwartauOn LinkedIn | https://www.linkedin.com/in/winnschwartau/Hosts: HutchOn ITSPmagazine
Send Everyday AI and Jordan a text messageNo joke.... this has been the busiest week in GenAI news. Ever. Amazon -- releases frontier models. Meta -- brings us a new Llama. OpenAI -- new models and features Google -- shipping AI literally everywhere What happened? Why is all of this happening now? We'll dive in, and make you the smartest person in AI at your company. Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode PageJoin the discussion: Ask Jordan questions on AIUpcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTopics Covered in This Episode:1. Amazon AI Developments2. Eleven Labs Voice Agents3. Microsoft AI Developments4. Google AI Model Updates5. xAI Updates6. OpenAI's Latest Releases and Plans7. Meta's Llama 3.3 Model6. OpenAI-Microsoft RelationshipTimestamps:03:00 OpenAI o1 Pro: Elevated AI, exclusive, costly.08:14 Copilot Vision provides insights, prioritizes privacy, feedback-driven.11:58 Gemini surpasses OpenAI GPT-4 in leaderboard.13:58 Google DeepMind outperforms ENS weather, AI advancements19:47 Amazon's model surpasses OpenAI's context window.22:27 Amazon quickly reaches top-tier model status.24:33 11 Labs platform: multilingual AI for customer interaction.29:12 Musk criticizes OpenAI, joins government, impacts technology.33:24 OpenAI discusses removing AGI access clause with Microsoft.34:51 OpenAI's redefined AGI criticized by Elon Musk.40:47 A sneaky release of a semi-open model.44:51 Advanced voice mode updates, some features rumored.46:47 OpenAI announces operator preview, waitlist expected.Keywords:Jordan Wilson, everydayai.com, Amazon Frontier model, ChatGPT, AWS, Nova Canvas, Nova Reel, Anthropic, Eleven Labs, David Sachs, Microsoft Gemini Live, Microsoft Copilot Vision, Google Gemini 1206, Google Gencast, Google Veo, Google Genie 2, Sundar Pichai, XAI, Elon Musk, OpenAI o1 pro model, ChatGPT Pro, AGI definition, Meta Llama 3.3 model, OpenAI-Microsoft relationship, OpenAI public benefit corporation, OpenAI restructuring, AI regulations, Department of Government Efficiency, Large language models, AI development Get more out of ChatGPT by learning our PPP method in this live, interactive and free training! Sign up now: https://youreverydayai.com/ppp-registration/
Send Everyday AI and Jordan a text messageGPT-5 is delayed. So is Copilot Recall Annnnd Google's Project Astra? While Big Tech is facing AI delays, Anthropic is shipping, the Chinese military is reportedly using Meta's open Llama model, and LinkedIn has a new AI agent to do its recruiting. Here's this week's AI news that matters. Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode PageJoin the discussion: Ask Jordan questions on AIUpcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTopics Covered in This Episode:1. ChatGPT Search2. Google AI gets real-time search3. Anthropic Claude gets updates4. OpenAI GPT-5 delayed5. Google's Project Astra delayed6. Microsoft Recall feature delayed7. China hacks Meta's Llama modelTimestamps:00:00 Improved ChatGPT: deeper engagement, Google-like search.04:42 Google unveils real-time search for Gemini AI.07:09 OpenAI and Google's strategies differ for developers.12:30 Google demonstrates delayed Project Astra with computer vision.15:52 Recall testing postponed; opt-in feature due December.17:34 Microsoft may rebrand AI as Windows Intelligence.23:26 LinkedIn aims to automate 80% of recruiting tasks.26:45 China uses open models for military purposes.28:45 Sam Altman confirms no imminent GPT-5 release.32:05 OpenAI prioritizes reasoning models; GPT-5 delayed.35:08 AI updates: LinkedIn, Meta, Reddit AMA, and more.Keywords:ChatGPT, New Search Tool, Google, Gemini AI, AI Business Competition, Anthropic, PDF processing system, AI Industry Trends, Jordan Wilson, Google Project Astra, Microsoft Copilot Plus Recall Feature, Windows Intelligence, Microsoft Rebranding, advanced voice mode feature, DALL-E, Soarra, GPT-5, Search GPT, Claude Mac app, LinkedIn AI hiring assistant, China, Meta's Llama model, military AI development, Sam Altman, Reddit AMA, Australia, rebranding, Copilot, potential bias, export restrictions. Get more out of ChatGPT by learning our PPP method in this live, interactive and free training! Sign up now: https://youreverydayai.com/ppp-registration/
Betteridge's law says no: with seemingly infinite flavors of RAG, and >2million token context + prompt caching from Anthropic/Deepmind/Deepseek, it's reasonable to believe that "in context learning is all you need".But then there's Cosine Genie, the first to make a huge bet using OpenAI's new GPT4o fine-tuning for code at the largest scale it has ever been used externally; resulting in what is now the #1 coding agent in the world according to SWE-Bench Full, Lite, and Verified:SWE-Bench has been the most successful agent benchmark of the year, receiving honors at ICLR (our interview here) and recently being verified by OpenAI. Cognition (Devin) was valued at $2b after reaching 14% on it. So it is very, very big news when a new agent appears to beat all other solutions, by a lot:While this number is self reported, it seems to be corroborated by OpenAI, who also award it clear highest marks on SWE-Bench verified:The secret is GPT-4o finetuning on billions of tokens of synthetic data. * Finetuning: As OpenAI says:Genie is powered by a fine-tuned GPT-4o model trained on examples of real software engineers at work, enabling the model to learn to respond in a specific way. The model was also trained to be able to output in specific formats, such as patches that could be committed easily to codebases. Due to the scale of Cosine's finetuning, OpenAI worked closely with them to figure out the size of the LoRA:“They have to decide how big your LoRA adapter is going to be… because if you had a really sparse, large adapter, you're not going to get any signal in that at all. So they have to dynamically size these things.”* Synthetic data: we need to finetune on the process of making code work instead of only training on working code.“…we synthetically generated runtime errors. Where we would intentionally mess with the AST to make stuff not work, or index out of bounds, or refer to a variable that doesn't exist, or errors that the foundational models just make sometimes that you can't really avoid, you can't expect it to be perfect.”Genie also has a 4 stage workflow with the standard LLM OS tooling stack that lets it solve problems iteratively:Full Video Podlike and subscribe etc!Show Notes* Alistair Pullen - Twitter, Linkedin* Cosine Genie launch, technical report* OpenAI GPT-4o finetuning GA* Llama 3 backtranslation* Cursor episode and Aman + SWEBench at ICLR episodeTimestamps* [00:00:00] Suno Intro* [00:05:01] Alistair and Cosine intro* [00:16:34] GPT4o finetuning* [00:20:18] Genie Data Mix* [00:23:09] Customizing for Customers* [00:25:37] Genie Workflow* [00:27:41] Code Retrieval* [00:35:20] Planning* [00:42:29] Language Mix* [00:43:46] Running Code* [00:46:19] Finetuning with OpenAI* [00:49:32] Synthetic Code Data* [00:51:54] SynData in Llama 3* [00:52:33] SWE-Bench Submission Process* [00:58:20] Future Plans* [00:59:36] Ecosystem Trends* [01:00:55] Founder Lessons* [01:01:58] CTA: Hiring & CustomersDescript Transcript[00:01:52] AI Charlie: Welcome back. This is Charlie, your AI cohost. As AI engineers, we have a special focus on coding agents, fine tuning, and synthetic data. And this week, it all comes together with the launch of Cosign's Genie, which reached 50 percent on SWE Bench Lite, 30 percent on the full SWE Bench, and 44 percent on OpenAI's new SWE Bench Verified.[00:02:17] All state of the art results by the widest ever margin recorded compared to former leaders Amazon Q and US Autocode Rover. And Factory Code Droid. As a reminder, Cognition Devon went viral with a 14 percent score just five months ago. Cosign did this by working closely with OpenAI to fine tune GPT 4. 0, now generally available to you and me, on billions of tokens of code, much of which was synthetically generated.[00:02:47] Alistair Pullen: Hi, I'm Ali. Co founder and CEO of Cosign, a human reasoning lab. And I'd like to show you Genie, our state of the art, fully autonomous software engineering colleague. Genie has the highest score on SWBench in the world. And the way we achieved this was by taking a completely different approach. We believe that if you want a model to behave like a software engineer, it has to be shown how a human software engineer works.[00:03:15] We've designed new techniques to derive human reasoning from real examples of software engineers doing their jobs. Our data represents perfect information lineage, incremental knowledge discovery, and step by step decision making. Representing everything a human engineer does logically. By actually training Genie on this unique dataset, rather than simply prompting base models, which is what everyone else is doing, we've seen that we're no longer simply generating random code until some works.[00:03:46] It's tackling problems like[00:03:48] AI Charlie: a human. Alistair Pullen is CEO and co founder of Kozen, and we managed to snag him on a brief trip stateside for a special conversation on building the world's current number one coding agent. Watch out and take care.[00:04:07] Alessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO of Resonance at Decibel Partners, and I'm joined by my co host Swyx, founder of Small. ai.[00:04:16] swyx: Hey, and today we're back in the studio. In person, after about three to four months in visa jail and travels and all other fun stuff that we talked about in the previous episode.[00:04:27] But today we have a special guest, Ali Pullen from Cosign. Welcome. Hi, thanks for having me. We're very lucky to have you because you're on a two day trip to San Francisco. Yeah, I wouldn't recommend it. I would not[00:04:38] Alistair Pullen: recommend it. Don't fly from London to San Francisco for two days.[00:04:40] swyx: And you launched Genie on a plane.[00:04:42] On plain Wi Fi, um, claiming state of the art in SuiteBench, which we're all going to talk about. I'm excited to dive into your whole journey, because it has been a journey. I've been lucky to be a small angel in part of that journey. And it's exciting to see that you're launching to such acclaim and, you know, such results.[00:05:01] Alistair and Cosine intro[00:05:01] swyx: Um, so I'll go over your brief background, and then you can sort of fill in the blanks on what else people should know about you. You did your bachelor's in computer science at Exeter.[00:05:10] Speaker 6: Yep.[00:05:10] swyx: And then you worked at a startup that got acquired into GoPuff and round about 2022, you started working on a stealth startup that became a YC startup.[00:05:19] What's that? Yeah. So[00:05:21] Alistair Pullen: basically when I left university, I, I met my now co founder, Sam. At the time we were both mobile devs. He was an Android developer. iOS developer. And whilst at university, we built this sort of small consultancy, sort of, we'd um, be approached to build projects for people and we would just take them up and start with, they were student projects.[00:05:41] They weren't, they weren't anything crazy or anything big. We started with those and over time we started doing larger and larger projects, more interesting things. And then actually, when we left university, we just kept doing that. We didn't really get jobs, traditional jobs. It was also like in the middle of COVID, middle of lockdown.[00:05:57] So we were like, this is a pretty good gig. We'll just keep like writing code in our bedrooms. And yeah, that's it. We did that for a while. And then a friend of ours that we went to Exeter with started a YC startup during COVID. And it was one of these fast grocery delivery companies. At the time I was living in the deepest, darkest countryside in England, where fast grocery companies are still not a thing.[00:06:20] So he, he sort of pitched me this idea and was like, listen, like I need an iOS dev, do you fancy coming along? And I thought, absolutely. It was a chance to get out of my parents house, chance to move to London, you know, do interesting things. And at the time, truthfully, I had no idea what YC was. I had no idea.[00:06:34] I wasn't in the startup space. I knew I liked coding and building apps and stuff, but I'd never, never really done anything in that area. So I said, yes, absolutely. I moved to London just sort of as COVID was ending and yeah, worked at what was fancy for about a year and a half. Then we brought Sam along as well.[00:06:52] So we, Sam and I, were the two engineers at Fancy for basically its entire life, and we built literally everything. So like the, the front, the client mobile apps, the, the backends, the internal like stock management system, the driver routing, algorithms, all those things. Literally like everything. It was my first.[00:07:12] You know, both of us were super inexperienced. We didn't have, like, proper engineering experience. There were definitely decisions we'd do differently now. We'd definitely buy a lot of stuff off the shelf, stuff like that. But it was the initial dip of the toe into, like, the world of startups, and we were both, like, hooked immediately.[00:07:26] We were like, this is so cool. This sounds so much better than all our friends who were, like, consultants and doing, like, normal jobs, right? We did that, and it ran its course, and after, I want to say, 18 months or so, GoPuff came and acquired us. And there was obviously a transitionary period, an integration period, like with all acquisitions, and we did that, and as soon as we'd vested what we wanted to vest, and as soon as we thought, okay, this chapter is sort of done, uh, in about 2022, We left and we knew that we wanted to go alone and try something like we'd had this taste.[00:07:54] Now we knew we'd seen how a like a YC startup was managed like up close and we knew that we wanted to do something similar ourselves. We had no idea what it was at the time. We just knew we wanted to do something. So we, we tried a small, um, some small projects in various different areas, but then GPT 3.[00:08:12] He'd seen it on Reddit and I'm his source of all knowledge. Yeah, Sam loves Reddit. I'd actually heard of GPT 2. And obviously had like loosely followed what OpenAI had done with, what was the game they trained a model to play? Dota. Was it Dota? Yeah. So I'd followed that and, I knew loosely what GPT 2 was, I knew what BERT was, so I was like, Okay, this GPT 3 thing sounds interesting.[00:08:35] And he just mentioned it to me on a walk. And I then went home and, like, googled GPT was the playground. And the model was DaVinci 2 at the time. And it was just the old school playground, completions, nothing crazy, no chat, no nothing. I miss completions though. Yeah. Oh, completion. Honestly, I had this conversation in open hours office yesterday.[00:08:54] I was like, I just went. I know. But yeah, so we, we, um, I started playing around with the, the playground and the first thing I ever wrote into it was like, hello world, and it gave me some sort of like, fairly generic response back. I was like, okay, that looks pretty cool. The next thing was. I looked through the docs, um, also they had a lot of example prompts because I had no idea.[00:09:14] I didn't know if the, if you could put anything in, I didn't know if you had to structure in a certain way or whatever, and I, and I saw that it could start writing like tables and JSON and stuff like that. So I was like, okay, can you write me something in JSON? And it did. And I was like, Oh, wow, this is, this is pretty cool.[00:09:28] Um, can it, can it just write arbitrary JSON for me? And, um, immediately as soon as I realized that my mind was racing and I like got Sam in and we just started messing around in the playground, like fairly innocently to start with. And then, of course, both being mobile devs and also seeing, at that point, we learned about what the Codex model was.[00:09:48] It was like, this thing's trained to write code, sounds awesome. And Copilot was start, I think, I can't actually remember if Copilot had come out yet, it might have done. It's round about the same time as Codex. Round about the same time, yeah. And we were like, okay, as mobile devs, let's see what we can do.[00:10:02] So the initial thing was like, okay, let's see if we can get this AI to build us a mobile app from scratch. We eventually built the world's most flimsy system, which was back in the day with like 4, 000 token context windows, like chaining prompts, trying to keep as much context from one to the other, all these different things, where basically, Essentially, you'd put an app idea in a box, and then we'd do, like, very high level stuff, figuring out what the stack should be, figuring out what the frontend should be written in, backend should be written in, all these different things, and then we'd go through, like, for each thing, more and more levels of detail, until the point that you're You actually got Codex to write the code for each thing.[00:10:41] And we didn't do any templating or anything. We were like, no, we're going to write all the code from scratch every time, which is basically why it barely worked. But there were like occasions where you could put in something and it would build something that did actually run. The backend would run, the database would work.[00:10:54] And we were like, Oh my God, this is insane. This is so cool. And that's what we showed to our co founder Yang. I met my co founder Yang through, through fancy because his wife was their first employee. And, um, we showed him and he was like, You've discovered fire. What is this? This is insane. He has a lot more startup experience.[00:11:12] Historically, he's had a few exits in the past and has been through all different industries. He's like our dad. He's a bit older. He hates me saying that. He's your COO now? He's our COO. Yeah. And, uh, we showed him and he was like, this is absolutely amazing. Let's just do something. Cause he, he, at the time, um, was just about to have a child, so he didn't have anything going on either.[00:11:29] So we, we applied to YC, got an interview. The interview was. As most YC interviews are short, curt, and pretty brutal. They told us they hated the idea. They didn't think it would work. And that's when we started brainstorming. It was almost like the interview was like an office hours kind of thing. And we were like, okay, given what you know about the space now and how to build things with these LLMs, like what can you bring out of what you've learned in building that thing into Something that might be a bit more useful to people on the daily, and also YC obviously likes B2B startups a little bit more, at least at the time they did, back then.[00:12:01] So we were like, okay, maybe we could build something that helps you with existing codebases, like can sort of automate development stuff with existing codebases, not knowing at all what that would look like, or how you would build it, or any of these things. And They were like, yeah, that sounds interesting.[00:12:15] You should probably go ahead and do that. You're in, you've got two weeks to build us an MVP. And we were like, okay, okay. We did our best. The MVP was absolutely horrendous. It was a CLI tool. It sucked. And, um, at the time we were like, we, we don't even know. How to build what we want to build. And we didn't really know what we wanted to build, to be honest.[00:12:33] Like, we knew we wanted to try to help automate dev work, but back then we just didn't know enough about how LLM apps were built, the intricacies and all those things. And also, like, the LLMs themselves, like 4, 000 tokens, you're not going very far, they're extremely expensive. So we ended up building a, uh, a code based retrieval tool, originally.[00:12:51] Our thought process originally was, we want to build something that can do our jobs for us. That is like the gold star, we know that. We've seen like there are glimpses of it happening with our initial demo that we did. But we don't see the path of how to do that at the moment. Like the tech just wasn't there.[00:13:05] So we were like, well, there are going to be some things that you need to build this when the tech does catch up. So retrieval being one of the most important things, like the model is going to have to build like pull code out of a code base somehow. So we were like, well, let's just build the tooling around it.[00:13:17] And eventually when the tech comes, then we'll be able to just like plug it into our, our tooling and then it should work basically. And to be fair, that's basically what we've done. And that's basically what's happened, which is very fortunate. But in the meantime, whilst we were waiting for everything to sort of become available, we built this code base retrieval tool.[00:13:34] That was the first thing we ever launched when we were in YC like that, and it didn't work. It was really frustrating for us because it was just me and Sam like working like all hours trying to get this thing to work. It was quite a big task in of itself, trying to get like a good semantic search engine working that could run locally on your machine.[00:13:51] We were trying to avoid sending code to the cloud as much as possible. And then for very large codebases, you're like, you know, millions of lines of code. You're trying to do some sort of like local HNSW thing that runs inside your VS Code instance that like eats all your RAM as you've seen in the past.[00:14:05] All those different things. Yep. Yeah.[00:14:07] swyx: My first call with[00:14:07] Alistair Pullen: you, I had trouble. You were like, yeah, it sucks, man. I know, I know. I know it sucks. I'm sorry. I'm sorry. But building all that stuff was essentially the first six to eight months of what at the time was built. Which, by the way, build it. Build it. Yeah, it was a terrible, terrible name.[00:14:25] It was the worst,[00:14:27] swyx: like, part of trying to think about whether I would invest is whether or not people could pronounce it.[00:14:32] Alistair Pullen: No, when we, so when we went on our first ever YC, like, retreat, No one got the name right. They were like, build, build, well, um, and then we actually changed the names, cosign, like, although some people would spell it as in like, as if you're cosigning for an apartment or something like that's like, can't win.[00:14:49] Yeah. That was what built was back then. But the ambition, and I did a talk on this back in the end of 2022, the ambition to like build something that essentially automated our jobs was still very much like core to what we were doing. But for a very long time, it was just never apparent to us. Like. How would you go about doing these things?[00:15:06] Even when, like, you had 3. suddenly felt huge, because you've gone from 4 to 16, but even then 16k is like, a lot of Python files are longer than 16k. So you can't, you know, before you even start doing a completion, even then we were like, eh, Yeah, it looks like we're still waiting. And then, like, towards the end of last year, you then start, you see 32k.[00:15:28] 32k was really smart. It was really expensive, but also, like, you could fit a decent amount of stuff in it. 32k felt enormous. And then, finally, 128k came along, and we were like, right, this is, like, this is what we can actually deal with. Because, fundamentally, to build a product like this, you need to get as much information in front of the model as possible, and make sure that everything it ever writes in output can be read.[00:15:49] traced back to something in the context window, so it's not hallucinating it. As soon as that model existed, I was like, okay, I know that this is now going to be feasible in some way. We'd done early sort of dev work on Genie using 3. 5 16k. And that was a very, very like crude way of proving that this loop that we were after and the way we were generating the data actually had signal and worked and could do something.[00:16:16] But the model itself was not useful because you couldn't ever fit enough information into it for it to be able to do the task competently and also the base intelligence of the model. I mean, 3. 5, anyone who's used 3. 5 knows the base intelligence of the model is. is lacking, especially when you're asking it to like do software engineering, this is quite quite involved.[00:16:34] GPT4o finetuning[00:16:34] Alistair Pullen: So, we saw the 128k context model and um, at that point we'd been in touch with OpenAI about our ambitions and like how we wanted to build it. We essentially are, I just took a punt, I was like, I'm just going to ask to see, can we like train this thing? Because at the time Fortobo had just come out and back then there was still a decent amount of lag time between like OpenAI releasing a model and then allowing you to fine tune it in some way.[00:16:59] They've gotten much better about that recently, like 4. 0 fine tuning came out either, I think, a day, 4. 0 mini fine tuning came out like a day after the model did. And I know that's something they're definitely like, optimising for super heavily inside, which is great to see.[00:17:11] swyx: Which is a little bit, you know, for a year or so, YC companies had like a direct Slack channel to open AI.[00:17:17] We still do. Yeah. Yeah. So, it's a little bit of a diminishing of the YC advantage there. Yeah. If they're releasing this fine tuning[00:17:23] Alistair Pullen: ability like a day after. Yeah, no, no, absolutely. But like. You can't build a startup otherwise. The advantage is obviously nice and it makes you feel fuzzy inside. But like, at the end of the day, it's not that that's going to make you win.[00:17:34] But yeah, no, so like we'd spoken to Shamul there, Devrel guy, I'm sure you know him. I think he's head of solutions or something. In their applied team, yeah, we'd been talking to him from the very beginning when we got into YC, and he's been absolutely fantastic throughout. I basically had pitched him this idea back when we were doing it on 3.[00:17:53] 5, 16k, and I was like, this is my, this is my crazy thesis. I want to see if this can work. And as soon as like that 128k model came out, I started like laying the groundwork. I was like, I know this definitely isn't possible because he released it like yesterday, but know that I want it. And in the interim, like, GPT 4, like, 8K fine tuning came out.[00:18:11] We tried that, it's obviously even fewer tokens, but the intelligence helped. And I was like, if we can marry the intelligence and the context window length, then we're going to have something special. And eventually, we were able to get on the Experimental Access Program, and we got access to 4Turbo fine tuning.[00:18:25] As soon as we did that, because in the entire run up to that we built the data pipeline, we already had all that set up, so we were like, right, we have the data, now we have the model, let's put it through and iterate, essentially, and that's, that's where, like, Genie as we know it today, really was born. I won't pretend like the first version of Gene that we trained was good.[00:18:45] It was a disaster. That's where you realize all the implicit biases in your data set. And you realize that, oh, actually this decision you made that was fairly arbitrary was the wrong one. You have to do it a different way. Other subtle things like, you know, how you write Git diffs in using LLMs and how you can best optimize that to make sure they actually apply and work and loads of different little edge cases.[00:19:03] But as soon as we had access to the underlying tool, we were like, we can actually do this. And I was I breathed a sigh of relief because I didn't know it was like, it wasn't a done deal, but I knew that we could build something useful. I mean, I knew that we could build something that would be measurably good on whatever eval at the time that you wanted to use.[00:19:23] Like at the time, back then, we weren't actually that familiar with Swift. But once Devin came out and they announced the SBBench core, I like, that's when my life took a turn. Challenge accepted. Yeah, challenge accepted. And that's where like, yes, that's where my friendships have gone. My sleep has gone. My weight.[00:19:40] Everything got into SweeBench and yeah, we, we, it was actually a very useful tool in building GeniX beforehand. It was like, yes, vibe check this thing and see if it's useful. And then all of a sudden you have a, an actual measure to, to see like, couldn't it do software engineering? Not, not the best measure, obviously, but like it's a, it's the best that we've got now.[00:19:57] We, we just iterated and built and eventually we got it to the point where it is now. And a little bit beyond since we actually Like, we actually got that score a couple of weeks ago, and yeah, it's been a hell of a journey from the beginning all the way now. That was a very rambling answer to your question about how we got here, but that's essentially the potted answer of how we got here.[00:20:16] Got the full[00:20:16] swyx: origin story[00:20:17] Alessio: out. Yeah, no, totally.[00:20:18] Genie Data Mix[00:20:18] Alessio: You mentioned bias in the data and some of these things. In your announcement video, you called Genie the worst verse AI software engineering colleague. And you kind of highlighted how the data needed to train it needs to show how a human engineer works. I think maybe you're contrasting that to just putting code in it.[00:20:37] There's kind of like a lot more than code that goes into software engineering. How do you think about the data mixture, you know, and like, uh, there's this kind of known truth that code makes models better when you put in the pre training data, but since we put so much in the pre training data, what else do you add when you turn to Genium?[00:20:54] Alistair Pullen: Yeah, I think, well, I think that sort of boils down fundamentally to the difference between a model writing code and a model doing software engineering, because the software engineering sort of discipline goes wider, because if you look at something like a PR, that is obviously a Artifact of some thought and some work that has happened and has eventually been squashed into, you know, some diffs, right?[00:21:17] What the, very crudely, what the pre trained models are reading is they're reading those final diffs and they're emulating that and they're being able to output it, right? But of course, it's a super lossy thing, a PR. You have no idea why or how, for the most part, unless there are some comments, which, you know, anyone who's worked in a company realizes PR reviews can be a bit dodgy at times, but you see that you lose so much information at the end, and that's perfectly fine, because PRs aren't designed to be something that perfectly preserves everything that happened, but What we realized was if you want something that's a software engineer, and very crudely, we started with like something that can do PRs for you, essentially, you need to be able to figure out why those things happened.[00:21:58] Otherwise, you're just going to rely, you essentially just have a code writing model, you have something that's good at human eval, but But, but not very good at Sweet Eng. Essentially that realization was, was part of the, the kernel of the idea of of, of the approach that we took to design the agent. That, that is genie the way that we decided we want to try to extract what happened in the past, like as forensically as possible, has been and is currently like one of the, the main things that we focus all our time on, because doing that as getting as much signal out as possible, doing that as well as possible is the biggest.[00:22:31] thing that we've seen that determines how well we do on that benchmark at the end of the day. Once you've sorted things out, like output structure, how to get it consistently writing diffs and all the stuff that is sort of ancillary to the model actually figuring out how to solve a problem, the core bit of solving the problem is how did the human solve this problem and how can we best come up with how the human solved these problems.[00:22:54] So all the effort went in on that. And the mix that we ended up with was, as you've probably seen in the technical report and so on, all of those different languages and different combinations of different task types, all of that has run through that pipeline, and we've extracted all that information out.[00:23:09] Customizing for Customers[00:23:09] Alessio: How does that differ when you work with customers that have private workflows? Like, do you think, is there usually a big delta between what you get in open source and maybe public data versus like Yeah,[00:23:19] Alistair Pullen: yeah, yeah. When you scrape enough of it, most of open source is updating readmes and docs. It's hilarious, like we had to filter out so much of that stuff because when we first did the 16k model, like the amount of readme updating that went in, we did like no data cleaning, no real, like, we just sort of threw it in and saw what happened.[00:23:38] And it was just like, It was really good at updating readme, it was really good at writing some comments, really good at, um, complaining in Git reviews, in PR reviews, rather, and it would, again, like, we didn't clean the data, so you'd, like, give it some feedback, and it would just, like, reply, and, like, it would just be quite insubordinate when it was getting back to you, like, no, I don't think you're right, and it would just sort of argue with you, so The process of doing all that was super interesting because we realized from the beginning, okay, there's a huge amount of work that needs to go into like cleaning this, getting it aligned with what we want the model to do to be able to get the model to be useful in some way.[00:24:12] Alessio: I'm curious, like, how do you think about the customer willingness? To share all of this historical data, I've done a lot of developer tools investing in my career and getting access to the code base is always one of the hard things. Are people getting more cautious about sharing this information? In the past, it was maybe like, you know, you're using static analysis tool, like whatever else you need to plug into the code base, fine.[00:24:35] Now you're building. A model based on it, like, uh, what's the discussion going into these companies? Are most people comfortable with, like, letting you see how to work and sharing everything?[00:24:44] Alistair Pullen: It depends on the sector, mostly. We've actually seen, I'd say, people becoming more amenable to the idea over time, actually, rather than more skeptical, because I think they can see the, the upside.[00:24:55] If this thing could be, Does what they say it does, it's going to be more help to us than it is a risk to our infosec. Um, and of course, like, companies building in this space, we're all going to end up, you know, complying with the same rules, and there are going to be new rules that come out to make sure that we're looking at your code, that everything is safe, and so on.[00:25:12] So from what we've seen so far, we've spoken to some very large companies that you've definitely heard of and all of them obviously have stipulations and many of them want it to be sandbox to start with and all the like very obvious things that I, you know, I would say as well, but they're all super keen to have a go and see because like, despite all those things, if we can genuinely Make them go faster, allow them to build more in a given time period and stuff.[00:25:35] It's super worth it to them.[00:25:37] Genie Workflow[00:25:37] swyx: Okay, I'm going to dive in a little bit on the process that you have created. You showed the demo on your video, and by the time that we release this, you should be taking people off the waitlist and launching people so people can see this themselves. There's four main Parts of the workflow, which is finding files, planning action, writing code and running tests.[00:25:58] And controversially, you have set yourself apart from the Devins of the world by saying that things like having access to a browser is not that important for you. Is that an accurate reading of[00:26:09] Alistair Pullen: what you wrote? I don't remember saying that, but At least with what we've seen, the browser is helpful, but it's not as helpful as, like, ragging the correct files, if that makes sense.[00:26:20] Like, it is still helpful, but obviously there are more fundamental things you have to get right before you get to, like, Oh yeah, you can read some docs, or you can read a stack overflow article, and stuff like that.[00:26:30] swyx: Yeah, the phrase I was indexing on was, The other software tools are wrappers around foundational models with a few additional tools, such as a web browser or code interpreter.[00:26:38] Alistair Pullen: Oh, I see. No, I mean, no, I'm, I'm not, I'm not, I'm not deri, I'm deriding the, the, the approach that, not the, not the tools. Yeah, exactly. So like, I would[00:26:44] swyx: say in my standard model of what a code agent should look like, uh, Devon has been very influential, obviously. Yeah. Yeah. Because you could just add the docs of something.[00:26:54] Mm-Hmm. . And like, you know, now I have, now when I'm installing a new library, I can just add docs. Yeah, yeah. Cursor also does this. Right. And then obviously having a code interpreter does help. I guess you have that in the form[00:27:03] Alistair Pullen: of running tests. I mean, uh, the Genie has both of those tools available to it as well.[00:27:08] So, yeah, yeah, yeah. So, we have a tool where you can, like, put in URLs and it will just read the URLs. And you can also use this Perplexities API under the hood as well to be able to actually ask questions if it wants to. Okay. So, no, we use both of those tools as well. Like, those tools are Super important and super key.[00:27:24] I think obviously the most important tools to these agents are like being able to retrieve code from a code base, being able to read Stack Overflow articles and what have you and just be able to essentially be able to Google like we do is definitely super useful.[00:27:38] swyx: Yeah, I thought maybe we could just kind of dive into each of those actions.[00:27:41] Code Retrieval[00:27:41] swyx: Code retrieval, one of the core indexer that Yes. You've worked on, uh, even as, as built, what makes it hard, what approach you thought would work, didn't work,[00:27:52] Alistair Pullen: anything like that. It's funny, I had a similar conversation to this when I was chatting to the guys from OpenAI yesterday. The thing is that searching for code, specifically semantically, at least to start with, I mean like keyword search and stuff like that is a, is a solved problem.[00:28:06] It's been around for ages, but at least being able to, the phrase we always used back in the day was searching for what code does rather than what code is. Like searching for functionality is really hard. Really hard. The way that we approached that problem was that obviously like a very basic and easy approach is right.[00:28:26] Let's just embed the code base. We'll chunk it up in some arbitrary way, maybe using an AST, maybe using number of lines, maybe using whatever, like some overlapping, just chunk it up and embed it. And once you've done that, I will write a query saying, like, find me some authentication code or something, embed it, and then do the cosine similarity and get the top of K, right?[00:28:43] That doesn't work. And I wish it did work, don't get me wrong. It doesn't work well at all, because fundamentally, if you think about, like, semantically, how code looks is very different to how English looks, and there's, like, not a huge amount of signal that's carried between the two. So what we ended up, the first approach we took, and that kind of did well enough for a long time, was Okay, let's train a model to be able to take in English code queries and then produce a hypothetical code snippet that might look like the answer, embed that, and then do the code similarity.[00:29:18] And that process, although very simple, gets you so much more performance out of the retrieval accuracy. And that was kind of like the start of our of our engine, as we called it, which is essentially like the aggregation of all these different heuristics, like semantic, keyword, LSP, and so on. And then we essentially had like a model that would, given an input, choose which ones it thought were most appropriate, given the type of requests you had.[00:29:45] So the whole code search thing was a really hard problem. And actually what we ended up doing with Genie is we, um, let The model through self play figure out how to retrieve code. So actually we don't use our engine for Genie. So instead of like a request coming in and then like say GPT 4 with some JSON output being like, Well, I think here we should use a keyword with these inputs and then we should use semantic.[00:30:09] And then we should like pick these results. It's actually like, A question comes in and Genie has self played in its training data to be able to be like, okay, this is how I'm going to approach finding this information. Much more akin to how a developer would do it. Because if I was like, Shawn, go into this new code base you've never seen before.[00:30:26] And find me the code that does this. You're gonna probably, you might do some keywords, you're gonna look over the file system, you're gonna try to figure out from the directories and the file names where it might be, you're gonna like jump in one, and then once you're in there, you're probably gonna be doing the, you know, go to definition stuff to like jump from file to file and try to use the graph to like get closer and closer.[00:30:46] And that is exactly what Genie does. Starts on the file system, looks at the file system, picks some candidate files, is this what I'm looking for, yes or no, and If there's something that's interesting, like an import or something, it can, it can command click on that thing, go to definition, go to references, and so on.[00:31:00] And it can traverse the codebase that way.[00:31:02] swyx: Are you using the VS Code, uh, LSP, or? No,[00:31:05] Alistair Pullen: that's not, we're not like, we're not doing this in VS Code, we're just using the language servers running. But, we really wanted to try to mimic the way we do it as best as possible. And we did that during the self play process when we were generating the dataset, so.[00:31:18] Although we did all that work originally, and although, like, Genie still has access to these tools, so it can do keyword searches, and it can do, you know, basic semantic searches, and it can use the graph, it uses them through this process and figures out, okay, I've learned from data how to find stuff in codebases, and I think in our technical report, I can't remember the exact number, but I think it was around 65 or 66 percent retrieval accuracy overall, Measured on, we know what lines we need for these tasks to find, for the task to actually be able to be completed, And we found about 66 percent of all those lines, which is one of the biggest areas of free performance that we can get a hold of, because When we were building Genie, truthfully, like, a lot more focus went on assuming you found the right information, you've been able to reproduce the issue, assuming that's true, how do you then go about solving it?[00:32:08] And the bulk of the work we did was on the solving. But when you go higher up the funnel, obviously, like, the funnel looks like, have you found everything you need for the task? Are you able to reproduce the problem that's seen in the issue? Are you then able to solve it? And the funnel gets narrower as you go down.[00:32:22] And at the top of the funnel, of course, is rank. So I'm actually quite happy with that score. I think it's still pretty impressive considering the size of some of the codebases we're doing, we're using for this. But as soon as that, if that number becomes 80, think how many more tasks we get right. That's one of the key areas we're going to focus on when we continue working on Genie.[00:32:37] It'd be interesting to break out a benchmark just for that.[00:32:41] swyx: Yeah, I mean, it's super easy. Because I don't know what state of the art is.[00:32:43] Alistair Pullen: Yeah, I mean, like, for a, um, it's super easy because, like, for a given PR, you know what lines were edited. Oh, okay. Yeah, you know what lines were[00:32:50] swyx: you can[00:32:51] Alistair Pullen: source it from Cbench, actually.[00:32:52] Yeah, you can do it, you can do it super easily. And that's how we got that figure out at the other end. Um, for us being able to see it against, um, our historic models were super useful. So we could see if we were, you know, actually helping ourselves or not. And initially, one of the biggest performance gains that we saw when we were work, when we did work on the RAG a bit was giving it the ability to use the LSP to like go to definition and really try to get it to emulate how we do that, because I'm sure when you go into an editor with that, where like the LSP is not working or whatever, you suddenly feel really like disarmed and naked.[00:33:20] You're like, Oh my god, I didn't realize how much I actually used this to get about rather than just find stuff. So we really tried to get it to do that and that gave us a big jump in performance. So we went from like 54 percent up to like the 60s, but just by adding, focusing on that.[00:33:34] swyx: One weird trick. Yes.[00:33:37] I'll briefly comment here. So this is the standard approach I would say most, uh, code tooling startups are pursuing. The one company that's not doing this is magic. dev. So would you do things differently if you have a 10 million[00:33:51] Alistair Pullen: token context window? If I had a 10 million context window and hundreds of millions of dollars, I wouldn't have gone and built, uh, it's an LTM, it's not a transformer, right, that they're using, right?[00:34:03] If I'm not mistaken, I believe it's not a transformer. Yeah, Eric's going to come on at some point. Listen, they obviously know a lot more about their product than I do. I don't know a great deal about how magic works. I don't think he knows anything yet. I'm not going to speculate. Would I do it the same way as them?[00:34:17] I like the way we've done it because fundamentally like we focus on the Active software engineering and what that looks like and showing models how to do that. Fundamentally, the underlying model that we use is kind of null to us, like, so long as it's the best one, I don't mind. And the context windows, we've already seen, like, you can get transformers to have, like, million, one and a half million token context windows.[00:34:43] And that works perfectly well, so like, as soon as you can fine tune Gemini 1. 5, then you best be sure that Genie will run on Gemini 1. 5, and like, we'll probably get very good performance out of that. I like our approach because we can be super agile and be like, Oh, well, Anthropic have just released whatever, uh, you know, and it might have half a million tokens and it might be really smart.[00:35:01] And I can just immediately take my JSONL file and just dump it in there and suddenly Genie works on there and it can do all the new things. Does[00:35:07] swyx: Anthropic have the same fine tuning support as OpenAI? I[00:35:11] Alistair Pullen: actually haven't heard any, anyone do it because they're working on it. They are partner, they're partnered with AWS and it's gonna be in Bedrock.[00:35:16] Okay. As far as, as far as I know, I think I'm, I think, I think that's true. Um, cool. Yeah.[00:35:20] Planning[00:35:20] swyx: We have to keep moving on to, uh, the other segments. Sure. Uh, planning the second piece of your four step grand master plan, that is the frontier right now. You know, a lot of people are talking about strawberry Q Star, whatever that is.[00:35:32] Monte Carlo Tree Search. Is current state of the art planning good enough? What prompts have worked? I don't even know what questions to ask. Like, what is the state of planning?[00:35:41] Alistair Pullen: I think it's fairly obvious that with the foundational models, like, you can ask them to think by step by step and ask them to plan and stuff, but that isn't enough, because if you look at how those models score on these benchmarks, then they're not even close to state of the art.[00:35:52] Which ones are[00:35:52] swyx: you referencing? Benchmarks? So, like,[00:35:53] Alistair Pullen: just, uh, like, SweetBench and so on, right? And, like, even the things that get really good scores on human evalor agents as well, because they have these loops, right? Yeah. Obviously these things can reason, quote unquote, but the reasoning is the model, like, it's constrained by the model as intelligence, I'd say, very crudely.[00:36:10] And what we essentially wanted to do was we still thought that, obviously, reasoning is super important, we need it to get the performance we have. But we wanted the reasoning to emulate how we think about problems when we're solving them as opposed to how a model thinks about a problem when we're solving it.[00:36:23] And that was, that's obviously part of, like, the derivation pipeline that we have when we, when we, when we Design our data, but the reasoning that the models do right now, and who knows what Q star, whatever ends up being called looks like, but certainly what I'm excited on a small tangent to that, like, what I'm really excited about is when models like that come out, obviously, the signal in my data, when I regenerate, it goes up.[00:36:44] And then I can then train that model. It's already better at reasoning with it. improved reasoning data and just like I can keep bootstrapping and keep leapfrogging every single time. And that is like super exciting to me because I don't, I welcome like new models so much because immediately it just floats me up without having to do much work, which is always nice.[00:37:02] But at the state of reasoning generally, I don't see it going away anytime soon. I mean, that's like an autoregressive model doesn't think per se. And in the absence of having any thought Maybe, uh, an energy based model or something like that. Maybe that's what QSTAR is. Who knows? Some sort of, like, high level, abstract space where thought happens before tokens get produced.[00:37:22] In the absence of that for the moment, I think it's all we have and it's going to have to be the way it works. For what happens in the future, we'll have to see, but I think certainly it's never going to hinder performance to do it. And certainly, the reasoning that we see Genie do, when you compare it to like, if you ask GPT 4 to break down step by step and approach for the same problem, at least just on a vibe check alone, looks far better.[00:37:46] swyx: Two elements that I like, that I didn't see in your initial video, we'll see when, you know, this, um, Genie launches, is a planner chat, which is, I can modify the plan while it's executing, and then the other thing is playbooks, which is also from Devin, where, here's how I like to do a thing, and I'll use Markdown to, Specify how I do it.[00:38:06] I'm just curious if, if like, you know,[00:38:07] Alistair Pullen: those things help. Yeah, no, absolutely. We're a hundred percent. We want everything to be editable. Not least because it's really frustrating when it's not. Like if you're ever, if you're ever in a situation where like this is the one thing I just wish I could, and you'd be right if that one thing was right and you can't change it.[00:38:21] So we're going to make everything as well, including the code it writes. Like you can, if it makes a small error in a patch, you can just change it yourself and let it continue and it will be fine. Yeah. So yeah, like those things are super important. We'll be doing those two.[00:38:31] Alessio: I'm curious, once you get to writing code, is most of the job done?[00:38:35] I feel like the models are so good at writing code when they're like, And small chunks that are like very well instructed. What's kind of the drop off in the funnel? Like once you get to like, you got the right files and you got the right plan. That's a great question[00:38:47] Alistair Pullen: because by the time this is out, there'll be another blog, there'll be another blog post, which contains all the information, all the learnings that I delivered to OpenAI's fine tuning team when we finally got the score.[00:38:59] Oh, that's good. Um, go for it. It's already up. And, um, yeah, yeah. I don't have it on my phone, but basically I, um, broke down the log probs. I basically got the average log prob for a token at every token position in the context window. So imagine an x axis from 0 to 128k and then the average log prob for each index in there.[00:39:19] As we discussed, like, The way genie works normally is, you know, at the beginning you do your RAG, and then you do your planning, and then you do your coding, and that sort of cycle continues. The certainty of code writing is so much more certain than every other aspect of genie's loop. So whatever's going on under the hood, the model is really comfortable with writing code.[00:39:35] There is no doubt, and it's like in the token probabilities. One slightly different thing, I think, to how most of these models work is, At least for the most part, if you ask GPT4 in ChatGPT to edit some code for you, it's going to rewrite the entire snippet for you with the changes in place. We train Genie to write diffs and, you know, essentially patches, right?[00:39:55] Because it's more token efficient and that is also fundamentally We don't write patches as humans, but it's like, the result of what we do is a patch, right? When Genie writes code, I don't know how much it's leaning on the pre training, like, code writing corpus, because obviously it's just read code files there.[00:40:14] It's obviously probably read a lot of patches, but I would wager it's probably read more code files than it has patches. So it's probably leaning on a different part of its brain, is my speculation. I have no proof for this. So I think the discipline of writing code is slightly different, but certainly is its most comfortable state when it's writing code.[00:40:29] So once you get to that point, so long as you're not too deep into the context window, another thing that I'll bring up in that blog post is, um, Performance of Genie over the length of the context window degrades fairly linearly. So actually, I actually broke it down by probability of solving a SWE bench issue, given the number of tokens of the context window.[00:40:49] It's 60k, it's basically 0. 5. So if you go over 60k in context length, you are more likely to fail than you are to succeed just based on the amount of tokens you have on the context window. And when I presented that to the fine tuning team at OpenAI, that was super interesting to them as well. And that is more of a foundational model attribute than it is an us attribute.[00:41:10] However, the attention mechanism works in, in GPT 4, however, you know, they deal with the context window at that point is, you know, influencing how Genie is able to form, even though obviously all our, all our training data is perfect, right? So even if like stuff is being solved in 110, 000 tokens, sort of that area.[00:41:28] The training data still shows it being solved there, but it's just in practice, the model is finding it much harder to solve stuff down that end of the context window.[00:41:35] Alessio: That's the scale with the context, so for a 200k context size, is 100k tokens like the 0. 5? I don't know. Yeah, but I,[00:41:43] Alistair Pullen: I, um, hope not. I hope you don't just take the context length and halve it and then say, oh, this is the usable context length.[00:41:50] But what's been interesting is knowing that Actually really digging into the data, looking at the log probs, looking at how it performs over the entire window. It's influenced the short term improvements we've made to Genie since we did the, got that score. So we actually made some small optimizations to try to make sure As best we can without, like, overdoing it, trying to make sure that we can artificially make sure stuff sits within that sort of range, because we know that's our sort of battle zone.[00:42:17] And if we go outside of that, we're starting to push the limits, we're more likely to fail. So just doing that sort of analysis has been super useful without actually messing with anything, um, like, more structural in getting more performance out of it.[00:42:29] Language Mix[00:42:29] Alessio: What about, um, different languages? So, in your technical report, the data makes sense.[00:42:34] 21 percent JavaScript, 21 percent Python, 14 percent TypeScript, 14 percent TSX, um, Which is JavaScript, JavaScript.[00:42:42] Alistair Pullen: Yeah,[00:42:42] swyx: yeah, yeah. Yes,[00:42:43] Alistair Pullen: yeah, yeah. It's like 49 percent JavaScript. That's true, although TypeScript is so much superior, but anyway.[00:42:46] Alessio: Do you see, how good is it at just like generalizing? You know, if you're writing Rust or C or whatever else, it's quite different.[00:42:55] Alistair Pullen: It's pretty good at generalizing. Um, obviously, though, I think there's 15 languages in that technical report, I think, that we've, that we've covered. The ones that we picked in the highest mix were, uh, the ones that, selfishly, we internally use the most, and also that are, I'd argue, some of the most popular ones.[00:43:11] When we have more resource as a company, and, More time and, you know, once all the craziness that has just happened sort of dies down a bit, we are going to, you know, work on that mix. I'd love to see everything ideally be represented in a similar level as it is. If you, if you took GitHub as a data set, if you took like how are the languages broken down in terms of popularity, that would be my ideal data mix to start.[00:43:34] It's just that it's not cheap. So, um, yeah, trying to have an equal amount of Ruby and Rust and all these different things is just, at our current state, is not really what we're looking for.[00:43:46] Running Code[00:43:46] Alessio: There's a lot of good Ruby in my GitHub profile. You can have it all. Well, okay, we'll just train on that. For running tests It sounds easy, but it isn't, especially when you're working in enterprise codebases that are kind of like very hard to spin up.[00:43:58] Yes. How do you set that up? It's like, how do you make a model actually understand how to run a codebase, which is different than writing code for a codebase?[00:44:07] Alistair Pullen: The model itself is not in charge of like setting up the codebase and running it. So Genie sits on top of GitHub, and if you have CI running GitHub, you have GitHub Actions and stuff like that, then Genie essentially makes a call out to that, runs your CI, sees the outputs and then like moves on.[00:44:23] Making a model itself, set up a repo, wasn't scoped in what we wanted Genie to be able to do because for the most part, like, at least most enterprises have some sort of CI pipeline running and like a lot of, if you're doing some, even like, A lot of hobbyist software development has some sort of like basic CI running as well.[00:44:40] And that was like the lowest hanging fruit approach that we took. So when, when Genie ships, like the way it will run its own code is it will basically run your CI and it will like take the, um, I'm not in charge of writing this. The rest of the team is, but I think it's the checks API on GitHub allows you to like grab that information and throw it in the context window.[00:44:56] Alessio: What's the handoff like with the person? So, Jeannie, you give it a task, and then how long are you supposed to supervise it for? Or are you just waiting for, like, the checks to eventually run, and then you see how it goes? Like, uh, what does it feel like?[00:45:11] Alistair Pullen: There are a couple of modes that it can run in, essentially.[00:45:14] It can run in, like, fully headless autonomous modes, so say you assign it a ticket in linear or something. Then it won't ask you for anything. It will just go ahead and try. Or if you're in like the GUI on the website and you're using it, then you can give it a task and it, it might choose to ask you a clarifying question.[00:45:30] So like if you ask it something super broad, it might just come back to you and say, what does that actually mean? Or can you point me in the right direction for this? Because like our decision internally was, it's going to piss people off way more if it just goes off and has, and makes a completely like.[00:45:45] ruined attempt at it because it just like from day one got the wrong idea. So it can ask you for a lot of questions. And once it's going much like a regular PR, you can leave review comments, issue comments, all these different things. And it, because you know, he's been trained to be a software engineering colleague, responds in actually a better way than a real colleague, because it's less snarky and less high and mighty.[00:46:08] And also the amount of filtering has to do for When you train a model to like be a software engineer, essentially, it's like you can just do anything. It's like, yeah, it looks good to me, bro.[00:46:17] swyx: Let's[00:46:17] Alistair Pullen: ship it.[00:46:19] Finetuning with OpenAI[00:46:19] swyx: I just wanted to dive in a little bit more on your experience with the fine tuning team. John Allard was publicly sort of very commentary supportive and, you know, was, was part of it.[00:46:27] Like, what's it like working with them? I also picked up that you initially started to fine tune what was publicly available, the 16 to 32 K range. You got access to do more than that. Yeah. You've also trained on billions of tokens instead of the usual millions range. Just, like, take us through that fine tuning journey and any advice that you might have.[00:46:47] Alistair Pullen: It's been so cool, and this will be public by the time this goes out, like, OpenAI themselves have said we are pushing the boundaries of what is possible with fine tuning. Like, we are right on the edge, and like, we are working, genuinely working with them in figuring out how stuff works, what works, what doesn't work, because no one's doing No one else is doing what we're doing.[00:47:06] They have found what we've been working on super interesting, which is why they've allowed us to do so much, like, interesting stuff. Working with John, I mean, I had a really good conversation with John yesterday. We had a little brainstorm after the video we shot. And one of the things you mentioned, the billions of tokens, one of the things we've noticed, and it's actually a very interesting problem for them as well, when you're[00:47:28] How big your peft adapter, your lore adapter is going to be in some way and like figuring that out is actually a really interesting problem because if you make it too big and because they support data sets that are so small, you can put like 20 examples through it or something like that, like if you had a really sparse, large adapter, you're not going to get any signal in that at all.[00:47:44] So they have to dynamically size these things and there is an upper bound and actually we use. Models that are larger than what's publicly available. It's not publicly available yet, but when this goes out, it will be. But we have larger law adapters available to us, just because the amount of data that we're pumping through it.[00:48:01] And at that point, you start seeing really Interesting other things like you have to change your learning rate schedule and do all these different things that you don't have to do when you're on the smaller end of things. So working with that team is such a privilege because obviously they're like at the top of their field in, you know, in the fine tuning space.[00:48:18] So we're, as we learn stuff, they're learning stuff. And one of the things that I think really catalyzed this relationship is when we first started working on Genie, like I delivered them a presentation, which will eventually become the blog post that you'll love to read soon. The information I gave them there I think is what showed them like, oh wow, okay, these guys are really like pushing the boundaries of what we can do here.[00:48:38] And truthfully, our data set, we view our data set right now as very small. It's like the minimum that we're able to afford, literally afford right now to be able to produce a product like this. And it's only going to get bigger. So yesterday while I was in their offices, I was basically, so we were planning, we were like, okay, how, this is where we're going in the next six to 12 months.[00:48:57] Like we're, Putting our foot on the gas here, because this clearly works. Like I've demonstrated this is a good, you know, the best approach so far. And I want to see where it can go. I want to see what the scaling laws like for the data. And at the moment, like, it's hard to figure that out because you don't know when you're running into like saturating a PEFT adapter, as opposed to actually like, is this the model's limit?[00:49:15] Like, where is that? So finding all that stuff out is the work we're actively doing with them. And yeah, it's, it's going to get more and more collaborative over the next few weeks as we, as we explore like larger adapters, pre training extension, different things like that.[00:49:27] swyx: Awesome. I also wanted to talk briefly about the synthetic data process.[00:49:32] Synthetic Code Data[00:49:32] swyx: One of your core insights was that the vast majority of the time, the code that is published by a human is encrypted. In a working state. And actually you need to fine tune on non working code. So just, yeah, take us through that inspiration. How many rounds, uh, did you, did you do? Yeah, I mean, uh,[00:49:47] Alistair Pullen: it might, it might be generous to say that the vast majority of code is in a working state.[00:49:51] I don't know if I don't know if I believe that. I was like, that's very nice of you to say that my code works. Certainly, it's not true for me. No, I think that so yeah, no, but it was you're right. It's an interesting problem. And what we saw was when we didn't do that, obviously, we'll just hope you have to basically like one shot the answer.[00:50:07] Because after that, it's like, well, I've never seen iteration before. How am I supposed to figure out how this works? So what the what you're alluding to there is like the self improvement loop that we started working on. And that was in sort of two parts, we synthetically generated runtime errors. Where we would intentionally mess with the AST to make stuff not work, or index out of bounds, or refer to a variable that doesn't exist, or errors that the foundational models just make sometimes that you can't really avoid, you can't expect it to be perfect.[00:50:39] So we threw some of those in with a, with a, with a probability of happening and on the self improvement side, I spoke about this in the, in the blog post, essentially the idea is that you generate your data in sort of batches. First batch is like perfect, like one example, like here's the problem, here's the answer, go, train the model on it.[00:50:57] And then for the second batch, you then take the model that you trained before that can look like one commit into the future, and then you let it have the first attempt at solving the problem. And hopefully it gets it wrong, and if it gets it wrong, then you have, like, okay, now the codebase is in this incorrect state, but I know what the correct state is, so I can do some diffing, essentially, to figure out how do I get the state that it's in now to the state that I want it in, and then you can train the model to then produce that diff next, and so on, and so on, and so on, so the model can then learn, and also reason as to why it needs to make these changes, to be able to learn how to, like, learn, like, solve problems iteratively and learn from its mistakes and stuff like that.[00:51:35] Alessio: And you picked the size of the data set just based on how much money you could spend generating it. Maybe you think you could just make more and get better results. How, what[00:51:42] Alistair Pullen: multiple of my monthly burn do I spend doing this? Yeah. Basically it was, it was very much related to Yeah. Just like capital and um, yes, with any luck that that will be alleviated to[00:51:53] swyx: very soon.[00:51:54] Alistair Pullen: Yeah.[00:51:54] SynData in Llama 3[00:51:54] swyx: Yeah. I like drawing references to other things that are happening in, in the, in the wild. So, 'cause we only get to release this podcast once a week. Mm-Hmm. , the LAMA three paper also had some really interesting. Thoughts on synthetic data for code? I don't know if you have reviewed that. I'll highlight the back translation section.[00:52:11] Because one of your dataset focuses is updating documentation. I think that translation between natural language, English versus code, and
Today's guest is Jordan Wilson of the Accelerant Agency and Everyday AI, who is not only a content production machine (daily podcasts, daily newsletter) but who makes AI accessible to everyone, tech enthusiasts and beginners alike. I asked Jordan to run through some AI applications for AI in real estate that you can immediately use – and he does not disappoint. Here's what you'll learn to do: Data Analysis with Claude: Use Claude.ai to process and analyze vast amounts of real estate data quickly, creating interactive dashboards (wow) that can be customized through simple natural language commands. AI for Custom Applications: Exactly how to use AI to automate time-consuming tasks, enhancing productivity and efficiency. Castmagic for Automated Transcriptions: We use (here at GowerCrowd) and Jordan loves the platform Castmagic to streamline video production, analysis, and repurposing into dozens of smaller content pieces you can use to expand your visibility and be recognized as an industry leader. Jordan is passionate about making AI accessible to everyone, and in today's episode, you'll be guided through some easy wins you can implement right away. Plus, he'll share insights into how even the rocket scientists are leveraging AI, giving you a sense of just how scalable its benefits are. ***** The only Podcast you need on real estate and AI. Learn how other real estate pros are using AI to get ahead of their competition. Get early notice of hot new game-changing AI real estate apps. Walk away with something you can actually use in every episode. PLUS, subscribe to my free newsletter and get: • practical guides, • how-to's, and • news updates All exclusively for real estate investors that make learning AI fun and easy and insanely productive, for free. EasyWin.AI
The Mint Condition: NFT and Digital Collectibles Entertainment
Send us a Text Message.In this engaging episode of Mid Mic Crisis, Bunchu and Chamber delve into the fascinating world of AI and its potential to revolutionize podcasting. They introduce their latest creation, Mid Mic Daily Bite, a mini version of their show featuring AI-generated voices of the hosts, made possible by Eleven Labs. Bunchu provides a detailed walkthrough of the workflow and steps required to create realistic banter and fresh material, giving listeners an insightful look into the behind-the-scenes process.The discussion then shifts to a thorough breakdown of Project Strawberry, the latest endeavor from OpenAI. Project Strawberry is rumored to be the development of GPT-5, a groundbreaking advancement in AI technology. The hosts discuss the project's potential to surpass current AI capabilities, its implications for various industries, and the ethical considerations that come with such powerful technology. They also delve into the roles of key figures like @iruletheworldmo and Sam Altman, analyzing their contributions and influence within OpenAI and the broader AI community.To wrap up the show, Bunchu and Chamber engage in their popular "What's Too Much?" segment, providing a lighthearted yet thought-provoking end to the episode. This segment is always a fan favorite, offering humorous insights into various topics and leaving listeners with plenty to ponder.Join Bunchu and Chamber as they explore the cutting-edge of AI in podcasting, dissect the latest trends in technology, and share their unique perspectives on the ever-evolving digital landscape.Follow Us:Fire Brain AI: https://www.skool.com/firebrain-ai-6434/aboutYouTube: https://www.youtube.com/@dGENnetworkInstagram: https://www.instagram.com/midmiccrisis/?hl=enTikTok: https://www.tiktok.com/@mid.mic.crisis?lang=enTwitter: https://twitter.com/MidMicCrisisPowered by @dGenNetworkWebsite: https://dgen.network/Support the Show.
Federal agencies with highly sensitive workloads now have the opportunity to use OpenAI GPT-4o. Microsoft announced that it received FedRAMP High accreditation to offer the OpenAI generative AI platform through its Azure Government cloud. The FedRAMP High designation denotes that the OpenAI services have met a higher security threshold to work with sensitive civilian datasets, including those in the fields of health care, law enforcement, finance and emergency response, among others. The General Services Administration has a health robotic process automation program, but in some cases, those bots are putting data and systems at risk, the agency's inspector general found in a recent audit. In a new report, GSA's Office of the Inspector General stated that the agency's RPA program did not comply with IT security requirements to “ensure bots are operating securely and properly.” The watchdog found a slew of security issues with the bots ranging from the agency not establishing a process for removing access to decommissioned bots to a lack of monitoring and reporting bot-related activity. The Daily Scoop Podcast is available every Monday-Friday afternoon. If you want to hear more of the latest from Washington, subscribe to The Daily Scoop Podcast on on Apple Podcasts, Soundcloud, Spotify and YouTube.
Send Everyday AI and Jordan a text messageA new large language model from the industry leader. Huge updates in AI lawsuits. International turmoil around AI regulation. That's just the beginning. This week was a chaotic one in AI news. What's it all mean for your biz? We got you. Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode PageJoin the discussion: Ask Jordan questions on AIUpcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTopics Covered in This Episode:1. Use of Copyrighted Content to Train AI2. Current State of AI Education3. Release of OpenAI's GPT 4 o Mini4. Launch of AI-Driven Education Platform, Eureka Labs5. Withholding of Meta's future AI models and features by the EUTimestamps:03:15 Tech giants accused of illegally using YouTube subtitles.07:00 Language model akin to a search engine.10:35 OpenAI requests stories, affecting journalism and copyright.11:46 Journalist pivots to AI, predicts legal implications.17:41 AI course for building functioning web app.18:58 Kaparthy is a leader in AI development.22:27 Off-camera conversations reveal more significant insights.27:29 EU announces strict EUAI Act; Meta's LAMA.30:06 OpenAI unveils new GPT-4oMini language model.31:56 OpenAI API facing issues, costly for developers.38:07 Use GPT-4.0 for products, services, AI.41:06 GPT-4 Mini leads in machine learning, AWS offers fine-tuning.42:44 OpenAI's development lacked, developers looked elsewhere.Keywords:AI assistants, human teachers, LLM 101n, digital cohorts, physical cohorts, Meta's celebrity chatbots, storyteller AI large language model, Python, C, CUDA, funding, AI technology education, resources focus on sales, Jordan Wilson, Everyday AI, Thanks a Million Giveaway, tech giants' illegal use of YouTube subtitles, Anthropic, NVIDIA, Salesforce, copyright violation, training large language models, decline in traffic for Stack Overflow, Marquise Brownlee, mister Beast, Meta withholding AI models from EU, Apple's withheld AI features, OpenAI GPT 4 o Mini, cost-effective AI solutions, competitive pricing of AI models. Get more out of ChatGPT by learning our PPP method in this live, interactive and free training! Sign up now: https://youreverydayai.com/ppp-registration/
O Hipsters: Fora de Controle é o podcast da Alura com notícias sobre Inteligência Artificial aplicada e todo esse novo mundo no qual estamos começando a engatinhar, e que você vai poder explorar conosco! Nesse episódio conversamos sobre as principais notícias da semana, incluindo o lançamento da nova versão mais leve e mais barata do GPT-4o, o lançamento do app do Claude para Android, e os novos modelos Mathstral e Codestral da Mistral. Em seguida, conversamos com Leandro Vieira sobre como a Oracle está lidando com a adoção e desenvolvimento de soluções de IA, tanto para uso interno quanto para seus clientes. Vem ver quem participou desse papo: Marcus Mendes, host fora de controle Fabrício Carraro, host fora de controle, Program Manager da Alura, autor de IA e host do podcast Dev Sem Fronteiras Leandro Vieira, Vice-Presidente de AI & HighTech Companies, LatAm na Oracle
Cohere is one of the frontier large-language-model companies focusing on enterprise-use cases to differentiate it from OpenAI GPT, Google Gemini, Meta Llama, Anthropic Claude and Mistral. Cohere co-founder Nick Frosst joins Bloomberg Intelligence senior technology analyst Mandeep Singh to discuss what makes Cohere unique in its approach to model training and deployment. They discuss retrieval augmented generation, tokenization and more.
In this episode, hosts Neil Tyra and Sam Mollaei dive into the transformative potential of artificial intelligence (AI) in the legal industry, presenting both challenges and opportunities for lawyers and law firms. They discuss insights from major AI conferences and announcements, emphasizing the exponential impact of AI advancements and the importance of embracing these technologies to solve problems and seize opportunities. Join them as they explore how lawyers can leverage cutting-edge AI tools to enhance their practices and stay ahead in a rapidly evolving landscape.Key Takeaways from Neil and SamFrom the Singularity AI Conference:Understanding Moore's Law's exponential growth in computing power and Wright's Law's cost reduction with production can help lawyers predict technological advancements and plan for AI integration in legal practices.Ray Kurzweil's Law of Accelerating Returns predicts exponential technological progress, urging lawyers to anticipate rapid AI advancements for improved legal research, contract analysis, and case predictions.In the AI era, data is invaluable for predictive analytics and legal trend identification, enabling lawyers to gain a competitive edge by leveraging high-quality data for more accurate and efficient client services.OpenAI - GPT-4o:OpenAI's GPT-4o, announced on May 13, 2024, offers GPT-4-level intelligence with faster performance and enhanced capabilities across text, voice, and vision, improving user interactions and real-time voice conversations.GPT-4o is twice as fast, 50% cheaper, and provides five times higher rate limits compared to GPT-4 Turbo, making it a superior and more economical AI model.GPT-4o is being rolled out to most paid users, with a free version starting this week, featuring new image upload capabilities, improved web browsing, and a code interpreter, while voice input still uses the old Whisper, and Mac and Phone apps are pending release.Google's AI Release:Announced at Google I/O 2024, Gemini 1.5 Pro is an advanced multimodal AI model with a context window of up to one million tokens, capable of processing text, images, and videos, with plans to expand to two million tokens for developers soon.Google introduced Ask Photos, allowing users to ask complex questions about their photo library for detailed information, and Veo, a video generation tool that creates high-quality videos from text prompts, both set to enhance user experiences across Google services.Google's generative AI enhancements in search provide more comprehensive and contextually relevant results, including video-based searches and improved AI overviews, as part of 44 new AI tools and updates aimed at deeper AI integration into its products."One of the other things I understood about this new version (of ChatGPT) is that it very much strengthened the visual analysis." — Neil Tyra"The importance of data is that essentially, data is the asset, and if you have a unique data set... that is what makes AI valuable." — Sam MollaeiJoin Lawyer Club FREE, where lawyers and law firm owners come to reclaim control of their firms and their lives! Plus, get the full list of Best AI Tools For Lawyers inside!Get in touch with Sam:MLA WebsiteLawyerClubPurchase Sam's book
Send Everyday AI and Jordan a text messageThis week could mark a turning point in the world of AI. Here's why: ↪ Stay tuned for Apple's groundbreaking GenAI revelations at WWDC 2024.↪ Discover why Elon Musk redirected NVIDIA chips from Tesla to xAI.↪ Find out how NVIDIA just made a historic leap past Apple.↪ And don't miss the details on the new AI video generator that could dethrone SoraNewsletter: Sign up for our free daily newsletterMore on this Episode: Episode PageJoin the discussion: Ask Jordan questions on AIRelated Episodes: Ep 286: Apple's AI – Too little, too late?Ep 211: OpenAI's Sora – The larger impact that no one's talking aboutUpcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTopics Covered in This Episode:1. Apple's upcoming AI developments2. NVIDIA mnarket cap3. New AI text-to-video tool4. Elon Musk reallocating NVIDIA AI chips5. Cisco's $1 billion investment into AI6. U.S. Antitrust Big Tech InvestigationTimestamps:02:00 Elon Musk reallocates NVIDIA chips for supercomputer.06:03 NVIDIA's market cap surged, stock split explained.07:30 Nvidia's strong financials and plans for AI chip.12:04 OpenAI restricted access, improving, making it safer.14:42 Unverified video likely from new AI competitor.18:01 Detailed observation of changing seasons and AI.21:15 Concern regarding societal impacts of advanced AI.25:15 DOJ to lead NVIDIA inquiry, FTC scrutinizes Microsoft.28:29 NVIDIA CEO ahead in recognizing generative AI.33:42 Apple fluctuates between utilizing own or OpenAI model.37:03 Newer iPhones may have advanced AI features.38:03 Siri can now interact with multiple apps.41:25 Apple's announcement at WWDC is big.Keywords:Apple AI features, OpenAI GPT-4 technology, WWDC, generative AI, older iPhone models, partnership, Elon Musk, NVIDIA chips, Tesla, market cap, Kuaishou AI developments, federal regulators, Microsoft OpenAI, AI monopolistic practices, Cisco AI investment, NVIDIA partnership, daily AI newsletter, Soarer, Gigafactory of Compute, NVIDIA stock split, Rubin AI chip platform, advanced AI models misuse, disinformation in AI, Jordan Wilson, Sora and Cling AI models, anthropic cloud, Jensen Huang, Anthropic, kling, Sora. Get more out of ChatGPT by learning our PPP method in this live, interactive and free training! Sign up now: https://youreverydayai.com/ppp-registration/
5 月 14 日,OpenAI 在产品发布会上发布了最新 GPT-4o 多模态大模型,通过实时的语音、视频和文本交互震撼了全世界。然而就在这场发布会后一天,谷歌举办了 2024 年 I/O 开发者大会,发布了令人眼花缭乱的AI产品,名字都快多的记不过来了。当然最令人瞩目的还是和GTP-4o对标的Project Astra。 本期节目邀请到三位业内人士进行访谈,其中包括两位 AI 领域的创业者:出门问问创始人兼 CEO 李志飞与 jobright.ai 联合创始人郑玉典。大家分享了在这两场发布会后的感受和思考,志飞从行业以及技术的角度大家分析了AI助理发展成真正可以日常使用的产品还需要解决的问题,以及Open AI 和谷歌这两家科技公司各自现存的问题与挑战等。 本期人物 丁教 Diane,「声动活泼」联合创始人、「科技早知道」主播 硅谷徐老师,AI高管、连续创业者、斯坦福客座讲师,小红书和微信视频号:硅谷徐老师 |公众号:硅谷云| YouTube: Byte into Future 李志飞,出门问问创始人兼 CEO,美国约翰霍普金斯大学计算机系博士,自然语言处理及人工智能专家,前 Google 总部科学家 郑玉典,AI /数据库博士,ex-Twitter/Newsbreak 广告推荐负责人,jobright.ai 联合创始人 主要话题 [05:19] 嘉宾来自 OpenAI 发布会现场的观察 [11:51] 多模态虚拟助理将引发新一轮人机交互革命 [16:36] Demo啥时候能照进现实,GPT-4o 是否过度抬高期望值? [24:26] GPT-4o VS Astra,多模态模型第一轮较量谁胜谁负? [31:33] 「小而美」的 OpenAI,与「大而全」的谷歌 [34:23] 「船大难掉头」? 皮查伊应该辞职 [39:36] 智能硬件成为主流设备还有多远的距离? [47:41] “渣男”苹果选 OpenAI 还是谷歌?可能是一招定生死 [53:36] OpenAI 两大挑战:产品形态与商业模式 [01:01:34] 开源闭源的终局,留给 OpenAI 的时间还多吗? 关联阅读 GPT-4o让人机交互这个渣男有望重新做人 (https://mp.weixin.qq.com/s/LDk6RLFDhoDbAnxri5amKg) 幕后制作 监制:丁教、Xinlu 后期:Jack、迪卡 运营:George 设计:饭团 商务合作 声动活泼商务合作咨询 (https://sourl.cn/6vdmQT) 支持我们,加入新一年的播客创新 2021 年我们发起了「声动胡同会员计划」,这是一个纯支持项目,支持「声动活泼」在播客内容上不断探索和创新。回顾 2023 年,得益于这些支持,「声动活泼」的每档节目都不断突破,不仅荣登苹果中国的年度热门节目榜单,还在 CPA 和喜马拉雅等平台都榜上有名。2024 年全新付费节目「不止金钱 (https://www.xiaoyuzhoufm.com/podcast/65a625966d045a7f5e0b5640)」现已上线,欢迎收听。同时,新一季「跳进兔子洞」即将上线,敬请期待! 胡同 https://files.fireside.fm/file/fireside-uploads/images/4/4931937e-0184-4c61-a658-6b03c254754d/Z0YbNKpo.png 加入我们 声动活泼正在招聘全职「节目监制」、「节目营销」、「商业化项目管理」,查看详细讯息请 点击链接 (https://sourl.cn/j8tk2g)。如果你已准备好简历,欢迎发送至 hr@shengfm.cn, 标题请用:姓名+岗位名称。 关于声动活泼 「用声音碰撞世界」,声动活泼致力于为人们提供源源不断的思考养料。 我们还有这些播客:声动早咖啡 (https://www.xiaoyuzhoufm.com/podcast/60de7c003dd577b40d5a40f3)、声东击西 (https://etw.fm/episodes)、吃喝玩乐了不起 (https://www.xiaoyuzhoufm.com/podcast/644b94c494d78eb3f7ae8640)、反潮流俱乐部 (https://www.xiaoyuzhoufm.com/podcast/5e284c37418a84a0462634a4)、泡腾 VC (https://www.xiaoyuzhoufm.com/podcast/5f445cdb9504bbdb77f092e9)、商业WHY酱 (https://www.xiaoyuzhoufm.com/podcast/61315abc73105e8f15080b8a)、跳进兔子洞 (https://therabbithole.fireside.fm/) 欢迎在即刻 (https://okjk.co/Qd43ia)、微博等社交媒体上与我们互动,搜索 声动活泼 即可找到我们。 期待你给我们写邮件,邮箱地址是:ting@sheng.fm 声小音 https://files.fireside.fm/file/fireside-uploads/images/4/4931937e-0184-4c61-a658-6b03c254754d/gK0pledC.png 欢迎扫码添加声小音,在节目之外和我们保持联系。 Special Guests: 李飞飞 and 郑玉典.
Timestamps: 0:00 YOU'RE GONNA GET A WORKOUT 0:10 Snapdragon X leaks, Nvidia's Arm APU 1:37 OpenAI GPT-4o is basically the movie Her 4:14 ASUS warranty scandal 6:30 QUICK BITS INTRO 6:37 RDNA 5 details 7:23 Google Messages RCS editing 7:54 Apple Vision Pro 2 price rumor 8:29 Windows 11 Settings app ads 9:17 AI bots dating other bots News Sources: https://lmg.gg/ZEq5T Learn more about your ad choices. Visit megaphone.fm/adchoices
(0:00) Welcoming Sam Altman to the show! (2:28) What's next for OpenAI: GPT-5, open-source, reasoning, what an AI-powered iPhone competitor could look like, and more (21:56) How advanced agents will change the way we interface with apps (33:01) Fair use, creator rights, why OpenAI has stayed away from the music industry (42:02) AI regulation, UBI in a post-AI world (52:23) Sam breaks down how he was fired and re-hired, why he has no equity, dealmaking on behalf of OpenAI, and how he organizes the company (1:05:33) Post-interview recap (1:10:38) All-In Summit announcements, college protests (1:19:06) Signs of innovation dying at Apple: iPad ad, Buffett sells 100M+ shares, what's next? (1:29:41) Google unveils AlphaFold 3.0 Follow Sam: https://twitter.com/sama Follow the besties: https://twitter.com/chamath https://twitter.com/Jason https://twitter.com/DavidSacks https://twitter.com/friedberg Follow on X: https://twitter.com/theallinpod Follow on Instagram: https://www.instagram.com/theallinpod Follow on TikTok: https://www.tiktok.com/@all_in_tok Follow on LinkedIn: https://www.linkedin.com/company/allinpod Intro Music Credit: https://rb.gy/tppkzl https://twitter.com/yung_spielburg Intro Video Credit: https://twitter.com/TheZachEffect Referenced in the show: https://twitter.com/EconomyApp/status/1622029832099082241 https://sacra.com/c/openai https://twitter.com/tim_cook/status/1787864325258162239 https://openai.com/index/introducing-the-model-spec https://twitter.com/SabriSun_Miller/status/1788298123434938738 https://www.archives.gov/founding-docs/bill-of-rights-transcript https://twitter.com/ClayTravis/status/1788312545754825091 https://www.inc.com/bill-murphy-jr/warren-buffett-just-sold-more-than-100-million-shares-of-apple-reason-why-is-eye-opening.html https://www.youtube.com/watch?v=snbTCWL6rxo https://www.digitimes.com/news/a20240506PD216/apple-ev-startup-genai.html https://www.theonion.com/fuck-everything-were-doing-five-blades-1819584036 https://blog.google/technology/ai/google-deepmind-isomorphic-alphafold-3-ai-model
Sam Altman is the CEO of OpenAI, the company behind GPT-4, ChatGPT, Sora, and many other state-of-the-art AI technologies. Please support this podcast by checking out our sponsors: - Cloaked: https://cloaked.com/lex and use code LexPod to get 25% off - Shopify: https://shopify.com/lex to get $1 per month trial - BetterHelp: https://betterhelp.com/lex to get 10% off - ExpressVPN: https://expressvpn.com/lexpod to get 3 months free Transcript: https://lexfridman.com/sam-altman-2-transcript EPISODE LINKS: Sam's X: https://x.com/sama Sam's Blog: https://blog.samaltman.com/ OpenAI's X: https://x.com/OpenAI OpenAI's Website: https://openai.com ChatGPT Website: https://chat.openai.com/ Sora Website: https://openai.com/sora GPT-4 Website: https://openai.com/research/gpt-4 PODCAST INFO: Podcast website: https://lexfridman.com/podcast Apple Podcasts: https://apple.co/2lwqZIr Spotify: https://spoti.fi/2nEwCF8 RSS: https://lexfridman.com/feed/podcast/ YouTube Full Episodes: https://youtube.com/lexfridman YouTube Clips: https://youtube.com/lexclips SUPPORT & CONNECT: - Check out the sponsors above, it's the best way to support this podcast - Support on Patreon: https://www.patreon.com/lexfridman - Twitter: https://twitter.com/lexfridman - Instagram: https://www.instagram.com/lexfridman - LinkedIn: https://www.linkedin.com/in/lexfridman - Facebook: https://www.facebook.com/lexfridman - Medium: https://medium.com/@lexfridman OUTLINE: Here's the timestamps for the episode. On some podcast players you should be able to click the timestamp to jump to that time. (00:00) - Introduction (07:51) - OpenAI board saga (25:17) - Ilya Sutskever (31:26) - Elon Musk lawsuit (41:18) - Sora (51:09) - GPT-4 (1:02:18) - Memory & privacy (1:09:22) - Q* (1:12:58) - GPT-5 (1:16:13) - $7 trillion of compute (1:24:22) - Google and Gemini (1:35:26) - Leap to GPT-5 (1:39:10) - AGI (1:57:44) - Aliens