POPULARITY
OpenAI's new model, GPT-4o, can reason across audio, vision, and text in real-time, with safety measures built-in by design. Apple and Google collaborate to deliver support for unwanted tracking alerts in iOS and Android, an industry first involving community and industry input. LoRA Land, a web application that hosts 25 LoRA fine-tuned Mistral-7B LLMs on a single NVIDIA A100 GPU with 80GB memory, highlights the quality and cost-effectiveness of employing multiple specialized LLMs over a single, general-purpose LLM. WildChat, a public dataset showcasing how chatbots like GPT-4 and ChatGPT are used by a population of users in practice, offers the most diverse user prompts, contains the largest number of languages, and presents the richest variety of potentially toxic use-cases for researchers to study. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:35 OpenAI Announces GPT-4 Omni 03:00 Apple and Google deliver support for unwanted tracking alerts in iOS and Android 05:03 Sam Altman on GPT-4 Omni 06:20 Fake sponsor 08:43 LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report 10:17 Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models 12:04 WildChat: 1M ChatGPT Interaction Logs in the Wild 14:04 Outro
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Jailbreaking GPT-4's code interpreter, published by nikolaisalreadytaken on July 13, 2023 on LessWrong. Disclaimer: I don't know much about cybersecurity. Much of my knowledge comes from asking GPT-3.5 and GPT-4 for advice. These are some results from around 20 hours of playing around with the code interpreter plugin in early-mid May, when most of this was written. I contacted OpenAI about these jailbreaks in mid May and they mostly seem to still be there. Thank you to Max Nadeau, Trevor Levin, aL xin, Pranav Gade, and Alexandra Bates for feedback on this post! Summary GPT-4's code interpreter plugin has been rolled out to some users. It works by running on a virtual machine that is isolated from the internet and other machines, except for the commands sent in from the API and the results sent back to the API. GPT-4 seems to follow a set of rules that are either enforced through hard access restrictions or through GPT-4 refusing to do things for the user. Here, I highlight 6 rules that GPT-4 claims to be following, but which are easily breakable, alongside some best practices in cybersecurity that have been neglected. In short: GPT-4 claims that it is only supposed to read, modify, or delete files in two designated folders ("sandbox" and "mnt"). However, it is able to read basically any file on the system (including sensitive system files), and it is able to write and delete files outside of its designated folders. This seems to reveal information that the user isn't supposed to see. There are ways to find out information about the hardware that the VM is being run on, including: Information about the way OpenAI logs data, including what libraries and IP address they assign to virtual machines. A rough estimate of the number of VMs that OpenAI can run at maximum at any moment (from the way the IP addresses are allocated). A rough idea of what storage hardware is used (from write speed), alongside some info on other hardware. There is a file in the virtual machine (in a folder labeled "internal") that users can download that details how web requests are handled. As ChatGPT would say: "By exposing your source code, you make it easier for potential attackers to analyze the code and identify security vulnerabilities. This can lead to an increased risk of exploitation if there are any flaws in your implementation." GPT-4 claims that conversations with the model do not have a memory. However, files are routinely saved between conversations with the same user. Later in this post, I present an example of two different conversations with GPT-4 where I write a file in one conversation and read the file in another conversation. GPT-4 claims that there are resource limits in place to prevent users from using too much CPU or memory. However, it is possible to write >80GB of files onto OpenAI's VM within minutes. The rough rate at which I managed to write files is 0.3GB/second. There's a maximum Python runtime of 120 seconds per process, and 25 messages every 3 hours. This can be circumvented using simple workarounds (you can increase usage by at least a factor of 2). GPT-4 claims it cannot execute system commands. However, GPT-4 can and will run (innocuous) system commands and run internet-related commands (such as "ping") despite measures put in place to prevent this. However, OpenAI seems at least partly aware of this. They seem to tell GPT-4 that it has a strict set of rules (as it reliably repeats the rules when asked), and GPT-4 seems to believe these rules in some contexts (most of the time it refuses to do things that go against the rules), but they also left a README file for those curious enough to look at the VM's files that says: You might think that all is well because OpenAI was aware that the system was not secure. I don't think the existence of this README file inv...
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Jailbreaking GPT-4's code interpreter, published by nikolaisalreadytaken on July 13, 2023 on LessWrong. Disclaimer: I don't know much about cybersecurity. Much of my knowledge comes from asking GPT-3.5 and GPT-4 for advice. These are some results from around 20 hours of playing around with the code interpreter plugin in early-mid May, when most of this was written. I contacted OpenAI about these jailbreaks in mid May and they mostly seem to still be there. Thank you to Max Nadeau, Trevor Levin, aL xin, Pranav Gade, and Alexandra Bates for feedback on this post! Summary GPT-4's code interpreter plugin has been rolled out to some users. It works by running on a virtual machine that is isolated from the internet and other machines, except for the commands sent in from the API and the results sent back to the API. GPT-4 seems to follow a set of rules that are either enforced through hard access restrictions or through GPT-4 refusing to do things for the user. Here, I highlight 6 rules that GPT-4 claims to be following, but which are easily breakable, alongside some best practices in cybersecurity that have been neglected. In short: GPT-4 claims that it is only supposed to read, modify, or delete files in two designated folders ("sandbox" and "mnt"). However, it is able to read basically any file on the system (including sensitive system files), and it is able to write and delete files outside of its designated folders. This seems to reveal information that the user isn't supposed to see. There are ways to find out information about the hardware that the VM is being run on, including: Information about the way OpenAI logs data, including what libraries and IP address they assign to virtual machines. A rough estimate of the number of VMs that OpenAI can run at maximum at any moment (from the way the IP addresses are allocated). A rough idea of what storage hardware is used (from write speed), alongside some info on other hardware. There is a file in the virtual machine (in a folder labeled "internal") that users can download that details how web requests are handled. As ChatGPT would say: "By exposing your source code, you make it easier for potential attackers to analyze the code and identify security vulnerabilities. This can lead to an increased risk of exploitation if there are any flaws in your implementation." GPT-4 claims that conversations with the model do not have a memory. However, files are routinely saved between conversations with the same user. Later in this post, I present an example of two different conversations with GPT-4 where I write a file in one conversation and read the file in another conversation. GPT-4 claims that there are resource limits in place to prevent users from using too much CPU or memory. However, it is possible to write >80GB of files onto OpenAI's VM within minutes. The rough rate at which I managed to write files is 0.3GB/second. There's a maximum Python runtime of 120 seconds per process, and 25 messages every 3 hours. This can be circumvented using simple workarounds (you can increase usage by at least a factor of 2). GPT-4 claims it cannot execute system commands. However, GPT-4 can and will run (innocuous) system commands and run internet-related commands (such as "ping") despite measures put in place to prevent this. However, OpenAI seems at least partly aware of this. They seem to tell GPT-4 that it has a strict set of rules (as it reliably repeats the rules when asked), and GPT-4 seems to believe these rules in some contexts (most of the time it refuses to do things that go against the rules), but they also left a README file for those curious enough to look at the VM's files that says: You might think that all is well because OpenAI was aware that the system was not secure. I don't think the existence of this README file inv...
This MacVoices Live! session continues as we explore the conflicts within Reddit, risks of mailing checks, and a security breach with Western Digital's My Cloud service. Chuck Joiner, Jim Rea, David Ginsburg, Ben Roethig, Brian Flanigan-Arthurs, Web Bixby, Mark Fuccio and Eric Bolden challenge the concept of the "Apple tax" and talk about Google's exit from domain name registration. (Part 2) This MacVoices is supported by Notion. Do your most efficient work with Notion Projects. You can try it for free today at notion.com/macvoices. Show Notes Chapters:0:03:14 Reddit's Server Costs and Financial Troubles0:04:57 Reddit's Poor Communication and Business Decision0:05:23 Lessons Not Learned: Alienating Users and Potential Blowback0:06:43 Volunteer Moderators: The Backbone of the Platform0:07:02 Reddit's popularity due to previous service, Dig0:08:51 Concerns about Reddit's future as a profit-generating company0:10:45 The Strengths and Weaknesses of Reddit0:11:51 Communities seeking alternate information storage options0:14:23 Risks of check interception and electronic payment alternatives0:17:40 Payment methods and concerns about giving direct account access0:19:42 Safety tip: Use separate checking accounts for different purposes0:22:14 Western Digital's MyCloud security breach and data protection concerns0:27:09 Concerns about Security Holes and Dubious Services0:29:25 Apple Value vs. Apple Tax0:31:03 Oculus devices vs. Meta's Quest: Quality and Cost Comparison Links: Reddit hackers demand $4.5 million and API changes in threat to leak 80GB of datahttps://www.engadget.com/reddit-hackers-demand-45-million-and-api-changes-in-threat-to-leak-80gb-of-data-114041164.html Why You Should Stop Sending Checks in the Mail, Especially Nowhttps://lifehacker.com/why-you-should-stop-sending-checks-in-the-mail-especia-1850543113 Western Digital's cloud storage service went down for days due to breach https://www.itbrew.com/stories/2023/04/21/western-digital-s-cloud-storage-service-went-down-for-days-due-to-breach Meta will lower the Quest headset's recommended age from 13 to 10https://www.engadget.com/meta-will-lower-the-quest-headsets-recommended-age-from-13-to-10-211153535.html Google Domains shutting down, assets sold and being migrated to Squarespacehttps://9to5google.com/2023/06/15/google-domains-squarespace/ Guests: Web Bixby has been in the insurance business for 40 years and has been an Apple user for longer than that.You can catch up with him on Facebook, Twitter, and LinkedIn. Eric Bolden is into macOS, plants, sci-fi, food, and is a rural internet supporter. You can connect with him on Twitter, by email at embolden@mac.com, on Mastodon at @eabolden@techhub.social, and on his blog, Trending At Work. Brian Flanigan-Arthurs is an educator with a passion for providing results-driven, innovative learning strategies for all students, but particularly those who are at-risk. He is also a tech enthusiast who has a particular affinity for Apple since he first used the Apple IIGS as a student. You can contact Brian on twitter as @brian8944. He also recently opened a Mastodon account at @brian8944@mastodon.cloud. Mark Fuccio is actively involved in high tech startup companies, both as a principle at piqsure.com, or as a marketing advisor through his consulting practice Tactics Sells High Tech, Inc. Mark was a proud investor in Microsoft from the mid-1990's selling in mid 2000, and hopes one day that MSFT will be again an attractive investment. You can contact Mark through Twitter, LinkedIn, or on Mastodon. David Ginsburg is the host of the weekly podcast In Touch With iOS where he discusses all things iOS, iPhone, iPad, Apple TV, Apple Watch, and related technologies. He is an IT professional supporting Mac, iOS and Windows users. Visit his YouTube channel at https://youtube.com/daveg65 and find and follow him on Twitter @daveg65 and on Mastodon at @daveg65@mastodon.cloud Jim Rea has been an independent Mac developer continuously since 1984. He is the founder of ProVUE Development, and the author of Panorama X, ProVUE's ultra fast RAM based database software for the macOS platform. Follow Jim at provue.com and via @provuejim on Twitter. Ben Roethig has been in the Apple Ecosystem since the System 7 Days. He is the a former Associate Editor with Geek Beat, Co-Founder of The Tech Hangout and Deconstruct and currently shares his thoughts on RoethigTech. Contact him on Twitter and Mastodon. Support: Become a MacVoices Patron on Patreon http://patreon.com/macvoices Enjoy this episode? Make a one-time donation with PayPal Connect: Web: http://macvoices.com Twitter: http://www.twitter.com/chuckjoiner http://www.twitter.com/macvoices Mastodon: https://mastodon.cloud/@chuckjoiner Facebook: http://www.facebook.com/chuck.joiner MacVoices Page on Facebook: http://www.facebook.com/macvoices/ MacVoices Group on Facebook: http://www.facebook.com/groups/macvoice LinkedIn: https://www.linkedin.com/in/chuckjoiner/ Instagram: https://www.instagram.com/chuckjoiner/ Subscribe: Audio in iTunes Video in iTunes Subscribe manually via iTunes or any podcatcher: Audio: http://www.macvoices.com/rss/macvoicesrss Video: http://www.macvoices.com/rss/macvoicesvideorss
This week we discuss RHEL licensing changes, check the vibe of DevOps and some thoughts on programing language. Plus, has ChatGPT already become boring? Runner-up Titles I don't like listening to fellow thought leaders. I listen to myself enough. Dammit, alarm was set for PM A massive failure of one The end of free It's not all smiles and thumbs Goose-cow “I used to, but I don't anymore.” The Podcast Review podcast. Rundown RHEL Furthering the evolution of CentOS Stream (https://www.redhat.com/en/blog/furthering-evolution-centos-stream) Red Hat strikes a crushing blow against RHEL downstreams (https://www.theregister.com/2023/06/23/red_hat_centos_move/) IBM/Red Hat Sparks Anger at GPL ‘breach' as RHEL Source Locked Up (https://devops.com/rhel-gpl-richixbw/) Rocky Strikes Back At Red Hat (https://hackaday.com/2023/06/30/rocky-strikes-back-at-red-hat/) The Suicide Attempt by Red Hat [Opinion] (https://news.itsfoss.com/red-hat-fiasco/) Rant about Red Hat's Licensing Change for REHL (https://youtube.com/watch?v=4fAq6AphRn0&feature=share) Reddit Reddit CEO tells employees that subreddit blackout “will pass” (https://www.theverge.com/2023/6/13/23759559/reddit-internal-memo-api-pricing-changes-steve-huffman) Apollo's Christian Selig explains his fight with Reddit — and why users revolted (https://www.theverge.com/2023/6/13/23759180/reddit-protest-private-apollo-christian-selig-subreddit) Reddit doubles down (https://www.platformer.news/p/reddit-doubles-down?utm_medium=email) Hackers threaten to leak 80GB of confidential data stolen from Reddit (https://techcrunch.com/2023/06/19/hackers-threaten-to-leak-80gb-of-confidential-data-stolen-from-reddit) DevOps Second Wave DevOps (https://www.systeminit.com/blog-second-wave-devops/) Kelsey Hightower Predicts How the Kubernetes Community Will Evolve (https://thenewstack.io/kelsey-hightower-predicts-how-the-kubernetes-community-will-evolve/) Kelsey Hightower Retires (https://twitter.com/kelseyhightower/status/1673366087541600256?s=20) Even the best rides come to an end featuring Kelsey Hightower (https://changelog.com/friends/6) (Podcast) Stack Overflow Developer Survey 2023 (https://survey.stackoverflow.co/2023/) Relevant to your Interests AWS teases mysterious mil-spec ‘Snowblade' server (https://www.theregister.com/2023/06/07/aws_snowblade_military_edge_server/) To fill offices, Google issues ultimatum while Salesforce tries charity (https://www.washingtonpost.com/business/2023/06/08/google-salesforce-return-to-office/) Amazon is pursuing 'too many ideas' and needs to focus on best opportunities (https://www.cnbc.com/2023/06/07/amazon-is-pursuing-too-many-ideas-bernstein-says-in-open-letter.html) There are better places for Amazon to put their capital to work, says Bernstein's Mark Shmulik (https://www.youtube.com/watch?v=j9Z2HeYkl4c) The best password managers for 2023 | Engadget (https://www.engadget.com/best-password-manager-134639599.html?guccounter=1&guce_referrer=aHR0cHM6Ly9uZXdzLmdvb2dsZS5jb20v&guce_referrer_sig=AQAAAIYHiHrsIv_lVu8RNqY46BjFzlgU4pFDBXmk1gQxq2wlQOz02b5tuepColb1KJFoYYwQVWy2SjTUKWVY2oAEMzfkYXlXs97_PE0gpwNUA4RjnDwE_YEm7FB323M9oOBQJNHboj1t77QC9HriDL8cJP-VcplJ5UlJvvwHZRzMn9PC) After a Rocky Year, Zuckerberg Lays Out Meta's Road Map to Employees (https://www.nytimes.com/2023/06/08/technology/mark-zuckerberg-meta.html) Hybrid combines the worst of office and remote work (https://world.hey.com/dhh/hybrid-combines-the-worst-of-office-and-remote-work-d3174e50) Twilio to sell ValueFirst business to Tanla (NYSE:TWLO) (https://seekingalpha.com/news/3978773-twilio-to-sell-valuefirst-business-to-tanla) Jeff Bezos Has Gained $10 on Mystery Purchase of One Amazon Share (https://www.bloomberg.com/news/articles/2023-06-09/billionaire-jeff-bezos-just-bought-one-share-of-amazon-and-no-one-knows-why#xj4y7vzkg) Jeff Bezos Has Gained $10 on Mystery Purchase of One Amazon Share (https://www.bloomberg.com/news/articles/2023-06-09/billionaire-jeff-bezos-just-bought-one-share-of-amazon-and-no-one-knows-why#xj4y7vzkg) CNET's Free Shopping Extension Saves You Time and Money. Give It a Try Today (https://www.cnet.com/tech/services-and-software/use-cnet-shopping-to-seek-out-the-best-deals/) Modular: Our launch & what's next (https://www.modular.com/blog/our-launch-whats-next) Exclusive-Broadcom set to win EU nod for $61 billion VMware deal, sources say (https://finance.yahoo.com/news/exclusive-eu-antitrust-regulators-okay-091426470.html) Amazon is reportedly trying to offer Prime subscribers free cell phone service | Engadget (https://www.engadget.com/amazon-is-reportedly-trying-to-offer-prime-subscribers-free-cell-phone-service-140026387.html) Cloud cost management startup CloudZero lands $32M investment (https://techcrunch.com/2023/06/12/cloud-cost-management-startup-cloudzero-lands-32m-investment/) Twitter stiffs Google (https://www.platformer.news/p/twitter-stiffs-google) Open Sourcing AWS Cedar Is a Game Changer for IAM (https://thenewstack.io/open-sourcing-aws-cedar-is-a-game-changer-for-iam/) Oracle beats on top and bottom lines as cloud revenue jumps (https://www.cnbc.com/2023/06/12/oracle-orcl-q4-earnings-report-2023.html) America to halt $68.7bn Microsoft takeover of Activision Blizzard (https://www.thetimes.co.uk/article/america-to-halt-68-7bn-microsoft-takeover-of-activision-blizzard-d80jvxm6f) Meta's Open-Source 'MusicGen' AI Is Like ChatGPT for Tunes (https://gizmodo.com/meta-open-source-musicgen-ai-like-chatgpt-for-music-1850528986) Google's return-to-office crackdown gets backlash from some employees: (https://www.cnbc.com/2023/06/13/google-rto-crackdown-gets-backlash-check-my-work-not-my-badge.html) Forrester Wave Integrated Software Delivery Platforms, Q2 2023 (https://www.forrester.com/blogs/the-forrester-wave-integrated-software-delivery-platforms-q2-2023-say-goodbye-to-the-devops-tax/) The economic potential of generative AI: The next productivity frontier (https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-economic-potential-of-generative-ai-the-next-productivity-frontier?utm_source=newsletter&utm_medium=email&utm_campaign=newsletter_axioslogin&stream=top) 1 big thing: Where AI's productivity revolution will strike first (https://www.axios.com/newsletters/axios-login-da50d8f4-fb10-4952-af38-01163b9acbd3.html?chunk=0&utm_term=emshare#story0) For the first time in almost 30 years, a company other than IBM received the most US patents (https://finance.yahoo.com/news/first-time-almost-30-years-192900742.html) AMD stock pops on potential Amazon superchip deal, CEO bullishness (https://finance.yahoo.com/news/amd-stock-pops-on-potential-amazon-superchip-deal-ceo-bullishness-112819279.html) Amazon cloud services back up after big outage hits thousands of users (https://www.reuters.com/technology/amazon-says-multiple-cloud-services-down-users-2023-06-13/) Proven Practices for Developing a Multicloud Strategy | Amazon Web Services (https://aws.amazon.com/blogs/enterprise-strategy/proven-practices-for-developing-a-multicloud-strategy/) 40 photos from inside Metropolitan Park—the first phase of Amazon's HQ2 (https://www.aboutamazon.com/news/amazon-offices/amazon-headquarters-hq2-arlington-virginia-photos?utm_source=newsletter&utm_medium=email&utm_campaign=newsletter_axioslogin&stream=top) The Forrester Wave™: Integrated Software Delivery Platforms, Q2 2023 (https://page.gitlab.com/forrester-wave-integrated-software-delivery-platforms-2023.html?utm_source=cote&utm_campaign=devrel&utm_content=newsletter20230615&utm_medium=email) AWS US-EAST-1 wobbled after Lambda management issues spread (https://www.theregister.com/2023/06/14/aws_us_east_1_brownout/) The store is for people, but the storefront is for robots (https://www.theverge.com/23753963/google-seo-shopify-small-business-ai) A Look Back at Q1 '23 Public Cloud Software Earnings (https://cloudedjudgement.substack.com/p/a-look-back-at-q1-23-public-cloud?utm_source=post-email-title&publication_id=56878&post_id=128805971&isFreemail=true&utm_medium=email) Apple Is Taking On Apples in a Truly Weird Trademark Battle (https://www.wired.com/story/apple-vs-apples-trademark-battle/) Apple Watch alerts 29-year-old Cincinnati woman to blood clot in lungs while sleeping (https://9to5mac.com/2023/06/19/apple-watch-blood-clot-sleeping/) Return to Office Enters the Desperation Phase (https://www.nytimes.com/2023/06/20/business/return-to-office-remote-work.html) Critical 'nOAuth' Flaw in Microsoft Azure AD Enabled Complete Account Takeover (https://thehackernews.com/2023/06/critical-noauth-flaw-in-microsoft-azure.html) What happened to Oracle? Why do they keep acquiring companies? (https://www.tiktok.com/t/ZT8JH8X5Y/) How an ex-Googler is reimagining the oldest computing interface of all (https://www.fastcompany.com/90907013/warp-terminal-command-line) WFH 4 ever (https://www.axios.com/2023/06/23/work-from-home-remote-workplace-trend) Databricks picks up MosaicML, an OpenAI competitor, for $1.3B (https://techcrunch.com/2023/06/26/databricks-picks-up-mosaicml-an-openai-competitor-for-1-3b/) Introducing LLaMA: A foundational, 65-billion-parameter language model (https://ai.facebook.com/blog/large-language-model-llama-meta-ai/?utm_source=newsletter&utm_medium=email&utm_campaign=newsletter_axioslogin&stream=top) AI's next conflict is between open and closed (https://www.axios.com/newsletters/axios-login-e2a8f546-c6e2-421c-a7dc-0996d64bf312.html?chunk=0&utm_term=emshare#story0) Amazon is investing another $7.8B in Ohio-based cloud computing operations, (https://apnews.com/article/amazon-aws-ohio-data-center-investment-e35c8b726269b6b78ce05854f9f31d27) A new law protecting pregnant workers is about to take effect (https://www.axios.com/2023/06/22/pregnant-workers-fairness-act-2023-explain) Amazon launches AWS AppFabric to help customers connect their SaaS apps (https://techcrunch.com/2023/06/27/amazon-launches-aws-appfabric-to-help-customers-connect-their-saas-apps/?guccounter=1&guce_referrer=aHR0cHM6Ly9uZXdzLmdvb2dsZS5jb20v&guce_referrer_sig=AQAAAGcA6HN4Zti_4dKCpuMURoiAkkQ_uR0GBWFOG215KnmRsvryBDclj9SjWv-95R0yA0wFRXevcP-HUdwk-E3ZyR3d23rc5VGVCNXFGK5L3mAPvoEOJxRs6WZFKQvDUBIyw5V3NpdWGkkQ-fXDh4Rijfdp2l_ekJTxepVJjoYJSyKz) State of Kubernetes Cost Optimization Report (https://inthecloud.withgoogle.com/state-of-kubernetes-cost-optimization-report/dl-cd.html) FTC Request, Answered: How Cloud Providers Do Business (https://www.lastweekinaws.com/blog/ftc-request-answered-how-cloud-providers-do-business/) OrbStack · Fast, light, simple Docker & Linux on macOS (https://orbstack.dev/?ref=console.dev) Surprise! You Work for Amazon. (https://www.theatlantic.com/technology/archive/2023/06/amazon-hub-delivery-last-mile/674559/) btop - the htop alternative (https://haydenjames.io/btop-the-htop-alternative/) We Raised A Bunch Of Money (https://fly.io/blog/we-raised-a-bunch-of-money/) Twitter has stopped paying its Google Cloud bills (https://www.businessinsider.com/elon-musk-twitter-stopped-paying-google-cloud-bills-money-platformer-2023-6) Report: 2022 Microsoft Azure Revenue Less Than Estimated, Half That Of AWS | CRN (https://www.crn.com/news/cloud/report-2022-microsoft-azure-revenue-less-than-estimated-half-that-of-aws) Google Domains shutting down, assets sold and being migrated to Squarespace (https://9to5google.com/2023/06/15/google-domains-squarespace/) Is Waze next? (https://www.theverge.com/2023/6/27/23776329/google-waze-layoffs-ads) The real story of how Facebook almost acquired Waze, but we ended up with Google (https://post.news/@/noam/2RTRvTNNxSCQb3yNjqa0DPfr1Yk) Google killed its Iris augmented-reality smart glasses (https://www.businessinsider.com/google-ar-iris-augmented-reality-smart-glasses-2023-6) Who killed Google Reader? (https://www.theverge.com/23778253/google-reader-death-2013-rss-social) Mark Zuckerberg is ready to fight Elon Musk in a cage match (https://www.theverge.com/2023/6/21/23769263/mark-zuckerberg-elon-musk-fight-cage-match-worldstar) IBM to Acquire Apptio Inc., (https://newsroom.ibm.com/2023-06-26-IBM-to-Acquire-Apptio-Inc-,-Providing-Actionable-Financial-and-Operational-Insights-Across-Enterprise-IT) IBM Re-ups On FinOps With Its Apptio Acquisition (https://www.forrester.com/blogs/ibm-re-ups-on-finops-with-its-apptio-acquisition/) Nonsense Texas Bans Kids From Social Media Without Mom and Dad's Ok (https://gizmodo.com/texas-law-kids-social-media-ban-without-parents-consent-1850540419) Summer intern's commute goes viral: She flies from South Carolina to New Jersey (https://www.cnn.com/2023/06/15/business/tiktok-summer-intern-commute/index.html) Twitter evicted from office amid lawsuits over unpaid rent and cleaning bills (https://arstechnica.com/tech-policy/2023/06/judge-ruled-twitter-must-be-evicted-from-colorado-office-over-unpaid-rent/) Fishing crew denied $3.5M in prize money after 600-pound marlin DQ'd in tournament (https://nypost.com/2023/06/19/massive-marlin-dqd-in-big-rock-blue-marlin-tournament-over-mutilation/) 'World's Largest' Buc-ee's store opens (https://www.wyff4.com/article/bucees-world-largest-tennessee/44343171) now on Bus-ee's Map (https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwjgoKnr-vX_AhVslGoFHeeBBREQFnoECBgQAQ&url=https%3A%2F%2Fwww.google.com%2Fmymaps%2Fviewer%3Fmid%3D1IBCXZDU73Q5pjsDWVkoQ5O0GLoUd-bg%26hl%3Den&usg=AOvVaw3joznC0GgnH9dU-z_XGEw5&opi=89978449) Magic Mushrooms. LSD. Ketamine. The Drugs That Power Silicon Valley. (https://www.wsj.com/articles/silicon-valley-microdosing-ketamine-lsd-magic-mushrooms-d381e214) 'Fueled by inflation': USPS stamp prices are increasing soon. Here's what to know. (https://www.usatoday.com/story/money/2023/06/28/stamp-price-increase-usps/70363626007/) At least a year younger on paper: South Korea makes changes to age-counting law (https://www.usatoday.com/story/news/world/2023/06/28/south-korea-changes-age-counting-law/70363453007/) Sony just spilled confidential PlayStation information because of a Sharpie (https://www.theverge.com/2023/6/28/23777298/sony-ftc-microsoft-confidential-documents-marker-pen-scanner-oops) Australia legalises psychedelics for mental health (https://www.bbc.co.uk/news/world-australia-66072427) Listener Feedback Let's Get To The News | Craig Box | Substack (https://craigbox.substack.com/) When You Don't Have a Seat At the (Managed Database) Table (https://unskript.com/blog/when-you-don-t-have-a-seat-at-the-(managed-database)-table> Show more) by Doug Sillars Conferences August 8th Kubernetes Community Day Australia (https://community.cncf.io/events/details/cncf-kcd-australia-presents-kubernetes-community-day-australia-2023/) in Sydney, Matt attending. August 21st to 24th SpringOne (https://springone.io/) & VMware Explore US (https://www.vmware.com/explore/us.html), in Las Vegas. Explore EU CFP is open. Sep 6th to 7th DevOpsDays Des Moines (https://devopsdays.org/events/2023-des-moines/welcome/), Coté speaking. Sep 18th to 19th SHIFT (https://shift.infobip.com/) in Zadar, Coté speaking. October 6, 2023, KCD Texas 2023 (https://community.cncf.io/events/details/cncf-kcd-texas-presents-kcd-texas-2023/), CFP Closes: August 30, 2023 Jan 29, 2024 to Feb 1, 2024 That Conference Texas CFP Open 6/1 - 8/21 (https://that.us/call-for-counselors/tx/2024/) If you want your conference mentioned, let's talk media sponsorships. SDT news & hype Join us in Slack (http://www.softwaredefinedtalk.com/slack). Get a SDT Sticker! Send your postal address to stickers@softwaredefinedtalk.com (mailto:stickers@softwaredefinedtalk.com) and we will send you free laptop stickers! Follow us on Twitch (https://www.twitch.tv/sdtpodcast), Twitter (https://twitter.com/softwaredeftalk), Instagram (https://www.instagram.com/softwaredefinedtalk/), Mastodon (https://hachyderm.io/@softwaredefinedtalk), BlueSky (https://bsky.app/profile/softwaredefinedtalk.com), LinkedIn (https://www.linkedin.com/company/software-defined-talk/), TikTok (https://www.tiktok.com/@softwaredefinedtalk) and YouTube (https://www.youtube.com/channel/UCi3OJPV6h9tp-hbsGBLGsDQ/featured). Use the code SDT to get $20 off Coté's book, Digital WTF (https://leanpub.com/digitalwtf/c/sdt), so $5 total. Become a sponsor of Software Defined Talk (https://www.softwaredefinedtalk.com/ads)! Recommendations Brandon: Cloudcast: MidYear 2023 Update (https://www.thecloudcast.net/2023/07/midyear-2023-update.html) Governments Building Software This Is What Happens When Governments Build Software - Odd Lots (https://omny.fm/shows/odd-lots/this-is-what-happens-when-governments-build-softwa) The Book I Wish Every Policymaker Would Read (https://www.nytimes.com/2023/06/06/opinion/ezra-klein-podcast-jennifer-pahlka.html) Tony Hsieh and the Emptiness of the Tech-Mogul Myth (https://www.newyorker.com/news/our-columnists/tony-hsieh-and-the-emptiness-of-the-tech-mogul-myth) (via Coté's newsletter) Coté: Hand Mirror app (https://handmirror.app), also in Setapp (https://setapp.com) if you have that. If Books could Kill (https://www.patreon.com/IfBooksPod) Photo Credits Header (https://unsplash.com/photos/5yuRImxKOcU) Artwork (https://www.freepnglogos.com/images/linux-22615.html)
This MacVoices Live! session continues as we explore the conflicts within Reddit, risks of mailing checks, and a security breach with Western Digital's My Cloud service. Chuck Joiner, Jim Rea, David Ginsburg, Ben Roethig, Brian Flanigan-Arthurs, Web Bixby, Mark Fuccio, and Eric Bolden challenge the concept of the "Apple tax" and talk about Google's exit from domain name registration. (Part 2) This MacVoices is supported by Notion. Do your most efficient work with Notion Projects. You can try it for free today at notion.com/macvoices. Show Notes Chapters:0:03:14 Reddit's Server Costs and Financial Troubles0:04:57 Reddit's Poor Communication and Business Decision0:05:23 Lessons Not Learned: Alienating Users and Potential Blowback0:06:43 Volunteer Moderators: The Backbone of the Platform0:07:02 Reddit's popularity due to previous service, Dig0:08:51 Concerns about Reddit's future as a profit-generating company0:10:45 The Strengths and Weaknesses of Reddit0:11:51 Communities seeking alternate information storage options0:14:23 Risks of check interception and electronic payment alternatives0:17:40 Payment methods and concerns about giving direct account access0:19:42 Safety tip: Use separate checking accounts for different purposes0:22:14 Western Digital's MyCloud security breach and data protection concerns0:27:09 Concerns about Security Holes and Dubious Services0:29:25 Apple Value vs. Apple Tax0:31:03 Oculus devices vs. Meta's Quest: Quality and Cost Comparison Links: Reddit hackers demand $4.5 million and API changes in threat to leak 80GB of datahttps://www.engadget.com/reddit-hackers-demand-45-million-and-api-changes-in-threat-to-leak-80gb-of-data-114041164.html Why You Should Stop Sending Checks in the Mail, Especially Nowhttps://lifehacker.com/why-you-should-stop-sending-checks-in-the-mail-especia-1850543113 Western Digital's cloud storage service went down for days due to breach https://www.itbrew.com/stories/2023/04/21/western-digital-s-cloud-storage-service-went-down-for-days-due-to-breach Meta will lower the Quest headset's recommended age from 13 to 10https://www.engadget.com/meta-will-lower-the-quest-headsets-recommended-age-from-13-to-10-211153535.html Google Domains shutting down, assets sold and being migrated to Squarespacehttps://9to5google.com/2023/06/15/google-domains-squarespace/ Guests: Web Bixby has been in the insurance business for 40 years and has been an Apple user for longer than that.You can catch up with him on Facebook, Twitter, and LinkedIn. Eric Bolden is into macOS, plants, sci-fi, food, and is a rural internet supporter. You can connect with him on Twitter, by email at embolden@mac.com, on Mastodon at @eabolden@techhub.social, and on his blog, Trending At Work. Brian Flanigan-Arthurs is an educator with a passion for providing results-driven, innovative learning strategies for all students, but particularly those who are at-risk. He is also a tech enthusiast who has a particular affinity for Apple since he first used the Apple IIGS as a student. You can contact Brian on twitter as @brian8944. He also recently opened a Mastodon account at @brian8944@mastodon.cloud. Mark Fuccio is actively involved in high tech startup companies, both as a principle at piqsure.com, or as a marketing advisor through his consulting practice Tactics Sells High Tech, Inc. Mark was a proud investor in Microsoft from the mid-1990's selling in mid 2000, and hopes one day that MSFT will be again an attractive investment. You can contact Mark through Twitter, LinkedIn, or on Mastodon. David Ginsburg is the host of the weekly podcast In Touch With iOS where he discusses all things iOS, iPhone, iPad, Apple TV, Apple Watch, and related technologies. He is an IT professional supporting Mac, iOS and Windows users. Visit his YouTube channel at https://youtube.com/daveg65 and find and follow him on Twitter @daveg65 and on Mastodon at @daveg65@mastodon.cloud Jim Rea has been an independent Mac developer continuously since 1984. He is the founder of ProVUE Development, and the author of Panorama X, ProVUE's ultra fast RAM based database software for the macOS platform. Follow Jim at provue.com and via @provuejim on Twitter. Ben Roethig has been in the Apple Ecosystem since the System 7 Days. He is the a former Associate Editor with Geek Beat, Co-Founder of The Tech Hangout and Deconstruct and currently shares his thoughts on RoethigTech. Contact him on Twitter and Mastodon. Support: Become a MacVoices Patron on Patreon http://patreon.com/macvoices Enjoy this episode? Make a one-time donation with PayPal Connect: Web: http://macvoices.com Twitter: http://www.twitter.com/chuckjoiner http://www.twitter.com/macvoices Mastodon: https://mastodon.cloud/@chuckjoiner Facebook: http://www.facebook.com/chuck.joiner MacVoices Page on Facebook: http://www.facebook.com/macvoices/ MacVoices Group on Facebook: http://www.facebook.com/groups/macvoice LinkedIn: https://www.linkedin.com/in/chuckjoiner/ Instagram: https://www.instagram.com/chuckjoiner/ Subscribe: Audio in iTunes Video in iTunes Subscribe manually via iTunes or any podcatcher: Audio: http://www.macvoices.com/rss/macvoicesrss Video: http://www.macvoices.com/rss/macvoicesvideorss
It's rare for all of a company's users to turn against it in such an intense way, but Reddit has managed to alienate all of their users at once. After a recent announcement that the company will no longer be offering their API for third-party apps, users have closed ranks and made the site virtually unusable. This week guest-host Haje Kamps talks with TechCrunch Sr. Reporter Morgan Sung about whether there is room for compromise between the moderators who keep the site running, third-party apps that users love, and the profit-motivated Reddit management team.Articles from the episode:A whistleblower raised safety concerns about OceanGate's submersible in 2018. Then he was fired.Volkswagen's breakthrough could spark a battery manufacturing gold rushWhatsApp introduces feature to automatically silence calls from unknown numbersNetflix launches website based on the fictional streaming service from ‘Black Mirror'Hundreds of subreddits plan to go dark indefinitely after Reddit CEO's internal memoReddit communities adopt alternative forms of protest, as the company threatens action on moderatorsHackers threaten to leak 80GB of confidential data stolen from RedditThe TechCrunch Podcast posts every Friday. Subscribe on Apple, Spotify or wherever you listen to podcasts to be alerted when new episodes drop. Check out the other TechCrunch podcasts: Equity, Found and Chain Reaction..
It's 5:05 on Thursday, June 22nd, 2023. From the source podcast network in New York city. This is Pokie Huang. Stories in today's episode come from Edwin Kwan in Sydney, Australia, Katy Craig in San Diego, California, Hillary Coover in Washington, DC, Olimpiu Pop in Transylvania, Romania and Marcel Brown in St. Louis, Missouri.Let's get to it!Apple releases a patch addressing three zero-day vulnerabilities
FU Podcast - E173 - June 20, 2023 Ladies and gentlemen, welcome to another exhilarating episode of the FU Podcast! I'm your host, Devin Ewart, and tonight we have an incredible lineup of stories and discussions to keep you engaged and entertained. Joining me are the dynamic trio of hosts: the knowledgeable and charismatic Patrick Martin, and the ever-entertaining Andy Henning. Local News: Excessive heat warning issued for Denton County Power outages caused by powerful storms in Texas and Oklahoma Innovative tiny home village in Texas offering affordable land rental Flower Mound ranked as the second-best place to live in the U.S. National Headlines: Fishing crew disqualified from $3 million prize due to a controversial ruling Harrowing video of a deputy sheriff's underwater journey after being sucked into a drain pipe Mysterious disappearance of a submarine that tours the Titanic wreckage Unconventional artist specializing in "crappy paintings" Archaeological discovery of a 6-inch stone penis from medieval times Business News: Hackers threatening to leak 80GB of stolen confidential data from Reddit Prominent vaccine scientist's unsettling experience of being stalked after a Twitter exchange with Joe Rogan Discussion on best and worst Father's Day gifts Stories from individuals with absent or difficult fathers Exploring what dads actually want for Father's Day FU Podcast - We Invite Our Listeners To Become Guests LIVE ON AIR!! To interact with our LIVE shows and see the episodes you might have missed go to: https://www.facebook.com/Frustration.Unlimited.Podcast Your Hosts: Devin G Ewart Patrick Martin Andy Henning To Donate to the Show: https://paypal.me/FUPodcast Find Us On: Facebook: Frustration.Unlimited.Podcast Twitter: FrustrationU Instagram: Frustration_Unlimited Youtube: https://dge.dgemedia.com/FU-Podcast Twitch: FU_Podcast Patreon: FUPodcast Listen to ALL of our shows on: Amazon Alexa: https://dge.dgemedia.com/FU-Amazon Apple iTunes: https://dge.dgemedia.com/FU-Apple Spotify: https://dge.dgemedia.com/FU-Spotify iHeartMedia: https://dge.dgemedia.com/FU-iHeart Google: https://dge.dgemedia.com/FU-Google Stitcher: https://dge.dgemedia.com/FU-Stitcher Breaker: https://dge.dgemedia.com/FU-Breaker Castbox: https://dge.dgemedia.com/FU-Castbox Anchor: https://dge.dgemedia.com/FU-Anchor Audible: https://dge.dgemedia.com/FU-Audible Radio Public: https://dge.dgemedia.com/FU-Radio Pocket Cast: https://dge.dgemedia.com/FU-Pocket Podchaser: https://dge.dgemedia.com/FU-Podchaser Listen Notes: https://dge.dgemedia.com/FU-Listen Castro: https://dge.dgemedia.com/FU-Castro Podcast Addict: https://dge.dgemedia.com/FU-Addict Player FM: https://dge.dgemedia.com/FU-Player Listen App: https://dge.dgemedia.com/FU-ListenApp Coming from the great state of TEXAS!!! This Podcast Is For The People, By The People Who Are Frustrated With All The Issues Going On In The World… --- Support this podcast: https://podcasters.spotify.com/pod/show/frustration-unlimited/support
Hackers threaten to leak 80GB of confidential data stolen from Reddit; Reddit communities adopt alternative forms of protest; The latest in startup business news
Heute u.A. mit diesen Themen:Insolvenzwelle bei deutschen StartupsFinanzierungen für KI-Startups im Jahr 2023 um 49% gesunkenHackergruppe droht mit Veröffentlichung von 80GB gestohlener Reddit-DatenGen Z führt bei Spieleinstallationen und In-App-KäufenDeutsche Bahn setzt in Berlin und Hamburg auf KI für pünktlichere ZügeFlugtaxi-Entwickler Lilium verkauft 100 Jets nach ChinaHelloFresh eröffnet erstes automatisiertes Produktionswerk in EuropaLinkedIn: Wachstum und Zukunftspläne des KarrierenetzwerksGesundheits-Apps gewinnen unter Deutschen an BeliebtheitHYGH: DOOH-Unternehmen sichert sich 21,5 Millionen Euro
AI could make people more dependent on technology and cause more stress, PwC reports that video games and esports revenue in the US is forecast to reach $72bn by 2027, Meta Platforms releases Voicebox, hackers threaten to release 80GB of confidential data stolen from Reddit, You.com CEO Richard Socher raises $45m, Rever aims to modernize refunds and returns, Twitter users report family-friendly ads next to user-posted clips of an antisemitic film, and Microsoft confirms that its June Outlook outage was a Distributed Denial-of-Service attack.
OpenAI just rollicked the AI world yet again yesterday — while releasing the long awaited ChatGPT API, they also priced it at $2 per million tokens generated, which is 90% cheaper than the text-davinci-003 pricing of the “GPT3.5” family. Their blogpost on how they did it is vague: Through a series of system-wide optimizations, we've achieved 90% cost reduction for ChatGPT since December; we're now passing through those savings to API users.We were fortunate enough to record Episode 2 of our podcast with someone who routinely creates 90%+ improvements for their customers, and in fact have started productizing their own infra skills with Codeium, the rapidly growing free-forever Copilot alternative (see What Building “Copilot for X” Really Takes). Varun Mohan is CEO of Exafunction/Codeium, and he indulged us in diving deep into AI infrastructure, compute-optimal training vs inference tradeoffs, and why he loves suffering.Recorded in-person at the beautiful StudioPod studios in San Francisco.Full transcript is below the fold. Timestamps* 00:00: Intro to Varun and Exafunction* 03:06: GPU Efficiency, Model Flop Utilization, Dynamic Multiplexing* 05:30: Should companies own their ML infrastructure?* 07:00: The two kinds of LLM Applications* 08:30: Codeium* 14:50: “Our growth is 4-5% day over day”* 16:30: Latency, Quality, and Correctability* 20:30: Acceleration mode vs Exploration mode* 22:00: Copilot for X - Harvey AI's deal with Allen & Overy* 25:00: Scaling Laws (Chinchilla)* 28:45: “The compute-optimal model might not be easy to serve”* 30:00: Smaller models* 32:30: Deepmind Retro can retrieve external infromation* 34:30: Implications for embedding databases* 37:10: LLMOps - Eval, Data Cleaning* 39:45: Testing/User feedback* 41:00: “Users Is All You Need”* 42:45: General Intelligence + Domain Specific Dataset* 43:15: The God Nvidia computer* 46:00: Lightning roundShow notes* Varun Mohan Linkedin* Exafunction* Blogpost: Are GPUs Worth it for ML* Codeium* Copilot statistics* Eleuther's The Pile and The Stack* What Building “Copilot for X” Really Takes* Copilot for X* Harvey, Copilot for Law - deal with Allen & Overy* Scaling Laws* Training Compute-Optimal Large Language Models - arXiv (Chinchilla paper)* chinchilla's wild implications (LessWrong)* UL2 20B: An Open Source Unified Language Learner (20B)* Paper - Deepmind Retro* “Does it make your beer taste better”* HumanEval benchmark/dataset* Reverse Engineering Copilot internals* Quora Poe* Prasanna Sankar notes on FLOPs and Bandwidth* NVIDIA H100 specs - 3TB/s GPU memory, 900GB/s NVLink Interconnect* Optimizer state is 14x size of model - 175B params => 2.5TB to store state → needs at least 30 H100 machines with 80GB each* Connor Leahy on The Gradient PodcastLightning Rounds* Favorite AI Product: Midjourney* Favorite AI Community: Eleuther and GPT-J* One year prediction: Better models, more creative usecases* Request for Startup: Superathlete Fitness Assistant* Takeaway: Continue to tinker!Transcript[00:00:00] Alessio Fanelli: Hey everyone. Welcome to the Latent Space podcast. This is Alessio, partner and CTO in residence at Decibel Partners. I'm joined by my cohost, swyx, writer, editor of L Space Diaries.[00:00:20] swyx: Hey, and today we have Varun Mohan from Codeium / Exafunction on. I should introduce you a little bit because I like to get the LinkedIn background out of the way.[00:00:30] So you did CS at MIT and then you spent a few years at Nuro where you were ultimately tech lead manager for autonomy. And that's an interesting dive. Self-driving cars in AI and then you went straight into Exafunction with a few of your coworkers and that's where I met some of them and started knowing about Exafunction.[00:00:51] And then from out of nowhere you cloned GitHub Copilot. That's a lot of progress in a very short amount of time. So anyway, welcome .[00:00:59] Varun Mohan: That's high praise.[00:01:00] swyx: What's one thing about you that doesn't appear on LinkedIn that is a big part of what people should know?[00:01:05] Varun Mohan: I actually really like endurance sports actually.[00:01:09] Like I, I've done multiple triathlons. I've actually biked from San Francisco to LA. I like things that are like suffering. I like to suffer while I, while I do sports. Yeah.[00:01:19] swyx: Do you think a lot about like code and tech while you're doing those endurance sports or are you just,[00:01:24] Varun Mohan: your mind is just focused?[00:01:26] I think it's maybe a little bit of both. One of the nice things about, I guess, endurance athletics, It's one of the few things you can do where you're not thinking about, you can't really think about much beyond suffering. Like you're climbing up a hill on a bike and you see like, uh, you see how many more feet you need to climb, and at that point you're just struggling.[00:01:45] That's your only job. Mm-hmm. . Yeah. The only thing you can think of is, uh, pedaling one more pedal. So it's actually like a nice, a nice way to not think about work. Yeah,[00:01:53] Alessio Fanelli: yeah, yeah. Maybe for the audience, you wanna tell a bit about exa function, how that came to be and how coding came out[00:01:59] Varun Mohan: of that. So a little bit about exo function.[00:02:02] Before working at exa function, I worked at Neuro as Sean was just saying, and at neuro, I sort of managed large scale offline deep learning infrastructure. Realized that deep learning infrastructure is really hard to build and really hard to maintain for even the most sophisticated companies, and started exa function to basically solve that gap, to make it so that it was much easier for companies.[00:02:24] To serve deep learning workloads at scale. One of the key issues that we noticed is GPUs are extremely hard to manage fundamentally because they work differently than CPUs. And once a company has heterogeneous hardware requirements, it's hard to make sure that you get the most outta the hardware. It's hard to make sure you can get, get great GPU utilization and exa function was specifically built to make it so that you could get the most outta the hardware.[00:02:50] Make sure. Your GP was effectively virtualized and decoupled from your workload to make it so that you could be confident that you were running at whatever scale you wanted without burning the bank.[00:03:00] swyx: Yeah. You gave me this metric about inefficiency,[00:03:03] Varun Mohan: right? Oh, okay. Like flop efficiency. Yeah. Yeah. So basically, I think it comes down to, for most people, one of the things about CPUs that's really nice is with containers, right?[00:03:13] You can end up having a single. You can place many containers on them and all the containers will slowly start eating the compute. It's not really the same with GPUs. Like let's say you have a single. For the most part, only have one container using that gpu. And because of that, people heavily underestimate what a single container can sort of do.[00:03:33] And the GPU is left like heavily idle. And I guess the common term now with a lot of LM workloads is like the flop efficiency of these workloads. M F U, yeah. Yeah. Model flop utilization. The model flop utilization, which is basically like what fraction of the flops or compute on the hardware is actually getting used.[00:03:49] And sort of what we did at exa function. Not only make it so that the model was always running, we also built compiler technology to make it so that the model was also running more efficiently. And some of these things are with tricks like operator fusion, like basically you could imagine fusing two operations together such that the time it takes to compute.[00:04:07] the fused operation is lower than the time it takes for each individual operation. Oh my God. Yeah. .[00:04:13] Alessio Fanelli: Yeah. And you have this technique called dynamic multiplexing, which is basically, instead of having a one-to-one relationship, you have one GP for multiple clients. And I saw one of your customers, they went from three clients to just one single GPU and the cost by 97%.[00:04:29] What were some of those learning, seeing hardware usage and efficiencies and how that then played into what, what[00:04:34] Varun Mohan: you're building? Yeah, I think it basically showed that there was probably a gap with even very sophisticated teams. Making good use of the hardware is just not an easy problem. I think that was the main I, it's not that these teams were like not good at what they were doing, it's just that they were trying to solve a completely separate problem.[00:04:50] They had a model that was trained in-house and their goal was to just run it and it, that should be an easy. Easy thing to do, but surprisingly still, it's not that easy. And that problem compounds in complexity with the fact that there are more accelerators now in the cloud. There's like TPUs, inferential and there's a lot of decisions, uh, that users need to make even in terms of GPU types.[00:05:10] And I guess sort of what we had was we had internal expertise on what the right way to run the workload was, and we were basically able to build infrastructure and make it so that companies could do that without thinking. So most[00:05:21] Alessio Fanelli: teams. Under utilizing their hardware, how should they think about what to own?[00:05:26] You know, like should they own the appearance architecture? Like should they use Xlo to get it to production? How do you think[00:05:32] Varun Mohan: about it? So I think one thing that has proven to be true over the last year and a half is companies, for the most part, should not be trying to figure out what the optimal ML architecture is or training architecture is.[00:05:45] Especially with a lot of these large language models. We have generic models and transformer architecture that are solving a lot of distinct problems. I'll caveat that with most companies. Some of our customers, which are autonomous vehicle companies, have extremely strict requirements like they need to be able to run a model at very low latency, extremely high precision recall.[00:06:05] You know, GBT three is great, but the Precision Recall, you wouldn't trust someone's life with that, right? So because of that, they need to innovate new kinds of model architectures. For a vast majority of enterprises, they should probably be using something off the shelf, fine tuning Bert models. If it's vision, they should be fine tuning, resonant or using something like clip like the less work they can do, the better.[00:06:25] And I guess that was a key turning point for us, which is like we start to build more and more infrastructure for the architectures that. The most popular and the most popular architecture was the transformer architecture. We had a lot of L L M companies explicitly reach out to us and ask us, wow, our GT three bill is high.[00:06:44] Is there a way to serve G P T three or some open source model much more cheaply? And that's sort of what we viewed as why we were maybe prepared for when we internally needed to deploy transform models our.[00:06:58] Alessio Fanelli: And so the next step was, Hey, we have this amazing infrastructure. We can build kind of consumer facing products, so to speak, at with much better unit economics, much better performance.[00:07:08] And that's how code kind[00:07:10] Varun Mohan: of came to be. Yeah. I think maybe the, the play is not maybe for us to be just, we make a lot of consumer products. We want to make products with like clear ROI in the long term in the enterprise. Like we view code as maybe one of those things. Uh, and maybe we can, we can talk about code maybe after this.[00:07:27] We. Products like co-pilot as being extremely valuable and something that is generating a lot of value to professionals. We saw that there was a gap there where a lot of people probably weren't developing high intensive L L M applications because of cost, because of the inability to train models the way they want to.[00:07:44] And we thought we could do that with our own infrastructure really quickly.[00:07:48] swyx: I wanna highlight when you say high intensive, you mean basically generate models every key, uh, generate inferences on every keystroke? That's[00:07:55] Varun Mohan: right. Yeah. So I would say like, there's probably two kinds of L l M applications here.[00:07:59] There's an L L M application where, you know, it rips through a bunch of data and maybe you wait a couple minutes and then you see something, and then there's an application where the quality is not exactly what you want, but it's able to generate enough, sorry, low enough latency. It's still providing a ton of value.[00:08:16] And I will say there's like a gap there where the number of products that have hit that co-pilot spot is actually not that high. Mm. A lot of them are, are kind of like weight and, you know, just generate a lot of stuff and see what happens because one is clearly more compute intensive than the other Basically.[00:08:31] swyx: Well co uh, I don't know if we told the whole story yet, you were going to[00:08:35] Varun Mohan: dive into it. . Yeah, so I guess, I guess the story was I guess four or five months ago we sort of decided internally as a team we were like very early adopters of co-pilot. I'm not gonna sit here and say co-pilot, it's not a great tool.[00:08:45] We love co-pilot. It's like a fantastic tool. We all got on the beta. The moment it came out we're like a fairly small T, but we, like we all got in, we were showing each other completions. We end up writing like a lot of cuda and c plus plus inside the company. And I think there was probably a thought process within us that was like, Hey, the code we write is like very high aq.[00:09:04] You know? So like there's no way it can help. And one of the things in c plus plus that's like the most annoying is writing templates. Writing template programming is maybe one of those things. No one, maybe there's like some people in the C plus O standards community that can do it without looking at the, looking at anything online.[00:09:19] But we struggle. We struggle writing bariatric templates and COPA just like ripped through. Like we had a 500 line file and it was just like writing templates like, and we didn't really even test it while we were running it. We then just compiled it and it just, We're like, wow. Like this is actually something that's not just like it's completing four loops, it's completing code for us.[00:09:38] That is like hard in our brains to reach, but fundamentally and logically is not that complicated. The only reason why it's complicated is there's just a lot of rules, right. And from then we were just like, wow, this is, that was maybe the first l l m application for us internally, because we're not like marketers that would use, uh, Jasper, where we were like, wow, this is like extremely valuable.[00:09:58] This is not a toy anymore. So we wanted to take our technology to build maybe apps where these apps were not gonna be toys, right? They were not gonna be like a demo where you post it on Twitter and then you know there's hype and then maybe like a month later, no one's using.[00:10:11] swyx: There's a report this morning, um, from co-pilot where they, they were estimating the key tabs on amount of code generated by a co-pilot that is then left in code repos and checked in, and it's something like 60 to 70%[00:10:24] Varun Mohan: That's, that's nuts, but I totally believe it given, given the stats we have too. There's this flips in your head once you start using products like this, where in the beginning there's like, there's like skepticism, like how, how valuable can it be? And suddenly now like user behavior fundamentally changes so that now when I need to write a function, I'm like documenting my code more because I think it's prompting the model better, right?[00:10:43] So there's like this crazy thing where it's a self-fulfilling prophecy where when you get more value from it, more of your code is generated. From co-pilot[00:10:50] swyx: just to walk through the creation process, I actually assumed that you would have grabbed your data from the pile, which is the Luther ai, uh, open source, uh, code information.[00:11:00] But apparently you scraped your own[00:11:01] Varun Mohan: stuff. Yeah. We ended up basically using a lot of open, I guess, permissively licensed code, uh, in the public internet, mainly because I think also the pile is, is fairly a small subset. Uh, I think maybe after we started there was the, that was also came to be, but for us, we had a model for ourselves even before that, uh, was the point.[00:11:21] Ah, okay. So the timing was just a little bit off. Yeah, exactly. Exactly. But it's awesome work. It's, it seems like there's a good amount of work that's getting done Decentrally. Yeah. Which is a little bit surprising to me because I'm like more bullish on everyone needs to get together in a room and make stuff happen.[00:11:35] Like we're all in person in Mountain View. But yeah, no, it's pretty impressive. Yeah. Luther in general, like everything they've done, I'm pretty impressed with it. Yeah, and we're[00:11:42] swyx: gonna talk about that. Cause I, I didn't know you were that involved in the community[00:11:45] Varun Mohan: that early on I wasn't involved. It was more of like a, I was watching and maybe commenting from time to time.[00:11:50] So they're a very special community for sure. Yeah,[00:11:52] swyx: yeah, yeah. That's true. That's true. My impression is a bunch of you are geniuses. You sit down together in a room and you. , get all your data, you train your model, like everything's very smooth sailing. Um, what's wrong with that[00:12:02] Varun Mohan: image? Yeah, so probably a lot of it just in that a lot of our serving infrastructure was already in place, Uhhuh before then.[00:12:09] So like, hey, we were able to knock off one of these boxes that I think a lot of other people maybe struggle with. The open source serving offerings are just, I will say, not great in that. That they aren't customized to transformers and these kind of workloads where I have high latency and I wanna like batch requests, and I wanna batch requests while keeping latency low.[00:12:29] Mm-hmm. , right? One of the weird things about generation models is they're like auto regressive, at least for the time being. They're auto aggressive. So the latency for a generation is a function of the amount of tokens that you actually end up generating. Like that's like the math. And you could imagine while you're generating the tokens though, unless you batch a.[00:12:46] It's gonna end up being the case that you're not gonna get great flop utilization on the hardware. So there's like a bunch of trade offs here where if you end up using something completely off the shelf, like one of these serving thing, uh, serving frameworks, you're gonna end up leaving a lot of performance on the table.[00:13:00] But for us, we were already kind of prepared. To sort of do that because of our infrastructure that we had already built up. And probably the other thing to sort of note is early on we were able to leverage open source models, sort of bootstrap it internally within our company, but then to ship, we finally had some requirements like, Hey, we want this model to have fill in the middle capabilities and a bunch of other things.[00:13:20] And we were able to ship a model ourselves. So we were able to time it so that over the course of multiple months, different pieces were like working out properly for us. So it wasn't. . You know, we started out and we were just planning the launch materials. The moment we started there was like maybe some stuff that was already there, some stuff that we had already figured out how to train models at scale internally.[00:13:38] So we were able to just leverage that muscle very quickly. I think the one[00:13:41] swyx: thing that you had figured out from the beginning was that it was gonna be free forever. Yeah. Yeah, co-pilot costs $10[00:13:47] Varun Mohan: a month. Co-pilot costs $10 a month. I would argue significantly more value than $10 a month. The important thing for us though, was we are gonna continue to build more great products on top of code completion.[00:13:58] We think code completion is maybe day one of what the future looks like. And for that, clearly we can't be a product that's like we're $10 a month and we're adding more products. We want a user base that loves using us. And we'll continue to stay with us as we continue to layer on more products. And I'm sure we're gonna get more users from the other products that we have, but we needed some sort of a differentiator.[00:14:17] And along the way we realized, hey, we're pretty efficient at running these workloads. We could probably do this. Oh, so it wasn't,[00:14:23] swyx: it was a plan to be free from the start. You just[00:14:25] Varun Mohan: realized we, yeah. We realized we could probably, if we cut and optimized heavily, we could probably do this properly. Part of the reasoning here was we were confident we could probably build a pro tier and go to the enter.[00:14:35] But for now, originally when we, when we started, we weren't like, we're just gonna go and give every, all pieces of software away for free. That wasn't like sort of the goal there. And[00:14:43] swyx: since you mentioned, uh, adoption and, you know, traction and all that, uh, what can you disclose about user growth? Yeah, user adoption.[00:14:50] Varun Mohan: Yeah. So right now we have. We probably have over 10,000 users and thousands of daily actives, and people come back day over day. Our growth is like around, you know, four to 5% day over day right now. So all of our growth right now is sort of like word of mouth, and that's fundamentally because like the product is actually one of those products where.[00:15:08] Even use COT and use us, it's, it's hard to tell the difference actually. And a lot of our users have actually churned off of cot isn't Yeah. I,[00:15:14] swyx: I swept Yeah. Yeah. To support you guys, but also also to try[00:15:17] Varun Mohan: it out. Yeah, exactly. So the, the crazy thing is it wasn't like, Hey, we're gonna figure out a marketing motion of like, Going to the people that have never heard of co-pilot and we're gonna like get a bunch of users.[00:15:27] We wanted to just get users so that in our own right we're like a really great product. Uh, and sort of we've spent a lot of engineering time and obviously we co-wrote a blog post with you, Sean, on this in terms of like, there's a lot of engineering work, even beyond the latency, making sure that you can get your cost down to make a product like this actually work.[00:15:44] swyx: Yeah. That's a long tail of, of stuff that you referenced,[00:15:47] Varun Mohan: right? Yes. Yeah, exactly.[00:15:48] swyx: And you, you said something to the order of, um, and this maybe gets into co-pilot for X uh, which is something that everybody is keen about cuz they, they see the success of co-pilot. They're like, okay, well first of all, developer tools, there's more to do here.[00:16:00] And second of all, let's say the co-pilot idea and apply for other disciplines. I don't know if you wanna Yeah.[00:16:06] Varun Mohan: There's[00:16:06] Alessio Fanelli: gonna some. Key points that, that you touched on. Um, how to estimate, inference a scale, you know, and the latency versus quality trade-offs. Building on first party. So this is free forever because you run your own models, right?[00:16:19] That's right. If you were building on open ai, you wouldn't be able to offer it for free real-time. You know, when I first use coding, It was literally the same speed as Copi is a little bit[00:16:29] swyx: faster. I don't know how to quantify it,[00:16:31] Varun Mohan: but we are faster. But it's one of those things that we're not gonna like market as that's the reason because it's not in and of itself a right for you to like, I'm just gonna be open with you.[00:16:39] It's not a reason for you to like suddenly turn off a copilot where if our answers were trash, uh, but we were faster. You know what I mean? But your focus[00:16:46] Alessio Fanelli: was there. We used the alpha, I think prem on our discord came to us and say, you guys should try this out. So it was really fast. Even then, prompt optimization is another big thing, and model outputs and UX kind of how you bring them together.[00:17:00] Which ones of these things are maybe like the one or two that new founders should really think about first?[00:17:07] Varun Mohan: Yeah, I think, I think my feeling on this is unless you are ex, you probably should always bootstrap on top of an existing a. Because like even if you were to, the only reason why we didn't is because we knew that this product was actually buildable.[00:17:22] Probably if we worked hard enough to train a model, we would actually be able to build a great product already. But if you're actually going out and trying to build something from scratch, unless you genuinely believe, I need to fine tune on top of, you know, terabytes of data terabyte is a very large amount of data, but like tens of gigabytes of data.[00:17:37] Probably go out and build on top of an API and spend most of your time to make it so that you can hit that quality latency trade off properly. And if I were to go out and think about like the three categories of like an LM product, it's probably like latency, quality, and correct ability. The reality is, you know, if I were to take a product like co-pilot or Coum, the latency is very low.[00:17:58] The quality I think, is good enough for the task, but the correct ability is, is very easy. Credibility. What, what is correct ability? Correct ability means, let's say the quality is not there. Like you consider the the case where, The answer is wrong. How easy is it for your user to actually go and leverage parts of the generation?[00:18:16] Maybe a, a concrete example. There's a lot of things people are excited about right now where I write a comment and it generates a PR for me, and that's like, that's like really awesome in theory. I think that's like a really cool thing and I'm sure at some point we will be able to get there. That will probably require an entirely new model for what it's worth that's trained on diffs and commits and all these other things that looks at like improvements and code and stuff.[00:18:37] It's probably not gonna be just trained on generic code. But the problem with those, those sort of, I would say, applications are that, let's suppose something does change many files, makes large amounts of changes. First of all, it's guaranteed not gonna be. Because even the idea of like reviewing the change takes a long time.[00:18:54] So if the quality and the correct ability is just not there, let's say you had 10 file, a 10 file change and you modified like, you know, file two and four, and those two modifications were consistent, but the other eight files were not consistent. Then suddenly the correct ability is like really hard.[00:19:10] It's hard to correct the output of the model. And so the user interface is 100% really important. But maybe until you get the latency down or the correct ability, like correct ability, like a lot better, it's probably not gonna be shippable. And I think that's what you gotta spend your time focusing on.[00:19:26] Can you deliver a product that is actually something users want to use? And I think this is why I was talking about like demo. It's like very easy to hand to handpick something that like works, that works for a demo, exceedingly hard for something that has large scope, like a PR to work consistently. It will take a lot of engineering effort to make it work on small enough chunks so that a user is like, wow, this is value generative to me.[00:19:49] Because eroding user trust or consumer trust is very easy. Like that is, it is is much, much, it's very easy to erode user trust versus enterprise. So just be mindful of that, and I think that's probably like the mantra that most of these companies need to operate under. Have you done any[00:20:05] Alessio Fanelli: analysis on. What the ratio between code generated and latency is.[00:20:11] So you can generate one line, but you could also generate the whole block. You can generate Yeah. A whole class and Yeah. You know, the more you generate the, the more time it takes. Like what's the sweet spot that, that you[00:20:21] Varun Mohan: found? Yeah, so I think there was a great study and, and I'm not sure if it's possible to link it, but there was a great study about co-pilot actually that came out.[00:20:28] Basically what they said was there were two ways that developers usually develop with a code assistant technology. They're either in what's called like acceleration mode or exploration mode. And exploration mode is basically you're in the case where you don't even know what the solution space for the function is.[00:20:43] and you just wanna generate a lot of code because you don't even know what that looks like. Like it might use some API that you've never heard of. And what you're actually doing at that point is like you're writing a clean comment, just wishing and praying that you know, the generation is long enough and gets you, gets you far enough, right?[00:20:57] acceleration mode is basically you are doing things where you are very confident in what you're doing and effectively. Code gives you that muscle so that you can basically stay in flow state and you're not thinking about like exactly what the APIs look like, but push comes to shove. You will figure out what the APIs look like, but actually like mentally, it takes off like a load in your head where you're like, oh wow.[00:21:18] Like I can just do this. The intent to execution is just a lot, a lot lower there. And I think effectively you want a tool that captures that a little bit. And we have heuristics in terms of captur. Whether or not you're in acceleration versus exploration mode. And a good heuristic is, let's say you're inside like a basic block of a piece of code.[00:21:37] Let's say you're inside a a block of code or an IF statement, you're probably already in acceleration mode and you would feel really bad if I started generating the ELs clause. Because what happens if that else causes really wrong? That's gonna cause like mental load for you because you are the way programmers think.[00:21:51] They only want to complete the if statement first, if that makes sense. So there are things where we are mindful of like how many lines we generate if you use the product, like multi-line generations happen and we are happy to do them, but we don't want to do them when we think it's gonna increase load on developers, if that makes sense.[00:22:07] That[00:22:07] Alessio Fanelli: makes sense. So co-pilot for x. , what are access that you think are interesting for people to build[00:22:13] Varun Mohan: in? Didn't we see some, some tweet recently about Harvey ai, uh, company that, that is trying to sell legal? It's like a legal, legal assistance. That's, that's pretty impressive, honestly. That's very impressive.[00:22:23] So it seems like I would really love to see what the product looks like there, because there's a lot of text there. You know, looking at bing, bing, ai, like, I mean, it's, it's pretty cool. But it seems like groundedness is something a lot of these products struggle with, and I assume legal, if there's one thing you want them to.[00:22:39] To get right. It's like the groundedness. Yeah.[00:22:42] swyx: Yeah. I've made the analogy before that law and legal language is basically just another form of programming language. You have to be that precise. Yes. Definitions must be made, and you can scroll to find the definition. It's the same thing. Yes. ,[00:22:55] Varun Mohan: yes. Yeah. But like, I guess there's a question of like comprehensiveness.[00:22:59] So like, let's say, let's say the only way it generates a suggestion is it provides like, you know, citations to other legal. You don't want it to be the case that it misses things, so you somehow need the comprehensiveness, but also at the same time, you also don't want it to make conclusions that are not from the site, the things at sites.[00:23:15] So, I don't know, like that's, that's very impressive. It's clear that they've demonstrated some amount of value because they've been able to close a fairly sizable enterprise contract. It was like a firm with 3,500 lawyers, something nuts, honestly. Very cool. So it's clear this is gonna happen, uh, and I think people are gonna need to be clever about how they actually make it work.[00:23:34] Within the constraints of whatever workload they're operating in. Also, you, you guys[00:23:37] swyx: are so good at trading stuff, why don't you, you try[00:23:39] Varun Mohan: cloning it. Yeah. So I think, I think that's, that's, uh, preview the roadmap. Yeah, yeah, yeah, yeah. No, no, no, but I'm just kidding. I think one of the things that we genuinely believe as a startup is most startups can't really even do one thing properly.[00:23:52] Mm-hmm. Focus. Yeah. Yeah. Usually doing one thing is really hard. Most companies that go public have like maybe a couple big products. They don't really have like 10, so we're under no illusions. Give the best product experience, the amount of engineering and attention to detail, to build one good product as hard.[00:24:08] So it's probably gonna be a while before we even consider leaving code. Like that's gonna be a big step because the amount of learning we need to do is gonna be high. We need to get users right. We've learned so much from our users already, so, yeah, I don't think we'd go into law anytime soon.[00:24:22] swyx: 3,500 lawyers with Ellen and Ry, uh, is, is is apparently the, the new[00:24:27] Varun Mohan: That's actually really big.[00:24:28] Yeah. Yeah. I can congrat.[00:24:29] swyx: Yeah, it's funny cuz like, it seems like these guys are moving faster than co-pilot. You know, co-pilot just launched, just announced enterprise, uh, like co-pilot for teams or co-pilot for Enterprise. Yeah. After like two years of testing.[00:24:40] Varun Mohan: Yeah, it does seem like the co-pilot team has built a very, very good product.[00:24:44] Um, so I don't wanna like say anything, but I think it is the case to startups will be able to move faster. I feel like that is true, but hey, like GitHub has great distribution. Whatever product they do have, they will be able to sell it really. Shall[00:24:56] swyx: we go into model numbers and infra estimates? our favorite[00:25:01] Varun Mohan: topics.[00:25:02] Nice small models. Nice.[00:25:04] swyx: So this is, um, relevant to basically I'm researching a lot of skilling law stuff. You have a lot of thoughts. You, you host paper discussions[00:25:12] Varun Mohan: in your team. Yeah, we, we try to like read papers that we think are really interesting and relevant to us. Recently that's been, there's just a fire hose of papers.[00:25:21] You know, someone even just curating what papers we should read internally as a company. Yeah, I think, I think there's, there's so much good content[00:25:28] swyx: out there. You should, you guys should have a podcast. I mean, I told you this before. Should have a podcast. Just, just put a mic near where, where you guys are[00:25:33] Varun Mohan: talking.[00:25:34] We gotta, we gotta keep developing coding though, . No, but you're doing this discussion[00:25:38] swyx: anyway. You[00:25:38] Varun Mohan: might as well just, oh, put the discussion on a podcast. I feel like some of the, some of the thoughts are raw, right? Like, they're not gonna be as, as nuanced. Like we'll just say something completely stupid during our discussions.[00:25:48] I don't know, , maybe that's exciting. Maybe that's, it's kinda like a justin.tv, but for ML papers, Okay, cool. I watched that.[00:25:55] swyx: Okay, so co-pilot is 12 billion parameters. Salesforce cogen is up to 16. G P t three is 175. GP four is gonna be 100 trillion billion. Yeah. So what, what we landed on with you is with, uh, with Cilla, is that we now have an idea of what compute optimal data scaling is.[00:26:14] Yeah. Which is about 20 times parameters. Is that intuitive to you? Like what, what did that[00:26:18] Varun Mohan: unlock? I think basically what this shows is that bigger models are like more data efficient, like given the same number of tokens, a big model like trained on the same number of tokens. A bigger model is like, is gonna learn more basically.[00:26:32] But also at the same time, the way you have to look at it is there are more flops to train a bigger model on the same number of tokens. So like let's say I had a 10 billion parameter model and I trained it on on 1 million tokens, but then I had a 20 billion parameter model at the end of it will be a better.[00:26:47] It will have better perplexity numbers, which means like the probability of like a prediction is gonna be better for like the next token is gonna be better. But at the end of it, you did burn twice the amount of compute on it. Right? So Shinto is an interesting observation, which says if you have a fixed compute budget, And you want the best model that came out of it because there's like a difference here where a model that is, that is smaller, trained on the same number of tokens as fewer flops.[00:27:12] There's a a sweet spot of like number of tokens and size a model. I will say like people probably like. Are talking about it more than they should, and, and I'll, I'll explain why, but it's a useful result, which is like, let's say I have, you know, some compute budget and I want the best model. It tells you what that, what you should generate.[00:27:31] The problem I think here is there is a real trade off of like, you do need to run this model somewhere. You need to run it on a piece of hardware. So then it comes down to how much memory does that piece of hardware have. Let's say for a fixed compute budget, you could train a 70 billion parameter. What are you gonna put that on?[00:27:47] Yeah, maybe you could, could you put that on an 80 gig, A 100? It would be a stretch. You could do things like f, you know, in eight F p a, to reduce the amount of memory that's on the box and do all these other things. But you have to think about that first, right? When you want to go out and train that model.[00:27:59] The worst case is you ended up training that mo, that model, and you cannot serve it. So actually what you end up finding is for a lot of these code completion models, they are actually what you would consider over-trained . So by that I mean like, let's look at a model like Cogen. It's actually trained on, I believe, and, and I could be wrong by, you know, a hundred billion here or there.[00:28:18] I got some data. Oh, okay. Let's look at the 3 billion parameter model. It's a 2.7. I think it's actually a 2.7 billion barometer model. It's weird because they also trained on natural language on top of code, but it's trained on hundreds of billions of tokens. If you applied that chinchilla, Optimization to it, you'd be like, wow, this is, this is a stupid use of compute.[00:28:36] Right? Because three, they should be going to 60, any anything more than 60. And they're like, they should have just increased the model size. But the reality is if they had like the compute optimal one might not be one that's easy to serve, right? It could just have more parameters. And for our case, our models that we train internally, they might not be the most compute.[00:28:56] In other words, we probably could have had a better model by making it larger, but the trade off would've been latency. We know what the impact of having higher latency is, and on top of that, being able to fit properly on our hardware constraints would've also been a concern.[00:29:08] swyx: Isn't the classic stopping point when you, you see like loss kind of levels off.[00:29:12] Right now you're just letting chinchilla tell you,[00:29:16] Varun Mohan: but like you should just look at loss. The problem is the loss will like continue to go down. It'll just continue to go down like, like in a, in a way that's like not that pleasing. It's gonna take longer and longer. It's gonna be painful, but it's like one of those things where if you look at the perplexity number of difference between.[00:29:31] Let's say a model that's like 70 billion versus 10 billion. It's not massive. It's not like tens of percentage points. It's like very small, right? Mm. The reality is here, like, I mean this comes down to like IQ of like these models in some sense, like small wins at the margins are massive wins in terms of iq.[00:29:47] Like it's harder to get those and they don't look as big, but they're like massive wins in terms of reasoning. They can now do chain of thought, all these other things. Yeah, yeah, yeah.[00:29:55] swyx: It's, and, and so apparently unlocked around the[00:29:57] Varun Mohan: 20 billion. Yes. That's right. Some kind of magic. Yeah. I think that was from the UL two or maybe one of those land papers.[00:30:03] Any thoughts on why? Like is there is? I don't know. I mean, emergence of intelligence, I think. I think maybe one of the things is like we don't even know, maybe like five years from now of what we're gonna be running are transformers. But I think it's like, we don't, we don't 100% know that that's true. I mean, there's like a lot of maybe issues with the current version of the transformers, which is like the way attention works, the attention layers work, the amount of computers quadratic in the context sense, because you're like doing like an n squared operation on the attention blocks basically.[00:30:30] And obviously, you know, one of the things that everyone wants right now is infinite context. They wanna shove as much prop as possible in here. And the current version of what a transformer looks like is maybe not ideal. You might just end up burning a lot of flops on this when there are probably more efficient ways of doing it.[00:30:45] So I'm, I'm sure in the future there's gonna be tweaks to this. Yeah. Uh, but it is interesting that we found out interesting things of like, hey, bigger is pretty much always better. There are probably ways of making smaller models significantly better through better data. That is like definitely true. Um, And I think one of the cool things that the stack showed actually was they did a, like a, I think they did some ablation studies where they were like, Hey, what happens if we do, if we do decontamination of our data, what happens if we do de-duplication?[00:31:14] What happens if we do near dup of our data and how does the model get better? And they have like some compelling results that showcase data quality really matters here, but ultimately, Yeah, I think it is an interesting result that at 20 billion there's something happening. But I also think like some of these things in the future may look materially different than what they look like right now.[00:31:30] Hmm. Do you think[00:31:31] Alessio Fanelli: the token limitation is actually a real architectural limitation? Like if you think about the tokens need as kind of like atic, right? Like once you have. 50,000 tokens context, like 50,000 or infinite. For most use cases, it's like the same. Where do you think that number is, especially as you think about code, like some people have very large code bases, there's a lot.[00:31:53] Have you done any work there to figure out where the sweet[00:31:55] Varun Mohan: spot is? Yeah, look, I think what's gonna really end up happening is if people come up with a clever way and, and it, there was some result research that I believe came out of Stanford. I think the team from the Helm group, I think came out with some architecture that looks a little bit different than Transformers, and I'm sure something like this will work in the future.[00:32:13] What I think is always gonna happen is if you find a cheap way to embed context, people are gonna figure out a way to, to put as much as possible in because L LM so far have been like virtually stateless. So the only thing that they have beyond fine tuning is like just shoveling everything you can inside.[00:32:28] And there are some interesting papers, like retro, actually there are maybe some interesting pieces of thought like ideas that have come out recently. Yeah, let's go through them. So one of the really interesting ideas, I think is retro. It's this paper that came out of DeepMind and the idea is actually, let's say you send out, you send out, uh, a prompt.[00:32:44] Okay? Send out a prompt. You compute the burt embedding of that. And then you have this massive embedding database. And by massive, I'm not talking about like gigabytes, I'm talking about terabytes. Like you have, geez, you actually have 10 times the number of tokens as what was used to train the model. So like, let's say you had a model that was trained on a trillion tokens, you have a 10 trillion embed, uh, like embedding database.[00:33:04] And obviously Google has this because they have all content that ever existed in humanity and they have like the best data set and sort of, they were able to make one of these, uh, embedding databases. But the idea here, which is really cool, is you end. Taking your prompt, computing, the bird, embedding you find out the things that were nearby.[00:33:20] So you do roughly like a semantic search or an embedding search within that. And then you take those, you take the documents that were from those embeddings and you shove those in the model too, in what are called like cross chunked attention. So you like shove them in the model with it as well.[00:33:34] Suddenly now the model is able to take in external. Which is really exciting actually, because suddenly now you're able to get dynamic context in, and the model in some sense is deciding what that context is. It's not deciding it completely. In this case, because the Bert model in this case was actually frozen.[00:33:50] It wasn't trained with the retro model as well, but. The idea is you're somehow adding or augmenting context, which I think is like quite exciting. There's probably two futures. Either context becomes really cheap. Right now it's quadratic. Maybe there's a future where it becomes linear in the, in the size of the context, but the future might actually be the model itself dictates, Hey, I have this context.[00:34:10] You have this data source. Give me this. The model itself is going out into your database and like being like, I want this information, and this is kind of like. What Bing search is looking like. Right? Or bing chat is sort of looking like where it's like I, the model is probably, there's probably some model that's saying I want this information.[00:34:27] And that is getting augmented into the context. Now the model itself knows what context it sort of has and it can sort of like build a state machine of sort of what it needs. And that's probably what the future of this looks like. So you, you[00:34:37] swyx: predict monster embedding database[00:34:39] Varun Mohan: companies? Probably Monster embedding database companies or, yeah.[00:34:43] The model in some sense will need to talk to, Talk to these embedding databases. I'm actually not convinced that the current breed of embedding database companies are like ready for what the future sort of looks like. I think I'm just looking at their pricing, how much it costs per gigabyte and it's prohibitive at the scale we're talking about, like let's say you actually did want to host a 10 terabyte embedding database.[00:35:03] A lot of them were created, let's say two years ago, two, three years ago, where people were like, you know, embedding databases are small and they need to make the cost economics work. But maybe, yeah, there's probably gonna be a big workload there. I will just say for us, we will probably just build this in-house to start with, and that's because I think the technology probably isn't there.[00:35:20] And I think that the technology isn't there yet. Like waiting on point solutions to come up is a lot harder, um, than probably building it up. The way I, I like to think about this is probably the world looks on the LM space. Looks like how the early internet days were, where I think the value was accrued to probably like Google and Google needed to figure out all the crazy things to make their workload work.[00:35:41] And the reason why they weren't able to outsource is, is no one else was feeling the pain. ,[00:35:46] swyx: they're just solving their own pain points. They're just solving their own pain points. They're so far ahead of everyone else. Yes, yes. And just wait[00:35:50] Varun Mohan: for people to catch up. Yes. Yes. And that's maybe different than how things like Snowflake look where the interface has been decided for what SQL looks like 50 years ago.[00:35:58] And because of that, you can go out and build the best database and Yeah, like everyone's gonna be like, this doesn't make my beer taste better. And buy your database basically. That's[00:36:08] swyx: a great reference, by the way. Yeah. We have some friends of the, the pod that are working on embedding database, so we'll try to connect you Toroma[00:36:14] Varun Mohan: and see.[00:36:14] Yeah. Oh, I actually know Anton. I worked with him at Neuro. Oh. Although, there you go. Yeah. Uh, what do you, well, what do you think about, I mean,[00:36:20] swyx: so chromas pivoting towards an embedding[00:36:22] Varun Mohan: database. I think it's an interesting idea. I think it's an interesting idea. I wonder what the early set of workloads that.[00:36:27] They will hit our, and you know what the scaling requirements are. This is maybe the classic thing where like, the teams are great, but you need to pick a workload here that you care about the most. You could build anything. You could build anything. When you're an infrastructure company, you can go in, if I was selling, serving in for, I could build, serving for like linear aggression.[00:36:44] I could build this, but like, unless you hit the right niche for the end user, it's gonna be. . So I think it, I'm excited to see what comes out and if they're great, then we'll use it. Yeah.[00:36:54] swyx: I also like how you slowly equated yourself to Google there. Oh, we're not, we're not Google. You're, you're gonna be the Google of ai.[00:37:00] Varun Mohan: We're definitely, we're definitely not Google. But I was just saying in terms of like, if you look at like the style of companies that came out. Yeah. You know? Absolutely. Or maybe we should live in the cutting edge in[00:37:08] swyx: the future. Yeah. I think that's the pitch.[00:37:10] Varun Mohan: Okay, thanks for b***h us.[00:37:13] Alessio Fanelli: So you just mentioned the older vector embedding source are kind of not made for the L l M generation of compute size.[00:37:21] what does l LM ops look like? You know, which pieces need to be drastically different? Which ones can we recycle?[00:37:27] Varun Mohan: Yeah. One of the things that we've found, like in our own thing of building code that's been just shows how much is missing, and this is the thing where like, I don't know how much of this you can really outsource, which is like we needed to build eval infrastructure.[00:37:40] That means how do you build a great code? And there are things online like human eval, right? And uh, I was telling, which is the benchmark telling Sean about this, the idea of human eval is really neat for code. The idea is you provide a bunch of functions with Docstrings and the eval instead of being, did you predict next token?[00:37:56] It's like, did you generate the entire function and does the function run correctly against a bunch of unit tests? Right. And we've built more sophisticated evals to work on many languages, to work on more variety of code bases. One of the issues that ends up coming up with things like human eval is contam.[00:38:12] Because a lot of these, uh, things that train models end up training on all of GitHub GitHub itself has human eva, so they end up training on that. And then the numbers are tiny, though. It's gonna be tiny, right? But it doesn't matter if it's tiny because it'll just remember it. It'll remember that it's, it's not that it's that precise, but it will, it's like, it's basically like mixing your, your training and validation set.[00:38:32] It's like, oh, yeah, yeah, yeah, yeah. But we've seen cases where like online where someone is like, we have a code model that's like, they we're like, we did this one thing, and HU and human eval jumped a ton and we were just like, huh, did human eval get into your data set? Is that really what happened there?[00:38:46] But we've needed to build all this eval. And what is shown is data cleaning is massive, but data cleaning looks different by. Like code data cleaning is different than what is a high quality piece of code is probably different than what's a high quality legal document. Yeah. And then on top of that, how do you eval this?[00:39:01] How do you also train it at scale at whatever cost you really want to get? But those are things that the end user is either gonna need to solve or someone else is gonna need to solve for them. And I guess maybe one of the things I'm a little bearish on is if another company comes out and solves eval properly for a bunch of different verticals, what was the company that they were selling to really?[00:39:21] What were they really doing at that point? If they themselves were not eval for their own workload and all these other things? I think there are cases where, let's say for code where we probably couldn't outsource our eval, like we wouldn't be able to ship models internally if we didn't know how to eval, but it's clear that there's a lot of different things that people need to take.[00:39:38] Like, Hey, maybe there's an embedding piece. How large is this embedding database actually need to be? But hey, this does look very different than what classic ML ops probably did. Mm-hmm. . How[00:39:47] Alessio Fanelli: do you compare some of these models? Like when you're thinking about model upgrading and making changes, like what does the testing piece of it internally?[00:39:56] Yeah. For us look like.[00:39:56] Varun Mohan: For us, it's like old school AB testing. We've built like infrastructure to be able to say, ramp up users from one to 10 to. 50% and slowly roll things out. This is all classic software, uh, which[00:40:09] swyx: you do in-house. You don't, you don't buy any[00:40:10] Varun Mohan: services. We don't buy services for that.[00:40:13] There are good services, open source services that help you just don't need them. Uh, yeah, I think that's just like not the most complicated thing for us. Sure. Basically. Yeah. Uh, but I think in the future, maybe, we'll, obviously we use things like Google Analytics and all this other stuff, but Yeah. For things of ramping our models, finding out if they're actually better because the eval also doesn't tell the whole story because also for us, Even before generating the prompt, we do a lot of work.[00:40:36] And the only way to know that it's really good across all the languages that our users need to tell us that it's actually good. And, and they tell us by accepting completions. So, so GitHub[00:40:44] swyx: co-pilot, uh, the extension does this thing where they, they like, they'll set a timer and then within like five minutes, 10 minutes, 20 minutes, they'll check in to see if the code is still there.[00:40:54] I thought it was a[00:40:54] Varun Mohan: pretty creative way. It's, it's a very, it's honestly a very creative way. We do do things to see, like in the long term, if people did. Accept or write things that are roughly so because they could accept and then change their minds. They could accept and then change their minds. So we, we are mindful of, of things like that.[00:41:09] But for the most part, the most important metric is at the time, did they actually, did we generate value? And we want to know if that's true. And it's, it's kind of, it's honestly really hard to get signal unless you have like a non-trivial amount of usage, non-trivial, meaning you're getting, you're doing hundreds of thousands of completions, if not millions of completions.[00:41:25] That sounds like, oh wow. Like, that's like a very small amount. But like it's classic. Maybe like if you look at like when I used to be an intern at Quora, like, you know, now more than seven, eight years ago. When I was there, I like shipped a change and then Cora had like millions of daily actives and then it looked like it was good, and then a week later it was just like way worse.[00:41:43] And how is this possible? Like in a given hour we get like hundreds of thousands of interaction, just like, no, you just need way more data. So this is like one of those things where I think having users is like genuinely very valuable to us, basically. Users is all you need. . Yeah.[00:41:59] swyx: Um, by the way, since you brought out Quora, have you tried po any, any thoughts[00:42:03] Varun Mohan: on po I have not actually tried po I've not actually tried.[00:42:05] I[00:42:05] swyx: mean, it seems like a question answering website that's been around for 20 years or something. Would be very, would be very good at question answering. Yeah.[00:42:12] Varun Mohan: Also Adam, the ceo, is like incredibly brilliant. That guy is like insanely smart, so I'm sure they're gonna do,[00:42:18] swyx: they have accidentally built the perfect like data collection company for For qa.[00:42:22] Varun Mohan: Yeah. . It takes a certain kind of person to go and like cannibalize your original company like the in, I mean, it was kinda stagnant for like a few years. Yeah, that's probably true. That's[00:42:31] swyx: probably true. The observation is I feel like you have a bias to its domain specific. , whereas most research is skewed towards, uh, general models, general purpose models.[00:42:40] I don't know if there's like a, a deeper insight here that you wanna go into or, or not, but like, train on all the things, get all the data and you're like, no, no, no. Everyone needs like customized per task,[00:42:49] Varun Mohan: uh, data set. Yeah. I think I'm not gonna. Say that general intelligence is not good. You want a base model that's still really good and that's probably trained on normal text, like a lot of different content.[00:43:00] But I think probably one thing that old school machine learning, even though I'm like the kind of person that says a lot of old school machine learning is just gonna die, is that training on a high quality data set for your workload is, is always gonna yield better results and more, more predictable results.[00:43:15] And I think we are under no illusions that that's not the case. Basical. And[00:43:19] swyx: then the other observation is bandwidth and connectivity, uh, which is not something that people usually think about, but apparently is a, is a big deal. Apparently training agreed in the synchronous needs, high GPU coordination.[00:43:29] These are deleted notes from Sam Altman talking about how they think about training and I was like, oh yeah, that's an insight. And[00:43:34] Varun Mohan: you guys have the same thing. Yeah. So I guess for, for training, you're right in that it is actually nuts to think about how insane the networks are for NVIDIA's most recent hardware, it's.[00:43:46] For the H 100 boxes, you shove eight of these H 100 s on a. Between two nodes. The bandwidth is 3,200 gigabits a second, so 400 gigabytes a second between machines. That's like nuts when you just sit and think about it. That's like double the memory bandwidth of what a CPU has, but it's like between two machines.[00:44:04] On top of that, within the machine, they've created this, this fabric called envy link that allows you to communicate at ultra low latency. That's even lower than P C I E. If you're familiar, that's like the communication protocol. . Yeah, between like the CPU and the other devices or other P C I E devices.[00:44:21] All of this is to make sure that reductions are fast, low latency, and you don't need to think about it. And that's because like a lot of deep learning has sort of evolved. Uh, training has evolved to be synchronous in the OG days. There is a lot of analysis in terms of how good is asynchronous training, which is like, Hey, I have a node, it has a current state of the model.[00:44:39] It's gonna update that itself locally, and it'll like every once in a while, go to another machine and update the weights. But I think like everyone has converged to synchronous. I'm not exactly sure. There's not a lot of good research on asynchronous training right now. Or maybe there is an, I haven't read it.[00:44:52] It's just that there isn't as much research because people are just like, oh, synchronous works. Uh, and the hardware is continually upleveled to handle[00:44:59] swyx: that. Yeah. It was just un unintuitive to me cuz like the whole purpose of GPUs could train things. A lot of things in parallel. Yes.[00:45:05] Varun Mohan: But the crazy thing is also, maybe I can, I can give some dumb math here.[00:45:09] Sure. Here, which is that, uh, let's go with uh, G B T three, which is like 170 billion per. The optimizer state, so while you're training is 14 times the size of the model, so in this case, if it's like 170 billion parameters, it's probably, I'm not great at mental math here, but that's probably around 2.5 terabytes to just store the optimizer state.[00:45:30] That has gotta be sharded across a lot of machines. Like that is not a single gpu. Even if you take an H 100 with 80 gigs to just shard that much, that's like 40, at least 30 machines. So there's like something there where these things need to communicate with each other too.[00:45:44] swyx: You need to vertically scale horizontally.[00:45:46] Varun Mohan: Yeah. You gotta co-located, you gotta somehow feel like you have this massive, the, the ideal programming paradigm is you feel like you have this massive computer. That has no communication, you know, overhead at all, but it has like infinite computer and infinite memory bandwidth.[00:45:59] swyx: That's the AI cluster. Um, okay, well, uh, we want to head to the questions.[00:46:05] Alessio Fanelli: So favorite AI product that you are not[00:46:08] Varun Mohan: building? Yeah, I'm friends with some of the folks at Mid Journey and I really think the Mid Journey product is super cool, especially seeing how the team is iterating and the quality of generations. It consistently gets upleveled. I think it's like quite neat and I think internally at at exa functional, we've been trying out mid Journey for like random content to like generate images and stuff.[00:46:26] Does it bother[00:46:26] swyx: you that they have like a style. I don't know. It, it seems like they're hedging themselves into a particular, like you want mid journey art, you go there.[00:46:33] Varun Mohan: Yeah. It's a brand of art. Yeah, you're right. I think they do have a style, but it seems more predictably good for that style. Okay. So maybe that's too, so just get good at, uh, domain specific thing.[00:46:41] Yeah. Yeah. maybe. Maybe I, maybe I'm just selling, talking to a booker right now. . Yeah. Uh, okay.[00:46:46] swyx: Uh, next question. Uh, favorite AI people and[00:46:48] Varun Mohan: communities? Yeah, so I think I mentioned this before, but I think obviously the open. The opening eye folks are, are insane. Like we, we only have respect for them. But beyond that, I think Elu is a pretty special group.[00:46:59] Especially it's been now probably more than a year and a half since they released like G P T J, which was like back when open source G PT three Curri, which was comparable. And it wasn't like a model where like, It wasn't good. It was like comparable in terms of perplexity to GT three curity and it was trained by a university student actually, and it just showed that, you know, in the end, like I would say pedigree is great, but in if you have people that are motivated know how computers work and they're willing to just get their hands dirty, you can do crazy things and that was a crazy project that gave me more hope.[00:47:34] Decentral training being potentially pretty massive. But I think that was like a very cool thing where a bunch of people just got on Discord and were chatting and they were able to just turn this out. Yeah. I did[00:47:42] swyx: not know this until I looked in further into Luther, but it was not a formal organization.[00:47:45] Was a company was a startup. It's not, yeah. Bunch of guys on Discord.[00:47:48] Varun Mohan: They gotta you, they gotta keep you research grant and they somehow just wrote some codes. .[00:47:52] Alessio Fanelli: Yeah. Yeah. Listen to APAC with Connor, who's the person, and basically Open Eye at the time was like, we cannot release G P T because it's like too good and so bad.[00:48:01] And he was like, He actually said he was sick, so he couldn't leave home for like a, a few weeks. So it was like, what else am I gonna do? And ended up
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Sequencing Intro, published by jefftk on August 29, 2022 on LessWrong. I've been working in computational bio for a couple months now and I've been learning a lot. There's still a ton I don't know, but I'm currently at a stage where I've put some pieces together while still remembering what it was like not to understand them, which is often a good time to try to write introductory stuff. Trying to explain things is also a good way of making sure I understand them myself. So, here's an dump of what I've been learning: The biological world primarily stores information with nucleic acids. These are series of nucleotides, often called bases: A, C, G, and T. For example, a strand of nucleotides could look like: The two main kinds of nucleic acid are DNA and RNA. They differ in a few ways, but from a computational perspective they're very similar. Physical RNA will have U instead of T, though in sequencing data you'll often see it with with T anyway. Each base has a complement: A bonds with T, and G with C. A nucleic acid that comprises two bonded strands is called double stranded. Each base in one strand will be bonded to its complement: CCGACTCTGTCACGGGTCTAGCAATGTGGTAAGCA This is the famous double helix and makes for a more stable structure than single stranded nucleic acid. Going from a physical nucleic acid to a sequence on a computer is sequencing, and the reverse is synthesis. I'm only going to talk about the former; I haven't learned much about the latter. The most common sequencing method today is Next Generation Sequencing, commonly called Illumina sequencing after the main vendor. Bases are dyed and the machine reads their colors. The output of sequencing is a large number of short reads. Each read is a sequence of 50-300 bases, usually around 150. In setting up the sequencing run you choose how many bases to read, and different applications will make the most sense with different lengths. Accuracy drops off as you read farther along the strand. Note the lengths we're talking about are way less than the length of a full nucleic acid, which is generally at least thousands of bases. Not getting the full picture is a big downside of this kind of sequencing. Let's get some real data to play with. When people publish a paper that depends on sequencing they generally upload their raw data to the NIH's National Center for Biotechnology Information (NCBI). Here's a paper I've been looking at recently, which sequenced wastewater: RNA Viromics of Southern California Wastewater and Detection of SARS-CoV-2 Single-Nucleotide Variants. If you look down to the "Data availability" section, you'll see: Raw sequencing data have been deposited on the NCBI Sequence Read Archive under accession number PRJNA729801, and representative code can be found at. The GitHub link is helpful for getting metadata (what does each sample represent?) and understanding how they processed it (what tools did they use and how?), but for now we're looking for sequencing data. The accession number is "PRJNA729801", and while we could click through and download it from the NCBI, the user interface on the European mirror (European Nucleotide Archive) is much better. We go to their landing page and enter the accession number: This takes us to a page that describes the data: We want to sanity check the title to make sure we didn't end up with the wrong data set, and "Metatranscriptomic sequencing of Southern California wastewater" sounds about right. Scrolling down there are links: We could download all of this data, but it would be about 80GB compressed. For now, let's just download a single fastq.gz file, at ~150MB: SRR14530724_1.fastq.gz. These files are generally both very large and very repetitive, so they're a natural candidate for compression. The most common option is gzip, and that's what they've us...
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Sequencing Intro, published by jefftk on August 29, 2022 on LessWrong. I've been working in computational bio for a couple months now and I've been learning a lot. There's still a ton I don't know, but I'm currently at a stage where I've put some pieces together while still remembering what it was like not to understand them, which is often a good time to try to write introductory stuff. Trying to explain things is also a good way of making sure I understand them myself. So, here's an dump of what I've been learning: The biological world primarily stores information with nucleic acids. These are series of nucleotides, often called bases: A, C, G, and T. For example, a strand of nucleotides could look like: The two main kinds of nucleic acid are DNA and RNA. They differ in a few ways, but from a computational perspective they're very similar. Physical RNA will have U instead of T, though in sequencing data you'll often see it with with T anyway. Each base has a complement: A bonds with T, and G with C. A nucleic acid that comprises two bonded strands is called double stranded. Each base in one strand will be bonded to its complement: CCGACTCTGTCACGGGTCTAGCAATGTGGTAAGCA This is the famous double helix and makes for a more stable structure than single stranded nucleic acid. Going from a physical nucleic acid to a sequence on a computer is sequencing, and the reverse is synthesis. I'm only going to talk about the former; I haven't learned much about the latter. The most common sequencing method today is Next Generation Sequencing, commonly called Illumina sequencing after the main vendor. Bases are dyed and the machine reads their colors. The output of sequencing is a large number of short reads. Each read is a sequence of 50-300 bases, usually around 150. In setting up the sequencing run you choose how many bases to read, and different applications will make the most sense with different lengths. Accuracy drops off as you read farther along the strand. Note the lengths we're talking about are way less than the length of a full nucleic acid, which is generally at least thousands of bases. Not getting the full picture is a big downside of this kind of sequencing. Let's get some real data to play with. When people publish a paper that depends on sequencing they generally upload their raw data to the NIH's National Center for Biotechnology Information (NCBI). Here's a paper I've been looking at recently, which sequenced wastewater: RNA Viromics of Southern California Wastewater and Detection of SARS-CoV-2 Single-Nucleotide Variants. If you look down to the "Data availability" section, you'll see: Raw sequencing data have been deposited on the NCBI Sequence Read Archive under accession number PRJNA729801, and representative code can be found at. The GitHub link is helpful for getting metadata (what does each sample represent?) and understanding how they processed it (what tools did they use and how?), but for now we're looking for sequencing data. The accession number is "PRJNA729801", and while we could click through and download it from the NCBI, the user interface on the European mirror (European Nucleotide Archive) is much better. We go to their landing page and enter the accession number: This takes us to a page that describes the data: We want to sanity check the title to make sure we didn't end up with the wrong data set, and "Metatranscriptomic sequencing of Southern California wastewater" sounds about right. Scrolling down there are links: We could download all of this data, but it would be about 80GB compressed. For now, let's just download a single fastq.gz file, at ~150MB: SRR14530724_1.fastq.gz. These files are generally both very large and very repetitive, so they're a natural candidate for compression. The most common option is gzip, and that's what they've us...
Episode Notes Last week NetOne announced a price adjustment for its OneFi packages that would have seen 80GB going for ZWL$99,000 which was US$800 at the auction rate. After a day-long public outcry, the state-owned mobile network operator saw the light and reduced the prices to a more realistic range. It was one of the rare instances where social media pressure, forced a state-run company to recant and put things right. TECHZIM'S MARKET APP HAS BEEN ZERO-RATED, NOW YOU DON'T NEED DATA TO BUY NETONE, TELECEL AIRTIME WITH MONEY IN YOUR ECOCASH. YOU CAN ALSO BUY NYARADZO POLICIES, TELONE PACKAGES AND MORE. CLICK THE FOLLOWING LINK TO DOWNLOAD THE APP - https://www.techzim.co.zw/download-techzim-app/ $99,000 for 80 GB?! NetOne's new data prices (February 2022) - https://www.techzim.co.zw/2022/02/99000-for-80-gb-netones-new-data-prices-february-2022/ US$800 for 80GB, NetOne's new One Fi prices are killing its MiFi use case - https://www.techzim.co.zw/2022/02/us800-for-80gb-netones-new-one-fi-prices-are-killing-its-mifi-use-case/ Netone reverses price hikes after public outcry. How much should they charge? - https://www.techzim.co.zw/2022/02/netone-reverses-price-hikes-after-public-outcry-what-should-they-charge/ Find out more at http://www.techzim.co.zw This podcast is powered by Pinecast.
Double Tap Canada is back to save the planet, with a look at upcycling old phones and just what Steven has bought that makes him an eco-warrior… After a few technical issues with the intro music and a group therapy session on what's annoying the team this week, it's on to Steven's retro tech purchases. He's incredibly happy to have picked up an iPod Classic with its 80GB storage, click wheel interface and 30-pin connector. There's no denying it's a blast from the past but there is one problem: it's not accessible. Of course, Steven's solution to this is to also buy a 2012 MacBook Air. Has he gone mad, or is he really an eco-warrior hero? Staying with old tech, it's on to a look at Samsung's Galaxy Upcycling project. Although still in it's beta stage, Samsung announced they will be releasing updates to some of its older smartphones, which will allow them to be used as smart home sensors. Once you've updated your old device, you can either use it as a sound sensor that will notify you when it hears a certain sound - such as a baby crying or dog barking - or an ambient light sensor that can turn off or on lights if it detects certain light levels. Although it's great to see old tech re-purposed like this, we hope Samsung will add more features, such as motion detection via the camera, etc. In Amazon news, some sellers on the platform have been removed due to allegations of trying to solicit positive reviews from people by offering full refunds on products as long as they leave a video or text review. Have you ever experienced this, and can we really trust product reviews? The popular audio-only app Clubhouse has finally come to Android. Well, as long as you live in the U.S., that is. It won't be long until it's available to Android users all over the world, but is it to late? Steven thinks so. In other audio news, Steven and Shaun have been trying out the new Twitter Spaces. Just how does this compare to Clubhouse? Finally, the show wraps up with a chainsaw attack, news on answering calls with just your voice and how Marc uses Air Tags to cheat at hide and seek. It's a tech show … honest.
Se nos retrasa Log Horizon hasta enero, Vuelve Shaman King. El siguiente anime original de Crunchiroll pinta para estar igual de bueno que el actual Tower of God. Goku al estilo de Gibli. Nos cayo el Seek por sorpresa. Venden el PS4 muy varatos pero se les acumulo la gente y los sancionaron. Requisitos para Death Stranding para PC:Para todas las configuraciones: Windows 10, 8gb de RAM y 80GB de espacio para el juego; de ahi en mas requieres un Core i5 4460 o Ryzen 5 1400 (minimo), y una GTX 1050 o RX560 (minimo). Se vienen mas juegos de la saga Kingdom Hearts. Síguenos e interactúa con nosotros en nuestras redes sociales Face, Insta, Twiter y YT, nos encuentras en todas como Lilium World. Redes de Shadow: YT: SHADOWSOUL. Insta: shadow_crosplay. Face: shadowcrosplay. Nos ayudarías bastante compartiendo nuestro contenido. Muchas Gracias nos escuchamos luego BYE.
Big Sounds Future Bass 2K20 it’s a must have sample pack for Future Bass genre. This sample pack brings you a step closer to the modern future bass sound. Inside you will find 5 construction kits, all sounds and samples can work out of the box without to worry about processing them. All kits include Loops of Melodies, Bass, Chords, Leads, Pianos, FXs and vocal chops including FX’s, and of course all Drum Hits and Drum loops. As an extra bonus we included also the Stems files. Load the stems in your DAW to achieve quick structure/arrangement for every construction kit. Everything is clearly labelled and in key. We have included all Instrumental Loops in midi form as well as wav for your own convenience and creative freedom. You can find also synth presets from Serum and Spire vst’s. What's more, everything you get is Royalty Free! Buy once and use in any of your projects in as many combinations as you want. Compatible with all DAWs and usable in more projects and genres than you can imagine. Please note: To run the presets you will need Serum v1.214, Spire v1.1.14 or higher is required. Pack Content: 1.80GB of content 5xConstruction Kits 83xLoops 76xStem Files 33xMidi Files 35x Shot Samples 14x Serum Presets 4x Spire Presets
This week, Adrian and Buzz.ie digital editor Mark Kavanagh look in depth at the pros and cons of Google's latest flagship smartphone, the Pixel 4 XL. Adrian is disappointed with the phone's battery life and lack of an ultrawide camera, while Mark loves the phone's 90Hz refresh rate. Both admire Google's Assistant and Android 10. The two journalists also compare and contrast the new Pixel with the other top phones on the market, while Adrian lists his three favourite smartphones of 2019. Adrian and Mark also discuss the advantages and disadvantages of signing up to Eir's budget mobile operator, GoMo. Adrian thinks it's a no-brainer, even if Eir goes back on its word and doesn't keep the price at €10 per month for 80GB of data.
Covering plenty of gaming news in modern, retro and indie as well as nice list of new World Records for NES speedrunning. Some unique Kickstarters, Game Deals, and New Retro Games.Podcast can be watched LIVE every Sunday at 630-7PM EST at http://www.twitch.tv/old_world_gamer @1:57 SDK Reviews Bloodstained: Ritual of the Night@31:59 PS5 to Focus on "hardcore" consumer and Triple-A Titles@40:54 My Friend Pedro Moves 250K Copies in 1 Week@42:48 Google Stadia Games Will Not be Cheaper Than Console Games@48:23 Pokemon Masters Info@52:06 Zelda: Link's Awakening Needed Minimal Changes@57:45 Dr DisRespect Back, Twitch Fine with Law Breaking Streams@1:06:21 Blazing Chrome, Like Having a New Contra Game@1:08:32 Borderlands 3, Main Story Takes 35 Hours to Finish@1:13:15 NES Speedrun World Record Updates@1:20:20 OWG Explores Indie Games@1:39:08 Nintendo Thinking of Expanding Retro Library Beyond NES@1:42:22 Wind Waker Mod for Banjo-Kazooie, N64 Compatible@1:44:02 Clu Clu Land Joins Hamster's Arcade Archives Range on Switch@1:45:27 Play Your n64 on Modern Displays with Super 64 HDMI Adaptor@1:48:11 Mega Drive Mini JP, Collector's Edition Comes with 22 Tiny Carts@1:49:57 Sega Not Worried With Overwhelming Audience with Retro Content@1:51:30 Limited Run Games Releasing Physical NES and GB Star Wars Carts@1:53:58 Royal Hunt by Bimbarlade, New C64 Game, Why is He Nude?@1:57:50 Rade Blunner Ep. 2 by Iadvd, New ZX Spectrum Platformer, Needs More Gameplay@1:59:23 Dice Skater by Geir Straume and Sean Connolly, New Addictive C64 Game@2:02:02 Cruiser-X 79 by The New Dimension, New C64 Shmup@2:04:06 128GB PlayStation Classic Retro Console@2:06:25 Grelox by Sunteam, Beautiful Retro Game for PC, Free to Play!?!@2:08:35 Minit for C64 and SpectrumNext, Possible Physical Release@2:10:54 NEW C64 Full Size Release Coming December 2019@2:14:41 Create Dizzy Games for ZX Spectrum with Dizzy Scripting Engine v1.1.0@2:16:32 Bonus C64 Game Released with Mancave, Docster's Digger@2:19:02 Precinct 20: Dead Strange gets Release Date from Psytronik Software@2:22:23 Pokemon Sword & Shield Dev Understands Drama Over National Dex@2:24:49 Twitch Allows Subscriber-Only Streams, #OnlyAppreciateYour$$$@2:30:35 Doom TV Show by Dave & James Franco.... About the Making of the Game@2:32:01 Tetris Royale, Mobile Tetris 99-like Royale Game@2:34:42 Bill & Ted 3: Face the Music Cast More Original Movie Characters@2:35:24 Cyberpunk 2077's Install Size is 80GB on PS4, Minimum@2:37:35 New Ni No Kuni is in the Works, Level-5 Confirms@2:38:22 SGDQ Raises over $3Million @2:40:58 Blue Protocol, New PC Action-RPG by Bandai Namco@2:42:31 Gloomhaven gets Early Access Teaser Ahead of Launch@2:44:37 No Delivery, Procedural Horror RPG, Unsettling Demo@2:47:04 $80 HDMI Dongle to do Post-Processing for Your GPU?@2:52:18 Kickstarter: SkateBIRD@2:56:00 Kickstarter: AWAY@3:02:23 Kickstarter: Evan's Remains@3:07:05 Kickstarter: Orange Island@3:12:35 Closing, Shout Outs & Channel Host: PoisonCurls======================================================~~~Follow Me!~~~????????Twitch: http://www.twitch.tv/old_world_gamer ????????????????Twitch: http://www.twitch.tv/scottdevonkane ?????????Game Channel: http://www.youtube.com/c/oldworldgamer?Cooking Channel: http://www.youtube.com/user/stefanskitchen?Vlog Channel: http://www.youtube.com/user/thisisusnl????????Twitter: http://www.twitter.com/owgamer ????????????????Facebook: https://www.facebook.com/pages/Old-World-Gamer/335852309774703 ????????????????Steam: OWGamer????????????????Instagram: https://www.instagram.com/oldworldgamer/????????????????My Patreon: http://www.patreon.com/oldworldgamer ????????=====================================================Want to speed run? Check out http://www.speedrun.com I'm now a Humble Bundle Partner, feel free to purchase your games through my link and you can decide how much of your purchase goes toward my reviews and streams.https://www.humblebundle.com/?partner=OWGamer
Bandwidth explained from software engineer point of view Bandwidth is measured by how many bits a device is allowed to send/receive in a second. It ranges from your internal network starting from network card all the way to your ISP Internet speed. In this video we will discuss the definition of bandwidth upload vs download speed, the different usage patterns for normal web browsing, streaming, gaming, media production and cloud web servers etc and finally we will show an example of the whole thing. When your ISP gives you 80Mbs download/1Mbs upload bandwidth, this means 80 mega bits download which means your router (and all your devices) can receive (download) a total of 80 mega bits in a second at a time, and it can send (upload) 1 mega bit in a second. With your ISP can you pick and choose your plan that fits your usage. Usage patterns Web Browsing Most people download far more than they upload. So the download bandwidth matter more than upload bandwidth. For simple web browsing the download incoming traffic is slightly larger than the outgoing one. You make a GET request to pull index.html, you get back the html content. Yeah there are some images but still not much of incoming. Streaming Think about it when you watch a movie in Netflix you make a small request usually in few hundred bytes. But the data you receive (download) is huge. So streaming is constant large incoming data. While still outgoing traffic is minimum. Gaming The bandwidth for gamers is really interesting specially who play online and it really depends on the game. a game might use the server authoritative model, where all the online players send their inputs and receive the final game state changes from the server. In that case the there is a constant upload, still minor but the download bandwidth is larger since state is usually bigger. Games can also use the lock state model or the client authoritative model, where all the clients send their inputs the server and the server just send back all the input from all players back to everyone so everyone calculate the game state locally. So in this efficient model, both the upload and download is quite small and its good for countries with low bandwidth. Some games require cloud saving which means upload speed is recommended. In general latency is much better in gaming. Media production Youtuber and you make 5 videos a day so you need to upload alot of data to the youtube server. So upload speed really matter, and download also matter because you will also consume alot. Web servers Netflix (which is hosted in amazon) they upload data far more than they download. So their upload bandwidth is more than their download bandwidth. All those movies and tv shows pushed to consumers. Example Assume you have bandwidth of 80mb/s download speed (which is 10 Mega bytes per second) and 1mb/s upload (which is 125 KB per second). You are connected to a web server which has 80Gb/s upload and 80Gb/s download bandwidth, which is 10Gigabyte per second. Lets say you want to download a 60GB file or movie (no streaming). It will take the server 6 seconds to upload that file however it will take you 1.7 hours to download it. Your ISP That is not entirely true though for TCP, which implement congestion control. UDP does not implement congestion control. --- Send in a voice message: https://anchor.fm/hnasr/message
The new Roli Songwriter Kit includes two of the company’s controllers with MIDI Polyphonic Expression — the Seaboard Block 2-octave soft keyboard and the illuminated Lightpad M playing surface, which both provide 5 types of playback expression. It also includes the Loop Block transport controller, a protective snapcase for traveling, and a software bundle with Tracktion Waveform and Ableton Live Lite DAWs, and software instruments designed for MIDI Polyphonic Expression such as the Roli Equator software synthesizer and the Noise loop composition app. The Roli Songmaker Kit is in major musical instrument retailers now for a sale price of $599 USD. The IK Multimedia Syntronik Deluxe virtual synthesizer collection uses a combination of sampling and modeling to recreate some of the most renown hardware synthesizers of all time. The 80GB package for Mac or PC computers includes a total of 22 synths. Five brand new instruments for Syntronik Deluxe are based on the Moog Memorymoog analog polysynth from 1982, two Roland SH monosynths, the Korg Mono/Poly, the British EMS VCS3 modular synthesizer, and percussion sounds created in part from a Moog Modular system. Syntronik Deluxe costs 399 USD or euros, with upgrade pricing available to current users. French-Canadian pioneers of physical modeling, Applied Acoustics Systems, has released Objeq, a creative effect app and Audio Unit plug-in for iOS. Objeq filters audio signals through three acoustic object resonators, making mundane sounds such as finger taps sound like rich percussion instruments. The $3.99 Objeq works as a standalone app or as an Audio Unit 3 plug-in for iOS DAWs such as GarageBand, Cubasis 2, or iMPC Pro 2. That’s this week’s Audio Tech News Flash, brought to you by Accusonus. Please subscribe and leave us a 5-star review to help spread the Audio Tech News. And if you have an Amazon Echo device, you can command Alexa to play this podcast. To enable this skill, go to Amazon.com and in the search field for Alexa skills, search for Audio Tech News Flash. LINKS: Roli Songwriter Kit https://roli.com/products/blocks/songmaker-kit IK Multimedia Syntronik Deluxe http://www.ikmultimedia.com/products/syntronikdeluxe/ Applied Acoustics Systems Objeq iOS creative effect app. https://www.applied-acoustics.com/objeq-for-ios/
Don’t forget to subscribe to the show in your favorite podcasting app! Patron This episode of SwitchCraft is brought to you by Patrons like Kimbellina66! Support SwitchCraft and my other content for as little as a dollar and get exclusive rewards, at https://www.patreon.com/runjumpstompEpisode ## is also brought to you by ….OPSeat Coupon code runjumpsit for $10 off your chair!Audible Free audiobook that you get to keep weather you stay a member or not 14.95Podnutz - Tech Podcast Network - For normal tech fans by normal tech fans adreadFeedbackLIVE! - Switchcraft is recorded three times a week at 3pm US Eastern on Tuesday / Thursday and also on Saturday at whatever time the universe allows. Tune in at twitch.tv/runjumpstopThank you chat!Review Email - Runjumpstomp@gmail.comHi RJS,I want to share a story with you about my 3DS and microSD card. I have 300 3DS digital titles, a mix of Virtual Console, eShop titles, and first party downloads. 300 is the max a 3DS can have if you did not know that. I actually have more than that, but I can't hold them all. This all takes up around 80GB. The 300 hundred limit is 60 pages of 3DS titles where 5 titles per page are listed in data management.About a month ago the microSD card would not allow writes. I don't know what happened or changed. It was bricked, but still readable fortunately. I purchased a new 128GB card and copied all the files over successfully.I was fortunate that the card was still readable and wanted to warn others to make backups. Can you image downloading 80GB of 3DS titles along with DSi titles. Also, I would have lost my save games and backups.The story had a happy ending, but I was sweating it, until the new card and a backup had been completed.CheersTweet @runjumpstomp Voicemail 260-RUN-JUMP 260-786-5867 Discord runjumpstomp.com/discord NewsRocket league local online worriesTwo online accounts?I doubt it. Not like this on Nintendo stuff. (Mario kart)Probably developer oversight. I haven’t had time to play rocket league on Switch will talk about it on Thursday. GameplayFire emblem warriorsReally funIntegration with Fire emblem mechanics are really awesomeCapturing fortsBuilding relationships through combatWeapon triangle --- This episode is sponsored by · Anchor: The easiest way to make a podcast. https://anchor.fm/app
Papers we love: ARC by Bryan Cantrill, SSD caching adventures with ZFS, OpenBSD full disk encryption setup, and a Perl5 Slack Syslog BSD daemon. This episode was brought to you by Headlines Papers We Love: ARC: A Self-Tuning, Low Overhead Replacement Cache (https://www.youtube.com/watch?v=F8sZRBdmqc0&feature=youtu.be) Ever wondered how the ZFS ARC (Adaptive Replacement Cache) works? How about if Bryan Cantrill presented the original paper on its design? Today is that day. Slides (https://www.slideshare.net/bcantrill/papers-we-love-arc-after-dark) It starts by looking back at a fundamental paper from the 40s where the architecture of general-purpose computers are first laid out The main is the description of memory hierarchies, where you have a small amount of very fast memory, then the next level is slower but larger, and on and on. As we look at the various L1, L2, and L3 caches on a CPU, then RAM, then flash, then spinning disks, this still holds true today. The paper then does a survey of the existing caching policies and tries to explain the issues with each. This includes ‘MIN', which is the theoretically optimal policy, which requires future knowledge, but is useful for setting the upper bound, what is the best we could possibly do. The paper ends up showing that the ARC can end up being better than manually trying to pick the best number for the workload, because it adapts as the workload changes At about 1:25 into the video, Bryan start talking about the practical implementation of the ARC in ZFS, and some challenges they have run into recently at Joyent. A great discussion about some of the problems when ZFS needs to shrink the ARC. Not all of it applies 1:1 to FreeBSD because the kernel and the kmem implementation are different in a number of ways There were some interesting questions asked at the end as well *** How do I use man pages to learn how to use commands? (https://unix.stackexchange.com/a/193837) nwildner on StackExchange has a very thorough answer to the question how to interpret man pages to understand complicated commands (xargs in this case, but not specifically). Have in mind what you want to do. When doing your research about xargs you did it for a purpose, right? You had a specific need that was reading standard output and executing commands based on that output. But, when I don't know which command I want? Use man -k or apropos (they are equivalent). If I don't know how to find a file: man -k file | grep search. Read the descriptions and find one that will better fit your needs. Apropos works with regular expressions by default, (man apropos, read the description and find out what -r does), and on this example I'm looking for every manpage where the description starts with "report". Always read the DESCRIPTION before starting Take a time and read the description. By just reading the description of the xargs command we will learn that: xargs reads from STDIN and executes the command needed. This also means that you will need to have some knowledge of how standard input works, and how to manipulate it through pipes to chain commands The default behavior is to act like /bin/echo. This gives you a little tip that if you need to chain more than one xargs, you don't need to use echo to print. We have also learned that unix filenames can contain blank and newlines, that this could be a problem and the argument -0 is a way to prevent things explode by using null character separators. The description warns you that the command being used as input needs to support this feature too, and that GNU find support it. Great. We use a lot of find with xargs. xargs will stop if exit status 255 is reached. Some descriptions are very short and that is generally because the software works on a very simple way. Don't even think of skipping this part of the manpage ;) Other things to pay attention... You know that you can search for files using find. There is a ton of options and if you only look at the SYNOPSIS, you will get overwhelmed by those. It's just the tip of the iceberg. Excluding NAME, SYNOPSIS, and DESCRIPTION, you will have the following sections: When this method will not work so well... + Tips that apply to all commands Some options, mnemonics and "syntax style" travel through all commands making you buy some time by not having to open the manpage at all. Those are learned by practice and the most common are: Generally, -v means verbose. -vvv is a variation "very very verbose" on some software. Following the POSIX standard, generally one dash arguments can be stacked. Example: tar -xzvf, cp -Rv. Generally -R and/or -r means recursive. Almost all commands have a brief help with the --help option. --version shows the version of a software. -p, on copy or move utilities means "preserve permissions". -y means YES, or "proceed without confirmation" in most cases. Default values of commands. At the pager chunk of this answer, we saw that less -is is the pager of man. The default behavior of commands are not always shown at a separated section on manpages, or at the section that is most top placed. You will have to read the options to find out defaults, or if you are lucky, typing /pager will lead you to that info. This also requires you to know the concept of the pager(software that scrolls the manpage), and this is a thing you will only acquire after reading lots of manpages. And what about the SYNOPSIS syntax? After getting all the information needed to execute the command, you can combine options, option-arguments and operands inline to make your job done. Overview of concepts: Options are the switches that dictates a command behavior. "Do this" "don't do this" or "act this way". Often called switches. Check out the full answer and see if it helps you better grasp the meaning of a man page and thus the command. *** My adventure into SSD caching with ZFS (Home NAS) (https://robertputt.co.uk/my-adventure-into-ssd-caching-with-zfs-home-nas.html) Robert Putt as written about his adventure using SSDs for caching with ZFS on his home NAS. Recently I decided to throw away my old defunct 2009 MacBook Pro which was rotting in my cupboard and I decided to retrieve the only useful part before doing so, the 80GB Intel SSD I had installed a few years earlier. Initially I thought about simply adding it to my desktop as a bit of extra space but in 2017 80GB really wasn't worth it and then I had a brainwave… Lets see if we can squeeze some additional performance out of my HP Microserver Gen8 NAS running ZFS by installing it as a cache disk. I installed the SSD to the cdrom tray of the Microserver using a floppy disk power to SATA power converter and a SATA cable, unfortunately it seems the CD ROM SATA port on the motherboard is only a 3gbps port although this didn't matter so much as it was an older 3gbps SSD anyway. Next I booted up the machine and to my suprise the disk was not found in my FreeBSD install, then I realised that the SATA port for the CD drive is actually provided by the RAID controller, so I rebooted into intelligent provisioning and added an additional RAID0 array with just the 1 disk to act as my cache, in fact all of the disks in this machine are individual RAID0 arrays so it looks like just a bunch of disks (JBOD) as ZFS offers additional functionality over normal RAID (mainly scrubbing, deduplication and compression). Configuration Lets have a look at the zpool before adding the cache drive to make sure there are no errors or uglyness: Now lets prep the drive for use in the zpool using gpart. I want to split the SSD into two seperate partitions, one for L2ARC (read caching) and one for ZIL (write caching). I have decided to split the disk into 20GB for ZIL and 50GB for L2ARC. Be warned using 1 SSD like this is considered unsafe because it is a single point of failure in terms of delayed writes (a redundant configuration with 2 SSDs would be more appropriate) and the heavy write cycles on the SSD from the ZIL is likely to kill it over time. Now it's time to see if adding the cache has made much of a difference. I suspect not as my Home NAS sucks, it is a HP Microserver Gen8 with the crappy Celeron CPU and only 4GB RAM, anyway, lets test it and find out. First off lets throw fio at the mount point for this zpool and see what happens both with the ZIL and L2ARC enabled and disabled. Observations Ok, so the initial result is a little dissapointing, but hardly unexpected, my NAS sucks and there are lots of bottle necks, CPU, memory and the fact only 2 of the SATA ports are 6gbps. There is no real difference performance wise in comparison between the results, the IOPS, bandwidth and latency appear very similar. However lets bare in mind fio is a pretty hardcore disk benchmark utility, how about some real world use cases? Next I decided to test a few typical file transactions that this NAS is used for, Samba shares to my workstation. For the first test I wanted to test reading a 3GB file over the network with both the cache enabled and disabled, I would run this multiple times to ensure the data is hot in the L2ARC and to ensure the test is somewhat repeatable, the network itself is an uncongested 1gbit link and I am copying onto the secondary SSD in my workstation. The dataset for these tests has compression and deduplication disabled. Samba Read Test Not bad once the data becomes hot in the L2ARC cache reads appear to gain a decent advantage compared to reading from the disk directly. How does it perform when writing the same file back accross the network using the ZIL vs no ZIL. Samba Write Test Another good result in the real world test, this certainately helps the write transfer speed however I do wonder what would happen if you filled the ZIL transferring a very large file, however this is unlikely with my use case as I typically only deal with a couple of files of several hundred megabytes at any given time so a 20GB ZIL should suit me reasonably well. Is ZIL and L2ARC worth it? I would imagine with a big beefy ZFS server running in a company somewhere with a large disk pool and lots of users with multiple enterprise level SSD ZIL and L2ARC would be well worth the investment, however at home I am not so sure. Yes I did see an increase in read speeds with cached data and a general increase in write speeds however it is use case dependant. In my use case I rarely access the same file frequently, my NAS primarily serves as a backup and for archived data, and although the write speeds are cool I am not sure its a deal breaker. If I built a new home NAS today I'd probably concentrate the budget on a better CPU, more RAM (for ARC cache) and more disks. However if I had a use case where I frequently accessed the same files and needed to do so in a faster fashion then yes, I'd probably invest in an SSD for caching. I think if you have a spare SSD lying around and you want something fun todo with it, sure chuck it in your ZFS based NAS as a cache mechanism. If you were planning on buying an SSD for caching then I'd really consider your needs and decide if the money can be spent on alternative stuff which would improve your experience with your NAS. I know my NAS would benefit more from an extra stick of RAM and a more powerful CPU, but as a quick evening project with some parts I had hanging around adding some SSD cache was worth a go. More Viewer Interview Questions for Allan News Roundup Setup OpenBSD 6.2 with Full Disk Encryption (https://blog.cagedmonster.net/setup-openbsd-with-full-disk-encryption/) Here is a quick way to setup (in 7 steps) OpenBSD 6.2 with the encryption of the filesystem. First step: Boot and start the installation: (I)nstall: I Keyboard Layout: ENTER (I'm french so in my case I took the FR layout) Leave the installer with: ! Second step: Prepare your disk for encryption. Using a SSD, my disk is named : sd0, the name may vary, for example : wd0. Initiating the disk: Configure your volume: Now we'll use bioctl to encrypt the partition we created, in this case : sd0a (disk sd0 + partition « a »). Enter your passphrase. Third step: Let's resume the OpenBSD's installer. We follow the install procedure Fourth step: Partitioning of the encrypted volume. We select our new volume, in this case: sd1 The whole disk will be used: W(hole) Let's create our partitions: NB: You are more than welcome to create multiple partitions for your system. Fifth step: System installation It's time to choose how we'll install our system (network install by http in my case) Sixth step: Finalize the installation. Last step: Reboot and start your system. Put your passphrase. Welcome to OpenBSD 6.2 with a full encrypted file system. Optional: Disable the swap encryption. The swap is actually part of the encrypted filesystem, we don't need OpenBSD to encrypt it. Sysctl is giving us this possibility. Step-by-Step FreeBSD installation with ZFS and Full Disk Encryption (https://blog.cagedmonster.net/step-by-step-freebsd-installation-with-full-disk-encryption/) 1. What do I need? For this tutorial, the installation has been made on a Intel Core i7 - AMD64 architecture. On a USB key, you would probably use this link : ftp://ftp.freebsd.org/pub/FreeBSD/releases/amd64/amd64/ISO-IMAGES/11.1/FreeBSD-11.1-RELEASE-amd64-mini-memstick.img If you can't do a network installation, you'd better use this image : ftp://ftp.freebsd.org/pub/FreeBSD/releases/amd64/amd64/ISO-IMAGES/11.1/FreeBSD-11.1-RELEASE-amd64-memstick.img You can write the image file on your USB device (replace XXXX with the name of your device) using dd : # dd if=FreeBSD-11.1-RELEASE-amd64-mini-memstick.img of=/dev/XXXX bs=1m 2. Boot and install: Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F1.png) 3. Configure your keyboard layout: Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F2.png) & Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F3.png) 4. Hostname and system components configuration : Set the name of your machine: [Screenshot](https://blog.cagedmonster.net/content/images/2017/09/F4.png_ What components do you want to install? Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F5.png) 5. Network configuration: Select the network interface you want to configure. Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F6.png) First, we configure our IPv4 network. I used a static adress so you can see how it works, but you can use DHCP for an automated configuration, it depends of what you want to do with your system (desktop/server) Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F7.png) & Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F7-1.png) & Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F8.png) IPv6 network configuration. Same as for IPv4, you can use SLAAC for an automated configuration. Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F9.png) & Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F10-1.png) & Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F10-2.png) Here, you can configure your DNS servers, I used the Google DNS servers so you can use them too if needed. Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F11.png) 6. Select the server you want to use for the installation: I always use the IPv6 mirror to ensure that my IPv6 network configuration is good.Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F12.png) 7. Disk configuration: As we want to do an easy full disk encryption, we'll use ZFS. Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F13.png) Make sure to select the disk encryption :Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F14.png) Launch the disk configuration :Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F15.png) Here everything is normal, you have to select the disk you'll use :Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F16.png) I have only one SSD disk named da0 :Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F17.png) Last chance before erasing your disk :Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F18.png) Time to choose the password you'll use to start your system : Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F19.png) & Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F20.png) & Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F21.png) 8. Last steps to finish the installation: The installer will download what you need and what you selected previously (ports, src, etc.) to create your system: Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F22.png) 8.1. Root password: Enter your root password: Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F22-1.png) 8.2. Time and date: Set your timezone, in my case: Europe/France Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F22-2.png) & Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F23.png) & Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F23-1.png) Make sure the date and time are good, or you can change them :Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F24.png) & Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F25.png) 8.3. Services: Select the services you'll use at system startup depending again of what you want to do. In many cases powerd and ntpd will be useful, sshd if you're planning on using FreeBSD as a server. Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F26.png) 8.4. Security: Security options you want to enable. You'll still be able to change them after the installation with sysctl. Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F26-1.png) 8.5. Additionnal user: Create an unprivileged system user: Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F26-2.png) Make sure your user is in the wheel group so he can use the su command. Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F26-3.png) & Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F26-4.png) 8.6. The end: End of your configuration, you can still do some modifications if you want : Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F26-5.png) & Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F26-6.png) & Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F26-7.png) 9. First boot: Enter the passphrase you have chosen previously : Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F27.png) & Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F28.png) & Screenshot (https://blog.cagedmonster.net/content/images/2017/09/F29.png) Welcome to Freebsd 11.1 with full disk encryption! *** The anatomy of ldd program on OpenBSD (http://nanxiao.me/en/the-anatomy-of-ldd-program-on-openbsd/) In the past week, I read the ldd (https://github.com/openbsd/src/blob/master/libexec/ld.so/ldd/ldd.c) source code on OpenBSD to get a better understanding of how it works. And this post should also be a reference for other*NIX OSs. The ELF (https://en.wikipedia.org/wiki/Executable_and_Linkable_Format) file is divided into 4 categories: relocatable, executable, shared, and core. Only the executable and shared object files may have dynamic object dependencies, so the ldd only check these 2 kinds of ELF file: (1) Executable. ldd leverages the LD_TRACE_LOADED_OBJECTS environment variable in fact, and the code is as following: if (setenv("LD_TRACE_LOADED_OBJECTS", "true", 1) < 0) err(1, "setenv(LD_TRACE_LOADED_OBJECTS)"); When LDTRACELOADED_OBJECTS is set to 1 or true, running executable file will show shared objects needed instead of running it, so you even not needldd to check executable file. See the following outputs: $ /usr/bin/ldd usage: ldd program ... $ LD_TRACE_LOADED_OBJECTS=1 /usr/bin/ldd Start End Type Open Ref GrpRef Name 00000b6ac6e00000 00000b6ac7003000 exe 1 0 0 /usr/bin/ldd 00000b6dbc96c000 00000b6dbcc38000 rlib 0 1 0 /usr/lib/libc.so.89.3 00000b6d6ad00000 00000b6d6ad00000 rtld 0 1 0 /usr/libexec/ld.so (2) Shared object. The code to print dependencies of shared object is as following: if (ehdr.e_type == ET_DYN && !interp) { if (realpath(name, buf) == NULL) { printf("realpath(%s): %s", name, strerror(errno)); fflush(stdout); _exit(1); } dlhandle = dlopen(buf, RTLD_TRACE); if (dlhandle == NULL) { printf("%sn", dlerror()); fflush(stdout); _exit(1); } _exit(0); } Why the condition of checking a ELF file is shared object or not is like this: if (ehdr.e_type == ET_DYN && !interp) { ...... } That's because the file type of position-independent executable (PIE) is the same as shared object, but normally PIE contains a interpreter program header since it needs dynamic linker to load it while shared object lacks (refer this article). So the above condition will filter PIE file. The dlopen(buf, RTLD_TRACE) is used to print dynamic object information. And the actual code is like this: if (_dl_traceld) { _dl_show_objects(); _dl_unload_shlib(object); _dl_exit(0); } In fact, you can also implement a simple application which outputs dynamic object information for shared object yourself: # include int main(int argc, char **argv) { dlopen(argv[1], RTLD_TRACE); return 0; } Compile and use it to analyze /usr/lib/libssl.so.43.2: $ cc lddshared.c $ ./a.out /usr/lib/libssl.so.43.2 Start End Type Open Ref GrpRef Name 000010e2df1c5000 000010e2df41a000 dlib 1 0 0 /usr/lib/libssl.so.43.2 000010e311e3f000 000010e312209000 rlib 0 1 0 /usr/lib/libcrypto.so.41.1 The same as using ldd directly: $ ldd /usr/lib/libssl.so.43.2 /usr/lib/libssl.so.43.2: Start End Type Open Ref GrpRef Name 00001d9ffef08000 00001d9fff15d000 dlib 1 0 0 /usr/lib/libssl.so.43.2 00001d9ff1431000 00001d9ff17fb000 rlib 0 1 0 /usr/lib/libcrypto.so.41.1 Through the studying of ldd source code, I also get many by-products: such as knowledge of ELF file, linking and loading, etc. So diving into code is a really good method to learn *NIX deeper! Perl5 Slack Syslog BSD daemon (https://clinetworking.wordpress.com/2017/10/13/perl5-slack-syslog-bsd-daemon/) So I have been working on my little Perl daemon for a week now. It is a simple syslog daemon that listens on port 514 for incoming messages. It listens on a port so it can process log messages from my consumer Linux router as well as the messages from my server. Messages that are above alert are sent, as are messages that match the regex of SSH or DHCP (I want to keep track of new connections to my wifi). The rest of the messages are not sent to slack but appended to a log file. This is very handy as I can get access to info like failed ssh logins, disk failures, and new devices connecting to the network all on my Android phone when I am not home. Screenshot (https://clinetworking.files.wordpress.com/2017/10/screenshot_2017-10-13-23-00-26.png) The situation arose today that the internet went down and I thought to myself what would happen to all my important syslog messages when they couldn't be sent? Before the script only ran an eval block on the botsend() function. The error was returned, handled, but nothing was done and the unsent message was discarded. So I added a function that appended unsent messengers to an array that are later sent when the server is not busy sending messages to slack. Slack has a limit of one message per second. The new addition works well and means that if the internet fails my server will store these messages in memory and resend them at a rate of one message per second when the internet connectivity returns. It currently sends the newest ones first but I am not sure if this is a bug or a feature at this point! It currently works with my Linux based WiFi router and my FreeBSD server. It is easy to scale as all you need to do is send messages to syslog to get them sent to slack. You could sent CPU temp, logged in users etc. There is a github page: https://github.com/wilyarti/slackbot Lscpu for OpenBSD/FreeBSD (http://nanxiao.me/en/lscpu-for-openbsdfreebsd/) Github Link (https://github.com/NanXiao/lscpu) There is a neat command, lscpu, which is very handy to display CPU information on GNU/Linux OS: $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 32 On-line CPU(s) list: 0-31 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 2 But unfortunately, the BSD OSs lack this command, maybe one reason is lscpu relies heavily on /proc file system which BSD don't provide, :-). TakeOpenBSD as an example, if I want to know CPU information, dmesg should be one choice: $ dmesg | grep -i cpu cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Core(TM)2 Duo CPU P8700 @ 2.53GHz, 2527.35 MHz cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM, PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,XSAVE,NXE,LONG,LAHF,PERF,SENSOR cpu0: 3MB 64b/line 8-way L2 cache cpu0: apic clock running at 266MHz cpu0: mwait min=64, max=64, C-substates=0.2.2.2.2.1.3, IBE But the output makes me feeling messy, not very clear. As for dmidecode, it used to be another option, but now can't work out-of-box because it will access /dev/mem which for security reason, OpenBSD doesn't allow by default (You can refer this discussion): $ ./dmidecode $ dmidecode 3.1 Scanning /dev/mem for entry point. /dev/mem: Operation not permitted Based on above situation, I want a specified command for showing CPU information for my BSD box. So in the past 2 weeks, I developed a lscpu program for OpenBSD/FreeBSD, or more accurately, OpenBSD/FreeBSD on x86 architecture since I only have some Intel processors at hand. The application getsCPU metrics from 2 sources: (1) sysctl functions. The BSD OSs provide sysctl interface which I can use to get general CPU particulars, such as how many CPUs the system contains, the byte-order of CPU, etc. (2) CPUID instruction. For x86 architecture, CPUID instruction can obtain very detail information of CPU. This coding work is a little tedious and error-prone, not only because I need to reference both Intel and AMD specifications since these 2 vendors have minor distinctions, but also I need to parse the bits of register values. The code is here (https://github.com/NanXiao/lscpu), and if you run OpenBSD/FreeBSD on x86 processors, please try it. It will be better you can give some feedback or report the issues, and I appreciate it very much. In the future if I have other CPUs resource, such as ARM or SPARC64, maybe I will enrich this small program. *** Beastie Bits OpenBSD Porting Workshop - Brian Callahan will be running an OpenBSD porting workshop in NYC for NYC*BUG on December 6, 2017. (http://daemonforums.org/showthread.php?t=10429) Learn to tame OpenBSD quickly (http://www.openbsdjumpstart.org/#/) Detect the operating system using UDP stack corner cases (https://gist.github.com/sortie/94b302dd383df19237d1a04969f1a42b) *** Feedback/Questions Awesome Mike - ZFS Questions (http://dpaste.com/1H22BND#wrap) Michael - Expanding a file server with only one hard drive with ZFS (http://dpaste.com/1JRJ6T9) - information based on Allan's IRC response (http://dpaste.com/36M7M3E) Brian - Optimizing ZFS for a single disk (http://dpaste.com/3X0GXJR#wrap) ***
An octagonal Android smartphone, a lawsuit over the term "Milk," and a phone camera that flips up like a deranged pancake. Oh, and luxury smartwatches. (Luxury smartwatch faces, anyway. Illegal ones. Or something.) Companies have waited until the end of the year to spring all the nonsense on us, it seems – and we're lapping it up like the grateful tech hounds we are. Because, according to the polls, dear listeners, that's what you want to hear. But like all humans, you also want to hear your own voice. Don't be ashamed; we're all narcissists here. To that end, we're stacking one of the last podcasts of 2014 with listener mail dating all the way back to July: stuff that's been held in the buffer so long it's got a fine coating of Pocketnow archival dust sitting atop it. Today, it's time to blow that coating away and finally get to some of your long-held questions, complaints, and curiosities – the ones that still make sense, anyway. That's a podcast so jazzy that even the ghost of Joe Levi couldn't resist joining, so what're you waiting for? Seal your port covers, snap on your protective case, and jump in our pocket. Now. Episode 124 of our weekly podcast lies below and you should watch it –either through the live video stream at 2:00pm Eastern on November 26 (click here for your local time), or the high-quality audio version available later on– and shoot your questions to podcast [AT] pocketnow [DOT] com for a shot at getting your question read aloud on the air. See you then! Pocketnow Weekly 124 Recording Date November 26, 2014 Hosts Michael Fisher Stephen Schenck Adam Doud Producer Jules Wang Podcast Rundown Sponsor (00:01:26) The Pocketnow Weekly is brought to you by Squarespace, the all-in-one platform that makes it fast and easy to create your own professional website, portfolio, and online store. For a free trial and 10% off, visit Squarespace.com/pocketnow and enter offer code POCKETNOW at checkout. A better web starts with your website. News (00:05:00) (Announcement) Pocketnow's Holiday Shopping Guide and Black Friday Roundup posts are up! Amazon's Fire Phone drops to $199 off contract Octagonal Android boasts 80GB storage for under $400 Weirdest smartphone camera yet swings up and over Luxury brands square off against copycat watchfaces Samsung getting sued over the term Milk Listener Mail (00:30:00) An absurdly dense smorgasbord of listener mail, some of it dating back to the pre-IFA days. Whether it's thoughts on the Nexus 9, questions on the future of the wearable, rants about keyboard lag, or questions about what phone to buy your mother for the holidays, you'll find it in our super-dense block of reader feedback! This season's extravaganza features letters from the following sweet, sweet listeners: Lizette M. Ed Carden Trevor Reis Vijay V. Sam Ward Chris Cox Diego Lopez Crane Bernard Untalan Kwadwo Boadu Eric Rekut Alex Ong A. Jacob William Li Alif Prakoso Music It may just sound like a ringtone to you, but our transition music track ("Radiation") is a real song, from a real album, by a real artist: Ali Spagnola. You can download that album, along with many others, at Ali's website here, visit her YouTube page here, and follow her on Twitter here! • See you next week for Episode 125! See omnystudio.com/listener for privacy information.
Channel Master DVR+ Review If you are a cord cutter you know that one thing that is difficult to live without is a DVR. You can buy a Tivo Roamio and pay a smaller monthly service but you cut the cord so you wouldn't have to pay a monthly charge. What most cord cutters are looking for is a way to record over the air HD without having to pay any monthly fees. Channel Master has you covered with the DVR+ (MSRP $249.99). The DVR+ is a subscription free DVR that allows you to cut the cord and still watch late night network TV on your terms. Features: Watch broadcast programming in pure, uncompressed HD - the highest quality available Easy installation with on-screen setup wizard Supports SD and HD video resolutions up to 1080i and 1080p Works with all digital TV antennas for instant access to broadcast programming Supports external USB hard drives (EHD) for full DVR functionality Dual tuners allow you to record one program while watching another or record two programs at the same time (using EHD) Records, pauses and plays back live TV up to 2 hours out of the box 4 See what's on now or later with the user-friendly Electronic Program Guide (EPG) Easily search for shows and set timers with name-based recording Broadband connection provides automatic software updates, enhanced EPG and access to Internet video streaming services Get Wi-Fi connectivity with optional USB Wi-Fi adapter Supports Dolby® Digital Plus surround sound Ultra-low power consumption Setup: When you take the DVR+ out of the box the first thing you notice is how thin it is. Its barely bigger than the HDMI cable that you plug into it. You can pretty much put it anywhere with little issue. To physically install the device you need to connect power, HDMI, antenna, Ethernet, and an external hard drive. Without the external drive there is little that you can do as the DVR+ comes with 16GB internally. A Terabyte drive will get you about 160 hours of HD recording. We had a spare 80GB drive which was reformatted by the DVR+ and was ready to go in a few minutes. The physical install took about 15 minutes. You will need a network connection for the program guide or if you want to use the Hulu service. Right now that's the only network service available. Channel Master says they will add more over time. If you don't have a wired connection you can buy a Wifi adapter for an additional $40. Next we fired up the unit and scanned for channels. The DVR+'s two tuners found 45 digital channels which was better than the HD Homerun but not as good as the tuner in our Panasonic plasma television. All of which are connected to the same antenna. Twentyfive minutes after taking the DVR+ out of the box we had a picture on screen and were able to record anything coming over the airwaves! Performance: What can we say, the DVR+ works like a DVR you would get from your provider. The program Guide is provided by Rovi (no additional cost) which allows you to search or browse to find your programs. You can tune to a program if its currently on or you can mark it for recording. Season passes are set up based on title names, which is a little less sophisticated than say the Genie but it will get the job done. You can setup up the skip forward and skip back lengths in the menu so if you want to skip back 30 seconds and forward by one minute you can do it. The user interface is basic but very responsive. We recorded a few programs to the external drive. All but one recorded but that was on a channel that did not have the best reception. On channels that have a solid signal every timed based recording went off without a hitch. We even recorded two programs at the same time. Like we said, it works just like every other DVR we've ever owned. We had to turn on the DVR to see if it was recording because the record light does not come on unless the DVR is powered on. Yes, the device always has power and can record even if its “off”. But we would like to see the light come on anytime the DVR is recording something. Playback was flawless and skipping through commercials was easy. We had the skip forward set to one minute so it was usually five or six presses to get through a commercial break. Skips were instantaneous. The only drawback of a 60 second skip is that if you miss the entry point after the commercials you'll have a lot of backing up to do so you may want a 30 second back button instead of the typical 10 seconds. This and That: The DVR+ has been designed to work well with the Slingbox 500. Current firmware does not allow you to set up a season pass that only records new shows. So you end up with some repeats that need to be deleted. The buttons on the remote are very small. But you'll be using a Harmony anyway right ;-) There is no user guide so be prepared to hunt around through the menus to see what's possible Conclusion: Up until now if you wanted to cut the cord and totally eliminate monthly fees you needed a HTPC. With the DVR+, a cord cutter can be free of cable and satellite without missing out on timeshifting. The only thing the DVR+ doesn't have is a rich set of network apps. Although we're sure they are already on the way!
Here are the show highlights for episode #34 of the Geek News Daily podcast for September 2, 2008. * Walmart discontinues 80GB and 4GB Zunes? 120GB Zune priced? * Apple event set for Sept 9 as rumored * Warez ringleader gets probation after guilty plea * Microsoft gets patent for page up / page down * Study: 88% of IT Pros would steal passwords or data if fired * Hulu premiering shows before television broadcast * PSP getting a Wi-Fi store * iPhone web share grows to 0.48% * The first letter in your email could determine the amount of spam you get * Google's Chrome browser launches Cool Link of the Day: USB can-shaped humidifier
News: Leopard, Amazon's MP3 download store, French iPhone launch delay, The Cloud, Freezing iMacs, Zune Switchers Corner: Browsers - Netscape, Shiira, Safari, Opera, Firefox, Omniweb Software Review: Saft Introductions The MacBiters introduction - Elaine, MikeB and MikeT - and have their first gab! A brief insight in to how & why we switched to the Mac platform. News Items For the week ending Friday 5th October 2007 Mac OS X 10.5 Leopard is coming Macworld.co.uk news is reporting that rumours are rife that we can expect Leopard to land on the 26th of October. Also, details released from testers of the latest pre-release version, list min system requirements of a Mac running a G4 867MHz or faster processor, which makes several older Mac models incapable of running the forthcoming operating system. iTunes killer??? Amazon.com launches mp3 download store - currently in Beta for US customers only. Prices are only $0.89 for a DRM free, 256bit mp3 track and recent albums priced around $8.99. Simple download manager app to easily transfer this music to your iTunes library and then to sync with iPod. Amazon have DRM free music from other labels as well as EMI - iTunes plus tracks are DRM free music from EMI owned labels only, and they’re priced at $1.29! So cheaper, better quality, DRM free mp3's from one of the biggest names in music retail. Is iPhone launch for France delayed due to French unlocking laws? Reported on several news sites including engadgetmobile and Appleinsider. Quote from the French Daily, Los Echos "The difficulties stem from a French law that would require the Apple handset to be sold both with and without contracts. This law would reportedly undermine the iPhone's exclusivity for Orange and Apple's demand of up to 30 percent of voice and data revenues." Discussion went on to iPhone purchasing. Mike B mentioned that information seen said that you couldn’t pay for an iPhone by cash in the UK. Quote from The Carphone Warehouse iPhone Eligibility statement: “To give everyone a chance, there'll be a limit of two iPhones per customer in-store and online. Oh, and we're afraid we can't accept cash.” And from this weeks news, we have The Good, The Bad & The Ugly THE GOOD The Cloud offer new deal WiFi access for iPod touch users. Subscribe and register with an iPod Touch and for £3.99 p/m you’ll unlimited (there’s bound to be the usual FUP caveat) WiFi access at 7500 hotspots around the UK. Pretty good vfm, as their normal unlimited account is £11.99 pm. iPod Touch users need to sign-up for the service at www.thecloud.co.uk/iPod THE BAD New iMacs plagued by interface freezing issues Users have reported that Apple's latest iMacs suffer from a flaw that locks up the interface, rendering the system all but inert until a reboot. The system will look and behave as if it's running but is totally unresponsive to any touching of the controls, according to the reports. UPDATED AS WE WERE RECORDING Apple has issued a statement confirming that some iMacs are suffering screen freezes since a recent software update was installed - but promises a software repair soon. Apple explained: "We are tracking down the root cause of this bug, and will issue a software update which corrects it as soon as we can - most likely later this month. We apologise for the inconvenience." THE UGLY - New Zune 2 to take on Apple's iPod The new Zunes have a familiar design but new interface and a controller nicknamed the squircle. Microsoft has launched three new models of its Zune digital media player in an effort to compete with Apple's iPod. The players - which come in 4GB, 8GB and 80GB, but although the Brown has gone, the new Green is pretty awful. Switcher's Corner This week’s feature covers Browsers Among those discussed were; http://browser.netscape.com/ http://shiira.jp/en.php http://www.apple.com/safari/ http://www.opera.com/ http://www.mozilla.com/en-US/firefox/ http://www.omnigroup.com/applications/omniweb/ Note on the Shiira Project - Shiira is a web browser based on Web Kit and written in Cocoa. The goal of the Shiira Project is to create a browser that is better and more useful than Safari. All source code used in this software is publicly available. Software Review Saft v8.3.11 This week’s Cool Tool scores a coveted 5 Golden Bites Download available from http://haoli.dnsalias.com/Saft/index.html It’s also featured on our favourite Safari resource site, http://pimpmysafari.com/ Tightly integrated into your Safari browser, Saft is so much more than a plugin/extension. There’re just too many features to list here, but reviewers consistently score Saft highly. And considering it’s only $12, you’re getting an excellent deal. Wrap-up OK, well that's about it for this episode of MacBites. This is our very first Podcast, and we’d love to hear from listeners. Please contact us with feedback & comments, or if you have questions about items featured in the podcast. We’d also like to feature listeners questions on anything to do with getting the most out of your Macs. Perhaps there’s something you’d like to hear featured in a future MacBites Podcast?? Please send your comments/questions, preferably in audio form, to macbitesuk@gmail.com Next Episode Featuring Mail on a mac and an application called SpamSieve which does what it says on the tin. OUR LOCAL MAC USER GROUP - www.nwmug.co.uk The three of us are regulars at the NWMUG - Hi to any members who are listening. We'd love to hear from any MUG members round the country/world. Have you any news or even plans to start your own podcast? Events Next NWMUG meeting - October 11th 2007. Discussion topic is Music made on a Mac, featuring Garageband Location: Grappenhall Community Centre, Warrington, 8pm Website - www.nwmug.co.uk Mac Expo - Kensington Olympia, London 25-27 October 2007 Showcasing the very best in the digital world, MacLive Expo returns this year, bigger, better and much, much more personal, giving visitors the chance to get close to major players in the world of music, photography, film and design. website - http://www.macliveexpo.co.uk Adobe CS3 Tour Join Adobe experts for some amazing workshops around the UK and Rep of Ireland. You'll learn the latest tips and techniques for designing and developing amazingly engaging experiences: in print, online or on any device. See the all new Creative Suite 3 family of products along with the latest in cutting edge web development tools including Adobe Flex and AIR. Website: http://tinyurl.com/26rpwz
We started our new school year this week in the sweltering New Zealand sunshine. We used my new 80GB video iPod and iTalk attachment to record our holiday introductions. I was really pleased with the ease of use of the iTalk so you can look forward to further podcasts out in the field rather than being tied to the computer or traditional microphone. A very cool first attempt. We got some encouraging feedback on our podcast from another school I taught at many summers ago, Coley Street, in Foxton which was nice. Welcome to the podcasting world, Coley Street. We look forward to hearing from you. Children have been enjoying listening to last year's podcasts on my 'old' iPod Nano. Most had never used one before and quickly caught on to the click wheel controls. 3.4 MB 4 minutes 50 seconds
Apple ipod 80GB review. Email me at Larry389@podomatic.com