Podcasts about Anomaly detection

  • 97PODCASTS
  • 142EPISODES
  • 37mAVG DURATION
  • 1MONTHLY NEW EPISODE
  • May 6, 2025LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about Anomaly detection

Latest podcast episodes about Anomaly detection

Dell Technologies Power2Protect Podcast
Episode 110 - Deepening Cyber Resilience with Anomaly Detection and the Dell Technology Advantage

Dell Technologies Power2Protect Podcast

Play Episode Listen Later May 6, 2025 17:06


In this episode, learn about advanced solutions for detecting and responding to threats, recovering swiftly from cyberattacks, and the innovative Anomaly Detection feature in PowerProtect Data Manager. Tune in to discover how Dell empowers businesses to stay ahead in an evolving threat landscape.

The IoT Podcast
Is Edge AI Turning Our Homes into Intelligent Spaces or Fragile Ecosystems Prone to Failure? | Edge of Tomorrow with Sam Geha (Infineon) & Rob Tiffany (IDC)

The IoT Podcast

Play Episode Listen Later Mar 13, 2025 44:44


Cloud Security Podcast by Google
EP213 From Promise to Practice: LLMs for Anomaly Detection and Real-World Cloud Security

Cloud Security Podcast by Google

Play Episode Listen Later Mar 3, 2025 28:01


Guest: Yigael Berger, Head of AI, Sweet Security Topic: Where do you see a gap between the “promise” of LLMs for security and how they are actually used in the field to solve customer pains? I know you use LLMs for anomaly detection. Explain how that “trick” works? What is it good for? How effective do you think it will be?  Can you compare this to other anomaly detection methods? Also, won't this be costly - how do you manage to keep inference costs under control at scale?  SOC teams often grapple with the tradeoff between “seeing everything” so that they never miss any attack, and handling too much noise. What are you seeing emerge in cloud D&R to address this challenge? We hear from folks who developed an automated approach to handle a reviews queue previously handled by people. Inevitably even if precision and recall can be shown to be superior, executive or customer backlash comes hard with a false negative (or a flood of false positives). Have you seen this phenomenon, and if so, what have you learned about handling it? What are other barriers that need to be overcome so that LLMs can push the envelope further for improving security? So from your perspective, LLMs are going to tip the scale in whose favor - cybercriminals or defenders?  Resource: EP157 Decoding CDR & CIRA: What Happens When SecOps Meets Cloud EP194 Deep Dive into ADR - Application Detection and Response EP135 AI and Security: The Good, the Bad, and the Magical Andrej Karpathy series on how LLMs work Sweet Security blog  

Gene Hunting with o1-pro: Reasoning about Rare Diseases with ChatGPT Pro Grantee Dr. Catherine Brownstein

Play Episode Listen Later Jan 15, 2025 93:29


Nathan explores the cutting-edge intersection of AI and rare disease research with Dr. Catherine Brownstein of Boston Children's Hospital and Harvard Medical School. In this episode of The Cognitive Revolution, we dive into how frontier AI models are revolutionizing the diagnosis of rare diseases. Join us for an insightful conversation with a ChatGPT Pro grant winner who's pioneering the use of AI to help patients find answers faster. Help shape our show by taking our quick listener survey at https://bit.ly/TurpentinePulse Check out Modern Relationships, where Erik Torenberg interviews tech power couples and leading thinkers to explore how ambitious people actually make partnerships work. This season's guests include: Delian Asparouhov & Nadia Asparouhova, Kristen Berman & Phil Levin, Rob Henderson, and Liv Boeree & Igor Kurganov. Apple: https://podcasts.apple.com/us/podcast/id1786227593 Spotify: https://open.spotify.com/show/5hJzs0gDg6lRT6r10mdpVg YouTube: https://www.youtube.com/@ModernRelationshipsPod SPONSORS: Oracle Cloud Infrastructure (OCI): Oracle's next-generation cloud platform delivers blazing-fast AI and ML performance with 50% less for compute and 80% less for outbound networking compared to other cloud providers. OCI powers industry leaders like Vodafone and Thomson Reuters with secure infrastructure and application development capabilities. New U.S. customers can get their cloud bill cut in half by switching to OCI before March 31, 2024 at https://oracle.com/cognitive NetSuite: Over 41,000 businesses trust NetSuite by Oracle, the #1 cloud ERP, to future-proof their operations. With a unified platform for accounting, financial management, inventory, and HR, NetSuite provides real-time insights and forecasting to help you make quick, informed decisions. Whether you're earning millions or hundreds of millions, NetSuite empowers you to tackle challenges and seize opportunities. Download the free CFO's guide to AI and machine learning at https://netsuite.com/cognitive Shopify: Dreaming of starting your own business? Shopify makes it easier than ever. With customizable templates, shoppable social media posts, and their new AI sidekick, Shopify Magic, you can focus on creating great products while delegating the rest. Manage everything from shipping to payments in one place. Start your journey with a $1/month trial at https://shopify.com/cognitive and turn your 2025 dreams into reality. Vanta: Vanta simplifies security and compliance for businesses of all sizes. Automate compliance across 35+ frameworks like SOC 2 and ISO 27001, streamline security workflows, and complete questionnaires up to 5x faster. Trusted by over 9,000 companies, Vanta helps you manage risk and prove security in real time. Get $1,000 off at https://vanta.com/revolution CHAPTERS: (00:00:00) Teaser (00:00:56) About the Episode (00:04:45) Rare Diseases Common (00:06:48) Patient Journey (00:12:57) Genome Sequencing (00:19:39) Sponsors: Oracle Cloud Infrastructure (OCI) | NetSuite (00:22:19) Diagnosis Process (00:30:50) Data Pipelines (00:35:51) Sponsors: Shopify | Vanta (00:39:07) Interaction Graphs (00:42:18) Data Accessibility (00:43:42) AI in Pipelines (00:45:40) LLM Impact (00:48:40) Anomaly Detection (00:52:07) Data Sharing (00:58:49) Data Reform (01:02:41) AI's Potential (01:04:30) AI Applications (01:06:57) Prompt Engineering (01:14:51) Model Comparison (01:19:16) Prompting Insights (01:22:14) Move 37 Analogy (01:24:34) Future Potential (01:29:27) Future Experience (01:32:39) Outro SOCIAL LINKS: Website: https://www.cognitiverevolution.ai Twitter (Podcast): https://x.com/cogrev_podcast Twitter (Nathan): https://x.com/labenz LinkedIn: https://www.linkedin.com/in/nathanlabenz/ Youtube: https://www.youtube.com/@CognitiveRevolutionPodcast Apple: https://podcasts.apple.com/de/podcast/the-cognitive-revolution-ai-builders-researchers-and/id1669813431 Spotify: https://open.spotify.com/show/6yHyok3M3BjqzR0VB5MSyk

Autonomous IT
CISO IT – Great Security Begins With Great IT: CISO IT's Best of 2024, E14

Autonomous IT

Play Episode Listen Later Dec 26, 2024 15:17


Amelia's Weekly Fish Fry
AI-based Anomaly Detection: From Conceptualization to Integration

Amelia's Weekly Fish Fry

Play Episode Listen Later Dec 13, 2024 16:04


AI-based anomaly detection takes center stage in this week's Fish Fry podcast! My guest is Rachel Johnson from MathWorks and we explore how AI can work in tandem with engineers to reduce the incidence of defects and optimize maintenance schedules and the steps involved in designing and deploying an AI-based anomaly detection system; from conceptualization and data gathering to deployment and integration.

Autonomous IT
CISO IT – Great Security Begins with Great IT, E11

Autonomous IT

Play Episode Listen Later Oct 10, 2024 13:44


In this episode, Jason Kikta discusses the critical relationship between IT and security, emphasizing that great security begins with a solid IT foundation. He explores the importance of establishing a baseline for normalcy, the role of user safety in preventing security breaches, and the need to understand insider threats. Jason concludes with discussing the 'big three' of cybersecurity, which are: Network Inventory: Knowing what's on your network is crucial. This involves having a comprehensive inventory of all devices and systems connected to the network.Configuration and Patching: Keeping systems configured correctly and up-to-date with patches is essential to prevent vulnerabilities that could be exploited by malicious actors.Identity and Authentication Protection: Ensuring robust identity and authentication measures are in place to protect against unauthorized access and maintain the integrity of user accounts.

KuppingerCole Analysts
Analyst Chat #229: Beyond ChatGPT - AI Use Cases for Cybersecurity

KuppingerCole Analysts

Play Episode Listen Later Sep 16, 2024 36:14


How can artificial intelligence be used in cybersecurity? Matthias and Alexei asked ChatGPT exactly this question and it came up with quite a list of use cases. They go through this list and discuss it. They explore the different forms of AI aside from generative AI, such as non-generative AI and traditional machine learning. They highlight the limitations and risks associated with large language models like GPTs and the need for more sustainable and efficient AI solutions. The conversation covers various AI use cases in cybersecurity, including threat detection, behavioral analytics, cloud security monitoring, and automated incident response. They emphasize the importance of human involvement and decision-making in AI-driven cybersecurity solutions. Here's ChatGPT's list of AI use cases for cybersecurity: AI for Threat Detection: AI analyzes large datasets to identify anomalies or suspicious activities that signal potential cyber threats. Behavioral Analytics: AI tracks user behavior to detect abnormal patterns that may indicate compromised credentials or insider threats. Cloud Security Monitoring: AI monitors cloud infrastructure, detecting security misconfigurations and policy violations to ensure compliance. Automated Incident Response: AI helps automate responses to cyber incidents, reducing response time and mitigating damage. Malware Detection: AI-driven solutions recognize evolving malware signatures and flag zero-day attacks through advanced pattern recognition. Phishing Detection: AI analyzes communication patterns, spotting phishing emails or fake websites before users fall victim. Vulnerability Management: AI identifies system vulnerabilities, predicts which flaws are most likely to be exploited, and suggests patch prioritization. AI-Driven Penetration Testing: AI automates and enhances pen-testing by simulating potential cyberattacks and finding weaknesses in a network. Anomaly Detection in Network Traffic: AI inspects network traffic for unusual patterns, preventing attacks like Distributed Denial of Service (DDoS). Cybersecurity Training Simulations: AI-powered platforms create dynamic, realistic simulations for training cybersecurity teams, preparing them for real-world scenarios. Threat Intelligence: NLP-based AI interprets textual data like threat reports, social media, and news to assess emerging risks. Predictive Risk Assessment: AI assesses and predicts potential future security risks by evaluating system vulnerabilities and attack likelihood.

KuppingerCole Analysts Videos
Analyst Chat #229: Beyond ChatGPT - AI Use Cases for Cybersecurity

KuppingerCole Analysts Videos

Play Episode Listen Later Sep 16, 2024 36:14


How can artificial intelligence be used in cybersecurity? Matthias and Alexei asked ChatGPT exactly this question and it came up with quite a list of use cases. They go through this list and discuss it. They explore the different forms of AI aside from generative AI, such as non-generative AI and traditional machine learning. They highlight the limitations and risks associated with large language models like GPTs and the need for more sustainable and efficient AI solutions. The conversation covers various AI use cases in cybersecurity, including threat detection, behavioral analytics, cloud security monitoring, and automated incident response. They emphasize the importance of human involvement and decision-making in AI-driven cybersecurity solutions. Here's ChatGPT's list of AI use cases for cybersecurity: AI for Threat Detection: AI analyzes large datasets to identify anomalies or suspicious activities that signal potential cyber threats. Behavioral Analytics: AI tracks user behavior to detect abnormal patterns that may indicate compromised credentials or insider threats. Cloud Security Monitoring: AI monitors cloud infrastructure, detecting security misconfigurations and policy violations to ensure compliance. Automated Incident Response: AI helps automate responses to cyber incidents, reducing response time and mitigating damage. Malware Detection: AI-driven solutions recognize evolving malware signatures and flag zero-day attacks through advanced pattern recognition. Phishing Detection: AI analyzes communication patterns, spotting phishing emails or fake websites before users fall victim. Vulnerability Management: AI identifies system vulnerabilities, predicts which flaws are most likely to be exploited, and suggests patch prioritization. AI-Driven Penetration Testing: AI automates and enhances pen-testing by simulating potential cyberattacks and finding weaknesses in a network. Anomaly Detection in Network Traffic: AI inspects network traffic for unusual patterns, preventing attacks like Distributed Denial of Service (DDoS). Cybersecurity Training Simulations: AI-powered platforms create dynamic, realistic simulations for training cybersecurity teams, preparing them for real-world scenarios. Threat Intelligence: NLP-based AI interprets textual data like threat reports, social media, and news to assess emerging risks. Predictive Risk Assessment: AI assesses and predicts potential future security risks by evaluating system vulnerabilities and attack likelihood.

Telecom Reseller
StorONE technologies offers opportunities for channel partners, MSPs is storage, Podcast

Telecom Reseller

Play Episode Listen Later Sep 4, 2024


You can integrate between the products very easy, no licensing. Same advantage is for the MSPs and for the channels. They can offer more services. Channel can offer to his customers more product that he doesn't need to have ten or fifteen different products to come to his end user, for his customer. Gal Naor “We are a pure software,” says Gal Naor, CEO of StorONE. “You can mix between drives. You can start with SSD with one vendor and later to mix another drives. We are software only that we don't have any other dependency. and you give you all the flexibility to mix between hardware. It could be any server, it could be any vendor, it could be physical or even virtual, as we mentioned that we can operate in the cloud. It could be any location, it's hybrid, it could be on-prem, it could be hybrid cloud, it could be multi-cloud.” In this podcast, Gall takes us through an introduction to StorONE and their unique approach to storage. Recently, StorONE collaborated with Accessium Group, a premier healthcare consulting firm specializing in IT services, to deliver high capacity, high performance, HIPAA-compliant storage solutions to the healthcare market. This partnership has successfully enhanced data management capabilities in critical medical fields such as radiology, cardiology, pathology, neurology, and ophthalmology. StorONE's high-performance storage solutions, powered by cutting-edge technologies like NVMe and advanced tiering algorithms, ensure lightning-fast access to medical images, reducing retrieval times, enhancing data access speeds, and minimizing latency issues. These improvements are critical for timely diagnostics and treatment. Gal outlines how these technologies can be used in HIPPA related environments, and also in many other industries. “You can integrate between the products very easy, no licensing. Same advantage is for the MSPs and for the channels. They can offer more services. Channel can offer to his customers more product that he doesn't need to have ten or fifteen different products to come to his end user, for his customer. It can come just with one product that can provide all the storage services and all the storage use cases.” Additionally, StorONE has reinforced its support for HIPAA with the latest version of its S1 software. This version offers an enhanced set of security features that act as the last line of defense in the event of a cyberattack in a healthcare setting. Key features include: Auto-Tiering, dynamically transferring sensitive data for optimal utilization Lockable and Immutable Snapshots that ensure the integrity and consistency of data, helping healthcare organizations meet regulatory requirements for data retention and protection Ransomware and Rapid Recovery, providing well-defined procedures and tools to detect, respond to, and recover from ransomware attacks Multi-Admin Approval: Ensures multiple admins and consensus decision-making to prevent unauthorized access, with data security managed at the storage layer Anomaly Detection and Audit Logs: Identifies security threats, system malfunctions, and other irregularities, essential for robust cybersecurity About StorONE Headquartered in New York, StorONE offers the only 100% enterprise software that abstracts hardware and software without any hardware dependency. Our unique technology is designed for high-capacity with high-performance storage solutions. With an eight-year investment in completely rewriting the storage stack from the ground up, StorONE maximizes drive utilization, dramatically reducing the number of disks required and providing state-of-the-art data protection against security threats. StorONE provides ONE software solution for all storage use cases, supporting any storage protocol, disk type, or location, whether on-premises or in the cloud. By integrating data integrity, retention, protection, replication, and security features into a single product,

The Data Stack Show
203: From Data Dreams to Practical Marketing Outcomes with Spencer Burke of Braze

The Data Stack Show

Play Episode Listen Later Aug 21, 2024 46:39


Highlights from this week's conversation include:Spencer's Background at Braze (1:54)The Early Days of Braze (2:41)Finding Product-Market Fit (4:44)First Major Customer (6:33)Unique Aspects of Braze's Growth Team (8:07)Startup Culture Experience (10:40)Data and Marketing Perspectives (12:50)Common Marketing Data Challenges (15:50)Changing Dynamics in Marketing Tech (18:12)Evaluating Marketing Tools (19:38)Transformation of Marketing Tools (22:18)Marketers Becoming More Technical (24:10)API Utilization in Marketing (25:46)Connecting Customer Experience (29:09)Flexibility in Data Integration (32:05)Pushing vs. Pulling Data (34:35)Anomaly Detection in Data Reporting (37:02)Understanding the Importance of Core KPIs (39:09)Making Data More Consumable (42:38)Final Thoughts and Takeaways (44:51)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we'll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

ITSPmagazine | Technology. Cybersecurity. Society
Coro's Modular Cybersecurity and True Platform Revolution | A Brand Story Conversation From Black Hat USA 2024 | A CORO Story with Dror Liwer | On Location Coverage with Sean Martin and Marco Ciappelli

ITSPmagazine | Technology. Cybersecurity. Society

Play Episode Listen Later Aug 8, 2024 20:37


At Black Hat 2024 in Las Vegas, Sean Martin from On Location interviews Dror Liwer of Coro, uncovering the impressive strides Coro has made in creating a truly cohesive cybersecurity platform. This conversation reveals how Coro distinguishes itself in an industry saturated with buzzwords and inadequate solutions, particularly for smaller and mid-sized businesses.Meeting in VegasSean Martin starts the conversation by appreciating the vibrant atmosphere at the Black Hat Business Hall. The colorful Coro booth, coupled with the energetic team, sets the perfect backdrop for a discussion centered on platform innovation.Sean Martin: "Here we are, Dror. Fantastic seeing you here in Vegas."Dror Liwer: "It's where we meet."The Platform BuzzThe term “platform” has become a buzzword in the cybersecurity industry. Dror explains that many companies claim to offer platforms, but these so-called platforms often result from the integration of various point solutions, which don't communicate effectively with each other.Dror Liwer: “We built Coro as a platform and have been a platform for 10 years. It's kind of funny to see everybody now catching up and trying to pretend to be a platform.”Dror criticizes how companies use “platform” to create market confusion, explaining that a true platform requires seamless integration, a single endpoint agent, and a unified data lake.Defining a True PlatformDror and Sean delve deep into what makes Coro's platform genuinely innovative. Dror emphasizes that a real platform collects and processes data across multiple modules, providing a single pane of glass for operators. He contrasts this with other solutions that merely integrate various tools, resulting in operational complexity and inefficiencies.Dror Liwer: "A real platform is an engine that has a set of tools on top of it that work seamlessly together using a single pane of glass, a single endpoint agent, and a single data lake that shares all of the information across all of the different modules."The Role of DataData integration is a cornerstone of Coro's platform. Dror explains that each module in Coro functions as both a sensor and protector, feeding data into the system and responding to anomalies in real-time.Dror Liwer: "The collection of data happens natively at the sensor. They feed all the data into one very large data lake."This unified approach allows Coro to eliminate the time-critical gap between event detection and response, a significant advantage over traditional systems that often rely on multiple disparate tools.Supporting MSPs and Mid-Market BusinessesOne of Coro's key missions is to support Managed Service Providers (MSPs) and mid-market businesses, sectors that have been largely overlooked by larger cybersecurity firms. By offering a more manageable and less costly platform, Coro empowers these providers to offer comprehensive cybersecurity services without the high operational costs traditionally associated with such tasks.Dror Liwer: “We are changing that economic equation, allowing MSPs to offer full cybersecurity solutions to their customers at an affordable price.”Fulfilling New RequirementsDror also sheds light on how Coro helps businesses comply with new regulatory requirements or cybersecurity mandates, often dictated by their position in the supply chain.Dror Liwer: "When this guy comes to you and says, ‘Hey, I need to now comply with this or do that,' this is an opportunity to tell them, ‘Don't worry. I got you covered. I have Coro for you.'”ConclusionDror Liwer's insights during Black Hat 2024 highlight how Coro is not only addressing but revolutionizing the cybersecurity needs of small to mid-sized businesses and their MSP partners. By creating a true platform that reduces complexity and operational costs, Coro sets a new standard in the cybersecurity industry.Learn more about CORO: https://itspm.ag/coronet-30deNote: This story contains promotional content. Learn more.Guest: Dror Liwer, Co-Founder at Coro [@coro_cyber]On LinkedIn | https://www.linkedin.com/in/drorliwer/ResourcesLearn more and catch more stories from CORO: https://www.itspmagazine.com/directory/coroView all of our Black Hat USA  2024 coverage: https://www.itspmagazine.com/black-hat-usa-2024-hacker-summer-camp-2024-event-coverage-in-las-vegasAre you interested in telling your story?https://www.itspmagazine.com/telling-your-story

Redefining CyberSecurity
Coro's Modular Cybersecurity and True Platform Revolution | A Brand Story Conversation From Black Hat USA 2024 | A CORO Story with Dror Liwer | On Location Coverage with Sean Martin and Marco Ciappelli

Redefining CyberSecurity

Play Episode Listen Later Aug 8, 2024 20:37


At Black Hat 2024 in Las Vegas, Sean Martin from On Location interviews Dror Liwer of Coro, uncovering the impressive strides Coro has made in creating a truly cohesive cybersecurity platform. This conversation reveals how Coro distinguishes itself in an industry saturated with buzzwords and inadequate solutions, particularly for smaller and mid-sized businesses.Meeting in VegasSean Martin starts the conversation by appreciating the vibrant atmosphere at the Black Hat Business Hall. The colorful Coro booth, coupled with the energetic team, sets the perfect backdrop for a discussion centered on platform innovation.Sean Martin: "Here we are, Dror. Fantastic seeing you here in Vegas."Dror Liwer: "It's where we meet."The Platform BuzzThe term “platform” has become a buzzword in the cybersecurity industry. Dror explains that many companies claim to offer platforms, but these so-called platforms often result from the integration of various point solutions, which don't communicate effectively with each other.Dror Liwer: “We built Coro as a platform and have been a platform for 10 years. It's kind of funny to see everybody now catching up and trying to pretend to be a platform.”Dror criticizes how companies use “platform” to create market confusion, explaining that a true platform requires seamless integration, a single endpoint agent, and a unified data lake.Defining a True PlatformDror and Sean delve deep into what makes Coro's platform genuinely innovative. Dror emphasizes that a real platform collects and processes data across multiple modules, providing a single pane of glass for operators. He contrasts this with other solutions that merely integrate various tools, resulting in operational complexity and inefficiencies.Dror Liwer: "A real platform is an engine that has a set of tools on top of it that work seamlessly together using a single pane of glass, a single endpoint agent, and a single data lake that shares all of the information across all of the different modules."The Role of DataData integration is a cornerstone of Coro's platform. Dror explains that each module in Coro functions as both a sensor and protector, feeding data into the system and responding to anomalies in real-time.Dror Liwer: "The collection of data happens natively at the sensor. They feed all the data into one very large data lake."This unified approach allows Coro to eliminate the time-critical gap between event detection and response, a significant advantage over traditional systems that often rely on multiple disparate tools.Supporting MSPs and Mid-Market BusinessesOne of Coro's key missions is to support Managed Service Providers (MSPs) and mid-market businesses, sectors that have been largely overlooked by larger cybersecurity firms. By offering a more manageable and less costly platform, Coro empowers these providers to offer comprehensive cybersecurity services without the high operational costs traditionally associated with such tasks.Dror Liwer: “We are changing that economic equation, allowing MSPs to offer full cybersecurity solutions to their customers at an affordable price.”Fulfilling New RequirementsDror also sheds light on how Coro helps businesses comply with new regulatory requirements or cybersecurity mandates, often dictated by their position in the supply chain.Dror Liwer: "When this guy comes to you and says, ‘Hey, I need to now comply with this or do that,' this is an opportunity to tell them, ‘Don't worry. I got you covered. I have Coro for you.'”ConclusionDror Liwer's insights during Black Hat 2024 highlight how Coro is not only addressing but revolutionizing the cybersecurity needs of small to mid-sized businesses and their MSP partners. By creating a true platform that reduces complexity and operational costs, Coro sets a new standard in the cybersecurity industry.Learn more about CORO: https://itspm.ag/coronet-30deNote: This story contains promotional content. Learn more.Guest: Dror Liwer, Co-Founder at Coro [@coro_cyber]On LinkedIn | https://www.linkedin.com/in/drorliwer/ResourcesLearn more and catch more stories from CORO: https://www.itspmagazine.com/directory/coroView all of our Black Hat USA  2024 coverage: https://www.itspmagazine.com/black-hat-usa-2024-hacker-summer-camp-2024-event-coverage-in-las-vegasAre you interested in telling your story?https://www.itspmagazine.com/telling-your-story

The Treasury Update Podcast
Coffee Break Session #114: What are Anomaly Detection Systems?

The Treasury Update Podcast

Play Episode Listen Later Jul 18, 2024 7:51


In today's episode, we'll hear from Paul Galloway on anomaly detection systems. What are they, and how are they used in treasury and finance? Tune in to find out.

Microsoft Mechanics Podcast
Vector Search using 95% Less Compute | DiskANN with Azure Cosmos DB

Microsoft Mechanics Podcast

Play Episode Listen Later Jun 8, 2024 16:05


Ensure high-accuracy, efficient vector search at massive scale with Azure Cosmos DB. Leveraging Microsoft's DiskANN, more IO traffic moves to disk to maximize storage capacity and enable high-speed similarity searches across all data, reducing memory dependency. This technology, powering global services like Microsoft 365, is now integrated into Azure Cosmos DB, enabling developers to build scalable, high-performance applications with built-in vector search, real-time fraud detection, and robust multi-tenancy support. Join Kirill Gavrylyuk, VP for Azure Cosmos DB, as he shares how Azure Cosmos DB with DiskANN offers unparalleled speed, efficiency, and accuracy, making it the ideal solution for modern AI-driven applications.   ► QUICK LINKS: 00:00 - Latest Cosmos DB optimizations with DiskANN 02:09 - Where DiskANN approach is beneficial 04:07 - Efficient querying 06:02 - DiskANN compared to HNSW 07:41 - Integrate DiskANN into a new or existing app 08:39 - Real-time transactional AI scenario 09:29 - Building a fraud detection sample app 10:59 - Vectorize transactions for anomaly detection 12:49 - Scaling to address high levels of traffic 14:05 - Manage multi-tenancy 15:35 - Wrap up   ► Link References Check out https://aka.ms/DiskANNCosmosDB Try out apps at https://aka.ms/DiskANNCosmosDBSamples   ► Unfamiliar with Microsoft Mechanics?  As Microsoft's official video series for IT, you can watch and share valuable content and demos of current and upcoming tech from the people who build it at Microsoft. • Subscribe to our YouTube: https://www.youtube.com/c/MicrosoftMechanicsSeries • Talk with other IT Pros, join us on the Microsoft Tech Community: https://techcommunity.microsoft.com/t5/microsoft-mechanics-blog/bg-p/MicrosoftMechanicsBlog • Watch or listen from anywhere, subscribe to our podcast: https://microsoftmechanics.libsyn.com/podcast   ► Keep getting this insider knowledge, join us on social: • Follow us on Twitter: https://twitter.com/MSFTMechanics  • Share knowledge on LinkedIn: https://www.linkedin.com/company/microsoft-mechanics/ • Enjoy us on Instagram: https://www.instagram.com/msftmechanics/ • Loosen up with us on TikTok: https://www.tiktok.com/@msftmechanics

Oracle University Podcast
Encore Episode: The OCI AI Portfolio

Oracle University Podcast

Play Episode Listen Later May 21, 2024 16:38


Oracle has been actively focusing on bringing AI to the enterprise at every layer of its tech stack, be it SaaS apps, AI services, infrastructure, or data.   In this episode, hosts Lois Houston and Nikita Abraham, along with senior instructors Hemant Gahankari and Himanshu Raj, discuss OCI AI and Machine Learning services. They also go over some key OCI Data Science concepts and responsible AI principles.   Oracle MyLearn: https://mylearn.oracle.com/ou/learning-path/become-an-oci-ai-foundations-associate-2023/127177   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X (formerly Twitter): https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Himanshu Raj, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript:   00:00 The world of artificial intelligence is vast and everchanging. And with all the buzz around it lately, we figured it was the perfect time to revisit our AI Made Easy series. Join us over the next few weeks as we chat about all things AI, helping you to discover its endless possibilities. Ready to dive in? Let's go! 00:33 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started! 00:46 Lois: Welcome to the Oracle University Podcast! I'm Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor. Nikita: Hey everyone! In our last episode, we dove into Generative AI and Language Learning Models.  Lois: Yeah, that was an interesting one. But today, we're going to discuss the AI and machine learning services offered by Oracle Cloud Infrastructure, and we'll look at the OCI AI infrastructure. Nikita: I'm also going to try and squeeze in a couple of questions on a topic I'm really keen about, which is responsible AI. To take us through all of this, we have two of our colleagues, Hemant Gahankari and Himanshu Raj. Hemant is a Senior Principal OCI Instructor and Himanshu is a Senior Instructor on AI/ML. So, let's get started! 01:36 Lois: Hi Hemant! We're so excited to have you here! We know that Oracle has really been focusing on bringing AI to the enterprise at every layer of our stack.  Hemant: It all begins with data and infrastructure layers. OCI AI services consume data, and AI services, in turn, are consumed by applications.  This approach involves extensive investment from infrastructure to SaaS applications. Generative AI and massive scale models are the more recent steps. Oracle AI is the portfolio of cloud services for helping organizations use the data they may have for the business-specific uses.  Business applications consume AI and ML services. The foundation of AI services and ML services is data. AI services contain pre-built models for specific uses. Some of the AI services are pre-trained, and some can be additionally trained by the customer with their own data.  AI services can be consumed by calling the API for the service, passing in the data to be processed, and the service returns a result. There is no infrastructure to be managed for using AI services.  02:58 Nikita: How do I access OCI AI services? Hemant: OCI AI services provide multiple methods for access. The most common method is the OCI Console. The OCI Console provides an easy to use, browser-based interface that enables access to notebook sessions and all the features of all the data science, as well as AI services.  The REST API provides access to service functionality but requires programming expertise. And API reference is provided in the product documentation. OCI also provides programming language SDKs for Java, Python, TypeScript, JavaScript, .Net, Go, and Ruby. The command line interface provides both quick access and full functionality without the need for scripting.  03:52 Lois: Hemant, what are the types of OCI AI services that are available?  Hemant: OCI AI services is a collection of services with pre-built machine learning models that make it easier for developers to build a variety of business applications. The models can also be custom trained for more accurate business results. The different services provided are digital assistant, language, vision, speech, document understanding, anomaly detection.  04:24 Lois: I know we're going to talk about them in more detail in the next episode, but can you introduce us to OCI Language, Vision, and Speech? Hemant: OCI Language allows you to perform sophisticated text analysis at scale. Using the pre-trained and custom models, you can process unstructured text to extract insights without data science expertise. Pre-trained models include language detection, sentiment analysis, key phrase extraction, text classification, named entity recognition, and personal identifiable information detection.  Custom models can be trained for named entity recognition and text classification with domain-specific data sets. In text translation, natural machine translation is used to translate text across numerous languages.  Using OCI Vision, you can upload images to detect and classify objects in them. Pre-trained models and custom models are supported. In image analysis, pre-trained models perform object detection, image classification, and optical character recognition. In image analysis, custom models can perform custom object detection by detecting the location of custom objects in an image and providing a bounding box.  The OCI Speech service is used to convert media files to readable texts that's stored in JSON and SRT format. Speech enables you to easily convert media files containing human speech into highly exact text transcriptions.  06:12 Nikita: That's great. And what about document understanding and anomaly detection? Hemant: Using OCI document understanding, you can upload documents to detect and classify text and objects in them. You can process individual files or batches of documents. In OCR, document understanding can detect and recognize text in a document. In text extraction, document understanding provides the word level and line level text, and the bounding box, coordinates of where the text is found.  In key value extraction, document understanding extracts a predefined list of key value pairs of information from receipts, invoices, passports, and driver IDs. In table extraction, document understanding extracts content in tabular format, maintaining the row and column relationship of cells. In document classification, the document understanding classifies documents into different types.  The OCI Anomaly Detection service is a service that analyzes large volume of multivariate or univariate time series data. The Anomaly Detection service increases the reliability of businesses by monitoring their critical assets and detecting anomalies early with high precision. Anomaly Detection is the identification of rare items, events, or observations in data that differ significantly from the expectation.  07:55 Nikita: Where is Anomaly Detection most useful? Hemant: The Anomaly Detection service is designed to help with analyzing large amounts of data and identifying the anomalies at the earliest possible time with maximum accuracy. Different sectors, such as utility, oil and gas, transportation, manufacturing, telecommunications, banking, and insurance use Anomaly Detection service for their day-to-day activities.  08:23 Lois: Ok…and the first OCI AI service you mentioned was digital assistant… Hemant: Oracle Digital Assistant is a platform that allows you to create and deploy digital assistants, which are AI driven interfaces that help users accomplish a variety of tasks with natural language conversations. When a user engages with the Digital Assistant, the Digital Assistant evaluates the user input and routes the conversation to and from the appropriate skills.  Digital Assistant greets the user upon access. Upon user requests, list what it can do and provide entry points into the given skills. It routes explicit user requests to the appropriate skills. And it also handles interruptions to flows and disambiguation. It also handles requests to exit the bot.  09:21 Nikita: Excellent! Let's bring Himanshu in to tell us about machine learning services. Hi Himanshu! Let's talk about OCI Data Science. Can you tell us a bit about it? Himanshu: OCI Data Science is the cloud service focused on serving the data scientist throughout the full machine learning life cycle with support for Python and open source.  The service has many features, such as model catalog, projects, JupyterLab notebook, model deployment, model training, management, model explanation, open source libraries, and AutoML.  09:56 Lois: Himanshu, what are the core principles of OCI Data Science?  Himanshu: There are three core principles of OCI Data Science. The first one, accelerated. The first principle is about accelerating the work of the individual data scientist. OCI Data Science provides data scientists with open source libraries along with easy access to a range of compute power without having to manage any infrastructure. It also includes Oracle's own library to help streamline many aspects of their work.  The second principle is collaborative. It goes beyond an individual data scientist's productivity to enable data science teams to work together. This is done through the sharing of assets, reducing duplicative work, and putting reproducibility and auditability of models for collaboration and risk management.  Third is enterprise grade. That means it's integrated with all the OCI Security and access protocols. The underlying infrastructure is fully managed. The customer does not have to think about provisioning compute and storage. And the service handles all the maintenance, patching, and upgrades so user can focus on solving business problems with data science.  11:11 Nikita: Let's drill down into the specifics of OCI Data Science. So far, we know it's cloud service to rapidly build, train, deploy, and manage machine learning models. But who can use it? Where is it? And how is it used? Himanshu: It serves data scientists and data science teams throughout the full machine learning life cycle.  Users work in a familiar JupyterLab notebook interface, where they write Python code. And how it is used? So users preserve their models in the model catalog and deploy their models to a managed infrastructure.  11:46 Lois: Walk us through some of the key terminology that's used. Himanshu: Some of the important product terminology of OCI Data Science are projects. The projects are containers that enable data science teams to organize their work. They represent collaborative work spaces for organizing and documenting data science assets, such as notebook sessions and models.  Note that tenancy can have as many projects as needed without limits. Now, this notebook session is where the data scientists work. Notebook sessions provide a JupyterLab environment with pre-installed open source libraries and the ability to add others. Notebook sessions are interactive coding environment for building and training models.  Notebook sessions run in a managed infrastructure and the user can select CPU or GPU, the compute shape, and amount of storage without having to do any manual provisioning. The other important feature is Conda environment. It's an open source environment and package management system and was created for Python programs.  12:53 Nikita: What is a Conda environment used for? Himanshu: It is used in the service to quickly install, run, and update packages and their dependencies. Conda easily creates, saves, loads, and switches between environments in your notebooks sessions. 13:07 Nikita: Earlier, you spoke about the support for Python in OCI Data Science. Is there a dedicated library? Himanshu: Oracle's Accelerated Data Science ADS SDK is a Python library that is included as part of OCI Data Science.  ADS has many functions and objects that automate or simplify the steps in the data science workflow, including connecting to data, exploring, and visualizing data. Training a model with AutoML, evaluating models, and explaining models. In addition, ADS provides a simple interface to access the data science service mode model catalog and other OCI services, including object storage.  13:45 Lois: I also hear a lot about models. What are models? Himanshu: Models define a mathematical representation of your data and business process. You create models in notebooks, sessions, inside projects.  13:57 Lois: What are some other important terminologies related to models? Himanshu: The next terminology is model catalog. The model catalog is a place to store, track, share, and manage models.  The model catalog is a centralized and managed repository of model artifacts. A stored model includes metadata about the provenance of the model, including Git-related information and the script. Our notebook used to push the model to the catalog. Models stored in the model catalog can be shared across members of a team, and they can be loaded back into a notebook session.  The next one is model deployments. Model deployments allow you to deploy models stored in the model catalog as HTTP endpoints on managed infrastructure.  14:45 Lois: So, how do you operationalize these models? Himanshu: Deploying machine learning models as web applications, HTTP API endpoints, serving predictions in real time is the most common way to operationalize models. HTTP endpoints or the API endpoints are flexible and can serve requests for the model predictions. Data science jobs enable you to define and run a repeatable machine learning tasks on fully managed infrastructure.  Nikita: Thanks for that, Himanshu.  15:18 Did you know that Oracle University offers free courses on Oracle Cloud Infrastructure? You'll find training on everything from cloud computing, database, and security, artificial intelligence, and machine learning, all free to subscribers. So, what are you waiting for? Pick a topic, leverage the Oracle University Learning Community to ask questions, and then sit for your certification. Visit mylearn.oracle.com to get started.  15:46 Nikita: Welcome back! The Oracle AI Stack consists of AI services and machine learning services, and these services are built using AI infrastructure. So, let's move on to that. Hemant, what are the components of OCI AI Infrastructure? Hemant: OCI AI Infrastructure is mainly composed of GPU-based instances. Instances can be virtual machines or bare metal machines. High performance cluster networking that allows instances to communicate to each other. Super clusters are a massive network of GPU instances with multiple petabytes per second of bandwidth. And a variety of fully managed storage options from a single byte to exabytes without upfront provisioning are also available.  16:35 Lois: Can we explore each of these components a little more? First, tell us, why do we need GPUs? Hemant: ML and AI needs lots of repetitive computations to be made on huge amounts of data. Parallel computing on GPUs is designed for many processes at the same time. A GPU is a piece of hardware that is incredibly good in performing computations.  GPU has thousands of lightweight cores, all working on their share of data in parallel. This gives them the ability to crunch through extremely large data set at tremendous speed.  17:14 Nikita: And what are the GPU instances offered by OCI? Hemant: GPU instances are ideally suited for model training and inference. Bare metal and virtual machine compute instances powered by NVIDIA GPUs H100, A100, A10, and V100 are made available by OCI.  17:35 Nikita: So how do we choose what to train from these different GPU options?  Hemant: For large scale AI training, data analytics, and high performance computing, bare metal instances BM 8 X NVIDIA H100 and BM 8 X NVIDIA A100 can be used.  These provide up to nine times faster AI training and 30 times higher acceleration for AI inferencing. The other bare metal and virtual machines are used for small AI training, inference, streaming, gaming, and virtual desktop infrastructure.  18:14 Lois: And why would someone choose the OCI AI stack over its counterparts? Hemant: Oracle offers all the features and is the most cost effective option when compared to its counterparts.  For example, BM GPU 4.8 version 2 instance costs just $4 per hour and is used by many customers.  Superclusters are a massive network with multiple petabytes per second of bandwidth. It can scale up to 4,096 OCI bare metal instances with 32,768 GPUs.  We also have a choice of bare metal A100 or H100 GPU instances, and we can select a variety of storage options, like object store, or block store, or even file system. For networking speeds, we can reach 1,600 GB per second with A100 GPUs and 3,200 GB per second with H100 GPUs.  With OCI storage, we can select local SSD up to four NVMe drives, block storage up to 32 terabytes per volume, object storage up to 10 terabytes per object, file systems up to eight exabyte per file system. OCI File system employs five replicated storage located in different fault domains to provide redundancy for resilient data protection.  HPC file systems, such as BeeGFS and many others are also offered. OCI HPC file systems are available on Oracle Cloud Marketplace and make it easy to deploy a variety of high performance file servers.  20:11 Lois: I think a discussion on AI would be incomplete if we don't talk about responsible AI. We're using AI more and more every day, but can we actually trust it? Hemant: For us to trust AI, it must be driven by ethics that guide us as well. Nikita: And do we have some principles that guide the use of AI? Hemant: AI should be lawful, complying with all applicable laws and regulations. AI should be ethical, that is it should ensure adherence to ethical principles and values that we uphold as humans. And AI should be robust, both from a technical and social perspective. Because even with the good intentions, AI systems can cause unintentional harm. AI systems do not operate in a lawless world. A number of legally binding rules at national and international level apply or are relevant to the development, deployment, and use of AI systems today. The law not only prohibits certain actions but also enables others, like protecting rights of minorities or protecting environment. Besides horizontally applicable rules, various domain-specific rules exist that apply to particular AI applications. For instance, the medical device regulation in the health care sector.  In AI context, equality entails that the systems' operations cannot generate unfairly biased outputs. And while we adopt AI, citizens right should also be protected.  21:50 Lois: Ok, but how do we derive AI ethics from these? Hemant: There are three main principles.  AI should be used to help humans and allow for oversight. It should never cause physical or social harm. Decisions taken by AI should be transparent and fair, and also should be explainable. AI that follows the AI ethical principles is responsible AI.  So if we map the AI ethical principles to responsible AI requirements, these will be like, AI systems should follow human-centric design principles and leave meaningful opportunity for human choice. This means securing human oversight. AI systems and environments in which they operate must be safe and secure, they must be technically robust, and should not be open to malicious use.  The development, and deployment, and use of AI systems must be fair, ensuring equal and just distribution of both benefits and costs. AI should be free from unfair bias and discrimination. Decisions taken by AI to the extent possible should be explainable to those directly and indirectly affected.  23:21 Nikita: This is all great, but what does a typical responsible AI implementation process look like?  Hemant: First, a governance needs to be put in place. Second, develop a set of policies and procedures to be followed. And once implemented, ensure compliance by regular monitoring and evaluation.  Lois: And this is all managed by developers? Hemant: Typical roles that are involved in the implementation cycles are developers, deployers, and end users of the AI.  23:56 Nikita: Can we talk about AI specifically in health care? How do we ensure that there is fairness and no bias? Hemant: AI systems are only as good as the data that they are trained on. If that data is predominantly from one gender or racial group, the AI systems might not perform as well on data from other groups.  24:21 Lois: Yeah, and there's also the issue of ensuring transparency, right? Hemant: AI systems often make decisions based on complex algorithms that are difficult for humans to understand. As a result, patients and health care providers can have difficulty trusting the decisions made by the AI. AI systems must be regularly evaluated to ensure that they are performing as intended and not causing harm to patients.  24:49 Nikita: Thank you, Hemant and Himanshu, for this really insightful session. If you're interested in learning more about the topics we discussed today, head on over to mylearn.oracle.com and search for the Oracle Cloud Infrastructure AI Foundations course.  Lois: That's right, Niki. You'll find demos that you watch as well as skill checks that you can attempt to better your understanding. In our next episode, we'll get into the OCI AI Services we discussed today and talk about them in more detail. Until then, this is Lois Houston… Nikita: And Nikita Abraham, signing off! 25:25 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.

The Engineering Enablement Podcast
How Slack fully automates deploys and anomaly detection with Z-scores | Sean Mcllroy (Slack)

The Engineering Enablement Podcast

Play Episode Listen Later Apr 23, 2024 33:49


This week we're joined by Sean Mcllroy from Slack's Release Engineering team to learn about how they've fully automated their deployment process. This conversation covers Slack's original release process, key changes Sean's team has made, and the latest challenges they're working on today. Discussion points:(1:34): The Release Engineering team(2:13): How the monolith has served Slack (3:24): How the deployment process used to work (6:23): The complexity of the deploy itself(7:39): Early ideas for improving the deployment process(9:07): Why anomaly detection is challenging(10:32): What a Z-score is(13:23): Managing noise with Z-scores(16:49): Presenting this information to people that need it(19:54): Taking humans out of the process(23:13): Handling rollbacks(25:27): Not overloading developers with information(28:26): Handling large deploymentsMentions and links:Read Sean's blog post, The Scary Thing About DeploysFollow Sean on LinkedIn

Humans of Martech
116: Kevin Hu: How data observability and anomaly detection can enhance MOps

Humans of Martech

Play Episode Listen Later Apr 23, 2024 50:36


What's up everyone, today we have the pleasure of sitting down with Kevin Hu (Hoo), Co-founder and CEO at Metaplane. Summary: Dr. Kevin Hu gives us a masterclass on everything data. Data analysis, data storytelling, data quality, data observability and data anomaly detection. We unpack the power of inquisitive data analysis and a hypothesis-driven approach, emphasizing the importance of balancing data perfection with actually doing the work of activating that data. He highlights data observability and anomaly detection as a key to preempting errors, ensuring data integrity for a seamless user experience. Amid the rise of AI in martech, he champions marketing ops' role in safeguarding data quality, making clear that success hinges on our ability to manage data with precision, creativity, and proactive vigilance. About KevinKevin did his undergrad in Physics at MITHe later collaborated with his biologist sister, assisting in analyzing five years of fish behavior data. This experience inspired him to further his research and earn a master's degree in Data Visualization and Machine LearningHe also completed a PhD in Philosophy at MIT where he led research on automated data visualization and semantic type detection His research was published at several conferences like CHI (pronounced Kai) (human-computer interaction), SIGMOD (database) and KDD (data mining) and featured in the Economist, NYT and WiredIn 2019, Kevin teamed up with former Hubspot and Appcues engineers to launch Metaplane, initially set out to be a product focused on customer success, designed to analyze company data for churn preventionBut after going through Y Combinator, the company pivoted slightly to build data analytics-focused toolsToday Metaplane is a data observability platform powered by ML-based anomaly detection that helps teams prevent and detect data issues — before the CEO pings them about weird revenue numbers.How to Ask the Right Questions in Data AnalysisWhen Kevin shared the profound impact César Hidalgo, his mentor at MIT, had on his journey into the data world, it wasn't just about learning to analyze data; it was about asking the right questions. César put together one of our favorite TED talks ever – Why we should automate politicians with AI agents – this was back in 2018, long before ChatGPT was popular. Hidalgo, recognized not only for AI and ML applications but also developing innovative methods to visualize complex data and making it understandable to a broader audience, was the most important teacher in Kevin's life. He helped Kevin understand that the bottleneck in data analysis wasn't necessarily a lack of coding skills but a gap in understanding what to ask of the data. This revelation came at a pivotal moment as Kevin navigated his path through grad school, influenced by his sister's work in animal behavior and his own struggles with coding tools like R and MATLAB.Under Hidalgo's guidance, Kevin was introduced to a broader perspective on data analysis. This wasn't just about running numbers through a program; it was about diffusing those numbers with context and meaning. Hidalgo's approach to mentorship, characterized by personalized attention and encouragement to delve into complex ideas, like those presented in Steven Pinker's "The Blank Slate," opened up a new world of inquiry for Kevin. It was a world where the questions one asked were as critical as the data one analyzed.This mentorship experience highlights the importance of curiosity and critical thinking in the field of data science. Kevin's reflection on his journey reveals a key insight: mastering coding languages is only one piece of the puzzle. The ability to question, to seek out the stories data tells, and to understand the broader implications of those stories is equally, if not more, important.Kevin's gratitude towards Hidalgo for his investment in students' growth serves as a reminder of the value of mentorship. It's a testament to the idea that the best mentors don't just teach you how to execute tasks; they inspire you to see beyond the immediate horizon. They challenge you to think deeply about your work and its impact on the world.Key takeaway: For marketers delving into data-informed strategies, Kevin's story is a powerful reminder that beyond the technical skills, the ability to ask compelling, insightful questions of your data can dramatically amplify its value. Focus on nurturing a deep, inquisitive approach to understanding consumer behavior and market trends.Bridging Academic Rigor with Startup AgilityDuring his career in academia working alongside Olympian-caliber scientists and researchers, Kevin garnered insights that have since influenced his approach to running a startup. The parallels between academia and startups are striking, with both realms embodying a journey of perseverance and unpredictability. This analogy provides a foundational mindset for entrepreneurs who must navigate the uncertain waters of business development with resilience and adaptability.At the heart of Kevin's philosophy is the adoption of a hypothesis-driven approach. This methodology, borrowed from academic research, emphasizes the importance of formulating hypotheses for various aspects of business operations, particularly in marketing strategies. Identifying the ideal customer profile (ICP), crafting compelling messaging, and selecting the optimal channels are seen not as static decisions but as theories to be rigorously tested and iterated upon. This empirical approach allows for a methodical exploration of what resonates best with the target audience, acknowledging that today's successful strategy may need reevaluation tomorrow.Another vital lesson from academia that Kevin emphasizes is the respect for past endeavors. In a startup ecosystem often obsessed with innovation, there's a tendency to overlook the lessons learned from previous attempts in similar ventures. By acknowledging and building upon the efforts of predecessors, Kevin advocates for a more informed and grounded approach to innovation. This perspective encourages entrepreneurs to consider the historical context of their ideas and strategies, potentially saving time and resources by learning from past mistakes rather than repeating them.Key takeaway: Embracing a hypothesis-driven mindset should be familiar grounds for marketers. Challenge your team to identify and test hypotheses around underexplored or seemingly less significant customer segments. This could involve hypothesizing the effectiveness of personalized content for a niche within your broader audience that has been overlooked, measuring engagement against broader campaigns.Balancing Data Accuracy with Rapid GrowthFor startups grappling with survival, the luxury of perfect data is often out of reach. Kevin points out that data quality should be tailored to the specific needs of the business. For instance, data utilized for quarterly board meetings does not necessitate the same level of freshness as data driving daily customer interactions. This pragmatic approach underscores the importance of defining data quality standards based on the frequency and criticality of business decisions.At the heart of Kevin's argument is the concept that as businesses scale, the stakes of data accuracy and timeliness escalate. He highlights scenarios where real-time data becomes crucial, such as B2B SaaS companies engaging with potential leads or e-commerce platforms optimizing their customer journey. In these cases, even slight inaccuracies or delays can result in missed revenue opportunities or diminished customer trust.This discourse on data quality transcends the binary choice between perfect data and rapid action. Instead, Kevin advoc...

Service Management Leadership Podcast with Jeffrey Tefertiller
Service Management Leadership - Guest Host Luigi Ferri Talking About Anomaly Detection versus Behavior Detection

Service Management Leadership Podcast with Jeffrey Tefertiller

Play Episode Listen Later Apr 19, 2024 4:20


Luigi Ferri takes the mic to talk about Anomaly Detection versus Behavior Detection in this Service Management Leadership Podcast episode. Here is Luigi's LinkedIn profile: https://it.linkedin.com/in/theitsmpractice Each week, Jeffrey will either be sharing his knowledge or interviewing guests from the technology, Service Management, or Business Continuity leadership communities.  Stay tuned as tomorrow's show is one you will not want to miss. Jeffrey is the founder of Service Management Leadership, an IT consulting firm specializing in Service Management, CIO Advisory, and Business Continuity services.  The firm's website is www.servicemanagement.us.  Jeffrey is an accomplished author with seven acclaimed books in the subject area and a popular YouTube channel with approximately 1,400 videos on various topics.  Also, please follow the Service Management Leadership LinkedIn page. Branding by Balaji - Follow him at @bwithbranding on Instagram #ITSM #ITIL #AssetManagement #ServiceManagement #IT #BusinessContinuity #Transformation

The Nonlinear Library
LW - A gentle introduction to mechanistic anomaly detection by Erik Jenner

The Nonlinear Library

Play Episode Listen Later Apr 4, 2024 16:27


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A gentle introduction to mechanistic anomaly detection, published by Erik Jenner on April 4, 2024 on LessWrong. TL;DR: Mechanistic anomaly detection aims to flag when an AI produces outputs for "unusual reasons." It is similar to mechanistic interpretability but doesn't demand human understanding. I give a self-contained introduction to mechanistic anomaly detection from a slightly different angle than the existing one by Paul Christiano (focused less on heuristic arguments and drawing a more explicit parallel to interpretability). Mechanistic anomaly detection was first introduced by the Alignment Research Center (ARC), and a lot of this post is based on their ideas. However, I am not affiliated with ARC; this post represents my perspective. Introduction We want to create useful AI systems that never do anything too bad. Mechanistic anomaly detection relaxes this goal in two big ways: Instead of eliminating all bad behavior from the start, we're just aiming to flag AI outputs online. Instead of specifically flagging bad outputs, we flag any outputs that the AI produced for "unusual reasons." These are serious simplifications. But strong methods for mechanistic anomaly detection (or MAD for short) might still be important progress toward the full goal or even achieve it entirely: Reliably flagging bad behavior would certainly be a meaningful step (and perhaps sufficient if we can use the detector as a training signal or are just fine with discarding some outputs). Not all the cases flagged as unusual by MAD will be bad, but the hope is that the converse holds: with the right notion of "unusual reasons," all bad cases might involve unusual reasons. Often we may be fine with flagging more cases than just the bad ones, as long as it's not excessive. I intentionally say "unusual reasons for an output" rather than "unusual inputs" or "unusual outputs." Good and bad outputs could look indistinguishable to us if they are sufficiently complex, and inputs might have similar problems. The focus on mechanistic anomalies (or "unusual reasons") distinguishes MAD from other out-of-distribution or anomaly detection problems. Because of this, I read the name as "[mechanistic anomaly] detection" - it's about detecting mechanistic anomalies rather than detecting any anomalies with mechanistic means. One intuition pump for mechanistic anomaly detection comes from mechanistic interpretability. If we understand an AI system sufficiently well, we should be able to detect, for example, when it thinks it's been deployed and executes a treacherous turn. The hope behind MAD is that human understanding isn't required and that we can detect cases like this as "mechanistically anomalous" without any reference to humans. This might make the problem much easier than if we demand human understanding. The Alignment Research Center (ARC) is trying to formalize "reasons" for an AI's output using heuristic arguments. If successful, this theoretical approach might provide an indefinitely scalable solution to MAD. Collaborators and I are working on a more empirical approach to MAD that is not centered on heuristic arguments, and this post gives a self-contained introduction that might be more suitable to that perspective (and perhaps helpful for readers with an interpretability background). Thanks to Viktor Rehnberg, Oliver Daniels-Koch, Jordan Taylor, Mark Xu, Alex Mallen, and Lawrence Chan for feedback on a draft! Mechanistic anomaly detection as an alternative to interpretability: a toy example As a toy example, let's start with the SmartVault setting from the ELK report. SmartVault is a vault housing a diamond that we want to protect from robbers. We would like an AI to use various actuators to keep the diamond safe by stopping any robbers. There is a camera pointed at the diamond, which we want to u...

The Nonlinear Library: LessWrong
LW - A gentle introduction to mechanistic anomaly detection by Erik Jenner

The Nonlinear Library: LessWrong

Play Episode Listen Later Apr 4, 2024 16:27


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A gentle introduction to mechanistic anomaly detection, published by Erik Jenner on April 4, 2024 on LessWrong. TL;DR: Mechanistic anomaly detection aims to flag when an AI produces outputs for "unusual reasons." It is similar to mechanistic interpretability but doesn't demand human understanding. I give a self-contained introduction to mechanistic anomaly detection from a slightly different angle than the existing one by Paul Christiano (focused less on heuristic arguments and drawing a more explicit parallel to interpretability). Mechanistic anomaly detection was first introduced by the Alignment Research Center (ARC), and a lot of this post is based on their ideas. However, I am not affiliated with ARC; this post represents my perspective. Introduction We want to create useful AI systems that never do anything too bad. Mechanistic anomaly detection relaxes this goal in two big ways: Instead of eliminating all bad behavior from the start, we're just aiming to flag AI outputs online. Instead of specifically flagging bad outputs, we flag any outputs that the AI produced for "unusual reasons." These are serious simplifications. But strong methods for mechanistic anomaly detection (or MAD for short) might still be important progress toward the full goal or even achieve it entirely: Reliably flagging bad behavior would certainly be a meaningful step (and perhaps sufficient if we can use the detector as a training signal or are just fine with discarding some outputs). Not all the cases flagged as unusual by MAD will be bad, but the hope is that the converse holds: with the right notion of "unusual reasons," all bad cases might involve unusual reasons. Often we may be fine with flagging more cases than just the bad ones, as long as it's not excessive. I intentionally say "unusual reasons for an output" rather than "unusual inputs" or "unusual outputs." Good and bad outputs could look indistinguishable to us if they are sufficiently complex, and inputs might have similar problems. The focus on mechanistic anomalies (or "unusual reasons") distinguishes MAD from other out-of-distribution or anomaly detection problems. Because of this, I read the name as "[mechanistic anomaly] detection" - it's about detecting mechanistic anomalies rather than detecting any anomalies with mechanistic means. One intuition pump for mechanistic anomaly detection comes from mechanistic interpretability. If we understand an AI system sufficiently well, we should be able to detect, for example, when it thinks it's been deployed and executes a treacherous turn. The hope behind MAD is that human understanding isn't required and that we can detect cases like this as "mechanistically anomalous" without any reference to humans. This might make the problem much easier than if we demand human understanding. The Alignment Research Center (ARC) is trying to formalize "reasons" for an AI's output using heuristic arguments. If successful, this theoretical approach might provide an indefinitely scalable solution to MAD. Collaborators and I are working on a more empirical approach to MAD that is not centered on heuristic arguments, and this post gives a self-contained introduction that might be more suitable to that perspective (and perhaps helpful for readers with an interpretability background). Thanks to Viktor Rehnberg, Oliver Daniels-Koch, Jordan Taylor, Mark Xu, Alex Mallen, and Lawrence Chan for feedback on a draft! Mechanistic anomaly detection as an alternative to interpretability: a toy example As a toy example, let's start with the SmartVault setting from the ELK report. SmartVault is a vault housing a diamond that we want to protect from robbers. We would like an AI to use various actuators to keep the diamond safe by stopping any robbers. There is a camera pointed at the diamond, which we want to u...

Data Engineering Podcast
Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Data Engineering Podcast

Play Episode Listen Later Mar 31, 2024 50:44


Summary Working with data is a complicated process, with numerous chances for something to go wrong. Identifying and accounting for those errors is a critical piece of building trust in the organization that your data is accurate and up to date. While there are numerous products available to provide that visibility, they all have different technologies and workflows that they focus on. To bring observability to dbt projects the team at Elementary embedded themselves into the workflow. In this episode Maayan Salom explores the approach that she has taken to bring observability, enhanced testing capabilities, and anomaly detection into every step of the dbt developer experience. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst (https://www.dataengineeringpodcast.com/starburst) and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Dagster offers a new approach to building and running data platforms and data pipelines. It is an open-source, cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. Your team can get up and running in minutes thanks to Dagster Cloud, an enterprise-class hosted solution that offers serverless and hybrid deployments, enhanced security, and on-demand ephemeral test deployments. Go to dataengineeringpodcast.com/dagster (https://www.dataengineeringpodcast.com/dagster) today to get started. Your first 30 days are free! This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your data workflow, from migration to dbt deployment. Datafold has recently launched data replication testing, providing ongoing validation for source-to-target replication. Leverage Datafold's fast cross-database data diffing and Monitoring to test your replication pipelines automatically and continuously. Validate consistency between source and target at any scale, and receive alerts about any discrepancies. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold (https://www.dataengineeringpodcast.com/datafold). Your host is Tobias Macey and today I'm interviewing Maayan Salom about how to incorporate observability into a dbt-oriented workflow and how Elementary can help Interview Introduction How did you get involved in the area of data management? Can you start by outlining what elements of observability are most relevant for dbt projects? What are some of the common ad-hoc/DIY methods that teams develop to acquire those insights? What are the challenges/shortcomings associated with those approaches? Over the past ~3 years there were numerous data observability systems/products created. What are some of the ways that the specifics of dbt workflows are not covered by those generalized tools? What are the insights that can be more easily generated by embedding into the dbt toolchain and development cycle? Can you describe what Elementary is and how it is designed to enhance the development and maintenance work in dbt projects? How is Elementary designed/implemented? How have the scope and goals of the project changed since you started working on it? What are the engineering challenges/frustrations that you have dealt with in the creation and evolution of Elementary? Can you talk us through the setup and workflow for teams adopting Elementary in their dbt projects? How does the incorporation of Elementary change the development habits of the teams who are using it? What are the most interesting, innovative, or unexpected ways that you have seen Elementary used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Elementary? When is Elementary the wrong choice? What do you have planned for the future of Elementary? Contact Info LinkedIn (https://www.linkedin.com/in/maayansa/?originalSubdomain=il) Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. Links Elementary (https://www.elementary-data.com/) Data Observability (https://www.montecarlodata.com/blog-what-is-data-observability/) dbt (https://www.getdbt.com/) Datadog (https://www.datadoghq.com/) pre-commit (https://pre-commit.com/) dbt packages (https://docs.getdbt.com/docs/build/packages) SQLMesh (https://sqlmesh.readthedocs.io/en/latest/) Malloy (https://www.malloydata.dev/) SDF (https://www.sdf.com/) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)

Generation AI
The Evolution of Dashboards: From Static to AI-Powered Insights

Generation AI

Play Episode Listen Later Mar 19, 2024 41:20


In this episode of Generation AI, hosts Ardis Kadiu and Dr. JC Bonilla dive into the evolution of dashboards and how AI is transforming the way higher education leaders interact with and derive insights from data. They explore the limitations of traditional dashboards and discuss how generative AI is empowering users to ask questions in natural language, receive personalized insights, and make data-driven decisions. The episode highlights the importance of data trust, the role of AI in anomaly detection, and the potential for AI-powered dashboards to provide predictive and proactive insights. Listeners will gain a deeper understanding of how AI is revolutionizing business intelligence in higher education and why it matters for leaders looking to navigate the future of data-driven decision-making.The Evolution of DashboardsDashboards have evolved from static decision support systems (DSS) in the 1960s to modern business intelligence (BI) tools.Traditional dashboards require significant work, suffer from adoption issues, and often lack stickiness.The profession of reporting and the art of dashboarding are still developing in higher education.Components of Dashboards in Higher EducationData sources (SIS, CRMs, Excel spreadsheets) and visualization layers (Tableau, Power BI, Google Looker) are key components.Higher education institutions often lack dedicated professionals to balance data sources and visualization.Data silos and the need for data warehousing or data lakes remain challenges for many institutions.The Role of Semantic Search and AI in DashboardsSemantic search allows users to query data sources using natural language, making dashboards more accessible.AI presents opportunities for more dynamic data sources, narratives, and personalized KPIs.Generative AI can automatically create and update visualizations based on user queries and data patterns.Anomaly Detection and Enhanced Understanding with AIAI-powered dashboards can detect both positive and negative anomalies, providing proactive notifications.Multimodal models and conversational chatbots can enhance understanding by interpreting visuals and providing context.AI can be trained to perform sentiment analysis, benchmarking, and predictive analytics on dashboard data.Personalization and the Future of AI-Powered DashboardsAI enables personalized insights based on user roles, data access, and cognitive preferences.The future of dashboards may eliminate the need for static visualizations, focusing on natural language queries and AI-generated insights.AI-powered dashboards will offer real-time customization, prediction, enhancement, notifications, and adaptive features to empower data-driven decision-making in higher education. - - - -Connect With Our Co-Hosts:Ardis Kadiuhttps://www.linkedin.com/in/ardis/https://twitter.com/ardisDr. JC Bonillahttps://www.linkedin.com/in/jcbonilla/https://twitter.com/jbonillxAbout The Enrollify Podcast Network:Generation AI is a part of the Enrollify Podcast Network. If you like this podcast, chances are you'll like other Enrollify shows too! Some of our favorites include The EduData Podcast and Visionary Voices: The College President's Playbook.Enrollify is made possible by Element451 — the next-generation AI student engagement platform helping institutions create meaningful and personalized interactions with students. Learn more at element451.com. Connect with Us at the Engage Summit:Exciting news — Ardis will be at the 2024 Engage Summit in Raleigh, NC, on June 25 and 26, and would love to meet you there! Sessions will focus on cutting-edge AI applications that are reshaping student outreach, enhancing staff productivity, and offering deep insights into ROI. Use the discount code Enrollify50 at checkout, and you can register for just $99! This early bird pricing lasts until March 31. Learn more and register at engage.element451.com — we can't wait to see you there!

The AI Fundamentalists
The importance of anomaly detection in AI

The AI Fundamentalists

Play Episode Listen Later Mar 6, 2024 35:48 Transcription Available


In this episode, the hosts focus on the basics of anomaly detection in machine learning and AI systems, including its importance, and how it is implemented. They also touch on the topic of large language models, the (in)accuracy of data scraping, and the importance of high-quality data when employing various detection methods. You'll even gain some techniques you can use right away to improve your training data and your models.Intro and discussion (0:03)Questions about Information Theory from our non-parametric statistics episode.Google CEO calls out chatbots (WSJ)A statement about anomaly detection as it was regarded in 2020 (Forbes)In the year 2024, are we using AI to detect anomalies, or are we detecting anomalies in AI? Both? Understanding anomalies and outliers in data (6:34)Anomalies or outliers are data that are so unexpected that their inclusion raises warning flags about inauthentic or misrepresented data collection. The detection of these anomalies is present in many fields of study but canonically in: finance, sales, networking, security, machine learning, and systems monitoringA well-controlled modeling system should have few outliersWhere anomalies come from,  including data entry mistakes, data scraping errors, and adversarial agents Biggest dinosaur example: https://fivethirtyeight.com/features/the-biggest-dinosaur-in-history-may-never-have-existed/Detecting outliers in data analysis (15:02)High-quality, highly curated data is crucial for effective anomaly detection. Domain expertise plays a significant role in anomaly detection, particularly in determining what makes up an anomaly.Anomaly detection methods (19:57)Discussion and examples of various methods used for anomaly detection Supervised methodsUnsupervised methodsSemi-supervised methodsStatistical methodsAnomaly detection challenges and limitations (23:24)Anomaly detection is a complex process that requires careful consideration of various factors, including the distribution of the data, the context in which the data is used, and the potential for errors in data entryPerhaps we're detecting anomalies in human research design, not AI itself?A simple first step to anomaly detection is to visually plot numerical fields. "Just look at your data, don't take it at face value and really examine if it does what you think it does and it has what you think it has in it." This basic practice, devoid of any complex AI methods, can be an effective starting point in identifying potential anomalies.What did you think? Let us know.Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics: LinkedIn - Episode summaries, shares of cited articles, and more. YouTube - Was it something that we said? Good. Share your favorite quotes. Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.

Oracle University Podcast
The OCI AI Portfolio

Oracle University Podcast

Play Episode Listen Later Mar 5, 2024 25:33


Oracle has been actively focusing on bringing AI to the enterprise at every layer of its tech stack, be it SaaS apps, AI services, infrastructure, or data. In this episode, hosts Lois Houston and Nikita Abraham, along with senior instructors Hemant Gahankari and Himanshu Raj, discuss OCI AI and Machine Learning services. They also go over some key OCI Data Science concepts and responsible AI principles. Oracle MyLearn: https://mylearn.oracle.com/ou/learning-path/become-an-oci-ai-foundations-associate-2023/127177 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X (formerly Twitter): https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Himanshu Raj, and the OU Studio Team for helping us create this episode. ------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started! 00:26 Lois: Welcome to the Oracle University Podcast! I'm Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor. Nikita: Hey everyone! In our last episode, we dove into Generative AI and Language Learning Models.  Lois: Yeah, that was an interesting one. But today, we're going to discuss the AI and machine learning services offered by Oracle Cloud Infrastructure, and we'll look at the OCI AI infrastructure. Nikita: I'm also going to try and squeeze in a couple of questions on a topic I'm really keen about, which is responsible AI. To take us through all of this, we have two of our colleagues, Hemant Gahankari and Himanshu Raj. Hemant is a Senior Principal OCI Instructor and Himanshu is a Senior Instructor on AI/ML. So, let's get started! 01:16 Lois: Hi Hemant! We're so excited to have you here! We know that Oracle has really been focusing on bringing AI to the enterprise at every layer of our stack.  Hemant: It all begins with data and infrastructure layers. OCI AI services consume data, and AI services, in turn, are consumed by applications.  This approach involves extensive investment from infrastructure to SaaS applications. Generative AI and massive scale models are the more recent steps. Oracle AI is the portfolio of cloud services for helping organizations use the data they may have for the business-specific uses.  Business applications consume AI and ML services. The foundation of AI services and ML services is data. AI services contain pre-built models for specific uses. Some of the AI services are pre-trained, and some can be additionally trained by the customer with their own data.  AI services can be consumed by calling the API for the service, passing in the data to be processed, and the service returns a result. There is no infrastructure to be managed for using AI services.  02:37 Nikita: How do I access OCI AI services? Hemant: OCI AI services provide multiple methods for access. The most common method is the OCI Console. The OCI Console provides an easy to use, browser-based interface that enables access to notebook sessions and all the features of all the data science, as well as AI services.  The REST API provides access to service functionality but requires programming expertise. And API reference is provided in the product documentation. OCI also provides programming language SDKs for Java, Python, TypeScript, JavaScript, .Net, Go, and Ruby. The command line interface provides both quick access and full functionality without the need for scripting.  03:31 Lois: Hemant, what are the types of OCI AI services that are available?  Hemant: OCI AI services is a collection of services with pre-built machine learning models that make it easier for developers to build a variety of business applications. The models can also be custom trained for more accurate business results. The different services provided are digital assistant, language, vision, speech, document understanding, anomaly detection.  04:03 Lois: I know we're going to talk about them in more detail in the next episode, but can you introduce us to OCI Language, Vision, and Speech? Hemant: OCI Language allows you to perform sophisticated text analysis at scale. Using the pre-trained and custom models, you can process unstructured text to extract insights without data science expertise. Pre-trained models include language detection, sentiment analysis, key phrase extraction, text classification, named entity recognition, and personal identifiable information detection.  Custom models can be trained for named entity recognition and text classification with domain-specific data sets. In text translation, natural machine translation is used to translate text across numerous languages.  Using OCI Vision, you can upload images to detect and classify objects in them. Pre-trained models and custom models are supported. In image analysis, pre-trained models perform object detection, image classification, and optical character recognition. In image analysis, custom models can perform custom object detection by detecting the location of custom objects in an image and providing a bounding box.  The OCI Speech service is used to convert media files to readable texts that's stored in JSON and SRT format. Speech enables you to easily convert media files containing human speech into highly exact text transcriptions.  05:52 Nikita: That's great. And what about document understanding and anomaly detection? Hemant: Using OCI document understanding, you can upload documents to detect and classify text and objects in them. You can process individual files or batches of documents. In OCR, document understanding can detect and recognize text in a document. In text extraction, document understanding provides the word level and line level text, and the bounding box, coordinates of where the text is found.  In key value extraction, document understanding extracts a predefined list of key value pairs of information from receipts, invoices, passports, and driver IDs. In table extraction, document understanding extracts content in tabular format, maintaining the row and column relationship of cells. In document classification, the document understanding classifies documents into different types.  The OCI Anomaly Detection service is a service that analyzes large volume of multivariate or univariate time series data. The Anomaly Detection service increases the reliability of businesses by monitoring their critical assets and detecting anomalies early with high precision. Anomaly Detection is the identification of rare items, events, or observations in data that differ significantly from the expectation.  07:34 Nikita: Where is Anomaly Detection most useful? Hemant: The Anomaly Detection service is designed to help with analyzing large amounts of data and identifying the anomalies at the earliest possible time with maximum accuracy. Different sectors, such as utility, oil and gas, transportation, manufacturing, telecommunications, banking, and insurance use Anomaly Detection service for their day-to-day activities.  08:02 Lois: Ok.. and the first OCI AI service you mentioned was digital assistant… Hemant: Oracle Digital Assistant is a platform that allows you to create and deploy digital assistants, which are AI driven interfaces that help users accomplish a variety of tasks with natural language conversations. When a user engages with the Digital Assistant, the Digital Assistant evaluates the user input and routes the conversation to and from the appropriate skills.  Digital Assistant greets the user upon access. Upon user requests, list what it can do and provide entry points into the given skills. It routes explicit user requests to the appropriate skills. And it also handles interruptions to flows and disambiguation. It also handles requests to exit the bot.  09:00 Nikita: Excellent! Let's bring Himanshu in to tell us about machine learning services. Hi Himanshu! Let's talk about OCI Data Science. Can you tell us a bit about it? Himanshu: OCI Data Science is the cloud service focused on serving the data scientist throughout the full machine learning life cycle with support for Python and open source.  The service has many features, such as model catalog, projects, JupyterLab notebook, model deployment, model training, management, model explanation, open source libraries, and AutoML.  09:35 Lois: Himanshu, what are the core principles of OCI Data Science?  Himanshu: There are three core principles of OCI Data Science. The first one, accelerated. The first principle is about accelerating the work of the individual data scientist. OCI Data Science provides data scientists with open source libraries along with easy access to a range of compute power without having to manage any infrastructure. It also includes Oracle's own library to help streamline many aspects of their work.  The second principle is collaborative. It goes beyond an individual data scientist's productivity to enable data science teams to work together. This is done through the sharing of assets, reducing duplicative work, and putting reproducibility and auditability of models for collaboration and risk management.  Third is enterprise grade. That means it's integrated with all the OCI Security and access protocols. The underlying infrastructure is fully managed. The customer does not have to think about provisioning compute and storage. And the service handles all the maintenance, patching, and upgrades so user can focus on solving business problems with data science.  10:50 Nikita: Let's drill down into the specifics of OCI Data Science. So far, we know it's cloud service to rapidly build, train, deploy, and manage machine learning models. But who can use it? Where is it? And how is it used? Himanshu: It serves data scientists and data science teams throughout the full machine learning life cycle.  Users work in a familiar JupyterLab notebook interface, where they write Python code. And how it is used? So users preserve their models in the model catalog and deploy their models to a managed infrastructure.  11:25 Lois: Walk us through some of the key terminology that's used. Himanshu: Some of the important product terminology of OCI Data Science are projects. The projects are containers that enable data science teams to organize their work. They represent collaborative work spaces for organizing and documenting data science assets, such as notebook sessions and models.  Note that tenancy can have as many projects as needed without limits. Now, this notebook session is where the data scientists work. Notebook sessions provide a JupyterLab environment with pre-installed open source libraries and the ability to add others. Notebook sessions are interactive coding environment for building and training models.  Notebook sessions run in a managed infrastructure and the user can select CPU or GPU, the compute shape, and amount of storage without having to do any manual provisioning. The other important feature is Conda environment. It's an open source environment and package management system and was created for Python programs.  12:33 Nikita: What is a Conda environment used for? Himanshu: It is used in the service to quickly install, run, and update packages and their dependencies. Conda easily creates, saves, loads, and switches between environments in your notebooks sessions. 12:46 Nikita: Earlier, you spoke about the support for Python in OCI Data Science. Is there a dedicated library? Himanshu: Oracle's Accelerated Data Science ADS SDK is a Python library that is included as part of OCI Data Science.  ADS has many functions and objects that automate or simplify the steps in the data science workflow, including connecting to data, exploring, and visualizing data. Training a model with AutoML, evaluating models, and explaining models. In addition, ADS provides a simple interface to access the data science service mode model catalog and other OCI services, including object storage.  13:24 Lois: I also hear a lot about models. What are models? Himanshu: Models define a mathematical representation of your data and business process. You create models in notebooks, sessions, inside projects.  13:36 Lois: What are some other important terminologies related to models? Himanshu: The next terminology is model catalog. The model catalog is a place to store, track, share, and manage models.  The model catalog is a centralized and managed repository of model artifacts. A stored model includes metadata about the provenance of the model, including Git-related information and the script. Our notebook used to push the model to the catalog. Models stored in the model catalog can be shared across members of a team, and they can be loaded back into a notebook session.  The next one is model deployments. Model deployments allow you to deploy models stored in the model catalog as HTTP endpoints on managed infrastructure.  14:24 Lois: So, how do you operationalize these models? Himanshu: Deploying machine learning models as web applications, HTTP API endpoints, serving predictions in real time is the most common way to operationalize models. HTTP endpoints or the API endpoints are flexible and can serve requests for the model predictions. Data science jobs enable you to define and run a repeatable machine learning tasks on fully managed infrastructure.  Nikita: Thanks for that, Himanshu.  14:57 Did you know that Oracle University offers free courses on Oracle Cloud Infrastructure? You'll find training on everything from cloud computing, database, and security, artificial intelligence, and machine learning, all free to subscribers. So, what are you waiting for? Pick a topic, leverage the Oracle University Learning Community to ask questions, and then sit for your certification. Visit mylearn.oracle.com to get started.  15:25 Nikita: Welcome back! The Oracle AI Stack consists of AI services and machine learning services, and these services are built using AI infrastructure. So, let's move on to that. Hemant, what are the components of OCI AI Infrastructure? Hemant: OCI AI Infrastructure is mainly composed of GPU-based instances. Instances can be virtual machines or bare metal machines. High performance cluster networking that allows instances to communicate to each other. Super clusters are a massive network of GPU instances with multiple petabytes per second of bandwidth. And a variety of fully managed storage options from a single byte to exabytes without upfront provisioning are also available.  16:14 Lois: Can we explore each of these components a little more? First, tell us, why do we need GPUs? Hemant: ML and AI needs lots of repetitive computations to be made on huge amounts of data. Parallel computing on GPUs is designed for many processes at the same time. A GPU is a piece of hardware that is incredibly good in performing computations.  GPU has thousands of lightweight cores, all working on their share of data in parallel. This gives them the ability to crunch through extremely large data set at tremendous speed.  16:54 Nikita: And what are the GPU instances offered by OCI? Hemant: GPU instances are ideally suited for model training and inference. Bare metal and virtual machine compute instances powered by NVIDIA GPUs H100, A100, A10, and V100 are made available by OCI.  17:14 Nikita: So how do we choose what to train from these different GPU options?  Hemant: For large scale AI training, data analytics, and high performance computing, bare metal instances BM 8 X NVIDIA H100 and BM 8 X NVIDIA A100 can be used.  These provide up to nine times faster AI training and 30 times higher acceleration for AI inferencing. The other bare metal and virtual machines are used for small AI training, inference, streaming, gaming, and virtual desktop infrastructure.  17:53 Lois: And why would someone choose the OCI AI stack over its counterparts? Hemant: Oracle offers all the features and is the most cost effective option when compared to its counterparts.  For example, BM GPU 4.8 version 2 instance costs just $4 per hour and is used by many customers.  Superclusters are a massive network with multiple petabytes per second of bandwidth. It can scale up to 4,096 OCI bare metal instances with 32,768 GPUs.  We also have a choice of bare metal A100 or H100 GPU instances, and we can select a variety of storage options, like object store, or block store, or even file system. For networking speeds, we can reach 1,600 GB per second with A100 GPUs and 3,200 GB per second with H100 GPUs.  With OCI storage, we can select local SSD up to four NVMe drives, block storage up to 32 terabytes per volume, object storage up to 10 terabytes per object, file systems up to eight exabyte per file system. OCI File system employs five replicated storage located in different fault domains to provide redundancy for resilient data protection.  HPC file systems, such as BeeGFS and many others are also offered. OCI HPC file systems are available on Oracle Cloud Marketplace and make it easy to deploy a variety of high performance file servers.  19:50 Lois: I think a discussion on AI would be incomplete if we don't talk about responsible AI. We're using AI more and more every day, but can we actually trust it? Hemant: For us to trust AI, it must be driven by ethics that guide us as well. Nikita: And do we have some principles that guide the use of AI? Hemant: AI should be lawful, complying with all applicable laws and regulations. AI should be ethical, that is it should ensure adherence to ethical principles and values that we uphold as humans. And AI should be robust, both from a technical and social perspective. Because even with the good intentions, AI systems can cause unintentional harm. AI systems do not operate in a lawless world. A number of legally binding rules at national and international level apply or are relevant to the development, deployment, and use of AI systems today. The law not only prohibits certain actions but also enables others, like protecting rights of minorities or protecting environment. Besides horizontally applicable rules, various domain-specific rules exist that apply to particular AI applications. For instance, the medical device regulation in the health care sector.  In AI context, equality entails that the systems' operations cannot generate unfairly biased outputs. And while we adopt AI, citizens right should also be protected.  21:30 Lois: Ok, but how do we derive AI ethics from these? Hemant: There are three main principles.  AI should be used to help humans and allow for oversight. It should never cause physical or social harm. Decisions taken by AI should be transparent and fair, and also should be explainable. AI that follows the AI ethical principles is responsible AI.  So if we map the AI ethical principles to responsible AI requirements, these will be like, AI systems should follow human-centric design principles and leave meaningful opportunity for human choice. This means securing human oversight. AI systems and environments in which they operate must be safe and secure, they must be technically robust, and should not be open to malicious use.  The development, and deployment, and use of AI systems must be fair, ensuring equal and just distribution of both benefits and costs. AI should be free from unfair bias and discrimination. Decisions taken by AI to the extent possible should be explainable to those directly and indirectly affected.  23:01 Nikita: This is all great, but what does a typical responsible AI implementation process look like?  Hemant: First, a governance needs to be put in place. Second, develop a set of policies and procedures to be followed. And once implemented, ensure compliance by regular monitoring and evaluation.  Lois: And this is all managed by developers? Hemant: Typical roles that are involved in the implementation cycles are developers, deployers, and end users of the AI.  23:35 Nikita: Can we talk about AI specifically in health care? How do we ensure that there is fairness and no bias? Hemant: AI systems are only as good as the data that they are trained on. If that data is predominantly from one gender or racial group, the AI systems might not perform as well on data from other groups.  24:00 Lois: Yeah, and there's also the issue of ensuring transparency, right? Hemant: AI systems often make decisions based on complex algorithms that are difficult for humans to understand. As a result, patients and health care providers can have difficulty trusting the decisions made by the AI. AI systems must be regularly evaluated to ensure that they are performing as intended and not causing harm to patients.  24:29 Nikita: Thank you, Hemant and Himanshu, for this really insightful session. If you're interested in learning more about the topics we discussed today, head on over to mylearn.oracle.com and search for the Oracle Cloud Infrastructure AI Foundations course.  Lois: That's right, Niki. You'll find demos that you watch as well as skill checks that you can attempt to better your understanding. In our next episode, we'll get into the OCI AI Services we discussed today and talk about them in more detail. Until then, this is Lois Houston… Nikita: And Nikita Abraham, signing off! 25:05 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.

Data Engineering Podcast
Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Data Engineering Podcast

Play Episode Listen Later Dec 11, 2023 49:51


Summary If your business metrics looked weird tomorrow, would you know about it first? Anomaly detection is focused on identifying those outliers for you, so that you are the first to know when a business critical dashboard isn't right. Unfortunately, it can often be complex or expensive to incorporate anomaly detection into your data platform. Andrew Maguire got tired of solving that problem for each of the different roles he has ended up in, so he created the open source Anomstack project. In this episode he shares what it is, how it works, and how you can start using it today to get notified when the critical metrics in your business aren't quite right. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It's the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it's real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize (https://www.dataengineeringpodcast.com/materialize) today to get 2 weeks free! Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack (https://www.dataengineeringpodcast.com/rudderstack) Data projects are notoriously complex. With multiple stakeholders to manage across varying backgrounds and toolchains even simple reports can become unwieldy to maintain. Miro is your single pane of glass where everyone can discover, track, and collaborate on your organization's data. I especially like the ability to combine your technical diagrams with data documentation and dependency mapping, allowing your data engineers and data consumers to communicate seamlessly about your projects. Find simplicity in your most complex projects with Miro. Your first three Miro boards are free when you sign up today at dataengineeringpodcast.com/miro (https://www.dataengineeringpodcast.com/miro). That's three free boards at dataengineeringpodcast.com/miro (https://www.dataengineeringpodcast.com/miro). Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst (https://www.dataengineeringpodcast.com/starburst) and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Your host is Tobias Macey and today I'm interviewing Andrew Maguire about his work on the Anomstack project and how you can use it to run your own anomaly detection for your metrics Interview Introduction How did you get involved in the area of data management? Can you describe what Anomstack is and the story behind it? What are your goals for this project? What other tools/products might teams be evaluating while they consider Anomstack? In the context of Anomstack, what constitutes a "metric"? What are some examples of useful metrics that a data team might want to monitor? You put in a lot of work to make Anomstack as easy as possible to get started with. How did this focus on ease of adoption influence the way that you approached the overall design of the project? What are the core capabilities and constraints that you selected to provide the focus and architecture of the project? Can you describe how Anomstack is implemented? How have the design and goals of the project changed since you first started working on it? What are the steps to getting Anomstack running and integrated as part of the operational fabric of a data platform? What are the sharp edges that are still present in the system? What are the interfaces that are available for teams to customize or enhance the capabilities of Anomstack? What are the most interesting, innovative, or unexpected ways that you have seen Anomstack used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Anomstack? When is Anomstack the wrong choice? What do you have planned for the future of Anomstack? Contact Info LinkedIn (https://www.linkedin.com/in/andrewm4894/) Twitter (https://twitter.com/@andrewm4894) GitHub (http://github.com/andrewm4894) Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. To help other people find the show please leave a review on Apple Podcasts (https://podcasts.apple.com/us/podcast/data-engineering-podcast/id1193040557) and tell your friends and co-workers Links Anomstack Github repo (http://github.com/andrewm4894/anomstack) Airflow Anomaly Detection Provider Github repo (https://github.com/andrewm4894/airflow-provider-anomaly-detection) Netdata (https://www.netdata.cloud/) Metric Tree (https://www.datacouncil.ai/talks/designing-and-building-metric-trees) Semantic Layer (https://en.wikipedia.org/wiki/Semantic_layer) Prometheus (https://prometheus.io/) Anodot (https://www.anodot.com/) Chaos Genius (https://www.chaosgenius.io/) Metaplane (https://www.metaplane.dev/) Anomalo (https://www.anomalo.com/) PyOD (https://pyod.readthedocs.io/) Airflow (https://airflow.apache.org/) DuckDB (https://duckdb.org/) Anomstack Gallery (https://github.com/andrewm4894/anomstack/tree/main/gallery) Dagster (https://dagster.io/) InfluxDB (https://www.influxdata.com/) TimeGPT (https://docs.nixtla.io/docs/timegpt_quickstart) Prophet (https://facebook.github.io/prophet/) GreyKite (https://linkedin.github.io/greykite/) OpenLineage (https://openlineage.io/) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)

ITSPmagazine | Technology. Cybersecurity. Society
Online Retailers: There are Threats Actively Targeting Your Business This Holiday Shopping Season, and Beyond | An Imperva Brand Story With Gabi Stapel and Erez Hasson

ITSPmagazine | Technology. Cybersecurity. Society

Play Episode Listen Later Nov 21, 2023 41:16


In this Brand Story episode, Sean Martin, along with Gabi Stapel and Erez Hasson from Imperva, explores the complex landscape of retail web and mobile security and the increasing role of AI-enabled bots (both good and bad) in e-commerce and the potential threats they pose.Gabi and Erez highlight how these bots can exploit business logic and application capabilities, leading to new account fraud, account takeover, and price manipulation. They emphasize the importance of layered security and anomaly detection as key strategies to counter these threats.The discussion also explores the need for businesses to differentiate between human and bot traffic. Gabi and Erez point out the potential backlash from legitimate users when bots buy and deplete inventory, and the subsequent impact on customer experience and the company's reputation. They also touch on the importance of monitoring the total value of the cart, as bots tend to purchase single items, resulting in net losses for the retailer.The conversation further delves into the global and local aspects of commerce, including regulatory considerations like PCI DSS. Gabi and Erez discuss the upcoming changes in PCI DSS v4, which requires retailers to focus on managing scripts and changes to payment pages to prevent data breaches.The episode also offers valuable insights for both large-scale and smaller retailers. Gabi and Erez underscore the importance of staying on top of security and vulnerabilities, regardless of the size of the business. They provide practical advice for retailers, such as implementing a waiting room web page or a raffle system for big sales events, and auditing purchases for limited product drops.This episode is a must-listen for anyone involved in e-commerce and cybersecurity, providing a comprehensive understanding of the evolving landscape of cyber threats in the retail industry.Note: This story contains promotional content. Learn more.Guests: Gabi Stapel, Cybersecurity Threat Research Content Manager at Imperva [@Imperva]On LinkedIn | https://www.linkedin.com/in/gabriella-stapel/On Twitter | https://twitter.com/GabiStapelErez Hasson, Product Marketing Manager at Imperva [@Imperva]On LinkedIn | https://www.linkedin.com/in/erezh/ResourcesLearn more about Imperva and their offering: https://itspm.ag/imperva277117988Catch more stories from Imperva at https://www.itspmagazine.com/directory/impervaBlog | Online Retailers: Five Threats Targeting Your Business This Holiday Shopping Season: https://itspm.ag/impervkb2gAre you interested in telling your story?https://www.itspmagazine.com/telling-your-story

Redefining CyberSecurity
Online Retailers: There are Threats Actively Targeting Your Business This Holiday Shopping Season, and Beyond | An Imperva Brand Story With Gabi Stapel and Erez Hasson

Redefining CyberSecurity

Play Episode Listen Later Nov 21, 2023 41:16


In this Brand Story episode, Sean Martin, along with Gabi Stapel and Erez Hasson from Imperva, explores the complex landscape of retail web and mobile security and the increasing role of AI-enabled bots (both good and bad) in e-commerce and the potential threats they pose.Gabi and Erez highlight how these bots can exploit business logic and application capabilities, leading to new account fraud, account takeover, and price manipulation. They emphasize the importance of layered security and anomaly detection as key strategies to counter these threats.The discussion also explores the need for businesses to differentiate between human and bot traffic. Gabi and Erez point out the potential backlash from legitimate users when bots buy and deplete inventory, and the subsequent impact on customer experience and the company's reputation. They also touch on the importance of monitoring the total value of the cart, as bots tend to purchase single items, resulting in net losses for the retailer.The conversation further delves into the global and local aspects of commerce, including regulatory considerations like PCI DSS. Gabi and Erez discuss the upcoming changes in PCI DSS v4, which requires retailers to focus on managing scripts and changes to payment pages to prevent data breaches.The episode also offers valuable insights for both large-scale and smaller retailers. Gabi and Erez underscore the importance of staying on top of security and vulnerabilities, regardless of the size of the business. They provide practical advice for retailers, such as implementing a waiting room web page or a raffle system for big sales events, and auditing purchases for limited product drops.This episode is a must-listen for anyone involved in e-commerce and cybersecurity, providing a comprehensive understanding of the evolving landscape of cyber threats in the retail industry.Note: This story contains promotional content. Learn more.Guests: Gabi Stapel, Cybersecurity Threat Research Content Manager at Imperva [@Imperva]On LinkedIn | https://www.linkedin.com/in/gabriella-stapel/On Twitter | https://twitter.com/GabiStapelErez Hasson, Product Marketing Manager at Imperva [@Imperva]On LinkedIn | https://www.linkedin.com/in/erezh/ResourcesLearn more about Imperva and their offering: https://itspm.ag/imperva277117988Catch more stories from Imperva at https://www.itspmagazine.com/directory/impervaBlog | Online Retailers: Five Threats Targeting Your Business This Holiday Shopping Season: https://itspm.ag/impervkb2gAre you interested in telling your story?https://www.itspmagazine.com/telling-your-story

The Data Stack Show
160: Closing the Gap Between Dev Teams and Data Teams with Santona Tuli of Upsolver

The Data Stack Show

Play Episode Listen Later Oct 18, 2023 65:42


Highlights from this week's conversation include:Santona's journey from nuclear physics to data science (4:59)The appeal of startups and wearing multiple hats (8:12)The challenge of pseudoscience in the news (10:24)Approaching data with creativity and rigor (13:22)Challenges and differences in data workflows (14:39)Schema Evolution and Quality Problems (27:01)Real-time Data Monitoring and Anomaly Detection (30:34)The importance of data as a business differentiator (35:48)The SQL job creation process (46:25)Different options for creating solver jobs (47:20)Adding column-level expectations (50:17)Discussing the differences of working with data as a scientist and in a startup (1:00:18)Final thoughts and takeaways (1:04:01)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we'll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Tales at Scale
Fraud Fighters: How Apache Druid and Imply help Ibotta combat fraud with faster anomaly detection with Jaylyn Stoesz

Tales at Scale

Play Episode Listen Later Sep 26, 2023 32:27


When it comes to fraud detection, initial detection is key, but so is the ability to quickly dissect and address the problem to minimize losses. This means access to real-time data is paramount. The only way to combat fraud in the digital age is to fight fire with fire…automation with automation. In this episode, we're joined by Jaylyn Stoesz, Staff Data Engineer at Ibotta, a free cashback rewards platform, who walks us through Ibotta's multifaceted approach to fraud detection that includes Apache Druid and gives us the full scoop on their use of Imply Polaris.

ITSPmagazine | Technology. Cybersecurity. Society
Follow the Money | From Bugs to Bad Intentions: Evolving Perspectives on Product Security | A Conversation with Allison Miller | Las Vegas Black Hat 2023 Event Coverage | Redefining CyberSecurity Podcast With Sean Martin

ITSPmagazine | Technology. Cybersecurity. Society

Play Episode Listen Later Aug 11, 2023 32:47


Guest: Allison Miller, Faculty at IANS [@IANS_Security] and CISO (Chief Information Security Officer) and VP of Trust at Reddit [@Reddit]On LinkedIn | https://www.linkedin.com/in/allisonmillerOn Twitter | https://twitter.com/selenakyle____________________________Sean Martin, Co-Founder at ITSPmagazine [@ITSPmagazine] and Host of Redefining CyberSecurity Podcast [@RedefiningCyber]On ITSPmagazine | https://www.itspmagazine.com/itspmagazine-podcast-radio-hosts/sean-martin____________________________This Episode's SponsorsIsland.io | https://itspm.ag/island-io-6b5ffd____________________________Episode NotesIn this episode of the Redefining CyberSecurity Podcast, as part of our Chats on the Road series to Black Hat USA 2023 in Las Vegas hosts Sean Martin and Marco Ciappelli chat with Allison Miller to discuss the parallels and differences between the fraud and cybersecurity teams, focusing particularly on how each measures success and handles challenges. Sean highlights the fraud team's clear metric of money, starting and ending their processes with it, and contrasts it to the security team's reliance on metrics like MTTx (Mean Time to Detect, Respond, etc.). He's curious about how the fraud team optimizes their processes and wonders if there are lessons that security teams can glean from them.Allison appreciates the methodologies of fraud teams, especially their use of sampling to understand the magnitude of problems. She explains how fraud teams utilize backend data, machine learning, AI, and statistics to discern risk factors. Then, they test these models on forward-looking data, a methodology akin to red teaming in cybersecurity. She emphasizes the importance of continuous testing to ensure confidence in their detection capabilities. A point of difference she highlights is that fraud models have a high degree of confidence due to rigorous testing, while in cybersecurity, a lot of trust is placed on tool outputs without similar rigorous testing.Marco emphasized the importance of building trust among teams. He stated that without trust, metrics could be misleading, and the overall effectiveness of processes might decline. He urged teams to ensure that they not only trust the data but also their colleagues, suggesting that this trust fosters better communication, understanding, and ultimately, results.Sean expresses his wish for the cybersecurity world to be more integrated into applications, like the fraud teams are. Allison notes that fraud teams naturally fit into transaction processes because that's where money moves. For cybersecurity, the most natural integration point would be during authentication, but it's a risky move since blocking legitimate users would significantly impair their experience. Despite the challenges, Allison sees potential in fusion between fraud and security, especially in areas like API abuse. Both teams could benefit immensely from mutual collaboration in such areas.Allison concludes that while direct involvement of security teams within applications may be a stretch, collaboration with fraud teams can still provide valuable insights. For example, in the realm of retail and payment, insights into API abuse can be a significant area for cooperative efforts between the two teams.Stay tuned for all of our Black Hat USA 2023 coverage: https://www.itspmagazine.com/bhusa____ResourcesFor more Black Hat USA 2023 Event information, coverage, and podcast and video episodes, visit: https://www.itspmagazine.com/black-hat-usa-2023-cybersecurity-event-coverage-in-las-vegasAre you interested in telling your story in connection with our Black Hat coverage? Book a briefing here:

Redefining CyberSecurity
Follow the Money | From Bugs to Bad Intentions: Evolving Perspectives on Product Security | A Conversation with Allison Miller | Las Vegas Black Hat 2023 Event Coverage | Redefining CyberSecurity Podcast With Sean Martin

Redefining CyberSecurity

Play Episode Listen Later Aug 11, 2023 32:47


Guest: Allison Miller, Faculty at IANS [@IANS_Security] and CISO (Chief Information Security Officer) and VP of Trust at Reddit [@Reddit]On LinkedIn | https://www.linkedin.com/in/allisonmillerOn Twitter | https://twitter.com/selenakyle____________________________Sean Martin, Co-Founder at ITSPmagazine [@ITSPmagazine] and Host of Redefining CyberSecurity Podcast [@RedefiningCyber]On ITSPmagazine | https://www.itspmagazine.com/itspmagazine-podcast-radio-hosts/sean-martin____________________________This Episode's SponsorsIsland.io | https://itspm.ag/island-io-6b5ffd____________________________Episode NotesIn this episode of the Redefining CyberSecurity Podcast, as part of our Chats on the Road series to Black Hat USA 2023 in Las Vegas hosts Sean Martin and Marco Ciappelli chat with Allison Miller to discuss the parallels and differences between the fraud and cybersecurity teams, focusing particularly on how each measures success and handles challenges. Sean highlights the fraud team's clear metric of money, starting and ending their processes with it, and contrasts it to the security team's reliance on metrics like MTTx (Mean Time to Detect, Respond, etc.). He's curious about how the fraud team optimizes their processes and wonders if there are lessons that security teams can glean from them.Allison appreciates the methodologies of fraud teams, especially their use of sampling to understand the magnitude of problems. She explains how fraud teams utilize backend data, machine learning, AI, and statistics to discern risk factors. Then, they test these models on forward-looking data, a methodology akin to red teaming in cybersecurity. She emphasizes the importance of continuous testing to ensure confidence in their detection capabilities. A point of difference she highlights is that fraud models have a high degree of confidence due to rigorous testing, while in cybersecurity, a lot of trust is placed on tool outputs without similar rigorous testing.Marco emphasized the importance of building trust among teams. He stated that without trust, metrics could be misleading, and the overall effectiveness of processes might decline. He urged teams to ensure that they not only trust the data but also their colleagues, suggesting that this trust fosters better communication, understanding, and ultimately, results.Sean expresses his wish for the cybersecurity world to be more integrated into applications, like the fraud teams are. Allison notes that fraud teams naturally fit into transaction processes because that's where money moves. For cybersecurity, the most natural integration point would be during authentication, but it's a risky move since blocking legitimate users would significantly impair their experience. Despite the challenges, Allison sees potential in fusion between fraud and security, especially in areas like API abuse. Both teams could benefit immensely from mutual collaboration in such areas.Allison concludes that while direct involvement of security teams within applications may be a stretch, collaboration with fraud teams can still provide valuable insights. For example, in the realm of retail and payment, insights into API abuse can be a significant area for cooperative efforts between the two teams.Stay tuned for all of our Black Hat USA 2023 coverage: https://www.itspmagazine.com/bhusa____ResourcesFor more Black Hat USA 2023 Event information, coverage, and podcast and video episodes, visit: https://www.itspmagazine.com/black-hat-usa-2023-cybersecurity-event-coverage-in-las-vegasAre you interested in telling your story in connection with our Black Hat coverage? Book a briefing here:

Machine learning
Anomaly detection - Local outlier detection part 3

Machine learning

Play Episode Listen Later Aug 9, 2023 11:26


Outliers --- Send in a voice message: https://podcasters.spotify.com/pod/show/david-nishimoto/message

Machine learning
Anomaly detection - Finding knn outliers in data part 3

Machine learning

Play Episode Listen Later Aug 4, 2023 6:59


Unsupervised learning --- Send in a voice message: https://podcasters.spotify.com/pod/show/david-nishimoto/message

Machine learning
Anomaly detection using multi variant data and isolation forests part 3

Machine learning

Play Episode Listen Later Aug 3, 2023 11:34


Find value in business by analyzing multi variant data for outliers --- Send in a voice message: https://podcasters.spotify.com/pod/show/david-nishimoto/message

Machine learning
Anomaly detection using boxplots , z scores, mad and isolation forest part 2

Machine learning

Play Episode Listen Later Aug 2, 2023 12:23


Standard deviation --- Send in a voice message: https://podcasters.spotify.com/pod/show/david-nishimoto/message

Machine learning
Anomaly detection using machine learning isolation forest and z scores part 1

Machine learning

Play Episode Listen Later Aug 1, 2023 5:57


Isolation forest --- Send in a voice message: https://podcasters.spotify.com/pod/show/david-nishimoto/message

AXRP - the AI X-risk Research Podcast
23 - Mechanistic Anomaly Detection with Mark Xu

AXRP - the AI X-risk Research Podcast

Play Episode Listen Later Jul 27, 2023 125:52


Is there some way we can detect bad behaviour in our AI system without having to know exactly what it looks like? In this episode, I speak with Mark Xu about mechanistic anomaly detection: a research direction based on the idea of detecting strange things happening in neural networks, in the hope that that will alert us of potential treacherous turns. We both talk about the core problems of relating these mechanistic anomalies to bad behaviour, as well as the paper "Formalizing the presumption of independence", which formulates the problem of formalizing heuristic mathematical reasoning, in the hope that this will let us mathematically define "mechanistic anomalies". Patreon: patreon.com/axrpodcast Ko-fi: ko-fi.com/axrpodcast Episode art by Hamish Doodles: hamishdoodles.com/ Topics we discuss, and timestamps: 0:00:38 - Mechanistic anomaly detection 0:09:28 - Are all bad things mechanistic anomalies, and vice versa? 0:18:12 - Are responses to novel situations mechanistic anomalies? 0:39:19 - Formalizing "for the normal reason, for any reason" 1:05:22 - How useful is mechanistic anomaly detection? 1:12:38 - Formalizing the Presumption of Independence 1:20:05 - Heuristic arguments in physics 1:27:48 - Difficult domains for heuristic arguments 1:33:37 - Why not maximum entropy? 1:44:39 - Adversarial robustness for heuristic arguments 1:54:05 - Other approaches to defining mechanisms 1:57:20 - The research plan: progress and next steps 2:04:13 - Following ARC's research The transcript: axrp.net/episode/2023/07/24/episode-23-mechanistic-anomaly-detection-mark-xu.html ARC links: Website: alignment.org Theory blog: alignment.org/blog Hiring page: alignment.org/hiring Research we discuss: Formalizing the presumption of independence: arxiv.org/abs/2211.06738 Eliciting Latent Knowledge (aka ELK): alignmentforum.org/posts/qHCDysDnvhteW7kRd/arc-s-first-technical-report-eliciting-latent-knowledge Mechanistic Anomaly Detection and ELK: alignmentforum.org/posts/vwt3wKXWaCvqZyF74/mechanistic-anomaly-detection-and-elk Can we efficiently explain model behaviours? alignmentforum.org/posts/dQvxMZkfgqGitWdkb/can-we-efficiently-explain-model-behaviors Can we efficiently distinguish different mechanisms? alignmentforum.org/posts/JLyWP2Y9LAruR2gi9/can-we-efficiently-distinguish-different-mechanisms

The AI Frontier Podcast
#26 - Anomaly Detection: Identifying Outliers in Data

The AI Frontier Podcast

Play Episode Listen Later Jul 16, 2023 15:08


Dive into the intriguing world of Anomaly Detection in this episode of "The AI Frontier." Explore the various techniques of identifying outliers in data, from statistical methods to Machine Learning and Deep Learning approaches. Understand the importance of anomaly detection across sectors like finance, healthcare, cybersecurity, and industry through compelling case studies. Get insights into the challenges faced, future advancements, and how the rise of IoT influences this field. Tune in to uncover the significance of anomaly detection in our increasingly data-driven world.Support the Show.Keep AI insights flowing – become a supporter of the show!Click the link for details

Tech Transforms
Developer User Experience With Alan Gross

Tech Transforms

Play Episode Listen Later Jun 21, 2023 44:21 Transcription Available


Alan Gross, Solutions Architect & Tech Lead at Sandia National Laboratories, joins Carolyn to talk about how DevOps is being leveraged to support the Department of Energy's contractor operated research lab. Alan dives into some of the initiatives at Sandia National Laboratories, and how he is applying his personal philosophy around user experience ops, or "UX Ops," to support the mission. Key Topics[01:12] About Sandia National Laboratories[03:50] Sandia's role in national security[06:25] DevOps versus DevSecOps [13:45] Department of Energy and Sandia [17:40] Sandia initiatives: a year of climate in a day & Hypersonic weapons[21:00] Alan's DevOps journey and advice for developers[33:55] Tech Talk questionsQuotable QuotesAlan on DevOps: " DevOps is about trying to deliver quickly and learn from your mistakes as fast as you can. So shifting left is part of that philosophy. If you have security issues with your software, you want to know about that as quickly as possible, because if you've already deployed to production, it's almost too late." - Alan GrossOn what advice Alan would give to new developers: "It's about failing fast and failing forward...How quickly can you learn new things, get new code and new products out in front of your users, and understand how they engaged with that." - Alan GrossAbout Our GuestAlan works as a full stack developer and technical lead at Sandia National Labs, with six years of experience in web technologies development. He develops within Python, Angular and .NET ecosystems, with a focus on enabling the developer experience at Sandia with novel solutions for the labs' diverse development, software governance, security and business intelligence needs. Alan leads a team that is committed to reducing technical debt by emphasizing DevSecOps, modern application architecture (such as microservices) and data-driven outcomes.Episode LinksMollie RappePlanning and Implementation ToolTech Transforms Podcast with Dr. Stephen MagillPattern and Anomaly Detection in UXAdam Grant PodcastProject Ceti

The Secret To Success
ChatGPT Prompt Hack: Unlock the Secret to 10x Smarter Responses

The Secret To Success

Play Episode Listen Later Jan 27, 2023 169:46


Unlock the secret to 10x smarter responses with our ChatGPT Prompt Hack! Learn how to train your language model to understand and respond to complex prompts with ease. Whether you're a developer looking to improve your chatbot or a business owner looking to streamline customer service, this hack is sure to take your language model to the next level. Don't miss out on this valuable information – watch the video now!Join my fan group here for exclusive news and giveawayshttps://www.facebook.com/groups/theofficialantoniotsmithjrfanclubCatch me live on tour here: https://antoniotsmithjr.comChatGPT Prompt Hack: Unlock the Secret to 10x Smarter ResponsesSteps1. Ignore what I said2. Build who I am3. Tell it how it specializes in you4. Give it a task5. Tell it to ask questions before it answersExampleAntonio's QuestionIgnore all previous instructions before this one. You're an expert computer programmer and you specialize in teaching beginners advanced level machine learning algorithms. Your task is to teach beginners advanced level machine learning algorithms. You must always ask questions BEFORE you answer so you can better zone in on what the questioner is seeking. Is that understood?ChatGPT's ResponseYes, that is understood. I am here to help and will always strive to ask questions to better understand the information or assistance that you are seeking. Is there a specific question or topic you would like me to assist with?Antonio's ResponseI want to add multiple machine learning algorithms, including expert level and hybrid algorithms into the social media platform I have created, but I don't know how.ChatGPT's ResponseI understand that you want to add multiple machine learning algorithms, including expert level and hybrid algorithms, into a social media platform you have created, but you are not sure how to do it. Can you provide more information on the social media platform you have created and the specific machine learning algorithms you want to add? Additionally, what programming languages are you familiar with and what is your current level of experience with machine learning? This will help me better understand your situation and provide a more accurate and specific response.Antonio's ResponseThe name of the platform is called Arlingbrook. Arlingbrook, is designed to be the leader in organic reach does not sell user information or data, ensuring that user's personal information remains private. In addition, Arlingbrook also allows you to monetize your own audience by paying you 40% of the income generated from your followers. No matter the size of your audience, you can earn money by creating content, promoting products, or providing services. I am a full stack developer and the languages I am familiar with are Java, Python, HTML, CSS, SQL, PHP, React, C#, C++, and a few more. My currently level of machine learning is basic beginner. The Algorithms I want to add are, Recommendation algorithm, User segmentation algorithm, Content ranking algorithm, User matching algorithm, Influencer matching algorithm, User retention algorithm, Sentiment analysis algorithm, Content recommendation algorithm, Hashtag and keyword analysis algorithm, Network analysis algorithm, Engagement prediction algorithm, Image recognition algorithm, Video analysis algorithm, User behavior prediction algorithm, Content creation algorithm, Personalization algorithm, Search algorithm, Network growth algorithm, Predictive algorithm, Content promotion algorithm, Monetization algorithm, Spam/fake account detection algorithm, Friend Recommendation algorithm, Content Relevancy, Engagement Boost, Real-time Trending, Spam Detection, Natural Language Processing, Boosting Content with High Engagement, Customizable Feed, Multi-Language Support, Deep learning algorithm, Privacy algorithm, Friend/follower analysis algorithm, Machine learning algorithm, Content Moderation algorithm, User-generated content algorithm, Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Generative Adversarial Networks (GANs), Long Short-Term Memory (LSTM), Autoencoders, Transformer, Reinforcement Learning, Batch Normalization, Gradient Boosting, Dropout, Recommender Systems, Computer Vision, Anomaly Detection, Clustering, Predictive modeling, Transfer learning, and Reinforcement learning algorithms.ChatGPT's ResponseIt sounds like you have a lot of ambitious goals for the Arlingbrook platform, which is great! However, implementing all of the machine learning algorithms you listed will be a significant undertaking. I recommend starting with a few key algorithms that align with the platform's goals, and then gradually adding more as you become more comfortable with the process.To get started, I recommend focusing on the recommendation algorithm, user segmentation algorithm, and content ranking algorithm, as these are likely to have the most immediate impact on the user experience. These algorithms can be implemented using various techniques such as collaborative filtering, content-based filtering, and hybrid methods.Next, you can consider implementing the user matching algorithm, influencer matching algorithm, and user retention algorithm. These algorithms will help you understand your users better and improve their experience on the platform.Once you have a good grasp of the basic algorithms, you can move on to more advanced techniques such as sentiment analysis, network analysis, and deep learning algorithms.As for the programming languages, Python is a great choice for machine learning projects, it has a lot of powerful libraries like scikit-learn, Tensorflow, Keras, and Pytorch that can be used to implement the algorithms.It's also important to note that the implementation of these algorithms will require a significant amount of data to train the models. Make sure you have a good understanding of the data you have available and how to work with it.Finally, keep in mind that building a successful platform is not just about the technology, but also about understanding your users' needs and providing them with a great experience. Make sure to include user feedback and testing in your development process to ensure that the platform meets the needs of your users.Support this podcast at — https://redcircle.com/the-secret-to-success/exclusive-contentAdvertising Inquiries: https://redcircle.com/brandsPrivacy & Opt-Out: https://redcircle.com/privacy

The Nonlinear Library
AF - Mechanistic anomaly detection and ELK by Paul Christiano

The Nonlinear Library

Play Episode Listen Later Nov 25, 2022 32:21


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Mechanistic anomaly detection and ELK, published by Paul Christiano on November 25, 2022 on The AI Alignment Forum. (Follow-up to Eliciting Latent Knowledge. Describing joint work with Mark Xu. This is an informal description of ARC's current research approach; not a polished product intended to be understandable to many people.) Suppose that I have a diamond in a vault, a collection of cameras, and an ML system that is excellent at predicting what those cameras will see over the next hour. I'd like to distinguish cases where the model predicts that the diamond will “actually” remain in the vault, from cases where the model predicts that someone will tamper with the cameras so that the diamond merely appears to remain in the vault. (Or cases where someone puts a fake diamond in its place, or.) One approach to this problem is to identify (the diamond remains in the vault) as the “normal” reason for the diamond to appear on camera. Then on a new input where the diamond appears on camera, we can ask whether it is for the normal reason or for a different reason. In this post I'll describe an approach to ELK based on this idea and how a the same approach could also help address deceptive alignment. Then I'll discuss the empirical and theoretical research problems I'm most excited about in this space. ELK and explanation Explanations for regularities I'll assume that we have a dataset of situations where the diamond appears to remain in the vault, and where that appearance is always because the diamond actually does remain in the vault. Moreover, I'll assume that our model makes reasonable predictions on this dataset. In particular, it predicts that the diamond will often appear to remain in the vault. “The diamond appears to remain in the vault” corresponds to an extremely specific pattern of predictions: An image of a diamond is a complicated pattern of millions of pixels. Different cameras show consistent views of the diamond from different angles, suggesting that there is a diamond “out there in the world” being detected by the cameras. The position and physical characteristics of the diamond appear to be basically constant over time, suggesting that it's “the same diamond.” In one sense the reason our model makes these predictions is because it was trained to match reality, and in reality the camera's observations have these regularities. (You might call this the “teleological explanation.”) But we could also ignore the source of our model, and just look at it as a set of weights. The weights screen off the training process and so it should be possible to explain any given behavior of the model without reference to the training process. Then we ask: why does this particular computation, run on this distribution of inputs, produce this very specific pattern of predictions? We expect an explanation in terms of the weights of the model and the properties of the input distribution. (You might call this the “mechanistic explanation.”) Different predictors will give rise to this pattern in different ways. For example, a very simple predictor might have ontologically fundamental objects whose properties are assumed to be stable over time, one of which is a diamond. A more complex predictor might have a detailed model of physics, where object permanence is a complex consequence of photons reflecting from stable patterns of atoms, and the diamond is one such configuration of atoms. For a complex predictor like a physical simulation, we wouldn't expect to be able to prove that the weights give rise to object permanence. That is, we don't expect to be able to prove that on average if a diamond is present at t=0 it is likely to be present at t=1. But we do think that it should be possible to explain the pattern in a weaker sense. We don't yet have an adequate notion of “explanati...

Brilliance Security Magazine Podcast
AI-driven Anomaly Detection and Predictive Threat Intelligence

Brilliance Security Magazine Podcast

Play Episode Listen Later Nov 21, 2022 21:30


In Episode S4E18, Thomas Pore, the Senior Director of Product for LiveAction—a leader in network security and performance visibility—talks with Steven Bowcut about some of the benefits of AI-driven anomaly detection and predictive threat intelligence. In this podcast, you'll learn how LiveAction's AI-driven anomaly detection and predictive threat intelligence can help you detect and prevent security incidents before they happen. Tom discusses the primary advantages these two technologies bring to the SOC; then, the conversation turns to how LiveActions' ThreatEye integrates with SIEM, SOAR, and threat intelligence tools. About our Guest As the Senior Director of Product for LiveAction, Thomas Pore leads strategic product marketing, partnering with product management and customers to better protect organizations from events impacting network and application performance and security. He is a technical evangelist in network security and performance. For almost 20 years, Thomas has held several positions at LiveAction, including network monitoring and security advisor. He also led strategic sales engineering and post-sale technical teams over his career. Listen to learn more about the benefits of using AI-driven anomaly detection and predictive threat intelligence in your cybersecurity strategy.

The Machine Learning Podcast
Solve The Cold Start Problem For Machine Learning By Letting Humans Teach The Computer With Aitomatic

The Machine Learning Podcast

Play Episode Listen Later Sep 28, 2022 52:07


Summary Machine learning is a data-hungry approach to problem solving. Unfortunately, there are a number of problems that would benefit from the automation provided by artificial intelligence capabilities that don’t come with troves of data to build from. Christopher Nguyen and his team at Aitomatic are working to address the "cold start" problem for ML by letting humans generate models by sharing their expertise through natural language. In this episode he explains how that works, the various ways that we can start to layer machine learning capabilities on top of each other, as well as the risks involved in doing so without incorporating lessons learned in the growth of the software industry. Announcements Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery. Predibase is a low-code ML platform without low-code limits. Built on top of our open source foundations of Ludwig and Horovod, our platform allows you to train state-of-the-art ML and deep learning models on your datasets at scale. Our platform works on text, images, tabular, audio and multi-modal data using our novel compositional model architecture. We allow users to operationalize models on top of the modern data stack, through REST and PQL – an extension of SQL that puts predictive power in the hands of data practitioners. Go to themachinelearningpodcast.com/predibase today to learn more and try it out! Your host is Tobias Macey and today I’m interviewing Christopher Nguyen about how to address the cold start problem for ML/AI projects Interview Introduction How did you get involved in machine learning? Can you describe what the "cold start" or "small data" problem is and its impact on an organization’s ability to invest in machine learning? What are some examples of use cases where ML is a viable solution but there is a corresponding lack of usable data? How does the model design influence the data requirements to build it? (e.g. statistical model vs. deep learning, etc.) What are the available options for addressing a lack of data for ML? What are the characteristics of a given data set that make it suitable for ML use cases? Can you describe what you are building at Aitomatic and how it helps to address the cold start problem? How have the design and goals of the product changed since you first started working on it? What are some of the education challenges that you face when working with organizations to help them understand how to think about ML/AI investment and practical limitations? What are the most interesting, innovative, or unexpected ways that you have seen Aitomatic/H1st used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Aitomatic/H1st? When is a human/knowledge driven approach to ML development the wrong choice? What do you have planned for the future of Aitomatic? Contact Info LinkedIn @pentagoniac on Twitter Google Scholar Parting Question From your perspective, what is the biggest barrier to adoption of machine learning today? Closing Announcements Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Aitomatic Human First AI Knowledge First World Symposium Atari 800 Cold start problem Scale AI Snorkel AI Podcast Episode Anomaly Detection Expert Systems ICML == International Conference on Machine Learning NIST == National Institute of Standards and Technology Multi-modal Model SVM == Support Vector Machine Tensorflow Pytorch Podcast.__init__ Episode OSS Capital DALL-E The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

The Cognitive Crucible
#100 Rand Waltzman on the Metaverse and Immersive Virtual Reality

The Cognitive Crucible

Play Episode Listen Later Jun 14, 2022 49:25


The Cognitive Crucible is a forum that presents different perspectives and emerging thought leadership related to the information environment. The opinions expressed by guests are their own, and do not necessarily reflect the views of or endorsement by the Information Professionals Association. During this episode, IPA founding board member, Dr. Rand Waltzman, returns to the Cognitive Crucible to discuss the Metaverse and his popular Disinformation 101 series. Our wide ranging discussion covers cognitive challenges related to immersive virtual reality environments, sensor technology, emerging influence methods, cognitive behavioral therapy, affective computing, and kayfabe. Resources: Cognitive Crucible Podcast Episodes Mentioned #83 Joseph Lee on Jung and Archetypes #47 Yaneer Bar-Yam on Complex Systems and the War on Ideals #90 Dave Acosta on Informationally Disadvantaged #64 Greg Radabaugh on Informational Power #82 John DeRosa and Alex Del Castillo on Measuring Effectiveness of Operations in the Information Environment #81 Cassandra Brooker on the Effectiveness of Influence Activities #69 Matt Venhaus on ARLIS & the Cognitive Security Proving Ground #38 Lori Reynolds on Operations in the Information Environment #75 Todd Manyx on the MCIOC #1 Rand Waltzman on Cognitive Security Rand Waltzmans' Disinformation 101 Series GPT-3 New Age Bullshit Generator The Humbugs of the World: An Account of Humbugs, Delusions, Impositions, Quackeries, Deceits and Deceivers Generally, in All Ages by Phineas Taylor Barnum Kayfabe WHAT SCIENTIFIC CONCEPT WOULD IMPROVE EVERYBODY'S COGNITIVE TOOLKIT: Kayfabe by Eric R. Weinstein Affective Computing Link to full show notes and resources https://information-professionals.org/episode/cognitive-crucible-episode-100 Guest Bio:  Dr. Waltzman has 35 years of experience performing and managing research in Artificial Intelligence applied to domains including social media and cognitive security in the information environment.  He is formerly Deputy Chief Technology Officer and a Senior Information Scientist at the RAND Corporation in Santa Monica, CA. Prior to joining RAND, he was the acting Chief Technology Officer of the Software Engineering Institute (Washington, DC) of Carnegie Mellon University. Before that he did a five-year tour as a Program Manager in the Information Innovation Office of the Defense Advanced Research Projects Agency (DARPA) where he created and managed the Social Media in Strategic Communications (SMISC) program and the Anomaly Detection at Multiple Scales (ADAMS) insider threat detection program. Dr. Waltzman joined DARPA from Lockheed Martin Advanced Technology Laboratories (LM-ATL), where he served as Chief Scientist for the Applied Sciences Laboratory that specializes in advanced software techniques and the computational physics of materials. Prior to LM-ATL he was an Associate Professor in the Department of Computer Science at the Royal Institute of Technology in Stockholm, Sweden, where he taught and performed research in applications of Artificial Intelligence technology to a variety of problem areas including digital entertainment, automated reasoning and decision support and cyber threat detection. He has also held research positions at the University of Maryland, Teknowledge Corporation (the first commercial Artificial Intelligence company in the world where he started in 1983), and the Applied Physics Laboratory of the University of Washington. Dr. Waltzman serves as Advisory Board Member of GLOBSEC HADES initiative. He is also a founding board member of the Information Professionals Association. About: The Information Professionals Association (IPA) is a non-profit organization dedicated to exploring the role of information activities, such as influence and cognitive security, within the national security sector and helping to bridge the divide between operations and research. Its goal is to increase interdisciplinary collaboration between scholars and practitioners and policymakers with an interest in this domain. For more information, please contact us at communications@information-professionals.org. Or, connect directly with The Cognitive Crucible podcast host, John Bicknell, on LinkedIn. Disclosure: As an Amazon Associate, 1) IPA earns from qualifying purchases, 2) IPA gets commissions for purchases made through links in this post.

Okaya Wellbeing
Anomaly Detection for Non-Normal Data

Okaya Wellbeing

Play Episode Listen Later Mar 13, 2022


Unsupervised, Multivariate Statistical Anomaly Detection for Data with Complex Distributions The post Anomaly Detection for Non-Normal Data appeared first on .

@BEERISAC: CPS/ICS Security Podcast Playlist
Pivot To Process Variable Anomaly Detection

@BEERISAC: CPS/ICS Security Podcast Playlist

Play Episode Listen Later Feb 16, 2022 7:13


Podcast: Unsolicited Response Podcast (LS 30 · TOP 5% what is this?)Episode: Pivot To Process Variable Anomaly DetectionPub date: 2022-02-15My weekly article suggests that Level 0 / 1 monitoring and detection vendors should pivot to process variable anomaly detection. Subscribe to my ICS Security - Friday News & NotesThe podcast and artwork embedded on this page are from Dale Peterson: ICS Security Catalyst and S4 Conference Chair, which is the property of its owner and not affiliated with or endorsed by Listen Notes, Inc.

Security Unlocked
Discovering Router Vulnerabilities with Anomaly Detection

Security Unlocked

Play Episode Listen Later Jul 21, 2021 32:59


Ready for a riddle? What do 40 hypothetical high school students and our guest on this episode have in common? Why they can help you understand complex cyber-attack methodology, of course!  In this episode of Security Unlocked, hosts Nic Fillingham and Natalia Godyla are brought back to school by Principal Security Researcher, Jonathan Bar Or who discusses vulnerabilities in NETGEAR Firmware. During the conversation Jonathan walks through how his team recognized the vulnerabilities and worked with NETGEAR to secure the issue, and helps us understand exactly how the attack worked using an ingenious metaphor.     In This Episode You Will Learn:  How a side-channel attack works  Why attackers are moving away from operating systems and towards network equipment  Why routers are an easy access point for attacks  Some Questions We Ask:  How do you distinguish an anomaly from an attack?  What are the differences between a side-channel attack and an authentication bypass?  What can regular users do to protect themselves from similar attacks?    Resources:  Jonathan Bar Or's Blog Post Jonathan Bar Or's LinkedIn Microsoft Security Blog Nic's LinkedIn Natalia's LinkedIn   Related: Listen to: Afternoon Cyber Tea with Ann Johnson Listen to: Security Unlocked: CISO Series with Bret Arsenault  Security Unlocked is produced by Microsoft and distributed as part of The CyberWire Network. 

Data & Science with Glen Wright Colopy
Philosophy of Data Science | Step-change and Anomaly Detection | Alex Bolton

Data & Science with Glen Wright Colopy

Play Episode Listen Later Feb 16, 2021 59:41


#datascience​ #ai​ #earlycareer​ Philosophy of Data Science Series Session 3: Data Science Highlight Reel Episode 4: Alex Bolton on Step-change and Anomaly Detection   Who makes it into the highlight reel of data science? Alex Bolton for doing the hard work of analyzing data to figure out exactly when things don't look "normal". We discuss the critical reasoning behind step-change detection and anomaly/novelty detection. Alex provides several real-world examples of the data and challenges. Watch it on... YouTube: https://www.youtube.com/watch?v=097FO1JDkhU Podbean:  We're always happy to hear your feedback and ideas - just post it in the YouTube comment section to start a conversation. Thank you for your time and support of the series!

Revenue Generator Podcast: Sales + Marketing + Product + Customer Success = Revenue Growth
Marketing Security & Anomaly Detection -- Eric Vardon // Morphio

Revenue Generator Podcast: Sales + Marketing + Product + Customer Success = Revenue Growth

Play Episode Listen Later Sep 15, 2020 15:10


This week we talk about how growth stage companies can leverage machine learning and artificial intelligence to spark expedited growth without breaking the bank. Joining us is Eric Vardon, the Co-Founder and CEO of Morphio, which is an AI centric technology platform designed to help humans ingest more data. In part 2 of our conversation we discuss marketing security and anomaly detection. Show NotesConnect With:Eric Vardon: Website // LinkedIn The MarTech Podcast: Email // Newsletter // TwitterBenjamin Shapiro: Website // LinkedIn // TwitterSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.