Podcasts about automl

  • 128PODCASTS
  • 220EPISODES
  • 43mAVG DURATION
  • 1EPISODE EVERY OTHER WEEK
  • Apr 18, 2025LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about automl

Latest podcast episodes about automl

The Tech Trek
Building AI Agents with Purpose

The Tech Trek

Play Episode Listen Later Apr 18, 2025 28:18


In this episode, Amir sits down with Nirman Dave, co-founder and CEO of Zams, an enterprise AI platform built to help businesses design and deploy AI agents with ease. They dive into Nirman's founding story—launching during the pandemic, navigating the evolution of the AI ecosystem, and the unique challenges of maintaining customer focus amid shifting trends and rising competition. Nirman also shares lessons from pitching investors, building trust with customers, and the art of product prioritization.

Bittensor Guru
S2E8 - Subnet 56 Gradients.io w/ wanderingweights

Bittensor Guru

Play Episode Listen Later Mar 28, 2025 68:05


The team at Rayon Labs have done it again with Gradients led by wanderingweights who joins the pod to discuss how he and his team have democratized AI model building with their "couple of clicks" no-code training subnet. This is one of the most groundbreaking projects on Bittensor that, in only a few months on the network, can already out-train the establishment. Think AutoML on Bittensor and you're on the right track but still selling this group way short. Enjoy! Video and links below.  https://x.com/KeithSingery/status/1905573818942263756 https://gradients.io https://github.com/rayonlabs/G.O.D https://rayonlabs.ai https://x.com/rayon_labs https://bittensor.guru

Secrets of Data Analytics Leaders
The AI/ML Tool Evaluation Template: A Guide to Smarter Selection - Audio Blog

Secrets of Data Analytics Leaders

Play Episode Listen Later Mar 11, 2025 13:53


This article breaks down the evolving landscape of AI/ML platforms, from AutoML to full-stack AI workbenches, and provides a structured tool evaluation framework to cut through vendor ambiguity. Published at: https://www.eckerson.com/articles/the-ai-ml-tool-evaluation-template-a-guide-to-smarter-selection

KI in der Industrie
AutoML: Did we promise ourselves too much?

KI in der Industrie

Play Episode Listen Later Jan 29, 2025 25:23 Transcription Available


We repeatedly see individual companies rolling out AutoML as a product, but the technology has not yet fulfilled the great promise of empowering domain experts. Why?

The Foresight Institute Podcast
Abhishek Singh | Decentralizing Machine Learning

The Foresight Institute Podcast

Play Episode Listen Later Jan 24, 2025 54:13


Abhishek Singh is a Ph.D. student at MIT Media Lab. His research interests include collective intelligence, self-organization, and decentralized machine learning. The central question guiding his research is --- how can we (algorithmically) engineer adaptive networks to build anti-fragile systems? He has co-authored multiple papers and built systems in machine learning, data privacy, and distributed computing. Before joining MIT, Abhishek worked with Cisco for 2 years where he did research in AutoML and Machine Learning for systems.An AbstractThe remarkable scaling of AI models has unlocked unprecedented capabilities in text and image generation, raising the question: why hasn't healthcare seen similar breakthroughs? While healthcare AI holds immense promise, progress has been stymied by fragmented data trapped in institutional silos. Traditional centralized approaches fall short in this domain, where privacy concerns and regulatory requirements prevent data consolidation. This talk introduces a framework for decentralized machine learning and discusses algorithms for enabling self-organization among participants with diverse resources and capabilities.About Foresight InstituteForesight Institute is a research organization and non-profit that supports the beneficial development of high-impact technologies. Since our founding in 1987 on a vision of guiding powerful technologies, we have continued to evolve into a many-armed organization that focuses on several fields of science and technology that are too ambitious for legacy institutions to support.Allison DuettmannThe President and CEO of Foresight Institute, Allison Duettmann directs the Intelligent Cooperation, Molecular Machines, Biotech & Health Extension, Neurotech, and Space Programs, alongside Fellowships, Prizes, and Tech Trees. She has also been pivotal in co-initiating the Longevity Prize, pioneering initiatives like Existentialhope.com, and contributing to notable works like "Superintelligence: Coordination & Strategy" and "Gaming the Future".Get Involved with Foresight:Apply to our virtual technical seminars Join our in-person events and workshops Donate: Support Our Work – If you enjoy what we do, please consider this, as we are entirely funded by your donations!Follow Us: Twitter | Facebook | LinkedInNote: Explore every word spoken on this podcast through Fathom.fm, an innovative podcast search engine. Hosted on Acast. See acast.com/privacy for more information.

Engineering Kiosk
#179 MLOps: Machine Learning in die Produktion bringen mit Michelle Golchert und Sebastian Warnholz

Engineering Kiosk

Play Episode Listen Later Jan 21, 2025 76:51


Machine Learning Operations (MLOps) mit Data Science Deep Dive.Machine Learning bzw. die Ergebnisse aus Vorhersagen (sogenannten Prediction-Models) sind aus der modernen IT oder gar aus unserem Leben nicht mehr wegzudenken. Solche Modelle kommen wahrscheinlich öfter zum Einsatz, als dir eigentlich bewusst ist. Die Programmierung, Erstellung und das Trainieren dieser Modelle ist die eine Sache. Das Deployment und der Betrieb ist die andere Thematik. Letzteres nennt man Machine Learning Operations, oder kurz “MLOps”. Dies ist das Thema dieser Episode.Wir klären was eigentlich MLOps ist und wie es sich zum klassischen DevOps unterscheidet, wie man das eigene Machine Learning-Modell in Produktion bringt und welche Stages dafür durchlaufen werden müssen, was der Unterschied von Model-Training und Model-Serving ist, welche Aufgabe eine Model-Registry hat, wie man Machine Learning Modelle in Produktion eigentlich monitored und debugged, was Model-Drift bzw. die Drift-Detection ist, ob der Feedback-Cycle durch Methoden wie Continuous Delivery auch kurz gehalten werden kann, aber auch welche Skills als MLOps Engineer wichtig sind.Um all diese Fragen zu beantworten, stehen uns Michelle Golchert und Sebastian Warnholz vom Data Science Deep Dive Podcast rede und Antwort.Unsere aktuellen Werbepartner findest du auf https://engineeringkiosk.dev/partnersDas schnelle Feedback zur Episode:

一桶金之財經新思維
AutoML Capital林朗行:用AI規避市場風險

一桶金之財經新思維

Play Episode Listen Later Dec 9, 2024 19:19


NeurologyLive Mind Moments
128: Machine Learning Algorithms to Predict Seizure Control in Epilepsy Surgery

NeurologyLive Mind Moments

Play Episode Listen Later Nov 15, 2024 21:12


Welcome to the NeurologyLive® Mind Moments® podcast. Tune in to hear leaders in neurology sound off on topics that impact your clinical practice. In this episode, Lara Jehi, MD, MHCDS, an epilepsy specialist and Cleveland Clinic's Chief Research and Information Officer, sat down to discuss a recently published study that explored using machine learning algorithms to predict seizure control after epilepsy surgery. In the interview, Jehi explained the unique aspects of the study design, emphasizing the importance of a large, well-characterized patient cohort with consistent follow-up and the choice of scalp EEG—a commonly used, non-invasive test in epilepsy care—as the data source. In addition, Jehi touched on the use of AutoML to streamline the process, enabling efficient identification of the top-performing algorithms and enhancing the model's predictive accuracy. Furthermore, she spoke on the team needed to properly implement machine learning techniques for neurosurgery, while providing recommendations for other institutions interested in pursuing these types of approaches. Looking for more epilepsy discussion? Check out the NeurologyLive® epilepsy clinical focus page. Episode Breakdown: 1:00 – Background on various machine learning approaches for epilepsy research 3:20 – Study details, findings, and notable takeaways 8:20 – Neurology News Minute 10:20 – Novelty in using scalp EEG and its global application 15:30 – Team personnel needed for proper implementation of machine learning techniques in epilepsy surgery The stories featured in this week's Neurology News Minute, which will give you quick updates on the following developments in neurology, are further detailed here: FDA Accepts Resubmitted NDA for Ataluren in Nonsense Duchenne Muscular Dystrophy FDA Places Clinical Hold on Epilepsy Agent RAP-219 for Diabetic Peripheral Neuropathic Pain First-Ever CRISPR/Cas13-RNA Editing Therapy to be Tested in Phase 1 Study of Age-Related Macular Degeneration Thanks for listening to the NeurologyLive® Mind Moments® podcast. To support the show, be sure to rate, review, and subscribe wherever you listen to podcasts. For more neurology news and expert-driven content, visit neurologylive.com.

The ERP Advisor
The ERP Minute Episode 155 - September 17th, 2024

The ERP Advisor

Play Episode Listen Later Sep 18, 2024 4:06 Transcription Available


NetSuite announced a series of new product updates and AI innovations across the suite to help organizations increase efficiency and accelerate growth. Oracle was busy making major data and AI-related announcements, alongside its new partnership with AWS. Qlik announced new enhancements to its AutoML capabilities. Certinia announced the general availability of Certinia Customer Success (CS) Cloud. ECI announced the completion of its acquisition of Khameleon Software, a cloud-based ERP software company supporting the unique needs of project-based dealers.Connect with us!https://www.erpadvisorsgroup.com866-499-8550LinkedIn:https://www.linkedin.com/company/erp-advisors-groupTwitter:https://twitter.com/erpadvisorsgrpFacebook:https://www.facebook.com/erpadvisorsInstagram:https://www.instagram.com/erpadvisorsgroupPinterest:https://www.pinterest.com/erpadvisorsgroupMedium:https://medium.com/@erpadvisorsgroup

Recsperts - Recommender Systems Experts
#22: Pinterest Homefeed and Ads Ranking with Prabhat Agarwal and Aayush Mudgal

Recsperts - Recommender Systems Experts

Play Episode Listen Later Jun 6, 2024 84:07


In episode 22 of Recsperts, we welcome Prabhat Agarwal, Senior ML Engineer, and Aayush Mudgal, Staff ML Engineer, both from Pinterest, to the show. Prabhat works on recommendations and search systems at Pinterest, leading representation learning efforts. Aayush is responsible for ads ranking and privacy-aware conversion modeling. We discuss user and content modeling, short- vs. long-term objectives, evaluation as well as multi-task learning and touch on counterfactual evaluation as well.In our interview, Prabhat guides us through the journey of continuous improvements of Pinterest's Homefeed personalization starting with techniques such as gradient boosting over two-tower models to DCN and transformers. We discuss how to capture users' short- and long-term preferences through multiple embeddings and the role of candidate generators for content diversification. Prabhat shares some details about position debiasing and the challenges to facilitate exploration.With Aayush we get the chance to dive into the specifics of ads ranking at Pinterest and he helps us to better understand how multifaceted ads can be. We learn more about the pain of having too many models and the Pinterest's efforts to consolidate the model landscape to improve infrastructural costs, maintainability, and efficiency. Aayush also shares some insights about exploration and corresponding randomization in the context of ads and how user behavior is very different between different kinds of ads.Both guests highlight the role of counterfactual evaluation and its impact for faster experimentation.Towards the end of the episode, we also touch a bit on learnings from last year's RecSys challenge.Enjoy this enriching episode of RECSPERTS - Recommender Systems Experts.Don't forget to follow the podcast and please leave a review(00:00) - Introduction (03:51) - Guest Introductions (09:57) - Pinterest Introduction (21:57) - Homefeed Personalization (47:27) - Ads Ranking (01:14:58) - RecSys Challenge 2023 (01:20:26) - Closing Remarks Links from the Episode:Prabhat Agarwal on LinkedInAayush Mudgal on LinkedInRecSys Challenge 2023Pinterest Engineering BlogPinterest LabsPrabhat's Talk at GTC 2022: Evolution of web-scale engagement modeling at PinterestBlogpost: How we use AutoML, Multi-task learning and Multi-tower models for Pinterest AdsBlogpost: Pinterest Home Feed Unified Lightweight Scoring: A Two-tower ApproachBlogpost: Experiment without the wait: Speeding up the iteration cycle with Offline Replay ExperimentationBlogpost: MLEnv: Standardizing ML at Pinterest Under One ML Engine to Accelerate InnovationPapers:Eksombatchai et al. (2018): Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-TimeYing et al. (2018): Graph Convolutional Neural Networks for Web-Scale Recommender SystemsPal et al. (2020): PinnerSage: Multi-Modal User Embedding Framework for Recommendations at PinterestPancha et al. (2022): PinnerFormer: Sequence Modeling for User Representation at PinterestZhao et al. (2019): Recommending what video to watch next: a multitask ranking systemGeneral Links:Follow me on LinkedInFollow me on XSend me your comments, questions and suggestions to marcel.kurovski@gmail.comRecsperts Website

KI in der Industrie
Industrial AI: Time Series, ick hör dir trapsen

KI in der Industrie

Play Episode Listen Later Jun 5, 2024 42:11 Transcription Available


In this episode we talk about Timeseries and in the main part Prof. Dr. Marco Huber from IPA and Marc Zöller from GFT explain how their Timeseries AutoML tool works. In the news part we offer an an overview from Statistical (ARIMA) through ML (Random Forest, gradient boosting, ...), Neural Networks (LSTM), AutoML and lately Transformers approaches. Thanks for listening. We welcome suggestions for topics, criticism and a few stars on Apple, Spotify and Co. We thank our partner **SIEMENS** https://www.siemens.de/de/ xLSTM Github ([mehr](https://lnkd.in/eG3HWJrs)) auto-sktime ([mehr](https://github.com/Ennosigaeon/auto-sktime)) Our guest [Marco Huber ](https://www.linkedin.com/in/marco-huber-78a1a151/) Our guest [Marc Zöller ](https://www.linkedin.com/in/marc-zoeller/) #machinelearning #ai #aimodel #industrialautomation #manufacturing #automation #genai #datascience #mlops #llm #IndustrialAI #artificialintelligence #sklearn

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0
ICLR 2024 — Best Papers & Talks (ImageGen, Vision, Transformers, State Space Models) ft. Christian Szegedy, Ilya Sutskever, Durk Kingma

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later May 27, 2024 218:03


Speakers for AI Engineer World's Fair have been announced! See our Microsoft episode for more info and buy now with code LATENTSPACE — we've been studying the best ML research conferences so we can make the best AI industry conf! Note that this year there are 4 main tracks per day and dozens of workshops/expo sessions; the free livestream will air much less than half of the content this time.Apply for free/discounted Diversity Program and Scholarship tickets here. We hope to make this the definitive technical conference for ALL AI engineers.ICLR 2024 took place from May 6-11 in Vienna, Austria. Just like we did for our extremely popular NeurIPS 2023 coverage, we decided to pay the $900 ticket (thanks to all of you paying supporters!) and brave the 18 hour flight and 5 day grind to go on behalf of all of you. We now present the results of that work!This ICLR was the biggest one by far, with a marked change in the excitement trajectory for the conference:Of the 2260 accepted papers (31% acceptance rate), of the subset of those relevant to our shortlist of AI Engineering Topics, we found many, many LLM reasoning and agent related papers, which we will cover in the next episode. We will spend this episode with 14 papers covering other relevant ICLR topics, as below.As we did last year, we'll start with the Best Paper Awards. Unlike last year, we now group our paper selections by subjective topic area, and mix in both Outstanding Paper talks as well as editorially selected poster sessions. Where we were able to do a poster session interview, please scroll to the relevant show notes for images of their poster for discussion. To cap things off, Chris Ré's spot from last year now goes to Sasha Rush for the obligatory last word on the development and applications of State Space Models.We had a blast at ICLR 2024 and you can bet that we'll be back in 2025

Papers Read on AI
LightAutoML: AutoML Solution for a Large Financial Services Ecosystem

Papers Read on AI

Play Episode Listen Later May 27, 2024 54:50


We present an AutoML system called LightAutoML developed for a large European financial services company and its ecosystem satisfying the set of idiosyncratic requirements that this ecosystem has for AutoML solutions. Our framework was piloted and deployed in numerous applications and performed at the level of the experienced data scientists while building high-quality ML models significantly faster than these data scientists. We also compare the performance of our system with various general-purpose open source AutoML solutions and show that it performs better for most of the ecosystem and OpenML problems. We also present the lessons that we learned while developing the AutoML system and moving it into production. 2021: Anton Vakhrushev, A. Ryzhkov, M. Savchenko, Dmitry Simakov, Rinchin Damdinov, Alexander Tuzhilin https://arxiv.org/pdf/2109.01528

Oracle University Podcast
Encore Episode: The OCI AI Portfolio

Oracle University Podcast

Play Episode Listen Later May 21, 2024 16:38


Oracle has been actively focusing on bringing AI to the enterprise at every layer of its tech stack, be it SaaS apps, AI services, infrastructure, or data.   In this episode, hosts Lois Houston and Nikita Abraham, along with senior instructors Hemant Gahankari and Himanshu Raj, discuss OCI AI and Machine Learning services. They also go over some key OCI Data Science concepts and responsible AI principles.   Oracle MyLearn: https://mylearn.oracle.com/ou/learning-path/become-an-oci-ai-foundations-associate-2023/127177   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X (formerly Twitter): https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Himanshu Raj, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript:   00:00 The world of artificial intelligence is vast and everchanging. And with all the buzz around it lately, we figured it was the perfect time to revisit our AI Made Easy series. Join us over the next few weeks as we chat about all things AI, helping you to discover its endless possibilities. Ready to dive in? Let's go! 00:33 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started! 00:46 Lois: Welcome to the Oracle University Podcast! I'm Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor. Nikita: Hey everyone! In our last episode, we dove into Generative AI and Language Learning Models.  Lois: Yeah, that was an interesting one. But today, we're going to discuss the AI and machine learning services offered by Oracle Cloud Infrastructure, and we'll look at the OCI AI infrastructure. Nikita: I'm also going to try and squeeze in a couple of questions on a topic I'm really keen about, which is responsible AI. To take us through all of this, we have two of our colleagues, Hemant Gahankari and Himanshu Raj. Hemant is a Senior Principal OCI Instructor and Himanshu is a Senior Instructor on AI/ML. So, let's get started! 01:36 Lois: Hi Hemant! We're so excited to have you here! We know that Oracle has really been focusing on bringing AI to the enterprise at every layer of our stack.  Hemant: It all begins with data and infrastructure layers. OCI AI services consume data, and AI services, in turn, are consumed by applications.  This approach involves extensive investment from infrastructure to SaaS applications. Generative AI and massive scale models are the more recent steps. Oracle AI is the portfolio of cloud services for helping organizations use the data they may have for the business-specific uses.  Business applications consume AI and ML services. The foundation of AI services and ML services is data. AI services contain pre-built models for specific uses. Some of the AI services are pre-trained, and some can be additionally trained by the customer with their own data.  AI services can be consumed by calling the API for the service, passing in the data to be processed, and the service returns a result. There is no infrastructure to be managed for using AI services.  02:58 Nikita: How do I access OCI AI services? Hemant: OCI AI services provide multiple methods for access. The most common method is the OCI Console. The OCI Console provides an easy to use, browser-based interface that enables access to notebook sessions and all the features of all the data science, as well as AI services.  The REST API provides access to service functionality but requires programming expertise. And API reference is provided in the product documentation. OCI also provides programming language SDKs for Java, Python, TypeScript, JavaScript, .Net, Go, and Ruby. The command line interface provides both quick access and full functionality without the need for scripting.  03:52 Lois: Hemant, what are the types of OCI AI services that are available?  Hemant: OCI AI services is a collection of services with pre-built machine learning models that make it easier for developers to build a variety of business applications. The models can also be custom trained for more accurate business results. The different services provided are digital assistant, language, vision, speech, document understanding, anomaly detection.  04:24 Lois: I know we're going to talk about them in more detail in the next episode, but can you introduce us to OCI Language, Vision, and Speech? Hemant: OCI Language allows you to perform sophisticated text analysis at scale. Using the pre-trained and custom models, you can process unstructured text to extract insights without data science expertise. Pre-trained models include language detection, sentiment analysis, key phrase extraction, text classification, named entity recognition, and personal identifiable information detection.  Custom models can be trained for named entity recognition and text classification with domain-specific data sets. In text translation, natural machine translation is used to translate text across numerous languages.  Using OCI Vision, you can upload images to detect and classify objects in them. Pre-trained models and custom models are supported. In image analysis, pre-trained models perform object detection, image classification, and optical character recognition. In image analysis, custom models can perform custom object detection by detecting the location of custom objects in an image and providing a bounding box.  The OCI Speech service is used to convert media files to readable texts that's stored in JSON and SRT format. Speech enables you to easily convert media files containing human speech into highly exact text transcriptions.  06:12 Nikita: That's great. And what about document understanding and anomaly detection? Hemant: Using OCI document understanding, you can upload documents to detect and classify text and objects in them. You can process individual files or batches of documents. In OCR, document understanding can detect and recognize text in a document. In text extraction, document understanding provides the word level and line level text, and the bounding box, coordinates of where the text is found.  In key value extraction, document understanding extracts a predefined list of key value pairs of information from receipts, invoices, passports, and driver IDs. In table extraction, document understanding extracts content in tabular format, maintaining the row and column relationship of cells. In document classification, the document understanding classifies documents into different types.  The OCI Anomaly Detection service is a service that analyzes large volume of multivariate or univariate time series data. The Anomaly Detection service increases the reliability of businesses by monitoring their critical assets and detecting anomalies early with high precision. Anomaly Detection is the identification of rare items, events, or observations in data that differ significantly from the expectation.  07:55 Nikita: Where is Anomaly Detection most useful? Hemant: The Anomaly Detection service is designed to help with analyzing large amounts of data and identifying the anomalies at the earliest possible time with maximum accuracy. Different sectors, such as utility, oil and gas, transportation, manufacturing, telecommunications, banking, and insurance use Anomaly Detection service for their day-to-day activities.  08:23 Lois: Ok…and the first OCI AI service you mentioned was digital assistant… Hemant: Oracle Digital Assistant is a platform that allows you to create and deploy digital assistants, which are AI driven interfaces that help users accomplish a variety of tasks with natural language conversations. When a user engages with the Digital Assistant, the Digital Assistant evaluates the user input and routes the conversation to and from the appropriate skills.  Digital Assistant greets the user upon access. Upon user requests, list what it can do and provide entry points into the given skills. It routes explicit user requests to the appropriate skills. And it also handles interruptions to flows and disambiguation. It also handles requests to exit the bot.  09:21 Nikita: Excellent! Let's bring Himanshu in to tell us about machine learning services. Hi Himanshu! Let's talk about OCI Data Science. Can you tell us a bit about it? Himanshu: OCI Data Science is the cloud service focused on serving the data scientist throughout the full machine learning life cycle with support for Python and open source.  The service has many features, such as model catalog, projects, JupyterLab notebook, model deployment, model training, management, model explanation, open source libraries, and AutoML.  09:56 Lois: Himanshu, what are the core principles of OCI Data Science?  Himanshu: There are three core principles of OCI Data Science. The first one, accelerated. The first principle is about accelerating the work of the individual data scientist. OCI Data Science provides data scientists with open source libraries along with easy access to a range of compute power without having to manage any infrastructure. It also includes Oracle's own library to help streamline many aspects of their work.  The second principle is collaborative. It goes beyond an individual data scientist's productivity to enable data science teams to work together. This is done through the sharing of assets, reducing duplicative work, and putting reproducibility and auditability of models for collaboration and risk management.  Third is enterprise grade. That means it's integrated with all the OCI Security and access protocols. The underlying infrastructure is fully managed. The customer does not have to think about provisioning compute and storage. And the service handles all the maintenance, patching, and upgrades so user can focus on solving business problems with data science.  11:11 Nikita: Let's drill down into the specifics of OCI Data Science. So far, we know it's cloud service to rapidly build, train, deploy, and manage machine learning models. But who can use it? Where is it? And how is it used? Himanshu: It serves data scientists and data science teams throughout the full machine learning life cycle.  Users work in a familiar JupyterLab notebook interface, where they write Python code. And how it is used? So users preserve their models in the model catalog and deploy their models to a managed infrastructure.  11:46 Lois: Walk us through some of the key terminology that's used. Himanshu: Some of the important product terminology of OCI Data Science are projects. The projects are containers that enable data science teams to organize their work. They represent collaborative work spaces for organizing and documenting data science assets, such as notebook sessions and models.  Note that tenancy can have as many projects as needed without limits. Now, this notebook session is where the data scientists work. Notebook sessions provide a JupyterLab environment with pre-installed open source libraries and the ability to add others. Notebook sessions are interactive coding environment for building and training models.  Notebook sessions run in a managed infrastructure and the user can select CPU or GPU, the compute shape, and amount of storage without having to do any manual provisioning. The other important feature is Conda environment. It's an open source environment and package management system and was created for Python programs.  12:53 Nikita: What is a Conda environment used for? Himanshu: It is used in the service to quickly install, run, and update packages and their dependencies. Conda easily creates, saves, loads, and switches between environments in your notebooks sessions. 13:07 Nikita: Earlier, you spoke about the support for Python in OCI Data Science. Is there a dedicated library? Himanshu: Oracle's Accelerated Data Science ADS SDK is a Python library that is included as part of OCI Data Science.  ADS has many functions and objects that automate or simplify the steps in the data science workflow, including connecting to data, exploring, and visualizing data. Training a model with AutoML, evaluating models, and explaining models. In addition, ADS provides a simple interface to access the data science service mode model catalog and other OCI services, including object storage.  13:45 Lois: I also hear a lot about models. What are models? Himanshu: Models define a mathematical representation of your data and business process. You create models in notebooks, sessions, inside projects.  13:57 Lois: What are some other important terminologies related to models? Himanshu: The next terminology is model catalog. The model catalog is a place to store, track, share, and manage models.  The model catalog is a centralized and managed repository of model artifacts. A stored model includes metadata about the provenance of the model, including Git-related information and the script. Our notebook used to push the model to the catalog. Models stored in the model catalog can be shared across members of a team, and they can be loaded back into a notebook session.  The next one is model deployments. Model deployments allow you to deploy models stored in the model catalog as HTTP endpoints on managed infrastructure.  14:45 Lois: So, how do you operationalize these models? Himanshu: Deploying machine learning models as web applications, HTTP API endpoints, serving predictions in real time is the most common way to operationalize models. HTTP endpoints or the API endpoints are flexible and can serve requests for the model predictions. Data science jobs enable you to define and run a repeatable machine learning tasks on fully managed infrastructure.  Nikita: Thanks for that, Himanshu.  15:18 Did you know that Oracle University offers free courses on Oracle Cloud Infrastructure? You'll find training on everything from cloud computing, database, and security, artificial intelligence, and machine learning, all free to subscribers. So, what are you waiting for? Pick a topic, leverage the Oracle University Learning Community to ask questions, and then sit for your certification. Visit mylearn.oracle.com to get started.  15:46 Nikita: Welcome back! The Oracle AI Stack consists of AI services and machine learning services, and these services are built using AI infrastructure. So, let's move on to that. Hemant, what are the components of OCI AI Infrastructure? Hemant: OCI AI Infrastructure is mainly composed of GPU-based instances. Instances can be virtual machines or bare metal machines. High performance cluster networking that allows instances to communicate to each other. Super clusters are a massive network of GPU instances with multiple petabytes per second of bandwidth. And a variety of fully managed storage options from a single byte to exabytes without upfront provisioning are also available.  16:35 Lois: Can we explore each of these components a little more? First, tell us, why do we need GPUs? Hemant: ML and AI needs lots of repetitive computations to be made on huge amounts of data. Parallel computing on GPUs is designed for many processes at the same time. A GPU is a piece of hardware that is incredibly good in performing computations.  GPU has thousands of lightweight cores, all working on their share of data in parallel. This gives them the ability to crunch through extremely large data set at tremendous speed.  17:14 Nikita: And what are the GPU instances offered by OCI? Hemant: GPU instances are ideally suited for model training and inference. Bare metal and virtual machine compute instances powered by NVIDIA GPUs H100, A100, A10, and V100 are made available by OCI.  17:35 Nikita: So how do we choose what to train from these different GPU options?  Hemant: For large scale AI training, data analytics, and high performance computing, bare metal instances BM 8 X NVIDIA H100 and BM 8 X NVIDIA A100 can be used.  These provide up to nine times faster AI training and 30 times higher acceleration for AI inferencing. The other bare metal and virtual machines are used for small AI training, inference, streaming, gaming, and virtual desktop infrastructure.  18:14 Lois: And why would someone choose the OCI AI stack over its counterparts? Hemant: Oracle offers all the features and is the most cost effective option when compared to its counterparts.  For example, BM GPU 4.8 version 2 instance costs just $4 per hour and is used by many customers.  Superclusters are a massive network with multiple petabytes per second of bandwidth. It can scale up to 4,096 OCI bare metal instances with 32,768 GPUs.  We also have a choice of bare metal A100 or H100 GPU instances, and we can select a variety of storage options, like object store, or block store, or even file system. For networking speeds, we can reach 1,600 GB per second with A100 GPUs and 3,200 GB per second with H100 GPUs.  With OCI storage, we can select local SSD up to four NVMe drives, block storage up to 32 terabytes per volume, object storage up to 10 terabytes per object, file systems up to eight exabyte per file system. OCI File system employs five replicated storage located in different fault domains to provide redundancy for resilient data protection.  HPC file systems, such as BeeGFS and many others are also offered. OCI HPC file systems are available on Oracle Cloud Marketplace and make it easy to deploy a variety of high performance file servers.  20:11 Lois: I think a discussion on AI would be incomplete if we don't talk about responsible AI. We're using AI more and more every day, but can we actually trust it? Hemant: For us to trust AI, it must be driven by ethics that guide us as well. Nikita: And do we have some principles that guide the use of AI? Hemant: AI should be lawful, complying with all applicable laws and regulations. AI should be ethical, that is it should ensure adherence to ethical principles and values that we uphold as humans. And AI should be robust, both from a technical and social perspective. Because even with the good intentions, AI systems can cause unintentional harm. AI systems do not operate in a lawless world. A number of legally binding rules at national and international level apply or are relevant to the development, deployment, and use of AI systems today. The law not only prohibits certain actions but also enables others, like protecting rights of minorities or protecting environment. Besides horizontally applicable rules, various domain-specific rules exist that apply to particular AI applications. For instance, the medical device regulation in the health care sector.  In AI context, equality entails that the systems' operations cannot generate unfairly biased outputs. And while we adopt AI, citizens right should also be protected.  21:50 Lois: Ok, but how do we derive AI ethics from these? Hemant: There are three main principles.  AI should be used to help humans and allow for oversight. It should never cause physical or social harm. Decisions taken by AI should be transparent and fair, and also should be explainable. AI that follows the AI ethical principles is responsible AI.  So if we map the AI ethical principles to responsible AI requirements, these will be like, AI systems should follow human-centric design principles and leave meaningful opportunity for human choice. This means securing human oversight. AI systems and environments in which they operate must be safe and secure, they must be technically robust, and should not be open to malicious use.  The development, and deployment, and use of AI systems must be fair, ensuring equal and just distribution of both benefits and costs. AI should be free from unfair bias and discrimination. Decisions taken by AI to the extent possible should be explainable to those directly and indirectly affected.  23:21 Nikita: This is all great, but what does a typical responsible AI implementation process look like?  Hemant: First, a governance needs to be put in place. Second, develop a set of policies and procedures to be followed. And once implemented, ensure compliance by regular monitoring and evaluation.  Lois: And this is all managed by developers? Hemant: Typical roles that are involved in the implementation cycles are developers, deployers, and end users of the AI.  23:56 Nikita: Can we talk about AI specifically in health care? How do we ensure that there is fairness and no bias? Hemant: AI systems are only as good as the data that they are trained on. If that data is predominantly from one gender or racial group, the AI systems might not perform as well on data from other groups.  24:21 Lois: Yeah, and there's also the issue of ensuring transparency, right? Hemant: AI systems often make decisions based on complex algorithms that are difficult for humans to understand. As a result, patients and health care providers can have difficulty trusting the decisions made by the AI. AI systems must be regularly evaluated to ensure that they are performing as intended and not causing harm to patients.  24:49 Nikita: Thank you, Hemant and Himanshu, for this really insightful session. If you're interested in learning more about the topics we discussed today, head on over to mylearn.oracle.com and search for the Oracle Cloud Infrastructure AI Foundations course.  Lois: That's right, Niki. You'll find demos that you watch as well as skill checks that you can attempt to better your understanding. In our next episode, we'll get into the OCI AI Services we discussed today and talk about them in more detail. Until then, this is Lois Houston… Nikita: And Nikita Abraham, signing off! 25:25 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.

KI in der Industrie
Beckhoff launches an AutoML tool

KI in der Industrie

Play Episode Listen Later Apr 21, 2024 26:15


Fabian Bause was actually very sceptical at our AI in the Forest event, which was all about AutoML. Then he called us a few weeks ago and we were surprised. Beckhoff offers its customers and also non-Beckhoff users an AutoML tool. The automation company is thus breaking new ground. Has he changed his mind? And if so, why? Fabian explains it to us.

MLOps.community
[Exclusive] Zilliz Roundtable // Why Purpose-built Vector Databases Matter for Your Use Case

MLOps.community

Play Episode Listen Later Mar 15, 2024 59:00


Frank Liu is the Director of Operations & ML Architect at Zilliz, where he serves as a maintainer for the Towhee open-source project. Jiang Chen is the Head of AI Platform and Ecosystem at Zilliz. Yujian Tang is a developer advocate at Zilliz. He has a background as a software engineer working on AutoML at Amazon. MLOps Coffee Sessions Special episode with Zilliz, Why Purpose-built Vector Databases Matter for Your Use Case, fueled by our Premium Brand Partner, Zilliz. Engineering deep-dive into the world of purpose-built databases optimized for vector data. In this live session, we explore why non-purpose-built databases fall short in handling vector data effectively and discuss real-world use cases demonstrating the transformative potential of purpose-built solutions. Whether you're a developer, data scientist, or database enthusiast, this virtual roundtable offers valuable insights into harnessing the full potential of vector data for your projects. // Bio Jiang Chen Frank Liu is Head of AI & ML at Zilliz, with over eight years of industry experience in machine learning and hardware engineering. Before joining Zilliz, Frank co-founded Orion Innovations, an IoT startup based in Shanghai, and worked as an ML Software Engineer at Yahoo in San Francisco. He presents at major industry events like the Open Source Summit and writes tech content for leading publications such as Towards Data Science and DZone. His passion for ML extends beyond the workplace; in his free time, he trains ML models and experiments with unique architectures. Frank holds MS and BS degrees in Electrical Engineering from Stanford University. Frank Liu Jiang Chen is the Head of AI Platform and Ecosystem at Zilliz. With years of experience in data infrastructures and information retrieval, Jiang previously served as a tech lead and product manager for Search Indexing at Google. Jiang holds a Master's degree in Computer Science from the University of Michigan, Ann Arbor. Yujian Tang Yujian Tang is a Developer Advocate at Zilliz. He has a background as a software engineer working on AutoML at Amazon. Yujian studied Computer Science, Statistics, and Neuroscience with research papers published to conferences including IEEE Big Data. He enjoys drinking bubble tea, spending time with family, and being near water. // MLOps Jobs board https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links Website: https://zilliz.com/ Neural Priming for Sample-Efficient Adaptation: https://arxiv.org/abs/2306.10191LIMA: Less Is More for Alignment: https://arxiv.org/abs/2305.11206ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT: https://arxiv.org/abs/2004.12832 Milvus Vector Database by Zilliz: https://zilliz.com/what-is-milvus --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Timestamps: [00:00] Demetrios' musical intro [04:36] Vector Databases vs. LLMs [07:51] Relevance Over Speed [12:55] Pipelines [16:19] Vector Databases Integration Benefits [26:42] Database Diversity Market [27:38] Milus vs. Pinecone [30:22] Vector DB for Training & Deployment [34:32] Future proof of AI applications [45:16] Data Size and Quality [48:53] ColBERT Model [54:25] Vector Data Consistency Best Practices [57:24] Wrap up

Oracle University Podcast
The OCI AI Portfolio

Oracle University Podcast

Play Episode Listen Later Mar 5, 2024 25:33


Oracle has been actively focusing on bringing AI to the enterprise at every layer of its tech stack, be it SaaS apps, AI services, infrastructure, or data. In this episode, hosts Lois Houston and Nikita Abraham, along with senior instructors Hemant Gahankari and Himanshu Raj, discuss OCI AI and Machine Learning services. They also go over some key OCI Data Science concepts and responsible AI principles. Oracle MyLearn: https://mylearn.oracle.com/ou/learning-path/become-an-oci-ai-foundations-associate-2023/127177 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X (formerly Twitter): https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Himanshu Raj, and the OU Studio Team for helping us create this episode. ------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started! 00:26 Lois: Welcome to the Oracle University Podcast! I'm Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor. Nikita: Hey everyone! In our last episode, we dove into Generative AI and Language Learning Models.  Lois: Yeah, that was an interesting one. But today, we're going to discuss the AI and machine learning services offered by Oracle Cloud Infrastructure, and we'll look at the OCI AI infrastructure. Nikita: I'm also going to try and squeeze in a couple of questions on a topic I'm really keen about, which is responsible AI. To take us through all of this, we have two of our colleagues, Hemant Gahankari and Himanshu Raj. Hemant is a Senior Principal OCI Instructor and Himanshu is a Senior Instructor on AI/ML. So, let's get started! 01:16 Lois: Hi Hemant! We're so excited to have you here! We know that Oracle has really been focusing on bringing AI to the enterprise at every layer of our stack.  Hemant: It all begins with data and infrastructure layers. OCI AI services consume data, and AI services, in turn, are consumed by applications.  This approach involves extensive investment from infrastructure to SaaS applications. Generative AI and massive scale models are the more recent steps. Oracle AI is the portfolio of cloud services for helping organizations use the data they may have for the business-specific uses.  Business applications consume AI and ML services. The foundation of AI services and ML services is data. AI services contain pre-built models for specific uses. Some of the AI services are pre-trained, and some can be additionally trained by the customer with their own data.  AI services can be consumed by calling the API for the service, passing in the data to be processed, and the service returns a result. There is no infrastructure to be managed for using AI services.  02:37 Nikita: How do I access OCI AI services? Hemant: OCI AI services provide multiple methods for access. The most common method is the OCI Console. The OCI Console provides an easy to use, browser-based interface that enables access to notebook sessions and all the features of all the data science, as well as AI services.  The REST API provides access to service functionality but requires programming expertise. And API reference is provided in the product documentation. OCI also provides programming language SDKs for Java, Python, TypeScript, JavaScript, .Net, Go, and Ruby. The command line interface provides both quick access and full functionality without the need for scripting.  03:31 Lois: Hemant, what are the types of OCI AI services that are available?  Hemant: OCI AI services is a collection of services with pre-built machine learning models that make it easier for developers to build a variety of business applications. The models can also be custom trained for more accurate business results. The different services provided are digital assistant, language, vision, speech, document understanding, anomaly detection.  04:03 Lois: I know we're going to talk about them in more detail in the next episode, but can you introduce us to OCI Language, Vision, and Speech? Hemant: OCI Language allows you to perform sophisticated text analysis at scale. Using the pre-trained and custom models, you can process unstructured text to extract insights without data science expertise. Pre-trained models include language detection, sentiment analysis, key phrase extraction, text classification, named entity recognition, and personal identifiable information detection.  Custom models can be trained for named entity recognition and text classification with domain-specific data sets. In text translation, natural machine translation is used to translate text across numerous languages.  Using OCI Vision, you can upload images to detect and classify objects in them. Pre-trained models and custom models are supported. In image analysis, pre-trained models perform object detection, image classification, and optical character recognition. In image analysis, custom models can perform custom object detection by detecting the location of custom objects in an image and providing a bounding box.  The OCI Speech service is used to convert media files to readable texts that's stored in JSON and SRT format. Speech enables you to easily convert media files containing human speech into highly exact text transcriptions.  05:52 Nikita: That's great. And what about document understanding and anomaly detection? Hemant: Using OCI document understanding, you can upload documents to detect and classify text and objects in them. You can process individual files or batches of documents. In OCR, document understanding can detect and recognize text in a document. In text extraction, document understanding provides the word level and line level text, and the bounding box, coordinates of where the text is found.  In key value extraction, document understanding extracts a predefined list of key value pairs of information from receipts, invoices, passports, and driver IDs. In table extraction, document understanding extracts content in tabular format, maintaining the row and column relationship of cells. In document classification, the document understanding classifies documents into different types.  The OCI Anomaly Detection service is a service that analyzes large volume of multivariate or univariate time series data. The Anomaly Detection service increases the reliability of businesses by monitoring their critical assets and detecting anomalies early with high precision. Anomaly Detection is the identification of rare items, events, or observations in data that differ significantly from the expectation.  07:34 Nikita: Where is Anomaly Detection most useful? Hemant: The Anomaly Detection service is designed to help with analyzing large amounts of data and identifying the anomalies at the earliest possible time with maximum accuracy. Different sectors, such as utility, oil and gas, transportation, manufacturing, telecommunications, banking, and insurance use Anomaly Detection service for their day-to-day activities.  08:02 Lois: Ok.. and the first OCI AI service you mentioned was digital assistant… Hemant: Oracle Digital Assistant is a platform that allows you to create and deploy digital assistants, which are AI driven interfaces that help users accomplish a variety of tasks with natural language conversations. When a user engages with the Digital Assistant, the Digital Assistant evaluates the user input and routes the conversation to and from the appropriate skills.  Digital Assistant greets the user upon access. Upon user requests, list what it can do and provide entry points into the given skills. It routes explicit user requests to the appropriate skills. And it also handles interruptions to flows and disambiguation. It also handles requests to exit the bot.  09:00 Nikita: Excellent! Let's bring Himanshu in to tell us about machine learning services. Hi Himanshu! Let's talk about OCI Data Science. Can you tell us a bit about it? Himanshu: OCI Data Science is the cloud service focused on serving the data scientist throughout the full machine learning life cycle with support for Python and open source.  The service has many features, such as model catalog, projects, JupyterLab notebook, model deployment, model training, management, model explanation, open source libraries, and AutoML.  09:35 Lois: Himanshu, what are the core principles of OCI Data Science?  Himanshu: There are three core principles of OCI Data Science. The first one, accelerated. The first principle is about accelerating the work of the individual data scientist. OCI Data Science provides data scientists with open source libraries along with easy access to a range of compute power without having to manage any infrastructure. It also includes Oracle's own library to help streamline many aspects of their work.  The second principle is collaborative. It goes beyond an individual data scientist's productivity to enable data science teams to work together. This is done through the sharing of assets, reducing duplicative work, and putting reproducibility and auditability of models for collaboration and risk management.  Third is enterprise grade. That means it's integrated with all the OCI Security and access protocols. The underlying infrastructure is fully managed. The customer does not have to think about provisioning compute and storage. And the service handles all the maintenance, patching, and upgrades so user can focus on solving business problems with data science.  10:50 Nikita: Let's drill down into the specifics of OCI Data Science. So far, we know it's cloud service to rapidly build, train, deploy, and manage machine learning models. But who can use it? Where is it? And how is it used? Himanshu: It serves data scientists and data science teams throughout the full machine learning life cycle.  Users work in a familiar JupyterLab notebook interface, where they write Python code. And how it is used? So users preserve their models in the model catalog and deploy their models to a managed infrastructure.  11:25 Lois: Walk us through some of the key terminology that's used. Himanshu: Some of the important product terminology of OCI Data Science are projects. The projects are containers that enable data science teams to organize their work. They represent collaborative work spaces for organizing and documenting data science assets, such as notebook sessions and models.  Note that tenancy can have as many projects as needed without limits. Now, this notebook session is where the data scientists work. Notebook sessions provide a JupyterLab environment with pre-installed open source libraries and the ability to add others. Notebook sessions are interactive coding environment for building and training models.  Notebook sessions run in a managed infrastructure and the user can select CPU or GPU, the compute shape, and amount of storage without having to do any manual provisioning. The other important feature is Conda environment. It's an open source environment and package management system and was created for Python programs.  12:33 Nikita: What is a Conda environment used for? Himanshu: It is used in the service to quickly install, run, and update packages and their dependencies. Conda easily creates, saves, loads, and switches between environments in your notebooks sessions. 12:46 Nikita: Earlier, you spoke about the support for Python in OCI Data Science. Is there a dedicated library? Himanshu: Oracle's Accelerated Data Science ADS SDK is a Python library that is included as part of OCI Data Science.  ADS has many functions and objects that automate or simplify the steps in the data science workflow, including connecting to data, exploring, and visualizing data. Training a model with AutoML, evaluating models, and explaining models. In addition, ADS provides a simple interface to access the data science service mode model catalog and other OCI services, including object storage.  13:24 Lois: I also hear a lot about models. What are models? Himanshu: Models define a mathematical representation of your data and business process. You create models in notebooks, sessions, inside projects.  13:36 Lois: What are some other important terminologies related to models? Himanshu: The next terminology is model catalog. The model catalog is a place to store, track, share, and manage models.  The model catalog is a centralized and managed repository of model artifacts. A stored model includes metadata about the provenance of the model, including Git-related information and the script. Our notebook used to push the model to the catalog. Models stored in the model catalog can be shared across members of a team, and they can be loaded back into a notebook session.  The next one is model deployments. Model deployments allow you to deploy models stored in the model catalog as HTTP endpoints on managed infrastructure.  14:24 Lois: So, how do you operationalize these models? Himanshu: Deploying machine learning models as web applications, HTTP API endpoints, serving predictions in real time is the most common way to operationalize models. HTTP endpoints or the API endpoints are flexible and can serve requests for the model predictions. Data science jobs enable you to define and run a repeatable machine learning tasks on fully managed infrastructure.  Nikita: Thanks for that, Himanshu.  14:57 Did you know that Oracle University offers free courses on Oracle Cloud Infrastructure? You'll find training on everything from cloud computing, database, and security, artificial intelligence, and machine learning, all free to subscribers. So, what are you waiting for? Pick a topic, leverage the Oracle University Learning Community to ask questions, and then sit for your certification. Visit mylearn.oracle.com to get started.  15:25 Nikita: Welcome back! The Oracle AI Stack consists of AI services and machine learning services, and these services are built using AI infrastructure. So, let's move on to that. Hemant, what are the components of OCI AI Infrastructure? Hemant: OCI AI Infrastructure is mainly composed of GPU-based instances. Instances can be virtual machines or bare metal machines. High performance cluster networking that allows instances to communicate to each other. Super clusters are a massive network of GPU instances with multiple petabytes per second of bandwidth. And a variety of fully managed storage options from a single byte to exabytes without upfront provisioning are also available.  16:14 Lois: Can we explore each of these components a little more? First, tell us, why do we need GPUs? Hemant: ML and AI needs lots of repetitive computations to be made on huge amounts of data. Parallel computing on GPUs is designed for many processes at the same time. A GPU is a piece of hardware that is incredibly good in performing computations.  GPU has thousands of lightweight cores, all working on their share of data in parallel. This gives them the ability to crunch through extremely large data set at tremendous speed.  16:54 Nikita: And what are the GPU instances offered by OCI? Hemant: GPU instances are ideally suited for model training and inference. Bare metal and virtual machine compute instances powered by NVIDIA GPUs H100, A100, A10, and V100 are made available by OCI.  17:14 Nikita: So how do we choose what to train from these different GPU options?  Hemant: For large scale AI training, data analytics, and high performance computing, bare metal instances BM 8 X NVIDIA H100 and BM 8 X NVIDIA A100 can be used.  These provide up to nine times faster AI training and 30 times higher acceleration for AI inferencing. The other bare metal and virtual machines are used for small AI training, inference, streaming, gaming, and virtual desktop infrastructure.  17:53 Lois: And why would someone choose the OCI AI stack over its counterparts? Hemant: Oracle offers all the features and is the most cost effective option when compared to its counterparts.  For example, BM GPU 4.8 version 2 instance costs just $4 per hour and is used by many customers.  Superclusters are a massive network with multiple petabytes per second of bandwidth. It can scale up to 4,096 OCI bare metal instances with 32,768 GPUs.  We also have a choice of bare metal A100 or H100 GPU instances, and we can select a variety of storage options, like object store, or block store, or even file system. For networking speeds, we can reach 1,600 GB per second with A100 GPUs and 3,200 GB per second with H100 GPUs.  With OCI storage, we can select local SSD up to four NVMe drives, block storage up to 32 terabytes per volume, object storage up to 10 terabytes per object, file systems up to eight exabyte per file system. OCI File system employs five replicated storage located in different fault domains to provide redundancy for resilient data protection.  HPC file systems, such as BeeGFS and many others are also offered. OCI HPC file systems are available on Oracle Cloud Marketplace and make it easy to deploy a variety of high performance file servers.  19:50 Lois: I think a discussion on AI would be incomplete if we don't talk about responsible AI. We're using AI more and more every day, but can we actually trust it? Hemant: For us to trust AI, it must be driven by ethics that guide us as well. Nikita: And do we have some principles that guide the use of AI? Hemant: AI should be lawful, complying with all applicable laws and regulations. AI should be ethical, that is it should ensure adherence to ethical principles and values that we uphold as humans. And AI should be robust, both from a technical and social perspective. Because even with the good intentions, AI systems can cause unintentional harm. AI systems do not operate in a lawless world. A number of legally binding rules at national and international level apply or are relevant to the development, deployment, and use of AI systems today. The law not only prohibits certain actions but also enables others, like protecting rights of minorities or protecting environment. Besides horizontally applicable rules, various domain-specific rules exist that apply to particular AI applications. For instance, the medical device regulation in the health care sector.  In AI context, equality entails that the systems' operations cannot generate unfairly biased outputs. And while we adopt AI, citizens right should also be protected.  21:30 Lois: Ok, but how do we derive AI ethics from these? Hemant: There are three main principles.  AI should be used to help humans and allow for oversight. It should never cause physical or social harm. Decisions taken by AI should be transparent and fair, and also should be explainable. AI that follows the AI ethical principles is responsible AI.  So if we map the AI ethical principles to responsible AI requirements, these will be like, AI systems should follow human-centric design principles and leave meaningful opportunity for human choice. This means securing human oversight. AI systems and environments in which they operate must be safe and secure, they must be technically robust, and should not be open to malicious use.  The development, and deployment, and use of AI systems must be fair, ensuring equal and just distribution of both benefits and costs. AI should be free from unfair bias and discrimination. Decisions taken by AI to the extent possible should be explainable to those directly and indirectly affected.  23:01 Nikita: This is all great, but what does a typical responsible AI implementation process look like?  Hemant: First, a governance needs to be put in place. Second, develop a set of policies and procedures to be followed. And once implemented, ensure compliance by regular monitoring and evaluation.  Lois: And this is all managed by developers? Hemant: Typical roles that are involved in the implementation cycles are developers, deployers, and end users of the AI.  23:35 Nikita: Can we talk about AI specifically in health care? How do we ensure that there is fairness and no bias? Hemant: AI systems are only as good as the data that they are trained on. If that data is predominantly from one gender or racial group, the AI systems might not perform as well on data from other groups.  24:00 Lois: Yeah, and there's also the issue of ensuring transparency, right? Hemant: AI systems often make decisions based on complex algorithms that are difficult for humans to understand. As a result, patients and health care providers can have difficulty trusting the decisions made by the AI. AI systems must be regularly evaluated to ensure that they are performing as intended and not causing harm to patients.  24:29 Nikita: Thank you, Hemant and Himanshu, for this really insightful session. If you're interested in learning more about the topics we discussed today, head on over to mylearn.oracle.com and search for the Oracle Cloud Infrastructure AI Foundations course.  Lois: That's right, Niki. You'll find demos that you watch as well as skill checks that you can attempt to better your understanding. In our next episode, we'll get into the OCI AI Services we discussed today and talk about them in more detail. Until then, this is Lois Houston… Nikita: And Nikita Abraham, signing off! 25:05 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.

Oracle University Podcast
Everything You Need to Know About the MySQL HeatWave Implementation Associate Certification

Oracle University Podcast

Play Episode Listen Later Feb 13, 2024 14:33


What is MySQL HeatWave? How do I get certified in it? Where do I start? Listen to Lois Houston and Nikita Abraham, along with MySQL Developer Scott Stroz, answer all these questions and more on this week's episode of the Oracle University Podcast. MySQL Document Store: https://oracleuniversitypodcast.libsyn.com/mysql-document-store Oracle MyLearn: https://mylearn.oracle.com/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X (formerly Twitter): https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode. -------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this  series of informative podcasts, we'll bring you foundational training on the most popular  Oracle technologies. Let's get started! 00:26 Nikita: Welcome to the Oracle University Podcast! I'm Nikita Abraham, Principal Technical Editor with Oracle University, and with me is Lois Houston, Director of Innovation Programs. Lois: Hi there! For the last two weeks, we've been having really exciting discussions on everything AI. We covered the basics of artificial intelligence and machine learning, and we're taking a short break from that today to talk about the new MySQL HeatWave Implementation Associate Certification with MySQL Developer Advocate Scott Stroz. 00:59 Nikita: You may remember Scott from an episode last year where he came on to discuss MySQL Document Store. We'll post the link to that episode in the show notes so you can listen to it if you haven't already. Lois: Hi Scott! Thanks for joining us again. Before diving into the certification, tell us, what is MySQL HeatWave?  01:19 Scott: Hi Lois, Hi Niki. I'm so glad to be back. So, MySQL HeatWave Database Service is a fully managed database that is capable of running transactional and analytic queries in a single database instance. This can be done across data warehouses and data lakes. We get all the benefits of analytic queries without the latency and potential security issues of performing standard extract, transform, and load, or ETL, operations. Some other MySQL HeatWave database service features are automated system updates and database backups, high availability, in-database machine learning with AutoML, MySQL Autopilot for managing instance provisioning, and enhanced data security.  HeatWave is the only cloud database service running MySQL that is built, managed, and supported by the MySQL Engineering team. 02:14 Lois: And where can I find MySQL HeatWave? Scott: MySQL HeatWave is only available in the cloud. MySQL HeatWave instances can be provisioned in Oracle Cloud Infrastructure or OCI, Amazon Web Services (AWS), and Microsoft Azure. Now, some features though are only available in Oracle Cloud, such as access to MySQL Document Store. 02:36 Nikita: Scott, you said MySQL HeatWave runs transactional and analytic queries in a single instance. Can you elaborate on that? Scott: Sure, Niki. So, MySQL HeatWave allows developers, database administrators, and data analysts to run transactional queries (OLTP) and analytic queries (OLAP).  OLTP, or online transaction processing, allows for real-time execution of database transactions. A transaction is any kind of insertion, deletion, update, or query of data. Most DBAs and developers work with this kind of processing in their day-to-day activities.   OLAP, or online analytical processing, is one way to handle multi-dimensional analytical queries typically used for reporting or data analytics. OLTP system data must typically be exported, aggregated, and imported into an OLAP system. This procedure is called ETL as I mentioned – extract, transform, and load. With large datasets, ETL processes can take a long time to complete, so analytic data could be “old” by the time it is available in an OLAP system. There is also an increased security risk in moving the data to an external source. 03:56 Scott: MySQL HeatWave eliminates the need for time-consuming ETL processes. We can actually get real-time analytics from our data since HeatWave allows for OLTP and OLAP in a single instance. I should note, this also includes analytic from JSON data that may be stored in the database. Another advantage is that applications can use MySQL HeatWave without changing any of the application code. Developers only need to point their applications at the MySQL HeatWave databases. MySQL HeatWave is fully compatible with on-premise MySQL instances, which can allow for a seamless transition to the cloud. And one other thing. When MySQL HeatWave has OLAP features enabled, MySQL can determine what type of query is being executed and route it to either the normal database system or the in-memory database. 04:52 Lois: That's so cool! And what about the other features you mentioned, Scott? Automated updates and backups, high availability… Scott: Right, Lois. But before that, I want to tell you about the in-memory query accelerator. MySQL HeatWave offers a massively parallel, in-memory hybrid columnar query processing engine. It provides high performance by utilizing algorithms for distributed query processing. And this query processing in MySQL HeatWave is optimized for cloud environments.  MySQL HeatWave can be configured to automatically apply system updates, so you will always have the latest and greatest version of MySQL. Then, we have automated backups. By this, I mean MySQL HeatWave can be configured to provide automated backups with point-in-time recovery to ensure data can be restored to a particular date and time. MySQL HeatWave also allows us to define a retention plan for our database backups, that means how long we keep the backups before they are deleted. High availability with MySQL HeatWave allows for more consistent uptime. When using high availability, MySQL HeatWave instances can be provisioned across multiple availability domains, providing automatic failover for when the primary node becomes unavailable. All availability domains within a region are physically separated from each other to mitigate the possibility of a single point of failure. 06:14 Scott: We also have MySQL Lakehouse. Lakehouse allows for the querying of data stored in object storage in various formats. This can be CSV, Parquet, Avro, or an export format from other database systems. And basically, we point Lakehouse at data stored in Oracle Cloud, and once it's ingested, the data can be queried just like any other data in a database. Lakehouse supports querying data up to half a petabyte in size using the HeatWave engine. And this allows users to take advantage of HeatWave for non-MySQL workloads. MySQL AutoPilot is a part of MySQL HeatWave and can be used to predict the number of HeatWave nodes a system will need and automatically provision them as part of a cluster. AutoPilot has features that can handle automatic thread pooling and database shape predicting. A “shape” is one of the many different CPU, memory, and ethernet traffic configurations available for MySQL HeatWave. MySQL HeatWave includes some advanced security features such as asymmetric encryption and automated data masking at query execution. As you can see, there are a lot of features covered under the HeatWave umbrella! 07:31 Did you know that Oracle University offers free courses on Oracle Cloud Infrastructure? You'll find training on everything from cloud computing, database, and security to artificial intelligence and machine learning, all free to subscribers. So, what are you waiting for? Pick a topic, leverage the Oracle University Learning Community to ask questions, and then sit for your certification. Visit mylearn.oracle.com to get started.  08:02 Nikita: Welcome back! Now coming to the certification, who can actually take this exam, Scott? Scott: The MySQL HeatWave Implementation Associate Certification Exam is designed specifically for administrators and data scientists who want to provision, configure, and manage MySQL HeatWave for transactions, analytics, machine learning, and Lakehouse. 08:22 Nikita: Can someone who's just graduated, say an engineering graduate interested in data analytics, take this certification? Are there any prerequisites? What are the career prospects for them? Scott: There are no mandatory prerequisites, but anyone who wants to take the exam should have experience with MySQL HeatWave and other aspects of OCI, such as virtual cloud networks and identity and security processes. Also, the learning path on MyLearn will be extremely helpful when preparing for the exam, but you are not required to complete the learning path before registering for the exam. The exam focuses more on getting MySQL HeatWave running (and keeping it running) than accessing the data. That doesn't mean it is not helpful for someone interested in data analytics. I think it can be helpful for data analysts to understand how the system providing the data functions, even if it is at just a high level. It is also possible that data analysts might be responsible for setting up their own systems and importing and managing their own data. 09:23 Lois: And how do I get started if I want to get certified on MySQL HeatWave? Scott: So, you'll first need to go to mylearn.oracle.com and look for the “Become a MySQL HeatWave Implementation Associate” learning path. The learning path consists of over 10 hours of training across 8 different courses.  These courses include “Getting Started with MySQL HeatWave Database Service,” which offers an introduction to some Oracle Cloud functionality such as security and networking, as well as showing one way to connect to a MySQL HeatWave instance. Another course demonstrates how to configure MySQL instances and copy that configuration to other instances. Other courses cover how to migrate data into MySQL HeatWave, set up and manage high availability, and configure HeatWave for OLAP. You'll find labs where you can perform hands-on activities, student and activity guides, and skill checks to test yourself along the way. And there's also the option to Ask the Instructor if you have any questions you need answers to. You can also access the Oracle University Learning Community and discuss topics with others on the same journey. The learning path includes a practice exam to check your readiness to pass the certification exam. 10:33 Lois: Yeah, and remember, access to the entire learning path is free so there's nothing stopping you from getting started right away. Now Scott, what does the certification test you on? Scott: The MySQL HeatWave Implementation exam, which is an associate-level exam, covers various topics. It will validate your ability to identify key features and benefits of MySQL HeatWave and describe the MySQL HeatWave architecture; identify Virtual Cloud Network (VCN) requirements and the different methods of connecting to a MySQL HeatWave instance; manage the automatic backup process and restore database systems from these backups; configure and manage read replicas and inbound replication channels; import data into MySQL HeatWave; configure and manage high availability and clustering of MySQL HeatWave instances. I know this seems like a lot of different topics. That is why we recommend anyone interested in the exam follow the learning path. It will help make sure you have the exposure to all the topics that are covered by the exam. 11:35 Lois: Tell us more about the certification process itself. Scott: While the courses we already talked about are valuable when preparing for the exam, nothing is better than hands-on experience. We recommend that candidates have hands-on experience with MySQL HeatWave with real-world implementations. The format of the exam is Multiple Choice. It is 90 minutes long and consists of 65 questions. When you've taken the recommended training and feel ready to take the certification exam, you need to purchase the exam and register for it. You go through the section on things to do before the exam and the exam policies, and then all that's left to do is schedule the date and time of the exam according to when is convenient for you. 12:16 Nikita: And once you've finished the exam? Scott: When you're done your score will be displayed on the screen when you finish the exam. You will also receive an email indicating whether you passed or failed. You can view your exam results and full score report in Oracle CertView, Oracle's certification portal. From CertView, you can download and print your eCertificate and even share your newly earned badge on places like Facebook, Twitter, and LinkedIn. 12:38 Lois: And for how long does the certification remain valid, Scott? Scott: There is no expiration date for the exam, so the certification will remain valid for as long as the material that is covered remains relevant.  12:49 Nikita: What's the next step for me after I get this certification? What other training can I take? Scott: So, because this exam is an associate level exam, it is kind of a stepping stone along a person's MySQL training. I do not know if there are plans for a professional level exam for HeatWave, but Oracle University has several other training programs that are MySQL-specific. There are learning paths to help prepare for the MySQL Database Administrator and MySQL Database Developer exams. As with the HeatWave learning paths, the learning paths for these exams include video tutorials, hands-on activities, skill checks, and practice exams. 13:27 Lois: I think you've told us everything we need to know about this certification, Scott. Are there any parting words you might have? Scott: We know that the whole process of training and getting certified may seem daunting, but we've really tried to simplify things for you with the “Become a MySQL HeatWave Implementation Associate” learning path. It not only prepares you for the exam but also gives you experience with features of MySQL HeatWave that will surely be valuable in your career. 13:51 Lois: Thanks so much, Scott, for joining us today. Nikita: Yeah, we've had a great time with you. Scott: Thanks for having me. Lois: Next week, we'll get back to our focus on AI with a discussion on deep learning. Until then, this is Lois Houston… Nikita: And Nikita Abraham, signing off. 14:07 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University  Podcast.

SuperDataScience
753: Blend Any Programming Languages in Your ML Workflows, with Dr. Greg Michaelson

SuperDataScience

Play Episode Listen Later Jan 30, 2024 86:20


Explore the future of collaborative ML workflows in this engaging episode with Dr. Greg Michaelson, Co-Founder of Zerve. Dr. Michaelson introduces the groundbreaking Zerve IDE and Pypelines project, addressing the critical gap in AutoML for commercial use and pinpointing why many A.I. projects don't meet their objectives. Gain insights into steering AI initiatives towards success and enhancing project communication, all in this insightful session. This episode is brought to you by Oracle NetSuite business software (https://netsuite.com/superdata), and by Prophets of AI (https://prophetsofai.com), the leading agency for AI experts. Interested in sponsoring a SuperDataScience Podcast episode? Visit https://passionfroot.me/superdatascience for sponsorship information. In this episode you will learn: • Why Zerve IDE is so sorely needed [04:50] • Pypelines: AutoML open-source in python [30:00] • Why most commercial A.I. projects fail and how to ensure they succeed [47:45] • How AutoML will impact the role of the data scientist [53:21] • Greg's background as a pastor and working at DataRobot [1:03:40] • How to develop impressive communication and storytelling skills [1:16:16] Additional materials: www.superdatascience.com/753

MLOps.community
RAG Has Been Oversimplified // Yujian Tang // #206

MLOps.community

Play Episode Listen Later Jan 23, 2024 48:55


Yujian is working as a Developer Advocate at Zilliz, where they develop and write tutorials for proof of concepts for large language model applications. They also give talks on vector databases, LLM Apps, semantic search, and tangential spaces. MLOps podcast #206 with Yujian Tang, Developer Advocate at Zilliz, RAG Has Been Oversimplified, brought to us by our Premium Brand Partner, Zilliz // Abstract In the world of development, Retrieval Augmented Generation (RAG) has often been oversimplified. Despite the industry's push, the practical application of RAG reveals complexities beyond its apparent simplicity. This talk delves into the nuanced challenges and considerations developers encounter when working with RAG, providing a candid exploration of the intricacies often overlooked in the broader narrative. // Bio Yujian Tang is a Developer Advocate at Zilliz. He has a background as a software engineer working on AutoML at Amazon. Yujian studied Computer Science, Statistics, and Neuroscience with research papers published to conferences including IEEE Big Data. He enjoys drinking bubble tea, spending time with family, and being near water. // MLOps Jobs board https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links Website: zilliz.com --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Yujian on LinkedIn: linkedin.com/in/yujiantang Timestamps: [00:00] Yujian's preferred coffee [00:17] Takeaways [02:42] Please like, share, and subscribe to our MLOps channels! [02:55] The hero of the LLM space [05:42] Embeddings into Vector databases [09:15] What is large and what is small LLM consensus [10:10] QA Bot behind the scenes [13:59] Fun fact getting more context [17:05] RAGs eliminate the ability of LLMs to hallucinate [18:50] Critical part of the rag stack [19:57] Building citations [20:48] Difference between context and relevance [26:11] Missing prompt tooling [27:46] Similarity search [29:54] RAG Optimization [33:03] Interacting with LLMs and tradeoffs [35:22] RAGs not suited for [39:33] Fashion App [42:43] Multimodel Rags vs LLM RAGs [44:18] Multimodel use cases [46:50] Video citations [47:31] Wrap up

Oracle University Podcast
Autonomous Database Tools

Oracle University Podcast

Play Episode Listen Later Jan 16, 2024 36:04


In this episode, hosts Lois Houston and Nikita Abraham speak with Oracle Database experts about the various tools you can use with Autonomous Database, including Oracle Application Express (APEX), Oracle Machine Learning, and more.   Oracle MyLearn: https://mylearn.oracle.com/   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X (formerly Twitter): https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Tamal Chatterjee, and the OU Studio Team for helping us create this episode.   ---------------------------------------------------------   Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started! 00:26 Lois: Hello and welcome to the Oracle University Podcast. I'm Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor. Nikita: Hi everyone! We spent the last two episodes exploring Oracle Autonomous Database's deployment options: Serverless and Dedicated. Today, it's tool time! Lois: That's right, Niki. We'll be chatting with some of our Database experts on the tools that you can use with the Autonomous Database. We're going to hear from Patrick Wheeler, Kay Malcolm, Sangeetha Kuppuswamy, and Thea Lazarova. Nikita: First up, we have Patrick, to take us through two important tools. Patrick, let's start with Oracle Application Express. What is it and how does it help developers? 01:15 Patrick: Oracle Application Express, also known as APEX-- or perhaps APEX, we're flexible like that-- is a low-code development platform that enables you to build scalable, secure, enterprise apps with world-class features that can be deployed anywhere. Using APEX, developers can quickly develop and deploy compelling apps that solve real problems and provide immediate value. You don't need to be an expert in a vast array of technologies to deliver sophisticated solutions. Focus on solving the problem, and let APEX take care of the rest. 01:52 Lois: I love that it's so easy to use. OK, so how does Oracle APEX integrate with Oracle Database? What are the benefits of using APEX on Autonomous Database? Patrick: Oracle APEX is a fully supported, no-cost feature of Oracle Database. If you have Oracle Database, you already have Oracle APEX. You can access APEX from database actions. Oracle APEX on Autonomous Database provides a preconfigured, fully managed, and secure environment to both develop and deploy world-class applications. Oracle takes care of configuration, tuning, backups, patching, encryption, scaling, and more, leaving you free to focus on solving your business problems. APEX enables your organization to be more agile and develop solutions faster for less cost and with greater consistency. You can adapt to changing requirements with ease, and you can empower professional developers, citizen developers, and everyone else. 02:56 Nikita: So you really don't need to have a lot of specializations or be an expert to use APEX. That's so cool! Now, what are the steps involved in creating an application using APEX?  Patrick: You will be prompted to log in as the administrator at first. Then, you may create workspaces for your respective users and log in with those associated credentials. Application Express provides you with an easy-to-use, browser-based environment to load data, manage database objects, develop REST interfaces, and build applications which look and run great on both desktop and mobile devices. You can use APEX to develop a wide variety of solutions, import spreadsheets, and develop a single source of truth in minutes. Create compelling data visualizations against your existing data, deploy productivity apps to elegantly solve a business need, or build your next mission-critical data management application. There are no limits on the number of developers or end users for your applications. 04:01 Lois: Patrick, how does APEX use SQL? What role does SQL play in the development of APEX applications?  Patrick: APEX embraces SQL. Anything you can express with SQL can be easily employed in an APEX application. Application Express also enables low-code development, providing developers with powerful data management and data visualization components that deliver modern, responsive end user experiences out-of-the-box. Instead of writing code by hand, you're able to use intelligent wizards to guide you through the rapid creation of applications and components. Creating a new application from APEX App Builder is as easy as one, two, three. One, in App Builder, select a project name and appearance. Two, add pages and features to the app. Three, finalize settings, and click Create. 05:00 Nikita: OK. So, the other tool I want to ask you about is Oracle Machine Learning. What can you tell us about it, Patrick? Patrick: Oracle Machine Learning, or OML, is available with Autonomous Database. A new capability that we've introduced with Oracle Machine Learning is called Automatic Machine Learning, or AutoML. Its goal is to increase data scientist productivity while reducing overall compute time. In addition, AutoML enables non-experts to leverage machine learning by not requiring deep understanding of the algorithms and their settings. 05:37 Lois: And what are the key functions of AutoML? Patrick: AutoML consists of three main functions: Algorithm Selection, Feature Selection, and Model Tuning. With Automatic Algorithm Selection, the goal is to identify the in-database algorithms that are likely to achieve the highest model quality. Using metalearning, AutoML leverages machine learning itself to help find the best algorithm faster than with exhaustive search. With Automatic Feature Selection, the goal is to denoise data by eliminating features that don't add value to the model. By identifying the most predicted features and eliminating noise, model accuracy can often be significantly improved with a side benefit of faster model building and scoring. Automatic Model Tuning tunes algorithm hyperparameters, those parameters that determine the behavior of the algorithm, on the provided data. Auto Model Tuning can significantly improve model accuracy while avoiding manual or exhaustive search techniques, which can be costly both in terms of time and compute resources. 06:44 Lois: How does Oracle Machine Learning leverage the capabilities of Autonomous Database? Patrick: With Oracle Machine Learning, the full power of the database is accessible with the tremendous performance of parallel processing available, whether the machine learning algorithm is accessed via native database SQL or with OML4Py through Python or R.  07:07 Nikita: Patrick, talk to us about the Data Insights feature. How does it help analysts uncover hidden patterns and anomalies? Patrick: A feature I wanted to call the electromagnet, but they didn't let me. An analyst's job can often feel like looking for a needle in a haystack. So throw the switch and all that metallic stuff is going to slam up onto that electromagnet. Sure, there are going to be rusty old nails and screws and nuts and bolts, but there are going to be a few needles as well. It's far easier to pick the needles out of these few bits of metal than go rummaging around in a pile of hay, especially if you have allergies. That's more or less how our Insights tool works. Load your data, kick off a query, and grab a cup of coffee. Autonomous Database does all the hard work, scouring through this data looking for hidden patterns, anomalies, and outliers. Essentially, we run some analytic queries that predict expected values. And where the actual values differ significantly from expectation, the tool presents them here. Some of these might be uninteresting or obvious, but some are worthy of further investigation. You get this dashboard of various exceptional data patterns. Drill down on a specific gauge in this dashboard and significant deviations between actual and expected values are highlighted. 08:28 Lois: What a useful feature! Thank you, Patrick. Now, let's discuss some terms and concepts that are applicable to the Autonomous JSON Database with Kay. Hi Kay, what's the main focus of the Autonomous JSON Database? How does it support developers in building NoSQL-style applications? Kay: Autonomous Database supports the JavaScript Object Notation, also known as JSON, natively in the database. It supports applications that use the SODA API to store and retrieve JSON data or SQL queries to store and retrieve data stored in JSON-formatted data.  Oracle AJD is Oracle ATP, Autonomous Transaction Processing, but it's designed for developing NoSQL-style applications that use JSON documents. You can promote an AJD service to ATP. 09:22 Nikita: What makes the development of NoSQL-style, document-centric applications flexible on AJD?  Kay: Development of these NoSQL-style, document-centric applications is particularly flexible because the applications use schemaless data. This lets you quickly react to changing application requirements. There's no need to normalize the data into relational tables and no impediment to changing the data structure or organization at any time, in any way. A JSON document has its own internal structure, but no relation is imposed on separate JSON documents. Nikita: What does AJD do for developers? How does it actually help them? Kay: So Autonomous JSON Database, or AJD, is designed for you, the developer, to allow you to use simple document APIs and develop applications without having to know anything about SQL. That's a win. But at the same time, it does give you the ability to create highly complex SQL-based queries for reporting and analysis purposes. It has built-in binary JSON storage type, which is extremely efficient for searching and for updating. It also provides advanced indexing capabilities on the actual JSON data. It's built on Autonomous Database, so that gives you all of the self-driving capabilities we've been talking about, but you don't need a DBA to look after your database for you. You can do it all yourself. 11:00 Lois: For listeners who may not be familiar with JSON, can you tell us briefly what it is?  Kay: So I mentioned this earlier, but it's worth mentioning again. JSON stands for JavaScript Object Notation. It was originally developed as a human readable way of providing information to interchange between different programs. So a JSON document is a set of fields. Each of these fields has a value, and those values can be of various data types. We can have simple strings, we can have integers, we can even have real numbers. We can have Booleans that are true or false. We can have date strings, and we can even have the special value null. Additionally, values can be objects, and objects are effectively whole JSON documents embedded inside a document. And of course, there's no limit on the nesting. You can nest as far as you like. Finally, we can have a raise, and a raise can have a list of scalar data types or a list of objects. 12:13 Nikita: Kay, how does the concept of schema apply to JSON databases? Kay: Now, JSON documents are stored in something that we call collections. Each document may have its own schema, its own layout, to the JSON. So does this mean that JSON document databases are schemaless? Hmmm. Well, yes. But there's nothing to fear because you can always use a check constraint to enforce a schema constraint that you wish to introduce to your JSON data. Lois: Kay, what about indexing capabilities on JSON collections? Kay: You can create indexes on a JSON collection, and those indexes can be of various types, including our flexible search index, which indexes the entire content of the document within the JSON collection, without having to know anything in advance about the schema of those documents.  Lois: Thanks Kay! 13:18 AI is being used in nearly every industry—healthcare, manufacturing, retail, customer service, transportation, agriculture, you name it! And, it's only going to get more prevalent and transformational in the future. So it's no wonder that AI skills are the most sought after by employers.  We're happy to announce a new OCI AI Foundations certification and course that is available—for FREE! Want to learn about AI? Then this is the best place to start! So, get going! Head over to mylearn.oracle.com to find out more.  13:54 Nikita: Welcome back! Sangeetha, I want to bring you in to talk about Oracle Text. Now I know that Oracle Database is not only a relational store but also a document store. And you can load text and JSON assets along with your relational assets in a single database.  When I think about Oracle and databases, SQL development is what immediately comes to mind. So, can you talk a bit about the power of SQL as well as its challenges, especially in schema changes? Sangeetha: Traditionally, Oracle has been all about SQL development. And with SQL development, it's an incredibly powerful language. But it does take some advanced knowledge to make the best of it. So SQL requires you to define your schema up front. And making changes to that schema could be a little tricky and sometimes highly bureaucratic task. In contrast, JSON allows you to develop your schema as you go--the schemaless, perhaps schema-later model. By imposing less rigid requirements on the developer, it allows you to be more fluid and Agile development style. 15:09 Lois: How does Oracle Text use SQL to index, search, and analyze text and documents that are stored in the Oracle Database? Sangeetha: Oracle Text can perform linguistic analyses on documents as well as search text using a variety of strategies, including keyword searching, context queries, Boolean operations, pattern matching, mixed thematic queries, like HTML/XML session searching, and so on. It can also render search results in various formats, including unformatted text, HTML with term highlighting, and original document format. Oracle Text supports multiple languages and uses advanced relevance-ranking technology to improve search quality. Oracle Text also offers advantage features like classification, clustering, and support for information visualization metaphors. Oracle Text is now enabled automatically in Autonomous Database. It provides full-text search capabilities over text, XML, JSON content. It also could extend current applications to make better use of textual fields. It builds new applications specifically targeted at document searching. Now, all of the power of Oracle Database and a familiar development environment, rock-solid autonomous database infrastructure for your text apps, we can deal with text in many different places and many different types of text. So it is not just in the database. We can deal with data that's outside of the database as well. 17:03 Nikita: How does it handle text in various places and formats, both inside and outside the database? Sangeetha: So in the database, we can be looking a varchar2 column or LOB column or binary LOB columns if we are talking about binary documents such as PDF or Word. Outside of the database, we might have a document on the file system or out on the web with URLs pointing out to the document. If they are on the file system, then we would have a file name stored in the database table. And if they are on the web, then we should have a URL or a partial URL stored in the database. And we can then fetch the data from the locations and index it in the term documents format. We recognize many different document formats and extract the text from them automatically. So the basic forms we can deal with-- plain text, HTML, JSON, XML, and then formatted documents like Word docs, PDF documents, PowerPoint documents, and also so many different types of documents. All of those are automatically handled by the system and then processed into the format indexing. And we are not restricted by the English either here. There are various stages in the index pipeline. A document starts one, and it's taken through the different stages so until it finally reaches the index. 18:44 Lois: You mentioned the indexing pipeline. Can you take us through it? Sangeetha: So it starts with a data store. That's responsible for actually reaching the document. So once we fetch the document from the data store, we pass it on to the filter. And now the filter is responsible for processing binary documents into indexable text. So if you have a PDF, let's say a PDF document, that will go through the filter. And that will extract any images and return it into the stream of HTML text ready for indexing. Then we pass it on to the sectioner, which is responsible for identifying things like paragraphs and sentences. The output from the section is fed onto the lexer. The lexer is responsible for dividing the text into indexable words. The output of the lexer is fed into the index engine, which is responsible for laying out to the indexes on the disk. Storage, word list, and stop list are some additional inputs there. So storage tells exactly how to lay out the index on disk. Word list which has special preferences like desegmentation. And then stop is a list word that we don't want to index. So each of these stages and inputs can be customized. Oracle has something known as the extensibility framework, which originally was designed to allow people to extend capabilities of these products by adding new domain indexes. And this is what we've used to implement Oracle Text. So when kernel sees this phrase INDEXTYPE ctxsys.context, it knows to handle all of the hard work creating the index. 20:48 Nikita: Other than text indexing, Oracle Text offers additional operations, right? Can you share some examples of these operations? Sangeetha: So beyond the text index, other operations that we can do with the Oracle Text, some of which are search related. And some examples of that are these highlighting markups and snippets. Highlighting and markup are very similar. They are ways of fetching these results back with the search. And then it's marked up with highlighting within the document text. Snippet is very similar, but it's only bringing back the relevant chunks from the document that we are searching for. So rather than getting the whole document back to you, just get a few lines showing this in a context and the theme and extraction. So Oracle Text is capable of figuring out what a text is all about. We have a very large knowledge base of the English language, which will allow you to understand the concepts and the themes in the document. Then there's entity extraction, which is the ability to find out people, places, dates, times, zip codes, et cetera in the text. So this can be customized with your own user dictionary and your own user rules. 22:14 Lois: Moving on to advanced functionalities, how does Oracle Text utilize machine learning algorithms for document classification? And what are the key types of classifications? Sangeetha: The text analytics uses machine learning algorithms for document classification. We can process a large set of data documents in a very efficient manner using Oracle's own machine learning algorithms. So you can look at that as basically three different headings. First of all, there's classification. And that comes in two different types-- supervised and unsupervised. The supervised classification which means in this classification that it provides the training set, a set of documents that have already defined particular characteristics that you're looking for. And then there's unsupervised classification, which allows your system itself to figure out which documents are similar to each other. It does that by looking at features within the documents. And each of those features are represented as a dimension in a massively high dimensional feature space in documents, which are clustered together according to that nearest and nearness in the dimension in the feature space. Again, with the named entity recognition, we've already talked about that a little bit. And then finally, there is a sentiment analysis, the ability to identify whether the document is positive or negative within a given particular aspect. 23:56 Nikita: Now, for those who are already Oracle database users, how easy is it to enable text searching within applications using Oracle Text? Sangeetha: If you're already an Oracle database user, enabling text searching within your applications is quite straightforward. Oracle Text uses the same SQL language as the database. And it integrates seamlessly with your existing SQL. Oracle Text can be used from any programming language which has SQL interface, meaning just about all of them.  24:32 Lois: OK from Oracle Text, I'd like to move on to Oracle Spatial Studio. Can you tell us more about this tool? Sangeetha: Spatial Studio is a no-code, self-service application that makes it easy to access the sorts of spatial features that we've been looking at, in particular, in order to get that data prepared to use with spatial, visualizing results in maps and tables, and also doing the analysis and sharing results. Spatial Studios is encoded at no extra cost with Autonomous Database. The studio web application itself has no additional cost and it runs on the server. 25:13 Nikita: Let's talk a little more about the cost. How does the deployment of Spatial Studio work, in terms of the server it runs on?  Sangeetha: So, the server that it runs on, if it's running in the Cloud, that computing node, it would have some cost associated with it. It can also run on a free tier with a very small shape, just for evaluation and testing.  Spatial Studio is also available on the Oracle Cloud Marketplace. And there are a couple of self-paced workshops that you can access for installing and using Spatial Studio. 25:47 Lois: And how do developers access and work with Oracle Autonomous Database using Spatial Studio? Sangeetha: Oracle Spatial Studio allows you to access data in Oracle Database, including Oracle Autonomous Database. You can create connections to Oracle Autonomous Databases, and then you work with the data that's in the database. You can also see Spatial Studio to load data to Oracle Database, including Oracle Autonomous Database. So, you can load these spreadsheets in common spatial formats. And once you've loaded your data or accessed data that already exists in your Autonomous Database, if that data does not already include native geometrics, Oracle native geometric type, then you can prepare the data if it has addresses or if it has latitude and longitude coordinates as a part of the data. 26:43 Nikita: What about visualizing and analyzing spatial data using Spatial Studio? Sangeetha: Once you have the data prepared, you can easily drag and drop and start to visualize your data, style it, and look at it in different ways. And then, most importantly, you can start to ask spatial questions, do all kinds of spatial analysis, like we've talked about earlier. While Spatial Studio provides a GUI that allows you to perform those same kinds of spatial analysis. And then the results can be dropped on the map and visualized so that you can actually see the results of spatial questions that you're asking. When you've done some work, you can save your work in a project that you can return to later, and you can also publish and share the work you've done. 27:34 Lois: Thank you, Sangeetha. For the final part of our conversation today, we'll talk with Thea. Thea, thanks so much for joining us. Let's get the basics out of the way. How can data be loaded directly into Autonomous Database? Thea: Data can be loaded directly to ADB through applications such as SQL Developer, which can read data files, such as txt and xls, and load directly into tables in ADB. 27:59 Nikita: I see. And is there a better method to load data into ADB? Thea: A more efficient and preferred method for loading data into ADB is to stage the data cloud object store, preferably Oracle's, but also supported our Amazon S3 and Azure Blob Storage. Any file type can be staged in object store. Once the data is in object store, Autonomous Database can access a directly. Tools can be used to facilitate the data movement between object store and the database. 28:27 Lois: Are there specific steps or considerations when migrating a physical database to Autonomous? Thea: A physical database can simply be migrated to autonomous because database must be converted to pluggable database, upgraded to 19C, and encrypted. Additionally, any changes to an Oracle-shipped stored procedures or views must be found and reverted. All uses of container database admin privileges must be removed. And all legacy features that are not supported must be removed, such as legacy LOBs. Data Pump, expdp/impdp must be used for migrating databases versions 10.1 and above to Autonomous Database as it addresses the issues just mentioned. For online migrations, GoldenGate must be used to keep old and new database in sync. 29:15 Nikita: When you're choosing the method for migration and loading, what are the factors to keep in mind? Thea: It's important to segregate the methods by functionality and limitations of use against Autonomous Database. The considerations are as follows. Number one, how large is the database to be imported? Number two, what is the input file format? Number three, does the method support non-Oracle database sources? And number four, does the methods support using Oracle and/or third-party object store? 29:45 Lois: Now, let's move on to the tools that are available. What does the DBMS_CLOUD functionality do? Thea: The Oracle Autonomous Database has built-in functionality called DBMS_CLOUD specifically designed so the database can move data back and forth with external sources through a secure and transparent process. DBMS_CLOUD allows data movement from the Oracle object store. Data from any application or data source export to text-- .csv or JSON-- output from third-party data integration tools. DBMS_CLOUD can also access data stored on Object Storage from the other clouds, AWS S3 and Azure Blob Storage. DBMS_CLOUD does not impose any volume limit, so it's the preferred method to use. SQL*Loader can be used for loading data located on the local client file systems into Autonomous Database. There are limits around OS and client machines when using SQL*Loader. 30:49 Nikita: So then, when should I use Data Pump and SQL Developer for migration? Thea: Data Pump is the best way to migrate a full or part database into ADB, including databases from previous versions. Because Data Pump will perform the upgrade as part of the export/import process, this is the simplest way to get to ADB from any existing Oracle Database implementation. SQL Developer provides a GUI front end for using data pumps that can automate the whole export and import process from an existing database to ADB. SQL Developer also includes an import wizard that can be used to import data from several file types into ADB. A very common use of this wizard is for importing Excel files into ADW. Once a credential is created, it can be used to access a file as an external table or to ingest data from the file into a database table. DBMS_CLOUD makes it much easier to use external tables, and the organization external needed in other versions of the Oracle Database are not needed. 31:54 Lois: Thea, what about Oracle Object Store? How does it integrate with Autonomous Database, and what advantages does it offer for staging data? Thea: Oracle Object Store is directly integrated into Autonomous Database and is the best option for staging data that will be consumed by ADB. Any file type can be stored in object store, including SQL*Loader files, Excel, JSON, Parquet, and, of course, Data Pump DMP files. Flat files stored on object store can also be used as Oracle Database external tables, so they can queried directly from the database as part of a normal DML operation. Object store is a separate bin storage allocated to the Autonomous Database for database Object Storage, such as tables and indexes. That storage is part of the Exadata system Autonomous Database runs on, and it is automatically allocated and managed. Users do not have direct access to that storage. 32:50 Nikita: I know that one of the main considerations when loading and updating ADB is the network latency between the data source and the ADB. Can you tell us more about this? Thea: Many ways to measure this latency exist. One is the website cloudharmony.com, which provides many real-time metrics for connectivity between the client and Oracle Cloud Services. It's important to run these tests when determining with Oracle Cloud service location will provide the best connectivity. The Oracle Cloud Dashboard has an integrated tool that will provide real time and historic latency information between your existing location and any specified Oracle Data Center. When migrating data to Autonomous Database, table statistics are gathered automatically during direct-path load operations. If direct-path load operations are not used, such as with SQL Developer loads, the user can gather statistics manually as needed. 33:44 Lois: And finally, what can you tell us about the Data Migration Service? Thea: Database Migration Service is a fully managed service for migrating databases to ADB. It provides logical online and offline migration with minimal downtime and validates the environment before migration. We have a requirement that the source database is on Linux. And it would be interesting to see if we are going to have other use cases that we need other non-Linux operating systems. This requirement is because we are using SSH to directly execute commands on the source database. For this, we are certified on the Linux only. Target in the first release are Autonomous databases, ATP, or ADW, both serverless and dedicated. For agent environment, we require Linux operating system, and this is Linux-safe. In general, we're targeting a number of different use cases-- migrating from on-premise, third-party clouds, Oracle legacy clouds, such as Oracle Classic, or even migrating within OCI Cloud and doing that with or without direct connection. If you have any direct connection behind a firewall, we support offline migration. If you have a direct connection, we support both offline and online migration. For more information on all migration approaches are available for your particular situation, check out the Oracle Cloud Migration Advisor. 35:06 Nikita: I think we can wind up our episode with that. Thanks to all our experts for giving us their insights.  Lois: To learn more about the topics we've discussed today, visit mylearn.oracle.com and search for the Oracle Autonomous Database Administration Workshop. Remember, all of the training is free, so dive right in! Join us next week for another episode of the Oracle University Podcast. Until then, Lois Houston… Nikita: And Nikita Abraham, signing off! 35:35 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.

Oracle University Podcast
Best of 2023: Getting Started with Oracle Database

Oracle University Podcast

Play Episode Listen Later Dec 26, 2023 19:21


In today's digital economy, data is a form of capital. Given the mission-critical role that it has, having a robust data management strategy is now more crucial than ever.   Join Lois Houston and Nikita Abraham, along with Kay Malcolm, as they talk about the various Oracle Database offerings and discuss how to actually use them to efficiently manage data across a diverse but unified data tier.   Oracle MyLearn: https://mylearn.oracle.com/ Oracle University Learning Community: https://education.oracle.com/ou-community X (formerly Twitter): https://twitter.com/Oracle_Edu LinkedIn: https://www.linkedin.com/showcase/oracle-university/   Special thanks to Arijit Ghosh, David Wright, Ranbir Singh, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started. 00:26 Lois: Welcome to the Oracle University Podcast. I'm Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor. Nikita: Hi there. If you've been following along with us these past few weeks, you'll know we've been revisiting our most popular episodes of the year.  Lois: Right, and today's episode is the last one of the Best of 2023 series. It's a throwback to our conversation on Oracle's Data Management strategy and offerings with Kay Malcolm, Senior Director of Database Product Management at Oracle. Nikita: We'd often heard Kay say that Oracle's data management strategy is simply complete and completely simple. And so we began by asking her what she meant by that. 01:09 Kay: It's a fun play on words, right? App development paradigms are in a rapid state of transformation. Modern app development is simplifying and accelerating how you deploy applications. Also simplifying how data models and data analytics are used. Oracle data management embraces modern app development and transformations that go beyond technology changes. It presents a simply complete solution that is completely simple. Immediately you can see benefits of the easiest and most productive platform for developing and running modern app and analytics. 01:54 Kay: Oracle Database is a converged database that provides best of breed support for all different data models and workloads that you need. When you have converged support for application development, you eliminate data fragmentation. You can perform unique queries and transactions that span any data and create value across all data types and build into your applications.  02:24 Nikita: When you say all data types, this can include both structured and unstructured data, right? Kay: This also includes structured and unstructured data. The Oracle converged database has the best of breed for JSON, graph, and text while including other data types, relations, blockchain, spatial, and others. Now that we have the ability to access any data type, we have various workloads and converged data management that supports all modern transactional and analytical workloads. We have the unique ability to run any combination of workloads on any combination of data. Simply complete for analytics means the ability to include all of the transactions, including key value, IoT, or Internet of Things, along with operational data warehouse and lake and machine learning. 03:27 Kay: Oracle's decentralized database architecture makes decentralized apps simple to deploy and operate. This architecture makes it simple to use decentralized app development techniques like coding events, data events, API driven development, low code, and geo distribution. Autonomous Database or ADB now supports the Mongo database API adding more tools for architectural support. Autonomous Database or ADB has a set of automated tools to manage, provision, tune, and patch. It provides solutions for difficult database engineering with auto indexing and partitioning and is elastic. You can automatically scale up or down based on the workload. Autonomous Database is also very productive. It allows for focus on the data for solving business problems. ADB has self-service tools for analytics, data access, and it simplifies these difficult data engineering architectures. 04:43 Lois: OK…so can you tell us about running modern apps and analytics? Kay: Running applications means thinking about all the operational concerns and solving how to support mission-critical applications. Traditionally, this is where Oracle excels with high availability, security, operational solutions that have been proven over the years. Now, having developer tools and the ability to scale and reduce risk simplifies the development process without having to use complex sharding and data protection. Mission-critical capabilities that are needed for the applications are already provided in the functionality of the Oracle Data Management architecture. Disaster recovery, replication, backups, and security are all part of the Oracle Autonomous Database. 05:42 Kay: Even complex business-critical applications are supported by the operational security and availability of Oracle ADB. Transparently, it provides automated solutions for minimizing risk, dealing with complexity, and availability for all applications. Oracle's big picture data management strategy is simply complete and completely simple with the converged database, data management tools, and the best platform. It is focused on providing a platform that allows for modern app development across all data types, workloads, and development styles. It is completely scalable, available, and secure, leveraging the database technologies developed over several years. And it's available consistently across the environment. It is the simplest to use because of the available tools and running completely mission critical applications. 06:50 Nikita: Ah, so that's how we come to… Kay: Simply complete and completely simple. Easy to remember and easy to incorporate into your existing architectures.  Lois: OK. So Kay, can you talk a little bit more about Autonomous Database? 07:04 Kay: Let's compare Autonomous Database to how you ran the database on premise. How you ran the database on the cloud using our earlier Cloud Services, Database Cloud Services, and Oracle Exadata Cloud Service. The key thing to understand is Autonomous Database, or ADB, is a fully managed service. We fully manage the infrastructure. We fully manage the database for you. In on premise, you manage everything-- the infrastructure, the database, everything. We also have a service in between that that we call a co-managed service. Here we manage the infrastructure, and you manage the database. That service is important for customers who are not yet up to 19c. Or they might be running a packaged application like E-Business Suite. But for the rest of you, ADB is really the place you want to go. 08:09 Nikita: And why is that? Kay: Because it's fully managed and, because it's fully managed, is a much, much lower cost way to go. So when you talk to your boss about why he wants to move to ADB, they often care about the bottom line. They want to know like, am I going to lower my costs? And with ADB, because we take care of a lot of the tedious chores that DBAs normally have to do and because we take care of best practices, configurations, we can do things at a really low cost.  08:49 Lois: Kay, what does it take for a customer to move to Oracle's Autonomous Database?  Kay: We've got a tool that helps you look at your current database on prem. This tool will analyze what features you're using and let you know, hey, you know you're doing something that's not supported for ADB, for example. Like if you're running some release before 19c, we don't support it. If you're doing stuff like putting database tables in the system or sys schema, we don't support it. You know, there are a few things that very few customers do that we don't support. And this tool will flag those for you. And then the next step, it's pretty simple. You just use our Data Pump import/export tool to move your data out of your database on prem into the object store on the Cloud. And then you simply import-- you know how to use Data Pump to import-- the data off the file and the object store into the database. Then you're done. Pretty simple process. 09:57 Nikita: Do we assist our customers with data migration from on-prem to Cloud? Kay: More recently have come out with a new service on our Cloud called the Database Migration Service. With Autonomous Database Migration Service, you can just point us at your source database on prem or even on some other cloud. Whatever it is, we will take care of everything from there and move that, go through all the steps and move your database to ADB on the Cloud. Even better, we now are working with our Applications customers to make it really easy for them to move their packaged applications to Autonomous Database. The Oracle development teams that built JD Edwards, PeopleSoft, Siebel have now all certified that those packaged applications can run with Autonomous Database no problem. Our EBS team is working on it. And that'll be coming soon, sometime next year. 11:02 Lois: So, if I am an Apps customer, is there a special service for me? Kay: We have a fully managed service available on our Cloud that lets you take your entire application stack on the middle tier and the database tier, move it to our Cloud. Move the database part to Autonomous Database. And they will also manage your middle tier for you. 11:32 Want to get the inside scoop on Oracle University? Head on over to the all-new Oracle University Learning Community. Attend exclusive events. Read up on the latest news. Get first-hand access to new products and stay up-to-date with upcoming certification opportunities. If you are already an Oracle MyLearn user, go to MyLearn to join the community. You will need to log in first. If you have not yet accessed Oracle MyLearn, visit mylearn.oracle.com and create an account to get started. Join the community today! 12:11 Nikita: Welcome back! Kay, can you talk a bit about APEX?  Kay: We have this great tool called APEX or Application Express. We have a version of Autonomous Database just for any APEX application.  Well, APEX is a low-code tool. It is our low-code tool that lets you rapidly build data-driven applications where the data is in the Oracle Database, really easy and really rapidly. We estimate at least 10 times faster than doing traditional coding to build your applications. What we're seeing is much, much higher productivity than that. Sometimes 40, even 50 times faster coding. 13:01 Kay: Out of the box, it comes with really nice tools for building things-- your classical forms and reporting kinds of workloads. It gives you things like faceted search and capabilities to do things like see on an e-commerce website where you get to choose things like dimensions, like I want a product where the cost is in this range. And, you know, it might have some other attributes. And it can very quickly filter that data for you and return the best results. And it's a really nice tool for iterating. Now, if your user interface doesn't look quite right, it's very easy to tweak colors and backgrounds and themes. Another reason it's so productive is that the whole middle tier part of your application is fully automated for you. You don't have to do anything about connection management or state management. You don't have to worry about mapping data types from some other 3GL programming language to data types. All of that is done for you. The combination of ADB and APEX really rocks. 14:17 Lois: Do we have Extract, Transform, and Load capabilities in our ADB? Kay: We have ETL transformation tools. Again, they let you specify transformations in a drag-and-drop fashion on the screen. We have all sorts of other tools and, in the service, the full power of the converged analytic technologies, things like graph analytics, spatial analytics, machine learning. All of this is built into this new platform. Now, a big, new capability around machine learning is something that we call AutoML. That lets any data scientists give us a data set, tell us what the key feature is that they want to analyze, and what the predictions are. And we will come up with a machine learning model for them out of the box. Really that easy. Plus, we have the low-code tool APEX that I mentioned earlier. 15:17 Kay: So this environment is really powerful for doing more than traditional data warehouses. We can build data lakes. We are integrated with the object stores on Oracle Cloud and also on other clouds. And we can do massively parallel querying of data in the core database itself and the data lake. 15:38 Nikita: Beyond the database tech, there's the business side, right? How easy do we make a customer's path to ADB from a business standpoint, a decision-making standpoint? Kay: So if you're an existing Oracle customer, you have an existing Oracle Database license you're using on prem, we have something called BYOL, Bring Your Own License, to OCI. We have the Cloud Lift Service. This huge cloud engineering team across all regions of the world will help you move your existing on-prem database to ADB for free. 16:16 Kay: And then, finally, we announced fairly recently something called the Support Rewards Program. This is something our customers are really excited about. It lets them translate their spending on OCI to a reduction in their support bill. So if you're a customer using OCI, you get a $0.25 to $0.33 reward for every dollar you spend on Oracle's Cloud. You can then take that money from your rewards and apply it to your bill for customer support, for your technology support even, like the database. And this is exactly what customers want as they move their investment to the cloud. They want to lower the costs of paying for their on-prem support. Now, we've talked about money. This lowers costs greatly. So ADB has lots of value. But the big thing I think to think about is really that it lowers costs. It lowers that cost via automation, higher productivity, less downtime, all sorts of areas.   17:22 Lois: You make a very convincing case for ADB, Kay. Kay: ADB is a great place to go. Take those existing Oracle Databases you have. Move and modernize them to a modern cloud infrastructure that's going to give you all the benefits of cloud, including agility and lower cost. So on our Cloud, we have something called the Always Free Autonomous Database Service. This service lets you get your hands on ADB. Try it out for yourself. You don't have to believe what we claim about how great this technology is. And we have other technologies like Live Labs that you can find on developer.oracle.com/livelabs that lets you do all kinds of exercises on this Always Free ADB infrastructure. Really get your hands dirty. And see for yourself how productive it can be.  18:16 Nikita: Thanks, Kay, for telling us about ADB and our database offerings. To learn more about this, head over mylearn.oracle.com, create a profile if you don't already have one, and get started on our free Oracle Cloud Data Management Foundations Workshop. Lois: We hope you've enjoyed revisiting some of our most popular episodes these past few weeks. We're kicking off the new year with a new season of the Oracle University Podcast. And this time around, it'll be on Oracle Autonomous Database so make sure you don't miss it. Until next week, this is Lois Houston… Nikita: And Nikita Abraham, signing off! 18:52 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.

MLOps.community
The Role of Infrastructure in ML Leveraging Open Source // Niels Bantilan // #197

MLOps.community

Play Episode Listen Later Dec 22, 2023 65:24


MLOps podcast #197 with Niels Bantilan, Chief Machine Learning Engineer at Union, The Role of Infrastructure in ML Leveraging Open Source brought to us by Union. // Abstract When we start out building and deploying models in a new organization, life is simple: all I need to do is grab some data, iterate on a model that fits the data well and performs reasonably well on some held-out test set. Then, if you're fortunate enough to get to the point where you want to deploy it, it's fairly straightforward to wrap it in an app framework and host it on a cloud server. However, once you get past this stage, you're likely to find yourself needing: More scalable data processing framework Experiment tracking for models Heavier duty CPU/GPU hardware Versioning tools to link models, data, code, and resource requirements Monitoring tools for tracking data and model quality There's a rich ecosystem of open-source tools that solves each of these problems and more: but how do you unify all of them together into a single view? This is where orchestration tools like Flyte can help. Flyte not only allows you to compose data and ML pipelines, but it also serves as “infrastructure as code” so that you can leverage the open-source ecosystem and unify purpose-built tools for different parts of the ML lifecycle on a single platform. ML systems are not just models: they are the models, data, and infrastructure combined. // Bio Niels is the Chief Machine Learning Engineer at Union.ai, and core maintainer of Flyte, an open-source workflow orchestration tool, author of UnionML, an MLOps framework for machine learning microservices, and creator of Pandera, a statistical typing and data testing tool for scientific data containers. His mission is to help data science and machine learning practitioners be more productive. He has a Masters in Public Health with a specialization in sociomedical science and public health informatics, and prior to that a background in developmental biology and immunology. His research interests include reinforcement learning, AutoML, creative machine learning, and fairness, accountability, and transparency in automated systems. // MLOps Jobs board https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links Website: https://github.com/cosmicBboy, https://union.ai/Flyte: https://flyte.org/ MLOps vs ML Orchestration // Ketan Umare // MLOps Podcast #183 - https://youtu.be/k2QRNJXyzFg ⁠ --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Varun on Twitter: https://www.linkedin.com/in/varunkmohan/ Timestamps: [00:00] Niels' preferred coffee [00:17] Takeaways [03:45] Shout out to our Premium Brand Partner, Union! [04:30] Pandera [08:12] Creating a company [14:22] Injecting ML for Data [17:30] ML for Infrastructure Optimization [22:17] AI Implementation Challenges [24:25] Generative DevOps movement [28:27] Pushing Limits: Code Responsibility [29:46] Orchestration in OpenAI's Dev Day [34:27] MLOps Stack: Layers & Challenges [42:45] Mature Companies Embrace Kubernetes [45:29] Horizon Challenges [47:24] Flexible Integration for Resources [49:10] MLOps Reproducibility Challenges [53:14] MLOps Maturity Spectrum [57:48] First-Class Citizens in Design [1:00:16] Delegating for Efficient Collaboration [1:04:55] Wrap up

the artisan podcast
S3 | E3 | the artisan podcast | eros marcello | demystifying AI

the artisan podcast

Play Episode Listen Later Oct 22, 2023 25:23


www.theotheeros.com LinkedIn | Instagram | X   Eros Marcello a software engineer/ developer and architect specializing in human interfacing artificial intelligence, with a special focus on conversational AI systems, voice assistance, chat bots and ambient computing.   Eros has been doing this since 2015 and even though today for the rest of us laymen in the industry we're hearing about AI everywhere, for Eros this has been something he's been passionately working in for quite a few years.    Super excited to have him here to talk to us about artificial intelligence and help demystify some of the terminology that you all may be hearing out there.    I'm so excited to welcome Eros Marcello to this conversation to learn a little bit more about AI. He is so fully well versed in it and has been working in AI at since 2015, when it was just not even a glimmer in my eyes so I'm so glad that to have somebody here who's an expert in that space.   Eros glad to have you here I would love to just jump into the conversation with you. For many of us this this buzz that we're hearing everywhere sounds new, as if it's just suddenly come to fruition. But that is clearly not the case, as it's been around for a long time, and you've been involved in it for a long time.     Can you take us to as a creative, as an artist, as an architect, as an engineer take us through your genesis and how did you get involved and how did you get started. Let's just start at the beginning.   Eros:  The beginning could be charted back sequentially working in large format facilities, as surprise surprise the music industry, which you know was the initial interest and was on the decline. You'd have this kind of alternate audio projects, sound design projects that would come into these the last remaining, especially on the East and West, Northeast and So-cal areas, the last era of large format analog-based facilities with large recording consoles and hardware and tape machines.  I got to experience that, which was a great primer for AI for many reasons, we'll get more into that later. So what happened was that you'd have voiceover coming in for telephony systems, and they would record these sterile, high-fidelity captures of voice that would become the UI sound banks, or used for speech synthesis engines for call centers. That was the exposure to what was to come with voice tech folks in that space, the call center world, that really started shifting my gears into what AI machine learning was and how I may fit into it. Fast forward, I got into digital signal processing and analog emulation, so making high caliber tools for Pro Tools, Logic, Cubase , Mac and PC for sound production and music production. specifically analog circuitry emulation and magnetic tape emulation “in the box” as it's called that gave me my design and engineering acumen. Come 2015/2016, Samsung came along and said you've done voice-over,  know NLP, machine learning, and AI, because I studied it and acquired the theoretical knowledge and had an understanding of the fundamentals.  I didn't know where I fit yet, and then they're like so you know about, plus you're into voice, plus you have design background with the software that you worked on.  I worked on the first touchscreen recording console called the Raven MTX for a company called Slate Digital. So I accidentally created the trifecta that was required to create what they wanted to do which was Bigxby which was Samsung's iteration of the series for the Galaxy S8 and they wanted me to design the persona… and that as they say is history. Samsung Research America, became my playground they moved me up from LA to the Bay Area and that was it.  It hasn't really stopped since it's been a meteoric ascension upward. They didn't even know what to call it back then, they called it a UX writing position, but UX writers don't generate large textual datasets and annotate data and then batch and live test neural networks. Because that's what I was doing, so I was essentially doing computational linguistics on the fly. And on top of it in my free time I ingratiated myself with a gentleman by the name of Gus who was head of deep learning research there and because I just happened to know all of these areas that fascinated me in the machine learning space, and because I was a native English speaker, I found a niche where they allowed me to not only join the meetings, but help them prepare formalized research and presentations which only expanded my knowledge base.  I mean we're looking into really cutting-edge stuff at the time, AutoML, Hyperparameter tuning and Param ILS and things in the realms of generative adversarial neural networks which turned me on to the work of Ian Goodfellow, who was until I got there was an Apple employee and now it's gone back to Google Deep Mind. He's the father of Generative Adversarial Neural Networks, he's called the GANfather and that's really it the rest is history. I got into Forbes when I was at Samsung and my Hyperloop team got picked to compete at SpaceX, so it was a lot that happened in a space of maybe 90 days.  Katty You were at the right place at the right time, but you were certainly there at a time where opportunities that exist today didn't exist then and you were able to forge that.  I also can see that there are jobs that will be coming up in AI that don't exist today. It's just such an exciting time to be in this space and really forge forward and craft a path based on passion and yours clearly was there.  So you've used a lot of words that are regular nomenclature for you, but I think for some of the audience may not be can you take us through…adversarial I don't even know what you said adversarial … Yes Generative Adversarial Neural Networks. Eros A neural network is the foundational machine learning technique, where you provide curated samples of data, be it images or text, to a machine learning algorithm neural network which is trained, as it's called, on these samples so that when it's deployed in the real world it can do things like image recognition, facial recognition, natural language processing, and understanding. It does it by showing it, it's called supervised learning, so it's explicitly hand-labeled data, you know, this picture is of a dog versus this is a picture of a cat, and then when you deploy that system in production or in a real-world environment it does its best to assign confidence scores or domain accuracy to you know whether it's a cat or a dog.  You take generative adversarial neural networks and that is the precipice of what we see today is the core of MidJourney and Stable Diffusion and image-to-image generation when we're seeing prompts to image tools. Suffice it to say generative adversarial networks are what is creating a lot of these images or, still image to 3D tools, you have one sample of data and then you have this sort of discriminator and there's a waiting process that occurs and that's how a new image is produced. because the pixel density and tis diffused, it's dispersed by you know by brightness and contrasts across the image and that can actually generate new images. Katty So for example if an artist is just dabbling with Dall-E, let's say, and they put in the prompt so they need to put in to create something, that's really where it's coming from, it's all the data that is already been fed into the system. Eros  Right, like Transformers which again are the type of neural network that's used in ChatGPT or Claude, there are really advanced recurrent neural networks. And current neural networks were used a lot for you know NLP and language understanding systems and language generation and text generation systems. Prior, they had a very hard ceiling and floor, and Transformers are the next step. But yeah more or less prompt to image. Again tons of training that assigns, that parses the semantics and assigns that to certain images and then to create that image there's sequence to sequence processes going on. Everyone's using something different, there's different techniques and approaches but more or less you have Transformers. Your key buzzwords are Transformers, Large Language models, Generative AI, and Generative neural networks. It's in that microcosm of topics that we're seeing a lot of this explode and yes they have existed for a while. Katty Where should somebody start? Let's say you have a traditional digital designer who doesn't really come from an engineering or math background like you didn't and they can see that this is impacting or creating opportunities within their space-- where should they start? Eros First and foremost leveling up what they can do. Again, that fundamental understanding, that initial due diligence, I think sets the tone and stage for success or failure, in any regard, but especially with this. Because you're dealing with double exponential growth and democratization to the tune where like we're not even it's not even the SotA state-of-the-art models, large language models that are the most astounding. If you see in the news Open AI is and looking at certain economic realities of maintaining. What is really eclipsing everything is and what's unique to this boom over like the.com bubble or even the initial AI bubble is the amount of Open Source effort being apportioned and that is you know genie out of the bottle for sure when it comes to something of this where you can now automate automation just certain degrees. So we're going to be seeing very aggressive advancement and that's why people are actually overwhelmed by everything. I mean there's a new thing that comes out not even by the day but seemingly by the minute. I'm exploring for black AI hallucinations, which for the uninitiated hallucinations are the industry term they decided to go with for erroneous or left field output from these large language models.  I'm exploring different approaches to actually leverage that as an ideation feature, so the sky is the limit when it comes to what you can do with these things and the different ways people are going to use it. Just because it's existed it's not like it's necessarily old news as much as it's fermented into this highly productized, commoditized thing now which is innovation in it and of itself.   So where they would start is really leveling up, and identifying what these things can do. And not trying to do with them on their own battlefield. So low hanging fruit you have to leverage these tools to handle that and quadruple down on your high caliber skill set on your on what makes you unique, on your specific brand, even though that word makes me cringe a little bit sometimes, but on your on your strengths, on what a machine can't do and what's not conducive to make a machine do and it's does boil down to common sense.  Especially if you're a subject matter expert in your domain, a digital designer will know OK well Dall-E obviously struggles here and there, you know it can make a logo but can it make you know this 3D scene to the exact specifications that I can? I mean there's still a lot of headroom that is so hyper-specific it would never be economically, or financially conducive to get that specific with this kind of tools that handle generalized tasks. What we're vying for artificial general intelligence so we're going to kind of see a reversal where it's that narrow skill set that is going to be, I think, ultimately important.  Where you start is what are you already good at and make sure you level up your skills by tenfold. People who are just getting by, who dabble or who are just so so, they're going to be displaced. I would say they start by embracing the challenge, not looking at it as a threat, but as an opportunity, and again hyper-focusing on what they can do that's technical, that's complex, quadrupling on that hyper-focusing on it, highlighting and marketing on that point and then automating a lot of that lower tier work that comes with it, with these tools where and when appropriate. Katty I would imagine just from a thinking standpoint and a strategy standpoint and the creative process that one needs to go through, that's going to be even more important than before, because in order to be able to give the prompts to AI, you have to really have to strategize where you want to take it, what you want to do with it,  otherwise it's information in and you're going to get garbage out.   Eros Right absolutely. And it depends on the tool, it depends on the approach of the company and manufacturer, creators of the tool. You know Midjourney, their story is really interesting. The gentleman who found that originally founded Leap Motion, which was in the 2010s that gesture-based platform that had minor success.  He ended up finding Midjourney and denying Apple two acquisition attempts, and like we're using Discord as a means for deployment and many other things simultaneously and to great effect. So it's the Wild West right now but it's an exciting time to be involved because it's kind of like when Auto-tune got re-popularized. For example it all kind of comes back to that music audio background because Autotune was originally a hardware box. That's what Cher used on her song and then you have folks that you know in the 2010s T-Pain and Little Wayne and everybody came along it became a plug-in, a software plug-in, and all of a sudden it was on everything and now it's had its day, it had 15 minutes again, and then it kind of dialed back to where it's used for vocal correction. It's used as a utility now rather than a kind of a buzzy effect. Katty Another thing to demystify.. Deep fake—what is that? Yes deep fake, can be voice cloning, which is neural speech synthesis and then you have deep fakes that are visual, so you have you know face swapping, as it's called.   You have very convincing deep fakes speeches, and you have voice clones that that more or less if you're not paying attention can sound and they're getting better again by the day. Katty What are the IP implications of that even with the content that's created on some of these other sources? Eros The IP implications in Japan passed that the data used that's you know regenerated, it kind of goes back I mean it's not if you alter something enough, a patent or intellectual property laws don't cover it because it's altered, and to prove it becomes an arbitrary task for it has an arbitrary result that's subjective. Katty You are the founder and chief product architect of BlackDream.ai. Tell us a little bit more about that what the core focus? Eros: So initially again it was conceived to research computer vision systems, adversarial machine intelligence. There's adversarial prompt injection, where you can make a prompt to go haywire if you kind of understand the idiosyncrasies of the specific model dealing with, or if you in construction of the model, found a way to cause perturbations in the data set, like basically dilute or compromise the data that it's being trained on with malice. To really kind of study those effects, how to create playbooks against them, how to make you know you know zero trust fault tolerant playbooks, and methodologies to that was the ultimate idea.  There's a couple moving parts to it, it's part consultancy to establish market fit so on the point now where again, Sandhill Road has been calling, but I've bootstrapped and consulted as a means of revenue first to establish market fit. So I've worked for companies and with companies, consulted for defense initiatives, for SAIC and partnering with some others. I have some other strategic partnerships that are currently in play. We have two offices, a main office at NASA/Ames, our headquarters is that is a live work situation, at NASA Ames / Moffett field in Mountain View CA so we are in the heart of Silicon Valley and then a satellite office at NASA Kennedy Space Center ,at the in the astronauts memorial building, the longevity of that which you know it's just a nice to have at this point because we are Silicon Valley-based for many reasons, but it's good to be present on both coasts. So there's an offensive cyber security element that's being explored, but predominantly what we're working on and it's myself as the sole proprietor with some third party resources, more or less friends from my SpaceX /Hyperloop team and some folks that I've brokered relationships with along the way at companies I've contracted with or consulted for. I've made sure to kind of be vigilant for anyone who's, without an agenda, just to make sure that I maintain relationships with high performers and radically awesome and talented people which I think is I've been successful in doing.  So I have a small crew of nonpareil, second to none talent, in the realm of deep learning, GPU acceleration, offensive cyber security, and even social robotics, human interfacing AI as I like to call it. So that's where Blackdream.ai is focusing on: adversarial machine intelligence research and development for the federal government and defense and militaristic sort of applications Katty This image of an iceberg comes to mind that we only see in the tip of it over the water you know with the fun everybody's having with the Dall-Es and the ChatGPT's but just the implication of it, what is happening with the depth of it ….fascinating!! Thank you you for being with us and just allowing us to kind of just maybe dip our toe a little bit under the water and to just see a little bit of what's going on there. I don't know if I'm clearer about it or if it was just a lot more research needs to be now done on my part to even learn further about it. But I really want to thank you for coming here. I know you're very active in the space and you speak constantly on about AI and you're coming up soon on “Voice and AI”. And where can people find you if they wanted to reach out and talk to you some more about this or have some interest in learning more about Blackdream.ai? The websites about to be launched Blackdream.AI. On Linkedin I think only Eros Marcello around and www.theotheeros.com,  the website was sort of a portfolio.  Don't judge me I'm not a web designer but I did my best. It came out OK and then you have LinkedIn, Instagram its Eros Marcello on Twitter/X its ErosX Marcello. I try to make sure that I'm always up to something cool so I'm not an influencer by any stretch or a thought-leader, but I certainly am always getting into some interesting stuff, be it offices at NASA Kennedy Space Center, or stranded in Puerto Rico…. you never know. It's all a little bit of reality television sprinkled into the tech. Katty: Before I let you go what's the last message you want to leave the audience with? Eros:  Basically like you know I was I grew up playing in hardcore punk bands and you know.  Pharma and Defense, AI for government and Apple AI engineer, none of that was necessarily in the cards for me, I didn't assume. So my whole premise is, I know I may be speaking about some on higher levels things or in dealing more in the technicalities than the seemingly, the whole premise is that you have to identify as a creative that this is a technical space and the technical is ultimately going to inform the design. And I didn't come out of the womb or hail from you know parents who are AI engineers. This isn't like a talent, this is an obsession.  So if I can learn this type of knowledge and apply it, especially in this rather succinct amount of time I have, that means anyone can. I mean it's not some secret sauce or method to it, it's watch YouTube videos or read papers, you know tutorials, tutorials, tutorials. Anyone can get this type of knowledge, and I think it's requisite that they do to bolster and support and scale their creative efforts. So this is gonna be a unique situation in space and time where that you know the more technical you can get, or understand or at least grasp the better output creatively the right it will directly enrich and benefit your creative output and I think that's a very kind of rare symmetry that isn't really inherent in a lot of other things but if I can do it anyone. I love it thank you for this peek into what's going on the defense component of it, the cyber security component of it, the IP component of it… there just so many implications that are things we need to talk about and think about, so thank you for starting that conversation. Absolutely pleasure I appreciate you having me on hopefully we do this again soon.    

DS30 Podcast
Navigating Ethical AutoML, Agile Data Science, and AI Job Frontiers with Favio Vazquez

DS30 Podcast

Play Episode Listen Later Oct 6, 2023 57:17


“When you stop seeing numbers, and you start seeing people in your data sheets, everything changes.” - Favio Vazquez   In this episode, our host, Chris Richardson, interviews Favio Vazquez, Senior Data Scientist at H20.ai.   This conversation dives deep into all things autoML and AI, from what companies are doing now to what data professionals can do to prepare for the future. Favio provides his strategies for screening for bias and competing in a job market where AI is changing how we work.   They discuss: What to expect in the transition from data science in academia to data science in business What it means to practice data science in an Agile environment Why AI and autoML won't replace data professionals any time soon How to approach ethics in machine learning New roles for data professionals Business-Driven Data Analysis As discussed in this podcast, transitioning to a business environment requires a different lens to interpret and analyze data.   A business-driven approach to data analysis enhances decision-making and aligns data projects with organizational objectives.   Our course, Business-Driven Data Analysis, will train you to learn what a stakeholder truly wants, refine the project based on available data, produce results and provide strategic insights.   Learn More

Future of UX
#38 The User Experience of AI Products on Future of UX

Future of UX

Play Episode Listen Later Sep 28, 2023 13:47


Welcome to another episode of "Future of UX," where we explore the intersection of design and technology. Today, we dive deep into the user experience of AI products. From the current state of usability in AI to the challenges and opportunities that lie ahead, this episode is a comprehensive guide for designers and tech enthusiasts alike.

TAG Data Talk
Successfully Leveraging AutoML to Solve Complex Problems

TAG Data Talk

Play Episode Listen Later Sep 7, 2023 21:18


In this episode of TAG Data Talk, Dr. Beverly Wright discusses with Chad Harness: describe automl to the business person how do these types of tools work what are the right ways to leverage automl expectations for the future of automl tools

GOTO - Today, Tomorrow and the Future
Empowering Consumers: Evolution of Software in the Future • Derek Collison & Linda Stougaard Nielsen

GOTO - Today, Tomorrow and the Future

Play Episode Listen Later Jul 21, 2023 25:47 Transcription Available


This interview was recorded for GOTO Unscripted at GOTO Copenhagen.gotopia.techRead the full transcription of this interview hereLinda Stougaard Nielsen - Data Scientist, Software Developer & Agile PractitionerDerek Collison - Founder of NATSRESOURCESLindagithub.com/stougaardlinkedin.com/in/linda-stougaard-nielsenDerek@derekcollisongithub.com/derekcollisonlinkedin.com/in/derekcollisonDESCRIPTIONIn this captivating GOTO Unscripted conversation, Linda Stougaard Nielsen, an accomplished data scientist and agile practitioner at AVA Women, joins forces with Derek Collison, the visionary Founder & CEO at Synadia Communications. They embark on a thrilling exploration of cutting-edge topics, including the path to personal data sovereignty beyond GDPR, the quest for innovative AI systems that break traditional boundaries, unlocking the utility paradigm in cloud computing's exciting future, the shift from developers to empowered consumers shaping the future of software, and their fascinating adventures in programming diversity.Prepare to be inspired by their insightful and thought-provoking dialogue!RECOMMENDED BOOKS Jeff Hawkins & Sandra Blakeslee • On IntelligenceStefan Helzle • Low-Code Application Development with AppianForsgren, Humble & Kim • AccelerateDavid Farley • Modern Software EngineeringZhamak Dehghani • Data MeshPiethein Strengholt • Data Management at ScaleMartin Kleppmann • Designing Data-Intensive ApplicationsTwitterLinkedInFacebookLooking for a unique learning experience?Attend the next GOTO conference near you! Get your ticket: gotopia.techSUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted almost daily

AI Today Podcast: Artificial Intelligence Insights, Experts, and Opinion
AI Today Podcast: AI Glossary Series – Automated Machine Learning (AutoML)

AI Today Podcast: Artificial Intelligence Insights, Experts, and Opinion

Play Episode Listen Later Jul 19, 2023 9:11


In this episode of the AI Today podcast hosts Kathleen Walch and Ron Schmelzer define the term Automated Machine Learning (AutoML), explain how this term relate to AI and why it's important to know about them. Show Notes: FREE Intro to CPMAI mini course CPMAI Training and Certification AI Glossary AI Glossary Series – DevOps, Machine Learning Operations (ML Ops) AI Glossary Series – Model Tuning and Hyperparameter Glossary Series: (Artificial) Neural Networks, Node (Neuron), Layer Glossary Series: Bias, Weight, Activation Function, Convergence, ReLU Glossary Series: Perceptron Glossary Series: Hidden Layer, Deep Learning Glossary Series: Loss Function, Cost Function & Gradient Descent Glossary Series: Backpropagation, Learning Rate, Optimizer Glossary Series: Feed-Forward Neural Network AI Glossary Series – Machine Learning, Algorithm, Model AI Glossary Series – Model Tuning and Hyperparameter Continue reading AI Today Podcast: AI Glossary Series – Automated Machine Learning (AutoML) at Cognilytica.

The Data Scientist Show
Uber's ML Systems (Uber Eats, Customer Support), Declarative Machine Learning - Piero Molino - The Data Scientist Show #064

The Data Scientist Show

Play Episode Listen Later Jul 4, 2023 110:05


Piero Molino was one of the founding members of Uber AI Labs. He worked on several deployed ML systems, including an NLP model for Customer Support, and the Uber Eats Recommender System. He is the author of Ludwig , an open source declarative deep learning framework. In 2021 he co-founded Predibase, the low-code declarative machine learning platform built on top of Ludwig. Piero's LinkedIn: https://www.linkedin.com/in/pieromolino Predibase free access: bit.ly/3PCeqqw Daliana's Twitter: https://twitter.com/DalianaLiu Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu (00:00:00) Introduction (00:01:54) Journey to machine learning (00:03:51) Recommending system at Uber Eats (00:04:13) Projects at Uber AI  (00:09:34) Uber's customer obsession ticket system (00:16:01) How to evaluate online-offline business and model performance metrics (00:17:16) Customer Satisfaction (00:28:38) When do you know whether a project is good enough (00:41:50) Declarative machine learning and Ludwig (00:45:32) Ludwig vs AutoML (00:54:44) Working with Professor Chris Re (00:58:32) Why he started Predibase (01:07:56) LLM and GenAI (01:10:17) Challenges for LLMs (01:22:36) Advice for data scientists (01:34:29) Career advice to his younger self

Oracle University Podcast
Oracle Machine Learning

Oracle University Podcast

Play Episode Listen Later Jul 4, 2023 16:26


There is so much data available today. But it only makes a difference when you transform that data into actionable intelligence.   In this episode, hosts Lois Houston and Nikita Abraham, along with Nick Commisso, discuss how you can harness the capabilities of Oracle Machine Learning to solve key business problems and accelerate the deployment of machine learning–based solutions.   Oracle MyLearn: https://mylearn.oracle.com/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ Twitter: https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Ranbir Singh, and the OU Studio Team for helping us create this episode.   -------------------------------------------------------   Episode Transcript:   00;00;00;00 - 00;00;39;06 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started. Hello and welcome to the Oracle University Podcast. I'm Nikita Abraham, Principal Technical Editor with Oracle University, and with me is Lois Houston, Director of Product Innovation and Go to Market Programs.   00;00;39;08 - 00;01;07;15 Hi there. For the last two weeks, we've been off the Oracle Database train, but today we're back on it, focusing on Oracle Machine Learning with our Cloud Engineer Nick Commisso. Hi Nick. Before we get into Oracle Machine Learning, I think we should start with the basics. What is machine learning? Machine learning is focused on enabling data science teams to add ML-based intelligence to both their applications and their dashboards.   00;01;07;17 - 00;01;37;07 With machine learning, we can automate the entire data analysis process workflow, from collaborating in order to obtain data from many sources to creating and analyzing the data, and showing the results and reports. We can perform predictions and easily visualize the data to provide a deeper and faster and more comprehensive insight to enable effective business decisions. I think we can safely say that machine learning is the future of analytics in every industry, right?   00;01;37;09 - 00;02;11;26 So where does Oracle Machine Learning come in? Oracle Machine Learning provides a reliable, AI-driven environment that truly encapsulates the power of machine learning. Enhanced performance and scalability is achieved in part by eliminating data movement for database data and providing algorithms that have been redesigned specifically for scalability and performance. Next is simpler solution architecture and management, where we want to avoid requiring separately maintained analytic engines or tools for data and model governance.   00;02;11;26 - 00;02;38;10 In-database machine learning also offers flexible architectures for deployment tests, in-production spanning the cloud, on-premises, and hybrid environments. And because of its SQL and REST interfaces, it's easy to integrate with the broader Oracle stack. Now the third is that OML empowers a broader range of users with machine learning. It's readily available in the database from multiple interfaces, including third-party package support.   00;02;38;13 - 00;03;06;19 So do you have to be an expert to use machine learning? To make even non-experts productive with machine learning, OML supports AutoML from a Python API, and a no code user interface. And there's also other built-in automation features like automatic data preparation, integrated text mining, and partition models. And these make machine learning even more accessible to a broader range of users.   00;03;06;22 - 00;03;33;04 What can you tell us about the pricing structure? Machine learning capabilities are included in the core product at no additional cost with Autonomous Database, and the OML components of ADB are pre-provisioned and ready to use. And an on-premises database is included with the database license. So overall, the takeaway is that OML helps reduce costs and complexity while increasing productivity and access.   00;03;33;06 - 00;04;01;06 What are the areas or fields in which OML is useful? Modern businesses and modern problems require solution best delivered by Oracle Machine Learning. Medical science has been leveraging machine learning successfully to perform quick and accurate diagnosis or creating curative solutions using vast quantities of data. Physical robots use a combination of machine learning solutions to sense their environment and respond appropriately.   00;04;01;08 - 00;04;37;07 Computational biology makes use of machine learning to analyze biological data, such as genetic sequences or organic samples, and make predictions. Analysis with financial or security data can identify clients with high risk profiles or cybersecurity surveillance to pinpoint warning signs of fraud. The recent growth in the popularity of machine learning has been aided by the fact that we now have improved machine learning algorithms, which are supported by the advent and frequent innovation in technology related to data capture, networking and computing power.   00;04;37;11 - 00;05;02;24 So you basically don't need to write complex software for every change in the data. And the machine learning model evolves as the historical data evolves. We have more advanced sensors and I/O devices which support machine learning models with accurate and real-time data. Customers of various services are now looking for more customization options, which can be efficiently supported with machine learning solutions.   00;05;02;26 - 00;05;24;16 The historical challenges of manually trawling through data to extract actionable knowledge is no longer a problem now because machine learning algorithms supported by powerful modern computers are designed for that particular purpose.   00;05;24;19 - 00;05;52;18 Are you attending Oracle CloudWorld 2023? Learn from experts, network with peers, and find out about the latest innovations when Oracle CloudWorld returns to Las Vegas from September 18 through 21. CloudWorld is the best place to learn about Oracle solutions from the people who build and use them. In addition to your attendance at CloudWorld, your ticket gives you access to Oracle MyLearn and all of the cloud learning subscription content, as well as three free certification exam credits.   00;05;52;23 - 00;06;09;20 This is valid from the week you register through 60 days after the conference. So what are you waiting for? Register today. Learn more about Oracle CloudWorld at www.oracle.com/cloudworld.   00;06;09;22 - 00;06;39;21 Welcome back! Nick, I was hoping you could share some use cases where machine learning can really be leveraged. Banks and other businesses in the financial industry use machine learning technology for two key purposes: to identify important insights and data and to prevent fraud. The insights can identify investment opportunities to help investors know when to trade, and machine learning can also identify clients with high risk profiles or use cyber surveillance to pinpoint warning signs of fraud.   00;06;39;23 - 00;07;10;23 Machine learning is a fast growing trend in the healthcare industry. The technology can help medical experts analyze data to identify trends or red flags that may lead to improved diagnostics and treatment. Finding new energy sources, analyzing minerals in the ground, predicting refinery sensor failure, streamlining oil distribution to make it more efficient and cost effective. The number of machine learning use cases for this industry is fast and still expanding.   00;07;10;25 - 00;07;46;27 Analyzing data to identify patterns and trends is key to the transportation industry, which relies on making routes more efficient and predicting potential problems to increase profitability. The data analysis and modeling aspects of machine learning are important tools to delivery companies, public transportation, and other transportation organizations. Shopping websites also use machine learning, right? Websites recommending items you might like based on previous purchases are used with machine learning to analyze your buying history and promote other items you might be interested in.   00;07;46;29 - 00;08;14;17 The ability to capture that data and analyze it and use it to personalize a shopping experience or implement a marketing campaign is the future of retail. Government agencies, such as public safety and utilities, have a particular need for machine learning because they have multiple sources of data that can be mined for insights. Analyzing sensor data, for example, identifies ways to increase efficiency and save money.   00;08;14;20 - 00;08;42;12 Machine learning can also help detect fraud and minimize theft. Retail industries can use machine learning to recognize customer spending patterns for targeted marketing or optimize supply chain logistics by recognizing outliers or anomalies in the data. All that a data science needs to do is identify the problem domains, such as transportation, find the data, and let Oracle Machine Learning take care of the rest.   00;08;42;14 - 00;09;08;18 GPS navigation services make use of historical data to predict travel time based on the current traffic levels. Video surveillance systems uses facial recognition systems to identify situations which require attention from emergency services. Social media uses machine learning to study the patterns of user interactions to suggest connections, item of interests, targeted ads, and so on. And we can use it to find spam, I'm sure.   00;09;08;20 - 00;09;33;27 Machine learning helps email services recognize spam or malicious emails by recognizing the common patterns among offending examples. And the well-known and almost essential Internet searches use machine learning to refine results based on the search patterns of the individual users. Nick, now that you've given us a really good idea about all of the places machine learning can be used, let's talk about the features of Oracle Machine Learning.   00;09;34;00 - 00;10;09;12 Oracle Machine Learning provides access to a wide array of features in addition to the collaborative notebooks, which include templates, user administration tools, and schedulers. In-database algorithms allow you to implement machine learning solutions on your data residing in Oracle databases without having to move your data anywhere else. OML provides support for SQL, PL/SQL, R, Python languages, and Markdown, which you should be familiar with if you've worked with databases before, making implementing machine learning solutions lot easier.   00;10;09;15 - 00;10;38;04 OML also provides support for the deployment of enterprise machine learning methodologies within the Autonomous Data Warehouse. What are the different parts of Oracle Machine Learning? The components that make up Oracle Machine Learning are the machine learning user administrative application, which is a web-based user interface for managing your Oracle Machine Learning user, as well as mapping your machine learning to the Autonomous Data Warehouse database users.   00;10;38;07 - 00;11;03;22 Now you can also access machine learning user interface for the administrator. The OML application is a web-based application for your data scientists to help create workspaces and projects, as well as notebooks. Earlier in our conversation, you spoke about these powerful machine learning algorithms. Can you tell us more about that, please? The OML tagline is move the algorithms, not the data.   00;11;03;25 - 00;11;34;26 To realize this, we've placed powerful machine learning algorithms in the database kernel software operating below the user security layer. Other tools simply can't do that. OML eliminates data movement for database data and simplifies the solution architecture as there's no need to manage and test workflows involving third-party engines. OML extends the database to enable users to augment applications and dashboards with machine learning– based intelligence quickly and easily.   00;11;34;28 - 00;12;05;14 It delivers over 30 in-database algorithms accessible through multiple language interfaces, and it's important to note that the broader Oracle ecosystem for data analytics and machine learning also include tools like Oracle Analytics Server and Analytics Cloud, OCI, Data Science, AI services, and others. And OML is included with Oracle Autonomous Database instances and Oracle Database licenses. So you already have free access to it to start using it.   00;12;05;18 - 00;12;33;25 And what are the benefits of using OML, Nick? Whether minimizing or eliminating data movement, support from multiple personas or multiple languages and both code and no code interfaces. These and other benefits resonate with customers needing powerful and integrated machine learning to meet their scalability and performance needs, while simplifying their solution and deployment architecture. What are the various OML components?   00;12;33;29 - 00;13;19;07 Build ML models and score data with no data movement with the OML4SQL API. Leverage the database as a high-performance compute engine from Python with in-database ML with OML4Py API. Leverage the database as a high-performance compute engine from R with in-database ML with OML4R API. OML Notebooks is a collaborative notebook user interface supporting SQL, PL/SQL, Python, R, and Markdown. OML AutoML UI is a no-code automated modeling interface. And OML Services is a RESTful model management and deployment.   00;13;19;09 - 00;13;44;19 With Oracle Data Miner, there's a SQL Developer extension with a drag-and-drop interface for creating ML methodologies. Let's talk about the life cycle of a machine learning project. The life cycle of a machine learning project is divided into six phases. The first phase of the machine learning process is to define business objectives. The initial phase of the project focuses on understanding the project objectives and requirements.   00;13;44;22 - 00;14;10;28 In this phase, you're going to specify the objectives, determine the machine learning goals, define success criteria, and produce a project plan. The data understanding phase involves data collection and exploration, which includes loading the data and analyzing the data for your business problem. In this phase, you will access and collect the data, explore data, and understand data quality. Alright, then.   00;14;10;28 - 00;14;40;22 So what's next? The preparation phase involves finalizing the data and covers all of the tasks involved in making the data in a format that you can use to build the model. In this phase, you will clean, join, and select the data, transform data, and engineer new features. In the modeling phase, you'll select and apply various modeling techniques and tune the algorithm parameters called hyperparameters to your desired values.   00;14;40;24 - 00;15;07;20 In this phase, you're going to explore different algorithms and build, evaluate, and tune models. At the evaluation phase, it's time to evaluate how well the model satisfies the originally stated business goal. In this phase, you'll review the business objectives, assess results against success criteria, and determine the next steps. Deployment is the use of machine learning with a targeted environment.   00;15;07;22 - 00;15;42;04 In the deployment phase, one can derive data-driven insights and actionable insights. In this phase, you will plan enterprise deployment, integrate models with application for business needs, monitor, refresh, retire, and archive models, and you'll report on model effectiveness. Thank you so much, Nick, for sharing your expertise with us. This was great. To learn more about Oracle Machine Learning, please visit mylearn.oracle.com and take a look at our Using Oracle Machine Learning with Autonomous Database course.   00;15;42;06 - 00;16;07;19 Once you're done with it, you can take the associated specialist certification exam with confidence. That brings us to the end of this episode. Next week, we'll talk about MySQL and why it's everywhere. Until then, this is Nikita Abraham and Lois Houston signing off. That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes.   00;16;07;22 - 00;18;40;10 We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.

En.Digital Podcast
La Tertul-IA #4: IA en Adquisición, AutoML e impacto de IA Generativa en la productividad

En.Digital Podcast

Play Episode Listen Later Jun 30, 2023 73:01


Tertulia junto a Frankie Carrero y Fares Kameli sobre tendencias y actualidad en el mundo de la Inteligencia Artificial aplicada a negocios.Hablamos sobre:

MLOps.community
Democratizing AI // Yujian Tang // MLOps Podcast #163

MLOps.community

Play Episode Listen Later Jun 27, 2023 54:17


MLOps Coffee Sessions #163 with Yujian Tang, Democratizing AI co-hosted by Abi Aryan. // Abstract The popularity of ChatGPT has brought large language model (LLM) apps and their supporting technologies to the forefront. One of the supporting technologies is vector databases. Yujian shares how vector databases like Milvus are used in production and how they solve one of the biggest problems in LLM app building - data issues. They also discuss how Zilliz is democratizing vector databases through education, expanding access to technologies, and technical evangelism. // Bio Yujian Tang is a Developer Advocate at Zilliz. He has a background as a software engineer working on AutoML at Amazon. Yujian studied Computer Science, Statistics, and Neuroscience with research papers published to conferences including IEEE Big Data. He enjoys drinking bubble tea, spending time with family, and being near water. // MLOps Jobs board https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Yujian on LinkedIn: https://www.linkedin.com/in/yujiantang Timestamps: [00:00] Yujian's preferred coffee [02:40] Takeaways [05:14] Please share this episode with your friends! [06:39] Vector databases trajectory [09:00] 2 start-up companies created by Yujian [09:39] Uninitiated Vector Databases [12:20] Vector Databases trade-off [14:16] Difficulties in training LLMs [23:30] Enterprise use cases [27:38] Process/rules not to use LLMs unless necessary [32:14] Setting up returns [33:13] When not to use Vector Databases [35:30] Elastic search [36:07] Generative AI apps common pitfalls [39:35] Knowing your data [41:50] Milvus [48:28] Actual Enterprise use cases [49:32] Horror stories [50:31] Data mesh [51:06] GPTCash [52:10] Shout out to the Seattle Community! [53:44] Wrap up

GOTO - Today, Tomorrow and the Future
How AutoML & Low Code Empowers Data Scientists • Linda Stougaard Nielsen & Moez Ali

GOTO - Today, Tomorrow and the Future

Play Episode Listen Later May 12, 2023 15:29 Transcription Available


This interview was recorded for GOTO Unscripted at GOTO Copenhagen. gotopia.techRead the full transcription of this interview hereLinda Stougaard Nielsen - Director of Data Science & Data Engineering at AVA WomenMoez Ali - Creator of PyCaretDESCRIPTIONOver the past decade, AutoML has revolutionized the world of data science, propelling it several layers forward in terms of abstraction. This powerful technology has paved the way for a new era of democratization, empowering experts from all fields to harness the power of data through the concept of the citizen data scientist. Moez Ali, Creator of PyCaret, and Linda Stougaard Nielsen, director of data science at Ava Women, discuss two sides of this discipline and its future.RECOMMENDED BOOKSStefan Helzle • Low-Code Application Development with AppianForsgren, Humble & Kim • AccelerateDavid Farley • Modern Software EngineeringTwitterLinkedInFacebookLooking for a unique learning experience?Attend the next GOTO conference near you! Get your ticket: gotopia.techSUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted almost daily

Secrets of Data Analytics Leaders
AutoML And Declarative Machine Learning: Comparing Use Cases - Audio Blog

Secrets of Data Analytics Leaders

Play Episode Listen Later Apr 6, 2023 10:24


AutoML and the emerging approach of declarative ML help simplify the process of creating and refining ML models. Published at: https://www.eckerson.com/articles/automl-and-declarative-machine-learning-comparing-use-cases

Data Science Perspectives
Season 6, Episode 3 - Frances Boykin, Director, Advanced Analytics, AT&T

Data Science Perspectives

Play Episode Listen Later Mar 2, 2023 29:11


A discussion with Frances Boykin, Director, Advanced Analytics, AT&T. She has held multiple roles in different parts of AT&T over nearly 2 decades, including in IT, supply chain, and marketing. She talks about how she learned what she wanted to do for a living at age 13 (!!??), how growing up as an “Army brat” influenced her, the importance of networking and being prepared to grab opportunities that arise, how she planned her time to complete degrees while working, and why autoML tools are both here to stay and a good thing. #analytics #datascience #leadership #machinelearning #ai #artificialintelligence #ATT #Army

R Weekly Highlights
Issue 2023-W09 Highlights

R Weekly Highlights

Play Episode Listen Later Mar 1, 2023 36:10


How to easily create interactive versions of your favorite ggplots with ggiraph, bringing AutoML to R with forester, and a head-to-head comparison of R and Excel for common data wrangling and summaries. Episode Links This week's curator: Colin Fay - [@ColinFay]](https://twitter.com/ColinFay) (Twitter) Creating interactive visualizations with {ggiraph} (with or without Shiny) (https://albert-rapp.de/posts/ggplot2-tips/17_ggiraph/17_ggiraph.html) forester: what makes the package special? (https://medium.com/responsibleml/forester-what-makes-the-package-special-9ece9b8a64d) Why should I use R: The Excel R Data Wrangling comparison: Part 1 (https://www.jumpingrivers.com/blog/why-r-part-1/) Entire issue available at rweekly.org/2023-W09 (https://rweekly.org/2023-W09.html) Supplement Resources ggiraph online book https://www.ardata.fr/ggiraph-book {openxlsx2} read, write, and modify xlsx files https://janmarvin.github.io/openxlsx2 Supporting the show Use the contact page at https://rweekly.fireside.fm/contact to send us your feedback Get a New Podcast App and send us a boost! https://podcastindex.org/apps A new way to think about value: https://value4value.info Get in touch with us on social media Eric Nantz: @theRcast (https://twitter.com/theRcast) (Twitter) and @rpodcast@podcastindex.social (https://podcastindex.social/@rpodcast) (Mastodon) Mike Thomas: @mike_ketchbrook (https://twitter.com/mike_ketchbrook) (Twitter) and @mike_thomas@fosstodon.org (https://fosstodon.org/@mike_thomas) (Mastodon)

Lenny's Podcast: Product | Growth | Career
AI and product management | Marily Nika (Meta, Google)

Lenny's Podcast: Product | Growth | Career

Play Episode Listen Later Feb 5, 2023 48:02


Brought to you by Amplitude—Build better products: https://amplitude.com/ | Eppo—Run reliable, impactful experiments: https://www.geteppo.com/ | Pando—Always-on employee progression: https://www.pando.com/lenny—Marily is a computer scientist and an AI Product Leader currently working for Meta's reality labs, and previously at Google for 8 years. In 2014 she completed a PhD in Machine Learning. She is also an Executive Fellow at Harvard Business School and she has taught numerous courses, actively teaching AI Product Management on Maven and at Harvard. Marily joins us in today's episode to shed light on the role of AI in product management. She shares her insights on how AI is empowering her work, and why she believes that every Product Manager will be an AI Product Manager in the future. We also discuss why PM's should learn a bit of coding, where they can learn it, and best practices for working with data scientists. Marily shares some insight into building her AI Product Management course and also why she full-heartedly believes you should also create your own course.Find the full transcript here: https://www.lennyspodcast.com/ai-and-product-management-marily-nika-meta-google/#transcriptWhere to find Marily Nika:• Instagram: http://www.instagram.com/marilynika• LinkedIn: https://www.linkedin.com/in/marilynika/• YouTube: https://www.youtube.com/c/MarilyNikaPM• Website: https://bio.link/marilynikaWhere to find Lenny:• Newsletter: https://www.lennysnewsletter.com• Twitter: https://twitter.com/lennysan• LinkedIn: https://www.linkedin.com/in/lennyrachitsky/Referenced:• The Download newsletter: https://www.technologyreview.com/topic/download-newsletter/• TLDR newsletter: https://tldr.tech/• ChatGPT: https://chat.openai.com/auth/login• MidJourney: https://midjourney.com/home/• Whisper: https://whisper.ai/• Machine Learning Specialization course: https://www.coursera.org/specializations/machine-learning-introduction• Career Foundry: https://careerfoundry.com/• Coding Dojo: https://www.codingdojo.com/• Building AI Products—For Current & Aspiring Product Managers course on Maven: https://maven.com/marily-nika/technical-product-management• arXiv: https://arxiv.org/• Marginal Revolution blog: https://marginalrevolution.com/• Automl: https://cloud.google.com/automl• Inspired: How to Create Tech Products Customers Love: https://www.amazon.com/INSPIRED-Create-Tech-Products-Customers/dp/1119387507• You Look Like a Thing and I Love You: How Artificial Intelligence Works and Why It's Making the World a Weirder Place: https://www.amazon.com/You-Look-Like-Thing-Love/dp/0316525227• The Adventures of Women in Tech Workbook: A Life-Tested Guide to Building Your Career: https://www.amazon.com/Adventures-Women-Tech-Workbook/dp/1646871022• Boz to the Future podcast: https://podcasts.apple.com/us/podcast/boz-to-the-future/id1574002430• The White Lotus on HBO: https://www.hbo.com/the-white-lotus• Lensa: https://apps.apple.com/us/app/lensa-ai-photo-video-editor/id1436732536In this episode, we cover:(00:00) Marily's background(03:20) How Marily stays informed about the latest developments in AI(04:46) What is overhyped and underhyped in AI right now(05:59) How Marily uses ChatGPT for work(08:25) Why product managers will be AI product managers in the future(11:16) How to get started using AI(14:12) When not to use AI(15:47) How much data do you need for AI to work properly?(17:01) When should companies develop their own AI tools?(18:35) What an AI model is and how it is trained(21:25) How Google demonstrated the ability of AI to translate a conversation in real time(23:02) Why AI will not replace PMs(23:48) A case for learning to code(26:21) Where to learn to code(27:40) How to become a strong AI PM(29:25) Challenges that AI PMs face(31:16) Getting leadership on board with investing in AI(33:10) How PMs will work with data scientists and AI(35:29) Marily's AI course(39:12) AutoML and how a renewable-energy company used it to improve its turbine maintenance procedure(40:31) How Marily built her course and the modifications she has made(42:53) Why you should create your own course(44:08) Lightning roundProduction and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com. Get full access to Lenny's Newsletter at www.lennysnewsletter.com/subscribe

The Data Scientist Show
The power of error analysis, tree models for search relevancy, what ChatGPT means for data scientists - Sergey Feldman - The Data Scientist Show #059

The Data Scientist Show

Play Episode Listen Later Jan 24, 2023 79:43


Sergey Feldman is the head of AI at Alongside, providing mental health support for students. He is also a Lead Applied Research Scientist at Allen Institute for AI, where he built an ML model that improved search relevancy for scientific literature. Sergey has a PhD in Electrical and Electronics Engineering from the University of Washington. Today we'll talk about machine learning for search, his consulting project for the Gates Foundation, AI for mental health, and career lessons. Make sure you listen till the end. If you like the show, subscribe, leave a comment, and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science. Daliana's Twitter: https://twitter.com/DalianaLiuDaliana's Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/ Sergey's LinkedIn: https://www.linkedin.com/in/sergey-feldman-6b45074b/ Data Cowboys: http://www.data-cowboys.com/ Sergey Feldman: You Should Probably Be Doing Nested Cross-Validation | PyData Miami 2019: https://www.youtube.com/watch?v=DuDtXtKNpZs December 4th, 2018 - Breakfast with WACh with Dr. Sergey Feldman, PhD: https://www.youtube.com/watch?v=vA_czRcCpvQ (00:00:00) Introduction (00:01:24) Machine learning skeptic (00:03:02) Tree-based models for search relevance (00:14:34) How to do error analysis (00:19:20) Nested cross-validation (00:21:34) Model evaluation (00:30:43) Error analysis common mistakes (00:33:37) How to avoid overfitting (00:35:56) Consulting project with Gates Foundation (00:41:16) Tree-based models vs linear models (00:45:19) Working with non-tech stakeholders (00:50:20) Chatbot for teen's mental health (00:54:32) Can ChatGPT provide therapy? (00:58:12) How he got into machine learning (01:02:12) How to not have a boss (01:03:46) Feelings vs Facts (01:09:02) Future of machine learning (01:11:30) How to prepare for the future (01:13:39) AutoML (01:17:12) His passion for large language models

Financial Investing Radio
FIR 158: Using AI In Your Product Delivery To Leap Ahead !!

Financial Investing Radio

Play Episode Listen Later Dec 15, 2022 31:53


In this episode, I talk with the CEO and founder of an organization that has been applying AI to help them develop products. Will AI help you develop your products faster? Come and see. Grant Hey, everybody, welcome to another episode of ClickAI Radio. So today I have this opportunity to speak with one of those brains out there in the market that's being disruptive, right? They're making changes in the industry in terms of not only the problems are solving, but it's the way in which they're solving the problems using AI very fascinating. Anyway, everyone, please welcome Paul Ortchanian here to the show. Paul Hi, nice. Nice, nice of you, happy to be here on the show.  Grant Absolutely. It's very good to have you here today. When I was first introduced to you. And I started to review your material what it is that your organization has put together as fascinated with the approach because I have a product development background and in in the software world. AI was late comer to that right meaning over generations when I saw the approach that you're taking to that I'm interested to dig more into that. But before we do that big reveal, could you maybe step back and talk about the beginning your journey? What got you on this route? And this map, both in terms of product development, and technology and AI itself? Paul Yeah, absolutely. So I started out as an engineer, headed down to San Francisco in the early 2000s. And, and I was more of a thinker than an actual engineer, or just be the type of guy who would figure things out by themselves. But if you were to ask me to really do things that the real things engineers do, you know, creativity was there, but not the solutioning. So being in San Francisco was a humbling experience, I guess, Silicon Valley, you get to see some really, really good engineers. So I had to make a shift in my career. And since I had a passion for user experience, the business aspect, product management was a great fit a function I didn't really understand. And I got to learn and respect, and did that for about 10 years.  In the mid 2000s, and 10s, I basically moved back to Montreal for family reasons and cost of living, of course in San Francisco. And I started a company called Bank Biddick, which in French stands for public bath. And the idea is that most what I realized in Canada was that people here in accelerators, incubators and, and startups just didn't understand what product management was. So they didn't really understand what they do and how they do it. And I saw a lot of organizations being led by the marketing teams, or the sales team and being very service oriented and not really product LED.  So basically, it basically stands for public bath, which means every quarter, you want to basically apply some hygiene to your roadmap, you have a galaxy of ideas, why not go out there and just, you know, take the good ones and remove the old ones and get rid of the dirt. And we started with that premise. And we put we said, well, what does a product manager do on a on a quarterly basis? Because a lot of the material you'll read out there really talks about, you know what product managers should do in terms of personas and understanding the customer's data and this and that, but nobody really tells you which order you should do it. Right. If that was my initial struggle as a product manager, do you try to do it all in the same day and then you realize that there's not enough time? So the question is like in a one quarter 12 week cycle, as my first three weeks should be about understanding the market shifts the industry, the product competitors and and the users and then maybe in the next three weeks working with leadership on making sure that there is no pivots in the organization or there are some some major strategic changes and then going into analyzing the DIS parking lot of ideas and figuring out which ones are short term and re and making business cases in order to present them for, for the company to make a decision on What to do next on the roadmap.  So there is a process and we just call that process SOAP, which goes in line with our public bath theme. So the idea was like, let's let's give product managers SOAP to basically wash their roadmap on a quarterly basis. And, and that's what being public does. And we work with over 40 organizations today so far, on really implementing this product LEDs process within their organizations, we work with their leaders on identifying a product manager within the organization and making sure that marketing support sales, the CFO CEO really understand how to engage with them what to expect from them, and how product manager can add value to to the organization. And so they just doesn't become, you know, this grace towards them as many features as you can pump out, right. Grant Oh, boy, yeah. Which, which is constant problem. The other thing that I've noticed, and I'm wondering if, and I'm sure that your SOAP methodology addresses this, it's the problem of shifting an organization in terams of their funding model, right? They'll come from sort of these project centric or service centric funding styles, and then you've got to help them through that shift to a different funding model round products. You guys address that as well. Paul Yeah, we address that a lot. One of the things we always tell them is if you are a service professional services firm, and you know, I have no issues basically calling them that. If and I asked them like do you quantify staff utilization in percentages, like 70% of our engineers are being billed? Right? Do we basically look at the sales team? How many new deals do they have in terms of pipeline? Are we looking at on time delivery across those, so double use that to serve the sales team closed? And what is our time and technical staff attrition, that usually tends to be identifiers of you being a service firm? And we often ask them, well, let's let's make the shift, when we identify one little initiative that you have that you want to productize because they all these service firms, really all they want is recurring revenue, then the service is tough, right?  That you constantly have to bring in new clients. So this recurring revenue, the path to recurring revenue is, you know, being able to say, Okay, I'm going to take two engineers, one sales person, one marketing person, one support person, and a product manager. And those guys collectively will cost me a million dollars a year, and I'm going to expect them to basically bring me $3 million in recurring revenue. That means that they're, they're no longer going to be evaluated on staff utilization, they're no longer going to be evaluating the number of deals they're bringing in. And they're, they're really going to be evaluated on how are they releasing features? Are they creating value for those features? are we increasing the number of paid customers? And are we basically, you know, staying abreast in terms of competitors and market industry changes.  And so that's a complete paradigm shift. And that transition takes a while. But the first seed is really being able to say, can you create an entity within your organization where the CFO accepts that those engineers are dedicated and no longer being, you know, reviewed in terms of their utilization rate in terms of their know how much they're billing to customers? Once they do that shift in the recipe is pretty easy to do. Grant Yeah. So it's become easy. So the thing to I've seen and experienced with, with product and product development is the relationship of innovation to product development. And so I see some groups will take innovation, and they'll move that as some separate activity or function in the organization, whereas others will have that innate within the product team itself. What have you found effective? And does self addressed that? Paul Yeah, I mean, we always ask them the question of what how are you going to defend yourself against the competition with the VCs that have to call their moat, right? And that defensibility could be innovation, it could also be your global footprint, or, you know, it could be how you operationalize your supply chain make things really, really cheap, right? Every company can have a different strategy. And we really ask them from the get go. We call this playing the strategy, we'll give them like eight potential ways a company can, you know, find strategies to differentiate themselves? And the first one is first the market?  And the question is, it's not about you being first to market today. But do you want to outpace your curlier closest rivals on a regular basis? And if so, you know, you need an r&d team and innovation team who is basically going to be pumping out commercializable features or r&d work. And then we always give him the two examples, the example of Dolby Dolby being completely analog in the 70s, but really banking on their r&d team to bring him to the digital age and from the digital age to set top boxes to Hollywood and now into Netflix compression, right?  So they basically put their R&D team as the leader to basically keep them a step ahead of their competition. But it but on the other hand, we also Welcome, you know, talk about Tesla, where Tesla is basically doing the same thing, but they're not doing it for intellectual property like Dolby, they're not suing anybody are actually open sourcing it. But there's a reason behind it where that open sourcing allows them to basically create the, you know, what we call the Betamax VHS issue, which is making sure that there's compatibility across car manufacturers for Tesla parts and overproduction of parts that are Tesla just to increase their supply chain, right? So we ask them, Do you want to be that company, if you don't want to be that company, then there's other ways for you to basically create defensibility, it could be regulatory compliance, if your industry requires it, you can go global, you can go cross industry, you can basically create customer logins, how just how SAP and Salesforce love to basically just integrate workflows with like boots on the ground, professional services certified teams, right?  And or you can basically review your process and make sure just like Amazon, that you're creating robots to do human work in order to just basically do it cheaper than anybody else. So there's ways of doing it. And I would say that if you were in AI space, especially, you know, it's important to make sure that, you know, are you really trying to innovate through AI, because you can get a lot of researchers doing a lot of things, but that's not really going to help you create commercializable ideas. So from the get go, the leadership team needs to, you know, at least make a hedge a bet on, you know, expansion, innovation, or creating efficiencies and just, you know, decide and let the product management team know in which direction they're gonna go planning on going for the next six years. Please. Grant I love your last comment there, Paul about about getting the leadership team involved. It seems that many times in organizations, this challenge of making the change sticky, right, making it last making it resonate, where people truly change their operating model, right, they're going to start operating in a different way, their roles and responsibilities change, what is the order in which things get done all of those change, when they start moving both into this AI space, but you know, product driven just by itself, even without AI has its own set of challenges? So here's the question I have for you. As you move companies through this transformation, that's part of your business, right? You are transforming the way companies operate and bring about better outcomes. How do you make those changes sticky? Because this is a cultural change? What is it you guys have found it's effective? Paul Or it goes back to our name public bath and SOAP, right? Because the idea is, you take a bath on a regular basis hygiene is something you do regularly, right? So we ask these organization, if we give you a process where you know exactly what the product management team is going to do with you with the leadership team in order to prioritize your next upcoming features, then can you do it in a cyclical way, every quarter, you need the product manager do the exact same process of revisiting the competitors, the industry, the market, as well as like the problems that you have with your premature customers, bringing it back to the organization, asking if the strategy is still about expansion, innovation, efficiencies, identifying new ideas, clearing up the parking lot of bad ideas, etc, and eventually making the business case for the new features in order for them to make a commitment. So if we do this in a cyclical way, then the product role becomes the role of what I'd like to call the CRO, which is the chief repeating officer, because all the product manager is doing is repeating that strategy and questioning the CEO, are we still on? Are we pivoting or if we pivot?  What does that mean? And if you're doing it on a three month basis, what that allows your company to do is to make sure that the marketing and sales and support team are going along with what the engineering team is going to be delivering. So this is what I usually see most product organization where a decision has been made that the engineers are going to be building a particular feature, the sales and marketing team just waits for the engineers to be Code Complete. And once a code completes, done, they're like, Okay, now we're gonna promote it. But my question is that it's too late. Right? You really need so I always show the talk about Apple, how Apple would basically go out in front of millions of people and just say, here's the new iPhone 13. And we came up with a new version of Safari, and we're updating our iOS and we're doing a 40 Other changes. And the next thing you want considered an Apple store and you know, everything has changed. The marketing has changed the guys that the doing the conferences, and the lectures and the training are all talking about the new supplier, the new iPhone, and you ask yourself, How did how did Apple know and to organize the marketing support and sales team in that in such a way that the day that the announcement has been done? Everything is changed. So that means that it's not just the engineering team's responsibility to get to Code Complete.  It is a collective responsibility where marketing support and sales are also preparing for the upcoming releases. And and the only way you can get that type of alignment is If every three months these these parties, technology, product, CEO, CFO, sales, marketing and support can get together and make a clear decision on what they're going to do, and be honest enough of what they're not going to do, and then work collectively together on making sure that that those are being delivered and prepared in terms of the size of the promotion that we're going to do, and how are we going to outreach how's the sales collateral going to change? How is the support team going to support these upcoming features. And so everybody has work to do in that three months timeframes. So and then that if we can get to that cyclical elements, I think most companies can create momentum. And once that momentum has is generating small increments of value to the customers, then you base start start building, what I like to call reputational capital, with the clients, with the customers with the prospects. And eventually anything you release the love, and everything you release adds value. And eventually everybody loves everything you're doing as an organization become that, you know, big unicorn that people want to be. Grant Yeah, so the net of that is, I believe what you said as you operationalize it. Now there's it gets integrated into everyone's role and responsibility. It's this enterprise level cross functional alignment that gets on a campus. And the cadence is, in your case, you'd mentioned quarterly, quarterly sounds like that's been a real real gem for you. I've seen some organizations do that in shorter timeframes and some much longer. It sounds like yeah, at least quarterly is that a good nugget that you find there?  Paul Yeah, quarterly works, because you know, markets are set in a quarter way they operate in that way the you want results on a quarterly basis in terms of sales in terms of engagement, etc. But what's important is that which you know, a lot of engineering teams like to work agile or Kanban. And in a quarter in a 12 week timeframe, you could fit, I'd say, Let's see your Sprint's are three weeks, you could fit for sprint for three weeks variance, or you could fit six 2-week sprints. But I feel that if you were to shorten it, then the marketing team and sales teams supporting might not have enough time to prepare themselves for Code Complete, the engineers might be able to deliver but then the product manager gets overwhelmed because doing an industry research, competitor research etc. Every, say month and a half or two months just becomes overwhelming for them. Because things don't change enough in two months for them to be able to say, Oh, look, this competitor just came up with that. And now we need so so I think three months is enough time for the world to change for, you know, country to go to war for COVID to come over and just destroy everything. So pivot decisions are usually can pretty good to do on a on a quarterly basis.  Grant Yeah, that's good. That's, I think COVID follow that rule. Right. Hey, I have a question for you around AI. So how are you leveraging AI in the midst of all this? Can you talk about that? Paul Yeah, absolutely. So what we noticed is a lot of organizations who have products, so SaaS products, or any type of product, IoT products, etc, they're generating data. I mean, it's it comes hand in hand with software development. So all that data is going into these databases are and nobody knows what to do with them. And eventually, you know, they want to start creating business intelligence, and from business intelligence, AI initiatives have just come about, it's very normal to say, You know what, with all this data, if we were to train a machine learning module, we would be able to recommend the best flight price or the best time for somebody to buy a flight, because we have enough data to do it. So so we're not working with AI first organizations who are here we have, our entire product is going to be around AI, we're just trying to work with organizations that have enough data to warrant 1-2-3, or four AI initiatives and an ongoing investment into those. So the best example I like to talk about is the Google Gmail suggestive, replies, right, which is adding value to the user needs AI in the back, end a lot of data.  But ultimately, it's not that Gmail isn't AI product, it simply has AI features in it. So and when organizations start identifying AI or machine learning, predictive elements to their product, then we go from engineering being a deterministic function, which is if we were to deliver this feature, then customers will be able to do that to a probabilistic function where Let's experiment and see what the data can give us. And if this algorithm ends up really nailing it, we will achieve this result. But if it doesn't, then do we release it? Do we not release it?  What's the and then it gets a little bit hairy because product managers just lose themselves into it. Oftentimes, they'll release a feature and the sales team would just ask them to pull it out right away because it has not met the expectations of a customer or two. And ultimately, like what we ask product managers to do is work with leadership on really it Identifying a few key elements that are very, very important to just just baseline before you were to begin an AI project. And those are pretty simple. It's, it's really like, are you trying to create to have the machine learning module? Make a prediction? Are you or are you trying for it to make a prediction plus pass judgment? Are you trying to make it a prediction, a judgment and take action? Right? Decision automation, which is what you know, self driving cars do, will will see biker, they will make a prediction that it's a biker will make a judgment that it's indeed a biker, and we'll take action to avoid the biker, right?  But when you when you're creating ml projects, you can easily say, you know, we're just going to keep it to prediction, right? Like this machine is going to predict something and then a human will make judgment and the human will take action. There's nothing wrong in doing that. So just setting the expectations for from the get go in terms of are we basically going to predict judge or take action? That's number one. And then the next question is whatever that we decide if it's just prediction, is that worth guessing? And who doesn't have guessed today, if it's a human? Is that how accurate is that human? Let's quantify. So this way we can compare it against what this machine is going to do? What is the value the company gets out of that gas being the right gas? And what's the cost of getting it wrong? So oftentimes, we forget that humans to get it wrong to and if humans get it wrong, there are huge consequences to organizations that will overlook but as soon as machine learning does the same thing, we're ready to just cancel hundreds of $1,000 of investment.  Grant Yeah, that's right. Yeah, we tossed it out. So the use case, I'm assuming would be you would leverage AI to say enhance a product managers abilities to either predict outcomes of some product development activities, or releases or things like that, would that be a kind of use case where he looked apply? Paul Well, not a product managers, I would say the product manager, we'd look at it software, let's take the software of a website that tries to predict your if people qualify for a mortgage loan, for example, right? So you have enough data at that point to be able to automate, what's the underwriting process that humans do of validating whether or not somebody's eligible for loan? Well, we could take all that data and just make a prediction of that person's fit for a particular loan. Now, if we were to say, well, it's just going to be the prediction, but we're not going to give this person the loan, we're still going to ask a human being to pass judgment that that prediction was the correct one, and then take action to give or not give him a loan.  So let's say that's the machine learning module, we're going to add to our to our feature. Now, the question is how this underwriting department in the past 10 years, how often did they really screw up that, you know, and issued loans to people that were that couldn't pay their loan, right? And realize it's 40%? Were like, Wow, 40%? Could this machine learning be as accurate as damn plus one, right? And, and then we ended up realizing that yeah, this, whatever we delivered is 33% accurate, and not 40% plus one accurate now is it still worth putting out there we spent $100,000 into it, and then you know, then it's up to the product manager to basically be able to put this thing in place and say, but look, you know, underwriting is a nine to five job currently in our business, and it cost us this much money.  On the other hand, if there's this machine learning is 33% accurate, but it's actually doing it 24/7 365 days a year, and it's only going to improve from 33 to 40. And if it goes above 40, then we the savings for our organization are this much money. So it is really the product managers job to be able to not only talking about the business KPIs, but also the what the AI machine learning KPIs we need to achieve and what the impact of that would be if we get it right. And I think that the biggest issue we have as product managers in the AI space is if we were to go and do this all there everything that we need to create AI, like the day data ops, selecting the data, sourcing it, synthesizing it, cleaning it, etc. The model ops, which, you know, comes down to multiple algorithms, training those algorithms, evaluating tuning them, and then the operationalization. If you do all these steps, and you get to 80 to 20% accuracy, and your target is at 70% accuracy, right? What do you do with it?  Because you had to do all this work anyways, it cost you tons of money and time. And so how do we get the leadership team to say this AI initiative has enough value for us that we're willing to live with the consequences of it getting it wrong, or we're willing to actually have it supported by human for the next six months to a year until we basically trains itself and gets better? So it's how do you get this openness from from from a leadership team? Because what I've often find delivering AI projects is every time you deliver an AI project, and it's misunderstood in terms of its output, and everybody thinks it has to be 100% accurate, the second and goes wrong. It's the political drama that you have to go through in order to keep it alive. is just it's just overwhelming, right? So miners will set those expectations up front and tool, the product managers with the right arguments to make sure that they the expectations are set correctly. Grant Have you ever worked with or heard of the company called digital.ai? Are your familiar with them? digital.ai, maybe not. Anyway, they have been working in a similar space as you but not so much of the product management level. What they're doing, though, is they're, they're looking to apply AI to the whole delivery function. So so you can you see, the product manager is above this, and is making sort of these KPIs and other estimate activities and the planning out. But then there are all these functions under there that of course, do the delivery of the product. And so they're working on the tooling spectrum, I think they acquired I think, was five different companies like in the last nine months, that they're integrating these and then building this AI seam or layer across that data across delivery with that purpose and intent to do that predictive not not only backwards analysis activities around AI, but predictive, which is what's the probabilities, I might run into the problem, or some problem with this particular release, right, of this product, right, that we're about to send out, now might be an interesting group for you to get connected with. Paul Yeah, I know, it's funny, because we're there. There's a local company here in Montreal that does the same thing. It's really about like data scientists are really expensive, and they're really hard to find, and there's a shortage of them. So, you know, the lot of organizations are trying to find like a self serve AI solution where you can build your AI using their AI. But ultimately, what they're doing is taking your data and delivering 123 or 10 versions of the machine learning module, it's up to you basically, judge which one is going to work the best for you, but they actually operationalize it, put it out there for you, and really automate the whole thing. So this way, you're not dependent on humans, I love that I really love that I think your organization should have one of those. But that still means that there's a dependency from the for the product manager to know that it's, it's data, like end to end, be able to clean it be able to tag it and then feed it to the to these machines, right? And I think that part is also misunderstood. Because Do we have enough data? Is there bias in the data and all that needs to be understood and figure it out? Because, you know, you could say like, Hey, we put it to this big machine. And we ended up with a 20% accuracy on the best ml that it out, put it, but that's still not good enough? Because we're trying, we're aiming for 87? And what does it mean? What do we need to do to basically get it to 87? We're gonna have to review the data bringing some third party data, you know, and it's, and that's, that costs a lot as well. So, yeah, Grant Do you think AutoML solutions play a role here like, Aible, I don't know if you're familiar with that platform, you know, that the goal is to try to reduce the amount of dependency that's needed on the data science. Scientists themselves, right. And but it's, it's still doesn't remove all of the data cleansing part, but it does help take care of some of the certainly the low level data science requirements, you think you think that's a viable solution in this area?  Paul I think it is. I mean, it's, you know, we went from rule based AI, where data scientists had to do good old fashioned AI, which was a feature engineering, right? Putting the rules themselves to machine learning AI, where, you know, we had to train the data that we needed, were so dependent on these data scientists. And now we're getting to v3, where we have these tools. And you know, there's a data dependency, but there, they also don't have such a high dependency on data scientists are and you know, figuring our algorithms and etc, we could just basically have these prepackaged algorithms that could basically output us any types of solution. What I tend to like, I've seen this a lot in a lot of companies. There's some companies that are very, very industry specific, right? So they're providing AI for E-commerce to be able to provide better search with predictive elements based on the person's browsing history. I mean, I, I'm not sure, but the ones that are providing every ML imaginable, so you could use it for supply chain, or you could use it for something else. I know it's dependent on data. But again, these algorithms, you can't have all the algorithms for all scenarios.  Even if it's supply chain, some person has perishables and there's ordering bananas and the other person is ordering, I don't know water coolers, and those, those don't have the same rules, right. You know, so it's, it's important to just, I think that maybe in the coming years, we'll have a lot of companies that are really going cross industry, just like we're in E-commerce, the other ones that are med tech, the other ones are, etcetera, the tools are the same. I mean, more or less the same, the customers are gonna get used to basically having these UI is that I'll give you your input the data in and then these emails come out, and then you choose which one and they give you probability you can retrain them and all that stuff. And I think that it's just going to get to a point where we're going to have these product managers who are now responsible of kind of training the Machine Learning Module themselves, you know if it's going to be the product manager, or if it's going to be some other function, where I think it does definitely fit inside the product managers? Grant Well I do is, I think it's because they still need to have what we would call the domain knowledge and in this domain of building products, yeah, AI, at least at least in this phase of the life of AI, where we are today for the foreseeable future. I think the product manager needs to be involved with that. Sure. So. Paul It comes down to intuition, right, somebody has to have like to build that intuition about what a model is relying on when making a judgment. And I think that, you know, with product managers, the closest one really, maybe in bigger organizations, it's the person who's managing analytics and data, but in smaller startup organization, I can definitely see the product manager putting that  Grant Yeah, absolutely. Paul, I really appreciate you taking the time. Here today on this been fascinating conversation. Any last comments you want to share? Paul We have tons of articles that talk about so we're very open source as an organization. So if you want to learn more about this, we have about 70 articles on our website. Just go to BainPublic.com and just click on "Articles" and you could just, you know, self serve and basically improve as a product manager in the AI space. Grant Excellent, fascinating, love, love the conversation, your insight and the vision where you guys are taking this I think you're gonna continue to disrupt everyone. Thanks for joining another episode of ClickAI Radio and until next time, check out BainPublic.com. Thank you for joining Grant on ClickAI Radio. Don't forget to subscribe and leave feedback. And remember to download your free ebook visit ClickAIRadio.com now.  

AI and the Future of Work
Emmanuel Turlay, Founder and CEO of Sematic and machine learning pioneer, discusses what's required to turn every software engineer into an ML engineer

AI and the Future of Work

Play Episode Listen Later Dec 4, 2022 45:10


Emmanuel Turlay spent more than a decade in engineering roles at tech-first companies like Instacart and Cruise before realizing machine learning engineers need a better solution. Emmanuel started Sematic earlier this year and was part of the YC summer 2022 batch. He recently raised a $3M seed round from investors including Race Capital and Soma Capital. Thanks to friend of the podcast and former guest Hina Dixit from Samsung NEXT for the intro to Emmanuel.I've been involved with the AutoML space for five years and, for full disclosure, I'm on the board of Auger which is in a related space. I've seen the space evolve and know how much room there is for innovation. This one's a great education about what's broken and what's ahead from a true machine learning pioneer.Listen and learn...How to turn every software engineer into a machine learning engineerHow AutoML platforms are automating tasks performed in traditional ML toolsHow Emmanuel translated learning from Cruise, the self-driving car company, into an open source platform available to all data engineering teamsHow to move from building an ML model locally to deploying it to the cloud and creating a data pipeline... in hoursWhat you should know about self-driving cars... from one of the experts who developed the brains that power themWhy 80% of AI and ML projects failReferences in this episode:Unscrupulous users manipulate LLMs to spew hateHina Dixit from Samsung NEXT on AI and the Future of WorkApache BeamEliot Shmukler, Anomalo CEO, on AI and the Future of Work

Gradient Dissent - A Machine Learning Podcast by W&B
D. Sculley — Technical Debt, Trade-offs, and Kaggle

Gradient Dissent - A Machine Learning Podcast by W&B

Play Episode Listen Later Dec 1, 2022 60:26


D. Sculley is CEO of Kaggle, the beloved and well-known data science and machine learning community.D. discusses his influential 2015 paper "Machine Learning: The High Interest Credit Card of Technical Debt" and what the current challenges of deploying models in the real world are now, in 2022. Then, D. and Lukas chat about why Kaggle is like a rain forest, and about Kaggle's historic, current, and potential future roles in the broader machine learning community.Show notes (transcript and links): http://wandb.me/gd-d-sculley---⏳ Timestamps: 0:00 Intro1:02 Machine learning and technical debt11:18 MLOps, increased stakes, and realistic expectations19:12 Evaluating models methodically25:32 Kaggle's role in the ML world33:34 Kaggle competitions, datasets, and notebooks38:49 Why Kaggle is like a rain forest44:25 Possible future directions for Kaggle46:50 Healthy competitions and self-growth48:44 Kaggle's relevance in a compute-heavy future53:49 AutoML vs. human judgment56:06 After a model goes into production1:00:00 Outro---Connect with D. and Kaggle:

SuperDataScience
627: AutoML: Automated Machine Learning

SuperDataScience

Play Episode Listen Later Nov 15, 2022 90:57


Jon Krohn speaks with Erin LeDell, H2O.ai's Chief Machine Learning Scientist. They investigate how AutoML supercharges the data science process, the importance of admissible machine learning for an equitable data-driven future, and what Erin's group Women in Machine Learning & Data Science is doing to increase inclusivity and representation in the field. This episode is brought to you by Datalore (https://datalore.online/SDS), the collaborative data science platform. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information. In this episode you will learn: • The H2O AutoML platform Erin developed [07:43] • How genetic algorithms work [19:17] • Why you should consider using AutoML? [28:15] • The “No Free Lunch Theorem” [33:45] • What Admissible Machine Learning is [37:59] • What motivated Erin to found R-Ladies Global and Women in Machine Learning and Data Science [47:00] • How to address bias in datasets [57:03] Additional materials: www.superdatascience.com/627

MLOps.community
ML Platforms, Where to Start? // Olalekan Elesin // Coffee Sessions #118

MLOps.community

Play Episode Listen Later Aug 26, 2022 53:05


MLOps Coffee Sessions #118 with Olalekan Elesin, Director of Data Platform & Data Architect at HRS Product Solutions GmbH, co-hosted by Vishnu Rachkonda. // Abstract You don't have infinite resources? Call out your main metrics! Focus on the most impactful things that you could do for your data scientists. Olalekan joined us to talk about his experience previously building a machine learning platform at Scaleout24. From our standpoint, this is the best demonstration and explanation of the role of technical product management in ML that we have on the podcast so far! // Bio Olalekan Elesin is a technologist with a successful track record of delivering data-driven technology solutions that leverages analytics, machine learning, and artificial intelligence. He combines experience working across 2 continents and 5 different market segments ranging from telecommunications, e-commerce, online marketplaces, and current business travel. Olalekan built the AI Platform 1.0 at Scout24 and currently leads multiple data teams at HRS Group. He is an AWS Machine Learning Community Hero in his spare time. // MLOps Jobs board https://mlops.pallet.xyz/jobs MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links What Customers Want: Using Outcome-Driven Innovation to Create Breakthrough Products and Services book by Anthony Ulwick: https://www.amazon.com/What-Customers-Want-Outcome-Driven-Breakthrough/dp/0071408673 Empowered: Ordinary People, Extraordinary Products by Marty Cagan: https://www.amazon.com/EMPOWERED-Ordinary-Extraordinary-Products-Silicon/dp/111969129X How to Avoid a Climate Disaster: The Solutions We Have and the Breakthroughs We Need by Bill Gates: https://www.amazon.com/How-Avoid-Climate-Disaster-Breakthroughs/dp/059321577X --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Vishnu on LinkedIn: https://www.linkedin.com/in/vrachakonda/ Connect with Olalekan on LinkedIn: https://www.linkedin.com/in/elesinolalekan/ Timestamps: [00:00] Introduction to Olalekan Elesin [00:42] Takeaways [02:52] Situation at Scaleout24 [07:53] Data landscape engineer and architect [11:27] Depiction of events [13:53] Platform approach investment [15:59] Exceptional need or opportunity to the most intense need [17:41] Long-tail pieces [22:01] Metrics [24:15] Nitty-gritty product works [26:00] Educating people metrics [30:02] Upskilling fundamentals of the product discipline [34:05] Investing in AWS [37:53] Best-of-breed tools [44:34] Continuous development for AutoML [47:26] Rapid fire questions [52:19] Wrap up

Gradient Dissent - A Machine Learning Podcast by W&B
Jordan Fisher — Skipping the Line with Autonomous Checkout

Gradient Dissent - A Machine Learning Podcast by W&B

Play Episode Listen Later Aug 4, 2022 57:58


Jordan Fisher is the CEO and co-founder of Standard AI, an autonomous checkout company that's pushing the boundaries of computer vision. In this episode, Jordan discusses “the Wild West” of the MLOps stack and tells Lukas why Rust beats Python. He also explains why AutoML shouldn't be overlooked and uses a bag of chips to help explain the Manifold Hypothesis. Show notes (transcript and links): http://wandb.me/gd-jordan-fisher --- ⏳ Timestamps: 00:00 Intro 00:40 The origins of Standard AI 08:30 Getting Standard into stores 18:00 Supervised learning, the advent of synthetic data, and the manifold hypothesis 24:23 What's important in a MLOps stack 27:32 The merits of AutoML 30:00 Deep learning frameworks 33:02 Python versus Rust 39:32 Raw camera data versus video 42:47 The future of autonomous checkout 48:02 Sharing the StandardSim data set 52:30 Picking the right tools 54:30 Overcoming dynamic data set challenges 57:35 Outro --- Connect with Jordan and Standard AI

The Cloud Pod
170: The Cloud Pod Is Also Intentionally Paranoid

The Cloud Pod

Play Episode Listen Later Jun 30, 2022 53:24


On The Cloud Pod this week, the team discusses Jonathan's penance for his failures. Plus: Microsoft makes moves on non-competes, NDAs, salary disclosures, and a civil rights audit; AWS modernizes mainframe applications for cloud deployment; and AWS CEO Adam Selipsky chooses to be intentionally paranoid. A big thanks to this week's sponsor, Foghorn Consulting, which provides full-stack cloud solutions with a focus on strategy, planning and execution for enterprises seeking to take advantage of the transformative capabilities of AWS, Google Cloud and Azure. This week's highlights

AI in Action Podcast
E353 Oscar Beijbom, Co-Founder at Nyckel

AI in Action Podcast

Play Episode Listen Later Jun 17, 2022 14:10


Today's guest is Oscar Beijbom, Co-Founder at Nyckel in San Francisco. Founded in 2021, Nyckel is democratizing access to machine learning for companies of all sizes. Instead of needing teams of ML specialists and years of expertise, Nyckel allows developers to add state-of-the-art machine learning to their applications in minutes. Nyckel's API-first design enables a fast and secure fully automated integration of ML into your application. You can also scale to millions of invokes on day one with an instant model deploy to elastic inference infra with 99.99% uptime and 300ms latency. They also offer highly parallelized AutoML trains and evaluates top deep learning methods on your data in seconds. In the episode, Oscar will talk about: The motivation for setting up Nyckel, Use cases of the benefits they can bring to customers, An insight into their data & engineering team, What's in store for the near future at Nyckel, & Why Nyckel is a great place to work

SuperDataScience
SDS 573: Automating ML Model Deployment

SuperDataScience

Play Episode Listen Later May 10, 2022 66:34


In this episode, co-founder and CEO of Linea, Dr. Doris Xin, joins Jon Krohn to discuss how automating ML model deployment delivers groundbreaking change to data science productivity, and shares what it's like being the CEO of an exciting, early-stage tech start-up. In this episode you will learn: • How Linea reduces ML model deployment down to a couple of lines of Python code [5:14] • Linea use cases [11:30] • How DAGs can 10x production workflow efficiency [22:12] • ML model graphlets and reducing wasted computation [24:14] • What future Doris envisions for autoML [35:23] • Doris's day-to-day life as a CEO of an early-stage start-up [42:43] • What Doris looks for in the engineers and data scientists that she hires [52:21] • The future of Data Science and how to prepare best for it [53:58] Additional materials: www.superdatascience.com/573

The Data Canteen
Lak Lakshmanan: Data Science...Broader Than Ever! | The Data Canteen #16

The Data Canteen

Play Episode Listen Later May 1, 2022 57:22


Lak Lakshmanan is an operating executive at Silver Lake focused on improving the value of portfolio companies through data and AI-driven innovation. Prior to Silver Lake, Lak was the Director for Data Analytics and AI Solutions on Google Cloud and a Research Scientist at NOAA. He co-founded Google's Advanced Solutions Lab and is the author of several O'Reilly books and Coursera courses. He was elected a Fellow of the American Meteorological Society (the highest honor offered by the AMS) for his data science work. In this episode, Host Ted Hallum and Lak dive into Google Cloud's evolved view of what data science encompasses (IT'S BROADER THAN EVER!), the biggest developments that Lak sees on the horizon for our field, how to deal with the blistering pace of change in the datasphere, and the challenges faced by all non-tech organizations that are striving for an edge with AI.     FEATURED GUEST: Name: Lak Lakshmanan LinkedIn: https://www.linkedin.com/in/valliappalakshmanan/ Twitter: https://twitter.com/lak_luster Medium: https://lakshmanok.medium.com/     SUPPORT THE DATA CANTEEN (LIKE PBS, WE'RE LISTENER SUPPORTED!): Donate: https://vetsindatascience.com/support-join     EPISODE LINKS: Lak on Medium: https://lakshmanok.medium.com/ Lak on Google Scholar: https://scholar.google.com/citations?user=qphajtkAAAAJ&hl=en Lak's Website: www.vlakshman.com Lak's Books: https://aisoftwarellc.weebly.com/books.html Lak's Technical Articles: https://aisoftwarellc.weebly.com/articles.html Lak's Recorded Talks: https://aisoftwarellc.weebly.com/talks.html Lak's Courses: https://aisoftwarellc.weebly.com/courses.html Lak's Journal Articles: https://aisoftwarellc.weebly.com/research.html Lak's Resume/Vitae: https://aisoftwarellc.weebly.com/resumevitae.html     PODCAST INFO: Host: Ted Hallum Website: https://vetsindatascience.com/thedatacanteen Apple Podcasts: https://podcasts.apple.com/us/podcast/the-data-canteen/id1551751086 YouTube: https://www.youtube.com/channel/UCaNx9aLFRy1h9P22hd8ZPyw Stitcher: https://www.stitcher.com/show/the-data-canteen     CONTACT THE DATA CANTEEN: Voicemail: https://www.speakpipe.com/datacanteen     VETERANS IN DATA SCIENCE AND MACHINE LEARNING: Website: https://vetsindatascience.com/ Join the Community: https://vetsindatascience.com/support-join Mentorship Program: https://vetsindatascience.com/mentorship     OUTLINE: 00:00:00​ - Introduction 00:02:15 - How Lak got into data science 00:10:06 - How to deal with the blistering pace of change in the datasphere 00:16:45 - Google Cloud's evolved view of what data science encompasses 00:25:48 - Is Google Cloud's new definition of data science inspired by MLOps? 00:28:05 - Biggest developments that Lak sees on the horizon for our field 00:33:59 - How capable do you think AutoML is? What role will it play in the future? 00:39:59 - Where Lak suggests veterans should focus when transitioning into data  00:47:22 - Challenges faced by all non-tech organizations that are striving for an edge with AI 00:51:25 - Lak's secret to keeping his learning and growth focus on track 00:54:38 - Lak's favorite way to learn new things 00:56:12 - How Lak prefers to be contacted 00:56:34 - Farewells

Lights On Data Show
Getting Started with AutoML

Lights On Data Show

Play Episode Listen Later Jan 14, 2022 41:18


Do you want to get started with autoML? Join us and Nathan George, Author and Author & Data Scientist at Tink What autoML is and what it is used for What environments autoML works with The projects/tasks recommended using autoML for Best practices and advice Plus, advice for those wanting to get into data science Don't forget to check out Nathan George's book on "Practical Data Science with Python": https://packt.link/ngeorge