Podcasts about linear regression

  • 56PODCASTS
  • 85EPISODES
  • 28mAVG DURATION
  • ?INFREQUENT EPISODES
  • Aug 22, 2024LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about linear regression

Latest podcast episodes about linear regression

Machine Learning Street Talk
Adversarial Examples and Data Modelling - Andrew Ilyas (MIT)

Machine Learning Street Talk

Play Episode Listen Later Aug 22, 2024 88:00


Andrew Ilyas, a PhD student at MIT who is about to start as a professor at CMU. We discuss Data modeling and understanding how datasets influence model predictions, Adversarial examples in machine learning and why they occur, Robustness in machine learning models, Black box attacks on machine learning systems, Biases in data collection and dataset creation, particularly in ImageNet and Self-selection bias in data and methods to address it. MLST is sponsored by Brave: The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api Andrew's site: https://andrewilyas.com/ https://x.com/andrew_ilyas TOC: 00:00:00 - Introduction and Andrew's background 00:03:52 - Overview of the machine learning pipeline 00:06:31 - Data modeling paper discussion 00:26:28 - TRAK: Evolution of data modeling work 00:43:58 - Discussion on abstraction, reasoning, and neural networks 00:53:16 - "Adversarial Examples Are Not Bugs, They Are Features" paper 01:03:24 - Types of features learned by neural networks 01:10:51 - Black box attacks paper 01:15:39 - Work on data collection and bias 01:25:48 - Future research plans and closing thoughts References: Adversarial Examples Are Not Bugs, They Are Features https://arxiv.org/pdf/1905.02175 TRAK: Attributing Model Behavior at Scale https://arxiv.org/pdf/2303.14186 Datamodels: Predicting Predictions from Training Data https://arxiv.org/pdf/2202.00622 Adversarial Examples Are Not Bugs, They Are Features https://arxiv.org/pdf/1905.02175 IMAGENET-TRAINED CNNS https://arxiv.org/pdf/1811.12231 ZOO: Zeroth Order Optimization Based Black-box https://arxiv.org/pdf/1708.03999 A Spline Theory of Deep Networks https://proceedings.mlr.press/v80/balestriero18b/balestriero18b.pdf Scaling Monosemanticity https://transformer-circuits.pub/2024/scaling-monosemanticity/ Adversarial Examples Are Not Bugs, They Are Features https://gradientscience.org/adv/ Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies https://proceedings.mlr.press/v235/bartoldson24a.html Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors https://arxiv.org/abs/1807.07978 Estimation of Standard Auction Models https://arxiv.org/abs/2205.02060 From ImageNet to Image Classification: Contextualizing Progress on Benchmarks https://arxiv.org/abs/2005.11295 Estimation of Standard Auction Models https://arxiv.org/abs/2205.02060 What Makes A Good Fisherman? Linear Regression under Self-Selection Bias https://arxiv.org/abs/2205.03246 Towards Tracing Factual Knowledge in Language Models Back to the Training Data [Akyürek] https://arxiv.org/pdf/2205.11482

Data Engineering Podcast
When And How To Conduct An AI Program

Data Engineering Podcast

Play Episode Listen Later Mar 3, 2024 46:25


Summary Artificial intelligence technologies promise to revolutionize business and produce new sources of value. In order to make those promises a reality there is a substantial amount of strategy and investment required. Colleen Tartow has worked across all stages of the data lifecycle, and in this episode she shares her hard-earned wisdom about how to conduct an AI program for your organization. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. It is an open-source, cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. Your team can get up and running in minutes thanks to Dagster Cloud, an enterprise-class hosted solution that offers serverless and hybrid deployments, enhanced security, and on-demand ephemeral test deployments. Go to dataengineeringpodcast.com/dagster (https://www.dataengineeringpodcast.com/dagster) today to get started. Your first 30 days are free! Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst (https://www.dataengineeringpodcast.com/starburst) and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Join us at the top event for the global data community, Data Council Austin. From March 26-28th 2024, we'll play host to hundreds of attendees, 100 top speakers and dozens of startups that are advancing data science, engineering and AI. Data Council attendees are amazing founders, data scientists, lead engineers, CTOs, heads of data, investors and community organizers who are all working together to build the future of data and sharing their insights and learnings through deeply technical talks. As a listener to the Data Engineering Podcast you can get a special discount off regular priced and late bird tickets by using the promo code dataengpod20. Don't miss out on our only event this year! Visit dataengineeringpodcast.com/data-council (https://www.dataengineeringpodcast.com/data-council) and use code dataengpod20 to register today! Your host is Tobias Macey and today I'm interviewing Colleen Tartow about the questions to answer before and during the development of an AI program Interview Introduction How did you get involved in the area of data management? When you say "AI Program", what are the organizational, technical, and strategic elements that it encompasses? How does the idea of an "AI Program" differ from an "AI Product"? What are some of the signals to watch for that indicate an objective for which AI is not a reasonable solution? Who needs to be involved in the process of defining and developing that program? What are the skills and systems that need to be in place to effectively execute on an AI program? "AI" has grown to be an even more overloaded term than it already was. What are some of the useful clarifying/scoping questions to address when deciding the path to deployment for different definitions of "AI"? Organizations can easily fall into the trap of green-lighting an AI project before they have done the work of ensuring they have the necessary data and the ability to process it. What are the steps to take to build confidence in the availability of the data? Even if you are sure that you can get the data, what are the implementation pitfalls that teams should be wary of while building out the data flows for powering the AI system? What are the key considerations for powering AI applications that are substantially different from analytical applications? The ecosystem for ML/AI is a rapidly moving target. What are the foundational/fundamental principles that you need to design around to allow for future flexibility? What are the most interesting, innovative, or unexpected ways that you have seen AI programs implemented? What are the most interesting, unexpected, or challenging lessons that you have learned while working on powering AI systems? When is AI the wrong choice? What do you have planned for the future of your work at VAST Data? Contact Info LinkedIn (https://www.linkedin.com/in/colleen-tartow-phd/) Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. Links VAST Data (https://vastdata.com/) Colleen's Previous Appearance (https://www.dataengineeringpodcast.com/starburst-lakehouse-modern-data-architecture-episode-304) Linear Regression (https://en.wikipedia.org/wiki/Linear_regression) CoreWeave (https://www.coreweave.com/) Lambda Labs (https://lambdalabs.com/) MAD Landscape (https://mattturck.com/mad2023/) Podcast Episode (https://www.dataengineeringpodcast.com/mad-landscape-2023-data-infrastructure-episode-369) ML Episode (https://www.themachinelearningpodcast.com/mad-landscape-2023-ml-ai-episode-21) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)

The Machine Learning Podcast
Strategies For Building A Product Using LLMs At DataChat

The Machine Learning Podcast

Play Episode Listen Later Mar 3, 2024 48:40


Summary Large Language Models (LLMs) have rapidly captured the attention of the world with their impressive capabilities. Unfortunately, they are often unpredictable and unreliable. This makes building a product based on their capabilities a unique challenge. Jignesh Patel is building DataChat to bring the capabilities of LLMs to organizational analytics, allowing anyone to have conversations with their business data. In this episode he shares the methods that he is using to build a product on top of this constantly shifting set of technologies. Announcements Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery. Your host is Tobias Macey and today I'm interviewing Jignesh Patel about working with LLMs; understanding how they work and how to build your own Interview Introduction How did you get involved in machine learning? Can you start by sharing some of the ways that you are working with LLMs currently? What are the business challenges involved in building a product on top of an LLM model that you don't own or control? In the current age of business, your data is often your strategic advantage. How do you avoid losing control of, or leaking that data while interfacing with a hosted LLM API? What are the technical difficulties related to using an LLM as a core element of a product when they are largely a black box? What are some strategies for gaining visibility into the inner workings or decision making rules for these models? What are the factors, whether technical or organizational, that might motivate you to build your own LLM for a business or product? Can you unpack what it means to "build your own" when it comes to an LLM? In your work at DataChat, how has the progression of sophistication in LLM technology impacted your own product strategy? What are the most interesting, innovative, or unexpected ways that you have seen LLMs/DataChat used? What are the most interesting, unexpected, or challenging lessons that you have learned while working with LLMs? When is an LLM the wrong choice? What do you have planned for the future of DataChat? Contact Info Website (https://jigneshpatel.org/) LinkedIn (https://www.linkedin.com/in/jigneshmpatel/) Parting Question From your perspective, what is the biggest barrier to adoption of machine learning today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast (https://www.dataengineeringpodcast.com) covers the latest on modern data management. Podcast.__init__ () covers the Python language, its community, and the innovative ways it is being used. Visit the site (https://www.themachinelearningpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com (mailto:hosts@themachinelearningpodcast.com)) with your story. To help other people find the show please leave a review on iTunes (https://podcasts.apple.com/us/podcast/the-machine-learning-podcast/id1626358243) and tell your friends and co-workers. Links DataChat (https://datachat.ai/) CMU == Carnegie Mellon University (https://www.cmu.edu/) SVM == Support Vector Machine (https://en.wikipedia.org/wiki/Support_vector_machine) Generative AI (https://en.wikipedia.org/wiki/Generative_artificial_intelligence) Genomics (https://en.wikipedia.org/wiki/Genomics) Proteomics (https://en.wikipedia.org/wiki/Proteomics) Parquet (https://parquet.apache.org/) OpenAI Codex (https://openai.com/blog/openai-codex) LLama (https://en.wikipedia.org/wiki/LLaMA) Mistral (https://mistral.ai/) Google Vertex (https://cloud.google.com/vertex-ai) Langchain (https://www.langchain.com/) Retrieval Augmented Generation (https://blogs.nvidia.com/blog/what-is-retrieval-augmented-generation/) Prompt Engineering (https://en.wikipedia.org/wiki/Prompt_engineering) Ensemble Learning (https://en.wikipedia.org/wiki/Ensemble_learning) XGBoost (https://xgboost.readthedocs.io/en/stable/) Catboost (https://catboost.ai/) Linear Regression (https://en.wikipedia.org/wiki/Linear_regression) COGS == Cost Of Goods Sold (https://www.investopedia.com/terms/c/cogs.asp) Bruce Schneier - AI And Trust (https://www.schneier.com/blog/archives/2023/12/ai-and-trust.html) The intro and outro music is from Hitman's Lovesong feat. Paola Graziano (https://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Tales_Of_A_Dead_Fish/Hitmans_Lovesong/) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/)/CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0/)

The Sweet Spot
#6 - Turning Data Into Wins w/ R.J. Michaels

The Sweet Spot

Play Episode Listen Later Feb 14, 2024 35:16


Jake and Mark sit down with R.J. Michaels (@rj_michaels), former Director of Analytics and Assistant Coach for the University of Rochester Baseball Team. R.J. transitioned from a player to a coach after his freshman year, and helped lead the team for several years. We focus on the data that R.J collected and how he communicated his insights to coaches and players to create more wins. (0:00) Intro (2:27) R.J.'s Baseball Career (4:31) Navigating the End of One's Playing Career(6:38) R.J.'s Role as Director of Analytics(9:17) Building Trust Through Success(18:13) The Simplicity and Utility of Linear Regression(23:05) Data Visualization (No Disrespect to Gerrit Cole)(24:14) From Data Driven Insights to Creating Change(25:59) Communication and Impact

Software Engineering Radio - The Podcast for Professional Software Developers
SE Radio 594: Sean Moriarity on Deep Learning with Elixir and Axon

Software Engineering Radio - The Podcast for Professional Software Developers

Play Episode Listen Later Dec 14, 2023 57:43


Sean Moriarity, creator of the Axon deep learning framework, co-creator of the Nx library, and author of Machine Learning in Elixir and Genetic Algorithms in Elixir, published by the Pragmatic Bookshelf, speaks with SE Radio host Gavin Henry about what deep learning (neural networks) means today. Using a practical example with deep learning for fraud detection, they explore what Axon is and why it was created. Moriarity describes why the Beam is ideal for machine learning, and why he dislikes the term “neural network.” They discuss the need for deep learning, its history, how it offers a good fit for many of today's complex problems, where it shines and when not to use it. Moriarity goes into depth on a range of topics, including how to get datasets in shape, supervised and unsupervised learning, feed-forward neural networks, Nx.serving, decision trees, gradient descent, linear regression, logistic regression, support vector machines, and random forests. The episode considers what a model looks like, what training is, labeling, classification, regression tasks, hardware resources needed, EXGBoost, Jax, PyIgnite, and Explorer. Finally, they look at what's involved in the ongoing lifecycle or operational side of Axon once a workflow is put into production, so you can safely back it all up and feed in new data. Brought to you by IEEE Computer Society and IEEE Software magazine. This episode sponsored by Miro.

The Nonlinear Library
LW - If influence functions are not approximating leave-one-out, how are they supposed to help? by Fabien Roger

The Nonlinear Library

Play Episode Listen Later Sep 22, 2023 5:55


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: If influence functions are not approximating leave-one-out, how are they supposed to help?, published by Fabien Roger on September 22, 2023 on LessWrong. Thanks to Roger Grosse for helping me understand his intuitions and hopes for influence functions. This post combines highlights from some influence function papers, some of Roger Grosse's intuitions (though he doesn't agree with everything I'm writing here), and some takes of mine. Influence functions are informally about some notion of influence of a training data point on the model's weights. But in practice, for neural networks, "influence functions" do not approximate well "what would happen if a training data point was removed". Then, what are influence functions about, and what can they be used for? From leave-one-out to influence functions Ideas from Bae 2022 (If influence functions are the answer, what is the question?). The leave-one-out function is the answer to "what would happen, in a network trained to its global minima, if one point was omitted": LOO(^x,^y)=argminθ1N∑(x,y)∼D-{(^x,^y)}L(fθ(x),y) Under some assumptions such as a strongly convex loss landscape, influence functions are cheap-to-compute approximation to leave-one-out function, thanks to the Implicit Function Theorem, which tells us that under those assumptions LOO(^x,^y)≈IF(^x,^y)def=θ∗+(∇2θJ(θ∗))-1∇θL(f(θ∗,^x),^y)/N But these assumptions don't hold for neural networks, and Basu 2020 shows that influence functions are a terrible approximation of leave-one-out in the context of neural networks, as shown in this figure from Bae 2022 (left is for Linear Regression, where the approximation hold, right is for MultiLayer-Perceptron, where it doesn't): Moreover, even the leave-one-out function is about parameters at convergence, which is not the regime most deep learning training runs operate in. Therefore, influence functions are even less about answering the question "what would happen if this point had more/less weight in the (incomplete) training run?". So every time you see someone introducing influence functions as an approximation of the effect of up/down-weighting training data points (as in this LW post about interpretability), remember that this does not apply when they are applied to neural networks. What are influence functions doing Bae 2022 shows that influence functions (not leave-one-out!) can be well approximated by the minimization of another training objective called PBRF, which is the sum of 3 terms: Ex∼D[L(fθ(x),fθt(x))], The loss function with the soft labels as computed by the studied function with weights after training θt: the new θ should not change the output of the function much. 1NL(fθ(^x),^y), The opposite of the loss function on the target point: the new θ should give a high loss on the considered data point λθ-θt2, A penalization of weights very different from the final training weights (Roger told me the specific value of λ didn't have a huge influence on the result.) This does not answer the often advertised question about leave-one-out, but this does answer something which looks related, and which happens to be much cheaper to compute than the leave-one-out function (which can only be computed by retraining the network and doesn't have cheaper approximations). Influence functions are currently among the few options to say anything about the intuitive "influence" of individual data points in large neural networks, which justifies why they are used. (Alternatives have roughly the same kind of challenges as influence functions.) Note: this explanation of what influence function are doing is not the only way to describe their behavior, and other works may shine new lights on what they are doing. What are influence functions useful for Current empirical evidence To this date, there has been almost no work externally ...

The Nonlinear Library: LessWrong
LW - If influence functions are not approximating leave-one-out, how are they supposed to help? by Fabien Roger

The Nonlinear Library: LessWrong

Play Episode Listen Later Sep 22, 2023 5:55


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: If influence functions are not approximating leave-one-out, how are they supposed to help?, published by Fabien Roger on September 22, 2023 on LessWrong. Thanks to Roger Grosse for helping me understand his intuitions and hopes for influence functions. This post combines highlights from some influence function papers, some of Roger Grosse's intuitions (though he doesn't agree with everything I'm writing here), and some takes of mine. Influence functions are informally about some notion of influence of a training data point on the model's weights. But in practice, for neural networks, "influence functions" do not approximate well "what would happen if a training data point was removed". Then, what are influence functions about, and what can they be used for? From leave-one-out to influence functions Ideas from Bae 2022 (If influence functions are the answer, what is the question?). The leave-one-out function is the answer to "what would happen, in a network trained to its global minima, if one point was omitted": LOO(^x,^y)=argminθ1N∑(x,y)∼D-{(^x,^y)}L(fθ(x),y) Under some assumptions such as a strongly convex loss landscape, influence functions are cheap-to-compute approximation to leave-one-out function, thanks to the Implicit Function Theorem, which tells us that under those assumptions LOO(^x,^y)≈IF(^x,^y)def=θ∗+(∇2θJ(θ∗))-1∇θL(f(θ∗,^x),^y)/N But these assumptions don't hold for neural networks, and Basu 2020 shows that influence functions are a terrible approximation of leave-one-out in the context of neural networks, as shown in this figure from Bae 2022 (left is for Linear Regression, where the approximation hold, right is for MultiLayer-Perceptron, where it doesn't): Moreover, even the leave-one-out function is about parameters at convergence, which is not the regime most deep learning training runs operate in. Therefore, influence functions are even less about answering the question "what would happen if this point had more/less weight in the (incomplete) training run?". So every time you see someone introducing influence functions as an approximation of the effect of up/down-weighting training data points (as in this LW post about interpretability), remember that this does not apply when they are applied to neural networks. What are influence functions doing Bae 2022 shows that influence functions (not leave-one-out!) can be well approximated by the minimization of another training objective called PBRF, which is the sum of 3 terms: Ex∼D[L(fθ(x),fθt(x))], The loss function with the soft labels as computed by the studied function with weights after training θt: the new θ should not change the output of the function much. 1NL(fθ(^x),^y), The opposite of the loss function on the target point: the new θ should give a high loss on the considered data point λθ-θt2, A penalization of weights very different from the final training weights (Roger told me the specific value of λ didn't have a huge influence on the result.) This does not answer the often advertised question about leave-one-out, but this does answer something which looks related, and which happens to be much cheaper to compute than the leave-one-out function (which can only be computed by retraining the network and doesn't have cheaper approximations). Influence functions are currently among the few options to say anything about the intuitive "influence" of individual data points in large neural networks, which justifies why they are used. (Alternatives have roughly the same kind of challenges as influence functions.) Note: this explanation of what influence function are doing is not the only way to describe their behavior, and other works may shine new lights on what they are doing. What are influence functions useful for Current empirical evidence To this date, there has been almost no work externally ...

Destination Devy Podcast
Predicting 2023 and 2022 QBs with Regression Analysis

Destination Devy Podcast

Play Episode Listen Later Jun 6, 2023 17:07


Jordan Backes (@JordanBackes33) takes a look at trying to predict how the 2023 Rookie QB class will perform their first year in the NFL through multiple regression models - Decision Trees and Linear Regression. JB also takes a look at the year 2 model for the 2022 QBs through the same methods. Which method is superior, how to these signal callers look going forward and who are some buys and sells? All Gas Newsletter - https://allgas.beehiiv.com/ Patreon - https://Patreon.com/AllGas Learn more about your ad choices. Visit megaphone.fm/adchoices

Destination Devy Podcast
Predicting 2023 and 2022 WRs with Regression Analysis

Destination Devy Podcast

Play Episode Listen Later May 16, 2023 22:45


Jordan Backes (@JordanBackes33) takes a look at trying to predict how the 2023 rookie WR class will perform their first year in the NFL through multiple regression models - Decision Trees and Linear Regression. JB also takes a look at the year 2 model for the 2022 WR class through the same methods. Which method is superior and how do these WRs project going forward? All Gas Newsletter - https://allgas.beehiiv.com/ Patreon - https://Patreon.com/AllGas Learn more about your ad choices. Visit megaphone.fm/adchoices

AI Today Podcast: Artificial Intelligence Insights, Experts, and Opinion
AI Today Podcast: AI Glossary Series – Regression and Linear Regression

AI Today Podcast: Artificial Intelligence Insights, Experts, and Opinion

Play Episode Listen Later Mar 29, 2023 7:46


Regression is a statistical and mathematical technique to find the relationship between two or more variables. In this episode of the AI Today podcast hosts Kathleen Walch and Ron Schmelzer define the terms Regression and Linear Regression and explain how they relate to AI and why it's important to know about them. Show Notes: FREE Intro to CPMAI mini course CPMAI Training and Certification AI Glossary AI Glossary Series – Machine Learning, Algorithm, Model Glossary Series: Machine Learning Approaches: Supervised Learning, Unsupervised Learning, Reinforcement Learning Glossary Series: Classification & Classifier, Binary Classifier, Multiclass Classifier, Decision Boundary Glossary Series: Clustering, Cluster Analysis, K-Means, Gaussian Mixture Model Continue reading AI Today Podcast: AI Glossary Series – Regression and Linear Regression at AI & Data Today.

The Machine Learning Podcast
Real-Time Machine Learning Has Entered The Realm Of The Possible

The Machine Learning Podcast

Play Episode Listen Later Mar 9, 2023 34:29


Summary Machine learning models have predominantly been built and updated in a batch modality. While this is operationally simpler, it doesn't always provide the best experience or capabilities for end users of the model. Tecton has been investing in the infrastructure and workflows that enable building and updating ML models with real-time data to allow you to react to real-world events as they happen. In this episode CTO Kevin Stumpf explores they benefits of real-time machine learning and the systems that are necessary to support the development and maintenance of those models. Announcements Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery. Your host is Tobias Macey and today I'm interviewing Kevin Stumpf about the challenges and promise of real-time ML applications Interview Introduction How did you get involved in machine learning? Can you describe what real-time ML is and some examples of where it might be applied? What are the operational and organizational requirements for being able to adopt real-time approaches for ML projects? What are some of the ways that real-time requirements influence the scale/scope/architecture of an ML model? What are some of the failure modes for real-time vs analytical or operational ML? Given the low latency between source/input data being generated or received and a prediction being generated, how does that influence susceptibility to e.g. data drift? Data quality and accuracy also become more critical. What are some of the validation strategies that teams need to consider as they move to real-time? What are the most interesting, innovative, or unexpected ways that you have seen real-time ML applied? What are the most interesting, unexpected, or challenging lessons that you have learned while working on real-time ML systems? When is real-time the wrong choice for ML? What do you have planned for the future of real-time support for ML in Tecton? Contact Info LinkedIn (https://www.linkedin.com/in/kevinstumpf/) @kevinmstumpf (https://twitter.com/kevinmstumpf?lang=en) on Twitter Parting Question From your perspective, what is the biggest barrier to adoption of machine learning today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast (https://www.dataengineeringpodcast.com) covers the latest on modern data management. Podcast.__init__ () covers the Python language, its community, and the innovative ways it is being used. Visit the site (https://www.themachinelearningpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com (mailto:hosts@themachinelearningpodcast.com)) with your story. To help other people find the show please leave a review on iTunes (https://podcasts.apple.com/us/podcast/the-machine-learning-podcast/id1626358243) and tell your friends and co-workers Links Tecton (https://www.tecton.ai/) Podcast Episode (https://www.themachinelearningpodcast.com/tecton-machine-learning-feature-platform-episode-6/) Data Engineering Podcast Episode (https://www.dataengineeringpodcast.com/tecton-mlops-feature-store-episode-166/) Uber Michelangelo (https://www.uber.com/blog/michelangelo-machine-learning-platform/) Reinforcement Learning (https://en.wikipedia.org/wiki/Reinforcement_learning) Online Learning (https://en.wikipedia.org/wiki/Online_machine_learning) Random Forest (https://en.wikipedia.org/wiki/Random_forest) ChatGPT (https://openai.com/blog/chatgpt) XGBoost (https://xgboost.ai/) Linear Regression (https://en.wikipedia.org/wiki/Linear_regression) Train-Serve Skew (https://ploomber.io/blog/train-serve-skew/) Flink (https://flink.apache.org/) Data Engineering Podcast Episode (https://www.dataengineeringpodcast.com/apache-flink-with-fabian-hueske-episode-57/) The intro and outro music is from Hitman's Lovesong feat. Paola Graziano (https://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Tales_Of_A_Dead_Fish/Hitmans_Lovesong/) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/)/CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0/)

The AI Frontier Podcast
#6 - The Rise of AI: A Journey Through the History of Deep Learning

The AI Frontier Podcast

Play Episode Listen Later Feb 26, 2023 12:40


In this episode of The AI Frontier, join us as we embark on a journey through the history of deep learning and artificial intelligence. From the earliest days of linear regression to the latest advancements in generative adversarial networks, we will explore the key moments and milestones that have shaped the development of this groundbreaking field. Learn about the pioneers and trailblazers who pushed the boundaries of what was possible, and discover how deep learning has revolutionized the way we think about and interact with technology. Get ready to delve deep into the history of AI!Support the Show.Keep AI insights flowing – become a supporter of the show!Click the link for details

AI Today Podcast: Artificial Intelligence Insights, Experts, and Opinion
AI Today Podcast: AI Glossary – Machine Learning Approaches: Supervised Learning, Unsupervised Learning, Reinforcement Learning

AI Today Podcast: Artificial Intelligence Insights, Experts, and Opinion

Play Episode Listen Later Feb 8, 2023 9:37


In this episode of the AI Today podcast hosts Kathleen Walch and Ron Schmelzer define terms related to Machine Learning Approaches including Supervised Learning, Unsupervised Learning, Reinforcement Learning and explain how they relate to AI and why it's important to know about them. Show Notes: FREE Intro to CPMAI mini course CPMAI Training and Certification AI Glossary Glossary Series: Artificial Intelligence AI Glossary Series – Machine Learning, Algorithm, Model Glossary Series: Probabilistic & Deterministic Glossary Series: Classification & Classifier, Binary Classifier, Multiclass Classifier, Decision Boundary Glossary Series: Regression, Linear Regression Glossary Series: Clustering, Cluster Analysis, K-Means, Gaussian Mixture Model Glossary Series: Goal-Driven Systems & Roboadvisor Understanding the Goal-Driven Systems Pattern of AI Continue reading AI Today Podcast: AI Glossary – Machine Learning Approaches: Supervised Learning, Unsupervised Learning, Reinforcement Learning at AI & Data Today.

AI Today Podcast: Artificial Intelligence Insights, Experts, and Opinion
AI Today Podcast: AI Glossary Series – Prediction, Inference, and Generalization

AI Today Podcast: Artificial Intelligence Insights, Experts, and Opinion

Play Episode Listen Later Jan 27, 2023 10:45


In this episode of the AI Today podcast hosts Kathleen Walch and Ron Schemlzer define and discuss at a high level the terms Prediction, Inference, and Generalization, why it's important to understand these terms, and how they fit into the overall picture of AI. Show Notes: FREE Intro to CPMAI mini course CPMAI Training and Certification AI Glossary Glossary Series: Machine Learning, Algorithm, Model Glossary Series: Classification & Classifier, Binary Classifier, Multiclass Classifier, Decision Boundary Glossary Series: Regression, Linear Regression Continue reading AI Today Podcast: AI Glossary Series – Prediction, Inference, and Generalization at Cognilytica.

Astro arXiv | all categories
Radiative Transfer as a Bayesian Linear Regression problem

Astro arXiv | all categories

Play Episode Listen Later Nov 24, 2022 0:49


Radiative Transfer as a Bayesian Linear Regression problem by Frederik De Ceuster et al. on Thursday 24 November Electromagnetic radiation plays a crucial role in various physical and chemical processes. Hence, almost all astrophysical simulations require some form of radiative transfer model. Despite many innovations in radiative transfer algorithms and their implementation, realistic radiative transfer models remain very computationally expensive, such that one often has to resort to approximate descriptions. The complexity of these models makes it difficult to assess the validity of any approximation and to quantify uncertainties on the model results. This impedes scientific rigour, in particular, when comparing models to observations, or when using their results as input for other models. We present a probabilistic numerical approach to address these issues by treating radiative transfer as a Bayesian linear regression problem. This allows us to model uncertainties on the input and output of the model with the variances of the associated probability distributions. Furthermore, this approach naturally allows us to create reduced-order radiative transfer models with a quantifiable accuracy. These are approximate solutions to exact radiative transfer models, in contrast to the exact solutions to approximate models that are often used. As a first demonstration, we derive a probabilistic version of the method of characteristics, a commonly-used technique to solve radiative transfer problems. arXiv: http://arxiv.org/abs/http://arxiv.org/abs/2211.12547v1

Astro arXiv | all categories
Radiative Transfer as a Bayesian Linear Regression problem

Astro arXiv | all categories

Play Episode Listen Later Nov 23, 2022 0:49


Radiative Transfer as a Bayesian Linear Regression problem by Frederik De Ceuster et al. on Wednesday 23 November Electromagnetic radiation plays a crucial role in various physical and chemical processes. Hence, almost all astrophysical simulations require some form of radiative transfer model. Despite many innovations in radiative transfer algorithms and their implementation, realistic radiative transfer models remain very computationally expensive, such that one often has to resort to approximate descriptions. The complexity of these models makes it difficult to assess the validity of any approximation and to quantify uncertainties on the model results. This impedes scientific rigour, in particular, when comparing models to observations, or when using their results as input for other models. We present a probabilistic numerical approach to address these issues by treating radiative transfer as a Bayesian linear regression problem. This allows us to model uncertainties on the input and output of the model with the variances of the associated probability distributions. Furthermore, this approach naturally allows us to create reduced-order radiative transfer models with a quantifiable accuracy. These are approximate solutions to exact radiative transfer models, in contrast to the exact solutions to approximate models that are often used. As a first demonstration, we derive a probabilistic version of the method of characteristics, a commonly-used technique to solve radiative transfer problems. arXiv: http://arxiv.org/abs/http://arxiv.org/abs/2211.12547v1

PaperPlayer biorxiv neuroscience
Mixed Models Quantify Annual Volume Change; Linear Regression Determines Thalamic Volume as the Best Subcortical Structure Volume Predictor in Alzheimer's Disease and Aging

PaperPlayer biorxiv neuroscience

Play Episode Listen Later Oct 31, 2022


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2022.10.29.514239v1?rss=1 Authors: Leger, C. S., Herbert, M., Stevens, W. D., DeSouza, J. F. Abstract: Background: Thalamus-hippocampus-putamen and thalamus-cerebellar interconnections are dense. The extent this connectivity is paralleled by each structure's volume impact on another is unquantified in Alzheimer's disease (AD). Mixed model quantification of annual volume change in AD is scarce and absent inclusive of the cerebellum, hippocampus, putamen and lateral ventricles and thalamus. Among these structures, autopsy evidence of early-stage AD seems largely but not entirely restricted to the hippocampus and thalamus. Objective: Variation in annual volume related to time and baseline age was assessed for the hippocampus, putamen, cerebellum, lateral ventricles and thalamus. Which subcortical structure's volume had the largest explanatory effect of volume variation in other subcortical structures was also determined. Method: The intraclass correlation coefficient was used to assess test-retest reliability of structure automated segmentation. Linear regression (N = 45) determined which structure's volume most impacted volume of other structures. Finally, mixed models (N = 36; 108 data points) quantified annual structure volume change from baseline to 24-months. Results: High test-retest reliability was indicated by a mean ICC score of .989 (SD = .012). Thalamic volume consistently had the greatest explanatory effect of hippocampal, putamen, cerebellar and lateral ventricular volume. The group variable proxy for AD significantly contributed to the best-fitting hippocampal linear regression model, hippocampal and thalamic longitudinal mixed models, and approached significance in the longitudinal lateral ventricular mixed model. Mixed models determined time (1 year) had a negative effect on hippocampal, cerebellar and thalamic volume, no effect on putamen volume, and a positive effect on lateral ventricular volume. Baseline age had a negative effect on hippocampal and thalamic volume, no effect on cerebellar or putamen volume and a positive effect on lateral ventricular volume. Interpretation: Linear regression determined thalamic volume as a virtual centralized index of hippocampal, cerebellar, putamen, and lateral ventricular volume. Relative to linear regression, longitudinal mixed models had greater sensitivity to detect contribution of early AD, or potential AD pathology (MCI), via the group variable not just to volume reduction in the hippocampus but also in the thalamus. Copy rights belong to original authors. Visit the link for more info Podcast created by Paper Player, LLC

I AM GPH
EP115 Student Ambassadors: The NYU GPH Journey

I AM GPH

Play Episode Listen Later Aug 29, 2022 35:31


In this episode, we chat with NYU GPH students from a variety of specialties. We learn about each individual's GPH journey and what it means to be a GPH student after one year into the program. This is an episode that goes deep into the mindset and motivations behind a true NYU GPH student. All our guests share their individual paths before coming to GPH alongside how NYU GPH has impacted their journey so far. They give us a glimpse into their future goals and dreams in the world of Public Health. Guests: Corrinne Tendo is a 2nd-year full-time online student in Global Health concentration. Relocating from the USA to Canada, the online MPH program has allowed Corrinne to pursue her degree at a world-class university with experienced faculty in different research fields. During Corrinne's first year at NYU, she has completed most of the required courses in addition to electives such as Writing Grants and Funding and Public Health Innovation and Entrepreneurship. Her interests are advocating, developing, and implementing policies that will help strengthen health care systems in vulnerable populations and Lower- and Middle-Income Countries. Although online, Corrinne has been able to participate in different programs at GPH. She is one of the co-directors of the Applied Global Public Health Initiative Lab (AGPHI) for the upcoming school year. She also is the Team Lead for a project focused on developing surveillance and control recommendations for a Neglected Tropical Disease known as Mycetoma in Ethiopia. In addition, as a member of the Climate-Related Displacement Team, Corrinne is working on developing a study protocol to analyze the climate-related challenges of food security and migration experienced by vulnerable populations in southern Madagascar. Mehak Paul is a General Physician (M.B.B.S.) from India and has just completed her first year of MPH with a concentration in Public Health Policy & Management at the School of Global Public Health, NYU. She was elected and recently re-elected as the Graduate Events Officer for GPH's Student Governing Council through which she organized many fun events like the Halloween Haunted House GPH Tour, Women's Day Book Exchange, Mental Health Wellness Workshop and the GPH's Public Health Week. During her first year of MPH, Mehak took Epidemiology, Biostatistics, Global Environmental Health, Public Health Policy and Value-based healthcare. Her objective is to move the healthcare system towards a more value-based and patient-centered delivery system. As part of her internship, she is working at NYC Health+ Hospitals/Elmhurst with CEO Pierre Pham and helping the Department of Patient Experience launch their 'Care Partner Program' to improve patient experience as well as the hospital's global standing. Rebecca Yu is a Master's student in the Biostatistics Department concentrating in Data Science at NYU School of Global Public Health. She has taken classes in Epidemiology, Biostatistics, Machine Learning, Linear Regression, and Statistical Programming in R. She has worked on projects that utilize deep learning techniques that identify melanoma lesion images from non-cancerous lesions, multivariate regression models to predict breast cancer remission odds by race, and linear regression models of lung cancer prevalence by the countries with the highest and lowest fine particulate matter. Rebecca graduated from Columbia University with a Bachelor's in Biology where she studied regenerative medicine under Dr. Chang-Hun Lee and had developed a passion for the opposing mechanism of controlled cell proliferation in cancer research. After graduating, she was a researcher in a pancreatic cancer clinical trial under Dr. Vinod Balachandran at Memorial Sloan-Kettering Cancer Center. When the pandemic hit, Rebecca was inspired to enter public health after being moved by the disparities emphasized by COVID-19 treatment. Currently, Rebecca is a researcher in the NIH “All of Us” project under Dr. Farzana Kapadia in understanding cancer disparities over time. In this initiative, Rebecca quantitatively isolates racial disparities in access to cancer screening which can, in turn, be used to provide information to develop interventions that promote timely cancer screening, testing and treatment. She is also the Graduate President of the Student Governing Council where she fosters a sense of community, promotes leadership, and professional development opportunities in public health. To learn more about the NYU School of Global Public Health, and how our innovative programs are training the next generation of public health leaders, visit publichealth.nyu.edu.

HD a Third Culture Kid
ep72: My June and Half of July in Review (Roe v Wade, machine learning, in-person meetings, apartment hunting in New York)

HD a Third Culture Kid

Play Episode Listen Later Jul 19, 2022 47:18


Roe v Wade was overturned and Twitter blew up with mutually exclusive pro-choice and pro-life reactions. My reaction didn't neatly fall into either camp. I'm also taking a Python machine learning course on Udemy and have completed the Linear Regression, Logistic Regression, and K Nearest Neighbors modules. I'm amazed at how accessible it is for us to gain this ability to become modern day prophets. My life has also included more in-person meetings and apartment hunting in New York. ------------------------------------------------ Don't forget to subscribe to the podcast and follow me on Instagram for more random thoughts, conversations, and experiences from a third culture kid that's finding his way around the world. A third culture kid is someone who grew up in a country that's different from the one his parents were raised in. I grew up in Brazil, was raised in a Korean household, and was educated in an American school - all at once.

MyPersonalFeed
02 - Linear Regression & Gradient Descent

MyPersonalFeed

Play Episode Listen Later Jul 13, 2022 78:16


02 - Linear Regression & Gradient Descent

ResearchPod
A parametric framework for multidimensional linear regression

ResearchPod

Play Episode Listen Later May 20, 2022 10:10 Transcription Available


Dr Stanley Luck of Vector Analytics LLC has developed a novel parametric framework for multidimensional linear regression, following collaborative research and developments involving the identification of beneficial agronomic variation in maize.Read more in Research Outreach.Read the original article here: https://doi.org/10.1371/journal.pone.0262148

People Analytics Deconstructed
What is Multiple Linear Regression?

People Analytics Deconstructed

Play Episode Listen Later May 13, 2022 31:03


Earlier in this season, we discussed a commonly used technique called simple linear regression. In this technique, we used one variable to predict an outcome. But, let's face it – life is a little bit more complex than just having one predictor and many times, organizations have lots of data that can be used to predict an outcome. In another technically focused episode, co-hosts Ron Landis and Jennifer Miller deconstruct multiple linear regression. They focus on using multiple predictors to predict a single criterion variable.    In this episode, we had conversations around the following multiple linear questions:  What is multiple linear regression?  What are some applications of multiple linear regression?  What are some of the ways in which models can be built using multiple linear regression?  What is mediation and moderation?  2 Key Takeaways on Multiple Linear RegressionMultiple linear regression uses multiple variables to predict an outcome (I.e., criterion) variable. The ultimate goal is to explain the variation in the criterion variable. One aspect to consider in this analysis is the relation between variables; that is, to what degree do the predictor variables correlate and how does that relation predict the outcome variable. Depending on the relation between predictors, either partial or full redundancy might be present. Ron and Jennifer discussed three questions that can be asked using multiple linear regression. First, you can assess the effects of particular predictors while controlling for others. Second, you can compare different sets of variables to find the most efficient model. Third, you can test for moderation and mediation.    Related Links  Millan Chicago What is Linear Regression?   

Andrea Unger Academy - EN
210 Linear Regression: Indicator Explained + How To Exploit It in a Trading System

Andrea Unger Academy - EN

Play Episode Listen Later Jan 25, 2022 8:37


Tech Stories
EP-19 No Regrets of Linear Thinking in Non Linear ML World

Tech Stories

Play Episode Listen Later Dec 12, 2021 6:59


In this episode I tried to explain the term linear, regression it's types with the help of the physical interpretation using Linear thinking in this non linear world Regression is a measure of the relation between the mean value of one variable (e.g. output) and corresponding values of other variables (e.g. time and cost). Linear Regression is a supervised machine learning algorithm where the predicted output is continuous and has a constant slope. It's used to predict values within a continuous range, (e.g. sales, price) rather than trying to classify them into categories (e.g. cat, dog). Listen the episode on all podcast platform and share your feedback as comments here Do check the episode on various platform follow me on instagram https://www.instagram.com/podcasteramit Apple https://podcasts.apple.com/us/podcast/id1544510362 Huhopper Platform https://hubhopper.com/podcast/tech-stories/318515 Amazon https://music.amazon.com/podcasts/2fdb5c45-2016-459e-ba6a-3cbae5a1fa4d Spotify https://open.spotify.com/show/2GhCrAjQuVMFYBq8GbLbwa

intuitions behind Data Science
Linear Regression

intuitions behind Data Science

Play Episode Listen Later Nov 19, 2021 Very Popular


Guessing the recipe of data!

Python Bytes
#238 A cloud-based file system for Python and a new GUI!

Python Bytes

Play Episode Listen Later Jun 15, 2021 47:07


Watch the live stream: Watch on YouTube About the show Sponsored by Sentry: Sign up at pythonbytes.fm/sentry And please, when signing up, click Got a promo code? Redeem and enter PYTHONBYTES Special guest: Julia Signell Brain #1: Practical SQL for Data Analysis Haki Benita Pandas is awesome, but … “In this article I demonstrate how to use SQL to perform fast and efficient data analysis.” First part of the article. SQL is faster than Pandas But they are great together Then tons of examples showing exactly how to best use SQL queries and Pandas in data analysis:: Basics including random data and sampling Descriptive statistics Subtotals including rollup and groupign sets Pivot tables, both conditional expressions and aggregate expressions Running and cumulative agregation Linear Regression Interpolation Super cheat sheet for useful SQL queries Michael #2: Git Blame in your Python Tracebacks via Ruslan Portnoy, by Ofer Koren Helpful Modules: traceback & linecache traceback uses linecache, and we can change linecache line's text They create a git blame bit of functionality to add to line's source Turns out this flows to things like PDB. Ripe for a proper package we can add to requirements-dev.txt Julia #3: fsspec: a unified file system library Martin Durant Other libraries conform to the interface so that each part of the analysis pipeline is like an interchangeable building block (for example s3fs, gcsfs) With the cloud providers competing to host data, fsspec makes it easy to swap out the read layer so that you can hop clouds. Brian #4: The need for slimmer containers or I'm even more confused now as to the usefulness of official base images on Docker Hub Ivan Velichko @iximiuz I read this article recently and it had me concerned. Then just yesterday read it again and there are some updates. I'm still concerned, but now also confused. So let's run it down. docker scan can be run on official Python images. It uses Snyk Container. We talked about one form of Snyk on Episode 227. Spoiler, all of the official Python containers have vulnerabilities except alpine. But. In an update, the author says that Alpine has a bunch of problems. The update includes some discussion on Hacker News vulnerability scanners tend to have lots of false positives official base images are rarely updated some people suggest adding an upgrade command in the beginning of every Dockerfile. but others object saying that the practice leads to unrepeatable builds So, I'm left with wondering if using official Python images are even worth it. Michael: Python's official image on docker hub Michael: PEP 656 -- Platform Tag for Linux Distributions Using Musl Michael: We dive a lot into this in our latest Talk Python recording (not out yet, but live stream is available) Some stats: Ubuntu: Found 32 vulnerabilities, 31 with upgrade. python:latest: Found 364 vulnerabilities, 353 with upgrade Ubuntu with source Python: 35 total, 28 low, 7 medium, several from intermediate tools such as wget, gcc, etc. Removing many dev tools SHOULD lower the count, but doesn't (e.g. wget, gcc) Switching from python:3-9 to python:3.9-slim-buster dropped the issues to 69. Michael #5: PandasGUI: A GUI for analyzing Pandas DataFrames Features View DataFrames and Series (with MultiIndex support) Interactive plotting Filtering Statistics summary Data editing and copy / paste Import CSV files with drag & drop Search toolbar Best way to see what it's about is to watch the video. Julia #6: xarray: pandas-like API for labeled N-dimensional data We've been talking a lot about the pandas API and how it's a common target for dataframe libraries. Xarray is not a dataframe library, it's for labeled N-dimensional data. People use it in geosciences, and in image processing where they don't have tabular data, but the axes mean something (lat, lon, time, band…) You can select, aggregate, resample, using the real dimension labels. It can be backed with dask arrays or numpy arrays (or other types of arrays). It supports plotting with .plot Extras Michael Python 3.10.0b2 is available (even windows store) Django security releases issued: 3.2.4, 3.1.12, and 2.2.24 Another method overloading library? Recently moved to pip-compile requirements.in style after last week I'm running PyCharm EAP Brian Someone responded to me the other day on twitter with an emoji that I was not clear on the meaning of. So I looked it up on emojipedia.org. Super useful for occasionally out of touch people like myself. pytestbook.com (redirects to pythontest.com/pytest-book/) has a facelift and a new home, to get ready for an announcement later this week. It's built on markdown, hugo, github, and Netlify, so changes can be done super quick with just a commit and push. I just needed a nice readable theme, and Pradyun's blog looked great, so I copied his choices. The blog will eventually also have writing, the legacy posts worth keeping from pythontesting.net, and probably transcripts from Test & Code. Julia GH CLI entrypoints - they are so cool! Example - with pandas you can plot with different backends not just matplotlib and the logic for those backends is contained in the plotting libraries not pandas. Joke From https://upjoke.com/programmer-jokes I asked a programmer what her New Year's resolution will be. She answered: 1920x1080. How does a programmer confuse a mathematician? x = x + 1 Why do Python programmers have low self esteem? They're constantly comparing their self to other.

Chasing Capital
5. Streamlined Ventures's Nicky Kamra: Startup vs Big Finance after College, Is It Okay if It's just Linear Regression, and the Importance of Radical Candor

Chasing Capital

Play Episode Listen Later May 3, 2021 33:25


In this fifth episode of Chasing Capital, I chat with Streamlined Ventures's Nicky Kamra on a wide variety of topics like the importance of real ML insight vs good marketing, choosing a job that aligns with your natural abilities and enjoyments, and the benefits of joining tech right after college vs traditional Finance & co.

Around IT in 256 seconds
#30: Linear Regression

Around IT in 256 seconds

Play Episode Listen Later Jan 18, 2021 4:16


Linear regression is one of the simplest machine learning algorithms. But also quite useful. It takes a bunch of existing, known observations and tries to predict how new observations will look like. Think about forecasting or finding trends. It says “linear” because the algorithm essentially finds a straight line that most closely follows the observations. OK, let's take a concrete example. Imagine you are selling your apartment. What is the right price for it? Well, you compare it to similar apartments in your neighborhood. If someone sells the exact same flat across the street, your price should be very similar. If another flat is sold, but 10% larger, expect its price to be 10% higher as well. Yet another flat is half the size of yours. So expect its price to be just 50% of your estimated asking price. Sounds reasonable? Read more: https://256.nurkiewicz.com/30 Get the new episode straight to your mailbox: https://256.nurkiewicz.com/newsletter

Applicable Science Podcast
Episode 7 | Linear Regression

Applicable Science Podcast

Play Episode Listen Later Dec 2, 2020 43:29


On this episode, the co-hosts talk about how they are breaking the mold and creating their own trendlines through the lens of Linear Regression.

PaperPlayer biorxiv bioinformatics
BOSO: a novel feature selection algorithm for linear regression with high-dimensional data

PaperPlayer biorxiv bioinformatics

Play Episode Listen Later Nov 20, 2020


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.11.18.388579v1?rss=1 Authors: Valcarcel, L. V., San Jose-Eneriz, E., Cendoya, X., Rubio, A., Agirre, X., Prosper, F., Planes, F. J. Abstract: Motivation: With the frenetic growth of high-dimensional datasets in different biomedical domains, there is an urgent need to develop predictive methods able to deal with this complexity. Feature selection is a relevant strategy in machine learning to address this challenge. Results: We introduce a novel feature selection algorithm for linear regression called BOSO (Bilevel Optimization Selector Operator). We conducted a benchmark of BOSO with key algorithms in the literature, finding a superior performance in high-dimensional datasets. Proof-of-concept of BOSO for predicting drug sensitivity in cancer is presented. A detailed analysis is carried out for methotrexate, a well-studied drug targeting cancer metabolism. Copy rights belong to original authors. Visit the link for more info

PaperPlayer biorxiv bioinformatics
A framework to efficiently smooth L1 penalties for linear regression

PaperPlayer biorxiv bioinformatics

Play Episode Listen Later Sep 19, 2020


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.09.17.301788v1?rss=1 Authors: Hahn, G., Lutz, S. M., Laha, N., Lange, C. Abstract: Penalized linear regression approaches that include an L1 term have become an important tool in day-to-day statistical data analysis. One prominent example is the least absolute shrinkage and selection operator (Lasso), though the class of L1 penalized regression operators also includes the fused and graphical Lasso, the elastic net, etc. Although the L1 penalty makes their objective function convex, it is not differentiable everywhere, motivating the development of proximal gradient algorithms such as Fista, the current gold standard in the literature. In this work, we take a different approach based on smoothing. The methodological contribution of our article is threefold: (1) We introduce a unified framework to compute closed-form smooth surrogates of a whole class of L1 penalized regression problems using Nesterov smoothing. The surrogates preserve the convexity of the original (unsmoothed) objective functions, are uniformly close to them, and have closed-form derivatives everywhere for efficient minimization via gradient descent; (2) We prove that the estimates obtained with the smooth surrogates can be made arbitrarily close to the ones of the original (unsmoothed) objective functions, and provide explicitly computable bounds on the accuracy of our estimates; (3) We propose an iterative algorithm to progressively smooth the L1 penalty which increases accuracy and is virtually free of tuning parameters. The proposed methodology is applicable to a large class of L1 penalized regression operators, including all the operators mentioned above. Using simulation studies, we compare our framework to current gold standards such as Fista, glmnet, gLasso, etc. Our simulation results suggest that our proposed smoothing framework provides estimates of equal or higher accuracy than the gold standards while keeping the aforementioned theoretical guarantees and having roughly the same asymptotic runtime scaling. Copy rights belong to original authors. Visit the link for more info

The Guiding Voice
ROADMAP to Machine Learning (ML) | Yaswanth Sai Palaghat (Tech Evangelist) | TGV Episode #27

The Guiding Voice

Play Episode Listen Later Aug 27, 2020 28:53


In this episode #27, the hosts Naveen Samala & Sudhakar Nagandla have interacted with another guest Yaswanth. Yaswanth Sai Palaghat is currently working as a software engineer at a product based company in Hyderabad. A machine learning Evangelist with a major content delivery focus on machine learning. Owns a python library developed during his engineering.  Active GitHub user with more than 150 real-time projects available in GitHub. He considers himself as a Techie, loves exploring technologies and teaching the same in his community. He is an Udemy instructor with active courses. Yashwant is a YouTuber with a well-strengthened YouTube channel with the motto of delivering tech content and will always try to motivate engineering students with his videos. A content developer with active tech content delivery in all forms(text, graphics, videos).  A blogger and runs an active blog titled "TECHIE EMPIRE" which aims to help and motivate the engineering graduates to choose the right path on technologies. He is very active in social media with more than 20k connections all over his social profiles and pages in LinkedIn, Instagram and YouTube. Coming to his education, Yaswanth graduated from Narayana Engineering College in the stream of Computer Science and Engineering in 2019 in which he was a college topper.  Apart from tech life, he is interested in film making where he is into story writing, editing, direction and a protagonist too. He Owns a YouTube channel named "THE FUN BITE" with more than 10 short films. Listen to Yaswanth's guidance on: How to Explain Machine Learning to your grandparents? How ML can help in solving real world problems? How to start leaning ML? Resources for learning ML Is it important to have mathematical background for ML? ML vs Deep Learning – Basic differences How ML is related to AI & Data Science? Yaswanth's LinkedIn profile: https://www.linkedin.com/in/yaswanthpalaghat/ Resources shared by Yaswanth: 1. Linear Algebra & Statistics: Linear Algebra for ML :https://youtu.be/1VSZtNYMntM Statistics for ML : https://youtu.be/hjZJIVWHnPE 2. Python Programming:    Python for Absolute Beginners: https://www.udemy.com/course/pythonforabsolutebeginners/?referralCode=E2DBB8598710151C2783 3. Data Analytics with Python Libraries(Numpy, Pandas, Matplotlib):     Data Analytics A-Z with Python: https://www.udemy.com/course/dataanalyticsa-zwithpython/?referralCode=AB085EE2CA864CB69FD7 4. Data Cleaning:    https://youtu.be/2HzsTwHL7H0 5.Project-I     Data Analytics on Iris Flowers Dataset : https://github.com/yaswanthpalaghat/Pandas-implementation-and-workflow-on-iris-flowers-dataset   6.Computer Vision with OpenCV:    Understanding OpenCV : https://opencv.org/  7.Project-II    Building a Face Detection and Recognition Model : https://www.udemy.com/course/building-a-face-detection-and-recognition-model-from-scratch/?referralCode=28F5323F045A89665F90 8.Machine Learning with Scikit-Learn: https://youtu.be/pqNCD_5r0IU 9.Hands-on with ML Algorithms: https://youtu.be/RnFGwxJwx-0 10.Solving ML Challenges:   https://www.kaggle.com/learn/microchallenges 11.Project-III: Linear Regression on Diabetes DataSet :    https://github.com/yaswanthpalaghat/Linear-Regression-on-Diabetes-data-set-to-find-out-the-predicted-data-set 12. Natural Language Processing and Text analytics:  https://youtu.be/OQmDhwhj78Y 13.Project-IV      Sentiment Analysis using NLP : https://github.com/yaswanthpalaghat/NLP-Sentiment-analysis-using-Machine-Learning-and-flask Enjoy the episode! Do not forget to share your suggestions or feedback at theguidingvoice4u@gmail.com or by messaging at +91 9494 587 187  Subscribe to our YouTube Channel: https://www.youtube.com/c/TheGuidingVoice Also, follow The Guiding Voice on Social Media: LinkedIn: https://www.linkedin.com/company/theguidingvoice Facebook: http://facebook.com/theguidingvoice4u Twitter: http://twitter.com/guidingvoice Instagram: https://www.instagram.com/theguidingvoice4u/ Pinterest: https://in.pinterest.com/theguidingvoice4u/pins/  #growth #data #ML #Machinelearning #supervisedlearning #unsupervisedlearning #deeplearning  #Neuralnetwork #ANN #datascience #datascientist #machinelearning #AI #AIbasics #technologies #kdnuggets #career #jobs #careerguidance #mentorship #careerpath #progression #management #leadership #crisis #job #midcareer #youngprofessionals #careergraph #TGV #theguidingvoice #kaggle #analyticsvidya #udemy #insofe #coursera #opensource #transformation    

Driveline R&D Podcast
Biomechanics Conference | When to Use Linear Regression | R&D Content | Weighted Ball Article Review | Driveline R&D Ep 23

Driveline R&D Podcast

Play Episode Listen Later Jul 20, 2020 68:39


Episode 23 of the Driveline R&D (Research and Drinks) Podcast! With Anthony Brady, Alex Caravan, and Kyle Lindley. Timestamps: 9:15 ABBS Conference 22:25 R&D Content Creation 30:05 Cara Chats - When to Use Linear Regression 44:55 Weighted Ball Study Review The episode begins with a few updates and announcements, and then we quickly get into the American Baseball Biomechanics Society (ABBS) Conference that happened last week. Kyle discusses some new R&D content initiatives to help showcase R&D output better, Caravan talks about when it's appropriate to use linear regression, and Anthony reviews a new weighted ball research study that was recently published. Link to Cara chats segment on YouTube: https://youtu.be/2IFbU8URzNk Link to weighted ball article review on YouTube: https://youtu.be/zd0GmIdH_vA Link to the study we discussed: https://doi.org/10.1177%2F1941738120925728 Documentation for the regression function, loess(), that Caravan talks about when referring to locally weighted regression: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/loess Link to the hosts' twitters: Anthony Brady: https://twitter.com/BaseballFreak_9 Alex Caravan: https://twitter.com/Alex_Caravan Kyle Lindley: https://twitter.com/kylelindley_

MSCI2800 Online Business Analytics Undergraduate
T09 Excel: Correlation and Linear Regression.

MSCI2800 Online Business Analytics Undergraduate

Play Episode Listen Later Jul 7, 2020 5:04


Dr. Jerz shows how to use his Excel model for correlation and linear regression.

MSCI2800 Online Business Analytics Undergraduate
T09 Lecture: Correlation and Linear Regression.

MSCI2800 Online Business Analytics Undergraduate

Play Episode Listen Later Jul 7, 2020 17:35


Dr. Jerz's lecture on correlation and linear regression analysis.

MBA8150 Business Analytics
WK08 Lecture: Ch13, Correlation and Linear Regression.

MBA8150 Business Analytics

Play Episode Listen Later Jun 30, 2020 17:35


Dr. Jerz's lecture on correlation and linear regression analysis.

MBA8150 Business Analytics
WK08 Excel: Ch13, Correlation and Linear Regression.

MBA8150 Business Analytics

Play Episode Listen Later Jun 30, 2020 5:04


Dr. Jerz shows how to use his Excel model for correlation and linear regression.

Significant Statistics
Simple Linear Regression

Significant Statistics

Play Episode Listen Later Apr 24, 2020 26:42


Audio Only Version of Simple Linear Regression Concept Video For More info: https://blogs.lt.vt.edu/jmrussell/topic --- Support this podcast: https://anchor.fm/john-russell10/support

Machine Learning with Coffee
08 Linear Regression: The Return of the Queen

Machine Learning with Coffee

Play Episode Listen Later Apr 4, 2020 21:34


In this episode I will try to convince you that Linear Regression is one of the most powerful Machine Learning algorithms. We will talk about common misconceptions, especially that Linear Regression is not able to model non-linear relationships. We also discuss how the myth of normality encourages many people to completely discard Linear Regression on non-normal data, when in reality, normality of the data has nothing to do with this assumption. Finally, I provide advice in how to check, but most importantly, how to fix any violated assumption in Linear Regression.

Python Podcast
Corona aus der Data-Science Perspektive

Python Podcast

Play Episode Listen Later Mar 29, 2020 96:21


Diesmal unterhalten uns mit Thomas Wiecki über Quantitative Finance, Probabilistic Programming und die Corona-Pandemie. Thomas hat übrigens einen eigenen Podcast namens PyData Deep Dive den wir wärmstens empfehlen  können. So ab Minute 36 wird das mit den Audio-Knacksern übrigens auch besser :). Shownotes Unsere E-Mail für Fragen, Anregungen & Kommentare: hallo@python-podcast.de Quantitative Finance Quantopian Backtesting Quantopian auf github zipline (backtesting library) Linear Regression statsmodels ARIMA Probabilistic Programming pymc Markov chain Monte Carlo The Best Of Both Worlds: Hierarchical Linear Regression in PyMC3 Bayesian Statistics COVID-19 Thomas covid-19 repository Some covid19 dashboards Compartmental models in epidemiology Student's t-distribution Using epidemiological models is like counting spoons  tweet by @nntaleb "Thousands of lines of undocumented c code" tweet by @neil_ferguson Johns Hopkins Daten aus WHO Pdfs European Centre for Disease Prevention and Control Our world in data (coronavirus) Hackathon Coronavirus COVID19 Global Forecasting Kaggle-Competition COVID-19 Open Research Dataset Challenge Kaggle-Competition CCC Cert Informationssammlung CERT Bulletin Datasette Datasette Query für Italien auf der covid-19 datasette Glitch Öffentliches Tag auf konektom

MSCI2800 Online Business Analytics Undergraduate
T09 Lecture: Correlation and Linear Regression.

MSCI2800 Online Business Analytics Undergraduate

Play Episode Listen Later Mar 10, 2020 17:35


Dr. Jerz's lecture on correlation and linear regression.Transcript (srt) file: Correlation_and_Regression_Analysis

MSCI2800 Online Business Analytics Undergraduate
T09 Excel: Correlation and Linear Regression.

MSCI2800 Online Business Analytics Undergraduate

Play Episode Listen Later Mar 10, 2020 5:04


Dr. Jerz shows how to use his Excel model for correlation and linear regression.Transcript (srt) file: Excel-Correlation_Regression

Statistics for the Social Sciences
21_Simple Linear Regression (Part 2 of 2)

Statistics for the Social Sciences

Play Episode Listen Later Jan 21, 2020 61:38


Simple Linear Regression (Part 2 of 2) --- Send in a voice message: https://anchor.fm/statistics/message

Statistics for the Social Sciences
20_Simple Linear Regression (Part 1 of 2)

Statistics for the Social Sciences

Play Episode Listen Later Jan 21, 2020 68:55


Simple Linear Regression (Part 1 of 2) --- Send in a voice message: https://anchor.fm/statistics/message

Machine learning
Random forest and linear regression and gradient boost in python

Machine learning

Play Episode Listen Later Aug 22, 2019 18:28


Thoughts on these three classifieds

Tech Podcast's - Data Science, AI, Machine Learning(BEPEC)
Purpose of R-Squared, Adjusted R- Squared & Predicted R -Squared

Tech Podcast's - Data Science, AI, Machine Learning(BEPEC)

Play Episode Listen Later Jan 5, 2019 9:19


What is the major purpose to check R-Squared in Regression Models.

Business Statistics - Undergraduate
Excel: Correlation and Linear Regression.

Business Statistics - Undergraduate

Play Episode Listen Later Oct 11, 2018 5:04


Dr. Jerz shows how to use his Excel model for correlation and linear regression.

Business Statistics - Undergraduate
Correlation and Linear Regression.

Business Statistics - Undergraduate

Play Episode Listen Later Oct 11, 2018 17:35


Dr. Jerz's lecture on correlation and linear regression.

RWpod - подкаст про мир Ruby и Web технологии
16 выпуск 06 сезона. Ruby 2.5 enables Thread.report_on_exception, wasm-pack, Action Policy, Package Phobia, Pico.js и прочее

RWpod - подкаст про мир Ruby и Web технологии

Play Episode Listen Later Apr 22, 2018 41:18


Добрый день уважаемые слушатели. Представляем новый выпуск подкаста RWpod. В этом выпуске: Ruby Ruby 2.5 enables Thread.report_on_exception by default and we get more info when threads die, Ruby's New Infinite Range Syntax: (0..) и Rails 5.2 added Date#prev_occurring and Date#next_occurring to return specified next & previous occurring day of week Linear Regression in Ruby, Working Effectively with Data Factories Using FactoryBot и Handling CORS issues in your Rails API HOWTO execute Python code with Ruby, Action Policy - an authorization framework for Ruby and Rails applications и Xf - Transform Functions JavaScript Hello wasm-pack!, Designing very large (JavaScript) applications и What You Can Expect From Node.js 10 in webpack, Choosing a Responsive Email Framework: MJML vs. Foundation for Emails и A Look at CSS Viewport Units Package Phobia - find the cost of adding a new dev dependency to your project, Pico.js: face detection in JavaScript, React Lazy Images - components and utilities for lazy image loading in React и Slugify - slugify a string Conferences RubyC RUBY Meditation #22

[BST] Business Statistics II
[BST01A-ID] Correlation and Linear Regression

[BST] Business Statistics II

Play Episode Listen Later Apr 18, 2018 13:06


Credit title: Subject Matter Expert : 1. Adi Teguh Suprapto, S.E., MM 2. Mulyono, S.E., M.M. Dokumenter: Binus University Uploaded by: Knowledge Management and Innovation Binus University

Stock Market Mentor Chart of the Day
Here's how you can use Linear Regression to clarify a pattern. Check out this throwback on JD.com (JD) (December 20, 2017)

Stock Market Mentor Chart of the Day

Play Episode Listen Later Dec 20, 2017


Data Podcast
Alison Cossette (@alison_cossette): Discusses her journey to Data Science

Data Podcast

Play Episode Listen Later May 25, 2017 25:06


Officially, Alison Cossette is a data analyst for the University of Vermont Medical Center. Unofficially, she says proudly, "I'm the resident data nerd." Interviewer: Rajib Bahar, Shabnam Khan Agenda: - In Twitter, your motto is "Numbers are the best story tellers", why do you say that? - Terms such as Linear Regression, Logistics regression may sound scary... is it? How did you implement them when you faced prediction challenge like that? - What do you appreciate between R & python languages (if you know only one, then talk about that one only)? - AI field keeps evolving... how do you define, AI, Machine Learning, Deep Learning, Reinforced Learning? - How do you build network in the DataScience community online or offline? - Any social media presence in Twitter, LinkedIn? Music: www.freesfx.co.uk

Machine Learning Guide
007 Logistic Regression

Machine Learning Guide

Play Episode Listen Later Feb 18, 2017 34:20


Your first classifier: Logistic Regression. That plus Linear Regression, and you're a 101 supervised learner! ocdevel.com/mlg/7 for notes and resources

Machine Learning Guide
005 Linear Regression

Machine Learning Guide

Play Episode Listen Later Feb 15, 2017 33:40


Introduction to the first machine-learning algorithm, the 'hello world' of supervised learning - Linear Regression ocdevel.com/mlg/5 for notes and resources

Learning Machines 101
LM101-050: How to Use Linear Machine Learning Software to Make Predictions (Linear Regression Software)[RERUN]

Learning Machines 101

Play Episode Listen Later May 3, 2016 30:32


In this episode we will explain how to download and use free machine learning software from the website: www.learningmachines101.com. This podcast is concerned with the very practical issues associated with downloading and installing machine learning software on your computer. If you follow these instructions, by the end of this episode you will have installed one of the simplest (yet most widely used) machine learning algorithms on your computer. You can then use the software to make virtually any kind of prediction you like. Also follow us on twitter at: lm101talk  

Using R for Statistical Analysis
Linear regression in R (STA80006 Using R for Statistical Analysis)

Using R for Statistical Analysis

Play Episode Listen Later Sep 16, 2015 5:08


This video demonstrates how to do simple linear regression in the R statistical software. Video originally created for STA80006 Using R for Statistical Analysis.

Learning Machines 101
LM101-033: How to Use Linear Machine Learning Software to Make Predictions (Linear Regression Software)[RERUN]

Learning Machines 101

Play Episode Listen Later Aug 10, 2015 31:28


In this episode will explain how to download and use free machine learning software which can be downloaded from the website: www.learningmachines101.com. The software can be used to make predictions using your own data sets. Although we will continue to focus on critical theoretical concepts in machine learning in future episodes, it is always useful to actually experience how these concepts work in practice.This is a rerun of Episode 13.

Introduction to Computer Science and Programming
Lecture 21: Validating simulation results, curve fitting, linear regression

Introduction to Computer Science and Programming

Play Episode Listen Later Apr 8, 2015 53:47


Learning Machines 101
LM101-013: How to Use Linear Machine Learning Software to Make Predictions (Linear Regression Software)

Learning Machines 101

Play Episode Listen Later Sep 22, 2014 30:35


Hello everyone! Welcome to the thirteenth podcast in the podcast series Learning Machines 101. In this series of podcasts my goal is to discuss important concepts of artificial intelligence and machine learning in hopefully an entertaining and educational manner.     In this episode we will explain how to download and use free machine learning software which can be downloaded from the website: www.learningmachines101.com. Although we will continue to focus on critical theoretical concepts in machine learning in future episodes, it is always useful to actually experience how these concepts work in practice. For these reasons, from time to time I will include special podcasts like this one which focus on very practical issues associated with downloading and installing machine learning software on your computer. If you follow these instructions, by the end of this episode you will have installed one of the simplest (yet most widely used) machine learning algorithms on your computer. You can then use the software to make virtually any kind of prediction you like. However, some of these predictions will be good predictions, while other predictions will be poor predictions. For this reason, following the discussion in Episode 12 which was concerned with the problem of evaluating generalization performance, we will also discuss how to evaluate what your learning machine has “memorized” and additionally evaluate the ability of your learning machine to “generalize” and make predictions about things that it has never seen before.

PSYC355 - Statistics for Psychology
Bivariate Linear Regression

PSYC355 - Statistics for Psychology

Play Episode Listen Later Dec 5, 2011 18:40


Calculator Tutorials
Calculating a Linear Regression

Calculator Tutorials

Play Episode Listen Later Jun 1, 2011 5:54


MIS Grade 9 Haley MYP & IGCSE Math
Unit 7 Day 4 - Linear Regression

MIS Grade 9 Haley MYP & IGCSE Math

Play Episode Listen Later May 18, 2009


Mr. Haley shows how to use the graphing calculator to find linear regressions on data sets.

Saint Mary's College MBA Podcasts
505 - Multiple Linear Regression Model

Saint Mary's College MBA Podcasts

Play Episode Listen Later Nov 19, 2007 6:57


Operations Management - Beam

Stats Data and Models
Chapter 8 (Paul Velleman)

Stats Data and Models

Play Episode Listen Later Dec 18, 2006


Linear Regression

Mathematik, Informatik und Statistik - Open Access LMU - Teil 02/03
The Effect of Single-Axis Sorting on the Estimation of a Linear Regression

Mathematik, Informatik und Statistik - Open Access LMU - Teil 02/03

Play Episode Listen Later Jan 1, 2006


Microaggregation is one of the most important statistical disclosure control techniques for continuous data. The basic principle of microaggregation is to group the observations in a data set and to replace them by their corresponding group means. In this paper, we consider single-axis sorting, a frequently applied microaggregation technique where the formation of groups depends on the magnitude of a sorting variable related to the variables in the data set. The paper deals with the impact of this technique on a linear model in continuous variables. We show that parameter estimates are asymptotically biased if the sorting variable depends on the response variable of the linear model. Using this result, we develop a consistent estimator that removes the aggregation bias. Moreover, we derive the asymptotic covariance matrix of the corrected least squares estimator.

Mathematik, Informatik und Statistik - Open Access LMU - Teil 02/03
Validating linear restrictions in linear regression models with general error structure

Mathematik, Informatik und Statistik - Open Access LMU - Teil 02/03

Play Episode Listen Later Jan 1, 2006


A new method for testing linear restrictions in linear regression models is suggested. It allows to validate the linear restriction, up to a specified approximation error and with a specified error probability. The test relies on asymptotic normality of the test statistic, and therefore normality of the errors in the regression model is not required. In a simulation study the performance of the suggested method for model selection purposes, as compared to standard model selection criteria and the t-test, is examined. As an illustration we analyze the US college spending data from 1994.

Mathematik, Informatik und Statistik - Open Access LMU - Teil 02/03
Performance of Double k-class Estimators for Coefficients in Linear Regression Models with Non Spherical Disturbances under Asymmetric Losses

Mathematik, Informatik und Statistik - Open Access LMU - Teil 02/03

Play Episode Listen Later Jan 1, 2006


The risk of the family of feasible generalized double k-class estimators under LINEX loss function is derived in a linear regression model. The disturbances are assumed to be non-spherical and their variance covariance matrix is unknown.

Mathematik, Informatik und Statistik - Open Access LMU - Teil 02/03
Estimation of a Linear Regression under Microaggregation with the Response Variable as a Sorting Variable

Mathematik, Informatik und Statistik - Open Access LMU - Teil 02/03

Play Episode Listen Later Jan 1, 2005


Microaggregation is one of the most frequently applied statistical disclosure control techniques for continuous data. The basic principle of microaggregation is to group the observations in a data set and to replace them by their corresponding group means. However, while reducing the disclosure risk of data files, the technique also affects the results of statistical analyses. The paper deals with the impact of microaggregation on a linear model in continuous variables. We show that parameter estimates are biased if the dependent variable is used to form the groups. Using this result, we develop a consistent estimator that removes the aggregation bias. Moreover, we derive the asymptotic covariance matrix of the corrected least squares estimator.

Mathematik, Informatik und Statistik - Open Access LMU - Teil 01/03
Use of prior information in the form of interval constraints for the improved estimation of linear regression models with some missing responses

Mathematik, Informatik und Statistik - Open Access LMU - Teil 01/03

Play Episode Listen Later Jan 1, 2001


We have considered the estimation of coefficients in a linear regression model when some responses on the study variable are missing and some prior information in the form of lower and upper bounds for the average values of missing responses is available. Employing the mixed regression framework, we have presented five estimators for the vector of regression coefficients. Their exact as well as asymptotic properties are discussed and superiority of one estimator over the other is examined.

Mathematik, Informatik und Statistik - Open Access LMU - Teil 01/03
A Revisit to the Application of Weighted Mixed Regression Estimation in Linear Regression Models with Missing Data

Mathematik, Informatik und Statistik - Open Access LMU - Teil 01/03

Play Episode Listen Later Jan 1, 2001


This paper deals with the application of the weighted mixed regression estimation of the coefficients in a linear model when some values of some of the regressors are missing. Taking the weight factor as an arbitrary scalar, the performance of weighted mixed regression estimator in relation to the conventional least squares and mixed regression estimators is analyzed and the choice of scalar is discussed. Then taking the weight factor as a specific matrix, a family of estimators is proposed and its performance properties under the criteria of bias vector and mean squared error matrix are analyzed.

Mathematik, Informatik und Statistik - Open Access LMU - Teil 01/03
Using diagnostic measures to detect non-MCAR processes in linear regression models with missing covariates

Mathematik, Informatik und Statistik - Open Access LMU - Teil 01/03

Play Episode Listen Later Jan 1, 2000


This paper presents methods to analyze and detect non-MCAR processes that lead to missing covariate values in linear regression models. First, the data situation and the problem is sketched. The next section provides an overview of the methods that deal with missing covariate values. The idea of using outlier methods to detect non-MCAR processes is described in section 3. Section 4 uses these ideas to introduce a graphical method to visualize the problem. Possible extensions conclude the presentation.

Mathematik, Informatik und Statistik - Open Access LMU - Teil 01/03
Estimation of Linear Regression Models with Missingness of Observations on Both the Explanatory and Study Variables-Part I: Theoretical Results

Mathematik, Informatik und Statistik - Open Access LMU - Teil 01/03

Play Episode Listen Later Jan 1, 2000


This paper discusses the estimation of coefficients in a linear regression model when there are some missing observations on an explanatory variable and the study variable individually as well as simultaneously. The first order regression method of imputation is followed and the least squares procedure is applied. Efficiency properties of estimators are then investigated employing the large sample asymptotic theory.

Mathematik, Informatik und Statistik - Open Access LMU - Teil 01/03
The Classical Linear Regression Model with one Incomplete Binary Variable

Mathematik, Informatik und Statistik - Open Access LMU - Teil 01/03

Play Episode Listen Later Jan 1, 1999


We present three different methods based on the conditional mean imputation when binary explanatory variables are incomplete. Apart from the single imputation and multiple imputation especially the so-called pi imputation is presented as a new procedure. Seven procedures are compared in a simulation experiment when missing data are confined to one independent binary variable: complete case analysis, zero order regression, categorical zero order regression, pi imputation, single imputation, multiple imputation, modified first order regression. After a brief theoretical description of the simulation experiment, MSE-ratio, variance and bias are used to illustrate differences within and between the approaches.

Mathematik, Informatik und Statistik - Open Access LMU - Teil 01/03
Improving the Estimation of Coefficients in Linear Regression Models with Some Missing Observations on Some Explanatory Variables

Mathematik, Informatik und Statistik - Open Access LMU - Teil 01/03

Play Episode Listen Later Jan 1, 1998


Thu, 1 Jan 1998 12:00:00 +0100 https://epub.ub.uni-muenchen.de/1518/1/paper_129.pdf Srivastava, V. K.; Toutenburg, Helge ddc:510

Mathematik, Informatik und Statistik - Open Access LMU - Teil 01/03
Improved Predictions in Linear Regression Models with Stochastic Linear Constraints

Mathematik, Informatik und Statistik - Open Access LMU - Teil 01/03

Play Episode Listen Later Jan 1, 1998


In this article, we have considered two families of predictors for the simultaneous prediction of actual and average values of study variable in a linear regression model when a set of stochastic linear constraints binding the regression coefficients is available. These families arise from the method of mixed regression estimation. Performance properties of these families are analyzed when the objective is to predict values outside the sample and within the sample.

Mathematik, Informatik und Statistik - Open Access LMU - Teil 01/03
Prediction of Response Values in Linear Regression Models from Replicated Experiments

Mathematik, Informatik und Statistik - Open Access LMU - Teil 01/03

Play Episode Listen Later Jan 1, 1998


This paper considers the problem of prediction in a linear regression model when data sets are available from replicated experiments. Pooling the data sets for the estimation of regression parameters, we present three predictors - one arising from the least squares method and two stemming from Stein-rule method. Efficiency properties of these predictors are discussed when they are used to predict actual and average values of response variable within/out-side the sample. Key words: least squares estimator, prediction, Stein-type estimator

Mathematik, Informatik und Statistik - Open Access LMU - Teil 01/03
Kernel smoothing of Aalen's linear regression model

Mathematik, Informatik und Statistik - Open Access LMU - Teil 01/03

Play Episode Listen Later Jan 1, 1997


The linear regression model by Aalen for failure time analysis allows the inclusion of time-dependent covariates as well as the variation of covariate effects over time. For estimation Aalen considers cumulative hazard functions and derives estimates by applying counting process theory. Since often hazard functions themselves are of primary interest rather than cumulative hazard functions, in this paper we consider kernel estimation of the hazard functions, particularly in the presence of time-dependent covariates. Different kinds of bandwidths and kernel functions are discussed. A comparison of the considered methods is illustrated by data from the Stanford Heart Transplant Study.

Mathematik, Informatik und Statistik - Open Access LMU - Teil 01/03
Semiparametric EM-estimation of censored linear regression models for durations

Mathematik, Informatik und Statistik - Open Access LMU - Teil 01/03

Play Episode Listen Later Jan 1, 1996


This paper investigates the sensitivity of maximum quasi likelihood estimators of the covariate effects in duration models in the presence of misspecification due to neglected heterogeneity or misspecification of the hazard function. We consider linear models for r(T) where T is duration and r is a known, strictly increasing function. This class of models is also referred to as location-scale models. In the absence of censoring, Gould and Lawless (1988) have shown that maximum likelihood estimators of the regression parameters are consistent and asymptotically normally distributed under the assumption that the location-scale structure of the model is of the correct form. In the presence of censoring, however, model misspecification leads to inconsistent estimates of the regression coefficients for most of the censoring mechanisms that are widely used in practice. We propose a semiparametric EM-estimator, following ideas of Ritov (1990), and Buckley and James (1979). This estimator is robust against misspecification and is highly recommended if there is heavy censoring and if there may be specification errors. We present the results of simulation experiments illustrating the performance of the proposed estimator.