POPULARITY
Summary: In dieser Episode des DeepTech DeepTalk diskutieren Oliver und Alois über die neuesten Entwicklungen im Bereich der Künstlichen Intelligenz, insbesondere im Hinblick auf Advanced Reasoning und die Preiselastizität von KI-Diensten. Sie beleuchten die Herausforderungen und Risiken, die mit der Nutzung von KI verbunden sind, sowie die geopolitischen Aspekte und die Notwendigkeit technologischer Resilienz. Sie sprechen auch über die zukünftigen Entwicklungen in der Wissenschaft und Technologie, insbesondere die Rolle von Künstlicher Intelligenz (KI) in der Forschung. Man hebt hervor, dass wir in 2025 bedeutende wissenschaftliche Durchbrüche erwarten können, die durch generative KI und neuartige Ansätze in der Datenanalyse ermöglicht werden. Zudem wird die globale Perspektive der KI-Entwicklung betrachtet, insbesondere die Rolle von Asien und die Notwendigkeit der Zusammenarbeit in Europa. Takeaways: Es gab viele spannende Entwicklungen zu Jahresbeginn. Advanced Reasoning wird als Schlüsseltechnologie angesehen. Die Preisgestaltung von KI-Diensten ist ein Experiment. Die Nutzung von KI birgt Risiken für geistiges Eigentum. Technologische Resilienz wird immer wichtiger. Die Nachfrage nach neuen Softwarelösungen steigt. KI kann signifikante Effizienzgewinne bringen. Es gibt Herausforderungen bei der Implementierung von KI. Die Entwicklung von KI erfordert verantwortungsbewusste Ansätze. Geopolitische Faktoren beeinflussen die Technologieentwicklung. Wir werden 2025 bedeutende wissenschaftliche Durchbrüche sehen. Generative KI wird in der Forschung eine zentrale Rolle spielen. Die Mensch-Maschine-Interaktion wird sich weiterentwickeln. Die Dezentralisierung der KI ist ein wichtiges Thema. Asien, insbesondere China, ist ein ernstzunehmender Akteur in der KI-Entwicklung. Kollaboration in Europa könnte entscheidend für den Fortschritt sein. Hardware-Entwicklung hat globale Dimensionen. Die Effizienz in der Ressourcennutzung ist entscheidend. Die Rolle von Quantencomputern wird zunehmen. Die Wissenschaft wird durch KI revolutioniert. Sound Bites: "Viel passiert zu Jahreswende." "Das Thema Reasoning wird immer wichtiger." "Das ist ein neues Spiel." "Wir müssen über die USAI-Themen reden." "Wir sehen viele Innovationen in Asien." "Wir müssen KI in die Edge bringen." Key Words: Deep Tech, KI, Advanced Reasoning, Preiselastizität, Marktstrategien, Risiken, Herausforderungen, Geopolitik, technologische Resilienz, Knowledge Discovery, Künstliche Intelligenz, Forschung, Wissenschaft, Innovation, Technologie, globale Entwicklung, Europa, Hardware, Quantencomputer, Zusammenarbeit Chapters: 00:00 Neujahrsrückblick und spannende Entwicklungen 02:52 Fortschritte im Bereich Advanced Reasoning 06:04 Preiselastizität und Marktstrategien 08:56 Risiken und Herausforderungen bei der Nutzung von KI 11:49 Geopolitische Aspekte und technologische Resilienz 16:59 Zukünftige Durchbrüche in der Wissenschaft 22:44 Die Rolle von KI in der Forschung und Entwicklung 27:20 Globale Perspektiven der KI-Entwicklung 30:51 Kollaboration und Innovation in Europa
Listen to this interview of Roberto Verdecchia, Assistant Professor, University of Florence, Italy; and also, Luís Cruz, Assistant Professor, Delft University of Technology, Netherlands. We talk about their coauthored paper A systematic review of Green AI (WIREs Data Mining and Knowledge Discovery 2023). Luís Cruz : "Sometimes, especially in systematic studies, we are so worried about the process that we forget about the goals of why we're doing this. That means, we can end up reporting things just because they are part of the process — you know, we feel a need to say something about all that — but really, that way of reporting just produces a review that's a big bulk of highly systematic outputs, but not necessarily a review with relevant and useful findings." Learn more about your ad choices. Visit megaphone.fm/adchoices Support our show by becoming a premium member! https://newbooksnetwork.supportingcast.fm/new-books-network
Listen to this interview of Roberto Verdecchia, Assistant Professor, University of Florence, Italy; and also, Luís Cruz, Assistant Professor, Delft University of Technology, Netherlands. We talk about their coauthored paper A systematic review of Green AI (WIREs Data Mining and Knowledge Discovery 2023). Luís Cruz : "Sometimes, especially in systematic studies, we are so worried about the process that we forget about the goals of why we're doing this. That means, we can end up reporting things just because they are part of the process — you know, we feel a need to say something about all that — but really, that way of reporting just produces a review that's a big bulk of highly systematic outputs, but not necessarily a review with relevant and useful findings." Learn more about your ad choices. Visit megaphone.fm/adchoices
In this episode of the Brand Called You, Prof Barend Mons, Professor Emeritus at Leiden University and founding director of the Leiden Institute for Fair and Equitable Science (LIFES), shares his extensive journey in data science and bioinformatics. The conversation delves into the evolution of FAIR principles (Findable, Accessible, Interoperable, and Reusable data), the challenges of knowledge discovery at the edge of chaos, and the future of scientific communication. About Prof Barend Mons Professor Mons is the professor emeritus at the Leiden University. He's the founding director of the Leiden Institute for FAIR and Equitable Science, which is the acronym for L, I, F, E, S. In 2024, he was appointed as a Fellow of the International Science Council. --- Support this podcast: https://podcasters.spotify.com/pod/show/tbcy/support
Retrieval-Augmented Generation (RAG) leverages retrieval tools to access external databases, thereby enhancing the generation quality of large language models (LLMs) through optimized context. However, the existing retrieval methods are constrained inherently, as they can only perform relevance matching between explicitly stated queries and well-formed knowledge, but unable to handle tasks involving ambiguous information needs or unstructured knowledge. Consequently, existing RAG systems are primarily effective for straightforward question-answering tasks. In this work, we propose MemoRAG, a novel retrieval-augmented generation paradigm empowered by long-term memory. MemoRAG adopts a dual-system architecture. On the one hand, it employs a light but long-range LLM to form the global memory of database. Once a task is presented, it generates draft answers, cluing the retrieval tools to locate useful information within the database. On the other hand, it leverages an expensive but expressive LLM, which generates the ultimate answer based on the retrieved information. Building on this general framework, we further optimize MemoRAG's performance by enhancing its cluing mechanism and memorization capacity. In our experiment, MemoRAG achieves superior performance across a variety of evaluation tasks, including both complex ones where conventional RAG fails and straightforward ones where RAG is commonly applied. 2024: Hongjin Qian, Peitian Zhang, Zheng Liu, Kelong Mao, Zhicheng Dou https://arxiv.org/pdf/2409.05591v2
In this episode, hosts JC Bonilla and Ardis Kadiu explore how AI is disrupting traditional search and revolutionizing knowledge discovery in higher education. They discuss Element451's groundbreaking AI search tool, which combines the power of large language models with university-specific content to provide students with a highly personalized, engaging search experience. The conversation highlights how this new approach could potentially replace static university websites, increase student engagement, and transform the attention economy in higher ed marketing.AI's Impact on Search and Knowledge DiscoveryDiscussion of how AI models like ChatGPT are changing the way we search for and discover informationIntroduction to tools like Perplexity AI that combine large language models with web search capabilitiesElement451's AI Search ToolOverview of Element451's innovative AI search experience for university websitesExplanation of how the tool uses AI to provide personalized, context-aware search resultsDiscussion of the tool's ability to incorporate multimedia content like videos and imagesTransforming University Websites and Student EngagementAnalysis of how AI-powered search could potentially replace traditional, static university websitesExploration of how this new approach could dramatically increase student engagement and time spent on university sitesInsights into how AI search creates a non-linear, "choose your own adventure" experience for studentsThe Attention Economy and the Future of Higher Ed MarketingDiscussion of how relevance and personalization drive engagement in the attention economyThoughts on how AI search could be a game-changer for higher ed marketing by capturing and holding student attentionPredictions for how AI will continue to transform the student experience and university marketing in the coming years - - - -Connect With Our Co-Hosts:Ardis Kadiuhttps://www.linkedin.com/in/ardis/https://twitter.com/ardisDr. JC Bonillahttps://www.linkedin.com/in/jcbonilla/https://twitter.com/jbonillxAbout The Enrollify Podcast Network:Generation AI is a part of the Enrollify Podcast Network. If you like this podcast, chances are you'll like other Enrollify shows too! Some of our favorites include The EduData Podcast and Visionary Voices: The College President's Playbook.Enrollify is made possible by Element451 — the next-generation AI student engagement platform helping institutions create meaningful and personalized interactions with students. Learn more at element451.com. Connect with Us at the Engage Summit:Exciting news — Ardis will be at the 2024 Engage Summit in Raleigh, NC, on June 25 and 26, and would love to meet you there! Sessions will focus on cutting-edge AI applications that are reshaping student outreach, enhancing staff productivity, and offering deep insights into ROI. Use the discount code Enrollify50 at checkout, and you can register for just $200! Learn more and register at engage.element451.com — we can't wait to see you there!
Step into the realm of digital deception with Nada and Nick in today's riveting episode. Delve deep into the intricate maze of disinformation and its effect on our perceptions. From the political landscape to the COVID-19 pandemic, no topic is off-limits as they unravel the role of social media in perpetuating falsehoods. Make sure to tune in as this podcast is not to be missed!ReferencesBulger, M., & Davison, P. (2018). The promises, challenges, and futures of media literacy. Journal of Media Literacy Education, 10(1), 1-21.Pereira, P. S., Silveira, A. D. S., & Pereira, A. (2020). Disinformation and conspiracy theories in the age of COVID-19. Frontiers in Sociology, 5, 560681. https://doi.org/10.3389/fsoc.2020.560681Shu, K., Bhattacharjee, A., Alatawi, F., Nazer, T. H., Ding, K., Karami, M., & Liu, H. (2020). Combating disinformation in a social media age. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(6), https://doi.org/10.1002/widm.1385Spies, S. (2020, January 22). Producers of Disinformation. MediaWell Research Review. https://mediawell.ssrc.org/research-reviews/producers-of-disinformation/
Die Themen in den Wissensnachrichten: +++ Zweiter Malaria-Impfstoff erfolgreich getestet +++ Eisen aus Gift-Schlamm gewinnen +++ Hunde mit langen Nasen leben länger +++**********Weiterführende Quellen zu dieser Folge:Update ErdeSafety and efficacy of malaria vaccine candidate R21/Matrix-M in African children/ The Lancet, 01.02.2024Green steel from red mud through climate-neutral hydrogen plasma reduction/ Nature, 24.01.2024Predicting consumer choice from raw eye-movement data using the RETINA deep learning architecture/ Data Mining and Knowledge Discovery, 29.12.2023Computational phylogenetics reveal histories of sign languages/ Science, 01.02.2024Alle Quellen findet ihr hier.**********Ihr könnt uns auch auf diesen Kanälen folgen: Tiktok und Instagram.
TL;DR: Contrast-consistent search (CCS) seemed exciting to us and we were keen to apply it. At this point, we think it is unlikely to be directly helpful for implementations of alignment strategies (>95%). Instead of finding knowledge, it seems to find the most prominent feature. We are less sure about the wider category of unsupervised consistency-based methods, but tend to think they won't be directly helpful either (70%). We've written a paper about some of our detailed experiences with it.Paper authors: Sebastian Farquhar*, Vikrant Varma*, Zac Kenton*, Johannes Gasteiger, Vlad Mikulik, and Rohin Shah. *Equal contribution, order randomised.Credences are based on a poll of Seb, Vikrant, Zac, Johannes, Rohin and show single values where we mostly agree and ranges where we disagreed. What does CCS try to do?To us, CCS represents a family of possible algorithms aiming at solving an ELK-style problem that have the steps: [...]The original text contained 5 footnotes which were omitted from this narration. --- First published: December 18th, 2023 Source: https://www.lesswrong.com/posts/wtfvbsYjNHYYBmT3k/discussion-challenges-with-unsupervised-llm-knowledge-1 --- Narrated by TYPE III AUDIO.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Discussion: Challenges with Unsupervised LLM Knowledge Discovery, published by Seb Farquhar on December 18, 2023 on The AI Alignment Forum. TL;DR: Contrast-consistent search (CCS) seemed exciting to us and we were keen to apply it. At this point, we think it is unlikely to be directly helpful for implementations of alignment strategies (>95%). Instead of finding knowledge, it seems to find the most prominent feature. We are less sure about the wider category of unsupervised consistency-based methods, but tend to think they won't be directly helpful either (70%). We've written a paper about some of our detailed experiences with it. Paper authors: Sebastian Farquhar*, Vikrant Varma*, Zac Kenton*, Johannes Gasteiger, Vlad Mikulik, and Rohin Shah. *Equal contribution, order randomised. Credences are based on a poll of Seb, Vikrant, Zac, Johannes, Rohin and show single values where we mostly agree and ranges where we disagreed. What does CCS try to do? To us, CCS represents a family of possible algorithms aiming at solving an ELK-style problem that have the steps: Knowledge-like property: write down a property that points at an LLM feature which represents the model's knowledge (or a small number of features that includes the model-knowledge-feature). Formalisation: make that property mathematically precise so you can search for features with that property in an unsupervised way. Search: find it (e.g., by optimising a formalised loss). In the case of CCS, the knowledge-like property is negation-consistency, the formalisation is a specific loss function, and the search is unsupervised learning with gradient descent on a linear + sigmoid function taking LLM activations as inputs. We were pretty excited about this. We especially liked that the approach is not supervised. Conceptually, supervising ELK seems really hard: it is too easy to confuse what you know, what you think the model knows, and what it actually knows. Avoiding the need to write down what-the-model-knows labels seems like a great goal. Why we think CCS isn't working We spent a lot of time playing with CCS and trying to make it work well enough to build a deception detector by measuring the difference between elicited model's knowledge and stated claims.[1] Having done this, we are now not very optimistic about CCS or things like it. Partly, this is because the loss itself doesn't give much reason to think that it would be able to find a knowledge-like property and empirically it seems to find whatever feature in the dataset happens to be most prominent, which is very prompt-sensitive. Maybe something building off it could work in the future, but we don't think anything about CCS provides evidence that it would be likely to. As a result, we have basically returned to our priors about the difficulty of ELK, which are something between "very very difficult" and "approximately impossible" for a full solution, while mostly agreeing that partial solutions are "hard but possible". What does the CCS loss say? The CCS approach is motivated like this: we don't know that much about the model's knowledge, but probably it follows basic consistency properties. For example, it probably has something like Bayesian credences and when it believes A with some probability PA, it ought to believe A with probability 1PA.[2] So if we search in the LLM's feature space for features that satisfy this consistency property, the model's knowledge is going to be one of the things that satisfies it. Moreover, they hypothesise, there probably aren't that many things that satisfy this property, so we can easily check the handful that we get and find the one representing the model's knowledge. When we dig into the CCS loss, it isn't clear that it really checks for what it's supposed to. In particular, we prove that arbitrary features, not jus...
"Learn to use AI. That's, that's my message. You wanna learn to use AI as a professional and as a citizen in your personal life. The more you know how to use it, the better you'll make of it, the better your life will be. AI gives power; like any technology, it gives power to those who understand it and use it" - Pedro Domingos Recent developments in AI, specifically consumer-facing generative AIs, are helping people create a lot of cool content while also generating a ton of concern. A big bucket of that concern is AI alignment - what are the possible unintended consequences to humans? The internet transformed our relationship to information, but it took a few years; now, AI is doing it in real time. My guest on this episode is Professor Pedro Domingos. Pedro is a leading AI researcher and the author of the worldwide bestseller "The Master Algorithm." He is a professor of computer science at the University of Washington in Seattle. He won the Special Interest Group on Knowledge Discovery and Data Mining Innovation Award and the international joint Conference on AI John McCarthy Award, two of the highest honors in data science and AI. Pedro helped start the fields of statistical relational AI, data stream mining, adversarial learning, machine learning for information integration, and influence maximization in social networks. On this episode, we run the gamut to include... Where are we with generative AIs Pedro demystifies LLMs (Large Language Models) Progress and problems with generative AIs Hallucination in AI - and Illusion in humans The homunculus fallacy Risks, regulations, known-unknowns Comments on existential threats The S curve in emerging technologies like AI AI's possible impact on employment and the economy Artificial General Intellience or AGI Goals and end games, is AGI the goal? Does he think LLMs AI's like ChatGPT are conscious? No matter your technical level, you'll enjoy this discussion with Pedro. He is passionate about the subject matter, no surprise - much of what he's predicted has come to pass in the field, And if you feel a tinge of AI anxiety, consider this a bit of exposure therapy. Listen and learn more about how these systems work and how they might impact your life. For show notes and more, visit larryweeks.com
Amir Feizpour is the cofounder and CEO of Aggregate Intellect. It's a platform to accelerate knowledge discovery for research and development teams, including but not limited to AI. You can visit ai.science to learn more about it. Previously he worked in the industry as a data scientist, a senior manager, and a product lead in NLP. He has a PhD in Physics from University of Toronto and he did his postdoc in quantum computing at University of Oxford. You can join the Aggregate Intellect Slack community here. You can book a free 20-min coaching session with him here.In this episode, we cover a range of topics including:- His journey into the world of data- What is knowledge discovery- How to build online communities- What he's building at Aggregate Intellect- What does great data science culture look like- What product has impressed you the most- Current and future trends in AI
We started out as the show that invited scholars, makers, and professionals to brunch for informal conversations about their work—but last season, we needed to record remotely. This year we're excited to be able to bring back in-person interviews while still taking advantage of the flexibility afforded by our remote setup.This episode is a little different from what we usually do, in that the focus isn't one person's work but rather a new tool designed to enhance knowledge access for everyone. It's called Marble, and it's a collaboration between Notre Dame's Hesburgh Libraries and Snite Museum of Art developed with a grant from the Andrew W. Mellon Foundation. Marble is an online portal that lets users all over the world view and learn about materials from the Snite Museum, Rare Books & Special Collections, and the University Archives in a way that is so cool it made us want to do a show literally about a website.And to cover everything that makes Marble special, we tried something else different: Not one but two interviews, with two people who have played distinct roles in its creation.First you'll hear from Mikala Narlock, digital collections librarian at the Hesburgh Libraries, who analyzed how content would be uploaded to Marble. Mikala and host Ted Fox talked on a windy day outside the library about the user experience—the types of artifacts available in the platform, what shows up on your screen when you run a search, why this is different than what existed before, and importantly, how anyone can use it, regardless of whether they have an affiliation with Notre Dame.After Mikala, it's Erika Hosselkus, a special collections curator and Latin American studies librarian at the Hesburgh Libraries who led the content team for the Marble project. Erika and Ted met up in Rare Books and Special Collections at the library, where they talked about how the materials Marble gives people access to can inform teaching, research, and just our collective consciousness, not to mention how digital discovery can actually serve as an important gateway to the physical collections themselves.LINKSMarble website: marble.nd.eduEpisode Transcript
Bem vindos meus amigos a sétima cobertura OFICIAL do prêmio IgNobel (contando podcasts anteiores), e para realizar esta façanha Trabuco recebe a já mais que da casa Vanora, Petrus Davi e estreando no podcast, e unico gabaratado para falar sobre ciência Rodolfo Souza! Venha para nosso grupo do Telegram https://t.me/trabucoshow Pesquisas Premiadas: BIOLOGY PRIZE [SWEDEN]: Susanne Schötz, Robert Eklund, and Joost van de Weijer, for analyzing variations in purring, chirping, chattering, trilling, tweedling, murmuring, meowing, moaning, squeaking, hissing, yowling, howling, growling, and other modes of cat–human communication. REFERENCE: “A Comparative Acoustic Analysis of Purring in Four Cats,” Susanne Schötz and Robert Eklund, Proceedings of Fonetik 2011, Speech, Music and Hearing, KTH, Stockholm, TMH-QPSR, 51. REFERENCE: “A Phonetic Pilot Study of Vocalisations in Three Cats,” Susanne Schötz, Proceedings of Fonetik 2012, Department of Philosophy, Linguistics and Theory of Science, University of Gothenburg, Sweden. REFERENCE: “A Phonetic Pilot Study of Chirp, Chatter, Tweet and Tweedle in Three Domestic Cats,” Susanne Schötz, Proceedings of Fonetik 2013, Linköping University, Sweden, 2013, pp. 65-68. REFERENCE: “A Study of Human Perception of Intonation in Domestic Cat Meows,” Susanne Schötz and Joost van de Weijer, Proceedings of the 7th International Conference on Speech Prosody, Dubin, Ireland, May 20-23, 2014. REFERENCE: “Melody in Human–Cat Communication (Meowsic): Origins, Past, Present and Future,” Susanne Schötz, Robert Eklund, and Joost van de Weijer, 2016. WHO TOOK PART IN THE CEREMONY: Susanne Schötz ECOLOGY PRIZE [SPAIN. IRAN]: Leila Satari, Alba Guillén, Àngela Vidal-Verdú, and Manuel Porcar, for using genetic analysis to identify the different species of bacteria that reside in wads of discarded chewing gum stuck on pavements in various countries. REFERENCE: “The Wasted Chewing Gum Bacteriome,” Leila Satari, Alba Guillén, Àngela Vidal-Verdú, and Manuel Porcar, Scientific Reports, vol. 10, no. 16846, 2020. WHO TOOK PART IN THE CEREMONY: Leila Satari, Alba Guillén, Àngela Vidal-Verdú, Manuel Porcar CHEMISTRY PRIZE [GERMANY, UK, NEW ZEALAND, GREECE, CYPRUS, AUSTRIA]: Jörg Wicker, Nicolas Krauter, Bettina Derstroff, Christof Stönner, Efstratios Bourtsoukidis, Achim Edtbauer, Jochen Wulf, Thomas Klüpfel, Stefan Kramer, and Jonathan Williams, for chemically analyzing the air inside movie theaters, to test whether the odors produced by an audience reliably indicate the levels of violence, sex, antisocial behavior, drug use, and bad language in the movie the audience is watching. REFERENCE: “Proof of Concept Study: Testing Human Volatile Organic Compounds as Tools for Age Classification of Films,” Christof Stönner, Achim Edtbauer, Bettina Derstroff, Efstratios Bourtsoukidis, Thomas Klüpfel, Jörg Wicker, and Jonathan Williams, PLoS ONE, vol. 13, no. 10, 2008, p. e0203044. REFERENCE: “Cinema Data Mining: The Smell of Fear,” Jörg Wicker, Nicolas Krauter, Bettina Derstorff, Christof Stönner, Efstratios Bourtsoukidis, Thomas Klüpfel, Jonathan Williams, and Stefan Kramer, Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1295-1304. 2015. WHO TOOK PART IN THE CEREMONY: Jörg Wicker, Nicolas Krauter, Bettina Derstroff, Christof Stönner, Efstratios Bourtsoukidis, Achim Edtbauer, Jochen Wulf, Thomas Klüpfel, Stefan Kramer, Jonathan Williams ECONOMICS PRIZE [FRANCE, SWITZERLAND, AUSTRALIA, AUSTRIA, CZECH REPUBLIC, UK]: Pavlo Blavatskyy, for discovering that the obesity of a country's politicians may be a good indicator of that country's corruption. REFERENCE: “Obesity of Politicians and Corruption in Post‐Soviet Countries,” Pavlo Blavatskyy, Economic of Transition and Institutional Change, vol. 29, no. 2, 2021, pp. 343-356. WHO TOOK PART IN THE CEREMONY: Pavlo Blavatskyy MEDICINE PRIZE [GERMANY, TURKEY, UK]: Olcay Cem Bulut, Dare Oladokun, Burkard Lippert, an
วิดิโอไลฟ์สด https://www.youtube.com/watch?v=UjISR62hTVE วิดิโอพิธีมอบรางวัลอิกโนเบลปี 2021 https://vimeo.com/599769861 BIOLOGY PRIZE [SWEDEN]: Susanne Schötz, Robert Eklund, and Joost van de Weijer, for analyzing variations in purring, chirping, chattering, trilling, tweedling, murmuring, meowing, moaning, squeaking, hissing, yowling, howling, growling, and other modes of cat–human communication. REFERENCE: “A Comparative Acoustic Analysis of Purring in Four Cats,” Susanne Schötz and Robert Eklund, Proceedings of Fonetik 2011, Speech, Music and Hearing, KTH, Stockholm, TMH-QPSR, 51. REFERENCE: “A Phonetic Pilot Study of Vocalisations in Three Cats,” Susanne Schötz, Proceedings of Fonetik 2012, Department of Philosophy, Linguistics and Theory of Science, University of Gothenburg, Sweden. REFERENCE: “A Phonetic Pilot Study of Chirp, Chatter, Tweet and Tweedle in Three Domestic Cats,” Susanne Schötz, Proceedings of Fonetik 2013, Linköping University, Sweden, 2013, pp. 65-68. REFERENCE: “A Study of Human Perception of Intonation in Domestic Cat Meows,” Susanne Schötz and Joost van de Weijer, Proceedings of the 7th International Conference on Speech Prosody, Dubin, Ireland, May 20-23, 2014. REFERENCE: “Melody in Human–Cat Communication (Meowsic): Origins, Past, Present and Future,” Susanne Schötz, Robert Eklund, and Joost van de Weijer, 2016. WHO TOOK PART IN THE CEREMONY: Susanne Schötz https://www.youtube.com/watch?v=wkRcwGdaeSE https://www.youtube.com/watch?v=bvS2SlJuLp8 ECOLOGY PRIZE [SPAIN. IRAN]: Leila Satari, Alba Guillén, Àngela Vidal-Verdú, and Manuel Porcar, for using genetic analysis to identify the different species of bacteria that reside in wads of discarded chewing gum stuck on pavements in various countries. REFERENCE: “The Wasted Chewing Gum Bacteriome,” Leila Satari, Alba Guillén, Àngela Vidal-Verdú, and Manuel Porcar, Scientific Reports, vol. 10, no. 16846, 2020. WHO TOOK PART IN THE CEREMONY: Leila Satari, Alba Guillén, Àngela Vidal-Verdú, Manuel Porcar CHEMISTRY PRIZE [GERMANY, UK, NEW ZEALAND, GREECE, CYPRUS, AUSTRIA]: Jörg Wicker, Nicolas Krauter, Bettina Derstroff, Christof Stönner, Efstratios Bourtsoukidis, Achim Edtbauer, Jochen Wulf, Thomas Klüpfel, Stefan Kramer, and Jonathan Williams, for chemically analyzing the air inside movie theaters, to test whether the odors produced by an audience reliably indicate the levels of violence, sex, antisocial behavior, drug use, and bad language in the movie the audience is watching. REFERENCE: “Proof of Concept Study: Testing Human Volatile Organic Compounds as Tools for Age Classification of Films,” Christof Stönner, Achim Edtbauer, Bettina Derstroff, Efstratios Bourtsoukidis, Thomas Klüpfel, Jörg Wicker, and Jonathan Williams, PLoS ONE, vol. 13, no. 10, 2008, p. e0203044. REFERENCE: “Cinema Data Mining: The Smell of Fear,” Jörg Wicker, Nicolas Krauter, Bettina Derstorff, Christof Stönner, Efstratios Bourtsoukidis, Thomas Klüpfel, Jonathan Williams, and Stefan Kramer, Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1295-1304. 2015. WHO TOOK PART IN THE CEREMONY: Jörg Wicker, Nicolas Krauter, Bettina Derstroff, Christof Stönner, Efstratios Bourtsoukidis, Achim Edtbauer, Jochen Wulf, Thomas Klüpfel, Stefan Kramer, Jonathan Williams ECONOMICS PRIZE [FRANCE, SWITZERLAND, AUSTRALIA, AUSTRIA, CZECH REPUBLIC, UK]: Pavlo Blavatskyy, for discovering that the obesity of a country's politicians may be a good indicator of that country's corruption. REFERENCE: “Obesity of Politicians and Corruption in Post‐Soviet Countries,” Pavlo Blavatskyy, Economic of Transition and Institutional Change, vol. 29, no. 2, 2021, pp. 343-356. WHO TOOK PART IN THE CEREMONY: Pavlo Blavatskyy ตัวอย่างโปรแกรมที่ไว้คำนวณ BMI จากภาพถ่ายใบหน้า เผื่อใครอยากลองไปเล่นดู https://medium.
Chegou o momento do já tradicional episódio duplo sobre o IgNobel, que tem como missão "honrar estudos e experiências que primeiro fazem as pessoas rir e depois pensar", com as descobertas científicas mais estranhas do ano.Esta é a primeira de duas partes sobre a edição 2021 do prêmio, trazendo as categorias Biologia, Ecologia, Química, Ciência do Transporte e Economia.Confira no papo entre o leigo curioso, Ken Fujioka, e o cientista PhD, Altay de Souza.> OUÇA (51min 42s)*Naruhodo! é o podcast pra quem tem fome de aprender. Ciência, senso comum, curiosidades, desafios e muito mais. Com o leigo curioso, Ken Fujioka, e o cientista PhD, Altay de Souza.Edição: Reginaldo Cursino.http://naruhodo.b9.com.br*PARCERIA: ALURAA Alura tem mais de 1.000 cursos de diversas áreas e é a maior plataforma de cursos online do Brasil -- e você tem acesso a todos com uma única assinatura.Aproveite o desconto de R$100 para ouvintes Naruhodo no link:https://www.alura.com.br/promocao/naruhodo ===Biologia - SuéciaSusanne Schötz, Robert Eklund, and Joost van de Weijer, for analyzing variations in purring, chirping, chattering, trilling, tweedling, murmuring, meowing, moaning, squeaking, hissing, yowling, howling, growling, and other modes of cat–human communication.Analisando variações em ronronar, chilrear, tagarelar, vibrar, murmurar, miar, gemer, guinchar, assobiar, uivar, rosnar e outros modos de comunicação gato-humano.*Ecologia - Espanha e IranLeila Satari, Alba Guillén, Àngela Vidal-Verdú, and Manuel Porcar, for using genetic analysis to identify the different species of bacteria that reside in wads of discarded chewing gum stuck on pavements in various countries.Análise genética para identificar as diferentes espécies de bactérias que residem em chicletes descartados presos em calçadas de vários países.*Química - Alemanha, Inglaterra, Nova Zelândia, Grecia, Chipre, AustriaJörg Wicker, Nicolas Krauter, Bettina Derstroff, Christof Stönner, Efstratios Bourtsoukidis, Achim Edtbauer, Jochen Wulf, Thomas Klüpfel, Stefan Kramer, and Jonathan Williams, for chemically analyzing the air inside movie theaters, to test whether the odors produced by an audience reliably indicate the levels of violence, sex, antisocial behavior, drug use, and bad language in the movie the audience is watching.Por analisar quimicamente o ar dentro das salas de cinema, para testar se os odores produzidos por um público indicam de forma confiável os níveis de violência, sexo, comportamento anti-social, uso de drogas e linguagem imprópria no filme que o público está assistindo.*Ciência do Transporte - Namibia, Africa do Sul, Tanzania, Zinbabue, Brasil, Inglaterra, Estados UnidosRobin Radcliffe, Mark Jago, Peter Morkel, Estelle Morkel, Pierre du Preez, Piet Beytell, Birgit Kotting, Bakker Manuel, Jan Hendrik du Preez, Michele Miller, Julia Felippe, Stephen Parry, and Robin Gleed, for determining by experiment whether it is safer to transport an airborne rhinoceros upside-down.Determinar por experimento se é mais seguro transportar um rinoceronte no ar de cabeça para baixo.*Economia - França, Suiça, Australia, Austria, República Checa, InglaterraPavlo Blavatskyy, for discovering that the obesity of a country's politicians may be a good indicator of that country's corruption.Descobrir que a obesidade dos políticos de um país pode ser um bom indicador da corrupção desse país.===REFERÊNCIASBiologia“A Comparative Acoustic Analysis of Purring in Four Cats,” Susanne Schötz and Robert Eklund, Proceedings of Fonetik 2011, Speech, Music and Hearing, KTH, Stockholm, TMH-QPSR, 51.https://www.diva-portal.org/smash/record.jsf?pid=diva2%3A539090&dswid=-2297“A Phonetic Pilot Study of Vocalisations in Three Cats,” Susanne Schötz, Proceedings of Fonetik 2012, Department of Philosophy, Linguistics and Theory of Science, University of Gothenburg, Sweden.https://portal.research.lu.se/portal/en/publications/a-phonetic-pilot-study-of-vocalisations-in-three-cats(d2621c3b-fdc1-485c-ade6-e5b2b6ad5dfb).html“A Phonetic Pilot Study of Chirp, Chatter, Tweet and Tweedle in Three Domestic Cats,” Susanne Schötz, Proceedings of Fonetik 2013, Linköping University, Sweden, 2013, pp. 65-68.https://portal.research.lu.se/portal/en/publications/a-phonetic-pilot-study-of-chirp-chatter-tweet-and-tweedle-in-three-domestic-cats(60fb046d-0955-4885-adfa-73de254500e6).html“A Study of Human Perception of Intonation in Domestic Cat Meows,” Susanne Schötz and Joost van de Weijer, Proceedings of the 7th International Conference on Speech Prosody, Dubin, Ireland, May 20-23, 2014.https://portal.research.lu.se/portal/en/publications/a-study-of-human-perception-of-intonation-in-domestic-cat-meows(a0ff22b4-4809-426f-806a-f5a7ca28100f).html“Melody in Human–Cat Communication (Meowsic): Origins, Past, Present and Future,” Susanne Schötz, Robert Eklund, and Joost van de Weijer, 2016.https://portal.research.lu.se/portal/en/publications/melody-in-humancat-communication-meowsic(e32b4f31-5064-48d1-b38f-7e97390093fe)/infrastructure.html*Ecologia“The Wasted Chewing Gum Bacteriome,” Leila Satari, Alba Guillén, Àngela Vidal-Verdú, and Manuel Porcar, Scientific Reports, vol. 10, no. 16846, 2020.https://doi.org/10.1038/s41598-020-73913-4*Química“Proof of Concept Study: Testing Human Volatile Organic Compounds as Tools for Age Classification of Films,” Christof Stönner, Achim Edtbauer, Bettina Derstroff, Efstratios Bourtsoukidis, Thomas Klüpfel, Jörg Wicker, and Jonathan Williams, PLoS ONE, vol. 13, no. 10, 2008, p. e0203044.https://doi.org/10.1371/journal.pone.0203044“Cinema Data Mining: The Smell of Fear,” Jörg Wicker, Nicolas Krauter, Bettina Derstorff, Christof Stönner, Efstratios Bourtsoukidis, Thomas Klüpfel, Jonathan Williams, and Stefan Kramer, Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1295-1304. 2015.https://doi.org/10.1145/2783258.2783404*Ciência do Transporte“The Pulmonary and Metabolic Effects of Suspension by the Feet Compared with Lateral Recumbency in Immobilized Black Rhinoceroses (Diceros bicornis) Captured by Aerial Darting,” Robin W. Radcliffe, Mark Jago, Peter vdB Morkel, Estelle Morkel, Pierre du Preez, Piet Beytell, Birgit Kotting, Bakker Manuel, Jan Hendrik du Preez, Michele A. Miller, Julia Felippe, Stephen A Parry; R.D. Gleed, Journal of Wildlife Diseases, vol. 57, no. 2, 2021, 357–367.https://doi.org/10.7589/2019-08-202*Economia“Obesity of Politicians and Corruption in Post‐Soviet Countries,” Pavlo Blavatskyy, Economic of Transition and Institutional Change, vol. 29, no. 2, 2021, pp. 343-356.https://doi.org/10.1111/ecot.12259*Naruhodo #151 - Especial Prêmio Ig Nobel 2018 - Parte 1 de 2https://www.b9.com.br/shows/naruhodo/naruhodo-151-especial-premio-ig-nobel-2018-parte-1-de-2/Naruhodo #152 - Especial Prêmio Ig Nobel 2018 - Parte 2 de 2https://www.b9.com.br/shows/naruhodo/naruhodo-152-especial-premio-ig-nobel-2018-parte-2-de-2/Naruhodo #202 - Especial Prêmio Ig Nobel 2019 - Parte 1 de 2https://www.b9.com.br/shows/naruhodo/naruhodo-202-especial-premio-ig-nobel-2019-parte-1-de-2/Naruhodo #203 - Especial Prêmio Ig Nobel 2019 - Parte 2 de 2https://www.b9.com.br/shows/naruhodo/naruhodo-203-especial-premio-ig-nobel-2019-parte-2-de-2/Naruhodo #254 - Especial Prêmio Ig Nobel 2020 - Parte 1 de 2https://www.b9.com.br/shows/naruhodo/naruhodo-254-especial-premio-ignobel-2020-parte-1-de-2/Naruhodo #255 - Especial Prêmio Ig Nobel 2020 - Parte 2 de 2https://www.b9.com.br/shows/naruhodo/naruhodo-255-especial-premio-ignobel-2020-parte-2-de-2/*Podcasts das #Minas: DICIONÁRIO FEMINISTA#MulheresPodcastershttps://open.spotify.com/show/6gS3V1exKZBt3f4QqxKkcj===APOIE O NARUHODO!Você sabia que pode ajudar a manter o Naruhodo no ar?Ao contribuir, você pode ter acesso ao grupo fechado no Telegram, receber conteúdos exclusivos e ter vantagens especiais.Assine o apoio mensal pelo PicPay: https://picpay.me/naruhodopodcast
Dr. E Michael Jones discusses the origins of horror, its relationship to society and history. Monsters from the ID, the Rise of Horror in Fiction and Film: https://www.fidelitypress.org/book-products/monsters-from-the-id Dr. E. Michael Jones is a prolific Catholic writer, lecturer, journalist, and Editor of Culture Wars Magazine who seeks to defend traditional Catholic teachings and values from those seeking to undermine them. Buy Dr. Jones books: https://www.fidelitypress.org/ Subscribe to Culture Wars Magazine: Culturewars.com. Donate: https://culturewars.com/donate
The internet has been disrupting the education sector since its advent and continues to do so. From The Open University to Massive Online Open Courses (known affectionately as MOOCs) and now Open Educational Resources, the world wide web has been empowering teachers and students alike for many years. In the midst of a global pandemic, the teaching profession has had to adapt, shifting near-seamlessly to “Remote Teaching” and/or “Hybrid Teaching”. Educators had to and still have to learn the lessons such as how to build a community in the virtual classroom and how to create new teaching material that meets student's learning needs. By and large, Open Educational Resources (OER) are another useful tool in the armoury of teachers and students alike.In this episode, I will make you familiar with this relatively new type of teaching resources. We will start with the question of what OER are, how useful they are and why you should create your own. Last, but not least, we will get you started creating your first OER.For more information visit my blog: profmanagement.de Thank you for listening. If you liked this episode please leave a review on the iTunes / Apple Podcasts website. If you've got any thoughts on this episode, or if you've got an idea about new podcast topics or question you'd like us to discuss, send an audio file or voice note to hi@profmanagement.de. For any non-audio comments, drop a tweet or DM to @profmanagement on Twitter or Instagram, please.References:Rebus Community Guide: https://press.rebus.community/authoropen/ Create Open Educational Resources: https://pitt.libguides.com/openeducation/create OER Authoring Tools: https://subjectguides.esc.edu/OER/oerauthoringtools
On this episode of Data Science Now we'll learn how to discover valuable insights from data through Data Science. We focus on: - Historic overview of data science methodologies - Comparison, advantages, and disadvantages of data science methodologies - Getting insights from data - Introduction to modeling Watch the video on Youtube: http://bit.ly/DSNYoutube Subscribe to the Data Science Now newsletter here: https://bit.ly/DSNNewsletter Follow Closter on social media: - Instagram: https://instagram.com/closterteam - Twitter: https://twitter.com/ClosterTeam - Facebook: https://www.facebook.com/ClosterTeam/ - LinkedIn: https://www.linkedin.com/company/closter #DataScienceNow #DataScience #Closter #MachineLearning #FavioDataJourney #Podcast
In this episode, Rosalie Bartlett, Sr. Open Source Community Manager, interviews Shaunak Mishra, Sr. Research Scientist, Verizon Media. Shaunak discusses two papers he presented at Knowledge Discovery and Data Mining (KDD) - “Understanding Consumer Journey using Attention-based Recurrent Neural Networks” and “Learning from Multi-User Activity Trails for B2B Ad Targeting”.
Sağlıkta teknolojik dönüşüm ve akıllı tanısal uygulamalar, bir diğer adı ile robot muayenesi ile, kendi kendine teşhis döneminin popülerliği gittikçe artmaktadır. Kullanıma sunulmuş fakat yaygınlaşamayan, halen kullanımda mevcut olan ya da piyasaya sürülecek çok fazla yapay zeka uygulaması gündemi daha da meşgul edecek gibi görünmektedir.. Acil tıpta da kullanımı oldukça popüler olabilecek bazı uygulamalardan bahsetmeden önce yapay zekanın tanımından, etkileşimde bulunduğu alanlardan ve kullandığı metodolojiden bahsedilmelidir. TANIMLAR Doğal zekayı insan zekası, teknolojik sistemleri de makine olarak tanımlarsak yapay zeka insan zekasını taklit eden yani insan gibi düşünen ve davranan sistemlerdir. Bu tanımda insan gibi düşünüp davranması konusu; insanların da hayatta kalmak için doğru ve mantıklı düşünebildiği inancına dayanmaktadır. Halen insan zekasının tam anlamda birebir taklidi olamasa da aklın bilişsel yani öğrenme ve problem çözme fonksiyonlarının taklit edilebilirliği üzerine kurulu sistemlerdir. Günlük hayatta bir problemi çözmede duyularla algıladıklarımızı, öğrendiklerimizi ve deneyimlerimizi (input) kullanarak problemi analiz eder (düşünme-process) ve dış dünyaya tepki veririz (output). Makinelerin de bu süreci yapabilme potansiyeli, yaygın olarak Kasparov’u yenen ilk bilgisayar Deep Blue olarak bilinse de; 1950'lerde Alan Turing tarafından keşfedilmiştir. Turing makinelerin taklit yeteneğinin o kadar iyi olduğuna inanıyordu ki, bir dizi soruya verilen cevaplara bakarak, kişilerin hangisi makine hangisi bilgisayar olduğunu anlayamayacaklarını iddia etmişti. Karıştırılan bir diğer terim ise robotlar ve robotik. Robot (-ik, bilim dalı) bir bilgisayar aracılığı ile programlanabilen bir dizi eylemi otomatik olarak yapabilen makinelere denir. Yapay zeka robotların içine entegre olabilir ya da olmayabilir. Yapay zekanın insanı taklit edebilme yeteneği, programlanabilir kodlardan çok, kendisine sunulan verilerden denetimli ya da denetimsiz olarak kendi oluşturduğu hesaplamalar ve makine öğrenmesi yöntemlerine dayanmaktadır. Denetimli ve denetimsiz makine öğrenmesi yöntemleri ayrı bir konu olarak ele alınabilir. Kısaca anlatmak gerekirse belirli bir sonuca ulaşılmak için verileri algoritmalar ve istatistiksel modelleri kullanarak matematiksel bir model üzerinden yorumlayıp çıkarım yapan ve karar veren sistemlerdir. Acil serviste hesaplanacak, analiz edilecek büyük veri setlerinden makine öğrenmesi, istatistik ve veritabanı sistemleri kullanılarak bilgi edinme sürecine de veri madenciliği adı verilmektedir. Her biri ve her birinin alt dalları ( yapay sinir ağları, genetik algoritmalar, derin öğrenme, karar ağaçları, vs…) ayrı yazı konusu olabilecek bu alanların birbiriyle olan etkileşimleri aşağıdaki şekillerde görülmektedir. Statistics: İstatistikData Mining: Veri MadenciliğiArtificial Intelligence: Yapay ZekaMachine Learning: Makine Öğrenmesi Statistics: İstatistikData Mining: Veri MadenciliğiAI: Yapay ZekaDatabases:VeritabanlarıKDD: Veritabanından bilgi keşfi( Knowledge Discovery in Databases)Machine Learning: Makine ÖğrenmesiPattern Recognition:Örüntü tanımaNeurocomputing: Nöral ağ hesaplamaları ACİL TIP UYGULAMALARI Hekimin yerine geçip geçmediği konusunda farklı kesimlerde farklı düşünceler karşımıza çıkmaktadır. Acil servis yoğunluğu içerisinde bir yemek molası için yerinize birisinin geçmesini dilediğiniz zamanlardan yıllarca edindiğimiz bilgi ve tecrübelere rağmen hastalarımız için karar aşamasında oluşturamadığımız algoritmalara ve zihinsel makinelere ne tür bir algoritma tanıtılacağı konusundaki endişelere kadar bu düşüncelere geniş bir perspektifte bakılabilir. Özellikle son 2 yılda artan ivmeyle literatürde yapay zeka, veri madenciliği ve makine öğrenmesi çalışmalarına ait ulusal tıp kütüphanesi grafiği şekilde görülmektedir. AI: Yapay ZekaDM: Veri MadenciliğiML:Makine Öğrenmesi Bu metodolojiler kullanılarak elde edilen karar destek sistemleri,
In den nächsten Wochen bis zum 20.2.2020 möchte Anna Hein, Studentin der Wissenschaftskommunikation am KIT, eine Studie im Rahmen ihrer Masterarbeit über den Podcast Modellansatz durchführen. Dazu möchte sie gerne einige Interviews mit Ihnen, den Hörerinnen und Hörern des Podcast Modellansatz führen, um herauszufinden, wer den Podcast hört und wie und wofür er genutzt wird. Die Interviews werden anonymisiert und werden jeweils circa 15 Minuten in Anspruch nehmen. Für die Teilnahme an der Studie können Sie sich bis zum 20.2.2020 unter der Emailadresse studie.modellansatz@web.de bei Anna Hein melden. Wir würden uns sehr freuen, wenn sich viele Interessenten melden würden. Gudruns Arbeitsgruppe begrüßte im Januar 2020 Andrea Walther als Gast. Sie ist Expertin für das algorithmische Differenzieren (AD) und ihre Arbeitsgruppe ist verantwortlich für das ADOL-C Programmpaket zum algorithmischen Differenzieren. Zusammen mit Andreas Griewank hat sie 2008 das Standardbuch zu AD veröffentlicht. Im Abitur und im mathematischen Grundstudium lernt jede und jeder Anwendungen kennen, wo Ableitungen von Funktionen gebraucht werden. Insbesondere beim Auffinden von Minima und Maxima von Funktionen ist es sehr praktisch, dies als Nullstellen der Ableitung zu finden. Bei der Modellierung komplexer Zusammenhänge mit Hilfe von partiellen Differentialgleichungen ist es möglich, diese Idee in ein abstrakteres Setting zu Übertragen. Eine sogenannte Kostenfunktion misst, wie gut Lösungen von partiellen Differentialgleichungen einer vorgegebenen Bedingung genügen. Man kann sich beispielsweise einen Backofen vorstellen, der aufgeheizt wird, indem am oberen und unteren Rand eine Heizspirale Wärme in den Ofen überträgt. Für den Braten wünscht man sich eine bestimmte Endtemperaturverteilung. Die Wärmeverteilung lässt sich mit Hilfe der Wärmeleitungsgleichung berechnen. In der Kostenfunktion wird dann neben der gewünschten Temperatur auch noch Energieeffizienz gemessen und die Abweichung von der Endtemperatur wird zusammen mit der benötigten Energie minimiert. Auch hierzu werden Ableitungen berechnet, deren Nullstellen helfen, diese Kosten zu minimeren. Man spricht hier von optimaler Steuerung. Eine Möglichkeit, die abstrakte Ableitung auszudrücken, ist das Lösen eines sogenannten adjungierten partiellen Differenzialgleichungsproblems. Aber hier wird es sehr schwierig, immer schnell und fehlerfrei Ableitungen von sehr komplexen und verschachtelten Funktionen zu berechnen, zumal sie für jedes Problem immer wieder neu und anders aussehen. Außerdem braucht man in der numerischen Auswertung des Algorithmus oft nur Werte dieser Ableitung an bestimmten Stellen. Deshalb ist die effiziente Berechnung von Funktionswerten der Ableitung ein unverzichtbarer Baustein in zahlreichen Anwendungen, die von Methoden zur Lösung nichtlinearer Gleichungen bis hin zu ausgefeilten Simulationen in der Optimierung und optimalen Kontrolle reichen. Am liebsten sollte dies der Computer fehlerfrei oder doch mit sehr kleinen Fehlern übernehmen können. Auch für das Newtonverfahren braucht man die Ableitung der Funktion. Es ist das Standardverfahren zur Lösung nichtlinearer Gleichungen und Gleichungssysteme. Das algorithmische Differenzieren (AD) liefert genaue Werte für jede Funktion, die in einer höheren Programmiersprache gegeben ist, und zwar mit einer zeitlichen und räumlichen Komplexität, die durch die Komplexität der Auswertung der Funktion beschränkt ist. Der Kerngedanke der AD ist die systematische Anwendung der Kettenregel der Analysis. Zu diesem Zweck wird die Berechnung der Funktion in eine (typischerweise lange) Folge einfacher Auswertungen zerlegt, z.B. Additionen, Multiplikationen und Aufrufe von elementaren Funktionen wie zum Beispiel Exponentialfunktion oder Potenzen. Die Ableitungen bezüglich der Argumente dieser einfachen Operationen können leicht berechnet werden. Eine systematische Anwendung der Kettenregel ergibt dann die Ableitungen der gesamten Sequenz in Bezug auf die Eingangsvariablen Man unterscheidet zwei Verfahren: den Vorwärts- und den Rückwärtsmodus. Im Vorwärtsmodus berechnet man das Matrizenprodukt der Jacobi-Matrix mit einer beliebigen Matrix (sogenannte Seedmatrix), ohne vorher die Komponenten der Jacobi-Matrix zu bestimmen. Der Rückwärtsmodus besteht aus zwei Phasen. Die Originalfunktion wird zunächst ausgeführt und gewisse Daten abgespeichert. Anschließend rechnet man rückwärts. Dabei werden Richtungsableitungen übergeben und es werden die im ersten Schritt gespeicherten Daten verwendet. Mit dem Rückwärtsmodus von AD ist es möglich, den Gradienten einer skalarwertigen Funktion mit Laufzeitkosten von weniger als vier Funktionsauswertungen zu berechnen. Diese Grenze ist auch noch völlig unabhängig von der Anzahl der Eingangsvariablen. Das ist phänomenal effektiv, aber er ist mit einem erhöhten Speicherbedarf verbunden. Im Laufe der Jahre wurden Checkpointing-Strategien entwickelt, um einen goldenen Mittelweg zu finden. Die Methoden sind für viele und sehr unterschiedliche Anwendungen interessant. In DFG-Projekten an denen Andrea beteiligt war und ist, wurde das unter anderem für die Modellierung von Piezokeramiken und für die Maxwellsche Wellengleichung umgesetzt. Außerdem sprechen Gudrun und Andrea über die Optimierung der Form einer Turbinenschaufel. Andrea begann ihre berufliche Laufbahn mit einer Ausbildung zur Bankkauffrau in Bremerhaven. Sie entschied sich anschließend für ein Studium der Wirtschaftsmathematik, um Mathematik und ihren erlernten Beruf zusammen zu halten. Unter den wenigen verfügbaren Standorten für so ein Studium in Deutschland entschied sie sich für die Universität Bayreuth. Nach Abschluss des Diploms gab es die Chance, an der TU Dresden im Optimierungsfeld zu arbeiten. Dort promovierte sie, wurde es später Leiterin der selbständigen Nachwuchsgruppe "Analyse und Optimierung von Computermodellen", Juniorprofessorin für "Analyse und Optimierung von Computermodellen" und habilitierte sich. 2009-2019 war sie als Professorin für "Mathematik und ihre Anwendungen" an der Universität Paderborn tätig. Seit Oktober 2019 ist sie Professorin für "Mathematische Optimierung", Humboldt-Universität zu Berlin. Literatur und weiterführende Informationen A. Griewank und A. Walther: Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, Second Edition. SIAM (2008). A. Gebremedhin und A. Walther: An Introduction to Algorithmic Differentiation. in WIREs Data Mining and Knowledge Discovery. S. Fiege, A. Walther und A. Griewank: An algorithm for nonsmooth optimization by successive piecewise linearization. Mathematical Programming 177(1-2):343-370 (2019). A. Walther und A. Griewank: Characterizing and testing subdifferential regularity for piecewise smooth objective functions. SIAM Journal on Optimization 29(2):1473-1501 (2019). Podcasts G. Thäter, A. Zarth: Automatic Differentiation, Gespräch im Modellansatz Podcast, Folge 167, Fakultät für Mathematik, Karlsruher Institut für Technologie (KIT), 2018. G. Thäter, P. Allinger und N. Stockelkamp: Strukturoptimierung, Gespräch im Modellansatz Podcast, Folge 053, Fakultät für Mathematik, Karlsruher Institut für Technologie (KIT), 2015.
In this podcast I discuss the (sometimes) wrong use of the term Data Mining, with in accord to the paper From Data Mining to Knowledge Discovery in Databases, written in 1996 by Usama Fayyad, Gregory Shapiro, and Padhraic Smyth, is defined as: Data mining is a step in the KDD process that consists of applying data analysis and discovery algorithms that produce a particular enumeration of patterns (or models) over the data. KDD means Knowledge Discovery in Databases, and is composed by the following steps: Data -> (selection) -> Target Data -> (preprocessing) -> Preprocessed Data -> (transformation) -> Transformed Data -> (data mining) -> Patterns -> (interpretation/evaluation) -> Knowledge Several authors call Data Mining when they are performing the entire cycle (from Data to Knowledge) and not only the data mining step, which can be represented also by the use of classification/clustering algorithms. The reference paper is available at: https://wvvw.aaai.org/ojs/index.php/aimagazine/article/download/1230/1131 Follow my podcast: http://anchor.fm/tkorting Subscribe to my YouTube channel: http://youtube.com/tkorting The intro and the final sounds were recorded at my home, using an old clock that belonged to my grandmother. Thanks for listening
Join the discussion on our Discord server As ML plays a more and more relevant role in many domains of everyday life, it's quite obvious to see more and more attacks to ML systems. In this episode we talk about the most popular attacks against machine learning systems and some mitigations designed by researchers Ambra Demontis and Marco Melis, from the University of Cagliari (Italy). The guests are also the authors of SecML, an open-source Python library for the security evaluation of Machine Learning (ML) algorithms. Both Ambra and Marco are members of research group PRAlab, under the supervision of Prof. Fabio Roli. SecML Contributors Marco Melis (Ph.D Student, Project Maintainer, https://www.linkedin.com/in/melismarco/) Ambra Demontis (Postdoc, https://pralab.diee.unica.it/it/AmbraDemontis) Maura Pintor (Ph.D Student, https://it.linkedin.com/in/maura-pintor) Battista Biggio (Assistant Professor, https://pralab.diee.unica.it/it/BattistaBiggio) References SecML: an open-source Python library for the security evaluation of Machine Learning (ML) algorithms https://secml.gitlab.io/. Demontis et al., “Why Do Adversarial Attacks Transfer? Explaining Transferability of Evasion and Poisoning Attacks,” presented at the 28th USENIX Security Symposium (USENIX Security 19), 2019, pp. 321–338. https://www.usenix.org/conference/usenixsecurity19/presentation/demontis W. Koh and P. Liang, “Understanding Black-box Predictions via Influence Functions,” in International Conference on Machine Learning (ICML), 2017. https://arxiv.org/abs/1703.04730 Melis, A. Demontis, B. Biggio, G. Brown, G. Fumera, and F. Roli, “Is Deep Learning Safe for Robot Vision? Adversarial Examples Against the iCub Humanoid,” in 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), 2017, pp. 751–759. https://arxiv.org/abs/1708.06939 Biggio and F. Roli, “Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning,” Pattern Recognition, vol. 84, pp. 317–331, 2018. https://arxiv.org/abs/1712.03141 Biggio et al., “Evasion attacks against machine learning at test time,” in Machine Learning and Knowledge Discovery in Databases (ECML PKDD), Part III, 2013, vol. 8190, pp. 387–402. https://arxiv.org/abs/1708.06131 Biggio, B. Nelson, and P. Laskov, “Poisoning attacks against support vector machines,” in 29th Int'l Conf. on Machine Learning, 2012, pp. 1807–1814. https://arxiv.org/abs/1206.6389 Dalvi, P. Domingos, Mausam, S. Sanghai, and D. Verma, “Adversarial classification,” in Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Seattle, 2004, pp. 99–108. https://dl.acm.org/citation.cfm?id=1014066 Sundararajan, Mukund, Ankur Taly, and Qiqi Yan. "Axiomatic attribution for deep networks." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017. https://arxiv.org/abs/1703.01365 Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "Model-agnostic interpretability of machine learning." arXiv preprint arXiv:1606.05386 (2016). https://arxiv.org/abs/1606.05386 Guo, Wenbo, et al. "Lemna: Explaining deep learning based security applications." Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2018. https://dl.acm.org/citation.cfm?id=3243792 Bach, Sebastian, et al. "On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation." PloS one 10.7 (2015): E0130140. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0130140
Am HPI und GFZ in Potsdam arbeitet Emmanuel Müller daran, transparente Algorithmen zu entwickeln, mit denen sich aus Rohdaten wertvolles Wissen extrahieren lässt, das dem menschlichen Betrachter bislang verborgen war.
Was ist eigentlich Knowledge Discovery? Emmanuel Müller vom HPI und GFZ erklärt es mit Gummibärchen.
Ruud Van Stiphout discusses Work Package 5 of the EURECA project, its achievement aims and whether or not it has met these goals now that the project is at an end. He also discusses how these developments will help European clinicians. The goal of EURECA (Enabling information re-Use by linking clinical REsearch and Care) is to enable seamless, secure, scalable and consistent linkage of healthcare information residing in electronic health record (EHR) systems with information in clinical research information systems, such as clinical trials.
Today's podcast is brought to you by:John A. Bertetto is a sworn member of the Chicago Police Department. His current areas of study and work include criminal street gangs, social network analysis, and asymmetric threat mitigation. He is the author of Counter-Gang Strategy: Adapted COIN in Policing Criminal Street Gangs, Countering Criminal Street Gangs: Lessons from the Counterinsurgent Battlespace, Designing Law Enforcement: Adaptive Strategies for the Complex Environment, and Toward a Police Ethos: Defining Our Values as a Call to Action. Officer Bertetto’s most recent research article “Reducing Gang Violence through Network Influence Based Targeting of Social Programs” has been accepted to the Industry & Government Track of the 2014 Knowledge Discovery and Data Mining (KDD) annual conference, a conference with a 20% acceptance rate. Officer Bertetto has worked street patrol, organized crime, and research and development assignments. His applied research projects have led to collaborative partnerships with students and faculty at USMA West Point, George Mason University, and the University of Maryland. He is one of the primary designers and the law enforcement SME behind the GANG social network analysis software, which has been featured in Popular Science, Governing, and on MIT’s technology blog, as well as profiled on ABC and BBC news. Officer Bertetto holds a Master of Science degree from Western Illinois University and a Master of Business Administration degree from St. Xavier University. John is one of those coppers that doesn't just see policing as a job with set hours or responsibilities, but he likes to go the extra mile to see a better community and a smarter police agency through educating officers. This is highly evident by not only his writing, which can be found at https://www.scribd.com/john_bertetto#, but also with a very heart touching story that he shares with our listeners about the death of a yong man and the lasting impact his mother made on John when he met her. That meeting gave way to the #DriveForDemario. A push to get a school transport vehicle to help ensure that kids aren't killed trying to get to or from extra-curricular activities at their schools, as is what happened to Demario. You can find John online: Twitter: http://twitter.com/chitowncopperScribd: https://www.scribd.com/john_bertetto#Web: http://foreign-intrigue.com/
Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 02/02
Knowledge Discovery in Databases (KDD) ist der Prozess, nicht-triviale Muster aus großen Datenbanken zu extrahieren, mit dem Ziel, dass diese bisher unbekannt, potentiell nützlich, statistisch fundiert und verständlich sind. Der Prozess umfasst mehrere Schritte wie die Selektion, Vorverarbeitung, Evaluierung und den Analyseschritt, der als Data-Mining bekannt ist. Eine der zentralen Aufgabenstellungen im Data-Mining ist die Ausreißererkennung, das Identifizieren von Beobachtungen, die ungewöhnlich sind und mit der Mehrzahl der Daten inkonsistent erscheinen. Solche seltene Beobachtungen können verschiedene Ursachen haben: Messfehler, ungewöhnlich starke (aber dennoch genuine) Abweichungen, beschädigte oder auch manipulierte Daten. In den letzten Jahren wurden zahlreiche Verfahren zur Erkennung von Ausreißern vorgeschlagen, die sich oft nur geringfügig zu unterscheiden scheinen, aber in den Publikationen experimental als ``klar besser'' dargestellt sind. Ein Schwerpunkt dieser Arbeit ist es, die unterschiedlichen Verfahren zusammenzuführen und in einem gemeinsamen Formalismus zu modularisieren. Damit wird einerseits die Analyse der Unterschiede vereinfacht, andererseits aber die Flexibilität der Verfahren erhöht, indem man Module hinzufügen oder ersetzen und damit die Methode an geänderte Anforderungen und Datentypen anpassen kann. Um die Vorteile der modularisierten Struktur zu zeigen, werden (i) zahlreiche bestehende Algorithmen in dem Schema formalisiert, (ii) neue Module hinzugefügt, um die Robustheit, Effizienz, statistische Aussagekraft und Nutzbarkeit der Bewertungsfunktionen zu verbessern, mit denen die existierenden Methoden kombiniert werden können, (iii) Module modifiziert, um bestehende und neue Algorithmen auf andere, oft komplexere, Datentypen anzuwenden wie geographisch annotierte Daten, Zeitreihen und hochdimensionale Räume, (iv) mehrere Methoden in ein Verfahren kombiniert, um bessere Ergebnisse zu erzielen, (v) die Skalierbarkeit auf große Datenmengen durch approximative oder exakte Indizierung verbessert. Ausgangspunkt der Arbeit ist der Algorithmus Local Outlier Factor (LOF). Er wird zunächst mit kleinen Erweiterungen modifiziert, um die Robustheit und die Nutzbarkeit der Bewertung zu verbessern. Diese Methoden werden anschließend in einem gemeinsamen Rahmen zur Erkennung lokaler Ausreißer formalisiert, um die entsprechenden Vorteile auch in anderen Algorithmen nutzen zu können. Durch Abstraktion von einem einzelnen Vektorraum zu allgemeinen Datentypen können auch räumliche und zeitliche Beziehungen analysiert werden. Die Verwendung von Unterraum- und Korrelations-basierten Nachbarschaften ermöglicht dann, einen neue Arten von Ausreißern in beliebig orientierten Projektionen zu erkennen. Verbesserungen bei den Bewertungsfunktionen erlauben es, die Bewertung mit der statistischen Intuition einer Wahrscheinlichkeit zu interpretieren und nicht nur eine Ausreißer-Rangfolge zu erstellen wie zuvor. Verbesserte Modelle generieren auch Erklärungen, warum ein Objekt als Ausreißer bewertet wurde. Anschließend werden für verschiedene Module Verbesserungen eingeführt, die unter anderem ermöglichen, die Algorithmen auf wesentlich größere Datensätze anzuwenden -- in annähernd linearer statt in quadratischer Zeit --, indem man approximative Nachbarschaften bei geringem Verlust an Präzision und Effektivität erlaubt. Des weiteren wird gezeigt, wie mehrere solcher Algorithmen mit unterschiedlichen Intuitionen gleichzeitig benutzt und die Ergebnisse in einer Methode kombiniert werden können, die dadurch unterschiedliche Arten von Ausreißern erkennen kann. Schließlich werden für reale Datensätze neue Ausreißeralgorithmen konstruiert, die auf das spezifische Problem angepasst sind. Diese neuen Methoden erlauben es, so aufschlussreiche Ergebnisse zu erhalten, die mit den bestehenden Methoden nicht erreicht werden konnten. Da sie aus den Bausteinen der modularen Struktur entwickelt wurden, ist ein direkter Bezug zu den früheren Ansätzen gegeben. Durch Verwendung der Indexstrukturen können die Algorithmen selbst auf großen Datensätzen effizient ausgeführt werden.
The proliferation of social networks, where individuals share private information, has caused, in the last few years, a growth in the volume of sensitive data being stored in these networks. As users subscribe to more services and connect more with their friends, families, and colleagues, the desire to use this information from the networks has increased. Online social interaction has become very popular around the globe and most sociologists agree that this will not fade away. Social network sites gather confidential information from their users (for instance, the social network site PacientsLikeMe collects confidential health information) and, as a result, social network data has begun to be analyzed from a different, specific privacy perspective. Since the individual entities in social networks, besides the attribute values that characterize them, also have relationships with other entities, the risk of disclosure increases. In this talk we present a greedy algorithm for anonymizing a social network and a measure that quantifies the information loss in the anonymization process due to edge generalization. About the speaker: Traian Marius Truta is an associate professor of Computer Science at Northern Kentucky University. He received his Ph.D. in computer science from Wayne State University in 2004. His major areas of expertise are data privacy and anonymity, privacy in statistical databases, and data management. He has served on the program committee of various conferences such as International Conference on Database and Expert Systems Applications (DEXA), Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), ACM Symposium of Applied Computing (SAC), and International Symposium on Data, Privacy, and E-Commerce (ISDPE). He received the Yahoo Research! Best Paper Award for Privacy, Security, and Trust in KDD 2008 (PinKDD) for the paper "A Clustering Approach for Data and Structural Anonymity in Social Networks" in 2008. For more information, including the list of research publications please see: http://www.nku.edu/~trutat1/research.html.
Vast resources are devoted to predicting human behavior in domainssuch as economics, popular culture, and national security, but thequality of such predictions is often poor. Thus, it is tempting toconclude that this inability to make good predictions is a consequenceof some fundamental lack of predictability on the part of humans.However, recent work offers evidence that the failure of standardprediction methods does not indicate an absence of humanpredictability but instead reflects:1. misunderstandings regarding which features of human dynamicsactually possess predictive power2. the fact that, until recently, it has not been possible to measurethese predictive features in real world settings.This talk introduces some of the science behind these basicobservations and demonstrates their utility in various case studies.We begin by considering social groups in which individuals areinfluenced by the behavior of others. Correctly identify andunderstanding the social forces in these situations can increase theextent to which the outcome of a social process can be predicted inits very early stages. This finding is then leveraged to designprediction methods which outperform existing techniques for predictingsocial network dynamics. We also look at the analysis of thepredictability of adversary behavior in the co-evolutionary "armsraces" that exist between attackers and defenders in many domains. Ouranalysis reveals that conventional wisdom regarding these co-evolvingsystems is incomplete, and provides insights which enable thedevelopment of predictive methods for computer network security. About the speaker: David Zage is a senior member of Sandia National Laboratories in theCyber Analysis R&D group. His main research interest are in the areasof security, networking, and distributed systems. David received hisPh.D. in computer science from Purdue University in 2010 and his B.S.in computer science from Purdue in 2004.
Vast resources are devoted to predicting human behavior in domains such as economics, popular culture, and national security, but the quality of such predictions is often poor. Thus, it is tempting to conclude that this inability to make good predictions is a consequence of some fundamental lack of predictability on the part of humans. However, recent work offers evidence that the failure of standard prediction methods does not indicate an absence of human predictability but instead reflects: 1. misunderstandings regarding which features of human dynamics actually possess predictive power 2. the fact that, until recently, it has not been possible to measure these predictive features in real world settings. This talk introduces some of the science behind these basic observations and demonstrates their utility in various case studies. We begin by considering social groups in which individuals are influenced by the behavior of others. Correctly identify and understanding the social forces in these situations can increase the extent to which the outcome of a social process can be predicted in its very early stages. This finding is then leveraged to design prediction methods which outperform existing techniques for predicting social network dynamics. We also look at the analysis of the predictability of adversary behavior in the co-evolutionary "arms races" that exist between attackers and defenders in many domains. Our analysis reveals that conventional wisdom regarding these co-evolving systems is incomplete, and provides insights which enable the development of predictive methods for computer network security.
The Intel Science and Technology Center for Embedded Computing at Carnegie Mellon University in Pittsburgh, Pennsylvania, will conduct research in four basic areas: Collaborative Perception which is essentially computer vision, Real-time Knowledge Discovery which includes machine learning, Robotics, and Embedded Systems Architecture. Embedded systems, used in automobiles, homes and many products, are already having a […]
Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02
Knowledge Discovery in Databases (KDD) is the non-trivial process of identifying valid, novel, useful and ultimately understandable patterns in data. The core step of the KDD process is the application of Data Mining (DM) algorithms to efficiently find interesting patterns in large databases. This thesis concerns itself with three inter-related themes: Generalised interaction and rule mining; the incorporation of statistics into novel data mining approaches; and probabilistic frequent pattern mining in uncertain databases. An interaction describes an effect that variables have -- or appear to have -- on each other. Interaction mining is the process of mining structures on variables describing their interaction patterns -- usually represented as sets, graphs or rules. Interactions may be complex, represent both positive and negative relationships, and the presence of interactions can influence another interaction or variable in interesting ways. Finding interactions is useful in domains ranging from social network analysis, marketing, the sciences, e-commerce, to statistics and finance. Many data mining tasks may be considered as mining interactions, such as clustering; frequent itemset mining; association rule mining; classification rules; graph mining; flock mining; etc. Interaction mining problems can have very different semantics, pattern definitions, interestingness measures and data types. Solving a wide range of interaction mining problems at the abstract level, and doing so efficiently -- ideally more efficiently than with specialised approaches, is a challenging problem. This thesis introduces and solves the Generalised Interaction Mining (GIM) and Generalised Rule Mining (GRM) problems. GIM and GRM use an efficient and intuitive computational model based purely on vector valued functions. The semantics of the interactions, their interestingness measures and the type of data considered are flexible components of vectorised frameworks. By separating the semantics of a problem from the algorithm used to mine it, the frameworks allow both to vary independently of each other. This makes it easier to develop new methods by focusing purely on a problem's semantics and removing the burden of designing an efficient algorithm. By encoding interactions as vectors in the space (or a sub-space) of samples, they provide an intuitive geometric interpretation that inspires novel methods. By operating in time linear in the number of interesting interactions that need to be examined, the GIM and GRM algorithms are optimal. The use of GRM or GIM provides efficient solutions to a range of problems in this thesis, including graph mining, counting based methods, itemset mining, clique mining, a clustering problem, complex pattern mining, negative pattern mining, solving an optimisation problem, spatial data mining, probabilistic itemset mining, probabilistic association rule mining, feature selection and generation, classification and multiplication rule mining. Data mining is a hypothesis generating endeavour, examining large databases for patterns suggesting novel and useful knowledge to the user. Since the database is a sample, the patterns found should describe hypotheses about the underlying process generating the data. In searching for these patterns, a DM algorithm makes additional hypothesis when it prunes the search space. Natural questions to ask then, are: "Does the algorithm find patterns that are statistically significant?" and "Did the algorithm make significant decisions during its search?". Such questions address the quality of patterns found though data mining and the confidence that a user can have in utilising them. Finally, statistics has a range of useful tools and measures that are applicable in data mining. In this context, this thesis incorporates statistical techniques -- in particular, non-parametric significance tests and correlation -- directly into novel data mining approaches. This idea is applied to statistically significant and relatively class correlated rule based classification of imbalanced data sets; significant frequent itemset mining; mining complex correlation structures between variables for feature selection; mining correlated multiplication rules for interaction mining and feature generation; and conjunctive correlation rules for classification. The application of GIM or GRM to these problems lead to efficient and intuitive solutions. Frequent itemset mining (FIM) is a fundamental problem in data mining. While it is usually assumed that the items occurring in a transaction are known for certain, in many applications the data is inherently noisy or probabilistic; such as adding noise in privacy preserving data mining applications, aggregation or grouping of records leading to estimated purchase probabilities, and databases capturing naturally uncertain phenomena. The consideration of existential uncertainty of item(sets) makes traditional techniques inapplicable. Prior to the work in this thesis, itemsets were mined if their expected support is high. This returns only an estimate, ignores the probability distribution of support, provides no confidence in the results, and can lead to scenarios where itemsets are labeled frequent even if they are more likely to be infrequent. Clearly, this is undesirable. This thesis proposes and solves the Probabilistic Frequent Itemset Mining (PFIM) problem, where itemsets are considered interesting if the probability that they are frequent is high. The problem is solved under the possible worlds model and a proposed probabilistic framework for PFIM. Novel and efficient methods are developed for computing an itemset's exact support probability distribution and frequentness probability, using the Poisson binomial recurrence, generating functions, or a Normal approximation. Incremental methods are proposed to answer queries such as finding the top-k probabilistic frequent itemsets. A number of specialised PFIM algorithms are developed, with each being more efficient than the last: ProApriori is the first solution to PFIM and is based on candidate generation and testing. ProFP-Growth is the first probabilistic FP-Growth type algorithm and uses a proposed probabilistic frequent pattern tree (Pro-FPTree) to avoid candidate generation. Finally, the application of GIM leads to GIM-PFIM; the fastest known algorithm for solving the PFIM problem. It achieves orders of magnitude improvements in space and time usage, and leads to an intuitive subspace and probability-vector based interpretation of PFIM.
Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02
Knowledge Discovery in Databases (KDD) is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. The core step of the KDD process is the application of a Data Mining algorithm in order to produce a particular enumeration of patterns and relationships in large databases. Clustering is one of the major data mining techniques and aims at grouping the data objects into meaningful classes (clusters) such that the similarity of objects within clusters is maximized, and the similarity of objects from different clusters is minimized. This can serve to group customers with similar interests, or to group genes with related functionalities. Currently, a challenge for clustering-techniques are especially high dimensional feature-spaces. Due to modern facilities of data collection, real data sets usually contain many features. These features are often noisy or exhibit correlations among each other. However, since these effects in different parts of the data set are differently relevant, irrelevant features cannot be discarded in advance. The selection of relevant features must therefore be integrated into the data mining technique. Since about 10 years, specialized clustering approaches have been developed to cope with problems in high dimensional data better than classic clustering approaches. Often, however, the different problems of very different nature are not distinguished from one another. A main objective of this thesis is therefore a systematic classification of the diverse approaches developed in recent years according to their task definition, their basic strategy, and their algorithmic approach. We discern as main categories the search for clusters (i) w.r.t. closeness of objects in axis-parallel subspaces, (ii) w.r.t. common behavior (patterns) of objects in axis-parallel subspaces, and (iii) w.r.t. closeness of objects in arbitrarily oriented subspaces (so called correlation cluster). For the third category, the remaining parts of the thesis describe novel approaches. A first approach is the adaptation of density-based clustering to the problem of correlation clustering. The starting point here is the first density-based approach in this field, the algorithm 4C. Subsequently, enhancements and variations of this approach are discussed allowing for a more robust, more efficient, or more effective behavior or even find hierarchies of correlation clusters and the corresponding subspaces. The density-based approach to correlation clustering, however, is fundamentally unable to solve some issues since an analysis of local neighborhoods is required. This is a problem in high dimensional data. Therefore, a novel method is proposed tackling the correlation clustering problem in a global approach. Finally, a method is proposed to derive models for correlation clusters to allow for an interpretation of the clusters and facilitate more thorough analysis in the corresponding domain science. Finally, possible applications of these models are proposed and discussed.
Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02
Knowledge Discovery in Databases (KDD) is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in large data collections. The most important step within the process of KDD is data mining which is concerned with the extraction of the valid patterns. KDD is necessary to analyze the steady growing amount of data caused by the enhanced performance of modern computer systems. However, with the growing amount of data the complexity of data objects increases as well. Modern methods of KDD should therefore examine more complex objects than simple feature vectors to solve real-world KDD applications adequately. Multi-instance and multi-represented objects are two important types of object representations for complex objects. Multi-instance objects consist of a set of object representations that all belong to the same feature space. Multi-represented objects are constructed as a tuple of feature representations where each feature representation belongs to a different feature space. The contribution of this thesis is the development of new KDD methods for the classification and clustering of complex objects. Therefore, the thesis introduces solutions for real-world applications that are based on multi-instance and multi-represented object representations. On the basis of these solutions, it is shown that a more general object representation often provides better results for many relevant KDD applications. The first part of the thesis is concerned with two KDD problems for which employing multi-instance objects provides efficient and effective solutions. The first is the data mining in CAD parts, e.g. the use of hierarchic clustering for the automatic construction of product hierarchies. The introduced solution decomposes a single part into a set of feature vectors and compares them by using a metric on multi-instance objects. Furthermore, multi-step query processing using a novel filter step is employed, enabling the user to efficiently process similarity queries. On the basis of this similarity search system, it is possible to perform several distance based data mining algorithms like the hierarchical clustering algorithm OPTICS to derive product hierarchies. The second important application is the classification and search for complete websites in the world wide web (WWW). A website is a set of HTML-documents that is published by the same person, group or organization and usually serves a common purpose. To perform data mining for websites, the thesis presents several methods to classify websites. After introducing naive methods modelling websites as webpages, two more sophisticated approaches to website classification are introduced. The first approach uses a preprocessing that maps single HTML-documents within each website to so-called page classes. The second approach directly compares websites as sets of word vectors and uses nearest neighbor classification. To search the WWW for new, relevant websites, a focused crawler is introduced that efficiently retrieves relevant websites. This crawler minimizes the number of HTML-documents and increases the accuracy of website retrieval. The second part of the thesis is concerned with the data mining in multi-represented objects. An important example application for this kind of complex objects are proteins that can be represented as a tuple of a protein sequence and a text annotation. To analyze multi-represented objects, a clustering method for multi-represented objects is introduced that is based on the density based clustering algorithm DBSCAN. This method uses all representations that are provided to find a global clustering of the given data objects. However, in many applications there already exists a sophisticated class ontology for the given data objects, e.g. proteins. To map new objects into an ontology a new method for the hierarchical classification of multi-represented objects is described. The system employs the hierarchical structure of the ontology to efficiently classify new proteins, using support vector machines.
Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02
Knowledge Discovery in Databases (KDD) is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. The core step of the KDD process is the application of a Data Mining algorithm in order to produce a particular enumeration of patterns and relationships in large databases. Clustering is one of the major data mining tasks and aims at grouping the data objects into meaningful classes (clusters) such that the similarity of objects within clusters is maximized, and the similarity of objects from different clusters is minimized. Beside many others, the density-based clustering notion underlying the algorithm DBSCAN and its hierarchical extension OPTICS has been proposed recently, being one of the most successful approaches to clustering. In this thesis, our aim is to advance the state-of-the-art clustering, especially density-based clustering by identifying novel challenges for density-based clustering and proposing innovative and solid solutions for these challenges. We describe the development of the industrial prototype BOSS (Browsing OPTICS plots for Similarity Search) which is a first step towards developing a comprehensive, scalable and distributed computing solution designed to make the efficiency and analytical capabilities of OPTICS available to a broader audience. For the development of BOSS, several key enhancements of OPTICS are required which are addressed in this thesis. We develop incremental algorithms of OPTICS to efficiently reconstruct the hierarchical clustering structure in frequently updated databases, in particular, when a set of objects is inserted in or deleted from the database. We empirically show that these incremental algorithms yield significant speed-up factors over the original OPTICS algorithm. Furthermore, we propose a novel algorithm for automatic extraction of clusters from hierarchical clustering representations that outperforms comparative methods, and introduce two novel approaches for selecting meaningful representatives, using the density-based concepts of OPTICS and producing better results than the related medoid approach. Another major challenge for density-based clustering is to cope with high dimensional data. Many today's real-world data sets contain a large number of measurements (or features) for a single data object. Usually, global feature reduction techniques cannot be applied to these data sets. Thus, the task of feature selection must be combined with and incooperated into the clustering process. In this thesis, we present original extensions and enhancements of the density-based clustering notion to cope with high dimensional data. In particular, we propose an algorithm called SUBCLU (density based SUBspace CLUstering) that extends DBSCAN to the problem of subspace clustering. SUBCLU efficiently computes all clusters that would have been found if DBSCAN is applied to all possible subspaces of the feature space. An experimental evaluation on real-world data sets illustrates that SUBCLU is more effective than existing subspace clustering algorithms because it is able to find clusters of arbitrary size and shape, and produces determine results. A semi-hierarchical extension of SUBCLU called RIS (Ranking Interesting Subspaces) is proposed that does not compute the subspace clusters directly, but generates a list of subspaces ranked by their clustering characteristics. A hierarchical clustering algorithm can be applied to these interesting subspaces in order to compute a hierarchical (subspace) clustering. A comparative evaluation of RIS and SUBCLU shows that RIS in combination with OPTICS can achieve an information gain over SUBCLU. In addition, we propose the algorithm 4C (Computing Correlation Connected Clusters) that extends the concepts of DBSCAN to compute density-based correlation clusters. 4C benefits from an innovative, well-defined and effective clustering model, outperforming related approaches in terms of clustering quality on real-world data sets.
Knowledge Discovery in Databases (KDD) bezeichnet einen methodischen Ansatz, bei dem Datenmuster in großen Datensätzen identifiziert und explorative Hypothesen überprüft werden. KDD umfasst Auswahl, Aufbereitung und Vorverarbeitung der Daten, sowie Data Mining (Mustererkennung) und Interpretation der Ergebnisse. Die zugrunde liegenden Datensätze entstehen entweder automatisch, z.B. durch die Datenverarbeitung einer Krankenkasse oder werden in Omnibusbefragungen erhoben. Bisher wird KDD überwiegend in den Wirtschafts- und Biowissenschaften angewendet. In dieser Arbeit wird überprüft, ob KDD auch zur Exploration psychologischer Fragestellungen geeignet ist. Dazu wurde an einer frei verfügbaren medizinischen Langzeitstudie der amerikanischen Gesundheitsbehörde mit über 49 000 Teilnehmenden (Medical Expenditure Panel Survey) eine klinisch-psychologische Fragestellung untersucht. Die durch KDD gewonnenen Daten wurden mit den Befunden aus epidemiologischen und klinischen Studien verglichen. Das Verfahren erweist sich für korrelative Designs als sinnvoll einsetzbar, wenn Einschränkungen in der Reliabilität und Validität aufgrund ökonomischer Vorteile in Kauf genommen werden.
Intrusion detection (ID) is an important component of infrastructure protection mechanisms. Intrusion detection systems (IDSs) need to be accurate, adaptive, extensible, and cost-effective. These requirements are very challenging because of the complexities of today's network environments and the lack of IDS development tools. Our research aims to systematically improve the development process of IDSs. In the first half of the talk, I will describe our data mining framework for constructing ID models. This framework mines activity patterns from system audit data and extracts predictive features from the patterns. It then applies machine learning algorithms to the audit records, which are processed according to the feature definitions, to generate intrusion detection rules. This framework is a "toolkit" (rather than a "replacement") for the IDS developers. I will discuss the design and implementation issues in utilizing expert domain knowledge in our framework. In the second half of the talk, I will give an overview of our current research efforts, which include: cost-sensitive analysis and modeling techniques for intrusion detection; information-theoretic approaches for anomaly detection; and correlation analysis techniques for understanding attack scenarios and early detection of intrusions. About the speaker: Wenke Lee is an Assistant Professor in the Computer Science Department at North Carolina State University. He received his Ph.D. in Computer Science from Columbia University and B.S. in Computer Science from Zhongshan University, China. His research interests include network security, data mining, and workflow management. He is a Principle Investigator (PI) for research projects in intrusion detection and network management, with funding from DARPA, North Carolina Network Initiatives, Aprisma Management Technologies, and HRL Laboratories. He received a Best Paper Award (applied research category) at the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), and Honorable Mention (runner-up) for Best Paper Award (applied research category) at both KDD-98 and KDD-97. He is a member of ACM and IEEE.