POPULARITY
Niloofar is a Postdoctoral researcher at University of Washington with research interests in building privacy preserving AI systems and studying the societal implications of machine learning models. She received her PhD in Computer Science from UC San Diego in 2023 and has received multiple awards and honors for research contributions. Time stamps of the conversation 00:00:00 Highlights 00:01:35 Introduction 00:02:56 Entry point in AI 00:06:50 Differential privacy in AI systems 00:11:08 Privacy leaks in large language models 00:15:30 Dangers of training AI on public data on internet 00:23:28 How auto-regressive training makes things worse 00:30:46 Impact of Synthetic data for fine-tuning 00:37:38 Most critical stage in AI pipeline to combat data leaks 00:44:20 Contextual Integrity 00:47:10 Are LLMs creative? 00:55:24 Under vs. Overpromises of LLMs 01:01:40 Publish vs. perish culture in AI research recently 01:07:50 Role of academia in LLM research 01:11:35 Choosing academia vs. industry 01:17:34 Mental Health and overarching More about Niloofar: https://homes.cs.washington.edu/~niloofar/ And references to some of the papers discussed: https://arxiv.org/pdf/2310.17884 https://arxiv.org/pdf/2410.17566 https://arxiv.org/abs/2202.05520 About the Host: Jay is a PhD student at Arizona State University working on improving AI for medical diagnosis and prognosis. Linkedin: https://www.linkedin.com/in/shahjay22/ Twitter: https://twitter.com/jaygshah22 Homepage: http://jayshah.me/ for any queries. Stay tuned for upcoming webinars! ***Disclaimer: The information in this video represents the views and opinions of the speaker and does not necessarily represent the views or opinions of any institution. It does not constitute an endorsement by any Institution or its affiliates of such video content.***
We are pleased to invite you to a LIVE distinguished YouTube panel discussion on Strategies for Securing Early Career Awards
What if privacy could be as dynamic and socially aware as the communities it aims to protect? Sebastian Benthall, a senior research fellow from NYU's Information Law Institute, shows us how privacy is complex. He uses Helen Nissenbaum's work with contextual integrity and concepts in differential privacy to explain the complexity of privacy. Our talk explains how privacy is not just about protecting data but also about following social rules in different situations, from healthcare to education. These rules can change privacy regulations in big ways.Show notesIntro: Sebastian Benthall (0:03)Research: Designing Fiduciary Artificial Intelligence (Benthall, Shekman)Integrating Differential Privacy and Contextual Integrity (Benthall, Cummings)Exploring differential privacy and contextual integrity (1:05)Discussion about the origins of each subjectHow are differential privacy and contextual integrity used to enforce each other?Accepted context or legitimate context? (9:33)Does context develop from what society accepts over time?Approaches to determine situational context and legitimacyNext steps in contextual integrity (13:35)Is privacy as we know it ending?Areas where integrated differential privacy and contextual integrity can help (Cummings)Interpretations of differential privacy (14:30)Not a silver bulletNew questions posed from NIST about its applicationPrivacy determined by social norms (20:25)Game theory and its potential for understanding social normsAgents and governance: what will ultimately decide privacy? (25:27)Voluntary disclosures and the biases it can present towards groups that are least concerned with privacyAvoiding self-fulfilling prophecy from data and contextWhat did you think? Let us know.Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics: LinkedIn - Episode summaries, shares of cited articles, and more. YouTube - Was it something that we said? Good. Share your favorite quotes. Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.
Summary In this week's episode, Anna (https://x.com/AnnaRRose) and Guille (https://x.com/GuilleAngeris) chat with Ying Tong Lai (https://x.com/therealyingtong) from Geometry Research (https://geometry.dev/) and Bryan Gillespie (https://x.com/bryan_gillespie) from Inversed Tech (https://inversed.tech/) about their latest research and works to date. They dive into the pair's recent work ‘SoK: Programmable Privacy in Distributed Systems (https://eprint.iacr.org/2024/982)', exploring the classifications and frameworks being introduced. Here's some additional links for this episode: SoK: Programmable Privacy in Distributed Systems by Benarroch, Gillespie, Lai and Miller (https://eprint.iacr.org/2024/982) Private Programmability in Zcash - Research Results and Community Discussion (https://forum.zcashcommunity.com/t/48016) Zcash Halo2 GitHub (https://github.com/zcash/halo2) Zk0x02 - An intro to Zcash and zkSNARKs - Ariel Gabizon (Zcash) (https://www.youtube.com/watch?v=Kx4cIkCY2EA) Moving SNARKs from the generic to algebraic group model by Ariel Gabizon (https://medium.com/@arielgabizon/moving-snarks-from-the-generic-to-algebraic-group-model-56549d60b90d) Explaining SNARKs Part I: Homomorphic Hidings by Ariel Gabizon (https://electriccoin.co/blog/snark-explain/) Differential Privacy in Constant Function Market Makers by Chitra, Angeris and Evans (https://fc22.ifca.ai/preproceedings/30.pdf) A Note on Privacy in Constant Function Market Makers by Angeris, Evans and Chitra (https://angeris.github.io/papers/cfmm-privacy.pdf) On Privacy Notions in Anonymous Communication by Kuhn, Beck, Schiffner, Jorswieck, and Strufe (https://arxiv.org/pdf/1812.05638) ZK Hack Montreal has been announced for Aug 9 - 11! Apply to join the hackathon here (https://zk-hack-montreal.devfolio.co/). Episode Sponsors Aleo (http://aleo.org/) is a new Layer-1 blockchain that achieves the programmability of Ethereum, the privacy of Zcash, and the scalability of a rollup. As Aleo is gearing up for their mainnet launch in Q1, this is an invitation to be part of a transformational ZK journey. Dive deeper and discover more about Aleo at http://aleo.org/ (http://aleo.org/). If you like what we do: * Find all our links here! @ZeroKnowledge | Linktree (https://linktr.ee/zeroknowledge) * Subscribe to our podcast newsletter (https://zeroknowledge.substack.com) * Follow us on Twitter @zeroknowledgefm (https://twitter.com/zeroknowledgefm) * Join us on Telegram (https://zeroknowledge.fm/telegram) * Catch us on YouTube (www.youtube.com/channel/UCYWsYz5cKw4wZ9Mpe4kuM_g)
Today, I chat with Gianclaudio Malgieri, an expert in privacy, data protection, AI regulation, EU law, and human rights. Gianclaudio is an Associate Professor of Law at Leiden University, the Co-director of the Brussels Privacy Hub, Associate Editor of the Computer Law & Security Review, and co-author of the paper "The Unfair Side of Privacy Enhancing Technologies: Addressing the Trade-offs Between PETs and Fairness". In our conversation, we explore this paper and why privacy-enhancing technologies (PETs) are essential but not enough on their own to address digital policy challenges.Gianclaudio explains why PETs alone are insufficient solutions for data protection and discusses the obstacles to achieving fairness in data processing – including bias, discrimination, social injustice, and market power imbalances. We discuss data alteration techniques such as anonymization, pseudonymization, synthetic data, and differential privacy in relation to GDPR compliance. Plus, Gianclaudio highlights the issues of representation for minorities in differential privacy and stresses the importance of involving these groups in identifying bias and assessing AI technologies. We also touch on the need for ongoing research on PETs to address these challenges and share our perspectives on the future of this research. Topics Covered: What inspired Gianclaudio to research fairness and PETsHow PETs are about power and controlThe legal / GDPR and computer science perspectives on 'fairness'How fairness relates to discrimination, social injustices, and market power imbalances How data obfuscation techniques relate to AI / ML How well the use of anonymization, pseudonymization, and synthetic data techniques address data protection challenges under the GDPRHow the use of differential privacy techniques may led to unfairness Whether the use of encrypted data processing tools and federated and distributed analytics achieve fairness 3 main PET shortcomings and how to overcome them: 1) bias discovery; 2) harms to people belonging to protected groups and individuals autonomy; and 3) market imbalances.Areas that warrant more research and investigation Resources Mentioned:Read: "The Unfair Side of Privacy Enhancing Technologies: Addressing the Trade-offs Between PETs and Fairness"Guest Info: Connect with Gianclaudio on LinkedInLearn more about Brussles Privacy HubSend us a Text Message.Privado.aiPrivacy assurance at the speed of product development. Get instant visibility w/ privacy code scans.TRU Staffing PartnersTop privacy talent - when you need it, where you need it.Disclaimer: This post contains affiliate links. If you make a purchase, I may receive a commission at no extra cost to you.Copyright © 2022 - 2024 Principled LLC. All rights reserved.
Explore the basics of differential privacy and its critical role in protecting individual anonymity. The hosts explain the latest guidelines and best practices in applying differential privacy to data for models such as AI. Learn how this method also makes sure that personal data remains confidential, even when datasets are analyzed or hacked.Show NotesIntro and AI news (00:00) Google AI search tells users to glue pizza and eat rocks Gary Marcus on break? (Maybe and X only break)What is differential privacy? (06:34)Differential privacy is a process for sensitive data anonymization that offers each individual in a dataset the same privacy they would experience if they were removed from the dataset entirely.NIST's recent paper SP 800-226 IPD: “Any privacy harms that result form a differentially private analysis could have happened if you had not contributed your data”.There are two main types of differential privacy: global (NIST calls it Central) and localWhy should people care about differential privacy? (11:30)Interest has been increasing for organizations to intentionally and systematically prioritize the privacy and safety of user dataSpeed up deployments of AI systems for enterprise customers since connections to raw data do not need to be establishedIncrease data security for customers that utilize sensitive data in their modeling systemsMinimize the risk of sensitive data exposure for your data privileges - i.e. Don't be THAT organizationGuidelines and resources for applied differential privacyGuidelines for Evaluating Differential Privacy Guarantees: NIST De-IdentificationPractical examples of applied differential privacy (15:58)Continuous Features - cite: Dwork, McSherry, Nissim, and Smith's 2006 seminal paper "Calibrating Noise to Sensitivity in Private Data Analysis”[2], introduces a concept called ε-differential privacyCategorical Features - cite: Warner (1965) created a randomized response technique in his paper titled: “Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias” Summary and key takeaways (23:59)Differential privacy is going to be a part of how many of us need to manage data privacyData providers can't provide us with anonymized data for analysis or when anonymization isn't enough for our privacy needsHopeful that cohort targeting takes over for individual targetingRemember: Differential privacy does not prevent bias!What did you think? Let us know.Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics: LinkedIn - Episode summaries, shares of cited articles, and more. YouTube - Was it something that we said? Good. Share your favorite quotes. Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.
Daten wären wertvoll für die Forschung, doch Daten sammeln verletzt die Privatsphäre. Ein unauflösbares Dilemma? Nein! Es gibt Tricks, wie man Daten nutzen und die Privatsphäre schützen kann: die PET - Privacy Enhancing Technologies. Der Podcast im Überblick (00:00:51) Das Dilemma: Sharing oder Privacy (00:09:47) PET 1 - Anonymisierung (00:15:21) PET 2 - Differential Privacy (00:23:20) PET 3 - Synthetic Data (00:16:42) PET 4 - Trusted Execution Environment (00:25:08) PET 5 - Zero Knowledge Proof (00:26:59) PET 6 - Homomorphic Encryption (00:32:10) PET 7 - Multiparty Computation (00:35:59) PET 8 - Distributed Analytics (00:38:47) PET 9 - Federated Learning (00:45:02) Hindernisse (00:49:48) Biomedizin mit Catherine Jutzeler (01:00:29) Start-up mit Jean-Pierre Hubaux Links Peter zu Federated Learning: https://www.srf.ch/audio/digital-podcast/chindsgi-kuenstliche-intelligenz-und-dorfromantik?id=11969009 Peter zu Homomorphic Encryption: https://www.srf.ch/audio/digital-podcast/dreckige-waesche-sichere-daten?id=11972567 Zero Knowledge Proof (Video): https://www.youtube.com/watch?v=5qzNe1hk0oY Zero Knowledge Proof (Artikel): https://www.spektrum.de/kolumne/zero-knowledge-proof-wie-man-etwas-geheimes-beweist/2140194 OECD-Bericht: https://www.oecd-ilibrary.org/docserver/bf121be4-en.pdf?expires=1714738067&id=id&accname=guest&checksum=23355B1680302D7AC1E70819326D7103 Bericht der Royal Society: https://royalsociety.org/news-resources/projects/privacy-enhancing-technologies/ SRF Geek Sofa bei Discord https://discord.gg/geeksofa
Daten wären wertvoll für die Forschung, doch Daten sammeln verletzt die Privatsphäre. Ein unauflösbares Dilemma? Nein! Es gibt Tricks, wie man Daten nutzen und die Privatsphäre schützen kann: die PET - Privacy Enhancing Technologies. Der Podcast im Überblick (00:00:51) Das Dilemma: Sharing oder Privacy (00:09:47) PET 1 - Anonymisierung (00:15:21) PET 2 - Differential Privacy (00:23:20) PET 3 - Synthetic Data (00:16:42) PET 4 - Trusted Execution Environment (00:25:08) PET 5 - Zero Knowledge Proof (00:26:59) PET 6 - Homomorphic Encryption (00:32:10) PET 7 - Multiparty Computation (00:35:59) PET 8 - Distributed Analytics (00:38:47) PET 9 - Federated Learning (00:45:02) Hindernisse (00:49:48) Biomedizin mit Catherine Jutzeler (01:00:29) Start-up mit Jean-Pierre Hubaux Links Peter zu Federated Learning: https://www.srf.ch/audio/digital-podcast/chindsgi-kuenstliche-intelligenz-und-dorfromantik?id=11969009 Peter zu Homomorphic Encryption: https://www.srf.ch/audio/digital-podcast/dreckige-waesche-sichere-daten?id=11972567 Zero Knowledge Proof (Video): https://www.youtube.com/watch?v=5qzNe1hk0oY Zero Knowledge Proof (Artikel): https://www.spektrum.de/kolumne/zero-knowledge-proof-wie-man-etwas-geheimes-beweist/2140194 OECD-Bericht: https://www.oecd-ilibrary.org/docserver/bf121be4-en.pdf?expires=1714738067&id=id&accname=guest&checksum=23355B1680302D7AC1E70819326D7103 Bericht der Royal Society: https://royalsociety.org/news-resources/projects/privacy-enhancing-technologies/ SRF Geek Sofa bei Discord https://discord.gg/geeksofa
In recent years, differential privacy has emerged as a promising solution for enhancing privacy protections in data processing systems. However, beneath its seemingly robust framework lie certain assumptions that, if left unquestioned, could inadvertently undermine its efficacy in safeguarding individual privacy. Here to discuss their recent papers on differential privacy is Rachel Cummings, Associate Professor of Industrial Engineering and Operations Research at Columbia University and CDT Non-Resident Fellow and Daniel Susser, Associate Professor for the Department of Information Science at Cornell University and CDT Non-Resident Fellow.
Can we take Data Clean Rooms to the next level in terms of baked-in privacy? Damien Desfontaines is a Scientist at Tumult Labs, a startup that helps organizations safely share or publish insights from sensitive data, using differential privacy. Before that, he led the anonymization consulting team at Google, and got his PhD in computer science at ETH Zürich. He maintains a blog that teaches you all about differential privacy. References: Damien Desfontaines on LinkedIn Nicola Newitt: the legal case for Data Clean Rooms (Masters of Privacy) Damien Desfontaines' blog on Differential Privacy Tumult Labs: Resources and publications on Differential Privacy
Fred Trotter on the balancing privacy & connection, the role of AI in societal judgment, and practical privacy protection strategies with a nod to Mighty Casey Watch two five-minute podcast clips on YouTube. Click here to view or download the printable newsletter with associated images Contents Table of Contents Toggle EpisodeProemPodcast introPrivacy in Digital CommunicationHarm reduction, safety, data aggregationCommunication minimalists and maximalistsPrivacy in small villages during the Bronze AgePrivacy in the viral modern ageJudicial enginePrivacy and shameDenied accessPeer-to-peer connection and privacy riskPeople-to-needs connectionA connection you don't know you haveHarm reductionOversimplification of harm reductionRedliningAI Artificial IntelligenceCall to actionChatGPT and health coverageAggregating informationAI judicial processes by Insurers outside the courtsWhat can I do to reduce potential harm?The Light CollectivePassword managersPseudonymityLow-tech approachesThe Electronic Frontier FoundationInter-rater reliability in chart reviewsInter-rater reliability and AIAI can make a complex system faster, not betterSituational awarenessExpectations of organizationsChatGPT and Large Language ModelsThe Mighty Casey Quinlan ApproachDALL.E – AI ImagesPrivacy of creatorsDangerously hopefulReflectionPodcast OutroPlease comment and ask questions:Production TeamCreditsInspired by and Grateful toLinks and referencesRelated episodes from Health HatsCreative Commons LicensingCC BY-NC-SADisclaimer Episode Proem How does YouTube know so much about me? I'm searching on my browser for solutions to my too-slow-responding Bluetooth mouse. In moments, YouTube feeds me shorts about solving Mac problems. I'm following a teen mental health Twitter chat, and my TikTok feed shows threads about mental health apps. How do they know? I'm getting personal comments about my mental health. My mental health is mostly good. Who else will know? Do I care? I live my life out loud. I don't share what I wouldn't want on a billboard, which, for me, is almost everything. When is that unsafe? When would I be embarrassed? I'm no longer looking for work, so I don't care. Who can access my data? What should I share? What does privacy even mean? How does privacy impact the need for connection? Isn't privacy a continuum – different needs at different times from different people? So many questions. Today's guest, Fred Trotter, co-authored the seminal work Hacking Healthcare. Fred is a Healthcare Data Journalist and expert in Clinical Data Analysis, Healthcare Informatics, Differential Privacy, and Clinical Cybersecurity. Podcast intro Welcome to Health Hats, the Podcast. I'm Danny van Leeuwen, a two-legged cisgender old white man of privilege who knows a little bit about a lot of healthcare and a lot about very little. We will listen and learn about what it takes to adjust to life's realities in the awesome circus of healthcare. Let's make some sense of all of this. Privacy in Digital Communication Health Hats: I picture movement along a continuum when I think about Digital Privacy. Complete privacy is connecting with no one. That's intolerable. No privacy is connecting with everyone about everything. That's unsafe and exhausting. Privacy and risk tolerance go hand in hand for me alone and for me with my peeps and tribes. Risk tolerance isn't fixed it changes with context. My thoughts get muddier when I associate privacy and connection. They are flip sides of the same coin. I need community connection. But the more I connect (content and reach), the more complex privacy becomes. My approach to managing privacy involves harm reduction, a term used in substance use treatment. So, based on my ever-changing risk tolerance and my need for connection, how do I reduce the harm privacy issues can cause? Harm reduction, safety, data aggregation Fred Trotter: It's funny that you mentioned harm reduction. A college friend of mine,
Guest: Damien Desfontaines, Staff Scientist at Tumult LabsOn Linkedin | https://www.linkedin.com/in/desfontaines/On Twitter | https://twitter.com/TedOnPrivacyOn Mastodon | https://hachyderm.io/@tedted____________________________Host: Sean Martin, Co-Founder at ITSPmagazine [@ITSPmagazine] and Host of Redefining CyberSecurity Podcast [@RedefiningCyber]On ITSPmagazine | https://www.itspmagazine.com/itspmagazine-podcast-radio-hosts/sean-martin____________________________This Episode's SponsorsImperva | https://itspm.ag/imperva277117988Devo | https://itspm.ag/itspdvweb___________________________Episode NotesThis episode of Redefining CyberSecurity features a deep discussion between host, Sean Martin and guest, Damien Desfontaines on the topic of Differential Privacy (DP) and its implications in the field of cybersecurity. Damien, who currently works in a startup, Tumult Labs, primarily focuses on DP concepts and has rich prior experience from working in the anonymization team at Google. He shares key insights on how differential privacy — a tool to anonymize sensitive data can be effectively used by organizations to share or publish data safely, thus opening doors for new business opportunities.They discuss how differential privacy is gradually becoming a standard practice for companies wanting to share more data without incurring additional privacy risk. Damien also sheds light on the forthcoming guidelines from NIST regarding DP, which will equip organizations with a concrete framework to evaluate DP claims. Despite the positive dimension, Damien also discusses the potential pitfalls in the differential privacy implementation and the need for solid data protection strategies.The episode concludes with an interesting conversation about how technology and risk mitigation controls can pave way for more business opportunities in a secure manner.Key insights:Differential Privacy (DP) offers a mathematically proven methodology to anonymize sensitive data. It enables organizations to safely share or publish data, opening new business opportunities while adhering to privacy norms and standards.The forthcoming guidelines from NIST will equip organizations with a concrete framework to evaluate DP claims, fine-tune their privacy governance, and promote data governance within their operations.Implementing DP is complex and necessitates solid data protection strategies. Even with a strong mathematical foundation, the practical implementation of DP requires careful monitoring of potential vulnerabilities, illustrating the need for a holistic approach to data privacy.___________________________Watch this and other videos on ITSPmagazine's YouTube ChannelRedefining CyberSecurity Podcast with Sean Martin, CISSP playlist:
Guest: Damien Desfontaines, Staff Scientist at Tumult LabsOn Linkedin | https://www.linkedin.com/in/desfontaines/On Twitter | https://twitter.com/TedOnPrivacyOn Mastodon | https://hachyderm.io/@tedted____________________________Host: Sean Martin, Co-Founder at ITSPmagazine [@ITSPmagazine] and Host of Redefining CyberSecurity Podcast [@RedefiningCyber]On ITSPmagazine | https://www.itspmagazine.com/itspmagazine-podcast-radio-hosts/sean-martin____________________________This Episode's SponsorsImperva | https://itspm.ag/imperva277117988Devo | https://itspm.ag/itspdvweb___________________________Episode NotesThis episode of Redefining CyberSecurity features a deep discussion between host, Sean Martin and guest, Damien Desfontaines on the topic of Differential Privacy (DP) and its implications in the field of cybersecurity. Damien, who currently works in a startup, Tumult Labs, primarily focuses on DP concepts and has rich prior experience from working in the anonymization team at Google. He shares key insights on how differential privacy — a tool to anonymize sensitive data can be effectively used by organizations to share or publish data safely, thus opening doors for new business opportunities.They discuss how differential privacy is gradually becoming a standard practice for companies wanting to share more data without incurring additional privacy risk. Damien also sheds light on the forthcoming guidelines from NIST regarding DP, which will equip organizations with a concrete framework to evaluate DP claims. Despite the positive dimension, Damien also discusses the potential pitfalls in the differential privacy implementation and the need for solid data protection strategies.The episode concludes with an interesting conversation about how technology and risk mitigation controls can pave way for more business opportunities in a secure manner.Key insights:Differential Privacy (DP) offers a mathematically proven methodology to anonymize sensitive data. It enables organizations to safely share or publish data, opening new business opportunities while adhering to privacy norms and standards.The forthcoming guidelines from NIST will equip organizations with a concrete framework to evaluate DP claims, fine-tune their privacy governance, and promote data governance within their operations.Implementing DP is complex and necessitates solid data protection strategies. Even with a strong mathematical foundation, the practical implementation of DP requires careful monitoring of potential vulnerabilities, illustrating the need for a holistic approach to data privacy.___________________________Watch this and other videos on ITSPmagazine's YouTube ChannelRedefining CyberSecurity Podcast with Sean Martin, CISSP playlist:
This interview was recorded for the GOTO Book Club.gotopia.tech/bookclubRead the full transcription of the interview hereKatharine Jarmul - Principal Data Scientist at Thoughtworks & Author of "Practical Data Privacy"Alyona Galyeva - Principal MLOps & Data Engineer at ThoughtworksRESOURCESKatharinetwitter.com/kjamlinkedin.com/in/katharinejarmulkjamistan.comprobablyprivate.comAlyonagithub.com/alyonagalyevalinkedin.com/in/alyonagalyevaDESCRIPTIONIntegrating privacy-enhancing technologies into software applications is an imperative step for safeguarding user data and adhering to regulatory requirements in the realm of software development. However, prior to implementation, it is vital for development teams to grasp the potential pitfalls associated with incorporating privacy technology. They must also appreciate the significance of iterative processes and the necessity of collaborative efforts to ensure compliance.Furthermore, achieving the delicate equilibrium between privacy and utility is of paramount importance. Organizations must meticulously fine-tune privacy settings, tailoring them to suit specific use cases.Additionally, alongside this core evaluation criterion, considerations such as speed and computational efficiency may enter the equation, demanding expertise in privacy engineering for successful implementation at scale.Katharine Jarmul, the author of "Practical Data Privacy," spoke to Alyona Galyeva from PyLadies Amsterdam, during which she unveiled a slew of open-source libraries and practical examples for implementing privacy technology. Katharine also explored how developers can proactively guarantee that their data science projects prioritize security by design and uphold privacy by default.The interview is based on the book "Practical Data Privacy"RECOMMENDED BOOKKatharine Jarmul • Practical Data PrivacyKatharine Jarmul & Jacqueline Kazil • Data Wrangling with PythonKatharine Jarmul & Richard Lawson • Python Web ScrapingYehonathan Sharvit • Data-Oriented ProgrammingZhamak Dehghani • Data MeshEberhard Wolff & Hanna Prinz • Service MeshPiethein Strengholt • Data Management at ScaleMartin Kleppmann • Designing Data-Intensive ApplicationsTwitterInstagramLinkedInFacebookLooking for a unique learning experience?Attend the next GOTO conference near you! Get your ticket: gotopia.techSUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted daily!
Summary Machine learning and generative AI systems have produced truly impressive capabilities. Unfortunately, many of these applications are not designed with the privacy of end-users in mind. TripleBlind is a platform focused on embedding privacy preserving techniques in the machine learning process to produce more user-friendly AI products. In this episode Gharib Gharibi explains how the current generation of applications can be susceptible to leaking user data and how to counteract those trends. Announcements Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery. Your host is Tobias Macey and today I'm interviewing Gharib Gharibi about the challenges of bias and data privacy in generative AI models Interview Introduction How did you get involved in machine learning? Generative AI has been gaining a lot of attention and speculation about its impact. What are some of the risks that these capabilities pose? What are the main contributing factors to their existing shortcomings? What are some of the subtle ways that bias in the source data can manifest? In addition to inaccurate results, there is also a question of how user interactions might be re-purposed and potential impacts on data and personal privacy. What are the main sources of risk? With the massive attention that generative AI has created and the perspectives that are being shaped by it, how do you see that impacting the general perception of other implementations of AI/ML? How can ML practitioners improve and convey the trustworthiness of their models to end users? What are the risks for the industry if generative models fall out of favor with the public? How does your work at Tripleblind help to encourage a conscientious approach to AI? What are the most interesting, innovative, or unexpected ways that you have seen data privacy addressed in AI applications? What are the most interesting, unexpected, or challenging lessons that you have learned while working on privacy in AI? When is TripleBlind the wrong choice? What do you have planned for the future of TripleBlind? Contact Info LinkedIn (https://www.linkedin.com/in/ggharibi/) Parting Question From your perspective, what is the biggest barrier to adoption of machine learning today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast (https://www.dataengineeringpodcast.com) covers the latest on modern data management. Podcast.__init__ () covers the Python language, its community, and the innovative ways it is being used. Visit the site (https://www.themachinelearningpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com (mailto:hosts@themachinelearningpodcast.com)) with your story. To help other people find the show please leave a review on iTunes (https://podcasts.apple.com/us/podcast/the-machine-learning-podcast/id1626358243) and tell your friends and co-workers. Links TripleBlind (https://tripleblind.ai/) ImageNet (https://scholar.google.com/citations?view_op=view_citation&hl=en&user=JicYPdAAAAAJ&citation_for_view=JicYPdAAAAAJ:VN7nJs4JPk0C) Geoffrey Hinton Paper BERT (https://en.wikipedia.org/wiki/BERT_(language_model)) language model Generative AI (https://en.wikipedia.org/wiki/Generative_artificial_intelligence) GPT == Generative Pre-trained Transformer (https://en.wikipedia.org/wiki/Generative_pre-trained_transformer) HIPAA Safe Harbor Rules (https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html) Federated Learning (https://en.wikipedia.org/wiki/Federated_learning) Differential Privacy (https://en.wikipedia.org/wiki/Differential_privacy) Homomorphic Encryption (https://en.wikipedia.org/wiki/Homomorphic_encryption) The intro and outro music is from Hitman's Lovesong feat. Paola Graziano (https://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Tales_Of_A_Dead_Fish/Hitmans_Lovesong/) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/)/CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0/)
This week, host Anna Rose (https://twitter.com/annarrose) and co-host Tarun Chitra (https://twitter.com/tarunchitra) catch up with Chris Goes (https://twitter.com/cwgoes) from Heliax (https://heliax.dev/team), the team behind Anoma (https://anoma.net/) and Namada (https://namada.net/). They start with a quick retrospective about IBC, a project he formerly worked on, and how the protocol has evolved since it launched. They dive into the concept of 'intents', exploring their origin, evolution, and discuss the intent-based systems that exist today. As well, they chat about the architectures enabled by a generalized intent-based infrastructure, the potential impacts on user experience, and the inherent trade-offs, particularly when zero-knowledge or privacy aspects are added to the mix. Further reading for this episode: Papers/Docs Cosmos Whitepaper (https://v1.cosmos.network/resources/whitepaper) Anoma: Undefining Money Versatile commitments to value by Christopher Goes, Awa Sun Yin and Adrian Brink (https://anoma.net/vision-paper.pdf) Differential Privacy in Constant Function Market Makers by Tarun Chitra, Guillermo Angeris and Alex Evans (https://eprint.iacr.org/2021/1101.pdf) Wyvern Protocol Documents (https://wyvernprotocol.com/docs) Websites SUAVE and the Future Opportunities and Challenges of MEV: Part I (https://medium.com/intotheblock/suave-and-the-future-opportunities-and-challenges-of-mev-part-i-6d206fb681) CoW Swap (https://swap.cow.fi/#/1/swap/WETH) Zcash GitHub - Nullifiers (https://zcash.github.io/orchard/design/nullifiers.html#:~:text=The%20nullifier%20commits%20to%20the,exist%20in%20the%20commitment%20tree) Map of Zones Website (https://mapofzones.com/home?columnKey=ibcVolume&period=24h) Talks/YouTube Realizing Intents with a Resource Model - Christopher Goes at Research Day (https://www.youtube.com/watch?v=4Nh4EOpvKMY) The Edge of MEV Switching Costs and the Slow Game - Christopher Goes at Research Day (https://www.youtube.com/watch?v=PUBvZRhOTAo&pp=ygUdY2hyaXN0b3BoZXIgZ29lcyByZXNlYXJjaCBkYXk%3D) ZK8: Namada: asset-agnostic interchain privacy - Chris Goes - Anoma (https://www.youtube.com/watch?v=5K6YxmZPFkE) Christopher Goes - Anoma: an intent-centric (https://www.youtube.com/watch?v=1Krw6-UkM9U) Are Intents, SUAVE, Account Abstraction, & Cross-Chain Bridging all the same thing? - Uma Roy at Research Day (https://www.youtube.com/watch?v=G0nFyq9DDPw) Podcast Eps Episode 115: Cosmos, IBC and ZKPs with Chris Goes (https://zeroknowledge.fm/115-2/) Episode 184: Anoma's Adrian Brink on Validity Predicates, Ferveo DKG & More (https://zeroknowledge.fm/184-2/) Episode 253: A look into Namada and Anoma with Awa Sun Yin (https://zeroknowledge.fm/253-2/) zkSummit 10 is happening in London on September 20, 2023! Apply to attend now -> zkSummit 10 Application Form (https://9lcje6jbgv1.typeform.com/zkSummit10) Polygon Labs (https://polygon.technology/) is thrilled to announce Polygon 2.0: The Value Layer for the Internet (https://polygon.technology/roadmap). Polygon 2.0 and all of our ZK tech is open-source and community-driven. Reach out to the Polygon community on Discord (https://discord.gg/0xpolygon) to learn more, contribute, or join in and build the future of Web3 together with Polygon! Aleo (https://www.aleo.org/) is a new Layer-1 blockchain that achieves the programmability of Ethereum, the privacy of Zcash, and the scalability of a rollup. For questions, join their Discord at aleo.org/discord (http://aleo.org/discord). If you like what we do: * Find all our links here! @ZeroKnowledge | Linktree (https://linktr.ee/zeroknowledge) * Subscribe to our podcast newsletter (https://zeroknowledge.substack.com) * Follow us on Twitter @zeroknowledgefm (https://twitter.com/zeroknowledgefm) * Join us on Telegram (https://zeroknowledge.fm/telegram) * Catch us on YouTube (https://zeroknowledge.fm/)
Welcome to the newest episode of The Cloud Pod podcast! Justin, Ryan, Jonathan, Matthew and Peter are your hosts this week as we discuss all things cloud and AI, Titles we almost went with this week: The Cloud Pod is better than Bob's Used Books The Cloud Pod sets up AWS notifications for all The Cloud Pod is non-differential about privacy in BigQuery The Cloud Pod finds Windows Bob The Cloud Pod starts preparing for its Azure Emergency today A big thanks to this week's sponsor: Foghorn Consulting, provides top-notch cloud and DevOps engineers to the world's most innovative companies. Initiatives stalled because you have trouble hiring? Foghorn can be burning down your DevOps and Cloud backlogs as soon as next week.
In this week's episode, I speak with Damien Desfontaines, also known by the pseudonym “Ted”, who is the Staff Scientist at Tumult Labs, a startup leading the way on differential privacy. In Damien's career, he has led an Anonymization Consulting Team at Google and specializes in making it easy to safely anonymize data. Damien earned his PhD and wrote his thesis at ETH Zurich, as well as his Master's Degree in Mathematical Logic and Theoretical Computer Science.Tumult Labs' platform makes differential privacy useful by making it easy to create innovative privacy and enabling data products that can be safely shared and used widely. In this conversation, we focus our discussion on Differential Privacy techniques, including what's next in its evolution, common vulnerabilities, and how to implement differential privacy into your platform.When it comes to protecting personal data, Tumult Labs has three stages in their approach. These are Assess, Design, and Deploy. Damien takes us on a deep dive into each with use cases provided.Topics Covered:Why there's such a gap between the academia and the corporate worldHow differential privacy's strong privacy guarantees are a result of strong assumptions; and why the biggest blockers to DP deployments have been eduction & usabilityWhen to use "local" vs "central" differential privacy techniquesAdvancements in technology that enable the private collection of dataTumult Labs' Assessment approach to deploying differential privacy, where a customer defines its 'data publication' problem or questionHow the Tumult Analytics platform can help you build different privacy algorithms that satisfies 'fitness for use' requirementsWhy using gold standard techniques like differential privacy to safely release, publish, or share data has value far beyond complianceHow data scientists can make the analysis & design more robust to better preserve privacy; and the tradeoff between utility on very specific tasks & number of tasks that you can possibly answerDamien's work assisting the IRS & DOE deploy differential privacy to safely publish and share data publicly via the College Scorecards projectHow to address security vulnerabilities (i.e. potential attacks) to differentially private datasetsWhere you can learn more about differential privacyHow Damien sees this space evolving over the next several yearsResources Mentioned:Join the Tumult Labs SlackLearn about Tumult LabsGuest Info:Connect with Damien on LinkedInLearn more on Damien's websiteFollow 'Ted' on Twitter Privado.ai Privacy assurance at the speed of product development. Get instant visibility w/ privacy code scans.Shifting Privacy Left Media Where privacy engineers gather, share, & learnDisclaimer: This post contains affiliate links. If you make a purchase, I may receive a commission at no extra cost to you.Copyright © 2022 - 2024 Principled LLC. All rights reserved.
Summary Encryption and security are critical elements in data analytics and machine learning applications. We have well developed protocols and practices around data that is at rest and in motion, but security around data in use is still severely lacking. Recognizing this shortcoming and the capabilities that could be unlocked by a robust solution Rishabh Poddar helped to create Opaque Systems as an outgrowth of his PhD studies. In this episode he shares the work that he and his team have done to simplify integration of secure enclaves and trusted computing environments into analytical workflows and how you can start using it without re-engineering your existing systems. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you're ready to build your next pipeline, or want to test out the projects you hear about on the show, you'll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode (https://www.dataengineeringpodcast.com/linode) today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don't forget to thank them for their continued support of this show! Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. By the time errors have made their way into production, it's often too late and damage is done. Datafold built automated regression testing to help data and analytics engineers deal with data quality in their pull requests. Datafold shows how a change in SQL code affects your data, both on a statistical level and down to individual rows and values before it gets merged to production. No more shipping and praying, you can now know exactly what will change in your database! Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Visit dataengineeringpodcast.com/datafold (https://www.dataengineeringpodcast.com/datafold) today to book a demo with Datafold. RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudder (https://www.dataengineeringpodcast.com/rudder) Build Data Pipelines. Not DAGs. That's the spirit behind Upsolver SQLake, a new self-service data pipeline platform that lets you build batch and streaming pipelines without falling into the black hole of DAG-based orchestration. All you do is write a query in SQL to declare your transformation, and SQLake will turn it into a continuous pipeline that scales to petabytes and delivers up to the minute fresh data. SQLake supports a broad set of transformations, including high-cardinality joins, aggregations, upserts and window operations. Output data can be streamed into a data lake for query engines like Presto, Trino or Spark SQL, a data warehouse like Snowflake or Redshift., or any other destination you choose. Pricing for SQLake is simple. You pay $99 per terabyte ingested into your data lake using SQLake, and run unlimited transformation pipelines for free. That way data engineers and data users can process to their heart's content without worrying about their cloud bill. For data engineering podcast listeners, we're offering a 30 day trial with unlimited data, so go to dataengineeringpodcast.com/upsolver (https://www.dataengineeringpodcast.com/upsolver) today and see for yourself how to avoid DAG hell. Your host is Tobias Macey and today I'm interviewing Rishabh Poddar about his work at Opaque Systems to enable secure analysis and machine learning on encrypted data Interview Introduction How did you get involved in the area of data management? Can you describe what you are building at Opaque Systems and the story behind it? What are the core problems related to security/privacy in data analytics and ML that organizations are struggling with? What do you see as the balance of internal vs. cross-organization applications for the solutions you are creating? comparison with homomorphic encryption validation and ongoing testing of security/privacy guarantees performance impact of encryption overhead and how to mitigate it UX aspects of not being able to view the underlying data risks of information leakage from schema/meta information Can you describe how the Opaque Systems platform is implemented? How have the design and scope of the product changed since you started working on it? Can you describe a typical workflow for a team or teams building an analytical process or ML project with your platform? What are some of the constraints in terms of data format/volume/variety that are introduced by working with it in the Opaque platform? How are you approaching the balance of maintaining the MC2 project against the product needs of the Opaque platform? What are the most interesting, innovative, or unexpected ways that you have seen the Opaque platform used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Opaque Systems/MC2? When is Opaque the wrong choice? What do you have planned for the future of the Opaque platform? Contact Info LinkedIn (https://www.linkedin.com/in/rishabh-poddar/) Website (https://rishabhpoddar.com/) @Podcastinator (https://twitter.com/podcastinator) on Twitter Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ () covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. To help other people find the show please leave a review on Apple Podcasts (https://podcasts.apple.com/us/podcast/data-engineering-podcast/id1193040557) and tell your friends and co-workers Links Opaque Systems (https://opaque.co/) UC Berkeley RISE Lab (https://rise.cs.berkeley.edu/) TLS (https://en.wikipedia.org/wiki/Transport_Layer_Security) MC² (https://mc2-project.github.io/) Homomorphic Encryption (https://en.wikipedia.org/wiki/Homomorphic_encryption) Secure Multi-Party Computation (https://en.wikipedia.org/wiki/Secure_multi-party_computation) Secure Enclaves (https://opaque.co/blog/what-are-secure-enclaves/) Differential Privacy (https://en.wikipedia.org/wiki/Differential_privacy) Data Obfuscation (https://en.wikipedia.org/wiki/Data_masking) AES == Advanced Encryption Standard (https://en.wikipedia.org/wiki/Advanced_Encryption_Standard) Intel SGX (Software Guard Extensions) (https://www.intel.com/content/www/us/en/developer/tools/software-guard-extensions/overview.html) Intel TDX (Trust Domain Extensions) (https://www.intel.com/content/www/us/en/developer/articles/technical/intel-trust-domain-extensions.html) TPC-H Benchmark (https://www.tpc.org/tpch/) Spark (https://spark.apache.org/) Trino (https://trino.io/) PyTorch (https://pytorch.org/) Tensorflow (https://www.tensorflow.org/) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)
We have a smorgasbord of articles today! Some consultants tell us that Deception is hard, and David has a spicy take, we discuss differential privacy briefly, aged domains used in malware, and KILLER ROBOTS! Article 1 - Crafty threat actor uses 'aged' domains to evade security platforms Article 2 - How to Use Cyber Deception to Counter an Evolving and Advanced Threat Landscape Article 3 - Computer Repair Technicians Are Stealing Your DataSupporting Article:Samsung Releases Maintenance Mode, A New Feature To Hide Your Personal Information From Prying EyesThinking about taking your computer to the repair shop? Be very afraid Article 4 - San Francisco lawmakers approve lethal robots, but they can't carry gunsSupporting Articles:Bomb-disposal robot violently disposes of Dallas cop-killer gunmanGoliath Demolition Tank Article 5 -Census Bureau Chief Defends New Privacy Tool Against CriticsSupporting Article:What is Differential Privacy and How does it Work? If you found this interesting or useful, please follow us on Twitter @serengetisec and subscribe and review on your favorite podcast app!
Starting with the 2020 Decennial Census, the U.S. Census Bureau is implementing a new framework to ensure privacy of the census takers - essentially, making sure that a nefarious actor can't identify an individual from published census tables. Although the U.S. Census Bureau has been employing various strategies for decades, this new framework is very different. And although the intent of this framework has good intentions, the tradeoff between privacy protection and data accuracy will mean rural areas will have less than accurate data. On this episode, Kelly Asche - Senior Research Associate, interviews David Van Riper from the Minnesota Population Center who has been one of the leading researchers exploring this tradeoff between privacy and accuracy, and what it might mean for rural data.
Data deidentification aims to provide data owners with edible cake: to allow them to freely use, share, store and publicly release sensitive record data without risking the privacy of any of the individuals in the data set. And, surprisingly, given some constraints, that's not impossible to do. However, the behavior of a deidentification algorithm depends on the distribution of the data itself. Privacy research often treats data as a black box---omitting formal data-dependent utility analysis, evaluating over simple homogeneous test data, and using simple aggregate performance metrics. As a result, there's less work formally exploring detailed algorithm interactions with realistic data contexts. This can result in tangible equity and bias harms when these technologies are deployed; this is true even of deidentification techniques such as cell-suppression which have been in widespread use for decades. At worst, diverse subpopulations can be unintentionally erased from the deidentified data. Successful engineering requires understanding both the properties of the machine and how it responds to its running environment. In this talk I'll provide a basic outline of distribution properties such as feature correlations, diverse subpopulations, deterministic edit constraints, and feature space qualities (cardinality, ordinality), that may impact algorithm behavior in real world contexts. I'll then use new (publicly available) tools from the National Institute of Standards and Technology to show unprecedentedly detailed performance analysis for a spectrum of recent and historic deidentification techniques on diverse community benchmark data. We'll combine the two and consider a few basic rules that help explain the behavior of different techniques in terms of data distribution properties. But we're very far from explaining everything—I'll describe some potential next steps on the path to well-engineered data privacy technology that I hope future research will explore. A path I hope some CERIAS members might join us on later this year. This talk will be accessible to anyone who's interested—no background in statistics, data, or recognition of any of the above jargon is required. About the speaker: Christine Task is a CERIAS alumna, who earned her PhD in Computer Science at Purdue University in 2015, and joined Knexus Research Corporation later that year. Since then she has led the first National Challenges in Differential Privacy for the National Institute of Standards and Technology, contributed to 2020 Census Differentially Private Disclosure Avoidance System, served as technical lead for non-DP Synthetic Data projects for the US Census Bureau's American Community Survey, American Housing Survey and American Business Survey, been co-lead on the United Nation's UNECE Synthetic Data Working Group, and led the development of the SDNist data deidentification benchmarking library. Back in 2012, as a doctoral student at Purdue, she gave a CERIAS seminar titled "Practical Beginner's Guide to Differential Privacy", whose success was very valuable to her career. Having begun a decade ago, she was thrilled to be invited back to present what amounts to an update on that work.
Differential privacy provides a mathematical definition of what privacy is in the context of user data. In lay terms, a data set is said to be differentially private if the existence or lack of existence of a particular piece of data doesn't impact the end result. Differential privacy protects an individual's information essentially as if her information were not used in the analysis at all. This is a promising area of research and one of the future privacy-enhancing technologies that many people in the privacy community are excited about. However, it's not just theoretical, differential privacy is already being used by large technology companies like Google and Apple as well as in US Census result reporting. Dr. Yun Lu of the University of Victoria specializes in differential privacy and she joins the show to explain differential privacy, why it's such a promising and compelling framework, and share some of her research on applying differential privacy in voting and election result reporting. Topics: What's your educational background and work history? What is differential privacy? What's the history of differential privacy? Where did this idea come from? How does differential privacy cast doubt on the results of the data? What problems does differential privacy solve that can't be solved by existing privacy technologies? When adding noise to a dataset, is the noise always random or does it need to be somehow correlated with the original dataset's distribution? How do you choose an epsilon? What are the common approaches to differential privacy? What are some of the practical applications of differential privacy so far? How is differential privacy used for training a machine learning model? What are some of the challenges with implementing differential privacy? What are the limitations of differential privacy? What area of privacy does your research focus on? Can you talk a bit about the work you did on voting data privacy How have politicians exploited the data available on voters? How can we prevent privacy leakage when releasing election results? What are some of the big challenges in privacy research today that we need to try to solve? What future privacy technologies are you excited about? Resources: Dr. Yun Lu's research The Definition of Differential Privacy - Cynthia Dwork Differential Privacy and the People's Data Protecting Privacy with MATH
Differential Privacy has become a widely used tool to protect privacy in data science applications. In this talk, I will present two use cases for differential privacy: a) in collection of key-value statistics and b) as a protection against membership inference attacks. Key-value statistics are commonly used to gather information about the use of software products. Yet, the collector may be untrusted, and the data of each user should be protected. There exist a number of differentially private collection methods that perturb the data at the client's site. However, these are very inaccurate. In theory it would also be possible to collect these statistics using secure computations. However, that is too inefficient to even test. We show that a new combination of differentially privacy and secure computation achieves both high accuracy and high efficiency. In the second application, we investigate the theoretical protection of differential privacy against membership inference attacks on neural network models. There exist proofs of theoretical upper bounds that scale with the privacy parameter. We show theoretically and empirically that those bounds do not hold against existing membership inference attacks in a natural deployment. We show that when using existing data sets from different sources on the Internet (instead of the same data set as in lab experiments) and unmodified existing, even no longer state-of-the-art membership inference attacks, the bound does not hold. We provide a theoretical explanation using a model that removes an unrealistic assumption about the training that, namely that it is iid. About the speaker: Florian Kerschbaum is a professor in the David R. Cheriton School of Computer Science at the University of Waterloo (joined in 2017), a member of the CrySP group, and NSERC/RBC chair in data security (since 2019). Before he worked as chief research expert at SAP in Karlsruhe (2005 – 2016) and as a software architect at Arxan Technologies in San Francisco (2002 – 2004). He holds a Ph.D. in computer science from the Karlsruhe Institute of Technology (2010) and a master's degree from Purdue University (2001). He served as the inaugural director of the Waterloo Cybersecurity and Privacy Institute (2018 – 2021). He is an ACM Distinguished Scientist (2019). He is interested in security and privacy in the entire data science lifecycle. He extends real-world systems with cryptographic security mechanisms to achieve (some) provable security guarantees. His work is used in several business applications.
Differential Privacy has become a widely used tool to protect privacy in data science applications. In this talk, I will present two use cases for differential privacy: a) in collection of key-value statistics and b) as a protection against membership inference attacks. Key-value statistics are commonly used to gather information about the use of software products. Yet, the collector may be untrusted, and the data of each user should be protected. There exist a number of differentially private collection methods that perturb the data at the client's site. However, these are very inaccurate. In theory it would also be possible to collect these statistics using secure computations. However, that is too inefficient to even test. We show that a new combination of differentially privacy and secure computation achieves both high accuracy and high efficiency. In the second application, we investigate the theoretical protection of differential privacy against membership inference attacks on neural network models. There exist proofs of theoretical upper bounds that scale with the privacy parameter. We show theoretically and empirically that those bounds do not hold against existing membership inference attacks in a natural deployment. We show that when using existing data sets from different sources on the Internet (instead of the same data set as in lab experiments) and unmodified existing, even no longer state-of-the-art membership inference attacks, the bound does not hold. We provide a theoretical explanation using a model that removes an unrealistic assumption about the training that, namely that it is iid.
Enterprises trying to keep up with technological advances and a rapidly changing regulatory landscape can learn from their public sector counterparts who have been finding innovative ways to publish data while respecting the privacy of individuals.This talk will review recent uses of differential privacy in the public sector, based on actual case-studies at the US Census Bureau and the Internal Revenue Service. We will emphasize the tools and processes that enable “negotiations” between the parties most concerned with privacy and those most concerned with accuracy, or “fitness for use”, of the released data. We will explain the benefits that drove the adoption of differential privacy and how they can be translated to commercial enterprises.
For today's episode we embark on part two of our discussion on the U.S. Census. Protecting the data privacy of survey respondents has always been a central consideration for the U.S Census Bureau, and throughout its history, many methods have been developed and implemented. For the 2020 Census, the Bureau adopted a new form of privacy protection—differential privacy which was received with mixed reaction. To further understand why the Census Bureau adopted this new form of privacy protection and to help explore the concerns raised about differential privacy, we invited two experts who represent both sides of the debate and who each contributed to the Harvard Data Science Review special issue on the 2020 U.S. Census. Our guests are: John Abowd, Associate Director for Research and Methodology, Chief Scientist at the U.S. Census Bureau, and author of the The 2020 Census Disclosure Avoidance System TopDown Algorithm for HDSR. danah boyd, founder and president of Data & Society, Principal Researcher at Microsoft, Visiting Professor at New York University, and author of Differential Perspectives: Epistemic Disconnects Surrounding the U.S. Census Bureau's Use of Differential Privacy for HDSR.
While most Americans have heard of the U.S. Census and understand that it is designed to count every resident in the United States every 10 years, many may not realize that the Census's role goes far beyond the allocation of seats in Congress. For this episode, we invited the three co-editors of Harvard Data Science Review's special issue on the U.S. Census to help us explore what the Census is, what it's used for, and how the data it collects should remain both private and useful. Our guests are: Erica Groshen, former Commissioner of Labor Statistics and Head of the U.S. Bureau of Labor Statistics Ruobin Gong, Assistant Professor of Statistics at Rutgers University Salil Vadhan, Professor of Computer Science and Applied Mathematics at Harvard University
What's Russia up to in cyberspace, nowadays? Belgium accuses China of cyberespionage. LockBit ransomware spreading through compromised servers. Malek Ben Salem from Accenture explains the Privacy Enhancing Technologies of Federated Learning with Differential Privacy guarantees. Rick Howard speaks with Rob Gurzeev from Cycognito on Data Exploitation. And Micodus GPS tracker vulnerabilities should motivate the user to turn the thing off. For links to all of today's stories check out our CyberWire daily news briefing: https://thecyberwire.com/newsletters/daily-briefing/11/136 Selected reading. Continued cyber activity in Eastern Europe observed by TAG (Google) Declaration by the High Representative on behalf of the European Union on malicious cyber activities conducted by hackers and hacker groups in the context of Russia's aggression against Ukraine (European Council) China: Declaration by the Minister for Foreign Affairs on behalf of the Belgian Government urging Chinese authorities to take action against malicious cyber activities undertaken by Chinese actors (Federal Public Service Foreign Affairs) Déclaration du porte-parole de l'Ambassade de Chine en Belgique au sujet de la déclaration du gouvernement belge sur les cyberattaques (Embassy of the People's Republic of China in the Kingdom of Belgium) LockBit: Ransomware Puts Servers in the Crosshairs (Broadcom Software Blogs | Threat Intelligence) Critical Vulnerabilities Discovered in Popular Automotive GPS Tracking Device (MiCODUS MV720) (BitSight) CISA released Security Advisory on MiCODUS MV720 Global Positioning System (GPS) Tracker (CISA)
feedback @decentmakeover13@gmail.com Instagram - https://www.instagram.com/decentmakeover Twitter - https://twitter.com/decentmakeover Episode Links - Gautam's Homepage - http://www.gautamkamath.com/ twitter - https://twitter.com/thegautamkamath YouTube - https://www.youtube.com/c/gautamkamath Blog - https://kamathematics.wordpress.com/
Dr. Lance Eliot explains AI & Law and differential privacy. See his website www.ai-law.legal for further information.
Om Shownotes ser konstiga ut (exempelvis om alla länkar saknas. Det ska finnas MASSOR med länkar) så finns de på webben här också: https://www.enlitenpoddomit.se Avsnitt 345 spelades in den 1 december och eftersom att en människa normalt har 52 ben i sina fötter av totalt 206 i kroppen ( https://www.ontariochiropodist.com/Public/foot-facts.html ) så handlar dagens avsnitt om: INTRO: - Alla har haft en vecka... FEEDBACK AND BACKLOG: - Johan har kommit på att det var "Differential Privacy" som han tänkte på förra veckan https://en.wikipedia.org/wiki/Differential_privacy ALLMÄNT NYTT - Winamp kommer tillbaka, men… https://musictech.com/news/industry/winamp-is-coming-back-as-a-unique-space-for-creators-and-possibly-a-music-making-platform/ - Nya IoT regler I UK (tips från DanielGR) https://www.bbc.com/news/technology-59400762 - Varför inte en Cyberwhistle https://appleinsider.com/articles/21/12/01/tesla-selling-50-cyberwhistle-musk-mocks-19-apple-polishing-cloth - Vi har pratat om Logistikpodden förut: https://logistikpodden.se/podcast/niklas-modig-fran-japan-till-varlden-med-flodeseffektivitet/ - Twitter får ny VD https://www.thurrott.com/cloud/social/259733/twitter-ceo-steps-down - Amazon låter oss bygga tvillingar https://aws.amazon.com/about-aws/whats-new/2021/11/aws-iot-twinmaker-build-digital-twins/ - Kryptovalutan JRR Token stoppas https://computersweden.idg.se/2.2683/1.759298/upphovsratten-satter-stopp-for-kryptovalutan-jrr-token LYSSNARFRÅGA: - Världens bästa Carin har hört av sig om klistermärken MICROSOFT - Men vafan… igen: https://www.thurrott.com/cloud/web-browsers/microsoft-edge/259781/users-pushback-against-bloatware-in-microsoft-edge - En bra sak har man också gjort https://www.bleepingcomputer.com/news/microsoft/microsoft-edge-adds-super-duper-secure-mode-to-stable-channel/ - Ny superdator: https://www-zdnet-com.cdn.ampproject.org/c/s/www.zdnet.com/google-amp/article/microsoft-now-has-one-of-the-worlds-fastest-supercomputers-and-no-it-doesnt-run-on-windows/ - Flytta Office 365 till Sverige. Fast man ska kanske inte ha bråttom. https://docs.microsoft.com/en-us/microsoft-365/enterprise/request-your-data-move?view=o365-worldwide - BONUSLÄNK: 99% invisible https://99percentinvisible.org/episode/alphabetical-order/ - Dags att köra windows 11 överallt? https://www.thurrott.com/windows/windows-11/259730/windows-11-is-now-on-almost-9-percent-of-pcs APPLE - Apple stämmer NSO Group https://www.apple.com/newsroom/2021/11/apple-sues-nso-group-to-curb-the-abuse-of-state-sponsored-spyware/ - BONUSLÄNK: https://darknetdiaries.com/episode/100/ GOOGLE: - Nya funktioner I Android https://www.thurrott.com/mobile/android/259810/google-announces-new-android-features-2 - Google betalar skatt https://computersweden.idg.se/2.2683/1.759460/google-betalar-miljardbelopp-i-restskatt-till-irland - Och håller på med teckenspråk https://blog.google/outreach-initiatives/accessibility/ml-making-sign-language-more-accessible/ - Inte bara döva som ska få hjälp, utan även synskadade https://blog.google/outreach-initiatives/accessibility/more-accessible-web-images-arrive-10-new-languages/ PRYLLISTA - Björn: Blackweek: Pluralsight pryllista: Kartonger: https://www.etsy.com/listing/893963572/mini-amazon-box-16-envelope-template - David: Blackweek: Musikprogramvara pryllista: Dekorationsslinga: https://www.kjell.com/se/produkter/hem-fritid/belysning-lampor/stamningsbelysning/dekorationsslinga-3000-led-med-ljuseffekter-p64646 - Johan: Blackweek: Hörlurar, löparskor, Pannkakajärn, Oura ring Pryllista: https://www.xxl.se/gore-wear-m-windstopper-facewarmer-ansiktsvarmare-svart/p/1150300_1_style EGNA LÄNKAR - En Liten Podd Om IT på webben, http://enlitenpoddomit.se/ - En Liten Podd Om IT på Facebook, https://www.facebook.com/EnLitenPoddOmIt/ - En Liten Podd Om IT på Youtube, https://www.youtube.com/enlitenpoddomit - Ge oss gärna en recension - https://podcasts.apple.com/se/podcast/en-liten-podd-om-it/id946204577?mt=2#see-all/reviews - https://www.podchaser.com/podcasts/en-liten-podd-om-it-158069 LÄNKAR TILL VART MAN HITTAR PODDEN FÖR ATT LYSSNA: - Apple Podcaster (iTunes), https://itunes.apple.com/se/podcast/en-liten-podd-om-it/id946204577 - Overcast, https://overcast.fm/itunes946204577/en-liten-podd-om-it - Acast, https://www.acast.com/enlitenpoddomit - Spotify, https://open.spotify.com/show/2e8wX1O4FbD6M2ocJdXBW7?si=HFFErR8YRlKrELsUD--Ujg%20 - Stitcher, https://www.stitcher.com/podcast/the-nerd-herd/en-liten-podd-om-it - YouTube, https://www.youtube.com/enlitenpoddomit LÄNK TILL DISCORD DÄR MAN HITTAR LIVE STREAM + CHATT - http://discord.enlitenpoddomit.se (Och glöm inte att maila bjorn@enlitenpoddomit.se om du vill ha klistermärken, skicka med en postadress bara. :)
1. Instagram Rolls Out Limits Feature to Prevent Abuse - Instagram announces 3 new features to help protect people from abuse : The ability for people to limit comments and DM requests during spikes of increased attention Stronger warnings when people try to post potentially offensive comments The global rollout of our Hidden Words feature, which allows people to filter abusive DM requests 2. Facebook Announces First-Ever #BuyBlack Summit - As part of its expanded effort to support Black business owners as they deal with the ongoing impacts of the pandemic, Facebook has announced a new #BuyBlack Summit, an all-day event that will provide advice and guidance for Black-owned businesses, to be held on August 24th. Sign up for the event and learn more https://buyblacksummit.splashthat.com/ 3. Facebook Shares New, Privacy-Focused Approach to Advertising - The spin from Facebook is that they are helping to provide more insight within the data limitations in place after Apple's ATT. And now they are developing a set of privacy-enhancing technologies (PETs) for ads, which will minimize the amount of data gathered and processed, in order to help protect personal information, while still facilitating insight into campaign performance. They are:Secure Multi-Party Computation (MPC) - allows two or more organizations to work together while limiting the information that either party can learn. MPC is useful for enhancing privacy while calculating outcomes from more than one party, such as reporting the results of an ad campaign or training a machine-learning model where the data is held by two or more parties. Today, this type of reporting requires at least one party to learn which specific people made a purchase after seeing a specific ad. With MPC, say one party has the information about who saw an ad and another party has information on who makes a purchase. MPC and encryption make it possible for both parties to learn insights about how an ad is performing, without the need to entrust a single party with both data sets. Last year, they began testing a solution called Private Lift Measurement, which uses MPC to help advertisers understand performance.On-Device Learning - trains an algorithm from insights processed right on your device without sending individual data such as an item purchased or your email address to a remote server or cloud. For example, if lots of people who click on ads for exercise equipment also tend to buy protein shakes, on-device learning could help identify that pattern without sending individual data to a Facebook server or cloud. Then, Facebook can use this pattern to find an audience for protein shakes using ads. Similar to a feature like autocorrect or text prediction, on-device learning improves over time. As millions of devices each make small improvements and start to identify new patterns, these patterns can train an algorithm to get smarter so you may see more ads that are relevant to you and less that aren't. On-device learning data can be further protected by combining it with differential privacy.Differential Privacy - is a technique that can be used on its own or applied to other privacy-enhancing technologies to protect data from being re-identified. Differential privacy works by including carefully calculated “noise” to a dataset. For example, if 118 people bought a product after clicking on an ad, a differentially private system would add or subtract a random amount from that number. So instead of 118, someone using that system would see a number like 120 or 114. Adding that small random bit of incorrect information makes it harder to know who actually bought the product after clicking the ad, even if you have a lot of other data.4. Video Calling on LinkedIn Is Now a Reality - After adding support for various third-party video providers over the last year in order to facilitate video meetings in the app, last week, LinkedIn quietly rolled out a new, native video option within its messaging platform, which provides another way to connect with users, without the need to download a separate video app.As explained by LinkedIn:"From an initial job search to a 1:1 conversation, we wanted to drive the productivity of our members end to end while keeping them safe. By adding video conferencing as a part of the messaging experience, members can connect virtually while maintaining the context of their existing conversation. Now, members can easily schedule free video meetings with their network without the need to download a client or sign up to any service."5. Google Search Console May Contain Bot Traffic Data - Google's John Mueller confirmed that Google Search Console does not filter out all bot traffic. John said on Twitter in response to someone suspecting seeing bot traffic in the Performance report, "sometimes it can be from bots - we don't necessarily filter all of that out in Search Console."6. Google Ads attribution models now support YouTube and Display - Attribution is a common issue for search marketers and continues to be muddied as more of the web focuses on privacy. The ability to model your attribution journeys through YouTube and Display will help marketers determine which channels to invest in and which channels could use a different strategy. As of August 9, Google Ads has upgraded all non-last click models, including data-driven attribution, to support YouTube and Display ads. In addition to clicks, the data-driven attribution model also measures engaged views from YouTube. Along with knowing which channels are contributing along the buyer journey toward a final conversion (whatever that looks like for your business), the new inclusions mean that “when used along with automated bidding strategies or updates to your manual bidding, data-driven attribution helps to drive additional conversions at the same CPA compared to last click.”7. YouTube Updates Default Settings on Kids Content, Implements New Restrictions on Promotions - YouTube has announced some new measures to assist in protecting young users from questionable content and unwanted exposure on the platform, with new default privacy settings for uploads by young people, and new reminders and prompts to help avoid overuse. First off on the new upload settings - in the coming weeks, YouTube says that it will upgrade the default privacy settings for uploads from users ages 13-17 to 'the most private option available'.So kids can still mitigate the defaults, but by using this as a starting point, YouTube's hoping to ensure that younger users gain more awareness of the risks involved in such, potentially limiting unwanted exposure in the app.YouTube's also looking to tackle overuse, with the addition of 'take a break' and bedtime reminders, also by default, for all users ages 13-17. So, again, savvy youngsters can just switch these settings off if they choose - and most of them are far more savvy and attuned to such than their parents. But by implementing new defaults, YouTube's looking to increase awareness of its various options in this respect, with a view to improving safety.And finally, in what may be a big blow for kidfluencers, YouTube's also removing more commercial content from YouTube Kids.8. Google Ads Editor Rolls Out New Features: Lead Form Extensions, Hotel Ads & More - Google Ads Editor is a tool that allows advertisers to make changes in bulk, making it easy to make optimizations and edits across multiple keywords, ads, ad groups and/or campaigns, easily and seamlessly. With Google Ads Editor, changes are made offline before being pushed live. Making adjustments offline allows advertisers better control and visibility into changes before pushing them live. This week, Google Ads released a new version of Google Ads Editor with a slew of new features. Here's what you need to know! Lead Form Extensions - Previously only accessible through the UI, lead form extensions were released in 2019 allowing users to append a lead form to their ads so that prospects could complete the form without ever leaving the SERP. Since their initial release, users could only create or edit lead form extensions within the UI. Users can now download, edit, and create lead form extensions within Google Ads Editor. Lead Form Extensions.YouTube Audio Ads - Similarly, YouTube Audio Ads, which were released in 2020, were previously only available to set up directly within the UI. Now audio ads can be set up through Google Ads Editor. Hotel Ads - Users can now use Google Ads Editor to manage Hotel Ads, which are feed-based ads that help hotel advertisers promote prices and availabilities of their properties on any given day. Since 2018, Hotel Ads have been available but only accessible to work on through the Google Ads UI.9. Google Provides More Transparency Over Custom Bidding Process - Custom Bidding is Google's automated bidding strategy for Google Ads 360, which enables advertisers to assign a value to a conversion or purchase, which Google's system can then optimize for within its process. It can be a good way to maximize campaign performance, based on Google's ever-evolving machine learning processes - but up till now, Custom Bidding has required a degree of technical expertise to implement, due to coding elements.To mitigate this, and lower the barrier to entry, Google's now adding 'Floodlight activities', which are pre-created HTML code snippets that can be used to track conversions, or other information about transactions.Through this new process, you'll be able to choose pre-determined goals for your Custom Bid approach, then export the code for insertion on your site. So there is still a level of technical expertise involved - but you won't have to understand all the code parameters and build the relevant HTML yourself.In addition to this, Google's also adding 'pay per viewable' impressions for display and video campaigns, providing more customizable campaign elements, while it's also adding a new 'Bidding Insights' report, which will provide more transparency over its automated bidding processes.10. New Requirements for Google Podcasts Recommendations - Beginning on September 21, Google will enforce new requirements for podcasts to show in recommendations on the Google Podcasts platform, the company told podcast owners via email on Thursday. Podcasts that do not provide the required information can still appear in Google and Google Podcasts search results and users can still subscribe to them, they just won't be eligible to be featured as a recommendation.The new requirements. Starting on September 21, to be eligible to show as a recommendation, podcast RSS feeds must include: A valid, crawlable image: This image must be accessible to Google (not blocked to Google's crawler or require a login). A show description: Include a user-friendly show description that accurately describes the show. A valid owner email address: This email address is used to verify show ownership. You must have access to email sent to this address. A link to a homepage for the show: Linking your podcast to a homepage will help the discovery and presentation of your podcast on Google surfaces. The podcast author's name: A name to show in Google Podcasts as the author of the podcast. This does not need to be the same as the owner.
Show Notes(02:06) Fabiana talked about her Bachelor’s degree in Applied Mathematics from the University of Lisbon in the early 2010s.(04:18) Fabiana shared lessons learned from her first job out of college as a Siebel and BI Developer at Novabase.(05:13) Fabiana discussed unique challenges while working as an IoT Solutions Architect at Vodafone.(09:56) Fabiana mentioned projects she contributed to as a Data Scientist at startups such as ODYSAI and Habit Analytics.(12:44) Fabiana talked about the two Master’s degrees she got while working in the industry (Applied Econometrics from Lisbon School of Economics and Management and Business Intelligence from NOVA IMS Information Management School).(14:41) Fabiana distinguished the difference between data science and business intelligence.(18:01) Fabiana shared the founding story of YData, the first data-centric platform with synthetic data, whose she is currently the Chief Data Officer.(21:32) Fabiana discussed different techniques to generate synthetic data, including oversampling, Bayesian Networks, and generative models.(24:01) Fabiana unpacked the key insights in her blog series on generating synthetic tabular data.(29:40) Fabiana summarized novel design and optimization techniques to cope with the challenges of training GAN models.(33:44) Fabiana brought up the benefits of using Differential Privacy as a complement to synthetic data generation.(38:07) Fabiana unpacked her post “The Cost of Poor Data Quality,” — where she defined data quality as data measures based on factors such as accuracy, completeness, consistency, reliability, and above all, whether it is up to date.(42:11) Fabiana explained the important role that data quality plays in ensuring model explainability.(44:57) Fabiana reasoned about YData’s decision to pursue the open-source strategy.(47:47) Fabiana discussed her podcast called “When Machine Learning Meets Privacy” in collaboration with the MLOps Slack community.(49:14) Fabiana briefly shared the challenges encountered to get the first cohort of customers for YData.(50:12) Fabiana went over valuable lessons to attract the right people who are excited about YData’s mission.(51:52) Fabiana shared her take on the data community in Lisbon and her effort to inspire more women to join the tech industry.(53:47) Closing segment.Fabiana’s Contact InfoLinkedInMediumTwitterYData’s ResourcesWebsiteGithubLinkedInTwitterAngelListSynthetic Data CommunityMentioned ContentBlog PostsSynthetic Data: The Future Standard for Data Science Development (April 2020)Generating Synthetic Tabular Data with GANs — Part 1 (May 2020)Generating Synthetic Tabular Data with GANs — Part 2 (May 2020)What Is Differential Privacy? (May 2020)What Is Going On With My GAN? (July 2020)How To Generate Synthetic Tabular Data? Wasserstein Loss for GANs (Sep 2020)The Cost of Poor Data Quality (Sep 2020)How Can I Explain My ML Models To The Business? (Oct 2020)Synthetic Time-Series Data: A GAN Approach (Jan 2021)Podcast“When Machine Learning Meets Privacy”PeopleJean-Francois Rajotte (Resident Data Scientist at the University of British Columbia)Sumit Mukherjee (Associate Professor of Statistics at Columbia University)Andrew Trask (Leader at OpenMined, Research Scientist at DeepMind, Ph.D. Student at the University of Oxford)Théo Ryffel (Co-Founder of Arkhn, Ph.D. Student at ENS and INRIA, Leader at OpenMined)Recent Announcements/ArticlesPartnerships with UbiOps and AlgorithmiaThe rise of DataPrepOps (March 2021)From model-centric to data-centric (March 2021)
Show Notes(02:06) Fabiana talked about her Bachelor’s degree in Applied Mathematics from the University of Lisbon in the early 2010s.(04:18) Fabiana shared lessons learned from her first job out of college as a Siebel and BI Developer at Novabase.(05:13) Fabiana discussed unique challenges while working as an IoT Solutions Architect at Vodafone.(09:56) Fabiana mentioned projects she contributed to as a Data Scientist at startups such as ODYSAI and Habit Analytics.(12:44) Fabiana talked about the two Master’s degrees she got while working in the industry (Applied Econometrics from Lisbon School of Economics and Management and Business Intelligence from NOVA IMS Information Management School).(14:41) Fabiana distinguished the difference between data science and business intelligence.(18:01) Fabiana shared the founding story of YData, the first data-centric platform with synthetic data, whose she is currently the Chief Data Officer.(21:32) Fabiana discussed different techniques to generate synthetic data, including oversampling, Bayesian Networks, and generative models.(24:01) Fabiana unpacked the key insights in her blog series on generating synthetic tabular data.(29:40) Fabiana summarized novel design and optimization techniques to cope with the challenges of training GAN models.(33:44) Fabiana brought up the benefits of using Differential Privacy as a complement to synthetic data generation.(38:07) Fabiana unpacked her post “The Cost of Poor Data Quality,” — where she defined data quality as data measures based on factors such as accuracy, completeness, consistency, reliability, and above all, whether it is up to date.(42:11) Fabiana explained the important role that data quality plays in ensuring model explainability.(44:57) Fabiana reasoned about YData’s decision to pursue the open-source strategy.(47:47) Fabiana discussed her podcast called “When Machine Learning Meets Privacy” in collaboration with the MLOps Slack community.(49:14) Fabiana briefly shared the challenges encountered to get the first cohort of customers for YData.(50:12) Fabiana went over valuable lessons to attract the right people who are excited about YData’s mission.(51:52) Fabiana shared her take on the data community in Lisbon and her effort to inspire more women to join the tech industry.(53:47) Closing segment.Fabiana’s Contact InfoLinkedInMediumTwitterYData’s ResourcesWebsiteGithubLinkedInTwitterAngelListSynthetic Data CommunityMentioned ContentBlog PostsSynthetic Data: The Future Standard for Data Science Development (April 2020)Generating Synthetic Tabular Data with GANs — Part 1 (May 2020)Generating Synthetic Tabular Data with GANs — Part 2 (May 2020)What Is Differential Privacy? (May 2020)What Is Going On With My GAN? (July 2020)How To Generate Synthetic Tabular Data? Wasserstein Loss for GANs (Sep 2020)The Cost of Poor Data Quality (Sep 2020)How Can I Explain My ML Models To The Business? (Oct 2020)Synthetic Time-Series Data: A GAN Approach (Jan 2021)Podcast“When Machine Learning Meets Privacy”PeopleJean-Francois Rajotte (Resident Data Scientist at the University of British Columbia)Sumit Mukherjee (Associate Professor of Statistics at Columbia University)Andrew Trask (Leader at OpenMined, Research Scientist at DeepMind, Ph.D. Student at the University of Oxford)Théo Ryffel (Co-Founder of Arkhn, Ph.D. Student at ENS and INRIA, Leader at OpenMined)Recent Announcements/ArticlesPartnerships with UbiOps and AlgorithmiaThe rise of DataPrepOps (March 2021)From model-centric to data-centric (March 2021)
Des emojis les plus populaires sur iOS à l'affluence dans votre boutique préférée sur Google Maps; comment ces informations sont obtenues? Et quelles garanties peut-on avoir quant à la préservation de l'anonymat des utilisateurs? Dans ce Post Mortem thématique, le Docteur Damien Desfontaines - Senior Software Engineer, Privacy pour Google nous parle de Differential Privacy (confidentialité différentielle). Avec des centaines de millions d'utilisateurs actifs chaque jour, les géants du numérique bénéficient de données précises d'utilisation. Au-delà de l'amélioration de l'expérience utilisateur, ces données, une fois agrégées, peuvent contribuer à adresser des problèmes de santé publique. Après un bref historique des techniques d'anonymisations (02'00"), on définit la confidentialité différentielle et ses propriétés (06'50") avant de revenir sur un cas d'usage au sein de Google (20'18") pour enfin discuter des implémentations existantes (27'58") et des challenges à l'adoption de cette technique (34'13"). Sur Apple Podcast, vous devriez avoir accès aux chapitres avec les liens et illustrations. L'illustration de la Randomized Response devrait être utile! Toutes les illustrations sont disponibles sur le blog post qui accompagne l'épisode sur le Medium du Post Mortem Podcast https://medium.com/the-post-mortem-podcast Ressources Latanya Sweeney et la ré-identification des données médicales du gouverneur du Massachusetts, William Weld en 1997. Wikipedia Le blog de Damien sur la Differential Privacy, c'est très visuel et de nombreux articles sont accessibles au grand public https://desfontain.es/privacy/differential-privacy-awesomeness.html (~10mins de lecture). Une version html de sa thèse Lowering the cost of anonymisation est également disponible sur son site. Les chapitres légers en maths sont indiqués par une fleur ✿. Pour le cas d'usage de la Differential Privacy chez Apple cité en introduction, voir le papier de la Differential Privacy Team d'Apple : "Learning with Privacy at Scale", disponible ici https://docs-assets.developer.apple.com/ml-research/papers/learning-with-privacy-at-scale.pdf Un exemple d'usage de la Differential Privacy chez Google; Les Community Mobility Reports, pour une vision de l'impact du covid sur la mobilité des personnes https://www.google.com/covid19/mobility/ Description du processus d'anonymisation pour les Google Community Reports "Google COVID-19 Community Mobility Reports: Anonymization Process Description", https://arxiv.org/abs/2004.04145 Le papier "Differentially Private SQL with Bounded User Contribution", https://arxiv.org/abs/1909.01917 publié par Damien et son équipe pour faciliter l'utilisation de la Differential Privacy par les analystes en étendant les capacités de SQL La librairie open source de Google sur la confidentialité différentielle est disponible sur GitHub: https://github.com/google/differential-privacy Fun Facts The Fundamental Law of Information Recovery, Cynthia DWork: "“Overly accurate” estimates of “too many” statistics is blatantly non-private" extrait du livre “The Algorithmic Foundations of Differential Privac
In this talk, we explore security and privacy related to meta-learning, a learning paradigm aiming to learn 'cross-task' knowledge instead of 'single-task' knowledge. For privacy perspective, we conjecture that meta-learning plays an important role in future federated learning and look into federated meta-learning systems with differential privacy design for task privacy protection. For security perspective, we explore anomaly detection for machine learning models. Particularly, we explore poisoning attacks on machine learning models in which poisoning training samples are the anomaly. Inspired from that poisoning samples degrade trained models through overfitting, we exploit meta-training to counteract overfitting, thus enhancing model robustness. About the speaker: Yimin Chen is now a postdoctoral researcher in Computer Science department in Virginia Tech. Currently his research mainly focuses on differential privacy, anomaly detection, adversarial example, and private learning. Before he worked on security and privacy of mobile computing systems for his PhD study. He obtained a PhD degree from Arizona State University in 2018, a MPhil degree from Chinese University of Hong Kong in 2013, and a BS degree from Peking University in 2010.
In this talk, we explore security and privacy related to meta-learning, a learning paradigm aiming to learn 'cross-task' knowledge instead of 'single-task' knowledge. For privacy perspective, we conjecture that meta-learning plays an important role in future federated learning and look into federated meta-learning systems with differential privacy design for task privacy protection. For security perspective, we explore anomaly detection for machine learning models. Particularly, we explore poisoning attacks on machine learning models in which poisoning training samples are the anomaly. Inspired from that poisoning samples degrade trained models through overfitting, we exploit meta-training to counteract overfitting, thus enhancing model robustness.
This week on The Encrypted Economy, my guest is Nigel Smart, a Co-Founder of Unbound Security and Professor at the University of Leuven in Belgium. Nigel's reputation as a cryptography expert is well known, and our conversation delivered one of the most in-depth episodes we've had on the podcast to date. Nigel is a leading researcher on Elliptic-Curve Cryptography, and his current work at Unbound focuses on Multi-Party Computation, but don't be fooled into thinking those two topics are the only areas we covered. Nigel left no cryptographic stone untouched as we dove into everything from the foundations of encryption in transit, to the challenges of post-quantum cryptography. This is an episode that anyone involved in the areas of cryptography, privacy, or security absolutely cannot miss. If you liked my conversation with Nigel, be sure to subscribe to The Encrypted Economy for more great episodes covering the future of data protection. If there are other topics that you would like to see more of on the podcast, be sure to reach out and send us some feedback on any of our social media profiles. Topics Covered Why Unbound Divorces Data Elliptic Curve Cryptography Post Quantum SecurityIntersecting Nigel's Work With Digital AssetsThe Obvious and Non-Obvious Use Cases for UnboundSecuring Data in Use as the Last Mile of SecurityDecentralization, Differential Privacy, and More in Unbound's SolutionsThe Unbound MPC Labs Resource List Nigel's LinkedIn Nigel's Website Unbound's Website My Article on Homomorphic Encryption CORE Key Management w/ Yehuda LindellBoston Wage Gap Study Elliptic Curve Cryptography RSA Cryptosystem Diffie-Hellman Key Exchange Lattice Based Cryptography Isogeny Based Cryptography NIST Post Quantum Project ENISA Post Quantum Report WannaCry Ransomware Report Differential Privacy Kerberos Authentication Our Episode with Kurt Rohloff Unbound MPC Labs
Welcome to the Tech Journal. My name is Mark van Rijmenam, and I am the Digital Speaker. In this episode of the Tech Journal, The Digital Speaker starts by exploring Data Privacy. Here he looks into the EU's investigation in Instagram's handling, or rather mishandling, of data, Facebook being sued over the Cambridge Analytica data scandal for a second time, and Differential Privacy, what it is and how it could potentially increase privacy. After, he moves on to Antitrust laws and the Tech Giants, having a closer look at how both Amazon and Google are being sued for breaching them. Then he ends by exploring Digital Censorship, and how the censorship debate is very much alive and well. If you're interested in any of these topics, stick around for Mark's breakdown. So, put your feet up, get comfortable and let us start speaking digital. In case you would like to view the video version of the podcast, please head to YouTube or Vimeo. For more information: • The Digital Speaker's website: https://TheDigitalSpeaker.com • Dr Mark van Rijmenam's website: https://vanrijmenam.nl • LinkedIn: https://linkedin.com/in/markvanrijmenam • Twitter: https://twitter.com/vanrijmenam • Vimeo: https://vimeo.com/digitalspeaker • Instagram: https://www.instagram.com/the_digital_speaker/ • Anchor.fm: https://anchor.fm/the-digital-speaker
Most of the avian-themed Privacy Sandbox proposals to date have been about ad targeting, but measurement will also be affected by the planned deprecation of third-party cookies in Chrome. Allyson Dietz, director of product marketing at Neustar, joins eMarketer principal analyst at Insider Intelligence Nicole Perrin to discuss the measurement firm's PeLICAn proposal to the World Wide Web Consortium (W3C) and explain what differential privacy means for ad measurement.
**Privacy-preserving ML with Differential Privacy** Differential privacy is without a question one of the most innovative concepts that came around in the last decades, with a variety of different applications even when it comes to Machine Learning. Many are organizations already leveraging this technology to access and make sense of their most sensitive data, but what is it? How does it work? And how can we leverage it the most? To explain this and provide us a brief intro on Differential Privacy, I've invited Christos Dimitrakakis. Professor at University, counts already with multiple publications (more than 1000!!!) in the areas of Machine Learning, Reinforcement Learning, and Privacy. Useful links: Christos Dimitrakakis list of publications Differential privacy for Bayesian inference through posterior sampling Authors: Christos Dimitrakakis, Blaine Nelson, Zuhe Zhang, Aikaterini Mitrokotsa, Benjamin IP Rubinstein Differential privacy use cases Open-source differential privacy projects Open-source project for Differential Privacy in SQL databases
In this episode of The Data Privacy Podcast, Tom is joined by Matt Kunkel - the Co-Founder and CEO of LogicGate. Tom and Matt will dive into risks in your organisation - how COVID-19 affects our risk landscape, managing risk within your organisation, and why taking risks isn’t necessarily a bad thing.
Privacy and data protection are not just a job for lawyers or professionals who specialize in privacy - not anymore. Technology plays an important role in ensuring personal data can remain private. Ensuring that personal data is secure but useful requires a level of skill found in data scientists.In this episode of Serious Privacy, Paul Breitbarth and K Royal searched for just such a skilled individual,Katharine Jarmul, the Head of Product at Cape Privacy, and a data scientist. Cape Privacy is a New York-based company assisting others with machine learning, data security and adding value to data. Katharine explains what data science actually is, how to keep data private, useful and valuable at the same time, and how to create synthetic data appropriately. Also a big question when it comes to powerful technology revolves around the ethics and the investment of individual technologists in the ethics of privacy.Join us as we discuss these topics and more, such as GPT-3, “this person does not exist,” the work of Cynthia Dwork, and differential privacy vs the generative model. As often happens in an episode, certain topics in privacy are revisited, mainly because they are wicked problems with no identified solution. One such topic Katharine discussed is bias in machine learning and approaches to solving bias once identified. Throughout this episode, we reference quite a few resources that we will provide the links - as always. ResourcesIAPP article on AI and synthetic data: https://iapp.org/news/a/accelerating-ai-with-synthetic-data/Federated / Collaborative Learning Introduction: https://federated.withgoogle.com/Encrypted Learning with TF-Encrypted (also can be used in a collaborative setting where we are sharing data): https://medium.com/dropoutlabs/encrypted-deep-learning-training-and-predictions-with-tf-encrypted-keras-557193284f44Europe - Ethics guidelines for trustworthy AI https://ec.europa.eu/futurium/en/ai-alliance-consultation Social MediaTwitter: @privacypodcast, @EuroPaulB, @heartofprivacy, @trustarc, @kjam, @capeprivacyInstagram @seriousprivacy
Simson Garfinkel, Senior Computer Scientist for Confidentiality and Data Access at the US Census Bureau, discusses his work modernizing the Census Bureau disclosure avoidance system from private to public disclosure avoidance techniques using differential privacy. Some of the discussion revolves around the topics in the paper Randomness Concerns When Deploying Differential Privacy. WORKS MENTIONED: “Calibrating Noise to Sensitivity in Private Data Analysis” by Cynthia Dwork, Frank McSherry, Kobbi Nissim, Adam Smith https://link.springer.com/chapter/10.1007/11681878_14 "Issues Encountered Deploying Differential Privacy" by Simson L Garfinkel, John M Abowd, and Sarah Powazek https://dl.acm.org/doi/10.1145/3267323.3268949 "Randomness Concerns When Deploying Differential Privacy" by Simson L. Garfinkel and Philip Leclerc https://arxiv.org/abs/2009.03777 Check out: https://simson.net/page/Differential_privacy Thank you to our sponsor, BetterHelp. Professional and confidential in-app counseling for everyone. Save 10% on your first month of services with www.betterhelp.com/dataskeptic
Science and knowledge advance through information gathered, organized, and analyzed. It is only through databases about people that social scientists, public health experts and academics can study matters important to us all. As never before, vast pools of personal data exist in data lakes controlled by Facebook, Google, Amazon, Acxiom, and other companies. Our personal data becomes information held by others. To what extent can we trust those who hold our personal information not to misuse it or share it in a way that we don’t want it shared? And what will lead us to trust our information to be shared for database purposes that could improve the lives of this and future generations, and not for undesirable and harmful purposes? Dr. Cody Buntain, Assistant Professor at the New Jersey Institute of Technology’s College of Computing and an affiliate of New York University’s Center for Social Media and Politics discusses in this podcast how privacy and academic research intersect. Facebook, Google, and other holders of vast stores of personal information face daunting privacy challenges. They must guard against unintended consequences of sharing data. They will not generally share with and will not sell to academic researchers access to databases. However, they will consider and approve collaborative agreements with researchers that result in providing academics access to information for study purposes. This access can aim to limit access to identifying individuals through various techniques, including encryption, anonymization, pseudonymization, and “noise” (efforts to block users from being able to identify individuals who contributed to a database). “Differential privacy” is an approach to the issues of assuring privacy protection and database access for legitimate purposes. It is described by Wikipedia as “a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset.” The concept is based on the point that it is the group’s information that is being measured and analyzed, and any one individual’s particular circumstances are irrelevant to the study. By eliminating the need for access to each individual’s identity, the provider of data through differential privacy seeks to assure data contributors that their privacy is respected, while providing to the researcher a statistically valid sample of a population. Differentially private databases and algorithms are designed to resist attacks aimed at tracing data back to individuals. While not foolproof, these efforts aim to reassure those who contribute their personal information to such sources that their private information will only be used for legitimate study purposes and not to identify them personally and thus risk exposure of information the individuals prefer to keep private. “Data donation” is an alternative. This provides a way for individuals to provide their own data to researchers for analysis. Some success has been achieved by paying persons to provide their data or allowing an entity gathering data for research to collect what it obtains by agreement with a group of persons. Both solutions have their limits of protection, and each can result in selection bias. Someone active in an illicit or unsavory activity will be reluctant to share information with any third party. We leave “data traces” through our daily activity and use of digital technology. Information about us becomes 0’s and 1’s that are beyond erasure. There can be false positives and negatives. Algorithms can create mismatches, for example a mistaken report from Twitter and Reddit identifying someone as a Russian disinformation agent. If you have ideas for more interviews or stories, please email info@thedataprivacydetective.com.
A US election is far more than just a struggle for the most powerful job in the world, they also provide a glimpse into consumer attitudes and emerging technologies designed to influence opinion. It was during the 2012 US election, for instance, that social media, online data and e-commerce profiling was leveraged for the first time to create a hyper-targeted, digital political campaign, that ultimately swept the Democrats into power. My guest this week, Harper Reed, was intimately involved in that strategy, having served as CTO of the Obama 2012 campaign, where he was the first to bring the mentality and connective capabilities of the tech industry to the political stage.
A US election is far more than just a struggle for the most powerful job in the world, they also provide a glimpse into consumer attitudes and emerging technologies designed to influence opinion. It was during the 2012 US election, for instance, that social media, online data and e-commerce profiling was leveraged for the first time to create a hyper-targeted, digital political campaign, that ultimately swept the Democrats into power. My guest this week, Harper Reed, was intimately involved in that strategy, having served as CTO of the Obama 2012 campaign, where he was the first to bring the mentality and connective capabilities of the tech industry to the political stage.
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.09.04.283135v1?rss=1 Authors: Oksuz, A. C., Ayday, E., Gudukbay, U. Abstract: Genome data is a subject of study for both biology and computer science since the start of Human Genome Project in 1990. Since then, genome sequencing for medical and social purposes becomes more and more available and affordable. Genome data can be shared on public websites or with service providers. However, this sharing compromises the privacy of donors even under partial sharing conditions. We mainly focus on the liability aspect ensued by unauthorized sharing of these genome data. One of the techniques to address the liability issues in data sharing is watermarking mechanism. To detect malicious correspondents and service providers (SPs) -whose aim is to share genome data without individuals' consent and undetected-, we propose a novel watermarking method on sequential genome data using belief propagation algorithm. In our method, we have two criteria to satisfy. (i) Embedding robust watermarks so that the malicious adversaries can not temper the watermark by modification and are identified with high probability (ii) Achieving {varepsilon}-local differential privacy in all data sharings with SPs. For the preservation of system robustness against single SP and collusion attacks, we consider publicly available genomic information like Minor Allele Frequency, Linkage Disequilibrium, Phenotype Information and Familial Information. Our proposed scheme achieves 100% detection rate against the single SP attacks with only 3% watermark length. For the worst case scenario of collusion attacks (50% of SPs are malicious), 80% detection is achieved with 5% watermark length and 90% detection is achieved with 10% watermark length. For all cases, {varepsilon}'s impact on precision remained negligible and high privacy is ensured. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.03.235416v1?rss=1 Authors: Chen, J., Wang, W. H., Shi, X. Abstract: Machine learning is powerful to model massive genomic data while genome privacy is a growing concern. Studies have shown that not only the raw data but also the trained model can potentially infringe genome privacy. An example is the membership inference attack (MIA), by which the adversary, who only queries a given target model without knowing its internal parameters, can determine whether a specific record was included in the training dataset of the target model. Differential privacy (DP) has been used to defend against MIA with rigorous privacy guarantee. In this paper, we investigate the vulnerability of machine learning against MIA on genomic data, and evaluate the effectiveness of using DP as a defense mechanism. We consider two widely-used machine learning models, namely Lasso and convolutional neural network (CNN), as the target model. We study the trade-off between the defense power against MIA and the prediction accuracy of the target model under various privacy settings of DP. Our results show that the relationship between the privacy budget and target model accuracy can be modeled as a log-like curve, thus smaller privacy budget provides stronger privacy guarantee with the cost of losing more model accuracy. We also investigate the effect of model sparsity on model vulnerability against MIA. Our results demonstrate that in addition to prevent overfitting, model sparsity can work together with DP to significantly mitigate the risk of MIA. Copy rights belong to original authors. Visit the link for more info
Using data for machine learning and analytics can potentially expose private data. How can we leverage data while ensuring that private information remains private? In this session, we'll discuss how differential privacy can be used to preserve privacy. We'll demonstrate how you can use newly released open source system, WhiteNoise, to put DP into your applications.Learn More: Whitenoise PaperOpen Differential PrivacyAzure Machine Learning How ToThe AI Show's Favorite links:Don't miss new episodes, subscribe to the AI Show Create a Free account (Azure) Deep Learning vs. Machine Learning Get Started with Machine Learning
Using data for machine learning and analytics can potentially expose private data. How can we leverage data while ensuring that private information remains private? In this session, we'll discuss how differential privacy can be used to preserve privacy. We'll demonstrate how you can use newly released open source system, WhiteNoise, to put DP into your applications.Learn More: Whitenoise PaperOpen Differential PrivacyAzure Machine Learning How ToThe AI Show's Favorite links:Don't miss new episodes, subscribe to the AI Show Create a Free account (Azure) Deep Learning vs. Machine Learning Get Started with Machine Learning
Learn how we use differential privacy to protect users' data in Windows.Learn More: Azure Blog Responsible MLAzure ML The AI Show's Favorite links:Don't miss new episodes, subscribe to the AI Show Create a Free account (Azure)
Learn more about the differential privacy research that powers WhiteNoise from Salil Vadhan, leader of the Privacy Tools Project at Harvard UniversityLearn More: Azure Blog Responsible MLAzure ML The AI Show's Favorite links:Don't miss new episodes, subscribe to the AI Show Create a Free account (Azure)
The power of finely-grained, individual-level data comes with a drawback: it compromises the privacy of potentially anyone and everyone in the dataset. Even for de-identified datasets, there can be ways to re-identify the records or otherwise figure out sensitive personal information. That problem has motivated the study of differential privacy, a set of techniques and definitions for keeping personal information private when datasets are released or used for study. Differential privacy is getting a big boost this year, as it’s being implemented across the 2020 US Census as a way of protecting the privacy of census respondents while still opening up the dataset for research and policy use. When two important topics come together like this, we can’t help but sit up and pay attention.
Show Notes:(2:18) Leonard discussed his undergraduate experience at Carnegie Mellon - where he studied Biology and Computer Science.(5:10) Leonard decided to pursue a Ph.D. in Bioinformatics at the University of California - San Francisco.(6:27) Leonard described his Ph.D. research that focused on finding hidden patterns in genetically-linked diseases.(9:42) Leonard went deep into clustering algorithms (Markov Clustering and Louvain) and their applications such as protein and news article similarity.(13:21) Leonard shared his story of starting a data science consultancy with various client startups.(17:58) Leonard discussed the interesting consulting projects that he worked on: from detecting plagiarism to predicting bill insurance.(22:04) Leonard shared practical tips to learn technical concepts.(23:23) Leonard reflected on his experience working with a string of startups including Accretive Health, Quid, and Stride Health.(26:06) Leonard is the founding team member of Primer AI, a startup that applies state-of-the-art NLP techniques to build machines that read and write, back in early 2015.(30:31) Leonard discussed the technical challenges to develop algorithms that power Primer’s products to scale across languages other than English.(34:28) Leonard unpacked his technical post "Russian NLP” on Primer’s blog.(38:17) Leonard talked about the advances in the NLP research domain that he is most excited about in 2020 (XLNet >>> BERT).(41:10) Leonard discussed the challenges of scaling the data-driven culture across Primer AI as the company grows.(46:20) Leonard mentioned different use cases of Primer for clients in finance, government, and corporate.(51:41) Leonard talked about his decision to leave Primer and become a Data Science Health Innovation Fellow at the Berkeley Institute for Data Science.(54:30) Leonard went over applications of data science in healthcare that will be adopted widely in the next few years.(1:02:45) Leonard discussed his process of writing a book called “Data Science Bookcamp.”(1:07:21) Leonard revealed how he chose the case studies to be included in the book.(1:10:27) Closing segment.His Contact Info:LinkedInGoogle ScholarBerkeley Institute For Data ScienceHis Recommended Resources:Semi-Supervised LearningAssociation Rule LearningspaCy (Open-Source Library for Advanced NLP)fastText (NLP library from Facebook)XLNet: Generalized Autoregressive Pretraining for Language UnderstandingBERT: Pretraining of Deep Bidirectional Transformers for Language UnderstandingFederated Learning with Differential Privacy: Algorithms and Performance AnalysisDifferential Privacy- Enabled Federated Learning for Sensitive Health DataOasis Labs and Dr. Dawn SongFitbit and Apple WatchWalter Pitts who invented neural networksPaul Werbos who invented back-propagationFei-Fei Li who constructed the ImageNet dataset“The Signal and The Noise” by Nate SilverYou can read the completed chapters of "Data Science Bookcamp" using the codes below:Permanent discount code: poddcast195 free eBook codes: dcdsprf-B373, dcdsprf-CA3B, dcdsprf-299E, dcdsprf-6E5, and dcdsprf-9660 (activated and will last for 2 months)
Show Notes:(2:18) Leonard discussed his undergraduate experience at Carnegie Mellon - where he studied Biology and Computer Science.(5:10) Leonard decided to pursue a Ph.D. in Bioinformatics at the University of California - San Francisco.(6:27) Leonard described his Ph.D. research that focused on finding hidden patterns in genetically-linked diseases.(9:42) Leonard went deep into clustering algorithms (Markov Clustering and Louvain) and their applications such as protein and news article similarity.(13:21) Leonard shared his story of starting a data science consultancy with various client startups.(17:58) Leonard discussed the interesting consulting projects that he worked on: from detecting plagiarism to predicting bill insurance.(22:04) Leonard shared practical tips to learn technical concepts.(23:23) Leonard reflected on his experience working with a string of startups including Accretive Health, Quid, and Stride Health.(26:06) Leonard is the founding team member of Primer AI, a startup that applies state-of-the-art NLP techniques to build machines that read and write, back in early 2015.(30:31) Leonard discussed the technical challenges to develop algorithms that power Primer’s products to scale across languages other than English.(34:28) Leonard unpacked his technical post "Russian NLP” on Primer’s blog.(38:17) Leonard talked about the advances in the NLP research domain that he is most excited about in 2020 (XLNet >>> BERT).(41:10) Leonard discussed the challenges of scaling the data-driven culture across Primer AI as the company grows.(46:20) Leonard mentioned different use cases of Primer for clients in finance, government, and corporate.(51:41) Leonard talked about his decision to leave Primer and become a Data Science Health Innovation Fellow at the Berkeley Institute for Data Science.(54:30) Leonard went over applications of data science in healthcare that will be adopted widely in the next few years.(1:02:45) Leonard discussed his process of writing a book called “Data Science Bookcamp.”(1:07:21) Leonard revealed how he chose the case studies to be included in the book.(1:10:27) Closing segment.His Contact Info:LinkedInGoogle ScholarBerkeley Institute For Data ScienceHis Recommended Resources:Semi-Supervised LearningAssociation Rule LearningspaCy (Open-Source Library for Advanced NLP)fastText (NLP library from Facebook)XLNet: Generalized Autoregressive Pretraining for Language UnderstandingBERT: Pretraining of Deep Bidirectional Transformers for Language UnderstandingFederated Learning with Differential Privacy: Algorithms and Performance AnalysisDifferential Privacy- Enabled Federated Learning for Sensitive Health DataOasis Labs and Dr. Dawn SongFitbit and Apple WatchWalter Pitts who invented neural networksPaul Werbos who invented back-propagationFei-Fei Li who constructed the ImageNet dataset“The Signal and The Noise” by Nate SilverYou can read the completed chapters of "Data Science Bookcamp" on the Manning Website:Permanent discount code: poddcast195 free eBook codes: dcdsprf-B373, dcdsprf-CA3B, dcdsprf-299E, dcdsprf-6E5, and dcdsprf-9660 (activated and will last for 2 months)
The U.S. Census, the once-a-decade count of everyone in the country, starts this month. Coming right up is Census Day, April 1, by which time everyone should have received a notification to fill out the census. When you respond you tell the census bureau where you live on April 1. To discuss the stakes in the census—everything from federal money to redistricting—we check in with Wendy Underhill, NCSL’s program director for Elections and Redistricting. Later in the show, we talk with Kathleen Styles, chief of decennial communications and stakeholder relations at the U.S. Census Bureau. Resources Differential Privacy for Census Data Explained 2020 Census Resources and Legislation 2020 Census Talking Points (for Legislators and Others) Transcription of OAS Episode 85
Differential privacy is a geeky technique designed to protect large consumer data sets. This week on The Big Story, we talk about what it means for advertising’s future. Also, we examine how the local TV market, stricken with declining ratings and fragmented consumption patterns, is embracing automation.
This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Today we’re joined by Ryan Rogers, Senior Software Engineer at LinkedIn. We caught up with Ryan at NeurIPS, where he presented the paper “Practical Differentially Private Top-k Selection with Pay-what-you-get Composition” as a spotlight talk. In our conversation, we discuss how LinkedIn allows its data scientists to access aggregate user data for exploratory analytics while maintaining its users’ privacy with differential privacy, and the major components of the paper. We also talk through one of the big innovations in the paper, which is discovering the connection between a common algorithm for implementing differential privacy, the exponential mechanism, and Gumbel noise, which is commonly used in machine learning. The complete show notes for this episode can be found at twimlai.com/talk/346.
In this episode, we have Dr. Mihai Maruseac give us a different perspective about Differential Privacy. His perspective comes from Computer Science instead of Statistics. Like all of our episodes, we jump around from privacy to other topics to Boston and back. Want to share your thoughts? Email us at antonio@datascienceimposters.com or jordy@datascienceimposters.com
Over the course of a generation, algorithms have gone from mathematical abstractions to powerful mediators of daily life. Algorithms have made our lives more efficient, but some experts contend that they are increasingly violating the basic rights of individual citizens. Professors Michael Kearns and Aaron Roth delved into the complexities of this topic with insight from their book The Ethical Algorithm. Joined by moderator Eric Horvitz of Microsoft Research Labs, Kearns and Roth contended that understanding and improving the science behind the algorithms that run our lives is rapidly becoming one of the most pressing issues of this century, yet traditional fixes such as laws, regulations, and watchdog groups have proven woefully inadequate. With reporting from the cutting edge of scientific research, Kearns and Roth offered a set of principled solutions based on the emerging and exciting science of socially aware algorithm design. They explored the future of responsible algorithm design. with strategies for combating data leaks and eliminating models with racial and gender biases. Kearns, Roth, and Horvitz invited us to a conversation about how we can better embed human principles into machine code—without halting the advance of data-driven scientific exploration. Michael Kearns is Professor and the National Center Chair in the Computer and Information Science department of the University of Pennsylvania. He is also the Founding Director of Penn’s Warren Center for Network and Data Sciences. Together with U.V. Vazirani, he is the author of An Introduction to Computational Learning Theory. Aaron Roth is the class of 1940 Bicentennial Term Associate Professor in the Computer and Information Science department at the University of Pennsylvania, where he co-directs Penn’s program in Networked and Social Systems Engineering. Together with Cynthia Dwork, he is the author of The Algorithmic Foundations of Differential Privacy. Eric Horvitz is Technical Fellow and Director at Microsoft Research Labs. He works to pursue research on principles of machine intelligence and on leveraging the complementarities of human and machine reasoning. He chairs the Aether Committee, Microsoft’s advisory board on the responsible fielding of AI technologies. He co-founded the Partnership on AI, a multiparty stakeholder organization that brings together leading tech companies and civil society groups on best practices with uses of AI in the open world. Presented by Town Hall Seattle. Recorded live in The Forum on November 11, 2019.
In this episode, Jon Prial continues his conversation about differential privacy with Yevgeniy Vahlis, Georgian Partners' Director of Security First. Find out more about how differential privacy works as Yevgeniy explains it using simple, every day examples. He then goes on to describe why differential privacy isn't just for the likes of Google and Apple, but rather something that most companies should be taking a close look at.
Apple made headlines in 2016 when it started talking about differential privacy. But what exactly is it? And what opportunities can it create for your business to aggregate and share your customers' data to get better results without compromising their privacy? In this episode, Jon Prial talks with Yevgeniy Vahlis, Georgian Partners' Director of Security First, to get a primer on differential privacy and understand what it's all about.
Differential privacy is a technology that's quickly moving from academia into business. And it's not just the big companies that are using it. With the intersection of trust and AI a hot topic right now, differential privacy is well on its way to becoming an integral part of the conversation. In this episode, Jon Prial welcomes Chang Liu to the show. Chang is an Applied Research Scientist at Georgian Partners and an expert on differential privacy. In the episode, they talk about what differential privacy is, why it's important, and why every company needs to think about having it as part of its product and technical strategy. You'll hear about: - The limits of data anonymization - What differential privacy is and why it's so important - Ways to protect your data and be differentially private - What epsilon is and what its role is - Differential privacy's potential to solve the cold-start problem - Implications for trust To learn more about this episode, check out the show notes: http://bit.ly/2MTjjLR
One of the big challenges that many software companies face is how to overcome the cold-start problem. That's when a software company needs customer data to optimize its AI solutions or to even just get them to work. That can be tricky because depending on the predictive model their solution uses, it could take weeks or even months to collect enough data from any new customer they onboard. In this episode, Jon Prial talks with Mahmoud Arram, the Co-founder and CTO of Bluecore, the leading retail marketing platform specializing in email. Find out how the Georgian Impact team partnered with Bluecore and used differential privacy to help it solve its cold-start problem. You'll Hear About: - The cold-start problem that many SaaS companies face - How Bluecore used differential privacy to eliminate the problem - The benefits using differential privacy brought to Bluecore and its customers - Bluecore's partnership with the Georgian Impact team and the results they got working together Access the show notes here: http://bit.ly/2HUObpl
How private is that survey data? Differential Privacy allows us to quantify this and other questions about data and it’s privacy. On this episode, we are joined by Dr. Claire McKay Bowen. She is currently a postdoctoral researcher in the Statistical Sciences Group at the Los Alamos National Laboratory studying methods of data privacy, specifically Read More ...
Data privacy. There's a lot of misinformation and overreaction when it comes to data privacy but that's in large part to the fact that there's a lot of lack of data privacy. People are rightly concerned. In this episode Tom describes very simply and generally what differential privacy is, and what you need to know about how it's used.On DTNS we try to balance the idea that companies do definitely need to improve data protection with the idea that sharing data at all isn't a bad thing in fact when done right can be a very good thing.Not just for companies but academic research and nonprofits benefit from research on datasets. However just taking data, even when names are stripped off, can lead to trouble. As far back as 2000, researchers were showing that the right analysis of raw data sets could deduce who people were even when the data was anonymized. In 2000, Latanya Sweeney showed that 87% of people in the US could be identified from ZIP code, birthdate and sex. https://arstechnica.com/tech-policy/2009/09/your-secrets-live-online-in-databases-of-ruin/One attempt to make data workable is called differential privacy. Apple mentioned the use of differential privacy in its 2016 WWDC keynote. https://www.theverge.com/2016/6/17/11957782/apple-differential-privacy-ios-10-wwdc-2016What is differential privacy?An algorithm is differentially private if you can't tell who anybody is by looking at the output. Here's a simple example. Let's say you want to publish the aggregate sales data of businesses by category. Stores want to keep sales data private. So you agree that only the total sales for a category will be published. That way you can't tell how much came from which businesses. Which is great until you come to the category of Shark repellent sales. There's only one shark repellent business in your region. If you publish that category you won't be saying the name of the business but it will be easy to tell who it is.So, you have an algorithm that looks for categories where that's a problem and maybe it deletes them or maybe it folds them into another category. This can get trickier if, say, there's a total sales number for the region and only one category was deleted. You just add all the published categories and subtract it from the published total and the difference is the missing business.And remember there's other data out there to use. Some attacks on data use data from elsewhere to deduce identities. Let's say you study how people walk through a park and you discover that of 100 people observed 40 walk on the path and 60 cut through the grass. Seems private enough right. There's no leakage of data in the published results.But an adversary discovers the names of the people who participated in the study. And they want to find out of Bob walks on the grass so they can embarrass him. They also found out that of the 99 people in the study who weren't Bob, 40 walked the path and 59 walked on the grass. BINGO! Bob is a grass walker. Now I know it sounds unrealistic that the adversary got that much info without just getting all of it. But differential privacy would protect Bob's identity even if the adversary had all that info. So what do we do? How do we do this differential privacy thing?In 2003 Kobbi Nissim and Irit Dinur demonstrated that, mathematically speaking, you can't publish arbitrary queries of a database without revealing some amount of private info. Thus the Fundamental Law of Information Recovery, which says that privacy cannot be protected without injecting noise. In 2006 Cynthia Dwork, Frank McSherry, Kobbi Nissim and Adam D. Smith published an article formalizing the amount of noise that needed to be added and how to do it. That work used the term differential privacy.A little bit on what that... See acast.com/privacy for privacy and opt-out information.
The field of privacy in machine learning is becoming increasingly important. With legislation like GDPR, it is becoming necessary for us, data scientists, to be mindful about privacy concerns related to the applications we develop. In this episode we interview Ran Gilad Bachrach, a researcher at Microsoft Research, that tells us about privacy in machine learning. We'll talk about differential privacy, about homomorphic encryption and how it enables training models on encrypted data, and about secure multi party computation - a field who's goal is to help different parties train models together, even when they can't share their data with one-another. This episode is sponsored by sisense. They're hiring data scientists! find our more on https://www.sisense.com/careers/
Neural Implant podcast - the people behind Brain-Machine Interface revolutions
Andrew Trask is currently a student at Oxford University and Author of Grokking Deep Learning performing his Ph.D. concerning anonymizing data. He discusses the facts concerning deep and machine learning and their possible benefits to society. Trask also discusses privacy securing techniques that would further benefit the field. Finally, Trask discusses his connection with Open Mind, which is a company that uses machine and deep learning to overcome the barriers in adoption. Top Three Takeaways: Machine Learning is a set of algorithms that allows for a system to learn while deep learning is a subset of Machine Learning techniques that are inspired by the human brain. Research in Deep Learning and Machine Learning to bring down sample complexity; more data is always better but cannot always be managed. Different types of privacy securing techniques are used to further propel the fields of Machine and Deep Learning. Show Notes: [0:00] Ladan introduces Andrew Trask who will discuss deep learning. He currently is performing his Ph.D. at University of Oxford concerning anonymizing data. [1:10] Ladan mentions how Trask wrote his book Grokking Deep Learning. The book seeks to teach the fundamentals of deep learning; the term “grokking” comes from the idea of an innate understanding. [2:25] The book fills the void for an intuitive guide in the subject of deep learning. [3:20] Trask was not a Ph.D. student when he started writing the book; he found an implementation of the deep neural networking and removed as much unnecessary information as possible. [5:55] Machine Learning is a set of algorithms that allows for a system to learn while deep learning is a subset of Machine Learning techniques that are inspired by the human brain. [6:30] The Deep Learning’s parametric algorithm would construct a hierarchical view of the world by recognizing lines and edges; the second part of the algorithm would take this information to form shapes, textures, and shadows. [8:20] Machine Learning includes Deep Learning and other learning techniques as subsets. [10:00] Trask disagrees with those who claim that all forms of Machine Learning and Deep Learning as artificial intelligence; Machine and Deep Learning focus on finding patterns. [11:50] Sample complexity relates to how many data points an algorithm needs to learn a pattern. [12:40] Research in Deep Learning and Machine Learning to bring down sample complexity; more data is always better but cannot always be managed. [14:00] There is much more unlabelled data in any field than labeled data; large amounts of labeled data is preferable. [16:00] Hospitals are not willing to share useful information for the development of algorithms in safe ways. [16:30] Research in this field concerns sharing private and intelligent information in a secure way. [17:40] An example of useful Deep Learning would be to find trends associated with aging in the brain that could lead to the reversal of its effects. [19:00] Federated learning is a new tool that replaces approaching different data providers with sending statistical models into someone's organization that only reveals the results needed. [21:30] Differential Privacy is a set of formal proofs to prove that statistic leaving an organization has no private data. [22:40] Patterns that are not unique to someone should not be considered private information. [24:00] For example, brain cells firing in response to certain information that is generalizable would not be considered personal information. [25:15] Secure Multiparty Computation refers to how the statistical model that is used for Machine Learning is put at risk. [27:50] AI models and data sets are just large collections of numbers. [28:30] To learn more, study a deep learning framework; fast.ai is a very helpful website for this. [29:00] Open Mind serves as an open-source community that utilizes privacy securing technologies to lower the barriers of adoption.
This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Today we’re joined by Andrew Trask, PhD student at the University of Oxford and Leader of the OpenMined Project. OpenMined is an open-source community focused on researching, developing, and promoting tools for secure, privacy-preserving, value-aligned artificial intelligence. Andrew and I caught up back at NeurIPS to dig into why OpenMined is important and explore some of the basic research and technologies supporting Private, Decentralized Data Science. We touch on ideas such as Differential Privacy, and Secure Multi-Party Computation, and how these ideas come into play in, for example, federated learning. Thanks to Pegasystems for sponsoring today's show! I'd like to invite you to join me at PegaWorld, the company’s annual digital transformation conference, which takes place this June in Las Vegas. To learn more about the conference or to register, visit pegaworld.com and use TWIML19 in the promo code field when you get there for $200 off. The complete show notes for this episode can be found at https://twimlai.com/talk/241.
This week: The New Horizons spacecraft took pictures of an object in the Kuiper belt; a study that brings up questions about how to define death; there’s a major upcoming scientific study that the US conducts every 10 years: the US census; and a look into the pricing and access to scientific journals.
This Week in Machine Learning & Artificial Intelligence (AI) Podcast
In this episode, our final episode in the Differential Privacy series, I speak with Chang Liu, applied research scientist at Georgian Partners, a venture capital firm that invests in growth stage business software companies in the US and Canada. Chang joined me to discuss Georgian’s new offering, Epsilon, a software product that embodies the research, development and lessons learned helps in helping their portfolio companies deliver differentially private machine learning solutions to their customers. In our conversation, Chang discusses some of the projects that led to the creation of Epsilon, including differentially private machine learning projects at BlueCore, Work Fusion and Integrate.ai. We explore some of the unique challenges of productizing differentially private ML, including business, people and technology issues. Finally, Chang provides some great pointers for those who’d like to further explore this field. The notes for this show can be found at twimlai.com/talk/135
This Week in Machine Learning & Artificial Intelligence (AI) Podcast
In this episode of our Differential Privacy series, I'm joined by Nicolas Papernot, Google PhD Fellow in Security and graduate student in the department of computer science at Penn State University. Nicolas and I continue this week’s look into differential privacy with a discussion of his recent paper, Semi-supervised Knowledge Transfer for Deep Learning From Private Training Data. In our conversation, Nicolas describes the Private Aggregation of Teacher Ensembles model proposed in this paper, and how it ensures differential privacy in a scalable manner that can be applied to Deep Neural Networks. We also explore one of the interesting side effects of applying differential privacy to machine learning, namely that it inherently resists overfitting, leading to more generalized models. The notes for this show can be found at twimlai.com/talk/134.
This Week in Machine Learning & Artificial Intelligence (AI) Podcast
In this episode of our Differential Privacy series, I'm joined by Zahi Karam, Director of Data Science at Bluecore, whose retail marketing platform specializes in personalized email marketing. I sat down with Zahi at the Georgian Partners portfolio conference last year, where he gave me my initial exposure to the field of differential privacy, ultimately leading to this series. Zahi shared his insights into how differential privacy can be deployed in the real world and some of the technical and cultural challenges to doing so. We discuss the Bluecore use case in depth, including why and for whom they build differentially private machine learning models. The notes for this show can be found at twimlai.com/talk/133
This Week in Machine Learning & Artificial Intelligence (AI) Podcast
In the first episode of our Differential Privacy series, I'm joined by Aaron Roth, associate professor of computer science and information science at the University of Pennsylvania. Aaron is first and foremost a theoretician, and our conversation starts with him helping us understand the context and theory behind differential privacy, a research area he was fortunate to begin pursuing at its inception. We explore the application of differential privacy to machine learning systems, including the costs and challenges of doing so. Aaron discusses as well quite a few examples of differential privacy in action, including work being done at Google, Apple and the US Census Bureau, along with some of the major research directions currently being explored in the field. The notes for this show can be found at twimlai.com/talk/132.
Daniel Winograd-Cort University of Pennsylvania, USA, gives the first presentation in the third panel, Applications, in the ICFP 2017 conference. Co-written by Andreas Haeberlen and Aaron Roth, University of Pennsylvania, USA. Differential privacy is a widely studied theory for analyzing sensitive data with a strong privacy guarantee--any change in an individual's data can have only a small statistical effect on the result--and a growing number of programming languages now support differentially private data analysis. A common shortcoming of these languages is poor support for adaptivity. In practice, a data analyst rarely wants to run just one function over a sensitive database, nor even a predetermined sequence of functions with fixed privacy parameters; rather, she wants to engage in an interaction where, at each step, both the choice of the next function and its privacy parameters are informed by the results of prior functions. Existing languages support this scenario using a simple composition theorem, which often gives rather loose bounds on the actual privacy cost of composite functions, substantially reducing how much computation can be performed within a given privacy budget. The theory of differential privacy includes other theorems with much better bounds, but these have not yet been incorporated into programming languages. We propose a novel framework for adaptive composition that is elegant, practical, and implementable. It consists of a reformulation based on typed functional programming of the privacy filters of Rogers et al (2016), together with a concrete realization of this framework in the design and implementation of a new language, called Adaptive Fuzz. Adaptive Fuzz transplants the core static type system of Fuzz to the adaptive setting by wrapping the Fuzz typechecker and runtime system in an outer adaptive layer, allowing Fuzz programs to be conveniently constructed and type-checked on the fly. We describe an interpreter for Adaptive Fuzz and report results from two case studies demonstrating its effectiveness for implementing common statistical algorithms over real data sets.
Esta semana hago una reflexión con respecto a bajar o no bajar las betas públicas de cualquier programa o sistema operativo y una reseña brevísima de lo que más me ha gustado de #iOS10 en la primera versión de su beta pública. Además les platico sobre el tema de la #privacidad en #Apple y el concepto de #DifferentialPrivacy que propone la empresa de la manzana para equilibrar el tema de privacidad y recolección de datos. Si les interesa más el tema, este artículo que habla sobre Differential Privacy puede ser muy interesante: http://blog.cryptographyengineering.com/2016/06/what-is-differential-privacy.html Espero como siempre tus comentarios: Twitter: @velvor Facebook: http://facebook.com/technovertpodcast mail: velvor@technovert.com.mx Patrocinadores: PDF Expert para Mac: https://pdfexpert.com Curso de Ulysses de Javier Cristobal: https://gumroad.com/l/curso-ulysses/30nanowrimo2015
What is Machine Learning? How are companies & developers using it? We discuss that, the major approaches in the market & Apple’s use of Differential Privacy. Plus Mike’s new Linux desktop, some feedback & a lot more!
- Differential Privacy in iOS 10 to be Opt-In - iTunes Stores Hit By End of Week Outage - Talk of 5k Apple Display with Integrated GPU Resurfaces - Apple Sued Over Voice over IP IP - Wanna Fight Human Trafficking? There’s an App for That! - Four-Dollar Smartphone Hits Indian Market This Week - Apple Takes Pride; Gives Rainbow Watch Bands - Get your guard up with ProXPN - Save with Offer Code osken at - Atlassian, helping teams - everywhere – Team Up - to create what’s next. Learn more at - Have you subscribed to EYE Chart Radio Yet? You should! ! - Power Mac OS Ken through Patreon at ! - Send me an email: or call (716)780-4080!
Apple wants to study iPhone users' activities and use it to improve performance. Google collects data on what people are doing online to try to improve their Chrome browser. Do you like the idea of this data being collected? Maybe not, if it's being collected on you--but you probably also realize that there is some benefit to be had from the improved iPhones and web browsers. Differential privacy is a set of policies that walks the line between individual privacy and better data, including even some old-school tricks that scientists use to get people to answer embarrassing questions honestly. Relevant links: http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42852.pdf
We returned to our usual format this week, kicking things off with a News Roundup, and including a Question of the Week and Weekly Pick. Our News Roundup covered three topics: Instagram's release of user numbers, including 500 million monthly active users and 300 million daily active users; Tencent's acquisition of a majority stake in Supercell for $8.6 billion; and BlackBerry's earnings. Our Question of the Week is: "How can differential privacy make my life better?" Aaron did a bunch of homework on this concept, which Apple introduced to many of us at WWDC last week but has actually been around for much longer. He tells us what differential privacy is, what some of the real-world applications are, and the benefits and limitations of this approach. We've included some links to some of Aaron's reading material in the show notes. Lastly, we discussed some other topics relating to WWDC which we didn't get to last time or which have emerged since last week's episodes, including the reviews of the macOS Sierra release which came out this week, and reports from the Wall Street Journal that Apple's next iPhone will largely stick to the iPhone 6 and 6s form factor while ditching the 3.5mm headphone jack. We wrapped up the episode with our Weekly Pick, this week a recommendation from Jan. As usual, you'll find links to related material on the website at podcast.beyonddevic.es.
Secure multiparty computation (MPC) and differential privacy are two notions of privacy that deal respectively with how and what functions can be privately computed. In this talk, I will first give an overview of MPC and differential privacy. Then, I will show how to build a two party differentially private secure protocol in the presence of semi-honest and malicious adversaries.Computing a differentially private function using secure function evaluation prevents private information leakage both in the process, and from information present in the function output. However, the very secrecy provided by secure function evaluation poses new challenges if any of the parties are malicious. We then relax the utility requirement of computational differential privacy to reduce computational cost, still giving security with rational adversaries. Finally, we provide a modified two-party computational differential privacy definition and show correctness and security guarantees in the rational setting. About the speaker: Balamurugan Anandan is a PhD candidate in Computer Science from Purdue University and works with Prof. Chris Clifton. He received his bachelor's degree in computer science from Kongu Engineering College, India in 2005 and MS in computer science from Purdue University in 2013. His research interests is in the intersection of data mining and privacy, specifically focussing on developing privacy preserving protocols.
Secure multiparty computation (MPC) and differential privacy are two notions of privacy that deal respectively with how and what functions can be privately computed. In this talk, I will first give an overview of MPC and differential privacy. Then, I will show how to build a two party differentially private secure protocol in the presence of semi-honest and malicious adversaries. Computing a differentially private function using secure function evaluation prevents private information leakage both in the process, and from information present in the function output. However, the very secrecy provided by secure function evaluation poses new challenges if any of the parties are malicious. We then relax the utility requirement of computational differential privacy to reduce computational cost, still giving security with rational adversaries. Finally, we provide a modified two-party computational differential privacy definition and show correctness and security guarantees in the rational setting.
Differential privacy is a very powerful approach to protecting individual privacy in data-mining; it's also an approach that hasn't seen much application outside academic circles. There's a reason for this: many people aren't quite certain how it works. Uncertainty poses a serious problem when considering the public release of sensitive data. Intuitively, differentially private data-mining applications protect individuals by injecting noise which "covers up" the impact any individual can have on the query results. In this talk, I will discuss the concrete details of how this is accomplished, exactly what it does and does not guarantee, common mistakes and misconceptions, and give a brief overview of useful differentially privatized data-mining techniques. This talk will be accessible to researchers from all domains; no previous background in statistics or probability theory is assumed. My goal in this presentation is to offer a short-cut to researchers who would like to apply differential privacy to their work and thus enable a broader adoption of this powerful tool. About the speaker: Christine Task is a PhD candidate in the Computer Science department of Purdue University, and a member of CERIAS. She has five years experience teaching discrete math and computability theory at the undergraduate level. Her research interests are in differential privacy and its application to social network analysis, and her research advisor is CERIAS fellow Chris Clifton.
Differential privacy is a very powerful approach to protecting individual privacy in data-mining; it's also an approach that hasn't seen much application outside academic circles. There's a reason for this: many people aren't quite certain how it works. Uncertainty poses a serious problem when considering the public release of sensitive data. Intuitively, differentially private data-mining applications protect individuals by injecting noise which "covers up" the impact any individual can have on the query results. In this talk, I will discuss the concrete details of how this is accomplished, exactly what it does and does not guarantee, common mistakes and misconceptions, and give a brief overview of useful differentially privatized data-mining techniques. This talk will be accessible to researchers from all domains; no previous background in statistics or probability theory is assumed. My goal in this presentation is to offer a short-cut to researchers who would like to apply differential privacy to their work and thus enable a broader adoption of this powerful tool.