Counting Sand

Share on

Counting Sand, the podcast that tackles the hard problem of how to make meaning of all the data available today. Introducing the themes at the heart of big data, high performance, and computer science, the show highlights the most cutting-edge applications. Whether discussing the best designs for a complex data system or the social implications of bringing a diverse skillset to data science, each new episode will provide research-backed perspectives on todayâ€™s hardest problems.

Angelo Kastroulis

Jul 26, 2023 LATEST EPISODE
infrequent NEW EPISODES
28m AVG DURATION
25 EPISODES

Ivy Insights

The Counting Sand podcast is an exceptional resource for anyone interested in software development and the intricate world of problem-solving. As a software developer myself, I found this podcast to be incredibly thought-provoking and a valuable addition to my knowledge base. Angelo, the host, has a remarkable talent for breaking down complex topics into easy-to-understand concepts while keeping the content interesting and engaging. I have already finished the entire series and plan to revisit it multiple times during my software journey.

One of the best aspects of The Counting Sand podcast is Angelo's ability to take historical stories and draw relevant applications to modern computationally difficult problems. By providing historical context, he creates a fascinating narrative that captures your attention from start to finish. This approach not only makes the episodes more intriguing but also helps listeners make connections between past events and present-day challenges in software development.

Another impressive aspect of this podcast is how Angelo simplifies challenging concepts without sacrificing depth or accuracy. He has a unique talent for explaining complex algorithms, problem-solving techniques, and other technical subjects in a way that anyone can understand. This accessibility allows listeners from various backgrounds, whether they are beginners or seasoned professionals, to grasp the information presented.

Although The Counting Sand podcast is near flawless in its delivery and content, one minor downside could be the pace at which new episodes are released. As an avid listener hungry for more thought-provoking content, it can be slightly disappointing when there are long gaps between episodes. However, it's important to remember that quality takes time, and Angelo's meticulous approach ensures that each episode is well-researched and carefully crafted.

In conclusion, The Counting Sand podcast is a must-listen for anyone involved in software development or those who simply enjoy exploring intellectually stimulating content. Angelo's ability to simplify complex topics while keeping them engaging is truly remarkable. Whether you're new to the field or an experienced developer looking for fresh perspectives on problem-solving, this podcast is sure to leave you inspired and eager for more. I highly recommend subscribing to The Counting Sand for a thought-provoking journey through the fascinating world of software development.

Search for episodes from Counting Sand with a specific topic:

Latest episodes from Counting Sand

AI Hot Sauce Taste Test Challenge

Play Episode Listen Later Jul 26, 2023 30:22

Key TopicsAI-optimized vs Commercially Available Hot Sauce: Angelo and Petter perform a blind taste test with three different hot sauces, one of which is AI-optimized, to see if they can determine which one is created by AI.Background of the AI Hot Sauce Creators: A brief insight into the story of Shekeib and Shohaib, the two brothers who combined their passion for data science and business to create an AI-optimized hot sauce.Understanding Bayesian Optimization: A comprehensive discussion on Bayesian Optimization, a technique that uses previous knowledge to influence future decisions, perfect for creating unique hot sauce recipes.Discussion on Other Optimization Techniques: Petter invites Angelo to delve into the different types of optimization algorithms and their pros and cons.Understanding Gradient Descent: Angelo gives a brief introduction to the concept of Gradient Descent, a popular optimization algorithm, explaining it as akin to finding a valley when on a mountain.RecommendationsCheck out the previous episode interviewing the creators of the AI-optimized hot sauce to understand their process better.For tech enthusiasts interested in AI and its applications, further exploration into optimization techniques like Bayesian Optimization and Gradient Descent can be insightful.Episode Quotes"Hot sauces are part of my favorite start of the day, so it'd be interesting to see what AI could come up with here." - Petter Graff"Bayesian is an optimization technique that centers around using your previous knowledge to influence the future and that works really well." - Angelo Kastroulis"Bayesian can kind of skip a bunch of steps because you've got a better second try." - Angelo Kastroulis"The algorithm of gradient descent basically goes like this. If you're trying to find from where you are to where you should go, imagine that you're on a mountain trying to find the valley." - Angelo Kastroulis

ai machine learning hot sauce taste test petter neural networks

AI Hot Sauce Brothers Part 2

Play Episode Listen Later Jul 12, 2023 15:05

Introduction:Angelo and Shohaib discuss the inclusion of new ingredients in hot sauce batches.Shohaib explains the process of introducing new ingredients and the excitement surrounding it.Incorporating New Ingredients:Angelo asks about the approach to incorporating new ingredients: creating new models or expanding the feature space.Shohaib suggests keeping the base model and increasing the search space for new ingredients.Both options are considered, including transferring the optimal features to another model.Metaphorical Understanding:Angelo highlights the advantage of using hot sauce as a metaphor for complex concepts.Shekeib acknowledges the clarity provided by the hot sauce analogy and the opportunity to learn more.Engaging with Mathematics:Angelo expresses his enthusiasm for discussing the mathematical side of AI.Shekeib shares his brother's interest in math and how it goes beyond his own understanding.Shohaib emphasizes the subset of AI concepts being discussed and the value of conceptualizing them through hot sauce.AI as an Expansive Field:Angelo mentions that AI encompasses various subfields, such as machine learning, Bayesian optimization, and active learning.Neural networks, deep learning, and reinforcement learning are discussed as additional branches of AI.Shohaib highlights the similarities between Bayesian optimization and reinforcement learning.Reinforcement Learning:Angelo mentions the significance of reinforcement learning in solving video games and its applicability to different domains.Shohaib shares his experience with reinforcement learning in an AI class, specifically using it to make Pac-Man play autonomously.Specialization and Continuous Learning:Angelo praises Shohaib's expertise in Bayesian optimization while acknowledging the vastness of AI knowledge.The discussion emphasizes the complexity of AI and the continuous learning required to stay up to date.Generative Pre-trained Transformers:Angelo brings up the popularity of generative pre-trained transformers like ChatGPT.The ensemble nature of these models and their unique combination of techniques is highlighted.

ai brothers chatgpt engaging machine learning computer science pac man hot sauce neural specialization bayesian

AI Hot Sauce Brothers - Part 1

Play Episode Listen Later Jun 21, 2023 24:16

IntroductionShekeib and Shohaib join the podcast as guests to talk about their experience with creating hot sauces using AI optimization.They created a special hot sauce, named "Counting Sauce," specifically for the podcast hosts.The Making of Counting SauceThis is a unique hot sauce that includes pineapple and mango flavors.The sauce was created as a token of appreciation for being featured on the podcast.Journey through Different Versions of the SauceThe hosts have tried versions 19, 20, 21, and they just received version 25.There will be a blind taste test to determine if they can tell the difference between the different iterations and compare them with other sauces to tell which is AI-created.Optimization ProcessThe process involves optimizing the amount of each ingredient.They use a Gaussian process regression model and an acquisition function called Expected Improvement for the optimization.Choice of IngredientsThe base hot sauce has five main ingredients: vinegar, pepper, jalapeno, and lime.After 25 iterations, the differences in taste become so minute it becomes hard to tell the difference.Subjective Taste TestingShekeib talks about how his taste tolerance changes after tasting hot sauces all day.They involved family and friends in the tasting process and asked for ratings on a scale of one to ten.The Learning Curve of the AIEarly on, the AI would try extreme variations like too much or too little salt.It learned quickly from feedback and adjusted accordingly.Strength of Bayesian OptimizationThe AI can learn mathematically from feedback and apply the learnings, making the optimization process quicker and more efficient.It was also able to tweak multiple ingredients simultaneously, unlike a human who might focus on one ingredient at a time.No Prior Experience in Hot Sauce MakingBoth brothers had no prior experience or generational knowledge in hot sauce making.The AI managed to create a decent hot sauce in just five iterations.Power of Bayesian Optimization with Human ExpertiseThe brothers emphasize the importance of having a human expert in the loop of Bayesian optimization.The AI simulates the intuition and experience of a human expert, but having a real human guide the process further enhances the results.Application Beyond Hot SauceThey discuss the potential of their Bayesian optimization process in other areas such as drug discovery.The process can be guided by human experts in the respective fields for even better results.

power ai brothers strength machine learning hot sauce learning curves bayesian gaussian different versions

Bonus: Season 2 Recap

Play Episode Listen Later Aug 9, 2022 35:28

In a time crunch? Check out the time stamps below:[00:45] - Moore's Law, where do we go from here?[03:00] - How do we improve data system efficiency?[10:30] - Purpose-built systems (FPGAs)[11:13] - Insights on (FPGAs)[13:32] - Event Streaming[17:50] - Data storage[18:34] - Google's approach to data storage[19:00] - Downtime[21:06] - Serves impact on environment and solutions to optimize[23:00] - Improving data systems, machine learning, artificial intelligence[24:06] - How do we regulate AI?[26:10] - Benefits of simulations through machine learning[28:38] - The impact computer science has on astrophysics[31:09] - How do we defy Moore's Law, the future of quantum computingOur Team:Host: Angelo KastroulisExecutive Producer: Náture KastroulisProducer: Albert PerrottaCommunications Strategist: Albert PerrottaAudio Engineer: Ryan ThompsonMusic: All Things Grow by Oliver Worth

google ai law benefits data improving machine learning computer science serves downtime quantum computing fpgas data systems

The End of Moore's Law Part 2

Play Episode Listen Later Jul 26, 2022 32:57

The last time we had Manos on the program, we talked about Moore's Law coming to an end. It's important to note that we can't rely on just sheer computing power doubling to be able to meet our ever-increasing demand for data. We must find new and exciting ways to collect and compute large amounts of data. In this episode of Counting Sand, we will dive deep into what does a database actually do? What is at the core of a data system? Most importantly, how can we use new and exciting techniques to free up the CPU's load by algorithmic trickery. In a time crunch? Check out the time stamps below:[00:53] - Guest Intro [01:30] - Intro to data systems[03:00] - Hardware types[05:00] - Why is it important to choose the right format[10:15] - What is column storage, and what the benefits[16:30] - Injecting the CPU, The hierarchy of memory[20:00] - Why not just duplicate data[22:55] - Acid properties Notable references:Relational Memory: Native In-Memory Accesses on Rows and Columns Our Team:Host: Angelo KastroulisExecutive Producer: Náture KastroulisProducer: Albert PerrottaCommunications Strategist: Albert PerrottaAudio Engineer: Ryan ThompsonMusic: All Things Grow by Oliver Worth

law algorithms machine learning computer science hardware acid data science databases manos kafka cpu injecting rows data systems

Dynamo: The Research Paper that Changed the World

Play Episode Listen Later Jul 5, 2022 35:08

The cycle between research and application is often too long and can take decades to complete. It is often asked what bit of research or technology is the most important? Before we can answer that question, I think it's important to take a step back and share the story of why we believe The Dynamo Paper is so essential to our modern world and how we encountered it. Citations:DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., ... & Vogels, W. (2007). Dynamo: Amazon's highly available key-value store. ACM SIGOPS operating systems review, 41(6), 205-220.Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., & Lewin, D. (1997, May). Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing (pp. 654-663).Lamport, L. (2019). Time, clocks, and the ordering of events in a distributed system. In Concurrency: the Works of Leslie Lamport (pp. 179-196).Merkle, R. C. (1987). A digital signature based on conventional encryption. In Proceedings of the USENIX Secur. Symp (pp. 369-378). Our Team:Host: Angelo KastroulisExecutive Producer: Náture KastroulisProducer: Albert PerrottaCommunications Strategist: Albert PerrottaAudio Engineer: Ryan ThompsonMusic: All Things Grow by Oliver Worth

time technology ai research theory consistent machine learning computer science levine distributed changed the world lehman dynamo acm lewin merkle research papers vogels lakshman karger lamport in proceedings leslie lamport

The Promise of AI: Opportunities and Obstacles

Play Episode Listen Later Jun 21, 2022 38:11

This show often discusses artificial intelligence and ideas to consider as technology progresses. We have discussed the deep tech of how it works and its implications on privacy. In this episode, we'll talk about the complex and controversial topic of AI policy and speak about some of the things we should be worried about regarding its future. In a time crunch? Check out the time stamps below:[01:15] - Guest Intro [03:38] - Western technology leadership[04:50] - Regulating AI[11:00] - The promise of self-driving cars[13:05] - AI data audition [17:50] - Neural networks to train AI[19:00] - Reducing mathematical knowledge, AI bottleneck [20:35] - What is in the way of the promise of AI[24:20] - Eric Daimler book[27:50] - The uses of trained AI models[29:30] - Health care industry data usage[33:25] - AI to speed up research[33:50] - What is rural AI? Guest Links:https://www.linkedin.com/in/ericdaimler/https://conexus.com/ Our Team:Host: Angelo KastroulisExecutive Producer: Náture KastroulisProducer: Albert PerrottaCommunications Strategist: Albert PerrottaAudio Engineer: Ryan ThompsonMusic: All Things Grow by Oliver Worth

health technology ai mit western blockchain obstacles reducing quantum machine learning obama administration encryption neural neural networks ai opportunities

Energy, Edge Computing, and Data Centers

Play Episode Listen Later Jun 7, 2022 27:44

What if there was a way to reduce the amount of energy consumed and produced from servers around the world. Would these new methods positively or negatively impact the environmental footprint of today's big data ecosystems? In a time crunch? Check out the time stamps below:[02:15] - Research Paper [05:55] - Power consumption of data centers and methods to save energy [08:50] - Server cooling methods [12:00] - Energy production from data transportation [13:55] - The impact of location and climate through venting and cooling computers. [15:38] - Edge devices and cloud computing [20:47] - Cost and energy optimization[21:45] - Machine Learning + A.I. productive maintenance[24:45] - Automobile processing unit, big data Our Team:Host: Angelo KastroulisExecutive Producer: Náture KastroulisProducer: Albert PerrottaCommunications Strategist: Albert PerrottaAudio Engineer: Ryan ThompsonMusic: All Things Grow by Oliver Worth

power ai energy cost machine learning big data data science automobiles server renewable energy data centers edge computing research papers energy production

Cutting-Edge Data Systems: Machine Learning

Play Episode Listen Later May 24, 2022 27:58

Over the last couple of years, Harvard Data Systems Lab has been focused on cutting-edge research and applications of complex data systems, focusing on such areas as artificial intelligence and machine learning pipelines. In this episode of Counting Sand, Angelo and Stratos dive deep into what they have learned and what's next in these fields. In a time crunch? Check out the time stamps below:[01:00] - What's new at Harvard Data Systems Lab?[08:20] - What are examples of general data structure applications?[14:13] - How do we decrease the time spent from research to application?[20:23] - What are the benefits of machine learning?[22:15] - What are some helpful tips when writing a thesis?[25:00] - How important is the creative process when writing a research paper? Helpful links:Harvard Data Systems Lab: http://daslab.seas.harvard.edu/Harvard Data Systems Lab Twitter: https://twitter.com/HarvardDASlab Our Team:Host:Angelo KastroulisExecutive Producer: Náture KastroulisProducer: Albert PerrottaCommunications Strategist: Albert PerrottaAudio Engineer: Ryan ThompsonMusic: All Things Grow by Oliver Worth

artificial intelligence machine learning big data cutting edge stratos data systems

Kafka Event Streaming Part 2

Play Episode Listen Later May 10, 2022 24:49

Our Team:Host:Angelo KastroulisExecutive Producer: Náture KastroulisProducer: Albert PerrottaCommunications Strategist: Albert PerrottaAudio Engineer: Ryan ThompsonMusic: All Things Grow by Oliver Worth

event streaming hiring developers machine learning big data computer science data science kafka quantum computing data systems

Kafka Event Streaming Part 1

Play Episode Listen Later Apr 26, 2022 32:14

Is Kafka a one-size-fits-all solution? Or does this event sourcing software have an inherent set of strengths? Join Angelo and Kafka guru Anna McDonald as they share use cases and swap stories about how Kafka has radically changed the field of computer science. In a time crunch? Check out the time stamps below:[00:54] - How did Kafka change the world?[04:40] - What is so great about big data technology?[07:00] - Outbox pattern 101[10:45] - clinical decision support use case[13:05] - Should I build it or buy it?[17:05] - Is Kafka a one-size-fits-all for businesses?[21:55] - Kafka tuning 101[25:53] - A.I. for Kafka tuning Helpful links:https://www.confluent.io/https://www.youtube.com/channel/UC37UjjtsxpZWS_0QGPKEHdA Our Team:Host:Angelo KastroulisExecutive Producer: Náture KastroulisProducer: Albert PerrottaCommunications Strategist: Albert PerrottaAudio Engineer: Ryan ThompsonMusic: All Things Grow by Oliver Worth

event streaming machine learning big data computer science data science data analytics kafka quantum computing apache kafka apache spark event sourcing outbox anna mcdonald

Simulating Biological Systems Part 2

Play Episode Listen Later Apr 12, 2022 27:13

This episode touches on computer simulations, machine learning, and GPU's. How do these aspects of computer science relate and differ? Andy and John dive deep into how they push the boundaries of what is possible and practical in modern medicine by simulating biological systems.Our Team:Host:Angelo KastroulisExecutive Producer: Náture KastroulisProducer: Albert Perrotta; Communications Strategist: Albert Perrotta;Audio Engineer: Ryan ThompsonMusic: All Things Grow by Oliver Worth

machine learning big data computer science biological gpu genomics quantum computing neural networks healthcare innovation human biology simulating computer simulations simulation modeling

Simulating Biological Systems Part 1

Play Episode Listen Later Mar 29, 2022 28:55

The episode starts by asking the question, what if we could use computer science to shorten the amount of time it takes to discover new medications? Angelo then shares, "If we meditate on that for just a second, our minds might wander over into the world of machine learning and artificial intelligence, where we can imagine a world where these complicated neural networks or other types of AI are trying to discover a new kind of chemical compound. We might even think about the far future things like quantum because it has application in chemistry because chemistry can be thought of as an optimization problem. Or we could do something like a simulation. What if we could simulate the chemical structures of the world, or we could even simulate the body. We could conceivably introduce new kinds of compounds to the body and see how it reacts." Our Team:Host: Angelo KastroulisExecutive Producer: Náture Kastroulis Producer: Albert Perrotta; Communications Strategist: Albert Perrotta;Audio Engineer: Ryan ThompsonMusic: All Things Grow by Oliver Worth

ai biology machine learning big data computer science biological genomics quantum computing simulations neural networks human biology simulating

Galaxy Evolution

Play Episode Listen Later Mar 15, 2022 27:27

The episode touches on the wondrous journey a galaxy undergoes as it evolves through its life cycle. Angelo starts off the episode by asking the question, what's an early-type galaxy? Paolo Bonfini explains that although you may think that early-type galaxies would be galaxies early in their evolution, they're not, they're galaxies a little later. They're the ultimate evolution of two galaxies coming together. Based on the topics touched on in Paolo's paper he then explains the role that supermassive black holes play in galaxy evolution. Paolo explains, "thanks to the recent development in gravitational-wave astronomy, which opened a completely new window of exploration because it's not based on electromagnetic waves, but on gravitational waves, which are a completely different thing. We are now able to explore black holes in more detail and we're able to study when supermassive black holes merged to create a bigger one."Relating to the idea of bringing new technology forward, Angelo asks has any computer science techniques assisted you to be able to model this or put it together? Paolo explains, "there are a lot of computations involved in this process. People have in mind the romantic view of the astronomer who just looks through the scope of the telescope and notes things down on a piece of paper, but modern astronomy is completely digitalized. And recently it has been even automated by a lot of procedures that they track and scan the sky to create huge catalogs. Even the images themselves, they are captured on digital devices, like, the same as they appear in the phone, basically, the same technology, but just on a more refined scale. And the first process for which you will need a computer is to combine exposures. So you cannot expose a telescope on a specific direction in the sky for a very long time, for several reasons. The summary is that, in order to take an image of some patch in the sky, you will have to take multiple images and then combine them. Now the modern telescopes, they are extremely accurate. So when you combine them, you need to align the stars to a sub pixel resolution. That means that you have to find the center of the star and itself be positioned within a single individual pixel. And when you combine images, you have to align them by with the precision of, let's say, a third of a pixel, which sounds impossible because you're like, how can you do that? But, there are some techniques that allow you to do that. And of course you need a lot of computational power for that. It can take several minutes to do this even a half an hour, let's say, to combine and produce the image that you see on famous websites, like the Hubble. I mean, this is just the first step. You mentioned a thing you need to actually extract, in my case for the study I was doing, in order to assess the lack of stars at the center of a given galaxy you actually have to measure it. So what you have to do is, you have to trace the light profile, starting from the outskirts of the galaxy going gradually towards the center. In this way, you can draw a light curve if you want. It's not exact, it's more like a light profile. So you have some intensity at the edge of the galaxy, which would be low intensity because the light is very diffused and all the center it grows, grows, grows. And at some point you will see doesn't grow as much. That's where you meet the depleted core, but you also need to quantify this because you want to actually extract the information about the amount of depleted mass, like comments that you would expect it to be versus how many you actually measure. So, you have to fit the light profile. And this is done by, okay. In my case, I've been doing this with some kind of basic statistical technique, which is the chi squared fitting. So you have our model and you just fit the model to the observation and once you have the model, you can project only the other path towards the center and you compare it with the actual model that you fit. And from the difference between the two, you have the amount of stars that are missing. So you need to explore a lot of parameters and therefore you need to have this thing automated via computer technology. There is no chance you can get this information doing it by hand."Referencing the famous space observatory, Hubble, Pablo explains what it was like to work with such a brilliant piece of machinery. He shares, "it's really amazing because the Hubble telescope was launched in the nineties and just to give you an idea is roughly the size of a bus. There is a replica of it you can visit at, I think it's the Aerospace Museum in Washington, so if you're curious. The main mirror is 2.3 meters in diameter, just to give you an idea, the larger the diameter, the higher the resolution you can achieve. On Earth, there are bigger telescopes. The biggest telescope we have on Earth is currently 10 meters. It's on the Canary Islands. On Earth you have the atmosphere on top of you and this makes everything flicker a little bit because you know, there is air moving, and these big masses of atmosphere move and this shifts the path of the light and this causes the images to be more confused. If you are instead outside the atmosphere, you don't have that problem and you really achieve the limiting resolution of your instrument. So the Hubble Space Telescope is particularly famous because of its resolution. It doesn't have a large collective area, it's only two meters, let's say, so it doesn't collect a lot of light per second. So it doesn't have, let's say, the same contrast as ground-based telescopes, but it has extremely high resolution. So when you open an image and you're saying, okay, I want to look at this galaxy and I will work on this, which is at the center of the field of view because you pointed there. But, at the edges of it, you see a lot of tiny objects and if you zoom in you can see the structure. Maybe you see a lot of spiral galaxies around the merging objects in the background. And it's not at the center of your research. You're looking at the big galaxy at the center that you're studying. But, you know, it's like a small pleasure, small candy that you have for the eye. You're looking at these things around and you are like, well, man, this is incredible. There are so many things in the universe and I'm here focusing on these big galaxies at the center, but whatever else is happening in the background, and this is really the, I think it's the most impressive thing."Angelo concludes the episode by discussing the ups and downs of crafting a research paper. Paolo touches on the rollercoaster of emotions one undergoes due to the sheer volume of work that needs to be done. to the most rewarding aspect of writing such a paper. He explains, "you know that you are at the forefront of this research, and I think this is when the reward comes when you're actually presenting and you see the people being curious and asking you directly at the conference, “What is this?” “How did you get there?” “It's very interesting. Let's work together.” “This is an idea to make it even better” and so on." Our Guest - Thank you!:Paolo Bonfini - https://www.linkedin.com/in/paolo-bonfini-phd-085a6a179/ Paolo's Paper:Connecting traces of galaxy evolution: the missing core mass-morphological fine structure relation Our Team:Host:Angelo KastroulisExecutive Producer: Náture Kastroulis Producer: Albert Perrotta; Communications Strategist: Albert Perrotta;Audio Engineer: Ryan ThompsonMusic: All Things Grow by Oliver Worth

earth washington evolution nasa galaxy data science paolo astronomy relating hubble canary islands referencing hubble space telescope bioinformatics on earth

The End of Moore's Law - What's Next?

Play Episode Listen Later Mar 1, 2022 25:47

Angelo begins this episode with the predictions of Moore's Law. In the early years, systems were restricted based on the CPUs ability to keep up. As the CPUs continued to advance, the bottlenecks ended up around data movement. Data movement of information from disc to memory and memory to cache become the big bottlenecks. Then of course, disks got faster and eventually you'd have so much RAM on a machine that it was just memory movement inside of RAM. Eventually, we believe the bottleneck will return to the CPU.Quantum computing is on the rise—that, we believe, is a game changer for Moore's Law. Because we're no longer talking about conventional computer chips and transistors, instead we're talking about something completely different.Additionally, as machine learning and artificial intelligence systems make advancements, or like Angelo's thesis using AI to tune data systems, the advancement of speed and acceleration will be impactful in order of magnitude from traditional systems.Talent and staffing will also change as we adapt to the future. Angelo admires Google's practices of hiring ability over experience because the problems we face tomorrow are different than today. The key thing is to be able to independently make progress because there isn't much room for babysitting. It's too hard to predict where the next fire will be. Angelo explains further why he hires ability over experience every single time, because it is true, someone who has ability, someone who's brilliant and has the hunger to learn new things can be programmed like a stem cell. They can just inject themselves into whatever problem they might have.Angelo transitions into his own personal story and his quest for fulfillment and happiness. He introduces a personal story of a boy who was dying during the Nazi occupied island in Chios, Greece. A doctor took pity on this boy and secretly nursed him to health. We later learn that this boy is Angelo's father. Angelo shares, “My father grew up in a world much different than mine. His siblings related stories of famine and suffering, but he never ever spoke of those things. What he chose to relate were accounts of human triumph, perseverance, hope, aspiration. The sea was his salvation, carrying him from Chios as a sailor, eventually to the United States.”So, what is our true potential? Intellectual achievements can be ignored or forgotten. But to be a successful family person, a husband, a father, a human, Angelo needed to be something more, something enduring.Education builds the qualities of perseverance, hard work, and accomplishment. There is no doubt you'll accomplish many things, but think about what it is that you're really trying to do. You see, building technical solutions isn't just about doing interesting stuff. Ultimately we're building these things for a reason. We're building technology. For example, if you're doing a healthcare application, it's going to touch somebody's life. That's the point of this breakthrough, right? You want to increase throughput, for example, in decision support, something Angelo spends a lot of time on. We want to say, increase throughput, build a system that can compute faster and bigger sets of data. Why are we doing that? Just because of the challenge of the data? No, we want to find out if a clinical intervention is working so that we can feed that information forward to those making the guidelines.You see, that's the real reason behind doing this. The great resignation has shown us that people care more about what it is they're doing and why they're doing it than just simply being interesting work. We owe it to our family to use our gifts, talents, and opportunities to the best of our ability, but to use them on something that matters.Angelo is really excited that we're going to have interesting conversations around things like the universe, data centers, energy and how they work. There's a reason the hard problem exists. Don't fixate on the fact that it's a problem. Although there is joy in having a problem and solving it.We're trying something a little bit new this season and we would love to hear which kinds of episodes you like most. Do you like interviews or do you like some of the educational discussional episodes?We're going to start a YouTube channel to help deep dive on topics like LSM Trees or RocksDB which are better served with diagrams than with just voice. Seeing the math for yourself or seeing the way that they operate for yourself on video is much more helpful. We're going to have supplementary content, bonus material that you can find on our YouTube channel, and we'll also have some bonus podcast episodes. We look forward to your feedback. Tell us what you like about the show, which topics you prefer, and what you wish we would dive a little deeper on. And we'll really try to do that. CitationsGordon Moore, Co-Founder of IntelHeisenberg, Uncertainty PrinciplePowell, James. (2008). The Quantum Limit to Moore's Law. Proceedings of the IEEE. 96. 1247 - 1248. 10.1109/JPROC.2008.925411.Merritt, Rick. (2013). Moore's Law Dead by 2022, Expert Says of EETimesAtomic Hire (2019) Further ReadingMoore's Law EndingWork and Culture at GoogleGoogle Strategy to Hire About the HostAngelo Kastroulis is an award-winning technologist, inventor, entrepreneur, speaker, data scientist, and author best known for his high-performance computing and Health IT experience. He is the principal consultant, lead architect, and owner of Carrera Group, a consulting firm specializing in software modernization, event streaming (Kafka), big data, analytics (Spark, elastic Search, and Graph), and high-performance software development on many technical stacks (Java, .net, Scala, C++, and Rust). A Data Scientist at heart, trained at the Harvard Data Systems Lab, Angelo enjoys a research-driven approach to creating powerful, massively scalable applications and innovating new methods for superior performance. He loves to educate, discover, then see the knowledge through to practical implementation. Host: Angelo KastroulisExecutive Producer: Náture KastroulisProducer: Albert Perrotta;Communications Strategist: Albert Perrotta;Audio Engineer: Ryan ThompsonMusic: All Things Grow by Oliver Worth

Bonus: Season 1 Recap

Play Episode Listen Later Jan 25, 2022 21:29

Angelo begins this episode with reflections on history and what brought us to the AI Winter. Why do we need a balance between research and practice? You don't want to rediscover what has already been discovered or settle for something that could be better if you took the time to research a bit more. In episode 4 we meet Angelo's friend Andy Lee who talks about computer science predicting our biological age. Andy actually met Greg Fahy who talked about longevity. The study focused on injecting the thymus gland with a growth hormone that produced regeneration effects. The effects were measured through the epigenetic clock known as DNA methylation.In Episode 6, Jim Shalaby talks with Angelo about how COVID-19 changed healthcare forever. Patients don't have to wait in waiting rooms, they don't have to find transportation to get there, and the patient has access to the clinicians. The hard problems associated with explainability in artificial neural networks, we talked about in Episode 8. Angelo's friend Nikos explained to us about five classic problems, one of which includes data privacy. Another big issue is developing a machine learning system to create adversarial attacks on the existing system.In episode 7, Angelo's friend Manos shared how complicated it is for people to invoke their right to have their data removed from a system. Typically those systems have to schedule deletions to remove the data through tombstones and a process called compacting.What is on the horizon and what should we be paying attention to? We are going to run against barriers of technology. For instance, Moore's law is coming to an end. What do we do about that? What is happening in the short-term and how do we get past this barrier to the next? And then how do we blow away all those barriers with moonshots like quantum computing?Finally, wrapping up our first season, Angelo wants to reflect on gratitude. Gratitude for you our listeners. Thank you so much for joining us on this journey. We really want to hear about your thoughts. The show is evolving just as the world is and we want to make sure that we're covering topics that you're interested in.We would love for you to follow, rate, and review the show on your favorite podcast platform so that others can find us too. Thank you so much for listening. Our Guests - Thank you!:Nikos Myrtakis on LinkedInManos Athanassoulis on LinkedIn and Boston UniversityJim Shalaby on Twitter and LinkedInAndy Lee on Twitter and LinkedIn About the HostAngelo Kastroulis is an award-winning technologist, inventor, entrepreneur, speaker, data scientist, and author best known for his high-performance computing and Health IT experience. He is the principal consultant, lead architect, and owner of Carrera Group, a consulting firm specializing in software modernization, event streaming (Kafka), big data, analytics (Spark, elastic Search, and Graph), and high-performance software development on many technical stacks (Java, .net, Scala, C++, and Rust). A Data Scientist at heart, trained at the Harvard Data Systems Lab, Angelo enjoys a research-driven approach to creating powerful, massively scalable applications and innovating new methods for superior performance. He loves to educate, discover, then see the knowledge through to practical implementation.Host: Angelo KastroulisExecutive Producer: Kerri Patterson; Producer: Albert Perrotta;Communications Strategist: Albert Perrotta;Audio Engineer: Ryan ThompsonMusic: All Things Grow by Oliver Worth

Machine Learning: Your Right to Explainability

Play Episode Listen Later Jan 11, 2022 31:39

How do we make the next generation of machine learning models that are explainable? How do you start finding new kinds of models that might be explainable? Where do you even start thinking about that process from a research perspective?Nikos begins with a discussion on how we make decisions in general. In the scientific world, we mostly reason through statistical or cause-and-effect type scenarios. We can predict outcomes and train our models to produce the results we traditionally expect.He then discusses other early pioneers in this work, for example, back in the 70s, a rules engine was developed to help clinicians make diagnoses. It turns out that humans are very complex and hard to codify. Dr. Charles Forgy wrote his thesis on the Rete algorithm which is what modern-day rules-based engines stem from.After the AI winter period, there was the introduction of neural networks that would encode the rules. This became an issue for explainability on why the rule was created. The neural networks create a mathematical weighted data model evaluated against the outcome. Without the ability to open up the network to determine why some data was weighted higher than another, has been the challenge in explaining the results we see. There is also a concern from the European Union General Data Protection Regulation (GDPR) where a human has the right to obtain meaningful information about the logic involved, commonly interpreted as the right to an explanation. We want to look at explainability through two factors: a local point of view and a global point of view. The global objective is to extract a general summary that is representative of some specific data set. So we explain the whole model and not just local decisions. The local objective is to explain a simple prediction as a single individual observation in the data. But you have a decision according to a neural network or a classifier or a regression algorithm, so the objective is to explain just a single observation. There are five problems that present themselves in explainability: Instability, Transparency, Adversarial Attacks, Privacy, and Analyst Perspective.For Instability, we look at heat maps as they are very sensitive to hyperparameters, meaning the way that we tuned that network. How we adjusted the sensitivity then impacts the interpretation. Transparency becomes more difficult the more accurate machine learning is. We call that transparency because machine learning models, neural networks, are black boxes with very high dimensionality. But what's interesting is that we can say that their prediction accuracy makes explainability inversely proportional to that. An Adversarial Attacks example is to imagine that interpretability might enable people, or programs to manipulate the system. So if one knows that for instance, having three credit cards can increase his chance of getting a loan then they can game the system by increasing their chance of getting the loan without really increasing the probability of repaying the loan. Privacy can impact your access to the original data especially in complex systems where boundaries can exist between other companies. You might not have the ability to access original data. Lastly, the Analyst Perspective. When a human gets involved to explain the system, important questions include, where to start first and how ensuring the interpretation aligns with how the model actually behaved. There are some systems by which the ML has multi-use and the human is trying to understand the perspective of use for the result given. These are some specific ways we have found that create the complexity and challenges in explainability with machine learning models.We continue to learn and adjust based on those learnings. This is a very interesting and important topic that we will continue to explore. CitationsDr. Charles Forgy (1979), On The Efficient Implementation of Production Systems, Carnegie Mellon University, ProQuest Dissertations Publishing, 1979, 7919143Nadia Burkart, Marco F. Huber (2020) A Survey on the Explainability of Supervised Machine Learning, arXiv:2011.07876 (cs) Further Readinghttps://openaccess.thecvf.com/content_CVPR_2019/papers/Pope_Explainability_Methods_for_Graph_Convolutional_Neural_Networks_CVPR_2019_paper.pdfhttps://towardsdatascience.com/explainable-deep-neural-networks-2f40b89d4d6f Nikos' Papers:https://www.mdpi.com/2079-9292/8/8/832/htmhttps://link.springer.com/article/10.1007/s11423-020-09858-2https://arxiv.org/pdf/2011.07876.pdfhttps://arxiv.org/pdf/2110.09467.pdf Host:Angelo KastroulisExecutive Producer: Kerri Patterson; Producer: Leslie Jennings Rowley; Communications Strategist: Albert Perrotta;Audio Engineer: Ryan ThompsonMusic: All Things Grow by Oliver Worth

ai privacy transparency survey machine learning big data computer science papers ml carnegie mellon university instability rete neural networks nikos arxiv explainable ai explainability cvpr

The Boundaries of Personal Data

Play Episode Listen Later Dec 28, 2021 32:48

Angelo and Manos' connection began in the 265 Course at Harvard University on Big Data Systems. This course inspired Angelo's thesis. The two discuss Manos' papers and how the future of Big Data is on the boundaries of Moore's Law. If you think about LSM trees (Log-Structured Merge Trees) and compacting data, what is considered acceptable deletion when users ask for their data to be removed? Is it when the data is removed from the identifying user that is good enough? In the analysis of Big Data Systems, considerations are always towards performance. An extensive delete sequence will cause a significant disruption in the system. Most people would address the completion of current execution cycles, perhaps during non-peak hours, and flag the no longer valid data. Maybe it could be that your data starts to become dirty, then what? How do you solve issues like privacy and the request for the "Right to be forgotten" or the "Right to erase"? Manos speaks about the papers he has written, which you can read in the links below. He addresses the delete question and boundaries with privacy in mind. Performance is a crucial factor, and looking at the issue holistically is just as important as encryption when protecting privacy.Mano's Research Papers https://dl.acm.org/doi/10.1145/3318464.3389757https://disc-projects.bu.edu/lethe/https://blogs.bu.edu/mathan/2020/06/29/lets-talk-about-deletes/Further ReadingCS265: Big Data Systems - Spring 2020Manos Athanassoulis homepageCalifornia Consumer Privacy Act - BCLP California Consumer Protection Act InformationGeneral Data Protection Regulation (GDPR) – Final text neatly arrangedFast 21 Chen Hao

law performance boundaries artificial intelligence harvard university big data mano manos data privacy personal data research papers lsm

How did COVID-19 change the way doctors make decisions?

Play Episode Listen Later Dec 14, 2021 23:44

Angelo begins this episode with a few questions about the changes caused by COVID-19, specifically around the patient data gathering, such as blood pressure. With telemedicine practice, how reliable is the data, who is legally responsible for the accuracy of the data gathered, and how exactly do clinical decision support (CDS) tools adjust with this new change in a traditional clinician workflow?Angelo explores more on the topic of IoT devices and the data brought into medical decisions. Again, how accurate is the data from these IoT devices, such as Fitbit scales, that a clinician can diagnose and treat from? Jim brings up some of the challenges that came with telemedicine such as workflow within a clinic. If the clinician seeing a patient wants the dietitian to speak with the patient, it is more of a challenge to coordinate than being within a few feet of each other. The other challenge relates to security policy and considerations patients need to agree to regard their personal privacy. To get into a virtual visit with a clinician, a patient has to follow the security protocol that provides a barrier for some elderly and disabled patients. Lastly, the challenge of all this data a patient could be collecting in their IoT devices is, how do you move that data into the EHR or in some format a CDS tool could ingest?With the use of CDS, machine learning, and AI, the future is ripe for opportunity.Further ReadingWhat is CDS - Health Gov ITResearchGate Publication on IoT in Health CarePrivacy-Preserving Single Decision TreeJim Shalaby on Twitter and LinkedIn

covid-19 ai doctors healthcare big data cds iot fitbit make decisions ehr

Rosenblatt's Perceptron: What Can Neural Networks Do For Us?

Play Episode Listen Later Nov 30, 2021 31:59

In any discussion of artificial intelligence and machine learning today, artificial neural networks are bound to come up. What are artificial neural networks, how have they developed, and what are they poised to do in the future? Host Angelo Kastroulis dives into the history, compares them to biological systems that they are meant to mimic, and talks about how hard problems like this one need to be handled carefully.Angelo begins with a discussion of how biological neural networks help make our brain a powerful computer of complexity. He then talks about how artificial neural networks recruit the same structures and connections to create artificial intelligence. To understand what we mean by artificial intelligence, Angelo explains how the Turing Test works and how Turing's work forms a foundation for modern AI. He then discusses other early pioneers in this work, namely Frank Rosenblatt, who worked on models that could learn or “perceptrons.” Angelo then relates the history of how this work was criticized by Marvin Minsky and Seymour Papert and how mistakes in their own work put the potential advances of artificial neural networks back by about two decades.Using image recognition as a case study, Angelo ends the episode by talking about about various approaches' benefits and drawbacks to illustrate what we can do with artificial neural networks today.CitationsHebb, D.O. (1949). The organization of behavior: A neuropsychological theory. New York: Wiley.Minsky, M. (1954.) Theory of neural-analog reinforcement systems and its application to the brain-model problem. Doctoral dissertation. Princeton: Princeton University.Minsky, M. and Papert, S. (1969). Perceptrons: An introduction to computational geometry. Cambridge: MIT Press.Rosenblatt, F. (1957). "The perceptron: A perceiving and recognizing automaton.”Buffalo: Cornell Aeronautical Laboratory, Inc. (Accessible at https://blogs.umass.edu/brain-wars/files/2016/03/rosenblatt-1957.pdf)Rosenblatt, F. (1962). Principles of neurodynamics: Perceptrons and the theory of brain mechanisms. Washington, D.C.: Spartan Books_._Turing, A. (1950, October). "Computing machinery and intelligence," Mind, LIX: 236, pp. 433–460. https://doi.org/10.1093/mind/LIX.236.433 Further ReadingWarren McCollough and the McCollough-Pitts NeuronChurch-Turing ThesisTuring TestXOR or Exclusive orHost: Angelo KastroulisExecutive Producer: Kerri Patterson; Producer: Leslie Jennings Rowley; Communications Strategist: Albert Perrotta; Audio Engineer: Ryan ThompsonMusic: All Things Grow by Oliver Worth© 2021, Carrrera Group

washington technology ai research psychology mind theory artificial intelligence principles analytics machine learning big data computer science accessible data science computing deep learning brain science turing doctoral systems thinking neural networks turing test rosenblatt minsky data systems marvin minsky image recognition lix seymour papert papert perceptron

Can Computer Science Help Us Live Longer?

Play Episode Listen Later Nov 16, 2021 31:25

Can computer science help us live longer? It is a complex question that requires an understanding of our anticipated lifespan and how the health of our body at this moment compares to the average person our age, what some would call calculating our biological age. What computer science stands to help with is understanding how better to pinpoint that biological age and what factors may play into holding off the aging process. Guest Andy Lee, of NeuroInitiative and Vincere Biosciences, speaks with host Angelo Kastroulis about this hard problem of predicting our biological age and potentially reversing it.Angelo begins with a discussion on some of the classical age models that have previously looked at the question of determining one's biological age. He talks about classic papers by Horvath and Fahy that have changed the way scientists think about aging. He introduced the idea of epigenetics, or the study of the changes in living things that are caused by the medication of the way genes are expressed rather that the more classical mode of altering the genetic code itself. With his Guest Andy Lee, founding CTO of NeuroInitiative and COO of Vincere Biosciences, Angelo dives deeper into DNA methylation signatures and the patterns that we can look at to begin to determine someone's biological age. Andy describes how computer science and neural networks are modernizing these determinations and what that means for improving our longevity.The pair note the challenges posed by the sheer volume of genetic data and what advances in data science can make in our ability to push this area forward, including therapeutics for diseases such as Parkinson's. They talk about how computer science is allowing us to have transformative information brought to us so that then we can intervene and act on it. About this Episode's GuestAndy Lee is Co-Founder, Director, and CTO of NeuroInitiative, where he is co-inventor on multiple granted and pending patents surrounding the SEED simulation platform, as well as COO at Vincere Biosciences, Inc., a Cambridge, MA, company developing disease-modifying therapies for Parkinson's disease. Previously, Andy was VP of Engineering at Black Knight through F500 acquisition, spin-out, and IPO. He has led teams of over 100 members and continues to actively code to create new data-driven solutions. You can find out more on Twitter (@Andy_D_Lee) and LinkedIn. CitationsFahy, GM, Brooke, RT, Watson, JP, et al. Reversal of epigenetic aging and immunosenescent trends in humans. Aging Cell. 2019; 18:e13028. https://doi.org/10.1111/acel.13028Horvath S. (2013). DNA methylation age of human tissues and cell types. Genome biology, 14(10), R115.https://doi.org/10.1186/gb-2013-14-10-r115Johnson, A.A., Shokhirev, M., and Shoshitaishvili, B. (August 2019). Revamping the Evolutionary Theories of Aging. Ageing Research Reviews. 55. Doi: 10.1016/j.arr.2019.100947.The Matt Walker Podcast. https://sleepdiplomat.com/podcast Further ReadingHandbook of Epigenetics, 2nd EditionNeuroInitiativeVincere BiosciencesAndy Lee on Twitter and LinkedIn About the HostAngelo Kastroulis is an award-winning technologist, inventor, entrepreneur, speaker, data scientist, and author best known for his high-performance computing and Health IT experience. He is the principal consultant, lead architect, and owner of Carrera Group, a consulting firm specializing in software modernization, event streaming (Kafka), big data, analytics (Spark, elastic Search, and Graph), and high-performance software development on many technical stacks (Java, .net, Scala, C++, and Rust). A Data Scientist at heart, trained at the Harvard Data Systems Lab, Angelo enjoys a research-driven approach to creating powerful, massively scalable applications and innovating new methods for superior performance. He loves to educate, discover, then see the knowledge through to practical implementation.Host:Angelo KastroulisExecutive Producer: Kerri Patterson; Producer: Leslie Jennings Rowley; Audio Engineer: Ryan Thompson; Communications Strategist: Albert PerrottaMusic: All Things Grow by Oliver Worth© 2021, Carrrera Group

How Can Computer Science Improve Life?

Play Episode Listen Later Nov 2, 2021 32:20

On the most recent episode, host Angelo Kastroulis made a case computer science as a potential force for good. In this continuation of the theme, he talks about why excellence is a value to strive toward and how it differs from perfection, how simplifying the question can lead to a more valuable answer, and how questions about personalized medicine point to the potential for quantum to make big improvements to life.Angelo begins with a revelation that excellence is value that he holds dear. He distinguishes “quality”—which he defines as some standard that you're measuring yourself or others against to try to compare to similar things—and “excellence,” which means being outstanding or extremely good when compared with peers. He cautions against striving for perfection, as the benefit rarely exceeds the cost of the pursuit of perfection. This spurs musings about the limits to continuous improvement and how to balance costs and quality.He illustrates this through the example of parsing JSON and rest servers and asks how, instead of micro-optimizing a solution we might go about eliminating serialization completely. Ultimately, he posits that the single most important factor that delineates a mediocre developer from a phenomenal developer is the ability of the latter to step away from the decades-old tendency to meet requirements and instead find ways to rethink the problem space, eliminating limitations at the onset rather than mitigating later. Relatedly, he contends that attainable smaller goals are far more valuable than ones that are very, very lofty and unreachable because we can achieve those small goals.Angelo recounts part of a conversation he had with Kerri Patterson, Chief Strategy Officer at Carrera, on the importance of solving really hard problems and how they can impact our lives. She spoke about how most healthcare data systems are set up based on insurance rules, not patient outcomes and needs. She suggests incorporating quantum computing into clinical decision support and, ultimately, personalized medicine. This is taste of conversations that will appear in more detail later in the series.Angelo concludes this episode with a brief discussion of what machine learning is, how its two main categories—unsupervised and supervised learning—differ, and how these concepts will fuel much of the content in the upcoming episodes of Counting Sand. About the HostAngelo Kastroulis is an award-winning technologist, inventor, entrepreneur, speaker, data scientist, and author best known for his high-performance computing and Health IT experience. He is the principal consultant, lead architect, and owner ofCarrera Group, a consulting firm specializing in software modernization, event streaming (Kafka), big data, analytics (Spark, elastic Search, and Graph), and high-performance software development on many technical stacks (Java, .net, Scala, C++, and Rust). A Data Scientist at heart, trained at the Harvard Data Systems Lab, Angelo enjoys a research-driven approach to creating powerful, massively scalable applications and innovating new methods for superior performance. He loves to educate, discover, then see the knowledge through to practical implementation. CitationsBruce, V., and Young, A. (1986). Understanding face recognition. British Journal of Psychology, 77, 305-327. doi: 10.1111/j.2044-8295.1986.tb02199.xFoer, J. (2012, February). Feats of Memory Anyone Can Do. TED2012. Retrieved September 16, 2021, from https://www.ted.com/talks/joshua_foer_feats_of_memory_anyone_can_doSiegler, M.G. (2010, August 4). Eric Schmidt: Every 2 Days We Create As Much Information As We Did Up To 2003. Retrieved September 16, 2021, from https://techcrunch.com/2010/08/04/schmidt-data/SINTEF. (2013, May 22). Big Data, for better or worse: 90% of data generated over last two years. Retrieved September 16, 2021, from https://sciencedaily.com/releases/2013/05/130522085217.htm Further ReadingUrs Hölzle on Infrastructure for the Long TermMore on Face PerceptionHippocampusParietal LobeMethod of Loci Host:Angelo KastroulisExecutive Producer:Kerri Patterson; Producer:Leslie Jennings Rowley; Communications Strategist:Albert PerrottaMusic: All Things Grow byOliver Worth© 2021, Carrrera Group

Can Computer Science Make Life Better?

Play Episode Listen Later Oct 19, 2021 30:15

Can computer science improve our lives? Most people who work in the field would like to think so. Host Angelo Kastroulis, CEO of the Carrera Group, considers the ways that computer science is poised to make our lives better but also introduces caveats such as bias in machine learning. Along the way he sets up future episodes' themes—including predictive analytics, simulation, and health decision support systems—that he will dive into with more technical detail.Acknowledging that technology is neither a panacea nor a tool without downsides, Angelo starts with a review of some research on the psychological and sociological effects of social media. Beyond social media, he also questions the predictive ability of big data and introduces the idea of bias in machine learning. He does this through recalling a chance encounter with an old friend and fellow computer scientist, Andy Lee. Andy is the chief technology officer and founder of NeuroInitiative, a company that uses advanced simulation techniques to try to create new drug compounds, as well as chief operating officer of Vincere Biosciences, a company that takes these drug compounds all the way to human trials and hopefully to the market. Andy talks about how, since bias is unavoidable, we should find a way to make this weakness the strength of the model. In considering this, Angelo defines key concepts such as a model's features, what accuracy means, and why it is important not to conflate correlation with causation. He shares the important axiom "All machine learning models are bad and some are less bad than others" and exhorts listeners to “Never lie with stats.” He ends by suggesting a few actionable ways that computer science can help our lives become better, setting up themes—including predictive analytics, simulation, and health decision support systems—that he will dive into with more technical detail in future episodes of Counting Sand. About the HostAngelo Kastroulis is an award-winning technologist, inventor, entrepreneur, speaker, data scientist, and author best known for his high-performance computing and Health IT experience. He is the principal consultant, lead architect, and owner of Carrera Group, a consulting firm specializing in software modernization, event streaming (Kafka), big data, analytics (Spark, elastic Search, and Graph), and high-performance software development on many technical stacks (Java, .net, Scala, C++, and Rust). A Data Scientist at heart, trained at the Harvard Data Systems Lab, Angelo enjoys a research-driven approach to creating powerful, massively scalable applications and innovating new methods for superior performance. He loves to educate, discover, then see the knowledge through to practical implementation. CitationsBruce, V., and Young, A. (1986). Understanding face recognition. British Journal of Psychology, 77, 305-327. doi: 10.1111/j.2044-8295.1986.tb02199.xCameron, S. (2018, November 12). Shark Attacks, ice Creams, and the Randomised Trial. Retireived Septmber 16, 2021, from https://the-gist.org/2018/11/shark-attacks-ice-creams-and-the-randomised-trial/Data Never Seeps Infographic. (n.d.). Domo.Com. Retrieved September 16, 2021, from https://www.domo.com/learn/infographic/data-never-sleeps-8McLean Hospital. (2021, February 9). Here's How Social Media Affects Your Mental Health. Retrieved September 16, 2021, Https://Www.Mcleanhospital.Org/Essential/It-or-Not-Social-Medias-Affecting-Your-Mental-Health. https://www.mcleanhospital.org/essential/it-or-not-social-medias-affecting-your-mental-healthWorld Happiness 2019 Chapter 2. (n.d). https://Worldhappiness.Report/Ed/2019/Changing-World-Happiness/. Retrieved September 16, 2021, from https://worldhappiness.report/ed/2019/changing-world-happiness/ Further ReadingIs Social Media Bad For You: The Evidence and the UnknownsWorld Happiness Report 2021Person Perception 25 Years after Bruce and Young (1986) Host: Angelo KastroulisExecutive Producer: Kerri Patterson; Producer: Leslie Jennings Rowley; Audio Engineer: Mert Çetinkaya; Communications Strategist: Albert PerrottaMusic: All Things Grow by Oliver Worth© 2021, Carrrera Group

Inspired by Archimedes...Counting Sand

Play Episode Listen Later Oct 5, 2021 37:45

How much sand would it take to fill the universe? And what does this 2,000-year-old question have to do with a podcast on today's big data challenges? In this kick-off episode of the Counting Sand podcast, host Angelo Kastroulis, CEO of Carrera Group, explains how an early research paper by Archimedes of Syracuse has much in common with his own approach to today's big questions in data science and how the paper provides not only a metaphor for how we can meld research and practice in tackling today's big problems but also the inspiration for the perfect podcast name.In order to explain the origin of the name of this podcast, Angelo starts with a little history on Archimedes, as both a practical designer and also a scientist interested in the theoretical underpinnings of mathematical principles.Angelo then talks about some important research by Archimedes but begins by explaining what a research paper is, what the history of research papers is, and why anyone undertakes writing one. He then spends time talking about Archimedes' paper that attempts to spell out how many grains of sand would be needed to fill the universe. Of course, to answer this, Archimedes needed to approximate the size of the universe and, in order to do that, he had to develop a new number system.Angelo—who himself has both a Greek and entrepreneurial heritage—begins to draw parallels to Archimedes and his approach to the sand problem and his own approach to understanding and addressing big problems today. He talks about his journey to find the balance of the theoretical and practical, just as Archimedes did, applying a rigorous methodology, dealing with disappointment, and exercising patience. Angelo shares his first operating axiom: “When the solution isn't readily apparent, be patient, keep researching; the solution will present itself.”In his work as a data scientist and technologist best known for his high-performance computing and Health IT experience, Angelo uses this process time and again. In this episode he gives examples from his own research career and the applications he has developed. Ultimately he shares his axiom #2: “If you find yourself doing too much theory, do more application and it will make your theory better, If you find yourself doing too much application, do more theory and it will make your application better.”As Angelo says, Counting Sand will be a bit different than other podcasts. We will talk about some big problems and both discuss the theory behind potential solutions and see how they can be applied to tackle real problems. We are excited to bring listeners along for the ride. CitationsBourne, S. (2004, Deecembeer 6). A Conversation with Bruce Lindsay. A conversation with Bruce Lindsay – ACM Queue. Retrieved October 4, 2021, from https://queue.acm.org/detail.cfm?id=1036486.Heath, T.G. (2020). The Sand-Reckoner of Archimedes (Vol. 1). Library of Alexandria.Kastroulis, A. (2019). Towards Learned Access Path Selection: Using Artificial Intelligence to Determine the Decision Boundary of Scan vs Index Probes in Data Systems (Doctoral dissertation, Harvard University) Further ReadingOn Archimedes' Sand ReckonerAngelo Kastroulis' Harvard master's thesisThe Harvard Data Systems Lab“Publish or Perish” About the HostAngelo Kastroulisis an award-winning technologist, inventor, entrepreneur, speaker, data scientist, and author best known for his high-performance computing and Health IT experience. He is the principal consultant, lead architect, and owner of Carrera Group, a consulting firm specializing in software modernization, event streaming (Kafka), big data, analytics (Spark, elastic Search, and Graph), and high-performance software development on many technical stacks (Java, .net, Scala, C++, and Rust). A Data Scientist at heart, trained at the Harvard Data Systems Lab, Angelo enjoys a research-driven approach to creating powerful, massively scalable applications and innovating new methods for superior performance. He loves to educate, discover, then see the knowledge through to practical implementation.Host:Angelo KastroulisExecutive Producer:Kerri Patterson; Producer:Leslie Jennings Rowley; Communications Strategist:Albert PerrottaMusic: All Things Grow byOliver Worth© 2021, Carrera Group

Introducing Counting Sand

Play Episode Listen Later Sep 30, 2021 1:28

How many grains of sand are there in the universe? To answer that hard question, Archimedes had to invent an entire number system and only then could he attempt the computation. Today's data challenges make that look easy.On the COUNTING SAND podcast, host Angelo Kastroulis tackles the hard problem of how to make meaning of all the data available today.Every few episodes, Angelo will introduce a new theme at the heart of big data and IT infrastructure and highlight the most cutting-edge applications. Whether discussing the best designs for a complex data system or the social implications of bringing a diverse skill set to data science, each new episode will provide research-backed perspectives on today's hardest problems.If you are pushing boundaries on methodologies that solve complex problems and want to stay up to date on the latest industry trends in Artificial Intelligence, Systems Thinking, and Big Data in general, our discussions will give you insight that you need now.Starting October 5, 2021. Follow us now on your favorite podcasting platform or find us on CountingSandShow.com About the HostAngelo Kastroulis is an award-winning technologist, inventor, entrepreneur, speaker, data scientist, and author best known for his high-performance computing and Health IT experience. He is the principal consultant, lead architect, and owner of Carrera Group, a consulting firm specializing in software modernization, event streaming (Kafka), big data, analytics (Spark, elastic Search, and Graph), and high-performance software development on many technical stacks (Java, .net, Scala, C++, and Rust). A Data Scientist at heart, trained at the Harvard Data Systems Lab, Angelo enjoys a research-driven approach to creating powerful, massively scalable applications and innovating new methods for superior performance. He loves to educate, discover, then see the knowledge through to practical implementation.Host: Angelo KastroulisExecutive Producer: Kerri Patterson; Producer: Leslie Jennings Rowley; Communications Strategist: Albert PerrottaMusic: All Things Grow by Oliver Worth© 2021, Carrrera Group

Claim Counting Sand

In order to claim this podcast we'll send an email to with a verification link. Simply click the link and you will be able to edit tags, request a refresh, and other features to take control of your podcast page!

Claim Cancel