Welcome to Modern Digital Applications - a podcast for corporate decision makers and executives looking to create or extend their digital business with the help of modern applications, processes, and software strategy. Your host is Lee Atchison, a recognized industry thought leader in cloud computing and published author bringing over 30 years of experience. This podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacy Podtrac - https://analytics.podtrac.com/privacy-policy-gdrp
Modern Digital Applications is changing and coming back after its year long hiatus. Join us for the launch of Modern Digital Business! Modern Digital Business will be coming later this summer. If you'd like to be informed when it's ready to launch, please go to https://mdb.fm/launch (mdb.fm/launch). We hope to see you there! This podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacy Podtrac - https://analytics.podtrac.com/privacy-policy-gdrp
My guest today is Kevin Goslar. Kevin is the Senior Vice President for Technology Strategy at Originate, a digital agency that helps organizations with digital transformation best practices. He has a PhD in business informatics, and is an avid software developer. He currently is the maintainer for Git Town, an open-source project that provides a high-level CLI for Git. Previously, Kevin worked as a software developer at Google, which is where he was exposed to Mono-repos. Kevin is a Git expert and process advocate, and he’s here to discuss with me the pros and cons of monorepos vs polyrepos. This is part 2 of my interview with Kevin. This podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacy Podtrac - https://analytics.podtrac.com/privacy-policy-gdrp
My guest today is Kevin Goslar. Kevin is the Senior Vice President for Technology Strategy at Originate, a digital agency that helps organizations with digital transformation best practices. He has a PhD in business informatics, and is an avid software developer. He currently is the maintainer for Git Town, an open-source project that provides a high-level CLI for Git. Previously, Kevin worked as a software developer at Google, which is where he was exposed to Mono-repos. Kevin is a Git expert and process advocate, and he’s here to discuss with me the pros and cons of monorepos vs polyrepos. This podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacy Podtrac - https://analytics.podtrac.com/privacy-policy-gdrp
My guest today is Beth Long. Beth worked at New Relic where she held roles in both engineering and marketing, including two years leading the Reliability Engineering team, which owned the tooling and process for incident response and analysis. She also led New Relic's collaboration with the SNAFU Catchers, a group of researchers in- vestigating how tech companies learn from incidents. Beth recently left New Relic to join the startup Jeli.io, where she leads the engi- neering team working on the industry's first incident analysis platform. Links Beth Long, Engineering Manager at Jeli.io LinkedIn: https://www.linkedin.com/in/beth-adele-long/ Twitter: https://twitter.com/BethAdeleLong Featured in this episode: Jeli.io (https://jeli.io) Learning From Incidents with Jeli (https://leeatchison.com/atscale/2020/12/07/learning-from-incidents-with-jeli/) S3 Outage Mentioned in this Episode (https://thenewstack.io/dont-write-off-aws-s3-outage-fat-finger-folly/) This podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacy Podtrac - https://analytics.podtrac.com/privacy-policy-gdrp
My guest today is Beth Long. Beth worked at New Relic where she held roles in both engineering and marketing, including two years leading the Reliability Engineering team, which owned the tooling and process for incident response and analysis. She also led New Relic's collaboration with the SNAFU Catchers, a group of researchers in- vestigating how tech companies learn from incidents. Beth recently left New Relic to join the startup Jeli.io, where she leads the engi- neering team working on the industry's first incident analysis platform. Links Beth Long, Engineering Manager at Jeli.io LinkedIn: https://www.linkedin.com/in/beth-adele-long/ Twitter: https://twitter.com/BethAdeleLong Featured in this episode: Jeli.io (https://jeli.io) Learning From Incidents with Jeli (https://leeatchison.com/atscale/2020/12/07/learning-from-incidents-with-jeli/) S3 Outage Mentioned in this Episode (https://thenewstack.io/dont-write-off-aws-s3-outage-fat-finger-folly/) This podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacy Podtrac - https://analytics.podtrac.com/privacy-policy-gdrp
The scheduling of a cloud migration is a complex undertaking that should be thought and planned in advance. But in order for a migration to be successful, it’s important that you limit your risk as much as possible during the migration itself, so that unforeseen problems don’t show up and cause your migration to go sideways, fail outright, or result in unexpected outages that negatively impact your business. This podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacy Podtrac - https://analytics.podtrac.com/privacy-policy-gdrp
Moving your data is one of the trickiest parts of a cloud migration. During the migration, the location of your data can have a significant impact on the performance of your application. During the data transfer, keeping the data intact, in sync, and self-consistent requires either tight correlation or—worse—application downtime. Moving your data and the applications that utilize the data at the same time is necessary to keep your application performance acceptable. Deciding how and when to migrate your data relative to your services, though, is a complex question. Often companies will rely on the expertise of a migration architect, which is a role that can greatly contribute to the success of any cloud migration. Whether you have an on-staff cloud architect or not, there are three primary strategies for migrating application data to the cloud. This podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacy Podtrac - https://analytics.podtrac.com/privacy-policy-gdrp
My guest today is Thomas Curran. Thomas is a cloud executive with many years of experience, including VP of Technology and Innovation at Deutsche Telekom and Technology Advisor at Deutsche Börse. He is the co-founder of the Ory Software Foundation, which is the owner of a very popular open source, go-based, identity management library named Kratos, along with other open source identity management tools. Now, Thomas is co-founders of Ory Corp, an Open Source Identity Infrastructure and Services company. Thomas is with me today from his office in Munich, Germany, to talk about application identity management. As means of full disclosure, I’ve worked with Thomas personally for many years, first meeting him back when he was at Deutsche Börse. I’m now currently working directly with Thomas at Ory, architecting their new cloud infrastructure. Links and More Information* Thomas LinkedIn (https://www.linkedin.com/in/thomasaidancurran/) * Ory (https://ory.sh) Tech Tapas — History of the Term SaaSWhen did software as a service start? Well, that depends on what you mean by the term… depending on how you define SaaS, the answer is either the early 1960s, or somewhere around 2005. Back in the early days of computing, all applications ran on a centralized computer. Users accessed the computers remotely. Initially via punch cards and later via remote terminals. The centralized nature of the application is, by a true definition, Software as a Service. But the modern definition of SaaS is tied much more closely with cloud computing. SaaS now-a-days refers to software running centrally, typically in a public or private cloud environment, and is shared among multiple users. A thin client of some sort — either a web browser or a thin mobile application — is used to front the centralized application. From a business model standpoint, users don’t buy SaaS software, instead they rent or lease access to it with monthly or annual fees. Alternatively, the service could be free and supported by advertising or other monetization processes. This is the heart of the business model for social media, for example. So, SaaS is an old term that has been given new meaning in recent years. But it’s the recent definition that has really changed the way people think and build software today. Tech Tapas — Amazon S3Amazon S3. A highly durable, highly available file and object storage mechanism in the cloud. This service is the go to service for most companies that want to store huge quantities of data in the cloud, or for long term persistent object storage. S3 was designed with the goals of being highly available, highly durable, and highly scalable. The design goal for availability is 99.99%, with a durability of objects of 99.999999999 (that’s 11 9’s). How available? The 4 9’s availability translates to a total of 52 minutes of downtime per year. How durable? The 11 9’s durability means that if every man, woman, and child in the world had an object in S3, then Amazon would lose at most one of those objects, approximately once every 15 years. These are amazing goals, and is one of the reasons S3 has such a great reputation as a high quality object storage system. S3 was one of three initial AWS services and was a big part of AWS’s early success. This podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacy Podtrac - https://analytics.podtrac.com/privacy-policy-gdrp
My guest today is Thomas Curran. Thomas is a cloud executive with many years of experience, including VP of Technology and Innovation at Deutsche Telekom and Technology Advisor at Deutsche Börse. He is the co-founder of the Ory Software Foundation, which is the owner of a very popular open source, go-based, identity management library named Kratos, along with other open source identity management tools. Now, Thomas is co-founders of Ory Corp, an Open Source Identity Infrastructure and Services company. Thomas is with me today from his office in Munich, Germany, to talk about application identity management. As means of full disclosure, I’ve worked with Thomas personally for many years, first meeting him back when he was at Deutsche Börse. I’m now currently working directly with Thomas at Ory, architecting their new cloud infrastructure. Links and More Information* Thomas LinkedIn (https://www.linkedin.com/in/thomasaidancurran/) * Ory (https://ory.sh) This podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacy Podtrac - https://analytics.podtrac.com/privacy-policy-gdrp
The scheduling of a cloud migration is a complex undertaking that should be thought and planned in advance. Typically, a migration architect is involved and makes the difficult technical decisions of what to migrate when, in concert with the organization management to take into account the business needs. But it’s important for a migration to be successful that you limit your risk as much as possible during the migration, so that unforeseen problems don’t show up and cause your migration to go sideways, fail, or result in unexpected outages that negatively impact your business. When scheduling the migration, there are a number of things you should keep in mind to increase the likelihood of a successful migration and reduce the risk of the migration itself. Here are five key methods to reducing the risk of your cloud migration, and hence increase your overall chance for success. Links and More InformationThe following are links mentioned in this episode, and links to related information: • Modern Digital Applications Website (https://mdacast.com) • Lee Atchison Articles and Presentations (https://leeatchison.com) • Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com) • Advising and Consulting Services by Lee Atchison (https://atchisontechnology.com) • Course: Building a Cloud Roadmap, 2018-2019 (https://leeatchison.com/classes/building-a-cloud-roadmap/) Key #1. Limit the complexity of migrating your dataThe process of migrating your data from your on-premise datastores to the cloud is, itself, the hardest, most dangerous, and most time-consuming part of your migration. There are many ways to migrate your data…some of the methods are quite complex and some of them are very basic. Some of them result in no need for downtime, others require significant downtime in order to implement. There is a tradeoff you need to make between the complexity of the migration process and the impact that complexity has on the migration, including the potential need for site downtime. While in some scenarios you must implement a complex data migration scheme to reduce or eliminate downtime and reduce risk along the way, in general I recommend choosing as simple of a data migration scheme as possible given your system constraints and business constraints. The more complex your data migration strategy, the riskier your migration. By keeping the data migration process as simple as practical given your business constraints, you reduce the overall risk of failure in your migration. Be aware, though, that you may require a certain level of migration complexity in order to maintain data redundancy and data availability during the migration itself. So the ultimate simplest migration process may not be available to you. Still, it’s important that you select the simplest migration process that achieves your business and technical migration goals. Key #2. Reduce the duration of the in-progress migration as much as possible.Put another way, do as much preparation work before you migrate as you can, and then once you start the migration, move as quickly as possible to completing the migration, postponing as much work as possible until after the migration is complete and validated. By doing as much preparation work before the migration as possible and pushing as much cleanup work to after the migration as possible, you reduce the amount of time and complexity of the migration itself. Given that your application is most at risk of a migration related failure during the migration process itself, reducing this in-migration time is critical to reducing your overall risk. For example, it may be possible to accept a bit lower overall application performance in the short term—during the migration, in order to get to the end of your migration quicker. Then, after the migration is complete, you can do some performance refactorings to improve your overall performance situation. While postponing an important performance...
Amazon Web Services provides a cloud certification program to encourage and enable growing your AWS cloud technical skills to help you grow your career and your business. Have you wondered what it takes to become AWS certified? In this episode, I conclude my interview with Kevin Downs, a trial by fire expert on the AWS certification program, as we discuss the AWS cloud certification program, and how to best utilize it. And then, what was the first AWS service? This is AWS Certifications, on Modern Digital Applications. Links and More InformationThe following are links mentioned in this episode, and links to related information: Modern Digital Applications Website (https://mdacast.com/ (https://mdacast.com)) Lee Atchison Articles and Presentations (https://leeatchison.com/ (https://leeatchison.com)) Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com/ (https://architectingforscale.com)) Advising and Consulting Services by Lee Atchison (https://atchisontechnology.com/ (https://atchisontechnology.com)) AWS Certifications (https://aws.amazon.com/certification/) A Cloud Guru (https://acloudguru.com) Kevin Downs Twitter (https://twitter.com/kupsand) Kevin Downs LinkedIn (https://www.linkedin.com/in/kevin-downs/ (https://www.linkedin.com/in/kevin-downs/)) This episode is part 2 and final part of my interview with Kevin Downs. This podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacy Podtrac - https://analytics.podtrac.com/privacy-policy-gdrp
Amazon Web Services provides a cloud certification program to encourage and enable growing your AWS cloud technical skills to help you grow your career and your business. Have you wondered what it takes to become AWS certified? In this episode, join me with Kevin Downs, a trial by fire expert on the AWS certification program, while we discuss the AWS cloud certification program, and how to best utilize it. And then, what was EC2 like in the old days? Back before it was actually useful? This is AWS Certifications, on Modern Digital Applications. Links and More InformationThe following are links mentioned in this episode, and links to related information: Modern Digital Applications Website (https://mdacast.com/ (https://mdacast.com)) Lee Atchison Articles and Presentations (https://leeatchison.com/ (https://leeatchison.com)) Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com/ (https://architectingforscale.com)) Advising and Consulting Services by Lee Atchison (https://atchisontechnology.com/ (https://atchisontechnology.com)) AWS Certifications (https://aws.amazon.com/certification/) A Cloud Guru (https://acloudguru.com) Kevin Downs Twitter (https://twitter.com/kupsand) Kevin Downs LinkedIn (https://www.linkedin.com/in/kevin-downs/ (https://www.linkedin.com/in/kevin-downs/)) This episode is part 1 of 2 of my interview with Kevin Downs. This podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacy Podtrac - https://analytics.podtrac.com/privacy-policy-gdrp
Likelihood and Severity. Two different measures for two different aspects of measuring risk in a modern digital application. They are both measures of risk, but they measure different things. What is the difference between likelihood and severity? And why does it matter? In this episode, I’ll discuss Likelihood and Severity, how they are different, and how they are both useful measures of risk in a modern digital application. Links and More InformationThe following are links mentioned in this episode, and links to related information: • Modern Digital Applications Website (https://mdacast.com/ (https://mdacast.com)) • Lee Atchison Articles and Presentations (https://leeatchison.com/ (https://leeatchison.com)) • Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com/ (https://architectingforscale.com)) • Advising and Consulting Services by Lee Atchison (https://atchisontechnology.com) • Learning Path - Risk Management (http://leeatchison.com/classes/learning-path-risk-management/ (http://leeatchison.com/classes/learning-path-risk-management/)) • O’Reilly Learning Path Course (https://learning.oreilly.com/learning-paths/learning-path-microservices/9781492061106/ (https://learning.oreilly.com/learning-paths/learning-path-microservices/9781492061106/)) Microservice architectures offer IT organizations many benefits and advantages over traditional monolithic applications. This is especially true in cloud environments where resource optimization works hand-in-hand with microservice architectures. So it’s no mystery that so many organizations are transitioning their application development strategies to a microservices mindset. But even in the realm of microservices, building and operating an application at scale can be daunting. Problems can include something as fundamental as having too few resources and time to continue developing and operating your application, to underestimating the needs of your rapidly growing customer base. At its best, failure to build for scale can be frustrating. At its worst, it can cause entire projects—even whole companies—to fail. Realistically, we know that it’s impossible to remove all risk from an application. There is no magic eight ball — no crystal ball — that allows you to see in the future and understand how decisions you make today impact your application tomorrow. Risk will always be a burden to you and your application. But, we can learn to mitigate risk. We can learn to minimize and lessen the impact of risk before problems associated with the risk negatively impact you and your applications. I’ve worked in many organizations, and have observed many more. Planning for problems is very hard and something most organizations fail to do properly. Technical debt is often a nebulous concept. Quantifying risk is the first step to understanding vulnerability. It also helps set priorities and goals. Is fixing one potential risk more important than another? How can you decide if the risks aren’t understood and quantified. In this episode, we’re going to talk about how to measure risk, so that you can build, maintain, and operate large, complex, modern applications at scale. There is a great quote by Donald Rumsfeld, twice former secretary of defense for the United States. It starts “Reports that say that something hasn’t happened are always interesting to me”. He goes on to say: “because, as we know, there are known knowns, there’re things we know we know. We also know there are known unknowns, that is to say we know there are some things we do not know.” “But there are also unknown unknowns. The ones we don’t know we don’t know. And if one looks throughout the history of our country and other free countries, it is the latter category that tend to be the difficult ones.” This is true in running a country, and a country’s military, and it is true in
Building a scalable application that has high availability is not easy. Problems can crop up in unexpected ways that can cause your application to stop working and stop serving your customer’s needs. No one can anticipate where problems will come from and no amount of testing will identify and correct all issues. Some issues end up being systemic problems that require the correlation of multiple systems in order for the problems to occur. Some are more basic, but are simply missed or not anticipated. Links and More Information The following are links mentioned in this episode, and links to related information: Modern Digital Applications Website (https://mdacast.com (https://mdacast.com)) Lee Atchison Articles and Presentations (https://leeatchison.com (https://leeatchison.com)) Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com (https://architectingforscale.com)) Application availability is critical to all modern digital applications. But how do you avoid availability problems? You can do so by avoiding those traps that cause poor availability. There are five main causes of poor availability that impact modern digital applications. Poor Availability Cause Number 1 Often, the main driver of application failure is success. The more successful your company is, the more traffic your application will receive. The more traffic it receives, the more likely you will run out of some vital resource that your application requires. Typically, resource exhaustion doesn’t happen all at once. Running low on a critical resource can cause your application to begin to slow down, backlogging requests. Backlogged requests generate more traffic, and ultimately a domino effect drives your application to fail. But even if it doesn’t fail completely, it can slow down enough that your customers leave. Shopping carts are abandoned, purchases are left uncompleted. Potential customers go elsewhere to find what they are looking for. Increasing the number of users using your system or increase the amount of data these consumers are using in your system, and your application may fall victim to resource exhaustion. Resource exhaustion can result in a slower and unresponsive application. Poor Availability Cause Number 2 When traffic increases, sometimes assumptions you’ve made in your code on how your application can scale are proven to be incorrect. You need to make adjustments and optimizations on the fly in order to resolve or work around your assumptions in order to keep your system performant. You need to change your assumptions on what is critical and what is not. The realization that you need to make these changes usually comes at an inopportune time. They come when your application is experiencing high traffic and the shortcomings start becoming exposed. This means you need a quick fix to keep things operating. Quick fixes can be dangerous. You don’t have time to architect, design, prioritize, and schedule the work. You can’t think through to make sure this change is the right long term change You need to make changes now to keep your application afloat. These changes, implemented quickly and at the last minute with little or no forethought or planning, are a common cause of problems. Untested and limited tested fixes, quickly thought through fixes, bad deployments caused my skipping important steps. All of these things can introduce defects into your production environment. The fact that you need to make changes to maintain availability, will itself threaten your availability. Poor Availability Cause Number 3 When an application becomes popular, your business needs usually demand that your application expand and add additional features and capabilities. Success drives larger and more complex needs. These increased needs make your application more complicated and requires more developers to manage all of the moving parts. Whether these additional developers are working on new features,...
We often hear that being able to scale your application is important. But why is it important? Why do we need to be able to suddenly, and without notice, scale our application to handle double, triple, or even ten times the load it is currently experiencing? Why is scaling important? In this episode, I am going to talk about four basic reasons. Four reasons why scaling is important to the success of your business. And then, what is the dynamic cloud? This is Application Scaling, on Modern Digital Applications. Links and More Information The following are links mentioned in this episode, and links to related information: Modern Digital Applications Website (https://mdacast.com (https://mdacast.com)) Lee Atchison Articles and Presentations (https://leeatchison.com (https://leeatchison.com)) Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com (https://architectingforscale.com)) Why you must scale We often hear that being able to scale your application is important. But why is it important? Why do we need to be able to suddenly, and without notice, scale our application to handle double, triple, or even ten times the load it is currently experiencing? Why is scaling important? There are many reasons why our applications must scale. A growing business need is certainly one important reason. But there are other reasons why architecting your application so it can scale is important for your business. I am going to talk about four basic reasons. Four reasons why scaling is important to the success of your business. Reason #1. Support your growing business This is the first, and the most basic reason why your application has to scale. As your business grows, your application needs grow. But there is more to it than that. There are three aspects of a growing business that impact your application and require it to scale. First, is the most obvious. As you get more customers, your customer’s make more use of your applications and they need more access to your website. This requires more capacity and more growth for the IT infrastructure for your sites. But that’s not the only aspect. As your application itself grows and matures, typically you will add more and more features and capabilities to the application. Each new feature and each new capability means customers will make more use of your application. As each customer uses more of your application, the application itself has to scale. Simply by your business maturing over time, even if the size of your customer base doesn’t grow, the computation needs for your application grow and your application must scale. And finally, as your business grows and matures, and your application grows and matures, your more complex application will require more engineers to work on the application simultaneously, and they will work on more complex components. Your application might be rearchitected to be service based. It might add additional external dependencies and provisions. You will have to support more deployments and more updates. Your application and your application infrastructure will need to scale to support larger development teams and larger projects. This means you need more mature processes and procedures to scale the speed at which your larger team can improve your application. Reason #2. Handle surprise situations The second reason you need to be able to scale your application is to handle surprise situations and conditions. All businesses have their biggest days. These are the days where traffic is at the heaviest. These are days like Black Friday in retail, or the day of the Super Bowl for companies that advertise during that event, or open enrollment periods, or start of travel season. But your business may have unexpected business bumps. These are the traffic increases that occur not because of a known big event, but because of an unknown or unexpected event. When an event occurs that is favorable to your business, you...
Ken Gavranovic was the Executive Vice President and GM for product at New Relic. In early 2019, Ken and I were in Boston together for an event, and we recorded an interview discussion about Risk Management in modern digital applications. Both Ken and I have experience dealing with Risk Management issues in current and past assignments. I discuss Risk Management in my book, Architecting for Scale. Ken used a very similar risk management technique in his past corporate management gigs. In this interview, we compare notes and make recommendations on best practices for Risk Management that everyone can use. Links and More Information The following are links mentioned in this episode, and links to related information: Modern Digital Applications Website (https://mdacast.com (https://mdacast.com)) Lee Atchison Articles and Presentations (https://leeatchison.com (https://leeatchison.com)) Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com (https://architectingforscale.com)) Risk Management with Ken Gavranovic Video (https://leeatchison.com/2019/02/06/managing-risk-in-modern-enterprise-applications/ (https://leeatchison.com/2019/02/06/managing-risk-in-modern-enterprise-applications/)) Ken Gavranovic Twitter (https://twitter.com/kgavranovic (https://twitter.com/kgavranovic)) Ken Gavranovic LinkedIn (https://www.linkedin.com/in/gavranovic/ (https://www.linkedin.com/in/gavranovic/)) Risk Management Interview Ken: I know we both talk to a lot of customers. One of the questions is, where do I get started? What are some of the patterns we see in enterprises and our own experiences? We have an awesome opportunity to talk to a lot of companies doing digital transformation, but what is something that I can just go do tomorrow to get started? Lee: One of the things I find it’s very easy to wrap your mind around is risk management. How do you build a risk matrix to track the issues and the risks you have within your system? I like to talk to companies about that because it gets people starting to think about what their system is doing, what problems they have, and how they deal with them. It gets them thinking beyond just the problem/resolution cycle, and more into a pro/con and risk assessment process. What is the benefit of fixing something versus the benefit of mitigating it versus the benefit of simply ignoring it? I like to talk about that because it gets conversations going within the company about the sorts of things that are important to them. Creating a risk matrix is an important first step for anyone who is thinking about trying to improve their availability, trying to improve their scalability, or trying to modernize their application in many different ways. It helps get a grip on the issues that already exist in your system and what you are currently doing to manage those risks. Ken: I 100% agree. I remember in a previous role, I had a couple hundred-million-dollar project, I had some challenges. We created a risk matrix which helped us solve those challenges. So I thought it might be helpful for people watching this video. Let’s double click and see what this might look like. From my perspective, I think the key questions that need to be asked, those questions need to be asked in a bottoms-up way, not top down. Agreed? Lee: Yes, definitely. Ken: It’s not people at the top of the organization that are giving you the answers. It’s the team level that gives you the answers you need. Let me give you my shot and tell me where I miss. First of all, the things that can go into the risk are the things that can go bump in the night. Lee: Most people already have an idea of the things that keep them up at night. Things they think about, worry about. The things they think about on a regular basis, and that is a good place to start. Ken: That makes sense. So, bottom up, by team, just create a list. Just list all the things that we think are some sort...
Modern applications require high availability. Our customers expect it, our customers demand it. But building a modern scalable application that has high availability is not easy and does not happen automatically. Problems happen. And when problems happen, availability suffers. Sometimes availability problems come from the simplest of places, but sometimes they can be highly complex. In this episode, we will continue our discussion from last week with the remainder of the five strategies for keeping your modern application, highly available as well. This is How to Improve Application Availability, on Modern Digital Applications. Links and More Information The following are links mentioned in this episode, and links to related information: Modern Digital Applications Website (https://mdacast.com (https://mdacast.com)) Lee Atchison Articles and Presentations (https://leeatchison.com (https://leeatchison.com)) Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com (https://architectingforscale.com)) Robinhood Announcement (https://blog.robinhood.com/news/2020/3/3/an-update-from-robinhoods-founders (https://blog.robinhood.com/news/2020/3/3/an-update-from-robinhoods-founders)) How to Improve Availability, Part 2 Building a scalable application that has high availability is not easy and does not come automatically. Problems can crop up in unexpected ways that can cause your application to stop working for some or all of your customers. No one can anticipate where problems will come from, and no amount of testing will find all issues. Many of these are systemic problems, not merely code problems. To find these availability problems, we need to step back and take a systemic look at our applications and how they works. What follows are five things you can and should focus on when building a system to make sure that, as its use scales upwards, availability remains high. In part 1 of this series, we discussed two of these focuses. The first was building with failure in mind. The second was always think about scaling. In part 2 of this series, we conclude with the remaining three focuses. Number 3 - Mitigate risk Keeping a system highly available requires removing risk from the system. When a system fails, often the cause of the failure could have been identified as a risk before the failure actually occurred. Identifying risk is a key method of increasing availability. All systems have risk in them. There is risk that: A server will crash A database will become corrupted A returned answer will be incorrect A network connection will fail A newly deployed piece of software will fail Keeping a system available requires removing risk. But as systems become more and more complicated, this becomes less and less possible. Keeping a large system available is more about managing what your risk is, how much risk is acceptable, and what you can do to mitigate that risk. This is Risk management, and it is at the heart of building highly available systems. Part of risk management is risk mitigation. Risk mitigation is knowing what to do when a problem occurs in order to reduce the impact of the problem as much as possible. Mitigation is about making sure your application works as best and as completely as possible, even when services and resources fail. Risk mitigation requires thinking about the things that can go wrong, and putting a plan together now, to be able to handle the situation when it does happen. For example, consider a typical online e-commerce store. Being able to search for product on the e-commerce store is critical to almost any online store. But what happens if search breaks? To prepare for this, you need to have “Failed Search Engine” listed as a risk in your application risk plan. And in that risk, you need to specify a mitigation plan to execute if that risk ever triggers. For example, we might know from history that 60 percent of people who search...
Modern applications require high availability. Our customers expect it, our customers demand it. But building a modern scalable application that has high availability is not easy and does not happen automatically. Problems happen. And when problems happen, availability suffers. Sometimes availability problems come from the simplest of places, but sometimes they can be highly complex. In this episode, we will discuss five strategies for keeping your modern application, highly available as well. This is How to Improve Application Availability, on Modern Digital Applications. Links and More Information The following are links mentioned in this episode, and links to related information: Modern Digital Applications Website (https://mdacast.com (https://mdacast.com)) Lee Atchison Articles and Presentations (https://leeatchison.com (https://leeatchison.com)) Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com (https://architectingforscale.com)) How to Improve Availability, Part 1 Building a scalable application that has high availability is not easy and does not come automatically. Problems can crop up in unexpected ways that can cause your application to stop working for some or all of your customers. These availability problems often arise from the areas you least expect, and some of the most serious availability problems can originate from extremely simple sources. Let’s take a simple example from a real world application that I’ve worked on in the past. This problem really happened. The software was a SaaS application. Customer’s could login to the application and they received a customized experience for their personal use. One of the ways that the customer could tell they were logged in is that an avatar of themselves appeared in the top right hand corner. It wasn’t a big deal, but it was a handy indicator that you were receiving a personalized environment. We’ve all seen this sort of thing, it’s pretty common in online software applications now-a-days. Anyway, by default, when we showed the page, we read the avatar from a 3rd party avatar service that told us what avatar to display for the current user. One day, that third party system failed. Our application, which made the poor assumption that the avatar service would always be working, also failed. Simply because we were unable to display a picture of the user in the upper right hand corner, our entire application crashed and nobody could use it. It was, of course, a major problem for us. It was harder too because the avatar service was out of our control. Our business was directly tied to a 3rd party service we had no control over, and we weren’t even aware of the dependency. A very minor feature crashed our entire business…Our business crashed because of an icon. Obviously, that was unacceptable. How could we have avoided this problem? There were a thousand solutions to the problem. By far the easiest would have been to notice and catch any failure of the 3rd party service in realtime, and if it did fail, show some default generic avatar instead. There was no need to bring down our entire application over this simple problem. A simple check, some error recovery logic, some fallback options, that’s all it would have taken to avoid crashing our entire business. No one can anticipate where problems will come from, and no amount of testing will find all issues. Many of these are systemic problems, not merely code problems. To find these availability problems, we need to step back and take a systemic look at our applications and how they works. What follows are five things you can and should focus on when building a system to make sure that, as its use scales upwards, availability remains high. Number 1 - Build with Failure in Mind As Werner Vogels, CTO of Amazon, says: “Everything fails all the time.” You should plan on your applications and services failing. It will happen. Now, deal with it. Assuming your...
In this episode, we know that using multiple availability zones helps increase your application availability and resiliency by distributing our application across multiple disperse data centers. But did you know that availability zones don’t necessarily give you the separation you expect? In fact, it is entirely possible to have two instances of a service running in two distinct availability zones, but actually have them running in the same data center, in the same physical rack, and possibly even on the same physical server! How can this be? And even more importantly, how can we avoid it? The answer involves understanding how availability zones work and how they are structured. And then, one of the oddest cloud services created is also one of the first cloud services. Before AI and before machine learning, humans actually powered a part of the cloud. This is, Life with Multiple AWS Accounts. Links and More Information The following are links mentioned in this episode, and links to related information: How to maintain availability when using multiple AWS accounts (https://www.infoworld.com/article/3444860/5-pain-points-of-modern-software-development-and-how-to-overcome-them.html (https://www.infoworld.com/article/3444860/5-pain-points-of-modern-software-development-and-how-to-overcome-them.html)) Modern Digital Applications Website (https://mdacast.com (https://mdacast.com)) Lee Atchison Articles and Presentations (https://leeatchison.com (https://leeatchison.com)) Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com (https://architectingforscale.com)) Distributing Your Application When building a modern, high-performant application at scale, it’s important to make sure the individual application instances are distributed across a variety of data centers in such a way that if any given data center goes offline, the application can continue to function relatively normally. This is an industry-wide best practice, and an important characteristic to architect into your applications in order to make them sufficiently resilient to data center problems. The same philosophy occurs when you build your application in the cloud. Except, when you build a cloud-based application, you typically do not have visibility into which data center a particular server or cloud resource is located. This is part of the abstraction that gives the cloud its value. Not having visibility into which data centers your application is operating in makes it difficult to build multi data center resiliency into your applications. To solve this problem, AWS created a cloud abstraction of the data center that allows you to build on this level of resiliency without being exposed to the details of data center location. The abstraction is the availability zone. AWS availability zones An AWS availability zone is an isolated set of cloud resources that allows specifying a certain level of isolation into your applications. Resources within a single availability zone may be physically or virtually near each other, to the extent that they can be dependent on each other and share subcomponents with each other. For example, two EC2 servers that are in the same availability zone may be in the same data center, in the same rack, or even on the same physical server. However, cloud resources that are in different availability zones are guaranteed to be separated into distinct data centers. They cannot be in the same data center, they cannot be in the same rack, and they cannot be using the same physical servers. They are distinct and independent from each other. Hence, the solution to the resiliency problem, you can build your application to live in multiple availability zones. If you construct your application so instances of your application are distributed across multiple availability zones, you can isolate yourself from hardware failures such as server failures, rack failures, and even entire data center failures....
This is a special edition of Modern Digital Applications. July 9th, 2018 was the launch of a podcast episode. It was an episode of the “https://blog.newrelic.com/tag/modern-software-podcast/ (Modern Software Podcast)”, a podcast sponsored by https://newrelic.com/ (New Relic), and hosted by New Relic’s https://www.linkedin.com/in/fredricpaul/ (Fredric Paul) and https://www.linkedin.com/in/toriwieldt/ (Tori Wieldt). This particular episode was titled “https://leeatchison.com/2018/07/11/the-great-serverless-debate/ (The Great Serverless Debate)”. It was a debate between myself, and a good friend of mine, https://www.linkedin.com/in/smithclay/ (Clay Smith). Clay and I were guests on the show. That episode was a huge success, and I still get asked questions about it today. It seemed to me that it was time for an update of that debate…a redux if you will. Since New Relic’s Modern Software Podcast isn’t active right now, I thought I would take on the challenge myself and host a redo of the great debate — based on what we know about serverless in 2020. So, on February 21, 2020, Clay and I got together for an update to our views on the world of serverless. This is The Great Serverless Debate, Redux. This is the second part of that interview. Links and More Information The following are links mentioned in this episode, and links to related information: Modern Digital Applications Website (https://mdacast.com/ (https://mdacast.com)) Lee Atchison Articles and Presentations (https://leeatchison.com/ (https://leeatchison.com)) Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com/ (https://architectingforscale.com)) The Great Serverless Debate (https://leeatchison.com/2018/07/11/the-great-serverless-debate/ (https://leeatchison.com/2018/07/11/the-great-serverless-debate/)) Clay Smith - Twitter (https://twitter.com/smithclay (https://twitter.com/smithclay)) Clay Smith - LinkedIn (https://www.linkedin.com/in/smithclay/ (https://www.linkedin.com/in/smithclay/)) Clay Smith’s Monitoring Monitoring Newsletter (https://monitoring2.substack.com/ (https://monitoring2.substack.com/)) Modern Software Podcast - Fredric Paul @TheFreditor (https://twitter.com/TheFreditor (https://twitter.com/TheFreditor)) Modern Software Podcast - Tori Wieldt @ToriWieldt (https://twitter.com/ToriWieldt (https://twitter.com/ToriWieldt)) Lee’s Guest — Clay Smith https://www.linkedin.com/in/smithclay/ (Clay Smith) is a good friend of mine. He was a senior software engineer at several early-stage startup companies and has been building serverless solutions for many years now, from mobile backends to real-time APIs with Amazon Web Services. Clay was a senior Developer Evangelist at New Relic, which is where Clay and I met. Clay’s newsletter is “Monitoring Monitoring”. You can subscribe to the newsletter at https://monitoring2.substack.com/ (https://monitoring2.substack.com/). Questions/Issues Discussed Is Lambda living up to the hype? Is there an end to the hype anytime soon? Has Fargate lived up to the hype? What is the role of containers vs FaaS? What is the role of Kubernetes? What types of problems are suited for FaaS and what kind of problems are not? How good was our guesses in 2018 for the state of serverless in 2020? What will be the state of serverless in 2022? We need a better term for FaaS than the generic term “serverless” Use of FaaS as a glue between external services Quotes Lee: “One of the problems with Lambda is…you are not making your service boundaries based on functionality, you are making service boundaries based on the limitation of the technology.” Clay: “It’s been a great two years for event driven architectures in general.” Clay: “There’s a new class of startups giving a Heroku type experience that look very interesting…” Lee: “Lambda is very good…for the hook and glue use case” [system integration] Clay: “API Glue” Clay: “We both definitely agree that the glue aspect is still the...
The second edition of my book, Architecting for Scale, is now available! Links and More InformationBook Website (https://architectingforscale.com (https://architectingforscale.com)) Amazon.com (https://www.amazon.com/Architecting-Scale-Maintain-Availability-Manage/dp/1492057177/ (https://www.amazon.com/Architecting-Scale-Maintain-Availability-Manage/dp/1492057177/)) O’Reilly (http://shop.oreilly.com/product/0636920274308.do (http://shop.oreilly.com/product/0636920274308.do)) Special ReportHello everyone and welcome to Modern Digital Applications. This is just a very quick news break... The second edition of my book, Architecting for Scale, is now available! This edition is dramatically updated and improved. It includes new content on cloud computing, microservices, and serverless computing. All the content has been updated and it has been significantly reorganized. The book is now available for purchase on http://amazon.com (amazon.com), other technical bookstores, or directly from O’Reilly Media. It’s also available included as part of your O’Reilly Safari subscription. Check it out! If you have any questions or comments, feel free to reach out to me. Links to the websites where you can find out more about the book or purchase it are contained in the shownotes for this episode. Thank you and enjoy! This podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacy Podtrac - https://analytics.podtrac.com/privacy-policy-gdrp
This is a special edition of Modern Digital Applications. July 9th, 2018 was the launch of a podcast episode. It was an episode of the “https://blog.newrelic.com/tag/modern-software-podcast/ (Modern Software Podcast)”, a podcast sponsored by https://newrelic.com (New Relic), and hosted by New Relic’s https://www.linkedin.com/in/fredricpaul/ (Fredric Paul) and https://www.linkedin.com/in/toriwieldt/ (Tori Wieldt). This particular episode was titled “https://leeatchison.com/2018/07/11/the-great-serverless-debate/ (The Great Serverless Debate)”. It was a debate between myself, and a good friend of mine, https://www.linkedin.com/in/smithclay/ (Clay Smith). Clay and I were guests on the show. That episode was a huge success, and I still get asked questions about it today. It seemed to me that it was time for an update of that debate…a redux if you will. Since New Relic’s Modern Software Podcast isn’t active right now, I thought I would take on the challenge myself and host a redo of the great debate — based on what we know about serverless in 2020. So, on February 21, 2020, Clay and I got together for an update to our views on the world of serverless. This is The Great Serverless Debate, Redux. Links and More Information The following are links mentioned in this episode, and links to related information: Modern Digital Applications Website (https://mdacast.com (https://mdacast.com)) Lee Atchison Articles and Presentations (https://leeatchison.com (https://leeatchison.com)) Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com (https://architectingforscale.com)) The Great Serverless Debate (https://leeatchison.com/2018/07/11/the-great-serverless-debate/ (https://leeatchison.com/2018/07/11/the-great-serverless-debate/)) Clay Smith - Twitter (https://twitter.com/smithclay (https://twitter.com/smithclay)) Clay Smith - LinkedIn (https://www.linkedin.com/in/smithclay/ (https://www.linkedin.com/in/smithclay/)) Clay Smith’s Monitoring Monitoring Newsletter (https://monitoring2.substack.com/ (https://monitoring2.substack.com/)) Modern Software Podcast - Fredric Paul @TheFreditor (https://twitter.com/TheFreditor (https://twitter.com/TheFreditor)) Modern Software Podcast - Tori Wieldt @ToriWieldt (https://twitter.com/ToriWieldt (https://twitter.com/ToriWieldt)) Lee’s Guest — Clay Smith https://www.linkedin.com/in/smithclay/ (Clay Smith) is a good friend of mine. He was a senior software engineer at several early-stage startup companies and has been building serverless solutions for many years now, from mobile backends to real-time APIs with Amazon Web Services. Clay was a senior Developer Evangelist at New Relic, which is where Clay and I met. Clay’s newsletter is “Monitoring Monitoring”. You can subscribe to the newsletter at https://monitoring2.substack.com/ (https://monitoring2.substack.com/). Questions/Issues Discussed Is Lambda living up to the hype? Is there an end to the hype anytime soon? Has Fargate lived up to the hype? What is the role of containers vs FaaS? What is the role of Kubernetes? What types of problems are suited for FaaS and what kind of problems are not? How good was our guesses in 2018 for the state of serverless in 2020? What will be the state of serverless in 2022? We need a better term for FaaS than the generic term “serverless” Use of FaaS as a glue between external services Quotes Lee: “One of the problems with Lambda is…you are not making your service boundaries based on functionality, you are making service boundaries based on the limitation of the technology.” Clay: “It’s been a great two years for event driven architectures in general.” Clay: “There’s a new class of startups giving a Heroku type experience that look very interesting…” Lee: “Lambda is very good…for the hook and glue use case” [system integration] Clay: “API Glue” Clay: “We both definitely agree that the glue aspect is still the killer use case…” Clay: “Greatest...
In this episode, Our customers depend on them, and our businesses depend on them. Without modern web apps, most businesses would not survive. This is the second in a two part series on the principles for modernizing your enterprise web application. And then, what is STOSA and how does it help with modern digital applications? Links and More InformationThe following are links mentioned in this episode, and links to related information: Modern Digital Applications Website (https://mdacast.com (https://mdacast.com)) Lee Atchison Articles and Presentations (https://leeatchison.com (https://leeatchison.com)) Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com (https://architectingforscale.com)) STOSA — Single Team Oriented Service Architecture (https://stosa.org (https://stosa.org)) Main Story - Five Principles for Modernizing Your Enterprise Web Applications, p2Modern web applications have a lot riding on them. Our customers depend on them, and our business depends on them. Without modern web applications, many businesses would not survive. Modern web applications must scale to meet our biggest needs without suffering outages or other availability issues. In order to meet and exceed this high bar, we must build, manage, and monitor our applications using modern principles, processes, and procedures. There are five guiding principles that a modern application requires in order to meet these expectations. In previous episodes, we discussed principle 1, using service-based architectures; principle 2, organizing teams around services; and principle 3, using DevOps processes and procedures. In this episode, we will finish the remaining two principles. Starting with… Principle #4. Use dynamic infrastructures.Customer traffic loads on our applications vary considerably. The upper bound or maximum amount of traffic your application needs to support can never be known accurately. In the past, as application traffic grew, we simply threw additional hardware at the problem. But as traffic variations increase, and customer usage patterns become more complicated. This simple solution is no longer reasonable. Simply adding hardware may be fine to handle expected peaks, but what do you do when an unexpected spike in usage arrives? When I was at Amazon, there was a term that was used. It was called the “Johnny Carson Effect”. It’s origin comes from the day that Johnny Carson died. On that day, there was a huge uptick in demand for any videotape of old episode’s of Johnny Carson’s The Tonight Show. Demand was huge, and given the suddenness of the event, it was all unexpected demand. Now-a-days, the term “going viral” is both a positive and a negative for most businesses. Going viral means huge attention given to you and your business. But it also means huge and unexpected traffic volume. The last thing you want is for your application to fail right at the point when everyone is watching. How can you handle the unexpected surge from such an unexpected event? It is no longer possible to throw large quantities of hardware at an application. You never know how much hardware is enough hardware to handle your possible maximum need. Additionally, when your application is not operating at a high volume, what do you do with the extra hardware? Typically, it’s sitting idle...which is a huge a waste of resources...and even more importantly...a waste of money. Especially if your traffic needs are highly variable or highly spiky, simply adding hardware to handle your peaks is not an effective use of resources. Instead, you must add resources to your applications dynamically, as they are needed and when they are needed. These resources can be applied when your traffic needs are high, and they can be released when they are no longer needed. This is what dynamic infrastructures are all about. The only way to implement a true dynamic infrastructure for an application with a highly variable load is...
In this episode, Our customers depend on them, and our businesses depend on them. Without modern web apps, most businesses would not survive. I’ll present the first in a two part series on the principles for modernizing your enterprise web application. And then, what does any of this have to do with my son and his wage reporting application? Links and More InformationThe following are links mentioned in this episode, and links to related information: Modern Digital Applications Website (https://mdacast.com (https://mdacast.com)) Lee Atchison Articles and Presentations (https://leeatchison.com (https://leeatchison.com)) Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com (https://architectingforscale.com)) Main Story - Five Principles for Modernizing Your Enterprise Web ApplicationsModern web applications have a lot riding on them. Our customers depend on them, and our business depends on them. Without modern web applications, many businesses would not survive. Modern web applications must scale to meet our biggest needs without suffering outages or other availability issues. In order to meet and exceed this high bar, we must build, manage, and monitor our applications using modern principles, processes, and procedures. There are five guiding principles that a modern application requires in order to meet these expectations. Principle #1 Using service-based architecturesModern applications are large and complex, too complex to be handled as a single entity by a single team. Instead, multiple teams are required to develop, test, support, and operate these applications. This complex task is impossible when the application is a single monolith. To handle a large complex application, split the application into multiple independent services. Then, you can assign different pieces of the application to different teams, allowing parallel development and operation. By keeping modules isolated, they can be built, tested, and operated in isolation. Application problems can be more easily correlated and isolated to individual services, making it easier to decide which team should be called on to work on an issue when a problem arises. Scaling is not just about the amount of traffic an application receives, but about the size and complexity of the application itself. Using service-based architectures makes scaling the application easier and allows larger and more complex applications to be dealt with reasonably. In future segments, we will talk about the remaining four guiding principles that are required for enterprise applications to modernize. In the next segment, principle #2 is about team organization. Our applications *are* getting more complex, and becoming more intertwined with the fundamental operation of our business. As such, the expectations of our customers are growing, and the demands for reliability, scalability, and functionality from management are keeping pace. Only by modernizing our applications can we make our applications meet the needs of our customers and our business. Principle #2. Organize teams around services.Architecting your application as a service-based application is only part of the answer to modernizing your enterprise systems. Once you have your application split up into services, it is essential that you structure your development teams around those services. It is critical that a single team owns each service. And when I say “own”, I mean complete ownership...front to back...top to bottom. This includes development, testing, operation, and support. All aspects of the development and operation of each service should be handled by one and only one team. There is a model for application management that stresses these ownership values. STOSA, which stands for Single Team Owned Service Architecture, provides guiding principles on team-level ownership, providing clear boundaries between services and promoting clear understanding and expectations
In this episode, the hardest part of your cloud migration is moving your data to the cloud. Moving your data to the cloud, without suffering planned or unplanned downtime, can be a challenge. I’m going to give you three strategies that will help you avoid downtime during the migration of your critical application data to the cloud. And in Tech Tapas, we are going to take a look at what it means to fly two mistakes high, and how that relates to application availability. Links and More Information The following are links mentioned in this episode, and links to related information: 3 Strategies to Avoid Downtime When Migrating Data to the Cloud (https://blog.newrelic.com/engineering/migrating-data-to-cloud-avoid-downtime-strategies/ (https://blog.newrelic.com/engineering/migrating-data-to-cloud-avoid-downtime-strategies/)) Modern Digital Applications Website (https://mdacast.com (https://mdacast.com)) Lee Atchison Articles and Presentations (https://leeatchison.com (https://leeatchison.com)) Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com (https://architectingforscale.com)) Main Story - 3 Strategies to Avoid Downtime when Migrating Data to the Cloud Moving your data is one of the trickiest parts of a cloud migration. During the migration, the location of your data can have a significant impact on the performance of your application. During the data transfer, keeping the data intact, in sync, and self-consistent requires either tight correlation or—worse—application downtime. Moving your data and the applications that utilize the data at the same time is necessary to keep your application performance acceptable. Deciding how and when to migrate your data relative to your services, though, is a complex question. Often companies will rely on the expertise of a migration architect, which is a role that can greatly contribute to the success of any cloud migration. Whether you have an on-staff cloud architect or not, there are three primary strategies for migrating application data to the cloud: Offline copy migration Master/read replica switch migration Master/master migration It doesn’t matter if you’re migrating an SQL database, a noSQL database, or simply raw data files—each migration method requires a different amount of effort, has a different impact on your application’s availability, and presents a different risk profile for your business. Strategy 1: Copy Data While Application is Offline An offline copy migration is the most straightforward method. Bring down your on-premise application, copy the data from your on-premise database to the new cloud database, then bring your application back online in the cloud. An offline copy migration is simple, easy, and safe, but you’ll have to take your application offline to execute it. If your dataset is extremely large, your application may be offline for a significant period of time, which will undoubtedly impact your customers and business. For most applications, the amount of downtime required for an offline copy migration is generally unacceptable. But if your business can tolerate some downtime, and your dataset is small enough, you should consider this method. It’s the easiest, least expensive, and least risky method of migrating your data to the cloud. Strategy 2: Read Replica Switch The goal of a read replica switch migration is to reduce application downtime without significantly complicating the data migration itself. For this type of migration, you start with your master version of your database running in your on-premise data center. You then set up a read replica copy of your database in the cloud with one way synchronization of data from your on-premise master to your read replica. At this point, you still make all data updates and changes to the on-premise master, and the master synchronizes those changes with the cloud-based read replica. The master-replica model is common in most database systems. You’ll...
Google Cloud Next is coming to the Moscone Center in San Francisco from April 6 until April 8. This is Google’s big cloud event of the year. Links and More Information The following are links mentioned in this episode, and links to related information: Announcement — Google Cloud Next (https://cloud.google.com/blog/topics/google-cloud-next/join-us-at-google-cloud-next-2020 (https://cloud.google.com/blog/topics/google-cloud-next/join-us-at-google-cloud-next-2020)) Registration — Google Cloud Next (https://cloud.withgoogle.com/next/sf (https://cloud.withgoogle.com/next/sf)) Use code GRPABLOG2020 for $500 USD off a full price ticket. Modern Digital Applications Website (https://mdacast.com (https://mdacast.com)) Lee Atchison Articles and Presentations (https://leeatchison.com (https://leeatchison.com)) Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com (https://architectingforscale.com)) Special Report Hello everyone and welcome to Modern Digital Applications. This is just a very quick news break... Those of you using Google Cloud will want to hear this. Google Cloud Next is coming to the Moscone Center in San Francisco from April 6 until April 8. This is Google’s big cloud event of the year. According to the information provided by Google: Google Cloud Next brings together a global cloud community of leaders, developers, and influencers to help you get inspired and solve your most pressing business challenges. If you would like to attend, check out the show notes for a link to the registration page. Also in the show notes is a code that is good for $500 off a full price ticket. You can expect that Google will have a host of new product announcements around this time, so even if you don’t attend, listen up in April for more news from Google Cloud. This podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacy Podtrac - https://analytics.podtrac.com/privacy-policy-gdrp
In this episode, we conclude our three part series on Service Tiers, and how they can be used to prevent disasters in applications using service based architectures. We also take a look at the very first AWS service. Any guess what that service is? Finally, what does redundancy look like in outer space? And just how effective was the space shuttle application systems at keeping the shuttle safe? Links and More Information The following are links mentioned in this episode, and links to related information: Modern Digital Applications Website (https://mdacast.com (https://mdacast.com)) Lee Atchison Articles and Presentations (https://leeatchison.com (https://leeatchison.com)) How Service Tiers Can Help to Avoid Microservices Disasters (https://thenewstack.io/how-service-tiers-can-help-to-avoid-microservices-disasters/ (https://thenewstack.io/how-service-tiers-can-help-to-avoid-microservices-disasters/)) Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com (https://architectingforscale.com)) Main Story How to Use Service Tiers So, now that we’ve defined service tiers, how do you use service tiers? Service tiers have two distinct uses. Helping determine required responsiveness to problems, and requirements for dependencies between individual services. Responsiveness Let’s first talk about responsiveness. The service tier level of a service can be used to determine how fast or slow a problem with a service should be addressed. Of course, the higher the significance of a problem, the faster it should be addressed. But, in general, ***the lower the service tier number, the higher importance the problem likely is…and therefore the faster it should be addressed***. A low-to-medium severity problem in a Tier-1 service is likely more important and impactful than a high severity problem with a Tier-4 service. Given this, you can use the service tier, in conjunction with the severity of the problem, together to determine how fast of a response your team should have to a problem. Should we be alerted immediately, 24 hours a day, 7 days a week and fix the problem no matter what time of the day or night? Or is this a problem that can wait until the next morning to fix? Or is it a problem we can add to a queue and fix it when we get to it in our overall list of priorities? Or should we simply add it to our backlog for future consideration? Service tiers, in conjunction with problem severity, can give you the right procedural guidelines for how to handle a problem. You can use them to set SLAs on service responsiveness. You can even use them to set availability SLAs for your services. For example, you could create a policy that says that all Tier 1 services need to have an availability of 99.95%. This might dictate that all high severity problems must be resolved within 2 hours of identification, meaning that you must have an on call support team available 24 hours a day, 7 days a week and that support team must have enough knowledge and experience to fix any serious problems that arise. This would likely mean the owning development team would need to comprise the support rotation for this service. Meanwhile a Tier 3 service might be able to have an availability SLA of only 99.8%. A Tier 4 service might not even have an availability SLA. This would mean that all but the most serious problems could probably wait until the next business day to be fixed, meaning an on call support role may not be needed, or may not need to be as formal or have tight mean time to repair goals. Service Tiers help set policy on responsiveness requirements for your services, which can then dictate many requirements for your other policies and procedures. Interservice Dependencies Now, let’s talk about how service tiers can help with inter service dependencies. Given that services at different service tier levels have different responsiveness requirements. This impacts your dependency map between services and...
Amazon recently announced a reduction in the cost of their Elastic Kubernetes Service. They lowered the cost of the Amazon EKS service by 50%. The new pricing is greatly appreciated, but is still a barrier to entry in using the EKS service for small clusters. Links and More Information The following are links mentioned in this episode, and links to related information: AWS EKS Pricing (https://aws.amazon.com/blogs/aws/eks-price-reduction/ (https://aws.amazon.com/blogs/aws/eks-price-reduction/)) AWS EKS (https://aws.amazon.com/eks/) AWS ECS (https://aws.amazon.com/ecs/) Modern Digital Applications Website (https://mdacast.com (https://mdacast.com)) Lee Atchison Articles and Presentations (https://leeatchison.com (https://leeatchison.com)) Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com (https://architectingforscale.com)) Special Report Amazon recently announced a reduction in the cost of their Elastic Kubernetes Service. As of January 21, 2020, the price of Amazon EKS will be lowered by 50%. The previous price for using the EKS service was 20 cents per hour for each Kubernetes cluster managed. This works out to approximately $144 per month. The new price for running an EKS cluster will now be 10 cents per hour, which works out to approximately $72 per month. It should be noted that this is the price necessary to operate the cluster, regardless of the number of workers contained in the cluster. You must still pay for the computation costs — either EC2 instance hours or Fargate compute resources — in order for the cluster to be useful. The cost reduction is for the “overhead” cost of maintaining the cluster, which covers the Amazon cost of managing the cluster. This price has always been an issue with the EKS service. The non-Kubernetes Amazon container management service, ECS, does not have this overhead cost — all you pay for is the cost of running the workers. Only EKS has this overhead. So, the reduction of this overhead cost should be seen as significant reduction in the barrier to entry in using the EKS service. While I applaud AWS for taking this step, I still believe this overhead cost is too high. It still makes EKS less attractive than ECS for small clusters. The cost is not an issue for large clusters, since you can amortize the cost out over the size of the cluster. But if you run a large number of small clusters, this overhead cost can be significant. For example, if you are running 100 clusters, the overhead cost is around $87,000 per year. Overhead. Now that’s a lot less than the $175,000 it was before the price change, but it’s still quite a bit of overhead. That said, I applaud AWS for making this price change. It’s a consistent strategy they use — pass on cost savings to their customers as their costs go down due to volume. It’s a strategy very much appreciated by their customers. A link to the announcement is given in the show notes, along with a link to the EKS and ECS services. This podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacy Podtrac - https://analytics.podtrac.com/privacy-policy-gdrp
Hello and welcome to Modern Digital Applications. In this episode, we continue our three part series on Service Tiers, and how they can be used to prevent disasters in applications using service based architectures. We also take another look at Amazon S3, and ask the question, how large is S3? The answer might surprise you. All of this, in this episode of Modern Digital Applications. Links and More Information The following are links mentioned in this episode, and links to related information: Modern Digital Applications Website (https://mdacast.com (https://mdacast.com)) Lee Atchison Articles and Presentations (https://leeatchison.com (https://leeatchison.com)) How Service Tiers Can Help to Avoid Microservices Disasters (https://thenewstack.io/how-service-tiers-can-help-to-avoid-microservices-disasters/ (https://thenewstack.io/how-service-tiers-can-help-to-avoid-microservices-disasters/)) Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com (https://architectingforscale.com)) News Links to stories mentioned in the news section: * DevOps World (https://www.cloudbees.com/devops-world (https://www.cloudbees.com/devops-world)) Main Story In the last episode, we introduced the concept of service tiers, and we talked specifically about Tier 1 service tier, which is used to label the most critical, highest priority services in your application. In this episode, we’re going to talk about the definition of the rest of the four service tier levels that are defined. In review, a Tier 1 service is a service that is mission critical to your application. If a Tier 1 service is down, your application is down. A tier 1 service is a service where if it fails, it has a significant impact on your customer’s and/or your business’s bottom line After Tier 1, the next level of service is Tier 2. Tier 2 services can have an impact to your business, but are typically less critical than a Tier 1 service. A Tier 2 services is a service where a failure of the service can cause a degraded customer experience in a noticeable and meaningful way but does not completely prevent your customer from interacting with your system. A great example of a tier 2 service is a search service. This would be a service that handles requests to search by your customers for products, product information or other content from your application. If the search box on your website stopped working, it would have a major impact on customers…especially if a customer is depending on search to find something on your site…but it would not make the application completely unusable to your customers. Compare this to a tier 1 service, such as the login service. If customer’s can’t login to their application, they can’t do anything. So a tier 1 service means a failure keeps the customer from doing anything. A tier 2 service failure means they can’t perform some important functionality, but other parts of the application are still available to them. Tier 2 services can also be services that impact your backend business processes in significant ways, but may not be directly noticeable to your customers. For example, if a service that managed order processing in a fulfillment center fails, this would have a significant impact on your ability to fulfill orders in a timely manner. This would have an impact on your business, and a potential impact on your customers if a package arrived late as a result, but it doesn’t completely bring down your website or application, and customers can still do things, such as create new orders or check on the status of existing orders. A customer may not even notice this failure, yet it could still impact them. That’s for tier 2 services. Tier 3 The next level is Tier 3. A Tier 3 service is one that can have minor, unnoticeable or difficult-to-notice customer impact. Or it could have limited effects on your business and systems. The key words here are: minor, difficult-to-notice, and limited. The best
Show Notes Welcome to the inaugural episode of Modern Digital Applications! I’m very glad you are listening, and I hope you’ll find this podcast informative and helpful. My goal is to try and keep the episodes short, so that they can be consumed during a single morning average commute trip to work. Please, let me know how I am doing and what I can to improve. But, let’s get started. In this episode, we begin a three part series on Service Tiers, and how they can be used to prevent disasters in applications using service based architectures. We also take a look at Amazon S3, and the history of SaaS. Links and More Information The following are links mentioned in this episode, and links to related information: Modern Digital Applications Website (https://mdacast.com (https://mdacast.com)) Lee Atchison Articles and Presentations (https://leeatchison.com (https://leeatchison.com)) Architecting for Scale — O’Reilly Media (https://architectingforscale.com (https://architectingforscale.com)) How Service Tiers Can Help to Avoid Microservices Disasters (https://thenewstack.io/how-service-tiers-can-help-to-avoid-microservices-disasters/ (https://thenewstack.io/how-service-tiers-can-help-to-avoid-microservices-disasters/)) Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com (https://architectingforscale.com)) Main Story Bringing down an entire application is easy. All it takes is the failure of a single service and the entire set of services that make up the application can come crashing down like a house of cards. Just one minor error from a non-critical service can be disastrous to the entire application. There are, of course, many ways to prevent dependent services from failing. However, adding extra resiliency in non-critical services also adds complexity and cost, and sometimes that extra cost is not needed. What if a service, let’s call it Service A, is consuming another service, let’s call it Service B. If the called Service, Service B, is not critical to the operation of the calling Service, Service A, then why should Service A fail if Service B has a problem? Surely, we should be able to build Service A so that it can survive the failure of Service B. And if Service B is not critical to the functioning of Service A, does Service B need to have the same level of resiliency has Service A? No, of course not. As we build our dependency map between each of our services and the services they depend on, we will find that some of the dependencies are critical dependencies, and some of them are non-critical dependencies. How do we determine which dependencies are critical and which are not critical? One important tool for making this determination is to use something called Service Tiers. What Are Service Tiers? A service tier is simply a label associated with a service that indicates how critical a service is to the operation of your business. Service tiers let you distinguish between services that are mission critical, and those that are useful and helpful but not essential. Service tiers can be used to determine whether the interaction between dependent services is a critical dependency, or a non-critical dependency. Service Tiers are a great way to help you prioritize where and how you invest in making your services, and their dependencies, more resilient. This allows you to build higher scaled applications that are much more highly available for the same amount of effort. To see how this works, let’s take a look at the various service tier labels and how you can determine which label to apply to which services. Assigning Service Tiers All services in your system, no matter how big or how small, should be assigned a service tier. In the model of service tiers that I use and recommend, there are four distinct tiers, four distinct levels if you will, that allow you to specify the criticalness of a service. Let’s talk about each of these four levels. Tier 1 The highest...
Modern Digital Applications is a podcast focused on helping corporate decision makers, executives, and architects create or extend their digital business with the help of modern applications, processes, and software strategy. The podcast is hosted by https://leeatchison.com/?utm_source=shownotes&utm_medium=notes&utm_campaign=000 (Lee Atchison), a recognized industry thought leader and pundit in cloud computing. Lee has over 30 years industry experience, and has been working on cloud computing since the early days of the cloud. Lee has committed his career to architecting and building high scale, cloud-based, service oriented, SaaS applications. He has a specific expertise in building highly available systems. Lee has consulted with leading organizations on how to modernize their application architectures and transform their organizations at scale; including optimize for cloud platforms, utilize service-based architectures, implement DevOps practices, and design for high availability. This experience led him to write his book https://architectingforscale.com/?utm_source=shownotes&utm_medium=links&utm_campaign=000 (Architecting for Scale, published by O’Reilly Media). Lee is an industry expert and pundit and is widely quoted in publications such as InfoWorld, Diginomica, IT Brief, Programmable Web, and CIO Review. He has been a featured and keynote speaker at events across the globe from London to Sydney, Tokyo to Paris, and all over North America. In each episode, Lee will give you insights into some aspect of building, modernizing, or scaling your digital applications, and the teams and culture that drive them. The podcast episodes will contain feature stories, new and noteworthy industry information, and some Tech Tapas, little bits of technology knowledge and data. We’ll also feature interviews with technology experts and other industry pundits. Each episode is designed to be short and concise, easily digestible on a single morning commute trip. This is ... Modern Digital Applications. Links and More Informationhttps://leeatchison.com/?utm_source=shownotes&utm_medium=links&utm_campaign=000 (Lee Atchison Articles and Presentations) https://architectingforscale.com/?utm_source=shownotes&utm_medium=links&utm_campaign=000 (Architecting for Scale — O’Reilly Media) https://mdacast.com/?utm_source=shownotes&utm_medium=links&utm_campaign=000 (Modern Digital Applications Podcast) This podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacy Podtrac - https://analytics.podtrac.com/privacy-policy-gdrp