The mission of the FailSafe Initiative is to articulate recommendations and approaches for delivering resilient and scalable cloud architectures. This series, developed by Microsoft Services and the Azure Customer Advisory Team (CAT) delivers general guidance on building resilient and scalable cloud…
The limit of scale and availability is the limit of insight.This session, delivered by Mark Simms, explores options and implementation approaches for instrumenting and monitoring applications on Azure. The session also covers configuration and change management practices for delivering a manageable application.
In a world where dependent services and platforms are not 100% available, architects must embrace that given enough time and pressure, all things will fail.This session, delivered by Mark Simms, will explore how to design for failure, understanding the impact and design choices for distributed services. This session will cover failure at node, service, and regional levels.
This session, delivered by Mark Simms, will explore elements of scalability; partition, optimize, and shift. The session will also provide an understanding of the capacity limits of Azure platform services and scalability approaches.
This session, delivered by Mark Simms, will provide an understanding of the various options for data management in Azure and help you map system capabilities against solution requirements.The session will cover the wide range of options for managing data, compositional approaches, and how to balance competing priorities.
This session, delivered by Mark Simms, will provide an understanding of Azure platform capabilities and scalability targets. It also explores key intra-service communication patterns, discussing choices and tradeoffs.The session will cover core Azure platform capabilities, scale units, and approaches to achieving the best utilization/density. Intra and Inter Service Communication Patterns are also be covered.
Cloud services run 24x7 and at scale. Failures will happen. Good software will be instrumented to help operations and developers pinpoint the location of the issues. Great software will automate diagnosis, resolution, and verification based on known patterns.This session, delivered by Marc Mercuri, covers topics including cloud considerations for ALM, health modeling, troubleshooting workflows for self-healing systems, and telemetry and insight.
This session, delivered by Ulrich Homann and Marc Mercuri, includes coverage of scalability and deployment considerations for scalable and resilient systems.Topics include scale units, fault domains and upgrade domains, resilient fault handling and circuit breakers, data decomposition, data partitioning, caching, CDN, deployment redundancy, backups, latency, and hybrid considerations.
A continuation of the previous session, this session, delivered by Ulrich Homann and Marc Mercuri, continues coverage of core concepts. Topics such as functional decomposition, business architecture, throttling, failure points, and failure modes.
Architecting for the cloud means understanding and embracing the fundamental aspects of it. Commodity hardware at scale requires a scale out vs. scale up approach. Commodity hardware fails and very few services provide 100% uptime SLAs. This session, delivered by Ulrich Homann and Marc Mercuri, covers the core concepts and considerations for developing scalable, resilient applications in this environment that the rest of the course is built upon.Topics in part one include SLA considerations, resource constraints, decomposing by workload, lifecycle modeling, and availability modeling.