{"id":9844,"date":"2021-11-18T12:00:32","date_gmt":"2021-11-18T11:00:32","guid":{"rendered":"http:\/\/www.orbit.cz\/?post_type=encyklopedie-cloudu&#038;p=9844"},"modified":"2024-10-31T14:32:09","modified_gmt":"2024-10-31T13:32:09","slug":"fault-injection-neboli-rozbij-si-to-sam","status":"publish","type":"encyklopedie-cloudu","link":"http:\/\/4.184.192.234\/en\/encyklopedie-cloudu\/fault-injection-neboli-rozbij-si-to-sam\/","title":{"rendered":"Fault injection or break it yourself!"},"content":{"rendered":"<div class=\"wp-block-image\">\n<figure class=\"aligncenter is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"139\" src=\"http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/dsadsad-300x139.png\" alt=\"Fault injection - break it yourself! | ORBIT Cloud Encyclopedia\" class=\"wp-image-9907\" style=\"width:534px;height:auto\" srcset=\"http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/dsadsad-300x139.png 300w, http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/dsadsad-1024x475.png 1024w, http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/dsadsad-768x357.png 768w, http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/dsadsad.png 1120w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/figure>\n<\/div>\n\n<style>.wp-block-kadence-column.kb-section-dir-horizontal > .kt-inside-inner-col > .kt-info-box9844_897600-bf .kt-blocks-info-box-link-wrap{max-width:unset;}.kt-info-box9844_897600-bf .kt-blocks-info-box-link-wrap{background:#ffffff;padding-top:var(--global-kb-spacing-xs, 1rem);padding-right:var(--global-kb-spacing-xs, 1rem);padding-bottom:var(--global-kb-spacing-xs, 1rem);padding-left:0px;}.kt-info-box9844_897600-bf.wp-block-kadence-infobox{max-width:100%;}.kt-info-box9844_897600-bf .kadence-info-box-image-inner-intrisic-container .kadence-info-box-image-intrisic{padding-bottom:100%;max-width:100%;}.kt-info-box9844_897600-bf .kadence-info-box-icon-container .kt-info-svg-icon, .kt-info-box9844_897600-bf .kt-info-svg-icon-flip, .kt-info-box9844_897600-bf .kt-blocks-info-box-number{font-size:50px;}.kt-info-box9844_897600-bf .kt-blocks-info-box-media{border-radius:200px;overflow:hidden;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;}.kt-info-box9844_897600-bf .kt-infobox-textcontent p.kt-blocks-info-box-title{font-size:var(--global-kb-font-size-md, 1.25rem);padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;margin-top:0px;margin-right:0px;margin-bottom:10px;margin-left:0px;}.kt-info-box9844_897600-bf .kt-blocks-info-box-learnmore{background:transparent;border-width:0px 0px 0px 0px;padding-top:4px;padding-right:8px;padding-bottom:4px;padding-left:8px;margin-top:10px;margin-right:0px;margin-bottom:10px;margin-left:0px;}<\/style>\n<div class=\"wp-block-kadence-infobox kt-info-box9844_897600-bf orbit-testimonial-second\"><span class=\"kt-blocks-info-box-link-wrap info-box-link kt-blocks-info-box-media-align-left kt-info-halign-left\"><div class=\"kt-blocks-info-box-media-container\"><div class=\"kt-blocks-info-box-media kt-info-media-animate-none\"><\/div><\/div><div class=\"kt-infobox-textcontent\"><p class=\"kt-blocks-info-box-title\">For a long time, we said, \"If it works, don't touch it!\" It's simply better not to touch running technology, lest our road to failure be paved with good intentions. But in recent years we have a new approach - let's \"break\" our own systems on an ongoing basis so that we can handle real incidents later. Isn't that nonsense? If not, how to do it? And how does cloud technology help us do that?<\/p><p class=\"kt-blocks-info-box-text\"><strong>Kamil Kov\u00e1\u0159<\/strong><\/p><\/div><\/span><\/div>\n\n\n\n<p>The idea of generating targeted outages and errors on infrastructure is not new or foreign. Every large company that provides IT services at a professional level performs regular disaster recovery tests (for which we have developed a tool to automate&nbsp;<a href=\"https:\/\/taskcontrol.cz\/\" target=\"_blank\" rel=\"nofollow noopener\">TaskControl - Automation of real-time activity coordination<\/a>).<\/p>\n\n\n\n<p>So the company is testing&nbsp;<strong>large-scale outage<\/strong>, i.e.&nbsp;<strong>macroscopic approach<\/strong>. It shuts down one large part of the infrastructure, typically systems in one datacenter, and turns on all services in an alternate location.<\/p>\n\n\n\n<p>I'm sure you run DR tests in your company too and you can rely on the robustness of the environment, right?<\/p>\n\n\n\n<p>Targeted&nbsp;<strong>small-scale outages<\/strong>that is to say&nbsp;<strong>microscopic approach<\/strong>,<strong>&nbsp;<\/strong>is a relatively new issue and has its origins in the first decade of this century. In the same way that we build systems with resilience to the failure of a datacenter, we naturally build them with resilience to the failure of a smaller unit, a virtual server, a container or an individual service. We use technology to&nbsp;<strong>blackout<\/strong>&nbsp;or&nbsp;<strong>Outage-free<\/strong>&nbsp;restart services.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Infrastructure accessibility<\/strong><\/h2>\n\n\n\n<p>We have historically addressed robustness:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Infrastructure<\/strong>&nbsp;at the level of&nbsp;<strong>datacenters<\/strong>&nbsp;(dual power sources, grid cooling, control, etc.),<\/li>\n\n\n\n<li>at the level of&nbsp;<strong>hardware<\/strong>&nbsp;(redundant power supply, internal cooling, disk RAID, etc.),<\/li>\n\n\n\n<li>at the level of&nbsp;<strong>virtualization<\/strong>&nbsp;(OS or container).<\/li>\n<\/ul>\n\n\n\n<p>When providing cloud services, all of the following IT components are&nbsp;<strong>under the responsibility of the cloud provider<\/strong>. Despite all the redundancy, we get a relatively&nbsp;<strong>low availability guarantee<\/strong>.<\/p>\n\n\n\n<p>What does Amazon have to say about this?&nbsp;<em>AWS will use&nbsp;<\/em><strong><em>commercially reasonable efforts<\/em><\/strong><em>&nbsp;to make Amazon EC2 and Amazon EBS ... of at least 99.99%, in each case during any monthly billing cycle (the \"Service Commitment\").<\/em><\/p>\n\n\n\n<p>Microsoft at a glance&nbsp;<a href=\"https:\/\/azure.microsoft.com\/en-us\/support\/legal\/sla\/summary\/\" target=\"_blank\" rel=\"nofollow noopener\">Service Level Agreements Summary | Microsoft Azure<\/a>&nbsp;guarantees for individual virtual machines&nbsp;<strong>network availability<\/strong>&nbsp;99.9 %.<\/p>\n\n\n\n<p>For details on infrastructure services in the cloud I recommend the article by Jakub Proch\u00e1zka&nbsp;<a href=\"http:\/\/4.184.192.234\/en\/encyklopedie-cloudu\/stavime-zaklady-infrastructure-as-a-service-iaas\/\" target=\"_blank\" rel=\"noopener\">(6) Building the Foundations: Infrastructure as a Service | LinkedIn<\/a>.<\/p>\n\n\n\n<p><em>(V&nbsp;<\/em><a href=\"http:\/\/4.184.192.234\/en\/encyklopedie-cloudu\/\" target=\"_blank\" rel=\"nofollow noopener\"><em>ORBIT is also preparing a December special&nbsp;<\/em><\/a><em>on the topic of high availability in the cloud for more details. To make sure you don't miss it, just follow the company's LinkedIn profile.)<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Software availability<\/strong><\/h2>\n\n\n\n<p>Ideal for achieving really high availability is&nbsp;<strong>not relying on infrastructure availability<\/strong>&nbsp;(although&nbsp;<strong>for less critical applications<\/strong>&nbsp;is a sufficient approach) if 99.9 % are guaranteed to match the target SLA. Even there, of course, I have to take care of backup and service restoration method.<\/p>\n\n\n\n<p>For critical applications, we therefore continue to achieve high&nbsp;<strong>availability at software level.<\/strong>&nbsp;Primarily on middleware (in the sense of cluster technologies at the application platform or database level), or in the code of the software solution itself.<\/p>\n\n\n\n<p>The goal is for the software to enable:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>redirecting or splitting queries<\/strong>&nbsp;on the integration, user, or data interface, usually by load balancer services or network services,<\/li>\n\n\n\n<li>similar&nbsp;<strong>load distribution<\/strong>&nbsp;calculations and transformations at the application layer,<\/li>\n\n\n\n<li><strong>redirect<\/strong>&nbsp;within active\/passive clusters or active\/active clusters at the database layer.<\/li>\n<\/ul>\n\n\n\n<p>For example, we may get a sympathetic architecture like this:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter is-resized\"><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/architecture\/reference-architectures\/n-tier\/multi-region-sql-server\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"212\" src=\"http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/dsadassd-300x212.png\" alt=\"Azure Architecture | Fault injection - break it yourself! | ORBIT Cloud Encyclopedia\" class=\"wp-image-9909\" style=\"width:496px;height:auto\" srcset=\"http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/dsadassd-300x212.png 300w, http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/dsadassd-1024x723.png 1024w, http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/dsadassd-768x542.png 768w, http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/dsadassd.png 1427w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><figcaption class=\"wp-element-caption\">Azure Architecture (microsoft.com)<\/figcaption><\/figure>\n<\/div>\n\n\n<p>These functionalities are supported by native cloud services such as scalesets, application gateways and load balancers, highly available databases and more.<\/p>\n\n\n\n<p>An important condition is&nbsp;<strong>software capability<\/strong>&nbsp;<strong>work in the configuration<\/strong>&nbsp;<strong>load distribution<\/strong>i.e. be configured to multiple redundant components performing the same operation.<\/p>\n\n\n\n<p>Some layers may use a stateless module to handle the incoming request regardless of the historical session progress (i.e., communication with the counterparty). Such a component does not retain any data and can be removed at any time.<\/p>\n\n\n\n<p>Other times we have to take into account some loss of session continuity if the module is stateful and keeps a running transaction in temporary memory.<\/p>\n\n\n\n<p>It is generally recommended to keep sessions in a fast distributed repository such as&nbsp;<em>AWS Redis<\/em>&nbsp;or&nbsp;<em>Azure Cache for Redis<\/em>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>User experience on a classic example<\/strong><\/h2>\n\n\n\n<p>Let's imagine a situation where I order goods in an e-shop and recalculate the price in the shopping cart after adding another item. Sudden&nbsp;<strong>fall of a container with a stateless component<\/strong>&nbsp;to calculate the total price, I will only note that the action did not last one second but three, because the calculation call had to be repeated against another container. I proceed normally with the purchase.<\/p>\n\n\n\n<p>When selecting the delivery method and location, you may find that the user interface still shows a blank selection after the selection has been made. This time it has failed&nbsp;<strong>state component<\/strong>, which was responsible for site selection, the map base and the list of sampling points. Instead of selecting a delivery box in our village, an error was returned to the API, which as a user I will not know about. I go back to selecting the drop-off point again, and a different instance of the component now walks me through the same steps with only a slight loss of user experience.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>How not to test on customers<\/strong><\/h2>\n\n\n\n<p>Here we have reached the essence of why we use fault injection within the field of chaos engineering. We always perform&nbsp;<strong>functional and acceptance tests<\/strong>&nbsp;so that we can be sure that the application executes the code correctly.<\/p>\n\n\n\n<p>But we're not used to&nbsp;<strong>crash test individual redundant components<\/strong>i.e. whether the behaviour of the application is still in line with the expected UX (user experience). We actually perform this with a large degree of uncertainty in production environments down to the customers of the service.<\/p>\n\n\n\n<p>That attitude is costing us:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Reputational risk<\/strong>if redundancy is insufficient and UX is very poor. This is a very common case known among others from government deliveries.<\/li>\n\n\n\n<li><strong>Unnecessary costs<\/strong>because we don't know how to set the target robustness. There is a risk that we have set the individual application layers unevenly and unnecessarily generously. In cloud services, robustness is usually set by a few lines of code, which result in several orders of magnitude higher cost. Incidentally, you can read about cloud costs&nbsp;<a href=\"http:\/\/4.184.192.234\/en\/encyklopedie-cloudu\/zkrotte-naklady-v-cloudu-predplatne-uctovani-sluzby\/\" target=\"_blank\" rel=\"noopener\">more here<\/a>.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Fault injection and chaos engineering<\/strong><\/h2>\n\n\n\n<p>The field of error insertion itself is not limited to IT infrastructure, but also includes<strong>&nbsp;software development<\/strong>where it is possible to deliberately break certain parts of the code (products such as.&nbsp;<em>ExhaustIF<\/em>&nbsp;and&nbsp;<em>Holodeck<\/em>). It is also used in other disciplines, including physical manufacturing (think of aerospace engineering and verifying the robustness of the systems of a satellite that then flies in orbit for twenty years).<\/p>\n\n\n\n<p>What do we need in our industry? In the operation of IT systems (<em>operations<\/em>) it is important for us to be able to&nbsp;<strong>insert possible errors into the infrastructure<\/strong>:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Scheduled<\/strong>&nbsp;- All operations and support functions must be aware of planned partial outages,<\/li>\n\n\n\n<li><strong>Controlled<\/strong>&nbsp;- I must not indiscriminately break too much of the infrastructure,<\/li>\n\n\n\n<li><strong>With the measurement<\/strong>&nbsp;- I must be able to assess the impact of errors on the quality of service.<\/li>\n<\/ul>\n\n\n\n<p>He is considered a pioneer in the field&nbsp;<strong>Netflix<\/strong>who introduced chaos engineering way back around 2011 and developed his own tool&nbsp;<a href=\"https:\/\/netflix.github.io\/chaosmonkey\/\" target=\"_blank\" rel=\"nofollow noopener\">ChaosMonkey<\/a>&nbsp;for targeted disruption of compute resources, network components and component states. The tool was soon published as open source.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"83\" src=\"http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/adsadasda-300x83.png\" alt=\"ChaosMonkey | Fault injection - break it yourself! | ORBIT Cloud Encyclopedia\" class=\"wp-image-9911\" style=\"width:516px;height:auto\" srcset=\"http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/adsadasda-300x83.png 300w, http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/adsadasda-1024x284.png 1024w, http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/adsadasda-768x213.png 768w, http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/adsadasda.png 1086w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/figure>\n<\/div>\n\n\n<p><em>ChaosMonkey<\/em>&nbsp;was soon enriched by the addition of&nbsp;<em>Simian Army<\/em>&nbsp;with a more sophisticated approach to introducing faults, enabling better tuning of the necessary resilience of systems. Chaos engineering has been gradually introduced by all major service companies such as Google, Microsoft, Facebook\/Meta and others.<\/p>\n\n\n\n<p>But that doesn't mean it's a discipline for large companies with CDNs (content delivery networks) and large-scale services. Fault injection should gradually become&nbsp;<strong>part of the processes in all IT service companies<\/strong>&nbsp;with high availability. Anyone can easily try testing on their existing environment using the tools described below.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Tool support<\/strong><\/h2>\n\n\n\n<p>In a short time, several interesting instruments with varying degrees of detail were created. Breaking is traditionally a simple activity. Here it includes shutdowns, component reboots, artificial load generation and reconfigurations.<\/p>\n\n\n\n<p>Complex is the control and monitoring part, where I need to see the behavior of applications and components that are intentionally affected by the error. After&nbsp;<em>ChaosMonkey<\/em>&nbsp;tools like&nbsp;<em>Gremlin<\/em>,&nbsp;<em>ChaosMesh<\/em>,&nbsp;<em>ChaosBlade<\/em>,&nbsp;<em>Litmus&nbsp;<\/em>and others. Support for fault injection is also available for the well-known&nbsp;<em>Istio<\/em>.<\/p>\n\n\n\n<p>Let's describe here an extended tool&nbsp;<em>Gremlin&nbsp;<\/em>and new native tools from major cloud providers&nbsp;<em>AWS FIS<\/em>&nbsp;and&nbsp;<em>Azure Chaos Studio<\/em>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Gremlin<\/h3>\n\n\n\n<p><a href=\"https:\/\/www.gremlin.com\/\" target=\"_blank\" rel=\"nofollow noopener\">Gremlin<\/a>&nbsp;(dating back to 2014) is a well-known online bug-insertion tool with agent support and connection to cloud services.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"49\" src=\"http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/asdsada-300x49.png\" alt=\"Gremlin | Fault injection - break it yourself! | ORBIT Cloud Encyclopedia\" class=\"wp-image-9913\" style=\"width:548px;height:auto\" srcset=\"http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/asdsada-300x49.png 300w, http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/asdsada-1024x169.png 1024w, http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/asdsada-768x126.png 768w, http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/asdsada.png 1093w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/figure>\n<\/div>\n\n\n<p>Its interface is simple, as is its use. The agent must be installed on each guest that is to become its target.<\/p>\n\n\n<div class=\"wp-block-image wp-image-9915\">\n<figure class=\"aligncenter is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"167\" src=\"http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/asdasd-300x167.png\" alt=\"Gremlin Window | ORBIT Cloud Encyclopedia\" class=\"wp-image-9915\" style=\"width:498px;height:auto\" srcset=\"http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/asdasd-300x167.png 300w, http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/asdasd-1024x571.png 1024w, http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/asdasd-768x428.png 768w, http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/asdasd.png 1384w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><figcaption class=\"wp-element-caption\">gremlin.com<\/figcaption><\/figure>\n<\/div>\n\n\n<p>Basically.&nbsp;<em>Gremlin&nbsp;<\/em>Allows attacks on resources in virtual machines or containers, including overloading, restarting, or shutting them down, on databases, or on networks. It can overload CPU, memory and I\/O, for example it can change system time or kill specific processes. Causes network communication drops, causes network latency or prevents access to DNS.<\/p>\n\n\n<div class=\"wp-block-image wp-image-9917\">\n<figure class=\"aligncenter is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"160\" src=\"http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/dsadsadsa-300x160.png\" alt=\"Gremlin CPU Utilization Chart | ORBIT Cloud Encyclopedia\" class=\"wp-image-9917\" style=\"width:520px;height:auto\" srcset=\"http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/dsadsadsa-300x160.png 300w, http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/dsadsadsa-1024x547.png 1024w, http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/dsadsadsa-768x411.png 768w, http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/dsadsadsa.png 1388w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><figcaption class=\"wp-element-caption\">Gremlin CPU Utilization Chart (gremlin.com)<\/figcaption><\/figure>\n<\/div>\n\n\n<p>The scenarios allow you to combine attacks and monitor the status of affected target components. Imagine, for example, periodically verifying that scalesets are functioning properly by purposely overloading nodes in a set and monitoring whether new nodes are turning on fast enough.<\/p>\n\n\n\n<p><em>Gremlin&nbsp;<\/em>Supported by&nbsp;<em>Service Discovery<\/em>&nbsp;within the target environment. It can detect system instability and stop attacks in time before they cause service disruption.<\/p>\n\n\n\n<p>For testing within the same team&nbsp;<strong>can&nbsp;<em>Gremlin&nbsp;<\/em>to use at no charge<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">AWS Fault Injection Simulator (AWS FIS)<\/h3>\n\n\n\n<p>New service (2020)&nbsp;<a href=\"https:\/\/aws.amazon.com\/fis\/\" target=\"_blank\" rel=\"nofollow noopener\">AWS FIS<\/a>&nbsp;testing EC2, ECS, EKS, RDS by shutting down or restarting machines and services is also part of Amazon's portfolio. This is very similar fault injection logic, adapted to the AWS environment, including the ability to define tests as JSON documents. The integration makes it unnecessary to use an agent for most of the actions performed, except when you need to load resources.<\/p>\n\n\n\n<p>V&nbsp;<em>AWS FIS<\/em>&nbsp;You create&nbsp;<em>events&nbsp;<\/em>Above&nbsp;<em>Targets&nbsp;<\/em>according to available&nbsp;<em>conditions<\/em>. Action chaining allows you to define advanced scenarios. It is possible to define a condition under which the test will end (if I significantly disrupt the functionality of the environment). The whole concept allows integration into the CD pipeline.<\/p>\n\n\n<div class=\"wp-block-image wp-image-9919\">\n<figure class=\"aligncenter is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"102\" src=\"http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/fdsfdsf-300x102.png\" alt=\"AWS FIS | Fault injection - break it yourself! | ORBIT Cloud Encyclopedia\" class=\"wp-image-9919\" style=\"width:518px;height:auto\" srcset=\"http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/fdsfdsf-300x102.png 300w, http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/fdsfdsf-1024x348.png 1024w, http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/fdsfdsf-768x261.png 768w, http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/fdsfdsf.png 1181w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><figcaption class=\"wp-element-caption\">AWS FIS (amazon.com)<\/figcaption><\/figure>\n<\/div>\n\n\n<p>The account under which the tests run, which must have access to the resources it operates, has fairly high privileges and needs to be accessed as such.<\/p>\n\n\n<div class=\"wp-block-image wp-image-9921\">\n<figure class=\"aligncenter is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"138\" src=\"http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/dadasda-300x138.png\" alt=\"AWS FIS | ORBIT Cloud Encyclopedia\" class=\"wp-image-9921\" style=\"width:518px;height:auto\" srcset=\"http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/dadasda-300x138.png 300w, http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/dadasda-1024x470.png 1024w, http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/dadasda-768x352.png 768w, http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/dadasda.png 1356w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><figcaption class=\"wp-element-caption\">amazon.com<\/figcaption><\/figure>\n<\/div>\n\n\n<p>AWS FIS is&nbsp;<strong>paid per use for the number of actions performed<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Azure Chaos Studio<\/h3>\n\n\n\n<p>At the time of writing, the tool is&nbsp;<a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/chaos-studio\/\" target=\"_blank\" rel=\"nofollow noopener\">Azure Chaos Studio<\/a>&nbsp;for Microsoft Azure in Preview, and is therefore until April 2022&nbsp;<strong>provided for testing<\/strong>before he goes on the pay-per-use model.<\/p>\n\n\n\n<p>The concept of the tool is similar to the previous ones, where on one side in&nbsp;<em>Chaos Studio<\/em>&nbsp;I create&nbsp;<em>experiments<\/em>such as stress tests, and in&nbsp;<em>Application Insights<\/em>&nbsp;I continue to monitor the behavior of the source. The errors I can generate are again CPU, virtual or physical memory loads, I\/O, resource restarts, system time changes, changes and failures on container infrastructure, and more. The list is gradually growing to include the most used cloud resources in Azure. The presence of APIs will enable integrations with other third-party products.<\/p>\n\n\n<div class=\"wp-block-image wp-image-9923\">\n<figure class=\"aligncenter is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"160\" src=\"http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/adsdsada-300x160.png\" alt=\"Azure Chaos Studio | Fault injection - break it yourself! | ORBIT Cloud Encyclopedia\" class=\"wp-image-9923\" style=\"width:528px;height:auto\" srcset=\"http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/adsdsada-300x160.png 300w, http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/adsdsada-1024x546.png 1024w, http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/adsdsada-768x409.png 768w, http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/adsdsada.png 1372w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><figcaption class=\"wp-element-caption\">Azure Chaos Studio (azure.microsoft.com)<\/figcaption><\/figure>\n<\/div>\n\n\n<h2 class=\"wp-block-heading\"><strong>Let's design systems better<\/strong><\/h2>\n\n\n\n<p>The discipline of chaos engineering and controlled insertion of bugs into infrastructure is developing promisingly. With cloud services, we get our hands on extensive infrastructure and advanced tools. We can easily and&nbsp;<strong>safely model outages as they actually occur statistically<\/strong>&nbsp;(we can base this on provider figures for component availability).<\/p>\n\n\n\n<p>We match the resulting behavior of the infrastructure, which is under the pressure of managed outages, to the custom configuration of the infrastructure and software tools so that the resulting number matches the SLA for the application provided. With fault injection tools, we have the ability to&nbsp;<strong>stop designing robustness by looking out of the window, but by drawing on tangible experience and the language of numbers<\/strong>.<\/p>\n\n\n\n<p>When do you think the adoption of chaos engineering will happen in your company? Let me know in the article discussion if you would dare to adopt this approach today!<\/p>","protected":false},"excerpt":{"rendered":"<p>If it works, don't touch it? No. You have to proactively tinker with the IT architecture to deal with real incidents. Vivat fault injection!<\/p>","protected":false},"author":11,"featured_media":9907,"template":"","meta":{"_acf_changed":true,"_kad_blocks_custom_css":"","_kad_blocks_head_custom_js":"","_kad_blocks_body_custom_js":"","_kad_blocks_footer_custom_js":"","_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"_kad_post_classname":""},"categories":[126,127],"class_list":["post-9844","encyklopedie-cloudu","type-encyklopedie-cloudu","status-publish","has-post-thumbnail","hentry","category-cloud-computing","category-cloud-governance"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Fault injection neboli rozbij si to s\u00e1m! | Encyklopedie cloudu ORBIT<\/title>\n<meta name=\"description\" content=\"Kdy\u017e to funguje, tak nesahat? Kdepak. Do IT architektury mus\u00edte preventivn\u011b \u0161\u0165ourat, abyste ust\u00e1li re\u00e1ln\u00e9 incidenty. Vivat fault injection!\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"http:\/\/4.184.192.234\/en\/encyklopedie-cloudu\/fault-injection-neboli-rozbij-si-to-sam\/\" \/>\n<meta property=\"og:locale\" content=\"en_GB\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Fault injection neboli rozbij si to s\u00e1m! | Encyklopedie cloudu ORBIT\" \/>\n<meta property=\"og:description\" content=\"Kdy\u017e to funguje, tak nesahat? Kdepak. Do IT architektury mus\u00edte preventivn\u011b \u0161\u0165ourat, abyste pak ust\u00e1li re\u00e1ln\u00e9 incidenty. Vivat fault injection!\" \/>\n<meta property=\"og:url\" content=\"http:\/\/4.184.192.234\/en\/encyklopedie-cloudu\/fault-injection-neboli-rozbij-si-to-sam\/\" \/>\n<meta property=\"og:site_name\" content=\"ORBIT | create IT your own way\" \/>\n<meta property=\"article:modified_time\" content=\"2024-10-31T13:32:09+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/4.184.192.234\/wp-content\/uploads\/2022\/01\/EC17-scaled.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"2048\" \/>\n\t<meta property=\"og:image:height\" content=\"1072\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:title\" content=\"Fault injection neboli rozbij si to s\u00e1m! | Encyklopedie cloudu ORBIT\" \/>\n<meta name=\"twitter:description\" content=\"Kdy\u017e to funguje, tak nesahat? Kdepak. Do IT architektury mus\u00edte preventivn\u011b \u0161\u0165ourat, abyste pak ust\u00e1li re\u00e1ln\u00e9 incidenty. Vivat fault injection!\" \/>\n<meta name=\"twitter:image\" content=\"http:\/\/4.184.192.234\/wp-content\/uploads\/2022\/01\/EC17-scaled.jpg\" \/>\n<meta name=\"twitter:label1\" content=\"Estimated reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"10 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"http:\\\/\\\/4.184.192.234\\\/encyklopedie-cloudu\\\/fault-injection-neboli-rozbij-si-to-sam\\\/\",\"url\":\"http:\\\/\\\/4.184.192.234\\\/encyklopedie-cloudu\\\/fault-injection-neboli-rozbij-si-to-sam\\\/\",\"name\":\"Fault injection neboli rozbij si to s\u00e1m! | Encyklopedie cloudu ORBIT\",\"isPartOf\":{\"@id\":\"http:\\\/\\\/4.184.192.234\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"http:\\\/\\\/4.184.192.234\\\/encyklopedie-cloudu\\\/fault-injection-neboli-rozbij-si-to-sam\\\/#primaryimage\"},\"image\":{\"@id\":\"http:\\\/\\\/4.184.192.234\\\/encyklopedie-cloudu\\\/fault-injection-neboli-rozbij-si-to-sam\\\/#primaryimage\"},\"thumbnailUrl\":\"http:\\\/\\\/4.184.192.234\\\/wp-content\\\/uploads\\\/2021\\\/12\\\/dsadsad.png\",\"datePublished\":\"2021-11-18T11:00:32+00:00\",\"dateModified\":\"2024-10-31T13:32:09+00:00\",\"description\":\"Kdy\u017e to funguje, tak nesahat? Kdepak. Do IT architektury mus\u00edte preventivn\u011b \u0161\u0165ourat, abyste ust\u00e1li re\u00e1ln\u00e9 incidenty. Vivat fault injection!\",\"breadcrumb\":{\"@id\":\"http:\\\/\\\/4.184.192.234\\\/encyklopedie-cloudu\\\/fault-injection-neboli-rozbij-si-to-sam\\\/#breadcrumb\"},\"inLanguage\":\"en-GB\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"http:\\\/\\\/4.184.192.234\\\/encyklopedie-cloudu\\\/fault-injection-neboli-rozbij-si-to-sam\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-GB\",\"@id\":\"http:\\\/\\\/4.184.192.234\\\/encyklopedie-cloudu\\\/fault-injection-neboli-rozbij-si-to-sam\\\/#primaryimage\",\"url\":\"http:\\\/\\\/4.184.192.234\\\/wp-content\\\/uploads\\\/2021\\\/12\\\/dsadsad.png\",\"contentUrl\":\"http:\\\/\\\/4.184.192.234\\\/wp-content\\\/uploads\\\/2021\\\/12\\\/dsadsad.png\",\"width\":1120,\"height\":520,\"caption\":\"Fault injection neboli rozbij si to s\u00e1m! | Encyklopedie cloudu ORBIT\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"http:\\\/\\\/4.184.192.234\\\/encyklopedie-cloudu\\\/fault-injection-neboli-rozbij-si-to-sam\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"http:\\\/\\\/4.184.192.234\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Fault injection neboli rozbij si to s\u00e1m!\"}]},{\"@type\":\"WebSite\",\"@id\":\"http:\\\/\\\/4.184.192.234\\\/#website\",\"url\":\"http:\\\/\\\/4.184.192.234\\\/\",\"name\":\"ORBIT | create IT your own way\",\"description\":\"ORBIT | create IT your own way\",\"publisher\":{\"@id\":\"http:\\\/\\\/4.184.192.234\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"http:\\\/\\\/4.184.192.234\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-GB\"},{\"@type\":\"Organization\",\"@id\":\"http:\\\/\\\/4.184.192.234\\\/#organization\",\"name\":\"ORBIT s.r.o.\",\"url\":\"http:\\\/\\\/4.184.192.234\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-GB\",\"@id\":\"http:\\\/\\\/4.184.192.234\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"http:\\\/\\\/4.184.192.234\\\/wp-content\\\/uploads\\\/2020\\\/11\\\/logoslogan-01.png\",\"contentUrl\":\"http:\\\/\\\/4.184.192.234\\\/wp-content\\\/uploads\\\/2020\\\/11\\\/logoslogan-01.png\",\"width\":1417,\"height\":829,\"caption\":\"ORBIT s.r.o.\"},\"image\":{\"@id\":\"http:\\\/\\\/4.184.192.234\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.linkedin.com\\\/company\\\/orbit\\\/\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Fault injection or break it yourself! | ORBIT Cloud Encyclopedia","description":"If it works, don't touch? No, it's not. You have to proactively tinker with IT architecture to deal with real incidents. Vivat fault injection!","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"http:\/\/4.184.192.234\/en\/encyklopedie-cloudu\/fault-injection-neboli-rozbij-si-to-sam\/","og_locale":"en_GB","og_type":"article","og_title":"Fault injection neboli rozbij si to s\u00e1m! | Encyklopedie cloudu ORBIT","og_description":"Kdy\u017e to funguje, tak nesahat? Kdepak. Do IT architektury mus\u00edte preventivn\u011b \u0161\u0165ourat, abyste pak ust\u00e1li re\u00e1ln\u00e9 incidenty. Vivat fault injection!","og_url":"http:\/\/4.184.192.234\/en\/encyklopedie-cloudu\/fault-injection-neboli-rozbij-si-to-sam\/","og_site_name":"ORBIT | create IT your own way","article_modified_time":"2024-10-31T13:32:09+00:00","og_image":[{"width":2048,"height":1072,"url":"http:\/\/4.184.192.234\/wp-content\/uploads\/2022\/01\/EC17-scaled.jpg","type":"image\/jpeg"}],"twitter_card":"summary_large_image","twitter_title":"Fault injection neboli rozbij si to s\u00e1m! | Encyklopedie cloudu ORBIT","twitter_description":"Kdy\u017e to funguje, tak nesahat? Kdepak. Do IT architektury mus\u00edte preventivn\u011b \u0161\u0165ourat, abyste pak ust\u00e1li re\u00e1ln\u00e9 incidenty. Vivat fault injection!","twitter_image":"http:\/\/4.184.192.234\/wp-content\/uploads\/2022\/01\/EC17-scaled.jpg","twitter_misc":{"Estimated reading time":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"http:\/\/4.184.192.234\/encyklopedie-cloudu\/fault-injection-neboli-rozbij-si-to-sam\/","url":"http:\/\/4.184.192.234\/encyklopedie-cloudu\/fault-injection-neboli-rozbij-si-to-sam\/","name":"Fault injection or break it yourself! | ORBIT Cloud Encyclopedia","isPartOf":{"@id":"http:\/\/4.184.192.234\/#website"},"primaryImageOfPage":{"@id":"http:\/\/4.184.192.234\/encyklopedie-cloudu\/fault-injection-neboli-rozbij-si-to-sam\/#primaryimage"},"image":{"@id":"http:\/\/4.184.192.234\/encyklopedie-cloudu\/fault-injection-neboli-rozbij-si-to-sam\/#primaryimage"},"thumbnailUrl":"http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/dsadsad.png","datePublished":"2021-11-18T11:00:32+00:00","dateModified":"2024-10-31T13:32:09+00:00","description":"If it works, don't touch? No, it's not. You have to proactively tinker with IT architecture to deal with real incidents. Vivat fault injection!","breadcrumb":{"@id":"http:\/\/4.184.192.234\/encyklopedie-cloudu\/fault-injection-neboli-rozbij-si-to-sam\/#breadcrumb"},"inLanguage":"en-GB","potentialAction":[{"@type":"ReadAction","target":["http:\/\/4.184.192.234\/encyklopedie-cloudu\/fault-injection-neboli-rozbij-si-to-sam\/"]}]},{"@type":"ImageObject","inLanguage":"en-GB","@id":"http:\/\/4.184.192.234\/encyklopedie-cloudu\/fault-injection-neboli-rozbij-si-to-sam\/#primaryimage","url":"http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/dsadsad.png","contentUrl":"http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/dsadsad.png","width":1120,"height":520,"caption":"Fault injection neboli rozbij si to s\u00e1m! | Encyklopedie cloudu ORBIT"},{"@type":"BreadcrumbList","@id":"http:\/\/4.184.192.234\/encyklopedie-cloudu\/fault-injection-neboli-rozbij-si-to-sam\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"http:\/\/4.184.192.234\/"},{"@type":"ListItem","position":2,"name":"Fault injection neboli rozbij si to s\u00e1m!"}]},{"@type":"WebSite","@id":"http:\/\/4.184.192.234\/#website","url":"http:\/\/4.184.192.234\/","name":"ORBIT | create IT your own way","description":"ORBIT | create IT your own way","publisher":{"@id":"http:\/\/4.184.192.234\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"http:\/\/4.184.192.234\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-GB"},{"@type":"Organization","@id":"http:\/\/4.184.192.234\/#organization","name":"ORBIT s.r.o.","url":"http:\/\/4.184.192.234\/","logo":{"@type":"ImageObject","inLanguage":"en-GB","@id":"http:\/\/4.184.192.234\/#\/schema\/logo\/image\/","url":"http:\/\/4.184.192.234\/wp-content\/uploads\/2020\/11\/logoslogan-01.png","contentUrl":"http:\/\/4.184.192.234\/wp-content\/uploads\/2020\/11\/logoslogan-01.png","width":1417,"height":829,"caption":"ORBIT s.r.o."},"image":{"@id":"http:\/\/4.184.192.234\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.linkedin.com\/company\/orbit\/"]}]}},"taxonomy_info":{"category":[{"value":126,"label":"Cloud computing"},{"value":127,"label":"Cloud governance"}]},"featured_image_src_large":["http:\/\/4.184.192.234\/wp-content\/uploads\/2021\/12\/dsadsad-1024x475.png",1024,475,true],"author_info":{"display_name":"Kamil Kov\u00e1\u0159","author_link":"http:\/\/4.184.192.234\/en\/author\/bd3f10283261c790\/"},"comment_info":"","_links":{"self":[{"href":"http:\/\/4.184.192.234\/en\/wp-json\/wp\/v2\/encyklopedie-cloudu\/9844","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/4.184.192.234\/en\/wp-json\/wp\/v2\/encyklopedie-cloudu"}],"about":[{"href":"http:\/\/4.184.192.234\/en\/wp-json\/wp\/v2\/types\/encyklopedie-cloudu"}],"author":[{"embeddable":true,"href":"http:\/\/4.184.192.234\/en\/wp-json\/wp\/v2\/users\/11"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/4.184.192.234\/en\/wp-json\/wp\/v2\/media\/9907"}],"wp:attachment":[{"href":"http:\/\/4.184.192.234\/en\/wp-json\/wp\/v2\/media?parent=9844"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/4.184.192.234\/en\/wp-json\/wp\/v2\/categories?post=9844"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}