Amazon AWS DevOps Engineer Professional – Configuration Management and Infrastructure Part4

  1. CloudFormation – Drift Detection

So now let’s talk about drift. Drift is the detection of the fact that the cloud formation template, which has been created, has deviated from its configuration due to manual intervention. So let’s have a look. I have a template called Drift Security Group. And this security group has one parameter called VPC ID to reference the VPC. And then we create two security groups: an SSH security group and an HTTP security group. So let’s go ahead and do this, and then we’ll modify them manually. So I’ll create the stack, upload the template file, and choose number 16. Then I’ll click on Next, and I’ll call it Demo Drift. VPC ID is the one I have, so let’s click on Next.

I’ll click on Next and create the stack. Here we go. So this is going to create two security groups. And so my security groups have been created. And if I go to Services and I go to EC Two, we should start seeing our security groups. So, if I go to security groups, I have this one called Demo Drift and Demo Drift on the left hand side. So these two security groups have been created by the MA clad Formation Stack. Okay, now if I go into Stack Action and click on “Detect Drift,” it’s saying that a drift detection was initiated for this confirmation template. And in Stack Actions, I’m going to click on “view drift results.” So the drift status is saying that it’s in sync. That means that what has been created by the cloud formation templates is exactly in sync with the configuration of these two security groups. And that’s not a surprise because we haven’t done anything other than create those using cloud formation. But now what if we go to the security group and start messing around a bit? So, for example, instead of port 80, I’m going to say it’s port 80 80.

So I can add a rule and say it’s going to be port 80, and maybe from anywhere. This is great. And maybe here I’m going to change the IP; I’m going to say 192. I’m going to add a description, for example, “changed rule.” And likewise, maybe I’m going to go ahead and have my SSH security group edit it, or maybe I’m just going to remove this rule altogether. So I’ve done a lot of manual stuff, and now if we go back to confirmation and check the drift again, we click on “Detect Stack Drift,” it’s going to run a drift initiation again, and it’s going to check whether or not the state of our confirmation template is in sync with the state of our security groups. So let’s refresh this page. Now we learn that the Drift status has shifted. And so if we go in here, we can see that these two resources have been modified manually. And so we’re aware that something happened, and some manual changes have happened for some reason. So let’s go to our HTTP security group and click on View drift details.

And then here we can see that it has been modified, and we can see that the security group ingress is not equal. It’s supposed to be zero, but the current value is something very different. And then two rules have been added for port 8080 on this cider and from anywhere on TCP. This is critical to comprehend. So it shows us the expected configuration and the actual configuration we got from cloud formation. So likewise, we could go and detect the drift for our SSH security group and view the drift details. And as we can see here, this rule right here was removed. As a result, this was the expected outcome. And the actual was as follows. So the Drift feature is actually really good to see if anyone did any manual intervention on your confirmation stacks, because they could be problematic if we start updating or deleting them. As a result, you can’t use the Drift to go back in time. It’s just about noticing changes, and that’s it. That’s the feature you need to know about. And that’s it for this lecture. Just go ahead and delete your stack when you’re done. All right? I will see you at the next lecture.

  1. CloudFormation – Status Codes Deep Dive

Okay, so now here comes a very boring lecture, but we have to go over the stack status codes. So we’ve seen those in our confirmation. Whenever we create a stack, it goes, and whenever it’s completed, it’s in Create Complete. So what I’ll do is that I’ll read all the status codes one by one and comment on them. We have a few to go over, and I’ll tell you the one that’s really important to look at is Update Rollback Failed. OK, so let’s go over those.

The green tick indicates that Create Complete is running when we create astack and it is finished. “Create in Progress” is something that happens while our stack creates, and it’s usually blue. Create Failed is when the stack was not successfully created, and so it remains in that Create Failed state for us to look at and understand what went wrong. Maybe we’ll iterate on it, and then we’ll delete the stack to remove the status altogether. Then, when we delete a stack, we can go into Delete Complete. This is the point at which everything either succeeded or failed. This is when it was unsuccessful to delete Astrak, maybe because some resources are still running. So we need to delete the stack again or view the stack events to see any associated error messages and keep on deleting. In progress is when, well, the stack is being deleted.

And then the review in progress is when there is an ongoing creation of one or more stacks with an unexpected stack ID but without templates or resources. So this is a very rare status, and you don’t need to remember it. The rollback is then complete. So this is when we create a stack and it doesn’t really work and it goes into stack creation, and then there is a rollback that happens and it’s successful. We go into complete rollback complete. If it fails, it goes into “Rollback Failed.” And if it’s happening, it goes into a rollback in progress. And now the important ones: when we update Astrak, there’s an update complete if it was created, and an update complete cleanup in progress in case some resources were replaced; for example, a new easy-to-create instance was created, and the update is complete. But we still have to terminate and clean up the old one. So it goes into progress on cleanup in progress.

An “update in progress” is when an update is happening. And if that update fails, then it rolls back. And if it rolls back successfully, it goes into Update Rollback Complete. If some resources need to be cleaned up again, it goes into Update Rollback complete, cleanup in progress, and finally Update Rollback in progress if it is occurring. And this one status that it says is the most important one is “Update Rollback Failed.” So this is an unsuccessful return of one or more stacks to your previous working state after a failed stack update. So this is quite bad. And when you get into this state, you can delete the stack. Or you can even continue the rollback to keep on trying. And so you might need to fix the errors before your stack can return to a working state.

Or you can contact customer support to restore the stack to a usable state. So you need to know about the update rollback failure. And there are two things I want you to remember. There is an AWS DevOps blog, and again, that blog is super important to read. It is called “Continue rolling back for an update for a cloud formation that is in the update rollback failed state.” I’m not going to read that for you. You can read this on your own. But I do suggest you read it just to understand a bit more about that error code. Okay? And finally, there’s a troubleshooting section again that has a fair bit of blogging around how you can get an update rollback that failed and some explanations of what you can do. So I would suggest you read it, but at a high level. You get that state if you don’t receive the required number of signals, if there are changes to resources that would marry outside of cloud formation, and you could have used drift to detect insufficient permissions, an invalid security token, a limitation error, or some resources didn’t stabilize. Okay? So have a look at these two resources in your own time and read them. But you need to remember that an update forback failed, which is quite a bad state, and you need to understand why it happened and what you can do about it. But overall, all these tax status codes right here—all these tax status codes—should be familiar to you going into the exam. All right, that’s it. I will see you at the next lecture.

  1. CloudFormation – InsufficientCapabilitiesException

. I’m going to go with the first option, either stacks region or EU. And so this was a request that was fulfilled. And if you go to response, we have a JSON message right here. So the JSON says error message “required capabilities,” capability “I,” which is the error message shown on the left. And the error code is called “insufficient capabilities.” Exception. So I need you to look at this “insufficient capability” exception right here. This is the error code to remember if you get the Insufficient Capabilities exception error, which means you have not provided the capability required for your template to work. So it has nothing to do with your own permissions. It has nothing to do with the template itself. It has to do with the fact that you must provide confirmation in order for I’m resources to be created for you. So this is when you get this “insufficient capabilities” exception.

So you have three resources you can read. This one is called Acknowledging I’m Resources in Atos Cloud Formation Templates, which says that, yes, you need to have capability IAM or capability named I’m whenever you are creating IAM resources. Okay, this stack overflow question says the same thing. When you create IAM resources, it says you need capability IAM and capability named I’m. Otherwise, you get an “insufficient capabilities” exception. Error. And again, if you go to the API itself for Create Stack and Confirmation, it says that you need to have these two things, capability IAM or named I’m, or you will get an “insufficient capabilities” error. So I’m driving this point home three or four times. But you need to remember this whenever you get this “insufficient capabilities” exception; that means that you need to tick that little box and provide confirmation of the ability to create IAM resources. So that’s it for this lecture. It was quite a short one, but hopefully you understood it, and I will see you in the next lecture.

  1. CloudFormation – cfn-hup & cfn-metadata

So let’s go into this file called Chub YAML. And this file is really important. So we have seen the CFN in it and the CFN signal beforehand, and that was in the lectures and confirmation where you have to just look at the other ones from before. And so now we’re going to look at this other thing called the CFN Hub. So let’s get started with this template and see what it does. So there is a key name in that regard, which presents an EC2 key pair so we can SSH into our EC2 instance. We also use the latest Linux. MI ID from the parameter store to look up the public. AWS provides an Amazon Linux AMI for Amazon Linux 2. Okay? And then we have a welcome message that will be displayed by our web server, which is Hello World. So we’ll instal HTTPD on a web server and display the message Hello World. Okay? So in terms of resources, we have a webserver security group, and it has two roles.

It opens up ports 80 and 22 to anyone. So we’ll be able to add the SS section to our instance and also access it over HTTP. Excellent. We’ll scroll down and get to the web server host. So this is the important bit. This is an Amazon EC2 instance, and there is going to be a metadata block. So we’ll go into that metadata block in a second. But we have a creation policy in here saying that it’s waiting for a signal from a CFN signal, and the timeout is five minutes. So this is something we’ve seen before, and the signal itself is going to be in the metadata. So bear with me. Actually, it’s not going to be here; it’s going to be in the easy-to-use user data. So bear with me. Anyway, for the properties, for the image ID, we’re going to use that latest Linux AMIID that we got from the parameters.

And then for the key name, we’re referencing the key name we got from the parameters again, and the instance type is a t of two micro. For the security groups, we make a reference to the resource right above; that makes sense. And for the user data, we encode into base 64 the entire Blob, which we’ll substitute with a sub-function that takes some pseudo-parameters. So if we look at this user data, we can see that the first thing is that we do a Yum update to get the CFN bootstrap script, and then we do CFN in it to initialise our template, and that will use the metadata in here. So I’ll show you this in a second, and then at the bottom we’ll run it from the stack ID. So this is a substitute pseudo parameter using the sub function. The place where we get the metadata is from the web server host. The region is the aviation region. In case this tag fails, we run it again. We run the error exits function, then we start the CFN Hub Demon to listen for changes to the EC, to instance metadata. And this is the focus of this lecture. So I will show you in a second what the CFN Hub Demon does. But by just running CFN Hub, that means that we are running the Demon, and the Demon will look for refreshes in the metadata.

But don’t worry, I’ll go over this again and again. And then, after the CFN Hub has run, the Demon is started, and we will signal using the CFN signal script to this stack that we have successfully created a web server host. And this is kicking in with the creation policy. Okay, so now let’s get into the meat of our template, which is the confirmation in its metadata. Okay, so the first thing we do is run aconfig, and we have packages, which we instal using the Yum package manager, and we instal Httpd and PHP. Then, in terms of groups, we create a user group called “Apache” and a user called “Apache” that belongs to the group’s “Apache” for the AWS CLI, and we instal it from this URL, which represents the GitHub URL for the Avcli. So, if you wanted, we have an updated alias for CLI. Then we create a file called indexHTML, and it contains some content. It includes an “h1” tag. So this is HTML code and the welcome message. So the parameter welcome message in here will be referenced by this subfunction with the welcome message. So this will contain “hello world.” So it contains “hello world” and then this tagname, which is again substituted using this sub function.

The mode for the file is six four four. The owner is Apache, and the group is Apache. Then we have this file called Cfnhub comf.And so this file stores the name of the stack and the initial credentials that the CFN Hub Demon targets. So here we’re saying that the stack is the one we have right here. The region is the one we’re running this template from. So again, this subfunction is being used, and the interval to check for changes in the resource metadata is in minutes. The default is 15 minutes, but we are setting this to a default interval of two minutes. So the smaller the default interval, the more often the CFN Hub will look for changes in the cloud formation in that block.

Okay, so you need to remember this. By default, it is 15 minutes, but we are setting this to an interval of two minutes for this tag so it goes a little bit quicker, something to remember for the exam. Okay, this is owned by Root, and it will be run by Root as well. Now there will be a directory called Hooks. Dcfnotorelated.com will say, “Okay, every two minutes you’ll check for this confirmation to have changed.” And if it has changed, you are going to run this fnotorelat And this configuration file is saying that, after an update to the cloud formation in its block and metadata, you are going to run an action, and this action is going to have CFN in it. So you are going to run CFN in it again, and this time because you run it again after an update of the metadata of cloud formation in it, that means that it’s going to rerun the entire CFN init package with the updated confirmation in it. We’ll see in practise what this means in a second. So that means that we have a way to update our EC2 instances directly from this confirmation in its block. So we’ll see this in a second. Finally, for services, we are going to run HTTPD enabled, and we’ll ensure it is running all the time. So I know this is a lot of information, but let’s look at what this does in practice, and I think that will make a lot more sense.

So we’ll create a stack, and actually, by the way, you need to call it so that the scripts work really well. We have some outputs and some outputs used, and you need to name the stack CFN Hub Demo. So we’ll upload a file, and this file is going to be called CFN Hub YAML. Click on “next.” CFN Hub Demo is the name of the stack. For the key name, I’m going to run AWSDevOps, which is my key for this course. And the welcome message is going to be “Hello World.” We’ll click on Next, Next, and Create. So this will create an easy instance, and that easy instance should have an index HTML file that says Hello World. So we’ll just wait a little while until this is done. So the create is complete, and we saw that a source was being created, initiated, and about a minute and 10 seconds later it received a success signal from the CFN signal script, and therefore we went to create complete and the whole stack was done.

So if we go to our outputs, we can see here that there’s a website URL to visit. So we’ll go to the website URL, and it says, “Hello world from CFN Hub Demo.” So this has worked, and currently we have our cloud formation that has created an index.html file that contains the world. Hello World. So what I’d like to show you is that let’s go to the EC2 instance and actually create an instance, which we’ll find running here. So this is our EC2 instance, and I’m going to connect to it. So let’s connect to it using EC2-two-instance connect. Excellent. I’m going to elevate myself as a super user, so it’s going to be easier, Here we go. So I’m a superuser, and now I’m going to show you a couple of handy scripts. The first one is to get the metadata. So that’s actually the only script I’m going to show you. But we can get the metadata from cloud formation using the CFN get metadata script. CFN get metadata for the stack CFN Hub Demo, we say. So this is the stack I have created. The resource is Web Server Host, and the region is EU Region 1. And you should change this according to your region. Obviously, if you’re not in EUs One, I’ll press Enter, and this returns all the confirmations in that block that we have defined in our cloud formation templates.

And, as we can see, varw Htmlindex contains the Hello World content from the CFN Hub Demo. And that makes sense; that’s what the CFN init file did. So, okay, and I clear, and I’m just going to copy this file so that we can print it, and I’ll do this from the very top of this screen. As you can see, it says “hello, world” from the CFN hub demo. So what we’re going to do now is that we’re going to update our stack, and we’re going to change the welcome message. So I’m going to update the stack and use the current templates. And in here, I’m going to say the updated welcome message. And I just changed that one parameter. Okay. Click on Next; click on Next; and click on Update Stack. But as you can see, let’s just wait a second for the change set to be previewed. There’s not going to be a replacement for my EC2 instance, okay? My EC2 instance does not change. It’s not going to be replaced. So I’m going to say “update the stack.” And the update is now in progress. So let’s be quick and see what’s happening in this update. Only the metadata will be updated, so nothing much will happen besides the metadata changing.

So if I go to my EC2 instance, the terminal here, and do this get metadata again and press Enter, now we get the fact that for this file varindexml, it says “updated welcome message from CFN Hub Demo.” And if we refresh this page right now, it still says “Hello, well” from the CFN Hub Demo. So what will happen is that we will have to wait two minutes, and the CFN Hub demo will realise that, well, this metadata here has changed because we updated that file, and now it says “updated welcome message.” And so when it detects this within two minutes, it’s going to update my file via www.html, index.html, and therefore want to refresh this page. It’s not going to say “Hello, World” anymore. It’s going to say “updated welcome message.” And so here we go. It says there is now an updated welcome message from the CFN Hub demo. So the CFN Hub ensures that our EC2 instance, if the metadata changes in our cloud formation templates, gets all this metadata applied using CFN in it. And so we are able to change the state of our EC2 instance directly using cloud formation without replacing it in the first place. So that’s all I want to show you. You only need to remember how to use these scripts and what they do on a high level. But I really thought that a demo could be helpful to you to understand exactly how things worked in the real world for the CFN Hub and the CFN get metadata functions. All right, well, that’s it for the hands-on. Don’t forget to delete your stack when you’re done, and I will see you in the next lecture.

  1. CloudFormation – Stack Policies

So we’ll get updates rolled back in progress here, and then we’ll be back to normal. So, at the very bottom, the Stack policy was protecting and stating that any updates against this logical resource ID to be a critical security group should be denied. And like in AWS, for everything else, deny has precedence over allow. So you can say, “Okay, how can I even update my security group then?” Well, the way you can do it is by doing the exact same thing.

So we’ll update the cider http, but in your stack policy in here, you should enter a new stack policy, and that’s going to be temporary; it’s going to be reverted afterwards. But if you delete this “deny” statement in here and go back, click on Next, and then click on Update. We’ve made the conscious choice to change the stack policy for this update and to update our critical security group. And as you can see, our critical security group is being updated. Now the update is complete. Let’s go back to our stack info, and we’ll scroll down, and we can see here in the stack policy that yes, the stack policy has not changed. It’s still the same as before.

So when we update a Stack policy during an update, it is only to revert it for that particular update. Okay? And so that’s the power of the stack policy. Now you can do a lot of things with them, but the idea is to protect confirmation resources from being accidentally or intentionally updated. Sorry. And as such, we can have deny statements, but if we really, really wanted to update some resources, we could modify the stack policy at update time to perform the required updates. So I do recommend you check out that link, “Prevent Update to Stack Resources,” and it shows you a lot of examples of the kinds of policies you can define. It’s just for your own knowledge, but from the perspective of the exam, you need to know at a high level what stack policies are, how they’re used, and how they work. Alright, so that’s it, and we’ll see you in the next lecture.

img