Amazon AWS Certified Database Specialty – Amazon RDS and Aurora Part 4

  1. Aurora failover priorities in action – Demo

In this demo. Let’s look at how failure tiers work when we have replicas of different sizes. Here I have an Aurora MySQL cluster with one Writer and four read replicas. And you can see that these replicas are placed in different az’s and they are also of different sizes. We have one reader which is of size two X large. We have another reader which is of size X large. The third reader is also of the size X large. And then we have a smaller reader which is of the size large and a Writer instance is of the size X large. So we have two replicas with the same size as the Writer instance and we have one replica smaller than the Writer instance and one replica larger than the Writer instance. And all these replicas as well as the Writer instance have their default failure priority.

So I have not assigned any specific failure priority to these instances. Let’s look at how failover happens in this case. Let’s look at the configuration for the Writer instance. Select the Writer and we’ll go to the configuration tab and you can see that the failure priority here is one. Similarly, for the reader replica, you see the same failure priority. So for all the replicas as well as for the reader, we have the same failure priority and that’s priority one.

All right? So let’s go back to the database screen and we’re going to test a failover or we’re going to simulate a failover to see which instance or which replica does this cluster fail over to. All right? So let’s go to Actions and choose to failure. Do you want to fail over? Yes. So now we have to wait for about 30 seconds for the failure to complete.

And I’m going to pause the recording here and come back in about 30 seconds. And now we can see that the Writer has changed and the original Writer has now become a reader. And Aura has a new Writer and that is the replica which is the largest one, the two X Large. The replica with size two X Large is now the new Writer. Now you know that if the failure priorities are same, then Aura will fail over to the largest instance.

All right, and we see some recommendations here. Let’s see what RDS has to recommend for us. And you can see that the recommendation says DB clusters with heterogeneous DB instance classes. So what this means is RDS always recommends that you have all the replicas of the same size as your primary instance for the best performance.

So unless your use case demands, you should always have all the replicas of the same size. All right, so what audience is saying is to use the X Large instance for all the database instances and now our Writer is too X Large. But I believe this recommendation was generated when our Writer was using the X Large size. So if you look at these recommendations in some time, RDS is going to generate a new recommendation asking us to use the two X large size for all the DB instances, because now our Writer instance or the primary instance is of two X large size.

All right? And if you click on this info, you can see additional information and it says, we recommend that all the DB instances in a DB cluster use the same DB instance class. And when database instances use different DB instance classes, it can cause workload imbalances among the DB instances in the cluster. For example, performance might suffer if a less powerful instance class is promoted to replace a more powerful DB instance class. And it kind of makes sense, right? So unless your use case demands, you should always use same size of instances for all the replicas as your primary database instance. So ideally, you can have your replica larger than the primary instance, but never smaller than for example, if you have analytical workload.

And if you use a particular replica to serve analytical workloads, you might want to increase the size of that instance or of that read replica and dedicate that read replica only for the analytical purposes. You may have that kind of use case, but otherwise, if you do not have any such use cases, then it’s always a good idea to use uniform sizes for all the instances in your database cluster. All right? And now our Writer instance is the largest one, and all the replicas are smaller than the Writer. So now let’s find out what happens if we fail over again. All right? So which one do you think it should fail over to? Let’s find out. So let’s fail over once again.

And remember, you don’t have to select the Writer node to fail over. You can just select any of the DB instance and choose Failure from the Actions menu. And the cluster is going to fail over as per the failure priorities are depending on the sizes of your database instances. So let’s go ahead and fail over and again. I’m going to come back in 30 seconds, all right? And there you go. Now, you can see that the new Writer has the size X large. So we had three X large instances to fail over two, and Aura has chosen one of the three largest instances. We can see that Aura has chosen the largest instance of the lot. So out of the four instances, three of the instances, three of the replicas are of size X large, and one replica is this size large, so it’s a smaller one. And we have three replicas of the same size. So in this case, Aura will fail over to an arbitrary replica, but the largest one of the lot.

All right? So generally the small instance will never be failed over to, right, unless it has higher priority. So what I’m going to do is I’m going to modify this small instance and change its failover priority from one to zero and then we’ll see that when we fail over, this instance will be failed over two. Let’s try that out quickly. So here we are on the Modify screen. So here we can see that the priority is tier one. So I’m going to change this to a tier zero, which is higher priority than tier one. All right? And scroll down and continue. I’m going to choose apply immediately because we want to see it in action immediately. All right? And downtime doesn’t matter for us as we are just testing it out. Modify DB instance and it’s going to take a while before this modification completes. So I’m going to pause the recording here and come back once this instance is available again.

All right? And now the small instance is available again. So let’s quickly cross check the priority of this instance by going into the configuration tab. And you can see that this instance now has a failover priority of Sirius and you can see that all other instances have priority one. So let’s go ahead and simulate a failover for one last time and you can do that from this screen as well. So from the Actions menu I’m going to choose to fail over.

Do you want to fail over? Yes. Let’s go back to the databases screen and this is our Writer for now and I’m going to pause here and come back in about 30 seconds. All right, and now you can see that the original Writer has now become a reader and our smallest replica is now a Writer because it has the highest failover priority. So now you how failure priorities work. And if failure priorities are same, then how Aurora decides which replica to failure to. So that’s about it. I hope this demo was useful for you and let’s continue to the next lecture.

  1. Fast failover in Aurora PostgreSQL

Now let’s look at fast failover in Aura postgres SQL. We have briefly seen this earlier as well. So whenever there is a failover in Aura Postgres SQL, your replica is going to face some kind of performance lag. And the performance lag is partly because of the cache data. So the cache data in your promoted replica is not going to be same as the cache data in the previous master, right? So this is where you can use CCM. We have already talked about CCM so you can set APG CCM enabled equal to one.

And of course there are other reasons for the performance lag as well. So we’re going to discuss what are the options we have to ensure that the failovers are fast and without any performance lag. So the first way to ensure this is using CCM or enabling CCM. And second thing you should ensure is that your application uses the cluster reader or custom endpoints and it’s not using the instance endpoints. And in case it is using these instance endpoints, you should always ensure that you clean up and reestablish connections post a failover.

You can also add automatic retry capability to your application, so your application can automatically retry if it’s unable to connect to the database. Another nifty thing that you could do is set your TCP Keep Alive values aggressively. By aggressively, I mean keep them as low as possible. So what this is going to do is your application will quickly close any connections if it is not able to connect to the database. And the recommended TCP keep alive settings as per AWS are TCP keep alive time equal to 1 second.

The Keep Alive interval also has 1 second and Keep Alive probes equal to five. So it’s going to try about five times every 1 second. So in about 5 seconds, your application will close the connection to the database if it is enabled to connect to it. So your application is going to close the connection and not keep retrying if it’s enabled to reestablish the connection. Another thing you could try is reduce the DNS cache timeouts or the DNS TTL values. So typically what happens is your reader endpoint cycles through available reader nodes. And if your client or if your application is caching the DNS info, then what’s going to happen is in case of a failure, the requests will keep going to the old instance instead of the newly failed over instance until the DNS cache times out.

So keeping the DNS TTL values to low will ensure that your application is hitting the correct endpoints, right? So keeping the TTL value low will ensure that your application reestablishes connection quickly and comes back online as soon as possible. In addition to this, you can use separate connection objects for long and short running queries. And another thing you could consider is using Aura Postgres SQL connection string with multiple hosts. Let’s look at what this connection string looks like when you use it with multiple hosts, you simply specify a list of hosts in your JDBC connection string. So it looks something like this. So you have your master cluster and you have your replica. So the first one is your writer endpoint, and the second one is the reader endpoint. Okay, so you can use both these endpoints in your connection string. So if one endpoint is not available, the second can be tried out.

And instead of putting a list of hosts in your connection string, you can also maintain a separate file. For example, you can create a text file containing the list of hosts. These two endpoints you can put in a file, and then your application can read this file and populate the host section of your JDBC connection string. And the JDBC connection driver is going to loop through all the nodes on this list to find a valid connection. So you have multiple connection endpoints to try before your application gives up. And within the connection string, you can see there are various parameters that are marked in red. Here, you can consider setting them aggressively so your app doesn’t wait too long on any host that is not responding. And in addition, you can also consider maintaining a file with instance endpoints. So all the instance endpoints, you can put them in a text file like this. But only thing here you should remember whenever you use instance endpoints is whenever you add or remove any nodes, you have to update this file. So your application must ensure that this file is always up to date. All right, that’s about it. Let’s continue.

  1. Cluster replication options for Aurora MySQL

In this lecture. Let’s talk about the cluster replication options for aura MySQL. So here we are talking about the cluster replication and not the replication within a single cluster. So we are talking about replication between two aura clusters. So this is how it looks. We have one aura cluster and second aura cluster and we have a couple of options of replicating data between two such aura clusters. All right. Replication between clusters means we can have more than 15 read replicas. So each cluster can have about 15 replicas. We can have more than 15 replicas if we replicate data between two clusters.

So we can have replication between two aura MySQL database clusters in different regions. This will be cross region replication. We can have replication between two aura MySQL database clusters in the same region. So in this case, we’ll have more than 15 read replicas in the same region. And we can also have replication between RDS MySQL database instance and an aura MySQL database cluster. Now, this is typically used for migration purposes. All right.

So let’s look at these three options one by one. First, the cross region replication between two aura MySQL database clusters. You have your source cluster in one region, and to use cross region replication, you have to enable binary logging on the source cluster. And you do that using the bin log format parameter. Now, we have talked about the bin log format parameter when we talked about reliability features of Aurora. And we know that we should generally disable binary logging to improve the crash recovery time of your aura clusters. But this one here is an exception. When you want to use external replication, like the cross region replication here, you have to enable binary logging. If you don’t need cross region replication, then you can always turn off binary logging by setting the bin log format parameter to off.

And once you have enabled binary logging on this source cluster, you can then create a cross region read replica in another region. And the second region is going to use bin log replication to get its data from the source region. And of course, for Dr purposes, you can always promote the replica to a standalone database cluster. All right. So in case you have a regional failure in the source region you can always promote the cross region read replica to be the new master. All right? So that’s how cross region replication works. Then let’s look at the second option of replication between two aurora MySQL clusters in the same region. All right, so you have a source cluster and a target cluster.

The first thing you need to do is enable binary logging on the source cluster. You do that using the bin log format parameter, and then you replicate using a snapshot of the replication master. So you have to create a snapshot and you can use the snapshot to load the data into the target or a cluster. And once you have done that, then you can start the bin log replication between the two clusters. So that’s how you would replicate between two aura MySQL clusters within the same region. Now, let’s look at the third option. That is replication between an audience database instance and an aura database cluster.

And to do this, you simply create an aura read replica of your RDS TB instance. You simply go to the RDS console for the source database and choose the option to create an aura read replica. And as I said earlier, this is typically used for migration rather than for ongoing replication. So to do this, you first stop any writes on the master instance and once the replication lag is zero, you can promote the aura replica to be a standalone database cluster or the aura database cluster. So in effect, you are migrating from MySQL on RDS or PostgreSQL on RDS to aura. And this is going to provide you with minimal downtime. So that’s about it. Let’s continue to the next lecture.

  1. Aurora replicas vs RDS MySQL replicas

In this lecture. Let’s compare the aura replicas with MySQL replicas. So by my sequel replicas, I mean the replicas of MySQL on RDS. So here is the comparison. So when we talk about Replicas, aura Replicas support up to 15 read Replicas, whereas RDS MySQL supports up to five replication in both the types of Replicas is asynchronous. But RDS Replicas can have replication lag in seconds, while Aura Replicas will have replication lag in milliseconds. Okay? And the performance impact on primary is low in case of Aura Replicas, whereas it is a little bit on a higher side. In case of RDS MySQL, aura Replicas can be within a region, whereas for RDS MySQL, you can have cross region Replicas old set, right? So you can have your Replicas sitting in another region.

Now, you might say that you can have cross region replication with Aura Replicas, but remember that when you create a cross region Replica, that Replica has its own storage layer. So that Replica has its own cluster and its own storage layer. Okay?

So it’s going to be a separate cluster when we talk about a cross region Replica in Aura. Now, Aura Replicas act as a failover target, and they do support automatic failure, so there is no data loss. But RDS, MySQL Replicas do not support automatic failure. They do support failover. But you have to manually promote the RDS Replicas to be a standalone database.

There is a potential for data loss of a few minutes. Now, as I said, aura Replicas support automatic failover, while RDS Replicas do not. Now, RDS Replicas support user defined replication delay. So you can configure the replication delay in case of RDS MySQL, but Aura Replicas do not support that feature. And again, with MySQL RDS, you can have different data or schema in the Replica versus the primary instance. But that’s not possible with Aura Replicas because Aura Replicas use the same data storage layer as the primary instance.

And remember, in RDS MySQL, Replicas can be made writable by setting the read only parameter to zero. So you can definitely have a different data and schema in RDS Replicas, but that’s not possible with Aurora Replicas. All right, so let’s continue.

  1. Comparison of RDS deployments

Now let’s compare different audios deployments. Okay, so we’re going to compare read Replicas Multiac deployments that is in a single region and multiregion deployments. So we’re going to talk about RDS as well as Aura with the purpose of these deployments read Replicas the main purpose is scalability. So you use read Replicas ask for read scaling. For multiaz, the purpose is high availability, so if one instance goes down, the second one can take over. Whereas multiregion deployments are typically used for Dr and for performance purpose. So if there is a regional failure, the multiregion deployments will allow you to recover quickly from those failures.

Replication method in case of read Replicas is asynchronous, it’s eventually consistent. Same case with multiregion deployments, the replication is asynchronous in case of multi easy deployments. In RDS we have synchronous replication to the Standby instance. And remember, the Standby is a passive instance, it’s not an active instance. And in case of Aurora, even if it is multiaz, we have asynchronous replication. In terms of accessibility, all the read Replicas can be used for read scaling. But in case of multiaz RDS deployments, the RDS standby instance is not accessible when the primary instance is active. The Standby instance will be accessible only when the primary goes down.

But in case of Aura this is not the case because in Aurora read Replicas act as failover targets. So replicas are always available. And in case of a failover, the primary can fail over to a Replica. So it’s an active active configuration. And in case of multiregion deployments, all regions can be used for reads. And talking about the automated backups, backups on read Replicas are not configured by default. In case of Multiac RDS deployments, automated backups are always taken from the Standby instance so as to improve the performance of the primary instance. And in Aurora, the backups are taken from the storage layer so there is no impact on the compute layer.

And in multiregion deployments, backups can be taken in each region. Now, talking about the instance placement, read Replicas can be placed within AZ, cross AZ or cross region. For multiaz, you at least need two AZ within the region. So that’s the definition of multi AZ anyways. And for multiregion, each region can be multi AZ. All right, so in each region you can have a multiaz set up. Talking about upgrades, where is the RDS? The upgrades on the read Replicas are independent of the source instance. In terms of multiaz deployments, upgrades happen on primary. So the way it happens is the Standby is first upgraded and then the primary fails over to the standby.

So that’s how the upgrades happen on a Multiac RDS set up. And in terms of multiregion RDS, the upgrades happen independently in each region. In case of Aura though, all instances are upgraded together. And finally, talking about disaster recovery or Dry in RDS, the Replicas have to be promoted manually to be a standalone database instance this is not an automated process, but failover to a standby is, of course, automatic. Now, in case of aurora read replicas, you can promote any replica to a primary instance and you can also fail over to an aurora replica automatic in case of an outage. And in terms of multiregion deployments, you can always promote a secondary region to be the new master. And remember that cross region replica promotion is a manual process, it’s not an automated process. All right, so that’s about it. Let’s continue to the next lecture.

  1. How to invoke Lambda functions from Aurora MySQL

In this lecture, let’s talk about how to invoke lambda functions from aura MySQL. So to do this, you have to give your aura MySQL access to lambda. And you can do that by setting a DB cluster parameter called AWS default lambda role. And you have to set this particular parameter with the appropriate IA am role arm. And once you do that, you can then use a stored procedure or a built in native function to invoke lambda functions.

The first option using the stored procedure is you simply call the stored procedure and pass the lambda function arm and the JSON payload to it. And typically you would wrap your calls in another stored procedure and call that stored procedure through triggers or through your application code. And this option is deprecated. So typically you would want to use the native functions supported in aura MySQL, and these are lambda sync and lambda async.

And to use these native functions, users must have invoke lambda privilege. And you can grant this privilege using the grant SQL statement something like this. So grant invoke lambda on startup star to a particular user or something like that. And calling these native functions is very easy. You simply call the lambda syncs or lambda async functions and pass the lambda, the lambda function arm and the JSON payload to these native functions. So a typical use case could be you want to send some notifications from your aura MySQL clusters. You can trigger a lambda function and the lambda function can then send certain notifications through SNS, so the users can receive emails or text messages from SNS. All right, let’s continue.

  1. How to load data from S3 into Aurora MySQL

Now let’s look at how to load data from S three into your aura MySQL cluster. So to do this, you can use SQL statements like load data from S three or load XML from S three. So you have your S three bucket, and you simply call these SQL statements to load data from S three into your aura MySQL. So an example could be something like this your data from S three followed by the path to your file sitting in S Three into your aura table. Name fields terminated by comma lines terminated by N. So this indicates that you’re using a CSV file or a comma separated file, and that followed by the columns into which you want to load your data. So that’s a simple example of loading data from S three into your aura MySQL table.

And to use these SQL statements, you must give your aura cluster access to S three. And you can do that by setting a DB cluster parameter. You can either do that using the parameter aura load from S three Role or AWS default S three role, and you can set these parameters with the value of appropriate Im role ARN that grants the cluster access to S three. And in addition, the user must also have load from S three privilege.

And you can use grant statements to provide this privilege to your users. And to grant this privilege, you can simply use the grant statement like grant load from S three on this table to this user. All right, that’s about it. Let’s continue to the next lecture.

  1. RDS / Aurora – Good things to know

All right, so in this lecture, we’re going to talk about some of the good things that you should know about RDS and Aura. First, something about stopping your RDS database. So you can stop your RDS instance only if it does not have a Replica. So you cannot stop an RDS instance if it has a Replica. And if you try to do that, you’re going going to run into error that looks something like this. Cannot stop or start a read Replica instance. And you cannot stop an RDS Replica either. You can only stop an RDS instance if it doesn’t have a Replica. And if you’re using Microsoft SQL Server instance on RDS, then you cannot stop it if it’s in a multiaz deployment. All right, now let’s talk about stopping an Aura database. So you can only stop the Aura cluster, but not the individual database instances. And if you’re using Aura Serverless, you cannot manually stop it.

We already know that Aura Serverless supports automatic pods, so it’s going to anyway stop if there is no activity on the database. So you cannot manually stop an Aura Serverless cluster. And in a similar fashion, if you’re using Aura Multimaster or Aura Global Database, you cannot stop them either. And if you have Aura Parallel Query Enabled, you cannot stop that cluster as well. And remember, if you have stopped your Aura Database cluster, then you have to start it first before you can delete it. And once you stop a cluster, you should start it again within seven days. And if you don’t do that, then RDS is going to automatically start that cluster.

All right, then maximum connections in RDS or in Aura is controlled via parameter groups. And each database engine has a particular formula to define the default value for max connections parameter. So, as you can see, Aura has a particular formula, aura Postgres has a particular formula, and RDS MySQL also has a particular formula. You can, of course, override this value using a custom parameter group, but ideally you would want to scale your instance to get higher max connections. All right, so with that, we come to the end of this section. I know this was a long section, but a very important one. So that’s about it. And let’s continue to the next section.

img