Amazon AWS Certified Database Specialty – Amazon DynamoDB and DAX Part 4

  1. DynamoDB accelerator (DAX)

In this lecture, we are going to look at DAX, or DynamoDB Accelerator. So DynamoDB accelerator, or DAX is inmemory caching service that sits on top of Dynamo TB. And we already know that Dynamo TB provides us with single digit millisecond latency. And if that’s not enough, and if your application requires even more faster performance, you can use Tax on top of your DynamoDB tables. And that’s going to give you ultra low latencies. Like in Microseconds, a Tax cluster sits in between your DynamoDB table and the client applications.

In effect, it acts as a proxy. It does save a lot of costs because you reduce the read load on DynamoDB. So essentially, you save up on the RCS that your DynamoDB table consumes. It also helps you prevent hot partitions. And when you use DAX, you require minimal code changes to your application. So, for example, if you have been using DynamoDB and later on, at some point in time, you decide that I want to use Tax, then you really don’t need a lot of code change in your application. All you need is to point your application from the DynamoDB endpoint to the Tax endpoint, and DynamoDB takes care of the rest. So only one line of code change is all that you need to start using Tax.

Now, important thing to remember here is tax only supports eventual consistency. And if your application requests for strong consistency reads, then that request will be passed directly to DynamoDB and they will not use Tax. And Tax is not made for write heavy applications. It’s only intended for read heavy applications. It will reduce the read load from your database and improve your applications performance. And the Tax clusters sit inside the VPC. And Tax supports multi AZ, so you can have multiple nodes in different availability zones. And for production use, at least three nodes are recommended. And again, just like any other AWS service, tax is secure. It uses encryption at rest with Kms, it already uses VPC. It uses Im for access control and cloud Trail for auditing purposes. So in the next lecture, let’s look at the DAX architecture. Alright?

  1. DAX architecture

Now let’s look at the tax architecture. DAX internally has two types of caches item cache and Query cache. So these are two independent caches that are present in the Tax cluster. Item cache is the one that stores your index reads. So whenever you make a Get Item or batch Get Item request, or in other words, it’s a read request using any of the index, then the result of that query gets stored in the item cache. The default TTL is about 5 minutes.

And you specify this when you create your DAX cluster. And when the cache becomes full, the older and less popular items will get removed. You can adjust the TTL as per your needs, but this is the amount of time any item lives in the DAX cache. Then the second cache is the Query cache. And as the name suggests, it stores the results of your query and scan operations. The default TTL here as well is about 5 minutes.

And what’s important to note here is that any updates that you make to the Item cache or to the underlying DynamoDB table do not invalidate the Query cache. So the item cache and Query cache are independent of each other. So the TTL value of the Query cache should be chosen accordingly. Think of it this way. Let’s say you updated an item into DynamoDB, and that updates your item cache, but the Query cache will not be updated due to that right operation. The Query cache will still hold the old or the stale copy of the data. And that’s why you should really keep the Query cache TTL low so you do not end up reading stale data.

  1. DAX operations

In this lecture, we are going to look at how DAX operates. So we have already discussed this, that tax sits in between the client and your DynamoDB table. And tax is only meant for item level operations. Table level operations still go through to DynamoDB, so they will not be processed by the tax cash. By table level operations, I mean creating dropping a table and things like that. And when you make write requests, these write requests use something called as write through approach. What that means is whenever you write any data to your table, it will first be written to DynamoDB and then it will be updated in DAX. And write operation is considered successful if and only if both the rights are successful. And there is another approach called as write around approach, where you can bypass DAX and directly write to your DynamoDB table. And this kind of approach right around approach is sometimes useful when you want to write a large amount of data in short amount of time, and you really don’t care if it’s not updated in the DAX cache.

So in this case, of course, the item cache of the DAX cluster is going to go out of sync. So now let’s understand how DAX functions when you make your read operations. Now, when you make a read request and DAX has the data you requested, then it’s called as a cache hit, and the data will simply be returned from the DAX cluster, and there won’t be any request going through to the DynamoDB table. On the other hand, if DAX doesn’t have the data which we call as a cache miss, then it will be returned from DynamoDB and it will also be updated in DAX on the master node. So this is how a cache miss works. And important thing to remember here is if you make a strongly consistent read, then that request will be served directly from DynamoDB. And strongly consistent read request will not update the DAX cluster. This is important to note here that the DAX cluster does not get updated when you make a strongly consistent read request.

All right, ElastiCache is another caching service, another in memory caching service. And later in this course, we are going to dive deeper into Elastic cache. But here I’m just going to compare DAX versus ElastiCache when we are using it with DynamoDB. So we have a client which is accessing DynamoDB through the DAX cache. So generally with DynamoDB, DAX is the better choice than using ElastiCache. So we use DAX to cache individual objects or the results of query or scan operations. But if we are processing some aggregation results, for example, your application is compiling some aggregation results using the DynamoDB data, then that data cannot be returned to DAX. Now, DAX can only store data that is returned from DynamoDB table in response to the DynamoDB API requests. So if you have any other data like the aggregation results for example, you can use the cache like ElastiCache to store the aggregation results. All right, that’s about it. Let’s continue to the next lecture.

  1. Implementing DAX – Hands on

Now let’s understand how we implement DAX. Now, to implement DAX, we simply create a DAX cluster. A Dax cluster is a group of one or more nodes, and you can have up to ten nodes per cluster. Each node is an instance of DAX. One of the nodes will be master or the primary node, and remaining nodes will act as read replicas. DAX internally handles load balancing between these nodes, and you don’t have to worry about how it works. And AWS recommends using at least three nodes if you’re using DAX in production. Now, let’s quickly go into AWS console and see how easy it is to create a DAX cluster. All right, here I am in the DynamoDB dashboard in the AWS console, and let’s go and create a DAX cluster. So from the left menu, go to the DAX dashboard, and you can create a cluster using the Create Cluster button. You can use the video on the right to learn more about DynamoDB accelerator, but I’m not going to do that right now. You can do it yourself. So this is the tax dashboard, and you simply use the Create cluster button to create your tax cluster.

So you simply give it a name, for example, my DAX cluster. And then you choose a note type depending on your requirements. So since this is a demo, I’m going to choose the smallest node type, and you can choose the cluster size. You can use up to ten nodes, and the recommended is to create at least three nodes. So I’m going to leave it at three. And we can enable encryption if you like. And here you have to provide an IAM service role for DynamoDB to use with tax. So AWS will create this role for you if you simply choose create new, all right, and just give a name of the role. So you can say my DAX role, and this role will be created automatically by DynamoDB. You can provide a DAX subnet group, or you can also create one if you like.

My tax subnet group, something like that. And you can provide description similarly tax subnets. And you can choose your VPC, or you can create a new VPC and provide it here and choose the subnets. You generally choose at least three subnets, and then you can also choose a security group here. So I’m going to go with the default one. And for the cluster settings, I’m going to use default settings. But if you want to change these settings, you can simply untick this checkbox and you can provide the additional settings. The important parts here are the parameter group. You can choose a parameter group or create a new one, and parameter group is the one that decides your TTL. Okay, so I’m not going to change any of these. I’ll leave it at defaults and launch the cluster. Now, it’s going to take a few minutes for the cluster to launch, and once it’s available. We should be able to see it in the DAX dashboard.

All right, now we can see that the cluster is creating. And while the cluster is getting created, let’s go and take a look at the parameter groups. Now, the parameter group that was associated with our cluster is this one. So I’m just going to open it and you can see that we have a query TTL and item detail. These are the only two parameters that are available for your configuration. So you can simply whenever you want to change, you can edit and you can provide your TTL in minutes, seconds, hour stays. Generally you put them in minutes. So the default TTL is five minutes.

But depending on your applications requirement, you can increase or decrease the TTL. Remember that query TTL and item TTL are different and both of them don’t affect each other because both the query and item caches are independent of each other. Right? So that was about the tax parameter groups going back to clusters. We can see that it’s still creating. So I’m going to pause the video here and we’ll come back once this Tax cluster is ready. All right, now we can see that our DAX cluster is available. So you can go ahead and click on the cluster name and here it will show us all the details about the cluster. So here we have our cluster endpoint. Now, this is the end point you should use when you make your DynamoDB request. The way to use DAX Cluster is simply to send or simply to redirect your DynamoDB API requests to the Tax Cluster endpoint. Instead of sending your requests to DynamoDB Endpoint, you simply send them to the Tax Cluster endpoint. And that’s the only change you need to make in your application code to get the Tax Cluster implemented in your application. If you go on to the Nodes tab, you can see that the three nodes are available here.

And you can also use these individual node endpoints. But ideally, you should only use the cluster end point that is shown here. Right on the metrics tab, you can see different cloud watch metrics. Alarms are for the cloud watch alarms. You can create alarms here and manage them here. Tags, again, are similar to the DynamoDB tags and these are used to segregate your resources, typically for billing purposes. So that’s about tags. And now that we are done with the demo, I’m just going to delete this cluster so we are not built for it. So you can also delete any alarms. Anyways, we don’t have any alarms here, so click on the Delete button and the DAX cluster will be deleted in a while. So that’s about it. Thank you so much and let’s continue to the next lecture.

  1. DynamoDB backup and restore

In this lecture, we are going to look at the backup and restore features in DynamoDB. It’s very easy to create backups in DynamoDB, and restore process is equally easy as well. So let’s take a look. Now, backups are automatically encrypted, they are cataloged and they are easily discoverable. And we are going to see that in a while. Backups are highly scalable. What that means is you you can create or retain as many backups for your tables of any size. So there are no restrictions here. Backup operations complete in seconds and it doesn’t take long. And also, the backups are consistent across thousands of partitions within seconds. And backup process does not consume any of the provision capacity of your table. So this is really a good thing. Creating a backup does not cost you money. Of course, storing a backup will cost you money, but creating it will not consume any provision capacity. Backup process does not affect your table performance or its availability. And even if you delete the table, backups will still be retained. And you can, of course, delete backups if you like. Now, you can back up your tables to the same region as the original table, but when you restore, you can restore it to the same region or to a different region. So cross region restoration is possible.

Now, this backups feature is integrated with the AWS backup service. So you can use this AWS backup service to create periodic backups of your DynamoDB table. This is really handy and a very efficient, or I should say operationally efficient way to manage your backups. When the aid of backup service was not available, the only option was to create a lambda function and schedule it using cloud wash triggers to take backups. But now you can simply use the AWS backup service instead, and that’s much more efficient than using a lambda function. Now, the important thing to note here is when you restore a DynamoDB table from a backup, the backup can only be done to a new table. That means you have to provide a new name to your table if you want to retain the original table name. Of course, the workaround is to delete the table first before running your restore operation. But remember that restore operation is not immediate. It takes a long time, depending on the size of your table data. And you can use im policies for access control for your backups.

Now, when you talk about backups, DynamoDB provides two types of backups on demand backup and restore and continuous backup with Pitr. On demand backup and restore means you simply go into the console and create a backup, or use the API or CLI operations to create a backup of your DynamoDB table, while continuous backups with Pitr is something that you enable in your DynamoDB console. So once you enable continuous backups with Pitr, you can restore your table data to any second in the past 35 days. So restored table gets the same provision capacity as the original table at the time of backup. So whatever the capacity of the table was at the time of backup, that capacity will be applied to the restored table. When you use Pitr, the RPO is approximately five minutes. What this means is when you enable your continuous backups with Pitr, the latest restorable time can be at least five minutes in the past. So once you start using it, of course you can restore to any second in the past 35 days. But the maximum amount of data loss, that would be about five minutes. RPO is recovery point objective. So what it designates is the amount of time of your data loss. When you say that RPO is about five minutes, it simply means that you can lose maximum of five minutes of data. Now, RTO can be longer. Now, RTO is recovery time objective. So this is the amount of time it takes for DynamoDB to restore your data to a new table. So the Pitr or even the on demand backups are always restored to a new table. The restore operation always takes longer, depending on the size of your data. So RTO or Recovery time objective will be longer.

Now, RTO corresponds to the amount of time it takes for restore process. All right, now, when you run a restore operation, what are the things that get restored? First, your table data is restored, your global and local secondary indexes get restored. And if you don’t want the GSIS or LSIs to be restored, you can choose the option to restore without the secondary indexes. And the encryption settings also get restored to the original configuration and you can change that configuration when running the restore operation. Then the provision capacity that was recorded at the time of the backup will be applied to your table and you can change it after the restore process is done. The billing mode will also have the same value that was there at the time of backup.

So if you had provision capacity, then that will be applied to your new table. Or if you used on demand mode, then the new table will be created with the OnDemand mode. Of course, you can change this after the restore process is complete. Now, there are certain things that do not get restored and you have to manually set them up. And you should really know what all these configuration options are from the examination perspective. You might have a question that asks you something around this. So what doesn’t get restored or what is that? You have to manually set up first the auto scaling policies and the im policies. You have to apply them after your table gets created. Then the Cloud Watch metrics and alarms do not get restored. You have to create them. The DynamoDB stream and TTL settings also do not get restored. We are going to talk about stream and TTL in a while. The tags that you have applied to your table do not get restored. So these are the things that you have to set up manually after your table gets restored. Now let’s go into a demo and see how easy it is to create a backup and restore from the backup as well.

img