Amazon AWS Certified Database Specialty – Amazon DynamoDB and DAX

  1. DynamoDB overview

Let’s start with DynamoDB. This is the database that got me started on my AWS journey. And this is one of the databases from AWS that provides ultralow latencies and high performance at any scale. So let’s begin. DynamoDB is a non-relational database, and it’s a key value store. It’s fully managed serverless NoSQL database from AWS. It’s fast, flexible, cost-effective, fault tolerant, and secure. By fast, I mean it provides ultralow latencies.

Flexible refers to the NoSQL type of structures that it supports. That’s semi structured data, cost effective because you pay per use, you pay as you get. Fault tolerance means your data is replicated across multiple AZ’s, and your data is also encrypted at rest and in transit, making a secure database. DynamoDB also provides something called as DynamoDB Global Tables, which is a multiregion, multimaster version of DynamoDB. It also provides you with backup and restore functions with Pitr. Pitr stands for point in time recovery that allows you to restore your database to any second in the past 35 days. And when we talk about the latencies,

DynamoDB provides single digit millisecond performance, a single digit millisecond latency at any scale. And if we use DAX along with DynamoDB, we can get as low as microsecond latency. So that’s ultra-low latency. For writing and reading from DynamoDB, we use APIs. There is no sequel interface. We use APIs to communicate with DynamoDB. And recently, DynamoDB has also started providing transactional support.

So it does provide asset compliance if we use transactional consistency, and we’ll talk about it in a bit. Since this is a non-relational database, there is no support for analytical queries or table joints. You cannot query multiple tables at once. And also you cannot run analytical operations or aggregation functions like you do with relational databases. One of the important aspects of DynamoDB is the table. Access patterns must be done ahead of time for efficient design and performance. And you get a better idea of what I mean when we create a table. So why don’t we create a table and see how easy it is to work with DynamoDB? And we’ll also have a look at the DynamoDB console in AWS. Let’s do that in the very next lecture.

  1. Working with DynamoDB – Hands on

In this lecture, we are going to go into our first hands-on and we’re going to create a DynamoDB table. So here I’m in the DynamoDB console, and creating a DynamoDB table is pretty straightforward. Simply click on this Create table button. It will take you to a wizard where you can create a new DynamoDB table. There are only two things, things that are important here. The table name and a primary key. But as I mentioned in the last lecture, that the access patterns must be known before you create your DynamoDB table. So we have to take a look at the access patterns that our database table is going to use before we can decide which attributes or which fields to create our primary keys, as well as our indexes on.

So let’s say we have a requirement to store player data for different games. For example, some online games, we have different players playing different game sessions. And for each player, we want to store the games that they have played along with, let’s say the time at which the game was played, the outcome of the game, whether it was one or it was last, the score, the game duration, and so on. So our first requirement is to store the data, so we should be able to retrieve the gameplay data of a particular player, as well as the historical data of the player. So all the games that this player has played in the past, we should be able to retrieve it. So that’s our first requirement.

And second access pattern that we have is, let’s say we want to create a gaming leaderboard, so we should be able to get the top scores of any particular game and display it in real time. So these are the two access patterns. The first, to store and retrieve the game data for a particular player, and second, to get the gaming leaderboard for a particular game. In relational databases, we have rows and columns, while in DynamoDB we have items and attributes. So an item is a record and attributes are like the fields or the columns. Okay? So here we can have attributes like, say, user ID to designate the ID of the player, game ID to designate the game session that’s being played. We can have the game timestamp, we can have result, score, duration of the game and so on. But as you can see here, we have only two input boxes here, table name and primary key. So, as you know, DynamoDB is a non relational database and it supports variable data structures.

So all any item needs is a primary key, and rest of the attributes are optional. So we only provide a table name and a primary key when we create a DynamoDB table. So let’s say gameplay, let’s say we call it gameplay. And for the partition key, we can say user ID as the partition key. And we can also have a sort key. Let’s say game ID. So here we have a partition key and a sort key. So partition key is how the data is partitioned in the table. And game ID is the sort key. So it designates how the data is sorted for each partition key. So for each user, we can sort the data by the games that the player has played. Right, so this is our primary index or primary key, and this is going to serve as our first access pattern. So it will allow us to get the game scores of any particular user and it will also allow us to get the historical list of games played by a particular user. So that’s our primary key.

Now I’m going to untake this default settings. It will open up all different configurations that we can set here. So the second access pattern we have is to create a gaming leaderboard. So we want to get the top scores of a particular game. We can create an index on the game ID. And of course, we have to sort it by the score of a particular game. Let’s create an index here. This will be a secondary index. In this case, the partition key can be game ID, so we can get information about a particular game and we want it to be sorted by, let’s say, score. So this is going to be an index and this will be a global secondary index. There are two types of indexes, local secondary index and global secondary index. And we’ll come to that in a bit. For now, just know that we’re creating this index to help us retrieve the top scores data for a particular game.

So we simply add an index and you can see that type is GSI, which is global secondary index. Right? And then we have a read write capacity mode. There are two modes, provision capacity that allows us to use free tier as well. And then there is an OnDemand mode and we’ll come to that in a bit. We can just remove auto scaling. We don’t need that. And I’ll just set this to one capacity units. Okay, so we have the minimal estimated cost. This shows us the estimated cost. But if you don’t have any other tables in your account, this will be in free tier in most probability. All right, we can keep rest of the settings as they are and simply hit create. So this will create our first Dynamo TB table. We have a couple of tables that use for other courses. You can ignore that. So now you can see that our table has been created. So that’s how easy it is to create a DynamoDB table.

So you can see different characteristics of the table. Here the table name, the primary partition key and primary sort key. We also have the capacity units that decides the pricing and other information. And here we have the ARN of the table, which is a unique identifier of the table. On the Items tab, we can see the items of the table. So from here, you can add or remove or view items. Let’s create an item quickly. So here you can say, let’s say user one, and we can say game one. And the score, let’s say 98. All right? So this is a quick way, and you can add more attribute to it. For example, you can add result, which is not a mandatory attribute, and I can say Win, right? And we save. And we can similarly create another item. So user two, game one, and score could be 50. And let’s say I don’t want to add a result here. We can still do that and we save. This particular item does not have the result attributes. This is variable data structures. Let’s say I want to edit this item here, and you can see three mode. You can change the mode to text, and you can see that it’s simply a JSON format.

All right? So data is actually stored in JSON format. And here we have three attributes. And if you go to the first one and switch to edit mode, you can see that this JSON has four attributes. This is a simple JSON view. And if you click on DynamoDB JSON, it will show you the structure as it is stored internally in DynamoDB table. So we have attribute game ID. It’s data type, which is string, and its value, which is game one, so on and so forth. So this is how information is stored in DynamoDB table. All right, then here on the Metrics tab, you can see Cloud Watch metrics of the table. On the Alarms tab, you can create different alarms. You can see some alarms have been created by default. Here you can set the capacity of the table, and we’ll discuss that later in the course. Here you can add or delete indexes.

Then this is global table. So if you want to create a multiregion multimaster table, then you can enable it here. From here, you can create or restore your backups or do your point in time recovery. Again, we’ll come to it later in the course. Then we have contributor insights. This is a diagnostic tool integrated with Cloud Watch. Then here we have triggers that allow you to run different lambda functions based on certain events, right? And here. Access control. You can use this for creating your fine grained access control policies. Again, we’ll come to it later in the course. And tags are typical of all different AWS services. You use them to segregate your resources, typically for billing purposes. So that’s a quick overview of the DynamoDB console. Let’s continue to the next lecture.

  1. DynamoDB basics

Now, let’s quickly go over some of the basics of DynamoDB. First, the DynamoDB terminology comparison with SQL. So in SQL we have tables, in DynamoDB as well, we have tables. The only difference is in SQL we can have tables within a database, while DynamoDB tables are top level entities. So in DynamoDB we don’t have a concept of database. Then in SQL we have rows and columns, and in DynamoDB we have itunes and attributes instead. Primary keys in SQL databases or relational databases can be on multiple columns and they are optional. But in DynamoDB primary keys are mandatory and we can have one or maximum two attributes in the primary key.

So we can either have a partition key or we can have a combination of partition key and a sort key. And primary keys being mandatory is one of the reasons that helps DynamoDB deliver consistent high performance at any scale there. In relational databases we have indexes, in DynamoDB we call them local secondary indexes. Similarly, in relational databases we have views, and in DynamoDB we have global secondary indexes instead. Now, DynamoDB tables are the top level entities in DynamoDB, so we don’t have a concept of databases and there are no inter table relationships. In DynamoDB, the tables are always independent of each other.

This allows us to control the performance at the table level, so you can control the performance of your tables independent of other tables in your account. DynamoDB table items are stored as JSON, DynamoDB specific JSON to be precise, and we have seen it in the hands on demo. Then, primary keys are mandatory, rest of the schema is flexible, so the only attributes that are mandatory are the primary key attributes and rest of the attributes are optional.

And you can add or remove them at runtime as and when you like. Primary key can be simple or composite. Simple primary key means the primary key is composed of only the partition key. Partition key is also called as hash key and composite key has two attributes partition key and a sort key. Sort key is also known as a range key sometimes and non key attributes including the attributes that are part of the secondary indexes are always optional. So here is an example of a simple primary key. So it only has a partition key and rest of the attributes are optional. And this is an example of a composite primary key. So, we have primary key composed of a partition key and a sort key, while rest of the attributes are optional.

Then data types in DynamoDB so, there are three categories of data types that are supported in DynamoDB. First is the scalar types. This means you have one key and one value per k key is the attribute name, while value is the value of the attribute. So we have exactly one value. Examples are string number, binary, Boolean and null. And keys or index attributes only support three scalar types. That’s string number and binary.

So if you’re using any attributes as primary keys or the attributes of the secondary indexes, then those attributes can only be strings, numbers, or binary. The second category is set types. So here we have one key and multiple values. So you can think of this as arrays. For example, you can have string set, number set, and a binary set. So you can have an array of strings, array of numbers, or an array of binary values. And the third category is document types. These are simply the complex JSON structures with nested attributes. Examples, a list and a map.

You can simply think of the document types as nested JSON documents. And finally, let’s quickly look at the AWS global infrastructure. Now, this is not specific to DynamoDB, but I’m using it to help you understand how DynamoDB ensures high availability. So we have the AWS Cloud, and AWS Cloud has different AWS regions across the globe. So we have multiple regions, and each region has multiple Availability Zones. An Availability Zone is a geographically separate area within an AWS region. So region can have multiple Availability Zones, and each Availability Zone can then have multiple data centers. Here we have multiple data centers and different Availability Zones.

And DynamoDB automatically replicates your data between multiple facilities within AWS regions. It basically puts your data into at least three different facilities within different Availability Zones. For example, when we create a new table, the table data is automatically replicated across at least three different facilities. And this is a real time replication. And this allows DynamoDB to offer high availability as the AZs act as independent failure domains. Here we have data replicated across at least three different data centers within multiple Availability Zones.

  1. DynamoDB consistency

In this lecture, we are going to look at DynamoDB consistency model. We have read consistency and write consistency. In read consistency, we have three types strong consistency, eventual consistency, and transactional consistency. And then we have write consistency, where we have standard consistency and transactional consistency. So what exactly are these consistencies? First, the strong consistency. So when you read a DynamoDB table using strong consistency, you are bound to get the most recent or most up to date data. And when you make the read request, you must explicitly mention that you want to use a strong consistency.

Then we have eventual consistency. This may or may not be the latest copy of data, and we’ll see what I mean in just a bit. This is the default consistency for all read operations, and this is 50% cheaper than strong consistency. So DynamoDB charges you twice as much if you use strong consistency as compared to eventual consistency. And then we have transactional consistency that you can use for reads as well as rights. And this is something that helps you get asset compliance with your transactions.

Transactional consistency comes at twice the cost of your strong consistency reads, or twice the cost of your standard writes. All right, so let’s understand how the strong and eventual consistency works. So we have an application here, and we have different DynamoDB servers. So, as we have seen earlier, that DynamoDB replicates your data across at least three different facilities.

So whenever you write something to DynamoDB table, it is written to one of the facilities, and it is then replicated to the other two facilities. And whenever you request a read operation, the data will be returned from one of these facilities. If you’re using eventually consistent read, then data will be randomly returned from one of the facilities. It’s possible that you just wrote something to a facility, and if you read it from some other facility before that data was replicated, you might get a stale copy of that data, which may not be current.

So this is how eventual consistency works. Then you have strong consistency. So if you ask DynamoDB to read some data with strong consistency, it will ensure that you always get the latest copy of the data. So if you write to a facility, DynamoDB most likely will return your copy of the data from that same facility to ensure strongly consistent read. Now, as we have discussed, by default, DynamoDB uses eventually consistent reads. But if you set the consistent read parameter in your get item query or scan requests, or in other words, in your read requests, then you can get a strongly consistent read. And now let’s look at the transactional consistency or DynamoDB transactions.

Now, this is where DynamoDB provides you with asset compliance. If you want your DynamoDB table to be acid compliant, then you can request transactional consistency. Transactional consistency allows you to write to multiple tables at once or not at all. So that’s what we mean by transactional consistency. So for example, you have account balance table here and you have a bank transaction table here. So whenever there is a transaction in the bank transactions table then account balance should get updated in the account balance table. You can’t have a transaction being written to bank transactions table with the account balance being updated in the account balance table that might result in inconsistency of the data. So transactional consistency ensures that the data is written to both these tables or none of them. So that is the meaning of transactional consistency.

  1. DynamoDB pricing model

Now let’s look at the DynamoDB pricing model. We have seen that DynamoDB provides two pricing options. Provisioned capacity and OnDemand capacity. First, the provision capacity. Here you pay for the capacity that you provision, that is, you provision number of reads and writes per second that your application requires and you pay accordingly. You can also use auto scaling to adjust the provision capacity on demand. And provision capacity uses something called as read capacity units and write capacity units. So these are the units in which your provision capacity is calculated. And if you consume beyond your provision capacity, that might result in throttling. Now, I say that might result that doesn’t always result in throttling, but it might result in throttling because DynamoDB also provides some safeguards in terms of burst capacity and adaptive capacity.

And we are going to look at that in one of the upcoming lectures. And along with provision capacity, you can also use what is called as a reserve capacity. That’s something where you can purchase your provision capacity in bulk for a period of one to three years and then you get a huge discount on your provision capacity. So you’re charged a one time fee and then you pay an hourly fee per 100 RCUS and WCUs. That is read capacity units and write capacity units. So if you can predict your tables requirements for the next one year or next three years, you can definitely reduce your DynamoDB bill by using reserve capacity on top of the provisioned capacity.

Then we have on demand capacity. Here we don’t have the concept of provisioning the capacity units. Instead, you pay per request, that is the number of read and write requests that your application makes. So there is no need to provision any capacity units. And this is especially good for uneven workloads. DynamoDB instantly accommodates your workloads when they ramp up or ramp down very fast. And on demand capacity uses something that’s called as read request units and write request units. These are similar to the read capacity units and write capacity units, that is RCUS and WCU. But these are called as read request units and write request units, that is RRU and WRU. All right? And you cannot use reserve capacity with on demand mode.

If you’re using the on demand mode, then you cannot use the one year or three year term contract on your table. And we already talked about this that the request units are equivalent to capacity units for the purpose of throughput calculation. And apart from the consumption, you also pay for storage, backup, replication, streams, caching and data transfer. All right then here we have a graph that represents provision capacity mode.

The red horizontal line is the capacity that you have provisioned on the table, whereas the curve that you see is your actual consumption. So you can see that there are two, three periods where you are going above the provision capacity. But if these bursts are short, and narrow, then they often get tolerated. And how that happens, we are going to take a look at that in a little while. But if the bursts are tall and wide, they may not get accommodated.

So this can result in throttling. And this is where auto scaling can help. So you can use auto scaling along with provision capacity mode to accommodate these larger bursts of capacity. And on the other hand, on demand capacity mode works a little different. So we don’t provision any capacity in the on demand mode. So the curve here represents the number of requests. So you just pay for the number of requests that your application makes on the DynamoDB table. So what you pay for in case of provision capacity mode, this is what you pay for. The shaded region is what you have provisioned, and that’s what you pay for. And for on demand mode, you pay for the number of requests that your application makes at the runtime. So this is what you pay for.

All right? So this is how both of these modes work. So with provision capacity, you know beforehand how much you’re going to pay, but that also can result in throttling. With on demand mode, you generally won’t see a lot of throttling, but you cannot know beforehand how much consumption your application is likely to make. So if you have a huge traffic spike in your application, then you’re of course going to pay a little more, but that’s prevents any throttling that can occur. There are certain situations where OnDemand mode also gets throttled, and we’re going to talk about that in a bit.

img