98-364 MTA Microsoft Database Fundamentals – Querying and Manipulating Data Part 3

  1. Querying Databases – Ordering

Welcome back. In this video, we’re going to look at ordering data using Transact-SQL. Data within a table can be physically sorted and stored in one specific way, and that completely depends on the clustered index key columns. These are, by default, the primary key columns. The problem is, we don’t always want to retrieve data in that specific way.

Say we have a student’s table in which the student ID is the primary key column. This means that the data is physically sorted and stored using the student ID column. So, whenever we query the table, the data and how it is stored will be returned. This creates a problem if we want to see an overview of all the students sorted alphabetically by last name or first name. This is where the “order by” clause comes in. We use the “order by” clause to order the result based on one or multiple columns. Here we’re selecting only the employee ID, last name, and first name from the employees table. As you can see, the result is ordered by the employee ID column, and that’s because this column is the primary key as well as the clustering key. If we click on Indexes and expand it, you’ll see a list of all the indexes we have for this table.

For now, we’re going to look at the properties of the clustered index. Here, you can see that the employee ID is the clustered index key column. Another important thing is the sort order. The sort order defines the way the data will be stored. We have the ascending sort order for this numerical column. This means that the rows are stored from the smallest to the largest numerical value in the employee ID column. Don’t worry too much about indexes.

We have a dedicated section where we will be covering them. For now, just know that data is physically ordered and stored in one specific way, on the basis of the clustering key. In the next query, we’ve used the “order by” clause to order the result by the last name column. The order by clause is always at the end of a transact-sql statement. If we execute the query, we get the results alphabetically ordered by the last name. This is the ascending sort direction and is also the default. When using the “order by” clause, We could have also explicitly specified to use the ascending sort direction. If we execute the query again, we get the same result. We also have the descending sort order. The descending order is literally the opposite of the ascending order.

So instead of going from A to Z, we’re going from Z to A. So if we execute this query, we get a result where the data is ordered by the last name in descending order. The next query orders data by multiple columns, namely the last name and the first name column. Since we haven’t specified any sort direction, the data will be ordered by both columns in ascending order. If we look at the result, it’s clear to see that the data is ordered by the last name. For the first name, it might not be that obvious. When we order by multiple columns, there’s actually an order in which the sorting occurs, and that’s based on the position of the columns in the order by clause. Since we’ve added the last name first, the data will first be ordered by the last name and then by the first name.

So if we look at the result, especially for Rose, where we have similar last names, you’ll see that the first name is ordered alphabetically. For example, see Dodsworth and King. We have Anne, followed by Sarah. For Dodsworth. And we have Alvin, followed by Mitch, and then Robert. Because we have omitted the sort direction, the data is by default ordered alphabetically by the last name and first name. We can also explicitly specify the sorting direction.

If we execute it again, we get the same result. Another thing we can do when ordering by multiple columns is use different sorting directions, as can be seen in the following query. Executing the query gives us the following result: If we again look at Dodsworth and King, you’ll see that for Bosworth we now have Sarah followed by Anne, and for King we now have Robert followed by Mitch and then Alvin. Dates and numerical values can also be ordered in the next query. We are ordering the employees by the higher date column in an ascending manner.

This results in the oldest, higher dates being at the top of the result set, while the most recent dates are at the bottom of the result set. If we were to change the sorting direction, we would get the opposite. Another thing I’d like to see in COVID is that we don’t necessarily have to order by a column that is in the select list. We could, for example, remove the higher date from the select list and still get a result that is ordered by the higher date. We’ve now covered the basics of ordering data. Thank you for watching. In the next video, we’re going to look at how we can group data. see you then.

  1. Querying Databases – Aggregate Functions

Welcome back. In one of the previous videos where we’ve covered aliasing, we created an expression that was calculated for each row, like a computed column. In this video, we’re going to look at performing calculations on an entire data set or an attribute of an entire data set. These calculations are done using aggregate functions. We are now going to look at some of the most commonly used aggregate functions. First, we have the count function. This function is used to count the number of rows in a table. The Count function accepts a single parameter. There are generally two ways we can use the count function. In the first one, we can pass an asterisk as a parameter, and this will return a single integer value equal to the number of rows in the table.

So if we execute this query, we get a result of 830 rows. If we head over to the Object Explorer and select the top 1000 rows from the Orders table, let’s remove the top 1000 and execute it again. We’ll now see a total of 830 rows in the table.

So it basically just counted all the rows in the table. This is the first way the Count function can be used. Secondly, we can pass in a column, and this will return a single integer value equal to the number of applicable values in that column. The reason I said applicable is because aggregate functions cannot perform aggregation on nulls. So if there is a column with null values, those corresponding rows will be excluded. If we execute this query, we get 323 rows. That’s almost 40% of the total rows in this table. We can actually verify this if we head over to the other query window. I can show you how the count function behaves. If we modify the query to select all the rows from the Orders table where the ship region is not null, we should get a total of 323 rows.

That’s the same value returned by the Count function. Then we have the sum function. The sum function is used to calculate totals. This function, just like the Count function, also accepts a single parameter, which can be an expression. The difference here is that the expression has to result in a set of numerical values, and as a result, it returns a single numerical value. So if we execute this query, we get the summarized value for the units in stock. That’s 3119. Then we have the average function. This one is pretty similar to the sum function, but instead of returning the total value, it returns an average. If we execute it, we get the average unit price for all the products in the Products table. If we head over to the Object Explorer and select the top 1000 rows from the Products table, let’s remove the top 1000 and execute it again.

We now have all the data in this table. Let’s take a closer look at what the average function did. First, it went through all the rows where we have an applicable unit price. It then summarized all the unit prices. So we’ve used the sum function on the unit price column, and then we divide it by the number of applicable unit prices. Now, if we execute this query, we should get the same result as the query where we used the average function.

Then we have the minor minimum function. This one is used to return the minimum value of a column or an expression. It accepts a single parameter, which can be of any comparable type. So we could, for example, pass in a character string column, a numerical column, or a date and time column to get the minimum value. This function returns a single value of the same data type as that of the parameter. Now, if we execute this query, we should get the lowest unit price. We can also easily verify this if we head over to the Products query window.

We can order the result based on the unit price and the ascending sort direction. and this will display the lowest price at the top of the column. As stated earlier, We can also use it on dates and character string data. In cases where there is a date column, it returns the oldest date value, while in terms of a character string column, it returns the string that is alphabetically closest to A.

These aren’t the only types we can use. This aggregate function can be used with any type that can be compared or ordered by. Then we have the max or maximum function, which is pretty similar to the minimum function. But instead of returning the minimum value, it returns the maximum value of a column or an expression. So if we execute this query, we get the highest unit price. If we were using a date-time column, we would get the most recent date in that column. And if we were using a character string column, we would get the character string that is alphabetically closest to Z. These are some of the most commonly used aggregate functions you’ll come across when working with databases. That’s it for aggregate functions. Thank you for watching. In the next video, we’re going to combine these aggregate functions with the group by clause. see you then.

  1. Querying Databases Grouping

Welcome back. In the previous video, we covered some of the most common aggregate functions. We’ve also seen that these functions apply calculations to the entire data set. In this video, we’re going to apply calculations to groups of rows. This can be achieved by combining the aggregate functions with the group by clause. So how does it actually work? The group by clause categorizes data on the basis of one column or a number of columns into groups of rows. Basically, rows with the same values for the columns contained in the group by clause are taken together and form a single group. We’re going to go through a few tables as we look at some of the examples of grouping. First, we’re going to look at the Customers table, specifically the country column.

As you can see, we have a lot of customers who share the same country. If we group the data in this table by the country column, we get a result set containing 21 unique rows. The group by clause took all the rows that had the same value for the country column and placed them into a single group of rows. The number of rows in each group depends on the number of times the country value appears. In the next query, We’re looking at how many times each country appears. Since this table stores customers and each row represents an individual customer, you could also say that this query returns the number of customers in each country. Now let’s execute this to see the result.

So here we have the total number of customers in each country. If you can remember from the previous video, we said that when we pass an asterisk as a parameter to the count function, it counts the total number of rows in the table. This actually happens here as well, but because we grouped the data into groups of rows, the count function counted the number of rows in each group, and that’s again the amount of time each value appears. When grouping data, the columns we want to select have to be in the “group by” clause, and that’s because each group of rows that is formed has a group identifier. The group identifier is made up of the values within the columns contained in the group by clause. For example, the value “USA” is the group identifier for the rows where the country is USA. The count function is then applied to the rows with the country USA. We get a value of 13 because there are 13 rows with the country USA. So it’s important to make sure the columns you want to select are also in the group by clause. If they aren’t contained in the group clause, you won’t be able to select them. You can only select columns that are part of the group identifier. If I, for example, add the City column to this select list, the query will give me the following errors. Always pay attention to this when grouping data.

Another thing we’ve added here is the order by clause. We’ve used the alias here as a reference to the count function in the order by clause. We could have actually just used the count function here, but I wouldn’t recommend it as it defeats the purpose of aliases. Also important here is that the order by clause should always appear at the end of a select or transact SQL statement. If we had a where clause in this statement, we would need to put it between the from clause and the group by clause. Now let’s look at all the data again from the customer table. As you can see, each customer has a contact person, and this person has a title, which is stored in the contact title column. In the following query, we’re going to look at how many customers each country has, as well as the number of contact titles there are in each country. Here we’re grouping by two columns, just like the order by clause. The order in which the columns are written in the group by clause is very important. First, we’re dividing the data into groups of rows based on the country column. We have 21 different countries here, so we’re going to have 21 groups of rows.

We’re then dividing each of these 21 groups of rows by the contact’s title. Let’s execute both queries to see both result sets at the same time. If we scroll down to Mexico, you’ll see that we have five customers in Mexico. These are the five rows that reside within the Mexico group of rows. In the second result set, where we grouped results by the country and contact title column, you’ll see that we have three rows for Mexico. If we summaries the total number of customers, we get a total of five customers. These are the same five rows that the first result set counted. These five rows are then grouped into groups of rows based on the contact title. Since there are only three distinct contact titles for Mexico, we have only three groups of rows. We are now going to query the order details table here. We’ve basically combined a number of aggregations in which the data is grouped by the OrderId column. This means that these aggregations will all be performed for each order. If you can remember from one of the previous videos where we covered aliasing, we created a formula to calculate the net amount of each product, which was the unit price times the quantity times one minus the discount. We’ve also used that formula here. When we run both queries, we get two result sets. The first one is the aggregated data, while the second one contains all the data from the order details table with the net amount calculation. If we look at the first order, which is order ID 10,248, you’ll see that we have three products.

So the count function literally counted the number of product IDs we have for this particular order. Then we have the sum function, which basically summarizes the net amount of each individual product in this particular order. Then we have the average function, which calculates the average net amount for each product. This can be verified by dividing the total net amount by three.

Because we have three products. Then we have the max function, which returned the highest unit price of this particular order, which is 34.8. Then we have the main function, which returned the lowest unit price for this particular order. In this case, 9.8. So these aggregations are performed on every single order, and that’s because we’ve grouped by the Order ID column. We’ve now covered the basics of grouping data. Thank you for watching. In the next video, we’re going to look at joining multiple tables. see you then.

img