98-364 MTA Microsoft Database Fundamentals – Querying and Manipulating Data Part 4

  1. Querying Databases – Joins

Hi, welcome back. In this video, we’re going to combine data from multiple tables using joins. Earlier in this course, we covered the process of normalising a relational database. We saw that this process involves breaking down tables into smaller logical units or entities. As a result of this, the data no longer resides within one table. So in order to get all the data, we have to combine it by relying on the relationships that are formed during the normalisation process.

These relationships are formed by using the primary key of one table as a foreign key in another table. This is where joining comes in. With joins, we can query data from multiple tables using the relationships between the tables. For now, we’re going to focus on the Customers and Orders table. Both tables have one column in common, and that’s the customer ID column. In the Customers table, the Customer ID column is the primary key, while in the Orders table, the Customer ID column is the foreign key.

We can now write a query to select data from both tables based on these keys. There are a number of ways we can join these tables, and that completely depends on what information we want to see. Here is a document with the different types of joints you should be familiar with. First, we’re going to COVID the inner join. Here we have two tables: table A and table B. The inner join would return only the data that both tables have in common. Let’s say table A holds the primary key and table B holds the foreign key.

If we use an inner join on these tables, only the rows where the primary key of table A is equal to the foreign key of table B will be returned. So basically, only rows from both tables will be displayed where table A has a corresponding row in table B. Let’s see this in action. First, we’re going to look at the tables we’re going to join, and those are the Customers and Orders tables. When we run both queries, we get two result sets. In the Customers table, we have a total of 91 rows, and in the Orders table, we have a total of 830 rows.

In the following query, we’ve created an inner join between the customers and the Orders table, and that’s done based on the primary key foreign key relationship between these tables, which you can see in the on clause of the inner join. We’ve also made use of aliases, and that’s because both tables have a common column. If I were to remove the aliases, SQL Server wouldn’t be able to determine which table’s column I’m referring to. So always use aliases when working with joins.

Let’s execute this query and see how the inner join behaves. We’ve used the asterisk, which means that all columns from both tables will be returned. This is something you want to avoid when joining tables. Always be specific about the columns you want to return. As you can see, we have a total of 830 rows returned, which is the same number of rows that are in the Orders table. That’s because each row in the Orders table has an associated customer in the Customers table.

Therefore, all the rows from the Orders table intersect with the Customers table, which is why the inner join returned 830 rows. The problem now begins when we want to see all customers, regardless of whether they have placed any orders. If there are customers who have not placed any orders, they will not have corresponding rows in the Orders table. Therefore, they will be excluded when using an inner join. And that’s where the outer joints come in. If we go back to the document, you’ll see that we have three types of outer joins. We have the left outer join, the right outer join, and the full outer join. All of the outer joins behave in a rather similar way. With the inner join, we saw that we could only retrieve data that intersects with both tables, so getting all the customers, regardless of whether they had placed any orders, wouldn’t be possible with the inner join. However, it would be possible with an outer join.

The left outer join retrieves all the data from the table on the left side of the join along with the data that intersects with both tables. It’s basically the inner join combined with all the data from the table on the left side of the join. This could be effectively used to get the desired results. Let’s head back over to SQL Server Management Studio to see this in action. This is the syntax for the left outer join. If we execute this query, we get 832 rows.

Let’s break down what happened here. The inner join returned a result set consisting of 830 rows, and these are all the rows that intersect with both tables. With the left outer join, we have two extra rows included in the result set. The left outer join took all the data from the table on the left side of the join, which is the customers table, and also all the data that intersects with both tables. So if two extra rows are included in the result set, those two rows belong to the customer’s table, which basically means that there are two customers who haven’t placed any orders. Since they haven’t placed any orders, they won’t have an order ID associated with them, which means we can find them.

We could say, based on this condition, get all the data from this left out or join where the order ID is null. If we execute the query, we get the following two customers, who haven’t placed any orders. If we scroll to the right, you’ll notice that each column of the Orders table for these two customers is null. This is one of the signs that we’re using the left outer join: on the left side, we have all the data, while on the right side, we have nulls. when there is no associated data to combine.

The right outer join is literally the opposite of the left outer join. Because every order has a customer ID defined, using the right outer join would return the same 830 rows. The difference between the left outer join and the right outer join is the order in which the tables are specified. If I switched the positions of both tables as well as the aliases and ran this query, I’d get 832 rows back, the same as with the left out or join when the positions of the tables were not switched. We can now uncomment the where clause to select only the rows where the order ID is null.

If we execute the query again, we get the following two rows: The right outer join took all the data from the table on the right side of the join, which is the customer’s table, and also all the data that intersects with both tables. So if two extra rows are included in the result set, those two rows belong to the table on the right of the join, which is the customer table. Because this is a right outer join, the NULLs now appear on the left side while the data appears on the right side. So these two behave pretty similarly, and it completely depends on how we’ve defined the joint statement.

Next, we’re going to look at the full outer joint. This type of join is literally a combination of the left outer join and the right outer join. If I execute this query, I should get 832 rows. This would effectively combine all the data from both tables. Let’s say I had five orders that had no customers defined. The full outer join would display all the data that intersects with both tables, the two customers who haven’t placed any orders, and also the five orders that aren’t associated with any customers. Let’s head back over to the document and look at the cross-join. The cross join, also known as the Cartesian join or the Cartesian product, joins every single row from one table with all the rows from another table. If you look at this image, we have three rows in table A and three rows in table B. The number of rows that will be returned is equal to the number of combinations we can make, and that’s three multiplied by three.

That’s nine combinations, and thus nine throws that will be returned. Let’s see this in action. The syntax of the cross join is a little bit different than the previous types of joins we’ve covered. Because the cross-join joins every single row from one table with all the rows from another table, it doesn’t need an on clause. So, when using the cross join, the primary key and foreign key have no effect. So if we execute this query, we get a total of 75,530 rows. We know that there are 830 rows in the orders table and 91 rows in the customer’s table. If we do the math and basically multiply them to get the total possible combinations, we should get the same result, which is 75,530 rows. We’ve now covered the basics of joining tables. Thank you for watching. In the next video, we’re going to discuss COVID set operators. see you then.

  1. Querying Databases – SET Operators

Welcome back. In this video, we’re going to work with set operators. Just like join, set operators can also be used to combine data from multiple tables. But instead of relying on the relationship between tables, set operators rely on the structure of result sets. This means that set operators can only combine the result set of select statements that have a similar structure. Generally, we have three types of set operators. First, we have the union operator.

This operator combines the result set of two select statements and returns only the distinct rows between them. So if we have rows in both result sets that contain the same values, only one of them will be returned. We also have the all-union operator. This operator behaves pretty similarly to the union operator, but instead of returning only the distinct rows, it returns all the rows regardless of whether there are duplicates or not. Let’s head back over to SQL Server Management Studio to see this in action. For explaining the set operators, we’re going to focus on the customers’ and employees’ employees table.

These tables store information about people who have a direct connection with the company. For now, we would like to see an overview of all the countries where we have employees and customers. Let’s first look at these queries individually. So if we run both of them, we get two result sets back. If we look at both result sets, you’ll see that we have lots of duplicate values. We’re now going to use the distinct clause, which I’m just now introducing. The distinct clause is used to get only unique values from the columns in the select list. There’s a document attached to this course where you can read all about the uses of the distinct clause. So if I now write the distinct clause for both queries and execute them, I should get two result sets that have only unique values. The first result set, which contains the countries of the customers, has a total of 21 rows, while the countries of the employees’ result set contains two rows. In the next query, we’re combining these individual select statements using the union all operator.

This is the general syntax. So we’ve combined two select statements that can be executed on their own with the use of a set operator. One of the requirements for combining result sets using a set operator is that both select statements have to return the same type of data. This means the same number of columns and the same data type for each column. If I, for example, add another column to the select list of the first select statement and execute it, you’ll notice we get an error. This error states that all queries combined using the union intersect or accept operator must have an equal number of expressions in their target list. If I now try to combine data of incompatible types, say the country column from the customers table and the birthdate column from the employees table, I’ll get a conversion error because they’re not the same type of data and SQL Server couldn’t perform an implicit conversion. So always pay attention to this when working with set operators.

Now, if we execute this query, it returns one result set with a total of 23 rows. That’s 21 rows from the customer’s table plus two rows from the employee’s table. When we scroll down, you’ll see that the UK and USA appear twice. And that’s because the UK and USA appear in both individual result sets. Since we’ve used the Union All operator, it has combined all the data from both result sets, regardless of whether there are duplicates or not. If we now use the union operator, we get 21 rows. The union operator first combined all the data from both result sets and then, just like the distinct clause, eliminated all the duplicate rows from the result set.

So now we have only one occurrence of the UK and only one occurrence of the USA. Let’s head back to the document and look at the intersect operator. As the name suggests, it only returns data that is common to both result sets. So if we have two result sets represented by the blue and green circles, we only get the data that intersects with both circles, which is the orange figure over here. So basically, the only data that exists in both result sets If we look at the next query, we’ve used the intersect operator to combine the result set of both select statements.

When we run this, we only get two rows back. As we’ve seen previously, these are the only two rows that are common in both result sets, which is why they are the only two rows returned by the intersect operator. Then there’s the acceptance. Operator. This set operator heavily relies on the order of the select statements. When we use the Accept operator, the result set on the left of the operator represented by the blue circle will exclude all data that intersects with the result set on the right of the operator represented by the green circle.

So if we exclude the orange figure, which is the intersection of the blue circle, we get this blue crescent figure. As a result, in the next query, we’ve used the accept operator to combine the result set of both select statements. If we execute this, we get 19 rows back. Let’s break down what the Accept Operator did. From the first select statement, we would get a result of 21 rows, and from the second select statement, we get a result of two rows.

These two rows, UK and USA, both intersect with the 21 rows of the first select statement. Since we’re using the accept operator, it will then remove those two rows from the result set, which is why 19 rows are returned. As you can see, none of these 19 rows have UK or USA as values. If we were to change the position of the individual queries, Since they’re basically similar, we can just switch the table names and execute the query. We get zero rows because all the data from the result set of the first select statement intersects with the data from the result set of the second select statement. Therefore, all data from the result set of the first select statement will be excluded. So it’s really important to pay attention to this when using the accept operator. That is it for set operators. In the next video, we’re going to start with the data manipulation language. see you then.

  1. Manipulating Data – Inserting Data

Welcome back. In this video, we’re going to start with the data manipulation language. Specifically, we’re going to look at how we can insert data using Transact SQL DML commands. I have attached a script to this lecture, which you can simply copy, paste, and execute. This script will create a new database schema named Test with a table inside it named Employees. If you’ve executed the script, you should see the new Employees table in the Object Explorer. Now we have two tables for employees in the same database. To take a brief break. This is what I meant in one of the previous videos about schemas being used to organise database objects and why it’s always a best practise to use the schema name when referring to database objects.

So right now we have an empty table, which means we’re going to have to insert data into this table. To insert data into a table, we need to be familiar with its columns. Here is a list of all the columns from the employee table. What’s important to us now are the datatype and the nullability properties of the columns. The data type tells us what type of data we can insert into a column as well as how to insert the data, while the nullability property tells us whether we are required to insert data into a column or not. Here we have a query that inserts one row into the Employees table. First we have the insert command, which initiates the insert statement.

We then have the INTUCLAUSE, which is used to specify the object we want to insert data into. Then we have the “values” clause, which is used to specify the values we want to insert into the table. We have a total of six columns in this table, but we have only specified five values. The reason we did that is because the employee ID column, which is the primary key, is also an identity column. If we take a look at the properties of the employee ID column, you’ll see that the identity attribute is set to true. This means that the values for the employee ID column are generated automatically by the database.

This is done based on the values assigned to the identity seed and the identity increment attributes, which are both set to one by default for both attributes. So the value for the employee ID column will start at one and will be incremented by one for every new row inserted into the table. And that’s the reason why we don’t specify a value for this column when inserting data. The order in which we’ve specified the values and the values clause is based on the position of the columns in the table.

This means that the first value will be inserted into the last name column. The second value will be inserted into the first name column. The third value will be inserted into the higher date column. The fourth value will be inserted into the department column, and the fifth value will be inserted into the salary column. If we execute this insert statement, we get a message stating that one row is affected. If we select all the data from this table, you can see the new row that we’ve just inserted. The problem with this insert statement is that it lacks flexibility. If we look at all the columns of the employees table, we have one column that can be null, and that’s the higher date column.

Given its ability to be null, this column shouldn’t be required when inserting a new row. So if we remove the value for the higher date column from the insert statement and try to insert this row again, we get the following error: we got this error because inserting rows into a table using this syntax requires you to specify a value for all nonidentity columns, regardless of whether they are null columns. You also need to specify the values in the same order as those of the columns in the table. Luckily, this is not the only way to insert a new row into a table. The following insert statement uses a columnlist to explicitly specify in which column the corresponding values will be inserted. Given the column list, we’re inserting data into the last name, first name, department, and salary columns. As you can see, we have omitted the higher date column. The higher date column has a default constraint. If you can recall, the default constraint automatically inserts data into a column. If no data was supplied, in this case, the higher date column automatically takes the current date from the system. Another thing about this insert statement is that the order of the values doesn’t have to be the same as that of the columns in the table. We could, for example, change the position of the Department and Salary columns with their values as well.

So if we execute this statement, the values will now be inserted according to the column list we have specified. As you can see, one row is affected. If we look at all the data in the table, you’ll see our newly added employee, Nikki Garcia. The values for the department and the salary have been inserted into their respective columns. You can also see the value of the higher date column, which is equal to the date value of the system at the moment the row was inserted. Now, the following insert statement is a little bit interesting. In the previous insert statements, we were inserting a single row into the employee’s table. With this insert statement, we can insert multiple rows at the same time into the employee’s table. The syntax is pretty similar to the previous one. However, instead of specifying values for one row, we have specified values for multiple rows. One of the benefits of this multi-insert statement is that if an error occurs during the insert operation, none of the rows will be inserted into the table, leaving your database in a consistent state. So if we execute this, we get a message stating that three rows are affected. Let’s again look at all the data in the table. We can now see the last three rows we’ve just added at the bottom of the result set. That’s it for inserting data. Thank you for watching. In the next video, we’re going to look at how we can update data using Transact-SQL. see you then.

  1. Manipulating Data – Updating Data

Welcome back. In this video, we’re going to show how we can update data using Transact-SQL. So this is all the data we have in the employees table, which is the data we inserted in the previous video. We are now going to update the data for Nikki Garcia by assigning a new value to the higher date column. The update statement is pretty straightforward. We first have the update command, which initiates the update statement, followed by the object for which we want to update the data, in this case, the employees table. Then we have the set clause. The Set clause is used to assign or set a new value for one or more columns. So in this case, we’re only updating the higher date column by setting or assigning a new value to it. Then we have basically the most important part of the update statement, which is the where clause. This is where the clause filters the rows that will be affected by the update statement. We’ve applied a filter to the employeeID column, which is the primary key.

The reason we’ve used the primarykey column is because this column uniquely identifies each row in the table. We could have, for example, also applied a filter to both the last name and the first name column, but I find the primary key attributes to be more efficient and also safer. If we had a column in this table that had a unique key constraint, then that column could also be used to filter the data when performing a single row update. If we execute this query, we get a result message stating that one row is affected. If we now look at all the data in the employees table, we see the new value for the higher date column where the employee ID is two. In the next query, we’re updating the department and the salary information of the employee with an employee ID of five. And that’s Patricia Anderson. If we had more columns to update, we could simply write them all with their new values separated by commas. If we execute this statement, we get a message stating that one row is affected. If we again look at all the data in the employee’s table, you’ll see that Patricia Anderson is now part of the Finance Department and has an updated salary value of $6950. In the next query, we’re performing a multi-row update.

The way we can identify whether an update statement potentially targets multiple rows is in the where clause. Here we’re filtering on the department column, which is a non-unique column. This means that it’s likely that there will be duplicate values in this column. Therefore, this update statement will likely target multiple rows. So this update statement will give every employee in the IT department a 10% raise. With this equation, we’re basically saying take the current value of the employee’s salary who meets the condition, increase their salary by 10%, and then update their current salary with the result. If we execute this query, we should see that two rows are affected. If we now look at all the data in the employees table, you’ll be able to see the increased salary for the employees of the IT department. As stated earlier, the most important part of the update statement is the where clause. So always pay attention to the rows you want to target and make sure that the where clause effectively filters them accordingly. That’s it for updating the data. Thank you for watching. In the next video, we’re going to look at deleting data from a table using Transact-SQL. see you then.

  1. Manipulating Data – Deleting Data

Welcome back. In this video, we’re going to delete data from a table using transact-SQL commands. This is all the data we have in the employee table. We’re going to delete the row where the employee ID is equal to one. If we look at the delete statement, you’ll see it’s also pretty straightforward. We first have the delete command, which initiates the delete statement, followed by the from clause, which is used to specify from which object we want to delete data.

Then we have the where clause, which is the most important part of the delete statement because it filters the rows that will be affected by the statement. So we have five rows in the employee’s table, and with this statement, we’re deleting one employee from the employee’s table. If we execute this statement, we get a message stating that one row is affected. If we again look at all the data in the employees table, we see that the row where the employee ID is equal to one is removed. So with the where clause, we decide which rows will be removed.

Because we’re deleting a single row from the table, we’ve made use of the primary key column, which uniquely identifies each row. If we remove the where clause from the delete statement, we’re no longer filtering the data. Therefore, every single row in the table will be removed, and we can actually test this out. If we execute this delete statement, you’ll see four rows affected. That’s the same number of rows we had in the employees’ table. If we again select all the data from the employees table, we get an empty result set. That’s because a delete statement without the where clause removes all the rows from a table. So be careful when using the delete statement. It can end up being very dangerous. Since we’ve deleted all the rows from the employees table, I have an insert statement here that will insert new data into this table.

If we execute this query, three new rows will be inserted into the employees table, and we can actually go ahead and look at the contents of the table. Looking at the employee ID column, you’ll notice that the first row starts with a value of six. If you can recall, before we deleted all the data, the last value for the employee ID column was five. So even though we removed all the rows from the table, it still remembered the last value that was generated for the employee ID column and basically just incremented it by one, which is why we started with six. The thing about the identity value is that it is maintained, so we could, under normal circumstances, no longer use the employee ID of removed rows. under normal circumstances.

That’s because there are ways we can actually reset the identity and reuse the values. One of the ways we can do that is by using the Truncate command. This command, however, is not part of the data manipulation language. It belongs to the Data Definition Language. So here we have the general syntax of a truncate statement. Because we’re using a DDL command, we cannot specify the where clause for the Truncate table statement. When we remove rows using the delete statement, each row is deleted one at a time and logged in the transaction log. So if we have a data set with a large number of rows and we need to remove all the data, it’s going to take a while because each row is going to be individually deleted and logged. The Truncate command, however, does not log the deletions of the individual rows. It just de-allocates the data pages used by the table and therefore removes all the data instantly. As a result of this, only the paged allocation gets logged in the transaction log.

So it’s a lot faster than using the delete command when removing all the rows from a table. In addition to removing the rows, the Truncate command also resets the identity back to the specified value, which is the default one. So if we execute this query, instead of getting a message stating how many rows are affected, we get a message stating that the command was completed successfully. The reason why is because it’s a DDL command. It doesn’t manipulate data.

So if we now look at the contents of the employees table, you’ll see that we have an empty result set. If we reinsert the three rows and execute the select statement, you’ll see that the numbering for the employee ID starts at one. So it’s very important to be aware of the differences between the delete and the truncate table statements. We’ve now covered the basics of deleting data. Thank you for watching. In the next video, we’re going to start with the Data Definition Language, where we’ll be creating database objects using transactions. see you then.

img