98-364 MTA Microsoft Database Fundamentals – Creating Database Objects Part 3

  1. Stored Procedures

Is a group of one or more transact SQL statements that form a single logical unit and are stored as an object in the database? Stored procedures are useful for a number of reasons. First of all, they can massively increase performance. That’s because a stored procedure is cached on the server, and also because fewer calls are made to the database.

Imagine you have an application that performs a rather complex task wherein multiple transact SQL statements are sent to the database and multiple result sets are being returned from the database. This will cause a lot of calls to be made to the database, which will result in a lot of data transfers between the application and the database server. In such a situation, it can be deemed appropriate to group the transacted SQL statements into a stored procedure and have the stored procedure perform all the processing.

This will cause only one call to be made to the database, thus reducing network traffic. However, the decision to move all the application logic to the database should be a carefully thought-out one because moving the logic to the database will increase the workload on the server side, which is also something you should try to avoid. So if you can find a balance, stored procedures can effectively increase performance. Then we have maintainability. Having all the logic in one location makes the procedures significantly easier to manage, document, and maintain.

Next, we have security, which is one of the most crucial factors when designing databases. Stored procedures can only be executed if the user has been granted execute permission. Just like views, giving users only access to execute the stored procedure can limit direct access to tables, thus protecting the underlying data. Another important thing about stored procedures in terms of security is their ability to protect against SQL injections.

SQL injection is one of the most common web hacking techniques. The SQL injection vulnerability occurs when the user’s input is directly used in a SQL query. If we allow the application to send full select, insert, update, and delete statements to the database, such statements can be injected by malicious SQL commands provided by the users. But if we use stored procedures rather than full SQL statements to perform these tasks, we’re effectively removing this vulnerability. Stored procedures are parameterized, so instead of sending full queries to the database, we’re sending only parameters. If someone decides to perform SQL injection, these malicious SQL commands will be stored as values and a parameter.

They will not be seen as SQL commands but rather as data, thus making them powerless. These are some of the most important reasons why stored procedures are used. But, of course, there are also drawbacks to using stored procedures. We’re not going to COVID them in this video, but I do have a document attached to this lecture that explains some of the general drawbacks to using stored procedures. Another thing you might find difficult to decide is whether to use database functions or stored procedures, and that’s because they’re somewhat similar. They both accept parameters, and they both have the ability to return data. So let’s take a look at some of the capabilities that differentiate these two database objects. First of all, functions are not allowed to modify data within the database.

This means that you cannot have DML operations such as insert, update, or delete within a function. However, when we use multistatement table-valued functions, you can use DML operations, but you’re not modifying data in the database. You’re modifying data in a table that only exists during the execution of the function. Stored procedures, on the other hand, are allowed to perform data modifications. So with stored procedures, you can insert, update, and delete data within the database. Secondly, functions always have to return a single value. Whether that is a table or not It’s always a single value that gets returned. Stored procedures, on the other hand, have the ability to return multiple values or none. Database functions are invoked by using a select statement, so they can basically be queried. Stored procedures cannot be used in a select statement. They can only be executed by using the execute command. Then we have errors.

Functions do not support error handling. This means that we cannot tell the database what to do next if an error occurs during the execution of a function. Usually, when an error occurs, you might want to log it along with information about the circumstances under which the error was raised. With functions, we’re not able to do that. Stored procedures, on the other hand, do support error handling. Actually, stored procedures and triggers are the only two objects we’ve covered that support error handling. Sometimes a trigger is also referred to as an “event-driven stored procedure.”

And finally, we cannot execute stored procedures from within a function, but we can call functions within a stored procedure. These are some of the major differences between these two objects, as well as some of the factors that help determine whether it’s appropriate to use a stored procedure or whether a function would be more effective and efficient. Let’s head over to the SQL Server management studio to look at some examples of stored procedures. So in SQL Server, there are generally two types of stored procedures. We have system stored procedures, which are stored procedures created by SQL Server, and we have user-defined stored procedures, which are stored procedures created by the user.

We’re first going to look at the system-stored procedures. Let’s expand the programmability node and the stored procedures node as well. This is where the stored procedures we create will appear. If we expand the System Stored Procedures node, you’ll see a large list of all the system stored procedures that reside within this database. As you can see, all of the system-stored procedures are stored on the sys schema. This is a very special schema within the database that can only store system objects and resources.

Another thing about the names is the prefix that is used to name these objects. This is a naming convention. Whenever you come across stored procedures that start with the prefix SP or XP, it’s most likely a system stored procedure. So if you create a stored procedure, it’s best to not use any of these prefixes, as it might be confusing for someone else who’s looking at the database. Usually, we use the prefix USP or Procto to indicate a user-defined stored procedure. Let’s open a new query window to execute our first stored procedure. We’re going to execute a stored procedure named SP. Help.

If we expanded the Parameters node, you’ll see that this object accepts one parameter, which should be the name of an object within the database. If the name is valid and refers to an object that exists within the database, VSB Help Stored Procedure will return detailed object information. Depending on the type of object, multiple result sets will be returned. We also see that this stored procedure returns an integer value. This integer value represents the return code, which is something every stored procedure can return. It’s just a basic characteristic of stored procedures.

To execute a procedure, we simply use the execute command followed by the name of the procedure, which is SP Help. We then specify the name of an object we would like to analyze. We’re going to execute the SP Help Stored procedure on the track table. If we execute this query, we should see the following results.

As you can see, we have about seven result sets returned. Each contains specific data about the track table, but we’re not going to COVID the contents of these result sets in this course. So we used the Execute command to run the Stored procedure. To simplify things, we can use the short version of the execute command, which is simply exec. If we execute this query now, we should get the same result. To further simplify things, we can completely remove the execute command, and we’d still be able to execute this query. This is, however, a very bad practice.

To be able to execute a stored procedure without the Execute command, the stored procedure would need to be the first statement within a batch. If we duplicate this query, you’ll see we have an incorrect syntax error. That’s because this statement is the second statement within this batch. To fix this issue, we either put it in a separate batch or explicitly use the execute command, which is what I would recommend. Let’s take a look at the Helptext Stored Procedure, which you might find interesting.

This stored procedure returns the definition of database objects, including functions, views, and stored procedures. In the previous videos, where we were altering views and functions, we needed the definition of the objects so we could easily modify the logic behind them. The problem is that we don’t always have access to the scripts used to create the objects. Perhaps the employee who came before you created the objects and failed to document them. The SB HelptextSystem stored procedure comes into play here.

This is one of the few methods we can use to get the definition of database objects. If we look at the parameters, you’ll see that this procedure accepts two parameters. The first one is the object name, while the second one is the column name. We don’t necessarily have to pass two parameters when executing this stored procedure. If we pass only the name of an object that resides within the database, we get the entire definition of the object. If we pass values for both parameters, the stored procedure will expect an object name and the name of a computed column within the object. However, we will not COVID that in this video.

Let’s go ahead and look at the definition of the total songs per album view. First, we write the execute command. Then we specify the name of the procedure, followed by the name of the object whose definition we want to see. If we execute this query, we get the following result: this is the DDL statement used to create the view, which is displayed in multiple rows. We’ve now gone over two of the most commonly used system stored procedures.

Let’s go ahead and create our own stored procedure. I already have a file containing the DDL script to create the stored procedure. To open a script file, we go to the File menu and select Open, then select File. An Open File dialogue will then appear here. You can simply select the script file. This is the DDL statement we’re going to use to create the stored procedure. This stored procedure will accept a few parameters and then insert the value of these parameters into the artist table. So let’s break down what happens when we execute this procedure.

First, we start with the Create command, followed by the type of object we want to create. We then specify the name of the object. As you can see, we have named this object using one of the naming conventions for user-defined stored procedures. Then we have the parameter list, which is pretty similar to that of a function. We have five parameters, which represent values for all the nonidentity columns of the artist table. We then have the as clause followed by the Begin command. Here we have the “no count” statement. This statement prevents the message that shows the count of the number of rows affected by a transact-SQL statement or stored procedure from being returned as part of the result set. It’s a good idea to include this in your stored procedures, but it’s not required. Next, we have the Insert statement. Here we have used the parameters and the Values clause, so whatever values we pass to this stored procedure will be inserted as a new row into the artist table.

Then we wrote a select statement. This select statement queries the artist table for the row where the artist ID is equal to the identity variable. The identity variable is a global variable in SQL Server and stores the latest value generated for an identity column. So when this insert statement is executed, a new value will be generated for the artist ID column. This value will be stored in the global identity variable. This will then be used to retrieve the most recent row inserted into the artist table. If we execute this stored procedure, we get a message stating that the command was completed successfully.

We can now go ahead and execute this procedure, but let’s first head over to the Object Explorer to see our newly created stored procedure. If we refresh the stored procedures node, our newly created procedure will appear. To execute the procedure, we simply write the execute command followed by the name of the object. The new artist we’re going to insert into the artist table is Taylor Swift. So her first name is Taylor. Her family surname is Swift. For the stage name column, we’re going to use Taylor Swift. She was born on December 13, 1989, and, finally, her birth country is the United States. Before we execute this stored procedure, let’s first look at the contents of the artist table. As you can see, we currently have six rows within the artist table. So if we execute the stored procedure, we get a result set with one row returned.

This is the result of the select statement we’ve defined within the stored procedure. If we now look at the contents of the artist table, you’ll see we have seven rows. We are now going to alter the stored procedure. Instead of returning the row that has been added, I simply want to print a message stating that the new artist was added successfully. To alter a stored procedure, just like altering views and functions, we also need the definition of the object to be present. Since we have the definition right here, we can simply replace the create command with the alter command. If we didn’t have the definition of this stored procedure, we could have used the Help Text system’s stored procedure to get the definition. So first we’re going to remove the select statement because we will no longer need it. We’re going to return a message stating that the artist was inserted successfully, along with the stage name of the artist. To do this, we use the print command. We then use the stage name parameter followed by the plus operator, which in this case is used to combine string data. Depending on the data type of the operands, the plus operator will behave differently.

So if we, for example, have numerical data here, the plus operator would perform a mathematical addition operation. If I have character string data, the text data will be combined or concatenated. Next we write that it has been successfully inserted into the artist table. If we now execute the DDL statement, our stored procedure will be altered. Let’s give the USP create artist stored procedure some new parameters. The artist we’re going to insert into this table is James Arthur. His first name is James. His surname is Arthur. For his stage name, we’re going to specify James Arthur. He was born on March 2, 1989, and finally, his birth country, which is the United Kingdom, When we run this procedure, we no longer get a result set. Instead, we get a simple message stating that the artist has been successfully inserted into the artist table. That’s it for this video. I hope you now have a basic understanding of what stored procedures are, how they are created, and how we can alter them. In the next video, we’re going to start with indexes. Thank you for watching. see you then.

  1. Introduction to Indexes

A database index is a data structure that improves the speed of data retrieval operations on a table at the cost of additional rights and storage space to maintain the index. A database index is very similar in purpose to the index section at the back of a book. If you were to search for a specific topic covered in the book, you would use the index section to quickly find the page where the topic is covered. If the book didn’t have an index section, you’d need to go through each page till you found the desired topic, which can be very time-consuming, especially if we’re talking about books that have hundreds of pages.

This is the exact same way SQL Server uses database indexes. If we have a table without an index, SQL Server would need to scan each individual row on each data page till it finds the data we’re looking for. This process is called a “full table scan,” but if we have an index, SQL Server no longer has to perform a “full table scan.” It can simply utilise the indexes to quickly find the data we’re looking for. Therefore, by indexing an underlying relational table, you can significantly enhance the query performance of your database. In SQL Server, there are a number of different types of indexes we can use. First, we have the clustered indexes, which are created by default when we define a primary key constraint. Secondly, we have the non-clustered indexes.

The behavior of these indexes depends on whether we already have a clustered index defined or not. And finally, we have the others, which go beyond the scope of this course. Both the clustered and the non-clustered indexes are implemented using a balanced tree, or B tree, data structure. A B-tree is a structure that stores data in assorted order and enables fast access to the data it stores. To understand how indices work, we’re going to look at the following table: the Customers table. This table has a primary key constraint defined in the Customer ID column.

This means that we also have a clustered index on this table. When a clustered index is created on a table, the index itself becomes the table. The structure of the table then changes to the following structure: Here we have a simplified view of a clustered index implemented using a balance tree data structure. This structure consists of a single root node, intermediate-level nodes, and leaf-level nodes, or data pages.

The database engine will sort and store the data in the underlying table based on the index keys you define. The clustered index shouldn’t be seen as a separate structure but rather as the table itself. All the data in the table is physically sorted and stored in the leaf-level pages of the clustered index. Because you can sort a table only in one physical sort order, you can have only one clustered index defined on a table.

The root and intermediate-level nodes contain index pages holding index rows. Each index row contains the clustering key and a pointer to either an intermediate-level page or a data row in the leaf level of the index. Let’s break down what happens when we, for example, execute the following query when searching for a key value: we always start at the root node and traverse the treelist till we find the desired value. The database engine will look for the index row with the key value of 11. If there is no index row with that particular key value, it moves to the next index row. If an index row with the key value of eleven is found, it uses the associated pointer to go to the next index page, which is on the intermediate level. Here, the database engine will again look for the index row with a key value of eleven. If the index row is not found, it moves to the next one. Once an index row with a key value of eleven is found, it uses the associated pointer to go to the data page containing the data on the leaf level. The database engine will then look for the clustering key with a value of 11. Once the value is found, the query data will be returned.

So instead of going through all the data pages, we have constantly narrowed the search till we reach the data page containing the data we’re looking for. This is, in general, how clustered indexes are used. As I’ve stated earlier, when a clustered index is created on a table, the index itself becomes the table. This is the reason why tables with a clustered index are also referred to as “clustered tables.” No clustered indexes are quite similar to clustered indexes. The biggest difference between these types of indexes is that they create a separate structure on top of a clustered table or a heap. In databases, a “hoard” is a table on which there is no clustered index defined.

Because the non-clustered index is a separate structure, we can create more than one non-clustered index on a table. No clustered indexes are also implemented via a balanced tree data structure, so their structure would be similar to this. Instead of physically storing the data of the underlying table in the leaf-level pages, they store either the clustering key if they are created on top of a clustered table or the row identifier if they are created on top of a heap. Therefore, non-clustered indexes do not point to the data itself but rather to the underlying structure they are based upon. Let’s head over to SQL Server Management Studio to look at some examples of indexes. We’re going to focus on the artist table. If we expand this table in the index node as well, you’ll see all the index entries are created on this table. For now, we only see the clustered index.

As you already know, this index was created as a result of the primary key we have defined on this table. If we look at the properties of the index, we’ll see the basic information of the index, such as the index type, whether the index enforces uniqueness or not, the columns that make up the index key, and the basic properties of these columns. Let’s go ahead and create an unclustered index on this table. Creating an index on a table is relatively easy. It’s the reasoning behind it that requires skills as a starting point for choosing index key columns. We begin with the columns that will be used the most when filtering data. Let’s assume it’s the stage name column.

To create a new index on this table, we head over to the indexes node, right-click, and select New Index. Here, we have a number of options. Some are enabled, and some are disabled. Since we already have a clustered index on this table, we cannot create a second one. But we can create a non-clustered index on top of the clustered index. So if we select this option, we get the following dialog. The first thing we can do is rename the index by giving it a friendlier name. There are a few naming conventions for naming indexes. The one I like to use is the following:

This name indicates that this index is a non-clustered, non-unique index on the artist table and has one index key column, which is the Stage Name column. Next, we’re going to add the columns we want the index to be created on. We can do this by simply clicking the “Add” button and selecting the columns we want. We can also change the sort order by clicking on Ascending. Since I don’t have any purpose for this, I’m going to leave it in its default state. Before we create the index, we’re first going to look at the transact-sql script necessary to create it. To generate the Create script for this index, we can head over to the Script Dropdown Menu and select New Query Editor Window.

This will generate the DDL script in a new query window. We have a lot of information here. Let’s remove what’s not important. We’ve already covered the Use command, which is used for context switching. Since we’re already targeting the fundamentals database, this statement is no longer necessary. This is the basic syntax for creating an index. We start with the Create command, which initializes the DDL script. We then specify the type of index we want to create. Here we have a nonclustered index, but we could also create a unique nonclustered index by simply writing unique or perhaps a clustered index by simply specifying only the clustered keyword. If we do that, you’ll now see a red line appear under the name of the object.

And that’s because we already have a clustered index defined on this table. So let’s change it back to a non-clustered index. After specifying the type of index, we want to create it. We use the index keyword followed by the name of the index. We then use the on clause, after which we specify the table on which we want to create the index. And finally, we use parentheses to specify the columns that will form the non-clustered index key. We can also specify the sort direction for the index key columns. Before we run this query, there’s one more thing I’d like to mention. Initially, we had only six rows in the artist table.

If I were to create a non-clustered index on this table, you wouldn’t be able to notice the increased query performance. To fix this issue, I have created a script that will insert an extra 100 rows of dummy data into the artist table. This script will also be attached to this lecture, which you can simply copy and execute. Since we’ve already executed it, we’re not going to do that again. So let’s go ahead and close this window. Here we have another query. This query filters the artist table using the Stage Name column. We’re going to execute this query before we create the nonclustered index, and then we’re going to execute it again after we’ve created the nonclustered index.

To accurately get the execution time of the query, we’re going to enable the time statistics for this session. This can be achieved by writing sets of statistics. The time on this will, among other things, return the time it took to execute a query. If we execute this statement, the time statistics will be enabled. Let’s go ahead and execute the select statement. If we look at the Messages tab, you’ll be able to see that it took about 129 milliseconds to execute the query and return the qualifying data. Let’s now go ahead and create the non-clustered index. If we execute the query again and head over to the Messages tab, you’ll see that we now have an execution time of zero milliseconds, which is a great improvement over the previous execution time.

This is why indexes are used to improve query performance. But there is also a big downside to using indexes, and that is the cost of maintaining them. As a result, operations where we modify data might take longer to complete. This might be something you cannot afford, especially in transactional databases where hundreds of transactions are being made each second. So it’s imperative to strategically create indices to get a balanced performance improvement where both data modifications and data retrieval operations can occur at an optimal pace.

Modifying the index is done the same way we’ve modified the previous database objects we’ve covered. We simply replace the Create command with the Alter command, and our index can now be modified. When altering an index, there are a number of things we can do. For example, disable the index, rebuild the index, reorganize the index, and modify the index properties. We can, however, not modify the index key columns. This means that if we wanted to change the columns of the index, We would need to write a drop statement to delete the index and recreate it on the desired columns. That’s it. For this introduction to indexes, we’ve now covered all the database objects that you should be familiar with for the exam. In the next video, we’re going to start with the fundamentals of administering a SQL Server database. Thank you for watching. see you then.