98-364 MTA Microsoft Database Fundamentals – Core Database Concepts

  1. Introduction to Databases

A database is an organized collection of logically interrelated data. By “data,” we mean known facts that can be recorded. For example, data about the customers of a company, sales the company makes, or products a company sells Nowadays, databases are used almost everywhere. If we look at any type of application software, there’s usually a data access layer responsible for retrieving and modifying data in a so-called back-end database. These applications usually have a number of users or clients who need access to that database. This is where a database server comes in.

A database server is either a physical or virtual computer system dedicated to using a database application that provides other computers with services related to accessing, modifying, and retrieving data from a database. These services include, but are not limited to, managing databases with concurrent access, which is the ability for multiple users to interact with the database at the same time, high-level performance reporting tools for performing data analysis, and security such as authentication and authorization. These are just some of the services provided by a database application.

Depending on the type of database being hosted by the database server, the database application could be either a database management system for managing non-relational databases or a relational database management system for managing relational databases. Some of the most popular relational database management systems are Microsoft SQL Server, MySQL Server, Oracle Server, and PostgreSQL. In this course, we’re only going to work with Microsoft SQL Server. That’s it for this part. Thank you for watching. In the next video, we’re going to discuss some of the most common types of databases. see you then.

  1. Types of Databases.

A flat file database is a database that stores data in a plain text file. Basically, the whole database is made out of one single text file, where each line represents a single record. Think about text files such as the comma-separated values (CSV) files, where the columns are separated by commas and each line represents a single record. Microsoft Excel is also a very popular example of this type of database. Flat-file databases are simple. The data as well as the structure can be easily modified. However, managing data with a flat file database comes with a number of issues. First of all, it’s impossible to form and enforce relationships in flat file databases.

This is the major cause of data redundancy. As you can see in the image below, we have the first and last names of a few customers appearing a number of times. This is a lot of redundant data. Another thing that might also lead to data redundancy is the fact that we’re not able to enforce uniqueness within a flat file database. Second of all, flat file databases hardly support concurrency, therefore not allowing multiple users to interact with the database. At the same time, flat file databases are also prone to data inconsistency, which might be a result of redundant data or modification anomalies, which are unintended side effects that occur when modifying the data. This leaves the database in a rather inconsistent state.

If we, for example, wanted to change the last name of the customer with an ID of 101, we wouldn’t need to search for every occurrence of this customer so we could change the last name accordingly. If we forget to change one occurrence, we will have two different customers with an ID of 101. You can only imagine how inconsistent the data can become if the flat file database is in an inconsistent state or if data is accidentally lost. We don’t have any advanced recovery mechanisms to restore the flat file database back to a consistent state other than just an old copy of the file. Then we have security. Flat-file databases are generally managed by the file system of an operating system. Operating systems provide only a password mechanism for security. This is not sufficiently flexible to enforce security policies in which different users are granted permissions to access different subsets of the data. This means that if we have access to the database file, we have access to the entire data set. The most problematic thing is that you cannot query flat-file databases on an ad hoc basis. As you can see, maintaining data in an aflat file database can be really hard. But these types of databases are still being used.

For example, because it is much easier to do, many computer applications store configuration data in flat file databases. It’s also quite interesting to perform data analysis on these types of files. Next, we have the hierarchical database. This type is based on the hierarchical data model. It has everything to do with organizing, storing, and representing data in a treelike structure. as can be seen in the image on the slide. On the left, we have the tree-like representation of a hierarchical database, while on the right, we have the data represented in tables. If we look at the image on the left, we can see that we have one company as the root node, Shark Inc. Shark Inc. has two departments, namely a finance department and an IT department. The finance department has two employees, while the IT department has one employee. In total, Shark Inc. has three employees.

This can also be seen in the table representation of the hierarchical database. I have added arrows to show you how the relationships are formed. The hierarchical databases are managed by a database management system and therefore solve most of the problems experienced with flat file databases, which are usually managed by a file system. Some of the characteristics of hierarchical databases are that they’re very fast when it comes to data modification due to their treelike structure. They’re also very fast at retrieving data, especially data at the top of the hierarchy. To retrieve the data, we need to traverse each tree until we get the desired data by following the explicit pointers, and this is done from top to bottom.

As a result, the top nodes in the hierarchy should be accessed first. Although hierarchical databases solved most of the problems with flat file databases, there was still a certain level of data redundancy. If we take a look at the hierarchy model, you’ll see that we’re able to form relationships. If we take a top-down approach starting from the root node, you’ll see that each node can have one or more child nodes, while if we take a bottom-up approach, you’ll see that each child node can have only one parent node.

This type of relationship is referred to as a “one to many” relationship. Being able to form these types of relationships in the hierarchical data model was able to reduce data redundancy but not entirely remove it. The last type of database we’re going to look at is a relational database. Relational databases are based on the relational model and represent databases as a collection of relations. If you’re familiar with relations in mathematics, this should be relatively easy for you to comprehend because it’s built on the same principles. We have the following student’s relationship, which is represented as a table. We have a domain of attributes, which in database terms refers to the columns, and we have a collection of tuples, which represent the rows in the table. The number of attributes or columns represents the degree of the relation, while the number of tuples or rows represents the cardinality of the relation.

These are just a few of the fundamental terms associated with relations. In theory, the body of a relation is a set of tuples. A set in mathematics has a number of characteristics. One of those characteristics is that a set cannot contain duplicate tuples, and that is where the primary key in a relation comes in. Primary keys are used to enforce uniqueness within a relationship. This means that if we have a primary key constraint defined on one or more attributes, it’s impossible to have rows with the same data. In our example, the primary key constraint is the student ID column, which ensures that each tuple is uniquely identifiable by this column. Another very important thing about relational databases is the ability to form rather complex relationships between relations. Let’s look at another example. In this example, we have three relations: an employee relation, a project relation, and a project assignment relation. All three relations have a primary key, but in the project assignment relation, we see another type of key. This is the foreign key. If we look at the relationship between these relationships, you’ll notice that each employee can be assigned to one or more projects, and each project can have multiple employees.

So we have a few relationships defined here. These relationships are defined by using the primary key of the parent table as a foreign key in the child table. The reason we use foreign keys is to enforce referential integrity. For example, in the project assignment table, we would not be able to insert a row in which the employee ID is equal to five. Instead, we would get a violation error, and the row would not be inserted because an employee with the employeeID of 5 does not exist in the employee table. And that’s essentially what referential integrity is all about: making sure that the data being referenced does in fact exist. The relational databases are managed by a relational database management system and are queried using the structured query language, or SQL for short. We will cover SQL in another section. That’s it for this part. In the next video, we’re going to dive deeper into relational databases and cover some of the fundamental relational database concepts. Thank you for watching. see you then.

img