SY0-501 Section 2.8 Summarize risk management best practices.

Business continuity

concepts One of the oldest phrases still in use today is “the show must go on.” Nowhere is that more true than in the world of business, where downtime means the loss of significant revenue with each passing minute. Business continuity is primarily concerned with the processes, policies, and methods that an organization follows to minimize the impact of a system failure, network failure, or the failure of any key component needed for operation—that is, essentially whatever it takes to ensure that the business continues and that the show does indeed go on.

Business continuity planning (BCP) is the process of implementing policies, controls, and procedures to counteract the effects of losses, outages, or failures of critical business processes. BCP is primarily a management tool that ensures that critical business functions can be performed when normal business operations are disrupted. Critical business functions (CBFs) refer to those processes or systems that must be made operational immediately when an outage occurs. The business can’t function without them, and many are information-intensive and require access to both technology and data.

Business impact analysis

Two of the key components of BCP are business impact analysis (BIA) and risk assessment. BIA is concerned with evaluating the processes, and risk assessment is concerned with evaluating the risk or likelihood of a loss. Evaluating all of the processes in an organization or enterprise is necessary in order for BCP to be effective.

Identification of critical systems and components

To identify critical functions, a company must ask itself, “What functions are necessary to continue operations until full service can be restored?” This identification process will help you establish which systems must be returned to operation in order for the business to continue. In performing this identification, you may find that a small or overlooked application in a department may be critical for operations. Many organizations have overlooked seemingly insignificant process steps or systems that have prevented business continuity planning (BCP) from being effective. Every department should be evaluated to ensure that no critical processes are overlooked.

Risk assessment

Risk assessment is also known as risk analysis or risk calculation. Risk assessment deals with the threats, vulnerabilities, and impacts of a loss of information-processing capabilities or a loss of information itself. A vulnerability is a weakness that could be exploited by a threat. Each risk that can be identified should be outlined, described, and evaluated for the likelihood of it occurring. The key here is to think outside the box. Conventional threats and risks are often too limited when considering risk assessment.

The key components of a risk-assessment process are outlined here:

Risks to Which the Organization Is Exposed This component allows you to develop scenarios that can help you evaluate how to deal with these risks if they occur. An operating system, server, or application may have known risks in certain environments. You should create a plan for how your organization will best deal with these risks and the best way to respond.

Risks That Need Addressing The risk-assessment component also allows an organization to provide a reality check on which risks are real and which are unlikely. This process helps an organization focus on its resources as well as on the risks that are most likely to occur. For example, industrial espionage and theft are likely, but the risk of a hurricane damaging the server room in Indiana is very low. Therefore, more resources should be allocated to prevent espionage or theft as opposed to the latter possibility.

Coordination with BIA The risk-assessment component, in conjunction with the business impact analysis (BIA), provides an organization with an accurate picture of the situation facing it. It allows an organization to make intelligent decisions about how to respond to various scenarios.

Disaster recovery Disaster recovery is the ability to recover system operations after a disaster. A key aspect of disaster recovery planning is designing a comprehensive backup plan that includes backup storage, procedures, and maintenance. Many options are available to implement disaster recovery.

Succession planning Succession planning outlines those internal to the organization who have the ability to step into positions when they open. By identifying key roles that cannot be left unfilled and associating internal employees who can step into these roles, you can groom those employees to make sure that they are up to speed when it comes time for them to fill those positions.

Tabletop Exercises A tabletop exercise is a simulation of a disaster. It is a way to check to see if your plans are ready to go. There are five levels of testing:

Document Review A review of recovery, operations, resumption plans, and procedures.

Simulation A walkthrough of recovery, operations, resumption plans, and procedures in a scripted “case study” or “scenario.”

Parallel Test With this test, you start up all backup systems but leave the main systems functioning.

Cutover Test This test shuts down the main systems and has everything fail over to backup systems.

High Availability

High availability (HA) refers to the measures used to keep services and systems operational during an outage. In short, the goal is to provide all services to all users, where they need them and when they need them. With high availability, the goal is to have key services available 99.999 percent of the time (also known as five nines availability).

Redundancy

Redundancy refers to systems that either are duplicated or fail over to other systems in the event of a malfunction. Failover refers to the process of reconstructing a system or switching. over to other systems when a failure is detected. In the case of a server, the server switches to a redundant server when a fault is detected. This strategy allows service to continue uninterrupted until the primary server can be restored. In the case of a network, this means processing switches to another network path in the event of a network failure in the primary path.

Fault tolerance

Hardware

In addition to software-based encryption, hardware-based encryption can be applied. Within the advanced configuration settings on some BIOS configuration menus, for example, you can choose to enable or disable TPM. A Trusted Platform Module (TPM) can be used to assist with hash key generation. TPM is the name assigned to a chip that can store cryptographic keys, passwords, or certificates. TPM can be used to protect smartphones and devices other than PCs as well.

RAID

The other fundamental aspect of fault tolerance is RAID, or redundant array of independent disks (RAID). RAID allows your servers to have more than one hard drive so that if the main hard drive fails, the system keeps functioning.

Clustering and Load balancing

RAID does a fantastic job of protecting data on systems (which you then protect further with regular backups), but sometimes you need to grow beyond single systems. Anytime you connect multiple computers to work/act together as a single server, it is known as clustering. Clustered systems utilize parallel processing (improving performance and availability) and add redundancy (but also add costs).

High availability can also be obtained through load balancing. This allows you to split the workload across multiple computers. Those computers are often Servers answering HTTP requests (often called a server farm), which may or may not be in the same geo- graphic location. If you split locations, this is usually called a mirror site, and the mirrored copy can add geographic redundancy (allowing requests to be answered quicker) and help prevent downtime.

Disaster recovery concepts

Disaster recovery is the ability to recover system operations after a disaster. A key aspect of disaster recovery planning is designing a comprehensive backup plan that includes backup storage, procedures, and maintenance. Many options are available to implement disaster recovery.

Backup plans/policies

Backups are duplicate copies of key information, ideally stored in a location other than the one where the information is stored currently. Backups include both paper and computer records. Computer records are usually backed up using a backup program, backup systems, and backup procedures.

Cold site

A cold site is a facility that isn’t immediately ready to use. The organization using it must bring along its equipment and network. A cold site may provide network capability, but this isn’t usually the case; the site provides a place for operations to resume, but it doesn’t provide the infrastructure to support those operations. Cold sites work well when an extended outage is anticipated. The major challenge is that the customer must provide all of the capabilities and do all of the work to get back into operation. Cold sites are usually the least expensive to put into place, but they require the most advanced planning, testing, and resources to become operational—occasionally taking up to a month to make operational.

Hot Site

A hot site is a location that can provide operations within hours of a failure. This type of site would have servers, networks, and telecommunications equipment in place to reestablish service in a short time. Hot sites provide network connectivity, systems, and preconfigured software to meet the needs of an organization. Databases can be kept up-to-date using network connections. These types of facilities are expensive, and they’re primarily suitable for short-term situations. A hot site may also double as an offsite storage facility, providing immediate access to archives and backup media.

Warm site

A warm site provides some of the capabilities of a hot site, but it requires the customer to do more work to become operational. Warm sites provide computer systems and compatible media capabilities. If a warm site is used, administrators and other staff will need to install and configure systems to resume operations. For most organizations, a warm site could be a remote office, a leased facility, or another organization with which yours has a reciprocal agreement.

img