The team is responsible for recovering and reestablishing operations of critical business programs

Business Impact Analysis

Susan Snedaker, Chris Rima, in Business Continuity and Disaster Recovery Planning for IT Professionals (Second Edition), 2014

Information technology

Critical business functions for IT? It seems like almost all of them are critical most of the time, especially if you judge by the phone calls, hallways pleas, and e-mails begging for assistance when one of the applications, servers, or hardware goes down. However, ultimately, the hardware and software should support the critical business functions, so the IT functions, in large part, will be driven by all the other departments. HR might say, “We have to have our payroll application”; marketing might say, “Without our CRM system, we can’t sell any products”; manufacturing might say, “Without our automated inventory management system, we can’t even begin to make anything.” All those statements may be true and many of these systems are interdependent. Going back to a concept mentioned earlier in this chapter, upstream and downstream systems need to connect in a logical and distinct manner, so there must be an order to which critical systems are addressed. Therefore, the IT department’s critical business functions are driven externally, to a large degree.

There are also business functions that occur within the IT department critical to the company’s ability to recover and continue doing business after a disaster. For example, the IT department needs to create, manage, and store backups of all data that changes after a disaster. If a disaster happens on a Tuesday and you’re able to get some systems up and running by the following Monday, backups need to start on Monday, as soon as data begins being generated, saved, or changed. Therefore, backup processes can be viewed as critical business functions from the IT perspective. However, you may have a hybrid environment where some systems are still being restored while other systems are in production, generating current data. Managing this environment is extremely critical, which is why we discuss how to restore systems later in this book.

Managing security is another critical aspect. In the aftermath of a major event or disaster, there’s a tendency to “just get it done.” However, ensuring the confidentiality, integrity, and availability of critical business data must still be a top priority. As with all information security functions, you’ll need to balance security with operational needs. Still, these are areas to consider as you develop your BC/DR plan and topics to discuss during the BIA process.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124105263000052

Information Security Essentials for Information Technology Managers

Albert Caballero, in Computer and Information Security Handbook (Third Edition), 2017

Contingency Planning

Contingency planning is necessary in several ways for an organization to be sure it can withstand some sort of security breach or disaster. Among the important steps required to make sure an organization is protected and able to respond to a security breach or disaster are business impact analysis, disaster recovery planning, and business continuity planning. These contingency plans are interrelated in several ways and need to stay that way so that a response team can change from one to the other seamlessly if there is a need. Business impact analysis must be performed in every organization to determine exactly which business process is deemed mission-critical and which processes would not seriously hamper business operations should they be unavailable for some time. An important part of a business impact analysis is the recovery strategy that is usually defined at the end of the process. If a thorough business impact analysis is performed, there should be a clear picture of the priority of each organization's highest-impact, therefore riskiest, business processes and assets as well as a clear strategy to recover from an interruption in one of these areas.

Business continuity planning (BCP) ensures that critical business functions can continue during a disaster and is most ultimately managed by the CEO of the organization. The BCP is usually activated and executed concurrently with disaster recovery planning (DRP) when needed and reestablishes critical functions at alternate sites (DRP focuses on reestablishment at the primary site). BCP relies on identification of critical business functions and the resources to support them using several continuity strategies, such as exclusive-use options like hot, warm, and cold sites or shared-use options like time-share, service bureaus, or mutual agreements. DRP is the preparation for and recovery from a disaster. Whether natural or manmade, it is an incident that has become a disaster because the organization is unable to contain or control its impact, or the level of damage or destruction from the incident is so severe that the organization is unable to recover quickly. The key role of DRP is defining how to reestablish operations at the site where the organization is usually located. Some key consideration in a properly designed DRP include:

Clear delegation of roles and responsibilities

Execution of alert roster and notification of key personnel

Clear establishment of priorities

Documentation of the disaster

Action steps to mitigate the impact

Alternative implementations for various systems components

DRP must be tested regularly

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128038437000247

BC/DR Plan Maintenance

Susan Snedaker, Chris Rima, in Business Continuity and Disaster Recovery Planning for IT Professionals (Second Edition), 2014

Changes in operations

During your risk assessment, you determined the mission-critical business functions that needed to be addressed in your BC/DR plan. Clearly, operations are not static, and changes over time to operations may impact the BC/DR plan. Reorganization, expansion, new departments, new facilities, and new management structures can all impact operations in a variety of ways. In some cases, changes in operation happen slowly over time and these changes may go unnoticed as it relates to the BC/DR plan. The BC/DR plan audit (discussed later in this chapter) can be an effective method of reviewing operations against the BC/DR plan. If the business’s mission-critical operations have changed over time or if the processes used to accomplish these functions have changed, the BC/DR plan is at significant risk of failure and should be revised. For example, if your company has slowly moved from bricks-and-mortar retail to e-commerce, many key processes may have changed. If the mix has shifted slowly over time, you might not notice it until you test the plan or perform a BC/DR audit. If you have been expanding storage or adding new lines of applications, servers, or business, these slow-but-steady changes can sometimes be incorporated without thought to bigger picture items like BC/DR readiness. Obviously, the key is to be sure your BC/DR plan addresses your mission-critical business functions and if those shift over time, your plan needs to be updated. Changes to operational processes should be implemented as needed, but it would help if your operations staff understood that any changes to their key processes should be flagged so the BC/DR team can review the impact of those changes on the plan and revise as needed.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124105263000106

Incident Response – Putting Out Fires Without Getting Burned

Aaron W. Bayles, ... Johnny Long, in Infosec Career Hacking, 2005

Summary

DRPs are a critical component of INFOSEC. They allow you to map out your critical business functions and how to recover them in the event of failure. They should cover worst-case scenarios as well as minor disruptions. They should start with a BIA, which takes an inventory of your different systems, their dependencies, and who relies on them. Once the BIA is complete, you then rank the systems and rate them on their criticality. Depending on their criticality, you determine the order of resumption. The DRP is a management document, but should also provide enough detail for a technical reader to understand their specific tasks. Some organizations use alternate sites to provide coverage in the event of a failure. These sites are rated hot, warm, and cold, depending on the amount of time needed to resume operations. An IR is another part of recovery planning, and is usually performed separately as it can be a more frequent activity. Understanding an IR is more than just fixing machines that are compromised in some manner, it also involves tracking failures and understanding what caused the failure in the first place.

There are many open source network and security tools that are commonly used for specific tasks. You can also use these same tools to perform more specialized tasks in INFOSEC. Snort, although primarily used as an IDS, can also be used to assist in IR activities by gathering data on a signature trip. You can also use Snort for tracking and trending specific types of network traffic. Syslog is easily set up as a centralized log server that can satisfy regulatory requirements. AMANDA allows you to create a centralized backup server that can be automated, again satisfying some requirements for specific regulations. File integrity checkers such as AIDE, Samhain, and Tripwire are used as part of a defense-in-depth strategy for servers and workstations alike. An INFOSEC engineer hacking software after development and testing can use the same code auditing tools that are used by programmers to check for insecure function usage. RATS and Flawfinder can analyze C/C++ code in source, while BTBTester and SPIKE can be used on compiled binaries for input validation and insecure operation testing. Nessus is known for vulnerability assessment, but can also be used for checking and tracking IR roles. While forensic tools like Sleuth Kit and TCT were designed for forensic analysis after an incident, you can also use them for assisting in routine checks on failed machines and other data recovery operations. Network trending and analysis is important for understanding how normal traffic flows through your infrastructure. Cacti and MRTG can be used in conjunction with tcpdump and ngrep for figuring out where choke points are, as well as discovering anomalous traffic and their sources. Kismet can turn a basic capacity for wireless detection into a wireless assessment capability with the right methodology and deliverables. Finally, using tools for secure deletion, such as Secure Delete and DBAN, can keep sensitive data on drives from being released into the public.

When you start your new career, you can do some things to avoid problems from happening. Keep your skills current, make sure your attire is appropriate, and give back to the INFOSEC community. You should be proactive with problems you see; try to do a little extra, look for the unpopular tasks and help out, always observe deadlines, and make sure your work is thorough and timely. Be prepared to work solo on tasks, get to know others in your field, and always keep track of your accomplishments and achieved goals.

When you do encounter problems, take the time to understand the root cause and address them in a logical manner. You will always have obnoxious co-workers. Learn their ways and see what you can do to head off their issues. Understand that 80 percent of the work is often done by 20 percent of the workers. Ethics are paramount in INFOSEC. Always be careful of situations where you think someone is operating unethically. It may be easier to keep your work and personal life separate so that if problems occur in the one, they do not automatically influence the other. Burnout is real and can be a major problem. Take small breaks from your work to keep from being overloaded. Mergers and acquisitions may happen, which are especially stressful. Make the decision whether to stay or look for a new job. If you stay, figure out your new goals with the new company and see if you can help create a better work environment.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9781597490115500179

Training, Testing, and Auditing

Susan Snedaker, Chris Rima, in Business Continuity and Disaster Recovery Planning for IT Professionals (Second Edition), 2014

Validation of task integration

Any walk-through or test of the plan should involve key personnel from mission-critical business functions as well as members of the BC/DR team. During the validation of task integration, these business subject matter experts will be best able to identify if the tasks are listed in the right order, with the right dependencies, with the right requirements, or resources, and such. The integration of tasks is often where plans fail in implementation due to the complexity of most businesses today. This is particularly true when looking at IT systems, which are at the heart of most recovery efforts. If tasks are not properly identified and sequenced, it can take hours, days, or weeks to uncover the source of the problem. The time and place to do this is in the plan testing phase, not during an emergency.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B978012410526300009X

Information Security

In The Manager's Handbook for Business Security (Second Edition), 2014

Cyber Incident Response Planning

A cyber incident is a real, perceived, or threatened event that involves technology such as data, business applications, computers, networks, or electronic communications with the potential to have a major negative impact on the business. Cyber incidents may range in seriousness from no direct impact to customers to major disruption of business operations or significant impact to the company’s reputation.

The purpose of the cyber incident response plan is to define the process to respond to cyber incidents, which significantly impact the company’s critical business functions. It documents the procedures for responding to situations, which impact the company’s ability to provide services to customers or to meet legal or regulatory requirements. The plan is limited to the response to cyber incidents after detection or identification and does not address ongoing preventative actions or detection techniques. Objectives of the cyber incident response plan are to supplement the cyber incident defense program by developing a response strategy that

facilitates timely assessment of potential problems while ensuring a coordinated and comprehensive response to incidents that cross business units;

minimizes the impact of cyber incidents on the company’s ability to provide service while maintaining the company’s public image and credibility;

facilitates prosecution of offenders as appropriate;

defines an organization to implement the response plan and includes definitions of the roles of the leaders and members;

documents procedures to rapidly notify, deploy, and coordinate corporate resources to assess and respond to the incident; and

documents the process to define how a decision to activate the plan is made and by whom.

In order to maximize overall effectiveness for preventing and responding to cyber incidents, a comprehensive, ongoing program with emphasis on prevention and early detection forms the foundation on which the response plan is based. In the event of a major incident, business unit senior management must be prepared to

accept the consequences of the required responses in order to maximize the effectiveness of the response;

support a comprehensive and cohesive response effort if required;

take actions that are in the company’s best interest, even though they may not be the best for the specific business unit, and which must be identified; and

identify and understand roles of critical team members prior to an incident, for support from their management is essential.

Development of a cyber incident response strategy and related plans offers both the IT and corporate security teams an opportunity to identify shared and unique skills and responsibilities while building teamwork, which will serve beyond timely and effective incident management.

Never before has cyber security been more important than it is today. The world is not only being confronted with technology advancements on a daily basis, but is in the midst of an evolutionary event that has a global impact. The “bad guys” don’t have to be in the same room; they can be on a different continent and obtain the same results—from a safer haven.

To provide for a secure future, both corporations and computer users must be constantly cognizant of the threats that are continually present and stalking their systems. They need to use safeguards (antivirus, anti-spam, anti-spyware, etc.) on their computers and be leery of scams and frauds. They must recognize changes in operation or other unusual characteristics of their computers and take corrective actions quickly.

Today and in the future, information assurance depends on an aware, accountable computer user, and that responsibility has to be set in policy, communicated, and enforced.

The image in Figure 8.2 displays many of the diverse threats and risks that confront an information protection program. The key to proactive protection is the recognition that threat is truly dynamic. It is constantly changing and adapting to our efforts to safeguard vital information assets.

The team is responsible for recovering and reestablishing operations of critical business programs

Figure 8.2. Information Protection Program Threats.

The many and diverse threats and risks that confront an information protection program.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128000625000087

Privacy and Security in Healthcare

Timothy Virtue, Justin Rainey, in HCISPP Study Guide, 2015

Business Continuity and Disaster Recovery

Business continuity is a relatively broad concept and often means different things to different organizations. However, in its simplest form, business continuity is about ensuring an organization’s critical business functions are available and continue to operate during an incident or emergency. Each organization will need to define what critical operations are, as well as what an incident or emergency is, and business continuity and disaster recovery planning accomplish this. Business continuity planning is the process that organizations follow so they can implement policies, procedures, processes, and technology used to ensure how an organization’s business objectives will be performed after a significant disruption. It is important to note that a variety (both natural and man-made) of events (e.g., flood, terrorism) can define a significant disruption and organizations must plan accordingly. Additionally, patients have high expectations that healthcare organizations have the ability to provide critical services and patient care regardless of a significant disruption. In fact, if the event is regional or national in nature (e.g., an entire state is impacted by a pandemic), there may be an even greater expectation or need that healthcare organizations can deliver patient care services. Disaster recovery planning is similar in nature to business continuity planning and they often work in conjunction to support the healthcare organization’s short- and long-term needs when recovering from a significant and disruptive unplanned event. A useful perspective on the relationship is that business continuity is about building systems and processes that are resilient to disruptions and disaster recovery is about systems and processes that enable the organization to quickly recover from a disruption. Disaster recovery tends to focus on the short-term response after a disaster occurs. Typically it focuses on the organization’s critical systems (essential to providing healthcare services) in a timely manner (e.g., the pharmacy system needs to be available within 2 h of a disaster). Although each healthcare organization will determine its specific needs, and several factors should go into a disaster recovery planning, the following elements are essential to a comprehensive disaster recovery plan:

Critical application assessment – Determining which systems are essential to the healthcare organization delivering critical services.

Backup procedures – Determining the details (e.g., who, what, when) and steps for backing up systems and data.

Recovery procedures – Determining the details (e.g., who, what, when) and steps for recovering data and systems from backup after a disaster.

Implementation procedures – Determining how to best implement the selected procedures.

Test procedures – Developing test procedures so that testing can be performed on a regular basis to ensure the organization is capable of recovering after a disaster.

Before an appropriate backup strategy can be selected, it is important to understand the fundamentals of backups. Essentially, backups can be managed as onsite or remote. However, most healthcare organizations will determine that a hybrid approach offers the best solution since there are both advantages and disadvantages to each strategy.

Onsite backups – Allow organizations to store the data onsite, the advantage being they are readily available if needed. One disadvantage is that if the onsite location is unavailable or destroyed, so are the backup data. Another disadvantage is that the physical backup media (and information it contains) must be properly safeguarded.

Remote backups – Allow organizations to store the data offsite, the advantage being that if the location is unavailable or destroyed, the backup data still exist. The disadvantage is that it can be time consuming or difficult to obtain backup data in a timely manner when they are not located onsite.

Another important factor when discussing backups is the frequency and specific methodology. Each method has advantages and disadvantages and most healthcare organizations select a hybrid approach to best meet their specific needs. The most common types include the following:

Full backup – Includes all data and is slower to backup but faster to restore and requires the most storage space.

Incremental backup – Only includes new or modified data, is fast to backup, has a moderate restore time, and requires lower storage space.

Differential backup – Includes all changed data since the last full backup, has a moderate time frame to backup, and requires moderate storage space.

Mirrored backup – Only includes new or modified data and has the fastest backup and restore times, but requires the most storage space.

Healthcare organizations should develop a compressive program and supporting documentation to support their backup strategy. This is used to clarify expectations and execution details. It is best to answer the “five W’s plus one” (who, what, when, where, why, and how) questions when documenting the backup strategy. Since both improvements to backup technology and the substantial reduction in cost for backup technology have occurred in recent years, healthcare organizations have additional alternatives (rather than traditional tape or other storage media) to consider when developing their backup strategies. This includes alternative sites, high availability architectures and technologies, and real-time journaling. Additionally, the capabilities associated with alternate sites need to also be considered by the healthcare organization’s strategy. The three most common considerations include a hot site, cold site, and warm site.

Hot sites – Duplicate the primary production site in terms of both infrastructure and data. The main advantage is that there is little to no time required to shift operations from the original production site to the alternative site. The primary disadvantage is the costs and maintenance associated with fully duplicating a production environment.

Cold sites – Have the primary advantage of being low cost, since the information system hardware and backup data are not present at the cold site. The disadvantage is the time and effort required to establish a production-level environment.

Warm sites – Are a hybrid alternative to hot and cold sites. The advantage is that they cost less than a hot site, but have more robust capabilities than a cold site.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128020432000045

Project Initiation

Susan Snedaker, Chris Rima, in Business Continuity and Disaster Recovery Planning for IT Professionals (Second Edition), 2014

Business requirements

Business requirements are the first step in developing BC/DR project requirements because you must first understand the critical areas of your business. What questions should you ask to ascertain which are the critical business functions? As you know, if you ask users what the most important systems are, they’ll give you a list a mile long. Rather than ask that type of question, many experts advise using scenario-based questions to help focus attention and elicit useful information. This may take a bit longer in the short-term but will save you time and headaches later. Keep in mind, too, that the first major deliverable for your BC/DR plan is likely to be the risk assessment, which may lead you back to modifying your business, functional, and/or technical requirements. Each iteration should move more quickly and inject less change into the project than prior iterations. So, if you create your business requirements and then do your risk assessment, you may find the priorities for the business requirements change or that a critical system was omitted during the first round. This is a normal part of project planning. However, if you find that each iteration is injecting more change and more uncertainty or more confusion to the process, you need to step back and assess what’s happening. It might be that the project is beginning to manage you (rather than you managing the project), or it could be some key assumptions were incorrect or that your organization is in the midst of a significant change. Be aware that a project plan that feels like shifting sand beneath your feet is in danger of getting out of control and failing. Let’s look at some questions you can use to elicit the type of business information you need to create a useable BC/DR plan, understanding that some of the answers to these questions may change slightly over time. You can tailor these questions to the specifics of your organization, but this should give you a good start.

What would happen if the server room caught on fire and the fire suppression system activated?

What would happen if our power to the data center was cut and unavailable for 4 hours? 4 days?

What would happen if our cooling system in the data center failed and replacement parts were not available for a month?

What would happen if there was a fire in the building and we had to evacuate the building immediately? What would happen if we were not able to reenter the building for 3 weeks or 3 months?

What would happen if a security breach was discovered and our customer database was compromised?

What would happen if we discovered that our Web server had been hacked?

What would happen if an earthquake (hurricane, tornado, flood) destroyed this building and many of our employees’ homes in this area?

What would happen if a major snow storm made it impossible for employees to get to work for a week or two?

What would happen if a chemical spill from a nearby plant or railroad forced us to evacuate this building for a week? A month? 6 months?

What would happen if electricity to this site were cut or unavailable for half a day, 1 day, 1 week, 1 month?

What would happen if our high-speed connection to the Internet were to go out for half a day, 1 day, 1 week, 1 month?

What would happen if a bomb went off in this building and we could not get back into it, ever?

What would we do if major transportation routes (air, rail, road, sea) were shut down or disrupted?

What would we do if key people were killed, injured, or missing?

As you can see, these questions elicit information because they create “what if” scenarios to which team planners have to respond. It gets people thinking in very concrete terms and you, as project manager, can help step them through this process. Again, since there may be aspects to the BC/DR planning process that do not fall under your direction or management, you may not be responsible for managing this process. However, whether you’re heading this up or simply participating as part of the team, you can bring your skills and expertise to the team process and help step people through this process. By envisioning “what would happen if,” you can help craft a realistic view of what the next steps would be. Immediately, people will begin thinking about what they would do without a server or without an application or without the resources at their desks—and this helps you begin to determine what the technological priorities and needs are for your BC/DR plan. Also keep in mind that fire has historically been the number one “disaster” to hit businesses, so start with the smaller, more localized potential problems and expand from there. It does no good to be fully prepared for a hurricane if you don’t have a plan for the common problems business face like power and cooling failures.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124105263000039

Intranet Security

Bill Mansoor, in Computer and Information Security Handbook (Third Edition), 2013

10 Rehearse the Inevitable: Disaster Recovery

Possible disaster scenarios can range from the mundane to the biblical in proportion. In intranet or general IT terms, recovering successfully from a disaster can mean resuming critical IT support functions for mission-critical business functions. Whether such recovery is smooth and hassle-free depends on how prior disaster recovery (DR) planning occurred and how this plan was tested to address all relevant shortcomings adequately.

The first task when planning for DR is to assess the business impact of a certain type of disaster on the functioning of an intranet using business impact analysis (BIA). BIA involves certain metrics; again, off-the shelf software tools are available to assist with this effort. The scenario could be a natural hurricane-induced power outage or a human-induced critical application crash. In any one of these scenarios, one needs to assess the type of impact in terms of time, productivity, and finance.

BIAs can take into consideration the breadth of impact. For example, if the power outage is caused by a hurricane or an earthquake, support from generator vendors or the electricity utility could be hard to get because of the large demands for their services. BIAs also need to take into account historical and local weather priorities. Although there could be possibilities of hurricanes occurring in California or earthquakes occurring along the Gulf Coast of Florida, for most practical purposes the chances of those disasters taking place in those locales are pretty remote. Historical data can be helpful for prioritizing contingencies.

Once the business impacts are assessed to categorize critical systems, a DR plan can be organized and tested. Criteria for recovery have two types of metrics: a recovery point objective (RPO) and a recovery time objective (RTO).

In the DR plan, the RPO refers to how far back or “back to what point in time” that backup data have to be recovered. This time frame generally dictates how often tape backups are taken, which again can depend on the criticality of the data. The most common scenario for medium-sized IT shops is daily incremental backups and a weekly full backup on tape. Tapes are sometimes changed automatically by tape backup appliances.

One important thing to remember is to rotate tapes (that is, put them on a life-cycle plan by marking them for expiry) to make sure that tapes have complete data integrity during a restore. Most tape manufacturers have marking schemes for this task. Although tapes are still relatively expensive, the extra amount spent on always having fresh tapes ensures that there are no nasty surprises at the time of a crucial data recovery.

RTO refers to how long it takes to restore backed up or recovered data to its original state for resuming normal business processes. The critical factor here is cost. It will cost much more to restore data within an hour using an online backup process or to resume operations using a hotsite rather than a 5-h restore using stored tape backups. If business process resumption is critical, cost becomes a less important factor.

DR also has to take into account resumption of communication channels. If network and telephone links are not up, having a timely tape restore does little good to resume business functions. Extended campus network links often depend on leased lines from major vendors such as Verizon and AT&T, so having a trusted vendor relationship with agreed-on service level agreement (SLA) standards is a requirement.

Depending on budgets, one can configure DR to happen almost instantly, if so desired, but that is a far more costly option. Most shops with “normal” data flows are okay with business being resumed within the span of about 3–4 h or even a full working day after a major disaster. Balancing costs with business expectations is the primary factor in the DR game. Spending inordinately for a rare disaster that might never happen is a waste of resources. It is fiscally imprudent (not to mention futile) to try to prepare for every contingency possible.

Once the DR plan is more or less finalized, a DR committee can be set up under an experienced DR professional to orchestrate the routine training of users and managers to simulate disasters on a frequent basis. In most shops this means management meeting every 2 months to simulate a DR “war room” (command center) situation and employees going through a mandatory interactive 6-month disaster recovery training, listing the DR personnel to contact.

Within the command center, roles are preassigned, and each member of the team carries out his or her role as though it were a real emergency or disaster. DR coordination is frequently modeled after the US Federal Emergency Management Agency guidelines, an active entity that has training and certification tracks for DR management professionals.

Simulated “generator shutdowns” in most shops are scheduled on a biweekly or monthly basis to see how the systems actually function. The systems can include UPSs, emergency lighting, email and cell phone notification methods, and alarm enunciators and sirens. Because electronics items in a server room are sensitive to moisture damage, gas-based Halon fire-extinguishing systems are used. These Halon systems also have a provision to be tested (often twice a year) to determine their readiness. The vendor will be happy to be on retainer for these tests, which can be made part of the purchasing agreement as an SLA. If equipment is tested on a regular basis, shortcomings and major hardware maintenance issues with major DR systems can easily be identified, documented, and redressed.

In a severe disaster situation, priorities need to be exercised regarding what to salvage first. Clearly, trying to recover employee records, payroll records, and critical business mission data such as customer databases will take precedence. Anything irreplaceable or not easily replaceable needs priority attention.

We can divide the levels of redundancies and backups to a few progressive segments. The level of backup sophistication would, of course, depend on (1) criticality and (2) time-to-recovery criteria of the data involved.

At the basic level, we can opt not to back up any data or not even have procedures to recover data, which means that data recovery would be a failure. Understandably, this is not a common scenario.

More typical is contracting with an archival company of a local warehouse within a 20-mile periphery. Tapes are backed up onsite and stored offsite, with the archival company picking up the tapes from your facility on a daily basis. The time to recover depends on retrieving the tapes from archival storage, getting them onsite, and starting a restore. The advantages here are lower cost. However, the time needed to transport tapes and recover them might not be acceptable, depending on the type of data and the recovery scenario.

Often a “coldsite” or “hotsite” is added to the intranet backup scenario. A coldsite is a smaller and scaled-down copy of the existing intranet data center that has only the most essential pared-down equipment supplied and tested for recovery but not in a perpetually ready state (powered down as in “cold,” with no live connection). These coldsites can house the basics, such as a Web server, domain name servers, and SQL databases, to get an informational site started up in very short order.

A hotsite is the same thing as a coldsite, except that in this case the servers are always running and the Internet and intranet connections are “live” and ready to be switched over much more quickly than on a coldsite. These are just two examples of how the business resumption and recovery times can be shortened.

Recovery can be made rapidly if the hotsite is linked to the regular data center using fast leased-line links (such as a DS3 connection). Backups synched in real time with an identical redundantarrayof inexpensivedisks at the hotsite over redundant high-speed data links afford the shortest recovery time.

In larger intranet shops based in defense-contractor companies, sometimes there are requirements for even faster data recovery with far more rigid standards for data integrity. To-the-second real-time data synchronization in addition to hardware synchronization ensure that duplicate sites thousands of miles away can be up and running within a matter of seconds, even faster than a hotsite. Such extreme redundancy typically is needed for critical national databases (that is, air traffic control or customs databases that are accessed 24/7, for example).

At the highest level of recovery performance, most large database vendors offer “zero data loss” solutions, with a variety of cloned databases synchronized across the country that automatically failover and recover in an instantaneous fashion to preserve a consistent status, often free from human intervention. Oracle's version is called Data Guard; most mainframe vendors offer a similar product, varying in their offerings of tiers and features.

The philosophy here is simple: The more dollars you spend, the more readiness you can buy. However, the expense has to be justified by the level of criticality for the availability of the data.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128038437000156