Javascript Navigation Menu

Ask The Experts

Recently Asked Questions

[CDP] [DPI] [ILM] [LTASCI] [Professional Services Task Force] [VTL] [XAM] [100 Year Archive]

Continuous Data Protection (CDP):

Q: What types of organizations are utilizing CDP technology?

A: CDP is being deployed by organizations of all sizes and industry types today. There is no specific “profile” that limits the value CDP is bringing to IT groups across small, medium and large organizations in manufacturing, retail, financial services, healthcare, government, and service organizations globally.

For organizations with mission-critical data and valuable transactions, continuous data protection ensures that ALL data changes are captured and protected as they occur, thus eliminating the risk of data loss due to gaps in protection. With CDP there is no exposure to data loss between scheduled backups or snapshot apertures. One legal firm implemented a CDP solution after estimating the exposure cost of a 1-day backup protection gap at over $1 million.

Many organizations are choosing application-intelligent CDP to protect and improve availability for messaging systems, such as Microsoft Exchange. For these organizations, reliable, rapid, consistent recovery is a key driver for implementing a cohesive CDP solution. Many organizations that have implemented CDP for messaging today are also concerned with the need to meet regulatory standards for data protection, retention and Electronic Data Discovery (EDD) and are finding that CDP solutions are meeting their needs.

Operational simplicity and savings are proving drivers for CDP adoption for organizations who want to reduce the management burden of data protection. Key IT and application support staff do not need to be involved in maintaining data protection or in recovery processes with many of today’s CDP solutions. Even print servers are being protected with CDP to save IT staff time!

Because CDP is capturing and maintaining all data in local or remote repositories, organizations are leveraging these repositories for data analysis and reporting. An example of this is in retail organizations where timely decision-making can directly impact margins and profitability. Real-time and historical data is accessed from the CDP repository to support decision-making without interrupting or slowing down retail operations on production application and database servers.

Bottom line, CDP is not a niche product for any particular organization type or size. It is proving an effective solution in improving availability, reliable recovery, RPO and RTO for all sorts of data, structured and unstructured, at the server and desktop level. Check out the CDP Buyers Guide to find vendor contacts and to get more detailed information about available solutions, so you can evaluate how CDP can integrate into your IT environment and improve service levels.

- Agnes Lamont, Vice President of Marketing, TimeSpring

Q: How does CDP augment traditional, scheduled backup systems?

A: Often touted for its ability to provide instant restoration and finely granular recovery, CDP also solves the Backup Window Paradox by evaporating the backup window itself, enabling backup copies to be made anytime, day or night, without impacting online operations.

CDP systems can recreate copies of the same time point repeatedly because the process is non-destructive to the underlying data. These instantly created time-based versions of data can easily be backed up using traditional, scheduled backup systems to removable media, replicated over distance or farmed out for alternative uses, all without impacting the ongoing online applications.

It is important to note that CDP systems deliver what is known as an ‘atomic’ view of the data. All the data across all the disks is shown at exactly the same moment in time. It’s as if time stopped at that exact moment. This atomic view provides consistency and stability across databases, applications, federations and even entire datacenters. CDP can dynamically recreate entire application environments without application involvement. In fact, the alternate view staging can be done on a completely different SAN or even in a separate geographic location.

As we’ve established, CDP technology can, in parallel to online applications, present alternate views of data, from any point or event in time. An important use for these time-based views is as a source for creating copies using traditional, scheduled backup systems. A simple process creates a time-stamped copy, mounts it on the backup server, initiates a backup, dismounts and evaporates the copy – all at the touch of a button.

There are a wide range of additional uses for time-stamped views. For the first time, an audit can be performed before a backup is performed – improving confidence in an effective recovery should the backup copy ever be needed.

Obviously, if the backup were to fail, it would be a simple matter to recreate the image and restart the backup.

What about Restoration? There are two general principles to restoration: Recovery Point Objective (RPO) and Recovery Time Objective (RTO). RPO is the targeted recovery point, and essentially defines the maximum amount of data loss that can be tolerated in a recovery.

If a business determines that it can’t lose more than six hours of data, then a backup copy of the data must be created every six hours. Of course, because most logical data failures aren’t found when they occur, but later (minutes, hours, even days), the RPO typically is implied across a time frame like two or three days. So a six-hour RPO over two days means that a backup occurs every six hours, four per day, for a total of eight.

RTO defines how long the business can tolerate being offline or down from a failure. It typically includes the three phases of application recovery: analysis (typically 5% of the RTO); data restoration (typically 90% of the RTO); and recovery (typically 5% of the RTO). The caliber of restoration capabilities can be best judged in terms of RPO and RTO, with a note about how easily the environment can be audited so as to avoid restoration failures.

CDP as a technology offers infinite RPOs. That is what the word ‘continuous’ in CDP refers to: the continuous spectrum of recovery points. Any point or event in time means just that, so the RPO becomes infinitely granular. With CDP, nothing extra has to be done, and the only additional enhancement the user can scale is how far back they want the recovery spectrum to extend.

Recovery Time Objectives are a little different. While most CDP systems provide the user with a picture of the data from the restoration point – this picture or image can then be used as a source to copy into the production environment, much like a snapshot, there are also CDP solutions which implement Virtualized Recovery providing the ability to remap or virtualize the recovery data right in the production environment, allowing for RTO times to be measured in seconds.

CDP: A New Solution for an Old Problem. Backup has long been an issue plaguing IT organizations for years, complemented by the unprecedented data growth over the past years coupled with unrelenting competitive pressures to remain operational around the clock. Luckily, CDP has come to augment organizations’ traditional, scheduled backup systems to meet their business and operational requirements.

- Michel Fisher, Software Engineer, Revivio, Inc.

Data Protection (DPI):

Q: Coming Soon!

A: Coming Soon!

Information Lifecycle Management (ILM):

Q: I'm writing to you because I ran into a problem when reading your article titled, "Managing Data and Storage Resources in Support of ILM”-- snapshot of a work in progress. I believe only you can help me out.

The sentence in question reads: “The intended audience includes groups defining standards using web services related to business process management, enterprise content management, grid computing and records information management.” I found at least three possible ways to understand it: 1) groups defining standards by using web services, and the web services are related to xxx; 2) groups defining standards by using web services, and the standards are related to xxx or groups using web services to define standards related to xxx; 3) groups defining standards for using web services related to xxx.

Obviously, only the correct answer can survive, and the other two must be struck out. My question is which is the correct one? I'm also wondering if web services can be used to define standards. A voice from an expert like you would be conclusive to my question and highly appreciated. Thanks for your time on this message.

- David Y.

A: Thank you for your question. I also received the same question via Arnold Jones and one of my co-authors, Jack Gelb, while I was on vacation.

Rather than choosing a, b, or c from your list, let me first provide you with the context. First, SNIA is defining standards for the management of storage in SMI-S. The intent of the SNIA ILM TWG is to extend SMI-S "up the stack" to the management of data as it relates to storage, storage services and data management applications such as backup and replication management.

Second, the authors of SMI-S recognize the industry migration toward web services for management applications and have identified a migration path for SMI-S to include web services over the next couple of releases.

- Edgar St. Pierre, Chair of the SNIA ILM Technical Workgroup, Chair of the DMF’s ILM Initiative and Senior Technologist at EMC2

Q: I’ve started a project for rationalization and optimization using ILM. The environment is mainly EMC but the customer wants to apply policies to optimize the storage. I want to know how to define classification criteria, such as which indicators should I consider, etc.?

At the moment, I'm considering 3 axes:
A - Availability (RTO and RPO)
B - Performance (usability, response time)
C - Retention requirements (due to legal or business reasons)

Should I consider more axes, variables to classify data and be able to associate this classification to storage tiers?

Another question: Is there any published document with storage alternatives, Storage SW and Hardware that could be useful to analyze alternatives?

- David from Barcelona

A: Great question! There's no single answer, but let me give you some things to consider.

First, every customer's needs are unique to some degree. They may have industry or business model-specific criteria on which they want to drive classification of data, or they may have personal preferences that will affect both the classification categories as well as the criteria. For example, some CIOs tend to be more risk averse than others and the CEO pays them to think that way.

The three you name are among the most common.

Availability / Data Recoverability is clearly business-driven based on the potential losses to the business if their application is down. If the classification determines the type of data recovery solutions that are to be used (disk-based backup vs. tape, snapshots, cdp, etc), then this is really just about Data Recoverability. If the classification also determines the class of storage configuration to be used so as to be more or less robust with respect to hardware failures (MTBF of hardware, RAID config, multi-paths, remote synchronous replication, etc) then this criteria truly determines availability. Note that these criteria can be combined or separate, depending on how you've organized your storage solutions. Also important here: you may need to distinguish Disaster Recovery from Operational Recovery solutions. This can be determined in a DR plan based on which applications need to be restored in which order. If a valuable application can not tolerate downtime in the primary data center, then a fast RTO is in order for Operational Recovery. But if it is dependent on many other applications being available, then having a fast Disaster Recovery RTO is unlikely.

Performance: Yes, certainly. I've also seen performance combined with hardware robustness (availability) since sometimes the two may change together between different storage configurations. E.g., if your EMC environment has both Symmetrix and Clariion, then you could easily configure the individual storage arrays so that the Symmetrix provides better performance and availability than the Clariion -- at a price that should reflect that difference.

Retention Requirements: In today's world of compliance, this is very often used -- both for how long to retain the data, as well as retention criteria such as immutability, that can be used to determine the storage configuration required for the data. This criteria may be driven by a records information manager, legal, or some other part of the organization than the business where the role of compliance officer has settled.

Other criteria to consider are below. NOTE: you don't need to classify data across all the different dimensions. The REQUIREMENTS of business, security, legal, RIM, and other stakeholders should determine the criteria needed to classify data across the different storage configurations you build to meet those requirements.

Cost: business is bottom line. Why not reflect to them what it will cost for each storage solution, and even allow them to specify the most they are willing to spend as a key classification criteria? Quantification may vary, but something like a fully loaded per GB/month works.

Search/index: if data requires frequent search to satisfy litigation discovery claims, then it would be best placed in a storage configuration that includes a search/index capability.

Security: this is the granddaddy of classification criteria. The data's accessibility should provide guidance in how the data should be stored.

I hope this helps. Please note, however, that the individual customer requirements are just that -- individual. These are guidelines that may be useful in helping you meet those customer requirements.

- Edgar St. Piere, Chair of the SNIA ILM Technical Workgroup and Senior Staff Software Engineer, EMC

Q: How does SNIA define ILM? And how does Information Lifecycle Management add value to the enterprise?

A: SNIA's formal definition for ILM can be found in the SNIA dictionary "Information Lifecycle Management - ILM"

The policies, processes, practices, services and tools used to align the business value of information with the most appropriate and cost-effective infrastructure from the time information is created through its final disposition. Information is aligned with business requirements through management policies and service levels associated with applications, metadata and data."

ILM applies to all operational aspects of the enterprise from business policies, processes and procedures to the mechanics of handling and managing the data related to the business. It enables new and emerging requirements, like compliance and governance, to be incorporated into your existing environment, and assists with important day-to-day considerations like optimal resource utilization, cost-effective technology implementations and leverage, as well as, long term information and data handling by applying business-driven policies and requirements to Information Technology planning and deployment.

Resource usage and utilization, information tracking and management, and enabling efficient deployment of new business applications and processes - ILM assists corporations to meet these goals by applying an end-to-end approach across the lifecycle of business information and aligning that with the most effective resources.

For more information on SNIA's definition for ILM and how it relates to the enterprise, see ILM Definition and Scope - An ILM Framework as well as Information Convergence: Transforming the Information-Centric Enterprise which describes the transformation of today's enterprise into an Information-centric enterprise.

Q: Is ILM specific to certain vendors or products?

A: No, not at all! ILM is business-driven and leverages technology -- not the other way around. Although ILM's most recent emergence as an important approach and discipline is receiving significant attention, there are many technologies and products (from many vendors!) already deployed that assist with the implementation of effective information management. Today, these technologies are deployed as disparate islands but the promise of ILM is to bring these together along with new products and approaches to manage information based upon business-driven requirements, from creation to destruction.

For more information on the roadmap that is envisioned for ILM, including some of the products and standards that are involved with delivering heterogeneous solutions into the future, see Information Lifecycle Management Roadmap whitepaper.

Q: How can ILM help with my business applications?

A: Corporations cannot afford to deploy "technology islands" for each business application; this has proven to be the most expensive, unmanageable and risk-prone method of application deployment. While it is expected that some applications will have unique user, infrastructure or business-centric requirements, leveraging the existing environment and associated investment across business applications is not necessarily approached using holistic approaches and methods. Further, adding in new goverment compliance and regulatory requirements to existing business application environments has proven to also be a daunting and complicated endeavor. ILM provides methods, guidelines and highlights important considerations with respect to information classification techniques, deploying tiered infrastructures and creating new practices to align the correct environment according to the business application's requirements and the importance of the associated business information.

Proper classification is the underlying principle surrounding this alignment. Classification of information based on business requirements. Classification of data in support of service level management. And classification of resources to create a consolidated set of manageable, scalable, resources to meet the needs of the business. For more information on classification and the alignment of information requirements with the most effective resources, see the recent tutorials developed by the DMF for Storage Networking World and Enterprise Information World industry events:  Information Classification: The Cornerstone to Information Management which provides an excellent introduction to the area of information classification, and ILM: Tiered Storage & The Need for Classification which examines the specific link between data classification, service level management, and tiers of storage.

Long Term Archive & Compliance Storage Initiative (LTACSI):

Q: Coming Soon!

A: Coming Soon!

Professional Services Task Force:

Q: What is the mission of the Professional Services Task Force?

A: The mission of this new task force is to look at the areas of commonality shared by practitioners of the art in Professional Services ILM engagements. The goal is to review common areas of terminology, process, and scope to identify areas that all vendors share in common.

- Gary Zasman and Edgar St. Pierre


Q: Coming Soon!

A: Coming Soon!


Q: Coming Soon!

A: Coming Soon!

100 Year Archive:

Q: Are CAS devices well suited for long-term storage? There appear to be benefits and downsides as well-- can you elaborate on these?

- Jacob from North Carolina

A: CAS devices by definition are meant for long-term storage, however "long-term" can be relative, so it's definitely worth while to dive into this a little further.

First off, let's define for the purposes of this writing what is the difference between a CAS storage solution and a plain (non-SAN) storage sub-system (i.e. just a bunch of disks or similar).

For the purpose of this writing, let’s define CAS as a storage device that has “software controls” either on the front end input/write end of the process; where an application, instead of writing directly to a file system, calls a storage application interface (API) and the API is responsible for controlling where and how the bytes are written to the disk, plus many subsequent data management functions such as mirroring; access controls for write once read many (WORM), and timed permanent data destruction - or - software controls on the backend where an application performs the write to the disk sub-system with normal operating system conventions (no third party API) and then software controls come into play, after the write to disk, to further manage the bytes written to the disk.

So, the biggest difference is the “software” which has been introduced to the process. And, although, the software is relatively transparent to the end-user (consumer) of the storage it is software nonetheless and this is where you should pay special attention relative to long-term storage if that’s your concern.

The lifecycle of software is continual improvement based on bug fixes, enhancements, etc. Each release of any software has variations on backwards compatibility and varying requirements for patches vs. general releases, etc. If a piece of software has major implications over your ability to get to the data stored on your storage devices, you want to ask lots of questions relative to software from your CAS vendor - and probably most important is what is the EOL (End of Life) support policy for their software.

This can have a major effect on the viability of your particular vendor for long-term storage or perhaps your actual strategy for long-term preservation of your data.

For example, how often is the controlling software updated for patches vs. general releases, and what is the support policy (for example current version minus one) how are patches coordinated with respect to potential firmware updates, etc.

What you want to ensure yourself of is supportability over the long-haul with respect to software and a level of comfort that you are not forced to be on the bleeding edge of new releases based on an aggressive release cycle and a current minus one support policy, for example. Review the EOL policies for your Operating System software and ask yourself should the software that is managing my data have the same supportability requirements – if long term record-keeping is your requirement this has to be a consideration.

So to bottom-line it, the software controls are the major advantage to these devices; they has allowed the industry to move forward with respect to newer technologies. However, evaluate the software like any other piece of software. Don’t be misguided by its seamlessness to the consumer.

All of that said, relative to software, there are a whole host of advantages for data migration strategies to both disk and tape, mirroring for redundancy, and well though-out hardware obsolescence strategies found in newer CAS devices. Also, tape migration strategies bring a necessary element to the long-term preservation equation.

So, as in most technologies there are pros and cons to the component pieces found in all CAS technologies. I would recommend that you do not rely on any one type of technology for long-term storage and preservation of electronic records – approach the problem using a combination of storage technologies, not for the purpose of hedging your bets but to exploit the benefits of different technologies to solve your business problem with CAS devices being just one type of technology in the overall mix.

- Peter Mojica

Q: Why was the 100 Year Archive Task Force formed?

A: The 100 Year Archive Task Force was formed for three main reasons:

1. Produce a “best practices for long term digital archive” document focused on the long-term access, preservation, and usability of electronic information in a business context.

2. Establish requirements and guidelines that will stimulate new technologies to come to market to address the long-term retention and usability problems.

3. Develop a standard storage-archive format for long term logical readability.

The records management industry and its associated long-term digital storage has changed more drastically over the last five to seven years than it has in the previous couple of decades. The new business requirements are driven by more than just corporate record-keeping. The new business mantra involves a new facet of "compliance" which in large part has been predicated based on corporate malfeasance, (i.e. Enron, Worldcom, and others). This flurry of new regulations, such as SOX, has impacted every industry with respect to how, what and where they manage long-term digital records. Even those companies not governed under SOX are adopting new best practices in order to keep up with new requirements on corporate record-keeping. In addition to one of the largest new data artifacts to be managed being e-mail, this has exasperated the new requirements as the volume of electronic data and size with respect to digital storage; complexity involved in managing security and access is unprecedented. All of these changes have really accelerated the need for the establishment of the 100 Year Archive Task Force, to ensure that best practices moving forward are in accordance with the new business objectives.

Q: What types of problems will you encounter if you try to retain information long term?

A: The problems encountered center around the long-term usability of information stored digitally on computerized information systems. The crucial problem affecting the long-term usability of digital information is the failure to develop a migration strategy for moving information to new media and technologies as older ones are displaced. Digital information is technology dependent and therefore technology obsolescence is likely to be the most serious impediment to the long-term usability of digital records. Therefore, the development and implementation of a migration strategy and a standard storage-archive format to ensure that digital records created today can be both processed by computers and intelligible to humans in the 21st century is critical.

As stated above, you have many technical issues that involve hardware, operating systems, applications and application versions, data portability for migrations and the effect on chain of custody and authenticity. Also, today's rich formats, embedded codes and programs, and many different applications participating in the records management process create issues which involve more than just data but the context of the data and/or its value outside of the data application. These issues are many and varied, however at the heart of the issue is how will we retrieve an electronic record created in application "A" when application "A" no longer exists and the servers and storage devices that were maintaining the data must continue to be upgraded in order to support new operating systems and technology? The complexity of software and the changes or advances in storage technologies have all contributed to the new issues for long term record keeping over the 100 year life-cycle.

- Peter Mojica and Gary Zasman

Copyright 2006, SNIA Data Management Forum. All rights reserved.