Recently Asked Questions
[Professional Services Task Force] [VTL] [XAM]
[100 Year Archive]
types of organizations are utilizing CDP technology?
CDP is being deployed by organizations of all sizes and industry types
today. There is no specific “profile” that limits the value CDP is bringing
to IT groups across small, medium and large organizations in manufacturing,
retail, financial services, healthcare, government, and service
organizations with mission-critical data and valuable transactions,
continuous data protection ensures that ALL data changes are captured and
protected as they occur, thus eliminating the risk of data loss due to gaps
in protection. With CDP there is no exposure to data loss between scheduled
backups or snapshot apertures. One legal firm implemented a CDP solution
after estimating the exposure cost of a 1-day backup protection gap at over
organizations are choosing application-intelligent CDP to protect and
improve availability for messaging systems, such as Microsoft Exchange. For
these organizations, reliable, rapid, consistent recovery is a key driver
for implementing a cohesive CDP solution. Many organizations that have
implemented CDP for messaging today are also concerned with the need to meet
regulatory standards for data protection, retention and Electronic Data
Discovery (EDD) and are finding that CDP solutions are meeting their needs.
Operational simplicity and savings
are proving drivers for CDP adoption for organizations who want to reduce
the management burden of data protection. Key IT and application support
staff do not need to be involved in maintaining data protection or in
recovery processes with many of today’s CDP solutions. Even print servers
are being protected with CDP to save IT staff time!
Because CDP is capturing and
maintaining all data in local or remote repositories, organizations are
leveraging these repositories for data analysis and reporting. An example of
this is in retail organizations where timely decision-making can directly
impact margins and profitability. Real-time and historical data is accessed
from the CDP repository to support decision-making without interrupting or
slowing down retail operations on production application and database
Bottom line, CDP is
not a niche product for any particular organization type or size. It is
proving an effective solution in improving availability, reliable recovery,
RPO and RTO for all sorts of data, structured and unstructured, at the
server and desktop level. Check out the CDP Buyers Guide to find vendor
contacts and to get more detailed information about available solutions, so
you can evaluate how CDP can integrate into your IT environment and improve
Lamont, Vice President of Marketing, TimeSpring
does CDP augment traditional, scheduled backup systems?
Often touted for its ability to provide instant restoration and finely
granular recovery, CDP also solves the Backup Window Paradox by evaporating
the backup window itself, enabling backup copies to be made anytime, day or
night, without impacting online operations.
CDP systems can recreate copies of
the same time point repeatedly because the process is non-destructive to the
underlying data. These instantly created time-based versions of data can
easily be backed up using traditional, scheduled backup systems to removable
media, replicated over distance or farmed out for alternative uses, all
without impacting the ongoing online applications.
It is important to note that CDP
systems deliver what is known as an ‘atomic’ view of the data. All the data
across all the disks is shown at exactly the same moment in time. It’s as if
time stopped at that exact moment. This atomic view provides consistency and
stability across databases, applications, federations and even entire
datacenters. CDP can dynamically recreate entire application environments
without application involvement. In fact, the alternate view staging can be
done on a completely different SAN or even in a separate geographic
established, CDP technology can, in parallel to online applications, present
alternate views of data, from any point or event in time. An important use
for these time-based views is as a source for creating copies using
traditional, scheduled backup systems. A simple process creates a
time-stamped copy, mounts it on the backup server, initiates a backup,
dismounts and evaporates the copy – all at the touch of a button.
There are a wide range of additional
uses for time-stamped views. For the first time, an audit can be performed
before a backup is performed – improving confidence in an effective recovery
should the backup copy ever be needed.
Obviously, if the backup were to
fail, it would be a simple matter to recreate the image and restart the
Restoration? There are two general principles to restoration: Recovery Point
Objective (RPO) and Recovery Time Objective (RTO). RPO is the targeted
recovery point, and essentially defines the maximum amount of data loss that
can be tolerated in a recovery.
If a business determines that it
can’t lose more than six hours of data, then a backup copy of the data must
be created every six hours. Of course, because most logical data failures
aren’t found when they occur, but later (minutes, hours, even days), the RPO
typically is implied across a time frame like two or three days. So a
six-hour RPO over two days means that a backup occurs every six hours, four
per day, for a total of eight.
RTO defines how long the business can
tolerate being offline or down from a failure. It typically includes the
three phases of application recovery: analysis (typically 5% of the RTO);
data restoration (typically 90% of the RTO); and recovery (typically 5% of
the RTO). The caliber of restoration capabilities can be best judged in
terms of RPO and RTO, with a note about how easily the environment can be
audited so as to avoid restoration failures.
CDP as a technology offers infinite
RPOs. That is what the word ‘continuous’ in CDP refers to: the continuous
spectrum of recovery points. Any point or event in time means just that, so
the RPO becomes infinitely granular. With CDP, nothing extra has to be done,
and the only additional enhancement the user can scale is how far back they
want the recovery spectrum to extend.
Recovery Time Objectives are a little
different. While most CDP systems provide the user with a picture of the
data from the restoration point – this picture or image can then be used as
a source to copy into the production environment, much like a snapshot,
there are also CDP solutions which implement Virtualized Recovery providing
the ability to remap or virtualize the recovery data right in the production
environment, allowing for RTO times to be measured in seconds.
CDP: A New Solution for an Old
Problem. Backup has long been an issue plaguing IT organizations for years,
complemented by the unprecedented data growth over the past years coupled
with unrelenting competitive pressures to remain operational around the
clock. Luckily, CDP has come to augment organizations’ traditional,
scheduled backup systems to meet their business and operational
Fisher, Software Engineer, Revivio, Inc.
Information Lifecycle Management (ILM):
Q: I'm writing to you because
I ran into a problem when reading your article titled, "Managing Data and
Storage Resources in Support of ILM”-- snapshot of a work in progress. I
believe only you can help me out.
The sentence in question reads:
“The intended audience includes groups defining standards using web services
related to business process management, enterprise content management, grid
computing and records information management.” I found at least three
possible ways to understand it: 1) groups defining standards by using web
services, and the web services are related to xxx; 2) groups defining
standards by using web services, and the standards are related to xxx or
groups using web services to define standards related to xxx; 3) groups
defining standards for using web services related to xxx.
Obviously, only the correct answer
can survive, and the other two must be struck out. My question is which is
the correct one? I'm also wondering if web services can be used to define
standards. A voice from an expert like you would be conclusive to my
question and highly appreciated. Thanks for your time on this message.
- David Y.
A: Thank you for your
question. I also received the same question via Arnold Jones and one of my
co-authors, Jack Gelb, while I was on vacation.
Rather than choosing a, b, or c from your list, let me first provide you
with the context. First, SNIA is defining standards for the management of
storage in SMI-S. The intent of the SNIA ILM TWG is to extend SMI-S "up the
stack" to the management of data as it relates to storage, storage services
and data management applications such as backup and replication management.
Second, the authors of SMI-S recognize the industry migration toward web
services for management applications and have identified a migration path
for SMI-S to include web services over the next couple of releases.
- Edgar St. Pierre, Chair of
the SNIA ILM Technical Workgroup, Chair of the DMF’s ILM Initiative and
Senior Technologist at EMC2
Q: I’ve started a project for
rationalization and optimization using ILM. The environment is mainly EMC
but the customer wants to apply policies to optimize the storage. I want to
know how to define classification criteria, such as which indicators should
I consider, etc.?
At the moment, I'm considering 3 axes:
A - Availability (RTO and RPO)
B - Performance (usability, response time)
C - Retention requirements (due to legal or business reasons)
Should I consider more axes, variables to classify data and be able to
associate this classification to storage tiers?
Another question: Is there any published document with storage alternatives,
Storage SW and Hardware that could be useful to analyze alternatives?
- David from Barcelona
A: Great question! There's
no single answer, but let me give you some things to consider.
First, every customer's needs are unique to some degree. They may have
industry or business model-specific criteria on which they want to drive
classification of data, or they may have personal preferences that will
affect both the classification categories as well as the criteria. For
example, some CIOs tend to be more risk averse than others and the CEO pays
them to think that way.
The three you name are among the most common.
Availability / Data Recoverability is clearly business-driven based on the
potential losses to the business if their application is down. If the
classification determines the type of data recovery solutions that are to be
used (disk-based backup vs. tape, snapshots, cdp, etc), then this is really
just about Data Recoverability. If the classification also determines the
class of storage configuration to be used so as to be more or less robust
with respect to hardware failures (MTBF of hardware, RAID config,
multi-paths, remote synchronous replication, etc) then this criteria truly
determines availability. Note that these criteria can be combined or
separate, depending on how you've organized your storage solutions. Also
important here: you may need to distinguish Disaster Recovery from
Operational Recovery solutions. This can be determined in a DR plan based on
which applications need to be restored in which order. If a valuable
application can not tolerate downtime in the primary data center, then a
fast RTO is in order for Operational Recovery. But if it is dependent on
many other applications being available, then having a fast Disaster
Recovery RTO is unlikely.
Performance: Yes, certainly. I've also seen performance combined with
hardware robustness (availability) since sometimes the two may change
together between different storage configurations. E.g., if your EMC
environment has both Symmetrix and Clariion, then you could easily configure
the individual storage arrays so that the Symmetrix provides better
performance and availability than the Clariion -- at a price that should
reflect that difference.
Retention Requirements: In today's world of compliance, this is very often
used -- both for how long to retain the data, as well as retention criteria
such as immutability, that can be used to determine the storage
configuration required for the data. This criteria may be driven by a
records information manager, legal, or some other part of the organization
than the business where the role of compliance officer has settled.
Other criteria to consider are below. NOTE: you don't need to classify data
across all the different dimensions. The REQUIREMENTS of business, security,
legal, RIM, and other stakeholders should determine the criteria needed to
classify data across the different storage configurations you build to meet
Cost: business is bottom line. Why not reflect to them what it will cost for
each storage solution, and even allow them to specify the most they are
willing to spend as a key classification criteria? Quantification may vary,
but something like a fully loaded per GB/month works.
Search/index: if data requires frequent search to satisfy litigation
discovery claims, then it would be best placed in a storage configuration
that includes a search/index capability.
Security: this is the granddaddy of classification criteria. The data's
accessibility should provide guidance in how the data should be stored.
I hope this helps. Please note, however, that the individual customer
requirements are just that -- individual. These are guidelines that may be
useful in helping you meet those customer requirements.
- Edgar St. Piere, Chair of
the SNIA ILM Technical Workgroup and Senior Staff Software Engineer, EMC
How does SNIA define ILM? And how does Information Lifecycle Management add
value to the enterprise?
A: SNIA's formal
definition for ILM can be found in the
SNIA dictionary "Information Lifecycle Management - ILM"
The policies, processes, practices, services and tools used to align the
business value of information with the most appropriate and cost-effective
infrastructure from the time information is created through its final
disposition. Information is aligned with business requirements through
management policies and service levels associated with applications,
metadata and data."
applies to all operational aspects of the enterprise from business policies,
processes and procedures to the mechanics of handling and managing the data
related to the business. It enables new and emerging requirements, like
compliance and governance, to be incorporated into your existing
environment, and assists with important day-to-day considerations like
optimal resource utilization, cost-effective technology implementations and
leverage, as well as, long term information and data handling by applying
business-driven policies and requirements to Information Technology planning
Resource usage and utilization,
information tracking and management, and enabling efficient deployment of
new business applications and processes - ILM assists corporations to meet
these goals by applying an end-to-end approach across the lifecycle of
business information and aligning that with the most effective resources.
For more information on SNIA's
definition for ILM and how it relates to the enterprise, see
ILM Definition and Scope - An ILM Framework as well as
Information Convergence: Transforming the Information-Centric Enterprise which
describes the transformation of today's enterprise into an
Q: Is ILM
specific to certain vendors or products?
A: No, not at all! ILM is
business-driven and leverages technology -- not the other way around.
Although ILM's most recent emergence as an important approach and discipline
is receiving significant attention, there are many technologies and products
(from many vendors!) already deployed that assist with the implementation of
effective information management. Today, these technologies are deployed as
disparate islands but the promise of ILM is to bring these together along
with new products and approaches to manage information based upon
business-driven requirements, from creation to destruction.
For more information on the roadmap that is envisioned for ILM, including
some of the products and standards that are involved with delivering
heterogeneous solutions into the future, see Information Lifecycle
Management Roadmap whitepaper.
Q: How can ILM help with my
A: Corporations cannot afford to deploy "technology
islands" for each business application; this has proven to be the most
expensive, unmanageable and risk-prone method of application deployment.
While it is expected that some applications will have unique user,
infrastructure or business-centric requirements, leveraging the existing
environment and associated investment across business applications is not
necessarily approached using holistic approaches and methods. Further,
adding in new goverment compliance and regulatory requirements to existing
business application environments has proven to also be a daunting and
complicated endeavor. ILM provides methods, guidelines and highlights
important considerations with respect to information classification
techniques, deploying tiered infrastructures and creating new practices
to align the correct environment according to the business application's
requirements and the importance of the associated business information.
Proper classification is the
underlying principle surrounding this alignment. Classification of
information based on business requirements. Classification of data in
support of service level management. And classification of resources to
create a consolidated set of manageable, scalable, resources to meet the
needs of the business. For more information on classification and the
alignment of information requirements with the most effective resources, see
the recent tutorials developed by the DMF for
Storage Networking World and
Enterprise Information World industry events:
Information Classification: The Cornerstone to Information Management which
provides an excellent introduction to the area of
information classification, and
ILM: Tiered Storage & The Need for Classification which examines the
specific link between data classification, service level management, and
tiers of storage.
Professional Services Task Force:
What is the mission of the Professional Services Task Force?
A: The mission of this new
task force is to look at the areas of commonality shared by practitioners of
the art in Professional Services ILM engagements. The goal is to review
common areas of terminology, process, and scope to identify areas that all
vendors share in common.
- Gary Zasman and Edgar St. Pierre
Q: Coming Soon!
A: Coming Soon!
Q: Coming Soon!
A: Coming Soon!
Q: Are CAS devices well suited
for long-term storage? There appear to be benefits and downsides as well--
can you elaborate on these?
- Jacob from North Carolina
A: CAS devices by
definition are meant for long-term storage, however "long-term" can be
relative, so it's definitely worth while to dive into this a little further.
First off, let's define for the purposes of this writing what is the
difference between a CAS storage solution and a plain (non-SAN) storage
sub-system (i.e. just a bunch of disks or similar).
For the purpose of this writing, let’s define CAS as a storage device that
has “software controls” either on the front end input/write end of the
process; where an application, instead of writing directly to a file system,
calls a storage application interface (API) and the API is responsible for
controlling where and how the bytes are written to the disk, plus many
subsequent data management functions such as mirroring; access controls for
write once read many (WORM), and timed permanent data destruction - or -
software controls on the backend where an application performs the write to
the disk sub-system with normal operating system conventions (no third party
API) and then software controls come into play, after the write to disk, to
further manage the bytes written to the disk.
So, the biggest difference is the “software” which has been introduced to
the process. And, although, the software is relatively transparent to the
end-user (consumer) of the storage it is software nonetheless and this is
where you should pay special attention relative to long-term storage if
that’s your concern.
lifecycle of software is continual improvement based on bug fixes,
enhancements, etc. Each release of any software has variations on backwards
compatibility and varying requirements for patches vs. general releases,
etc. If a piece of software has major implications over your ability to get
to the data stored on your storage devices, you want to ask lots of
questions relative to software from your CAS vendor - and probably most
important is what is the EOL (End of Life) support policy for their
This can have a
major effect on the viability of your particular vendor for long-term
storage or perhaps your actual strategy for long-term preservation of your
For example, how often
is the controlling software updated for patches vs. general releases, and
what is the support policy (for example current version minus one) how are
patches coordinated with respect to potential firmware updates, etc.
What you want to ensure yourself of is supportability over the long-haul
with respect to software and a level of comfort that you are not forced to
be on the bleeding edge of new releases based on an aggressive release cycle
and a current minus one support policy, for example. Review the EOL policies
for your Operating System software and ask yourself should the software that
is managing my data have the same supportability requirements – if long term
record-keeping is your requirement this has to be a consideration.
So to bottom-line it, the software controls are the major advantage to these
devices; they has allowed the industry to move forward with respect to newer
technologies. However, evaluate the software like any other piece of
software. Don’t be misguided by its seamlessness to the consumer.
All of that said, relative to software, there are a whole host of advantages
for data migration strategies to both disk and tape, mirroring for
redundancy, and well though-out hardware obsolescence strategies found in
newer CAS devices. Also, tape migration strategies bring a necessary element
to the long-term preservation equation.
So, as in most technologies there are pros and cons to the component pieces
found in all CAS technologies. I would recommend that you do not rely on any
one type of technology for long-term storage and preservation of electronic
records – approach the problem using a combination of storage technologies,
not for the purpose of hedging your bets but to exploit the benefits of
different technologies to solve your business problem with CAS devices being
just one type of technology in the overall mix.
Q: Why was the
100 Year Archive Task Force formed?
A: The 100 Year Archive
Task Force was formed for three main reasons:
1. Produce a “best practices for long term digital archive” document focused
on the long-term access, preservation, and usability of electronic
information in a business context.
2. Establish requirements and guidelines that will stimulate new
technologies to come to market to address the long-term retention and
Develop a standard storage-archive format for long term logical readability.
The records management industry and its associated long-term digital storage
has changed more drastically over the last five to seven years than it has
in the previous couple of decades. The new business requirements are driven
by more than just corporate record-keeping. The new business mantra involves
a new facet of "compliance" which in large part has been predicated based on
corporate malfeasance, (i.e. Enron, Worldcom, and others). This flurry of
new regulations, such as SOX, has impacted every industry with respect to
how, what and where they manage long-term digital records. Even those
companies not governed under SOX are adopting new best practices in order to
keep up with new requirements on corporate record-keeping. In addition to
one of the largest new data artifacts to be managed being e-mail, this has
exasperated the new requirements as the volume of electronic data and size
with respect to digital storage; complexity involved in managing security
and access is unprecedented. All of these changes have really accelerated
the need for the establishment of the 100 Year Archive Task Force, to ensure
that best practices moving forward are in accordance with the new business
Q: What types of problems will you encounter if you try to retain
information long term?
A: The problems
encountered center around the long-term usability of information stored
digitally on computerized information systems. The crucial problem affecting
the long-term usability of digital information is the failure to develop a
migration strategy for moving information to new media and technologies as
older ones are displaced. Digital information is technology dependent and
therefore technology obsolescence is likely to be the most serious
impediment to the long-term usability of digital records. Therefore, the
development and implementation of a migration strategy and a standard
storage-archive format to ensure that digital records created today can be
both processed by computers and intelligible to humans in the 21st century
above, you have many technical issues that involve hardware, operating
systems, applications and application versions, data portability for
migrations and the effect on chain of custody and authenticity. Also,
today's rich formats, embedded codes and programs, and many different
applications participating in the records management process create issues
which involve more than just data but the context of the data and/or its
value outside of the data application. These issues are many and varied,
however at the heart of the issue is how will we retrieve an electronic
record created in application "A" when application "A" no longer exists and
the servers and storage devices that were maintaining the data must continue
to be upgraded in order to support new operating systems and technology? The
complexity of software and the changes or advances in storage technologies
have all contributed to the new issues for long term record keeping over the
100 year life-cycle.
Peter Mojica and Gary Zasman