Tales from the Field

June 22, 2011

Tales from the Field

by Woody Hutsell, www.appICU.com

Instead of marketing from afar, I have been selling from the trenches and let me tell you the world looks very different from this view point.

I have a variety of observations from my first 9 months of working closely with IT end-users:

  1. At least 50% of the IT people I talk to are generally unfamiliar with solid state storage.  These 50% are so busy worrying about backups, replication, storage capacity and virtualization that it would take a whole screaming train full of end users before they would care about performance.  What they are likely to think they know about SSD is that they are unreliable and don’t have great write performance.  I always ask these end users about performance or interest in SSD and usually get fairly blank looks back.  Don’t get me wrong, their interest in performance or SSD is no reflection on them just a reflection on their situation.  Maybe they don’t need any more performance than they already get from their storage.  Maybe performance is so far down their list of concerns as to not matter.  Maybe they just can’t budget a big investment in SSD.
  2. Some high percentage of IT buying is done without any real research.  So much for technical marketing.  You could write any number of case studies, brochures and white papers and these guys wouldn’t learn about it unless the sales person sitting across from them drops in at just the right time immediately after the aforementioned train full of end-users has started complaining about performance (and the IT guy happens to have budget to spend on something other than backup, storage capacity, replication or virtualization).
  3. These groups are deploying server virtualizationin mass.
  4. These groups are standardizing on low cost storage solutions.  The rush to standardize is driven by the number one reality affecting many IT shops:  they are under staffed and their budgets are constrained.  The lack of staffing means that it is hard to get staff trained on multiple products and life is easier if they can manage multiple components from a single interface.  The lack of budget means that IT buyers have to make compromises when it comes to storage solutions.  Because of item #2 (above), they are reasonably likely to buy storage from their server vendor and often find their way to the bottom of the storage line-up to save money.

You might think these observations would be disheartening, but really I think the story is that SSD is just starting to make its way through to the more mature buyers in the market.  Eventually, I believe that all IT storage buyers will be as familiar with and concerned with protecting application performance as they are with capacity and reliability.

A case in point, I have run into at least two customers where the drive to standardize with VMWare and low cost storage is crushing application performance for mission critical applications.  The good news for these IT shops is they have low storage costs and an easy to manage environment (because they have one storage vendor and one server virtualization solution).  The bad news is that their core business is suffering.

From my limited point of view, standardization is something that the IT guys like and the application owners don’t like.  You might assume that I think the IT guys are short-sighted, but no, increasingly I am seeing that they just don’t have a choice; they have to standardize or die under a staggering workload and shrinking budget.  Something though has to give.  A core business of one of these operations was risk analysis.  This company deployed low-cost storage and had virtualized the entire IT environment with VMWare (including the SQLServer database).  The entire IT infrastructure ran great for this customer but a mission critical sub-terabyte database was a victim of standardization.  The risk managers, whose decisions drove business profitability, were punished every time they did complex analyses by slow application response time.  The second business is really a conglomerate of some 50+ departments.  These departments were not created equally, however, there were some really profitable big departments and some paper-pushing small departments.  To the benefit of some end users and the tremendous detriment of others this business standardized on a middle tier storage solution with generous capacity scalability but not so generous performance scalability.  Their premier revenue generating department was suffering with, you won’t believe this, 60 millisecond latencies from storage for their transaction processing system.  Yikes.  For the non-storage geeks reading this blog, a really fast solid state storage system will return data to the host in well under 1 millisecond.  A well-tuned hard disk based RAID array will return data in 5 to 7 milliseconds.  A 60 millisecond response time is indicative of a major storage bottleneck.  Experiencing a 60 millisecond response time on a single request is no big deal but when this is during a batch process or spread across many concurrent users applications get to be very slow, end-users wait for seconds or batch process take too long to complete resulting in blown batch processing windows.

For now, the story for these two environments is not finished.  Once companies head down the standardization trail they are pretty confident and committed.  Eventually, the wheels fall off and people begin to realize that it is as bad to standardize on all low cost storage as it is to standardize on all high end storage.  Eventually, people realize that IT needs to align to business and not the other way around.

As companies amass larger data stores and the price and options for deploying SSD evolves, SSD solutions will become more common in the data center and a part of each IT manager’s bag of tricks.  Zsolt Kerekes, at StorageSearch.com, put it best in his 2010 article “This Way to Petabyte SSD” (http://www.storagesearch.com/ssd-petabyte.html) when he said “The ability to leverage the data harvest will create new added value opportunities in the biggest data use markets – which means that backup will no longer be seen as an overhead cost. Instead archived data will be seen as a potential money making resource or profit center. Following the Google experience – that analyzing more data makes the product derived from that data even better. So more data is good rather than bad. (Even if it’s expensive.)”

Advertisements

Consistency Groups: The Trouble with Stand-alone SSDs

February 28, 2011

SSDs (Solid State Disks) are fast; everyone knows this.  So, if they are all so very fast, why are we still using spinning disks at all?  The thing about SSDs (OK, well, one of the things) is that while they are unarguably fast, they can need to be implemented with reliability and availability in mind just like any other storage media.  Deploying them in an Enterprise environment can be sort of like “putting all of your eggs in one basket”.  In order for them to meet the RAS needs of enterprise customers, they must be “backed up” in some meaningful way.  It is not good enough to make back-up copies occasionally; we must protect their data in real time, all of the time.  Enterprise storage systems do this in many different ways, and over time, we will touch upon all of these ways.  Today, we want to talk about one of the ways – replication.

One of the key concepts in data center replication is the concept of consistency groups.  A consistency group is a set of files that must be backed up/replicated/restored together with the primary data in order for the application to be properly restored.  Consistency groups are the cause of the most difficult discussions between end-users and SSD manufacturers.  At the end of this article, I will suggest some solutions to this problem.

The largest storage manufacturers have a corner on the enterprise data center marketplace because they have array-based replication tools that have been proven, in many locations over many years.  For replicated data to be restored, an entire consistency group must be replicated using the same tool set.  This is where external SSDs encounter a problem.  External SSDs are not typically (though this is changing) used to store all application data; furthermore, they do not usually offer replication.  In a typical environment, the most frequently accessed components of an application are stored on SSD and the remaining, less frequently accessed data, are stored on slower, less expensive disk.  If a site has array-based replication, that array no longer has the entire consistency group to replicate.

External SSD write caching solutions encounter a more significant version of this same problem.  Instead of storing specific files that are accessible to the array-based replication tool, it has cached some writes that may, or may not be, flushed through to the replicating array.  The replicating array has no way of knowing this and will snapshot or replicate and not have a full set of consistent data because some of that data is cached in the external caching solution.  I am aware that some of these third party write caching solutions do have a mechanism to flush cache and allow the external array to snapshot or replicate, but generally speaking, these caching SSDs have historically been used to cache only reads, since write-caching creates too many headaches.  Unless the external caching solution is explicitly certified and blessed by the manufacturer of the storage being cached, using these products for anything more than read caching can be a pretty risky decision.

Automatic integration with array-based replication tools is a main reason that some customers will select disk form factor SSD rather than third party SSDs, in spite of huge performance benefits from the third party SSD.  If you are committed to attaining the absolute highest performance, and are willing to invest just a little bit of effort to maximize performance, the following discussion details some options for getting around this problem.

Solution 1:  Implement a preferred-read mirror.  For sites committed to array-based replication, a preferred-read mirror is often the best way to get benefit from an external SSD and yet keep using array-based replication.  A preferred-read mirror writes to both the external SSD and to the replicating SAN array.  In this way, the replicating array has all of the data needed to maintain the consistency group and yet all reads come from the faster external SSD.  One side benefit of this model is that it allows a site to avoid mirroring two expensive external SSDs for reliability, saving money.  This is because the existing array provides this role.  If your host operating system or individual software application does not offer preferred read mirroring, then a common solution is to use third-party storage application such as Symnatec’s Veritas Storage Foundation to provide this feature.  You must bear in mind that a preferred read mirror does not accelerate writes.

Solution 2:  Implement server-based replication.  There are an increasing number of good server-based replication solutions.  These tools allow you to maintain consistency groups from the server rather than from the controller inside the storage array, allowing one tool to replicate multiple heterogeneous storage solutions.

Solution 3:  For enterprise database environments, it is common for a site to replicate using transaction log shipping.  Transaction log shipping makes sure all writes to a database are replicated to a remote site where a database can be rebuilt if needed.  This approach takes database replication away from the array – moving things closer to the database application. 

Solution 4:  Implement a virtualizing controller with replication capabilities.  A few external SSD manufacturers have partnered with vendors that offer controller based replication and who support heterogeneous external storage behind that controller.  This moves the SSD behind a controller capable of performing replication.  The performance characteristics of the virtualizing controller now are a gating factor in determining the effectiveness, and indeed the value added by the external SSD.  In other words, if the virtualizing controller adds latency (it must) or has bandwidth limitations (generally they do), those will now apply to the external SSD.  This can slow SSDs down by a factor of from three to ten times.  It is also the case that this approach will solve the consistency group problem only if the entire consistency group is stored behind the virtualizing controller.

Most companies implementing external SSD have had to make decisions, trying to grapple with the impact of consistency groups on application performance, replication and recovery speed.  Even so, the great speed associated with external SSDs often leads them to implement external SSD using one of the solutions we have discussed. 

What has been your experience?