Online Status 
online status of sales team Sales
online status of support team Support


Power management, hardware failure and MTBF with Cassatt Collage. 
There has been a lot of "chatter" lately about power management, hardware failure and MTBF numbers. James , Aloof, Vinay and others have been discussing it and it's been an interesting topic to follow.

Data centres are eating more and more power, both in terms of power to the servers themselves and to the air conditioning units cooling them.
We've talked about power savings from implementing virtualisation before, here , here and here for example.

But, IT people don't like power cycling servers, partly for the reasons that James & co. have suggested. That reason being that historically the extra load put on a computer at power up has caused many a server to die. However modern computers are better engineered and the Mean Time Between Failures (MTBF) is pretty respectable.

HOWEVER....

We in IT know the fundamental law of computing... Murphy's Law.
It's not the average that gets you, it's the exception to the rule, the machine that fails or the set of machines that fail destroying the MTBF number in your data centre in a moment. It's that one server that had that one application that you didn't realise was not being backed up properly, that you didn't realise the other applications needed, that you didn't realise the CEO looked at hourly!
That server is the one that will have dual power supplies fail when you turn it off and on again, Murphy's law pretty much guarantees it.

There are two ways that Cassatt can help you deal with this.
One is balancing risk against power savings. By powering down all the servers that are not needed, you can save a large amount in utility bills. If the saving is larger than the risk of having to buy a new piece of hardware to get that one server back from the grave, then you win.

This saving can also be tracked and the saving added say to a new hardware budget instead. So you can have spare hardware ready, or better yet used to start putting the more advanced features of Cassatt into play.

Cassatt's Collage software offers the automated power control, the power savings come from there as discussed above.
Collage also allows you to automate the power, operating system and applications on a large number of hardware servers. With Collage's advanced features, you can push to bare metal an entire application server, operating system, application and even reconfigure the network switches to enable the port the hardware you have just provisioned is plugged into.

With Collage, all your servers become nodes, that all your applications could potentially run on if required.

Collage will shutdown that server we were talking about earlier at say 7pm, then try and turn it back on at say 7am (saving you all that money on power and cooling). Should the hardware fail for some reason, Collage will select a new server with appropriate CPU, RAM, DISK, etc. and push the operating system and application onto it, it'll re configure the network so that the new hardware is on the right vLAN and as far as the rest of the world is concerned it will be the exact same server as before.

So getting back to the discussion around MTBF numbers and Mr. Murphy's law. What Cassatt can do is accept that Murphy's Law is in operation, that despite advances in technology hardware will fail, that it will fail at inconvenient moments (like when you are asleep, or on holiday) and it can cope with that situation to ensure that the service levels you have configured are maintained. This is above and beyond the mission to be "green" and save energy.

So... if you believe in MTBF numbers or in Murphy's law, Collage is able to do its thing. It'll save power by shutting machines down for you and if you go whole hog, it'll handle all the rest, so you can enjoy that Martini on the beach during your holiday (or should that be Eggnog by the fireside at this time of year?).

Links:
Cassatt Page
Creating Agile Data Centre Server Environment

[ add comment ] ( 91 views )   |  [ 0 trackbacks ]   |  permalink
Optimising WAN for Application Acceleration. 
In a survey of 235 organisations by Aberdeen Group, 93% of respondents said they increased their bandwidth over the last two years, but only 52% saw an improvement in application response times, only 36% saw WAN latency decrease and only 18% saw a 50% improvement in performance.

This is really no surprise. Bandwidth is rarely the problem, if only it were that easy!!

Aberdeen identifies “Best in Class” companies who exhibited common characteristics:

• Likely to have polices for prioritisation of WAN traffic.

• Centrally manage WAN optimisation appliances

• Application specific compression tools

They also identify actions to get to become “Best in Class”:

• Develop capabilities for monitoring and analysing performance of applications running on the WAN.

• Use historical data to plan chances of bandwidth capacity

• Deploy technology solutions for shaping WAN traffic

• Develop controls to limit the use of bandwidth for applications

Our view is a lot of this makes sense. Many things seem to cause confusion amongst some customers we talk to about application acceleration and WAN optimisation as there is often a lack of understanding about this pretty complicated area. We hear comments like “I can ping it OK but it runs like a dog.”

Ping is sometimes OK for testing connectivity, but that’s about it. CoS and QoS are not the same animal. Latency is a killer. Bandwidth limitations can often be overcome by other means. Even experienced network managers are often surprised when they get visibility of what is really running over their WAN. Knowing what changes (network behaviour) is key to troubleshooting as well as making sensible decisions about which applications to accelerate and which techniques to use. Once WAN optimisation and application acceleration has been applied, there is an ongoing need to monitor the WAN in detail to ensure the optimisation is doing what it is supposed to be doing.

Companies that can view bandwidth consumption per network location, and per application, per user, can better plan for changes in capacity and preserve optimal application performance. Those which can then apply policies to give granular control of those variables and measure to confirm optimal acceleration are those who are most likely to succeed with their application acceleration and provide users on remote sites with services which allow them to work effectively.

You can read the report on the Aberdeen site

[ add comment ] ( 82 views )   |  [ 0 trackbacks ]   |  permalink
What is RPO and why it matters with CDP using InMage DR-Scout. 
With our work in Continuous Data Protection (CDP) we often end up in discussions about this three letter acronym RPO.
RPO stands for Recovery Point Objective, and in basic terms it is that window of risk between backups. It is the amount of data the business is prepared to lose.

Wikipedia describes it like this:
"..Recovery point objective (RPO) describes a point in time to which data must be restored in order to be acceptable to the owner(s) of the processes supported by that data.
This is often thought of as the time between the last available backup and the time a disruption could potentially occur. The RPO is established based on tolerance for loss of data or re-entering of data..."

( http://en.wikipedia.org/wiki/Recovery_point_objective )

In real terms it works like this. If you do your backup at say midnight everyday, then at 9am the next day the RPO would be 9 hours, at noon it would be 12 hours. The worst case scenario is at 11:59pm, where you have a RPO of 23 hours 59 minutes.

Now, why does this matter? It matters because is your server dies at lunchtime (RPO of 12 hours) you've lost all that data from, this morning. :(
In real terms you have probably only lost 3 hours work (9am-12), but that's still 3 hours lost! How many sales orders, reports, invoices, etc have been lost?

Why this comes up when we talk to people about CDP is pretty easy to illustrate. Keep in mind the example above and take a look at the chart below:



This is a RPO chart taken from a live server with a CDP product called InMage DR-Scout installed. The RPO times are in the range of 1-2 minutes, meaning should something bad happen, at most 2 minutes of data would be lost. We can repeat the question "How many sales orders, reports, invoices, etc have been lost?" but instead of 3 hours think in terms of less than 3 minutes.

The chart above, we should point out is a server being protected via DR-Scout to a target server via a 1MB WAN link (with about 40ms round trip ping times). This server has about 200MB of data "churn" per hour. So what the chart tells us is even in a catastrophic disaster where the entire primary site was lost, our disaster recovery site would have all the data up to at worst 2 minutes before.

Further reading:
Should I replicate data or use continuous data protection?
Does Your Data Protection System Meet Your SLAs?
CDP or Replication for DR and Business Continuity?

[ add comment ] ( 81 views )   |  [ 0 trackbacks ]   |  permalink
Using Packeteer iShared WAFS to improve application performance over the WAN. 
WAFS, is predominately a technology used by users. By this I mean that in a typical scenario, your users in the remote locations access files directly from the cache to get nice fast access to files from the central fileservers.

Recently however, we have worked with several firms to enable applications to access the cache rather than people. In one case we ran a live test where the cache was used exclusively by an application and users had no access to files on the cache whatsoever.

Overall, it has proven quite successful.
Caching data for applications provides the same speed benefits that user’s experience. We have seen the typical speed improvements that we have documented previously (previous blog entry) which have delivered improvements in performance for users but also shown measurable time savings with large processing jobs.

Here are a couple of examples:

1) Web-based knowledge/document management system.

A lot of clients we work with are using knowledge management systems, especially those involved with large client projects, such as design firms or in construction. Typically built on a platform with a web front end and database such as IIS and SQL, like SharePoint, these systems allow users to "check-in" and "check-out" documents via a client or web interface. These systems often maintain copies of the data at remote sites for users at say a construction site itself or in remote offices.
We have used iShared to maintain this data, which has reliably served the data quickly and saved time especially where the file would not be on the remote site normally. Most of these systems use simple mechanisms to transfer the data from the central data store to the remote data store, using full file copies and standard protocols such as HTTP or CIFS/SMB.
By using iShared we gained considerable speed improvements just in transferring these files as the iShared system uses several optimisation methods and its own efficient protocol to send the files.

We have struck some "issues" with some packages, as obviously they were not built to have a smart cache like iShared. On one occasion we experienced some file loss as a result of the knowledge base software deleting files prior to copying files from the data centre. This resulted in the files being deleted obediently by the iShared system from the data centre. Then the software tried to copy the file it had just requested be deleted... bang!

It highlights the need to be cautious when implementing WAFS for applications as unless you have a thorough knowledge of how WAFS works and how the application interacts, some subtle but catastrophic problems can arise.

2) Data processing and report runs.
Although it seems mad in this day and age, running batch jobs to process data and or produce reports is still a required evil. Often, the location needing/producing the reports is not the one producing the data. For example in the manufacturing sector, "head office" might be the ones needing the reports based on the data being produced at the "factory". Moving the report runs to the Factory might be one solution, but the reports would still need to be moved to the HQ at some point.
WAFs can help in this scenario too. iShared is able to optimise file transfers and is also able to optimise/accelerate application protocols also, say for example SQL traffic. We have done practical and lab experiments where we have optimised SQL queries (see here for some details) which have produced some marked improvements. We have worked on examples where data has been required to be exported to XML prior to importing into another application this again is handled well by iShared. The "byte-level differencing" in fact becomes a considerable optimisation when especially when exporting to the same file on a regular basis as only the new data added is transferred.

The biggest issues we have come across when using WAFS to optimise applications have been related to the internal working of iShared and the specific applications. We've discovered that running iShared in a "normal" configuration as you might get find "out of the box" is often not ideal and can in fact cause some of the issues that we have observed. This is because the application's behaviour has been designed to work with a cache (or at least with a cache that the application itself does not maintain). We have on occasion had to advise our clients that WAFS although providing some impressive speed gains would simply not be a sensible solution as we discovered an incompatibility in their application that would have caused serious problems.

If you are considering WAFS as a way to improve application performance you are probably on the right track, just be careful in terms of making sure you have a pretty detailed understanding of both your application and WAFS, as getting it right can take some tweaking and getting it wrong could be P45 territory.

Product Page:
Packeteer iShared WAFS

[ add comment ] ( 82 views )   |  [ 0 trackbacks ]   |  permalink
Gartner Says Agility Will Become the Primary Measure of Data Centre Excellence by 2012  
Analysts Examine Business Agility and Data-Centre Virtualisation at Gartner's Data Centre Summit 2007, 22-24 October 2007, London

The next five years will see agility become the primary measure of data-centre excellence. Analysts advised that through 2012 virtualisation will be the most significant factor on data centres. It greatly reduces the number of servers, space, power and cooling demands and ultimately enables agility.

"An agile data centre will handle exceptions effectively, but learn from exceptions to improve standards and processes," said Tom Bittman, Gartner vice-president and distinguished analyst. "Agility will become a major business differentiator in a connected world. Business agility requires agility in the data centre, which is difficult as many of the technologies for improving the intelligence and self-management of IT are very immature, but they will evolve over the next ten years."

Within the data centre, agility should be measured in terms that make sense to the business, such as the time and cost to deploy new servers, to install new software or to fix a problem.

Gartner defines agility as the ability of an organisation to sense environmental change and respond efficiently and effectively. However, no organisation will be agile if its infrastructure is not designed for agility. Mr Bittman said: "Agility is the right strategic balance between speed and operational efficiency."

As a core enabler of agility, virtualisation is the abstraction of IT resources in a way that makes it possible to creatively build IT services. While the vast majority of large organisations have started to virtualise their servers, Gartner estimates that currently only 6 per cent of the addressable market is penetrated by virtualisation, a figure set to rise to 11 per cent by 2009. However, the number of virtualised machines deployed on servers is expected to grow from 1.2 million today to 4 million in 2009.

"Virtualisation changes virtually everything," said Mr Bittman. He explained that it is not just about consolidation but also includes transitioning resource management from individual servers to pools, increasing server deployment speeds up to 30 times.

Virtualisation is a major enabler for infrastructure automation, and will help accelerate the trend toward IT operations process automation.

However, Gartner warned that tools alone are not a substitute for a good process and made the following recommendations to organisations planning or implementing virtualisation:

• When looking at IT projects, balance the virtualised and unvirtualised services. Also look at the investments and trade-offs;
• Reuse virtualised services across the portfolio. Every new project does not warrant a new virtualisation technology or approach;
• Understand the impact of virtualisation on the project's life cycle. In particular, look for licensing, support and testing constraints;
• Focus not just on virtualisation platforms, but also on the management tools and the impact on operations;
• Look at emerging standards for the management and virtualisation space.

Mr Bittman concluded: "IT organisations should have strategic plans in place that include agility improvements. Ultimately, agility requirements are determined and valued by the business."

More Reading:

Cassatt Collage
VMware
XenSource
Virtuozzo
Monitoring Virtual Environments

[ add comment ] ( 79 views )   |  [ 0 trackbacks ]   |  permalink

<<First <Back | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | Next> Last>>