Should I replicate data or use continuous data protection? 
When your bits are on the line and the whole world is shouting at you to get the system backup; that could rapidly become a very critical decision!

Remembering the whole point about any such protection scheme is the restores/recovery/continuity benefits; that must be the most significant consideration.

With any protection scheme there is always a trade off between cost and functionality. So it’s vital to understand how the business sees the risk and what service levels they require. The reality is that hardware failure, human error, software corruption and viruses/malware are the most common causes of data loss.

Most organisations can tolerate a few minutes or tens of minutes of data loss and the majority can be without some of their systems for a few minutes (we were told of one a few weeks ago who were happy to be without Exchange for a week!! There is always the extreme).

Putting it in context, until recently most people were doing daily backups, so had 24 hours of data at risk and restore times are horrible, picking on Exchange again find a new box, put the OS on, add it to the domain, set it up the same as the previous machine, put Exchange back, restore the database, probably run a consistency check, run the logs and so on, a couple of days could easily have passed by.

So, with a dose of realism, ten minutes either way is not a big deal (usually).

Going back to the original question, replication or continuous data protection?

Replication, however achieved, implies the copy is a “replica” of the original data, so corrupt the data and corrupt the replica. Ah, two copies of corrupt data, that’s not good.

Continuous data protection on the other hand implies that each change is saved as a separate change, so you can go back to any point in time. In reality, lots of so called continuous data protection systems don’t work like that. They simply replicate then take snapshots on some schedule. So I’m talking about grown up continuous data protection, not the “pseudo” version, where marketing hype takes over.

Having a local copy of data for those day to day annoyances – “I deleted the wrong file, can I get it back please”, or some random data corruption incident, is really useful. Having an off site continuous data protection copy is great for site disasters, especially if you have (perhaps) virtual servers set up on the remote site ready to mount the backup version. There is no reason why such a system should not have you back up and running in 10 minutes with negligible data loss.

[ add comment ] ( 19 views )   |  [ 0 trackbacks ]   |  permalink  |  related link
Netuitive Named "Best of VMworld" Finalist for Performance Monitoring and Optimisation 
Netuitive SI for VMware is a really neat, intelligent agentless monitoring solution for ESX server and the VMs running on it. Others have clearly realised the strengths we have long since recognised.

Netuitive SI for VMware, the self-learning management solution for virtualised environments, was named a Best of VMworld Award Winner at VMworld 2007, the world's largest virtualisation event. An independent team of judges consisting of experts and editors from SearchServerVirtualisation.com chose Netuitive SI for VMware over 22 competing products in the largest and most competitive category which included technologies from CA, AMD and IBM.

"We are honored to be recognised as a Best of VMworld Award Winner," stated Daniel Heimlich, vice president at Netuitive. "This award not only validates the innovation of Netuitive SI for VMware, but indicates that conventional approaches for system management are insufficient in the dynamic, fluid world of virtualisation. As more companies use VMware for deploying mission-critical applications, self-learning technology will be the key for maintaining optimal performance."

Netuitive SI for VMware self-learns and cross-correlates the performance behaviour characteristics of the virtual infrastructure -- each virtual machine, the host server and the resource pool -- enabling optimal allocation of the resource pool, automatic baselining and threshold administration, performance degradation forecasts up to 2 hours in advance, immediate pinpointing of problems and proactive management of system health. Netuitive quickly isolates root-causes, accurately identifies badly behaving virtual machines and spells out necessary corrective actions in plain English -- all to greatly reduce mean time to repair (MTTR).

Product details…..


[ add comment ] ( 89 views )   |  [ 0 trackbacks ]   |  permalink  |  related link
Presence information on/from mobile phones. 
Presence information is becoming increasingly important and also increasingly accessible and useable thanks to products such as Presence Networks "Networker" application. With Networker you are able to show to your network of contacts not only that you are online, offline, away, in a meeting, on the phone, etc. You are also able to customise that status to for example "Busy - in meeting about product x" or "Available - Working on proposal for xyz ltd".

Why is this important?
It is important because it adds context to your presence information. "Online" is useful, but "Online - on the train" in more useful. "Away" is useful (maybe), but "Away - going to met client 'Z' to discuss problems with widget quality" is more useful to a colleague who might be interested in speaking to Client 'Z' or perhaps is responsible for the quality control on your widget factory floor.

Real-world example.
Part of what we do as a business is support clients who use the technologies we provide for them. Sometimes this includes site visits to complete system software upgrades. Recently we sent an engineer out to do an out of hours upgrade on a client site, starting at 10pm. Our engineer had to stay overnight and left early to arrive in plenty of time to check into the hotel prior to visiting the client.

With a bit of luck traversing the M6 motorway, he arrived a several hours before the upgrade and after checking in decided to make use of the spare time to go for a run.
Just before leaving, our engineer changed his status to "Away - going for a run".
Shortly after this our client called, checking on where the engineer was as he expected him earlier than we had put in the calendar. We flicked quickly to the Presence Networks Networker application and saw his status message and were immediately able to tell the client that the engineer was in town, checked into the hotel, and was out for a run but would be back shortly and would be with him shortly after a shower and would be there in plenty of time.

In our example the status message was not changed using a traditional instant messaging desktop client. It was in fact changed using a Java (J2ME) client running on our engineers mobile phone.


[ add comment ] ( 86 views )   |  [ 0 trackbacks ]   |  permalink  |  related link
Who or what is consuming your network? 
So you have all this bandwidth. You have an acceptable use policy. You've done all the other things you are supposed to do.

Why does everyone point at the network when things go slow? Because they can and defence is hard to come by, because knowing what is happening now, what is normal and what happened 5 minutes 26 seconds ago are really hard to come by.

Good news!! The MTI (Mean Time to Innocence) has just got a whole lot quicker ;-))

Imagine something that sucks in all the NetFlow (and similar - such as sFlow, jFlow and IPFIX) data and draws a network map, showing what’s connected to what and how much is going across those links. Then add deep packet inspection, so you can right click on one of those links and see what protocols are running, what is the client, what is the server. Now you know what is running across the network, and where it’s going from and to. Now add AD integration. Now you can see whose credentials were used to create the traffic. Hey, now we are cooking! Now we know what, where and who and in real time.

If we record all that activity, we can look back a month or two and see how usage has changed. That's good.

Add heuristics to look for odd behaviour, oh I like that. So if a worm starts to do something nasty then it will show up on the network activity, of course it will. So by looking at the history, we can see where the nasty activity comes from, so we can see what's infected without having to run scans. And you can see what services those infected hosts are providing, and any dependencies, so planning a containment strategy is done with a good deal of understanding of what's what and the consequences of disconnecting that port, or pulling the mains cable on that server.

This is really good, if it was agentless to make deployment easier - perhaps just an appliance, then life would be, ah! It would mean I could see what was happening all the time, I could prove my innocence straight away and I could even help with security issues at the same time. I’d be a hero! Super Network Operations Guy (SNOG)!! I could even wear my pants outside my tights (oops, too much).

Have a look at Mazu Networks and get your reputation back. The network guy is innocent!!

[ add comment ] ( 86 views )   |  [ 0 trackbacks ]   |  permalink  |  related link
Using InMage DR Scout to protect Web applications. 
Increasingly we find our clients are using web based applications both to serve customers and to serve staff.
Be it the Intranet WIKI or the Internet ecommerce site, maintaining a useful replica of that site, software and database is essential.
What we have noticed also is that an overnight backup is just not cutting it any more.

Specifically, plain old time based backups don't fit in an international model very well and a client has also come to that conclusion, so asked us to come up with a mechanism to help.

InMage is a Continuous Data Protection (CDP) provider who have a product called DR-Scout which allows us to do event based backup and restore operations (along with traditional time based backups and manual restores). We have recently been working directly on a web-based application using InMage to provide a client the ability to restore the entire web application back to the moment before a specific action in the application occurred.

The web application is a typical LAMP (Linux, Apache, MySQL, PHP) application architecture, In fact in our development rig it is a WAMP (Windows, Apache, MySQL, PHP) environment. There is a database (of course) and staff enter their "widgets" information via the web gui. So serial numbers, part numbers, in dates, out dates, etc etc.

Using the InMage system we created a replication of the volumes that the various stacks reside. So we have a volume that contains the PHP application itself, one for the MySQL DB etc. These replicas are continuously updated as changes are written to the disk, the RPO (Restore Point Objective ) on the local replica is less than a second. We also have a remote replica which is on a DR site, the RPO on these files is larger, x seconds.
It is worth noting that the RPO is a product of the data churn (how much data is being written), the amount of compression of the changes is possible, and the bandwidth the changes are being transmitted over to the replica.

InMage allows us to restore data from any point in time using this replica. We can restore to physical volumes or to virtual volumes. So we can restore data just prior to a disaster at 01:00:30.25 AM by restoring the data from 01:00:30.24 AM. Yes, we can be that precise! Obviously we probably want to go a bit further back, so we can create another restore virtual drive at say 01:00:25.00 AM. We can mount both sets of data and see what the damage is and use the most appropriate. Which covers unforeseen failures and all the flexibility you could need.

But...What would be better, would be if we could restore from a point at when something happened in the business/application.Say from the last successful transaction or big transaction at least. With InMage this is done with EVENTS (or bookmarks if you prefer), which we can search and restore from. So in this case, we setup InMage to record an event just prior to any deletion of a record, before and after the billing cycle, and before and after all big maintenance jobs.
We also setup the system to record an event prior to and immediately after any changes to the production PHP code being updated.

What this allows our client to do is look at the recovery web gui and choose to restore say the PHP code that powers the system prior to the last update. Or to restore a volume back to the point before the last DB purge was run, or billing was run. This is all human readable and human understandable, it is also in the language of the business, not of IT. So the event for the billing cycle appears as "EVENT_PRE_RUNNING_INVOICES" and for code updates it says "EVENT_BEFORE_SYSTEM_SOFTWARE_UPDATE".

At this stage of the project that is all we have rolled out, but the client is so impressed they want to extend the technology to other critical areas of their business. They have a MS Exchange Server we are integrating into the InMage system. So that the client can do push button fail over of Exchange from the primary site to a secondary server in a matter of just a few minutes. No loss of valuable emails. We are able to make sure that when we record an event, it is "application consistent". So all the VSS magic is done and we know that the restore will be working.

Watch this space for more as we progress....

[ add comment ] ( 86 views )   |  [ 0 trackbacks ]   |  permalink  |  related link

Back Next