Nov 28, 2011

How much credence do you put in the claim that preventative maintenance of data centers results in decreased reliability?

The president of MTechnology claimed during his keynote speech at the 24x7 Exchange Fall Conference that preventative maintenance of data centers is the most common threat to reliability. Essentially, his position is that maintenance often introduces new defects and can disrupt optimal configurations for reliable operations. He used Three Mile Island as an example of maintenance creating a system failure. I don't know that much about the particulars of the Three Mile Island Meltdown, but the premise that maintenance is essentially a bad thing seems counterintuitive to me. Is there validity to his claim?


I think his point might be lost in the way it was phrased.  I see that he could have a point with maintenance schedules that are aggressive to the point that they become excessive.  And sometimes the companies that sell you hardware push pretty aggressive  maintenance packages, just like a lot of car dealership try to expand upon the manufacturers recommendations and get you to pay for expensive and unnecessary maintenance.  If you accept that there is some risk of an error while performing maintenance procedures, the more times those procedures are performed, the more opportunities for an error taking place.  I think that is one challenge to developing a good preventative maintenance schedule, how often is the ideal balance between preventing problems, but not so often that the odds of an error become high enough to outweigh the benefits. 

Well what's his alternative then? To do no maintenance whatsoever and just wait for things to break down eventually? Like it or not, there must be some maintenance done. It's like owning a car and never taking it in for inspection and fixes. If you don't do it then sooner or later you're going to have a serious malfunction.

Maintenance probably does introduce some issues, but it's necessary and has to happen. Otherwise things will just slowly fall apart over time and then you have an even bigger crisis to deal with. I doubt most people want to go that route.
