This ties in with the concept of experimentation. Thomas Grohser related a story the other night of a case of “yeah, the database failed and we tried to do a restore and found out we couldn’t.”
Apparently their system could somehow make backups, but couldn’t restore them. BIG OOPS. (Apparently they managed to create an empty database and replay 4.5 years of transaction logs and recover their data. That’s impressive in its own right.)
This is not the first time I’ve worked with a client or heard of a company where their disaster recovery plans didn’t pass the first actual need of it. It may sound obvious, but companies need to test the DR plans. I’m in fact working with a partner on a new business to help companies think about their DR plans. Note, we’re NOT writing or creating DR plans for companies, we’re going to focus on how companies go about actually implementing and testing their DR plans.
Fortunately, right now I’m working with a client that had an uncommon use case. They wanted a restore of the previous night’s backup to a different server every day.
They also wanted to log-ship the database in question to another location.
This wasn’t hard to implement.
But what is very nice about this setup is, every 15 minutes we have a built-in automatic test of their log-backups. If for a reason log-backups stop working or a log gets corrupt, we’ll know in fairly short time.
And, with the database copy, we’ll know within a day if their backups fail. They’re in a position where they’ll never find out 4.5 years later that their backups don’t work.
This client’s DR plan needs a lot of work, they actually have nothing formal written down. However, they know for a fact their data is safe. This is a huge improvement over companies that have a DR plan, but have no idea if their idea is safe.
Morale of the story: I’d rather know my data is safe and my DR plan needs work than have a DR plan but not have safe data.