Back online!

filed under: ,

18 January 2009

After about 40 days of offline-ness, rainskit.com is back up, mostly. gwen.rainskit.com is still down, but the rest of the websites are up. tru_tags (and instructions) are back up. The gallery is working. From a visitor’s point of view, everything (except gwen.rainskit.com) is working as it was before.

Of course, all is not quite as it seems…

So, back before the outage, here’s basically how things were set up:

  • A friend was hosting all my websites (including galleries) on his Mac Mini, in his apartment a Columbia University, where he had a static IP and really good upload speeds
  • That server was also the primary mail server for rainskit.com, but my home server acted as a secondary mail server (so if his server was down, email would still get delivered), and all email was also forwarded to Gmail
  • He had an amazing set of anti-spam tools, so we got very little spam
  • I had a great set of daily backup scripts that backed up everything (of mine) from his server, plus everything from my home server, plus everything on my home desktop, to a dedicated hard drive in the home server

So I felt very confident that all my data and services were secure, and that a single outage of anything wouldn’t cause a problem. And mostly I was right.

What happened is that about 5 weeks ago (right after I finally blogged again!), the hard drive in the Mac Mini bit the dust. OK, not such a big deal (for me) because all my data is backed up (right?) and all my email is automatically routing through the secondary mail server (right?) and forwarding on to Gmail. And yes, I checked – all my data was backed up, and my email was making it to Gmail. Woohoo! Mark up a win for preparedness!

But then things got complicated. My friend was in the middle of finals, plus the Christmas holidays were coming up, plus I (later) found out that my friend’s backup disk was in a format that could only (easily) be read by the Mac Mini – so it was going to be very difficult for him to get the backup data up and running on a new machine. So suddenly I was looking at a month-long (or longer) outage. That complicates things.

First, I needed to notify visitors about the problem. That was fairly easy – I just updated the DNS records for all my sites to point at the home server, and put up an error page. Check.

Second, I needed to figure out what to do about the email that was queuing up. See, email that goes to my secondary server (here at home) usually gets forwarded on to my friend’s server, for spam filtering and final storage (so I can see it in Thunderbird, for example). But with his server down, it was just sitting on my server, and eventually my server would give up trying to forward it on, and bounce it back to the original sender, making them think that I never received it, even though I would have already seen it in Gmail, by then. So I poked around the qmail documentation, and discovered that I could increase the timeout before mail would bounce. It is currently set to about 80 days, and there are currently (after 40 days) about 800 messages in the queue, waiting to be delivered. But OK, that will work, temporarily.

Third, a bunch of people wanted access to the tru_tags documentation, but it wasn’t accessible. I finally tracked down a version at archive.org and posted that on the error page. It really should have been included in the plugin, but I took it out because it was so large, and I figured it was good enough to be on my website. Oops!

So finally today I decided that I wasn’t going to wait for my friend to get his site back up, and sat down to get my websites back up on my home server. And mostly I was successful. A few things I wasn’t prepared for, though:

  1. The hard drive I was planning on using (on my home server) didn’t have enough free disk space to host my gallery (10gb), so I had to put the sites on the drive I use for backups – which means that if that single drive fails, I lose the pictures and the backups of the pictures. Which makes the backups pretty much useless. Crap. That needs to be fixed ASAP, but for now (as long as I don’t upload new pictures), it’s no worse than it was yesterday (when the pictures were only on that backup drive…)
  2. I discovered that I forgot to backup the database for gwen.rainskit.com. I have all the pictures, but the database has users, descriptions, comments, etc. So I’m hoping that my friend has the database on his backup, and I’m waiting until I find out to try to fix it.
  3. I discovered that I forgot to backup my wife’s email. Oops. Gmail has everything, but it’s not the same as having her Inbox, and her folders, and all her old emails right where she left them. Again, I hope my friend has them backed up, and until I find out, I’m not fixing the email setup.

So there’s still a lot of work to be done before I’ll be able to withstand another failure (of any sort!). And it’s not fun work, but I’ve already seen its value once :)

Add a comment

(will not be displayed)

(not required)