Topic: Recent Downtime; March 7th-8th 2011

Posted under General

Hey guys.

Just letting you know a bit of background information and giving an explanation for the downtime over the last 24 and a bit hours.

At approximately 11pm on Sunday 7th March, e621 went offline due to a stupid issue with Debian and Xen virtualization. Namely... the clock started becoming inaccurate (I believe it went backwards), and the operating system freaked the fuck out and crashed.

Around 2-3hrs later, I woke up (I'm back in the UK now, on GMT) due to jetlag and noticed the site was down; I performed some scheduled updates that needed to happen anyway and fixed the clock skew problem (as fortunately there was an update that fixed the problem available).

The site was online and available and working fine for approximately 3 hours, from about 5am until 8am - at which point the Ruby on Rails server that runs the actual e621 application fell over - again, for no apparent reason.

Unfortunately, due to the fact I have recently changed from a US to a UK cellphone, the automated downtime monitoring service that monitors e621 as well as my other sites did not notify me of this fact - so I went about my entire day oblivious to the fact the site was down (as I'd checked an hour or two after it went back up, and there were no problems).

This entire situation blows, and is one more reason why I hate Debian and Ruby on Rails with a passion (as well as myself for not bothering to check the site more frequently).

But anyway; for future resolution, I have set up and tested the downtime notification to send messages to my correct phone number, and have performed a large number of updates on the server e621 runs on, including turning on automatic time updating (which should in theory reduce 'clock skew', the reason that e621 shat itself in the first place).

I could've handled this so much better, but unfortunately due to the fact I just changed my body clock through seven timezones I was pretty lucky just to be awake and able to solve the problem in the first place.

In future, I think that I might be moving e621 across to a different operating system and attempting to performance optimise it, before moving everybody across to the new instance. This will improve maintainability (how easily I can keep the software going) as well as improve the ease of which bugs can be fixed.

Thanks everyone for your patience; I'm so, so sorry that the site was down for over 24hrs, and I apologise for everyone who lost out on fap time because of it. :(

Varka

Updated by Blaziken

Was just gonna say thumbnails are broken again, but you appear to have fixed that too!

Updated by anonymous

Char

Former Staff

Thanks for all your hard work Varka. <3

Updated by anonymous

Cheers dude, sounds like a bitch.

Debian is meant to be the most stable distro! Tsk.

Updated by anonymous

actually in a way I'm almost thankful for the downtime varka, I had totally forgotten about the toypics site you run and had not known it had been updated, was a nice find. Plus I got to go to a site I used to go to all the time till I found e621 and they had a good number of new images there too. So no worries, we're all glad it's back up but we survived. ;-)

Updated by anonymous

I was worried, but this is very much a relief. Thank you for fixing it, Varka.

Updated by anonymous

Awesome, keep it up, Varka. Must be a bitch keeping the site running; I know I couldn't.

Updated by anonymous

  • 1