Hey guys.
Just letting you know a bit of background information and giving an explanation for the downtime over the last 24 and a bit hours.
At approximately 11pm on Sunday 7th March, e621 went offline due to a stupid issue with Debian and Xen virtualization. Namely... the clock started becoming inaccurate (I believe it went backwards), and the operating system freaked the fuck out and crashed.
Around 2-3hrs later, I woke up (I'm back in the UK now, on GMT) due to jetlag and noticed the site was down; I performed some scheduled updates that needed to happen anyway and fixed the clock skew problem (as fortunately there was an update that fixed the problem available).
The site was online and available and working fine for approximately 3 hours, from about 5am until 8am - at which point the Ruby on Rails server that runs the actual e621 application fell over - again, for no apparent reason.
Unfortunately, due to the fact I have recently changed from a US to a UK cellphone, the automated downtime monitoring service that monitors e621 as well as my other sites did not notify me of this fact - so I went about my entire day oblivious to the fact the site was down (as I'd checked an hour or two after it went back up, and there were no problems).
This entire situation blows, and is one more reason why I hate Debian and Ruby on Rails with a passion (as well as myself for not bothering to check the site more frequently).
But anyway; for future resolution, I have set up and tested the downtime notification to send messages to my correct phone number, and have performed a large number of updates on the server e621 runs on, including turning on automatic time updating (which should in theory reduce 'clock skew', the reason that e621 shat itself in the first place).
I could've handled this so much better, but unfortunately due to the fact I just changed my body clock through seven timezones I was pretty lucky just to be awake and able to solve the problem in the first place.
In future, I think that I might be moving e621 across to a different operating system and attempting to performance optimise it, before moving everybody across to the new instance. This will improve maintainability (how easily I can keep the software going) as well as improve the ease of which bugs can be fixed.
Thanks everyone for your patience; I'm so, so sorry that the site was down for over 24hrs, and I apologise for everyone who lost out on fap time because of it. :(
Varka
Updated by Blaziken