Xepher.Net Forums

Xepher.net => Hosting Q&A => Topic started by: fesworks on March 17, 2007, 06:03:11 PM

Title: What happened with the server? It went down a couple times?
Post by: fesworks on March 17, 2007, 06:03:11 PM
Last two mornings all of Xepher was down for a bit.
Title: What happened with the server? It went down a couple times?
Post by: Xepher on March 17, 2007, 06:20:15 PM
I couldn't tell if it was my crummy ISP here, who's messed up and/or broken their routing tables in the past, or if it was the datacenter having connection issues. The server itself hasn't actually gone down at all, it's at 60+ days uptime right now, so it must've been network issues at or near the datacenter. In other words... nothing I can do about it, but I'm sorry none-the-less.
Title: What happened with the server? It went down a couple times?
Post by: fesworks on March 17, 2007, 07:31:49 PM
cool :)
Title: What happened with the server? It went down a couple times?
Post by: reinder on March 18, 2007, 07:19:24 PM
Seems to have been down again. I got a message from a reader about it.
Title: What happened with the server? It went down a couple times?
Post by: Xepher on March 18, 2007, 08:37:31 PM
What times (in UTC or with your timezone/offset) are you running into problems? I've personally only had outages of a few minutes in length at a time when trying to reach the server, not even long enough to disconnect SSH sessions I leave open over night it seems. I know the datacenter had a note about adding new bandwidth lines, and that would cause some minor glitches, but the suggested it would only be for an hour or so during the install. I wonder if it's related.

I'm gonna look into it further with the datacenter, and I'll let you know what I find.
Title: What happened with the server? It went down a couple times?
Post by: reinder on March 18, 2007, 08:45:56 PM
I read the message at about 17:00 CET, but it came in on an old email address that I rarely check (and will indeed be canceled in a few months), so it may even have referred to yesterday's outage. What it said was that "ROCR had no IP address" (quoting from memory because I only read that address on the studio machine anyway, and am now at home). Two outages that I was personally witness to happened in the afternoon (CET) on Friday and Saturday. The Friday one was long enough for Project Wonderful to send me email that my ads were suspended until they could confirm that the site was up again.
Title: What happened with the server? It went down a couple times?
Post by: Xepher on March 18, 2007, 09:31:39 PM
I wasn't around most of friday, and didn't have SSH running or anything. That could've been a lot longer and I wouldn't have known.

Okay, found out the datacenter had a major piece of routing equipment fail on them. Their replacement went awry. They've got people up in arms on their support forums, demanding money back and such. Reality is there's nothing to be done while they wait on a shipment of a specialized part.

As for me/us... This is the sort of thing that happens from time to time with this datacenter. IIRC it's been nearly a year since they last had this sort of problem, which isn't THAT unreasonable to me. This is why I bought up the idea of placing the new server with a different facility, but as I pointed out in that thread, they all cost a fair amount more.

I hate downtime enough when the server I run crashes or I personally screw up an upgrade or something... but I take pride that I usually get those things fixed pretty quickly. Having everything randomly down, through no fault of my design or attention... that annoys me. Of course, when the server screws up, I'm grateful that most everyone here doesn't hold it against me, and trust I'm doing my best. I try to hold the same attitude towards the datacenter hosts.

If what they say is true, they had the outage on the 16th, replaced the equipment within 2 hours with their own spare, after a few more hours, realized something was still wrong, got on the phone with cisco support who had them test some things that caused additional outages but eventually determined a component in the new hardware was bad, they replaced the part the the morning of the 17th (another outage), but it's not working properly, so they're replacing the whole system, and that won't arrive until the 20th, where we can expect one (hopefully final) outage during "offpeak" hours. They say they'll post the exact time on tuesday.


I know it sucks, but I think I'd rather put up with this once a year or so, than pay twice as much for bandwidth. What do you think? I'd like honest opinions on how important uptime is to you. If we really need to do something different, we should figure that out before it's time to put the new server online.
Title: What happened with the server? It went down a couple times?
Post by: reinder on March 18, 2007, 10:14:16 PM
Well, taken as a whole, XN isn't down all that much. It's unnerving when it happens but I could deal with that by being less neurotic.
Title: What happened with the server? It went down a couple times?
Post by: griever on March 19, 2007, 12:10:51 AM
Nothing's 100% perfect and this has been running fairly well and for a good price, right?  Unless the frequency increases, I'd say they're still good.  A lot of the sites here have longevity or at least a dedicated fanbase, so I would think that the occasional downtime shouldn't kill them.

I also think I live in offpeak hours. :(
Title: What happened with the server? It went down a couple times?
Post by: Xepher on March 19, 2007, 03:21:53 AM
Well, the server is in Chicago, about a thousand miles straight north from me. That's UTC-5 right now due to stupid new daylight savings time rules. Lowest traffic is usually around 4-6am local, or UTC 2300-0100, but I wouldn't be surprised if they do it a bit closer to "normal" work hours. They claim it should be a quick downtime this next go. I'm crossing my fingers.