News:

The anti-spam plugins have stopped being effective. Registration is back to requiring approval. After registering, you must ALSO email me with your username, so that I can manually approve your account.

Main Menu
Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - Xepher

#256
Well, I feel I owe everyone an explanation for the day-long outage yesterday. Let's start from the top.

For the past year, xepher.net was hosted on a virtual machine at a company I used to work for, because I got a discount there, paying $63 for what should've been $250 normally. They finally decided I've been gone long enough, that the discount has to go. So I look for new hosting. I find a place (call it option A), and it's $160/month for ostensibly the exact same level of service. I put in an order, and don't hear anything for a couple days, despite emails to their support and such. This isn't very promising, so I look at other options. I find another place (option B), really cheap, $55/month, with a slightly less powerful setup, but more bandwidth and diskspace, so I figure, why not? They get me setup pretty much right away, and I spend a day or so moving stuff over there.

I switch all the services on, and it runs great for a few hours, some hiccups, but not a big deal. After a couple days, I figure, eh, this'll work for now, and my contract is up in a day or two anyway on the original service, so I have to move, shutting down the old host and closing that account. I go ahead and let the order for A stay in place though, since I'm curious to compare the two. It comes through about 48 hours ago, 6 full days after I actually ordered it, and by then I'm noticing more and more performance issues with B. The problem is that the CPU is powerful enough, but they have too many users on one machine, sharing the same disk, so anything needing a file gets delayed really bad. Since we have about 150 users here, all with websites, this shows up as a system load of over 100... e.g. there's 100 people waiting in line for a file. At this point, I get a letter from B, saying other customers are complaining because I'm using all the CPU. This isn't the case, as it's the disk that's lagging, and it's not my fault. Other customers may not have 100 people in line, but... well, picture it this way. I have 100 people in line with bucks, to collect their files as they trickle out of the spigot, so my "load" is 100-in-line. The other customers may not have 100 waiting, but they're bringing dumptrucks instead... big file transfers, but since only one person is waiting, their load is only "5" or some such. I tested this by shutting down all services for a bit in the middle of the night, and the load was still sky high, even when my server was doing nothing. Anyway, besides the point... what happens next is the fun part.

The of B emails me, with the complaints... but instead of giving me a chance to respond or work on it... he just shuts down my server. And in the email he tells me he's installed a script that will automatically shut it down again anytime load goes over 10.  Keep in mind, there's already a line of 20 dumptrucks (each from a different customer) in line, but if I so much as get 10 people with bucks at the end of the line, I'm shutdown again. I can reboot the thing through the control panel, but it obviously mucks with stuff to just reboot every few minutes. I write out a reply, explaining all this, but he apparently went to bed, because I don't get any response for 8 hours. In the meantime, I do the only thing I can. I shutdown the web service and try to migrate stuff as quickly as possible from host B to host A.

And then it gets more fun... host A is even slower on the disk response than the first one. As in, I can literally download it from the internet faster than it can save it to disk, so it's taking like 18 hours to move all this stuff. I just kept at it for 6 hours or so, but realized this place will probably be even worse, and for $160 there's got to be a better option. Finally I got with ThePlanet, and get a dedicated server. No sharing it with anyone... no VMs, no VPS, pure hardware all for me! And it's $125.

I order it, but one catch, it doesn't come with gentoo, the OS I use. So either I rewrite all my management scripts and all the custom programming, or I find a way to install gentoo myself. The thing has two disks, so it should work, but... well... OS install via the internet. Yay. It's slow going, but I make some headway, but finally, I reboot and it won't come back up. I give in, and tack on another $30/month for a KVM (virtual keyboard/monitor access) to let me fix things. That takes another couple hours for them to setup.

So now I have KVM, and I work through stuff for a good solid 30 minutes, then it refuses to boot because I screwed up a boot menu option. No biggy, reset it and pick another thing from the menu. FAIL! I can't reset it with the ctrl+alt+del, since it's frozen. So I use their panel to power it off and back on... but my KVM goes dead and stops responding as soon as I do this. Later, I find out the KVM is powered by USB, and takes like 60 seconds to come up. So, by the time it powers up, the computer has already booted, and once again gone with the broken default!! So I put in a ticket for them to go reboot it manually, and wait some more.

They're actually pretty good on this, get it done in like 20 minutes, and I finally get a working, bootable system. Then I start copying all data off of B, because that's the last place a complete copy of xepher.net ever actually made it to. (Well, I have a full backup of everything on my desktop, but uploading from a home connection would take 3 days or more.) The attempted copy to A has been going for the entire day, and it's still not done either! I nix that, and just focus on moving everything into the physical machine. It finally finished about two hours ago. I've been up since 5pm on tuesday, and it's now the wee hours of the morning on thursday. 31 hours awake, and 25 of them trying to sort all this out.

There are a thousand more little things that went wrong of course, but I'm too tired to rant anymore. Bottom line is, while it wasn't my fault, I do apologize. I never like to get caught with my pants down, so to speak, and all this hit when my options/infrastructure was weakest. Almost any other time I have a backup plan that's better than this, but having to move only 2 days after you just finished one... and with a script rebooting your only functional system anytime you so much as look at it funny... Oi!

Well, with all that said, I'm off to bed. If you run into any problems or snags, it's entirely possible I missed 1 (or 100) things in this, so do let me know. Also, any of you hosting here... feel free to tell your readers/visitors what happened, and send 'em here, anyone is welcome to ask questions if you want clarification or whatnot... it's not just for members here.

Somehow, I keep thinking all this must've been 24 hours early, because I would swear this much fail would have to be an April Fool's Day prank!
#257
Announcements / Re: Server Move
April 01, 2010, 03:18:28 AM
Ugh... See updated post at top.
#258
General Chat / Re: Battlefield Bad Company 2
March 31, 2010, 03:31:59 AM
Yeah, I don't play too often, but I'll probably still be playing in a few months... it's the kind of game I enjoy dropping into a couple times a week.
#259
Applications / Re: The Weekly Toot
March 27, 2010, 09:51:48 PM
Sorry I haven't had time to look at this... I'm still running into headaches from the server move. I'll get to this after things get sorted out.
#260
Announcements / Re: Server Move
March 27, 2010, 06:28:38 AM
Okay, rough ride so far. Server ran out of memory a bit ago, made a lot of things act wonky, including DNS. Some sites may have cached bad DNS results, but that should expire in an hour or so. I'm tuning things to try and use less memory, since this server has no swap. Let me know if you come across any other issues.
#261
Announcements / Re: Server Move
March 26, 2010, 10:50:21 PM
Just realized the localtime hack above wouldn't work either, since it's only for initial boot. No worries though, since they synced the host clock now.
#262
Announcements / Re: Server Move
March 27, 2010, 08:20:55 AM
As for why I can't set the time... it's OpenVZ (not Xen) which shares a kernel across instances. The clock is not virtualized, so changing the kernel clock changes it for all VMs on the node... not something you want a customer being able to do, and so CAP_SYS_TIME is off by default in VMs.

Faking the timezone isn't good either, because everything automatically adjusts based on the timezone. For example, the forums here display all times/dates converted to whatever timezone the user sets in their profile. If I set the system timezone to Australia, it's then going to convert BACK to a US zone, and it'll still be off by 12 hours (and 2 minutes, 51 seconds.) Though yes, I could probably hack it by setting the system to think the clock is in localtime on the "hardware" clock, and making a custom timezone file that's offset by 12 hours, 3 minutes... but really, they should just run NTP on the node/host anyway, since every client is gonna have the same problem. I'm just waiting for the support ticket I raised to get attention, then it should get fixed properly.
#263
Announcements / Re: Server Move
March 26, 2010, 11:32:19 PM
Alright, move is complete. If you're reading this, it's on the new server.

One small problem so far... clock on this node is 12 hours fast. As it's a VM, I can't change it, so I've put in a ticket. Until they correct it though, times (such as on the forum) are going to be off a bit. Shouldn't really impact much though.


If you run into any other problems, or have connection issues, let me know. I don't really know how solid this new service is going to be, so report any issues to me so I can have a better overview of the quality.
#264
Announcements / Re: Server Move
March 26, 2010, 08:44:37 AM
Okay, move is going to happen in about 2 hours or so. There will be some interruption in service, since I need to halt everything while I do a final sync to the new machine.

Also, this is a secondary host I'm moving to. The initial choice I made... well, I put in an order, and 36 hours (and 2 emails) later, I still haven't heard a word from them, so I went with another, cheaper option, who was MUCH more responsive. We'll see how this works out. Please give me until 10AM CST, after that, if there are any problems, let me know. I'm real curious to see how the performance on this new system is.
#265
General Chat / Re: Battlefield Bad Company 2
March 26, 2010, 02:26:33 AM
Yeah, I should've said, I'm on the PC version. I don't like aiming with my left thumb on a console. :-)
#266
General Chat / Battlefield Bad Company 2
March 24, 2010, 11:46:55 PM
Anyone here play it? I got it a couple weeks ago, and finally found a game I'm enjoying again (even though the server reliability sucks). Problem is, I don't know anyone else who plays. If you do, and want to add me as a friend, my solider name is Xepher42 (surprise!)
#267
Announcements / Re: Server Move
March 24, 2010, 11:27:39 PM
I've decided to go with VPSVillage. They offer actual 4GB Xen instances at a price about what I was paying for colo on the arclight server a year ago... $160. That's about $100 more than I've been paying this past year, but I figure better to give money to some actual geeks doing quality work, than half-cocked, outsourced budget hosting. As for the actual hosting, well, it seems to offer nearly everything that linode does, up to and including letting you run your own custom kernels and OS images, but for a cheaper price. Anyway, I've placed my order, and they say "up to 3 days for setup" but I hope it'll be quicker than that. Tentatively, I'm aiming to do migration late at night in the next few days. Should be done by monday at the latest. I'll post more updates when I have an exact time.
#268
Announcements / Re: Server Move
March 24, 2010, 07:41:24 AM
Oh, and if anyone has suggestions for a good VPS host that supports gentoo, lemme know. I'm currently considering Burst, Hostlatch, and Delimiter. All of them are rather on the cheap side, but it looks like I may actually get MORE bandwidth/storage for less money than even my super-cheap discount does now. This could be a good thing overall.
#269
Announcements / Server Move
March 24, 2010, 04:44:01 AM
UPDATE UPDATE: FRAK! Well, that didn't go so well, did it? Long story short, we moved 2 more times in the past 24 hours. I think we're settled down finally. I've been up 30 hours right now, and I'll make a new post later tonight or maybe tomorrow that'll be LONG and explain everything that happened.

UPDATE: Move is complete. So far, bumpy ride, and there've been some issues and brief outages. I'm tuning things to try and make it work better on the new system. Please let me know if/when you run into problems.

Fasten your seatbelts, we're about to move to a new server. Xepher.net has been hosted on a discounted plan I got through my old employer, and now that discount is going away. As such, I'm going to have to find a new home for everything. Thankfully things are all virtual these days, so it should be just a matter of finding a new host and transferring the data. I expect it to be mostly seamless, with a few interruptions in service here and there, maybe a couple hours downtime at most. There's always the chance things could go south though, so I'm giving as much notice as I can here. I just found out a few hours ago myself. Worst case though, I do have full, nightly backups of everything on my own PC here though, so no matter what happens, all the data is safe, and I will get it all back up and working sooner rather than later.

The good news about still being unemployed though, is that I have plenty of free time to work on this. :-) The move may happen as soon as tomorrow, so by the time some of you read this, it might already be past tense. In any case, keep an eye on this thread for updates, and if xepher.net is unreachable, feel free to ping me for status updates on AIM/Gtalk/YahooIM... my screenname is Xepher42.
#270
Applications / Re: Furtigo
March 23, 2010, 12:05:36 AM
You said you have two issues on web comics nation already... you should start by linking or showing us those.