Jump to content

[UPCOMING] Multi-stage server maintenance


Nekone

Recommended Posts

wmghAt6.png?1Nw8vWVu.png

UPDATE March 17: Extra work will be needed after today's maintenance. This will be scheduled for early April.

 

UPDATE Mar 13: I got sick this week, unfortunately, and I'm entering a long workweek, so due to these circumstances, Stage 1 maintenance has been postponed until Saturday, March 17th at 7:00 PM CDT (12 AM GMT). Sorry.

 

 

I will be conducting major server work on the Kametsu server within the next few weeks. Due to the extensive and complicated nature of this maintenance, it will be done in multiple stages so as not to disrupt the community for too long a time.

 

The first stage will only affect the web components of Kametsu. This includes the forums as well as the various project sites and project-related services, and should be relatively short in duration. The second stage will affect *EVERYTHING* on the server, INCLUDING the XDCC bots themselves, and will likely be of significant duration, although efforts will be made to reduce this impact as much as possible.

 

During the first stage, I will be making some critical adjustments to the SSL encryption parameters as well as updating the SSL software libraries responsible for providing this encryption. I am allocating a maintenance window of 2 hours maximum, although I anticipate that any downtime will be limited to 5-15 minutes at any given time during this window. Again this stage will only affect the web-based services. XDCC and IRC functions will continue to operate without interruption during the maintenance, although the XDCC parser website itself will not be operable during this time. We will point people to join the IRC channel during the maintenance work, as updates will be posted there first.

 

The second stage is going to involve a lot more work, and likely quite a bit of downtime. I will be applying critical security updates to the Linux kernel that powers this server. Unfortunately, because the Linux kernel is the core of the operating system itself, and because the particular security updates in question cannot be live-patched into it, at least one to two server reboots will be required to fully implement them. This will result in community-wide disruption during each reboot, affecting everything from the web-based functions (forums, project sites, etc) to IRC and XDCC as well. Because of this, a date for this particular portion of the work will not be set until adequate planning has been done ahead of time - I will be consulting with @Koby on how to go about this.

 

The tentative maintenance schedule is as follows. This post will be updated if anything changes, so monitor this thread carefully.

 

  • Maintenance, Stage 1 is complete, please report ANY problems to IkarosBD
  • Maintenance, Stage 2
    • When: Postponed indefinitely
    • Duration: 3 hours (tentatively)
    • Downtime: Up to 1 hour (tentatively)
    • Affected services: All (web, IRC, XDCC functions)
    • For: Major Linux kernel update + kernel security updates

 

I sincerely apologize in advance for any inconvenience this may cause you. We care very deeply about the security of our community and strongly believe this is necessary to maintain that secure environment.  Thank you for your understanding...and as always, thank you for being a part of this community! It just wouldn't be Kametsu without all of you!

Edited by IkarosBD
UPDATED
  • Like 16
  • Thanks 1
Link to comment
Share on other sites

Your work is underappreciated and we love you. Thanks for the heads up and for taking care of this in a way more professional manner than basically every other community I'm currently apart of.

 

seriously, like 1/3-1/2 of the trackers I regularly use are down or broken in some way right now with no notice or any idea of why, or how long it'll be til they come back. this here is top-tier service, people.

  • Like 6
  • Thanks 1
Link to comment
Share on other sites

Can people be redirected to a page that explains why it is down? I'm not web savvy so, I'm not sure if that's even a thing.

If it is a thing, it should be Catar's avatar saying, "Yup. I broke it. Be back soon!" or some way of simply implying Catar is 100% at fault.

 

This was a fairly long post for joking around with Catar. I will see myself out.

  • Haha 3
Link to comment
Share on other sites

  • Catar unpinned this topic
6 hours ago, NeutralHatred said:

Can people be redirected to a page that explains why it is down? I'm not web savvy so, I'm not sure if that's even a thing.

If it is a thing, it should be Catar's avatar saying, "Yup. I broke it. Be back soon!" or some way of simply implying Catar is 100% at fault.

 

This was a fairly long post for joking around with Catar. I will see myself out.

Unfortunately, since the web server software itself will be stopped in order to apply the updates, redirecting isn't possible. Nor could I run a temporary server on it as it would also have to be capable of SSL since we use SSL everywhere now...and that won't work because the SSL libraries are exactly what I'm updating. Nor would this work for the second stage affecting the kernel, as the entire server itself would have to be rebooted.

 

The only way to do something even remotely similar is to temporarily alter the DNS records. This also won't work for a number of reasons - one, the DNS system is not instantaneous, when changes are made to a domain's DNS records, they take time to become fully effective and known on the internet, due to the design of DNS.

 

And even when your DNS update has taken hold fully, users who visited the site recently before the DNS change is set will STILL continue to attempt connection to the old IP as their own DNS resolvers have a cached response that hasn't expired yet - caching works against us in this case as those DNS resolvers won't bother asking the internet for any updates to the records until that "Time to Live" period for the record expires. Even if there's already a change effective. So for a time any (possibly) significant number of users will continue to hit the old IP even after we changed DNS to point at a temporary IP hosting the temporary webserver.

 

Sorry if that got too technical for you @NeutralHatred, the TL;DR of it is the way this is being done, there's no feasible solution to keep any sort of temporary live site up without any interruption, the way this site is operating now. Can't do it with DNS, can't do it with anything else.

Link to comment
Share on other sites

8 hours ago, IkarosBD said:

Unfortunately, since the web server software itself will be stopped in order to apply the updates, redirecting isn't possible. Nor could I run a temporary server on it as it would also have to be capable of SSL since we use SSL everywhere now...and that won't work because the SSL libraries are exactly what I'm updating. Nor would this work for the second stage affecting the kernel, as the entire server itself would have to be rebooted.

 

The only way to do something even remotely similar is to temporarily alter the DNS records. This also won't work for a number of reasons - one, the DNS system is not instantaneous, when changes are made to a domain's DNS records, they take time to become fully effective and known on the internet, due to the design of DNS.

 

And even when your DNS update has taken hold fully, users who visited the site recently before the DNS change is set will STILL continue to attempt connection to the old IP as their own DNS resolvers have a cached response that hasn't expired yet - caching works against us in this case as those DNS resolvers won't bother asking the internet for any updates to the records until that "Time to Live" period for the record expires. Even if there's already a change effective. So for a time any (possibly) significant number of users will continue to hit the old IP even after we changed DNS to point at a temporary IP hosting the temporary webserver.

 

Sorry if that got too technical for you @NeutralHatred, the TL;DR of it is the way this is being done, there's no feasible solution to keep any sort of temporary live site up without any interruption, the way this site is operating now. Can't do it with DNS, can't do it with anything else.

Technically speaking you could spin up another web server (i.e. lighttpd with just HTTP) to display the maintenance message during your maintenance window. After you're done with whatever you're doing, you can just turn off lighttpd and start apache (or nginx) again.

Link to comment
Share on other sites

Thanks for the heads up.

 

If this site has any social media accounts, like twitter, that would be a good way to give status updates on how things are going while the actual site is down.  If anything exists other than discord I'll definitely follow (I dont use discord, not even the slightest clue how it works.)

  • Like 1
Link to comment
Share on other sites

1 hour ago, GleefulChibi said:

Thanks for the heads up.

 

If this site has any social media accounts, like twitter, that would be a good way to give status updates on how things are going while the actual site is down.  If anything exists other than discord I'll definitely follow (I dont use discord, not even the slightest clue how it works.)

We don't do that much in terms of service to require a Twitter for it. Discord and IRC should suffice for any announcements we may need to do.

Link to comment
Share on other sites

1 hour ago, GleefulChibi said:

Thanks for the heads up.

 

If this site has any social media accounts, like twitter, that would be a good way to give status updates on how things are going while the actual site is down.  If anything exists other than discord I'll definitely follow (I dont use discord, not even the slightest clue how it works.)

there actually is an official facebook and twitter account, located at the buttons at the top and bottom of every page. They're mostly unused though. 

Link to comment
Share on other sites

8 hours ago, ZeroPenguins said:

Technically speaking you could spin up another web server (i.e. lighttpd with just HTTP) to display the maintenance message during your maintenance window. After you're done with whatever you're doing, you can just turn off lighttpd and start apache (or nginx) again.

@ZeroPenguins yeah I'm aware. I didn't want to go that route because of the fact we use HTTPS. I mean on a temporary basis maybe, and we could only do that because we don't (yet) enforce HTTPS via the HSTS mechanism. This is part of the SSL maintenance, actually. Since we have it enabled already for pretty much anything hosted on the domain, it now makes practical sense to turn it on.

Link to comment
Share on other sites

On 3/3/2018 at 5:17 PM, IkarosBD said:

@ZeroPenguins yeah I'm aware. I didn't want to go that route because of the fact we use HTTPS. I mean on a temporary basis maybe, and we could only do that because we don't (yet) enforce HTTPS via the HSTS mechanism. This is part of the SSL maintenance, actually. Since we have it enabled already for pretty much anything hosted on the domain, it now makes practical sense to turn it on.

 

Damn it! @IkarosBD stop being so damn nice, im used to admin's acting like there gods and no one (and i mean no one) can give them feedback (positive or not) its there way or banned... welp time to donate again to the nice ppl. :D

Link to comment
Share on other sites

6 hours ago, Alacardjr said:

 

Damn it! @IkarosBD stop being so damn nice, im used to admin's acting like there gods and no one (and i mean no one) can give them feedback (positive or not) its there way or banned... welp time to donate again to the nice ppl. :D

(off-topic)

You wouldn't know it by the way I carry on these days, but back when I was fresh out of high school...hell, not even then, barely coming out of my junior year of high school in fact...that was exactly the type of admin I was on an MMORPG community forum AND its associated IRC channel (which neither exist today 'cause of staff infighting, disagreements between mods/admins and everywhere in between, nobody worked well together at all). Even though I was technically not its owner, I still ran around the place acting like I owned it, not caring for anyone's opinion but my own, and banning people left and right at even the slightest hint of defiance. Heck, I banned people just for looking at me the wrong way (e.g the simple "I don't like you" ban). I was much worse on IRC than I was on the forum itself - one year I ended up with a kick counter of 1,200+, more than 15 times that of anyone else on staff.

 

Thankfully, for everyone here, those days are long past me by at least a decade, and over that time I've matured quite well and I certainly have a hell of a lot more sense than I did back then. Those days were nasty for me, and looking back at them now, it's very difficult to believe just how nasty I was back then, just how much of an ass I was...and above all else, just how much I truly hated myself for it.

(/off-topic)

 

Back on topic, after an inquiry with @Koby, I have updated the original post as we have decided upon at least a tentative day for the 2nd half of the planned work. It will be on or after March 18th depending on my own (personal) schedule. There's also still some uncertainty surrounding the exact amount of time I'll need to make sure this is done right, as well as the amount of downtime I'll require as well. Right now I have a 3-hour window allocated with up to 1 hour downtime, but this will likely change as we get closer. To help minimize this as much as possible, I'm going to repeatedly go back over every step of the process, making adjustments, and ensuring my bases are covered to make this go as smoothly as possible. As a part of this I'll be continually ensuring the environment is consistent and fully up-to-date leading up to the actual work, so as to minimize any potential for problems.

 

Also, what we will probably do during maintenance (at least for the 2nd portion of it), is direct DNS to a temporary webserver that will have links to IRC and our Discord. The webserver will likely be hosted off-site on one of my servers. This page will also contain any status update messages I feel are necessary during the process. It will have an auto-refresh mechanism so that you don't have to constantly refresh manually. We will tweak DNS so that any switch between the temporary webserver and the main one goes as quickly as possible.

 

SIDE NOTE: While I did mention IRC as being 'affected' for the 2nd part of the work, this only affects the particular server that is attached to the IRC network hosting our main channel. The IRC network itself will very much remain fully available to connect to via its remaining servers.

  • Like 2
Link to comment
Share on other sites

How dare you shut down for any time at all, and for what, doing maintenance on the site to make it better? How dare you sir? Just how dare you? Nah, I'm just kidding, seriously though thank you very much of all the time and effort put into the site and trying to make it run smoothly and everything. And thanks for the heads up so everybody isn't worried about it being taken down because of an attack or anything.

  • Like 1
Link to comment
Share on other sites

Welp, I ended up getting sick to start off the week. The only two off-days I have this week, no less. I'm entering a long workweek at work so the time I have to do this has just disappeared. Stage 1 maintenance postponed until Saturday evening. Main post has been updated. Obviously, no server changes have occurred yet.

Link to comment
Share on other sites

35 minutes ago, IkarosBD said:

Welp, I ended up getting sick to start off the week. The only two off-days I have this week, no less. I'm entering a long workweek at work so the time I have to do this has just disappeared. Stage 1 maintenance postponed until Saturday evening. Main post has been updated. Obviously, no server changes have occurred yet.

Hope you get better.

Link to comment
Share on other sites

  • Catar unfeatured this topic

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...
Please Sign In or Sign Up